Using autonomous SRE to move from alerts to action | OD800

Vyom Nagrani and Deepthi Chelupati explore how autonomous SRE can turn incident signals into concrete actions, combining AI-agent workflows with observability, SLO practices, and governance controls.

Overview

The session focuses on modern SRE workflows where AI agents do more than detect and summarize incidents—they can also propose and execute mitigations under defined constraints.

Key themes covered include:

From alerts to action with autonomous remediation

SLO management and deep observability

Agent workflow testing via PR-based changes

Guardrails and human oversight

Custom logic and organizational adaptation

Live simulation: diagnosing a credential desync

Data-driven reasoning and use of past knowledge

Governance model: review vs autonomous run modes

Monitoring and evaluation of the system

Resources

Session context