Using autonomous SRE to move from alerts to action | OD800

Name: Using autonomous SRE to move from alerts to action | OD800
Uploaded: 2026-06-03T14:27:41+00:00
Description: Vyom Nagrani and Deepthi Chelupati explain how autonomous SRE workflows use AI agents to move from incident alerts to automated diagnosis and remediation,...

Jun 3, 2026 by Vyom Nagrani, Deepthi Chelupati

Vyom Nagrani and Deepthi Chelupati explore how autonomous SRE can turn incident signals into concrete actions, combining AI-agent workflows with observability, SLO practices, and governance controls.

Overview

The session focuses on modern SRE workflows where AI agents do more than detect and summarize incidents—they can also propose and execute mitigations under defined constraints.

Key themes covered include:

From alerts to action with autonomous remediation

Using AI agents to progress incidents from detection to diagnosis and mitigation.
Emphasis on reducing downtime by preventing issues earlier and accelerating response when incidents occur.

SLO management and deep observability

Positioning SLOs as a control surface for reliability work.
Using deep observability signals to support diagnosis, validation of hypotheses, and decision-making during incidents.

Agent workflow testing via PR-based changes

Demonstration-oriented workflow where an agent is tested by creating a new pull request and monitoring automated analysis.
Using PR activity as a structured way to validate agent behavior and outcomes.

Guardrails and human oversight

Automated mitigation is framed as operating with guardrails and explicit human oversight.
Discussion of how teams can keep control while still benefiting from autonomous execution.

Custom logic and organizational adaptation

Extending the workflow with custom logic to fit organizational requirements.
Adapting agent behavior to local processes and constraints.

Live simulation: diagnosing a credential desync

A simulated incident scenario focused on diagnosing a credential desynchronization issue.
Root cause confirmation is highlighted as part of the workflow.

Data-driven reasoning and use of past knowledge

The agent applies prior knowledge and available data to identify patterns.
Validating hypotheses is treated as a first-class step, not just generating a guess.

Governance model: review vs autonomous run modes

Governance approach that supports different operating modes:
- Review mode (human review before actions)
- Autonomous run mode (agent executes within constraints)
Command validation hooks are discussed as a control mechanism.

Monitoring and evaluation of the system

Monitoring capabilities through session insights, dashboards, and evaluation metrics.
Focus on measuring outcomes and behavior to ensure reliability and safety of autonomous operations.

Resources

https://aka.ms/build26-next-steps

Session context

Microsoft Build 2026 session: OD800
Level: Advanced
Topic area: Developer tools & frameworks