Observe and control agents across any framework with open source tools | BRK250
Sarah Bird, Sandeep Atluri, and Mehrnoosh Sameki present a Microsoft Build 2026 breakout on shipping AI agents safely at enterprise scale, with a focus on governance, reliability, and controls that work across Microsoft Agent Framework and open-source stacks.
Overview
As AI agents move into production, the session focuses on how developers can own safety, governance, and reliability end to end:
- Turning requirements into context-aware evaluations
- Stress-testing agents against adversarial risks
- Applying open controls that work across frameworks
- Keeping humans in the loop for high-stakes actions
Key topics covered
Common failure modes for AI agents
The session frames four major ways agents can fail:
- Instruction failures
- Information integrity failures
- Tool misuse
- Emergent behavior
Defining risks and roles as configuration
- Agent risks and roles are defined in a YAML format.
Rubric-based judging and evaluation
- Introduces a rubric-based judge and an evaluation process for assessing agent behavior against requirements.
Automated test set creation
- Discusses automated creation of test sets, including:
- Singleton scenarios
- Multi-turn scenarios
Safety regression and system controls
- Covers the idea of AI safety regression and the need for system controls to prevent regressions as agents evolve.
Agent Control Specification (ACS)
- Introduces Agent Control Specification (ACS) as a way to unify control logic across frameworks.
- Explains how ACS operates between an agent runtime and a policy engine.
Continuous evaluations and attacker simulation
- Introduces continuous evaluations.
- Discusses reinforcement learning-based attackers as part of stress testing.