Observe and control agents across any framework with open source tools | BRK250

Name: Observe and control agents across any framework with open source tools | BRK250
Uploaded: 2026-06-03T08:52:06+00:00
Description: Sarah Bird, Sandeep Atluri, and Mehrnoosh Sameki explain how to govern AI agents end to end in production, focusing on safety, reliability, and human...

Jun 3, 2026 by Sarah Bird, Sandeep Atluri, Mehrnoosh Sameki

Sarah Bird, Sandeep Atluri, and Mehrnoosh Sameki present a Microsoft Build 2026 breakout on shipping AI agents safely at enterprise scale, with a focus on governance, reliability, and controls that work across Microsoft Agent Framework and open-source stacks.

Overview

As AI agents move into production, the session focuses on how developers can own safety, governance, and reliability end to end:

Turning requirements into context-aware evaluations
Stress-testing agents against adversarial risks
Applying open controls that work across frameworks
Keeping humans in the loop for high-stakes actions

Key topics covered

Common failure modes for AI agents

The session frames four major ways agents can fail:

Instruction failures
Information integrity failures
Tool misuse
Emergent behavior

Defining risks and roles as configuration

Agent risks and roles are defined in a YAML format.

Rubric-based judging and evaluation

Introduces a rubric-based judge and an evaluation process for assessing agent behavior against requirements.

Automated test set creation

Discusses automated creation of test sets, including:
- Singleton scenarios
- Multi-turn scenarios

Safety regression and system controls

Covers the idea of AI safety regression and the need for system controls to prevent regressions as agents evolve.

Agent Control Specification (ACS)

Introduces Agent Control Specification (ACS) as a way to unify control logic across frameworks.
Explains how ACS operates between an agent runtime and a policy engine.

Continuous evaluations and attacker simulation

Introduces continuous evaluations.
Discusses reinforcement learning-based attackers as part of stress testing.