Agent supervision is the new senior engineering skill | BRK244
swyx (Shawn Wang) argues that as AI agents start shipping code, “agent supervision” becomes a core senior engineering skill: scoping agent work, setting constraints, designing checkpoints, and reviewing outputs for correctness and security.
Overview
This Microsoft Build 2026 breakout focuses on practical ways to keep humans accountable while letting coding agents move quickly. The session frames the main bottleneck as trust, and proposes supervision patterns to reduce silent failures and improve correctness and security when agents produce code.
What “agent supervision” means in practice
The talk positions supervision as a senior-level responsibility that includes:
- Scoping what an agent is allowed to do (clear boundaries and goals)
- Setting constraints (what must not change, what must be verified)
- Designing checkpoints (where humans or automated gates must review)
- Reviewing outputs for:
- Correctness
- Security
Patterns to catch silent failures
The session highlights workflow patterns intended to detect failures that may not be obvious when an agent appears to “complete” a task:
- Strong specifications to reduce ambiguity
- Checkpoints and review stages to validate intermediate outputs
- Strong test suites as a primary safety net
Validation and hardening mindset
The talk discusses the need for hardening and validation as agent-generated code increases, including:
- “Goal loops” for self-verifying code (agents iterating toward a validated outcome)
- Using layered defenses (described via the Swiss cheese model) rather than relying on a single control
Broader ecosystem context (from session chapters)
The session also touches on:
- Early discussion of the Microsoft GitHub Copilot launch and broader industry rebranding trends
- Rapid growth of AI-generated code and multi-domain agent applications
- Microsoft’s vision for personalized and company-trained models
- Hardware trends: faster AI inference chips and custom hardware demonstrations
- Model landscape notes, including GPT-4.1, Llama 3, and the value of open weights
- Ongoing growth in model sizes and limits on compression