Orchestrate special agents with NVIDIA Nemotron models on Foundry | BRKSP94
Aysen Ilkbahar, Stephen McCullough, and Joey Conway present a Microsoft Build 2026 breakout on designing enterprise agentic AI with a tiered, system-of-models approach in Azure AI Foundry.
Overview
The session focuses on orchestrating “special agents” by routing work across multiple model tiers:
- Frontier models for higher-end reasoning tasks
- NVIDIA Nemotron models for complex sub-tasks
- Local models for latency-sensitive execution (including edge scenarios)
The goal is to minimize cost-per-task while maximizing output quality, using a plan-and-execute style orchestration pattern.
Key architecture ideas
Tiered system-of-models
The presenters describe a tiered architecture where requests are routed to different models depending on the task’s needs:
- Use more capable (and typically more expensive) models when deeper reasoning is required.
- Offload specific complex sub-tasks to Nemotron models.
- Run local models when low latency is critical.
Plan-and-execute orchestration
The session highlights a plan-and-execute pattern where an orchestrator:
- Plans the steps needed to complete a task
- Routes each step to the most appropriate model tier
- Executes steps and composes results
Cloud + edge routing
A core theme is routing workloads across cloud and edge tiers to:
- Reduce latency for time-sensitive steps
- Control spend by avoiding unnecessary frontier-model usage
Nemotron model family highlights
The talk includes an overview of the NVIDIA Nemotron open model family, including:
- Capabilities and positioning of Nemotron models for enterprise agentic workloads
- An announcement of Nemotron 3 Ultra as NVIDIA’s most capable open model
- Emphasis on open publication and building community confidence
Enterprise agent operations: observability and audit
The session includes an enterprise-focused demonstration that calls out operational requirements for agents, including:
- Observability for agent actions
- An audit trail for what the agent did and why
- A workflow involving pull request (PR) creation, review, and monitoring changes
Foundry-hosted agents and persistent learning
The presenters discuss benefits of using Foundry-hosted Hermes agents, including:
- “Special agents” as part of the orchestration approach
- Persistent learning concepts for improving task completion over time
Session chapters (from the video)
- 0:00 – Introduction to NVIDIA announcements and partnerships
- 00:05:46 – Overview of Nemotron open model family and capabilities
- 00:07:52 – Announcement of Nemotron 3 Ultra
- 00:08:55 – Open publication and community confidence
- 00:18:35 – Vision of a digital workforce and agent-based future
- 00:18:54 – Hermes orchestration overview
- 00:26:31 – PR creation demo; observability and audit trail
- 00:27:15 – Reviewing PR and monitoring changes
- 00:34:35 – Foundry hosted Hermes agents and persistent learning