Orchestrate special agents with NVIDIA Nemotron models on Foundry | BRKSP94

Name: Orchestrate special agents with NVIDIA Nemotron models on Foundry | BRKSP94
Uploaded: 2026-06-03T15:13:35+00:00
Description: Aysen Ilkbahar, Stephen McCullough, and Joey Conway explain how to optimize enterprise agentic AI using a tiered system-of-models architecture in Microsoft...

Jun 3, 2026 by Aysen Ilkbahar, Stephen McCullough, Joey Conway

Aysen Ilkbahar, Stephen McCullough, and Joey Conway present a Microsoft Build 2026 breakout on designing enterprise agentic AI with a tiered, system-of-models approach in Azure AI Foundry.

Overview

The session focuses on orchestrating “special agents” by routing work across multiple model tiers:

Frontier models for higher-end reasoning tasks
NVIDIA Nemotron models for complex sub-tasks
Local models for latency-sensitive execution (including edge scenarios)

The goal is to minimize cost-per-task while maximizing output quality, using a plan-and-execute style orchestration pattern.

Key architecture ideas

Tiered system-of-models

The presenters describe a tiered architecture where requests are routed to different models depending on the task’s needs:

Use more capable (and typically more expensive) models when deeper reasoning is required.
Offload specific complex sub-tasks to Nemotron models.
Run local models when low latency is critical.

Plan-and-execute orchestration

The session highlights a plan-and-execute pattern where an orchestrator:

Plans the steps needed to complete a task
Routes each step to the most appropriate model tier
Executes steps and composes results

Cloud + edge routing

A core theme is routing workloads across cloud and edge tiers to:

Reduce latency for time-sensitive steps
Control spend by avoiding unnecessary frontier-model usage

Nemotron model family highlights

The talk includes an overview of the NVIDIA Nemotron open model family, including:

Capabilities and positioning of Nemotron models for enterprise agentic workloads
An announcement of Nemotron 3 Ultra as NVIDIA’s most capable open model
Emphasis on open publication and building community confidence

Enterprise agent operations: observability and audit

The session includes an enterprise-focused demonstration that calls out operational requirements for agents, including:

Observability for agent actions
An audit trail for what the agent did and why
A workflow involving pull request (PR) creation, review, and monitoring changes

Foundry-hosted agents and persistent learning

The presenters discuss benefits of using Foundry-hosted Hermes agents, including:

“Special agents” as part of the orchestration approach
Persistent learning concepts for improving task completion over time

Session chapters (from the video)

0:00 – Introduction to NVIDIA announcements and partnerships
00:05:46 – Overview of Nemotron open model family and capabilities
00:07:52 – Announcement of Nemotron 3 Ultra
00:08:55 – Open publication and community confidence
00:18:35 – Vision of a digital workforce and agent-based future
00:18:54 – Hermes orchestration overview
00:26:31 – PR creation demo; observability and audit trail
00:27:15 – Reviewing PR and monitoring changes
00:34:35 – Foundry hosted Hermes agents and persistent learning