Orchestrate special agents with NVIDIA Nemotron models on Foundry | BRKSP94

Aysen Ilkbahar, Stephen McCullough, and Joey Conway present a Microsoft Build 2026 breakout on designing enterprise agentic AI with a tiered, system-of-models approach in Azure AI Foundry.

Overview

The session focuses on orchestrating “special agents” by routing work across multiple model tiers:

The goal is to minimize cost-per-task while maximizing output quality, using a plan-and-execute style orchestration pattern.

Key architecture ideas

Tiered system-of-models

The presenters describe a tiered architecture where requests are routed to different models depending on the task’s needs:

Plan-and-execute orchestration

The session highlights a plan-and-execute pattern where an orchestrator:

Cloud + edge routing

A core theme is routing workloads across cloud and edge tiers to:

Nemotron model family highlights

The talk includes an overview of the NVIDIA Nemotron open model family, including:

Enterprise agent operations: observability and audit

The session includes an enterprise-focused demonstration that calls out operational requirements for agents, including:

Foundry-hosted agents and persistent learning

The presenters discuss benefits of using Foundry-hosted Hermes agents, including:

Session chapters (from the video)