Hugging Face open‑source models to production on Microsoft Foundry | DEM320
Vaidyaraman Sambasivam, Osi Otugo, and Jean Boudier demonstrate an end-to-end flow for taking Hugging Face open-source models from discovery to production inference using Foundry Managed Compute in Azure AI Foundry, focusing on scaling, governance, and avoiding direct GPU management.
Overview
This lightning talk (DEM320, Microsoft Build 2026) focuses on operationalizing open-source models in production by deploying and scaling Hugging Face models on Azure using Foundry Managed Compute inside Azure AI Foundry.
What the session covers
Deploying Hugging Face models on Azure via Foundry
- The speakers position open-source models as a strong option for modern AI workloads, while calling out that productionizing them is often difficult.
- They show an end-to-end path from:
- model discovery
- to deployment
- to production inference
- The core pitch is reducing operational burden by avoiding direct GPU management.
Production concerns addressed
- Autoscaling for inference workloads
- Governance controls suitable for enterprise environments
- Enterprise-grade performance characteristics when running open models in production
Ownership and control with self-hosted weights
- The session highlights the ability to keep ownership and control by using self-hosted model weights, emphasizing transparency, flexibility, and control.
Hugging Face ecosystem context
- Notes the breadth of the Hugging Face ecosystem, including large numbers of public models and datasets.
Runtime and hardware considerations
- Mentions runtime optimizations for different types of models, including multimodal workloads.
- Demonstrates simplified GPU selection using Managed Compute.
Demo: deploy a model and use it in an agent
- Demonstrates deploying a model and then adding the model to an agent.
- Shows an example of an agent retrieving NBA Finals results using web integration.
Resources
Speakers
- Vaidyaraman Sambasivam
- Osi Otugo
- Jean Boudier