Hugging Face open‑source models to production on Microsoft Foundry | DEM320

Name: Hugging Face open‑source models to production on Microsoft Foundry | DEM320
Uploaded: 2026-06-04T11:26:07+00:00
Description: Vaidyaraman Sambasivam, Osi Otugo, and Jean Boudier demonstrate an end-to-end flow for taking Hugging Face open-source models from discovery to production...

Jun 4, 2026 by Vaidyaraman Sambasivam, Osi Otugo, Jean Boudier

Vaidyaraman Sambasivam, Osi Otugo, and Jean Boudier demonstrate an end-to-end flow for taking Hugging Face open-source models from discovery to production inference using Foundry Managed Compute in Azure AI Foundry, focusing on scaling, governance, and avoiding direct GPU management.

Overview

This lightning talk (DEM320, Microsoft Build 2026) focuses on operationalizing open-source models in production by deploying and scaling Hugging Face models on Azure using Foundry Managed Compute inside Azure AI Foundry.

What the session covers

Deploying Hugging Face models on Azure via Foundry

The speakers position open-source models as a strong option for modern AI workloads, while calling out that productionizing them is often difficult.
They show an end-to-end path from:
- model discovery
- to deployment
- to production inference
The core pitch is reducing operational burden by avoiding direct GPU management.

Production concerns addressed

Autoscaling for inference workloads
Governance controls suitable for enterprise environments
Enterprise-grade performance characteristics when running open models in production

Ownership and control with self-hosted weights

The session highlights the ability to keep ownership and control by using self-hosted model weights, emphasizing transparency, flexibility, and control.

Hugging Face ecosystem context

Notes the breadth of the Hugging Face ecosystem, including large numbers of public models and datasets.

Runtime and hardware considerations

Mentions runtime optimizations for different types of models, including multimodal workloads.
Demonstrates simplified GPU selection using Managed Compute.

Demo: deploy a model and use it in an agent

Demonstrates deploying a model and then adding the model to an agent.
Shows an example of an agent retrieving NBA Finals results using web integration.

Resources

Speakers

Vaidyaraman Sambasivam
Osi Otugo
Jean Boudier