Smaller, faster, smarter: Distilling models with fine‑tuning | DEM322

William Liang demonstrates how teams use Azure AI Foundry to distill large models into smaller, task-focused language models using supervised fine-tuning, with an emphasis on reducing production latency and cost while maintaining accuracy through structured evaluation.

Overview

Large language models can be powerful but expensive to run in production. This Build 2026 demo focuses on practical techniques for creating smaller models that are cheaper and faster at inference time while still performing well on specific tasks.

Key ideas covered

Why distillation now

Distillation + fine-tuning workflow

Evaluation approach

Azure AI Foundry demonstration

Example scenario discussed

Refund and cancellation behavior comparison

Takeaways

Resources