Shipping custom models at scale from fine-tuning to inference | BRK234

Rob Ferguson leads a Microsoft Build 2026 panel with Daniel Han (Unsloth), Mark Saroufim (Stealth Startup), and other practitioners on how teams customize models, take them to production, and run them efficiently at scale.

Overview

The panel focuses on real-world considerations for moving from fine-tuning to production inference, including technique trade-offs, infrastructure choices, and performance/cost optimization.

Fine-tuning and production trade-offs

Reinforcement learning (RL): challenges and limits

Efficiency techniques: LoRA and collaboration

Inference performance: GPU and kernel-level optimization

Model math considerations for long contexts