Under the hood of Microsoft AI models | DEM323
Dave Citron (CVP, Microsoft AI) walks through what goes into training Microsoft’s latest MAI model family—covering new thinking, coding, voice, transcription, and image models, plus the architectural and evaluation choices behind their capabilities and performance.
Overview
Microsoft AI (MAI) presents a technical, “under the hood” look at a newly announced family of Microsoft models, including:
- Thinking models
- Coding models
- Voice models
- Transcription models
- Image models
The session focuses on what it takes to train these models, what the team learns during training and evaluation, and how those learnings are reflected in model architectures, features, and capabilities.
What’s covered (from the session description and chapter list)
Model family announcements and highlights
- Introduction of Transcribe 1.5, Voice 2, and Code 1 Flash performance highlights.
Voice 2: speech generation focus
- Launch of Voice 2 with emphasis on natural prosody.
- Showcase of fine-grained emotional control and multilingual availability.
- Demonstration of an emotional tone example (“joy”).
Evaluation and reporting
- Reference to a 100-page technical report describing model development.
- Discussion of benchmark performance and real-world code evaluation.
Tuning approach
- Introduction to Microsoft Frontier Tuning.
Real-world example
- A real-world example is referenced: Land O’Lakes quality report generation.
Links
- Next steps and related Build resources: https://aka.ms/build26-next-steps
- Microsoft Build sessions: https://build.microsoft.com