Turn foundation models into production AI on Microsoft Foundry | BRKSP91
Vivek Chauhan explains how to move from generic foundation models to production-ready, use case-specific AI by combining Fireworks AI training/inference capabilities with Microsoft (Azure) AI Foundry, focusing on practical patterns to reduce cost and latency and deploy at scale.
Overview
This Microsoft Build 2026 breakout covers how teams can operationalize foundation models by:
- Customizing models for specific use cases using managed training and post-training approaches.
- Improving inference performance (cost and latency) using an optimized LLM serving/inference engine.
- Deploying and integrating models through Microsoft (Azure) AI Foundry, including using a model catalog and connecting deployed models to Azure agents.
Session segments (from the published chapters)
Fireworks' inference engine and LLM serving optimization
- Focuses on optimization challenges in large language model (LLM) serving.
- Frames the goal as reducing latency and cost while keeping production reliability.
Flexible training options for teams at different stages
- Discusses how organizations can “own their AI stack” with training options that fit different maturity levels.
- Mentions managed training, APIs aimed at researchers, and one-click deployment.
PTU mode for production workloads
- Explains PTU (Provisioned Throughput Unit) mode as a way to run production workloads with predictable throughput.
Live demo: model catalog on Azure AI Foundry
- Demonstrates exploring the Fireworks model catalog within Azure AI Foundry.
Integrating deployed models into Azure agents
- Shows how deployed models can be connected into Azure agents for application use.
Case study discussion: open-weight models and post-training
- Discusses a real-world strategy (Harvey) using open-weight models and post-training to achieve domain expertise and support governance needs.
Partnership value and closing guidance
- Summarizes the Microsoft Foundry + Fireworks partnership positioning for enterprise-grade AI deployment.
- Closes with advice aimed at founders and enterprises on using managed AI infrastructure to shorten time-to-production.