Turn foundation models into production AI on Microsoft Foundry | BRKSP91

Name: Turn foundation models into production AI on Microsoft Foundry | BRKSP91
Uploaded: 2026-06-04T10:46:17+00:00
Description: Vivek Chauhan explains how to move from generic foundation models to production-ready, use case-specific AI by combining Fireworks AI training/inference...

Jun 4, 2026 by Vivek Chauhan

Vivek Chauhan explains how to move from generic foundation models to production-ready, use case-specific AI by combining Fireworks AI training/inference capabilities with Microsoft (Azure) AI Foundry, focusing on practical patterns to reduce cost and latency and deploy at scale.

Overview

This Microsoft Build 2026 breakout covers how teams can operationalize foundation models by:

Customizing models for specific use cases using managed training and post-training approaches.
Improving inference performance (cost and latency) using an optimized LLM serving/inference engine.
Deploying and integrating models through Microsoft (Azure) AI Foundry, including using a model catalog and connecting deployed models to Azure agents.

Session segments (from the published chapters)

Fireworks' inference engine and LLM serving optimization

Focuses on optimization challenges in large language model (LLM) serving.
Frames the goal as reducing latency and cost while keeping production reliability.

Flexible training options for teams at different stages

Discusses how organizations can “own their AI stack” with training options that fit different maturity levels.
Mentions managed training, APIs aimed at researchers, and one-click deployment.

PTU mode for production workloads

Explains PTU (Provisioned Throughput Unit) mode as a way to run production workloads with predictable throughput.

Live demo: model catalog on Azure AI Foundry

Demonstrates exploring the Fireworks model catalog within Azure AI Foundry.

Integrating deployed models into Azure agents

Shows how deployed models can be connected into Azure agents for application use.

Case study discussion: open-weight models and post-training

Discusses a real-world strategy (Harvey) using open-weight models and post-training to achieve domain expertise and support governance needs.

Partnership value and closing guidance

Summarizes the Microsoft Foundry + Fireworks partnership positioning for enterprise-grade AI deployment.
Closes with advice aimed at founders and enterprises on using managed AI infrastructure to shorten time-to-production.