Move AI workflows from test to production on Microsoft Foundry | DEMSP383
Vignesh Sridhar demonstrates how to run high-performance LLM inference on Microsoft Foundry using Fireworks AI, and how to take an AI workflow from testing into production with a unified deployment and evaluation flow focused on latency, cost, and quality metrics.
Overview
This Microsoft Build 2026 demo shows an end-to-end workflow for moving an enterprise AI use case from test to production by running high-performance inference directly on Microsoft Foundry, using Fireworks AI integration.
Key themes covered in the session description and chapter outline:
- Scale and serving capabilities
- Example scale figures discussed: 30 trillion tokens/day and 180,000 requests/second.
- Overview of the Fireworks serving stack and workload-aware optimization.
- Model selection and test deployment
- Selecting and deploying a model for testing (example mentioned: Kimi K 2.6).
- Production-oriented deployment setup
- Setting up a single-tenant deployment.
- Performing performance validation with a focus on latency and throughput.
- Choosing models based on practical constraints
- Comparing options using latency, quality, and token usage.
- Saving the chosen configuration as an agent.
- Evaluation workflow
- Selecting datasets.
- Mapping evaluation fields.
- Configuring a judge model.
- Tracking key evaluation metrics:
- Relevance
- Groundedness
- Coherence
Session context
- Session: DEMSP383 (Demo, Intermediate)
- Event: Microsoft Build 2026
- Speaker: Vignesh Sridhar (Fireworks AI)
- More Build sessions: https://build.microsoft.com