Move AI workflows from test to production on Microsoft Foundry | DEMSP383

Name: Move AI workflows from test to production on Microsoft Foundry | DEMSP383
Uploaded: 2026-06-03T10:46:44+00:00
Description: Vignesh Sridhar demonstrates how to run high-performance LLM inference on Microsoft Foundry using Fireworks AI, and how to take an AI workflow from testing...

Jun 3, 2026 by Vignesh Sridhar

Vignesh Sridhar demonstrates how to run high-performance LLM inference on Microsoft Foundry using Fireworks AI, and how to take an AI workflow from testing into production with a unified deployment and evaluation flow focused on latency, cost, and quality metrics.

Overview

This Microsoft Build 2026 demo shows an end-to-end workflow for moving an enterprise AI use case from test to production by running high-performance inference directly on Microsoft Foundry, using Fireworks AI integration.

Key themes covered in the session description and chapter outline:

Scale and serving capabilities
- Example scale figures discussed: 30 trillion tokens/day and 180,000 requests/second.
- Overview of the Fireworks serving stack and workload-aware optimization.
Model selection and test deployment
- Selecting and deploying a model for testing (example mentioned: Kimi K 2.6).
Production-oriented deployment setup
- Setting up a single-tenant deployment.
- Performing performance validation with a focus on latency and throughput.
Choosing models based on practical constraints
- Comparing options using latency, quality, and token usage.
- Saving the chosen configuration as an agent.
Evaluation workflow
- Selecting datasets.
- Mapping evaluation fields.
- Configuring a judge model.
- Tracking key evaluation metrics:
  - Relevance
  - Groundedness
  - Coherence

Session context

Session: DEMSP383 (Demo, Intermediate)
Event: Microsoft Build 2026
Speaker: Vignesh Sridhar (Fireworks AI)
More Build sessions: https://build.microsoft.com