Build Apps w/ Local AI for Unmetered Intelligence on every Windows PC | BRK260
Anastasiya Tarnouskaya, Aditi Narvekar, and Jordi Janer explain how to build AI-powered Windows apps that run models locally, including using Foundry Local and Windows ML to execute workloads across CPU, GPU, and NPU, plus new tooling like the Windows MLCLI preview and WebNN support for web apps.
Overview
This Build 2026 breakout focuses on shipping local AI experiences on Windows PCs using Microsoft’s Windows AI platform.
Windows AI APIs (solution-centric)
- Starts from Windows AI APIs intended to help app developers add AI features.
- Notes the platform is expanding beyond Copilot+ devices.
Running open-source models locally with Foundry Local
- Uses Foundry Local to run open-source models on-device.
- Emphasizes local execution benefits:
- Privacy
- Lower latency
- Network independence
- Cost savings (no per-request metering)
Preparing models for local deployment with Foundry Toolkit
- Introduces Foundry Toolkit tooling to:
- Optimize models
- Prepare models for local AI deployments
- Analyze hardware support and compatibility
Executing on-device inference with Windows ML
- Runs custom AI workloads locally across:
- GPU
- NPU
- CPU
- Mentions switching to NPU for faster inference.
Windows MLCLI preview
- Announces a Windows MLCLI (preview) for working with Windows ML workflows.
Web apps and WebNN
- Calls out WebNN support to enable Windows ML scenarios for web apps.
Example scenario: Voicemod
- Uses Voicemod as an example of real-time AI voice transformation.
- Highlights Voicemod’s use of Windows ML for low-latency AI processing.