Expand local AI reach with Windows ML | OD851
Andrew Leader and Maha Bayana explain how Windows ML enables local AI apps on Windows using custom or open-source ONNX models, with a focus on running inference efficiently across CPU, GPU, and NPU. They also cover what’s new, including WebNN support for web scenarios and improved tooling via AI Toolkit for VS Code.
Overview
Windows ML lets developers build local, AI-powered applications on Windows that run models efficiently across available hardware (GPU, NPU, and CPU) using a unified platform.
The session highlights:
- What’s new in Windows ML for local AI scenarios
- WebNN support for web experiences
- Tooling improvements with AI Toolkit for VS Code to simplify preparing and deploying models and AI workloads with Windows ML
Session outline (from chapters)
Introduction: cloud costs and benefits of local AI
- Motivation for running AI locally, including cost and performance considerations.
Windows ML overview and supported ONNX models
- Windows ML positioning and how it works with ONNX models.
Compatibility and language support
- Windows ML compatibility with Windows 10
- Support for multiple programming languages
Demo scenario: sentiment analysis dashboard
- Planned demo centered on a sentiment analysis dashboard.
Model conversion with Windows ML CLI
- Using the Windows ML CLI for model conversion.
Web app integration
- Integrating an ONNX model into a web app using ONNX Runtime Web.
Web inference via WebNN
- Running model inference on CPU via WebNN.
Performance optimization for NPUs
- Optimizing for NPU execution to accelerate web app performance.
Moving to a native Windows app
- Converting the project to a native WinUI 3 app with Windows ML.
Resources
- https://aka.ms/build26/OD851
- Microsoft Build 2026 sessions: https://build.microsoft.com