Microsoft Events present a technical deep-dive into deploying local AI models using Windows ML, led by Andrew Leader and Anastasiya Tarnouskaya. Discover best practices and engineering details for high-performance, on-device inferencing.

Deploying Local AI Models in Enterprise with Windows ML

Session Speakers: Andrew Leader, Anastasiya Tarnouskaya Event: Microsoft Ignite 2025 (Session BRK329)

Overview

Windows ML is an advanced AI framework built directly into Windows, supporting secure and private on-device AI inferencing. This session explores why deploying models locally is vital for privacy, security, and performance—eliminating the need for cloud infrastructure.

Key Topics

Benefits of Local AI:
- Enhanced privacy and data security by keeping inferencing on-device
- Faster response times due to reduced latency
- Scalable solutions across diverse hardware tiers
Windows ML Architecture:
- Abstracts complex model-to-hardware operations for developers
- Supports execution on CPU, GPU, and NPU
- Helps manage dependencies, minimizing app size and deployment risks
Registering Execution Providers:
- Demonstrated use of ONNX Runtime for model execution
- How to select and debug execution providers (including QNN NPU support)
- Step-by-step: Register providers and ensure device readiness
Model Setup and Prompt Encoding:
- Encoding prompts for local generation
- Setting up model generators for various tasks
- Example: Live GPU-based image generation demo using Windows ML
Deployment and Flexibility:
- Windows ML enables lightweight deployments that scale from CPU to NPU
- Comparison of inference performance on different hardware
Developer Experience:
- Focus on app logic while the framework handles hardware abstraction
- Tools and best practices for debugging and readiness

Resources and Further Learning

Session Resources
Related Sessions:
- BRK331
- BRK199
Main event portal: Microsoft Ignite

Useful Tags

WindowsML #ONNX #LocalAI #EnterpriseAI #Privacy #Security #EdgeAI #MSIgnite

This session is recommended for developers and enterprise architects interested in robust, scalable, and private AI deployments using Windows ML. Technical requirements and stepwise demonstrations allow attendees to implement these solutions in production environments.