Microsoft Events present a technical deep-dive into deploying local AI models using Windows ML, led by Andrew Leader and Anastasiya Tarnouskaya. Discover best practices and engineering details for high-performance, on-device inferencing.

Deploying Local AI Models in Enterprise with Windows ML

Session Speakers: Andrew Leader, Anastasiya Tarnouskaya Event: Microsoft Ignite 2025 (Session BRK329)

Overview

Windows ML is an advanced AI framework built directly into Windows, supporting secure and private on-device AI inferencing. This session explores why deploying models locally is vital for privacy, security, and performance—eliminating the need for cloud infrastructure.

Key Topics

  • Benefits of Local AI:
    • Enhanced privacy and data security by keeping inferencing on-device
    • Faster response times due to reduced latency
    • Scalable solutions across diverse hardware tiers
  • Windows ML Architecture:
    • Abstracts complex model-to-hardware operations for developers
    • Supports execution on CPU, GPU, and NPU
    • Helps manage dependencies, minimizing app size and deployment risks
  • Registering Execution Providers:
    • Demonstrated use of ONNX Runtime for model execution
    • How to select and debug execution providers (including QNN NPU support)
    • Step-by-step: Register providers and ensure device readiness
  • Model Setup and Prompt Encoding:
    • Encoding prompts for local generation
    • Setting up model generators for various tasks
    • Example: Live GPU-based image generation demo using Windows ML
  • Deployment and Flexibility:
    • Windows ML enables lightweight deployments that scale from CPU to NPU
    • Comparison of inference performance on different hardware
  • Developer Experience:
    • Focus on app logic while the framework handles hardware abstraction
    • Tools and best practices for debugging and readiness

Resources and Further Learning

Useful Tags

WindowsML #ONNX #LocalAI #EnterpriseAI #Privacy #Security #EdgeAI #MSIgnite


This session is recommended for developers and enterprise architects interested in robust, scalable, and private AI deployments using Windows ML. Technical requirements and stepwise demonstrations allow attendees to implement these solutions in production environments.