Azure Storage for AI workloads | OD870

Saurabh Sensharma and Vishnu Charan TJ explain how Azure Storage supports AI inference at scale, focusing on securely bringing enterprise data to models, speeding up model loading and data access, and reducing GPU idle time with caching and high-performance storage patterns.

Overview

This Microsoft Build 2026 session (OD870) covers how Azure Storage fits into AI inference architectures and what to consider when optimizing storage for performance and cost.

Introduction to Azure Storage for AI workloads

The speakers introduce Azure Storage as a key dependency for AI inference systems, especially where model artifacts and enterprise data must be accessed quickly and securely.

Storage for AI and AI for Storage

The session frames two directions:

Azure Storage integration across the AI stack

The speakers describe how Azure Storage integrates across infrastructure and AI layers, including Microsoft and open-source AI frameworks, to support scalable inference and agent-based applications.

Azure Storage clients and tools for AI workloads

The session highlights Azure Storage client options and tooling used to connect AI workloads to storage efficiently, with an emphasis on optimized data access patterns.

Paths to run AI workloads with storage

The speakers outline common execution environments where Azure Storage is used as a core data and model layer:

Storage requirements for agentic inference

The session discusses storage needs that become more pronounced with agentic inference patterns, where systems may perform repeated retrievals, tool calls, and multi-step workflows that increase storage access frequency and sensitivity to latency.

Inference optimization through caching

The speakers cover caching approaches intended to reduce repeated work and improve throughput:

Demo: explicit caching with Azure Blob and NIXL integration

A demonstration shows explicit caching using Azure Blob Storage together with NIXL integration to improve data access behavior for inference workloads.

Fast model loading and distribution

The session covers techniques to reduce GPU idle time by speeding up model availability:

Bringing enterprise data to AI via Azure integrations

The speakers discuss approaches for securely connecting enterprise data to AI systems using Azure integrations, including Foundry IQ, to support inference scenarios that depend on organizational data.

Storage Center and recap

The session closes by introducing Storage Center and summarizing the main themes: integrating Azure Storage into AI inference stacks, optimizing data access and model loading, and using caching to improve performance and cost efficiency.