Azure Storage for AI workloads | OD870
Saurabh Sensharma and Vishnu Charan TJ explain how Azure Storage supports AI inference at scale, focusing on securely bringing enterprise data to models, speeding up model loading and data access, and reducing GPU idle time with caching and high-performance storage patterns.
Overview
This Microsoft Build 2026 session (OD870) covers how Azure Storage fits into AI inference architectures and what to consider when optimizing storage for performance and cost.
Introduction to Azure Storage for AI workloads
The speakers introduce Azure Storage as a key dependency for AI inference systems, especially where model artifacts and enterprise data must be accessed quickly and securely.
Storage for AI and AI for Storage
The session frames two directions:
- Storage for AI: using Azure Storage to feed and accelerate AI workloads.
- AI for Storage: applying AI-driven approaches to improve storage-related experiences and operations.
Azure Storage integration across the AI stack
The speakers describe how Azure Storage integrates across infrastructure and AI layers, including Microsoft and open-source AI frameworks, to support scalable inference and agent-based applications.
Azure Storage clients and tools for AI workloads
The session highlights Azure Storage client options and tooling used to connect AI workloads to storage efficiently, with an emphasis on optimized data access patterns.
Paths to run AI workloads with storage
The speakers outline common execution environments where Azure Storage is used as a core data and model layer:
- Azure AI Foundry
- Azure Kubernetes Service (AKS)
- IaaS-based deployments
Storage requirements for agentic inference
The session discusses storage needs that become more pronounced with agentic inference patterns, where systems may perform repeated retrievals, tool calls, and multi-step workflows that increase storage access frequency and sensitivity to latency.
Inference optimization through caching
The speakers cover caching approaches intended to reduce repeated work and improve throughput:
- Prompt caching as an optimization technique for inference scenarios.
- Explicit caching patterns using Azure Storage.
Demo: explicit caching with Azure Blob and NIXL integration
A demonstration shows explicit caching using Azure Blob Storage together with NIXL integration to improve data access behavior for inference workloads.
Fast model loading and distribution
The session covers techniques to reduce GPU idle time by speeding up model availability:
- Faster model loading and distribution patterns
- Use of Run:AI Streamer
- Use of a distributed cache to improve model access and reduce repeated loading overhead
Bringing enterprise data to AI via Azure integrations
The speakers discuss approaches for securely connecting enterprise data to AI systems using Azure integrations, including Foundry IQ, to support inference scenarios that depend on organizational data.
Storage Center and recap
The session closes by introducing Storage Center and summarizing the main themes: integrating Azure Storage into AI inference stacks, optimizing data access and model loading, and using caching to improve performance and cost efficiency.