Azure Storage for AI workloads | OD870

Name: Azure Storage for AI workloads | OD870
Uploaded: 2026-06-03T14:38:52+00:00
Description: Saurabh Sensharma, Vishnu Charan TJ, and Saloni Sonpal explain how Azure Storage supports AI inference at scale, focusing on secure enterprise data access,...

Jun 3, 2026 by Saurabh Sensharma, Vishnu Charan TJ

Saurabh Sensharma, Vishnu Charan TJ, and Saloni Sonpal walk through how Azure Storage can be used to improve performance and cost efficiency for AI inference workloads, including caching patterns, faster model distribution, and integrations across the AI stack.

Overview

The session covers how Azure Storage powers AI inference at scale, with an emphasis on:

Securely bringing enterprise data to AI models
Accelerating AI workloads with high-performance storage
Reducing GPU idle time via faster model loading and optimized data access
Integrating Azure Storage with Microsoft and open-source AI frameworks
Enabling scalable, agent-based (agentic) applications

Topics and chapters

Introduction to Azure Storage for AI workloads

High-level framing of storage needs for AI inference at scale

Storage for AI and AI for Storage

Overview of how storage supports AI workloads, and how AI can be applied to storage scenarios

Azure Storage integration across the AI stack and infrastructure

How Azure Storage fits into AI infrastructure and the broader AI stack

Azure Storage clients and tools for AI workloads

Discussion of storage clients and tooling used to connect AI workloads to Azure Storage

Paths to run AI workloads with storage

The presenters outline common execution environments where Azure Storage is used:

Azure AI Foundry
Azure Kubernetes Service (AKS)
Infrastructure-as-a-Service (IaaS)

Storage requirements for agentic inference

Storage considerations for agent-based inference scenarios and the roles storage plays in those architectures

Inference optimization through prompt caching

Prompt caching as a technique to improve inference performance and reduce repeated work

Explicit caching with Azure Blob and NIXL (demo)

A demo showing explicit caching using Azure Blob Storage
NIXL integration is referenced as part of the caching approach

Fast model loading and distribution

Approaches to reduce model load time and improve distribution efficiency
Run:AI Streamer and a distributed cache are referenced in this segment

Bringing enterprise data to AI via Azure integrations

Azure integrations for connecting enterprise data to AI workflows
Foundry IQ is referenced in the context of these integrations

Storage Center and recap

Introduction of Storage Center
Session recap of the main performance and scalability themes