Scale agentic AI cost‑efficiently on Azure with Arm Cobalt VMs | DEMSP381
Sameer Nori, Pranay Bakre, and Govardhani Babu present a Microsoft Build 2026 session on scaling agentic AI workloads cost-efficiently on Azure using Arm-based Azure Cobalt VMs.
Overview
The session focuses on CPU-based AI inference for agentic and cloud-native applications, and how Azure Cobalt VMs can be used to scale LLM inferencing alongside application tiers.
Key themes covered in the description and agenda include:
- Using Azure Cobalt VMs (Arm-based) to improve cost/performance for inference-heavy workloads.
- A live Azure Kubernetes Service (AKS) demo showing deployment of:
- An LLM inference component
- Supporting application tiers in the same cluster
- Practical considerations for production deployments:
- Performance characteristics and validation across first-party and third-party workloads
- Scaling approaches for distributed, microservice-based systems
- Design patterns for agentic and cloud-native architectures
Session chapters (as provided)
- 0:00 - Silicon Innovation Collaboration on Microsoft Cobalt Chips
- 00:02:21 - Transition to Technical Deep Dive: Handoff to Goa
- 00:05:40 - Microsoft first-party and third-party workloads validating new performance levels
- 00:05:53 - Future plans focused on agentic AI and cloud-native applications
- 00:08:13 - Scaling distributed microservices on Cobalt DS
- 00:08:29 - Demo setup of a cloud-native polyglot shopping cart application
- 00:10:40 - Adding AI-native capabilities to existing applications within cluster
- 00:15:36 - Announcement of interactive lab sessions for hands-on learning
- 00:15:59 - Introduction of ARM Cloud Migration program for partners
Related links
- Microsoft Build sessions: https://build.microsoft.com