Record-Breaking AI Inference Performance with Azure ND Virtual Machines

Microsoft Events, with Hugo Affaticati and Nitin Nagarkatte, demonstrate how Azure ND Virtual Machines deliver record-setting AI inference throughput and efficiency through engineering innovations showcased at Ignite 2025.

Record-Breaking AI Inference Performance with Azure ND Virtual Machines

Speakers: Hugo Affaticati, Nitin Nagarkatte Event: Microsoft Ignite 2025, Breakout Session BRK180 (Intermediate Level)

Introduction

This session highlights Azure’s achievement in AI inference performance, with the ND GB200 and GB300 v6 Virtual Machines reaching speeds of 865,000 and 1.1 million tokens per second. These results stem from optimizations across the compute stack—from low-level GPU kernels like GEMM to sophisticated attention mechanisms and multi-node scaling solutions.

Key Topics Covered

Infrastructure Overview

Session Highlights

Further Resources

Conclusion

This session provides practitioners a deep technical look into how Azure expands scalable inference capabilities, optimizing every layer for accelerated, cost-effective AI workloads. The content is aimed at enterprise developers, data scientists, and AI infrastructure architects seeking actionable strategies for deploying AI at production scale.