Performance and Scalability of Azure HBv5-series Virtual Machines
jvenkatesh and colleagues present an in-depth technical overview and benchmarks of Azure HBv5-series Virtual Machines, highlighting improvements for HPC workloads and practical guidance for Azure users.
Performance and Scalability of Azure HBv5-series Virtual Machines
Overview
Azure HBv5-series virtual machines (VMs) are the latest CPU-based high performance computing (HPC) offering from Microsoft Azure, now generally available. This article, contributed by Amirreza Rastegari and colleagues, provides a technical deep-dive into HBv5 architecture, performance, cost implications, and real-world HPC application benchmarks.
HBv5 VMs offer significant advances for memory bandwidth-bound workloads, including computational fluid dynamics (CFD), weather and geoscience simulations, and finite element analysis. Compared to the previous HBv4 generation, HBv5 VMs deliver up to:
- 5x higher performance for CFD workloads with 43% lower costs
- 3.2x higher performance for weather simulation with 16% lower costs
- 2.8x higher performance for geoscience workloads at the same costs
Technical Highlights
Key features of HBv5-series VMs:
- Up to 6.6 TB/s memory bandwidth (STREAM TRIAD) and 432 GB RAM per VM
- Up to 368 physical cores (user configurable) using custom AMD EPYC Zen4 CPUs (SMT disabled)
- Base clock 3.5 GHz, boost to 4 GHz across all cores
- 800 Gb/s NVIDIA Quantum-2 InfiniBand (4 x 200 Gb/s CX-7)
- 180 Gb/s Azure Accelerated Networking
- 15 TB local NVMe SSD, 50 GB/s read, 30 GB/s write bandwidth
- High-bandwidth memory (HBM), with ~9x memory bandwidth vs. dual-socket EPYC Genoa and ~7x vs. EPYC Turin
- Multiple constrained core configurations for ISV licensing and core-optimized scenarios
See official documentation for full specs.
Microbenchmark Results
Memory & Compute Performance
Industry benchmarks were used:
- STREAM – Memory bandwidth
- HPCG – Sparse linear algebra performance
- HPL – Dense linear algebra performance
Example command for STREAM:
OMP_NUM_THREADS=368 OMP_PROC_BIND=true OMP_PLACES=cores ./amd_zen_stream STREAM data size: 2621440000 bytes
InfiniBand Networking Performance
- Four NVIDIA Quantum-2 NICs per VM, each at 200 Gb/s, aggregate 800 Gb/s per VM
- IB perftests and OSU benchmarks show 99% of theoretical bandwidth achieved
- Latency as low as 1.25 microseconds (dependent on message size)
Application Benchmarks: Performance, Cost, & Consolidation
Benchmarks were performed with real HPC workloads, comparing HBv5 with prior Azure VM generations (HBv4, HBv3, HBv2, HX series):
CFD
- OpenFOAM 2306: 4.8x performance vs. HBv4, 57% cost.
- Palabos 1.01: 4.4x performance vs. HBv4, 62% cost.
- Ansys Fluent 2025 R2: 3.4x performance vs. HBv4, 81% cost.
- Siemens Star-CCM+ 17.04: 3.4x performance vs. HBv4, 81% cost.
Weather Modeling
- WRF 4.2.2: 3.27x performance vs. HBv4, 84% cost.
Energy Research
- Devito 4.8.7: 3.27x performance vs. HBv4, equivalent cost.
Molecular Dynamics
- NAMD 2.15a2: 2.18x performance vs. HBv4, 26% higher cost (compute-bound, less memory-bound advantage).
Insights
- HBv5 VMs consistently allow for ~2-5x consolidation of VM fleets for memory-bound applications.
- For compute-bound workloads (e.g. NAMD), HBv5 is fastest on Azure but cost efficiency may favor other VM families or GPUs.
Scalability Analysis
Weak Scaling
- Demonstrated using Palabos 3D Cavity model
- HBv5 shows linear scaling as workload and core count increase together
Strong Scaling
- Demonstrated with NAMD 20M and 210M cell cases
- For moderate problem sizes, scaling is efficient up to a point (communication time limits gains)
- For large cases (210M cells), HBv5 scales linearly even at >11,000 cores (32 VMs)
Recommendations:
- Run scaling tests with your specific application to identify the sweet spot for time-to-solution and job cost
- For communication-bound cases, consider fewer MPI processes or redesigns to overlap communication and compute
Conclusion
Azure HBv5-series VMs provide a major leap in memory bandwidth and CPU performance for Azure’s HPC offerings, enabling substantial performance and cost advantages for memory-bound workloads across various scientific and engineering domains. For detailed configurations, benchmarks, and scalability practices, consult the official Azure documentation and consider application-specific scaling tests to optimize your deployment.
This post appeared first on “Microsoft Tech Community”. Read the entire article here