Content by Shantanu Patankar and Azin Heidarshenas (1)
Shantanu Patankar and Azin Heidarshenas break down Azure’s MLPerf Training v6.0 run for Llama 3.1 405B, sharing what they learned scaling pretraining to 8,192 NVIDIA GB200 GPUs on Fairwater—where the time goes per step, why topology-aware parallelism mapping matters, and what actually limits scaling efficiency at extreme scale.
End of content