Content by shantanu patankar and azin heidarshenas (1)

Shantanu Patankar and Azin Heidarshenas break down Azure’s MLPerf Training v6.0 run for Llama 3.1 405B, sharing what they learned scaling pretraining to 8,192 NVIDIA GB200 GPUs on Fairwater—where the time goes per step, why topology-aware parallelism mapping matters, and what actually limits scaling efficiency at extreme scale.
Community

End of content

Rejoining the server...

Rejoin failed... trying again in seconds.

Failed to rejoin.
Please retry or reload the page.

The session has been paused by the server.

Failed to resume the session.
Please reload the page.