A Blueprint to Build the World’s Largest Liquid-Cooled GPU Cluster
Scaling out Supermicro SuperCluster with NVIDIA Spectrum™-X Ethernet

Scaling out Supermicro SuperCluster with NVIDIA Spectrum™-X Ethernet

Supermicro’s SuperCluster, accelerated by the NVIDIA Blackwell Platform, empowers the next stage of AI, defined by new breakthroughs, including the evolution of scaling laws and the rise of reasoning models. These new SuperCluster offerings powered by the NVIDIA Blackwell Platform are available in 42U, 48U, or 52U configurations. The upgraded cold plates and 250kW coolant distribution unit (CDU) more than double the cooling capacity of the previous generation. The new vertical coolant distribution manifold (CDM) means that horizontal manifolds no longer occupy valuable rack space. NVIDIA Quantum InfiniBand or NVIDIA Spectrum™ networking in a centralized rack enables a non-blocking, 256-GPU scalable unit in five racks, or an extended 768-GPU scalable unit in nine racks.

Supermicro’s SuperCluster accelerated by the NVIDIA Blackwell Platform, empowers the next stage of AI, defined by new breakthroughs, including the evolution of scaling laws and the rise of reasoning models. Supermicro’s new air-cooled SuperCluster is composed of the new Supermicro NVIDIA HGX B200 8-GPU systems. Featuring a redesigned 10U chassis to accommodate the thermals of its leading-edge AI compute performance, it is designed to tackle heavy AI workloads of all types, from training to fine-tuning to inference. NVIDIA Quantum InfiniBand or NVIDIA Spectrum™ networking in a centralized rack enables a non-blocking, 256-GPU scalable unit in nine racks.

This white paper reveals blueprints of a Supermicro Generative AI rack cluster with NVIDIA HGX™ H100/H200 GPUs. It delves into the design of SuperCluster’s individual system nodes, component selection, rack layout, network topology, and deployment steps.