ClusterSim: modeling thread block clusters in hopper GPUs

Lühnen, Tim JuliusTim JuliusLühnenBehera, JyotirmanJyotirmanBeheraTripathy, DevashreeDevashreeTripathyLal, SohanSohanLal2025-09-102025-09-102025IEEE International Symposium on Workload Characterization, IISWC 2025https://hdl.handle.net/11420/57345Modern Graphics Processing Units (GPUs), such as NVIDIA’s Hopper and Blackwell, leverage Thread Block Clusters (TBCs) to enhance performance and resource management. TBCs introduce a hierarchical organization, grouping thread blocks into clusters that enable efficient synchronization and distributed shared memory access. This innovation improves data locality and reduces latency in inter-thread block communication, unlocking new opportunities for executing complex parallel workloads. However, modeling the intricate interactions within TBCs, especially the balance between data locality and resource contention, is challenging. This is further complicated by limited access to cutting-edge hardware like Hopper GPUs, which restricts direct experimentation. As a result, robust simulation models are needed to accurately replicate TBC behavior. This paper presents a detailed simulation model that captures TBC performance characteristics. Our model enables researchers to explore TBC functionalities and evaluate performance implications without requiring physical Hopper GPUs. Validation against an NVIDIA H100 GPU shows a Mean Absolute Relative Error (MARE) of 4.7%, demonstrating the model’s accuracy and utility for advancing research in GPU architectures and parallel computing.enhttp://rightsstatements.org/vocab/InC/1.0/GPUsThread block clustercomputer architecture modeling and simulationHopper architectureComputer Science, Information and General Works::004: Computer SciencesClusterSim: modeling thread block clusters in hopper GPUsConference Paperhttps://doi.org/10.15480/882.1585810.1109/IISWC66894.2025.0004810.15480/882.15858Conference Paper