TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publications
  4. ClusterSim: modeling thread block clusters in hopper GPUs
 
Options

ClusterSim: modeling thread block clusters in hopper GPUs

Citation Link: https://doi.org/10.15480/882.15858
Publikationstyp
Conference Paper
Date Issued
2025
Sprache
English
Author(s)
Lühnen, Tim Julius  
Massively Parallel Systems E-EXK5  
Behera, Jyotirman
Tripathy, Devashree  
Lal, Sohan  
Massively Parallel Systems E-EXK5  
TORE-DOI
10.15480/882.15858
TORE-URI
https://hdl.handle.net/11420/57345
Citation
IEEE International Symposium on Workload Characterization, IISWC 2025
Contribution to Conference
IEEE International Symposium on Workload Characterization, IISWC 2025  
Peer Reviewed
true
ISBN of container
979-8-3315-4917-6
979-8-3315-4918-3
Modern Graphics Processing Units (GPUs), such as NVIDIA’s Hopper and Blackwell, leverage Thread Block Clusters (TBCs) to enhance performance and resource management. TBCs introduce a hierarchical organization, grouping thread blocks into clusters that enable efficient synchronization and distributed shared memory access. This innovation improves data locality and reduces latency in inter-thread block communication, unlocking new opportunities for executing complex parallel workloads. However, modeling the intricate interactions within TBCs, especially the balance between data locality and resource contention, is challenging. This is further complicated by limited access to cutting-edge hardware like Hopper GPUs, which restricts direct experimentation. As a result, robust simulation models are needed to accurately replicate TBC behavior. This paper presents a detailed simulation model that captures TBC performance characteristics. Our model enables researchers to explore TBC functionalities and evaluate performance implications without requiring physical Hopper GPUs. Validation against an NVIDIA H100 GPU shows a Mean Absolute Relative Error (MARE) of 4.7%, demonstrating the model’s accuracy and utility for advancing research in GPU architectures and parallel computing.
Subjects
GPUs
Thread block cluster
computer architecture modeling and simulation
Hopper architecture
DDC Class
004: Computer Sciences
Lizenz
http://rightsstatements.org/vocab/InC/1.0/
Loading...
Thumbnail Image
Name

Simulator_Paper.pdf

Size

700.69 KB

Format

Adobe PDF

TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback