Options
Accelerating GPGPU simulation by strategically parallelizing the compute bottleneck
Citation Link: https://doi.org/10.15480/882.16573
Publikationstyp
Conference Paper
Date Issued
2026-01
Sprache
English
TORE-DOI
Cycle-accurate GPGPU simulators like GPGPU-Sim provide invaluable insights for hardware architecture research but suffer from extremely long runtimes, hindering research productivity. This paper addresses this critical bottleneck by proposing a strategy to accelerate GPGPU-Sim. We first perform a holistic profiling analysis across diverse GPGPU benchmarks to identify the primary performance bottleneck, pinpointing the SIMT-Core cluster execution within the CORE-clock cycle. Based on this, we implement a parallelization scheme that strategically targets this hotspot, utilizing a thread pool to manage concurrent execution of SIMT-Core clusters. Our approach prioritizes minimal modifications to the existing GPGPU-Sim codebase to ensure long-term maintainability. Evaluation of a simulated NVIDIA H100 model demonstrates an average simulation wall-time speedup of 3.58x with 8 worker threads, and a maximum up to 4.38x, while incurring a maximum cycle count error of 3.22%, with some other benchmarks exhibiting no error at all.
Subjects
GPGPU
CUDA
Simulation
Computer Architecture
GPGPU-Sim
Thread Pool
DDC Class
004: Computer Sciences
621.3: Electrical Engineering, Electronic Engineering
005: Computer Programming, Programs, Data and Security
Publication version
submittedVersion
Loading...
Name
Accelerating-GPGPU-Simulation.pdf
Size
870.83 KB
Format
Adobe PDF