Options
Hybrid PTX analysis for GPU accelerated CNN inferencing aiding computer architecture design
Publikationstyp
Conference Paper
Publikationsdatum
2023-09
Sprache
English
Volume
2023-September
Article Number
193383
Citation
Forum on Specification and Design Languages (FDL 2023)
Contribution to Conference
Publisher DOI
Scopus ID
Publisher
IEEE Computer Society
ISBN
9798350307375
General-Purpose Computation on Graphics Processing Units (GPGPUs) are becoming crucial in accelerating computing capacity. Due to the massive parallelism capabilities of GPUs, they can achieve impressive speedups of up to 32 times compared to common CPUs. However, writing highly parallel code and utilizing a GPU is challenging for programmers. Developers are facing new challenges since GPUs handle threads and parallelism differently from CPUs. Academia and industry proposed several profilers to support developers in terms of code optimization. These profilers often require an actual device (e.g., GPU) and take a long time for the profiling process. We propose HyPA, a hybrid Parallel Thread Execution (PTX) Analyzer that inspects PTX code statically and dynamically. HyPA implements a partly functional emulator that executes instructions that rely on runtime dependencies to count the number of executed PTX instructions and divergent branches. HyPa executes compiled kernels-the programs that run on GPUs-generated by the CUDA compiler and supports the full PTX 7.7 specification. Our functional emulator allows significantly faster analysis of PTX code compared to standard profilers. In our evaluation, we quantify this increase in performance through benchmark runs. HyPA achieved speedups of up to 536% compared to the nvprof profiler. Moreover, our approach can gather performance metrics beyond static analysis (e.g., branch efficiency) by a faster execution time than by profiling the application on an actual device. Finally, we provide an open-source implementation of HyPA to help developers and system designers in further research and development.
Schlagworte
CUDA
GPU
Power and Performance Optimization
PTX
DDC Class
004: Computer Sciences