TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publication References
  4. Hybrid PTX analysis for GPU accelerated CNN inferencing aiding computer architecture design
 
Options

Hybrid PTX analysis for GPU accelerated CNN inferencing aiding computer architecture design

Publikationstyp
Conference Paper
Date Issued
2023-09
Sprache
English
Author(s)
Metz, Christopher  
Plump, Christina  
Berger, Bernhard Johannes  orcid-logo
Eingebettete Systeme E-13  
Drechsler, Rolf  
TORE-URI
https://hdl.handle.net/11420/44189
Volume
2023-September
Article Number
193383
Citation
Forum on Specification and Design Languages (FDL 2023)
Contribution to Conference
Forum on Specification and Design Languages, FDL 2023  
Publisher DOI
10.1109/FDL59689.2023.10272088
Scopus ID
2-s2.0-85175261538
Publisher
IEEE Computer Society
ISBN
9798350307375
General-Purpose Computation on Graphics Processing Units (GPGPUs) are becoming crucial in accelerating computing capacity. Due to the massive parallelism capabilities of GPUs, they can achieve impressive speedups of up to 32 times compared to common CPUs. However, writing highly parallel code and utilizing a GPU is challenging for programmers. Developers are facing new challenges since GPUs handle threads and parallelism differently from CPUs. Academia and industry proposed several profilers to support developers in terms of code optimization. These profilers often require an actual device (e.g., GPU) and take a long time for the profiling process. We propose HyPA, a hybrid Parallel Thread Execution (PTX) Analyzer that inspects PTX code statically and dynamically. HyPA implements a partly functional emulator that executes instructions that rely on runtime dependencies to count the number of executed PTX instructions and divergent branches. HyPa executes compiled kernels-the programs that run on GPUs-generated by the CUDA compiler and supports the full PTX 7.7 specification. Our functional emulator allows significantly faster analysis of PTX code compared to standard profilers. In our evaluation, we quantify this increase in performance through benchmark runs. HyPA achieved speedups of up to 536% compared to the nvprof profiler. Moreover, our approach can gather performance metrics beyond static analysis (e.g., branch efficiency) by a faster execution time than by profiling the application on an actual device. Finally, we provide an open-source implementation of HyPA to help developers and system designers in further research and development.
Subjects
CUDA
GPU
Power and Performance Optimization
PTX
DDC Class
004: Computer Sciences
TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback