TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publications
  4. A quantitative study of locality in GPU caches for memory-divergent workloads
 
Options

A quantitative study of locality in GPU caches for memory-divergent workloads

Citation Link: https://doi.org/10.15480/882.4333
Publikationstyp
Journal Article
Date Issued
2022-04-05
Sprache
English
Author(s)
Lal, Sohan  
Varma, Bogaraju Sharatchandra  
Juurlink, Ben  
Institut
Massively Parallel Systems E-EXK5  
TORE-DOI
10.15480/882.4333
TORE-URI
http://hdl.handle.net/11420/12244
Journal
International journal of parallel programming  
Volume
50
Issue
2
Start Page
189
End Page
216
Citation
International Journal of Parallel Programming 50 (2): 189-216 (2022)
Publisher DOI
10.1007/s10766-022-00729-2
Scopus ID
2-s2.0-85127581640
Publisher
Springer Science + Business Media B.V.
GPUs are capable of delivering peak performance in TFLOPs, however, peak performance is often difficult to achieve due to several performance bottlenecks. Memory divergence is one such performance bottleneck that makes it harder to exploit locality, cause cache thrashing, and high miss rate, therefore, impeding GPU performance. As data locality is crucial for performance, there have been several efforts to exploit data locality in GPUs. However, there is a lack of quantitative analysis of data locality, which could pave the way for optimizations. In this paper, we quantitatively study the data locality and its limits in GPUs at different granularities. We show that, in contrast to previous studies, there is a significantly higher inter-warp locality at the L1 data cache for memory-divergent workloads. We further show that about 50% of the cache capacity and other scarce resources such as NoC bandwidth are wasted due to data over-fetch caused by memory divergence. While the low spatial utilization of cache lines justifies the sectored-cache design to only fetch those sectors of a cache line that are needed during a request, our limit study reveals the lost spatial locality for which additional memory requests are needed to fetch the other sectors of the same cache line. The lost spatial locality presents opportunities for further optimizing the cache design.
Subjects
Data locality
GPU caches
Memory divergence
DDC Class
600: Technik
Funding(s)
Projekt DEAL  
Publication version
publishedVersion
Lizenz
https://creativecommons.org/licenses/by/4.0/
Loading...
Thumbnail Image
Name

Lal2022_Article_AQuantitativeStudyOfLocalityIn.pdf

Size

2.47 MB

Format

Adobe PDF

TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback