Please use this identifier to cite or link to this item: https://doi.org/10.15480/882.4333
Publisher DOI: 10.1007/s10766-022-00729-2
Title: A quantitative study of locality in GPU caches for memory-divergent workloads
Language: English
Authors: Lal, Sohan 
Varma, Bogaraju Sharatchandra 
Juurlink, Ben 
Keywords: Data locality; GPU caches; Memory divergence
Issue Date: 5-Apr-2022
Publisher: Springer Science + Business Media B.V.
Source: International Journal of Parallel Programming 50 (2): 189-216 (2022)
Abstract (english): 
GPUs are capable of delivering peak performance in TFLOPs, however, peak performance is often difficult to achieve due to several performance bottlenecks. Memory divergence is one such performance bottleneck that makes it harder to exploit locality, cause cache thrashing, and high miss rate, therefore, impeding GPU performance. As data locality is crucial for performance, there have been several efforts to exploit data locality in GPUs. However, there is a lack of quantitative analysis of data locality, which could pave the way for optimizations. In this paper, we quantitatively study the data locality and its limits in GPUs at different granularities. We show that, in contrast to previous studies, there is a significantly higher inter-warp locality at the L1 data cache for memory-divergent workloads. We further show that about 50% of the cache capacity and other scarce resources such as NoC bandwidth are wasted due to data over-fetch caused by memory divergence. While the low spatial utilization of cache lines justifies the sectored-cache design to only fetch those sectors of a cache line that are needed during a request, our limit study reveals the lost spatial locality for which additional memory requests are needed to fetch the other sectors of the same cache line. The lost spatial locality presents opportunities for further optimizing the cache design.
URI: http://hdl.handle.net/11420/12244
DOI: 10.15480/882.4333
ISSN: 1573-7640
Journal: International journal of parallel programming 
Institute: Massively Parallel Systems E-EXK5 
Document Type: Article
Project: Projekt DEAL 
License: CC BY 4.0 (Attribution) CC BY 4.0 (Attribution)
Appears in Collections:Publications with fulltext

Files in This Item:
File Description SizeFormat
Lal2022_Article_AQuantitativeStudyOfLocalityIn.pdfVerlags-PDF2,53 MBAdobe PDFView/Open
Thumbnail
Show full item record

Page view(s)

42
Last Week
0
Last month
checked on Dec 1, 2022

Download(s)

22
checked on Dec 1, 2022

Google ScholarTM

Check

Note about this record

Cite this record

Export

This item is licensed under a Creative Commons License Creative Commons