Options
Memory access granularity aware lossless compression for GPUs
Citation Link: https://doi.org/10.15480/882.4221
Publikationstyp
Conference Paper
Publikationsdatum
2022
Sprache
English
Citation
36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022)
Contribution to Conference
Publisher DOI
Scopus ID
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Peer Reviewed
true
High-bandwidth off-chip memory has played a key role in the success of Graphics Processing Units (GPUs) as an accelerator. However, as memory bandwidth scaling continues to lag behind the computational power, it remains a key bottleneck in computing systems. While memory compression has shown immense potential to increase the effective memory bandwidth by compressed data transfers between on-chip and off-chip memory, the large memory access granularity (MAG) of off-
chip memory limits compression techniques from achieving a high effective compression ratio. Unfortunately, state-of-the-art lossless memory compression techniques do not take the large MAG of off-chip memory into account. A recent study has used MAG-aware approximation to increase the effective compression ratio, however, not all applications can tolerate errors, which limits its applicability. We propose extensions and GPU-specific optimizations to adapt a lossless memory compression technique to a MAG size to increase the effective compression ratio and performance gain. Our technique is based on the well-known Base-Delta-Immediate (BDI) compression technique that compresses a memory block to a common base and multiple deltas. We leverage the key observation that deltas often contain enough leading zeros to compress a block to a multiple of MAG without any loss of information. We show that MAG-aware BDI provides, on average, a 48% higher effective compression ratio, 10% (up to 27%) higher speedup, and 16% bandwidth reduction compared to normal BDI. While BDI, FPC, and CPACK have a similar compression ratio, MAG-aware BDI outperforms FPC, CPACK, and SLC by 56%, 47%, and 33%, respectively.
chip memory limits compression techniques from achieving a high effective compression ratio. Unfortunately, state-of-the-art lossless memory compression techniques do not take the large MAG of off-chip memory into account. A recent study has used MAG-aware approximation to increase the effective compression ratio, however, not all applications can tolerate errors, which limits its applicability. We propose extensions and GPU-specific optimizations to adapt a lossless memory compression technique to a MAG size to increase the effective compression ratio and performance gain. Our technique is based on the well-known Base-Delta-Immediate (BDI) compression technique that compresses a memory block to a common base and multiple deltas. We leverage the key observation that deltas often contain enough leading zeros to compress a block to a multiple of MAG without any loss of information. We show that MAG-aware BDI provides, on average, a 48% higher effective compression ratio, 10% (up to 27%) higher speedup, and 16% bandwidth reduction compared to normal BDI. While BDI, FPC, and CPACK have a similar compression ratio, MAG-aware BDI outperforms FPC, CPACK, and SLC by 56%, 47%, and 33%, respectively.
DDC Class
004: Informatik
600: Technik
620: Ingenieurwissenschaften
Publication version
submittedVersion
Publisher‘s Creditline
Sohan Lal, Manuel Renz, Julian Hartmer, Ben Juurlink. Memory Access Granularity Aware Lossless
Compression for GPUs. In: Proceedings of the 36th IEEE International Parallel & Distributed Processing Symposium, IPDPS 2022. © 2022 IEEE
Compression for GPUs. In: Proceedings of the 36th IEEE International Parallel & Distributed Processing Symposium, IPDPS 2022. © 2022 IEEE
Loading...
Name
paper.pdf
Size
408.57 KB
Format
Adobe PDF