Lal, SohanSohanLalRenz, ManuelManuelRenzHartmer, JulianJulianHartmerJuurlink, Ben H. H.Ben H. H.Juurlink2022-03-222022-03-22202236th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022)http://hdl.handle.net/11420/11845High-bandwidth off-chip memory has played a key role in the success of Graphics Processing Units (GPUs) as an accelerator. However, as memory bandwidth scaling continues to lag behind the computational power, it remains a key bottleneck in computing systems. While memory compression has shown immense potential to increase the effective memory bandwidth by compressed data transfers between on-chip and off-chip memory, the large memory access granularity (MAG) of off- chip memory limits compression techniques from achieving a high effective compression ratio. Unfortunately, state-of-the-art lossless memory compression techniques do not take the large MAG of off-chip memory into account. A recent study has used MAG-aware approximation to increase the effective compression ratio, however, not all applications can tolerate errors, which limits its applicability. We propose extensions and GPU-specific optimizations to adapt a lossless memory compression technique to a MAG size to increase the effective compression ratio and performance gain. Our technique is based on the well-known Base-Delta-Immediate (BDI) compression technique that compresses a memory block to a common base and multiple deltas. We leverage the key observation that deltas often contain enough leading zeros to compress a block to a multiple of MAG without any loss of information. We show that MAG-aware BDI provides, on average, a 48% higher effective compression ratio, 10% (up to 27%) higher speedup, and 16% bandwidth reduction compared to normal BDI. While BDI, FPC, and CPACK have a similar compression ratio, MAG-aware BDI outperforms FPC, CPACK, and SLC by 56%, 47%, and 33%, respectively.enhttp://rightsstatements.org/vocab/InC/1.0/InformatikTechnikIngenieurwissenschaftenMemory access granularity aware lossless compression for GPUsConference Paper10.15480/882.422110.1109/IPDPS53621.2022.0010810.15480/882.4221Sohan Lal, Manuel Renz, Julian Hartmer, Ben Juurlink. Memory Access Granularity Aware Lossless Compression for GPUs. In: Proceedings of the 36th IEEE International Parallel & Distributed Processing Symposium, IPDPS 2022. © 2022 IEEEConference Paper