Enhancing Practicality of Memory Compression for GPUs with High-Throughput Simplifications

Memory-bound Graphics Processing Unit (GPU) applications are limited by memory bandwidth, as the rapid growth in computational power has outpaced the slower increase in memory bandwidth. Consequently, approaches such as memory compression, are gaining prominence to synthetically enhance memory bandwidth and accelerate bandwidth-limited applications. Traditionally, compression techniques have been tailored towards achieving high compression ratios without considering the high bandwidth of modern GPU memory systems which makes hardware integration costly and impractical. We analyze several state-of-the-art memory compression techniques and finds that the throughput of Bit-Plane Compression (BPC) and Frequent Pattern Compression (FPC) is limited by Zero Run-Length Encoding (ZRLE), which efficiently compresses zero blocks, however, GPUs often do not benefit as even heavily compressed blocks require the transfer of a full Memory-Access Granularity (MAG). We propose simplifying the BPC and FPC techniques by removing ZRLE and introducing a fixed-size tag section. Together with higher word-level parallelism, our simplifications increase compressor throughput by 14.0× and decompressor throughput by 13.5x without any loss in the effective compression ratio. Additionally, the area required for hardware integration is significantly reduced; for instance, the area cost of BPC is decreased by 3.6x, and power consumption by 1.8x, making hardware integration of memory compression more practical and cost-effective.

Subjects

Memory Compression

GPUs

Throughput

DDC Class

621.3: Electrical Engineering, Electronic Engineering

004: Computer Sciences

Lizenz

http://rightsstatements.org/vocab/InC/1.0/

Publication version

publishedVersion

Name

memory_compression_high_throughput_simplifications.pdf

Size

465.21 KB

Format

Adobe PDF

Options

Enhancing Practicality of Memory Compression for GPUs with High-Throughput Simplifications