TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publications
  4. Enhancing Practicality of Memory Compression for GPUs with High-Throughput Simplifications
 
Options

Enhancing Practicality of Memory Compression for GPUs with High-Throughput Simplifications

Citation Link: https://doi.org/10.15480/882.15123
Publikationstyp
Conference Paper
Date Issued
2025-05
Sprache
English
Author(s)
Renz, Manuel  
Massively Parallel Systems E-EXK5  
Lal, Sohan  
Massively Parallel Systems E-EXK5  
TORE-DOI
10.15480/882.15123
TORE-URI
https://hdl.handle.net/11420/55470
Citation
22nd ACM International Conference on Computing Frontiers, CF' 25
Contribution to Conference
22nd ACM International Conference on Computing Frontiers, CF' 25  
Publisher DOI
10.1145/3719276.3725182
Publisher
ACM
Peer Reviewed
true
Memory-bound Graphics Processing Unit (GPU) applications are limited by memory bandwidth, as the rapid growth in computational power has outpaced the slower increase in memory bandwidth. Consequently, approaches such as memory compression, are gaining prominence to synthetically enhance memory bandwidth and accelerate bandwidth-limited applications. Traditionally, compression techniques have been tailored towards achieving high compression ratios without considering the high bandwidth of modern GPU memory systems which makes hardware integration costly and impractical. We analyze several state-of-the-art memory compression techniques and finds that the throughput of Bit-Plane Compression (BPC) and Frequent Pattern Compression (FPC) is limited by Zero Run-Length Encoding (ZRLE), which efficiently compresses zero blocks, however, GPUs often do not benefit as even heavily compressed blocks require the transfer of a full Memory-Access Granularity (MAG). We propose simplifying the BPC and FPC techniques by removing ZRLE and introducing a fixed-size tag section. Together with higher word-level parallelism, our simplifications increase compressor throughput by 14.0× and decompressor throughput by 13.5x without any loss in the effective compression ratio. Additionally, the area required for hardware integration is significantly reduced; for instance, the area cost of BPC is decreased by 3.6x, and power consumption by 1.8x, making hardware integration of memory compression more practical and cost-effective.
Subjects
Memory Compression
GPUs
Throughput
DDC Class
621.3: Electrical Engineering, Electronic Engineering
004: Computer Sciences
Publication version
publishedVersion
Lizenz
http://rightsstatements.org/vocab/InC/1.0/
Loading...
Thumbnail Image
Name

memory_compression_high_throughput_simplifications.pdf

Size

465.21 KB

Format

Adobe PDF

TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback