TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publication References
  4. HydraViT: stacking heads for a scalable ViT
 
Options

HydraViT: stacking heads for a scalable ViT

Publikationstyp
Conference Paper
Date Issued
2024
Sprache
English
Author(s)
Haberer, Janek  
Hojjat, Ali  
Landsiedel, Olaf  
TORE-URI
https://hdl.handle.net/11420/61007
First published in
Advances in neural information processing systems  
Number in series
37
Start Page
40254
End Page
40277
Citation
38th Advances in Neural Information Processing System, NeurIPS 2024
Contribution to Conference
38th Advances in Neural Information Processing System, NeurIPS 2024  
Publisher DOI
10.52202/079017-1273
Publisher
Neural Information Processing Systems Foundation, Inc. (NeurIPS)
ISBN of container
979-8-3313-1438-5
The architecture of Vision Transformers (ViTs), particularly the Multi-head Attention (MHA) mechanism, imposes substantial hardware demands. Deploying ViTs
on devices with varying constraints, such as mobile phones, requires multiple models of different sizes. However, this approach has limitations, such as training and storing each required model separately. This paper introduces HydraViT, a novel approach that addresses these limitations by stacking attention heads to achieve a scalable ViT. By repeatedly changing the size of the embedded dimensions through-out each layer and their corresponding number of attention heads in MHA during training, HydraViT induces multiple subnetworks. Thereby, HydraViT achievesadaptability across a wide spectrum of hardware environments while maintaining performance. Our experimental results demonstrate the efficacy of HydraViT in achieving a scalable ViT with up to 10 subnetworks, covering a wide range of resource constraints. HydraViT achieves up to 5 p.p. more accuracy with the same GMACs and up to 7 p.p. more accuracy with the same throughput on ImageNet-1K compared to the baselines, making it an effective solution for scenarios where hardware availability is diverse or varies over time. The source code is available at https://github.com/ds-kiel/HydraViT.
Subjects
MLE@TUHH
DDC Class
600: Technology
TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback