HydraViT: stacking heads for a scalable ViT

Haberer, Janek; Hojjat, Ali; Landsiedel, Olaf

HydraViT: stacking heads for a scalable ViT

Publikationstyp

Conference Paper

Date Issued

2024

Sprache

English

Author(s)

Haberer, Janek

Hojjat, Ali

Landsiedel, Olaf

TORE-URI

https://hdl.handle.net/11420/61007

First published in

Advances in neural information processing systems

Number in series

37

Start Page

40254

End Page

40277

Citation

38th Advances in Neural Information Processing System, NeurIPS 2024

Contribution to Conference

38th Advances in Neural Information Processing System, NeurIPS 2024

Publisher DOI

10.52202/079017-1273

Publisher

Neural Information Processing Systems Foundation, Inc. (NeurIPS)

ISBN of container

979-8-3313-1438-5

The architecture of Vision Transformers (ViTs), particularly the Multi-head Attention (MHA) mechanism, imposes substantial hardware demands. Deploying ViTs
on devices with varying constraints, such as mobile phones, requires multiple models of different sizes. However, this approach has limitations, such as training and storing each required model separately. This paper introduces HydraViT, a novel approach that addresses these limitations by stacking attention heads to achieve a scalable ViT. By repeatedly changing the size of the embedded dimensions through-out each layer and their corresponding number of attention heads in MHA during training, HydraViT induces multiple subnetworks. Thereby, HydraViT achievesadaptability across a wide spectrum of hardware environments while maintaining performance. Our experimental results demonstrate the efficacy of HydraViT in achieving a scalable ViT with up to 10 subnetworks, covering a wide range of resource constraints. HydraViT achieves up to 5 p.p. more accuracy with the same GMACs and up to 7 p.p. more accuracy with the same throughput on ImageNet-1K compared to the baselines, making it an effective solution for scenarios where hardware availability is diverse or varies over time. The source code is available at https://github.com/ds-kiel/HydraViT.

Subjects

MLE@TUHH

DDC Class

600: Technology

Options

HydraViT: stacking heads for a scalable ViT