TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publication References
  4. CNN Implementation and Analysis on Xilinx Versal ACAP at European XFEL
 
Options

CNN Implementation and Analysis on Xilinx Versal ACAP at European XFEL

Publikationstyp
Conference Paper
Date Issued
2022-09
Sprache
English
Author(s)
Al-Zoubi, Ahmad  
Martino, Gianluca  orcid-logo
Bahnsen, Fin Hendrik  
Zhu, Jun  
Schlarb, Holger  
Fey, Görschwin  orcid-logo
Institut
Eingebettete Systeme E-13  
TORE-URI
http://hdl.handle.net/11420/13970
Citation
35th IEEE International System-on-Chip Conference (SOCC 2022)
Contribution to Conference
35th IEEE International System-on-Chip Conference, SOCC 2022  
Publisher DOI
10.1109/SOCC56010.2022.9908101
Scopus ID
2-s2.0-85140775256
Developers have proposed various hardware accelerators to improve the CNN inference performance on embedded platforms. Recently, Xilinx announced its first 7-nm FPGA accelerator, the Versal ACAP, delivering a high-performance, heterogeneous computing platform adaptable to the application requirements. However, as early studies were concerned with the most common deep learning architectures for CNN, e.g. VGG, Resnet, Inception, etc., under full support of the Xilinx Vitis-AI, the implementation and analysis of the Versal ACAP performance with customized CNN architectures is yet to be explored. In this study, we implement one of the CNN architectures considered at the European XFEL and compare its performance to a state-of-the-art GPU and other FPGA generation. In addition, this study evaluates the validity of using the quantization methods for critical regression applications and presents a complete analysis of the results built upon the device time traces, providing recommendations for configuring the runtime parameters. The experimental results confirm a superior performance of the Versal ACAP in terms of latency and throughput. When the neural network layers were all supported by the ACAP processing unit, it achieved 17x and 18x better throughput and latency compared to GPU. In addition, when quantized using the fine-tuning method, the CNN model shows an improved accuracy compared to the floating-point model, with a reduction of 6% in loss.
Subjects
CNN
FPGA
Heterogeneous
Quantization
Versal ACAP
MLE@TUHH
TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback