Options
Latency-optimized hardware acceleration of multilayer perceptron inference
Publikationstyp
Conference Paper
Date Issued
2023
Sprache
English
Start Page
235
End Page
241
Citation
26th Euromicro Conference on Digital System Design: 235-241 (2023)
Contribution to Conference
Publisher DOI
Scopus ID
Publisher
IEEE
ISBN
979-835034419-6
Decreasing the inference latency of neural networks is crucial in situations where real-time responses are necessary. We propose a new neuron architecture for parallel computations, targeting the MLP implementation on an FPGA. The parallelism in the proposed architecture is exposed through the segmentation of non-linear activation functions into a set of linear segments, delivering highly accurate estimations of the original function. The implementation combines various other optimization techniques, such as fixed-point arithmetics, pipelining, array partitioning, and loop unrolling. For the validation of the proposed architecture using the Xilinx Vitis HLS toolchain, four MLPs with a mix of non-linear activation functions have been implemented and evaluated in comparison to accelerated models produced by the open-source tool hls4ml, a Python package for latency-optimized machine learning inference in FPGAs. Experimental results clearly show that our proposed architecture outperformed the corresponding hls4ml model with up to three times speedups.
Subjects
FPGA
MLP
Non-linear activation function
Parallel
Segmentation
MLE@TUHH
DDC Class
004: Computer Sciences
620: Engineering