Options
Low-latency real-time inference for multilayer perceptrons on FPGAs
Publikationstyp
Conference Paper
Date Issued
2023
Sprache
English
Start Page
123
End Page
133
Citation
International Workshop on Boolean Problems (2023)
Contribution to Conference
Publisher DOI
Scopus ID
Publisher
Springer International Publishing
ISBN
978-3-031-28916-3
978-3-031-28915-6
978-3-031-28917-0
978-3-031-28918-7
Application domains such as process control, particle accelerator control systems, autonomous driving, and monitoring of critical infrastructures are considered latency critical. However, most studies and commercial processors focus on the throughput aspect of the machine learning algorithms implemented in these domains. Given the wide use of multilayer perceptron neural network, specifically, in fast inference tasks and their competitive accuracy, we propose an efficient, latency-optimized architecture for multilayer perceptron neural networks, implemented on field-programmable gate arrays (FPGAs). The proposed architecture takes advantage of the inherited parallel computation model of the multilayer perceptron and our proposed implementation of the segmented activation functions. We analyze the latency, accuracy, and power consumption of the proposed architecture in comparison with the state of the implementations. Experimental results show that the proposed architecture for a topology of 7-9-9-9-5 of the neural network achieves a latency of 86.58 ns and power consumption of 3.731 W with an accuracy of 98.21%. When compared to the same topology of the state of the art, ours outperforms the pre-existing implementation in latency by a factor of 2.1 x for the customized implementation, and 332.81 x for a commercial IP.
Subjects
MLE@TUHH
DDC Class
004: Computer Sciences
510: Mathematics