Low-latency real-time inference for multilayer perceptrons on FPGAs

Al-Zoubi, AhmadAhmadAl-ZoubiFey, GörschwinGörschwinFey2023-11-022023-11-022023International Workshop on Boolean Problems (2023)978-3-031-28916-3978-3-031-28915-6978-3-031-28917-0978-3-031-28918-7https://hdl.handle.net/11420/44028Application domains such as process control, particle accelerator control systems, autonomous driving, and monitoring of critical infrastructures are considered latency critical. However, most studies and commercial processors focus on the throughput aspect of the machine learning algorithms implemented in these domains. Given the wide use of multilayer perceptron neural network, specifically, in fast inference tasks and their competitive accuracy, we propose an efficient, latency-optimized architecture for multilayer perceptron neural networks, implemented on field-programmable gate arrays (FPGAs). The proposed architecture takes advantage of the inherited parallel computation model of the multilayer perceptron and our proposed implementation of the segmented activation functions. We analyze the latency, accuracy, and power consumption of the proposed architecture in comparison with the state of the implementations. Experimental results show that the proposed architecture for a topology of 7-9-9-9-5 of the neural network achieves a latency of 86.58 ns and power consumption of 3.731 W with an accuracy of 98.21%. When compared to the same topology of the state of the art, ours outperforms the pre-existing implementation in latency by a factor of 2.1 x for the customized implementation, and 332.81 x for a commercial IP.enMLE@TUHHComputer SciencesMathematicsLow-latency real-time inference for multilayer perceptrons on FPGAsConference Paper10.1007/978-3-031-28916-3_9Conference Paper