Machine learning for SAST: a lightweight and adaptable approach

Hüther, LorenzLorenzHütherSohr, KarstenKarstenSohrBerger, Bernhard JohannesBernhard JohannesBergerRothe, HendrikHendrikRotheEdelkamp, StefanStefanEdelkamp2024-08-022024-08-022024-01-1128th European Symposium on Research in Computer Security, ESORICS 2023https://hdl.handle.net/11420/48582In this paper, we summarize a novel method for machine learning-based static application security testing (SAST), which was devised as part of a larger study funded by Germany’s Federal Office for Information Security (BSI). SAST describes the practice of applying static analysis techniques to program code on the premise of detecting security-critical software defects early during the development process. In the past, this was done by using rule-based approaches, where the program code is checked against a set of rules that define some pattern, representative of a defect. Recently, an increasing influx of publications can be observed that discuss the application of machine learning methods to this problem. Our method poses a lightweight approach to this concept, comprising two main contributions: Firstly, we present a novel control-flow based embedding method for program code. Embedding the code into a metric space is a necessity in order to apply machine learning techniques to the problem of SAST. Secondly, we describe how this method can be applied to generate expressive, yet simple, models of some unwanted behavior. We have implemented these methods in a prototype for the C and C++ programming languages. Using tenfold cross-validation, we show that our prototype is capable of effectively predicting the location and type of software defects in previously unseen code.enMachine LearningSASTSoftware Defect PredictionStatic AnalysisMLE@TUHHComputer Science, Information and General Works::004: Computer SciencesMachine learning for SAST: a lightweight and adaptable approachConference Paper10.1007/978-3-031-51482-1_5Conference Paper