Options
Machine learning for SAST: a lightweight and adaptable approach
Publikationstyp
Conference Paper
Publikationsdatum
2024-01-11
Sprache
English
Author
First published in
Number in series
14347 LNCS
Start Page
85
End Page
104
Citation
Lecture Notes in Computer Science 14347: 85-104 (2024)
Contribution to Conference
Publisher DOI
Scopus ID
Publisher
Springer Nature
ISBN
978-3-031-51482-1
978-3-031-51481-4
In this paper, we summarize a novel method for machine learning-based static application security testing (SAST), which was devised as part of a larger study funded by Germany’s Federal Office for Information Security (BSI). SAST describes the practice of applying static analysis techniques to program code on the premise of detecting security-critical software defects early during the development process. In the past, this was done by using rule-based approaches, where the program code is checked against a set of rules that define some pattern, representative of a defect. Recently, an increasing influx of publications can be observed that discuss the application of machine learning methods to this problem. Our method poses a lightweight approach to this concept, comprising two main contributions: Firstly, we present a novel control-flow based embedding method for program code. Embedding the code into a metric space is a necessity in order to apply machine learning techniques to the problem of SAST. Secondly, we describe how this method can be applied to generate expressive, yet simple, models of some unwanted behavior. We have implemented these methods in a prototype for the C and C++ programming languages. Using tenfold cross-validation, we show that our prototype is capable of effectively predicting the location and type of software defects in previously unseen code.
Schlagworte
Machine Learning
SAST
Software Defect Prediction
Static Analysis
MLE@TUHH
DDC Class
004: Computer Sciences