Automatic extraction of security-rich dataflow diagrams for microservice applications written in Java
Dataflow diagrams (DFDs) are a valuable asset for securing applications, as they are the starting point for many security assessment techniques. Their creation, however, is often done manually, which is time-consuming and introduces problems concerning their correctness. Furthermore, as applications are continuously extended and modified in CI/CD pipelines, the DFDs need to be kept in sync, which is also challenging. In this paper, we present a novel, tool-supported technique to automatically extract DFDs from the implementation code of microservices. The technique parses source code and configuration files in search for keywords that are used as evidence for the model extraction. Our approach uses a novel technique that iteratively detects new keywords, thereby snowballing through an application's codebase. Coupled with other detection techniques, it produces a fully-fledged DFD enriched with security-relevant annotations. The extracted DFDs further provide full traceability between model items and code snippets. We evaluate our approach and the accompanying prototype for applications written in Java on a manually curated dataset of 17 open-source applications. In our testing set of applications, we observe an overall precision of 93% and recall of 85%. The dataset created for the evaluation is openly released to the research community, as additional contribution of this work.