The ground truth effect: investigating SZZ variants in Just-in-Time vulnerability prediction

Cannavale, Alfonso; Iannone, Emanuele; Di Lillo, Gianluca; Palomba, Fabio; De Lucia, Andrea

The ground truth effect: investigating SZZ variants in Just-in-Time vulnerability prediction

Publikationstyp

Conference Paper

Date Issued

2025

Sprache

English

Author(s)

Cannavale, Alfonso

Iannone, Emanuele

Software Security E-22

Di Lillo, Gianluca

Palomba, Fabio

De Lucia, Andrea

TORE-URI

https://hdl.handle.net/11420/58017

First published in

Lecture notes in computer science

Number in series

16083 LNCS

Start Page

317

End Page

326

Citation

51st Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2025

Contribution to Conference

51st Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2025

Publisher DOI

10.1007/978-3-032-04207-1_21

Scopus ID

2-s2.0-105016545057

Publisher

Springer

ISBN of container

978-3-032-04207-1

978-3-032-04206-4

978-3-032-04208-8

Just-in-Time (JIT) vulnerability prediction is critical for proactively securing software, yet its effectiveness heavily relies on the quality of the ground truth used for training models. This ground truth is commonly established using variants of the SZZ algorithm to identify vulnerability-contributing commits (VCCs). However, the impact of choosing a specific SZZ variant on model performance remains largely unexplored. In this study, we systematically investigate the effect of eight SZZ variants on JIT vulnerability prediction across seven open-source Java projects. Our findings reveal that the choice of the SZZ variant is a non-trivial factor. Models trained with datasets labeled by variants like B-SZZ, V-SZZ, and VCC-SZZ achieve strong and stable predictive performance, with median MCC scores often exceeding 0.50. In contrast, variants such as L-SZZ and R-SZZ produce models that perform no better than random chance, with median MCC scores close to 0.0. This performance gap demonstrates that an inappropriate SZZ variant can invalidate prediction models, underscoring the necessity of a principled approach to defining ground truth.

DDC Class

005.8: Computer Security

Funding(s)

Cybersecurity for AI-Augmented Systems

Options

The ground truth effect: investigating SZZ variants in Just-in-Time vulnerability prediction