TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publication References
  4. The ground truth effect: investigating SZZ variants in Just-in-Time vulnerability prediction
 
Options

The ground truth effect: investigating SZZ variants in Just-in-Time vulnerability prediction

Publikationstyp
Conference Paper
Date Issued
2025
Sprache
English
Author(s)
Cannavale, Alfonso  
Iannone, Emanuele 
Software Security E-22  
Di Lillo, Gianluca
Palomba, Fabio  
De Lucia, Andrea  
TORE-URI
https://hdl.handle.net/11420/58017
First published in
Lecture notes in computer science  
Number in series
16083 LNCS
Start Page
317
End Page
326
Citation
51st Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2025
Contribution to Conference
51st Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2025  
Publisher DOI
10.1007/978-3-032-04207-1_21
Scopus ID
2-s2.0-105016545057
Publisher
Springer
ISBN of container
978-3-032-04207-1
978-3-032-04206-4
978-3-032-04208-8
Just-in-Time (JIT) vulnerability prediction is critical for proactively securing software, yet its effectiveness heavily relies on the quality of the ground truth used for training models. This ground truth is commonly established using variants of the SZZ algorithm to identify vulnerability-contributing commits (VCCs). However, the impact of choosing a specific SZZ variant on model performance remains largely unexplored. In this study, we systematically investigate the effect of eight SZZ variants on JIT vulnerability prediction across seven open-source Java projects. Our findings reveal that the choice of the SZZ variant is a non-trivial factor. Models trained with datasets labeled by variants like B-SZZ, V-SZZ, and VCC-SZZ achieve strong and stable predictive performance, with median MCC scores often exceeding 0.50. In contrast, variants such as L-SZZ and R-SZZ produce models that perform no better than random chance, with median MCC scores close to 0.0. This performance gap demonstrates that an inappropriate SZZ variant can invalidate prediction models, underscoring the necessity of a principled approach to defining ground truth.
DDC Class
005.8: Computer Security
Funding(s)
Cybersecurity for AI-Augmented Systems  
TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback