TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publication References
  4. Impact of identifier normalization on vulnerability detection techniques
 
Options

Impact of identifier normalization on vulnerability detection techniques

Publikationstyp
Conference Paper
Date Issued
2025-04
Sprache
English
Author(s)
Hinrichs, Torge  
Software Security E-22  
Diercks, Tim
Software Security E-22  
Scandariato, Riccardo  
Software Security E-22  
TORE-URI
https://hdl.handle.net/11420/60362
Start Page
69
End Page
76
Citation
IEEE International Conference on Software Analysis, Evolution and Reengineering - Companion, SANER-C 2025
Contribution to Conference
IEEE International Conference on Software Analysis, Evolution and Reengineering - Companion, SANER-C 2025  
Publisher DOI
10.1109/saner-c66551.2025.00017
Publisher
IEEE
ISBN of container
979-8-3315-3749-4
This study examines the impact of identifier normalization on software vulnerability detection using three approaches: static application security testing (SAST), specialized machine learning (ML) models, and Large Language Models (LLM). Using the BigVul dataset of vulnerabilities in C/C++ projects, the research evaluates the performance of these methods under normalized (generalized variables / functions names) and their original conditions. SAST tools such as Flawfinder and CppCheck exhibit limited effectiveness (F1 ∼ scores 0.1) and are unaffected by normalization. Specialized ML models, such as LineVul, achieve high F1 scores on nonnormalized data (F1 ∼ 0.9) but suffer significant performance drops when tested on normalized inputs, highlighting their lack of generalizability. In contrast, LLMs such as Llama3, although underperforming in their pre-trained state, show substantial improvement after fine-tuning, achieving robust and consistent results across both normalized and non-normalized datasets. The findings suggest that while SAST tools are less effective, fine-tuned LLMs hold strong potential for scalable and generalized vulnerability detection. The study recommends further exploration of hybrid approaches that combine ML models, LLMs, and traditional tools to enhance accuracy and adaptability in diverse scenarios.
Subjects
Vulnerability Detection
Data Set Normalization
LLM
Large Language Models
Machine Learning
Static Application Security Testing
DDC Class
005.8: Computer Security
Funding(s)
Cybersecurity for AI-Augmented Systems  
TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback