Impact of identifier normalization on vulnerability detection techniques

This study examines the impact of identifier normalization on software vulnerability detection using three approaches: static application security testing (SAST), specialized machine learning (ML) models, and Large Language Models (LLM). Using the BigVul dataset of vulnerabilities in C/C++ projects, the research evaluates the performance of these methods under normalized (generalized variables / functions names) and their original conditions. SAST tools such as Flawfinder and CppCheck exhibit limited effectiveness (F1 ∼ scores 0.1) and are unaffected by normalization. Specialized ML models, such as LineVul, achieve high F1 scores on nonnormalized data (F1 ∼ 0.9) but suffer significant performance drops when tested on normalized inputs, highlighting their lack of generalizability. In contrast, LLMs such as Llama3, although underperforming in their pre-trained state, show substantial improvement after fine-tuning, achieving robust and consistent results across both normalized and non-normalized datasets. The findings suggest that while SAST tools are less effective, fine-tuned LLMs hold strong potential for scalable and generalized vulnerability detection. The study recommends further exploration of hybrid approaches that combine ML models, LLMs, and traditional tools to enhance accuracy and adaptability in diverse scenarios.

Subjects

Vulnerability Detection

Data Set Normalization

LLM

Large Language Models

Machine Learning

Static Application Security Testing

DDC Class

005.8: Computer Security

Funding(s)

Cybersecurity for AI-Augmented Systems

Options

Impact of identifier normalization on vulnerability detection techniques