AIDA paper studies the use of static data from the code to detect security vulnerabilities with Machine Learning

The most recent paper of AIDA project was presented at 17th European Dependable Computing Conference (EDCC 2021) and evaluates the ability of the four Machine Learning algorithms to predict vulnerable files in an Open Source C/C++ project (Mozilla). The paper is entitled “Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study“, and the authors of this paper are José D’Abruzzo Pereira, João R. Campos, and Marco Vieira, from the University of Coimbra.



The algorithms used as input (features) two different types of information to predict the files: Software Metrics and alerts from Static Analysis Tools (SATs). The predictions were performed without considering the type of vulnerability and also considering the categories of vulnerabilities, which have been defined in a previous work and are based on the best practices for security programming. 


This paper aims at studying the hypothesis of combining alerts of multiple SATs with SMs as features for ML algorithms to predict software vulnerabilities in large software projects.


The results either have a high number of false alarms (vulnerabilities that are not actual vulnerabilities), or not all vulnerabilities are identified. Hence, some of the vulnerability fixes were analyzed, trying to understand why such results were obtained. Overall, the fixes involve adding more source code, which increases the values for the software metrics.


EDCC 2021 took place 13th-16th September 2021 virtually. The European Dependable Computing Conference is a leading venue for presenting and discussing the latest research, industrial practice and innovations in dependable and secure computing, with a long tradition started in 1994.