Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/32219
Title: A machine learning approach to vulnerability detection combining software metrics and topic modelling: Evidence from smart contracts
Authors: Ibba, G
Neykova, R
Ortu, M
Tonelli, R
Counsell, S
Destefanis, G
Keywords: vulnerability detection;software metrics;topic modelling;machine learning;source code analysis;smart contracts
Issue Date: 17-Oct-2025
Publisher: Elsevier
Citation: Ibba, G. et al. (2025) 'A machine learning approach to vulnerability detection combining software metrics and topic modelling: Evidence from smart contracts', Machine Learning with Applications, 22, 100759, pp. 1 - 18. doi: 10.1016/j.mlwa.2025.100759.
Abstract: This paper introduces a methodology for software vulnerability detection that combines structural and semantic analysis through software metrics and topic modelling. We evaluate the approach using smart contracts as a case study, focusing on their structural properties and the presence of known security vulnerabilities. We identify the most relevant metrics for vulnerability detection, evaluate multiple machine learning classifiers for both binary and multi-label classification, and improve classification performance by integrating topic modelling techniques. Our analysis shows that metrics such as cyclomatic complexity, nesting depth, and function calls are strongly associated with vulnerability presence. Using these metrics, the Random Forest classifier achieved strong performance in binary classification (AUC: 0.982, accuracy: 0.977, F1-score: 0.808) and multi-label classification (AUC: 0.951, accuracy: 0.729, F1-score: 0.839). The addition of topic modelling using Non-Negative Matrix Factorisation further improved results, increasing the F1-score to 0.881. The evaluation is conducted on Ethereum smart contracts written in Solidity.
Description: Data availability: Data and scripts are shared within the replication package available online at: https://figshare.com/s/5d0129e78d0cf0c61274 .
URI: https://bura.brunel.ac.uk/handle/2438/32219
DOI: https://doi.org/10.1016/j.mlwa.2025.100759
Other Identifiers: ORCiD: Rumyana Neykova https://orcid.org/0000-0002-2755-7728
ORCiD: Steve Counsell https://orcid.org/0000-0002-2939-8919
ORCiD: Giuseppe Destefanis https://orcid.org/0000-0003-3982-6355
Article number: 100759
Appears in Collections:Dept of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2025 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license ( https://creativecommons.org/licenses/by/4.0/ ).2.68 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons