An audit of machine learning experiments on software defect prediction

Destefanis, G; Yousefi, L; Shepperd, M; Tucker, A; Swift, S; Counsell, S; Arzoky, M

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/33014

Title:	An audit of machine learning experiments on software defect prediction
Authors:	Destefanis, G Yousefi, L Shepperd, M Tucker, A Swift, S Counsell, S Arzoky, M
Keywords:	software defect;machine learning;audit;research review
Issue Date:	20-Feb-2026
Publisher:	Springer Nature
Citation:	Destefanis, G. et al. (2026) 'An audit of machine learning experiments on software defect prediction', Empirical Software Engineering, 31 (4), 83, pp. 1–36. doi: 10.1007/s10664-025-10797-w.
Abstract:	Background: Machine learning algorithms are increasingly being proposed to solve the problem of predicting defect-prone software components. In this literature, computational experiments are the primary means of evaluating and comparing learners and the credibility of findings depends critically on their experimental design and reporting. Objective: This paper audits recent software defect prediction (SDP) experiments by assessing their experimental design, analysis and reporting practices against widely accepted norms from statistics, machine learning and empirical software engineering. Our aim is to characterise the current state of practice and evaluate the reproducibility of published findings. Method: We undertook an audit of relevant studies published from the SCOPUS database (2019-2023) focusing on their experimental design and analysis choices e.g., the outcome variables such as F-measure and the type of out of sample (OOS) validation regime, e.g., cross-validation, plus the statistical analysis and inference mechanisms. In all, we evaluated nine different study issues. This was complemented by an assessment of reproducibility using the instrument proposed by González-Barahona and Robles. Results: Our search located approximately 1,585 experiments in SDP (2019-2023), a substantial body of work. From this, we randomly sampled 101 (≈ 6.4%) papers, 61 journal and 40 conference papers. Almost 50% are behind ‘paywalls’. We found considerable divergence in research practice. The number of datasets used ranged 1-365, the number of learners or learner variants evaluated from 1-34 and the number of performance metrics from 1 to 9. Approximately 45% of papers made use of formal statistical inference. We detected a total of 427 issues distributed across 101 papers (median=4) with only one paper being entirely issue-free. In terms of reproducibility, experiments ranged from near perfect to lacking almost all required information. We also found two examples of tortured phrases and potential “paper mill” activity. Conclusions: Approaches to designing and reporting computational experiments varied greatly, but almost half the studies provided insufficient information such that reproduction would be challenging. Overall, our audit suggests that as a research community, we have considerable scope for improvement. Fortunately, many improvements should be neither difficult nor costly to achieve.
Description:	Data and Code Availability: The dataset used in this study and the code to analyse it are available from here https://zenodo.org/records/17696089 and DOI https://doi.org/10.5281/zenodo.13927601 .
URI:	https://bura.brunel.ac.uk/handle/2438/33014
DOI:	https://doi.org/10.1007/s10664-025-10797-w
ISSN:	1382-3256
Other Identifiers:	ORCiD: Giuseppe Destefanis https://orcid.org/0000-0003-3982-6355 ORCiD: Leila Yousefi https://orcid.org/0000-0003-1952-0674 ORCiD: Martin Shepperd https://orcid.org/0000-0003-1874-6145 ORCiD: Allan Tucker https://orcid.org/0000-0001-5105-3506 ORCiD: Stephen Swift https://orcid.org/0000-0001-8918-3365 ORCiD: Steve Counsell https://orcid.org/0000-0002-2939-8919 ORCiD: Mahir Arzoky https://orcid.org/0000-0002-2721-643X
Appears in Collections:	Department of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © The Author(s) 2025. Rights and permissions: Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.	2.92 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License