The Prevalence of Errors in Machine Learning Experiments

Shepperd, M; Guo, Y; Li, N; Arzoky, M; Capiluppi, A; Counsell, S; Destefanis, G; Swift, S; Tucker, A; Yousefi, L

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/19135

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shepperd, M	-
dc.contributor.author	Guo, Y	-
dc.contributor.author	Li, N	-
dc.contributor.author	Arzoky, M	-
dc.contributor.author	Capiluppi, A	-
dc.contributor.author	Counsell, S	-
dc.contributor.author	Destefanis, G	-
dc.contributor.author	Swift, S	-
dc.contributor.author	Tucker, A	-
dc.contributor.author	Yousefi, L	-
dc.date.accessioned	2019-09-16T09:55:00Z	-
dc.date.available	2019-09-16T09:55:00Z	-
dc.date.issued	2019	-
dc.identifier.citation	arXiv:1909.04436v1 [cs.LG	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/19135	-
dc.identifier.uri	https://arxiv.org/abs/1909.04436v1	-
dc.description.abstract	Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction. Our focus is simple arithmetical and statistical errors. Method: We analyse 49 papers describing 2456 individual experimental results from a previously undertaken systematic review comparing supervised and unsupervised defect prediction classifiers. We extract the confusion matrices and test for relevant constraints, e.g., the marginal probabilities must sum to one. We also check for multiple statistical significance testing errors. Results: We find that a total of 22 out of 49 papers contain demonstrable errors. Of these 7 were statistical and 16 related to confusion matrix inconsistency (one paper contained both classes of error). Conclusions: Whilst some errors may be of a relatively trivial nature, e.g., transcription errors their presence does not engender confidence. We strongly urge researchers to follow open science principles so errors can be more easily be detected and corrected, thus as a community reduce this worryingly high error rate with our computational experiments.	en_US
dc.language.iso	en	en_US
dc.publisher	Cornell University	en_US
dc.subject	classiﬁer	en_US
dc.subject	computational experiment	en_US
dc.subject	reliability	en_US
dc.subject	error	en_US
dc.title	The Prevalence of Errors in Machine Learning Experiments	en_US
dc.type	Conference Paper	en_US
pubs.notes	20th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL), 14--16 November 2019	-
Appears in Collections:	Department of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf		238.02 kB	Adobe PDF	View/Open

Show simple item record