Brunel University Research Archive (BURA) >
Schools >
School of Information Systems, Computing and Mathematics >
Brunel Software Engineering ResearCh Group (B-SERC) >
B-SERC Research Papers >

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/3220

Title: Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation
Authors: Song, Q
Shepperd, MJ
Chen, X
Liu, J
Keywords: Machine learning
Imputation
Rule induction
Classifier
Software project
Cost estimation
Publication Date: 2008
Publisher: Elsevier
Citation: Journal of Systems and Software. 81 (12): 2361-2370
Abstract: Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%.
URI: http://bura.brunel.ac.uk/handle/2438/3220
http://www.sciencedirect.com/science/article/pii/S0164121208000988
DOI: http://dx.doi.org/10.1016/j.jss.2008.05.008
ISSN: 0164-1212
Appears in Collections:B-SERC Research Papers
Computer Science
Dept of Computer Science Research Papers

Files in This Item:

File Description SizeFormat
Can k-NN Imputation Improve the Performance of C4.5 With Small Software Project Data Sets A Comparative A Comparative.pdf1.75 MBAdobe PDFView/Open

Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.