Please use this identifier to cite or link to this item:
|Title:||Data sets and data quality in software engineering: Eight years on|
|Keywords:||Empirical software engineering;Data quality;Mapping study|
|Citation:||ACM International Conference Proceeding Series, 2016|
|Abstract:||Context: We revisit our review of data quality within the context of empirical software engineering eight years on from our PROMISE 2008 article. Objective: To assess the extent and types of techniques used to manage quality within data sets. We consider this a particularly interesting question in the context of initiatives to promote sharing and secondary analysis of data sets. Method: We update the 2008 mapping study through four subsequently published reviews and a snowballing exercise. Results: The original study located only 23 articles explicitly considering data quality. This picture has changed substantially as our updated review now finds 283 articles, however, our estimate is that this still represents perhaps 1% of the total empirical software engineering literature. Conclusions: It appears the community is now taking the issue of data quality more seriously and there is more work exploring techniques to automatically detect (and sometimes repair) noise problems. However, there is still little systematic work to evaluate the various data sets that are widely used for secondary analysis; addressing this would be of considerable benefit. It should also be a priority to work collaboratively with practitioners to add new, higher quality data to the existing corpora.|
|Appears in Collections:||Dept of Computer Science Research Papers|
Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.