Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/31319
Full metadata record
DC FieldValueLanguage
dc.contributor.authorSanderson, D-
dc.contributor.authorKalganova, T-
dc.date.accessioned2025-05-25T16:44:34Z-
dc.date.available2025-05-25T16:44:34Z-
dc.date.issued2025-05-08-
dc.identifierORCiD: Dominic Sanderson https://orcid.org/0000-0002-1339-143X-
dc.identifierORCiD: Tatiana Kalganova https://orcid.org/0000-0003-4859-7152-
dc.identifierArticle number: 98-
dc.identifier.citationSanderson D. and Kalganova, T. (2025) 'Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets', AI, 6 (5), 98, pp. 1 - 26. doi: 10.3390/ai6050098.en_US
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/31319-
dc.descriptionData Availability Statement: The open-source dataset used in this study may be found here: https://springernature.figshare.com/collections/A_High-Fidelity_Residential_Building_Occupancy_Detection_Dataset/5364449 (accessed on 16 April 2024).en_US
dc.description.abstractOccupancy detection for large buildings enables optimised control of indoor systems based on occupant presence, reducing the energy costs of heating and cooling. Through machine learning models, occupancy detection is achieved with an accuracy of over 95%. However, to achieve this, large amounts of data are collected with little consideration of which of the collected data are most useful to the task. This paper demonstrates methods to identify if data may be removed from the imbalanced time-series training datasets to optimise the training process and model performance. It also describes how the calculation of the class density of a dataset may be used to identify if a dataset is applicable for data reduction, and how dataset fusion may be used to combine occupancy datasets. The results show that over 50% of a training dataset may be removed from imbalanced datasets while maintaining performance, reducing training time and energy cost by over 40%. This indicates that a data-centric approach to developing artificial intelligence applications is as important as selecting the best model.en_US
dc.description.sponsorshipThis research was funded by InnovateUK, project number 10097909.en_US
dc.format.extent1 - 26-
dc.format.mediumElectronic-
dc.languageEnglish-
dc.language.isoen_USen_US
dc.publisherMDPIen_US
dc.relation.urihttps://springernature.figshare.com/collections/A_High-Fidelity_Residential_Building_Occupancy_Detection_Dataset/5364449-
dc.rightsCreative Commons Attribution 4.0 International-
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/-
dc.subjectoccupancy detectionen_US
dc.subjectdata reductionen_US
dc.subjectdynamic data applicationen_US
dc.subjecttime-series dataen_US
dc.subjectuseful dataen_US
dc.subjectclass balanceen_US
dc.subjectclass densityen_US
dc.subjectdataset fusionen_US
dc.subjectgreen AIen_US
dc.titleIdentifying Suitability for Data Reduction in Imbalanced Time-Series Datasetsen_US
dc.typeArticleen_US
dc.date.dateAccepted2025-04-18-
dc.identifier.doihttps://doi.org/10.3390/ai6050098-
dc.relation.isPartOfAI-
pubs.issue5-
pubs.publication-statusPublished online-
pubs.volume6-
dc.identifier.eissn2673-2688-
dc.rights.licensehttps://creativecommons.org/licenses/by/4.0/legalcode.en-
dcterms.dateAccepted2025-04-18-
dc.rights.holderThe authors-
Appears in Collections:Dept of Electronic and Electrical Engineering Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).897.15 kBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons