Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/33520
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLazzaro, I-
dc.contributor.authorMilano, M-
dc.contributor.authorTucker, A-
dc.contributor.authorCannataro, M-
dc.date.accessioned2026-06-26T12:33:28Z-
dc.date.available2026-06-26T12:33:28Z-
dc.date.issued2026-06-15-
dc.identifierORCiD: Ilaria Lazzaro https://orcid.org/0009-0007-1612-2538-
dc.identifierORCiD: Allan Tucker https://orcid.org/0000-0001-5105-3506-
dc.identifier.citationLazzaro, I. et al. (2026) 'Assessing the impact of synthetic data generated by Bayesian networks on heart disease prediction', Journal of Computational Science, 99, 102940, pp. 1–10. doi: 10.1016/j.jocs.2026.102940.en-US
dc.identifier.issn1877-7503-
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/33520-
dc.descriptionData availability: All datasets used in this work are freely available in the UCI repository.en-US
dc.description.abstractSynthetic data generation using Bayesian networks (BN) offers a promising approach to overcoming data scarcity in clinical prediction tasks, yet its actual impact on model performance remains underexplored. This study investigates the use of Bayesian network-based generative models to produce synthetic patient data and examines how the quality of the original real data influences the effectiveness of such augmentation. Three benchmark datasets from the UCI Heart Disease repository (Cleveland, Hungary, and Switzerland) were employed, all sharing an identical structure comprising 13 clinical predictors. The Cleveland dataset, which is the most complete and consistent among the three, was used exclusively as the training source for learning the Bayesian network structure and parameters under clinically informed constraints. To ensure robust evaluation, the dataset was partitioned into two independent subsets: 153 patients were used to train the Bayesian network, while 150 held-out patients were used exclusively to generate synthetic records. Predictive models were trained under three configurations: real data only, synthetic data only, and a hybrid real + synthetic (filtered) dataset, and evaluated using 10-fold cross-validation and external validation on independent cohorts. Results indicate that integrating real and synthetic data significantly improved accuracy and precision, particularly for the Switzerland cohort (F(2,27)=23.06, </i>η²</i>=0.63)), whereas improvements were smaller and partially non-significant in the noisier Hungarian dataset. These findings demonstrate that the effectiveness of synthetic augmentation depends on the structure and completeness of the source data, underscoring the importance of data quality for reliable generative modelling in clinical prediction.en-US
dc.description.sponsorshipThis work has been partially supported by the OFIDIAPlus (Operational Fire Danger preventIon plAtform Plus) project under the INTERREG GREECE-ITALY 2021–2027 PROGRAMME.en-US
dc.format.extentpp. 1–10-
dc.format.mediumPrint-Electronic-
dc.languageEnglishen-US
dc.language.isoengen-US
dc.publisherElsevieren-US
dc.rightsCreative Commons Attribution 4.0 International-
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/-
dc.subjectBayesian networksen-US
dc.subjectsynthetic data generationen-US
dc.subjectheart disease predictionen-US
dc.subjectdata qualityen-US
dc.titleAssessing the impact of synthetic data generated by Bayesian networks on heart disease predictionen-US
dc.typeArticleen-US
dc.date.dateAccepted2026-06-10-
dc.identifier.doihttps://doi.org/10.1016/j.jocs.2026.102940-
dc.relation.isPartOfJournal of Computational Scienceen-US
pubs.publication-statusPublished-
pubs.volume99-
dc.identifier.eissn1877-7511-
dc.rights.licensehttps://creativecommons.org/licenses/by/4.0/legalcode.en-
dcterms.dateAccepted2026-06-10-
dc.rights.holderThe Authors-
dc.contributor.orcidLazzaro, Ilaria [0009-0007-1612-2538]-
dc.contributor.orcidTucker, Allan [0000-0001-5105-3506]-
dc.identifier.number102940-
Appears in Collections:Department of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2026 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license ( https://creativecommons.org/licenses/by/4.0/ ).2.31 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons