Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/33520Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Lazzaro, I | - |
| dc.contributor.author | Milano, M | - |
| dc.contributor.author | Tucker, A | - |
| dc.contributor.author | Cannataro, M | - |
| dc.date.accessioned | 2026-06-26T12:33:28Z | - |
| dc.date.available | 2026-06-26T12:33:28Z | - |
| dc.date.issued | 2026-06-15 | - |
| dc.identifier | ORCiD: Ilaria Lazzaro https://orcid.org/0009-0007-1612-2538 | - |
| dc.identifier | ORCiD: Allan Tucker https://orcid.org/0000-0001-5105-3506 | - |
| dc.identifier.citation | Lazzaro, I. et al. (2026) 'Assessing the impact of synthetic data generated by Bayesian networks on heart disease prediction', Journal of Computational Science, 99, 102940, pp. 1–10. doi: 10.1016/j.jocs.2026.102940. | en-US |
| dc.identifier.issn | 1877-7503 | - |
| dc.identifier.uri | https://bura.brunel.ac.uk/handle/2438/33520 | - |
| dc.description | Data availability: All datasets used in this work are freely available in the UCI repository. | en-US |
| dc.description.abstract | Synthetic data generation using Bayesian networks (BN) offers a promising approach to overcoming data scarcity in clinical prediction tasks, yet its actual impact on model performance remains underexplored. This study investigates the use of Bayesian network-based generative models to produce synthetic patient data and examines how the quality of the original real data influences the effectiveness of such augmentation. Three benchmark datasets from the UCI Heart Disease repository (Cleveland, Hungary, and Switzerland) were employed, all sharing an identical structure comprising 13 clinical predictors. The Cleveland dataset, which is the most complete and consistent among the three, was used exclusively as the training source for learning the Bayesian network structure and parameters under clinically informed constraints. To ensure robust evaluation, the dataset was partitioned into two independent subsets: 153 patients were used to train the Bayesian network, while 150 held-out patients were used exclusively to generate synthetic records. Predictive models were trained under three configurations: real data only, synthetic data only, and a hybrid real + synthetic (filtered) dataset, and evaluated using 10-fold cross-validation and external validation on independent cohorts. Results indicate that integrating real and synthetic data significantly improved accuracy and precision, particularly for the Switzerland cohort (F(2,27)=23.06, </i>η²</i>=0.63)), whereas improvements were smaller and partially non-significant in the noisier Hungarian dataset. These findings demonstrate that the effectiveness of synthetic augmentation depends on the structure and completeness of the source data, underscoring the importance of data quality for reliable generative modelling in clinical prediction. | en-US |
| dc.description.sponsorship | This work has been partially supported by the OFIDIAPlus (Operational Fire Danger preventIon plAtform Plus) project under the INTERREG GREECE-ITALY 2021–2027 PROGRAMME. | en-US |
| dc.format.extent | pp. 1–10 | - |
| dc.format.medium | Print-Electronic | - |
| dc.language | English | en-US |
| dc.language.iso | eng | en-US |
| dc.publisher | Elsevier | en-US |
| dc.rights | Creative Commons Attribution 4.0 International | - |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | - |
| dc.subject | Bayesian networks | en-US |
| dc.subject | synthetic data generation | en-US |
| dc.subject | heart disease prediction | en-US |
| dc.subject | data quality | en-US |
| dc.title | Assessing the impact of synthetic data generated by Bayesian networks on heart disease prediction | en-US |
| dc.type | Article | en-US |
| dc.date.dateAccepted | 2026-06-10 | - |
| dc.identifier.doi | https://doi.org/10.1016/j.jocs.2026.102940 | - |
| dc.relation.isPartOf | Journal of Computational Science | en-US |
| pubs.publication-status | Published | - |
| pubs.volume | 99 | - |
| dc.identifier.eissn | 1877-7511 | - |
| dc.rights.license | https://creativecommons.org/licenses/by/4.0/legalcode.en | - |
| dcterms.dateAccepted | 2026-06-10 | - |
| dc.rights.holder | The Authors | - |
| dc.contributor.orcid | Lazzaro, Ilaria [0009-0007-1612-2538] | - |
| dc.contributor.orcid | Tucker, Allan [0000-0001-5105-3506] | - |
| dc.identifier.number | 102940 | - |
| Appears in Collections: | Department of Computer Science Research Papers | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| FullText.pdf | Copyright © 2026 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license ( https://creativecommons.org/licenses/by/4.0/ ). | 2.31 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License