Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/32454
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAlattal, D-
dc.contributor.authorDraghi, B-
dc.contributor.authorMyles, P-
dc.contributor.authorBranson, R-
dc.contributor.authorTucker, A-
dc.date.accessioned2025-12-04T17:48:36Z-
dc.date.available2025-12-04T17:48:36Z-
dc.date.issued2026-02-26-
dc.identifierORCiD: Allan Tucker https://orcid.org/0000-0001-5105-3506-
dc.identifier.citationAlattal, D. et al. (2025) 'Probabilistic vs Deep Generative Models: A Fairness Centred Evaluation of Synthetic Healthcare Tabular Data', International Journal of Computational Intelligence Systems, 19 (1), 135, pp. 1–24. doi: 10.1007/s44196-026-01173-7.en-GB
dc.identifier.issn1875-6891-
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/32454-
dc.descriptionData Availability: CPRD cardiovascular disease synthetic dataset used in this paper can be requested from CPRD (https://cprd.com/cprd-cardiovascular-disease-synthetic-dataset). The diabetes dataset is publicly available on Kaggle (https://www.kaggle.com/datasets/rabieelkharoua/diabetes-health-datasetanalysis).en-GB
dc.descriptionCode availability: Not applicable.en-GB
dc.descriptionA preprint version of the article is available on Research Square at https://doi.org/10.21203/rs.3.rs-7565139/v1 . It has not been certified by peer review.en-GB
dc.description.abstractSynthetic data offers a promising avenue for addressing privacy, scarcity, and fairness challenges in healthcare datasets. However, there is limited evaluation of how different generation methods balance fidelity, utility, and fairness, particularly for underrepresented subgroups. This study addresses this gap by comparing representative generative modelling techniques, both probabilistic and deep approaches, that are popular in the research literature. We empirically evaluate BayesBoost, CTGAN, TVAE, CopulaGAN, and DECAF on two healthcare datasets containing numerical, binary, and categorical features. Each model’s performance is assessed along three axes: data fidelity, machine learning utility, and fairness, using Accuracy Parity, Equalised Odds, and Predictive Rate Parity. Results show that BayesBoost consistently achieved superior fidelity, utility, and fairness preservation, particularly when paired with Random Forest classifiers, achieving around 60–63% higher downstream utility than GAN-based deep generative baselines (e.g., Random Forest accuracy up to 0.88 with BayesBoost versus 0.54 to − 0.55 for GAN-based methods). Deep generative models, while effective in capturing complex structures, often degraded fairness, especially for underrepresented groups, with equalised odds deviating by over 100% from the ideal parity value of 1.0 in some settings. The Variational Autoencoder outperformed other deep generative models in fairness preservation, especially for equalised odds, although with some reduction in fidelity and utility. Overall, these findings suggest that synthetic data generation for healthcare must move beyond fidelity evaluations to explicitly assess fairness and subgroup impacts, with probabilistic models such as BayesBoost showing strong potential for ethical deployment, while deep generative models require further adaptation for fairness-sensitive applications.en-GB
dc.description.sponsorshipThis work was funded by the Regulators Pioneer Fund, Department for Science, Innovation and Technology. This work was also supported by the UK Regulatory Science and Innovation Networks – Implementation Phase: Human Health CERSIs programme through the project RADIANT: Regulatory Science Empowering Innovation in Transformative Digital Health and AI (Grant Ref: MCPC24031), funded by the Medical Research Council (MRC) and Innovate UK.en-GB
dc.format.extent1–24-
dc.format.mediumPrint-Electronic-
dc.languageen-GBen-GB
dc.language.isoenen-GB
dc.publisherSpringeren-GB
dc.relation.urihttps://www.researchsquare.com/article/rs-7565139/v1-
dc.relation.urihttps://doi.org/10.21203/rs.3.rs-7565139/v1-
dc.rightshttps://creativecommons.org/licenses/by/4.0/-
dc.rightsCreative Commons Attribution 4.0 International-
dc.subjectsynthetic data generationen-GB
dc.subjecttabular dataen-GB
dc.subjectfairness in machine learningen-GB
dc.subjecthealthcare dataen-GB
dc.subjectgenerative modelsen-GB
dc.subjectdata fidelityen-GB
dc.subjectbias mitigationen-GB
dc.subjectBayes boosten-GB
dc.subjectGANen-GB
dc.subjectVAEen-GB
dc.titleProbabilistic vs Deep Generative Models: A Fairness Centred Evaluation of Synthetic Healthcare Tabular Dataen-GB
dc.typeArticleen-GB
dc.date.dateAccepted2026-01-15-
dc.identifier.doihttps://doi.org/10.1007/s44196-026-01173-7-
dc.relation.isPartOfInternational Journal of Computational Intelligence Systems-
pubs.issue1-
pubs.publication-statusPublished-
pubs.volume19-
dc.rights.licensehttps://creativecommons.org/licenses/by/4.0/legalcode.en-
dcterms.dateAccepted2026-01-15-
dc.rights.holderThe Author(s)-
dc.contributor.orcidTucker, Allan [0000-0001-5105-3506]-
dc.identifier.number135-
Appears in Collections:Department of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © The Author(s) 2026. Rights and permissions: Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.2.54 MBAdobe PDFView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.