Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/31098
Title: Evaluation of Machine Learning and Traditional Statistical Models to Assess the Value of Stroke Genetic Liability for Prediction of Risk of Stroke Within the UK Biobank
Authors: MacCarthy, G
Pazoki, R
Keywords: the receiver operation characteristic (ROC);area under the curve (AUC);brier score (BS);integrated calibration index (ICI)
Issue Date: 26-Apr-2025
Publisher: MDPI
Citation: MacCarthy G. and Pazoki, R. (2025) 'Evaluation of Machine Learning and Traditional Statistical Models to Assess the Value of Stroke Genetic Liability for Prediction of Risk of Stroke Within the UK Biobank', Healthcare, 13 (9), 1003, pp. 1 - 19. doi: 10.3390/healthcare13091003.
Abstract: Background and Objective: Stroke is one of the leading causes of mortality and long-term disability in adults over 18 years of age globally, and its increasing incidence has become a global public health concern. Accurate stroke prediction is highly valuable for early intervention and treatment. There is a scarcity of studies evaluating the prediction value of genetic liability in the prediction of the risk of stroke. Materials and Methods: Our study involved 243,339 participants of European ancestry from the UK Biobank. We created stroke genetic liability using data from MEGASTROKE genome-wide association studies (GWASs). In our study, we built four predictive models with and without stroke genetic liability in the training set, namely a Cox proportional hazard (Coxph) model, gradient boosting model (GBM), decision tree (DT), and random forest (RF), to estimate time-to-event risk for stroke. We then assessed their performances in the testing set. Results: Each unit (standard deviation) increase in genetic liability increases the risk of incident stroke by 7% (HR = 1.07, 95% CI = 1.02, 1.12, p-value = 0.0030). The risk of stroke was greater in the higher genetic liability group, demonstrated by a 14% increased risk (HR = 1.14, 95% CI = 1.02, 1.27, p-value = 0.02) compared with the low genetic liability group. The Coxph model including genetic liability was the best-performing model for stroke prediction achieving an AUC of 69.54 (95% CI = 67.40, 71.68), NRI of 0.202 (95% CI = 0.12, 0.28; p-value = 0.000) and IDI of 1.0 × 10−4 (95% CI = 0.000, 3.0 × 10−4; p-value = 0.13) compared with the Cox model without genetic liability. Conclusions: Incorporating genetic liability in prediction models slightly improved prediction models of stroke beyond conventional risk factors.
Description: Data Availability Statement: The data used in this study is available on request from the UK Biobank.
Acknowledgments: This research was conducted using the UK Biobank under Application Number 60549 (www.ukbiobank.ac.uk (accessed on 5 February 2021)). The UK Biobank is generously supported by its founding funders, the Wellcome Trust and the UK Medical Research Council, as well as by the British Heart Foundation, Cancer Research UK, the Department of Health, the Northwest Regional Development Agency, and the Scottish Government. The MEGASTROKE project received funding from sources specified at https://megastroke.org/acknowledgements.html (accessed on 13 September 2022).
Supplementary Materials are available online at: https://www.mdpi.com/2227-9032/13/9/1003#app1-healthcare-13-01003 .
URI: https://bura.brunel.ac.uk/handle/2438/31098
DOI: https://doi.org/10.3390/healthcare13091003
Other Identifiers: ORCiD: Raha Pazoki https://orcid.org/0000-0002-5142-2348
Article number 1003
Appears in Collections:Dept of Life Sciences Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).1.38 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons