Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/29411
Title: Using Machine Learning to Evaluate the Value of Genetic Liabilities in the Classification of Hypertension within the UK Biobank
Authors: MacCarthy, G
Pazoki, R
Keywords: the receiver operation characteristic (ROC);area under the curve (AUC)
Issue Date: 17-May-2024
Publisher: MDPI
Citation: MacCarthy, G. and Pazoki, R.(2024) 'Using Machine Learning to Evaluate the Value of Genetic Liabilities in the Classification of Hypertension within the UK Biobank'. Journal of Clinical Medicine, 13 (10), pp. 1 - 20. doi: 10.3390/jcm13102955.
Abstract: Background and Objective: Hypertension increases the risk of cardiovascular diseases (CVD) such as stroke, heart attack, heart failure, and kidney disease, contributing to global disease burden and premature mortality. Previous studies have utilized statistical and machine learning techniques to develop hypertension prediction models. Only a few have included genetic liabilities and evaluated their predictive values. This study aimed to develop an effective hypertension classification model and investigate the potential influence of genetic liability for multiple risk factors linked to CVD on hypertension risk using the random forest and the neural network. Materials and Methods: The study involved 244,718 European participants, who were divided into training and testing sets. Genetic liabilities were constructed using genetic variants associated with CVD risk factors obtained from genome-wide association studies (GWAS). Various combinations of machine learning models before and after feature selection were tested to develop the best classification model. The models were evaluated using area under the curve (AUC), calibration, and net reclassification improvement in the testing set. Results: The models without genetic liabilities achieved AUCs of 0.70 and 0.72 using the random forest and the neural network methods, respectively. Adding genetic liabilities improved the AUC for the random forest but not for the neural network. The best classification model was achieved when feature selection and classification were performed using random forest (AUC = 0.71, Spiegelhalter z score = 0.10, p-value = 0.92, calibration slope = 0.99). This model included genetic liabilities for total cholesterol and low-density lipoprotein (LDL). Conclusions: The study highlighted that incorporating genetic liabilities for lipids in a machine learning model may provide incremental value for hypertension classification beyond baseline characteristics.
Description: Data Availability Statement: Data are contained within the article and supplementary materials.
Supplementary Materials: The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/jcm13102955/s1, Supplementary Data S1–Supplementary Data S10 List of genetic variants’ summary statistics used to construct the genetic risk scores, Supplementary Table S1–Supplementary Table S4, Supplementary Figure S1–Supplementary Figure S7.
URI: https://bura.brunel.ac.uk/handle/2438/29411
DOI: https://doi.org/10.3390/jcm13102955
Other Identifiers: Article No.: 2955
ORCiD: Gideon MacCarthy https://orcid.org/0009-0006-7139-1039
ORCiD: Raha Pazoki https://orcid.org/0000-0002-5142-2348
Appears in Collections:Dept of Life Sciences Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright: © 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/ 4.0/).2.43 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons