Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/25229
Title: Statistical modelling and machine learning for the epidemiology of diabetes in Saudi Arabia
Authors: Almutairi, Entissar
Advisors: Abbod, M
Date, P
Keywords: Mathematical modelling;Risk factors
Issue Date: 2022
Publisher: Brunel University London
Abstract: Mathematical modelling and machine learning algorithms have been successfully applied to the healthcare domain and epidemiological chronic disease including diabetes mellitus, which is classified as an epidemic due to its high rates of prevalence around the world. Machine learning and statistical techniques are useful for the processes of description, prediction, and evaluation of various diseases, including diabetes. These techniques can be efficient tool in modelling diabetes and the most related risk factors. Although, Machine learning methods have been utilised in different aspects of diabetes research, but most of them were based on diagnosing or detecting the disease, and little research attention has explored the adoption of machine learning methods to study the trends in the prevalence of diabetes and forecast its future in specific populations. Thus, this thesis attempts to apply various machine learning and combination methods for studying diabetes and make future predictions. This thesis has investigated the application of machine learning and statistical techniques for developing prediction models for diabetes and the relevant risk factors (smoking, obesity, and physical inactivity) in the Kingdom of Saudi Arabia that can be used to support health policy planning and diabetes controlling. Regression, classification, and time series modelling approaches were used for diabetes modelling. Several models were developed namely, Multiple Linear Regression, Adaptive Neuro-Fuzzy Interference System ANFIS, Artificial Neural Network ANN, Support Vector Regression, Bayesian Linear Regression, Support Vector Machine, K-Nearest Neighbour KNN, Linear Discriminant, Neural Network Pattern Recognition, and Neural Network Time Series NARX-NN models. These models integrate historical data on diabetes, smoking, obesity, and inactivity prevalence to achieve its aim for examining the trends in prevalence of diabetes mellitus in the Kingdom of Saudi Arabia, to predict the future level of the disease. A combination of regression models is performed to improve the prediction accuracy using combination methods (Average, Weighted Average, Majority Voting, Weighted Majority, Minimum, and Maximum, and a new combination method consensus model). Several statistical evaluation metrics were applied to evaluate the performance of regression and time series models: mean squared error, root mean squared error, mean absolute percentage error, and the coefficient of determination R-squared. Classification models’ accuracy performance was evaluated. Results from the regression and combined models were validated by comparison with some observed data from existing studies by the World Health Organization, International Diabetes Federation, and Family Health Survey from the Saudi General Authority for Statistics, revealing that improved accuracy was achieved with ANFIS model and the combined weighted average model in comparison to previous studies. The experimental results demonstrate the effectiveness of regression and combined models compared to classification and time series models. The ANFIS and WAVR models were found to be suitable for diabetes prediction due to their flexibility and high accuracy.
Description: This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University London
URI: http://bura.brunel.ac.uk/handle/2438/25229
Appears in Collections:Electronic and Computer Engineering
Dept of Electronic and Electrical Engineering Theses

Files in This Item:
File Description SizeFormat 
FullText.pdf2.54 MBAdobe PDFView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.