Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/31664
Full metadata record
DC FieldValueLanguage
dc.contributor.authorElmi, Z-
dc.contributor.authorElmi, S-
dc.contributor.authorDanishvar, S-
dc.date.accessioned2025-08-01T14:12:00Z-
dc.date.available2025-08-01T14:12:00Z-
dc.date.issued2025-07-29-
dc.identifierORCiD: Zahra Elmi https://orcid.org/0000-0003-1487-8570-
dc.identifierORCiD: Soheila Elmi https://orcid.org/0000-0003-1434-6494-
dc.identifierORCiD: Sebelan Danishvar https://orcid.org/0000-0002-8258-0437-
dc.identifierArticle number: 129194-
dc.identifier.citationElmi, Z., Elmi, S. and Danishvar, S. (2025) 'NRBO-AGP: A Novel Feature Selection Approach for Accurate Protein Solubility Prediction', Expert Systems with Applications, 0 (in press, pre-proof), 129194, pp. 1 - 38. doi: 10.1016/j.eswa.2025.129194.en_US
dc.identifier.issn0957-4174-
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/31664-
dc.descriptionData availability: Data will be made available on request.en_US
dc.description.abstractProtein solubility determines how well a protein dissolves in an aqueous solution, and this property is a critical factor in the functional analysis of proteins and biotechnological applications. Accurately estimating solubility can provide significant advantages in areas such as protein engineering and drug discovery. This study proposes a new feature selection method, Newton-Raphson-based Optimization and Adaptive Gradient Perturbation (NRBO-AGP) for predicting protein solubility. The research combines the accuracy and speed of the Newton-Raphson method with the capacity of population-based optimization techniques to balance exploration and exploitation. Using 3144 protein sequences from the eSOL database, descriptor features were obtained for each protein, resulting in a dataset with 3104 features. The performance of NRBO-AGP was compared with eight different metaheuristic algorithms and evaluated using five regression models: MLP, AdaBoost, Gradient Boosting Trees, Random Forest, and Support Vector Regressor (SVR). The best results were obtained with the Gradient Boosting and Random Forest. Mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (𝑅2) metrics were used for performance evaluation. The results show that NRBO-AGP outperforms other metaheuristic algorithms in all regression models. The best results were achieved with Gradient Boosting and Random Forest, reaching MAE:0.0001 ± 0.0000, RMSE: 0.0008 ± 0.0000, and 𝑅2: 0.9908 ± 0.0005, and MAE: 0.0002 ± 0.0000, RMSE: 0.0025 ± 0.0000, and 𝑅2: 0.9908 ± 0.0005. These findings show that NRBO-AGP is an effective feature selection tool for predicting protein solubility. Multiple statistical analyses based on Friedman and Nemenyi tests show that the NBRO-AGP method exhibits statistically significant superior performance (𝑝 < 0.05) compared to other metaheuristic algorithms in MAE and RMSE metrics and also achieves the highest performance in the 𝑅2 score.en_US
dc.format.extent1 - 38-
dc.format.mediumPrint-Electronic-
dc.languageEnglish-
dc.language.isoen_USen_US
dc.publisherElsevieren_US
dc.rightsCreative Commons Attribution 4.0 International-
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/-
dc.subjectdrug discoveryen_US
dc.subjectprotein solubility predictionen_US
dc.subjectmetaheuristic approachen_US
dc.subjectfeature selectionen_US
dc.titleNRBO-AGP: A Novel Feature Selection Approach for Accurate Protein Solubility Predictionen_US
dc.typeArticleen_US
dc.date.dateAccepted2025-07-29-
dc.identifier.doihttps://doi.org/10.1016/j.eswa.2025.129194-
dc.relation.isPartOfExpert Systems with Applications-
pubs.publication-statusPublished-
pubs.volume0-
dc.identifier.eissn1873-6793-
dc.rights.licensehttps://creativecommons.org/licenses/by/4.0/legalcode.en-
dcterms.dateAccepted2025-07-29-
dc.rights.holderCrown / The Authors-
Appears in Collections:Dept of Electronic and Electrical Engineering Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCrown Copyright © 2025 Published by Elsevier Ltd. This is an open access article under a Creative Commons license (https://creativecommons.org/licenses/by/4.0/).2.23 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons