Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/31752
Title: | MDGraphEmb: A Toolkit for Graph Embedding and Classification of Protein Conformational Ensembles |
Authors: | Hossein Nezhad, F Oues, N Massimiliano, M Pandini, A |
Keywords: | protein conformation;graph representation learning;graph embedding;machine learning |
Issue Date: | 31-Jul-2025 |
Publisher: | Oxford University Press |
Citation: | Hossein Nezhad, et al. (2025) 'MDGraphEmb: A Toolkit for Graph Embedding and Classification of Protein Conformational Ensembles', Bioinformatics , 0 (ahead of print), btaf420, pp. 1 - 10. doi: 10.1093/bioinformatics/btaf420. |
Abstract: | Motivation: Molecular Dynamics (MD) simulations are essential for investigating protein dynamics and function. Although significant advances have been made in integrating simulation techniques and machine learning, there are still challenges in selecting the most suitable data representation for learning. Graph embedding is a powerful computational method that automatically learns low-dimensional representations of nodes in a graph while preserving graph topology and node properties, thereby bridging graph structures and machine learning methods. Graph embeddings hold great potential for efficiently representing MD simulation data and studying protein dynamics. Results: We present MDGraphEmb, a Python library built on MDAnalysis, specifically designed to convert protein MD simulation trajectories into graph-based representations and corresponding graph embeddings. This transformation enables the compression of high-dimensional, noisy trajectories from protein simulations into tabular formats suitable for machine learning. MDGraphEmb provides a framework that supports a range of graph embedding techniques and machine learning models, enabling the creation of workflows to analyse protein dynamics and identify important protein conformations. Graph embedding effectively captures and compresses structural information from protein MD simulation data, making it applicable to diverse downstream machine-learning classification tasks. We present an application for encoding and detecting important protein conformations from molecular dynamics simulations to classify functional states, using adenylate kinase (ADK) as the main case study. To assess the generalisability of the approach, two additional systems, Plantaricin E (PlnE) and HIV-1 protease are included as supplementary validation examples. A performance comparison of different graph embedding methods combined with machine learning models is also provided. Availability: MDGraphEMB GitHub Repository: https://github.com/FerdoosHN/MDGraphEMB . |
Description: | Accepted manuscripts are PDF versions of the author’s final manuscript, as accepted for publication by the journal but prior to copyediting or typesetting. They can be cited using the author(s), article title, journal title, year of online publication, and DOI. They will be replaced by the final typeset articles, which may therefore contain changes. The DOI will remain the same throughout. Data availability: Relevant data underpinning this publication can be accessed from Brunel University London’s data repository under CC BY licence: https://doi.org/10.17633/rd.brunel.c.7664645 . Supplementary data is available online at: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf420/8220315#supplementary-data . |
URI: | https://bura.brunel.ac.uk/handle/2438/31752 |
DOI: | https://doi.org/10.1093/bioinformatics/btaf420 |
ISSN: | 1367-4803 |
Other Identifiers: | ORCiD: Ferdoos Hossein Nezhad https://orcid.org/0009-0007-9892-7662 ORCiD: Namir Oues https://orcid.org/0009-0003-2001-1065 ORCiD: Massimiliano Meli https://orcid.org/0000-0003-3304-6104 ORCiD: Alessandro Pandini https://orcid.org/0000-0002-4158-233X Article number: btaf420 |
Appears in Collections: | Dept of Computer Science Research Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
FullText.pdf | Copyright © The Author(s) 2025. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. | 3.04 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License