Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/31752
Title: MDGraphEmb: A Toolkit for Graph Embedding and Classification of Protein Conformational Ensembles
Authors: Hossein Nezhad, F
Oues, N
Massimiliano, M
Pandini, A
Keywords: protein conformation;graph representation learning;graph embedding;machine learning
Issue Date: 31-Jul-2025
Publisher: Oxford University Press
Citation: Hossein Nezhad, et al. (2025) 'MDGraphEmb: A Toolkit for Graph Embedding and Classification of Protein Conformational Ensembles', Bioinformatics , 0 (ahead of print), btaf420, pp. 1 - 10. doi: 10.1093/bioinformatics/btaf420.
Abstract: Motivation: Molecular Dynamics (MD) simulations are essential for investigating protein dynamics and function. Although significant advances have been made in integrating simulation techniques and machine learning, there are still challenges in selecting the most suitable data representation for learning. Graph embedding is a powerful computational method that automatically learns low-dimensional representations of nodes in a graph while preserving graph topology and node properties, thereby bridging graph structures and machine learning methods. Graph embeddings hold great potential for efficiently representing MD simulation data and studying protein dynamics. Results: We present MDGraphEmb, a Python library built on MDAnalysis, specifically designed to convert protein MD simulation trajectories into graph-based representations and corresponding graph embeddings. This transformation enables the compression of high-dimensional, noisy trajectories from protein simulations into tabular formats suitable for machine learning. MDGraphEmb provides a framework that supports a range of graph embedding techniques and machine learning models, enabling the creation of workflows to analyse protein dynamics and identify important protein conformations. Graph embedding effectively captures and compresses structural information from protein MD simulation data, making it applicable to diverse downstream machine-learning classification tasks. We present an application for encoding and detecting important protein conformations from molecular dynamics simulations to classify functional states, using adenylate kinase (ADK) as the main case study. To assess the generalisability of the approach, two additional systems, Plantaricin E (PlnE) and HIV-1 protease are included as supplementary validation examples. A performance comparison of different graph embedding methods combined with machine learning models is also provided. Availability: MDGraphEMB GitHub Repository: https://github.com/FerdoosHN/MDGraphEMB .
Description: Accepted manuscripts are PDF versions of the author’s final manuscript, as accepted for publication by the journal but prior to copyediting or typesetting. They can be cited using the author(s), article title, journal title, year of online publication, and DOI. They will be replaced by the final typeset articles, which may therefore contain changes. The DOI will remain the same throughout.
Data availability: Relevant data underpinning this publication can be accessed from Brunel University London’s data repository under CC BY licence: https://doi.org/10.17633/rd.brunel.c.7664645 .
Supplementary data is available online at: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf420/8220315#supplementary-data .
URI: https://bura.brunel.ac.uk/handle/2438/31752
DOI: https://doi.org/10.1093/bioinformatics/btaf420
ISSN: 1367-4803
Other Identifiers: ORCiD: Ferdoos Hossein Nezhad https://orcid.org/0009-0007-9892-7662
ORCiD: Namir Oues https://orcid.org/0009-0003-2001-1065
ORCiD: Massimiliano Meli https://orcid.org/0000-0003-3304-6104
ORCiD: Alessandro Pandini https://orcid.org/0000-0002-4158-233X
Article number: btaf420
Appears in Collections:Dept of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © The Author(s) 2025. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.3.04 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons