Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/29710
Full metadata record
DC FieldValueLanguage
dc.contributor.authorMarshan, A-
dc.contributor.authorAlmutairi, AN-
dc.contributor.authorIoannou, A-
dc.contributor.authorBell, D-
dc.contributor.authorMonaghan, A-
dc.contributor.authorArzoky, M-
dc.date.accessioned2024-09-11T16:19:53Z-
dc.date.available2024-09-11T16:19:53Z-
dc.date.issued2024-06-26-
dc.identifierORCiD: Alaa Marshan https://orcid.org/0000-0001-6764-9160-
dc.identifierORCiD: David Bell https://orcid.org/0000-0003-3148-6691-
dc.identifierORCiD: Mahir Arzoky https://orcid.org/0000-0002-2721-643X-
dc.identifier1371680-
dc.identifier.citationMarshan, A. et al. (2024) 'MedT5SQL: a transformers-based large language model for text-to-SQL conversion in the healthcare domain', Frontiers in Big Data, 7, 1371680, pp. 1 - 20. doi: 10.3389/fdata.2024.1371680.en_US
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/29710-
dc.descriptionData availability statement: Publicly available datasets were analyzed in this study. This data can be found at: https://github.com/wangpinggl/TREQS/tree/master/mimicsql_data/mimicsql_natural_v2; https://huggingface.co/datasets/wikisql .en_US
dc.descriptionSupplementary material: The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fdata.2024.1371680/full#supplementary-material .-
dc.description.abstractIntroduction: In response to the increasing prevalence of electronic medical records (EMRs) stored in databases, healthcare staff are encountering difficulties retrieving these records due to their limited technical expertise in database operations. As these records are crucial for delivering appropriate medical care, there is a need for an accessible method for healthcare staff to access EMRs. Methods: To address this, natural language processing (NLP) for Text-to-SQL has emerged as a solution, enabling non-technical users to generate SQL queries using natural language text. This research assesses existing work on Text-to-SQL conversion and proposes the MedT5SQL model specifically designed for EMR retrieval. The proposed model utilizes the Text-to-Text Transfer Transformer (T5) model, a Large Language Model (LLM) commonly used in various text-based NLP tasks. The model is fine-tuned on the MIMICSQL dataset, the first Text-to-SQL dataset for the healthcare domain. Performance evaluation involves benchmarking the MedT5SQL model on two optimizers, varying numbers of training epochs, and using two datasets, MIMICSQL and WikiSQL. Results: For MIMICSQL dataset, the model demonstrates considerable effectiveness in generating question-SQL pairs achieving accuracy of 80.63%, 98.937%, and 90% for exact match accuracy matrix, approximate string-matching, and manual evaluation, respectively. When testing the performance of the model on WikiSQL dataset, the model demonstrates efficiency in generating SQL queries, with an accuracy of 44.2% on WikiSQL and 94.26% for approximate string-matching. Discussion: Results indicate improved performance with increased training epochs. This work highlights the potential of fine-tuned T5 model to convert medical-related questions written in natural language to Structured Query Language (SQL) in healthcare domain, providing a foundation for future research in this area.en_US
dc.description.sponsorshipThe author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.en_US
dc.format.extent1 - 20-
dc.languageEnglish-
dc.language.isoen_USen_US
dc.publisherFrontiers Mediaen_US
dc.rightsCopyright © 2024 Marshan, Almutairi, Ioannou, Bell, Monaghan and Arzoky. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.-
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/-
dc.subjecttext-to-SQL conversionen_US
dc.subjectlarge language modelen_US
dc.subjecttransformersen_US
dc.subjectT5 modelen_US
dc.subjectNLPen_US
dc.subjectMIMICSQL dataseten_US
dc.subjecthealthcare domainen_US
dc.titleMedT5SQL: a transformers-based large language model for text-to-SQL conversion in the healthcare domainen_US
dc.typeArticleen_US
dc.date.dateAccepted2024-06-10-
dc.identifier.doihttps://doi.org/10.3389/fdata.2024.1371680-
dc.relation.isPartOfFrontiers in Big Data-
pubs.publication-statusPublished-
pubs.volume7-
dc.identifier.eissn2624-909X-
dc.rights.licensehttps://creativecommons.org/licenses/by/4.0/legalcode.en-
dc.rights.holderMarshan, Almutairi, Ioannou, Bell, Monaghan and Arzoky-
Appears in Collections:Dept of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2024 Marshan, Almutairi, Ioannou, Bell, Monaghan and Arzoky. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.1.18 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons