Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/29710
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Marshan, A | - |
dc.contributor.author | Almutairi, AN | - |
dc.contributor.author | Ioannou, A | - |
dc.contributor.author | Bell, D | - |
dc.contributor.author | Monaghan, A | - |
dc.contributor.author | Arzoky, M | - |
dc.date.accessioned | 2024-09-11T16:19:53Z | - |
dc.date.available | 2024-09-11T16:19:53Z | - |
dc.date.issued | 2024-06-26 | - |
dc.identifier | ORCiD: Alaa Marshan https://orcid.org/0000-0001-6764-9160 | - |
dc.identifier | ORCiD: David Bell https://orcid.org/0000-0003-3148-6691 | - |
dc.identifier | ORCiD: Mahir Arzoky https://orcid.org/0000-0002-2721-643X | - |
dc.identifier | 1371680 | - |
dc.identifier.citation | Marshan, A. et al. (2024) 'MedT5SQL: a transformers-based large language model for text-to-SQL conversion in the healthcare domain', Frontiers in Big Data, 7, 1371680, pp. 1 - 20. doi: 10.3389/fdata.2024.1371680. | en_US |
dc.identifier.uri | https://bura.brunel.ac.uk/handle/2438/29710 | - |
dc.description | Data availability statement: Publicly available datasets were analyzed in this study. This data can be found at: https://github.com/wangpinggl/TREQS/tree/master/mimicsql_data/mimicsql_natural_v2; https://huggingface.co/datasets/wikisql . | en_US |
dc.description | Supplementary material: The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fdata.2024.1371680/full#supplementary-material . | - |
dc.description.abstract | Introduction: In response to the increasing prevalence of electronic medical records (EMRs) stored in databases, healthcare staff are encountering difficulties retrieving these records due to their limited technical expertise in database operations. As these records are crucial for delivering appropriate medical care, there is a need for an accessible method for healthcare staff to access EMRs. Methods: To address this, natural language processing (NLP) for Text-to-SQL has emerged as a solution, enabling non-technical users to generate SQL queries using natural language text. This research assesses existing work on Text-to-SQL conversion and proposes the MedT5SQL model specifically designed for EMR retrieval. The proposed model utilizes the Text-to-Text Transfer Transformer (T5) model, a Large Language Model (LLM) commonly used in various text-based NLP tasks. The model is fine-tuned on the MIMICSQL dataset, the first Text-to-SQL dataset for the healthcare domain. Performance evaluation involves benchmarking the MedT5SQL model on two optimizers, varying numbers of training epochs, and using two datasets, MIMICSQL and WikiSQL. Results: For MIMICSQL dataset, the model demonstrates considerable effectiveness in generating question-SQL pairs achieving accuracy of 80.63%, 98.937%, and 90% for exact match accuracy matrix, approximate string-matching, and manual evaluation, respectively. When testing the performance of the model on WikiSQL dataset, the model demonstrates efficiency in generating SQL queries, with an accuracy of 44.2% on WikiSQL and 94.26% for approximate string-matching. Discussion: Results indicate improved performance with increased training epochs. This work highlights the potential of fine-tuned T5 model to convert medical-related questions written in natural language to Structured Query Language (SQL) in healthcare domain, providing a foundation for future research in this area. | en_US |
dc.description.sponsorship | The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article. | en_US |
dc.format.extent | 1 - 20 | - |
dc.language | English | - |
dc.language.iso | en_US | en_US |
dc.publisher | Frontiers Media | en_US |
dc.rights | Copyright © 2024 Marshan, Almutairi, Ioannou, Bell, Monaghan and Arzoky. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. | - |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | - |
dc.subject | text-to-SQL conversion | en_US |
dc.subject | large language model | en_US |
dc.subject | transformers | en_US |
dc.subject | T5 model | en_US |
dc.subject | NLP | en_US |
dc.subject | MIMICSQL dataset | en_US |
dc.subject | healthcare domain | en_US |
dc.title | MedT5SQL: a transformers-based large language model for text-to-SQL conversion in the healthcare domain | en_US |
dc.type | Article | en_US |
dc.date.dateAccepted | 2024-06-10 | - |
dc.identifier.doi | https://doi.org/10.3389/fdata.2024.1371680 | - |
dc.relation.isPartOf | Frontiers in Big Data | - |
pubs.publication-status | Published | - |
pubs.volume | 7 | - |
dc.identifier.eissn | 2624-909X | - |
dc.rights.license | https://creativecommons.org/licenses/by/4.0/legalcode.en | - |
dc.rights.holder | Marshan, Almutairi, Ioannou, Bell, Monaghan and Arzoky | - |
Appears in Collections: | Dept of Computer Science Research Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
FullText.pdf | Copyright © 2024 Marshan, Almutairi, Ioannou, Bell, Monaghan and Arzoky. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. | 1.18 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License