Multilingual sentiment analysis of Arabic, Bahraini dialects and English

Omran, Thuraya Mohamed Maki

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/27228

Full metadata record

DC Field	Value	Language
dc.contributor.author	Omran, Thuraya Mohamed Maki	-
dc.date.accessioned	2023-09-20T14:03:37Z	-
dc.date.available	2023-09-20T14:03:37Z	-
dc.date.issued	2022	-
dc.identifier.uri	http://bura.brunel.ac.uk/handle/2438/27228	-
dc.description	This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University London	en_US
dc.description.abstract	Sentiment analysis is a crucial natural language processing (NLP) task to analyze the user’s emotions and opinions towards entities such as events, services, or products. Arabic NLP faces numerous challenges, some of which include: (1) the scarcity of resources, especially in modern standard Arabic and Arabic dialects, particularly the Bahraini one; (2) the lack of multilingual deep learning models; and (3) insufficient transfer learning studies on Arabic dialects in general and Bahraini dialects specifically. This research aims to create a balanced dataset of Bahraini dialects that covers product reviews by translating English Amazon product reviews to modern standard Arabic, which were then converted to Bahraini dialects. Another aim of this research is to provide a multilingual deep learning long short-term memory (LSTM) model to analyze the parallel dataset of English, modern standard Arabic, and Bahraini dialects, which differ in linguistic properties. Many experiments were conducted using train-validate-test split and k-fold cross-validation to evaluate the model performance using accuracy, F1 score, and AUC metrics. The average accuracy of the model on all datasets ranged from 96.72% to 97.04% and 97.91% to 97.93% in the F1 score, while in AUC was 98.46% to 98.7% when utilizing an augmentation technique. The LSTM model was incorporated in a stacking ensemble learning process that includes other LSTM architectures as base learners and a decision tree (DT) as a meta-learner. Interestingly, promising results were obtained, such as 99.52%, 99.25%, and 98.52% of mean accuracy for English, MSA, and BDs datasets. Moreover, the LSTM model was utilized as a pre-trained model in the transfer learning process to exploit the knowledge gained from analyzing the product reviews in Bahraini dialects to perform another sentiment analysis task on a small dataset of movie comments in the same dialects. The pre-trained model performance was 96.97% accuracy, 96.65% F1 score, and 97.94% AUC.	en_US
dc.publisher	Brunel University London	en_US
dc.relation.uri	http://bura.brunel.ac.uk/handle/2438/27228/1/FulltextThesis.pdf	-
dc.subject	Natural language processing	en_US
dc.subject	Resource scarcity	en_US
dc.subject	Parallel dataset	en_US
dc.subject	Transfer learning	en_US
dc.subject	LSTM deep learning model	en_US
dc.title	Multilingual sentiment analysis of Arabic, Bahraini dialects and English	en_US
dc.type	Thesis	en_US
Appears in Collections:	Computer Science Department of Computer Science Theses

Files in This Item:

File	Description	Size	Format
FulltextThesis.pdf		5.26 MB	Adobe PDF	View/Open

Show simple item record