Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/27043
Title: Text mining of stocktwits data for predicting stock prices
Authors: Jaggi, M
Mandal, P
Narang, S
Naseem, U
Khushi, M
Keywords: BERT;FinBERT;ALBERT;NLP;StockTwits;FinALBERT;FAANG;transformer;pre-training;fine-tuning
Issue Date: 17-Feb-2021
Publisher: MDPI
Citation: Jaggi, M. et al. (2021) 'Text mining of stocktwits data for predicting stock prices', Applied System Innovation, 2021, 4 (1), 22, pp. 1 - 22. doi: 10.3390/asi4010013.
Abstract: Copyright © 2021 by the authors. Stock price prediction can be made more efficient by considering the price fluctuations and understanding people’s sentiments. A limited number of models understand financial jargon or have labelled datasets concerning stock price change. To overcome this challenge, we introduced FinALBERT, an ALBERT based model trained to handle financial domain text classification tasks by labelling Stocktwits text data based on stock price change. We collected Stocktwits data for over ten years for 25 different companies, including the major five FAANG (Facebook, Amazon, Apple, Netflix, Google). These datasets were labelled with three labelling techniques based on stock price changes. Our proposed model FinALBERT is fine-tuned with these labels to achieve optimal results. We experimented with the labelled dataset by training it on traditional machine learning, BERT, and FinBERT models, which helped us understand how these labels behaved with different model architectures. Our labelling method’s competitive advantage is that it can help analyse the historical data effectively, and the mathematical function can be easily customised to predict stock movement.
Description: Data Availability Statement: The code and data are available from https://mkhushi.github.io/.
URI: https://bura.brunel.ac.uk/handle/2438/27043
DOI: https://doi.org/10.3390/asi4010013
Other Identifiers: ORCID iDs: Mukul Jaggi https://orcid.org/0000-0003-3324-0812; Priyanka Mandal https://orcid.org/0000-0003-3246-3440; Shreya Narang https://orcid.org/0000-0001-6905-8188; Usman Naseem https://orcid.org/0000-0003-0191-7171; Matloob Khushi https://orcid.org/0000-0001-7792-2327.
22
Appears in Collections:Dept of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).11.98 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons