COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis

Naseem, U; Razzak, I; Khushi, M; Eklund, PW; Kim, J

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/33458

Title:	COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis
Authors:	Naseem, U Razzak, I Khushi, M Eklund, PW Kim, J
Keywords:	COVID-19;epidemic;misinformation;opinion mining;pandemic;sentiment analysis;text mining;Twitter
Issue Date:	29-Jan-2021
Publisher:	Institute of Electrical and Electronics Engineers (IEEE) ob behalf of the Computer Society, and the Systems, Man, and Cybernetics Society
Citation:	Naseem, U. et al. (2021) 'COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis', IEEE Transactions on Computational Social Systems, 8 (4), pp. 1003–1015. doi: 10.1109/tcss.2021.3051189.
Abstract:	Social media (and the world at large) have been awash with news of the COVID-19 pandemic. With the passage of time, news and awareness about COVID-19 spread like the pandemic itself, with an explosion of messages, updates, videos, and posts. Mass hysteria manifest as another concern in addition to the health risk that COVID-19 presented. Predictably, public panic soon followed, mostly due to misconceptions, a lack of information, or sometimes outright misinformation about COVID-19 and its impacts. It is thus timely and important to conduct an ex post facto assessment of the early information flows during the pandemic on social media, as well as a case study of evolving public opinion on social media which is of general interest. This study aims to inform policy that can be applied to social media platforms; for example, determining what degree of moderation is necessary to curtail misinformation on social media. This study also analyzes views concerning COVID-19 by focusing on people who interact and share social media on Twitter. As a platform for our experiments, we present a new large-scale sentiment data set COVIDSENTI, which consists of 90 000 COVID-19-related tweets collected in the early stages of the pandemic, from February to March 2020. The tweets have been labeled into positive, negative, and neutral sentiment classes. We analyzed the collected tweets for sentiment classification using different sets of features and classifiers. Negative opinion played an important role in conditioning public sentiment, for instance, we observed that people favored lockdown earlier in the pandemic; however, as expected, sentiment shifted by mid-March. Our study supports the view that there is a need to develop a proactive and agile public health presence to combat the spread of negative sentiment on social media following a pandemic.
URI:	https://bura.brunel.ac.uk/handle/2438/33458
DOI:	https://doi.org/10.1109/tcss.2021.3051189
Other Identifiers:	ORCiD: Usman Naseem https://orcid.org/0000-0003-0191-7171 ORCiD: Imran Razzak https://orcid.org/0000-0002-3930-6600 ORCiD: Matloob Khushi https://orcid.org/0000-0001-7792-2327 ORCiD: Peter W. Eklund https://orcid.org/0000-0003-2313-8603 ORCiD: Jinman Kim https://orcid.org/0000-0001-5960-1060
Appears in Collections:	Department of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2020 The Author(s). This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).	2.35 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License