Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/29263
Title: CHUNAV: Analyzing Hindi Hate Speech and Targeted Groups in Indian Election Discourse
Authors: Jafri, FA
Rauniyar, K
Thapa, S
Siddiqui, MA
Khushi, M
Naseem, U
Keywords: hate speech;natural language processing;Indian election;topic modeling;ensemble methods
Issue Date: 16-May-2024
Publisher: Association for Computing Machinery (ACM)
Citation: Jafri, F.A. et al. (2024) 'CHUNAV: Analyzing Hindi Hate Speech and Targeted Groups in Indian Election Discourse', ACM Transactions on Asian and Low-Resource Language Information Processing, 0 (ahead of print), pp. 1 - 32. doi: 10.1145/3665245.
Abstract: In the ever-evolving landscape of online discourse and political dialogue, the rise of hate speech poses a signiicant challenge to maintaining a respectful and inclusive digital environment. he context becomes particularly complex when considering the Hindi language—a low-resource language with limited available data. To address this pressing concern, we introduce the CHUNAV dataset—a collection of 11,457 Hindi tweets gathered during assembly elections in various states. CHUNAV is purpose-built for hate speech categorization and the identiication of target groups. he dataset is a valuable resource for exploring hate speech within the distinctive socio-political context of Indian elections. he tweets within CHUNAV have been meticulously categorized into “Hate” and “Non-Hate” labels, and further subdivided to pinpoint the speciic targets of hate speech, including “Individual”, “Organization”, and “Community” labels (as shown in Figure 1). Furthermore, this paper presents multiple benchmark models for hate speech detection, along with an innovative ensemble and oversampling-based method. he paper also delves into the results of topic modeling, all aimed at efectively addressing hate speech and target identiication in the Hindi language. his contribution seeks to advance the ield of hate speech analysis and foster a safer and more inclusive online space within the distinctive realm of Indian Assembly Elections.
URI: https://bura.brunel.ac.uk/handle/2438/29263
DOI: https://doi.org/10.1145/3665245
ISSN: 2375-4699
Other Identifiers: ORCiD: Farhan Ahmad Jafri https://orcid.org/0000-0003-2494-2548
ORCiD: Kritesh Rauniyar https://orcid.org/0000-0001-6806-6688
ORCiD: Surendrabikram Thapa https://orcid.org/0000-0003-4119-8239
ORCiD: Mohammad Aman Siddiqui https://orcid.org/0000-0003-2191-9721
ORCiD: Matloob Khushi https://orcid.org/0000-0001-7792-2327
ORCiD: Usman Naseem https://orcid.org/0000-0003-0191-7171
Appears in Collections:Dept of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdf© 2024 Copyright held by the owner/author(s). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). The definitive Version of Record was published in ACM Transactions on Asian and Low-Resource Language Information Processing, https://doi.org/10.1145/3665245 (see: https://www.acm.org/publications/policies/copyright-policy).2.38 MBAdobe PDFView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.