CHUNAV: Analyzing Hindi Hate Speech and Targeted Groups in Indian Election Discourse

Jafri, FA; Rauniyar, K; Thapa, S; Siddiqui, MA; Khushi, M; Naseem, U

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/29263

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jafri, FA	-
dc.contributor.author	Rauniyar, K	-
dc.contributor.author	Thapa, S	-
dc.contributor.author	Siddiqui, MA	-
dc.contributor.author	Khushi, M	-
dc.contributor.author	Naseem, U	-
dc.date.accessioned	2024-06-23T21:54:32Z	-
dc.date.available	2024-06-23T21:54:32Z	-
dc.date.issued	2024-05-16	-
dc.identifier	ORCiD: Farhan Ahmad Jafri https://orcid.org/0000-0003-2494-2548	-
dc.identifier	ORCiD: Kritesh Rauniyar https://orcid.org/0000-0001-6806-6688	-
dc.identifier	ORCiD: Surendrabikram Thapa https://orcid.org/0000-0003-4119-8239	-
dc.identifier	ORCiD: Mohammad Aman Siddiqui https://orcid.org/0000-0003-2191-9721	-
dc.identifier	ORCiD: Matloob Khushi https://orcid.org/0000-0001-7792-2327	-
dc.identifier	ORCiD: Usman Naseem https://orcid.org/0000-0003-0191-7171	-
dc.identifier.citation	Jafri, F.A. et al. (2024) 'CHUNAV: Analyzing Hindi Hate Speech and Targeted Groups in Indian Election Discourse', ACM Transactions on Asian and Low-Resource Language Information Processing, 0 (ahead of print), pp. 1 - 32. doi: 10.1145/3665245.	en_US
dc.identifier.issn	2375-4699	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/29263	-
dc.description.abstract	In the ever-evolving landscape of online discourse and political dialogue, the rise of hate speech poses a signiicant challenge to maintaining a respectful and inclusive digital environment. he context becomes particularly complex when considering the Hindi language—a low-resource language with limited available data. To address this pressing concern, we introduce the CHUNAV dataset—a collection of 11,457 Hindi tweets gathered during assembly elections in various states. CHUNAV is purpose-built for hate speech categorization and the identiication of target groups. he dataset is a valuable resource for exploring hate speech within the distinctive socio-political context of Indian elections. he tweets within CHUNAV have been meticulously categorized into “Hate” and “Non-Hate” labels, and further subdivided to pinpoint the speciic targets of hate speech, including “Individual”, “Organization”, and “Community” labels (as shown in Figure 1). Furthermore, this paper presents multiple benchmark models for hate speech detection, along with an innovative ensemble and oversampling-based method. he paper also delves into the results of topic modeling, all aimed at efectively addressing hate speech and target identiication in the Hindi language. his contribution seeks to advance the ield of hate speech analysis and foster a safer and more inclusive online space within the distinctive realm of Indian Assembly Elections.	en_US
dc.description.sponsorship	MKis supported by UKRI NERC grant NE/X000192/12.	en_US
dc.format.extent	1 - 32	-
dc.format.medium	Print-Electronic	-
dc.language	English	-
dc.language.iso	en_US	en_US
dc.publisher	Association for Computing Machinery (ACM)	en_US
dc.rights	© 2024 Copyright held by the owner/author(s). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). The definitive Version of Record was published in ACM Transactions on Asian and Low-Resource Language Information Processing, https://doi.org/10.1145/3665245 (see: https://www.acm.org/publications/policies/copyright-policy).	-
dc.rights.uri	https://www.acm.org/publications/policies/copyright-policy	-
dc.subject	hate speech	en_US
dc.subject	natural language processing	en_US
dc.subject	Indian election	en_US
dc.subject	topic modeling	en_US
dc.subject	ensemble methods	en_US
dc.title	CHUNAV: Analyzing Hindi Hate Speech and Targeted Groups in Indian Election Discourse	en_US
dc.type	Article	en_US
dc.date.dateAccepted	2024-05-09	-
dc.identifier.doi	https://doi.org/10.1145/3665245	-
dc.relation.isPartOf	ACM Transactions on Asian and Low-Resource Language Information Processing	-
pubs.issue	ahead of print	-
pubs.publication-status	Published online	-
pubs.volume	0	-
dc.identifier.eissn	2375-4702	-
dc.rights.holder	The owner/author(s)	-
Appears in Collections:	Department of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	© 2024 Copyright held by the owner/author(s). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). The definitive Version of Record was published in ACM Transactions on Asian and Low-Resource Language Information Processing, https://doi.org/10.1145/3665245 (see: https://www.acm.org/publications/policies/copyright-policy).	2.38 MB	Adobe PDF	View/Open

Show simple item record