Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/29263
Full metadata record
DC FieldValueLanguage
dc.contributor.authorJafri, FA-
dc.contributor.authorRauniyar, K-
dc.contributor.authorThapa, S-
dc.contributor.authorSiddiqui, MA-
dc.contributor.authorKhushi, M-
dc.contributor.authorNaseem, U-
dc.date.accessioned2024-06-23T21:54:32Z-
dc.date.available2024-06-23T21:54:32Z-
dc.date.issued2024-05-16-
dc.identifierORCiD: Farhan Ahmad Jafri https://orcid.org/0000-0003-2494-2548-
dc.identifierORCiD: Kritesh Rauniyar https://orcid.org/0000-0001-6806-6688-
dc.identifierORCiD: Surendrabikram Thapa https://orcid.org/0000-0003-4119-8239-
dc.identifierORCiD: Mohammad Aman Siddiqui https://orcid.org/0000-0003-2191-9721-
dc.identifierORCiD: Matloob Khushi https://orcid.org/0000-0001-7792-2327-
dc.identifierORCiD: Usman Naseem https://orcid.org/0000-0003-0191-7171-
dc.identifier.citationJafri, F.A. et al. (2024) 'CHUNAV: Analyzing Hindi Hate Speech and Targeted Groups in Indian Election Discourse', ACM Transactions on Asian and Low-Resource Language Information Processing, 0 (ahead of print), pp. 1 - 32. doi: 10.1145/3665245.en_US
dc.identifier.issn2375-4699-
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/29263-
dc.description.abstractIn the ever-evolving landscape of online discourse and political dialogue, the rise of hate speech poses a signiicant challenge to maintaining a respectful and inclusive digital environment. he context becomes particularly complex when considering the Hindi language—a low-resource language with limited available data. To address this pressing concern, we introduce the CHUNAV dataset—a collection of 11,457 Hindi tweets gathered during assembly elections in various states. CHUNAV is purpose-built for hate speech categorization and the identiication of target groups. he dataset is a valuable resource for exploring hate speech within the distinctive socio-political context of Indian elections. he tweets within CHUNAV have been meticulously categorized into “Hate” and “Non-Hate” labels, and further subdivided to pinpoint the speciic targets of hate speech, including “Individual”, “Organization”, and “Community” labels (as shown in Figure 1). Furthermore, this paper presents multiple benchmark models for hate speech detection, along with an innovative ensemble and oversampling-based method. he paper also delves into the results of topic modeling, all aimed at efectively addressing hate speech and target identiication in the Hindi language. his contribution seeks to advance the ield of hate speech analysis and foster a safer and more inclusive online space within the distinctive realm of Indian Assembly Elections.en_US
dc.description.sponsorshipMKis supported by UKRI NERC grant NE/X000192/12.en_US
dc.format.extent1 - 32-
dc.format.mediumPrint-Electronic-
dc.languageEnglish-
dc.language.isoen_USen_US
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.rights© 2024 Copyright held by the owner/author(s). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). The definitive Version of Record was published in ACM Transactions on Asian and Low-Resource Language Information Processing, https://doi.org/10.1145/3665245 (see: https://www.acm.org/publications/policies/copyright-policy).-
dc.rights.urihttps://www.acm.org/publications/policies/copyright-policy-
dc.subjecthate speechen_US
dc.subjectnatural language processingen_US
dc.subjectIndian electionen_US
dc.subjecttopic modelingen_US
dc.subjectensemble methodsen_US
dc.titleCHUNAV: Analyzing Hindi Hate Speech and Targeted Groups in Indian Election Discourseen_US
dc.typeArticleen_US
dc.date.dateAccepted2024-05-09-
dc.identifier.doihttps://doi.org/10.1145/3665245-
dc.relation.isPartOfACM Transactions on Asian and Low-Resource Language Information Processing-
pubs.issueahead of print-
pubs.publication-statusPublished online-
pubs.volume0-
dc.identifier.eissn2375-4702-
dc.rights.holderThe owner/author(s)-
Appears in Collections:Dept of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdf© 2024 Copyright held by the owner/author(s). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). The definitive Version of Record was published in ACM Transactions on Asian and Low-Resource Language Information Processing, https://doi.org/10.1145/3665245 (see: https://www.acm.org/publications/policies/copyright-policy).2.38 MBAdobe PDFView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.