A Linguistic Grounding-Infused Contrastive Learning Approach for Health Mention Classification on Social Media

Naseem, U; Kim, J; Khush, M; Dunn, AG

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/29262

Full metadata record

DC Field	Value	Language
dc.contributor.author	Naseem, U	-
dc.contributor.author	Kim, J	-
dc.contributor.author	Khush, M	-
dc.contributor.author	Dunn, AG	-
dc.date.accessioned	2024-06-23T21:32:07Z	-
dc.date.available	2024-06-23T21:32:07Z	-
dc.date.issued	2024-03-04	-
dc.identifier	ORCiD: Usman Naseem https://orcid.org/0000-0003-0191-7171	-
dc.identifier	ORCiD: Jinmaan Kim https://orcid.org/0000-0001-5960-1060	-
dc.identifier	ORCiD: Matloob Khushi https://orcid.org/0000-0001-7792-2327	-
dc.identifier	ORCiD: Adam G. Dunn https://orcid.org/0000-0002-1720-8209	-
dc.identifier.citation	Naseem, U. et al. (2024) 'A Linguistic Grounding-Infused Contrastive Learning Approach for Health Mention Classification on Social Media', WSDM 2024 - Proceedings of the 17th ACM International Conference on Web Search and Data Mining, Merida, Mexico, 4-8 Marchpp. 529 - 537. doi: 10.1145/3616855.3635763.	en_US
dc.identifier.isbn	979-8-4007-0371-3	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/29262	-
dc.description.abstract	Social media users use disease and symptoms words in different ways, including describing their personal health experiences figuratively or in other general discussions. The health mention classification (HMC) task aims to separate how people use terms, which is important in public health applications. Existing HMC studies address this problem using pretrained language models (PLMs). However, the remaining gaps in the area include the need for linguistic grounding, the requirement for large volumes of labelled data, and that solutions are often only tested on Twitter or Reddit, which provides limited evidence of the transportability of models. To address these gaps, we propose a novel method that uses a transformer-based PLM to obtain a contextual representation of target (disease or symptom) terms coupled with a contrastive loss to establish a larger gap between target terms' literal and figurative uses using linguistic theories. We introduce the use of a simple and effective approach for harvesting candidate instances from the broad corpus and generalising the proposed method using self-Training to address the label scarcity challenge. Our experiments on publicly available health-mention datasets from Twitter (HMC2019) and Reddit (RHMD) demonstrate that our method outperforms the state-of-The-Art HMC methods on both datasets for the HMC task. We further analyse the transferability and generalisability of our method and conclude with a discussion on the empirical and ethical considerations of our study.	en_US
dc.format.extent	529 - 537	-
dc.format.medium	Electronic	-
dc.language.iso	en_US	en_US
dc.rights	© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee (see: https://www.acm.org/publications/policies/copyright-policy) .Request permissions from permissions@acm.org. The definitive Version of Record was published in WSDM’24, March 4–8, 2024, Merida, Mexico, https://doi.org/10.1145/3616855.3635763.	-
dc.rights.uri	https://www.acm.org/publications/policies/copyright-policy	-
dc.subject	health mention classification	en_US
dc.subject	public health surveillance	en_US
dc.subject	contrastive learning	en_US
dc.subject	social media	en_US
dc.title	A Linguistic Grounding-Infused Contrastive Learning Approach for Health Mention Classification on Social Media	en_US
dc.type	Conference Paper	en_US
dc.date.dateAccepted	2023-10-19	-
dc.identifier.doi	https://doi.org/10.1145/3616855.3635763	-
dc.relation.isPartOf	WSDM 2024 - Proceedings of the 17th ACM International Conference on Web Search and Data Mining	-
pubs.publication-status	Published	-
dc.rights.holder	The owner/author(s)	-
Appears in Collections:	Department of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee (see: https://www.acm.org/publications/policies/copyright-policy) .Request permissions from permissions@acm.org. The definitive Version of Record was published in WSDM’24, March 4–8, 2024, Merida, Mexico, https://doi.org/10.1145/3616855.3635763.	1.38 MB	Adobe PDF	View/Open

Show simple item record