Word4Per: Zero-shot Composed Person Retrieval

Liu, D; Li, H; Zhao, Z; Su, F; Meng, H

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/30890

Full metadata record

DC Field	Value	Language
dc.contributor.author	Liu, D	-
dc.contributor.author	Li, H	-
dc.contributor.author	Zhao, Z	-
dc.contributor.author	Su, F	-
dc.contributor.author	Meng, H	-
dc.date.accessioned	2025-03-10T16:56:29Z	-
dc.date.available	2025-03-10T16:56:29Z	-
dc.date.issued	2023-11-25	-
dc.identifier	ORCiD: Hongying Meng https://orcid.org/0000-0002-8836-1382	-
dc.identifier	arXiv:2311.16515v3 [cs.CV]	-
dc.identifier.citation	Liu, D. et al. (2023) 'Word4Per: Zero-shot Composed Person Retrieval', arXiv preprint, arXiv:2311.16515v3 [cs.CV], pp. 1 - 12. doi: 10.48550/arXiv.2311.16515.	en_US
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/30890	-
dc.description	The version of the article is a preprint [v3] Mon, 25 Nov 2024 18:11:18 UTC (4,291 KB). It has not been certified by peer review.	en_US
dc.description	The code and ITCPR dataset will be publicly available at https://github.com/Delong-liu-bupt/Word4Per	-
dc.description.abstract	Searching for specific person has great social benefits and security value, and it often involves a combination of visual and textual information. Conventional person retrieval methods, whether image-based or text-based, usually fall short in effectively harnessing both types of information, leading to the loss of accuracy. In this paper, a whole new task called Composed Person Retrieval (CPR) is proposed to jointly utilize both image and text information for target person retrieval. However, the supervised CPR requires very costly manual annotation dataset, while there are currently no available resources. To mitigate this issue, we firstly introduce the Zero-shot Composed Person Retrieval (ZS-CPR), which leverages existing domain-related data to resolve the CPR problem without expensive annotations. Secondly, to learn ZS-CPR model, we propose a two-stage learning framework, Word4Per, where a lightweight Textual Inversion Network (TINet) and a text-based person retrieval model based on fine-tuned Contrastive Language-Image Pre-training (CLIP) network are learned without utilizing any CPR data. Thirdly, a finely annotated Image-Text Composed Person Retrieval (ITCPR) dataset is built as the benchmark to assess the performance of the proposed Word4Per framework. Extensive experiments under both Rank-1 and mAP demonstrate the effectiveness of Word4Per for the ZS-CPR task, surpassing the comparative methods by over 10\%.	en_US
dc.description.sponsorship	The work is supported by The Key R&D Program of Yunnan Province (202102AE09001902-2), and the BUPT Innovation and Entrepreneurship Support Program (2024-YC-T030).	en_US
dc.language	English	-
dc.language.iso	en_US	en_US
dc.publisher	Cornell University	en_US
dc.rights	The URI http://arxiv.org/licenses/nonexclusive-distrib/1.0/ is used to record the fact that the submitter granted the following license to arXiv.org on submission of an article: * I grant arXiv.org a perpetual, non-exclusive license to distribute this article. * I certify that I have the right to grant this license. * I understand that submissions cannot be completely removed once accepted. * I understand that arXiv.org reserves the right to reclassify or reject any submission.	-
dc.rights.uri	https://arxiv.org/licenses/nonexclusive-distrib/1.0/license.html	-
dc.subject	zero-shot	en_US
dc.subject	composed person retrieval	en_US
dc.subject	ITCPR	en_US
dc.subject	dataset	-
dc.subject	textual inversion network	-
dc.subject	computer vision and pattern recognition (cs.CV)	-
dc.subject	artificial intelligence (cs.AI)	-
dc.subject	information retrieval (cs.IR)	-
dc.title	Word4Per: Zero-shot Composed Person Retrieval	en_US
dc.type	Preprint	en_US
dc.identifier.doi	https://doi.org/10.48550/arXiv.2311.16515	-
dc.relation.isPartOf	arXiv	-
dc.identifier.eissn	2331-8422	-
dcterms.dateAccepted	2023-11-25	-
dc.rights.holder	The Authors	-
Appears in Collections:	Dept of Electronic and Electrical Engineering Research Papers

Files in This Item:

File	Description	Size	Format
Preprint_v3.pdf	The URI http://arxiv.org/licenses/nonexclusive-distrib/1.0/ is used to record the fact that the submitter granted the following license to arXiv.org on submission of an article: * I grant arXiv.org a perpetual, non-exclusive license to distribute this article. * I certify that I have the right to grant this license. * I understand that submissions cannot be completely removed once accepted. * I understand that arXiv.org reserves the right to reclassify or reject any submission.	3.62 MB	Adobe PDF	View/Open

Show simple item record