Enhancing Multi Object Tracking with CLIP: A Comparative Study on DeepSORT and StrongSORT

Alkandary, K; Yildiz, AS; Meng, H

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/32701

Full metadata record

DC Field	Value	Language
dc.contributor.author	Alkandary, K	-
dc.contributor.author	Yildiz, AS	-
dc.contributor.author	Meng, H	-
dc.date.accessioned	2026-01-23T10:58:28Z	-
dc.date.available	2026-01-23T10:58:28Z	-
dc.date.issued	2026-01-07	-
dc.identifier	ORCiD: Khadijah Alkandary https://orcid.org/0009-0000-0260-0817	-
dc.identifier	ORCiD: Ahmet Serhat Yildiz https://orcid.org/0000-0002-2957-7394	-
dc.identifier	ORCiD: Hongying Meng https://orcid.org/0000-0002-8836-1382	-
dc.identifier	Article number: 265	-
dc.identifier.citation	Alkandary, K., Yildiz, A.S. and . (2026) 'Enhancing Multi Object Tracking with CLIP: A Comparative Study on DeepSORT and StrongSORT', Electronics, 15 (2), 265, pp. 1 - 18. doi: 10.3390/electronics15020265.	en_US
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/32701	-
dc.description	Data Availability Statement: The data presented in this study are openly available in https://motchallenge.net (accessed on 1 September 2025), and reference [14]. Wojke, N., Bewley, A. and Paulus, D. (2017) 'Simple Online and Realtime Tracking with a Deep Association Metric', Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17-20 September, pp. 3645 - 3649. doi: 10.1109/ICIP.2017.8296962.	en_US
dc.description.abstract	Multi object tracking (MOT) is a crucial task in video analysis but is often hindered by frequent identity (ID) switches, particularly in crowded or occluded scenarios. This study explores the integration of a vision-language model, into two tracking by detection frameworks DeepSORT and StrongSORT to enhance appearance-based re-identification. YOLOv8x is employed as the base detector due to its robust localization performance, while CLIP’s visual features replace the default appearance encoders, providing more discriminative and semantically rich embeddings. We evaluated the CLIP enhanced DeepSORT and StrongSORT on sequences from two challenging real world benchmarks: MOT15 and MOT16. Furthermore, we analyze the generalizability of YOLOv8x when trained on the MOT20 benchmark and applied to the chosen trackers on MOT15 and MOT16. Our findings show that both CLIP enhanced trackers substantially reduce ID switches and improve ID-based tracking metrics, with CLIP StrongSORT achieving the most consistent gains. In addition, YOLOv8x demonstrates strong generalization capabilities for unseen datasets. These results highlight the effectiveness of incorporating vision language models into MOT frameworks, particularly under visually challenging conditions.	en_US
dc.description.sponsorship	This research received no external funding.	en_US
dc.format.extent	1 - 18	-
dc.format.medium	Electronic	-
dc.language	English	-
dc.language.iso	en_US	en_US
dc.publisher	MDPI	en_US
dc.rights	Creative Commons Attribution 4.0 International	-
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	-
dc.subject	YOLO	en_US
dc.subject	DeepSORT	en_US
dc.subject	StrongSORT	en_US
dc.subject	detection	en_US
dc.subject	tracking	en_US
dc.subject	autonomous driving	en_US
dc.subject	CLIP	en_US
dc.subject	vision-language models	en_US
dc.title	Enhancing Multi Object Tracking with CLIP: A Comparative Study on DeepSORT and StrongSORT	en_US
dc.type	Article	en_US
dc.date.dateAccepted	2025-12-30	-
dc.identifier.doi	https://doi.org/10.3390/electronics15020265	-
dc.relation.isPartOf	Electronics	-
pubs.issue	2	-
pubs.publication-status	Published online	-
pubs.volume	15	-
dc.identifier.eissn	2079-9292	-
dc.rights.license	https://creativecommons.org/licenses/by/4.0/legalcode.en	-
dcterms.dateAccepted	2025-12-30	-
dc.rights.holder	The authors	-
dc.contributor.orcid	Alkandary, Khadijah [0009-0000-0260-0817]	-
dc.contributor.orcid	Yildiz, Ahmet Serhat [0000-0002-2957-7394]	-
dc.contributor.orcid	Meng, Hongying [0000-0002-8836-1382]	-
Appears in Collections:	Department of Electronic and Electrical Engineering Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).	4.56 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License