Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/32701
Title: Enhancing Multi Object Tracking with CLIP: A Comparative Study on DeepSORT and StrongSORT
Authors: Alkandary, K
Yildiz, AS
Meng, H
Keywords: YOLO;DeepSORT;StrongSORT;detection;tracking;autonomous driving;CLIP;vision-language models
Issue Date: 7-Jan-2026
Publisher: MDPI
Citation: Alkandary, K., Yildiz, A.S. and . (2026) 'Enhancing Multi Object Tracking with CLIP: A Comparative Study on DeepSORT and StrongSORT', Electronics, 15 (2), 265, pp. 1 - 18. doi: 10.3390/electronics15020265.
Abstract: Multi object tracking (MOT) is a crucial task in video analysis but is often hindered by frequent identity (ID) switches, particularly in crowded or occluded scenarios. This study explores the integration of a vision-language model, into two tracking by detection frameworks DeepSORT and StrongSORT to enhance appearance-based re-identification. YOLOv8x is employed as the base detector due to its robust localization performance, while CLIP’s visual features replace the default appearance encoders, providing more discriminative and semantically rich embeddings. We evaluated the CLIP enhanced DeepSORT and StrongSORT on sequences from two challenging real world benchmarks: MOT15 and MOT16. Furthermore, we analyze the generalizability of YOLOv8x when trained on the MOT20 benchmark and applied to the chosen trackers on MOT15 and MOT16. Our findings show that both CLIP enhanced trackers substantially reduce ID switches and improve ID-based tracking metrics, with CLIP StrongSORT achieving the most consistent gains. In addition, YOLOv8x demonstrates strong generalization capabilities for unseen datasets. These results highlight the effectiveness of incorporating vision language models into MOT frameworks, particularly under visually challenging conditions.
Description: Data Availability Statement: The data presented in this study are openly available in https://motchallenge.net (accessed on 1 September 2025), and reference [14]. Wojke, N., Bewley, A. and Paulus, D. (2017) 'Simple Online and Realtime Tracking with a Deep Association Metric', Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17-20 September, pp. 3645 - 3649. doi: 10.1109/ICIP.2017.8296962.
URI: https://bura.brunel.ac.uk/handle/2438/32701
DOI: https://doi.org/10.3390/electronics15020265
Other Identifiers: ORCiD: Khadijah Alkandary https://orcid.org/0009-0000-0260-0817
ORCiD: Ahmet Serhat Yildiz https://orcid.org/0000-0002-2957-7394
ORCiD: Hongying Meng https://orcid.org/0000-0002-8836-1382
Article number: 265
Appears in Collections:Dept of Electronic and Electrical Engineering Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).4.56 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons