Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/32701
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAlkandary, K-
dc.contributor.authorYildiz, AS-
dc.contributor.authorMeng, H-
dc.date.accessioned2026-01-23T10:58:28Z-
dc.date.available2026-01-23T10:58:28Z-
dc.date.issued2026-01-07-
dc.identifierORCiD: Khadijah Alkandary https://orcid.org/0009-0000-0260-0817-
dc.identifierORCiD: Ahmet Serhat Yildiz https://orcid.org/0000-0002-2957-7394-
dc.identifierORCiD: Hongying Meng https://orcid.org/0000-0002-8836-1382-
dc.identifierArticle number: 265-
dc.identifier.citationAlkandary, K., Yildiz, A.S. and . (2026) 'Enhancing Multi Object Tracking with CLIP: A Comparative Study on DeepSORT and StrongSORT', Electronics, 15 (2), 265, pp. 1 - 18. doi: 10.3390/electronics15020265.en_US
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/32701-
dc.descriptionData Availability Statement: The data presented in this study are openly available in https://motchallenge.net (accessed on 1 September 2025), and reference [14]. Wojke, N., Bewley, A. and Paulus, D. (2017) 'Simple Online and Realtime Tracking with a Deep Association Metric', Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17-20 September, pp. 3645 - 3649. doi: 10.1109/ICIP.2017.8296962.en_US
dc.description.abstractMulti object tracking (MOT) is a crucial task in video analysis but is often hindered by frequent identity (ID) switches, particularly in crowded or occluded scenarios. This study explores the integration of a vision-language model, into two tracking by detection frameworks DeepSORT and StrongSORT to enhance appearance-based re-identification. YOLOv8x is employed as the base detector due to its robust localization performance, while CLIP’s visual features replace the default appearance encoders, providing more discriminative and semantically rich embeddings. We evaluated the CLIP enhanced DeepSORT and StrongSORT on sequences from two challenging real world benchmarks: MOT15 and MOT16. Furthermore, we analyze the generalizability of YOLOv8x when trained on the MOT20 benchmark and applied to the chosen trackers on MOT15 and MOT16. Our findings show that both CLIP enhanced trackers substantially reduce ID switches and improve ID-based tracking metrics, with CLIP StrongSORT achieving the most consistent gains. In addition, YOLOv8x demonstrates strong generalization capabilities for unseen datasets. These results highlight the effectiveness of incorporating vision language models into MOT frameworks, particularly under visually challenging conditions.en_US
dc.description.sponsorshipThis research received no external funding.en_US
dc.format.extent1 - 18-
dc.format.mediumElectronic-
dc.languageEnglish-
dc.language.isoen_USen_US
dc.publisherMDPIen_US
dc.rightsCreative Commons Attribution 4.0 International-
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/-
dc.subjectYOLOen_US
dc.subjectDeepSORTen_US
dc.subjectStrongSORTen_US
dc.subjectdetectionen_US
dc.subjecttrackingen_US
dc.subjectautonomous drivingen_US
dc.subjectCLIPen_US
dc.subjectvision-language modelsen_US
dc.titleEnhancing Multi Object Tracking with CLIP: A Comparative Study on DeepSORT and StrongSORTen_US
dc.typeArticleen_US
dc.date.dateAccepted2025-12-30-
dc.identifier.doihttps://doi.org/10.3390/electronics15020265-
dc.relation.isPartOfElectronics-
pubs.issue2-
pubs.publication-statusPublished online-
pubs.volume15-
dc.identifier.eissn2079-9292-
dc.rights.licensehttps://creativecommons.org/licenses/by/4.0/legalcode.en-
dcterms.dateAccepted2025-12-30-
dc.rights.holderThe authors-
dc.contributor.orcidAlkandary, Khadijah [0009-0000-0260-0817]-
dc.contributor.orcidYildiz, Ahmet Serhat [0000-0002-2957-7394]-
dc.contributor.orcidMeng, Hongying [0000-0002-8836-1382]-
Appears in Collections:Dept of Electronic and Electrical Engineering Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).4.56 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons