Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/31532
Title: DAF-DETR: A dynamic adaptation feature transformer for enhanced object detection in unmanned aerial vehicles
Authors: Song, B
Zhao, S
Wang, Z
Liu, W
Liu, X
Keywords: object detection;complex environments;tiny object detection;transformer;unmanned aerial vehicles
Issue Date: 27-May-2025
Publisher: Elsevier
Citation: Song, B. et al. (2025) 'DAF-DETR: A dynamic adaptation feature transformer for enhanced object detection in unmanned aerial vehicles', Knowledge Based Systems, 323, 113760, pp. 1 - 13. doi: 10.1016/j.knosys.2025.113760.
Abstract: Object detection in complex environments is challenged by overlapping objects, complex spatial relationships, and dynamic variations in target scales. To address these challenges, the Dynamic Adaptation Feature DEtection TRansformer (DAF-DETR) is proposed as a novel transformer-based model optimized for real-time detection in spatially complex environments. The framework introduces four key innovations. First, a learnable position encoding mechanism is employed in place of fixed positional encoding, enhancing adaptability and flexibility when processing complex spatial layouts. Second, the Resynthetic Network (ResynNet) backbone, which consists of stacked Resynthetic Blocks (ResynBlocks) integrating ResBlock and FasterBlock feature extraction strategies, is designed to optimize multi-scale feature representation and improve computational efficiency. Third, an enhanced feature fusion module is incorporated to strengthen the detection of small, densely packed objects by integrating multi-scale contextual information. Fourth, a dynamic perception module is introduced, utilizing deformable attention to capture complex spatial relationships between overlapping objects. Extensive experiments conducted on the Vision meets Drone 2019 (VisDrone2019) and Tiny Object Detection in Aerial Images (AI-TOD) datasets demonstrate the superiority of DAF-DETR, achieving state-of-the-art detection accuracy while maintaining real-time efficiency. The results confirm its robustness in handling scale variations, occlusions, and spatial complexity, establishing it as a reliable solution for real-world applications such as aerial imagery and crowded scene analysis.
Description: Data availability: Data will be made available on request.
URI: https://bura.brunel.ac.uk/handle/2438/31532
DOI: https://doi.org/10.1016/j.knosys.2025.113760
ISSN: 0950-7051
Other Identifiers: ORCiD: Baoye Song https://orcid.org/0000-0003-1631-5237
ORCiD: Zidong Wang https://orcid.org/0000-0002-9576-7401
ORCiD: Weibo Liu https://orcid.org/0000-0002-8169-3261
ORCiD: Xiaohui Liu https://orcid.org/0000-0003-1589-1267
Article number: 113760
Appears in Collections:Dept of Computer Science Embargoed Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfEmbargoed until 27 May 2026. Copyright © 2025 Elsevier B.V. All rights reserved. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/ (see: https://www.elsevier.com/about/policies/sharing).9.31 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons