Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/33009
Title: BANet: Enhancing Weakly Aligned Multimodal Object Detection via Balanced Bidirectional Alignment Network
Authors: Shi, Y
Li, G
Shen, Z
Meng, H
Pang, Y
Keywords: multimodal object detection;weak alignment;feature alignment;cross-modal fusion;remote sensing
Issue Date: 17-Mar-2026
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Citation: Shi, Y. et al. (2026) 'BANet: Enhancing Weakly Aligned Multimodal Object Detection via Balanced Bidirectional Alignment Network', IEEE Transactions on Geoscience and Remote Sensing, 0 (early access), pp. 1–13. doi: 10.1109/tgrs.2026.3674946.
Abstract: Multimodal object detection in remote sensing imagery has achieved remarkable performance, primarily owing to its ability to exploit complementary information from multiple modalities. However, most existing methods often suffer from substantial performance degradation under weakly aligned conditions, primarily due to the asymmetric utilization of information across different modalities. Therefore, we propose a novel multi-modal object detection network, termed Bidirectional Alignment Network (BANet), which aims to improve detection accuracy in weakly aligned multimodal remote sensing imagery by adopting a dual-path architecture and incorporating a dedicated Weakly Aligned Module (WAM) to explicitly mitigate misalignment and enhance cross-modal feature interaction. Specifically, WAM includes three cooperative components. Firstly, the Adaptive Cross-Modal Correlation Module (ACMCM) is designed to establish semantic correspondence by jointly modeling global dependencies and local similarities in a bidirectional manner. Then, the Symmetric Offset Generator (SOG) adopts a coarse-to-fine strategy to produce stable and symmetric offsets, thereby enabling precise and robust spatial alignment. Finally, the Progressive Fusion Strategy (PFS) adaptively integrates the original and aligned features through learnable weighting, effectively preserving modality-specific characteristics while enhancing both spatial alignment and semantic consistency. Extensive experiments on the DroneVehicle and VEDAI multimodal remote sensing datasets demonstrate the superiority of the proposed method over other advanced multimodal remote sensing object detectors. Notably, BANet performs best on the two datasets with only 8.8M parameters, highlighting its effectiveness and efficiency for real-time UAV applications.
URI: https://bura.brunel.ac.uk/handle/2438/33009
DOI: https://doi.org/10.1109/tgrs.2026.3674946
ISSN: 0196-2892
Other Identifiers: ORCiD: Guoquan Li https://orcid.org/0000-0001-8022-743X
ORCiD: Zhilong Shen https://orcid.org/0000-0002-2170-3907
ORCiD: Hongying Meng https://orcid.org/0000-0002-8836-1382
ORCiD: Yu Pang https://orcid.org/0000-0002-7507-5387
Appears in Collections:Department of Electronic and Electrical Engineering Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfFor the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.34.11 MBAdobe PDFView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.