Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/33009| Title: | BANet: Enhancing Weakly Aligned Multimodal Object Detection via Balanced Bidirectional Alignment Network |
| Authors: | Shi, Y Li, G Shen, Z Meng, H Pang, Y |
| Keywords: | multimodal object detection;weak alignment;feature alignment;cross-modal fusion;remote sensing |
| Issue Date: | 17-Mar-2026 |
| Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
| Citation: | Shi, Y. et al. (2026) 'BANet: Enhancing Weakly Aligned Multimodal Object Detection via Balanced Bidirectional Alignment Network', IEEE Transactions on Geoscience and Remote Sensing, 0 (early access), pp. 1–13. doi: 10.1109/tgrs.2026.3674946. |
| Abstract: | Multimodal object detection in remote sensing imagery has achieved remarkable performance, primarily owing to its ability to exploit complementary information from multiple modalities. However, most existing methods often suffer from substantial performance degradation under weakly aligned conditions, primarily due to the asymmetric utilization of information across different modalities. Therefore, we propose a novel multi-modal object detection network, termed Bidirectional Alignment Network (BANet), which aims to improve detection accuracy in weakly aligned multimodal remote sensing imagery by adopting a dual-path architecture and incorporating a dedicated Weakly Aligned Module (WAM) to explicitly mitigate misalignment and enhance cross-modal feature interaction. Specifically, WAM includes three cooperative components. Firstly, the Adaptive Cross-Modal Correlation Module (ACMCM) is designed to establish semantic correspondence by jointly modeling global dependencies and local similarities in a bidirectional manner. Then, the Symmetric Offset Generator (SOG) adopts a coarse-to-fine strategy to produce stable and symmetric offsets, thereby enabling precise and robust spatial alignment. Finally, the Progressive Fusion Strategy (PFS) adaptively integrates the original and aligned features through learnable weighting, effectively preserving modality-specific characteristics while enhancing both spatial alignment and semantic consistency. Extensive experiments on the DroneVehicle and VEDAI multimodal remote sensing datasets demonstrate the superiority of the proposed method over other advanced multimodal remote sensing object detectors. Notably, BANet performs best on the two datasets with only 8.8M parameters, highlighting its effectiveness and efficiency for real-time UAV applications. |
| URI: | https://bura.brunel.ac.uk/handle/2438/33009 |
| DOI: | https://doi.org/10.1109/tgrs.2026.3674946 |
| ISSN: | 0196-2892 |
| Other Identifiers: | ORCiD: Guoquan Li https://orcid.org/0000-0001-8022-743X ORCiD: Zhilong Shen https://orcid.org/0000-0002-2170-3907 ORCiD: Hongying Meng https://orcid.org/0000-0002-8836-1382 ORCiD: Yu Pang https://orcid.org/0000-0002-7507-5387 |
| Appears in Collections: | Department of Electronic and Electrical Engineering Research Papers |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| FullText.pdf | For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising. | 34.11 MB | Adobe PDF | View/Open |
Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.