Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM

Huang, Y; Cheng, Y; Wang, K

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/31289

Title:	Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM
Authors:	Huang, Y Cheng, Y Wang, K
Issue Date:	11-Jun-2025
Publisher:	Institute of Electrical and Electronics Engineers (IEEE) on behalf of the Computer Vision Foundation
Citation:	Huang, Y., Cheng, Y. and Wang, K. (2025) 'Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM', 2025 IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), Nashville TN, USA, 11-15 June, pp. 12058 - 12067. Available at: https://openaccess.thecvf.com/content/CVPR2025/html/Huang_Trajectory_Mamba_Efficient_Attention-Mamba_Forecasting_Model_Based_on_Selective_SSM_CVPR_2025_paper.html (accessed: 23 July 2025).
Abstract:	Motion prediction is crucial for autonomous driving, as it enables accurate forecasting of future vehicle trajectories based on historical inputs. This paper introduces Trajectory Mamba, a novel efficient trajectory prediction framework based on the selective state-space model (SSM). Conventional attention-based models face the challenge of computational costs that grow quadratically with the number of targets, hindering their application in highly dynamic environments. In response, we leverage the SSM to redesign the self-attention mechanism in the encoder-decoder architecture, thereby achieving linear time complexity. To address the potential reduction in prediction accuracy resulting from modifications to the attention mechanism, we propose a joint polyline encoding strategy to better capture the associations between static and dynamic contexts, ultimately enhancing prediction accuracy. Additionally, to balance prediction accuracy and inference speed, we adopted the decoder that differs entirely from the encoder. Through cross-state space attention, all target agents share the scene context, allowing the SSM to interact with the shared scene representation during decoding, thus inferring different trajectories over the next prediction steps. Our model achieves state-of-the-art results in terms of inference speed and parameter efficiency on both the Argoverse 1 and Argoverse 2 datasets. It demonstrates a four-fold reduction in FLOPs compared to existing methods and reduces parameter count by over 40% while surpassing the performance of the vast majority of previous methods. These findings validate the effectiveness of Trajectory Mamba in trajectory prediction tasks.
Description:	The CVPR 2025 papers at https://openaccess.thecvf.com/WACV2025 are the Open Access versions, provided by the Computer Vision Foundation. Except for the watermark, they are identical to the accepted versions; the final published version of the proceedings is available on IEEE Xplore at https://ieeexplore.ieee.org/xpl/conhome/1000147/all-proceedings. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.
URI:	https://bura.brunel.ac.uk/handle/2438/31289
Other Identifiers:	ORCiD: Kezhi Wang https://orcid.org/0000-0001-8602-0800 arXiv:2503.10898 [cs.CV]
Appears in Collections:	Dept of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2025 The Authors. This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).	1.51 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License