Visual transformer with depthwise separable convolution projections for video-based human action recognition

Cao, Y; Wang, F; Zheng, Q

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/32290

Title:	Visual transformer with depthwise separable convolution projections for video-based human action recognition
Authors:	Cao, Y Wang, F Zheng, Q
Issue Date:	1-Oct-2025
Publisher:	EDP Sciences
Citation:	Cao, Y., Wang, F. and Zheng, Q. (2025) 'Visual transformer with depthwise separable convolution projections for video-based human action recognition', MATEC Web of Conferences, 413, 06003, pp. 1 - 5. doi: 10.1051/matecconf/202541306003.
Abstract:	Human action recognition is a task that utilizes algorithms to recognize human actions from videos. Transformer-based algorithms have attracted growing attention in recent years. However, transformer networks often suffer from slow convergence and require large amounts of training data, due to their inability to prioritize information from neighboring pixels. To address these issues, we propose a novel network architecture that combines a depthwise separable convolution layer with transformer modules. The proposed network has been evaluated on the medium-sized benchmark dataset UCF101 and the results have demonstrated that the proposed model converges quickly during training and achieves competitive performance compared with SOTA pure transformer network, while reducing approximately 7.4 million parameters.
URI:	https://bura.brunel.ac.uk/handle/2438/32290
DOI:	https://doi.org/10.1051/matecconf/202541306003
ISSN:	2274-7214
Other Identifiers:	ORCiD: Fang Wang https://orcid.org/0000-0003-1987-9150 Article number: 06003
Appears in Collections:	Dept of Mechanical and Aerospace Engineering Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © The Authors, published by EDP Sciences, 2025. Licence: Creative Commons. This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0 (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.	323.91 kB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License