Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/32290
Title: Visual transformer with depthwise separable convolution projections for video-based human action recognition
Authors: Cao, Y
Wang, F
Zheng, Q
Issue Date: 1-Oct-2025
Publisher: EDP Sciences
Citation: Cao, Y., Wang, F. and Zheng, Q. (2025) 'Visual transformer with depthwise separable convolution projections for video-based human action recognition', MATEC Web of Conferences, 413, 06003, pp. 1 - 5. doi: 10.1051/matecconf/202541306003.
Abstract: Human action recognition is a task that utilizes algorithms to recognize human actions from videos. Transformer-based algorithms have attracted growing attention in recent years. However, transformer networks often suffer from slow convergence and require large amounts of training data, due to their inability to prioritize information from neighboring pixels. To address these issues, we propose a novel network architecture that combines a depthwise separable convolution layer with transformer modules. The proposed network has been evaluated on the medium-sized benchmark dataset UCF101 and the results have demonstrated that the proposed model converges quickly during training and achieves competitive performance compared with SOTA pure transformer network, while reducing approximately 7.4 million parameters.
URI: https://bura.brunel.ac.uk/handle/2438/32290
DOI: https://doi.org/10.1051/matecconf/202541306003
ISSN: 2274-7214
Other Identifiers: ORCiD: Fang Wang https://orcid.org/0000-0003-1987-9150
Article number: 06003
Appears in Collections:Dept of Mechanical and Aerospace Engineering Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © The Authors, published by EDP Sciences, 2025. Licence: Creative Commons. This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0 (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.323.91 kBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons