Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/32290Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Cao, Y | - |
| dc.contributor.author | Wang, F | - |
| dc.contributor.author | Zheng, Q | - |
| dc.coverage.spatial | London, UK | - |
| dc.date.accessioned | 2025-11-05T12:58:27Z | - |
| dc.date.available | 2025-11-05T12:58:27Z | - |
| dc.date.issued | 2025-10-01 | - |
| dc.identifier | ORCiD: Fang Wang https://orcid.org/0000-0003-1987-9150 | - |
| dc.identifier | Article number: 06003 | - |
| dc.identifier.citation | Cao, Y., Wang, F. and Zheng, Q. (2025) 'Visual transformer with depthwise separable convolution projections for video-based human action recognition', MATEC Web of Conferences, 413, 06003, pp. 1 - 5. doi: 10.1051/matecconf/202541306003. | en_US |
| dc.identifier.issn | 2274-7214 | - |
| dc.identifier.uri | https://bura.brunel.ac.uk/handle/2438/32290 | - |
| dc.description.abstract | Human action recognition is a task that utilizes algorithms to recognize human actions from videos. Transformer-based algorithms have attracted growing attention in recent years. However, transformer networks often suffer from slow convergence and require large amounts of training data, due to their inability to prioritize information from neighboring pixels. To address these issues, we propose a novel network architecture that combines a depthwise separable convolution layer with transformer modules. The proposed network has been evaluated on the medium-sized benchmark dataset UCF101 and the results have demonstrated that the proposed model converges quickly during training and achieves competitive performance compared with SOTA pure transformer network, while reducing approximately 7.4 million parameters. | en_US |
| dc.description.sponsorship | This work is supported by the Zhongyuan University of Technology-Brunel University London (ZUT-BUL) Joint Doctoral Training Programme. This work is funded by the ZUT/BRUNEL scholarship. | en_US |
| dc.format.extent | 1 - 5 | - |
| dc.format.medium | Print-Electronic | - |
| dc.language | English | - |
| dc.language.iso | en_US | en_US |
| dc.publisher | EDP Sciences | en_US |
| dc.rights | Creative Commons Attribution 4.0 International | - |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | - |
| dc.source | International Conference on Measurement, AI, Quality and Sustainability (MAIQS 2025) | - |
| dc.source | International Conference on Measurement, AI, Quality and Sustainability (MAIQS 2025) | - |
| dc.title | Visual transformer with depthwise separable convolution projections for video-based human action recognition | en_US |
| dc.type | Conference Paper | en_US |
| dc.date.dateAccepted | 2025-06-08 | - |
| dc.identifier.doi | https://doi.org/10.1051/matecconf/202541306003 | - |
| dc.relation.isPartOf | MATEC Web of Conferences | - |
| pubs.finish-date | 2025-08-28 | - |
| pubs.finish-date | 2025-08-28 | - |
| pubs.publication-status | Published | - |
| pubs.start-date | 2025-08-26 | - |
| pubs.start-date | 2025-08-26 | - |
| pubs.volume | 413 | - |
| dc.identifier.eissn | 2261-236X | - |
| dc.rights.license | https://creativecommons.org/licenses/by/4.0/legalcode.en | - |
| dcterms.dateAccepted | 2025-06-08 | - |
| dc.rights.holder | The Authors | - |
| Appears in Collections: | Dept of Mechanical and Aerospace Engineering Research Papers | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| FullText.pdf | Copyright © The Authors, published by EDP Sciences, 2025. Licence: Creative Commons. This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0 (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. | 323.91 kB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License