SSRL: Self-supervised Spatial-temporal Representation Learning for 3D Action recognition

Jin, Z; Wang, Y; Wang, Q; Shen, Y; Meng, H

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/26708

Title:	SSRL: Self-supervised Spatial-temporal Representation Learning for 3D Action recognition
Authors:	Jin, Z Wang, Y Wang, Q Shen, Y Meng, H
Keywords:	self-supervised learning;contrastive learning;skeleton action recognition
Issue Date:	9-Jun-2023
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Citation:	Jin, Z. et al. (2024) 'SSRL: Self-supervised Spatial-temporal Representation Learning for 3D Action recognition, IEEE Transactions on Circuits and Systems for Video Technology, 34 (1), pp. 274 - 285. doi: 10.1109/tcsvt.2023.3284493.
Abstract:	For 3D action recognition, the main challenge is to extract long-range semantic information in both temporal and spatial dimensions. In this paper, in order to better excavate long-range semantic information from large number of unlabelled skeleton sequences, we propose Self-supervised Spatial-temporal Representation Learning (SSRL), a contrastive learning framework to learn skeleton representation. SSRL consists of two novel inference tasks that enable the network to learn global semantic information in the temporal and spatial dimensions, respectively. The temporal inference task learns the temporal persistence of human actions through temporally incomplete skeleton sequences. And the spatial inference task learns the spatially coordinated nature of human action through spatially partially skeleton sequence. We design two transformation modules to efficiently realize these two tasks while fitting the encoder network. To avoid the difficulty of constructing and maintaining high-quality negative samples, our proposed framework learns by maintaining consistency among positive samples without the need of any negative sample. Experiments demonstrate that our proposed method can achieve better results in comparison with state-of-the-art methods under a variety of evaluation protocols on NTU RGB+D 60, PKU-MMD and NTU RGB+D 120 datasets.
URI:	https://bura.brunel.ac.uk/handle/2438/26708
DOI:	https://doi.org/10.1109/tcsvt.2023.3284493
ISSN:	1051-8215
Other Identifiers:	ORCiD: Qicong Wang https://orcid.org/0000-0001-7324-0433 ORCiD: Yehu Shen https://orcid.org/0000-0002-8917-719X ORCiD: Hongying Meng https://orcid.org/0000-0002-8836-1382
Appears in Collections:	Dept of Electronic and Electrical Engineering Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2023 Institute of Electrical and Electronics Engineers (IEEE). Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works by sending a request to pubs-permissions@ieee.org. For more information, see: https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelinesand-policies/post-publication-policies/	21.75 MB	Adobe PDF	View/Open

Show full item record