SSRL: Self-supervised Spatial-temporal Representation Learning for 3D Action recognition

Jin, Z; Wang, Y; Wang, Q; Shen, Y; Meng, H

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/26708

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jin, Z	-
dc.contributor.author	Wang, Y	-
dc.contributor.author	Wang, Q	-
dc.contributor.author	Shen, Y	-
dc.contributor.author	Meng, H	-
dc.date.accessioned	2023-06-21T12:45:41Z	-
dc.date.available	2023-06-21T12:45:41Z	-
dc.date.issued	2023-06-09	-
dc.identifier	ORCiD: Qicong Wang https://orcid.org/0000-0001-7324-0433	-
dc.identifier	ORCiD: Yehu Shen https://orcid.org/0000-0002-8917-719X	-
dc.identifier	ORCiD: Hongying Meng https://orcid.org/0000-0002-8836-1382	-
dc.identifier.citation	Jin, Z. et al. (2024) 'SSRL: Self-supervised Spatial-temporal Representation Learning for 3D Action recognition, IEEE Transactions on Circuits and Systems for Video Technology, 34 (1), pp. 274 - 285. doi: 10.1109/tcsvt.2023.3284493.	en_US
dc.identifier.issn	1051-8215	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/26708	-
dc.description.abstract	For 3D action recognition, the main challenge is to extract long-range semantic information in both temporal and spatial dimensions. In this paper, in order to better excavate long-range semantic information from large number of unlabelled skeleton sequences, we propose Self-supervised Spatial-temporal Representation Learning (SSRL), a contrastive learning framework to learn skeleton representation. SSRL consists of two novel inference tasks that enable the network to learn global semantic information in the temporal and spatial dimensions, respectively. The temporal inference task learns the temporal persistence of human actions through temporally incomplete skeleton sequences. And the spatial inference task learns the spatially coordinated nature of human action through spatially partially skeleton sequence. We design two transformation modules to efficiently realize these two tasks while fitting the encoder network. To avoid the difficulty of constructing and maintaining high-quality negative samples, our proposed framework learns by maintaining consistency among positive samples without the need of any negative sample. Experiments demonstrate that our proposed method can achieve better results in comparison with state-of-the-art methods under a variety of evaluation protocols on NTU RGB+D 60, PKU-MMD and NTU RGB+D 120 datasets.	-
dc.description.sponsorship	Shenzhen Science and Technology Project (Grant Number: JCYJ20200109143035495); 10.13039/501100003392-Natural Science Foundation of Fujian Province (Grant Number: 2023J01003); 10.13039/501100001809-National Natural Science Foundation of China (Grant Number: 51975394).	en_US
dc.format.extent	274 - 285	-
dc.format.medium	Print-Electronic	-
dc.language.iso	en_US	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.rights	Copyright © 2023 Institute of Electrical and Electronics Engineers (IEEE). Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works by sending a request to pubs-permissions@ieee.org. For more information, see: https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelinesand-policies/post-publication-policies/	-
dc.rights.uri	https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelinesand-policies/post-publication-policies/	-
dc.subject	self-supervised learning	en_US
dc.subject	contrastive learning	en_US
dc.subject	skeleton action recognition	en_US
dc.title	SSRL: Self-supervised Spatial-temporal Representation Learning for 3D Action recognition	en_US
dc.type	Article	en_US
dc.identifier.doi	https://doi.org/10.1109/tcsvt.2023.3284493	-
dc.relation.isPartOf	IEEE Transactions on Circuits and Systems for Video Technology	-
pubs.issue	1	-
pubs.publication-status	Published	-
pubs.volume	34	-
dc.identifier.eissn	1558-2205	-
dcterms.dateAccepted	2023-06-06	-
dc.rights.holder	Institute of Electrical and Electronics Engineers (IEEE)	-
Appears in Collections:	Dept of Electronic and Electrical Engineering Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2023 Institute of Electrical and Electronics Engineers (IEEE). Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works by sending a request to pubs-permissions@ieee.org. For more information, see: https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelinesand-policies/post-publication-policies/	21.75 MB	Adobe PDF	View/Open

Show simple item record