A Semi-Supervised Acoustic Scene Classification Network Based on Multi-Modal Information Fusion

Yang, J; Liu, H; Shi, L; Gan, L; Nishizaki, H; Leow, CS

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/33270

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yang, J	-
dc.contributor.author	Liu, H	-
dc.contributor.author	Shi, L	-
dc.contributor.author	Gan, L	-
dc.contributor.author	Nishizaki, H	-
dc.contributor.author	Leow, CS	-
dc.coverage.spatial	Singapore	-
dc.date.accessioned	2026-05-13T08:46:03Z	-
dc.date.available	2026-05-13T08:46:03Z	-
dc.date.issued	2025-10-22	-
dc.identifier	ORCiD: Lu Gan https://orcid.org/0000-0003-1056-7660	-
dc.identifier.citation	Yang, J. et al. (2025) 'A Semi-Supervised Acoustic Scene Classification Network Based on Multi-Modal Information Fusion', 2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Singapore, 22–24 October, pp. 177–181. doi: 10.1109/apsipaasc65261.2025.11249027.	en-US
dc.identifier.isbn	979-8-3315-7206-8	-
dc.identifier.isbn	979-8-3315-7207-5	-
dc.identifier.issn	2640-009X	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/33270	-
dc.description	Code Availability: We provide the code and checkpoint at https://github.com/JunkangYang/ALPS-ASC.	en-US
dc.description.abstract	This paper presents our semi-supervised acoustic scene classification (ASC) framework submitted to the APSIPA ASC 2025 Grand Challenge, which focuses on city- and timeaware ASC under limited labeled data. Our approach leverages a multi-modal network architecture that fuses audio melspectrograms with spatiotemporal metadata (city identity and timestamps) to capture dynamic acoustic scene variations across urban environments. The model employs a residual-based CNN with attention mechanisms for robust feature extraction, enhanced by multi-modal fusion. To address label scarcity, we adopt a staged semi-supervised pipeline: pre-training on TAU Urban Acoustic Scenes 2020 and CochlScene datasets with specaugment and mixup augmentations, and then iterative fine-tuning on challenge data with pseudo-labeling to expand the training set was conducted, resulting in performance improvement. Experimental results demonstrate the efficacy of our city/time-aware design and semi-supervised strategies on our validation data.	en-US
dc.format.extent	177–181	-
dc.format.medium	Print-Electronic	-
dc.language	English	-
dc.language	English	en-US
dc.language.iso	eng	en-US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en-US
dc.rights	Creative Commons Attribution 4.0 International	-
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	-
dc.source	2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)	-
dc.source	2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)	-
dc.subject	training	en-US
dc.subject	scene classification	en-US
dc.subject	urban areas	en-US
dc.subject	pipelines	en-US
dc.subject	network architecture	en-US
dc.subject	metadata	en-US
dc.subject	acoustics	en-US
dc.subject	spatiotemporal phenomena	en-US
dc.subject	reliability	en-US
dc.subject	iterative methods	en-US
dc.title	A Semi-Supervised Acoustic Scene Classification Network Based on Multi-Modal Information Fusion	en-US
dc.type	Conference Paper	en-US
dc.date.dateAccepted	2025-09-05	-
dc.identifier.doi	https://doi.org/10.1109/apsipaasc65261.2025.11249027	-
dc.relation.isPartOf	2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)	-
pubs.finish-date	2025-10-24	-
pubs.finish-date	2025-10-24	-
pubs.publication-status	Published	-
pubs.start-date	2025-10-22	-
pubs.start-date	2025-10-22	-
dc.identifier.eissn	2640-0103	-
dcterms.dateAccepted	2025-09-05	-
dc.rights.holder	The Author(s)	-
dc.rights.holder	https://creativecommons.org/licenses/by/4.0/legalcode.en	-
dc.contributor.orcid	Gan, Lu [0000-0003-1056-7660]	-
Appears in Collections:	Department of Electronic and Electrical Engineering Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.	522.21 kB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License