Cross Domain Optimization for Speech Enhancement: Parallel or Cascade?

Wan, L; Liu, H; Shi, L; Zhou, Y; Gan, L

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/30156

Full metadata record

DC Field	Value	Language
dc.contributor.author	Wan, L	-
dc.contributor.author	Liu, H	-
dc.contributor.author	Shi, L	-
dc.contributor.author	Zhou, Y	-
dc.contributor.author	Gan, L	-
dc.date.accessioned	2024-11-17T17:00:19Z	-
dc.date.available	2024-11-17T17:00:19Z	-
dc.date.issued	2024-09-26	-
dc.identifier	ORCiD: Hongqing Liu https://orcid.org/0000-0003-4839-1525	-
dc.identifier	ORCiD: Liming Shi https://orcid.org/0000-0003-4129-0668	-
dc.identifier	ORCiD: Yi Zhou https://orcid.org/0000-0001-7445-226X	-
dc.identifier	ORCiD: Lu Gan https://orcid.org/0000-0003-1056-7660	-
dc.identifier.citation	Wan, L. (2024) 'Cross Domain Optimization for Speech Enhancement: Parallel or Cascade?', IEEE/ACM Transactions on Audio Speech and Language Processing, 32, pp. 4328 - 4341. doi: 10.1109/TASLP.2024.3468026.	en_US
dc.identifier.issn	2329-9290	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/30156	-
dc.description	We provide a demo page containing enhanced audio clips from different models at https://wanliangdaxia.github.io/ .	-
dc.description.abstract	This paper introduces five novel deep-learning architectures for speech enhancement. Existing methods typically use time-domain, time-frequency representations, or a hybrid approach. Recognizing the unique contributions of each domain to feature extraction and model design, this study investigates the integration of waveform and complex spectrogram models through cross-domain fusion to enhance speech feature learning and noise reduction, thereby improving speech quality. We examine both cascading and parallel configurations of waveform and complex spectrogram models to assess their effectiveness in speech enhancement. Additionally, we employ an orthogonal projection-based error decomposition technique and manage the inputs of individual sub-models to analyze factors affecting speech quality. The network is trained by optimizing three specific loss functions applied across all sub-models. Our experiments, using the DNS Challenge (ICASSP 2021) dataset, reveal that the proposed models surpass existing benchmarks in speech enhancement, offering superior speech quality and intelligibility. These results highlight the efficacy of our cross-domain fusion strategy.	en_US
dc.format.extent	4328 - 4341	-
dc.format.medium	Print-Electronic	-
dc.language	english	-
dc.language.iso	en_US	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.uri	https://wanliangdaxia.github.io/	-
dc.rights	Copyright © 2024 Institute of Electrical and Electronics Engineers (IEEE). Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. See: https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/	-
dc.rights.uri	https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/	-
dc.subject	speech enhancement	en_US
dc.subject	waveform	en_US
dc.subject	time-frequency	en_US
dc.subject	complex domain	en_US
dc.subject	cross-domain speech	en_US
dc.title	Cross Domain Optimization for Speech Enhancement: Parallel or Cascade?	en_US
dc.type	Article	en_US
dc.date.dateAccepted	2024-09-16	-
dc.identifier.doi	https://doi.org/10.1109/TASLP.2024.3468026	-
dc.relation.isPartOf	IEEE/ACM Transactions on Audio Speech and Language Processing	-
pubs.publication-status	Published	-
pubs.volume	32	-
dc.identifier.eissn	2329-9304	-
dc.rights.holder	Institute of Electrical and Electronics Engineers (IEEE)	-
Appears in Collections:	Dept of Electronic and Electrical Engineering Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2024 Institute of Electrical and Electronics Engineers (IEEE). Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. See: https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/	14.42 MB	Adobe PDF	View/Open

Show simple item record