Visual Language Model based Cross-modal Semantic Communication Systems

Jiang, F; Tang, C; Dong, L; Wang, K; Yang, K; Pan, C

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/30985

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jiang, F	-
dc.contributor.author	Tang, C	-
dc.contributor.author	Dong, L	-
dc.contributor.author	Wang, K	-
dc.contributor.author	Yang, K	-
dc.contributor.author	Pan, C	-
dc.date.accessioned	2025-03-28T08:15:17Z	-
dc.date.available	2025-03-28T08:15:17Z	-
dc.date.issued	2025-03-04	-
dc.identifier	ORCiD: Feibo Jiang https://orcid.org/0000-0002-0235-0253	-
dc.identifier	ORCiD: Li Dong https://orcid.org/0000-0002-0127-8480	-
dc.identifier	ORCiD: Kezhi Wang https://orcid.org/0000-0001-8602-0800	-
dc.identifier	ORCiD: Kun Yang https://orcid.org/0000-0002-6782-6689	-
dc.identifier	ORCiD: Cunhua Pan https://orcid.org/0000-0001-5286-7958	-
dc.identifier.citation	Jiang, F. et al. (2025) 'Visual Language Model based Cross-modal Semantic Communication Systems', IEEE Transactions on Wireless Communications, 24 (5), pp. 3937 - 3948. doi: 10.1109/TWC.2025.3539526.	en_US
dc.identifier.issn	1536-1276	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/30985	-
dc.description.abstract	Semantic Communication (SC) has emerged as a novel communication paradigm in recent years. Nevertheless, extant Image Semantic Communication (ISC) systems face several challenges in dynamic environments, including low information density, catastrophic forgetting, and uncertain Signal-to-Noise Ratio (SNR). To address these challenges, we propose a novel Vision-Language Model-based Cross-modal Semantic Communication (VLM-CSC) system. The VLM-CSC comprises three novel components: (1) Cross-modal Knowledge Base (CKB) is used to extract high-density textual semantics from the semantically sparse image at the transmitter and reconstruct the original image based on textual semantics at the receiver. The transmission of high-density semantics contributes to alleviating bandwidth pressure. (2) Memory-assisted Encoder and Decoder (MED) employ a hybrid long/short-term memory mechanism, enabling the semantic encoder and decoder to overcome catastrophic forgetting in dynamic environments when there is a drift in the distribution of semantic features. (3) Noise Attention Module (NAM) employs attention mechanisms to adaptively adjust the semantic coding and the channel coding based on SNR, ensuring the robustness of the CSC system. The experimental simulations validate the effectiveness, adaptability, and robustness of the CSC system.	en_US
dc.description.sponsorship	10.13039/501100001809-National Natural Science Foundation of China (Grant Number: 41904127 41904127, and 62132004), in part by the Hunan Provincial Natural Science Foundation of China under Grant 2024JJ5270, in part by the Open Project of Xiangjiang Laboratory under Grant 22XJ03011, in part by the Scientific Research Fund of the Hunan Provincial Education Department under Grant 22B0663, in part by the Changsha Natural Science Foundation under Grants kq2402098 and kq2402162, in part by the Jiangsu Major Project on Basic Researches under Grant BK20243059 and Gusu Innovation Project for under Grant ZXL2024360.	en_US
dc.format.extent	3937 - 3948	-
dc.format.medium	Print-Electronic	-
dc.language	English	-
dc.language.iso	en_US	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.rights	Copyright © 2025 Institute of Electrical and Electronics Engineers (IEEE). Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. See: https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/.	-
dc.rights.uri	https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/	-
dc.subject	semantic communication	en_US
dc.subject	knowledge base	en_US
dc.subject	vision language model	en_US
dc.subject	large language model	en_US
dc.subject	continual learning	en_US
dc.title	Visual Language Model based Cross-modal Semantic Communication Systems	en_US
dc.type	Article	en_US
dc.date.dateAccepted	2025-01-23	-
dc.identifier.doi	https://doi.org/10.1109/TWC.2025.3539526	-
dc.relation.isPartOf	IEEE Transactions on Wireless Communications	-
pubs.issue	5	-
pubs.publication-status	Published	-
pubs.volume	24	-
dc.identifier.eissn	1558-2248	-
dcterms.dateAccepted	2025-01-23	-
dc.rights.holder	Institute of Electrical and Electronics Engineers (IEEE)	-
Appears in Collections:	Dept of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2025 Institute of Electrical and Electronics Engineers (IEEE). Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. See: https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/.	4.62 MB	Adobe PDF	View/Open

Show simple item record