Large AI Model Empowered Multimodal Semantic Communications

Jiang, F; Peng, Y; Dong, L; Wang, K; Yang, K; Pan, C; You, X

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/27173

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jiang, F	-
dc.contributor.author	Peng, Y	-
dc.contributor.author	Dong, L	-
dc.contributor.author	Wang, K	-
dc.contributor.author	Yang, K	-
dc.contributor.author	Pan, C	-
dc.contributor.author	You, X	-
dc.date.accessioned	2023-09-13T09:34:18Z	-
dc.date.available	2023-09-13T09:34:18Z	-
dc.date.issued	2024-09-09	-
dc.identifier	ORCiD: Kezhi Wang https://orcid.org/0000-0001-8602-0800	-
dc.identifier	arXiv:2309.01249v2 [cs.AI]	-
dc.identifier.citation	Jiang, F. et al. (2024) 'Large AI Model Empowered Multimodal Semantic Communications', IEEE Communications Magazine, 63 (1), pp. 76 - 82. doi: 10.1109/MCOM.001.2300575.	en_US
dc.identifier.issn	0163-6804	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/27173	-
dc.description.abstract	Multimodal signals, including text, audio, image, and video, can be integrated into semantic communication (SC) systems to provide an immersive experience with low latency and high quality at the semantic level. However, the multimodal SC has several challenges, including data heterogeneity, semantic ambiguity, and signal distortion during transmission. Recent advancements in large AI models, particularly in the multimodal language model (MLM) and large language model (LLM), offer potential solutions for addressing these issues. To this end, we propose a large AI model-based multimodal SC (LAM-MSC) framework, where we first present the MLM-based multimodal alignment (MMA) that utilizes the MLM to enable the transformation between multimodal and unimodal data while preserving semantic consistency. Then, a personalized LLM-based knowledge base (LKB) is proposed, which allows users to perform personalized semantic extraction or recovery through the LLM. This effectively addresses the semantic ambiguity. Finally, we apply the conditional generative adversarial networks-based channel estimation (CGE) for estimating the wireless channel state information. This approach effectively mitigates the impact of fading channels in SC. Finally, we conduct simulations that demonstrate the superior performance of the LAM-MSC framework.	en_US
dc.description.sponsorship	10.13039/501100001809-National Natural Science Foundation of China (Grant Number: 41604117,41904127,62132004). This work was supported in part by the National Natural Science Foundation of China under Grants 41604117, 41904127, and 62132004, in part by the Hunan Provincial Natural Science Foundation of China under Grant 2024JJ5270, in part by the Open Project of Xiangjiang Laboratory under Grant 22XJ03011, in part by the Scientific Research Fund of Hunan Provincial Education Department under Grant 22B0663, and in part by the Changsha Natural Science Foundation under Grants kq2402098 and kq2402162.	-
dc.format.extent	76 - 82	-
dc.format.medium	Print-Electronic	-
dc.language.iso	en_US	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.uri	https://arxiv.org/abs/2309.01249v1	-
dc.rights	Copyright © 2024 The Author(s). arXiv.org perpetual, non-exclusive license 1.0 (https://arxiv.org/licenses/nonexclusive-distrib/1.0/). This license gives limited rights to arXiv to distribute the article, and also limits re-use of any type from other entities or individuals.	-
dc.rights.uri	https://arxiv.org/licenses/nonexclusive-distrib/1.0/	-
dc.subject	semantic communication	en_US
dc.subject	large AI models	en_US
dc.subject	LLM	en_US
dc.subject	MLM	en_US
dc.subject	knowledgebase	-
dc.subject	artificial intelligence (cs.AI)	-
dc.subject	computation and language (cs.CL)	-
dc.subject	machine learning (cs.LG)	-
dc.title	Large AI Model Empowered Multimodal Semantic Communications	en_US
dc.type	Article	en_US
dc.identifier.doi	https://doi.org/10.1109/MCOM.001.2300575	-
dc.relation.isPartOf	IEEE Communications Magazine	-
pubs.issue	1	-
pubs.notes	Comments: Accepted by IEEE CM	-
pubs.volume	63	-
dc.identifier.eissn	1558-1896	-
dc.rights.license	https://arxiv.org/licenses/nonexclusive-distrib/1.0/	-
dc.rights.holder	The Author(s)	-
Appears in Collections:	Dept of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	The article version on this institutional repository is available at arXiv:2309.01249v2 [cs.AI], https://arxiv.org/abs/2309.01249. Comments: Accepted by IEEE CM. [v2] Sun, 4 Aug 2024 12:34:29 UTC (1,779 KB). Copyright © 2024 The Author(s). arXiv.org perpetual, non-exclusive license 1.0 (https://arxiv.org/licenses/nonexclusive-distrib/1.0/). This license gives limited rights to arXiv to distribute the article, and also limits re-use of any type from other entities or individuals.	1.84 MB	Adobe PDF	View/Open

Show simple item record