Multi-agent reinforcement learning with synchronized and decomposed reward automaton synthesized from reactive temporal logic

Zhu, C; Zhu, J; Si, W; Wang, X; Wang, F

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/30403

Full metadata record

DC Field	Value	Language
dc.contributor.author	Zhu, C	-
dc.contributor.author	Zhu, J	-
dc.contributor.author	Si, W	-
dc.contributor.author	Wang, X	-
dc.contributor.author	Wang, F	-
dc.date.accessioned	2025-01-04T10:38:36Z	-
dc.date.available	2025-01-04T10:38:36Z	-
dc.date.issued	2024-11-12	-
dc.identifier	ORCiD: Chenyang Zhu https://orcid.org/0000-0002-2145-0559	-
dc.identifier	ORCiD: Fang Wang https://orcid.org/0000-0003-1987-9150	-
dc.identifier	112703	-
dc.identifier.citation	Zhu, C. et al. (2024) 'Multi-agent reinforcement learning with synchronized and decomposed reward automaton synthesized from reactive temporal logic', Knowledge-Based Systems, 306, 112703, pp. 1 - 16. doi: 10.1016/j.knosys.2024.112703.	en_US
dc.identifier.issn	0950-7051	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/30403	-
dc.description	Data availability: No data was used for the research described in the article.	en_US
dc.description.abstract	Multi-agent systems (MAS) consist of multiple autonomous agents interacting to achieve collective objectives. Multi-agent reinforcement learning (MARL) enhances these systems by enabling agents to learn optimal behaviors through interaction, thus improving their coordination in dynamic environments. However, MARL faces significant challenges in adapting to complex dependencies on past states and actions, which are not adequately represented by the current state alone in reactive systems. This paper addresses these challenges by considering MAS operating under task specifications formulated as Generalized Reactivity of rank 1 (GR(1)). These synthesized strategies are used as a priori knowledge to guide the learning. To tackle the difficulties of handling non-Markovian tasks in reactive systems, we propose a novel synchronized decentralized training paradigm that guides agents to learn within the MARL framework using a reward structure constructed from decomposed synthesized strategies of GR(1). We initially formalize the synthesis of GR(1) strategies as a reachability problem of winning states of the system. Subsequently, we develop a decomposition mechanism that constructs individual reward structures for decentralized MARL, incorporating potential values calculated through value iteration. Theoretical proofs are provided to verify that the safety and liveness properties are preserved. We evaluate our approach against other state-of-the-art methods under various GR(1) specifications and scenario maps, demonstrating superior learning efficacy and optimal rewards per episode. Additionally, we show that the decentralized training paradigm outperforms the centralized training paradigm. The value iteration strategy used to calculate potential values for the reward structure is compared against two other strategies, showcasing its advantages.	en_US
dc.description.sponsorship	This work was supported by National Natural Science Foundation of China (No. 62202067), Royal Society (IEC_NSFC_233444), Natural Science Foundation of the Higher Education Institutions of Jiangsu Province (No. 22KJB520012) and Postgraduate Research and Practice Innovation Project of Jiangsu Province (No. KYCX24_3228), Changzhou Sci & Tech Program (No. CJ20220241)	en_US
dc.format.extent	1 - 16	-
dc.format.medium	Print-Electronic	-
dc.language	English	-
dc.language.iso	en_US	en_US
dc.publisher	Elsevier	en_US
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	-
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/	-
dc.subject	multi-agent reinforcement learning	en_US
dc.subject	autonomous reasoning	en_US
dc.subject	swarm intelligence	en_US
dc.title	Multi-agent reinforcement learning with synchronized and decomposed reward automaton synthesized from reactive temporal logic	en_US
dc.type	Article	en_US
dc.date.dateAccepted	2024-11-02	-
dc.identifier.doi	https://doi.org/10.1016/j.knosys.2024.112703	-
dc.relation.isPartOf	Knowledge-Based Systems	-
pubs.publication-status	Published	-
pubs.volume	306	-
dc.identifier.eissn	1872-7409	-
dc.rights.license	https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode.en	-
dc.rights.holder	Elsevier B.V.	-
Appears in Collections:	Dept of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Embargoed until 12 November 2025. Copyright © 2024 Elsevier B.V. All rights reserved. This is the accepted manuscript version of an article which has been published in final form at https://doi.org/10.1016/j.knosys.2024.112703, archived on this repository under a Creative Commons CC BY-NC-ND attribution licence (https://creativecommons.org/licenses/by-nc-nd/4.0/).	3.46 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License