Multiagent deep reinforcement learning-based cooperative optimal operation with strong scalability for residential microgrid clusters

Wang, C; Wang, M; Wang, A; Zhang, X; Zhang, J; Ma, H; Yang, N; Zhao, Z; Lai, CS; Lai, LL

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/30647

Title:	Multiagent deep reinforcement learning-based cooperative optimal operation with strong scalability for residential microgrid clusters
Authors:	Wang, C Wang, M Wang, A Zhang, X Zhang, J Ma, H Yang, N Zhao, Z Lai, CS Lai, LL
Keywords:	optimal operation;residential microgrid (RM);deep reinforcement learning (DRL);multi-agent systems
Issue Date:	11-Dec-2024
Publisher:	Elsevier
Citation:	Wang, C. et al. (2024) 'Multiagent deep reinforcement learning-based cooperative optimal operation with strong scalability for residential microgrid clusters', Energy, 314, 134165, pp. 1 - 14. doi: 10.1016/j.energy.2024.134165.
Abstract:	With the rapid development of smart home technology, residential microgrid (RM) clusters have become an important way to utilize the demand-side resources of large-scale housing. However, there are some key problems in existing RM cluster optimization methods, such as difficult in adapting to the local observable environment and with poor privacy and scalability. Therefore, this paper proposes a multi-agent deep reinforcement learning (MADRL)-based RM cluster optimization operation method. First, with the aim of minimizing the energy cost of each residence while satisfying the comfort level of residents and avoiding transformer overload, the optimization scheduling problem of an RM cluster is described as a Markov game with an unknown state transition probability function. Then, a novel MADRL method is proposed to determine the optimal operation strategy of multiple RMs in this game paradigm. Each agent in the proposed method contains a collective strategy model and an independent learner. The collective strategy model can simulate the energy consumption of other RMs in the system and reflect its operating behavior. In addition, an independent learner based on a soft actor-critic (SAC) framework is used to learn the optimal scheduling strategy interactively with the environment. The proposed method has a completely decentralized and scalable structure, which can deal with continuous high-dimensional state and action spaces only requires local observations and approximations during training. Finally, a numerical example is given to verify that the proposed method can not only learn a stable cooperative energy management strategy but can also be extended to large-scale RM cluster problems. This gives the strong scalability and a high potential for practical application.
Description:	Data availability: The authors do not have permission to share data.
URI:	https://bura.brunel.ac.uk/handle/2438/30647
DOI:	https://doi.org/10.1016/j.energy.2024.134165
ISSN:	0360-5442
Other Identifiers:	ORCiD: Can Wang https://orcid.org/0000-0002-5892-253X ORCiD: Xiaojia Zhang https://orcid.org/0009-0007-5024-5363 ORCiD: Zhuoli Zhao https://orcid.org/0000-0003-2531-0614 ORCiD: Chun Sing Lai https://orcid.org/0000-0002-4169-4438 ORCiD: Loi Lei Lai https://orcid.org/0000-0003-4786-7931 134165
Appears in Collections:	Dept of Electronic and Electrical Engineering Embargoed Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Embargoed until 11 December 2025. Copyright © 2024 Elsevier Ltd. All rights reserved. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/ (see: https://www.elsevier.com/about/policies/sharing).	2.05 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License