Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout

Wang, H; Tang, Z; Sun, Y; Wang, F; Zhang, S; Chen, Y

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/30166

Full metadata record

DC Field	Value	Language
dc.contributor.author	Wang, H	-
dc.contributor.author	Tang, Z	-
dc.contributor.author	Sun, Y	-
dc.contributor.author	Wang, F	-
dc.contributor.author	Zhang, S	-
dc.contributor.author	Chen, Y	-
dc.date.accessioned	2024-11-18T11:58:42Z	-
dc.date.available	2024-11-18T11:58:42Z	-
dc.date.issued	2024-08-12	-
dc.identifier	ORCiD: Haoran Wang https://orcid.org/0000-0002-4622-0119	-
dc.identifier	ORCiD: Zeshen Tang https://orcid.org/0000-0001-8765-6464	-
dc.identifier	ORCiD: Fang Wang https://orcid.org/0000-0003-1987-9150	-
dc.identifier	ORCiD: Siyu Zhang https://orcid.org/0000-0002-0001-0204	-
dc.identifier	ORCiD: Yeming Chen https://orcid.org/0009-0005-5515-1943	-
dc.identifier.citation	Wang, H. et al. (2024) 'Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout', IEEE Transactions on Neural Networks and Learning Systems, 36 (5), pp. 8455 - 8469. doi: 10.1109/tnnls.2024.3425809.	en_US
dc.identifier.issn	2162-237X	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/30166	-
dc.description	A preprint version of the article is available at arXiv:2309.13508v2 [cs.LG], https://arxiv.org/abs/2309.13508 ([v2] Sat, 6 Apr 2024 17:07:13 UTC (4,747 KB)) . It is archived on this institutional repository but it has not been certified by peer review. Comments: Resubmitted a revised version, in which we provided more illustrative examples, corrected the writing errors, and added references.	en_US
dc.description	This article has supplementary downloadable material available at https://doi.org/10.1109/TNNLS.2024.3425809, provided by the authors.	-
dc.description.abstract	Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened interlevel communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting interlevel cooperation. Here, we propose a novel goal-conditioned HRL framework named Guided Cooperation via Model-Based Rollout (GCMR; code is available at https://github.com/HaoranWang-TJ/GCMR_ACLG_official), aiming to bridge interlayer information synchronization and cooperation by exploiting forward dynamics. First, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Second, to prevent disruption by the unseen subgoals and states, lower level Q -function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Third, we propose a one-step rollout-based planning, using higher level critics to guide the lower level policy. Specifically, we estimate the value of future states of the lower level policy using the higher level critic function, thereby transmitting global task information downward to avoid local pitfalls. These three critical components in GCMR are expected to facilitate interlevel cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of hierarchical reinforcement learning guided by landmarks (HIGL), namely, adjacency constraint and landmark-guided planning (ACLG), yields more stable and robust policy improvement compared with various baselines and significantly outperforms previous state-of-the-art (SOTA) algorithms.	en_US
dc.format.extent	8455 - 8469	-
dc.format.medium	Print-Electronic	-
dc.language	English	-
dc.language.iso	en_US	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.uri	https://arxiv.org/abs/2309.13508	-
dc.rights	Copyright © 2024 Institute of Electrical and Electronics Engineers (IEEE). Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. See: https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/	-
dc.rights.uri	https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/	-
dc.subject	deep reinforcement learning (DRL)	en_US
dc.subject	goal conditioning	en_US
dc.subject	hierarchical reinforcement learning (HRL)	en_US
dc.subject	interlevel cooperation	en_US
dc.subject	model-based rollout	en_US
dc.title	Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout	en_US
dc.type	Article	en_US
dc.date.dateAccepted	2024-06-30	-
dc.identifier.doi	https://doi.org/10.1109/tnnls.2024.3425809	-
dc.relation.isPartOf	IEEE Transactions on Neural Networks and Learning Systems	-
pubs.issue	5	-
pubs.publication-status	Published	-
pubs.volume	36	-
dc.identifier.eissn	2162-2388	-
dc.rights.holder	Institute of Electrical and Electronics Engineers (IEEE)	-
Appears in Collections:	Dept of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
Preprint.pdf	Copyright © 2024 Institute of Electrical and Electronics Engineers (IEEE). Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. See: https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/ .	5.95 MB	Adobe PDF	View/Open

Show simple item record