Optimizing potential-based reward automata in partially observable reinforcement learning using genetic local search

Zhu, Z; Chen, Z; Zhu, C; Si, W; Wang, F

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/32841

Full metadata record

DC Field	Value	Language
dc.contributor.author	Zhu, Z	-
dc.contributor.author	Chen, Z	-
dc.contributor.author	Zhu, C	-
dc.contributor.author	Si, W	-
dc.contributor.author	Wang, F	-
dc.date.accessioned	2026-02-23T10:20:54Z	-
dc.date.available	2026-02-23T10:20:54Z	-
dc.date.issued	2026-02-09	-
dc.identifier.citation	Zhu, Z. et al. (2026) 'Optimizing potential-based reward automata in partially observable reinforcement learning using genetic local search', Engineering Applications of Artificial Intelligence, 169, 114054, pp. 1–14. doi: 10.1016/j.engappai.2026.114054.	en-US
dc.identifier.issn	0952-1976	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/32841	-
dc.description	Highlights: • Introduce a new RA learning mechanism with reward constraints for better strategies. • Develop evolutionary algorithms to optimize RA and policies in various environments. • Conduct experiments showing superior performance in six partially observable domains. • Analyze exploration–exploitation balance and environmental randomness effects. • Demonstrate the stability and efficiency of our genetic local search method.	en-US
dc.description	Data availability: No data was used for the research described in the article.	-
dc.description.abstract	Partially observable reinforcement learning extends the reinforcement learning framework to environments in which agents have limited visibility of the state space, making it particularly relevant for applications in robotics and autonomous vehicle navigation. However, a primary challenge in partially observable reinforcement learning is defining effective reward functions that can guide the learning process despite partial observability. To address this challenge, this paper introduces a novel approach for constructing potential-based reward automata by employing genetic local search methods. Specifically, our method constructs these automata from compressed representations of exploration trajectories, which succinctly capture critical decision points and essential state transitions while eliminating redundant steps. By optimizing trajectory samples and shortening agent trajectories to their crucial transitions, our technique significantly reduces computational overhead. Formally, we define the learning objective as an optimization problem aimed at maximizing the log-likelihood of future observations while simultaneously minimizing the structural complexity of the learned reward automata. Furthermore, by incorporating value-based strategies to estimate potential values within the reward automata, our approach improves learning efficiency and facilitates the identification of optimal reward structures. We empirically evaluate our proposed method on seven partially observable grid-world benchmarks. Experimental results demonstrate that our method achieves superior performance relative to state-of-the-art reward automata-based techniques, exhibiting both accelerated learning speeds and higher accumulated rewards. Additionally, our genetic local search algorithm consistently outperforms comparative heuristic methods in terms of learning curves and reward accumulation.	en-US
dc.description.sponsorship	This work was supported by National Natural Science Foundation of China (No. 62202067), CNPC Innovation Found (No. 2024DQ02-0501), Royal Society (IEC_NSFC_233444), Postgraduate Research and Practice Innovation Project of Jiangsu Province (No. KYCX24_3228) and Youth Science and Technology Talent Promotion Project of Jiangsu Province (No. JSTJ-2025-137).	en-US
dc.format.extent	1–14	-
dc.format.medium	Print-Electronic	-
dc.language	en	-
dc.language.iso	en-US	en-US
dc.publisher	Elsevier on behalf of International Federation of Automatic Control (IFAC)	en-US
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/	-
dc.subject	partially observable reinforcement learning	en-US
dc.subject	reward automata	en-US
dc.subject	heuristic algorithms	en-US
dc.subject	reward shaping	en-US
dc.title	Optimizing potential-based reward automata in partially observable reinforcement learning using genetic local search	en-US
dc.type	Article	en-US
dc.date.dateAccepted	2026-02-01	-
dc.identifier.doi	https://doi.org/10.1016/j.engappai.2026.114054	-
dc.relation.isPartOf	Engineering Applications of Artificial Intelligence	-
pubs.publication-status	Published	-
pubs.volume	169	-
dc.identifier.eissn	1873-6769	-
dc.rights.license	https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode.en	-
dcterms.dateAccepted	2026-02-01	-
dc.rights.holder	Elsevier Ltd.	-
dc.contributor.orcid	Zhu, Chenyang [0000-0002-2145-0559]	-
dc.identifier.number	114054	-
Appears in Collections:	Dept of Computer Science Embargoed Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Embargoed until 9 February 2027. Copyright © Elsevier Ltd. All rights reserved. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/ (see: https://www.elsevier.com/about/policies/sharing).	3.19 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License