Optimizing potential-based reward automata in partially observable reinforcement learning using genetic local search

Zhu, Z; Chen, Z; Zhu, C; Si, W; Wang, F

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/32841

Title:	Optimizing potential-based reward automata in partially observable reinforcement learning using genetic local search
Authors:	Zhu, Z Chen, Z Zhu, C Si, W Wang, F
Keywords:	partially observable reinforcement learning;reward automata;heuristic algorithms;reward shaping
Issue Date:	9-Feb-2026
Publisher:	Elsevier on behalf of International Federation of Automatic Control (IFAC)
Citation:	Zhu, Z. et al. (2026) 'Optimizing potential-based reward automata in partially observable reinforcement learning using genetic local search', Engineering Applications of Artificial Intelligence, 169, 114054, pp. 1–14. doi: 10.1016/j.engappai.2026.114054.
Abstract:	Partially observable reinforcement learning extends the reinforcement learning framework to environments in which agents have limited visibility of the state space, making it particularly relevant for applications in robotics and autonomous vehicle navigation. However, a primary challenge in partially observable reinforcement learning is defining effective reward functions that can guide the learning process despite partial observability. To address this challenge, this paper introduces a novel approach for constructing potential-based reward automata by employing genetic local search methods. Specifically, our method constructs these automata from compressed representations of exploration trajectories, which succinctly capture critical decision points and essential state transitions while eliminating redundant steps. By optimizing trajectory samples and shortening agent trajectories to their crucial transitions, our technique significantly reduces computational overhead. Formally, we define the learning objective as an optimization problem aimed at maximizing the log-likelihood of future observations while simultaneously minimizing the structural complexity of the learned reward automata. Furthermore, by incorporating value-based strategies to estimate potential values within the reward automata, our approach improves learning efficiency and facilitates the identification of optimal reward structures. We empirically evaluate our proposed method on seven partially observable grid-world benchmarks. Experimental results demonstrate that our method achieves superior performance relative to state-of-the-art reward automata-based techniques, exhibiting both accelerated learning speeds and higher accumulated rewards. Additionally, our genetic local search algorithm consistently outperforms comparative heuristic methods in terms of learning curves and reward accumulation.
Description:	Highlights: • Introduce a new RA learning mechanism with reward constraints for better strategies. • Develop evolutionary algorithms to optimize RA and policies in various environments. • Conduct experiments showing superior performance in six partially observable domains. • Analyze exploration–exploitation balance and environmental randomness effects. • Demonstrate the stability and efficiency of our genetic local search method. Data availability: No data was used for the research described in the article.
URI:	https://bura.brunel.ac.uk/handle/2438/32841
DOI:	https://doi.org/10.1016/j.engappai.2026.114054
ISSN:	0952-1976
Appears in Collections:	Department of Computer Science Embargoed Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Embargoed until 9 February 2027. Copyright © Elsevier Ltd. All rights reserved. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/ (see: https://www.elsevier.com/about/policies/sharing).	3.19 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License