Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/32841
Full metadata record
DC FieldValueLanguage
dc.contributor.authorZhu, Z-
dc.contributor.authorChen, Z-
dc.contributor.authorZhu, C-
dc.contributor.authorSi, W-
dc.contributor.authorWang, F-
dc.date.accessioned2026-02-23T10:20:54Z-
dc.date.available2026-02-23T10:20:54Z-
dc.date.issued2026-02-09-
dc.identifier.citationZhu, Z. et al. (2026) 'Optimizing potential-based reward automata in partially observable reinforcement learning using genetic local search', Engineering Applications of Artificial Intelligence, 169, 114054, pp. 1–14. doi: 10.1016/j.engappai.2026.114054.en-US
dc.identifier.issn0952-1976-
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/32841-
dc.descriptionHighlights: • Introduce a new RA learning mechanism with reward constraints for better strategies. • Develop evolutionary algorithms to optimize RA and policies in various environments. • Conduct experiments showing superior performance in six partially observable domains. • Analyze exploration–exploitation balance and environmental randomness effects. • Demonstrate the stability and efficiency of our genetic local search method.en-US
dc.descriptionData availability: No data was used for the research described in the article.-
dc.description.abstractPartially observable reinforcement learning extends the reinforcement learning framework to environments in which agents have limited visibility of the state space, making it particularly relevant for applications in robotics and autonomous vehicle navigation. However, a primary challenge in partially observable reinforcement learning is defining effective reward functions that can guide the learning process despite partial observability. To address this challenge, this paper introduces a novel approach for constructing potential-based reward automata by employing genetic local search methods. Specifically, our method constructs these automata from compressed representations of exploration trajectories, which succinctly capture critical decision points and essential state transitions while eliminating redundant steps. By optimizing trajectory samples and shortening agent trajectories to their crucial transitions, our technique significantly reduces computational overhead. Formally, we define the learning objective as an optimization problem aimed at maximizing the log-likelihood of future observations while simultaneously minimizing the structural complexity of the learned reward automata. Furthermore, by incorporating value-based strategies to estimate potential values within the reward automata, our approach improves learning efficiency and facilitates the identification of optimal reward structures. We empirically evaluate our proposed method on seven partially observable grid-world benchmarks. Experimental results demonstrate that our method achieves superior performance relative to state-of-the-art reward automata-based techniques, exhibiting both accelerated learning speeds and higher accumulated rewards. Additionally, our genetic local search algorithm consistently outperforms comparative heuristic methods in terms of learning curves and reward accumulation.en-US
dc.description.sponsorshipThis work was supported by National Natural Science Foundation of China (No. 62202067), CNPC Innovation Found (No. 2024DQ02-0501), Royal Society (IEC_NSFC_233444), Postgraduate Research and Practice Innovation Project of Jiangsu Province (No. KYCX24_3228) and Youth Science and Technology Talent Promotion Project of Jiangsu Province (No. JSTJ-2025-137).en-US
dc.format.extent1–14-
dc.format.mediumPrint-Electronic-
dc.languageen-
dc.language.isoen-USen-US
dc.publisherElsevier on behalf of International Federation of Automatic Control (IFAC)en-US
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/-
dc.subjectpartially observable reinforcement learningen-US
dc.subjectreward automataen-US
dc.subjectheuristic algorithmsen-US
dc.subjectreward shapingen-US
dc.titleOptimizing potential-based reward automata in partially observable reinforcement learning using genetic local searchen-US
dc.typeArticleen-US
dc.date.dateAccepted2026-02-01-
dc.identifier.doihttps://doi.org/10.1016/j.engappai.2026.114054-
dc.relation.isPartOfEngineering Applications of Artificial Intelligence-
pubs.publication-statusPublished-
pubs.volume169-
dc.identifier.eissn1873-6769-
dc.rights.licensehttps://creativecommons.org/licenses/by-nc-nd/4.0/legalcode.en-
dcterms.dateAccepted2026-02-01-
dc.rights.holderElsevier Ltd.-
dc.contributor.orcidZhu, Chenyang [0000-0002-2145-0559]-
dc.identifier.number114054-
Appears in Collections:Dept of Computer Science Embargoed Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfEmbargoed until 9 February 2027. Copyright © Elsevier Ltd. All rights reserved. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/ (see: https://www.elsevier.com/about/policies/sharing).3.19 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons