Using Reward Machines for Offline Reinforcement Learning With Non-Markovian Reward Functions

Authors

  • Yanze Wang School for Engineering of Matter, Transport and Energy, Arizona State University, Arizona, AZ 85287, USA

Keywords:

offline reinforcement learning, reward machine, non-Markovian reward function

Abstract

We investigate the problem of offline reinforcement learning using non-Markovian reward functions, which allows for the incorporation of more realistic and intricate reward structures in the learning process. Offline reinforcement learning has shown promising potential in learning optimal policies when the agent has access to previously collected static datasets. Reward machines offer a way to encode the high-level structure of non-Markovian reward functions. We introduce C-QRM, an offline reinforcement learning approach that employs non-Markovian reward functions specified as reward machines to accomplish complex tasks and learn an optimal policy more efficiently by utilizing the offline dataset. Our objective is to learn a conservative Q-function that decomposes complex high-level reward machines and whose expected value of a policy under this Q-function provides a lower bound to its actual value. C-QRM learns these lower-bounded Q-values, mitigating overestimation bias and improving sampling efficiency. We evaluate the performance of the proposed C-QRM algorithm by comparing it to QRM as a baseline method. The results indicate that C-QRM outperforms QRM with fewer training steps and benefits from the offline dataset.

Downloads

Published

2025-04-27

How to Cite

(1)
Wang, Y. Using Reward Machines for Offline Reinforcement Learning With Non-Markovian Reward Functions. Sci. Insights 2025, 1, 5.

Issue

Section

Computer Science and Mathematics

Similar Articles

You may also start an advanced similarity search for this article.