Using Reward Machines for Offline Reinforcement Learning With Non-Markovian Reward Functions

Yanze Wang

Authors

Yanze Wang School for Engineering of Matter, Transport and Energy, Arizona State University, Arizona, AZ 85287, USA

Keywords:

offline reinforcement learning, reward machine, non-Markovian reward function

Abstract

We investigate the problem of offline reinforcement learning using non-Markovian reward functions, which allows for the incorporation of more realistic and intricate reward structures in the learning process. Offline reinforcement learning has shown promising potential in learning optimal policies when the agent has access to previously collected static datasets. Reward machines offer a way to encode the high-level structure of non-Markovian reward functions. We introduce C-QRM, an offline reinforcement learning approach that employs non-Markovian reward functions specified as reward machines to accomplish complex tasks and learn an optimal policy more efficiently by utilizing the offline dataset. Our objective is to learn a conservative Q-function that decomposes complex high-level reward machines and whose expected value of a policy under this Q-function provides a lower bound to its actual value. C-QRM learns these lower-bounded Q-values, mitigating overestimation bias and improving sampling efficiency. We evaluate the performance of the proposed C-QRM algorithm by comparing it to QRM as a baseline method. The results indicate that C-QRM outperforms QRM with fewer training steps and benefits from the offline dataset.

Using Reward Machines for Offline Reinforcement Learning With Non-Markovian Reward Functions

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Similar Articles

Bilateral Quadratic Series Via Residue Method

On the Nature of Chemical Bond: Space Enfoldings, Density Bond Matrices, Quantum Molecular Polyhedra, and a Collective Bond Description Proposal

Ion Trap Quantum Computers Part 1: 1-Qubit Gates