Indexed by:
Abstract:
A key challenge in reinforcement learning is how to guide agents to efficiently explore sparse reward environments. In order to overcome this challenge, the state-of-the-art methods introduce additional intrinsic rewards based on state-related information, such as the novelty of states. Unfortunately, these methods frequently fail in procedurally-generated tasks, where a different environment is generated in each episode so that the agent is not likely to visit the same state more than once. Recently, some exploration methods designed specifically for procedurally-generated tasks have been proposed. However, they still only consider state-related information, which leads to relatively inefficient exploration. In this work, we propose a novel exploration method, which utilizes cross-episode policy-related information and intraepisode state-related information to jointly encourage exploration in procedurally-generated tasks. In term of policy-related information, we first use an imitator-based unbalanced policy diversity to measure the difference between the agent's current policy and the agent's previous policies, and then encourage the agent to maximize this difference. In term of state-related information, we encourage the agent to maximize the state diversity within an episode, thereby visiting as many different states as possible in an episode. We show that our method significantly improves sample efficiency over state-of-the-art methods on three challenging benchmarks, including MiniGrid, MiniWorld, and the sparse-reward version of Procgen.
Keyword:
Reprint 's Address:
Email:
Version:
Source :
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
ISSN: 2168-2216
Year: 2025
8 . 6 0 0
JCR@2023
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1
Affiliated Colleges: