• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Xu, Pei (Xu, Pei.) [1] | Chen, Hao (Chen, Hao.) [2] | Yang, Wenjie (Yang, Wenjie.) [3] | Huang, Kaiqi (Huang, Kaiqi.) [4]

Indexed by:

Scopus SCIE

Abstract:

A key challenge in reinforcement learning is how to guide agents to efficiently explore sparse reward environments. In order to overcome this challenge, the state-of-the-art methods introduce additional intrinsic rewards based on state-related information, such as the novelty of states. Unfortunately, these methods frequently fail in procedurally-generated tasks, where a different environment is generated in each episode so that the agent is not likely to visit the same state more than once. Recently, some exploration methods designed specifically for procedurally-generated tasks have been proposed. However, they still only consider state-related information, which leads to relatively inefficient exploration. In this work, we propose a novel exploration method, which utilizes cross-episode policy-related information and intraepisode state-related information to jointly encourage exploration in procedurally-generated tasks. In term of policy-related information, we first use an imitator-based unbalanced policy diversity to measure the difference between the agent's current policy and the agent's previous policies, and then encourage the agent to maximize this difference. In term of state-related information, we encourage the agent to maximize the state diversity within an episode, thereby visiting as many different states as possible in an episode. We show that our method significantly improves sample efficiency over state-of-the-art methods on three challenging benchmarks, including MiniGrid, MiniWorld, and the sparse-reward version of Procgen.

Keyword:

Benchmark testing Current measurement Cybernetics Deep reinforcement learning Diversity reception exploration Faces Optimization procedurally-generated task Q-learning sparse reward Three-dimensional displays Training

Community:

  • [ 1 ] [Xu, Pei]Chinese Acad Sci, Inst Automat, Key Lab Cognit & Decis Intelligence Complex Syst, Beijing 100190, Peoples R China
  • [ 2 ] [Huang, Kaiqi]Chinese Acad Sci, Inst Automat, Key Lab Cognit & Decis Intelligence Complex Syst, Beijing 100190, Peoples R China
  • [ 3 ] [Chen, Hao]Univ Chinese Acad Sci, Sch Emergency Management Sci & Engn, Beijing 100190, Peoples R China
  • [ 4 ] [Yang, Wenjie]Fuzhou Univ, Coll Comp & Data Sci, Fuzhou 350100, Peoples R China
  • [ 5 ] [Huang, Kaiqi]Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

Reprint 's Address:

  • [Huang, Kaiqi]Chinese Acad Sci, Inst Automat, Key Lab Cognit & Decis Intelligence Complex Syst, Beijing 100190, Peoples R China

Show more details

Version:

Related Keywords:

Source :

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS

ISSN: 2168-2216

Year: 2025

8 . 6 0 0

JCR@2023

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Online/Total:136/10829478
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1