DSpace at KOASAS: Hindsight goal ranking on replay buffer for sparse reward environment

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

Hindsight goal ranking on replay buffer for sparse reward environment희소 보상 환경을 위한 재생 버퍼의 사후 목표 랭킹 방법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 158
Download : 0

Export

Luu, Minh Tung

Reinforcement learning (RL) agents successively updates their parameters by way of recalling past experience via experience replay. Strongly correlated updates violate many stochastic gradient-based algorithms, but experience replay disallows temporal correlations by mixing more and less recent experience for update. Furthermore, it permits rare experience to be reused in the update. It is a well-known fact that prioritizing the experience judiciously can improve sample efficiency. This paper considers a method for prioritizing the replay experience for off-policy RL referred to as Hindsight Goal Ranking (HGR) is proposed by addressing the limitation of Hindsight Experience Replay (HER) that generates hindsight goals based on uniform sampling. HGR samples with higher probability on the states visited in an episode with larger temporal difference (TD) error, which is considered as a proxy measure of the amount which the RL agent can learn from an experience. The actual sampling for large TD error is performed in two steps: first, an episode is sampled from the relay buffer according to the average TD error of its experiences, and then, for the sampled episode, hindsight goal leading to larger TD error is sampled with higher probability from future visited states. The proposed method combined with Deep Deterministic Policy Gradient (DDPG), an off-policy model-free actor-critic algorithm, accelerates learning significantly faster than that without any prioritization on four challenging simulated robotic manipulation tasks. The empirical results show that HGR uses samples more efficiently than previous methods on all four tasks. A video showing experimental results is available at https://youtu.be/KKqQ3aDzk1A.

Advisors: Yoo, Changdong researcher; 유창동 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2020

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2020.8,[iii, 24 p. :]

Keywords: Multi-Goal Reinforcement Learning▼aSparse Reward▼aSample Efficiency▼aHindsight Goal Ranking; 다중 목표 강화학습▼a드문 보상▼a표본 효율성▼a사후 평가 목표 순위

URI: http://hdl.handle.net/10203/285050

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=925214&flag=dissertation

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Hindsight goal ranking on replay buffer for sparse reward environment희소 보상 환경을 위한 재생 버퍼의 사후 목표 랭킹 방법

KOASAS

Communities & Collections