Hindsight goal ranking on replay buffer for sparse reward environment희소 보상 환경을 위한 재생 버퍼의 사후 목표 랭킹 방법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 140
  • Download : 0
Reinforcement learning (RL) agents successively updates their parameters by way of recalling past experience via experience replay. Strongly correlated updates violate many stochastic gradient-based algorithms, but experience replay disallows temporal correlations by mixing more and less recent experience for update. Furthermore, it permits rare experience to be reused in the update. It is a well-known fact that prioritizing the experience judiciously can improve sample efficiency. This paper considers a method for prioritizing the replay experience for off-policy RL referred to as Hindsight Goal Ranking (HGR) is proposed by addressing the limitation of Hindsight Experience Replay (HER) that generates hindsight goals based on uniform sampling. HGR samples with higher probability on the states visited in an episode with larger temporal difference (TD) error, which is considered as a proxy measure of the amount which the RL agent can learn from an experience. The actual sampling for large TD error is performed in two steps: first, an episode is sampled from the relay buffer according to the average TD error of its experiences, and then, for the sampled episode, hindsight goal leading to larger TD error is sampled with higher probability from future visited states. The proposed method combined with Deep Deterministic Policy Gradient (DDPG), an off-policy model-free actor-critic algorithm, accelerates learning significantly faster than that without any prioritization on four challenging simulated robotic manipulation tasks. The empirical results show that HGR uses samples more efficiently than previous methods on all four tasks. A video showing experimental results is available at https://youtu.be/KKqQ3aDzk1A.
Advisors
Yoo, Changdongresearcher유창동researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2020
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2020.8,[iii, 24 p. :]

Keywords

Multi-Goal Reinforcement Learning▼aSparse Reward▼aSample Efficiency▼aHindsight Goal Ranking; 다중 목표 강화학습▼a드문 보상▼a표본 효율성▼a사후 평가 목표 순위

URI
http://hdl.handle.net/10203/285050
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=925214&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0