Rewards Prediction Based Credit Assignment for Reinforcement Learning보상 예측 기반의 신뢰 할당을 통한 강화학습

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 277
  • Download : 0
In many reinforcement learning cases, a reward for an action is not immediately given to the action, and this is called delayed reward. When the form of reward is sparse binary rewards, under which rewards are given only when an agent succeeds in achieving a goal, success signals do not appear frequently, so the learning speed gets slow and the difficulty of learning increases. In this paper, a method to do credit assignment and improve sample efficiency by selecting key-action that contributed to receiving rewards among a series of actions, is proposed. To actions made precedent to the key-action, smaller reward than the key-action’s is given, so that the problem that success signals do not often appear can be alleviated. The main behavior is based on the predicted value of the rewards to be received based on the previous information in episode. As one kind of credit assignment method, there is a traditional reward shaping, but it requires prior knowledge of the environment, and is likely to involve the designer's bias. The proposed method can has dynamic reward shaping effect using a reward function that is modified according to the agent's experience while using sparse binary reward that does not require prior knowledge. In this paper, a key-action detection is experimented in the slide task that robot hits a puck and sends it to the goal point, and performance of the proposed method in push task, slide task, and maze solving task is shown. In the first experiment, it is confirmed that a robot detects proper key-action, which is at the moment just before the robot hit the object. In the other experiments, all the proposed cases show higher success rate or marginally improved performance than the cases without the proposed method.
Advisors
Har, Dong Sooresearcher하동수researcher
Description
한국과학기술원 :조천식녹색교통대학원,
Publisher
한국과학기술원
Issue Date
2019
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 조천식녹색교통대학원, 2019.8,[iii, 46 p. :]

Keywords

Credit Assignment; Reward Shaping; Reinforcement Learning; Delayed Reward; Sparse Binary Reward; 신뢰할당; 보상변형; 강화학습; 지연보상; 희소이진보상

URI
http://hdl.handle.net/10203/285192
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=927180&flag=dissertation
Appears in Collection
GT-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0