Models and algorithms for inverse reinforcement learning역강화학습을 위한 모델과 알고리즘

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 708
  • Download : 0
Reinforcement learning (RL) is the problem of how an agent can learn to behave optimally. A reward function plays an important role in determining the optimality because it specifies how much reward or punishment is given in every situation. We thus need an appropriate reward function that describes the objective of the problem or the preference of the agent to formalize an RL problem. However, it is a difficult task in practice. The reward function is often hand-tuned by domain experts iteratively until a satisfactory strategy is obtained via RL algorithms. Therefore, a systematic way to determine the reward function is highly desired to avoid this labor-intensive process. The main focus of this thesis is inverse reinforcement learning (IRL), which aims to infer the reward function that the domain expert is optimizing from her behavior data. Since IRL provides a framework to explore the principle of the behavior, it can be utilized in various research areas such as examining human and animal behaviors, building intelligent agents that imitate the demonstrator, and developing an econometric model for making decisions. A number of studies on IRL algorithms have appeared in the literature during the last decade, but there remain a number of challenges: (1) The IRL problem is inherently ill-posed. There are infinitely many reward functions that make the expert’s behavior optimal. (2) The expert is generally assumed to behave optimally, but she may choose sub-optimal actions. (3) The behavior data is typically assumed to be generated by a single expert having a single reward function. In practice, it is often gathered from a number of experts to obtain the enough amount of the data. (4) When dealing with large problems, we assume that the pre-defined features are given and find the reward function as a linear function of the features. However, it is difficult to specify the features that compactly represent the reward structure. (5) Although the expert is genera...
Advisors
Kim, Kee-Eungresearcher김기응
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2013
Identifier
566041/325007  / 020095375
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학과, 2013.8, [ viii, 102 p. ]

Keywords

Inverse Reinforcement Learning; 부분관찰 마코프 의사결정과정; 마코프 의사결정과정; 강화학습; 역강화학습; POMDP; Reinforcement Learning; MDP

URI
http://hdl.handle.net/10203/197804
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=566041&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0