Online actor-critic method based on incrementally generated radial basis functions점진적으로 생성되는 방사형 기저함수 기반 온라인 액터-크리틱 방법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 964
  • Download : 0
Reinforcement learning is learning what to do so as to maximize a numerical reward signal. The reinforcement learning agent is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward through interaction with its environment. The detailed information about the environment is not given to the agent as well. Because of these properties, reinforcement learning is a natural approach to deal with sequential decision problems. Direct methods of the reinforcement learning, such as Q-learning and SARSA, are widely used because of their simplicity, but it is difficult to deal with the continuous state and action problems using them. To use those methods, the discretization process is needed in advance, and it could bring the curse of dimensionality problem. In addition, the discontinuity of action selection in those methods could result in oscillations or divergence in the learning process. An alternative is the actor-critic method using the policy gradient. The policy gradient method guarantees convergence to a local optimal policy. In this thesis, a novel actor-critic method using an incrementally constructed radial basis function network is developed to deal with continuous state and action problems. There exists one local model for each basis function and the number of local models is increased as the basis function network grows. The normalized weighted sum of their outputs is used to estimate the value function for the critic, and the models are updated with a heuristic method, which uses the local temporal difference error in the receptive field of the corresponding basis function. A Gaussian policy is used for continuous action, and it is parameterized by the mean and the standard deviation. The parameters are determined by the normalized weighed sum of the corresponding sub-parameters assigned to the basis functions, and the regular policy gradient method is used for their update proces...
Advisors
Lee, Ju-Jangresearcher이주장
Description
한국과학기술원 : 로봇공학학제전공,
Publisher
한국과학기술원
Issue Date
2013
Identifier
513482/325007  / 020075312
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 로봇공학학제전공, 2013.2, [ vii, 100 p. ]

Keywords

Reinforcement learning; actor-critic; local model; policy gradient; 강화학습; 액터-크리틱; 지역 모델; 정책기울기; 함수 추정; function approximation

URI
http://hdl.handle.net/10203/179591
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=513482&flag=dissertation
Appears in Collection
RE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0