Online actor-critic method based on incrementally generated radial basis functions점진적으로 생성되는 방사형 기저함수 기반 온라인 액터-크리틱 방법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 994
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorLee, Ju-Jang-
dc.contributor.advisor이주장-
dc.contributor.authorLee, Dong-Hyun-
dc.contributor.author이동현-
dc.date.accessioned2013-09-10T07:32:26Z-
dc.date.available2013-09-10T07:32:26Z-
dc.date.issued2013-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=513482&flag=dissertation-
dc.identifier.urihttp://hdl.handle.net/10203/179591-
dc.description학위논문(박사) - 한국과학기술원 : 로봇공학학제전공, 2013.2, [ vii, 100 p. ]-
dc.description.abstractReinforcement learning is learning what to do so as to maximize a numerical reward signal. The reinforcement learning agent is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward through interaction with its environment. The detailed information about the environment is not given to the agent as well. Because of these properties, reinforcement learning is a natural approach to deal with sequential decision problems. Direct methods of the reinforcement learning, such as Q-learning and SARSA, are widely used because of their simplicity, but it is difficult to deal with the continuous state and action problems using them. To use those methods, the discretization process is needed in advance, and it could bring the curse of dimensionality problem. In addition, the discontinuity of action selection in those methods could result in oscillations or divergence in the learning process. An alternative is the actor-critic method using the policy gradient. The policy gradient method guarantees convergence to a local optimal policy. In this thesis, a novel actor-critic method using an incrementally constructed radial basis function network is developed to deal with continuous state and action problems. There exists one local model for each basis function and the number of local models is increased as the basis function network grows. The normalized weighted sum of their outputs is used to estimate the value function for the critic, and the models are updated with a heuristic method, which uses the local temporal difference error in the receptive field of the corresponding basis function. A Gaussian policy is used for continuous action, and it is parameterized by the mean and the standard deviation. The parameters are determined by the normalized weighed sum of the corresponding sub-parameters assigned to the basis functions, and the regular policy gradient method is used for their update proces...eng
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectReinforcement learning-
dc.subjectactor-critic-
dc.subjectlocal model-
dc.subjectpolicy gradient-
dc.subject강화학습-
dc.subject액터-크리틱-
dc.subject지역 모델-
dc.subject정책기울기-
dc.subject함수 추정-
dc.subjectfunction approximation-
dc.titleOnline actor-critic method based on incrementally generated radial basis functions-
dc.title.alternative점진적으로 생성되는 방사형 기저함수 기반 온라인 액터-크리틱 방법-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN513482/325007 -
dc.description.department한국과학기술원 : 로봇공학학제전공, -
dc.identifier.uid020075312-
dc.contributor.localauthorLee, Ju-Jang-
dc.contributor.localauthor이주장-
Appears in Collection
RE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0