DSpace at KOASAS: Online actor-critic method based on incrementally generated radial basis functions

DSpace at KOASAS

College of Engineering(공과대학)The Robotics Program(로봇공학학제전공)RE-Theses_Ph.D.(박사논문)

Online actor-critic method based on incrementally generated radial basis functions점진적으로 생성되는 방사형 기저함수 기반 온라인 액터-크리틱 방법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 994
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Lee, Ju-Jang	-
dc.contributor.advisor	이주장	-
dc.contributor.author	Lee, Dong-Hyun	-
dc.contributor.author	이동현	-
dc.date.accessioned	2013-09-10T07:32:26Z	-
dc.date.available	2013-09-10T07:32:26Z	-
dc.date.issued	2013	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=513482&flag=dissertation	-
dc.identifier.uri	http://hdl.handle.net/10203/179591	-
dc.description	학위논문(박사) - 한국과학기술원 : 로봇공학학제전공, 2013.2, [ vii, 100 p. ]	-
dc.description.abstract	Reinforcement learning is learning what to do so as to maximize a numerical reward signal. The reinforcement learning agent is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward through interaction with its environment. The detailed information about the environment is not given to the agent as well. Because of these properties, reinforcement learning is a natural approach to deal with sequential decision problems. Direct methods of the reinforcement learning, such as Q-learning and SARSA, are widely used because of their simplicity, but it is difficult to deal with the continuous state and action problems using them. To use those methods, the discretization process is needed in advance, and it could bring the curse of dimensionality problem. In addition, the discontinuity of action selection in those methods could result in oscillations or divergence in the learning process. An alternative is the actor-critic method using the policy gradient. The policy gradient method guarantees convergence to a local optimal policy. In this thesis, a novel actor-critic method using an incrementally constructed radial basis function network is developed to deal with continuous state and action problems. There exists one local model for each basis function and the number of local models is increased as the basis function network grows. The normalized weighted sum of their outputs is used to estimate the value function for the critic, and the models are updated with a heuristic method, which uses the local temporal difference error in the receptive field of the corresponding basis function. A Gaussian policy is used for continuous action, and it is parameterized by the mean and the standard deviation. The parameters are determined by the normalized weighed sum of the corresponding sub-parameters assigned to the basis functions, and the regular policy gradient method is used for their update proces...	eng
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Reinforcement learning	-
dc.subject	actor-critic	-
dc.subject	local model	-
dc.subject	policy gradient	-
dc.subject	강화학습	-
dc.subject	액터-크리틱	-
dc.subject	지역 모델	-
dc.subject	정책기울기	-
dc.subject	함수 추정	-
dc.subject	function approximation	-
dc.title	Online actor-critic method based on incrementally generated radial basis functions	-
dc.title.alternative	점진적으로 생성되는 방사형 기저함수 기반 온라인 액터-크리틱 방법	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	513482/325007	-
dc.description.department	한국과학기술원 : 로봇공학학제전공,	-
dc.identifier.uid	020075312	-
dc.contributor.localauthor	Lee, Ju-Jang	-
dc.contributor.localauthor	이주장	-

Appears in Collection: RE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Online actor-critic method based on incrementally generated radial basis functions점진적으로 생성되는 방사형 기저함수 기반 온라인 액터-크리틱 방법

KOASAS

Communities & Collections