DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Lee, Ju-Jang | - |
dc.contributor.advisor | 이주장 | - |
dc.contributor.author | Lee, Dong-Hyun | - |
dc.contributor.author | 이동현 | - |
dc.date.accessioned | 2013-09-10T07:32:26Z | - |
dc.date.available | 2013-09-10T07:32:26Z | - |
dc.date.issued | 2013 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=513482&flag=dissertation | - |
dc.identifier.uri | http://hdl.handle.net/10203/179591 | - |
dc.description | 학위논문(박사) - 한국과학기술원 : 로봇공학학제전공, 2013.2, [ vii, 100 p. ] | - |
dc.description.abstract | Reinforcement learning is learning what to do so as to maximize a numerical reward signal. The reinforcement learning agent is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward through interaction with its environment. The detailed information about the environment is not given to the agent as well. Because of these properties, reinforcement learning is a natural approach to deal with sequential decision problems. Direct methods of the reinforcement learning, such as Q-learning and SARSA, are widely used because of their simplicity, but it is difficult to deal with the continuous state and action problems using them. To use those methods, the discretization process is needed in advance, and it could bring the curse of dimensionality problem. In addition, the discontinuity of action selection in those methods could result in oscillations or divergence in the learning process. An alternative is the actor-critic method using the policy gradient. The policy gradient method guarantees convergence to a local optimal policy. In this thesis, a novel actor-critic method using an incrementally constructed radial basis function network is developed to deal with continuous state and action problems. There exists one local model for each basis function and the number of local models is increased as the basis function network grows. The normalized weighted sum of their outputs is used to estimate the value function for the critic, and the models are updated with a heuristic method, which uses the local temporal difference error in the receptive field of the corresponding basis function. A Gaussian policy is used for continuous action, and it is parameterized by the mean and the standard deviation. The parameters are determined by the normalized weighed sum of the corresponding sub-parameters assigned to the basis functions, and the regular policy gradient method is used for their update proces... | eng |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | Reinforcement learning | - |
dc.subject | actor-critic | - |
dc.subject | local model | - |
dc.subject | policy gradient | - |
dc.subject | 강화학습 | - |
dc.subject | 액터-크리틱 | - |
dc.subject | 지역 모델 | - |
dc.subject | 정책기울기 | - |
dc.subject | 함수 추정 | - |
dc.subject | function approximation | - |
dc.title | Online actor-critic method based on incrementally generated radial basis functions | - |
dc.title.alternative | 점진적으로 생성되는 방사형 기저함수 기반 온라인 액터-크리틱 방법 | - |
dc.type | Thesis(Ph.D) | - |
dc.identifier.CNRN | 513482/325007 | - |
dc.description.department | 한국과학기술원 : 로봇공학학제전공, | - |
dc.identifier.uid | 020075312 | - |
dc.contributor.localauthor | Lee, Ju-Jang | - |
dc.contributor.localauthor | 이주장 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.