DSpace at KOASAS: Algorithms for model-based bayesian reinforcement learning

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Algorithms for model-based bayesian reinforcement learning모델 기반 베이지안 강화학습 알고리즘

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 613
Download : 0

Export

Lee, Kanghoon

Reinforcement Learning (RL) is the problem of an agent with the goal of maximizing long-term rewards while interacting with an unknown environment. A fundamental problem in RL is exploration-exploitation tradeoff, which refers to the agent having to balance between exploring to gather more information from the environment and exploiting the current knowledge to maximize cumulative reward. The main focus of this thesis is a model-based Bayesian reinforcement learning (BRL), which provides a principled framework for an optimal exploration-exploitation tradeoff from the Bayesian perspective. Formally, when the environment is assumed to be a Markov decision process (MDP), the Bayesian model under the uncer- tainty in environment parameters is defined as a Bayes-adaptive Markov decision process (BAMDP), which can be seen as a special case of a partially observable Markov decision process (POMDP). Although the BAMDP model provides a succinct formulation of the model-based BRL, it still remains as a computational challenge to obtain the Bayes-optimal policy. Therefore, many model-based BRL algorithms relied on two approaches: approximated model constructions or real-time search methods. In this thesis, we develop novel algorithms for finding the Bayes-optimal policy in both approaches. First, we propose an optimistic MDP construction algorithm, Bayesian Optimistic Kullback-Leibler Exploration (BOKLE), and provide a PAC-BAMDP analysis. We then propose a real-time heuristic search algorithm, Anytime Error Minimization Search for the model-based BRL (AEMS-BRL), which is a natural adaptation of a well-known online POMDP planning algorithm to the model-based BRL setting. In addition, we suggest tighter value func- tion bounds, and integrate them into AEMS-BRL for improving the efficiency of search. As a consequence, we experimentally show that these significantly improve the learning performance in standard BRL domains.

Advisors: Kim, Kee-Eung researcher; 김기응 researcher

Description: 한국과학기술원 :전산학부,

Publisher: 한국과학기술원

Issue Date: 2019

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전산학부, 2019.2,[iv, 88 p. :]

Keywords: Model-based bayesian reinforcement learning▼abayes-adaptive markov decision process▼aPAC-BAMDP▼aAEMS-BRL▼abelief-dependent value function bounds; 모델 기반 베이지안 강화학습▼a베이지 적응 마코프 의사결정과정▼a베이지 적응 마코프 의사결정 과정의 확률적 근사해 학습▼a베이지안 강화학습을 위한 지속적인 에러 최소화 트리 탐색▼a모델 상태 추정에 의존한 가치 함수 경계값

URI: http://hdl.handle.net/10203/265309

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=842409&flag=dissertation

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Algorithms for model-based bayesian reinforcement learning모델 기반 베이지안 강화학습 알고리즘

KOASAS

Communities & Collections