Algorithms for model-based bayesian reinforcement learning = 모델 기반 베이지안 강화학습 알고리즘

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 90
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorKim, Kee-Eung-
dc.contributor.authorLee, Kanghoon-
dc.description학위논문(박사) - 한국과학기술원 : 전산학부, 2019.2,[iv, 88 p. :]-
dc.description.abstractReinforcement Learning (RL) is the problem of an agent with the goal of maximizing long-term rewards while interacting with an unknown environment. A fundamental problem in RL is exploration-exploitation tradeoff, which refers to the agent having to balance between exploring to gather more information from the environment and exploiting the current knowledge to maximize cumulative reward. The main focus of this thesis is a model-based Bayesian reinforcement learning (BRL), which provides a principled framework for an optimal exploration-exploitation tradeoff from the Bayesian perspective. Formally, when the environment is assumed to be a Markov decision process (MDP), the Bayesian model under the uncer- tainty in environment parameters is defined as a Bayes-adaptive Markov decision process (BAMDP), which can be seen as a special case of a partially observable Markov decision process (POMDP). Although the BAMDP model provides a succinct formulation of the model-based BRL, it still remains as a computational challenge to obtain the Bayes-optimal policy. Therefore, many model-based BRL algorithms relied on two approaches: approximated model constructions or real-time search methods. In this thesis, we develop novel algorithms for finding the Bayes-optimal policy in both approaches. First, we propose an optimistic MDP construction algorithm, Bayesian Optimistic Kullback-Leibler Exploration (BOKLE), and provide a PAC-BAMDP analysis. We then propose a real-time heuristic search algorithm, Anytime Error Minimization Search for the model-based BRL (AEMS-BRL), which is a natural adaptation of a well-known online POMDP planning algorithm to the model-based BRL setting. In addition, we suggest tighter value func- tion bounds, and integrate them into AEMS-BRL for improving the efficiency of search. As a consequence, we experimentally show that these significantly improve the learning performance in standard BRL domains.-
dc.subjectModel-based bayesian reinforcement learning▼abayes-adaptive markov decision process▼aPAC-BAMDP▼aAEMS-BRL▼abelief-dependent value function bounds-
dc.subject모델 기반 베이지안 강화학습▼a베이지 적응 마코프 의사결정과정▼a베이지 적응 마코프 의사결정 과정의 확률적 근사해 학습▼a베이지안 강화학습을 위한 지속적인 에러 최소화 트리 탐색▼a모델 상태 추정에 의존한 가치 함수 경계값-
dc.titleAlgorithms for model-based bayesian reinforcement learning = 모델 기반 베이지안 강화학습 알고리즘-
dc.description.department한국과학기술원 :전산학부,-
Appears in Collection
Files in This Item
There are no files associated with this item.


  • mendeley


rss_1.0 rss_2.0 atom_1.0