Algorithms for model-based bayesian reinforcement learning모델 기반 베이지안 강화학습 알고리즘

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 548
  • Download : 0
Reinforcement Learning (RL) is the problem of an agent with the goal of maximizing long-term rewards while interacting with an unknown environment. A fundamental problem in RL is exploration-exploitation tradeoff, which refers to the agent having to balance between exploring to gather more information from the environment and exploiting the current knowledge to maximize cumulative reward. The main focus of this thesis is a model-based Bayesian reinforcement learning (BRL), which provides a principled framework for an optimal exploration-exploitation tradeoff from the Bayesian perspective. Formally, when the environment is assumed to be a Markov decision process (MDP), the Bayesian model under the uncer- tainty in environment parameters is defined as a Bayes-adaptive Markov decision process (BAMDP), which can be seen as a special case of a partially observable Markov decision process (POMDP). Although the BAMDP model provides a succinct formulation of the model-based BRL, it still remains as a computational challenge to obtain the Bayes-optimal policy. Therefore, many model-based BRL algorithms relied on two approaches: approximated model constructions or real-time search methods. In this thesis, we develop novel algorithms for finding the Bayes-optimal policy in both approaches. First, we propose an optimistic MDP construction algorithm, Bayesian Optimistic Kullback-Leibler Exploration (BOKLE), and provide a PAC-BAMDP analysis. We then propose a real-time heuristic search algorithm, Anytime Error Minimization Search for the model-based BRL (AEMS-BRL), which is a natural adaptation of a well-known online POMDP planning algorithm to the model-based BRL setting. In addition, we suggest tighter value func- tion bounds, and integrate them into AEMS-BRL for improving the efficiency of search. As a consequence, we experimentally show that these significantly improve the learning performance in standard BRL domains.
Advisors
Kim, Kee-Eungresearcher김기응researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2019
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2019.2,[iv, 88 p. :]

Keywords

Model-based bayesian reinforcement learning▼abayes-adaptive markov decision process▼aPAC-BAMDP▼aAEMS-BRL▼abelief-dependent value function bounds; 모델 기반 베이지안 강화학습▼a베이지 적응 마코프 의사결정과정▼a베이지 적응 마코프 의사결정 과정의 확률적 근사해 학습▼a베이지안 강화학습을 위한 지속적인 에러 최소화 트리 탐색▼a모델 상태 추정에 의존한 가치 함수 경계값

URI
http://hdl.handle.net/10203/265309
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=842409&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0