Improving upper confidence reinforcement learning with bootstrapping강화학습에서의 효율적 탐색을 위한 부트스트랩 기법의 활용

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 55
  • Download : 0
Upper confidence reinforcement learning (UCRL) algorithms are shown to be very effective to solve online reinforcement learning problems in which the environment is given by a Markov decision process (MDP) with unknown reward distributions and unknown state transition probabilities. Analogously to upper confidence bound (UCB) algorithm, UCRL algorithm constructs a set of plausible MDPs that contains the true MDP with a high probability, and finds an exploration policy based on optimistic interpretation of this confidence set. To achieve optimal balance between exploration and exploitation, it is crucial to construct the set of plausible MDPs as tight as possible. We introduce bootstrap techniques in construction of the set of plausible MDPs, in addition to the concentration inequalities such as Hoeffding's inequality and empirical Bernstein inequality used in the previous UCRL algorithms. By doing so, we can further utilize the whole distribution of given data thereby making the set of plausible MDPs tighter while preserving theoretical guarantees on the performance of worst case. We demonstrate through experiments that our proposed bootstrapping UCRL algorithms improve the existing UCRL algorithms by 5%-30% in terms of cumulative regret, and also provide theoretical analysis showing that this improvement can be carried out without degrading their performance guarantees.
Advisors
Min, Seungkiresearcher민승기researcherKim, Kyoung-Kukresearcher김경국researcher
Description
한국과학기술원 :산업및시스템공학과,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 산업및시스템공학과, 2022.2,[iii, 30 p. :]

URI
http://hdl.handle.net/10203/308787
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997783&flag=dissertation
Appears in Collection
IE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0