DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Min, Seungki | - |
dc.contributor.advisor | 민승기 | - |
dc.contributor.advisor | Kim, Kyoung-Kuk | - |
dc.contributor.advisor | 김경국 | - |
dc.contributor.author | Kim, Sanghwa | - |
dc.date.accessioned | 2023-06-23T19:31:09Z | - |
dc.date.available | 2023-06-23T19:31:09Z | - |
dc.date.issued | 2022 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997783&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/308787 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 산업및시스템공학과, 2022.2,[iii, 30 p. :] | - |
dc.description.abstract | Upper confidence reinforcement learning (UCRL) algorithms are shown to be very effective to solve online reinforcement learning problems in which the environment is given by a Markov decision process (MDP) with unknown reward distributions and unknown state transition probabilities. Analogously to upper confidence bound (UCB) algorithm, UCRL algorithm constructs a set of plausible MDPs that contains the true MDP with a high probability, and finds an exploration policy based on optimistic interpretation of this confidence set. To achieve optimal balance between exploration and exploitation, it is crucial to construct the set of plausible MDPs as tight as possible. We introduce bootstrap techniques in construction of the set of plausible MDPs, in addition to the concentration inequalities such as Hoeffding's inequality and empirical Bernstein inequality used in the previous UCRL algorithms. By doing so, we can further utilize the whole distribution of given data thereby making the set of plausible MDPs tighter while preserving theoretical guarantees on the performance of worst case. We demonstrate through experiments that our proposed bootstrapping UCRL algorithms improve the existing UCRL algorithms by 5%-30% in terms of cumulative regret, and also provide theoretical analysis showing that this improvement can be carried out without degrading their performance guarantees. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.title | Improving upper confidence reinforcement learning with bootstrapping | - |
dc.title.alternative | 강화학습에서의 효율적 탐색을 위한 부트스트랩 기법의 활용 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :산업및시스템공학과, | - |
dc.contributor.alternativeauthor | 김상화 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.