DSpace at KOASAS: Improving upper confidence reinforcement learning with bootstrapping

DSpace at KOASAS

College of Engineering(공과대학)Dept. of Industrial and Systems Engineering(산업및시스템공학과)IE-Theses_Master(석사논문)

Improving upper confidence reinforcement learning with bootstrapping강화학습에서의 효율적 탐색을 위한 부트스트랩 기법의 활용

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 80
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Min, Seungki	-
dc.contributor.advisor	민승기	-
dc.contributor.advisor	Kim, Kyoung-Kuk	-
dc.contributor.advisor	김경국	-
dc.contributor.author	Kim, Sanghwa	-
dc.date.accessioned	2023-06-23T19:31:09Z	-
dc.date.available	2023-06-23T19:31:09Z	-
dc.date.issued	2022	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997783&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/308787	-
dc.description	학위논문(석사) - 한국과학기술원 : 산업및시스템공학과, 2022.2,[iii, 30 p. :]	-
dc.description.abstract	Upper confidence reinforcement learning (UCRL) algorithms are shown to be very effective to solve online reinforcement learning problems in which the environment is given by a Markov decision process (MDP) with unknown reward distributions and unknown state transition probabilities. Analogously to upper confidence bound (UCB) algorithm, UCRL algorithm constructs a set of plausible MDPs that contains the true MDP with a high probability, and finds an exploration policy based on optimistic interpretation of this confidence set. To achieve optimal balance between exploration and exploitation, it is crucial to construct the set of plausible MDPs as tight as possible. We introduce bootstrap techniques in construction of the set of plausible MDPs, in addition to the concentration inequalities such as Hoeffding's inequality and empirical Bernstein inequality used in the previous UCRL algorithms. By doing so, we can further utilize the whole distribution of given data thereby making the set of plausible MDPs tighter while preserving theoretical guarantees on the performance of worst case. We demonstrate through experiments that our proposed bootstrapping UCRL algorithms improve the existing UCRL algorithms by 5%-30% in terms of cumulative regret, and also provide theoretical analysis showing that this improvement can be carried out without degrading their performance guarantees.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.title	Improving upper confidence reinforcement learning with bootstrapping	-
dc.title.alternative	강화학습에서의 효율적 탐색을 위한 부트스트랩 기법의 활용	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :산업및시스템공학과,	-
dc.contributor.alternativeauthor	김상화	-

Appears in Collection: IE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Improving upper confidence reinforcement learning with bootstrapping강화학습에서의 효율적 탐색을 위한 부트스트랩 기법의 활용

KOASAS

Communities & Collections