DSpace at KOASAS: Sequential decision-making with rotting rewards and infinitely many actions

DSpace at KOASAS

College of Engineering(공과대학)Dept. of Industrial and Systems Engineering(산업및시스템공학과)IE-Theses_Ph.D.(박사논문)

Sequential decision-making with rotting rewards and infinitely many actions감소하는 보상 및 무한히 많은 액션에서의 순차적 의사 결정

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 3
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	윤세영	-
dc.contributor.author	Kim, Jung-hun	-
dc.contributor.author	김정훈	-
dc.date.accessioned	2024-07-26T19:30:34Z	-
dc.date.available	2024-07-26T19:30:34Z	-
dc.date.issued	2023	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1046811&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/320861	-
dc.description	학위논문(박사) - 한국과학기술원 : 산업및시스템공학과, 2023.8,[ii, 70 p. :]	-
dc.description.abstract	In this thesis, we study the infinitely-many armed bandit problem in rotting rewards where the mean reward of an arm may decrease at each arm pull and, otherwise, it remains unchanged. We first study a simple model where initial mean rewards are generated from a uniform distribution and there is a rotting rate constraint with maximum rotting rate $\varrho=o(1)$. We first provide a regret lower bound of this problem. Then we propose an efficient algorithm using UCB and a threshold for detecting sub-optimal arms achieving a near-optimal regret bound. We then study a more generalized model where initial mean rewards follow a power function class of distributions with exponent parameter $\beta > 0$. Also, for rotting rewards, we study two cases, one under which the cumulative amount of rotting is $V_T$ and the other under which the number of rotting instances is $S_T$ over a time horizon of $T$ time steps. We first provide regret lower bounds for both slow rotting with $V_T=o(T)$ and abrupt rotting with $S_T=o(T)$ scenarios. Then we propose an adaptive window-UCB algorithm for controlling the bias-variance trade-off from the rotting rewards along with a generalized threshold value for detecting suboptimal arms. The proposed algorithm achieves near-optimal regret bounds for both scenarios under some conditions.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	순차적 의사 결정▼a밴딧 알고리즘▼a감소하는 보상▼a무한히 많은 액션	-
dc.subject	Sequential decision making▼aBandit algorithms▼aRotting rewards▼aInfinitely many arms	-
dc.title	Sequential decision-making with rotting rewards and infinitely many actions	-
dc.title.alternative	감소하는 보상 및 무한히 많은 액션에서의 순차적 의사 결정	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :산업및시스템공학과,	-
dc.contributor.alternativeauthor	Yun, Se-Young	-

Appears in Collection: IE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Sequential decision-making with rotting rewards and infinitely many actions감소하는 보상 및 무한히 많은 액션에서의 순차적 의사 결정

KOASAS

Communities & Collections