DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 윤세영 | - |
dc.contributor.author | Kim, Jung-hun | - |
dc.contributor.author | 김정훈 | - |
dc.date.accessioned | 2024-07-26T19:30:34Z | - |
dc.date.available | 2024-07-26T19:30:34Z | - |
dc.date.issued | 2023 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1046811&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/320861 | - |
dc.description | 학위논문(박사) - 한국과학기술원 : 산업및시스템공학과, 2023.8,[ii, 70 p. :] | - |
dc.description.abstract | In this thesis, we study the infinitely-many armed bandit problem in rotting rewards where the mean reward of an arm may decrease at each arm pull and, otherwise, it remains unchanged. We first study a simple model where initial mean rewards are generated from a uniform distribution and there is a rotting rate constraint with maximum rotting rate $\varrho=o(1)$. We first provide a regret lower bound of this problem. Then we propose an efficient algorithm using UCB and a threshold for detecting sub-optimal arms achieving a near-optimal regret bound. We then study a more generalized model where initial mean rewards follow a power function class of distributions with exponent parameter $\beta > 0$. Also, for rotting rewards, we study two cases, one under which the cumulative amount of rotting is $V_T$ and the other under which the number of rotting instances is $S_T$ over a time horizon of $T$ time steps. We first provide regret lower bounds for both slow rotting with $V_T=o(T)$ and abrupt rotting with $S_T=o(T)$ scenarios. Then we propose an adaptive window-UCB algorithm for controlling the bias-variance trade-off from the rotting rewards along with a generalized threshold value for detecting suboptimal arms. The proposed algorithm achieves near-optimal regret bounds for both scenarios under some conditions. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 순차적 의사 결정▼a밴딧 알고리즘▼a감소하는 보상▼a무한히 많은 액션 | - |
dc.subject | Sequential decision making▼aBandit algorithms▼aRotting rewards▼aInfinitely many arms | - |
dc.title | Sequential decision-making with rotting rewards and infinitely many actions | - |
dc.title.alternative | 감소하는 보상 및 무한히 많은 액션에서의 순차적 의사 결정 | - |
dc.type | Thesis(Ph.D) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :산업및시스템공학과, | - |
dc.contributor.alternativeauthor | Yun, Se-Young | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.