Sequential decision-making with rotting rewards and infinitely many actions감소하는 보상 및 무한히 많은 액션에서의 순차적 의사 결정

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 3
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor윤세영-
dc.contributor.authorKim, Jung-hun-
dc.contributor.author김정훈-
dc.date.accessioned2024-07-26T19:30:34Z-
dc.date.available2024-07-26T19:30:34Z-
dc.date.issued2023-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1046811&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/320861-
dc.description학위논문(박사) - 한국과학기술원 : 산업및시스템공학과, 2023.8,[ii, 70 p. :]-
dc.description.abstractIn this thesis, we study the infinitely-many armed bandit problem in rotting rewards where the mean reward of an arm may decrease at each arm pull and, otherwise, it remains unchanged. We first study a simple model where initial mean rewards are generated from a uniform distribution and there is a rotting rate constraint with maximum rotting rate $\varrho=o(1)$. We first provide a regret lower bound of this problem. Then we propose an efficient algorithm using UCB and a threshold for detecting sub-optimal arms achieving a near-optimal regret bound. We then study a more generalized model where initial mean rewards follow a power function class of distributions with exponent parameter $\beta > 0$. Also, for rotting rewards, we study two cases, one under which the cumulative amount of rotting is $V_T$ and the other under which the number of rotting instances is $S_T$ over a time horizon of $T$ time steps. We first provide regret lower bounds for both slow rotting with $V_T=o(T)$ and abrupt rotting with $S_T=o(T)$ scenarios. Then we propose an adaptive window-UCB algorithm for controlling the bias-variance trade-off from the rotting rewards along with a generalized threshold value for detecting suboptimal arms. The proposed algorithm achieves near-optimal regret bounds for both scenarios under some conditions.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject순차적 의사 결정▼a밴딧 알고리즘▼a감소하는 보상▼a무한히 많은 액션-
dc.subjectSequential decision making▼aBandit algorithms▼aRotting rewards▼aInfinitely many arms-
dc.titleSequential decision-making with rotting rewards and infinitely many actions-
dc.title.alternative감소하는 보상 및 무한히 많은 액션에서의 순차적 의사 결정-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :산업및시스템공학과,-
dc.contributor.alternativeauthorYun, Se-Young-
Appears in Collection
IE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0