Sequential decision-making with rotting rewards and infinitely many actions감소하는 보상 및 무한히 많은 액션에서의 순차적 의사 결정

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 2
  • Download : 0
In this thesis, we study the infinitely-many armed bandit problem in rotting rewards where the mean reward of an arm may decrease at each arm pull and, otherwise, it remains unchanged. We first study a simple model where initial mean rewards are generated from a uniform distribution and there is a rotting rate constraint with maximum rotting rate $\varrho=o(1)$. We first provide a regret lower bound of this problem. Then we propose an efficient algorithm using UCB and a threshold for detecting sub-optimal arms achieving a near-optimal regret bound. We then study a more generalized model where initial mean rewards follow a power function class of distributions with exponent parameter $\beta > 0$. Also, for rotting rewards, we study two cases, one under which the cumulative amount of rotting is $V_T$ and the other under which the number of rotting instances is $S_T$ over a time horizon of $T$ time steps. We first provide regret lower bounds for both slow rotting with $V_T=o(T)$ and abrupt rotting with $S_T=o(T)$ scenarios. Then we propose an adaptive window-UCB algorithm for controlling the bias-variance trade-off from the rotting rewards along with a generalized threshold value for detecting suboptimal arms. The proposed algorithm achieves near-optimal regret bounds for both scenarios under some conditions.
Advisors
윤세영researcher
Description
한국과학기술원 :산업및시스템공학과,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 산업및시스템공학과, 2023.8,[ii, 70 p. :]

Keywords

순차적 의사 결정▼a밴딧 알고리즘▼a감소하는 보상▼a무한히 많은 액션; Sequential decision making▼aBandit algorithms▼aRotting rewards▼aInfinitely many arms

URI
http://hdl.handle.net/10203/320861
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1046811&flag=dissertation
Appears in Collection
IE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0