On developing a realtime learning-based control framework for network resource management네트워크 자원 관리를 위한 실시간 학습 기반 제어 프레임워크 개발

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 3
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor정송-
dc.contributor.authorBae, Jeongmin-
dc.contributor.author배정민-
dc.date.accessioned2024-07-26T19:30:56Z-
dc.date.available2024-07-26T19:30:56Z-
dc.date.issued2023-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1047266&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/320966-
dc.description학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.8,[vi, 87 p. :]-
dc.description.abstractAs network environments become more complex and user demands become more diverse, the limitations of rule-based network management techniques are becoming apparent. Therefore, learning-based network resource management techniques that can learn optimal policies without human intervention have been actively researched in recent years. However, under time-varying network environments, policies learned in an offline manner with limited environments cannot guarantee optimal performance. In this study, we investigate realtime learning-based control for network resource management that can learn the given network environment quickly and data-efficiently with minimal performance loss. Specifically, we propose a realtime learning-based control framework for the downlink scheduling algorithm and a realtime learning-based control framework for the congestion control technique. We first present a reinforcement learning-based network scheduling algorithm for a single-hop downlink scenario that achieves throughput optimality and converges to minimal delay. To this end, we first formulate the network optimization problem as a Markov decision process (MDP) problem. Then, we introduce a new state-action value function called $Q^+$-function and develop a reinforcement learning algorithm called $Q^+$-learning with UCB (Upper Confidence Bound) exploration which guarantees small performance loss during a learning process. We also derive an upper bound of the sample complexity in our algorithm, which is more efficient than the best known bound from Q-learning with UCB exploration by a factor of $\gamma^2$ where $\gamma$ is the discount factor of the MDP problem. Furthermore, we propose a novel realtime learning-based control framework for downlink scheduling under time-varying environments. We first transform a problem with the objective of optimal throughput and queueing delay under time-varying environments into a piece-wise non-stationary MDP problem and design modules needed to enable real-time learning with the transformed problem. Specifically, we propose a novel method to leverage prior experiences in order to learn new optimal policies more efficiently compared to the random exploration algorithm. Finally, we consider a realtime learning-based control framework for congestion control. Though ideas of exploiting learning-based controls have been tried, none of them has succeeded in realizing such an ideal control due to the following fundamental challenges: 1) While time-varying network states require a learning-based control to keep learning the environment and optimal actions, it is unknown how to let it learn the optimal action without performing poorly during the learning process, and 2) it is under-explored how to identify and classify such time-varying states that have not been encountered before (i.e., unseen environments) in a detailed manner, which is crucial for learning the environment continually. To provide an answer to these challenges, in this work, we propose a new learning-based congestion control, namely CLINE, with the following two techniques: 1) CLINE predicts best-projected actions under an unseen environment by exploiting and extrapolating its inductive bias on the mapping structure between current observable network states and optimal states, which is learned during its offline training process with a finite set of environments and further improved by an online calibration process. 2) CLINE identifies and classifies the given network much more precisely by utilizing packet timing information and makes it possible for the mapping structure to expand by accumulating the experiences in each unseen environment over time.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject실시간 학습▼a강화 학습▼a연속 학습▼a네트워크 자원 관리▼a혼잡 제어-
dc.subjectRealtime learning▼aReinforcement learning▼aContinual learning▼aNetwork resource management▼aCongestion control-
dc.titleOn developing a realtime learning-based control framework for network resource management-
dc.title.alternative네트워크 자원 관리를 위한 실시간 학습 기반 제어 프레임워크 개발-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthorChong, Song-
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0