Offline-to-online reinforcement learning via balanced experience replay and pessimistic Q-ensemble균형된 경험 리플레이와 보수적인 앙상블 Q 러닝을 통한 오프라인-온라인 강화학습

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 435
  • Download : 0
Recent progress in offline reinforcement learning (RL) has made it possible to train strong RL agents from offline datasets. However, depending on the quality of the trained agents and the application being considered, it is often desirable to fine-tune such offline RL agents via further online interaction. Here, we make an observation that state-action distribution shift may lead to severe bootstrap error during fine-tuning. To address this issue, we first propose a balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples from the offline dataset. Furthermore, we leverage multiple pessimistic offline Q-functions, thereby preventing overoptimism concerning unfamiliar actions at novel states during the initial training phase. We show that the proposed method stabilizes Q-learning during fine-tuning and improves the final performance and sample-efficiency of fine-tuned agents on various continuous control tasks from the D4RL benchmark suite.
Advisors
Shin, Jinwooresearcher신진우researcher
Description
한국과학기술원 :AI대학원,
Publisher
한국과학기술원
Issue Date
2021
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : AI대학원, 2021.8,[iv, 24 p. :]

Keywords

Reinforcement Learning▼aOffline Reinforcement Learning▼aFine-tuning▼aExperience Replay▼aReinforcement Learning; 강화학습▼a오프라인 강화학습▼a미세조정▼a경험 리플레이▼a강화학습

URI
http://hdl.handle.net/10203/292498
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=963744&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0