DSpace at KOASAS: Offline-to-online reinforcement learning via balanced experience replay and pessimistic Q-ensemble

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Theses_Master(석사논문)

Offline-to-online reinforcement learning via balanced experience replay and pessimistic Q-ensemble균형된 경험 리플레이와 보수적인 앙상블 Q 러닝을 통한 오프라인-온라인 강화학습

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 435
Download : 0

Export

Lee, Seunghyun

Recent progress in offline reinforcement learning (RL) has made it possible to train strong RL agents from offline datasets. However, depending on the quality of the trained agents and the application being considered, it is often desirable to fine-tune such offline RL agents via further online interaction. Here, we make an observation that state-action distribution shift may lead to severe bootstrap error during fine-tuning. To address this issue, we first propose a balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples from the offline dataset. Furthermore, we leverage multiple pessimistic offline Q-functions, thereby preventing overoptimism concerning unfamiliar actions at novel states during the initial training phase. We show that the proposed method stabilizes Q-learning during fine-tuning and improves the final performance and sample-efficiency of fine-tuned agents on various continuous control tasks from the D4RL benchmark suite.

Advisors: Shin, Jinwoo researcher; 신진우 researcher

Description: 한국과학기술원 :AI대학원,

Publisher: 한국과학기술원

Issue Date: 2021

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : AI대학원, 2021.8,[iv, 24 p. :]

Keywords: Reinforcement Learning▼aOffline Reinforcement Learning▼aFine-tuning▼aExperience Replay▼aReinforcement Learning; 강화학습▼a오프라인 강화학습▼a미세조정▼a경험 리플레이▼a강화학습

URI: http://hdl.handle.net/10203/292498

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=963744&flag=dissertation

Appears in Collection: AI-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Offline-to-online reinforcement learning via balanced experience replay and pessimistic Q-ensemble균형된 경험 리플레이와 보수적인 앙상블 Q 러닝을 통한 오프라인-온라인 강화학습

KOASAS

Communities & Collections