DSpace at KOASAS: Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

DSpace at KOASAS

RIMS Collection RIMS Conference Papers

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 127
Download : 0

Export

Lee, Seunghyun / Seo, Younggyo / Lee, Kimin researcher / Abbeel, Pieter / Shin, Jinwoo researcher

Recent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets. However, depend- ing on the quality of the trained agents and the application being considered, it is often desirable to fine-tune such agents via further online interactions. In this paper, we observe that state-action distribution shift may lead to severe bootstrap error during fine-tuning, which destroys the good initial policy obtained via offline RL. To address this issue, we first propose a balanced replay scheme that priori- tizes samples encountered online while also encouraging the use of near-on-policy samples from the offline dataset. Furthermore, we leverage multiple Q-functions trained pessimistically offline, thereby preventing overoptimism concerning unfa- miliar actions at novel states during the initial training phase. We show that the proposed method improves sample-efficiency and final performance of the fine- tuned robotic agents on various locomotion and manipulation tasks. Our code is available at: https://github.com/shlee94/Off2OnRL.

Publisher: CoRL Conference Chair

Issue Date: 2021-11

Language: English

Citation: 5th Annual Conference on Robot Learning(coRL 2021)

URI: http://hdl.handle.net/10203/290710

Appears in Collection: AI-Conference Papers(학술대회논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

KOASAS

Communities & Collections