DSpace at KOASAS: Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

DSpace at KOASAS

RIMS Collection RIMS Conference Papers

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 126
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Lee, Seunghyun	ko
dc.contributor.author	Seo, Younggyo	ko
dc.contributor.author	Lee, Kimin	ko
dc.contributor.author	Abbeel, Pieter	ko
dc.contributor.author	Shin, Jinwoo	ko
dc.date.accessioned	2021-12-16T06:51:51Z	-
dc.date.available	2021-12-16T06:51:51Z	-
dc.date.created	2021-12-02	-
dc.date.created	2021-12-02	-
dc.date.issued	2021-11	-
dc.identifier.citation	5th Annual Conference on Robot Learning(coRL 2021)	-
dc.identifier.uri	http://hdl.handle.net/10203/290710	-
dc.description.abstract	Recent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets. However, depend- ing on the quality of the trained agents and the application being considered, it is often desirable to fine-tune such agents via further online interactions. In this paper, we observe that state-action distribution shift may lead to severe bootstrap error during fine-tuning, which destroys the good initial policy obtained via offline RL. To address this issue, we first propose a balanced replay scheme that priori- tizes samples encountered online while also encouraging the use of near-on-policy samples from the offline dataset. Furthermore, we leverage multiple Q-functions trained pessimistically offline, thereby preventing overoptimism concerning unfa- miliar actions at novel states during the initial training phase. We show that the proposed method improves sample-efficiency and final performance of the fine- tuned robotic agents on various locomotion and manipulation tasks. Our code is available at: https://github.com/shlee94/Off2OnRL.	-
dc.language	English	-
dc.publisher	CoRL Conference Chair	-
dc.title	Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble	-
dc.type	Conference	-
dc.type.rims	CONF	-
dc.citation.publicationname	5th Annual Conference on Robot Learning(coRL 2021)	-
dc.identifier.conferencecountry	UK	-
dc.identifier.conferencelocation	London, Vitual	-
dc.contributor.localauthor	Lee, Kimin	-
dc.contributor.localauthor	Shin, Jinwoo	-
dc.contributor.nonIdAuthor	Abbeel, Pieter	-

Appears in Collection: AI-Conference Papers(학술대회논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

KOASAS

Communities & Collections