DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Seunghyun | ko |
dc.contributor.author | Seo, Younggyo | ko |
dc.contributor.author | Lee, Kimin | ko |
dc.contributor.author | Abbeel, Pieter | ko |
dc.contributor.author | Shin, Jinwoo | ko |
dc.date.accessioned | 2021-12-16T06:51:51Z | - |
dc.date.available | 2021-12-16T06:51:51Z | - |
dc.date.created | 2021-12-02 | - |
dc.date.created | 2021-12-02 | - |
dc.date.issued | 2021-11 | - |
dc.identifier.citation | 5th Annual Conference on Robot Learning(coRL 2021) | - |
dc.identifier.uri | http://hdl.handle.net/10203/290710 | - |
dc.description.abstract | Recent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets. However, depend- ing on the quality of the trained agents and the application being considered, it is often desirable to fine-tune such agents via further online interactions. In this paper, we observe that state-action distribution shift may lead to severe bootstrap error during fine-tuning, which destroys the good initial policy obtained via offline RL. To address this issue, we first propose a balanced replay scheme that priori- tizes samples encountered online while also encouraging the use of near-on-policy samples from the offline dataset. Furthermore, we leverage multiple Q-functions trained pessimistically offline, thereby preventing overoptimism concerning unfa- miliar actions at novel states during the initial training phase. We show that the proposed method improves sample-efficiency and final performance of the fine- tuned robotic agents on various locomotion and manipulation tasks. Our code is available at: https://github.com/shlee94/Off2OnRL. | - |
dc.language | English | - |
dc.publisher | CoRL Conference Chair | - |
dc.title | Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble | - |
dc.type | Conference | - |
dc.type.rims | CONF | - |
dc.citation.publicationname | 5th Annual Conference on Robot Learning(coRL 2021) | - |
dc.identifier.conferencecountry | UK | - |
dc.identifier.conferencelocation | London, Vitual | - |
dc.contributor.localauthor | Lee, Kimin | - |
dc.contributor.localauthor | Shin, Jinwoo | - |
dc.contributor.nonIdAuthor | Abbeel, Pieter | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.