Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 106
  • Download : 0
Recent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets. However, depend- ing on the quality of the trained agents and the application being considered, it is often desirable to fine-tune such agents via further online interactions. In this paper, we observe that state-action distribution shift may lead to severe bootstrap error during fine-tuning, which destroys the good initial policy obtained via offline RL. To address this issue, we first propose a balanced replay scheme that priori- tizes samples encountered online while also encouraging the use of near-on-policy samples from the offline dataset. Furthermore, we leverage multiple Q-functions trained pessimistically offline, thereby preventing overoptimism concerning unfa- miliar actions at novel states during the initial training phase. We show that the proposed method improves sample-efficiency and final performance of the fine- tuned robotic agents on various locomotion and manipulation tasks. Our code is available at: https://github.com/shlee94/Off2OnRL.
Publisher
CoRL Conference Chair
Issue Date
2021-11
Language
English
Citation

5th Annual Conference on Robot Learning(coRL 2021)

URI
http://hdl.handle.net/10203/290710
Appears in Collection
AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0