DSpace at KOASAS: OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

DSpace at KOASAS

RIMS Collection RIMS Conference Papers

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 147
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Lee, Jongmin	ko
dc.contributor.author	Jeon, Wonseok	ko
dc.contributor.author	Lee, Byung-Jun	ko
dc.contributor.author	Pineau, Joelle	ko
dc.contributor.author	Kim, Kee-Eung	ko
dc.date.accessioned	2021-10-27T07:10:11Z	-
dc.date.available	2021-10-27T07:10:11Z	-
dc.date.created	2021-10-27	-
dc.date.issued	2021-07	-
dc.identifier.citation	International Conference on Machine Learning (ICML)	-
dc.identifier.issn	2640-3498	-
dc.identifier.uri	http://hdl.handle.net/10203/288350	-
dc.description.abstract	We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions. In offline RL, the distributional shift becomes the primary source of difficulty, which arises from the deviation of the target policy being optimized from the behavior policy used for data collection. This typically causes overestimation of action values, which poses severe problems for model-free algorithms that use bootstrapping. To mitigate the problem, prior offline RL algorithms often used sophisticated techniques that encourage underestimation of action values, which introduces an additional set of hyperparameters that need to be tuned properly. In this paper, we present an offline RL algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy and does not rely on policy-gradients, unlike previous offline RL algorithms. Using an extensive set of benchmark datasets for offline RL, we show that OptiDICE performs competitively with the state-of-the-art methods.	-
dc.language	English	-
dc.publisher	JMLR-JOURNAL MACHINE LEARNING RESEARCH	-
dc.title	OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation	-
dc.type	Conference	-
dc.identifier.wosid	000683104606014	-
dc.type.rims	CONF	-
dc.citation.publicationname	International Conference on Machine Learning (ICML)	-
dc.identifier.conferencecountry	US	-
dc.identifier.conferencelocation	ELECTR NETWORK	-
dc.contributor.localauthor	Kim, Kee-Eung	-
dc.contributor.nonIdAuthor	Lee, Jongmin	-
dc.contributor.nonIdAuthor	Jeon, Wonseok	-
dc.contributor.nonIdAuthor	Lee, Byung-Jun	-
dc.contributor.nonIdAuthor	Pineau, Joelle	-

Appears in Collection: RIMS Conference Papers

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

KOASAS

Communities & Collections