DSpace at KOASAS: Batch Reinforcement Learning with Hyperparameter Gradients

DSpace at KOASAS

RIMS Collection RIMS Conference Papers

Batch Reinforcement Learning with Hyperparameter Gradients

Cited 0 time in webofscience

Cited 0 time in

Hit : 157
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Lee, Jongmin	ko
dc.contributor.author	Lee, Byung-Jun	ko
dc.contributor.author	Vrancx, Peter	ko
dc.contributor.author	Kim, Dongho	ko
dc.contributor.author	Kim, Kee-Eung	ko
dc.date.accessioned	2020-12-10T12:30:30Z	-
dc.date.available	2020-12-10T12:30:30Z	-
dc.date.created	2020-12-02	-
dc.date.created	2020-12-02	-
dc.date.issued	2020-07-16	-
dc.identifier.citation	The 37th International Conference on Machine Learning (ICML 2020), pp.5681 - 5691	-
dc.identifier.issn	2640-3498	-
dc.identifier.uri	http://hdl.handle.net/10203/278163	-
dc.description.abstract	We consider the batch reinforcement learning problem where the agent needs to learn only from a fixed batch of data, without further interaction with the environment. In such a scenario, we want to prevent the optimized policy from deviating too much from the data collection policy since the estimation becomes highly unstable otherwise due to the off-policy nature of the problem. However, imposing this requirement too strongly will result in a policy that merely follows the data collection policy. Unlike prior work where this trade-off is controlled by hand-tuned hyperparameters, we propose a novel batch reinforcement learning approach, batch optimization of policy and hyperparameter (BOPAH), that uses a gradient-based optimization of the hyperparameter using held-out data. We show that BOPAH outperforms other batch reinforcement learning algorithms in tabular and continuous control tasks, by finding a good balance to the trade-off between adhering to the data collection policy and pursuing the possible policy improvement.	-
dc.language	English	-
dc.publisher	International Conference on Machine Learning	-
dc.title	Batch Reinforcement Learning with Hyperparameter Gradients	-
dc.type	Conference	-
dc.identifier.wosid	000683178505079	-
dc.identifier.scopusid	2-s2.0-85105552796	-
dc.type.rims	CONF	-
dc.citation.beginningpage	5681	-
dc.citation.endingpage	5691	-
dc.citation.publicationname	The 37th International Conference on Machine Learning (ICML 2020)	-
dc.identifier.conferencecountry	AU	-
dc.identifier.conferencelocation	Virtual	-
dc.contributor.localauthor	Kim, Kee-Eung	-
dc.contributor.nonIdAuthor	Vrancx, Peter	-
dc.contributor.nonIdAuthor	Kim, Dongho	-

Appears in Collection: RIMS Conference Papers

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Batch Reinforcement Learning with Hyperparameter Gradients

KOASAS

Communities & Collections