Batch Reinforcement Learning with Hyperparameter Gradients

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 157
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorLee, Jongminko
dc.contributor.authorLee, Byung-Junko
dc.contributor.authorVrancx, Peterko
dc.contributor.authorKim, Donghoko
dc.contributor.authorKim, Kee-Eungko
dc.date.accessioned2020-12-10T12:30:30Z-
dc.date.available2020-12-10T12:30:30Z-
dc.date.created2020-12-02-
dc.date.created2020-12-02-
dc.date.issued2020-07-16-
dc.identifier.citationThe 37th International Conference on Machine Learning (ICML 2020), pp.5681 - 5691-
dc.identifier.issn2640-3498-
dc.identifier.urihttp://hdl.handle.net/10203/278163-
dc.description.abstractWe consider the batch reinforcement learning problem where the agent needs to learn only from a fixed batch of data, without further interaction with the environment. In such a scenario, we want to prevent the optimized policy from deviating too much from the data collection policy since the estimation becomes highly unstable otherwise due to the off-policy nature of the problem. However, imposing this requirement too strongly will result in a policy that merely follows the data collection policy. Unlike prior work where this trade-off is controlled by hand-tuned hyperparameters, we propose a novel batch reinforcement learning approach, batch optimization of policy and hyperparameter (BOPAH), that uses a gradient-based optimization of the hyperparameter using held-out data. We show that BOPAH outperforms other batch reinforcement learning algorithms in tabular and continuous control tasks, by finding a good balance to the trade-off between adhering to the data collection policy and pursuing the possible policy improvement.-
dc.languageEnglish-
dc.publisherInternational Conference on Machine Learning-
dc.titleBatch Reinforcement Learning with Hyperparameter Gradients-
dc.typeConference-
dc.identifier.wosid000683178505079-
dc.identifier.scopusid2-s2.0-85105552796-
dc.type.rimsCONF-
dc.citation.beginningpage5681-
dc.citation.endingpage5691-
dc.citation.publicationnameThe 37th International Conference on Machine Learning (ICML 2020)-
dc.identifier.conferencecountryAU-
dc.identifier.conferencelocationVirtual-
dc.contributor.localauthorKim, Kee-Eung-
dc.contributor.nonIdAuthorVrancx, Peter-
dc.contributor.nonIdAuthorKim, Dongho-
Appears in Collection
RIMS Conference Papers
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0