DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Jongmin | ko |
dc.contributor.author | Lee, Byung-Jun | ko |
dc.contributor.author | Vrancx, Peter | ko |
dc.contributor.author | Kim, Dongho | ko |
dc.contributor.author | Kim, Kee-Eung | ko |
dc.date.accessioned | 2020-12-10T12:30:30Z | - |
dc.date.available | 2020-12-10T12:30:30Z | - |
dc.date.created | 2020-12-02 | - |
dc.date.created | 2020-12-02 | - |
dc.date.issued | 2020-07-16 | - |
dc.identifier.citation | The 37th International Conference on Machine Learning (ICML 2020), pp.5681 - 5691 | - |
dc.identifier.issn | 2640-3498 | - |
dc.identifier.uri | http://hdl.handle.net/10203/278163 | - |
dc.description.abstract | We consider the batch reinforcement learning problem where the agent needs to learn only from a fixed batch of data, without further interaction with the environment. In such a scenario, we want to prevent the optimized policy from deviating too much from the data collection policy since the estimation becomes highly unstable otherwise due to the off-policy nature of the problem. However, imposing this requirement too strongly will result in a policy that merely follows the data collection policy. Unlike prior work where this trade-off is controlled by hand-tuned hyperparameters, we propose a novel batch reinforcement learning approach, batch optimization of policy and hyperparameter (BOPAH), that uses a gradient-based optimization of the hyperparameter using held-out data. We show that BOPAH outperforms other batch reinforcement learning algorithms in tabular and continuous control tasks, by finding a good balance to the trade-off between adhering to the data collection policy and pursuing the possible policy improvement. | - |
dc.language | English | - |
dc.publisher | International Conference on Machine Learning | - |
dc.title | Batch Reinforcement Learning with Hyperparameter Gradients | - |
dc.type | Conference | - |
dc.identifier.wosid | 000683178505079 | - |
dc.identifier.scopusid | 2-s2.0-85105552796 | - |
dc.type.rims | CONF | - |
dc.citation.beginningpage | 5681 | - |
dc.citation.endingpage | 5691 | - |
dc.citation.publicationname | The 37th International Conference on Machine Learning (ICML 2020) | - |
dc.identifier.conferencecountry | AU | - |
dc.identifier.conferencelocation | Virtual | - |
dc.contributor.localauthor | Kim, Kee-Eung | - |
dc.contributor.nonIdAuthor | Vrancx, Peter | - |
dc.contributor.nonIdAuthor | Kim, Dongho | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.