DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Donghwan | ko |
dc.contributor.author | Hu, Jianghai | ko |
dc.contributor.author | He, Niao | ko |
dc.date.accessioned | 2023-08-14T02:00:09Z | - |
dc.date.available | 2023-08-14T02:00:09Z | - |
dc.date.created | 2022-11-14 | - |
dc.date.created | 2022-11-14 | - |
dc.date.issued | 2023-08 | - |
dc.identifier.citation | SIAM JOURNAL ON CONTROL AND OPTIMIZATION, v.61, no.3, pp.1861 - 1880 | - |
dc.identifier.issn | 0363-0129 | - |
dc.identifier.uri | http://hdl.handle.net/10203/311457 | - |
dc.description.abstract | This paper develops a novel control-theoretic framework to analyze the non-asymptotic convergence of Q-learning. We show that the dynamics of asynchronous Q-learning with a constant step-size can be naturally formulated as a discrete-time stochastic affine switching system. Moreover, the evolution of the Q-learning estimation error is over- and underestimated by trajectories of two simpler dynamical systems. Based on these two systems, we derive a new finite-time error bound of asynchronous Q-learning when a constant stepsize is used. Our analysis also sheds light on the overestimation phenomenon of Q-learning. We further illustrate and validate the analysis through numerical simulations. | - |
dc.language | English | - |
dc.publisher | SIAM PUBLICATIONS | - |
dc.title | A Discrete-Time Switching System Analysis of Q-learning | - |
dc.type | Article | - |
dc.identifier.wosid | 001031998600033 | - |
dc.identifier.scopusid | 2-s2.0-85165535477 | - |
dc.type.rims | ART | - |
dc.citation.volume | 61 | - |
dc.citation.issue | 3 | - |
dc.citation.beginningpage | 1861 | - |
dc.citation.endingpage | 1880 | - |
dc.citation.publicationname | SIAM JOURNAL ON CONTROL AND OPTIMIZATION | - |
dc.identifier.doi | 10.48550/arXiv.2102.08583 | - |
dc.contributor.localauthor | Lee, Donghwan | - |
dc.contributor.nonIdAuthor | Hu, Jianghai | - |
dc.contributor.nonIdAuthor | He, Niao | - |
dc.description.isOpenAccess | N | - |
dc.type.journalArticle | Article | - |
dc.subject.keywordAuthor | Q-learning | - |
dc.subject.keywordAuthor | switched linear system | - |
dc.subject.keywordAuthor | stochastic approximation | - |
dc.subject.keywordPlus | STOCHASTIC-APPROXIMATION | - |
dc.subject.keywordPlus | CONVERGENCE | - |
dc.subject.keywordPlus | RATES | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.