Bayesian Optimistic Kullback-Leibler Exploration

Cited 2 time in webofscience Cited 1 time in scopus
  • Hit : 591
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorLee, Kang Hoonko
dc.contributor.authorKim, Geon-Hyeongko
dc.contributor.authorOrtega, Pedroko
dc.contributor.authorLee, Daniel D.ko
dc.contributor.authorKim, Kee-Eungko
dc.date.accessioned2019-06-24T01:30:17Z-
dc.date.available2019-06-24T01:30:17Z-
dc.date.created2019-03-08-
dc.date.created2019-03-08-
dc.date.created2019-03-08-
dc.date.created2019-03-08-
dc.date.issued2019-05-
dc.identifier.citationMACHINE LEARNING, v.108, no.5, pp.765 - 783-
dc.identifier.issn0885-6125-
dc.identifier.urihttp://hdl.handle.net/10203/262791-
dc.description.abstractWe consider a Bayesian approach to model-based reinforcement learning, where the agent uses a distribution of environment models to find the action that optimally trades off exploration and exploitation. Unfortunately, it is intractable to find the Bayes-optimal solution to the problem except for restricted cases. In this paper, we present BOKLE, a simple algorithm that uses Kullback–Leibler divergence to constrain the set of plausible models for guiding the exploration. We provide a formal analysis that this algorithm is near Bayes-optimal with high probability. We also show an asymptotic relation between the solution pursued by BOKLE and a well-known algorithm called Bayesian exploration bonus. Finally, we show experimental results that clearly demonstrate the exploration efficiency of the algorithm.-
dc.languageEnglish-
dc.publisherSPRINGER-
dc.titleBayesian Optimistic Kullback-Leibler Exploration-
dc.typeArticle-
dc.identifier.wosid000470185100004-
dc.identifier.scopusid2-s2.0-85058968448-
dc.type.rimsART-
dc.citation.volume108-
dc.citation.issue5-
dc.citation.beginningpage765-
dc.citation.endingpage783-
dc.citation.publicationnameMACHINE LEARNING-
dc.identifier.doi10.1007/s10994-018-5767-4-
dc.contributor.localauthorKim, Kee-Eung-
dc.contributor.nonIdAuthorOrtega, Pedro-
dc.contributor.nonIdAuthorLee, Daniel D.-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle; Proceedings Paper-
dc.subject.keywordAuthorModel-based Bayesian reinforcement learning-
dc.subject.keywordAuthorBayes-adaptive Markov decision process-
dc.subject.keywordAuthorPAC-BAMDP-
Appears in Collection
AI-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 2 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0