Regularized Behavior Cloning for Blocking the Leakage of Past Action Information

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 197
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorSeo, Seokinko
dc.contributor.authorHWANG, HYEONGJOOko
dc.contributor.authorYang, Hongseokko
dc.contributor.authorKim, Kee-Eungko
dc.date.accessioned2023-11-30T01:02:39Z-
dc.date.available2023-11-30T01:02:39Z-
dc.date.created2023-11-08-
dc.date.created2023-11-08-
dc.date.issued2023-12-13-
dc.identifier.citationThe 37th Conference on Neural Information Processing Systems (NeurIPS 2023)-
dc.identifier.urihttp://hdl.handle.net/10203/315451-
dc.description.abstractFor partially observable environments, imitation learning with observation histories (ILOH) assumes that control-relevant information is sufficiently captured in the observation histories for imitating the expert actions. In the offline setting wherethe agent is required to learn to imitate without interaction with the environment, behavior cloning (BC) has been shown to be a simple yet effective method for imitation learning. However, when the information about the actions executed in the past timesteps leaks into the observation histories, ILOH via BC often ends up imitating its own past actions. In this paper, we address this catastrophic failure by proposing a principled regularization for BC, which we name Past Action Leakage Regularization (PALR). The main idea behind our approach is to leverage the classical notion of conditional independence to mitigate the leakage. We compare different instances of our framework with natural choices of conditional independence metric and its estimator. The result of our comparison advocates the use of a particular kernel-based estimator for the conditional independence metric. We conduct an extensive set of experiments on benchmark datasets in order to assess the effectiveness of our regularization method. The experimental results show that our method significantly outperforms prior related approaches, highlighting its potential to successfully imitate expert actions when the past action information leaks into the observation histories.-
dc.languageEnglish-
dc.publisherNeural information processing systems foundation-
dc.titleRegularized Behavior Cloning for Blocking the Leakage of Past Action Information-
dc.typeConference-
dc.type.rimsCONF-
dc.citation.publicationnameThe 37th Conference on Neural Information Processing Systems (NeurIPS 2023)-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationNew Orleans Ernest N. Morial Convention Center-
dc.contributor.localauthorYang, Hongseok-
dc.contributor.localauthorKim, Kee-Eung-
Appears in Collection
CS-Conference Papers(학술회의논문)AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0