Inverse Reinforcement Learning in Partially Observable Environments

Cited 56 time in webofscience Cited 0 time in scopus
  • Hit : 1162
  • Download : 670
Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behavior of an expert. Most of the existing IRL algorithms assume that the environment is modeled as a Markov decision process (MDP), although it is desirable to handle partially observable settings in order to handle more realistic scenarios. In this paper, we present IRL algorithms for partially observable environments that can be modeled as a partially observable Markov decision process (POMDP). We deal with two cases according to the representation of the given expert's behavior, namely the case in which the expert's policy is explicitly given, and the case in which the expert's trajectories are available instead. The IRL in POMDPs poses a greater challenge than in MDPs since it is not only ill-posed due to the nature of IRL, but also computationally intractable due to the hardness in solving POMDPs. To overcome these obstacles, we present algorithms that exploit some of the classical results from the POMDP literature. Experimental results on several benchmark POMDP domains show that our work is useful for partially observable settings.
Publisher
MICROTOME PUBL
Issue Date
2011-03
Language
English
Article Type
Article
Citation

JOURNAL OF MACHINE LEARNING RESEARCH, v.12, pp.691 - 730

ISSN
1532-4435
URI
http://hdl.handle.net/10203/97629
Appears in Collection
AI-Journal Papers(저널논문)
Files in This Item
000289635000002.pdf(788.06 kB)Download
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 56 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0