Monte-Carlo Tree Search for Constrained MDPs

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 231
  • Download : 0
Monte-Carlo Tree Search (MCTS) is the state-ofthe-art online planning algorithm for very large MDPs. However, many real-world problems inherently have multiple goals, where multi-objective sequential decision models are more natural. The constrained MDP (CMDP) is such a model that maximizes the reward while constraining the cost. The common solution method for CMDPs is linear programming (LP), which is hardly applicable to large real-world problems. In this paper, we present CCUCT (Cost-Constrained UCT), an online planning algorithm for large constrained MDPs (CMDPs) that leverages the optimization of LPinduced parameters. We show that CCUCT converges to the optimal stochastic action selection in CMDPs and it is able to solve very large CMDPs through experiments on the multi-objective version of an Atari 2600 arcade game.
Publisher
ICML/IJCAI/AAMAS Workshop on Planning and Learning (PAL)
Issue Date
2018-07-15
Language
English
Citation

ICML/IJCAI/AAMAS Workshop on Planning and Learning (PAL)

URI
http://hdl.handle.net/10203/251744
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0