Alphago for belief space planning알파고의 직관을 이용한 믿음 공간 계획법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 165
  • Download : 0
In robotics, the ability to make decisions in an environment that includes uncertainty is essential. The robot should infer the current state through a series of actions and observations and plan to achieve the goal based on them. This problem can be modeled as Partially Observable Markov Decision Process (POMDP). However, high-dimensional state, action, and observation spaces in long horizon make a POMDP problem more complicated. A large number of simulations are needed to make a plan for this complex POMDP problem using previous online planning algorithms which do not guide a planner. In this paper, we learn policy network and value network by imitating prior experience data generated from simulations. Then use these networks to guide online planning algorithm to resolve complex POMDP problems more efficiently. We model the Light-Dark Room domain, one of localization problem in robotics, as a continuous POMDP. Our guided planning algorithm achieves higher success rates in this problem with less number of simulations.
Advisors
Kim, Beomjoonresearcher김범준researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2023.2,[iii, 18 p. :]

Keywords

Partially observable Markov decision process(POMDP)▼aOnline planning▼aImitaion learning; Partially observable Markov decision process(POMDP)▼a온라인 계획법▼a모방 학습

URI
http://hdl.handle.net/10203/308178
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032324&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0