POfD-BC : policy optimization from demonstrations with behavior cloning for robot hand manipulationPOfD-BC : 로봇 핸드 매니퓰레이션에서의 demonstrations과 BC를 이용한 policy optimization

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 182
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorKim, Jong-Hwan-
dc.contributor.advisor김종환-
dc.contributor.authorChoi, Yun-Seon-
dc.date.accessioned2021-05-13T19:34:28Z-
dc.date.available2021-05-13T19:34:28Z-
dc.date.issued2020-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=911415&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/284785-
dc.description학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2020.2,[iv, 29 p. :]-
dc.description.abstractThe five fingers robot hand has been developed for a long time, and there exist commercially available robot hands nowadays. But there are numerous difficulties in controlling the robot hand. The motion generation of the robot hand was virtually impossible even if any captured motion of the human hand is leveraged. As Deep learning advances, Deep-RL has shown remarkable achievement in several areas-
dc.description.abstractit also becomes a solution for controlling the robot hand, although the complexity is high. But there is a problem in RL, which is that a reward function is manually made by human knowledge. The easiest way to make a reward function is sparse reward, indicating whether some subgoals are accomplished. This paper studies the robot hand manipulation with RL in this sparse reward condition. The existing algorithm POfD, which utilizes human demonstrations, was successfully in sparse reward environments. We firstly demonstrated POfD in robot hand manipulation tasks and analyzed, resulting in not solving for all tasks. The generated motions by POfD was also seen erratic. In the view of the performance and the practicality, POfD has some limitations. We propose POfD-BC to adapt POfD into imitation tasks, such as robot hand. This new method tries to mimic human hand motions, leading to being far more natural. Furthermore, we do transfer the learned behaviors to the new environments. The new four environments of tasks have been constructed for transfer learning. These tasks relate to the previous tasks. Behaviors coming from the pre-trained parameters would have common parts with new actions. The Experiments prove that new tasks can not be completed without prior knowledge. POfD-BC solves the contact reach robot hand manipulation tasks successfully, resulting in the practical motions. With the knowledge in the previous step, the robot hand easily learns how to manipulate in more complex situations. The imitation tasks, such as a human hand, need Behavior cloning for reliable and practical learning.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectrobotics▼arobot hand▼areinforcement learning▼aBehavior Cloning(BC)▼amotion generation-
dc.subject로보틱스▼a로봇손▼a강화학습▼aBehavior Cloning(BC)▼a동작 생성-
dc.titlePOfD-BC-
dc.title.alternativePOfD-BC : 로봇 핸드 매니퓰레이션에서의 demonstrations과 BC를 이용한 policy optimization-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthor최윤선-
dc.title.subtitlepolicy optimization from demonstrations with behavior cloning for robot hand manipulation-
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0