Deep robust face state estimation for intelligent systems지능형 시스템을 위한 강인한 얼굴 정보 시각 인지 방법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 710
  • Download : 0
These days, face-related tasks are very important in human-robot-interaction (HRI) areas such as humanoid robots, home service robots, remote control, user preference analysis, and advanced driver assistance systems (ADAS). The intelligent system monitors the driver's attention to the front, identifies whether the user of robot is ready to give orders to the robot, and tells where the customers are most looking. As a result, information about user's head pose and gaze is very important in the service robot industry. For the sake of this, the algorithm should be robust to various environments and processed in real time. In this dissertation, we present a deep learning based face detection, head pose estimation, and gaze estimation method. We also present the use of synthetic data to obtain a sufficient amount of data. The details of this dissertation consist of the following three topics. First, we present a multi-task deep neural network which contains multi-view face detection, bounding box refinement, and head pose estimation. We apply it to intelligent vehicle application scenarios to verify the proposed algorithm. Driver's inattention is one of the main causes of traffic accidents. To avoid such accidents, advanced driver assistance system that passively monitors the driver's activities is needed. In this dissertation, we present a novel method to estimate a head pose from a monocular camera. The proposed algorithm is based on multi-task learning deep neural network that uses a small grayscale image. The network jointly detects multi-view faces and estimates head pose even under poor environment conditions such as illumination change, vibration, large pose change, and occlusion. We also propose a multi-task learning method that does not bias on a specific task with different datasets. Moreover, in order to fertilize training dataset, we establish the RCVFace dataset that has accurate head poses. The proposed framework outperforms state-of-the-art approaches quantitatively and qualitatively with an average head pose mean error of less than 4º in real-time. The algorithm applies to driver monitoring system that is crucial for driver safety. Second, we extend the deep convolutional neural network for pose estimation to 3D model retrieval and pose estimation of industrial components. To achieve this, we propose a method to construct and utilize synthetic data in virtual space. To increase the quantity and quality of training data, we define our simulation space in the near infrared (NIR) band, and utilize the quasi-Monte Carlo (MC) method for scalable photorealistic rendering of manufactured components. Two types of convolutional neural network (CNN) architectures are trained over these synthetic data and a relatively small amount of real data. The first CNN model seeks the most discriminative information and uses it to classify industrial components with fine-grained shape attributes. Once a 3D model is identified, one of the category-specific CNNs is tested for pose regression in the second phase. The mixed data for learning object categories is useful in domain adaptation and attention mechanism in our system. We validate our data-driven method with 88 component models, and the experimental results are qualitatively demonstrated. Also, the CNNs trained with various conditions of mixed data are quantitatively analyzed. Finally, we propose a deep convolutional neural network for gaze estimation with free head pose. Prior to that, a gaze estimation method that does not require user calibration in a fixed head pose is presented. A typical gaze estimator needs an explicit personal calibration stage with many discrete fixation points. This limitation can be resolved by mapping multiple eye images and corresponding saliency maps of a video clip during an implicit calibration stage. Compared to previous calibration-free methods, our approach clusters eye images by using Gaussian Mixture Model (GMM) in order to increase calibration accuracy and reduce training redundancy. Eye feature vectors representing eye images undergo soft clustering with GMM as well as the corresponding saliency maps for aggregation. The GMM based soft-clustering boosts the accuracy of Gaussian process regression which maps between eye feature vectors and gaze directions given this constructed data. The experimental results show an increase in gaze estimation accuracy compared to previous works on calibration-free methods. Furthermore, the proposed head pose-free gaze estimation method uses only a small gray scale image without any specific equipment such as IR illumination device. The proposed 3D gaze estimation method differs from the existing methods estimating only the 2D coordinates on a specific screen. Since it estimates gaze direction in space, it is more widely used such as driver assistance system, psychological analysis, and marketing. In order to fertilize training dataset, we establish and release the synthetic dataset (SynFace) that has accurate head poses, gaze directions, and facial landmarks. The proposed method outperforms state-of-the-art methods with mean error of less than 4º.
Advisors
Kweon, In Soresearcher권인소researcher
Description
한국과학기술원 :로봇공학학제전공,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 로봇공학학제전공, 2018.8,[vi, 78p :]

Keywords

Head pose▼agaze▼asynthetic data▼adomain adaptation▼aadvanced driver assistance system (ADAS)▼aintelligent system▼adeep learning▼aconvolutional neural network (CNN); 머리 자세▼a시선▼a합성 데이터▼a도매인 적응▼a지능형 시스템▼a딥러닝▼a신경망

URI
http://hdl.handle.net/10203/264613
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=827867&flag=dissertation
Appears in Collection
RE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0