Learning to embed, align, and augment : application to face and object recognition임베딩, 정렬, 증강을 위한 훈련 방법과 얼굴 및 객체 인식에의 응용

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 145
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorYoo, Chang Dong-
dc.contributor.advisor유창동-
dc.contributor.authorLee, Donghoon-
dc.date.accessioned2021-05-11T19:38:14Z-
dc.date.available2021-05-11T19:38:14Z-
dc.date.issued2019-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=871445&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/283271-
dc.description학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2019.8,[x, 90 p. :]-
dc.description.abstractIn computer vision, images are transformed into various forms for various purposes. For example, images are transformed into embedding vectors lie on special space to improve performance and robustness. Face images are transformed for locating their eyes, nose, and mouth in the same position. Images are augmented through label preserving transformations. This dissertation studies models and learning methods to transform images---embed, align, augment---with application to face and object recognition. First, this dissertation proposes a sparsity sharing embedding method that transforms face images into embedding vectors which are robust to variations in pose, illumination, and expression. The SSE is built on a generic identity dataset where each identity contains multiple faces under large intra-personal settings. An embedding space is learned to preserve inter-personal structures of intra-personal settings. Face images are transformed into embedding vectors, thus robust face verification under large variations in pose, illumination, and expression can be achieved. Second, two face alignment methods that locating eyes, nose, and mouth at the same position are proposed. The first method, a parallel joint boosting, simultaneously estimates poses and face landmarks. It iteratively updates the poses and face landmarks in a stage-wise manner: pose probabilities are updated based on previous face landmark estimates and face landmark estimates are updated based on previous pose probabilities. The second method is cascade Gaussian process regression trees (cGPRT). Here, GPRT is a Gaussian process with a kernel defined by a set of trees. Without increasing prediction time, the prediction of cGPRT can be performed in the same framework as the cascade regression trees (CRT) but with better generalization. Lastly, a data augmentation method is proposed to learn image transformations that improve generalization performance. Data augmentation has a large impact on the generalization performance of the image classification model. However, it is currently conducted on the basis of trial and error, and thus, the generalization performance cannot be predicted during training. This study considers an influence function that predicts how generalization performance is affected by a particular augmented training sample in terms of validation loss. The influence function provides an approximation of the change in validation loss without actually comparing the performances that include and exclude the sample in the training process. Based on this function, a differentiable augmentation network is learned to augment the input training sample to reduce validation loss.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectEmbedding▼aface alignment▼adata augmentation▼ainfluence function▼aimage recognition-
dc.subject임베딩▼a얼굴 정렬▼a데이터 증강▼a영향 함수▼a이미지 인식-
dc.titleLearning to embed, align, and augment-
dc.title.alternative임베딩, 정렬, 증강을 위한 훈련 방법과 얼굴 및 객체 인식에의 응용-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthor이동훈-
dc.title.subtitleapplication to face and object recognition-
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0