DSpace at KOASAS: Multimodal speaker identification using deep neural network

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

Multimodal speaker identification using deep neural network깊은 신경망을 이용한 멀티모달 화자 인식 알고리즘

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 499
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Kim, Dae-Shik	-
dc.contributor.advisor	김대식	-
dc.contributor.author	Jeon, Jinwoo	-
dc.date.accessioned	2018-06-20T06:22:22Z	-
dc.date.available	2018-06-20T06:22:22Z	-
dc.date.issued	2017	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=675429&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/243321	-
dc.description	학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2017.2,[iii, 29 p. :]	-
dc.description.abstract	Speaker identification is fundamentally important for the various purposes such as home device, surveillance or authorization. The main difficulty of speaker recognition is to improve the robust identification accuracy. In this paper, we present a multimodal method based on deep neural networks for speaker identification by using both face recognition and voice identification. Our proposed multimodal model shows more robust speaker identification performance. As a face recognition, we use a convolutional neural network, especially VGG Face descriptor networks. For voice identification, we use Gaussian Mixture Model based on i-vector. After feature extraction, feature vectors from each face and voice information are concatenated and trains multimodal deep neural network in order to get 1024-dimension multimodal embeddings. We validate the performance of our model by new dataset which consists of 281 TED videos. The multimodal DNN model depicts more reliable identification performance than single modality based identification methods like face recognition or speaker recognition.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Speaker Identification	-
dc.subject	deep learning	-
dc.subject	multimodal model	-
dc.subject	i-vector	-
dc.subject	convolutional neural network	-
dc.subject	화자 인식	-
dc.subject	딥 러닝	-
dc.subject	멀티모달 모델	-
dc.subject	i-벡터	-
dc.subject	컨볼루젼 신경망	-
dc.title	Multimodal speaker identification using deep neural network	-
dc.title.alternative	깊은 신경망을 이용한 멀티모달 화자 인식 알고리즘	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전기및전자공학부,	-
dc.contributor.alternativeauthor	전진우	-

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Multimodal speaker identification using deep neural network깊은 신경망을 이용한 멀티모달 화자 인식 알고리즘

KOASAS

Communities & Collections