DSpace at KOASAS: Style-based audio-driven talking head generation

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Theses_Master(석사논문)

Style-based audio-driven talking head generation스타일 기반의 음성에 따른 얼굴 비디오 생성

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 165
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Hwang, Sung Ju	-
dc.contributor.advisor	황성주	-
dc.contributor.author	Song, Minyoung	-
dc.date.accessioned	2023-06-22T19:31:29Z	-
dc.date.available	2023-06-22T19:31:29Z	-
dc.date.issued	2022	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997676&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/308230	-
dc.description	학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2022.2,[iii, 17 p. :]	-
dc.description.abstract	While audio-driven talking head generation has achieved highly realistic multi-speaker generation, previous works rely on predefined additional data such as 3D model parameters, landmarks, and head pose angles. However, these explicit supervisions are expensive as scanning 3D models require special devices in a controlled lab environment, and landmarks are a manual annotation. In this paper, we propose a novel multi-speaker talking video generation framework that does not use any predefined prior for the first time. We first design a novel style code manipulator that explores the latent space of pretrained StyleGAN3 and generates a sequence of style codes within the distribution of the generator. In this way, we achieve identity-preserving head pose matching without any support of predefined supervision. Furthermore, by leveraging the power of StyleGAN3, our framework achieves high-quality video generation. Finally, we adopt sync loss, computed from an expert discriminator that maps audio and visual features to unified space, for better lip synchronization. Our framework is fully unsupervised since we do not include any model trained with additional data. Experimental results show that our method can generate high-quality video results and show competitive performance with the state-of-the-art methods that use supervision.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.title	Style-based audio-driven talking head generation	-
dc.title.alternative	스타일 기반의 음성에 따른 얼굴 비디오 생성	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :김재철AI대학원,	-
dc.contributor.alternativeauthor	송민영	-

Appears in Collection: AI-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Style-based audio-driven talking head generation스타일 기반의 음성에 따른 얼굴 비디오 생성

KOASAS

Communities & Collections