DSpace at KOASAS: NeRFFaceSpeech: one-shot audio-diven 3D talking head synthesis via generative prior

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Theses_Master(석사논문)

NeRFFaceSpeech: one-shot audio-diven 3D talking head synthesis via generative prior생성적 사전 지식을 이용한 단일 이미지로부터 음성 입력 기반 말하는 3D 얼굴 생성

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 11
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	노준용	-
dc.contributor.author	Kim, Gihoon	-
dc.contributor.author	김기훈	-
dc.date.accessioned	2024-07-30T19:30:45Z	-
dc.date.available	2024-07-30T19:30:45Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096175&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/321390	-
dc.description	학위논문(석사) - 한국과학기술원 : 문화기술대학원, 2024.2,[iv, 31 p. :]	-
dc.description.abstract	Audio-driven talking head generation is advancing from 2D to 3D content. Notably, recent advancements leveraging Neural Radiance Field (NeRF) are in the spotlight to synthesize 3D output but they need extensive paired audio-visual data for each identity, limiting their scalability. On the other hand, some studies have demonstrated that even with a single image, it is possible to generate convincing audio-driven talking head synthesis. Despite their promise, as observed, these techniques struggle to produce accurate 3D-aware results due to insufficient information on obscured regions of a single image. In this paper, we propose our novel pipeline, NeRFFaceSpeech, which enables us to bridge the trade-off between the number of images and 3D information fidelity. Using prior knowledge of generative models combined with NeRF, our method can craft a 3D-consistent facial feature space corresponding to a single image. Following this, our approach employs ray deformation to map the audio-correlated vertex dynamics from a parametric face model to the facial feature space, ensuring realistic 3D facial motion. Moreover, to replenish the lacking information in the inner-mouth area, which can not be obtained from a given single image, we introduce LipaintNet—a novel network trained in a self-supervised manner. Lastly, our comprehensive experiments demonstrate the superiority of our pipeline for producing enhanced 3D consistency in generating audio-driven talking heads from a single image compared to previous approaches.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	음성 기반 말하는 얼굴 생성▼a3D 애니메이션▼a자기 지도 학습▼a신경 방사 필드▼a생성적 사전지식	-
dc.subject	Audio-driven talking head generation▼aNeural radiance field (NeRF)▼aD-aware imaging▼aSelf-supervised learning▼aGenerative prior	-
dc.title	NeRFFaceSpeech: one-shot audio-diven 3D talking head synthesis via generative prior	-
dc.title.alternative	생성적 사전 지식을 이용한 단일 이미지로부터 음성 입력 기반 말하는 3D 얼굴 생성	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :문화기술대학원,	-
dc.contributor.alternativeauthor	Noh, Junyong	-

Appears in Collection: GCT-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

NeRFFaceSpeech: one-shot audio-diven 3D talking head synthesis via generative prior생성적 사전 지식을 이용한 단일 이미지로부터 음성 입력 기반 말하는 3D 얼굴 생성

KOASAS

Communities & Collections