DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 노준용 | - |
dc.contributor.author | Kim, Gihoon | - |
dc.contributor.author | 김기훈 | - |
dc.date.accessioned | 2024-07-30T19:30:45Z | - |
dc.date.available | 2024-07-30T19:30:45Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096175&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/321390 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 문화기술대학원, 2024.2,[iv, 31 p. :] | - |
dc.description.abstract | Audio-driven talking head generation is advancing from 2D to 3D content. Notably, recent advancements leveraging Neural Radiance Field (NeRF) are in the spotlight to synthesize 3D output but they need extensive paired audio-visual data for each identity, limiting their scalability. On the other hand, some studies have demonstrated that even with a single image, it is possible to generate convincing audio-driven talking head synthesis. Despite their promise, as observed, these techniques struggle to produce accurate 3D-aware results due to insufficient information on obscured regions of a single image. In this paper, we propose our novel pipeline, NeRFFaceSpeech, which enables us to bridge the trade-off between the number of images and 3D information fidelity. Using prior knowledge of generative models combined with NeRF, our method can craft a 3D-consistent facial feature space corresponding to a single image. Following this, our approach employs ray deformation to map the audio-correlated vertex dynamics from a parametric face model to the facial feature space, ensuring realistic 3D facial motion. Moreover, to replenish the lacking information in the inner-mouth area, which can not be obtained from a given single image, we introduce LipaintNet—a novel network trained in a self-supervised manner. Lastly, our comprehensive experiments demonstrate the superiority of our pipeline for producing enhanced 3D consistency in generating audio-driven talking heads from a single image compared to previous approaches. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 음성 기반 말하는 얼굴 생성▼a3D 애니메이션▼a자기 지도 학습▼a신경 방사 필드▼a생성적 사전지식 | - |
dc.subject | Audio-driven talking head generation▼aNeural radiance field (NeRF)▼aD-aware imaging▼aSelf-supervised learning▼aGenerative prior | - |
dc.title | NeRFFaceSpeech: one-shot audio-diven 3D talking head synthesis via generative prior | - |
dc.title.alternative | 생성적 사전 지식을 이용한 단일 이미지로부터 음성 입력 기반 말하는 3D 얼굴 생성 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :문화기술대학원, | - |
dc.contributor.alternativeauthor | Noh, Junyong | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.