(A) study on free-view image synthesis with view-dependent effects based on camera motion and local context priors카메라 움직임 및 로컬 컨텍스트 사전 정보에 기초한 시점 의존적 효과를 고려한 자유시점 이미지 합성에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 3
  • Download : 0
Recent advances in neural scene rendering have shed light on the usefulness of rendering view-dependent effects in the novel view synthesis task. In particular, NeRFs, have demonstrated that the radiance fields can be effectively learned in neural feature space via a multi-layer perceptron (MLP) that allows to render geometry and view-dependent effects for scenes that are carefully captured from multiple views. However, NeRF purely relies on multi-view consistency and cannot exploit prior knowledge such as textures and depth cues that are common across natural scenes, limiting its use when a few or only one view is available. On the other hand, to leverage the 3D prior knowledge in multi-view datasets, PixelNeRF proposed to train a multi-layer perception (MLP) which takes as input pixel locations and pixel-aligned features to generate the colors and opacities of the 3D points in the radiance fields. The pixel-aligned features which are obtained from a CNN backbone allow PixelNeRF to leverage the common priors among different scenes to render radiance fields, but cause considerably limited quality. Other works, such as single-image MPIs and MINE have also proposed single-view-based free-view synthesis, but cannot model view-dependent effects (VDE). View-dependent effects depend on the material's reflectance, which is a function of the material properties and the angle of incidence of the light. Learning such material properties and the sources of light from a single image is a very ill-posed problem. Previous works such as NeRFs of PixelNeRFs learn to directly regress the colors of pixels given the viewing directions while other methods, such as NeX, encode view-dependent effects into a given or learned basis. While these techniques are effective when learning from multiple input images, they are still limited to learning when a single image is given as input. Instead, to tackle the estimation of view-dependent effects in novel view synthesis, we propose to rely on the contents of the images and estimated (during training) or user-defined (during test time) camera motions to estimate photo-metrically realistic view-dependent effects from a single image for the first time. In addition, in this study, we propose a new geometric rendering pipeline inspired from neural volumetric rendering (NVR) by approximating NVR with a single pass of a convolutional (or transformer-based) auto-encoder network, a sampler MLP block, and a rendering MLP block. In addition, we train our networks in a self-supervised manner, that is, under the conditions that no camera poses or depth GTs are given during training (as in previous works). We present extensive experiments and show that our proposed method can learn free view synthesis with view-dependent effects on the challenging KITTI, RealEstate10k, and MannequinChallenge datasets.
Advisors
김문철researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.8,[ix, 102 p. :]

Keywords

새로운 시점 합성▼a딥 러닝▼a깊이 추정; Deep learning▼aView synthesis▼aDepth estimation

URI
http://hdl.handle.net/10203/320953
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1047249&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0