The demand for 3D modeling of real-world objects is getting higher with the rapid growth of the 3D printer and 3D visual contents. Although 3D scanners such as laser scanner captures highly accurate depth, those are costly. For the alternative ways, simple camera or commercial depth sensors have been used for the 3D modeling. For that, most conventional works have been used visible-band images and near-infrared (NIR) band is neglected and filtered out. Our motivation is to analyze the beneficial aspect of the NIR images possibly be used as a photometric cue for the 3D geometry. After that, we use a simple NIR camera and conduct the shape from shading so that we can estimate the 3D geometry of various real-world objects from a single-view NIR image. For this we prose cost optimization-based and deep learning-based methods. Also for the Full 3D modeling with multiple-view, we use commercial depth sensors such as Kinect and use their rough 3D geometry to resolve the ambiguity of the shape from shading. Finally we can obtain the improved 3D geometries. In this dissertation, there are four sub-tasks. The abstract description of each sub-tasks are as follows:
Near-Infrared (NIR) images of most materials exhibit less texture or albedo variations making them beneficial for vision tasks such as intrinsic image decomposition and structured light depth estimation. Understanding the reflectance properties (BRDF) of materials in the NIR wavelength range can be further useful for many photometric methods including shape from shading and inverse rendering. However, even with less albedo variation, many materials e.g., fabrics, leaves, etc. exhibit complex fine-scale surface detail making it hard to accurately estimate BRDF. In this task, we present an approach to simultaneously estimate NIR BRDF and fine-scale surface details by imaging materials under different IR lighting and viewing directions. This is achieved by an iterative scheme that alternately estimates surface detail and NIR BRDF of materials. Our setup does not require complicated gantries or calibration and we present the first NIR dataset of 100 materials including a variety of fabrics (knits, weaves, cotton, satin, leather), and organic (skin, leaves, jute, trunk, fur) and inorganic materials (plastic, concrete, carpet). The NIR BRDFs measured from material samples are used with a shape-from-shading algorithm to demonstrate fine-scale reconstruction of objects from a single NIR image.
To augment the lighting directions and materials, we present deep learning-based surface normal estimation using a single near infrared (NIR) image. We are focusing on reconstructing fine-scale surface geometry using an image captured with an uncalibrated light source. To tackle this ill-posed problem, we adopt a generative adversarial network, which is effective in recovering sharp outputs essential for fine-scale surface normal estimation. We incorporate the angular error and an integrability constraint into the objective function of the network to make the estimated normals incorporate physical characteristics. We train and validate our network on a recent NIR dataset , and also evaluate the generality of our trained model by using new external datasets that are captured with a different camera under different environments.
In the next sub-task, We propose a method to refine geometry of 3D meshes from a consumer level depth camera, e.g. Kinect, by exploiting shading cues captured from an NIR camera. A major benefit to using an NIR camera instead of an RGB camera is that the NIR images captured are narrow band images that filter out most undesired ambient light, which makes our system robust against natural indoor illumination. Moreover, for many natural objects with colorful textures in the visible spectrum, the subjects appear to have a uniform albedo in the NIR spectrum. Based on our analyses on the IR projector light of the Kinect, we define a near light source NIR shading model that describes the captured intensity as a function of surface normals, albedo, lighting direction, and distance between light source and surface points. To resolve the ambiguity in our model between the normals and distances, we utilize an initial 3D mesh from the Kinect fusion and multi-view information to reliably estimate surface details that were not captured and reconstructed by the Kinect fusion. Our approach directly operates on the mesh model for geometry refinement. We ran experiments on our algorithm for geometries captured by both the Kinect I and Kinect II, as the depth acquisition in Kinect I is based on a structured-light technique and that of the Kinect II is based on a time-of-flight (ToF) technology. The effectiveness of our approach is demonstrated through several challenging real-world examples. We have also performed a user study to evaluate the quality of the mesh models before and after our refinements.
Lastly, In this section, we additionally try to see if our RGB and NIR pair images are beneficial for recognition task. In this sub-task, we present a data-driven method for scene parsing of road scenes to utilize single-channel near-infrared (NIR) images. To overcome the lack of data problem in non-RGB spectrum, we define a new color space and decompose the task of deep scene parsing into two sub-tasks with two separate CNN architectures for chromaticity channels and semantic masks. For chromaticity estimation, we build a spatially-aligned RGB-NIR image database (40k urban scenes) to infer color information from RGB-NIR spectrum learning process and leverage existing scene parsing networks trained over already available RGB masks. From our database, we sample key frames and manually annotate them (4k ground truth masks) to finetune the network into the proposed color space. Hence, the key contribution of this work is to replace multispectral scene parsing methods with a simple yet effective approach using single NIR images. The benefits of using our algorithm and dataset are confirmed in the qualitative and quantitative experiments.
For all the sub-tasks, we validate our approaches with using various examples, and demonstrate possible applications for each industrial field.