In this dissertation, we address the question of how framing and labelling processes of our cognitive abilities can be computationally modelled for various industrial applications. Over the past decades, industrial jobs increasingly require too repetitive tasks that often needs very high accuracy. In the cases that a labour could not work, or the conditions for a human to work is not available, our focus is to introduce and demonstrate a new approach for artificial workers to perform the same tasks with better representation and segmentation methods. To be specific, we are interested in automating typical image editing tasks in media applications as well as a robotic assembly task in smart manufacturing.
For the first sub-task, we investigate multi-view object representations for highly accurate foreground-background separation in digital content creation. By linearly increasing the dimension of available information, we take advantage of a geometric relationship between different viewpoints. On top of exploring unique 2D appearance models from a single viewpoint, we analyze how much the multi-view representation has some benefits for robust initialization and segmentation. For image editing, matting regions can be adaptively detected along the object boundaries based on information theory. Our final results are high-quality alpha mattes geometrically consistent across all different viewpoints. In addition to use multiple camera viewpoints, we study a new photometric object representation using a multi-band information such as RGB and NIR channels, and develop a semantic segmentation system for smart vehicle applications. Among all the input data, we observe salient information is critical in visual recognition.
In the next sub-task, we present a CNN-based ranking system for automatically selecting natural bases and salient views of virtual 3D objects with arbitrary poses. Based on a large number of well-aligned 3D shapes and category-labelled 2D images, the data-driven solution needs category learning process for upright orientation and salient views of 3D models. Since direct annotations for the web data is not adequate, we make reasonable assumptions to come up with a way of utilizing category-labelled data for the supervised learning. Even in the slightly different context, our system fully utilizes the big data preserved by humans, thus the selected salient views for thumbnails or previews of 3D models are more appealing to humans than the other views from conventional view selection algorithms. We define a good view is a recognizable view, but we also analyze what is recognizable is actually category-specific.
Lastly, we develop deep representations of industrial components with simulated images and from data-specific salient viewpoints. While CNN-based representations replacing all the hand-designed features, it requires a huge amount of human annotations. Hence, we introduce a photo-realistic simulation space in the near-infrared band that minimizes the domain differences between real and simulated appearances. By doing so, we can learn BRDFs of various industrial components and their fine-grained shape variations from real-world and simulated data, and improve the recognition performance with mixed data. Based on our experimental results, we discuss how the simulated samples interpolate real-world samples and stabilize the training process. In addition, we select category-independent and category-specific viewpoints on target objects and analyze the benefits in recognition performance. After modifying the state-of-the-art CNN architectures for detection and semantic segmentation methods, we demonstrate component retrieval and pixel-level localization in the context of robotic assembly automation.
For all the sub-tasks, the purpose of this dissertation is to provide theoretical grounds and experimental confirmation in our new approaches to object representations and segmentation methods. After overcoming several technical issues in challenging conditions, we qualitatively and quantitatively validate the methods with various examples, and demonstrate interesting applications for each industrial field.