Geometric-guided domain adaptation for semantic segmentation의미분할을 위한 기하학적 유도 영역 적응

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 2
  • Download : 0
Training the semantic segmentation models require extensive annotated data, which is challenging to obtain due to its labor-intensive and expensive nature. To address this problem, photorealistic data rendered from simulators and game engines with precise pixel-level semantic annotations are used to train segmentation networks. However, models trained on synthetic data may not perform well on real-world data due to cross-domain differences. In this dissertation, we present an extensive analysis of the causes of the gaps between the source and the target domain and introduce novel domain adaptive semantic segmentation frameworks to minimize the domain discrepancy for semantic segmentation task. In chapter 2, we present to bridge the domain gaps with self-supervision from the target data itself. Previous methods attempted to adapt models from the source data to the target data directly (to reduce the inter-domain gaps), but they fail to consider the large distribution gap among the target data itself (intra-domain gaps). To address this limitation, we propose a two-step self-supervised domain adaptation approach that addresses both the inter-domain and the intra-domain gaps. First, we adapt the model to the target domain and use an entropy-based ranking function to divide it into an easy and hard split. To reduce the intra-domain gap, we suggest using a self-supervised adaptation technique from the easy to the hard split. In chapter 3, we tackle a more practical open compound domain adaptation (OCDA) case where the target domain as the compound of multiple unknown homogeneous subdomains. The goal of OCDA is to minimize the domain gap between the labeled source domain and the unlabeled compound target domain, which benefits the model generalization to the unseen domains. Current OCDA for semantic segmentation methods adopt manual domain separation and employ a single model to simultaneously adapt to all the target subdomains. However, adapting to a target subdomain might hinder the model from adapting to other dissimilar target subdomains, which leads to limited performance. In this work, we introduce a multi-teacher framework with bidirectional photometric mixing to separately adapt to every target subdomain. First, we present an automatic domain separation to find the optimal number of subdomains. On this basis, we propose a multi-teacher framework in which each teacher model uses bidirectional photometric mixing to adapt to one target subdomain. Furthermore, we conduct an adaptive distillation to learn a student model and apply consistency regularization to improve the student generalization. In chapter 4, we leverage motion priors from videos and propose a motion-guided domain adaptation (MoDA) to address the domain gap issues. Our moDA self-supervised 3D object motion to learn effective representations in the target domain. MoDA differs from previous methods that use optical flow to establish consistency regularization. First, we propose a motion mask pre-processing module (MMP) to extract the object-level motion masks from the object motion map. The object-level motion masks may not accurately identify all the moving instances. Therefore, directly using these object-level motion masks to correct the target pseudo labels is not reliable. To handle this issue, we design a self-supervised object discovery (SOD) to update the object-level motion masks in order to accurately localize the moving objects. Moreover, we propose a semantic label mining (SLM) to improve the target noisy pseudo labels with guidance from the updated object-level motion masks. In chapter 5, we extend MoDA into MoDA-v2 which deals separately with the domain alignment on the foreground and background categories using different strategies. For the foreground categories, MoDA-v2 uses object motion to align the domain gap with two novel modules: motion-guided self-training (MST) and moving object label mining (MLM), taking the pixel-level and object-level guidance from the motion, respectively. For the background alignment, MoDA-v2 introduces background adversarial training (BAT), which contains a background category-specific discriminator. Experimental results on multiple benchmarks highlight the effectiveness of MoDA-v2 against existing approaches. Moreover, MoDA-v2 is versatile and can be used in conjunction with existing state-of-the-art approaches to further improve performance.
Advisors
권인소researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.8,[vi, 66 p :]

Keywords

도메인 적응▼a기하학적 유도 도메인 적응▼a개방형 복합 도메인 적응▼a시맨틱 분할▼a객체 움직임▼a단안 깊이 추정; Domain adaptation▼aGeometric-guided Domain Adaptation▼aOpen compound domain adaptation▼aSemantic Segmentation▼aObject Motion▼aMonocular Depth Estimation

URI
http://hdl.handle.net/10203/320941
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1047234&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0