Towards human-level domain adaptation for scene understanding장면이해를 위한 인간 수준의 도메인 적응 방법론

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 2
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor권인소-
dc.contributor.advisorKweon, In-So-
dc.contributor.advisor윤국진-
dc.contributor.authorShin, Inkyu-
dc.contributor.author신인규-
dc.date.accessioned2024-08-08T19:31:05Z-
dc.date.available2024-08-08T19:31:05Z-
dc.date.issued2024-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1099208&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/322014-
dc.description학위논문(박사) - 한국과학기술원 : 미래자동차학제전공, 2024.2,[xi, 100 p. :]-
dc.description.abstractThe human visual system analyzes the vision data to create meaningful representations, enabling the performance of various tasks. Remarkably, it possesses the capability to autonomously discern and learn from the obtained unseen data by analyzing their pattern and distribution (unsupervised offline adaptation). Furthermore, it demonstrates robust adaptability to real-time incoming data during inference (online adaptation). This adaptability significantly enhances the generalizability and effectiveness of the human visual system in diverse scenarios. In this thesis, we propose to apply these two data-centric adaptation methods to machine vision systems, which are currently vulnerable to changes in data distribution, with the aim of achieving domain adaptive and cost effective human-level computer vision. Below is an abstract summary detail of how this approach is proposed. Firstly, in Chapter 2, we present our pursuit of data-centric unsupervised adaptation (UDA) in machine vision. Our research identifies the crucial role of effectively acquiring and utilizing model outputs, such as pseudo-labels, from unseen target data to enhance adaptation. To this end, we propose a methodology that scales up the data pseudo-labels by meticulously analyzing the patterns and relationships within the pixel outputs of the data. Furthermore, we demonstrate that our approach significantly improves adaptability at both the image and video levels. This is achieved by implementing spatial and temporal scaling strategies, respectively, allowing for more nuanced and effective adaptation across diverse visual contexts. In Chapter 3, our empirical studies reveal that unsupervised adaptation, conducted without any real target data labels as like in Chapter 2, is inherently limited and cannot match the performance of a fully supervised model. While cost-effective, this adaptation approach yields a model whose performance gap compared to its supervised counterpart cannot not be deployed practically. Addressing this challenge, we introduce a novel human-in-the loop active domain adaptation method (Active DA). This method strategically determines areas for labeling within the target data, guided by the model’s analysis on target data. Our findings indicate that labeling a mere $2%$ of pixels in each image can approximate the performance of a supervised model. Additionally, we propose a technique for selecting representative points within this $2%$ threshold (e.g., 40 points per image), demonstrating that this selective approach still yields comparable results to the supervised models without the severe performance degradation. In Chapter 4, we delve into the realm of online adaptation, a pivotal element in our pursuit of human-level adaptability in machine learning models. Online adaptation is characterized by the model’s capacity for bidirectional inference and learning, utilizing target test data in real-time (Test-time DA). This approach necessitates more meticulous analysis of each data sample, as the model aims to adapt by observing only current batch or even a single sample. To enhance the model’s self-supervision on an individual sample basis, we propose two innovative methods. The first method focuses on the generation of improved pseudo labels through the integration and aggregation of multi-modal sensor data. Our findings reveal that the bidirectional interplay between modalities significantly enhances the quality of pseudo labels, thereby bolstering the model’s adaptability during test-time. In scenarios lacking multi-modal data, and consequently accurate pseudo labels, we introduce a second method. This approach involves a straightforward yet effective self-supervision technique, which we term ‘masking and reconstruction’. This method adeptly translates the inherent structure and correlations within the data, leading to a substantial improvement in the model’s performance during test-time adaptation. These methodologies underscore our commitment to advancing the frontiers of online adaptation, ensuring our models remain robust and effective in various tasks. In Chapter 5, we culminate our exploration with the comprehensive framework for unified domain adaptation (UnDA), aimed at attaining the zenith of human-level adaptability in machine learning. This chapter commences with a series of supplementary experiments designed to extend and apply the UDA methodology, initially introduced in Chapter 2, to test-time training and, conversely, to incorporate test-time adaptation (TTA) strategies, as proposed, into the offline training phase. Our empirical evaluations reveal a notable compatibility and synergy between our UDA and TTA approaches. Further, this chapter ventures into the integration of active adaptation strategies to augment the efficacy of our unified domain adaptation framework. A critical challenge emerges in the context of incorporating a human-in-the-loop active adaptation system within this unified framework, since we assume the infeasibility of human labeling in online scenarios. To navigate this obstacle, we leverage the capabilities of a pre-trained, domain-generalized foundation model. This model serves as a surrogate for human-guided labeling, offering robust masking capabilities that are invariant to domain shifts. We demonstrate that pseudo-labels, meticulously refined through both training and test phases under the guidance of the mask from foundation model, exhibit marked improvements. This innovative approach to pseudo-label generation and refinement facilitates a more potent and effective unified adaptation, seamlessly bridging the gap between training and test phases.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject비지도 도메인적응▼a엑티브 도메인적응▼a테스트타임 도메인 적응▼a통합 도메인 적응▼a인간 수준의 도메인 적응-
dc.subjectUnsupervised Domain Adaptation▼aActive Domain Adaptation▼aTest-time Adaptation▼aUnified Domain Adaptation▼aHuman-level Adaptation-
dc.titleTowards human-level domain adaptation for scene understanding-
dc.title.alternative장면이해를 위한 인간 수준의 도메인 적응 방법론-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :미래자동차학제전공,-
dc.contributor.alternativeauthorYoon, Kuk-Jin-
Appears in Collection
PD-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0