Associative learning for multimodal representation under ambiguous pair problems모호한 페어 문제 하에서의 멀티모달 표현을 위한 연상 학습

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 87
  • Download : 0
Our daily life is full of multimodal information such as visual, audio, and language representations. Humans recognize daily events naturally by processing such multimodal information comprehensively. It is possible because humans are aware of the relationships among multimodal information. Therefore, in order to understand the world at a human level, machines need to learn and be aware of the relationships among multimodal data beyond a single modal one. However, the relationships of given multimodal data in real-world environments are not always enough or certain for machines to learn. For example, when it is difficult to obtain certain modal data, the number of multimodal data pairs for machines to learn can be limited. In addition, even if there are enough multimodal data pairs, their relationships can be mismatched sometimes, which may confuse machines. Such situations with limited and mismatched pairs can be considered to have ambiguous pair problems that hinder machines from learning multimodal relationships. Therefore, it is necessary to address the ambiguous pair problems in order to learn multimodal relationships robustly even in real-world environments. We deal with ambiguous pair problems for multimodal representation through multimodal association approaches that can compensate lack of paired information. We address audio-visual representation learning and text-video retrieval tasks which suffer from limited and mismatched pair problems, respectively. First, we propose a novel audio-visual representation learning approach based on associative learning that can utilize abundant unpaired data under the limited pair problem. Second, we introduce a novel text-video retrieval method based on associative learning which can recognize mismatched features and mitigate the mismatch effect under the mismatched pair problem. The proposed methods are validated to show the effectiveness of the associative learning approach under ambiguous pair problems by conducting extensive experiments including comparisons to the state-of-the-art methods, ablation studies, and further qualitative/quantitative analyses.
Advisors
Ro, Yong Manresearcher노용만researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[vi, 70 p. :]

Keywords

Multimodal▼aAmbiguous pair problems▼aLimited pairs▼aMismatched pairs▼aAssociative learning▼aAudio-visual representation learning▼aText-video retrieval; 멀티모달▼a모호한 페어 문제▼a제한된 페어▼a불일치 페어▼a연상 학습▼a시청각 표현 학습▼a텍스트-비디오 검색

URI
http://hdl.handle.net/10203/309104
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030553&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0