DSpace at KOASAS: Associative learning for multimodal representation under ambiguous pair problems

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Associative learning for multimodal representation under ambiguous pair problems모호한 페어 문제 하에서의 멀티모달 표현을 위한 연상 학습

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 87
Download : 0

Export

Lee, Sangmin

Our daily life is full of multimodal information such as visual, audio, and language representations. Humans recognize daily events naturally by processing such multimodal information comprehensively. It is possible because humans are aware of the relationships among multimodal information. Therefore, in order to understand the world at a human level, machines need to learn and be aware of the relationships among multimodal data beyond a single modal one. However, the relationships of given multimodal data in real-world environments are not always enough or certain for machines to learn. For example, when it is difficult to obtain certain modal data, the number of multimodal data pairs for machines to learn can be limited. In addition, even if there are enough multimodal data pairs, their relationships can be mismatched sometimes, which may confuse machines. Such situations with limited and mismatched pairs can be considered to have ambiguous pair problems that hinder machines from learning multimodal relationships. Therefore, it is necessary to address the ambiguous pair problems in order to learn multimodal relationships robustly even in real-world environments. We deal with ambiguous pair problems for multimodal representation through multimodal association approaches that can compensate lack of paired information. We address audio-visual representation learning and text-video retrieval tasks which suffer from limited and mismatched pair problems, respectively. First, we propose a novel audio-visual representation learning approach based on associative learning that can utilize abundant unpaired data under the limited pair problem. Second, we introduce a novel text-video retrieval method based on associative learning which can recognize mismatched features and mitigate the mismatch effect under the mismatched pair problem. The proposed methods are validated to show the effectiveness of the associative learning approach under ambiguous pair problems by conducting extensive experiments including comparisons to the state-of-the-art methods, ablation studies, and further qualitative/quantitative analyses.

Advisors: Ro, Yong Man researcher; 노용만 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2023

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[vi, 70 p. :]

Keywords: Multimodal▼aAmbiguous pair problems▼aLimited pairs▼aMismatched pairs▼aAssociative learning▼aAudio-visual representation learning▼aText-video retrieval; 멀티모달▼a모호한 페어 문제▼a제한된 페어▼a불일치 페어▼a연상 학습▼a시청각 표현 학습▼a텍스트-비디오 검색

URI: http://hdl.handle.net/10203/309104

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030553&flag=dissertation

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Associative learning for multimodal representation under ambiguous pair problems모호한 페어 문제 하에서의 멀티모달 표현을 위한 연상 학습

KOASAS

Communities & Collections