DSpace at KOASAS: Cross-modal retrieval meets inference: improving zero-shot classification with cross-modal retrieval

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Theses_Master(석사논문)

Cross-modal retrieval meets inference: improving zero-shot classification with cross-modal retrieval교차 모달 검색과 추론의 만남: 교차 모달 검색으로 제로샷 분류 개선

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 9
Download : 0

Export

Eom, Seong-Ha / 엄성하

Contrastive language-image pre-training (CLIP) has demonstrated remarkable zero-shot classification ability, namely image classification using novel text labels. Existing works have attempted to enhance CLIP by fine-tuning on downstream tasks, but these have inadvertently led to performance degradation on unseen classes, thus harming zero-shot generalization. This paper aims to address this challenge by leveraging readily available image-text pairs from an external dataset for cross-modal guidance during inference. To this end, we propose X-MoRe, a novel inference method comprising two key steps: (1) cross-modal retrieval and (2) modal-confidence-based ensemble. Given a query image, we harness the power of CLIP's cross-modal representations to retrieve relevant textual information from an external image-text pair dataset. Then, we assign higher weights to the more reliable modality between the original query image and retrieved text, contributing to the final prediction. X-MoRe demonstrates robust performance across a diverse set of tasks without the need for additional training, showcasing the effectiveness of utilizing cross-modal features to maximize CLIP's zero-shot ability.

Advisors: 윤세영 researcher

Description: 한국과학기술원 :김재철AI대학원,

Publisher: 한국과학기술원

Issue Date: 2024

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2024.2,[iii, 20 p. :]

Keywords: 제로샷 분류▼a교차 모달 검색▼a앙상블; Zero-shot classification▼aCross modal retrieval▼aEnsemble

URI: http://hdl.handle.net/10203/321358

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096063&flag=dissertation

Appears in Collection: AI-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Cross-modal retrieval meets inference: improving zero-shot classification with cross-modal retrieval교차 모달 검색과 추론의 만남: 교차 모달 검색으로 제로샷 분류 개선

KOASAS

Communities & Collections