DSpace at KOASAS: Probabilistic representation learning for improved cross-modal retrieval using density-wise similarity

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Theses_Master(석사논문)

Probabilistic representation learning for improved cross-modal retrieval using density-wise similarity분포간 유사도 기반 확률적 표현학습을 통한 크로스모달 검색 개선

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 2
Download : 0

Export

Youn, Yeo Dong / 윤여동

For cross-modal retrieval tasks, building a joint representation space for data samples from different modalities has been a common practice especially from the vision and language domains. The two characteristics of image and caption pairs that make this task especially challenging are the multiplicity of matches and partiality of matching pairs. Given an image or a caption, there are multiple positive captions or images and for each positive image-caption pair, the captions convey only the key concepts at interest while ignoring other components. Previous researches, which are based on learning pointwise embeddings in a deterministic way, fail to capture this one-to-many correspondences nor correctly calibrate the semantic intersection between arbitrary image-caption pairs. This paper proposes a generalized method of learning the representations of images and captions as probabilistic distributions in the joint representation space and explicitly model cross-modal uncertainty with differential entropy. The probabilistic embeddings are parametrically learned by fusing a visual, text head module to a pretrained visual text encoder and trained in a two-staged manner. Through extensive qualitative experiments on MS-COCO and Flickr30K datasets, the paper demonstrates the benefit of using probabilistic representations by showing how cross-modal uncertainty can measure the multiplicity within each sample and how density-wise similarity preserves the partial similarity of each image-caption pair.

Advisors: 문일철 researcher

Description: 한국과학기술원 :김재철AI대학원,

Publisher: 한국과학기술원

Issue Date: 2023

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2023.8,[iii, 25 p. :]

Keywords: 크로스모달 검색▼a다중 매칭 문제▼a의미론적 부분성▼a점/확률분포 임베딩▼a불확실성; Cross-modal retrieval▼aMultiplicity of matches▼aPartiality of matching pairs▼aPoint-wise/distributionwise embedding▼aCross-modal uncertainty

URI: http://hdl.handle.net/10203/320557

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045745&flag=dissertation

Appears in Collection: AI-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Probabilistic representation learning for improved cross-modal retrieval using density-wise similarity분포간 유사도 기반 확률적 표현학습을 통한 크로스모달 검색 개선

KOASAS

Communities & Collections