Probabilistic representation learning for improved cross-modal retrieval using density-wise similarity분포간 유사도 기반 확률적 표현학습을 통한 크로스모달 검색 개선

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 2
  • Download : 0
For cross-modal retrieval tasks, building a joint representation space for data samples from different modalities has been a common practice especially from the vision and language domains. The two characteristics of image and caption pairs that make this task especially challenging are the multiplicity of matches and partiality of matching pairs. Given an image or a caption, there are multiple positive captions or images and for each positive image-caption pair, the captions convey only the key concepts at interest while ignoring other components. Previous researches, which are based on learning pointwise embeddings in a deterministic way, fail to capture this one-to-many correspondences nor correctly calibrate the semantic intersection between arbitrary image-caption pairs. This paper proposes a generalized method of learning the representations of images and captions as probabilistic distributions in the joint representation space and explicitly model cross-modal uncertainty with differential entropy. The probabilistic embeddings are parametrically learned by fusing a visual, text head module to a pretrained visual text encoder and trained in a two-staged manner. Through extensive qualitative experiments on MS-COCO and Flickr30K datasets, the paper demonstrates the benefit of using probabilistic representations by showing how cross-modal uncertainty can measure the multiplicity within each sample and how density-wise similarity preserves the partial similarity of each image-caption pair.
Advisors
문일철researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2023.8,[iii, 25 p. :]

Keywords

크로스모달 검색▼a다중 매칭 문제▼a의미론적 부분성▼a점/확률분포 임베딩▼a불확실성; Cross-modal retrieval▼aMultiplicity of matches▼aPartiality of matching pairs▼aPoint-wise/distributionwise embedding▼aCross-modal uncertainty

URI
http://hdl.handle.net/10203/320557
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045745&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0