Unconditional image-text pair generation with multimodal cross quantizer다중모달 벡터 퀀타이저를 사용한 조건없는 이미지-텍스트 쌍 생성

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 138
  • Download : 0
Though deep generative models have gained a lot of attention, most of the existing works are designed for the unimodal generation task. In this paper, we explore a new method for unconditional image-text pair generation. We propose MXQ-VAE, a vector quantization method for multimodal image-text representation. MXQ-VAE accepts a paired image and text as input, and learns a joint quantized representation space, so that the image-text pair can be converted to a sequence of unified indices. Then we can use autoregressive generative models to model the joint image-text representation, and even perform unconditional image-text pair generation. Extensive experimental results demonstrate that our approach effectively generates semantically consistent image-text pair and also enhances meaningful alignment between image and text.
Advisors
Choi, Edwardresearcher최윤재researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2022.8,[iii, 13 p. :]

Keywords

Multimodal Representation Learning▼aVector Quantization▼aUnconditional Multimodal Generation; 다중모달 특징 학습▼a벡터 퀀타이제이션▼a조건없는 다중모달 생성

URI
http://hdl.handle.net/10203/308221
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1008214&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0