Exploration into translation-equivariant image quantization위치 등변 이미지 양자화에 대한 탐구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 146
  • Download : 0
Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing. Instead of focusing on anti-aliasing, we propose a simple but effective way to achieve translation-equivariant image quantization by enforcing orthogonality among the codebook embeddings. To explore the advantages of translation-equivariant image quantization, we conduct three experiments with a carefully controlled dataset: (1) text-to-image generation, where the quantized image indices are the target to predict, (2) image-to-text generation, where the quantized image indices are given as a condition, (3) using a smaller training set to analyze sample efficiency. From the strictly controlled experiments, we empirically verify that translation-equivariant image quantizer improves not only sample efficiency but also the accuracy over VQGAN up to +11.9\% in text-to-image generation and +3.9\% in image-to-text generation.
Advisors
Choi, Edwardresearcher최윤재researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2022.8,[iv, 28 p. :]

Keywords

Vector Quantization▼aTranslation Equivariance▼aText-Image Multimodal Learning; 벡터 양자화▼a위치 등변성▼a텍스트-이미지 멀티모달 학습

URI
http://hdl.handle.net/10203/308219
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1008205&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0