Simple but effective attention calibration for CLIP-guided diffusion modelsCLIP 지도 디퓨젼 모델을 위한 간단하지만 효과적인 주의 집중 교정

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 4
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor김창익-
dc.contributor.authorJeon, Woo-jin-
dc.contributor.author전우진-
dc.date.accessioned2024-07-30T19:31:30Z-
dc.date.available2024-07-30T19:31:30Z-
dc.date.issued2024-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097179&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/321607-
dc.description학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[vi, 31 p. :]-
dc.description.abstractWhile Contrastive Language-Image Pre-training (CLIP) model has significantly advanced text-to-image generation, we uncover two notable issues in its application to diffusion models, particularly with the implementation of local embeddings. First, the model disproportionately focuses on word embeddings with less information of the input prompt. Second, local embeddings disrupt the image geometry established by global embeddings at initial timesteps, risking misalignment with the original prompt. To mitigate the identified issues, we introduce two adjustments to cross-attention: sequence-dependent and time-dependent attention calibration. Our method employs simple numerical operations, for which we provide the values, ensuring easy implementation. In the sequence-dependent attention calibration, constants are added to the logits in the cross-attention layer to counterbalance the diminishing attention across the word sequence. The time-dependent attention adjustment enhances the attention towards global embeddings in the initial stages, facilitating better geometry formation. Our experiments on various datasets show that this simple method significantly improves the performance of Stable Diffusion, yielding images that more accurately depict the input prompts.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectCLIP▼a디퓨젼▼a교차 어텐션-
dc.subjectCLIP▼aDiffusion▼aCross-attention-
dc.titleSimple but effective attention calibration for CLIP-guided diffusion models-
dc.title.alternativeCLIP 지도 디퓨젼 모델을 위한 간단하지만 효과적인 주의 집중 교정-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthorKim, Changick-
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0