DSpace at KOASAS: Simple but effective attention calibration for CLIP-guided diffusion models

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

Simple but effective attention calibration for CLIP-guided diffusion modelsCLIP 지도 디퓨젼 모델을 위한 간단하지만 효과적인 주의 집중 교정

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 4
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	김창익	-
dc.contributor.author	Jeon, Woo-jin	-
dc.contributor.author	전우진	-
dc.date.accessioned	2024-07-30T19:31:30Z	-
dc.date.available	2024-07-30T19:31:30Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097179&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/321607	-
dc.description	학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[vi, 31 p. :]	-
dc.description.abstract	While Contrastive Language-Image Pre-training (CLIP) model has significantly advanced text-to-image generation, we uncover two notable issues in its application to diffusion models, particularly with the implementation of local embeddings. First, the model disproportionately focuses on word embeddings with less information of the input prompt. Second, local embeddings disrupt the image geometry established by global embeddings at initial timesteps, risking misalignment with the original prompt. To mitigate the identified issues, we introduce two adjustments to cross-attention: sequence-dependent and time-dependent attention calibration. Our method employs simple numerical operations, for which we provide the values, ensuring easy implementation. In the sequence-dependent attention calibration, constants are added to the logits in the cross-attention layer to counterbalance the diminishing attention across the word sequence. The time-dependent attention adjustment enhances the attention towards global embeddings in the initial stages, facilitating better geometry formation. Our experiments on various datasets show that this simple method significantly improves the performance of Stable Diffusion, yielding images that more accurately depict the input prompts.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	CLIP▼a디퓨젼▼a교차 어텐션	-
dc.subject	CLIP▼aDiffusion▼aCross-attention	-
dc.title	Simple but effective attention calibration for CLIP-guided diffusion models	-
dc.title.alternative	CLIP 지도 디퓨젼 모델을 위한 간단하지만 효과적인 주의 집중 교정	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전기및전자공학부,	-
dc.contributor.alternativeauthor	Kim, Changick	-

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Simple but effective attention calibration for CLIP-guided diffusion modelsCLIP 지도 디퓨젼 모델을 위한 간단하지만 효과적인 주의 집중 교정

KOASAS

Communities & Collections