DSpace at KOASAS: Class token knowledge distillation for efficient vision transformer

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

Class token knowledge distillation for efficient vision transformer효율적인 비전 트랜스포머를 위한 클래스 토큰 지식 증류

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 2
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	김대식	-
dc.contributor.author	Kang, Minchan	-
dc.contributor.author	강민찬	-
dc.date.accessioned	2024-08-08T19:30:14Z	-
dc.date.available	2024-08-08T19:30:14Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097294&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/321776	-
dc.description	학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 19 p. :]	-
dc.description.abstract	Vision Transformer (ViT) achieves higher performance compared to Convolutional Neural Networks(CNNs) but requires more computational cost. Knowledge Distillation (KD) has demonstrated potential in compressing complex networks by transferring knowledge from a large pre-trained model to a smaller one. However, existing KD methods for ViT either employ CNNs as teachers or overlook the importance of class token ([CLS]) information. It leads to failing to effectively distill ViT’s distinct knowledge. In this paper, we propose Class token Knowledge Distillation ([CLS]-KD), which fully exploits information from the class token and patches in ViT. For class embedding (CLS) distillation, the intermediate CLS of the student model is aligned with the corresponding CLS of the teacher model through a projector. Furthermore, we introduce CLS-patch attention map distillation, where an attention map between the CLS and patch embeddings is generated and matched at each layer. This empowers the student model to learn how to adaptively extract patch embedding information into the CLS under teacher guidance. Through these two strategies, [CLS]-KD consistently outperforms existing state-of-the-art methods on the ImageNet-1K dataset across various teacher-student settings. Moreover, the proposed method shows its generalization ability through transfer learning experiments on the CIFAR-10 and CIFAR-100 datasets.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	딥러닝▼a컴퓨터 비젼▼a지식 증류▼a비젼 트랜스포머	-
dc.subject	Deep learning▼aComputer vision▼aKnowledge distillation▼aVision transformer	-
dc.title	Class token knowledge distillation for efficient vision transformer	-
dc.title.alternative	효율적인 비전 트랜스포머를 위한 클래스 토큰 지식 증류	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전기및전자공학부,	-
dc.contributor.alternativeauthor	Kim, Dae-Shik	-

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Class token knowledge distillation for efficient vision transformer효율적인 비전 트랜스포머를 위한 클래스 토큰 지식 증류

KOASAS

Communities & Collections