DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 김대식 | - |
dc.contributor.author | Kang, Minchan | - |
dc.contributor.author | 강민찬 | - |
dc.date.accessioned | 2024-08-08T19:30:14Z | - |
dc.date.available | 2024-08-08T19:30:14Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097294&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/321776 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 19 p. :] | - |
dc.description.abstract | Vision Transformer (ViT) achieves higher performance compared to Convolutional Neural Networks(CNNs) but requires more computational cost. Knowledge Distillation (KD) has demonstrated potential in compressing complex networks by transferring knowledge from a large pre-trained model to a smaller one. However, existing KD methods for ViT either employ CNNs as teachers or overlook the importance of class token ([CLS]) information. It leads to failing to effectively distill ViT’s distinct knowledge. In this paper, we propose Class token Knowledge Distillation ([CLS]-KD), which fully exploits information from the class token and patches in ViT. For class embedding (CLS) distillation, the intermediate CLS of the student model is aligned with the corresponding CLS of the teacher model through a projector. Furthermore, we introduce CLS-patch attention map distillation, where an attention map between the CLS and patch embeddings is generated and matched at each layer. This empowers the student model to learn how to adaptively extract patch embedding information into the CLS under teacher guidance. Through these two strategies, [CLS]-KD consistently outperforms existing state-of-the-art methods on the ImageNet-1K dataset across various teacher-student settings. Moreover, the proposed method shows its generalization ability through transfer learning experiments on the CIFAR-10 and CIFAR-100 datasets. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 딥러닝▼a컴퓨터 비젼▼a지식 증류▼a비젼 트랜스포머 | - |
dc.subject | Deep learning▼aComputer vision▼aKnowledge distillation▼aVision transformer | - |
dc.title | Class token knowledge distillation for efficient vision transformer | - |
dc.title.alternative | 효율적인 비전 트랜스포머를 위한 클래스 토큰 지식 증류 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전기및전자공학부, | - |
dc.contributor.alternativeauthor | Kim, Dae-Shik | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.