ReMixer : object-aware mixing layer for vision transformers비전 트랜스포머를 위한 물체 인식 기반 패치 혼합 신경층

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 76
  • Download : 0
Vision Transformers (ViTs) have shown impressive results on various visual recognition tasks, alternating classic convolutional networks. While the initial ViTs treated all patches equally, recent studies reveal that incorporating inductive biases such as spatiality benefits the learned representations. However, most prior works solely focused on the location of patches, overlooking the scene structure of images. This paper aims to further guide the interaction of patches using the object information. Specifically, we propose ReMixer, which reweights the patch mixing layers of ViT based on the patch-wise object labels obtained in unsupervised or weakly-supervised manners, i.e., no additional human-annotating cost is necessary. Using the object labels, we compute a reweighting mask with a learnable scale parameter that calibrates the patch interactions, e.g., attention map of self-attention. We demonstrate that ReMixer improves ViTs over various downstream tasks, including classification, multi-object recognition, and background robustness. Finally, we show that our idea also works for MLP-Mixer and ConvMixer, implying its generic applicability to patch-based models.
Advisors
Shin, Jinwooresearcher신진우researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2022.8,[iv, 23 p. :]

Keywords

Object-centric▼aInductive bias▼aVision transformers▼aPatch-based models; 물체 인식 기반▼a귀납적 편향▼a비전 트랜스포머▼a패치 기반 모델

URI
http://hdl.handle.net/10203/309947
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1008387&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0