Let it reuse : a multi-mode sparse attention inference accelerator with a unified multi-precision datapath통합된 다중 정밀도 데이터 연산을 통한 다중 모드 희소 어탠션 추론 가속기

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 56
  • Download : 0
Transformer-based models are rapidly emerging in various fields of DNNs. Therefore, accelerators for the self-attention mechanism, a bottleneck of the transformer, are actively studied today. However, for real-world accelerators, not only high performance but also generality and flexibility are necessary. First, because the required precision and datatype of each task are different, the accelerators should generally support multi-precision. Second, because the required accuracy, energy, and latency change depend on the scenarios, the accelerators should flexibly support the multi-mode without severe HW underutilization. Real-world accelerators need to deliver high performance even under the aforementioned functionalities. This paper shows that the prior design framework has reached its limit in terms of computational savings. This paper presents an interpretable design framework called "Let It Reuse." To effectively utilize this framework and satisfy real-world constraints, it takes a co-optimization approach, including an algorithm, architecture and microarchitecture. In detail, this paper proposes a multi-mode aware pipeline with a unified multi-precision datapath and explores reusability according to the datatype. As an experiment of the Question & Answering task, the Let It Reuse Accelerator improves the geomean speedup by 24 times and 4 times, respectively, compared to a GPU, an up-to-date Nvidia ampere architecture, and Sanger, a state-of-the-art attention accelerator.
Advisors
Kim, Lee-Supresearcher김이섭researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2022.8,[iv, 34 p. :]

Keywords

Transformer▼aSelf-attention Mechanism▼aSparse▼aMulti-mode▼aMulti-precision▼aCo-optimization; 트랜스포머▼a셀프 어텐션 메커니즘▼a희소성▼a멀티 모드▼a멀티 정확도▼a통합 최적화

URI
http://hdl.handle.net/10203/309836
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1008352&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0