DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Kim, Lee-Sup | - |
dc.contributor.advisor | 김이섭 | - |
dc.contributor.author | Yeo, Unhak | - |
dc.date.accessioned | 2023-06-26T19:33:39Z | - |
dc.date.available | 2023-06-26T19:33:39Z | - |
dc.date.issued | 2022 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1008352&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/309836 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2022.8,[iv, 34 p. :] | - |
dc.description.abstract | Transformer-based models are rapidly emerging in various fields of DNNs. Therefore, accelerators for the self-attention mechanism, a bottleneck of the transformer, are actively studied today. However, for real-world accelerators, not only high performance but also generality and flexibility are necessary. First, because the required precision and datatype of each task are different, the accelerators should generally support multi-precision. Second, because the required accuracy, energy, and latency change depend on the scenarios, the accelerators should flexibly support the multi-mode without severe HW underutilization. Real-world accelerators need to deliver high performance even under the aforementioned functionalities. This paper shows that the prior design framework has reached its limit in terms of computational savings. This paper presents an interpretable design framework called "Let It Reuse." To effectively utilize this framework and satisfy real-world constraints, it takes a co-optimization approach, including an algorithm, architecture and microarchitecture. In detail, this paper proposes a multi-mode aware pipeline with a unified multi-precision datapath and explores reusability according to the datatype. As an experiment of the Question & Answering task, the Let It Reuse Accelerator improves the geomean speedup by 24 times and 4 times, respectively, compared to a GPU, an up-to-date Nvidia ampere architecture, and Sanger, a state-of-the-art attention accelerator. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | Transformer▼aSelf-attention Mechanism▼aSparse▼aMulti-mode▼aMulti-precision▼aCo-optimization | - |
dc.subject | 트랜스포머▼a셀프 어텐션 메커니즘▼a희소성▼a멀티 모드▼a멀티 정확도▼a통합 최적화 | - |
dc.title | Let it reuse | - |
dc.title.alternative | 통합된 다중 정밀도 데이터 연산을 통한 다중 모드 희소 어탠션 추론 가속기 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전기및전자공학부, | - |
dc.contributor.alternativeauthor | 여운학 | - |
dc.title.subtitle | a multi-mode sparse attention inference accelerator with a unified multi-precision datapath | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.