DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 김이섭 | - |
dc.contributor.author | Park, Junyoung | - |
dc.contributor.author | 박준영 | - |
dc.date.accessioned | 2024-07-30T19:31:37Z | - |
dc.date.available | 2024-07-30T19:31:37Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097215&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/321643 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 38 p. :] | - |
dc.description.abstract | Text generation models based on autoregressive transformer models have been instrumental in advancing applications such as chatbot systems and virtual assistants. When the model generates text with multiple batching, the key/value pairs used in the attention mechanism cannot be shared, thus leading to prolonged execution time. As the attention mechanism is memory bounded, off-chip memory accesses should be minimized for faster execution. Although previous methods reduced the off-chip memory accesses regarding unimportant tokens, they fall short in selectively removing the negligible tokens in each instance. Rather, this dissertation estimates the weight using bit chunks of K vectors, effectively removing the memory accesses for low weight tokens and achieving an $12.1x$ pruning ratio without fine-tuning. Additionally, this dissertation present consecutive bit chunk request that prevents the underutilization of Processing Elements (PEs) induced by on-demand DRAM access. Finally, a dedicated hardware equipped with PEs and auxiliary modules is designed, which supports the proposed methods. As a result, it shows $2.6x$ reduced memory accesses, leading to an average $2.3x$ speedup and a $2.4x$ energy efficiency. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 트랜스포머 구조▼a텍스트 생성▼a어텐션 메커니즘▼a인공지능 가속기 디자인▼a비순차적 실행 | - |
dc.subject | Transformer architecture▼atext generation▼aattention mechanism▼aAI accelerator design▼aOut-of-order processing | - |
dc.title | Accelerating text generation by minimizing memory transfer in attention mechanism | - |
dc.title.alternative | 어텐션 메커니즘의 메모리 전송 최소화를 통한 텍스트 생성 가속 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전기및전자공학부, | - |
dc.contributor.alternativeauthor | Kim, Lee-Sup | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.