Accelerating text generation by minimizing memory transfer in attention mechanism어텐션 메커니즘의 메모리 전송 최소화를 통한 텍스트 생성 가속

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 3
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor김이섭-
dc.contributor.authorPark, Junyoung-
dc.contributor.author박준영-
dc.date.accessioned2024-07-30T19:31:37Z-
dc.date.available2024-07-30T19:31:37Z-
dc.date.issued2024-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097215&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/321643-
dc.description학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 38 p. :]-
dc.description.abstractText generation models based on autoregressive transformer models have been instrumental in advancing applications such as chatbot systems and virtual assistants. When the model generates text with multiple batching, the key/value pairs used in the attention mechanism cannot be shared, thus leading to prolonged execution time. As the attention mechanism is memory bounded, off-chip memory accesses should be minimized for faster execution. Although previous methods reduced the off-chip memory accesses regarding unimportant tokens, they fall short in selectively removing the negligible tokens in each instance. Rather, this dissertation estimates the weight using bit chunks of K vectors, effectively removing the memory accesses for low weight tokens and achieving an $12.1x$ pruning ratio without fine-tuning. Additionally, this dissertation present consecutive bit chunk request that prevents the underutilization of Processing Elements (PEs) induced by on-demand DRAM access. Finally, a dedicated hardware equipped with PEs and auxiliary modules is designed, which supports the proposed methods. As a result, it shows $2.6x$ reduced memory accesses, leading to an average $2.3x$ speedup and a $2.4x$ energy efficiency.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject트랜스포머 구조▼a텍스트 생성▼a어텐션 메커니즘▼a인공지능 가속기 디자인▼a비순차적 실행-
dc.subjectTransformer architecture▼atext generation▼aattention mechanism▼aAI accelerator design▼aOut-of-order processing-
dc.titleAccelerating text generation by minimizing memory transfer in attention mechanism-
dc.title.alternative어텐션 메커니즘의 메모리 전송 최소화를 통한 텍스트 생성 가속-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthorKim, Lee-Sup-
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0