DSpace at KOASAS: Accelerating text generation by minimizing memory transfer in attention mechanism

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

Accelerating text generation by minimizing memory transfer in attention mechanism어텐션 메커니즘의 메모리 전송 최소화를 통한 텍스트 생성 가속

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 4
Download : 0

Export

Park, Junyoung / 박준영

Text generation models based on autoregressive transformer models have been instrumental in advancing applications such as chatbot systems and virtual assistants. When the model generates text with multiple batching, the key/value pairs used in the attention mechanism cannot be shared, thus leading to prolonged execution time. As the attention mechanism is memory bounded, off-chip memory accesses should be minimized for faster execution. Although previous methods reduced the off-chip memory accesses regarding unimportant tokens, they fall short in selectively removing the negligible tokens in each instance. Rather, this dissertation estimates the weight using bit chunks of K vectors, effectively removing the memory accesses for low weight tokens and achieving an $12.1x$ pruning ratio without fine-tuning. Additionally, this dissertation present consecutive bit chunk request that prevents the underutilization of Processing Elements (PEs) induced by on-demand DRAM access. Finally, a dedicated hardware equipped with PEs and auxiliary modules is designed, which supports the proposed methods. As a result, it shows $2.6x$ reduced memory accesses, leading to an average $2.3x$ speedup and a $2.4x$ energy efficiency.

Advisors: 김이섭 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2024

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 38 p. :]

Keywords: 트랜스포머 구조▼a텍스트 생성▼a어텐션 메커니즘▼a인공지능 가속기 디자인▼a비순차적 실행; Transformer architecture▼atext generation▼aattention mechanism▼aAI accelerator design▼aOut-of-order processing

URI: http://hdl.handle.net/10203/321643

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097215&flag=dissertation

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Accelerating text generation by minimizing memory transfer in attention mechanism어텐션 메커니즘의 메모리 전송 최소화를 통한 텍스트 생성 가속

KOASAS

Communities & Collections