SAL-PIM: a subarray-level processing-in-memory architecture for accelerating end-to-end generative transformer with LUT-based linear interpolation생성 트랜스포머의 종단간 가속을 위한 룩-업 테이블 기반 선형 보간을 이용하는 서브어레이-레벨 프로세싱-인-메모리 구조

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 254
  • Download : 0
Text generation is one of the representative applications that employ machine learning. Various deep-learning models have been presented and studied for text generation, but transformer-based models show state-of-the-art accuracy currently. Among the models, the transformer-decoder-based generative model, such as the generative pretrained model (GPT), has two stages in text generation: summarization and generation. The generation stage is a memory-bound operation, unlike the summarization stage, due to its sequentially operating feature. Therefore, accelerators based-processing-in-memory (PIM) have been suggested many times to address the von-Neumann bottleneck. However, existing PIM accelerators utilize limited memory bandwidth or cannot accelerate the entire model. The SAL-PIM is the first PIM architecture to accelerate the end-to-end transformer-decoder-based generative model. With an optimized mapping scheme, SAL-PIM utilizes higher bandwidth using the subarray-level arithmetic logic unit (S-ALU). To minimize area overhead for S-ALU, S-ALU uses shared MACs utilizing slow clock frequency of commands for the same bank. In addition, in order to support vector functions in PIM, the DRAM cells are used as a look-up table (LUT), and the vector functions are computed by linear interpolation. Then, an LUT-embedded subarray is proposed to optimize LUT operation in DRAM. Lastly, the channel-level arithmetic logic unit (C-ALU) performs the accumulation and reduce-sum operations of data and enables end-to-end inference on PIM. We implemented SAL-PIM on the TSMC 28-nm CMOS technology and scaled it to DRAM technology to verify the feasibility of SAL-PIM. SAL-PIM has a 23.43% additional area overhead compared to the original DRAM, which is smaller than the threshold mentioned in previous work. As a result, the SAL-PIM architecture achieves a maximum of 73.17x speedup on the GPT-2 medium model and an average of 27.74x speedup using the SAL-PIM simulator for text generation compared to GPU.
Advisors
Kim, Joo-Youngresearcher김주영researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iv, 26 p. :]

Keywords

Processing-in-memory▼aDRAM▼aTransformer▼aText generation; 프로세싱-인-메모리; 디램▼a트랜스포머 모델▼a텍스트 생성

URI
http://hdl.handle.net/10203/309864
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032949&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0