SAL-PIM: a subarray-level processing-in-memory architecture for accelerating end-to-end generative transformer with LUT-based linear interpolation생성 트랜스포머의 종단간 가속을 위한 룩-업 테이블 기반 선형 보간을 이용하는 서브어레이-레벨 프로세싱-인-메모리 구조

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 265
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorKim, Joo-Young-
dc.contributor.advisor김주영-
dc.contributor.authorHan, Wontak-
dc.date.accessioned2023-06-26T19:33:48Z-
dc.date.available2023-06-26T19:33:48Z-
dc.date.issued2023-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032949&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/309864-
dc.description학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iv, 26 p. :]-
dc.description.abstractText generation is one of the representative applications that employ machine learning. Various deep-learning models have been presented and studied for text generation, but transformer-based models show state-of-the-art accuracy currently. Among the models, the transformer-decoder-based generative model, such as the generative pretrained model (GPT), has two stages in text generation: summarization and generation. The generation stage is a memory-bound operation, unlike the summarization stage, due to its sequentially operating feature. Therefore, accelerators based-processing-in-memory (PIM) have been suggested many times to address the von-Neumann bottleneck. However, existing PIM accelerators utilize limited memory bandwidth or cannot accelerate the entire model. The SAL-PIM is the first PIM architecture to accelerate the end-to-end transformer-decoder-based generative model. With an optimized mapping scheme, SAL-PIM utilizes higher bandwidth using the subarray-level arithmetic logic unit (S-ALU). To minimize area overhead for S-ALU, S-ALU uses shared MACs utilizing slow clock frequency of commands for the same bank. In addition, in order to support vector functions in PIM, the DRAM cells are used as a look-up table (LUT), and the vector functions are computed by linear interpolation. Then, an LUT-embedded subarray is proposed to optimize LUT operation in DRAM. Lastly, the channel-level arithmetic logic unit (C-ALU) performs the accumulation and reduce-sum operations of data and enables end-to-end inference on PIM. We implemented SAL-PIM on the TSMC 28-nm CMOS technology and scaled it to DRAM technology to verify the feasibility of SAL-PIM. SAL-PIM has a 23.43% additional area overhead compared to the original DRAM, which is smaller than the threshold mentioned in previous work. As a result, the SAL-PIM architecture achieves a maximum of 73.17x speedup on the GPT-2 medium model and an average of 27.74x speedup using the SAL-PIM simulator for text generation compared to GPU.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectProcessing-in-memory▼aDRAM▼aTransformer▼aText generation-
dc.subject프로세싱-인-메모리-
dc.subject디램▼a트랜스포머 모델▼a텍스트 생성-
dc.titleSAL-PIM: a subarray-level processing-in-memory architecture for accelerating end-to-end generative transformer with LUT-based linear interpolation-
dc.title.alternative생성 트랜스포머의 종단간 가속을 위한 룩-업 테이블 기반 선형 보간을 이용하는 서브어레이-레벨 프로세싱-인-메모리 구조-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthor한원탁-
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0