DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Kim, Joo-Young | - |
dc.contributor.advisor | 김주영 | - |
dc.contributor.author | Han, Wontak | - |
dc.date.accessioned | 2023-06-26T19:33:48Z | - |
dc.date.available | 2023-06-26T19:33:48Z | - |
dc.date.issued | 2023 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032949&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/309864 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iv, 26 p. :] | - |
dc.description.abstract | Text generation is one of the representative applications that employ machine learning. Various deep-learning models have been presented and studied for text generation, but transformer-based models show state-of-the-art accuracy currently. Among the models, the transformer-decoder-based generative model, such as the generative pretrained model (GPT), has two stages in text generation: summarization and generation. The generation stage is a memory-bound operation, unlike the summarization stage, due to its sequentially operating feature. Therefore, accelerators based-processing-in-memory (PIM) have been suggested many times to address the von-Neumann bottleneck. However, existing PIM accelerators utilize limited memory bandwidth or cannot accelerate the entire model. The SAL-PIM is the first PIM architecture to accelerate the end-to-end transformer-decoder-based generative model. With an optimized mapping scheme, SAL-PIM utilizes higher bandwidth using the subarray-level arithmetic logic unit (S-ALU). To minimize area overhead for S-ALU, S-ALU uses shared MACs utilizing slow clock frequency of commands for the same bank. In addition, in order to support vector functions in PIM, the DRAM cells are used as a look-up table (LUT), and the vector functions are computed by linear interpolation. Then, an LUT-embedded subarray is proposed to optimize LUT operation in DRAM. Lastly, the channel-level arithmetic logic unit (C-ALU) performs the accumulation and reduce-sum operations of data and enables end-to-end inference on PIM. We implemented SAL-PIM on the TSMC 28-nm CMOS technology and scaled it to DRAM technology to verify the feasibility of SAL-PIM. SAL-PIM has a 23.43% additional area overhead compared to the original DRAM, which is smaller than the threshold mentioned in previous work. As a result, the SAL-PIM architecture achieves a maximum of 73.17x speedup on the GPT-2 medium model and an average of 27.74x speedup using the SAL-PIM simulator for text generation compared to GPU. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | Processing-in-memory▼aDRAM▼aTransformer▼aText generation | - |
dc.subject | 프로세싱-인-메모리 | - |
dc.subject | 디램▼a트랜스포머 모델▼a텍스트 생성 | - |
dc.title | SAL-PIM: a subarray-level processing-in-memory architecture for accelerating end-to-end generative transformer with LUT-based linear interpolation | - |
dc.title.alternative | 생성 트랜스포머의 종단간 가속을 위한 룩-업 테이블 기반 선형 보간을 이용하는 서브어레이-레벨 프로세싱-인-메모리 구조 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전기및전자공학부, | - |
dc.contributor.alternativeauthor | 한원탁 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.