(A) PIM-based SparsePU for transformer model acceleration with bit-slice level sparsity exploitation비트 슬라이스 레벨 희소성 활용을 통한 트랜스포머 모델 가속 프로세싱-인-메모리 기반 희소 연산 유닛

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 2
  • Download : 0
This paper presents SparsePU, a processing unit capable of leveraging bit-slice level sparsity for accelerating transformer models within a processing-in-memory (PIM) architecture. This processor achieves performance enhancements by utilizing both activation and weight unstructured bit-slice level sparsity, which has been challenging in conventional PIM structures. The proposed accelerator accelerates operations by performing row-wise matrix multiplication for activation sparsity and enables simultaneous acceleration for various ratios of weight sparsity through a row-wise compressed weight data format. It integrates a network within the accelerator for effective and efficient accumulation of compressed weight data. Additionally, it maximizes operational acceleration in high activation sparsity scenarios through a multi-row skipping scheme. The accelerator significantly enhances performance, achieving up to 857.27x faster computation in actual transformer model layers, and reduces the size of sparse weight data to be stored by up to 93.68%.
Advisors
김주영researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[ii, 21 p. :]

Keywords

프로세싱-인-메모리▼a트랜스포머 모델▼a비트-슬라이스 레벨 희소성; Processing-in-memory(PIM)▼aTransformer model▼aBit-slice level sparsity

URI
http://hdl.handle.net/10203/321578
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096796&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0