Memory-centric sparse matrix application acceleration architecture메모리 중심 희소 행렬 애플리케이션 가속 아키텍처

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 1
  • Download : 0
Many high-performance computing applications use sparse matrices with various operations and kernels. These applications use GPUs for parallel acceleration. However, these kernels suffer from memory I/O bottleneck, which makes it hard to utilize the total performance of GPU. It is possible to classify these memory-intensive kernels into these four categories: sparse matrix general multiplication (SpGEMM), vector-vector operations, sparse matrix-vector multiplication (SpMV), and sparse triangular matrix-vector solve (SpTRSV). Therefore, this dissertation proposes a SpGEMM accelerator architecture and a processing-in-memory (PIM) architecture for the others, which results in total hardware acceleration of sparse matrix applications. Recent SpGEMM accelerator designs advocated outer product processing, which reads sequentially and minimizes the memory read traffic. However, this study first identifies the memory bloating problem of the outer product designs, which can cause availability problems. Therefore, this study revisits an alternative inner product approach and proposes a new accelerator design called InnerSP to overcome the memory bloating problem. This study shows that row-wise inner product algorithms have a locality and exploits with a modest on-chip cache. The row-wise inner product relies on the on-chip aggregation of intermediate products with the fixed-size on-chip hash table. To deal with the variance in the size of output rows, InnerSP proposes a pre-scanning technique for row splitting and merging. For the rest of the kernels, this dissertation suggests a processing-in-memory architecture pSyncPIM to expand the compute capability of the commercial PIMs to support sparse matrix operations. These commercial PIM products execute all memory banks to access the same memory bank rows and operate simultaneously. However, these approaches could not be applied directly to sparse matrix kernels due to the uneven distribution of sparse matrices. In addition, the current DRAM interface does not allow the generation of memory commands inside the memory chip itself. Therefore, this dissertation proposes a partially synchronous execution technique that follows the lock-step manner execution of all memory banks. Still, it allows the execution of each bank to diverge in its status. On top of that, this dissertation proposes a mapping scheme of SpMV and SpTRSV kernels.
Advisors
허재혁researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2024.2,[vii, 64 p. :]

Keywords

희소 행렬 곱셈▼a희소 행렬-벡터 곱셈▼a희소 삼각 행렬-벡터 풀이▼a하드웨어 가속기▼a프로세싱-인-메모리; Sparse matrix general multiplication▼aSparse matrix-vector multiplication▼aSparse triangular matrix-vector solve▼aHardware accelerator▼aProcessing-in-memory

URI
http://hdl.handle.net/10203/322203
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1100112&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0