DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 허재혁 | - |
dc.contributor.author | Baek, Daehyeon | - |
dc.contributor.author | 백대현 | - |
dc.date.accessioned | 2024-08-08T19:31:46Z | - |
dc.date.available | 2024-08-08T19:31:46Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1100112&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/322203 | - |
dc.description | 학위논문(박사) - 한국과학기술원 : 전산학부, 2024.2,[vii, 64 p. :] | - |
dc.description.abstract | Many high-performance computing applications use sparse matrices with various operations and kernels. These applications use GPUs for parallel acceleration. However, these kernels suffer from memory I/O bottleneck, which makes it hard to utilize the total performance of GPU. It is possible to classify these memory-intensive kernels into these four categories: sparse matrix general multiplication (SpGEMM), vector-vector operations, sparse matrix-vector multiplication (SpMV), and sparse triangular matrix-vector solve (SpTRSV). Therefore, this dissertation proposes a SpGEMM accelerator architecture and a processing-in-memory (PIM) architecture for the others, which results in total hardware acceleration of sparse matrix applications. Recent SpGEMM accelerator designs advocated outer product processing, which reads sequentially and minimizes the memory read traffic. However, this study first identifies the memory bloating problem of the outer product designs, which can cause availability problems. Therefore, this study revisits an alternative inner product approach and proposes a new accelerator design called InnerSP to overcome the memory bloating problem. This study shows that row-wise inner product algorithms have a locality and exploits with a modest on-chip cache. The row-wise inner product relies on the on-chip aggregation of intermediate products with the fixed-size on-chip hash table. To deal with the variance in the size of output rows, InnerSP proposes a pre-scanning technique for row splitting and merging. For the rest of the kernels, this dissertation suggests a processing-in-memory architecture pSyncPIM to expand the compute capability of the commercial PIMs to support sparse matrix operations. These commercial PIM products execute all memory banks to access the same memory bank rows and operate simultaneously. However, these approaches could not be applied directly to sparse matrix kernels due to the uneven distribution of sparse matrices. In addition, the current DRAM interface does not allow the generation of memory commands inside the memory chip itself. Therefore, this dissertation proposes a partially synchronous execution technique that follows the lock-step manner execution of all memory banks. Still, it allows the execution of each bank to diverge in its status. On top of that, this dissertation proposes a mapping scheme of SpMV and SpTRSV kernels. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 희소 행렬 곱셈▼a희소 행렬-벡터 곱셈▼a희소 삼각 행렬-벡터 풀이▼a하드웨어 가속기▼a프로세싱-인-메모리 | - |
dc.subject | Sparse matrix general multiplication▼aSparse matrix-vector multiplication▼aSparse triangular matrix-vector solve▼aHardware accelerator▼aProcessing-in-memory | - |
dc.title | Memory-centric sparse matrix application acceleration architecture | - |
dc.title.alternative | 메모리 중심 희소 행렬 애플리케이션 가속 아키텍처 | - |
dc.type | Thesis(Ph.D) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전산학부, | - |
dc.contributor.alternativeauthor | Huh, Jaehyuk | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.