DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 정명수 | - |
dc.contributor.author | Lee, Sangwon | - |
dc.contributor.author | 이상원 | - |
dc.date.accessioned | 2024-08-08T19:31:43Z | - |
dc.date.available | 2024-08-08T19:31:43Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1100092&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/322186 | - |
dc.description | 학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 57 p. :] | - |
dc.description.abstract | In this thesis, three Real System deploying Persistent Memory are proposed. First one is TensorPRAM, a scalable heterogeneous deep learning accelerator that realizes FPGAbased domain specific architecture, and it can be used for forming a computational array for deep neural networks (DNNs). The current design of TensorPRAM includes a systolic-array hardware, which accelerates general matrix multiplication (GEMM) and convolution of DNNs. Our real system evaluations show that TensorPRAM can reduce the execution time of various DNN workloads, compared to a processor only accelerator and a systolic-array only accelerator by $99%$ and $48%$, on average, respectively. Second one is LightPC, a lightweight persistence-centric platform that consists of hardware and software subsystems, each being referred to as open-channel PMEM (OC-PMEM) and persistence-centric OS (PecOS). OC-PMEM removes physical and logical boundaries in drawing a line between volatile and non-volatile data structures by unshackling new memory media from conventional PMEM complex. PecOS provides a single execution persistence cut to quickly convert the execution states to persistent information in cases of a power failure, which can eliminate persistent control overhead and make existing software simply transparent to new memories. Our evaluation results show that OC-PMEM can make user-level performance comparable with a DRAM only non-persistent system, while consuming $72%$ lower power and $44.2%$ less energy. LightPC also shortens execution time of diverse HPC and SPEC workloads, compared to traditional orthogonal persistent systems by $1.9×, 7.7×$, on average, respectively. Last one is TrainingCXL that can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead. To this end, i) we integrate persistent memory (PMEM) and GPU into a cache-coherent domain as Type-2. Enabling CXL allows PMEM to be directly placed in GPU’s memory hierarchy, such that GPU can access PMEM without software intervention. The evaluation shows that TrainingCXL achieves $5.2×$ training performance improvement and $76%$ energy savings, compared to the modern PMEM-based recommendation systems. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 비휘발성 메모리▼a지속성 메모리 모듈▼a지속성 시스템 | - |
dc.subject | Non-volatile Memory▼aPersistent Memory Module▼aPersistent System | - |
dc.title | Integrating persistent memory into real system | - |
dc.title.alternative | 비휘발성 메모리의 실시스템으로의 통합 | - |
dc.type | Thesis(Ph.D) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전기및전자공학부, | - |
dc.contributor.alternativeauthor | Jung, Myoungsoo | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.