Integrating persistent memory into real system비휘발성 메모리의 실시스템으로의 통합

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 1
  • Download : 0
In this thesis, three Real System deploying Persistent Memory are proposed. First one is TensorPRAM, a scalable heterogeneous deep learning accelerator that realizes FPGAbased domain specific architecture, and it can be used for forming a computational array for deep neural networks (DNNs). The current design of TensorPRAM includes a systolic-array hardware, which accelerates general matrix multiplication (GEMM) and convolution of DNNs. Our real system evaluations show that TensorPRAM can reduce the execution time of various DNN workloads, compared to a processor only accelerator and a systolic-array only accelerator by $99%$ and $48%$, on average, respectively. Second one is LightPC, a lightweight persistence-centric platform that consists of hardware and software subsystems, each being referred to as open-channel PMEM (OC-PMEM) and persistence-centric OS (PecOS). OC-PMEM removes physical and logical boundaries in drawing a line between volatile and non-volatile data structures by unshackling new memory media from conventional PMEM complex. PecOS provides a single execution persistence cut to quickly convert the execution states to persistent information in cases of a power failure, which can eliminate persistent control overhead and make existing software simply transparent to new memories. Our evaluation results show that OC-PMEM can make user-level performance comparable with a DRAM only non-persistent system, while consuming $72%$ lower power and $44.2%$ less energy. LightPC also shortens execution time of diverse HPC and SPEC workloads, compared to traditional orthogonal persistent systems by $1.9×, 7.7×$, on average, respectively. Last one is TrainingCXL that can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead. To this end, i) we integrate persistent memory (PMEM) and GPU into a cache-coherent domain as Type-2. Enabling CXL allows PMEM to be directly placed in GPU’s memory hierarchy, such that GPU can access PMEM without software intervention. The evaluation shows that TrainingCXL achieves $5.2×$ training performance improvement and $76%$ energy savings, compared to the modern PMEM-based recommendation systems.
Advisors
정명수researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 57 p. :]

Keywords

비휘발성 메모리▼a지속성 메모리 모듈▼a지속성 시스템; Non-volatile Memory▼aPersistent Memory Module▼aPersistent System

URI
http://hdl.handle.net/10203/322186
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1100092&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0