In this thesis, three Real System deploying Persistent Memory are proposed.
First one is TensorPRAM, a scalable heterogeneous deep learning accelerator that realizes FPGAbased domain specific architecture, and it can be used for forming a computational array for deep neural networks (DNNs). The current design of TensorPRAM includes a systolic-array hardware, which accelerates general matrix multiplication (GEMM) and convolution of DNNs. Our real system evaluations show that TensorPRAM can reduce the execution time of various DNN workloads, compared to a processor only accelerator and a systolic-array only accelerator by $99%$ and $48%$, on average, respectively.
Second one is LightPC, a lightweight persistence-centric platform that consists of hardware and software subsystems, each being referred to as open-channel PMEM (OC-PMEM) and persistence-centric OS (PecOS). OC-PMEM removes physical and logical boundaries in drawing a line between volatile and non-volatile data structures by unshackling new memory media from conventional PMEM complex. PecOS provides a single execution persistence cut to quickly convert the execution states to persistent information in cases of a power failure, which can eliminate persistent control overhead and make existing software simply transparent to new memories. Our evaluation results show that OC-PMEM can make user-level performance comparable with a DRAM only non-persistent system, while consuming $72%$ lower power and $44.2%$ less energy. LightPC also shortens execution time of diverse HPC and SPEC workloads, compared to traditional orthogonal persistent systems by $1.9×, 7.7×$, on average, respectively. Last one is TrainingCXL that can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead. To this end, i) we integrate persistent memory (PMEM) and GPU into a cache-coherent domain as Type-2. Enabling CXL allows PMEM to be directly placed in GPU’s memory hierarchy, such that GPU can access PMEM without software intervention. The evaluation shows that TrainingCXL achieves $5.2×$ training performance improvement and $76%$ energy savings, compared to the modern PMEM-based recommendation systems.