DSpace at KOASAS: High-throughput system design with memory networks

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

High-throughput system design with memory networks메모리 네트워크에 기반한 대용량 계산 시스템 설계에 관한 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 424
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Kim, John Dongjun	-
dc.contributor.advisor	김동준	-
dc.contributor.author	Kim, Gwangsun	-
dc.date.accessioned	2019-08-25T02:47:34Z	-
dc.date.available	2019-08-25T02:47:34Z	-
dc.date.issued	2016	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=849848&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/265317	-
dc.description	학위논문(박사) - 한국과학기술원 : 전산학부, 2016.8,[vii, 78 p. :]	-
dc.description.abstract	Recent advances in 3D integration technology and the high-bandwidth demand of modern processors led to the development of 3D-stacked memory devices such as Hybrid Memory Cube (HMC) that improve DRAM bandwidth while reducing energy cost. One of the salient features of the HMC is the routing capability provided by the logic layer that enables creating a memory network. Memory networks pose new opportunities in system design to enables efficient communication among different processors in a system, which can also lead to improved programmability. We first explore the design space of the system interconnect, which defines the connectivity of multiple processors and memory devices in a system. We show the limitations of the conventional system interconnect design, which we classify as a processor-centric network (PCN), in flexibly utilizing the processor bandwidth. By leveraging the routing capability of HMCs, we propose a memory-centric network (MCN), which can enable full processor bandwidth utilization for different traffic patterns. The MCN leads to challenges including higher processor-to-processor latency and the need to properly exploit the path diversity. Thus, we propose a distributor-based network and pass-through microarchitecture to reduce network diameter and per-hop latency, while leveraging the path diversity within the memory network to provide high throughput for adversarial traffic patterns. Meanwhile, GPUs, which are commonly used to accelerate various workloads, employ the PCIe interface, and can suffer from two major communication bottlenecks ？ accessing remote GPU memory and the host CPU memory ？ that lead to programmability challenges. This work leverages the memory network to simplify memory management and proposes scalable kernel execution (SKE) where multiple GPUs are encapsulated as a single virtual GPU to improve programmability. In addition, we propose a unified memory network (UMN) which combines the CPU memory network and GPU memory network to provide high bandwidth between CPU and multiple GPUs while eliminating memory copy overhead. In order to meet the high bandwidth requirement of the GPU and low latency requirement of the CPU, we propose a sliced flattened butterfly topology which provides high network bandwidth at low cost and an overlay network architecture to minimize CPU packet latency. The memory network and the logic layer of 3D-stacked memory device that can provide computational capability also pose the opportunity for near-data processing (NDP) which has the potential to address several obstacles for modern computer systems such as memory bandwidth and energy efficiency. Furthermore, a standardization of NDP interface can achieve more pervasive use of NDP across a wide range of systems, leveraging economies of scale across the industry. In order to overcome the challenge of performing address translation in an architecture-neutral manner to provide access to data distributed across multiple memory stacks in NDP, we propose a partitioned execution model, which removes the need for an architecture-specific MMU or TLB in the logic layer. In addition, instead of employing a data cache in the logic layer, we introduce NDP buffers to avoid the the issue of cache coherence among the main processor and multiple memory stacks. As offloading too much computation to NDP logic can degrade performance by making it a bottleneck, we also low-complexity, dynamic offload decision mechanisms to enable high speedup as well as energy reduction.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Memory network▼a3D-stacked memory▼amulti-socket system▼amulti-GPU system▼anear-data Processing	-
dc.subject	메모리 네트워크▼a3차원 적층 메모리▼a멀티-소켓 시스템▼a멀티-GPU 시스템▼a메모리-근접 계산	-
dc.title	High-throughput system design with memory networks	-
dc.title.alternative	메모리 네트워크에 기반한 대용량 계산 시스템 설계에 관한 연구	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전산학부,	-
dc.contributor.alternativeauthor	김광선	-

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

High-throughput system design with memory networks메모리 네트워크에 기반한 대용량 계산 시스템 설계에 관한 연구

KOASAS

Communities & Collections