DSpace at KOASAS: Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs

DSpace at KOASAS

RIMS Collection RIMS Conference Papers

Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs

Cited 10 time in

Cited 11 time in

Hit : 233
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Kim, Gwangsun	ko
dc.contributor.author	Jeong, Jiyun	ko
dc.contributor.author	Kim, John	ko
dc.contributor.author	Stephenson, Mark	ko
dc.date.accessioned	2020-02-10T03:20:08Z	-
dc.date.available	2020-02-10T03:20:08Z	-
dc.date.created	2020-02-10	-
dc.date.issued	2016-09	-
dc.identifier.citation	25th International Conference on Parallel Architectures and Compilation Techniques, PACT 2016, pp.339 - 350	-
dc.identifier.uri	http://hdl.handle.net/10203/272189	-
dc.description.abstract	Execution of GPGPU workloads consists of different stages including data I/O on the CPU, memory copy between the CPU and GPU, and kernel execution. While GPU can remain idle during I/O and memory copy, prior work has shown that overlapping data movement (I/O and memory copies) with kernel execution can improve performance. However, when there are multiple dependent kernels, the execution of the kernels is serialized and the benefit of overlapping data movement can be limited. In order to improve the performance of workloads that have multiple dependent kernels, we propose to automatically overlap the execution of kernels by exploiting implicit pipeline parallelism. We first propose Coarse-grained Reference Counting-based Scoreboarding (CRCS) to guarantee correctness during overlapped execution of multiple kernels. However, CRCS alone does not necessarily improve overall performance if the thread blocks (or CTAs) are scheduled sequentially. Thus, we propose an alternative CTA scheduler-Pipeline Parallelism-aware CTA Scheduler (PPCS) that takes available pipeline parallelism into account in CTA scheduling to maximize pipeline parallelism and improve overall performance. Our evaluation results show that the proposed mechanisms can improve performance by up to 67% (33% on average). To the best of our knowledge, this is one of the first work that enables overlapped execution of multiple dependent kernels without any kernel modification or explicitly expressing dependency by the programmer.	-
dc.language	English	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs	-
dc.type	Conference	-
dc.identifier.wosid	000392249100029	-
dc.identifier.scopusid	2-s2.0-84989332455	-
dc.type.rims	CONF	-
dc.citation.beginningpage	339	-
dc.citation.endingpage	350	-
dc.citation.publicationname	25th International Conference on Parallel Architectures and Compilation Techniques, PACT 2016	-
dc.identifier.conferencecountry	IS	-
dc.identifier.conferencelocation	Dan Carmel HotelHaifa	-
dc.identifier.doi	10.1145/2967938.2967952	-
dc.contributor.nonIdAuthor	Jeong, Jiyun	-
dc.contributor.nonIdAuthor	Kim, John	-
dc.contributor.nonIdAuthor	Stephenson, Mark	-

Appears in Collection: RIMS Conference Papers

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 10 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs

This item is cited by other documents in WoS

KOASAS

Communities & Collections