Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs

Cited 10 time in webofscience Cited 11 time in scopus
  • Hit : 233
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Gwangsunko
dc.contributor.authorJeong, Jiyunko
dc.contributor.authorKim, Johnko
dc.contributor.authorStephenson, Markko
dc.date.accessioned2020-02-10T03:20:08Z-
dc.date.available2020-02-10T03:20:08Z-
dc.date.created2020-02-10-
dc.date.issued2016-09-
dc.identifier.citation25th International Conference on Parallel Architectures and Compilation Techniques, PACT 2016, pp.339 - 350-
dc.identifier.urihttp://hdl.handle.net/10203/272189-
dc.description.abstractExecution of GPGPU workloads consists of different stages including data I/O on the CPU, memory copy between the CPU and GPU, and kernel execution. While GPU can remain idle during I/O and memory copy, prior work has shown that overlapping data movement (I/O and memory copies) with kernel execution can improve performance. However, when there are multiple dependent kernels, the execution of the kernels is serialized and the benefit of overlapping data movement can be limited. In order to improve the performance of workloads that have multiple dependent kernels, we propose to automatically overlap the execution of kernels by exploiting implicit pipeline parallelism. We first propose Coarse-grained Reference Counting-based Scoreboarding (CRCS) to guarantee correctness during overlapped execution of multiple kernels. However, CRCS alone does not necessarily improve overall performance if the thread blocks (or CTAs) are scheduled sequentially. Thus, we propose an alternative CTA scheduler-Pipeline Parallelism-aware CTA Scheduler (PPCS) that takes available pipeline parallelism into account in CTA scheduling to maximize pipeline parallelism and improve overall performance. Our evaluation results show that the proposed mechanisms can improve performance by up to 67% (33% on average). To the best of our knowledge, this is one of the first work that enables overlapped execution of multiple dependent kernels without any kernel modification or explicitly expressing dependency by the programmer.-
dc.languageEnglish-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titleAutomatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs-
dc.typeConference-
dc.identifier.wosid000392249100029-
dc.identifier.scopusid2-s2.0-84989332455-
dc.type.rimsCONF-
dc.citation.beginningpage339-
dc.citation.endingpage350-
dc.citation.publicationname25th International Conference on Parallel Architectures and Compilation Techniques, PACT 2016-
dc.identifier.conferencecountryIS-
dc.identifier.conferencelocationDan Carmel HotelHaifa-
dc.identifier.doi10.1145/2967938.2967952-
dc.contributor.nonIdAuthorJeong, Jiyun-
dc.contributor.nonIdAuthorKim, John-
dc.contributor.nonIdAuthorStephenson, Mark-
Appears in Collection
RIMS Conference Papers
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 10 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0