Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems

Cited 110 time in webofscience Cited 0 time in scopus
  • Hit : 60
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorHsieh, Kevinko
dc.contributor.authorEbrahim, Eimanko
dc.contributor.authorKim, Gwangsunko
dc.contributor.authorChatterjee, Niladrishko
dc.contributor.authorO'Connor, Mikeko
dc.contributor.authorVijaykumar, Nanditako
dc.contributor.authorMutlu, Onurko
dc.contributor.authorKeckler, Stephen Wko
dc.date.accessioned2023-10-05T08:00:34Z-
dc.date.available2023-10-05T08:00:34Z-
dc.date.created2023-10-05-
dc.date.issued2016-06-
dc.identifier.citation43rd International Symposium on Computer Architecture, ISCA 2016, pp.204 - 216-
dc.identifier.urihttp://hdl.handle.net/10203/313020-
dc.description.abstractMain memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to significantly alleviate this bottleneck by directly connecting a logic layer to the DRAM layers with high bandwidth connections. Recent work has shown promising potential performance benefits from an architecture that connects multiple such 3D-stacked memories and offloads bandwidth-intensive computations to a GPU in each of the logic layers. An unsolved key challenge in such a system is how to enable computation offloading and data mapping to multiple 3D-stacked memories without burdening the programmer such that any application can transparently benefit from near-data processing capabilities in the logic layer. Our paper develops two new mechanisms to address this key challenge. First, a compiler-based technique that automatically identifies code to offload to a logic-layer GPU based on a simple cost-benefit analysis. Second, a software/hardware cooperative mechanism that predicts which memory pages will be accessed by offloaded code, and places those pages in the memory stack closest to the offloaded code, to minimize off-chip bandwidth consumption. We call the combination of these two programmer-transparent mechanisms TOM: Transparent Offloading and Mapping. Our extensive evaluations across a variety of modern memory-intensive GPU workloads show that, without requiring any program modification, TOM significantly improves performance (by 30% on average, and up to 76%) compared to a baseline GPU system that cannot offload computation to 3D-stacked memories.-
dc.languageEnglish-
dc.publisherACM SIGGRAPH and IEEE TCCA-
dc.titleTransparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems-
dc.typeConference-
dc.identifier.wosid000389548600017-
dc.identifier.scopusid2-s2.0-84988446752-
dc.type.rimsCONF-
dc.citation.beginningpage204-
dc.citation.endingpage216-
dc.citation.publicationname43rd International Symposium on Computer Architecture, ISCA 2016-
dc.identifier.conferencecountryKO-
dc.identifier.conferencelocationSeoul-
dc.identifier.doi10.1109/ISCA.2016.27-
dc.contributor.nonIdAuthorHsieh, Kevin-
dc.contributor.nonIdAuthorEbrahim, Eiman-
dc.contributor.nonIdAuthorChatterjee, Niladrish-
dc.contributor.nonIdAuthorO'Connor, Mike-
dc.contributor.nonIdAuthorVijaykumar, Nandita-
dc.contributor.nonIdAuthorMutlu, Onur-
dc.contributor.nonIdAuthorKeckler, Stephen W-
Appears in Collection
RIMS Conference Papers
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 110 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0