DC Field | Value | Language |
---|---|---|
dc.contributor.author | Hsieh, Kevin | ko |
dc.contributor.author | Ebrahim, Eiman | ko |
dc.contributor.author | Kim, Gwangsun | ko |
dc.contributor.author | Chatterjee, Niladrish | ko |
dc.contributor.author | O'Connor, Mike | ko |
dc.contributor.author | Vijaykumar, Nandita | ko |
dc.contributor.author | Mutlu, Onur | ko |
dc.contributor.author | Keckler, Stephen W | ko |
dc.date.accessioned | 2023-10-05T08:00:34Z | - |
dc.date.available | 2023-10-05T08:00:34Z | - |
dc.date.created | 2023-10-05 | - |
dc.date.issued | 2016-06 | - |
dc.identifier.citation | 43rd International Symposium on Computer Architecture, ISCA 2016, pp.204 - 216 | - |
dc.identifier.uri | http://hdl.handle.net/10203/313020 | - |
dc.description.abstract | Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to significantly alleviate this bottleneck by directly connecting a logic layer to the DRAM layers with high bandwidth connections. Recent work has shown promising potential performance benefits from an architecture that connects multiple such 3D-stacked memories and offloads bandwidth-intensive computations to a GPU in each of the logic layers. An unsolved key challenge in such a system is how to enable computation offloading and data mapping to multiple 3D-stacked memories without burdening the programmer such that any application can transparently benefit from near-data processing capabilities in the logic layer. Our paper develops two new mechanisms to address this key challenge. First, a compiler-based technique that automatically identifies code to offload to a logic-layer GPU based on a simple cost-benefit analysis. Second, a software/hardware cooperative mechanism that predicts which memory pages will be accessed by offloaded code, and places those pages in the memory stack closest to the offloaded code, to minimize off-chip bandwidth consumption. We call the combination of these two programmer-transparent mechanisms TOM: Transparent Offloading and Mapping. Our extensive evaluations across a variety of modern memory-intensive GPU workloads show that, without requiring any program modification, TOM significantly improves performance (by 30% on average, and up to 76%) compared to a baseline GPU system that cannot offload computation to 3D-stacked memories. | - |
dc.language | English | - |
dc.publisher | ACM SIGGRAPH and IEEE TCCA | - |
dc.title | Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems | - |
dc.type | Conference | - |
dc.identifier.wosid | 000389548600017 | - |
dc.identifier.scopusid | 2-s2.0-84988446752 | - |
dc.type.rims | CONF | - |
dc.citation.beginningpage | 204 | - |
dc.citation.endingpage | 216 | - |
dc.citation.publicationname | 43rd International Symposium on Computer Architecture, ISCA 2016 | - |
dc.identifier.conferencecountry | KO | - |
dc.identifier.conferencelocation | Seoul | - |
dc.identifier.doi | 10.1109/ISCA.2016.27 | - |
dc.contributor.nonIdAuthor | Hsieh, Kevin | - |
dc.contributor.nonIdAuthor | Ebrahim, Eiman | - |
dc.contributor.nonIdAuthor | Chatterjee, Niladrish | - |
dc.contributor.nonIdAuthor | O'Connor, Mike | - |
dc.contributor.nonIdAuthor | Vijaykumar, Nandita | - |
dc.contributor.nonIdAuthor | Mutlu, Onur | - |
dc.contributor.nonIdAuthor | Keckler, Stephen W | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.