Exploiting bandwidth temporal loads for low latency and high bandwidth memory

Cited 1 time in webofscience Cited 0 time in scopus
  • Hit : 343
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Soontaeko
dc.contributor.authorVijaykrishnan, Nko
dc.contributor.authorKandemir, Mko
dc.contributor.authorIrwin, MJko
dc.date.accessioned2013-03-07T15:59:18Z-
dc.date.available2013-03-07T15:59:18Z-
dc.date.created2012-02-06-
dc.date.created2012-02-06-
dc.date.issued2005-07-
dc.identifier.citationIEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES, v.152, no.4, pp.457 - 466-
dc.identifier.issn1350-2387-
dc.identifier.urihttp://hdl.handle.net/10203/90617-
dc.description.abstractIncreasing clock frequencies and issue rates aggravates the memory latency problem, imposing higher memory bandwidth requirements. While caches can be multi-ported to provide high memory bandwidth, the increase in access latency with the increase in the number of ports limits their potential. The paper proposes a novel technique, called the 'temporal load cache architecture', to reduce load latencies and provide higher memory bandwidths. The key motivation for the technique is that temporal loads - dynamic instances of a static load instruction that access the same address as that accessed by the last dynamic instance of the same static load - constitute 48% of all dynamic loads on average for the SPEC2000 benchmarks. When a load is predicted to be temporal, the data predicted to be accessed by it are read early in the pipeline from a small temporal load cache that stores the temporal data. The proposed temporal load cache architecture has two main advantages. First, since instructions dependent on a temporal load are provided with their data early in the pipeline, they can be issued as soon as they resolve their remaining data dependences and resource conflicts. Second, since a large percentage of loads can be filtered by the temporal load cache, the main data cache can service other (nontemporal) loads better, providing higher memory bandwidth. The experimental results show that the proposed temporal load cache architecture improves performance by 8.3% on average for the SPEC2000 integer benchmarks.-
dc.languageEnglish-
dc.publisherIEE-INST ELEC ENG-
dc.titleExploiting bandwidth temporal loads for low latency and high bandwidth memory-
dc.typeArticle-
dc.identifier.wosid000231541300001-
dc.identifier.scopusid2-s2.0-23744502675-
dc.type.rimsART-
dc.citation.volume152-
dc.citation.issue4-
dc.citation.beginningpage457-
dc.citation.endingpage466-
dc.citation.publicationnameIEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES-
dc.identifier.doi10.1049/ip-cdt:20045124-
dc.contributor.localauthorKim, Soontae-
dc.contributor.nonIdAuthorVijaykrishnan, N-
dc.contributor.nonIdAuthorKandemir, M-
dc.contributor.nonIdAuthorIrwin, MJ-
dc.type.journalArticleArticle-
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 1 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0