A Framework for Correction of Multi-Bit Soft Errors in L2 Caches Based on Redundancy

Cited 12 time in webofscience Cited 0 time in scopus
  • Hit : 451
  • Download : 0
With the continuous decrease in the minimum feature size and increase in the chip density due to technology scaling, on-chip L2,caches are becoming increasingly susceptible to multi-bit soft errors. The increase in multi-bit errors could lead to higher risk of data corruption and potentially result in the crashing of application programs. Traditionally, the L2 caches have been protected from soft errors using techniques such as: 1) error detection/correction codes; 2) physical interleaving of cache bit lines to convert multi-bit errors into single-bit errors; and 3) cache scrubbing. While the first two methods incur large area overheads for multi-bit errors, identifying the time interval for scrubbing could be tricky. In this paper, we investigate in detail the multi-bit soft error rates in large L2 caches and propose a framework of solutions for their correction based on the amount of redundancy present in the memory hierarchy. We investigate several new techniques for reducing multi-bit errors in large L2 caches, in which, the multi-bit errors are detected using simple error detection codes and corrected using the data redundancy in the memory hierarchy. We also propose several techniques to control/mine the redundancy in the memory hierarchy to further improve the reliability of the L2 cache. The proposed techniques were implemented in the Simplescalar framework and validated using the SPEC 2000 integer and floating point benchmarks for L2 cache vulnerability, global cache miss-rate, average cycle count and main memory write back rate, considering the area and power overheads. Experimental results indicate that the vulnerability of L2 caches can be decreased by 40% on the average for integer benchmarks and 32% on the average for floating point benchmarks, with an average multi-bit error coverage of about 96%, with significantly less area and power overheads and with virtually no performance penalty. The proposed techniques are applicable to both single and multi-core processor-based systems.
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Issue Date
2009-02
Language
English
Article Type
Article
Keywords

RELIABILITY; CHALLENGES; DESIGN

Citation

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, v.17, pp.194 - 206

ISSN
1063-8210
DOI
10.1109/TVLSI.2008.2003236
URI
http://hdl.handle.net/10203/97713
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 12 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0