Non-volatile memory like spin-transfer torque random access memory (STT-RAM) has recently emerged as a promising type of memory technology owing to its non-volatility, high density, and low-leakage power. Unfortunately, STT-RAM has write-related drawbacks such as high write energy consumption and limit in the number of writes. While STT-RAM has emerged, traditional memory technologies such as SRAM (static random access memory) and DRAM (dynamic random access memory) are still used. In particular, DRAM has been widely used as main memory in computing systems owing to its high density and low cost. DRAM has become the main contributor to total energy consumption in computing systems owing to the growing demands for high-bandwidth and large capacity. In the transition period of memory technologies, we need to study both conventional and emerging memory technologies. In this dissertation, we improve the feasibility of STT-RAM-based cache by reducing energy consumption and increasing lifetime based on data spatial locality. We also improve DRAM systems by reducing unnecessarily large DRAM operation granularity based on data spatial locality and compression.
First, we evaluate and propose cache hierarchy management policies for low write energy consumption and long lifetime when L1 caches are composed of SRAM and L2 caches are implemented using STT-RAM. We first evaluate inclusion-related policies (such as inclusive, non-inclusive, exclusive, and non-exclusive policies) between L1 and L2 caches because such policies differ in their impact on the number of write operations in an L2 cache. We found that a non-exclusive policy shows the best energy consumption and longest lifetime results because it reduces the number of write operations to the L2 cache by filling only L1 caches upon L2 cache misses. Then, we propose a sub-block-based management policy because the write energy consumption is proportional to the amount of written data and not all of the words in a cache line are always used. The policy writes back only used sub-blocks to an L2 cache to reduce the write-amount and, consequently, to reduce the write energy consumption and increase the lifetime.
Second, we evaluate the inefficiency of large DRAM operation granularity and propose a new DRAM system that supports fine-grained operations. Generally, many applications show low spatial locality in the fetched cache lines. Thus, it is inefficient to read unused words from DRAM. Consequently, corresponding parts of a row to the unused words do not need to be activated and read/written. To mitigate the inefficiency, we propose the Spatio-DRAM system that supports variable granularity accesses based on low spatial locality. Spatio-DRAM operates with two DRAM access schemes. The first scheme called Spatio-Row achieves ACT-PRE and I/O power reductions for both reads and writes based on data spatial locality. The second scheme called Spatio-Col reduces the read/write energy consumption by eliminating reads/writes for unused words and the time wasted on the transmission of those unused words. Spatio-Col supports variable burst length (quarter, half, three-quarter, and full) because read/write energy consumption is proportional to the burst length. It also relaxes the tBURST constraint, which makes data bus-turnaround, write-recovery, and rank-to-rank switching start earlier, thereby improving system performance.
Lastly, we rethink the efficiency of data compression-based memory. There have been several data compression algorithms based on data similarity, which are used to provide larger effective memory capacity or improve memory bandwidth. We can also reduce memory energy consumption by accessing memory as much as compressed data size. However, such data compression produces metadata as a by-product. There are a lot of metadata in main memory owing to a large number of cache lines lying in it, and thus, adding a metadata cache in the memory controller is inevitable. We propose a dynamic metadata cache management method considering the metadata cache paradox in which metadata cache does not work properly and rather incurs overheads in low locality workloads. Consequently, our proposed data compression-aware DRAM system, which supports dynamic metadata cache use determination, improves system performance and energy consumption with marginal overheads.