The demand for high-performance architectures and powerful battery-operated mobile devices has accentuated the need for low-power systems. In many media and embedded applications, the memory system can consume more than 50% of the overall system energy, making it a ripe candidate for optimization. To address this increasingly important problem, this article studies energy-efficient cache architectures in the memory hierarchy that can have a significant impact on the overall system energy consumption.Existing cache optimization approaches have looked at partitioning the caches at the circuit level and enabling/disabling these cache partitions (subbanks) at the architectural level for both performance and energy. In contrast, this article focuses on partitioning the cache resources architecturally for energy and energy-delay optimizations. Specifically, we investigate ways of splitting the cache into several smaller units, each of which is a cache by itself (called a subcache). Subcache architectures not only reduce the per-access energy costs, but can potentially improve the locality behavior as well.The proposed subcache architecture employs a page-based placement strategy, a dynamic page remapping policy, and a subcache prediction policy in order to improve the memory system energy behavior, especially on-chip cache energy. Using applications from the SPECjvm98 and SPEC CPU2000 benchmarks, the proposed subcache architecture is shown to be very effective in improving both the energy and energy-delay metrics. It is more beneficial in larger caches as well.