In this article, we describe how to ease memory management between a Central Processing Unit (CPU) and one or multiple discrete Graphic Processing Units (GPUs) by architecting a novel hardware-based Unified Memory Hierarchy (UMH). Adopting UMH, a GPU accesses the CPU memory only if it does not find its required data in the directories associated with its high-bandwidth memory, or the NMOESI coherency protocol limits the access to that data. UsingUMHwith NMOESI improves performance of a CPU-multiGPU system by at least 1.92x in comparison to alternative software-based approaches. It also allows the CPU to access GPUs modified data by at least 13x faster.