On-chip cache memories play an important role in resource-constrained embedded systems by filtering out most off-chip memory accesses. Because cache latency and energy consumption are generally proportional to cache sizes, a small cache at the top level of the memory hierarchy is desirable. Previous work has presented a novel cache architecture called a filter cache to reduce hit time and energy consumption of the L1 instruction cache. However, consideration to the data cache requires a different approach and has not been researched much. In this paper, we propose a filter data cache architecture to effectively adopt the filter cache to the data cache hierarchy. We observed that cache misses occur considerably and they are likely to be continuous when the filter cache is used for the data cache. Those misses cost performance and energy consumption by increasing cache latency and uploading unnecessary data. The proposed filter data cache architecture reduces miss costs using three schemes: early cache hit predictor (ECHP), locality-based allocation (LA), and No Tag Matching Write (NTW). Experimental results show that the proposed filter data cache reduces energy consumption of the data caches by 21% compared with the filter cache, and the energy consumption of the ALU by 27.2 percentage on average. The overheads in terms of area and leakage power are small and the proposed filter data cache architecture does not hurt performance.