To manage exponentially increasing large-scale data, the use of scalable distributed file systems is
being considered as a major solution for storing large scale data. The Google file system and Hadoop
distributed file system are typical examples of the distributed file systems that manage a massive
volume of data. They provide high fault-tolerance and availability by keeping multiple replicas for
each file. However, the replication scheme burdens lots of storage overhead. We address this space
overhead problem in this paper. We suggest a practical solution for reducing the space overhead by
combining distributed RAID techniques with replication-based DFS. Distributed RAID provides
availability based on parities which are generated by erasure coding such as Reed-Solomon coding.
This scheme requires less space overhead than replication. Our solution is to reduce the storage
overhead by substituting the replicas with parities for a certain group of data blocks. We also perform a
quantitative analysis for distributed systems with RAID parity. Finally, we discuss issues related to
decreasing space overhead and increasing high availability of the distributed file systems.