(An) efficient approach to improve space overhead in modern block-based DFS블록 기반 분산 파일 시스템의 효과적인 공간 효율성 향상 기법에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 590
  • Download : 0
Modern block-oriented distributed storage systems have proliferated in this era of big data and cloud computing. These systems feature block-level data replication in order to guarantee both fault tolerance and data availability. The distributed storage systems typically partition their files into equal-sized blocks and make multiple copies for each block, which are then distributed across machines. However, the data replication strategy severely causes the issue of space overhead as it requires more storage volumes for keeping the copies. Many storage volumes are just wasted only for keeping the block copies of which data may not be accessed in the strategy. We present a novel technique called DynaEC to address the issue in modern block-oriented distribute storage systems. Dynaec substitutes block copies with much fewer parity blocks at runtime by integrating erasure coding techniques into a distributed storage system. In the integration, dynaec provides unique features which have not been well addressed by other studies. First, dynaec provides a unique striping and parity block placement algorithm that encodes data blocks which are randomly distributed across machines to parities and then places the parities to the other machines, guaranteeing data availability for any node failure. Second, parity encoding in dynaec is performed without any change of the original block placement policy in the distributed storage systems. This makes dynaec work seamlessly in the block-oriented distributed storage systems. Third, during the encoding procedure each data node encodes each own data blocks only, not requiring any information about other blocks in other data nodes. As such, the encoding procedure in dynaec can be fully performed in parallel without any synchronization issue. Finally, DynaEC achieves both performance and availability at a similar level to the block-level replication scheme after substitution of replicas with parities. With extensive experiments on our implementation on Hadoop DFS(Distributed File Systems), we show that DynaEC saves storage volumes to the theoretical limit while outperforming the previous approaches in terms of encoding time by orders of magnitude. Solid state drives(SSDs), which have no mechanical moving parts, have received much attention as a new storage media. As SSDs have become more cost effective with the declining price of NAND flash memories, they are widely used from embedded devices to enterprise-level servers. SSDs provide a lot of benefits over HDDs such as low power consumption, shock resilience, light weight, and significantly low random latency. This enables us to reduce the overall operating costs for enterprise data centers although the initial costs of adopting SSDs may still higher than HDDs. However, there remain two serious problems limiting wider deployment of SSDs: limited lifespan and relatively poor random write performance. The main reason of the above problems is the write amplification which is caused by out-of-place update characteristics of NAND flash memories. In parity based DFS, writing a SSD page needs to update not only the data block but also the parity block, both of which are random writes. The more random write operations make more garbage collection triggering, degraded system performance, and shortened SSD lifetime. To solve the parity update issue in SSD-based DFS, we propose a novel technique, called LPUS. LPUS transforms random parity updates to sequential log writes with additional log blocks in the SSD. In contrast to the conventional parity update process, our approach reduces the block fragmentation using sequential parity logs and the number of parity updates by updating parities in a lazy manner. We implement and evaluate the proposed method on SRsim which is an open-source SSD array simulator. The experimental results show that, LPUS reduces write amplification up to 37% and the number of erase operations up to 50% with the reasonable size of log space.
Advisors
Lee, YoonJoonresearcher이윤준researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2016
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2016.8 ,[v, 60 p. :]

Keywords

Distributed storage system; Storage efficiency; Hadoop; Replication; Erasure coding; RAID; NAND flash memory; SSD(Solid State Drives); Write amplification; Log-based approach; 분산 저장 시스템; 저장 공간 효율성; 하둡; 복제본; 소거 코딩; 레이드; 낸드플래시메모리; 솔리드 스테이트 드라이브; 쓰기 증폭; 로그-기반 접근 방법

URI
http://hdl.handle.net/10203/222417
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=663210&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0