HPMR : Prefetching and pre-shuffling in shared mapreduce computation environment맵-리듀스 공유 사용자 환경에서의 프리패칭과 프리셔플링기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 486
  • Download : 0
MapReduce is a programming model that supports distributed and parallel processing for large-scale data-intensive applications such as machine learning, data mining, and scientific simulation. Hadoop is an open-source implementation of the MapReduce programming model. Hadoop is used by many companies including Yahoo!, Amazon, and Facebook to perform various data mining on large-scale data sets such as user search logs and visit logs. In these cases, it is very common to share the same computing resources by multiple users due to practical considerations about cost, system utilization, and manageability. However, Hadoop assumes that all cluster nodes are dedicated to a single user, failing to guarantee high performance in the shared MapReduce computation environment. In this paper, we propose two optimization schemes, $\emph{prefetching}$ and $\emph{pre-shuffling}$, which improve the overall performance under the shared environment while retaining compatibility with the native Hadoop. The proposed schemes are implemented in the native Hadoop-0.18.3 as a plug-in component called HPMR (High Performance MapReduce Engine). Our evaluation on the Yahoo!Grid platform with three different workloads and seven types of test sets from Yahoo! shows that HPMR reduces the execution time by up to 73%.
Advisors
Maeng, Seung-Ryoulresearcher맹승렬researcher
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2010
Identifier
455247/325007  / 020084062
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학과, 2010.08, [ vi, 34 p. ]

Keywords

Hadoop; MapReduce; Scheduling; 스케줄링; 하둡; 맵리듀스

URI
http://hdl.handle.net/10203/34938
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=455247&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0