DSpace at KOASAS: Large-scale incremental processing for mapreduce

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Large-scale incremental processing for mapreduce맵리듀스를 위한 대규모 점진적 처리에 대한 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 648
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Maeng, Seung-Ryoul	-
dc.contributor.advisor	맹승렬	-
dc.contributor.author	Lee, Dae-Woo	-
dc.contributor.author	이대우	-
dc.date.accessioned	2015-04-23T08:30:33Z	-
dc.date.available	2015-04-23T08:30:33Z	-
dc.date.issued	2014	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=568602&flag=dissertation	-
dc.identifier.uri	http://hdl.handle.net/10203/197814	-
dc.description	학위논문(박사) - 한국과학기술원 : 전산학과, 2014.2, [ vii, 69 p. ]	-
dc.description.abstract	An important property of today`s big data processing is that the same computation is often repeated on datasets evolving over time, such as web and social network data. This style of repeated computation is also used for many iterative algorithms. While repeating full computation of the entire datasets is feasible with distributed computing frameworks such as Hadoop, it is obviously inefficient and wastes resources.In this dissertation, we present HadUP (Hadoop with Update Processing), a modified Hadoop architecture tailored to large-scale incremental processing for conventional MapReduce algorithms. Several approaches have been proposed to achieve a similar goal using task-level memoization. They keep the previous results of tasks permanently, and reuse them when the same computation on the same task input is needed again. However, task-level memoization detects the change of datasets at a coarse-grained level, which often makes such approaches ineffective. Our analysis reveals that task-level memoization can be effective only if each task processes a few KB of input data. In contrast, HadUP detects and computes the change of datasets at a fine-grained level using deduplication-based snapshot differential algorithm (D-SD) and update propagation.Update propagation is a key primitive for efficient incremental processing of HadUP. Many applications for today`s big data processing consist of data parallel operations, where an operation transforms one or more input datasets into one output dataset. For each operation, the same computation is concurrently applied to a single input record or a group of input records. The independence between these executions allows us to compute the records to be inserted into or deleted from the output dataset, if those records inserted into or deleted from the input datasets are explicitly given. In this way, update propagation computes the updated result without full recomputation. Our evaluation shows that HadUP provides high pe...	eng
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Big data processing	-
dc.subject	중복 제거	-
dc.subject	하둡	-
dc.subject	맵리듀스	-
dc.subject	점진적 처리	-
dc.subject	빅데이터 처리	-
dc.subject	Incremental processing	-
dc.subject	MapReduce	-
dc.subject	Hadoop	-
dc.subject	Data deduplication	-
dc.title	Large-scale incremental processing for mapreduce	-
dc.title.alternative	맵리듀스를 위한 대규모 점진적 처리에 대한 연구	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	568602/325007	-
dc.description.department	한국과학기술원 : 전산학과,	-
dc.identifier.uid	020037429	-
dc.contributor.localauthor	Maeng, Seung-Ryoul	-
dc.contributor.localauthor	맹승렬	-

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Large-scale incremental processing for mapreduce맵리듀스를 위한 대규모 점진적 처리에 대한 연구

KOASAS

Communities & Collections