DSpace at KOASAS: Experimental comparison of flink, spark, and hadoop on big-benchmark

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

Experimental comparison of flink, spark, and hadoop on big-benchmark빅벤치마크를 통한 플링크, 스파크, 하둡의 실험적 비교분석

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 749
Download : 0

Export

Lee, Hae Joon / 이해준

Various frameworks for Big Data analytics have become important for large scale processing. Hadoop MapReduce is one of the most prominent programming model in the distributed system field, but the model has multiple disadvantages for running iterative algorithms, e.g., low performance caused by disk I/O cost. To overcome the inefficiency, in-memory frameworks such as Apache Flink and Apache Spark have been introduced. Nowadays, Spark-SQL developed by the Spark community is very popular in the field. These systems are actively developed and utilized in industry. In this thesis, we experimentally compare Flink and Spark-SQL with a comprehensive benchmark suite for Big Data systems, BigBench. 21 queries of the 30 BigBench queries are ported from Hive QL to Flink. Both systems are evaluated using the 21 queries to test the total elapsed time and data scalability on variable scale factors. Results show that in the experiment of data scalability, the elapsed time of Spark-SQL stays nearly constant for over a half of the queries as the dataset increases, whereas that of Flink linearly increases. In the experiment of the total elapsed time, Spark-SQL is slower than Flink on Scale Factor 100 for most queries, but on Scale Factor 300 the elapsed time of Spark-SQL for over a half of the queries is similar to that of Flink. We analyze the behavior of Flink and Spark-SQL on a per query basis in detail and point out the trade-offs of using the two systems according to certain situations.

Advisors: Maeng, Seungryoul researcher; 맹승렬 researcher

Description: 한국과학기술원 :전산학부,

Publisher: 한국과학기술원

Issue Date: 2016

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전산학부, 2016.8 ,[v, 68 p. :]

Keywords: Flink; Spark; Spark-SQL; Big Data; BigBench; 플링크; 스파크; 빅데이터; 빅벤치; 스파크 에스큐엘

URI: http://hdl.handle.net/10203/221871

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=663492&flag=dissertation

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Experimental comparison of flink, spark, and hadoop on big-benchmark빅벤치마크를 통한 플링크, 스파크, 하둡의 실험적 비교분석

KOASAS

Communities & Collections