Experimental comparison of flink, spark, and hadoop on big-benchmark빅벤치마크를 통한 플링크, 스파크, 하둡의 실험적 비교분석

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 745
  • Download : 0
Various frameworks for Big Data analytics have become important for large scale processing. Hadoop MapReduce is one of the most prominent programming model in the distributed system field, but the model has multiple disadvantages for running iterative algorithms, e.g., low performance caused by disk I/O cost. To overcome the inefficiency, in-memory frameworks such as Apache Flink and Apache Spark have been introduced. Nowadays, Spark-SQL developed by the Spark community is very popular in the field. These systems are actively developed and utilized in industry. In this thesis, we experimentally compare Flink and Spark-SQL with a comprehensive benchmark suite for Big Data systems, BigBench. 21 queries of the 30 BigBench queries are ported from Hive QL to Flink. Both systems are evaluated using the 21 queries to test the total elapsed time and data scalability on variable scale factors. Results show that in the experiment of data scalability, the elapsed time of Spark-SQL stays nearly constant for over a half of the queries as the dataset increases, whereas that of Flink linearly increases. In the experiment of the total elapsed time, Spark-SQL is slower than Flink on Scale Factor 100 for most queries, but on Scale Factor 300 the elapsed time of Spark-SQL for over a half of the queries is similar to that of Flink. We analyze the behavior of Flink and Spark-SQL on a per query basis in detail and point out the trade-offs of using the two systems according to certain situations.
Advisors
Maeng, Seungryoulresearcher맹승렬researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2016
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학부, 2016.8 ,[v, 68 p. :]

Keywords

Flink; Spark; Spark-SQL; Big Data; BigBench; 플링크; 스파크; 빅데이터; 빅벤치; 스파크 에스큐엘

URI
http://hdl.handle.net/10203/221871
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=663492&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0