An Experimental Analysis of Limitations of MapReduce for Iterative Algorithms on Spark

Cited 6 time in webofscience Cited 0 time in scopus
  • Hit : 669
  • Download : 0
MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached.
Publisher
SPRINGER
Issue Date
2017-12
Language
English
Article Type
Article
Citation

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, v.20, no.4, pp.3593 - 3604

ISSN
1386-7857
DOI
10.1007/s10586-017-1167-y
URI
http://hdl.handle.net/10203/227498
Appears in Collection
IE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 6 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0