Effect of garbage collection in iterative algorithms on Spark: an experimental analysis

Cited 4 time in webofscience Cited 4 time in scopus
  • Hit : 307
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKang, Minseoko
dc.contributor.authorLee, Jae-Gilko
dc.date.accessioned2020-11-16T05:55:28Z-
dc.date.available2020-11-16T05:55:28Z-
dc.date.created2020-01-18-
dc.date.issued2020-09-
dc.identifier.citationJOURNAL OF SUPERCOMPUTING, v.76, no.9, pp.7204 - 7218-
dc.identifier.issn0920-8542-
dc.identifier.urihttp://hdl.handle.net/10203/277306-
dc.description.abstractSpark is one of the most widely used systems for the distributed processing of big data. Its performance bottlenecks are mainly due to the network I/O, disk I/O, and garbage collection. Previous studies quantitatively analyzed the performance impact of these bottlenecks but did not focus on iterative algorithms. In an iterative algorithm, garbage collection has more performance impact than other workloads because the algorithm repeatedly loads and deletes data in the main memory through multiple iterations. Spark provides three caching mechanisms which are ""disk cache,"" ""memory cache,"" and ""no cache"" to keep the unchanged data across iterations. In this paper, we provide an in-depth experimental analysis of the effect of garbage collection on the overall performance depending on the caching mechanisms of Spark with various combinations of algorithms and datasets. The experimental results show that garbage collection accounts for 16-47% of the total elapsed time of running iterative algorithms on Spark and that the memory cache is no less advantageous in terms of garbage collection than the disk cache. We expect the results of this paper to serve as a guide for the tuning of garbage collection in the running of iterative algorithms on Spark.-
dc.languageEnglish-
dc.publisherSPRINGER-
dc.titleEffect of garbage collection in iterative algorithms on Spark: an experimental analysis-
dc.typeArticle-
dc.identifier.wosid000507708500003-
dc.identifier.scopusid2-s2.0-85077979209-
dc.type.rimsART-
dc.citation.volume76-
dc.citation.issue9-
dc.citation.beginningpage7204-
dc.citation.endingpage7218-
dc.citation.publicationnameJOURNAL OF SUPERCOMPUTING-
dc.identifier.doi10.1007/s11227-020-03150-z-
dc.contributor.localauthorLee, Jae-Gil-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorSpark-
dc.subject.keywordAuthorGarbage collection-
dc.subject.keywordAuthorIterative algorithms-
dc.subject.keywordAuthorDistributed processing-
dc.subject.keywordAuthorStorage level-
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 4 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0