(A) study on performance improvement of communication-intensive applications in large-scale computational resources거대 계산자원에서 통신 의존적 프로그램의 성능향상 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 63
  • Download : 0
Efforts to solve a large-scale problem requiring more than huge numbers of CPU cores often face signifi- cant performance degradation, mainly due to communication burden. The present study aims to improve the communication performance of parallel applications by investigating two approaches including non- blocking-collective-based latency-hiding and space-filling-curve-based task-mapping ones. Non-blocking- collective-based latency-hiding approach enhances parallel performance to leverage the non-blocking collective operations, enabling latency hiding by overlapping the computation and communication. We apply this approach to highly communication-intensive code to perform a numerical simulation of tur- bulent channel flows. Computation-communication overlapping enables 3.55 times faster computing on 16K CPU cores of Nurion supercomputer than the non-optimized case. While effective, the adoption of this approach could be limited depending on applications, and would require significant efforts from the developers’ part. Meanwhile, efficient task-mapping between processes and compute resource is a useful approach since it can be applied regardless of the algorithm and code modifications can be minimized. Space-filling-curve-based task-mapping approach is developed, in which it exploits the Hilbert curve to map a multi dimensional task space into one dimensional space while preserving the locality between processes. It is also accompanied with a novel performance analysis employing the binary classification, which enables static assessment of the applications to predict the potential benefits of the proposed approach before run-time. The Hilbert-based approach is evaluated with three different workloads, each using two- or three-dimensional domain decomposition strategy in Cartesian coordinates. Benchmarks for the three workloads are composed to exploit up to 65K CPU cores of large-scale cluster system. Results show the overall performance improvement is from ̃1.3x to ̃1.66x depending on workloads by reducing the communication overhead through the proposed approach. The accuracy of the proposed performance analysis is also evaluated via benchmark results using binary classification with neural net- work models. Through the analysis, the expected value of performance improvement when using the binary classification is from 4% to 8%.
Advisors
Song, Junehwaresearcher송준화researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2022.8,[vi, 83 p. :]

Keywords

HPC▼aMPI▼aLarge-scale computing▼aTask mapping▼aHilbert curve; 고성능컴퓨팅▼aMPI▼a대규모컴퓨팅▼a태스크 매핑▼aHilbert curve

URI
http://hdl.handle.net/10203/309250
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1007874&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0