DSpace at KOASAS: (A) batch orchestration algorithm for straggler mitigation of synchronous SGD in heterogeneous GPU cluster

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

(A) batch orchestration algorithm for straggler mitigation of synchronous SGD in heterogeneous GPU cluster이기종 GPU 클러스터에서 동기식 SGD의 스트래글러 완화를 위한 배치 오케스트레이션 알고리즘 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 356
Download : 0

Export

Yang, Eunju

so various researches have been conducted on accelerating the training speed through distributed processing. Data parallelism is one of the widely used distributed training schemes, and various algorithms for the data parallelism have been studied. However, since most of the studies assumed homogeneous computing environment, there is a problem that they do not consider a heterogeneous performance graphics processing unit (GPU) cluster environment induced by rapid performance changes of GPU types. Heterogeneous performance GPU clusters have performance differences between workers. It leads to differences in computation time between GPU workers in synchronous data parallelism, in which the total global mini-batch size is usually divided equally among several workers. Due to the time difference of the computation time of one iteration, the straggler problem that fast workers wait for the slowest worker makes the training speed low. In this thesis, we proposed a batch-orchestrating algorithm (BOA), reducing training time by improving hardware efficiency in heterogeneous performance GPU clusters. The proposed algorithm coordinates local mini-batch sizes for all workers to reduce one training iteration time. Additionally, we conducted performance tuning by searching better GPU worker set. We confirmed that the proposed algorithm improved the performance by 23% over the synchronous SGD with one back-up worker of training ResNet-194 with 8 GPUs of three different types: GTX 1080, GTX1060 and QuadroM2000. The proposed BOA solved the problem caused by the performance difference between GPU workers, and it accelerated the convergence speed of training.; Training deep learning model is time consuming

Advisors: Youn, Chan-Hyun researcher; 윤찬현 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2018

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2018.2,[iii, 46 p. :]

Keywords: deep learning▼adistributed training▼asynchronous SGD▼astraggler problem▼amini-batch; 딥러닝 분산 학습▼a동기적 확률 그라디언트 하강▼a배치 오케스트레이션▼a스트레글러

URI: http://hdl.handle.net/10203/266871

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=734028&flag=dissertation

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

(A) batch orchestration algorithm for straggler mitigation of synchronous SGD in heterogeneous GPU cluster이기종 GPU 클러스터에서 동기식 SGD의 스트래글러 완화를 위한 배치 오케스트레이션 알고리즘 연구

KOASAS

Communities & Collections