DSpace at KOASAS: (A) software framework for estimating training time of trillion-parameter scale distributed machine learning

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

(A) software framework for estimating training time of trillion-parameter scale distributed machine learning대규모 분산형 기계학습의 학습 시간 예측을 위한 소프트웨어 프레임워크

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 107
Download : 0

Export

Bang, Jehyeon

As the size of deep neural network (DNN) models is rapidly increasing to improve performance, the demand for compute resources required for DNN training is exponentially increasing. Such large-scale training is performed on distributed systems with various parallelism techniques, and the training performance in distributed systems varies drastically depending on DNN model architecture, the network topology, and the combination of parallelism techniques. However, finding the optimal training configuration incurs immense expenses, leading to the inability to effectively use compute resources in large-scale training. To address this, this thesis proposes a simulation framework to predict the training iteration time of distributed training. The proposed framework accurately predicts the training iteration time for various configurations with a mean absolute error of 12.80%, facilitating efficient exploration for the optimal training configuration.

Advisors: Rhu, Minsoo researcher; 유민수 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2023

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iv, 24 p. :]

Keywords: Distributed training▼aDeep neural networks▼aSimulation▼aParallelization▼aGPU; 분산 학습▼a심층신경망▼a시뮬레이션▼a병렬화▼a그래픽처리장치

URI: http://hdl.handle.net/10203/309962

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1033105&flag=dissertation

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

(A) software framework for estimating training time of trillion-parameter scale distributed machine learning대규모 분산형 기계학습의 학습 시간 예측을 위한 소프트웨어 프레임워크

KOASAS

Communities & Collections