In-network architecture for multi-GPU멀티지피유를 위한 인네트워크 아키텍쳐

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 115
  • Download : 0
Deep learning technology requires tremendous computing power and memory, and multi-GPU systems are developing accordingly. However, in order to perform deep learning training in a multi-GPU system, data parallelism must be used. Data parallelism utilizes All-Reduce, a cluster communication library, during data parallelism. All-Reduce averages the slopes scattered across each GPU. The process of having to communicate with a different GPU each deep learning training session acts as a tremendous burden. Recently, deep learning frameworks such as Pytorch and TensorFlow are trying to reduce communication overhead by overlapping backpropagation and All-Reduce. While this increased throughput, it was observed that the backpropagation time increased due to the overlapping. We present an in-network architecture to improve the slow backpropagation caused by the aforementioned overlapping. We verified it using the latest multi-GPU simulator, MGPUSim, and will improve the slow backpropagation phenomenon, and achieve performance improvement in the limited All-Reduce size.
Advisors
Kim, John Dongjunresearcher김동준researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2021
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2021.8,[iii, 24 p. :]

Keywords

GPU▼aMulti-GPU System▼adistiributed system▼adeep learning▼acollective communication▼aswitch; 그래픽 카드▼a다중 그래픽 카드 시스템▼a분산 시스템▼a딥러닝; 군집통신▼a스위치

URI
http://hdl.handle.net/10203/295933
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=963409&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0