Energy-efficient deep-neural-network training processor with fine-grained mixed precision고속 학습 가능 고효율 혼합 정밀도 DNN 학습 프로세서

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 328
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorYoo, Hoi-Jun-
dc.contributor.advisor유회준-
dc.contributor.authorLee, Jinsu-
dc.date.accessioned2021-05-12T19:45:42Z-
dc.date.available2021-05-12T19:45:42Z-
dc.date.issued2020-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=924545&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/284457-
dc.description학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2020.8,[vii, 113 p. :]-
dc.description.abstractRecently, several hardware accelerators have been reported for deep neural network (DNN) operation, however, they focused on only inference rather than DNN training that is a crucial ingredient for user adaptation at the edge-device as well as transfer learning with domain-specific data. However, DNN training requires much heavier floating-point (FP) computation and memory access than DNN inference, thus, dedicated DNN training hardware is essential. In this dissertation, we present a deep learning neural processing unit (LNPU) supporting CNN and FC training as well as inference with the following key features. First, we proposed fine-grained mixed precision (FGMP) scheme. The FGMP divides data into FP8-group and FP16-group in data-element level. FGMP can dynamically adjust the ratio between FP8 and FP16 to reduce external memory access and avoid accuracy loss. With the FGMP, external memory access is reduced by 38.9% for ResNet-18 training. Second, we designed hardware architecture to support FGMP. For high energy efficiency, we proposed DL core architecture with configurable PE and data-path for DNN training with FGMP. As a result, the energy efficiency of LNPU is improved by $2.08 \times$ ResNet-18 training. Lastly, we proposed fully-reconfigurable hardware architecture for various kinds of operations in DNN training/inference with zero-skipping. With the help of fully-reconfigurable hardware architecture, proposed LNPU can support all of the steps of DNN training with skipping zeros which are derived from FGMP and ReLU, and so on. As a result, the energy efficiency is $ \times 4.4$ higher than NVIDIA V100 GPU and its normalized peak performance is $\times 2.4$ higher than the previous DNN training processor.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectDeep learning▼adeep-neural-network▼aDNN training▼adigital processor▼aenergy-efficient hardware▼aartificial intelligence▼amachine learning-
dc.subject딥러닝▼a딥 뉴럴 네트워크▼a학습▼a디지털 프로세서▼a고효율 하드웨어▼a인공지능▼a기계학습-
dc.titleEnergy-efficient deep-neural-network training processor with fine-grained mixed precision-
dc.title.alternative고속 학습 가능 고효율 혼합 정밀도 DNN 학습 프로세서-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthor이진수-
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0