DSpace at KOASAS: Energy-efficient deep-neural-network training processor with fine-grained mixed precision

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Energy-efficient deep-neural-network training processor with fine-grained mixed precision고속 학습 가능 고효율 혼합 정밀도 DNN 학습 프로세서

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 318
Download : 0

Export

Lee, Jinsu

Recently, several hardware accelerators have been reported for deep neural network (DNN) operation, however, they focused on only inference rather than DNN training that is a crucial ingredient for user adaptation at the edge-device as well as transfer learning with domain-specific data. However, DNN training requires much heavier floating-point (FP) computation and memory access than DNN inference, thus, dedicated DNN training hardware is essential. In this dissertation, we present a deep learning neural processing unit (LNPU) supporting CNN and FC training as well as inference with the following key features. First, we proposed fine-grained mixed precision (FGMP) scheme. The FGMP divides data into FP8-group and FP16-group in data-element level. FGMP can dynamically adjust the ratio between FP8 and FP16 to reduce external memory access and avoid accuracy loss. With the FGMP, external memory access is reduced by 38.9% for ResNet-18 training. Second, we designed hardware architecture to support FGMP. For high energy efficiency, we proposed DL core architecture with configurable PE and data-path for DNN training with FGMP. As a result, the energy efficiency of LNPU is improved by $2.08 \times$ ResNet-18 training. Lastly, we proposed fully-reconfigurable hardware architecture for various kinds of operations in DNN training/inference with zero-skipping. With the help of fully-reconfigurable hardware architecture, proposed LNPU can support all of the steps of DNN training with skipping zeros which are derived from FGMP and ReLU, and so on. As a result, the energy efficiency is $ \times 4.4$ higher than NVIDIA V100 GPU and its normalized peak performance is $\times 2.4$ higher than the previous DNN training processor.

Advisors: Yoo, Hoi-Jun researcher; 유회준 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2020

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2020.8,[vii, 113 p. :]

Keywords: Deep learning▼adeep-neural-network▼aDNN training▼adigital processor▼aenergy-efficient hardware▼aartificial intelligence▼amachine learning; 딥러닝▼a딥 뉴럴 네트워크▼a학습▼a디지털 프로세서▼a고효율 하드웨어▼a인공지능▼a기계학습

URI: http://hdl.handle.net/10203/284457

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=924545&flag=dissertation

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Energy-efficient deep-neural-network training processor with fine-grained mixed precision고속 학습 가능 고효율 혼합 정밀도 DNN 학습 프로세서

KOASAS

Communities & Collections