DSpace at KOASAS: Energy-efficient deep-neural-network training processor with fine-grained mixed precision

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Energy-efficient deep-neural-network training processor with fine-grained mixed precision고속 학습 가능 고효율 혼합 정밀도 DNN 학습 프로세서

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 328
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Yoo, Hoi-Jun	-
dc.contributor.advisor	유회준	-
dc.contributor.author	Lee, Jinsu	-
dc.date.accessioned	2021-05-12T19:45:42Z	-
dc.date.available	2021-05-12T19:45:42Z	-
dc.date.issued	2020	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=924545&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/284457	-
dc.description	학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2020.8,[vii, 113 p. :]	-
dc.description.abstract	Recently, several hardware accelerators have been reported for deep neural network (DNN) operation, however, they focused on only inference rather than DNN training that is a crucial ingredient for user adaptation at the edge-device as well as transfer learning with domain-specific data. However, DNN training requires much heavier floating-point (FP) computation and memory access than DNN inference, thus, dedicated DNN training hardware is essential. In this dissertation, we present a deep learning neural processing unit (LNPU) supporting CNN and FC training as well as inference with the following key features. First, we proposed fine-grained mixed precision (FGMP) scheme. The FGMP divides data into FP8-group and FP16-group in data-element level. FGMP can dynamically adjust the ratio between FP8 and FP16 to reduce external memory access and avoid accuracy loss. With the FGMP, external memory access is reduced by 38.9% for ResNet-18 training. Second, we designed hardware architecture to support FGMP. For high energy efficiency, we proposed DL core architecture with configurable PE and data-path for DNN training with FGMP. As a result, the energy efficiency of LNPU is improved by $2.08 \times$ ResNet-18 training. Lastly, we proposed fully-reconfigurable hardware architecture for various kinds of operations in DNN training/inference with zero-skipping. With the help of fully-reconfigurable hardware architecture, proposed LNPU can support all of the steps of DNN training with skipping zeros which are derived from FGMP and ReLU, and so on. As a result, the energy efficiency is $ \times 4.4$ higher than NVIDIA V100 GPU and its normalized peak performance is $\times 2.4$ higher than the previous DNN training processor.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Deep learning▼adeep-neural-network▼aDNN training▼adigital processor▼aenergy-efficient hardware▼aartificial intelligence▼amachine learning	-
dc.subject	딥러닝▼a딥 뉴럴 네트워크▼a학습▼a디지털 프로세서▼a고효율 하드웨어▼a인공지능▼a기계학습	-
dc.title	Energy-efficient deep-neural-network training processor with fine-grained mixed precision	-
dc.title.alternative	고속 학습 가능 고효율 혼합 정밀도 DNN 학습 프로세서	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전기및전자공학부,	-
dc.contributor.alternativeauthor	이진수	-

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Energy-efficient deep-neural-network training processor with fine-grained mixed precision고속 학습 가능 고효율 혼합 정밀도 DNN 학습 프로세서

KOASAS

Communities & Collections