(An) energy-effcient deep neural network acceleration by exploiting cross-layer weight scaling and bit-level data sharing교차 계층 가중치 상호 변환 및 비트 수준 데이터 공유를 활용한 에너지 효율적 심층 신경망 가속

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 274
  • Download : 0
Various accelerators have been proposed to efficiently execute deep neural networks in mobile devices with severe energy and area limitations. However, these accelerators mostly use fixed-point representations for energy-efficient inferencing and are optimized for higher-level vision tasks such as classification and recognition. In this dissertation, overcoming these limitations to perform various tasks energy-efficiently with minimal overhead in existing accelerators is proposed. To this end, when performing diverse tasks of the deep neural network, two methods are used to minimize access to external memory, which is the primary energy consumption source of the accelerator. A brief introduction to both ways follows. 1. Reformation of deep neural networks considering quantization error: Most deep neural networks trained and distributed by suppliers use floating-point representation for high prediction accuracy. However, embedded accelerators use a fixed-point representation for energy-efficient operation. Therefore, a quantization process that converts to a low-precision fixed-point representation while maintaining high accuracy is indispensable. In this dissertation, deep neural network modification methods are proposed to make robust from quantization errors by predicting and analyzing errors in the quantization process. 2. Data compression using spatial correlation: To reduce the amount of external memory access data used by the built-in accelerator, most of them use compression methods that take advantage of the activation function's sparsity to support parallel operations. However, depending on the deep neural network and the type of data, there are cases in which sufficient sparsity to use the existing compression method cannot be provided. This dissertation proposes compression methods that can effectively reduce communication even for deep neural networks and data possing low sparsity by using spatial correlation. In summary, this dissertation proposes energy-efficient executing strategies for performing various tasks on embedded accelerators. The comprehensive goal is to improve the deep neural-network accelerator's energy efficiency. The main goal is to effectively reduce the amount of communication with external memory during execution with negligible overhead. To this subject, the data characteristics of deep neural networks and the limitations of accelerators are analyzed, and the proposed methods solve the raised problem.
Advisors
Kim, Lee-Supresearcher김이섭researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2021
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2021.8,[iv, 59 p. :]

Keywords

Deep neural network▼aAccelerator▼aQuantization▼aNeural network reformation▼aData compression; 심층 신경망▼a가속기▼a양자화▼a신경망 변환▼a데이터 압축

URI
http://hdl.handle.net/10203/295620
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=962476&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0