Energy-efficient MAC architecture design and runtime convergence detecting method for on-device DNN training온 디바이스 DNN 학습을 위한 에너지 효율적인 MAC 아키텍처 및 런타임 수렴 감지 기법 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 54
  • Download : 0
Artificial intelligence (AI) can be accessed by various applications in real life where a vast network is formed with IoT devices. As both the algorithmic efficiency and hardware performance have steadily improved, the efficiency of deep learning operations on devices has been maximized. Subsequently, research to support training operations in deep learning accelerators to enhance the application performance by considering the user environment came into the spotlight. In the thesis, we propose design techniques to support on-device training in deep learning accelerators. Prior to the research, it should be noted that deep learning accelerators for devices perform inference operations as their main tasks. In other words, we focus on developing an accelerator that can support the training operations by using the minimum overhead without degrading the performance of the inference operation. Based on this objective, we propose microarchitecture design and algorithm-hardware co-design for the deep learning accelerator to alleviate the computational burden of DNN training. For the first study, we design low-precision training architecture applied with fixed-point multiply-and-accumulate (MAC) units along with the proposal of device personalization suitable for the on-device situation. A fabricated chip of heterogeneous dataflow architecture is proposed as a primitive accelerator for reconfigurable computation of inference and training. Subsequently, a concrete method for architecture design of low-bit training is introduced. By constructing a framework with customized on-device training applications, concrete examinations on device-level low-bit training applied with the proposed schemes are fulfilled. Secondly, to overcome the limitation of the fixed-point based training, we propose a novel architecture occupied with reconfigurable MAC receiving heterogeneous data-type. By replacing high-cost floating-point accumulation with separate brick-level fixed-point accumulations, area/power-efficient MAC operations are enabled, and maximal throughput can be obtained in the inference phase. Lastly, rather than focusing on dedicated accelerator design, we propose algorithm-hardware co-optimization to be easily mounted on off-the-shelf accelerators for efficient acceleration. Therefore, tested on a more practical on-device environment of transfer learning-based task adaptation, we offer optimal training time for each incoming task. We conclude the thesis by conducting research that considers algorithm-level optimization through more realistic on-device situation settings.
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2022.2,[vi, 79 p. :]

Keywords

Deep learning▼aOn-device training▼aEnergy-efficient processor▼aReconfigurable architecture▼aAlgorithm-hardware co-design; 딥 러닝▼a온 디바이스 학습▼a에너지 효휼적인 프로세서▼a재구성 가능한 연산기 구조▼a알고리즘-하드웨어 공동 설계

URI
http://hdl.handle.net/10203/309074
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1000294&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0