An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In-Situ Personalization on Smart Devices

Cited 25 time in webofscience Cited 16 time in scopus
  • Hit : 318
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorChoi, Seungkyuko
dc.contributor.authorSim, Jaehyeongko
dc.contributor.authorKang, Myeongguko
dc.contributor.authorChoi, Yeongjaeko
dc.contributor.authorKim, Hyeonukko
dc.contributor.authorKim, Lee-Supko
dc.date.accessioned2020-10-14T02:55:10Z-
dc.date.available2020-10-14T02:55:10Z-
dc.date.created2020-08-12-
dc.date.created2020-08-12-
dc.date.created2020-08-12-
dc.date.created2020-08-12-
dc.date.issued2020-10-
dc.identifier.citationIEEE JOURNAL OF SOLID-STATE CIRCUITS, v.55, no.10, pp.2691 - 2702-
dc.identifier.issn0018-9200-
dc.identifier.urihttp://hdl.handle.net/10203/276545-
dc.description.abstractA scalable deep-learning accelerator supporting the training process is implemented for device personalization of deep convolutional neural networks (CNNs). It consists of three processor cores operating with distinct energy-efficient dataflow for different types of computation in CNN training. Unlike the previous works where they implement design techniques to exploit the same characteristics from the inference, we analyze major issues that occurred from training in a resource-constrained system to resolve the bottlenecks. A masking scheme in the propagation core reduces a massive amount of intermediate activation data storage. It eliminates frequent off-chip memory accesses for holding the generated activation data until the backward path. A disparate dataflow architecture is implemented for the weight gradient computation to enhance PE utilization while maximally reuse the input data. Furthermore, the modified weight update system enables an 8-bit fixed-point computing datapath. The processor is implemented in 65-nm CMOS technology and occupies 10.24 mm(2) of the core area. It operates with the supply voltage from 0.63 to 1.0 V, and the computing engine runs in near-threshold voltage of 0.5 V. The chip consumes 40.7 mW at 50 MHz with the highest efficiency and achieves 47.4 mu J/epoch of training efficiency for the customized CNN model.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleAn Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In-Situ Personalization on Smart Devices-
dc.typeArticle-
dc.identifier.wosid000572629500007-
dc.identifier.scopusid2-s2.0-85089364270-
dc.type.rimsART-
dc.citation.volume55-
dc.citation.issue10-
dc.citation.beginningpage2691-
dc.citation.endingpage2702-
dc.citation.publicationnameIEEE JOURNAL OF SOLID-STATE CIRCUITS-
dc.identifier.doi10.1109/JSSC.2020.3005786-
dc.contributor.localauthorKim, Lee-Sup-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordPlusConvolutional neural network (CNN)-
dc.subject.keywordPlusdataflow-
dc.subject.keywordPlusdeep-learning application-specific integrated circuit (ASIC)-
dc.subject.keywordPlusdeep learning-
dc.subject.keywordPlusneural network training-
dc.subject.keywordPlustraining processor-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 25 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0