7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16

Cited 81 time in webofscience Cited 88 time in scopus
  • Hit : 396
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorLee, Jinsuko
dc.contributor.authorLee, Juhyoungko
dc.contributor.authorHan, Donghyeonko
dc.contributor.authorLee, Jinmookko
dc.contributor.authorPark, Gwangtaeko
dc.contributor.authorYoo, Hoi-Junko
dc.date.accessioned2019-11-28T03:20:44Z-
dc.date.available2019-11-28T03:20:44Z-
dc.date.created2019-11-27-
dc.date.created2019-11-27-
dc.date.created2019-11-27-
dc.date.issued2019-02-
dc.identifier.citation2019 IEEE International Solid-State Circuits Conference, ISSCC 2019, pp.142 - 144-
dc.identifier.urihttp://hdl.handle.net/10203/268663-
dc.description.abstractRecently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN learning with domain-specific and private data is required meet various user preferences on edge or mobile devices. Since edge and mobile devices contain only limited computation capability with battery power, an energy-efficient DNN learning processor is necessary. Only [6] supported on-chip DNN learning, but it was not energy-efficient, as it did not utilize sparsity which represents 37%-61% of the inputs for various CNNs, such as VGG16, AlexNet and ResNet-18, as shown in Fig. 7.7.1. Although [3-5] utilized the sparsity, they only considered the inference phase with inter-channel accumulation in Fig. 7.7.1, and did not support intra-channel accumulation for the weight-gradient generation (WG) step of the learning phase. Also, [6] adopted FP16, but it was not energy optimal because FP8 is enough for many input operands with 4× less energy than FP16.-
dc.languageEnglish-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.title7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16-
dc.typeConference-
dc.identifier.wosid000463153600043-
dc.identifier.scopusid2-s2.0-85063504226-
dc.type.rimsCONF-
dc.citation.beginningpage142-
dc.citation.endingpage144-
dc.citation.publicationname2019 IEEE International Solid-State Circuits Conference, ISSCC 2019-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationSan Francisco, CA-
dc.identifier.doi10.1109/ISSCC.2019.8662302-
dc.contributor.localauthorYoo, Hoi-Jun-
dc.contributor.nonIdAuthorLee, Jinsu-
dc.contributor.nonIdAuthorLee, Juhyoung-
dc.contributor.nonIdAuthorHan, Donghyeon-
dc.contributor.nonIdAuthorLee, Jinmook-
dc.contributor.nonIdAuthorPark, Gwangtae-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 81 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0