A Deep Neural Network Training Architecture with Inference-aware Heterogeneous Data-type

Cited 4 time in webofscience Cited 0 time in scopus
  • Hit : 329
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorChoi, Seungkyuko
dc.contributor.authorShin, Jaekangko
dc.contributor.authorKim, Lee-Supko
dc.date.accessioned2022-04-25T10:00:12Z-
dc.date.available2022-04-25T10:00:12Z-
dc.date.created2021-06-15-
dc.date.created2021-06-15-
dc.date.issued2022-05-
dc.identifier.citationIEEE TRANSACTIONS ON COMPUTERS, v.71, no.5, pp.1216 - 1229-
dc.identifier.issn0018-9340-
dc.identifier.urihttp://hdl.handle.net/10203/295891-
dc.description.abstractAs deep learning applications often encounter accuracy degradation due to the distorted inputs from a variety of environmental conditions, training with personal data has become essential for the edge devices. Hence, training on edge by supporting a trainable deep learning accelerator has been actively studied. Nevertheless, previous research does not consider the fundamental datapath for training and the importance of retaining the high performance for inference tasks. In this work, we propose NeuroFlix, a deep neural network training accelerator supporting heterogeneous data-type of floating- and fixed-point for input operands. From two perspectives: 1)separate precision decision for each input data, 2)maintenance of high performance on inference, we configure the data with low-bit fixed-point of activation/weight and floating-point based error gradient securing up to half-precision. A novel MAC architecture is designed to compute low/high-precision modes for the different input combinations. By substituting a high-cost floating-point based addition to brick-level separate accumulations, we realize both area-efficient architecture and high throughput for low-precision computation. Consequently, NeuroFlix outperforms the previous architectures of state-of-the-art configurations proving its high efficiency in both training and inference. By also comparing with the off-the-shelf bfloat16-based accelerator, it achieves 1.2/2.0 of speedup/energy-efficiency at training and further enhancement of 3.6/4.5 at inference.-
dc.languageEnglish-
dc.publisherIEEE COMPUTER SOC-
dc.titleA Deep Neural Network Training Architecture with Inference-aware Heterogeneous Data-type-
dc.typeArticle-
dc.identifier.wosid000778905700018-
dc.identifier.scopusid2-s2.0-85105844527-
dc.type.rimsART-
dc.citation.volume71-
dc.citation.issue5-
dc.citation.beginningpage1216-
dc.citation.endingpage1229-
dc.citation.publicationnameIEEE TRANSACTIONS ON COMPUTERS-
dc.identifier.doi10.1109/TC.2021.3078316-
dc.contributor.localauthorKim, Lee-Sup-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorTraining-
dc.subject.keywordAuthorComputer architecture-
dc.subject.keywordAuthorThroughput-
dc.subject.keywordAuthorQuantization (signal)-
dc.subject.keywordAuthorNeural networks-
dc.subject.keywordAuthorComputational modeling-
dc.subject.keywordAuthorPerformance evaluation-
dc.subject.keywordAuthorDeep neural network-
dc.subject.keywordAuthoron-device training-
dc.subject.keywordAuthormultiply-and-accumulate unit-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 4 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0