An Energy-Efficient Deep Convolutional Neural Network Inference Processor With Enhanced Output Stationary Dataflow in 65-nm CMOS

Cited 37 time in webofscience Cited 26 time in scopus
  • Hit : 474
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorSim, Jaehyeongko
dc.contributor.authorLee, Sominko
dc.contributor.authorKim, Lee-Supko
dc.date.accessioned2020-01-29T03:20:07Z-
dc.date.available2020-01-29T03:20:07Z-
dc.date.created2020-01-29-
dc.date.created2020-01-29-
dc.date.created2020-01-29-
dc.date.issued2020-01-
dc.identifier.citationIEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, v.28, no.1, pp.87 - 100-
dc.identifier.issn1063-8210-
dc.identifier.urihttp://hdl.handle.net/10203/271828-
dc.description.abstractWe propose a deep convolutional neural network (CNN) inference processor based on a novel enhanced output stationary (EOS) dataflow. Based on the observation that some activations are commonly used in two successive convolutions, the EOS dataflow employs dedicated register files (RFs) for storing such reused activation data to eliminate redundant memory accesses for highly energy-consuming SRAM banks. In addition, processing elements (PEs) are split into multiple small groups such that each group covers a tile of input activation map to increase the usability of activation RFs (ARFs). The processor has two different voltage/frequency domains. The computation domain with 512 PEs operates at near-threshold voltage (NTV) (0.4 V) and 60-MHz frequency to increase energy efficiency, while the rest of the processors including 848-KB SRAMs run at 0.7 V and 120-MHz frequency to increase both on-chip and off-chip memory bandwidths. The measurement results show that our processor is capable of running AlexNet at 831 GOPS/W, VGG-16 at 1151 GOPS/W, ResNet-18 at 1004 GOPS/W, and MobileNet at 948 GOPS/W energy efficiency.-
dc.languageEnglish-
dc.publisherIEEE, Institute of Electrical and Electronics Engineers-
dc.titleAn Energy-Efficient Deep Convolutional Neural Network Inference Processor With Enhanced Output Stationary Dataflow in 65-nm CMOS-
dc.typeArticle-
dc.identifier.wosid000506608100009-
dc.identifier.scopusid2-s2.0-85077823130-
dc.type.rimsART-
dc.citation.volume28-
dc.citation.issue1-
dc.citation.beginningpage87-
dc.citation.endingpage100-
dc.citation.publicationnameIEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS-
dc.identifier.doi10.1109/TVLSI.2019.2935251-
dc.contributor.localauthorKim, Lee-Sup-
dc.contributor.nonIdAuthorLee, Somin-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorEarth Observing System-
dc.subject.keywordAuthorRadio frequency-
dc.subject.keywordAuthorEnergy consumption-
dc.subject.keywordAuthorSystem-on-chip-
dc.subject.keywordAuthorMemory management-
dc.subject.keywordAuthorRegisters-
dc.subject.keywordAuthorRandom access memory-
dc.subject.keywordAuthorConvolutional neural network (CNN)-
dc.subject.keywordAuthordataflow-
dc.subject.keywordAuthordeep learning-
dc.subject.keywordAuthorenergy-efficient processor-
dc.subject.keywordAuthornear-threshold voltage (NTV)-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 37 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0