GANPU: An Energy-Efficient Multi-DNN Training Processor for GANs With Speculative Dual-Sparsity Exploitation

Cited 15 time in webofscience Cited 0 time in scopus
  • Hit : 639
  • Download : 61
DC FieldValueLanguage
dc.contributor.authorKang, Sanghoonko
dc.contributor.authorHan, Donghyeonko
dc.contributor.authorLee, Juhyoungko
dc.contributor.authorIm, Dongseokko
dc.contributor.authorKim, Sangyeobko
dc.contributor.authorKim, Soyeonko
dc.contributor.authorRyu, Junhako
dc.contributor.authorYoo, Hoi-Junko
dc.date.accessioned2021-09-26T01:30:17Z-
dc.date.available2021-09-26T01:30:17Z-
dc.date.created2021-09-24-
dc.date.created2021-09-24-
dc.date.created2021-09-24-
dc.date.issued2021-09-
dc.identifier.citationIEEE JOURNAL OF SOLID-STATE CIRCUITS, v.56, no.9, pp.2845 - 2857-
dc.identifier.issn0018-9200-
dc.identifier.urihttp://hdl.handle.net/10203/287858-
dc.description.abstractThis article presents generative adversarial network processing unit (GANPU), an energy-efficient multiple deep neural network (DNN) training processor for GANs. It enables on-device training of GANs on performance- and battery-limited mobile devices, without sending user-specific data to servers, fully evading privacy concerns. Training GANs require a massive amount of computation, and therefore, it is difficult to accelerate in a resource-constrained platform. Besides, networks and layers in GANs show dramatically changing operational characteristics, making it difficult to optimize the processor's core and bandwidth allocation. For higher throughput and energy efficiency, this article proposed three key features. An adaptive spatiotemporal workload multiplexing is proposed to maintain high utilization in accelerating multiple DNNs in a single GAN model. To take advantage of ReLU sparsity during both inference and training, dual-sparsity exploitation architecture is proposed to skip redundant computations due to input and output feature zeros. Moreover, an exponent-only ReLU speculation (EORS) algorithm is proposed along with its lightweight processing element (PE) architecture, to estimate the location of output feature zeros during the inference with minimal hardware overhead. Fabricated in a 65-nm process, the GANPU achieved the energy efficiency of 75.68 TFLOPS/W for 16-bit floating-point computation, which is 4.85x higher than the state of the art. As a result, GANPU enables on-device training of GANs with high energy efficiency.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleGANPU: An Energy-Efficient Multi-DNN Training Processor for GANs With Speculative Dual-Sparsity Exploitation-
dc.typeArticle-
dc.identifier.wosid000690441300022-
dc.identifier.scopusid2-s2.0-85104671041-
dc.type.rimsART-
dc.citation.volume56-
dc.citation.issue9-
dc.citation.beginningpage2845-
dc.citation.endingpage2857-
dc.citation.publicationnameIEEE JOURNAL OF SOLID-STATE CIRCUITS-
dc.identifier.doi10.1109/JSSC.2021.3066572-
dc.embargo.liftdate9999-12-31-
dc.embargo.terms9999-12-31-
dc.contributor.localauthorYoo, Hoi-Jun-
dc.contributor.nonIdAuthorKim, Soyeon-
dc.contributor.nonIdAuthorRyu, Junha-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorDeep neural network (DNN) processor-
dc.subject.keywordAuthorgenerative adversarial network (GAN)-
dc.subject.keywordAuthormultiple DNN acceleration-
dc.subject.keywordAuthorneural processing unit (NPU)-
dc.subject.keywordAuthoron-chip training-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 15 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0