FlashMAC: A Time-Frequency Hybrid MAC Architecture With Variable Latency-Aware Scheduling for TinyML Systems

Cited 1 time in webofscience Cited 0 time in scopus
  • Hit : 178
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorGweon, Surinko
dc.contributor.authorKang, Sanghoonko
dc.contributor.authorKim, Kwantaeko
dc.contributor.authorYoo, Hoi-Junko
dc.date.accessioned2022-10-11T03:00:27Z-
dc.date.available2022-10-11T03:00:27Z-
dc.date.created2022-08-09-
dc.date.created2022-08-09-
dc.date.created2022-08-09-
dc.date.issued2022-10-
dc.identifier.citationIEEE JOURNAL OF SOLID-STATE CIRCUITS, v.57, no.10, pp.2944 - 2956-
dc.identifier.issn0018-9200-
dc.identifier.urihttp://hdl.handle.net/10203/298917-
dc.description.abstractWith the widespread of deep neural networks (DNNs) in diverse applications, tiny platforms such as Internet-of-Things devices are starting to adopt DNNs. Due to their extreme energy and form factor constraints, conventional digital-only implementations of multiply-and-accumulate (MAC) acceleration faced fundamental limitations. To that end, the investigation into mixed-signal computing architectures is growing rapidly. Motivated by the flash ADC, this article proposes FlashMAC architecture that can natively support multibit multiplication. In addition, through fusing time- and frequency-domain computing methods without power-hungry oscillators, it enables low latency accumulation with low power consumption. As a result, the proposed time-frequency hybrid architecture achieves high energy efficiency with the support for complex DNN models requiring higher precision. To enhance the robustness of PVT variation of the mixed-signal architecture, a frequency calibration loop is integrated. In addition, motivated by the data-dependent performance of the FlashMAC architecture, variable latency-aware scheduling is proposed. The FlashMAC does not skip MAC operations as zero-skipping architectures do, but the latency of the operation can be lower when operands are smaller in magnitude. Tackling the issue through software and hardware co-optimization, loose synchronization architecture and magnitude-aware weight reordering increase the DNN benchmark performance by achieving higher utilization of the parallel FlashMAC array. The proposed features are integrated into a test chip which is fabricated in 65-nm logic CMOS technology. The silicon chip achieves 56.52 TOPS/W peak energy efficiency and a peak operating frequency of 90 MHz. Tested with the VGG16 benchmark trained on the Imagenet dataset, it achieved 17.04-ms latency while showing 11.15 TOPS/W energy efficiency. As a result, compared to the previous state-of-the-art, the proposed FlashMAC achieved 3.15x higher normalized energy efficiency.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleFlashMAC: A Time-Frequency Hybrid MAC Architecture With Variable Latency-Aware Scheduling for TinyML Systems-
dc.typeArticle-
dc.identifier.wosid000824709300001-
dc.identifier.scopusid2-s2.0-85134265844-
dc.type.rimsART-
dc.citation.volume57-
dc.citation.issue10-
dc.citation.beginningpage2944-
dc.citation.endingpage2956-
dc.citation.publicationnameIEEE JOURNAL OF SOLID-STATE CIRCUITS-
dc.identifier.doi10.1109/JSSC.2022.3182699-
dc.contributor.localauthorYoo, Hoi-Jun-
dc.contributor.nonIdAuthorGweon, Surin-
dc.contributor.nonIdAuthorKim, Kwantae-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorAnalog computing-
dc.subject.keywordAuthordeep learning (DL)-
dc.subject.keywordAuthorfrequency-domain computing-
dc.subject.keywordAuthormixed-signal multiply-and-accumulate (MAC)-
dc.subject.keywordAuthortime-domain computing-
dc.subject.keywordAuthorTinyML-
dc.subject.keywordPlusACCELERATOR-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 1 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0