DC Field | Value | Language |
---|---|---|
dc.contributor.author | Gweon, Surin | ko |
dc.contributor.author | Kang, Sanghoon | ko |
dc.contributor.author | Kim, Kwantae | ko |
dc.contributor.author | Yoo, Hoi-Jun | ko |
dc.date.accessioned | 2022-10-11T03:00:27Z | - |
dc.date.available | 2022-10-11T03:00:27Z | - |
dc.date.created | 2022-08-09 | - |
dc.date.created | 2022-08-09 | - |
dc.date.created | 2022-08-09 | - |
dc.date.issued | 2022-10 | - |
dc.identifier.citation | IEEE JOURNAL OF SOLID-STATE CIRCUITS, v.57, no.10, pp.2944 - 2956 | - |
dc.identifier.issn | 0018-9200 | - |
dc.identifier.uri | http://hdl.handle.net/10203/298917 | - |
dc.description.abstract | With the widespread of deep neural networks (DNNs) in diverse applications, tiny platforms such as Internet-of-Things devices are starting to adopt DNNs. Due to their extreme energy and form factor constraints, conventional digital-only implementations of multiply-and-accumulate (MAC) acceleration faced fundamental limitations. To that end, the investigation into mixed-signal computing architectures is growing rapidly. Motivated by the flash ADC, this article proposes FlashMAC architecture that can natively support multibit multiplication. In addition, through fusing time- and frequency-domain computing methods without power-hungry oscillators, it enables low latency accumulation with low power consumption. As a result, the proposed time-frequency hybrid architecture achieves high energy efficiency with the support for complex DNN models requiring higher precision. To enhance the robustness of PVT variation of the mixed-signal architecture, a frequency calibration loop is integrated. In addition, motivated by the data-dependent performance of the FlashMAC architecture, variable latency-aware scheduling is proposed. The FlashMAC does not skip MAC operations as zero-skipping architectures do, but the latency of the operation can be lower when operands are smaller in magnitude. Tackling the issue through software and hardware co-optimization, loose synchronization architecture and magnitude-aware weight reordering increase the DNN benchmark performance by achieving higher utilization of the parallel FlashMAC array. The proposed features are integrated into a test chip which is fabricated in 65-nm logic CMOS technology. The silicon chip achieves 56.52 TOPS/W peak energy efficiency and a peak operating frequency of 90 MHz. Tested with the VGG16 benchmark trained on the Imagenet dataset, it achieved 17.04-ms latency while showing 11.15 TOPS/W energy efficiency. As a result, compared to the previous state-of-the-art, the proposed FlashMAC achieved 3.15x higher normalized energy efficiency. | - |
dc.language | English | - |
dc.publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | - |
dc.title | FlashMAC: A Time-Frequency Hybrid MAC Architecture With Variable Latency-Aware Scheduling for TinyML Systems | - |
dc.type | Article | - |
dc.identifier.wosid | 000824709300001 | - |
dc.identifier.scopusid | 2-s2.0-85134265844 | - |
dc.type.rims | ART | - |
dc.citation.volume | 57 | - |
dc.citation.issue | 10 | - |
dc.citation.beginningpage | 2944 | - |
dc.citation.endingpage | 2956 | - |
dc.citation.publicationname | IEEE JOURNAL OF SOLID-STATE CIRCUITS | - |
dc.identifier.doi | 10.1109/JSSC.2022.3182699 | - |
dc.contributor.localauthor | Yoo, Hoi-Jun | - |
dc.contributor.nonIdAuthor | Gweon, Surin | - |
dc.contributor.nonIdAuthor | Kim, Kwantae | - |
dc.description.isOpenAccess | N | - |
dc.type.journalArticle | Article | - |
dc.subject.keywordAuthor | Analog computing | - |
dc.subject.keywordAuthor | deep learning (DL) | - |
dc.subject.keywordAuthor | frequency-domain computing | - |
dc.subject.keywordAuthor | mixed-signal multiply-and-accumulate (MAC) | - |
dc.subject.keywordAuthor | time-domain computing | - |
dc.subject.keywordAuthor | TinyML | - |
dc.subject.keywordPlus | ACCELERATOR | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.