DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Juhyoung | ko |
dc.contributor.author | Kim, Sangyeob | ko |
dc.contributor.author | Kim, Sangjin | ko |
dc.contributor.author | Jo, Wooyoung | ko |
dc.contributor.author | Kim, Ji-Hoon | ko |
dc.contributor.author | Han, Donghyeon | ko |
dc.contributor.author | Yoo, Hoi-Jun | ko |
dc.date.accessioned | 2022-04-13T06:49:42Z | - |
dc.date.available | 2022-04-13T06:49:42Z | - |
dc.date.created | 2022-02-06 | - |
dc.date.created | 2022-02-06 | - |
dc.date.created | 2022-02-06 | - |
dc.date.issued | 2022-04 | - |
dc.identifier.citation | IEEE JOURNAL OF SOLID-STATE CIRCUITS, v.57, no.4, pp.999 - 1012 | - |
dc.identifier.issn | 0018-9200 | - |
dc.identifier.uri | http://hdl.handle.net/10203/292574 | - |
dc.description.abstract | In this article, we present an energy-efficient deep reinforcement learning (DRL) processor, OmniDRL, for DRL training on edge devices. Recently, the need for DRL training is growing due to the DRL's distinct characteristics that can be adapted to each user. However, a massive amount of external and internal memory access limits the implementation of DRL training on resource-constrained platforms. OmniDRL proposes four key features that can reduce external memory access by compressing as much data as possible and can reduce internal memory access by directly processing compressed data. A group-sparse training (GST) enables a high weight compression ratio (CR) for every DRL iteration by selective utilization of weight grouping and weight pruning. A group-sparse training core is proposed to fully take advantage of compressed weight from GST by skipping redundant operations and reusing duplicated data. An exponent-mean-delta encoding additionally compresses the exponent of both weight and feature map for higher CR and low memory power consumption. A world-first on-chip sparse weight transposer enables the DRL training process of compressed weight without off-chip transposer. As a result, OmniDRL is fabricated in a 28-nm CMOS technology and occupies a 3.6x3.6 mm(2) die area. It shows a state-of-the-art peak performance of 4.18 TFLOPS and a peak energy efficiency of 29.3 TFLOPS/W. It achieved 7.42-TFLOPS/W energy efficiency for training robot agent (Mujoco Halfcheetah, TD3), which is 2.4x higher than the previous state of the art. | - |
dc.language | English | - |
dc.publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | - |
dc.title | OmniDRL: An Energy-Efficient Deep Reinforcement Learning Processor With Dual-Mode Weight Compression and Sparse Weight Transposer | - |
dc.type | Article | - |
dc.identifier.wosid | 000745482800001 | - |
dc.identifier.scopusid | 2-s2.0-85122858788 | - |
dc.type.rims | ART | - |
dc.citation.volume | 57 | - |
dc.citation.issue | 4 | - |
dc.citation.beginningpage | 999 | - |
dc.citation.endingpage | 1012 | - |
dc.citation.publicationname | IEEE JOURNAL OF SOLID-STATE CIRCUITS | - |
dc.identifier.doi | 10.1109/JSSC.2021.3138520 | - |
dc.contributor.localauthor | Yoo, Hoi-Jun | - |
dc.contributor.nonIdAuthor | Jo, Wooyoung | - |
dc.contributor.nonIdAuthor | Kim, Ji-Hoon | - |
dc.description.isOpenAccess | N | - |
dc.type.journalArticle | Article | - |
dc.subject.keywordAuthor | Training | - |
dc.subject.keywordAuthor | Memory management | - |
dc.subject.keywordAuthor | Reinforcement learning | - |
dc.subject.keywordAuthor | Power demand | - |
dc.subject.keywordAuthor | Task analysis | - |
dc.subject.keywordAuthor | Computational modeling | - |
dc.subject.keywordAuthor | Bandwidth | - |
dc.subject.keywordAuthor | Data compression | - |
dc.subject.keywordAuthor | deep reinforcement learning (DRL) | - |
dc.subject.keywordAuthor | energy-efficient deep neural network (DNN) application-specific integrated circuit (ASIC) | - |
dc.subject.keywordAuthor | structured weight | - |
dc.subject.keywordAuthor | transposer | - |
dc.subject.keywordAuthor | weight pruning | - |
dc.subject.keywordPlus | LEVEL | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.