A 8.81 TFLOPS/W Deep-Reinforcement-Learning Accelerator With Delta-Based Weight Sharing and Block-Mantissa Reconfigurable PE Array

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 5
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorAn, Sanghyukko
dc.contributor.authorRyu, Junhako
dc.contributor.authorPark, Gwangtaeko
dc.contributor.authorYoo, Hoi-Junko
dc.date.accessioned2024-08-30T03:00:14Z-
dc.date.available2024-08-30T03:00:14Z-
dc.date.created2024-08-29-
dc.date.issued2024-05-
dc.identifier.citationIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, v.71, no.5, pp.2529 - 2533-
dc.identifier.issn1549-7747-
dc.identifier.urihttp://hdl.handle.net/10203/322487-
dc.description.abstractTD3 is one of the most high-performing Deep Reinforcement Learning (DRL) algorithms, providing high training stability and rewards. However, it suffers from low energy efficiency due to high External Memory Access (EMA) and floating point operations. To mitigate this issue and achieve higher throughput and energy efficiency, we propose the DRL accelerator with 3 features: 1) Delta-based Weight Sharing (DWS) represents weights by referencing corresponding network and exploits data locality, leading to EMA reduction of up to 64.3% in feed-forward stage and 39.7% in gradient generation and weight update stage. 2) Block-Mantissa Reconfigurable PE Array (BMRPA) supports variable operations in blocks and mantissa to provide optimal precision for each layer, resulting in up to a 4x increase in throughput. 3) Multi-mode Data Fetcher (MDF) supports bit width adaptive data fetching, achieving twice the bandwidth with an average read overhead of 5.3%. When combined with BMRPA, it attains an energy efficiency of 8.81 TFLOPS/W.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleA 8.81 TFLOPS/W Deep-Reinforcement-Learning Accelerator With Delta-Based Weight Sharing and Block-Mantissa Reconfigurable PE Array-
dc.typeArticle-
dc.identifier.wosid001230987700069-
dc.identifier.scopusid2-s2.0-85188005568-
dc.type.rimsART-
dc.citation.volume71-
dc.citation.issue5-
dc.citation.beginningpage2529-
dc.citation.endingpage2533-
dc.citation.publicationnameIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS-
dc.identifier.doi10.1109/TCSII.2024.3374725-
dc.contributor.localauthorAn, Sanghyuk-
dc.contributor.localauthorYoo, Hoi-Jun-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorTraining-
dc.subject.keywordAuthorThroughput-
dc.subject.keywordAuthorEnergy efficiency-
dc.subject.keywordAuthorDecoding-
dc.subject.keywordAuthorArtificial neural networks-
dc.subject.keywordAuthorVectors-
dc.subject.keywordAuthorTask analysis-
dc.subject.keywordAuthorDeep reinforcement learning-
dc.subject.keywordAuthorTD3-
dc.subject.keywordAuthorexternal memory access-
dc.subject.keywordAuthorblock floating point-
dc.subject.keywordAuthorreconfigurable-
Appears in Collection
RIMS Journal PapersEE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0