A 8.81 TFLOPS/W Deep-Reinforcement-Learning Accelerator With Delta-Based Weight Sharing and Block-Mantissa Reconfigurable PE Array

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 7
  • Download : 0
TD3 is one of the most high-performing Deep Reinforcement Learning (DRL) algorithms, providing high training stability and rewards. However, it suffers from low energy efficiency due to high External Memory Access (EMA) and floating point operations. To mitigate this issue and achieve higher throughput and energy efficiency, we propose the DRL accelerator with 3 features: 1) Delta-based Weight Sharing (DWS) represents weights by referencing corresponding network and exploits data locality, leading to EMA reduction of up to 64.3% in feed-forward stage and 39.7% in gradient generation and weight update stage. 2) Block-Mantissa Reconfigurable PE Array (BMRPA) supports variable operations in blocks and mantissa to provide optimal precision for each layer, resulting in up to a 4x increase in throughput. 3) Multi-mode Data Fetcher (MDF) supports bit width adaptive data fetching, achieving twice the bandwidth with an average read overhead of 5.3%. When combined with BMRPA, it attains an energy efficiency of 8.81 TFLOPS/W.
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Issue Date
2024-05
Language
English
Article Type
Article
Citation

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, v.71, no.5, pp.2529 - 2533

ISSN
1549-7747
DOI
10.1109/TCSII.2024.3374725
URI
http://hdl.handle.net/10203/322487
Appears in Collection
RIMS Journal PapersEE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0