OmniDRL: A 29.3 TFLOPS/W Deep Reinforcement Learning Processor with Dualmode Weight Compression and On-chip Sparse Weight Transposer

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 128
  • Download : 0
This paper presents OmniDRL, a 4.18 TFLOPS and 29.3 TFLOPS/W DRL processor. A group-sparse training core and exponent mean delta encoding are proposed to enable weight and feature map compression for every iteration of DRL training. A sparse weight transposer enables on-chip transpose of compressed weight for reducing external memory access. The processor fabricated in 28 nm CMOS technology and occupies 3.6×3.6 mm2 die area. It achieved 7.16 TFLOPS/W energy efficiency for training robot agent (Mujoco Halfcheetah, TD3), which is 2.4× higher than the previous state-of-the-art. © 2021 JSAP.
Publisher
Institute of Electrical and Electronics Engineers Inc.
Issue Date
2021-06
Language
English
Citation

35th Symposium on VLSI Circuits, VLSI Circuits 2021

ISSN
2158-5601
DOI
10.23919/VLSICircuits52068.2021.9492504
URI
http://hdl.handle.net/10203/288887
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0