A 47.4µJ/epoch Trainable Deep Convolutional Neural Network Accelerator for In-Situ Personalization on Smart Devices

Cited 5 time in webofscience Cited 0 time in scopus
  • Hit : 950
  • Download : 0
A scalable deep learning accelerator supporting both inference and training is implemented for device personalization of deep convolutional neural networks. It consists of three processor cores operating with distinct energy-efficient dataflow for different types of computation in CNN training. Two cores conduct forward and backward propagation in convolutional layers and utilize a masking scheme to reduce 88.3% of intermediate data to store for training. The third core executes weight update process in convolutional layers and inner product computation in fully connected layers with a novel large window dataflow. The system enables 8-bit fixed point datapath with lossless training and consumes 47.4J/epoch for a customized deep CNN model.
Publisher
IEEE/SSCS
Issue Date
2019-11-05
Language
English
Citation

2019 IEEE Asian Solid-State Circuits Conference

URI
http://hdl.handle.net/10203/269001
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 5 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0