Zero and Narrow-Width Value-Aware Compression for Quantized Convolutional Neural Networks

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 77
  • Download : 0
Convolutional neural networks are normally used in systems with dedicated neural processing units for CNN-related computations. For high performance and low hardware overheads, CNN datatype quantization is applied. As an additional optimization, to further reduce DRAM accesses, compression algorithms have been used for CNN data. However, conventional zero value-aware compression algorithms suffer from a reduction in compression ratio with the latest quantized CNNs, owing to the small number of zero values. Moreover, the appropriate zero run-length code width can be changed dynamically based on the CNNs, layers, and quantization datatypes. As another compressible data value for increasing the compression ratio, the latest quantized CNNs have many narrow-width values. Because low-precision quantization reduces the data bit width, CNN data are gathered into a few discrete values and incur a biased data distribution. These discrete values become narrow-width values, and constitute a large proportion of the biased distribution. In this article, we propose an efficient compression algorithm for quantized CNNs, ENCORE, which utilizes variable zero run-length encoding and compresses narrow-width values. With the latest quantized CNNs, ENCORE shows higher compression ratios, 93.55% and 50.85% in Mobilenet v1 and Tiny YOLO v3, respectively, than conventional zero value-aware CNN data compression algorithms.
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Issue Date
2024-01
Language
English
Article Type
Article
Citation

IEEE Transactions on Computers, v.73, no.1, pp.249 - 262

ISSN
0018-9340
DOI
10.1109/tc.2023.3315051
URI
http://hdl.handle.net/10203/318599
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0