As deep neural networks (DNNs) show remarkable achievements in various fields, the demand for fast and energy-efficient hardware for DNNs is increasing. This demand has developed a domain-specific hardware architecture and various methods for efficient DNN processing. Previous studies improve computing performance and energy efficiency by skipping ineffectual computations and compressing ineffectual data. However, despite the improvements from such effective methods, it is still insufficient to meet the increasing demand.
In this thesis, we propose accelerator architectures that further eliminate redundancy in DNN processing. Specifically, two designs are proposed, each of which covers redundant computation in DNN inference and redundant data in DNN training, respectively. First, the proposed architecture with the first design processes DNNs with redundancy-free computing beyond zero-free computing. Zero-free computing eliminates only ineffectual computations from zero-valued data. Meanwhile, the proposed redundancy-free computing identifies repeated data and the consequent repeated computation, and then performs only single computation while the other redundant computations are all skipped. In redundancy-free computing, repeated sparse (zero-valued) data is regarded as a special case. By eliminating more unnecessary computations, DNN inference is performed faster and more energy-efficient with the proposed design. The proposed architecture with the second design eliminates redundant data that are not critical to training quality. DNN training accelerators need to stash data that are generated in forward propagation in order to use them in backpropagation. As a result, the efficiency of DNN training is limited by the memory capacity and bandwidth for the stashing. The proposed method is based on the observation that even if a large part of the stashed data is not used in the backpropagation, it does not significantly affect training quality. By eliminating the redundant data during training, the proposed architecture improves training performance with reduced memory footprint. As the demand for fast and energy-efficient DNN processing is constantly increasing, the proposed redundancy-free DNN accelerator architectures and methods are expected to help the development and application of artificial intelligence.