Regularizing Class-wise Predictions via Self-knowledge Distillation

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 75
  • Download : 0
Deep neural networks with millions of parameters may suffer from poor generalization due to overfitting. To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. In particular, we distill the predictive distribution between different samples of the same label during training. This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network (i.e., a self-knowledge distillation) by forcing it to produce more meaningful and consistent predictions in a class-wise manner. Consequently, it mitigates overconfident predictions and reduces intra-class variations. Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve not only the generalization ability but also the calibration performance of modern convolutional neural networks.
Publisher
IEEE Computer Society
Issue Date
2020-06-16
Language
English
Citation

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020

DOI
10.1109/CVPR42600.2020.01389
URI
http://hdl.handle.net/10203/278205
Appears in Collection
RIMS Conference Papers
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0