Bayesian weight decay for deep convolutional neural networks : approximation and generalization심층 회선 신경망의 베이지언 가중치 감쇠 : 근사화와 일반화

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 188
  • Download : 0
This study determines the weight decay parameter value of a deep convolutional neural network (CNN) that yields a good generalization. Although the weight decay is theoretically related to generalization error, determining a value of the weight decay is known to be a challenging issue. Deep CNNs are widely used in vision applications and guaranteeing their classification accuracy on unseen data is important. To obtain such a CNN in general, numerical trials with different weight decay values are needed. However, the larger the CNN architecture, the higher the computational cost of the trials. To address this problem, this study derives an analytical form for the decay parameter through a proposed objective function in conjunction with Bayesian probability distributions. For computational efficiency, a novel method to approximate this form is suggested. This method uses a small amount of information in the Hessian matrix. Under general conditions, the approximate form is guaranteed by a provable bound and is obtained by a proposed algorithm with discretized information, where its time complexity is linear in terms of the depth and width of the CNN. The bound provides a consistent result of the proposed learning scheme. Also, the generalization error of CNN trained by the proposed algorithm is analyzed with statistical learning theory and the analysis on computational complexity shows the rate of efficiency. By reducing the computational cost of determining the decay value, the approximation allows for the fast investigation of a deep CNN which yields a small generalization error. Experimental results show that the assumption verified with different deep CNNs is suitable for real-world image datasets. In addition, the method can show a remarkable time complexity reduction with achieving good classification accuracy when it is applied to deeper classification neural networks, more complex training methods, and/or objective functions requiring the high computational cost. The proposed method has an advantage in that it can be applied to any deep classification network trained by a loss function which satisfies mild conditions.
Advisors
Jo, Sungho Joresearcher조성호researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2020
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2020.2,[iv, 59 p. :]

Keywords

Bayesian method▼aconvolutional neural networks▼acomputational complexity▼ainverse Hessian matrix▼aregularization▼aweight decay; 베이지언 기법▼a계산 복잡도▼a회선 신경망▼a역 헤시안 행렬▼a학습 규제▼a가중치 감소

URI
http://hdl.handle.net/10203/284154
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=909372&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0