Design and analysis of optimization problems in deep learning심층 학습의 최적화 문제에 대한 설계 및 분석

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 310
  • Download : 0
It has been recently observed that probabilistic ideas could be useful in deep learning. For instance, stochastic gradient descent (SGD) enables a deep neural network to learn a task efficiently, and dropout prevents co-adaptation of neurons through random subnetworks. Despite their wide adoption, our understanding of their role in high dimensional parameter spaces is limited. In this dissertation, we analyze SGD from a geometrical perspective by inspecting the stochasticity of the norms and directions of minibatch gradients. We claim that the directional uniformity of minibatch gradients increases over the course of SGD. Furthermore, we formulate that dropout regularizes learning to minimize the deviation from the origin and that the strength of regularization adapts along the optimization trajectory. Inspired by this theoretical analysis of dropout, we propose a new regularization technique "mixout" useful in transfer learning. Mixout greatly improves both finetuning stability and average performance of pretrained large-scale language models. In the case of training from scratch, we introduce a variant of mixout preventing generator forgetting to avoid mode collapse in GANs.
Advisors
Kang, Wanmoresearcher강완모researcher
Description
한국과학기술원 :수리과학과,
Publisher
한국과학기술원
Issue Date
2020
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 수리과학과, 2020.8,[xii, 80 p. :]

Keywords

deep learning▼astochastic gradient descent▼adropout▼afinetuning stability▼amode collapse; 심층 학습▼a확률적 경사하강법▼a드롭아웃▼a미세조정 안정성▼a모드 붕괴

URI
http://hdl.handle.net/10203/284354
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=924355&flag=dissertation
Appears in Collection
MA-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0