DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Kang, Wanmo | - |
dc.contributor.advisor | 강완모 | - |
dc.contributor.author | Lee, Cheolhyoung | - |
dc.date.accessioned | 2021-05-12T19:43:50Z | - |
dc.date.available | 2021-05-12T19:43:50Z | - |
dc.date.issued | 2020 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=924355&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/284354 | - |
dc.description | 학위논문(박사) - 한국과학기술원 : 수리과학과, 2020.8,[xii, 80 p. :] | - |
dc.description.abstract | It has been recently observed that probabilistic ideas could be useful in deep learning. For instance, stochastic gradient descent (SGD) enables a deep neural network to learn a task efficiently, and dropout prevents co-adaptation of neurons through random subnetworks. Despite their wide adoption, our understanding of their role in high dimensional parameter spaces is limited. In this dissertation, we analyze SGD from a geometrical perspective by inspecting the stochasticity of the norms and directions of minibatch gradients. We claim that the directional uniformity of minibatch gradients increases over the course of SGD. Furthermore, we formulate that dropout regularizes learning to minimize the deviation from the origin and that the strength of regularization adapts along the optimization trajectory. Inspired by this theoretical analysis of dropout, we propose a new regularization technique "mixout" useful in transfer learning. Mixout greatly improves both finetuning stability and average performance of pretrained large-scale language models. In the case of training from scratch, we introduce a variant of mixout preventing generator forgetting to avoid mode collapse in GANs. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | deep learning▼astochastic gradient descent▼adropout▼afinetuning stability▼amode collapse | - |
dc.subject | 심층 학습▼a확률적 경사하강법▼a드롭아웃▼a미세조정 안정성▼a모드 붕괴 | - |
dc.title | Design and analysis of optimization problems in deep learning | - |
dc.title.alternative | 심층 학습의 최적화 문제에 대한 설계 및 분석 | - |
dc.type | Thesis(Ph.D) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :수리과학과, | - |
dc.contributor.alternativeauthor | 이철형 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.