Adaptive Proximal Gradient Methods for Structured Neural Networks

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 245
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorYun, Jihunko
dc.contributor.authorAurelie, Lozanoko
dc.contributor.authorYang, Eunhoko
dc.date.accessioned2021-12-09T06:47:26Z-
dc.date.available2021-12-09T06:47:26Z-
dc.date.created2021-12-03-
dc.date.created2021-12-03-
dc.date.issued2021-12-10-
dc.identifier.citation35th Conference on Neural Information Processing Systems (NeurIPS)-
dc.identifier.issn1049-5258-
dc.identifier.urihttp://hdl.handle.net/10203/290286-
dc.description.abstractWe consider the training of structured neural networks where the regularizer can be non-smooth and possibly non-convex. While popular machine learning libraries have resorted to stochastic (adaptive) subgradient approaches, the use of proximal gradient methods in the stochastic setting has been little explored and warrants further study, in particular regarding the incorporation of adaptivity. Towards this goal, we present a general framework of stochastic proximal gradient descent methods that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. We derive two important instances of our framework: (i) the first proximal version of \textsc{Adam}, one of the most popular adaptive SGD algorithm, and (ii) a revised version of ProxQuant for quantization-specific regularizers, which improves upon the original approach by incorporating the effect of preconditioners in the proximal mapping computations. We provide convergence guarantees for our framework and show that adaptive gradient methods can have faster convergence in terms of constant than vanilla SGD for sparse data. Lastly, we demonstrate the superiority of stochastic proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that the benefit of proximal approaches over sub-gradient counterparts is more pronounced for non-convex regularizers than for convex ones.-
dc.languageEnglish-
dc.publisherNeural Information Processing Systems-
dc.titleAdaptive Proximal Gradient Methods for Structured Neural Networks-
dc.typeConference-
dc.identifier.wosid000901616400055-
dc.type.rimsCONF-
dc.citation.publicationname35th Conference on Neural Information Processing Systems (NeurIPS)-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationVirtual-
dc.contributor.localauthorYang, Eunho-
dc.contributor.nonIdAuthorAurelie, Lozano-
Appears in Collection
AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0