Scalable Neural Video Representations with Learnable Positional Features

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 44
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Subinko
dc.contributor.authorYu, Sihyunko
dc.contributor.authorLee, Jaehoko
dc.contributor.authorShin, Jinwooko
dc.date.accessioned2023-09-13T07:00:47Z-
dc.date.available2023-09-13T07:00:47Z-
dc.date.created2023-09-13-
dc.date.issued2022-11-
dc.identifier.citation36th Conference on Neural Information Processing Systems, NeurIPS 2022-
dc.identifier.urihttp://hdl.handle.net/10203/312583-
dc.description.abstractSuccinct representation of complex signals using coordinate-based neural representations (CNRs) has seen great progress, and several recent efforts focus on extending them for handling videos. Here, the main challenge is how to (a) alleviate a compute-inefficiency in training CNRs to (b) achieve high-quality video encoding while (c) maintaining the parameter-efficiency. To meet all requirements (a), (b), and (c) simultaneously, we propose neural video representations with learnable positional features (NVP), a novel CNR by introducing “learnable positional features” that effectively amortize a video as latent codes. Specifically, we first present a CNR architecture based on designing 2D latent keyframes to learn the common video contents across each spatio-temporal axis, which dramatically improves all of those three requirements. Then, we propose to utilize existing powerful image and video codecs as a compute-/memory-efficient compression procedure of latent codes. We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07→34.57 (measured with the PSNR metric), even using >8 times fewer parameters. We also show intriguing properties of NVP, e.g., video inpainting, video frame interpolation, etc.-
dc.languageEnglish-
dc.publisherNeural information processing systems foundation-
dc.titleScalable Neural Video Representations with Learnable Positional Features-
dc.typeConference-
dc.identifier.scopusid2-s2.0-85162807450-
dc.type.rimsCONF-
dc.citation.publicationname36th Conference on Neural Information Processing Systems, NeurIPS 2022-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationNew Orleans-
dc.contributor.localauthorShin, Jinwoo-
dc.contributor.nonIdAuthorKim, Subin-
dc.contributor.nonIdAuthorLee, Jaeho-
Appears in Collection
AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0