Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue

Cited 3 time in webofscience Cited 0 time in scopus
  • Hit : 282
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorLee, Seokjuko
dc.contributor.authorRameau, Francoisko
dc.contributor.authorIm, Sunghoonko
dc.contributor.authorKweon, In Soko
dc.date.accessioned2022-08-19T07:00:10Z-
dc.date.available2022-08-19T07:00:10Z-
dc.date.created2022-08-01-
dc.date.created2022-08-01-
dc.date.issued2022-09-
dc.identifier.citationINTERNATIONAL JOURNAL OF COMPUTER VISION, v.130, no.9, pp.2265 - 2285-
dc.identifier.issn0920-5691-
dc.identifier.urihttp://hdl.handle.net/10203/298027-
dc.description.abstractWe introduce an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion, and depth in a monocular camera setup without geometric supervision. Our technical contributions are three-fold. First, we highlight the fundamental difference between inverse and forward projection while modeling the individual motion of each rigid object, and propose a geometrically correct projection pipeline using a neural forward projection module. Second, we propose two types of residual motion learning frameworks to explicitly disentangle camera and object motions in dynamic driving scenes with different levels of semantic prior knowledge: video instance segmentation as a strong prior, and object detection as a weak prior. Third, we design a unified photometric and geometric consistency loss that holistically imposes self-supervisory signals for every background and object region. Lastly, we present a unsupervised method of 3D motion field regularization for semantically plausible object motion representation. Our proposed elements are validated in a detailed ablation study. Through extensive experiments conducted on the KITTI, Cityscapes, and Waymo open dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code, dataset, and models are publicly available-
dc.languageEnglish-
dc.publisherSPRINGER-
dc.titleSelf-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue-
dc.typeArticle-
dc.identifier.wosid000827403200001-
dc.identifier.scopusid2-s2.0-85134520935-
dc.type.rimsART-
dc.citation.volume130-
dc.citation.issue9-
dc.citation.beginningpage2265-
dc.citation.endingpage2285-
dc.citation.publicationnameINTERNATIONAL JOURNAL OF COMPUTER VISION-
dc.identifier.doi10.1007/s11263-022-01641-5-
dc.contributor.localauthorKweon, In So-
dc.contributor.nonIdAuthorLee, Seokju-
dc.contributor.nonIdAuthorIm, Sunghoon-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthor3D visual perception-
dc.subject.keywordAuthorMonocular depth estimation-
dc.subject.keywordAuthorMotion estimation-
dc.subject.keywordAuthorSelf-supervised learning-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0