Self-Supervised Monocular Depth Estimation with Positional Shift Depth Variance and Adaptive Disparity Quantization

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 92
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorBello, Juan Luis Gonzalezko
dc.contributor.authorMoon, Jaehoko
dc.contributor.authorKim, Munchurlko
dc.date.accessioned2024-04-17T14:00:31Z-
dc.date.available2024-04-17T14:00:31Z-
dc.date.created2024-04-17-
dc.date.created2024-04-17-
dc.date.created2024-04-17-
dc.date.created2024-04-17-
dc.date.issued2024-03-
dc.identifier.citationIEEE TRANSACTIONS ON IMAGE PROCESSING, v.33, pp.2074 - 2089-
dc.identifier.issn1057-7149-
dc.identifier.urihttp://hdl.handle.net/10203/319091-
dc.description.abstractRecently, attempts to learn the underlying 3D structures of a scene from monocular videos in a fully self-supervised fashion have drawn much attention. One of the most challenging aspects of this task is to handle independently moving objects as they break the rigid-scene assumption. In this paper, we show for the first time that pixel positional information can be exploited to learn SVDE (Single View Depth Estimation) from videos. The proposed moving object (MO) masks, which are induced by the depth variance to shifted positional information (SPI) and are referred to as 'SPIMO' masks, are highly robust and consistently remove independently moving objects from the scenes, allowing for robust and consistent learning of SVDE from videos. Additionally, we introduce a new adaptive quantization scheme that assigns the best per-pixel quantization curve for depth discretization, improving the fine granularity and accuracy of the final aggregated depth maps. Finally, we employ existing boosting techniques in a new way that self-supervises moving object depths further. With these features, our pipeline is robust against moving objects and generalizes well to high-resolution images, even when trained with small patches, yielding state-of-the-art (SOTA) results with four- to eight-fold fewer parameters than the previous SOTA techniques that learn from videos. We present extensive experiments on KITTI and CityScapes that show the effectiveness of our method.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleSelf-Supervised Monocular Depth Estimation with Positional Shift Depth Variance and Adaptive Disparity Quantization-
dc.typeArticle-
dc.identifier.wosid001188332200014-
dc.identifier.scopusid2-s2.0-85187999582-
dc.type.rimsART-
dc.citation.volume33-
dc.citation.beginningpage2074-
dc.citation.endingpage2089-
dc.citation.publicationnameIEEE TRANSACTIONS ON IMAGE PROCESSING-
dc.identifier.doi10.1109/TIP.2024.3374045-
dc.contributor.localauthorKim, Munchurl-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorDepth from videos-
dc.subject.keywordAuthorself-supervised-
dc.subject.keywordAuthormonocular depth estimation-
dc.subject.keywordAuthordeep convolutional neural networks-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0