DC Field | Value | Language |
---|---|---|
dc.contributor.author | Bello, Juan Luis Gonzalez | ko |
dc.contributor.author | Moon, Jaeho | ko |
dc.contributor.author | Kim, Munchurl | ko |
dc.date.accessioned | 2024-04-17T14:00:31Z | - |
dc.date.available | 2024-04-17T14:00:31Z | - |
dc.date.created | 2024-04-17 | - |
dc.date.created | 2024-04-17 | - |
dc.date.created | 2024-04-17 | - |
dc.date.created | 2024-04-17 | - |
dc.date.issued | 2024-03 | - |
dc.identifier.citation | IEEE TRANSACTIONS ON IMAGE PROCESSING, v.33, pp.2074 - 2089 | - |
dc.identifier.issn | 1057-7149 | - |
dc.identifier.uri | http://hdl.handle.net/10203/319091 | - |
dc.description.abstract | Recently, attempts to learn the underlying 3D structures of a scene from monocular videos in a fully self-supervised fashion have drawn much attention. One of the most challenging aspects of this task is to handle independently moving objects as they break the rigid-scene assumption. In this paper, we show for the first time that pixel positional information can be exploited to learn SVDE (Single View Depth Estimation) from videos. The proposed moving object (MO) masks, which are induced by the depth variance to shifted positional information (SPI) and are referred to as 'SPIMO' masks, are highly robust and consistently remove independently moving objects from the scenes, allowing for robust and consistent learning of SVDE from videos. Additionally, we introduce a new adaptive quantization scheme that assigns the best per-pixel quantization curve for depth discretization, improving the fine granularity and accuracy of the final aggregated depth maps. Finally, we employ existing boosting techniques in a new way that self-supervises moving object depths further. With these features, our pipeline is robust against moving objects and generalizes well to high-resolution images, even when trained with small patches, yielding state-of-the-art (SOTA) results with four- to eight-fold fewer parameters than the previous SOTA techniques that learn from videos. We present extensive experiments on KITTI and CityScapes that show the effectiveness of our method. | - |
dc.language | English | - |
dc.publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | - |
dc.title | Self-Supervised Monocular Depth Estimation with Positional Shift Depth Variance and Adaptive Disparity Quantization | - |
dc.type | Article | - |
dc.identifier.wosid | 001188332200014 | - |
dc.identifier.scopusid | 2-s2.0-85187999582 | - |
dc.type.rims | ART | - |
dc.citation.volume | 33 | - |
dc.citation.beginningpage | 2074 | - |
dc.citation.endingpage | 2089 | - |
dc.citation.publicationname | IEEE TRANSACTIONS ON IMAGE PROCESSING | - |
dc.identifier.doi | 10.1109/TIP.2024.3374045 | - |
dc.contributor.localauthor | Kim, Munchurl | - |
dc.description.isOpenAccess | N | - |
dc.type.journalArticle | Article | - |
dc.subject.keywordAuthor | Depth from videos | - |
dc.subject.keywordAuthor | self-supervised | - |
dc.subject.keywordAuthor | monocular depth estimation | - |
dc.subject.keywordAuthor | deep convolutional neural networks | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.