PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss

Cited 17 time in webofscience Cited 0 time in scopus
  • Hit : 75
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorBello, Juan Luis Gonzalezko
dc.contributor.authorKim, Munchurlko
dc.date.accessioned2022-12-03T05:01:28Z-
dc.date.available2022-12-03T05:01:28Z-
dc.date.created2022-12-03-
dc.date.issued2021-06-23-
dc.identifier.citation2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp.6851 - 6860-
dc.identifier.issn1063-6919-
dc.identifier.urihttp://hdl.handle.net/10203/301526-
dc.description.abstractIn this paper, we propose a self-supervised singleview pixel-level accurate depth estimation network, called PLADE-Net. The PLADE-Net is the first work that shows remarkable accuracy levels, exceeding 95% in terms of the δ1 metric on the challenging KITTI dataset. Our PLADENet is based on a new network architecture with neural positional encoding and a novel loss function that borrows from the closed-form solution of the matting Laplacian to learn pixel-level accurate depth estimation from stereo images. Neural positional encoding allows our PLADENet to obtain more consistent depth estimates by letting the network reason about location-specific image properties such as projection (and potentially lens) distortions. Our novel distilled matting Laplacian loss allows our network to predict sharp depths at object boundaries and more consistent depths in highly homogeneous regions. Our proposed method outperforms all previous self-supervised single-view depth estimation methods by a large margin on the challenging KITTI dataset, with unparalleled levels of accuracy. Furthermore, our PLADE-Net, naively extended for stereo inputs, outperforms the most recent self-supervised stereo methods, even without any advanced blocks like 1D correlations, 3D convolutions, or spatial pyramid pooling. We present extensive ablation studies and experiments that support our method’s effectiveness on the KITTI, CityScapes, and Make3D datasets.-
dc.languageEnglish-
dc.publisherThe IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR)-
dc.titlePLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss-
dc.typeConference-
dc.identifier.wosid000739917307007-
dc.identifier.scopusid2-s2.0-85121320273-
dc.type.rimsCONF-
dc.citation.beginningpage6851-
dc.citation.endingpage6860-
dc.citation.publicationname2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationVirtual-
dc.identifier.doi10.1109/CVPR46437.2021.00678-
dc.contributor.localauthorKim, Munchurl-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 17 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0