Dense Pixel-Level Interpretation of Dynamic Scenes With Video Panoptic Segmentation

Cited 3 time in webofscience Cited 0 time in scopus
  • Hit : 441
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Dahunko
dc.contributor.authorWoo, Sanghyunko
dc.contributor.authorLee, Joon-Youngko
dc.contributor.authorKweon, In Soko
dc.date.accessioned2022-09-06T03:00:51Z-
dc.date.available2022-09-06T03:00:51Z-
dc.date.created2022-09-06-
dc.date.created2022-09-06-
dc.date.created2022-09-06-
dc.date.issued2022-
dc.identifier.citationIEEE TRANSACTIONS ON IMAGE PROCESSING, v.31, pp.5383 - 5395-
dc.identifier.issn1057-7149-
dc.identifier.urihttp://hdl.handle.net/10203/298370-
dc.description.abstractA holistic understanding of dynamic scenes is of fundamental importance in real-world computer vision problems such as autonomous driving, augmented reality and spatio-temporal reasoning. In this paper, we propose a new computer vision benchmark: Video Panoptic Segmentation (VPS). To study this important problem, we present two datasets, Cityscapes-VPS and VIPER together with a new evaluation metric, video panoptic quality (VPQ). We also propose VPSNet++, an advanced video panoptic segmentation network, which simultaneously performs classification, detection, segmentation, and tracking of all identities in videos. Specifically, VPSNet++ builds upon a top-down panoptic segmentation network by adding pixel-level feature fusion head and object-level association head. The former temporally augments the pixel features while the latter performs object tracking. Furthermore, we propose panoptic boundary learning as an auxiliary task, and instance discrimination learning which learns spatio-temporally clustered pixel embedding for individual thing or stuff regions, i.e., exactly the objective of the video panoptic segmentation problem. Our VPSNet++ significantly outperforms the default VPSNet, i.e., FuseTrack baseline, and achieves state-of-the-art results on both Cityscapes-VPS and VIPER datasets. The datasets, metric, and models are publicly available at https://github.com/mcahny/vps.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleDense Pixel-Level Interpretation of Dynamic Scenes With Video Panoptic Segmentation-
dc.typeArticle-
dc.identifier.wosid000842776300009-
dc.identifier.scopusid2-s2.0-85133760600-
dc.type.rimsART-
dc.citation.volume31-
dc.citation.beginningpage5383-
dc.citation.endingpage5395-
dc.citation.publicationnameIEEE TRANSACTIONS ON IMAGE PROCESSING-
dc.identifier.doi10.1109/TIP.2022.3183440-
dc.contributor.localauthorKweon, In So-
dc.contributor.nonIdAuthorLee, Joon-Young-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorTask analysis-
dc.subject.keywordAuthorImage segmentation-
dc.subject.keywordAuthorMeasurement-
dc.subject.keywordAuthorElectron tubes-
dc.subject.keywordAuthorSemantics-
dc.subject.keywordAuthorHead-
dc.subject.keywordAuthorBenchmark testing-
dc.subject.keywordAuthorVideo panoptic segmentation-
dc.subject.keywordAuthorpanoptic segmentation-
dc.subject.keywordAuthorvideo instance segmentation-
dc.subject.keywordAuthorvideo semantic segmentation-
dc.subject.keywordAuthorscene parsing-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0