Pivot Correlational Neural Network for Multimodal Video Categorization

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 163
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKang, Sunghunko
dc.contributor.authorKim, Junyeongko
dc.contributor.authorChoi, Hyunsooko
dc.contributor.authorKim, Sungjinko
dc.contributor.authorYoo, Chang-Dongko
dc.date.accessioned2018-12-20T02:03:53Z-
dc.date.available2018-12-20T02:03:53Z-
dc.date.created2018-11-30-
dc.date.created2018-11-30-
dc.date.issued2018-09-13-
dc.identifier.citationEuropean Conference on Computer Vision, pp.402 - 417-
dc.identifier.urihttp://hdl.handle.net/10203/247347-
dc.description.abstractThis paper considers an architecture for multimodal video categorization referred to as Pivot Correlational Neural Network (Pivot CorrNN). The architecture consists of modal-specific streams dedicated exclusively to one specific modal input as well as modal-agnostic pivot stream that considers all modal inputs without distinction, and the architecture tries to refine the pivot prediction based on modal-specific predictions. The Pivot CorrNN consists of three modules: (1) maximizing pivot-correlation module that maximizes the correlation between the hidden states as well as the predictions of the modal-agnostic pivot stream and modal-specific streams in the network, (2) contextual Gated Recurrent Unit (cGRU) module which extends the capability of a generic GRU to take multimodal inputs in updating the pivot hidden-state, and (3) adaptive aggregation module that aggregates all modal-specific predictions as well as the modal-agnostic pivot predictions into one final prediction. We evaluate the Pivot CorrNN on two publicly available large-scale multimodal video categorization datasets, FCVID and YouTube-8M. From the experimental results, Pivot CorrNN achieves the best performance on the FCVID database and performance comparable to the state-of-the-art on YouTube-8M database.-
dc.languageEnglish-
dc.publisherSpringer International Publishing-
dc.titlePivot Correlational Neural Network for Multimodal Video Categorization-
dc.typeConference-
dc.identifier.scopusid2-s2.0-85055721545-
dc.type.rimsCONF-
dc.citation.beginningpage402-
dc.citation.endingpage417-
dc.citation.publicationnameEuropean Conference on Computer Vision-
dc.identifier.conferencecountryGE-
dc.identifier.conferencelocationGASTEIG Cultural Center, Munich-
dc.identifier.doi10.1007/978-3-030-01264-9_24-
dc.contributor.localauthorYoo, Chang-Dong-
dc.contributor.nonIdAuthorChoi, Hyunsoo-
dc.contributor.nonIdAuthorKim, Sungjin-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0