DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kang, Sunghun | ko |
dc.contributor.author | Kim, Junyeong | ko |
dc.contributor.author | Choi, Hyunsoo | ko |
dc.contributor.author | Kim, Sungjin | ko |
dc.contributor.author | Yoo, Chang-Dong | ko |
dc.date.accessioned | 2018-12-20T02:03:53Z | - |
dc.date.available | 2018-12-20T02:03:53Z | - |
dc.date.created | 2018-11-30 | - |
dc.date.created | 2018-11-30 | - |
dc.date.issued | 2018-09-13 | - |
dc.identifier.citation | European Conference on Computer Vision, pp.402 - 417 | - |
dc.identifier.uri | http://hdl.handle.net/10203/247347 | - |
dc.description.abstract | This paper considers an architecture for multimodal video categorization referred to as Pivot Correlational Neural Network (Pivot CorrNN). The architecture consists of modal-specific streams dedicated exclusively to one specific modal input as well as modal-agnostic pivot stream that considers all modal inputs without distinction, and the architecture tries to refine the pivot prediction based on modal-specific predictions. The Pivot CorrNN consists of three modules: (1) maximizing pivot-correlation module that maximizes the correlation between the hidden states as well as the predictions of the modal-agnostic pivot stream and modal-specific streams in the network, (2) contextual Gated Recurrent Unit (cGRU) module which extends the capability of a generic GRU to take multimodal inputs in updating the pivot hidden-state, and (3) adaptive aggregation module that aggregates all modal-specific predictions as well as the modal-agnostic pivot predictions into one final prediction. We evaluate the Pivot CorrNN on two publicly available large-scale multimodal video categorization datasets, FCVID and YouTube-8M. From the experimental results, Pivot CorrNN achieves the best performance on the FCVID database and performance comparable to the state-of-the-art on YouTube-8M database. | - |
dc.language | English | - |
dc.publisher | Springer International Publishing | - |
dc.title | Pivot Correlational Neural Network for Multimodal Video Categorization | - |
dc.type | Conference | - |
dc.identifier.scopusid | 2-s2.0-85055721545 | - |
dc.type.rims | CONF | - |
dc.citation.beginningpage | 402 | - |
dc.citation.endingpage | 417 | - |
dc.citation.publicationname | European Conference on Computer Vision | - |
dc.identifier.conferencecountry | GE | - |
dc.identifier.conferencelocation | GASTEIG Cultural Center, Munich | - |
dc.identifier.doi | 10.1007/978-3-030-01264-9_24 | - |
dc.contributor.localauthor | Yoo, Chang-Dong | - |
dc.contributor.nonIdAuthor | Choi, Hyunsoo | - |
dc.contributor.nonIdAuthor | Kim, Sungjin | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.