Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

Cited 54 time in webofscience Cited 37 time in scopus
  • Hit : 178
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Dong-Jinko
dc.contributor.authorChoi, Jinsooko
dc.contributor.authorOh, Tae-Hyunko
dc.contributor.authorKweon, In-Soko
dc.date.accessioned2019-11-28T08:26:22Z-
dc.date.available2019-11-28T08:26:22Z-
dc.date.created2019-11-26-
dc.date.created2019-11-26-
dc.date.created2019-11-26-
dc.date.issued2019-06-19-
dc.identifier.citationIEEE Conference on Computer Vision and Pattern Recognition, pp.6264 - 6273-
dc.identifier.urihttp://hdl.handle.net/10203/268690-
dc.description.abstractOur goal in this work is to train an image captioning model that generates more dense and informative captions. We introduce "relational captioning," a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in an image. Relational captioning is a framework that is advantageous in both diversity and amount of information, leading to image understanding based on relationships. Part-of speech (POS, i.e. subject-object-predicate categories) tags can be assigned to every English word. We leverage the POS as a prior to guide the correct sequence of words in a caption. To this end, we propose a multi-task triple-stream network (MTTSNet) which consists of three recurrent units for the respective POS and jointly performs POS prediction and captioning. We demonstrate more diverse and richer representations generated by the proposed model against several baselines and competing methods.-
dc.languageEnglish-
dc.publisherIEEE Conference on Computer Vision and Pattern Recognition-
dc.titleDense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning-
dc.typeConference-
dc.identifier.wosid000529484006047-
dc.identifier.scopusid2-s2.0-85066479484-
dc.type.rimsCONF-
dc.citation.beginningpage6264-
dc.citation.endingpage6273-
dc.citation.publicationnameIEEE Conference on Computer Vision and Pattern Recognition-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationLong Beach, CA-
dc.identifier.doi10.1109/CVPR.2019.00643-
dc.contributor.localauthorKweon, In-So-
dc.contributor.nonIdAuthorKim, Dong-Jin-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 54 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0