Phrase-frames alignment network with contrastive attention loss for video description의미 중심 구-프레임 정렬과 대조 집중 손실을 통한 영상 묘사

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 156
  • Download : 0
This paper considers a video caption generating network referred to as Phrase-Frames Alignment Network (PFAN) that solves the problem of information redundancy of successive sampled frame, prevalent in most video captioning algorithms. As consecutive sampled frames are less likely to provide unique information, prior methods have focused on encoding compact video representation from an input video through various methods such as using a hierarchical encoder or learning to sample informative frames. The PFAN attempts to compactly encode the input video by not only using the visual features of frames but also the semantics of a partially decoded caption. The PFAN (1) forms \textit{semantic groups} by aligning each video frame feature with discriminating word phrases of partially decoded caption and then (2) decodes the semantic groups to predict the next of the partially decoded caption. In contrast to the prior methods, the continuous feedback from decoded words enables the PFAN to dynamically update the video representation that adapts to the partially decoded caption. Furthermore, a contrastive attention loss is proposed to facilitate accurate alignment between word phrases and video frame features without requiring any manual annotations. The PFAN achieves state-of-the-art performances by outperforming runner-up methods by a margin of 2.1% and 2.4% in a CIDEr-D score on MSVD and MSR-VTT datasets, respectively. Extensive experiments are conducted to demonstrate the effectiveness and interpretability of the PFAN.
Advisors
Yoo, Chang Dongresearcher유창동researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2020
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2020.8,[iv, 25 p. :]

Keywords

Deep Learning▼aComputer Vision▼aVideo Captioning▼aMulti-Modal Alignment▼aContrastive Attention Mechanism; 심층학습▼a컴퓨터 비전▼a영상 묘사▼a멀티모달 정렬▼a대조적인 주의 메커니즘

URI
http://hdl.handle.net/10203/285058
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=925222&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0