Attentive visual semantic embedding with multiple self-attention다중적 셀프 어텐션 접근을 이용한 이미지의 의미론적 임베딩

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 156
  • Download : 0
Visual-semantic embedding enables various tasks such as image-text retrieval, image captioning, and visual question answering. The key to successful visual-semantic embedding is to express visual and textual data properly by accounting for their intricate relationship. While previous studies have achieved much advance by encoding the visual and textual data into a joint space where similar concepts are closely located, they often represent data by a single vector ignoring the presence of multiple important components in an image or text. Thus, in addition to the joint embedding space, we propose a novel multi-view self-attention network to capture various components of visual and textual data by attending to important parts in data. Our approach achieves the new state-of-the-art results in image-text retrieval tasks on MS-COCO and Flicker30K datasets. Through the visualization of the attention maps that capture distinct semantic components at multiple positions in the image and the text, we demonstrate that our method achieves an effective and interpretable visual-semantic joint space.
Advisors
Kim, Daeshikresearcher김대식researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2020
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2020.2,[iii, 25 p. :]

Keywords

Deep Learning▼aVision and Language▼aImage-Text Matching▼aVisual-Semantic Embedding▼aSelf-Attention▼aMultihop Self-Attention; 딥러닝▼a컴퓨터비전▼a이미지-텍스트 이해▼a어텐션▼a셀프-어텐션

URI
http://hdl.handle.net/10203/284718
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=911327&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0