DSpace at KOASAS: MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Conference Papers(학술회의논문)

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Cited 7 time in

Cited 0 time in

Hit : 65
Download : 0

Export

Park, Geondo / Han, Chihye / Kim, Daeshik researcher / Yoon, Wonjun

Visual-semantic embedding enables various tasks such as image-text retrieval, image captioning, and visual question answering. The key to successful visual-semantic embedding is to express visual and textual data properly by accounting for their intricate relationship. While previous studies have achieved much advance by encoding the visual and textual data into a joint space where similar concepts are closely located, they often represent data by a single vector ignoring the presence of multiple important components in an image or text. Thus, in addition to the joint embedding space, we propose a novel multi-head self-attention network to capture various components of visual and textual data by attending to important parts in data. Our approach achieves the new state-of-the-art results in imagetext retrieval tasks on MS-COCO and Flicker30K datasets. Through the visualization of the attention maps that capture distinct semantic components at multiple positions in the image and the text, we demonstrate that our method achieves an effective and interpretable visual-semantic joint space.

Publisher: IEEE COMPUTER SOC

Issue Date: 2020-03

Language: English

Citation: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.1507 - 1515

ISSN: 2472-6737

DOI: 10.1109/WACV45572.2020.9093548

URI: http://hdl.handle.net/10203/288395

Appears in Collection: EE-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 7 items in WoS	Click to see citing articles in

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

This item is cited by other documents in WoS

KOASAS

Communities & Collections