DSpace at KOASAS: Explainable image caption generator using attention and bayesian inference

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

Explainable image caption generator using attention and bayesian inference어텐션과 베이즈 추론을 이용한 설명가능 이미지 캡션 생성기

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 556
Download : 0

Export

Han, Seung Ho

In this thesis, we propose an explainable image caption generation model by applying an attention mechanism and Bayesian inference. Image captioning task is the process of generating textual descriptions of a given image. In traditional studies, such tasks were addressed by directly combining computer vision techniques and natural language processing. Because deep learning has shown great performance in various applications, some recent studies have applied deep learning to image captioning leading to performance improvements compared to traditional approaches. However, these image captioning models cannot reflect the important objects in the given image when generating captions, because these models simply learn the direct correlation between the given image and a corresponding true caption sentence. Moreover, these models cannot explain why specific words are selected in the generated caption because of the lack of deep learning interpretability. To overcome these limitations, we propose a novel image captioning model, Explainable Image Caption Generator, which generates a caption for a given image by reflecting specific objects in the image, and by providing evidence explaining why specific words are generated. Our model is composed of two parts: the generation part, which generates the caption for a given image; and the explanation part, which generates the image--sentence relevance loss, which guides the generation part to capture the important objects in the image and to reflect them during training. Furthermore, the generation part provides a correlation matrix between extracted regions and generated words that can be used for visualizing the evidence of the words in the generated caption. We evaluate our model on three benchmark datasets: MSCOCO, Flickr8K, and Flickr30K. Qualitative results are provided for presenting the effectiveness of the explanation. Quantitative results for generated caption show that the proposed model outperforms traditional approaches both for quantitative and qualitative results.

Advisors: Choi, Ho Jin researcher; 최호진 researcher

Description: 한국과학기술원 :전산학부,

Publisher: 한국과학기술원

Issue Date: 2018

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전산학부, 2018.8,[iv, 40 p. :]

Keywords: Image captioning▼aattention model▼agenerative model▼aobject detection▼abayesian inference; 이미지 캡셔닝▼a어텐션 모델▼a생성 모델▼a오브젝트 감지▼a베이즈 추론

URI: http://hdl.handle.net/10203/267108

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828611&flag=dissertation

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Explainable image caption generator using attention and bayesian inference어텐션과 베이즈 추론을 이용한 설명가능 이미지 캡션 생성기

KOASAS

Communities & Collections