Explainable image caption generator using attention and bayesian inference어텐션과 베이즈 추론을 이용한 설명가능 이미지 캡션 생성기

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 556
  • Download : 0
In this thesis, we propose an explainable image caption generation model by applying an attention mechanism and Bayesian inference. Image captioning task is the process of generating textual descriptions of a given image. In traditional studies, such tasks were addressed by directly combining computer vision techniques and natural language processing. Because deep learning has shown great performance in various applications, some recent studies have applied deep learning to image captioning leading to performance improvements compared to traditional approaches. However, these image captioning models cannot reflect the important objects in the given image when generating captions, because these models simply learn the direct correlation between the given image and a corresponding true caption sentence. Moreover, these models cannot explain why specific words are selected in the generated caption because of the lack of deep learning interpretability. To overcome these limitations, we propose a novel image captioning model, Explainable Image Caption Generator, which generates a caption for a given image by reflecting specific objects in the image, and by providing evidence explaining why specific words are generated. Our model is composed of two parts: the generation part, which generates the caption for a given image; and the explanation part, which generates the image--sentence relevance loss, which guides the generation part to capture the important objects in the image and to reflect them during training. Furthermore, the generation part provides a correlation matrix between extracted regions and generated words that can be used for visualizing the evidence of the words in the generated caption. We evaluate our model on three benchmark datasets: MSCOCO, Flickr8K, and Flickr30K. Qualitative results are provided for presenting the effectiveness of the explanation. Quantitative results for generated caption show that the proposed model outperforms traditional approaches both for quantitative and qualitative results.
Advisors
Choi, Ho Jinresearcher최호진researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학부, 2018.8,[iv, 40 p. :]

Keywords

Image captioning▼aattention model▼agenerative model▼aobject detection▼abayesian inference; 이미지 캡셔닝▼a어텐션 모델▼a생성 모델▼a오브젝트 감지▼a베이즈 추론

URI
http://hdl.handle.net/10203/267108
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828611&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0