Besra: Self-correction for hallucination mitigation in large vision-language models베스라: 대형 시각 언어 모델의 환각 완화를 위한 자체 교정

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 2
  • Download : 0
Large Vision-Language Models (LVLMs) have revolutionized the field of computer vision by unifying various computer vision tasks through their ability to comprehend visual information. However, they often suffer from hallucination, generating inconsistent descriptions not aligned with input images. This paper introduces Besra, a Large Vision-Language Model designed to address hallucination by incorporating a self-correction task. Besra leverages its iterative refinement capability to enhance generated sentences' consistency with provided images. The model iteratively refines descriptions by refeeding them alongside corresponding images, facilitating a detailed examination of specific image regions. Besra-Self-Correction-30K, a proposed dataset, trains Besra's self-correction ability by inducing corrections based on predictions from a baseline LVLM. The approach aims to mitigate hallucination, enabling Besra to generate more accurate and contextually relevant descriptions through active image scrutiny. We evaluate Besra on POPE and MME benchmark and prove that a self-correction task is helpful for hallucination mitigation.
Advisors
노용만researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iii, 22 p. :]

Keywords

대형 시각 언어 모델▼a환각 현상▼a자체 교정 작업▼a베스▼a베스라-자체교정-데이터셋; Large vision-language model▼aHallucination▼aSelf-correction▼aBesra▼aBesra-self-correction-30K

URI
http://hdl.handle.net/10203/321570
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096788&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0