DSpace at KOASAS: LLM-based framework for generating barrier-free audio descriptions

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

LLM-based framework for generating barrier-free audio descriptions대형언어모델 기반 배리어프리 화면해설 자동 생성 프레임워크

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 4
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	한동수	-
dc.contributor.author	Park, Jaehyeong	-
dc.contributor.author	박재형	-
dc.date.accessioned	2024-07-30T19:31:46Z	-
dc.date.available	2024-07-30T19:31:46Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097268&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/321688	-
dc.description	학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 23 p. :]	-
dc.description.abstract	We address a framework for generating barrier-free audio descriptions (AD), specifically designed for visually impaired people. Barrier-free audio description is a narration that delivers key visual information to visually impaired people about the contents of video media such as movies and dramas. Given the increasing importance of visual media, it holds significant value in preventing social exclusion of visually impaired people. AD generation is more challenging than general video captioning as it requires reflecting the overall context of the movie and the names of characters for each video description. We leverage the rich contextual information from movie scripts and the capabilities of multi-modality LLM to generate audio descriptions. It involves identifying parts of the movie script relevant to the videos to be described, acquiring information about the contextual narrative of the movie and names of characters. Then, we incorporate this information into video descriptions using the multi-modality LLM. Our framework produces higher quality audio descriptions than previous works, without the need for additional training.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	배리어프리 화면해설▼a대형 언어 모델▼a영화 대본▼a영상 캡션 생성	-
dc.subject	Barrier-free audio description▼aLarge language model▼aMovie script▼aVideo captioning	-
dc.title	LLM-based framework for generating barrier-free audio descriptions	-
dc.title.alternative	대형언어모델 기반 배리어프리 화면해설 자동 생성 프레임워크	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전기및전자공학부,	-
dc.contributor.alternativeauthor	Han, Dongsu	-

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

LLM-based framework for generating barrier-free audio descriptions대형언어모델 기반 배리어프리 화면해설 자동 생성 프레임워크

KOASAS

Communities & Collections