DSpace at KOASAS: Delving into human speech understanding through multimodal representation learning

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Delving into human speech understanding through multimodal representation learning멀티모달 표현 학습을 통한 인간 음성 이해 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 35
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	노용만	-
dc.contributor.author	Hong, Joanna	-
dc.contributor.author	홍요안나	-
dc.date.accessioned	2024-08-08T19:31:33Z	-
dc.date.available	2024-08-08T19:31:33Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1100039&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/322139	-
dc.description	학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 54 p. :]	-
dc.description.abstract	in real life, clean visual inputs are not always accessible and can even be corrupted by occluded lip regions or noises. We firstly analyze that the previous speech recognition models are not robust to the corruption of multimodal input streams. Then, we design multimodal input corruption modeling and develop an audio-visual speech recognition model that is robust to both audio and visual corruption. Third, we further extend to delve into the challenges from a multilingual viewpoint, where the existing multilingual techniques have been facing a critical problem of data imbalance among languages. Motivated by a human cognitive system that humans can intuitively distinguish different languages without any conscious effort or guidance, we design a model that can capture and recognize which language is given as an input speech. Overall, the proposed research aims to bridge the gaps caused by the insufficiency of certain modalities in communication, allowing for a more comprehensive understanding of human communication processes. The effectiveness of the proposed methods is evaluated with comprehensive experiments.	-
dc.description.abstract	Speech perception is inherently multimodal. In human communication, visual information is generally utilized and readily integrated with auditory speech. Aligned with human perception, machines can also better comprehend human communication by considering multiple modalities. It has been widely known that using complementary information from different modalities is effective in understanding speech. In this research, we deliver several issues that generally occur in speech understanding techniques and provide solutions with a specific task of speech recognition using multimodal audio-visual information. First, we deal with the issue of human communication in noisy environments. Since the visual information is not affected by noisy environments, we design a noise-robust audio-visual speech recognition system that enhances an input noisy audio speech using audio-visual correspondence. Second, we consider the case where both audio and visual information are corrupted	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	다중모달▼a오디오-비주얼▼a음성처리▼a음성이해▼a오디오-비주얼 음성인식▼a다국어 음성인식	-
dc.subject	Multimodal▼aAudio-visual▼aSpeech processing▼aSpeech understanding▼aAudio-visual speech recognition▼aMultilingual	-
dc.title	Delving into human speech understanding through multimodal representation learning	-
dc.title.alternative	멀티모달 표현 학습을 통한 인간 음성 이해 연구	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전기및전자공학부,	-
dc.contributor.alternativeauthor	Ro, Yong Man	-

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Delving into human speech understanding through multimodal representation learning멀티모달 표현 학습을 통한 인간 음성 이해 연구

KOASAS

Communities & Collections