C2C : context to context mapping with audio-knowledge for lip reading음성 지식을 활용한 문맥 정보 기반 독순술

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 89
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorRo, Yong Man-
dc.contributor.advisor노용만-
dc.contributor.authorYeo, Jeong Hun-
dc.date.accessioned2023-06-26T19:33:56Z-
dc.date.available2023-06-26T19:33:56Z-
dc.date.issued2022-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1008353&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/309885-
dc.description학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2022.8,[iii, 22 p. :]-
dc.description.abstractLip reading is to predict the spoken sentence from silent lip movement. However, due to the existence of homophenes that similar lip movement with different sound, lip reading is a challenging task and showing inferior performances than speech recognition. To mitigate the homophenes problem in lip reading, in this paper, we propose a novel Context to Context mapping (C2C) method which is mainly composed of two parts: 1) Audio Context Memory Network is designed to complement insufficient visual information by storing and providing both phoneme- and context-level audio knowledge without audio input during the inference phase, and 2) Visual Feature Decomposition Module (VFDM) is presented to figure out subtle differences in similar lip movements by decomposing visual features into multiple latent features in order to capture the different amounts of temporal information. And reconstructed visual feature from latent features can distinguish subtle difference of lip movement. which also be helpful to reconstruct audio knowledge in viseme to phoneme level due to discriminative visual feature. Through the extensive experiments, we validate the effectiveness of the proposed C2C method achieving state-of-the-art performances on two public word-level lip reading datasets.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectLip Reading▼aVisual Speech Recognition▼aContext to Context Mapping▼aVisual Feature Decomposition-
dc.subject독순술▼a멀티모달 러닝▼a오디오-비주얼 문맥 정보 연결▼a메모리-
dc.titleC2C-
dc.title.alternative음성 지식을 활용한 문맥 정보 기반 독순술-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthor여정훈-
dc.title.subtitlecontext to context mapping with audio-knowledge for lip reading-
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0