C2C : context to context mapping with audio-knowledge for lip reading음성 지식을 활용한 문맥 정보 기반 독순술

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 88
  • Download : 0
Lip reading is to predict the spoken sentence from silent lip movement. However, due to the existence of homophenes that similar lip movement with different sound, lip reading is a challenging task and showing inferior performances than speech recognition. To mitigate the homophenes problem in lip reading, in this paper, we propose a novel Context to Context mapping (C2C) method which is mainly composed of two parts: 1) Audio Context Memory Network is designed to complement insufficient visual information by storing and providing both phoneme- and context-level audio knowledge without audio input during the inference phase, and 2) Visual Feature Decomposition Module (VFDM) is presented to figure out subtle differences in similar lip movements by decomposing visual features into multiple latent features in order to capture the different amounts of temporal information. And reconstructed visual feature from latent features can distinguish subtle difference of lip movement. which also be helpful to reconstruct audio knowledge in viseme to phoneme level due to discriminative visual feature. Through the extensive experiments, we validate the effectiveness of the proposed C2C method achieving state-of-the-art performances on two public word-level lip reading datasets.
Advisors
Ro, Yong Manresearcher노용만researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2022.8,[iii, 22 p. :]

Keywords

Lip Reading▼aVisual Speech Recognition▼aContext to Context Mapping▼aVisual Feature Decomposition; 독순술▼a멀티모달 러닝▼a오디오-비주얼 문맥 정보 연결▼a메모리

URI
http://hdl.handle.net/10203/309885
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1008353&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0