AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 53
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorYeo, Jeong Hunko
dc.contributor.authorKim, Minsuko
dc.contributor.authorJeongsoo Choiko
dc.contributor.authorKim, Dae Hoeko
dc.contributor.authorRo, Yong Manko
dc.date.accessioned2024-06-20T10:00:30Z-
dc.date.available2024-06-20T10:00:30Z-
dc.date.created2024-01-12-
dc.date.created2024-01-12-
dc.date.created2024-01-12-
dc.date.created2024-01-12-
dc.date.created2024-01-12-
dc.date.created2024-01-12-
dc.date.created2024-01-12-
dc.date.created2024-01-12-
dc.date.issued2024-01-
dc.identifier.citationIEEE TRANSACTIONS ON MULTIMEDIA, v.26, pp.6462 - 6474-
dc.identifier.issn1520-9210-
dc.identifier.urihttp://hdl.handle.net/10203/319902-
dc.description.abstractVisual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this article, we propose an Audio Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement the insufficient speech information of visual modality by using audio modality. Different from the previous methods, the proposed AKVSR 1) utilizes rich audio knowledge encoded by a large-scale pretrained audio model, 2) saves the linguistic information of audio knowledge in compact audio memory by discarding the non-linguistic information from the audio through quantization, and 3) includes Audio Bridging Module which can find the best-matched audio features from the compact audio memory, which makes our training possible without audio inputs, once after the compact audio memory is composed. We validate the effectiveness of the proposed method through extensive experiments, and achieve new state-of-the-art performances on the widely-used LRS3 dataset.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleAKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model-
dc.typeArticle-
dc.identifier.wosid001200272600028-
dc.identifier.scopusid2-s2.0-85182355286-
dc.type.rimsART-
dc.citation.volume26-
dc.citation.beginningpage6462-
dc.citation.endingpage6474-
dc.citation.publicationnameIEEE TRANSACTIONS ON MULTIMEDIA-
dc.identifier.doi10.1109/TMM.2024.3352388-
dc.contributor.localauthorRo, Yong Man-
dc.contributor.nonIdAuthorYeo, Jeong Hun-
dc.contributor.nonIdAuthorKim, Minsu-
dc.contributor.nonIdAuthorJeongsoo Choi-
dc.contributor.nonIdAuthorKim, Dae Hoe-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorVSR-
dc.subject.keywordAuthorAudio Knowledge via memory-
dc.subject.keywordAuthoraudio empowered visual speech recognition-
dc.subject.keywordAuthoraudio pretrained model-
dc.subject.keywordAuthoraudio knowledge quantization-
dc.subject.keywordPlusEND-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0