DSpace at KOASAS: Learning to lip read words by watching videos

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Journal Papers(저널논문)

Learning to lip read words by watching videos

Cited 35 time in

Cited 0 time in

Hit : 167
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Chung, Joon Son	ko
dc.contributor.author	Zisserman, Andrew	ko
dc.date.accessioned	2021-11-27T06:40:34Z	-
dc.date.available	2021-11-27T06:40:34Z	-
dc.date.created	2021-11-26	-
dc.date.created	2021-11-26	-
dc.date.created	2021-11-26	-
dc.date.issued	2018-08	-
dc.identifier.citation	COMPUTER VISION AND IMAGE UNDERSTANDING, v.173, pp.76 - 85	-
dc.identifier.issn	1077-3142	-
dc.identifier.uri	http://hdl.handle.net/10203/289581	-
dc.description.abstract	Our aim is to recognise the words being spoken by a talking face, given only the video but not the audio. Existing works in this area have focussed on trying to recognise a small number of utterances in controlled environments (e.g. digits and alphabets), partially due to the shortage of suitable datasets. We make three novel contributions: first, we develop a pipeline for fully automated data collection from TV broadcasts. With this we have generated a dataset with over a million word instances, spoken by over a thousand different people; second, we develop a two-stream convolutional neural network that learns a joint embedding between the sound and the mouth motions from unlabelled data. We apply this network to the tasks of audio-to-video synchronisation and active speaker detection; third, we train convolutional and recurrent networks that are able to effectively learn and recognize hundreds of words from this large-scale dataset. In lip reading and in speaker detection, we demonstrate results that exceed the current state-of-the-art on public benchmark datasets.	-
dc.language	English	-
dc.publisher	ACADEMIC PRESS INC ELSEVIER SCIENCE	-
dc.title	Learning to lip read words by watching videos	-
dc.type	Article	-
dc.identifier.wosid	000454184600009	-
dc.identifier.scopusid	2-s2.0-85044661381	-
dc.type.rims	ART	-
dc.citation.volume	173	-
dc.citation.beginningpage	76	-
dc.citation.endingpage	85	-
dc.citation.publicationname	COMPUTER VISION AND IMAGE UNDERSTANDING	-
dc.identifier.doi	10.1016/j.cviu.2018.02.001	-
dc.contributor.localauthor	Chung, Joon Son	-
dc.contributor.nonIdAuthor	Zisserman, Andrew	-
dc.description.isOpenAccess	N	-
dc.type.journalArticle	Article	-
dc.subject.keywordAuthor	Lip reading	-
dc.subject.keywordAuthor	Lip synchronisation	-
dc.subject.keywordAuthor	Active speaker detection	-
dc.subject.keywordAuthor	Large vocabulary	-
dc.subject.keywordAuthor	Dataset	-
dc.subject.keywordPlus	SPEECH	-
dc.subject.keywordPlus	EXTRACTION	-
dc.subject.keywordPlus	FEATURES	-

Appears in Collection: EE-Journal Papers(저널논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 35 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Learning to lip read words by watching videos

This item is cited by other documents in WoS

KOASAS

Communities & Collections