Feature generations analysis of lip image streams for isolate words recognition

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 271
  • Download : 0
To overcome the decrease in the recognition rate of voice recognition in noisy environments, the implementation of Audio Visual Speech Recognition (AVSR), which combines voice and lip information, has been attempted since the 1990s. This study aims to investigate the discrimination of various features extracted from lip image data using dynamic time warping (DTW) as an objective function to implement a robust lip-reading system as the core process of AVSR. The features taken from existing literature are gridbased features, including gray level, optical flow, and Sobel operator gradient, and various ratios of lip shapes calculated based on coordinates. According to the results of the application of DTW to respective feature generation methods using 180 pieces of data collected from ten study subjects who each uttered six isolated words three times, the mean recognition rate was found to be up to 60.55%. The feature that showed the highest recognition rate was the combined vector of a width/height ratio of the outer lip and the height of the inner lip, and grid-based features were found to outperform coordinatebased features in the recognition rate of certain words.
Publisher
Science and Engineering Research Support Society
Issue Date
2015
Language
English
Citation

INTERNATIONAL JOURNAL OF MULTIMEDIA AND UBIQUITOUS ENGINEERING, v.10, no.10, pp.337 - 346

ISSN
1975-0080
DOI
10.14257/ijmue.2015.10.10.33
URI
http://hdl.handle.net/10203/212435
Appears in Collection
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0