Audio-visual speech recognition : stochastic optimization of hidden markov models, modeling of interframe correlations, and integration with neural networks = 시청각 음성인식 : 은닉 마르코프 모델의 확률적 최적화, 프레임간 상관관계의 모델링 및 신경회로망을 이용한 통합stochastic optimization of hidden markov models, modeling of interframe correlations, and integration with neural networks
Automatic speech recognition has become a popular and important technique for the man-machine interface service nowadays. Although many existing speech recognition systems show high recognition performance in controlled situations, their performance is not satisfactory in noisy circumstances yet. The problem of overcoming this limitation and achieving noise-robust recognition performance is important but difficult in the automatic speech recognition field.
Audio-visual speech recognition (AVSR) is to recognize speech by observing both acoustic and visual signals for robust recognition in such circumstances; a microphone records the voice signal, a camera captures the speaker’s lip movement, and the two signals are combined for recognition of the speech. Although speech recognition using the visual signal shows rather low accuracy compared to the conventional acoustic speech recognition in low-noise environments, it is not affected by the acoustic noise and, thus, can be a powerful solution which compensates for the performance degradation of the acoustic speech recognition in noisy environments.
In this dissertation, we focus on improving robustness of AVSR by considering the three parts composing the recognition system: acoustic speech recognition, visual speech recognition and integration of the two modalities.
First, we propose a novel stochastic optimization algorithm of hidden Markov models (HMMs) used for the recognizer to improve the visual speech recognition performance. We combine the powerful stochastic search algorithm, simulated annealing, with the local optimization technique to develop the hybrid simulated annealing algorithm for improving speed and performance of the algorithm. While the conventional learning algorithm of HMMs, the expectation-maximization method, only performs local optimization of the likelihood function, the proposed algorithm can perform global search and, thus, improve the recognition performance of the HMMs. It ...