The speech recognizer running in the real world is considerably influenced by noise. The speech recognizer trained by the clean speech cannot well recognize a speech obtained under the noisy environments because the noise brings mismatches between the training and test environments. Therefore, it is necessary to compensate these mismatches for noise robust speech recognition.
In this thesis, we studied about an improvement of stochastic feature extraction based on band-SNR for noise robust speech recognition. We proposed a slightly-modified version of the multi-band spectral subtraction method that adjusts the subtraction level of noise spectrum according to band-SNR, which is noted as M-MSS. Also, we modified the architecture of the stochastic feature extraction method, which is noted as M-SFE. Then, we proposed a stochastic feature extraction method combining two methods above. It is to use advantages of two methods to reliably consider the effect of noise. In the M-MSS, a noise normalization factor was newly introduced in order to play a role in controlling the over-estimation factor depending on band-SNR. As a result, we could more reliably adjust the subtraction level of noise spectrum. We could get a better performance when the spectral subtraction was applied in the power spectrum domain than in the mel-scale domain. Last, we applied the framework of stochastic feature extraction method to the modified multi-band spectral subtraction method. The proposed method, which is denoted as the MMSS-MSFE method, could more effectively compensate variations of noise spectrum by estimating optimal spectrum of clean speech and using the mean and variance of stochastic features.
The proposed methods were evaluated on isolated word recognition under various noise environments. When we used only mean of stochastic feature, the average error rates of the M-MSS, M-SFE, MMSS-MSFE method over the ordinary spectral subtraction (SS) method were reduced with 18.6%, 11.0%,...