Although many researchers have studied about digit recognition, it is still away from commercial applications in Korea. It is well known that Korean digit recognition is more difficult than English digit recognition, even worse in continuous digits. In this paper, I studied about various techniques to improve the recognition, especially one of the environmental compensation preprocessing methods, called the cepstral mean normalization, with some acoustic-phonetic models. I found that the recognition results varied depending on the windows size for the cepstral mean normalization, and not always the long-term cepstral mean normalization produces the best results. This can be interpreted as if we use the short-term cepstral mean normalization technique with a proper window size for Korean digit recognition, we can get the better results than the conventional cepstral mean normalization. The reason could be the variation of the phone length caused by the short-term cepstral mean normalization, and this variation is believed to improve the recognition rate.
Monophone, triphone, whole-word, tri-word, and phonological-rule- considered digit models in Korean pronunciation, are tested in various numbers of states and mixtures. Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) cepstral coefficients are extracted as the feature vectors. Long-term and short-term cepstral mean normalization/ subtraction(CMN/CMS) processing, and relative spectral (RASTA) processing is used for the channel noise compensation. Kalman filtering is applied for additive noise reduction. Linear discriminant analysis (LDA) transformation for the digit recognition is also tested in the end.