This thesis proposes a fast text-independent speaker identification method using phonetic GMMs. The individual Gaussian component of a GMM can accurately represent acoustic characteristics of a speaker, so the GMM is effective to make a speaker model in text-independent condition. In the text- dependent speaker identification, input speech content for identification is determined a priori. In this application, using the hidden Markov model (HMM) as a speaker model shows better accuracy, since the HMM can model the temporal structure of the input speech as well as the speaker identity. When we build a speaker GMM for text-independent speaker identification, sufficient training data are required to estimate the GMM parameters precisely. On the other hand, the HMM-based text-independent speaker model doesn``t demand so many training data in building the speaker HMMs. In order to combine the advantages of the GMMs and the HMMs in the text-independent speaker identification, we propose a system architecture using phonetic GMMs. The speaker identification using phonetic GMMs uses three different types of models: speaker-independent phone HMM, baseline speaker GMM, phonetic speaker GMM. The HMM is used to get the segmental information of phones the baseline GMM is used to obtain the N-best speakers from all registered speakers, and the phonetic GMM is finally used to find a person who speaks to the system.
From the experiments, as the number of mixtures of the baseline GMM is increased to 320, we obtained an identification accuracy similar to that of the phonetic GMM with 14 mixtures for 45 phones, but the time elapsed to identify the speaker was longer five times than that of the phonetic GMM. Hence the phonetic GMMs can save the elapsed time, but the number of parameters is much greater than that of the baseline GMM because of using three mode types. This problem can be overcome more or less by tying phones into some classes. This is based on the fact that the lik...