This paper describes a voice conversion method for an HSMM(Hidden Semi-Markov model)―based speech synthesis system. Our HSMM system adopt a speaker dependent model which can be converted into a target speaker model through the adaptation process. The main purpose of this paper is to investigate the possibility that speaker adaptation method can be successfully applied even on the SD model instead of average voice model.
By utilizing the SD model based HSMM system combined with MLLR(Maximum Likelihood Linear Regression) as an adaptation process, it will be shown that conversion both of spectral and prosody information such as F0 and phone duration are sincerely reflected on the synthesized speech.