HMM-based Korean speech synthesizer with two-band mixed excitation model for enbedded applications임베디드 시스템을 위한 2대역 혼합 여기 모델과 은닉 마코프 모델(HMM)기반의 한국어 음성 합성기

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 560
  • Download : 0
Speech interface may be the first choice as a user interface for robots or hand-held devices such as personal digital assistants (PDAs) and portable multimedia players (PMPs). However, those devices have the limitation of the memory space and the computation power. The hidden Markov model (HMM)-based speech synthesis is presently considered to be suitable for the embedded systems. This thesis describes an HMM-based Korean speech synthesis, spectral parameter comparison, and the suggested two-band excitation model for the HMM-based speech synthesis. Firstly, development of an HMM-based Korean speech synthesis system and its evaluation is presented. Statistical HMM models for Korean speech units are trained with the hand-labeled speech database including the contextual information about phoneme, word phrase, utterance, and break strength. The developed system produced speech with a fairly good prosody. The synthesized speech is evaluated and compared with that of a corpus-based unit concatenating Korean text-to-speech system. The two systems were trained with the same manually labeled speech database. Secondly, comparison of the mel-cepstrum and the line spectrum pair (LSP) as the spectrum parameters for the developed HMM-based speech synthesis is described. Since the mel-cepstral analysis has a couple of merits compared to the linear prediction analysis and the normal cepstral analysis, mel-cepstral coefficients have been used as the feature for spectrum modeling of the HMM-based speech synthesis. Although the LSP also satisfies the stability and the quantization/ interpolation performance of the synthesis filter, its feasibility for the HMM-based speech synthesis is not tested. In this thesis, the LSP and the mel-cepstrum parameters are tested for the HMM-based speech synthesis, and the comparative performance evaluation is carried out. The two systems are trained with the same manually labeled speech database. The results show that the LSP can be a go...
Advisors
Hahn, Min-Sooresearcher한민수researcher
Description
한국정보통신대학교 : 공학부,
Publisher
한국정보통신대학교
Issue Date
2007
Identifier
392804/225023 / 020025338
Language
eng
Description

학위논문(박사) - 한국정보통신대학교 : 공학부, 2007.2, [ xiii, 102 p. ]

Keywords

speech synthesis; HMM; excitation model; 여기 모델; 음성 합성; 은닉 마코프 모델; maximum voiced frequency

URI
http://hdl.handle.net/10203/54582
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=392804&flag=dissertation
Appears in Collection
School of Engineering-Theses_Ph.D(공학부 박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0