HMM-based Korean speech synthesis using suprasegmental prosodic features초분절적 운율 정보를 이용한 HMM 기반 한국어 음성 합성

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 540
  • Download : 0
Hidden Markov models (HMMs) are generally used to recent researches statistical parametric speech synthesis systems. An HMM is a generative model frequently used in speech recognition, which is applied to parameter generation that is prior stage to signal processing of speech synthesis. HMM-based speech synthesis has advantages including the followings: Much less storage is necessary because there is no need to keep speech corpus after training is finished. Furthermore, it is easy to get the speeches with various voice characteristics, speaking styles, and emotions by modifying the parameters. There are more advantages such as multilingual support, robustness, and ability to separately control each parameter. Commonly believed drawbacks of this kind of speech synthesis such as vocoder-like sound or unnaturalness due to speech reconstruction from parameters are being gradually overcome. However, most HMM-based speech synthesis approaches are inferior in the sense of prosody. Prosody is an important factor of verbal communication. There is a research insists that prosody has more eminent impact on communication than meaning of the words themselves. The primary weakness of HMM-based speech synthesis system in generation of prosody is that it considers prosodic features in subword units, i.e. phones. A model in the trained HMM set corresponds to a phone. Therefore, it has difficulties utilizing suprasegmental information such as relations between words, structure of the sentence, and lengths of each word, phrase, and the sentence. This leads the HMM-based system to lack the capability for creating natural speeches with human-like changes in pitch and tempo, rather it creates machine-like speeches which have the same pauses at spaces, pronounce all the words in the same way without any changes in strength or rate. Context-dependent HMM is suggested to overcome this problem; still it has not been the essential solution for prosody. We have researched generating ...
Advisors
Oh, Yung-Hwanresearcher오영환researcher
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2011
Identifier
467932/325007  / 020083380
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학과, 2011.2, [ vi, 39 p. ]

Keywords

Speech synthesis; HMM; CART; prosody; 음성 합성; 은닉 마르코프 모델; 분류 및 회귀 트리; 운율; 한국어; Korean

URI
http://hdl.handle.net/10203/180572
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=467932&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0