DSpace at KOASAS: HMM-based Korean speech synthesis using suprasegmental prosodic features

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

HMM-based Korean speech synthesis using suprasegmental prosodic features초분절적 운율 정보를 이용한 HMM 기반 한국어 음성 합성

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 540
Download : 0

Export

Lee, Seung-Uk / 이승욱

Hidden Markov models (HMMs) are generally used to recent researches statistical parametric speech synthesis systems. An HMM is a generative model frequently used in speech recognition, which is applied to parameter generation that is prior stage to signal processing of speech synthesis. HMM-based speech synthesis has advantages including the followings: Much less storage is necessary because there is no need to keep speech corpus after training is finished. Furthermore, it is easy to get the speeches with various voice characteristics, speaking styles, and emotions by modifying the parameters. There are more advantages such as multilingual support, robustness, and ability to separately control each parameter. Commonly believed drawbacks of this kind of speech synthesis such as vocoder-like sound or unnaturalness due to speech reconstruction from parameters are being gradually overcome. However, most HMM-based speech synthesis approaches are inferior in the sense of prosody. Prosody is an important factor of verbal communication. There is a research insists that prosody has more eminent impact on communication than meaning of the words themselves. The primary weakness of HMM-based speech synthesis system in generation of prosody is that it considers prosodic features in subword units, i.e. phones. A model in the trained HMM set corresponds to a phone. Therefore, it has difficulties utilizing suprasegmental information such as relations between words, structure of the sentence, and lengths of each word, phrase, and the sentence. This leads the HMM-based system to lack the capability for creating natural speeches with human-like changes in pitch and tempo, rather it creates machine-like speeches which have the same pauses at spaces, pronounce all the words in the same way without any changes in strength or rate. Context-dependent HMM is suggested to overcome this problem; still it has not been the essential solution for prosody. We have researched generating ...

Advisors: Oh, Yung-Hwan researcher; 오영환 researcher

Description: 한국과학기술원 : 전산학과,

Publisher: 한국과학기술원

Issue Date: 2011

Identifier: 467932/325007 / 020083380

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전산학과, 2011.2, [ vi, 39 p. ]

Keywords: Speech synthesis; HMM; CART; prosody; 음성 합성; 은닉 마르코프 모델; 분류 및 회귀 트리; 운율; 한국어; Korean

URI: http://hdl.handle.net/10203/180572

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=467932&flag=dissertation

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

HMM-based Korean speech synthesis using suprasegmental prosodic features초분절적 운율 정보를 이용한 HMM 기반 한국어 음성 합성

KOASAS

Communities & Collections