In this thesis, a system that is capable of transforming neutral speech into emotional one is developed. At first, emotional speech data are collected and analysed. Analysed feature parameters for emotional speech are duration, average amplitude, amplitude range, average pitch, pitch range, etc. Conversion rules are formed based on the analysed feature parameters of emotional speech, and these rules are directly applied to transform neutral speech into emotional speech.
Each feature parameter is modified in time-domain. Duration modification is performed by duplication or deletion of speech signal of one pitch or of specific duration according to the corresponding speech type, i.e., voiced or unvoiced, relatively. Amplitude of speech is controlled by multiplying amplitude of original speech by amplitude ratio. The TD-PSOLA with triangular window is used for pitch modification.
In this thesis, two conversion systems are implemented. One is a dependent conversion system and the other one is an independent one. In the dependent conversion system, emotionless speech is converted into target emotional speech by imitating the feature parameters of emotional speech. On the contrary, the independent conversion system transforms emotionless speech into emotional one by using the conversion rules that are previously obtained by the emotional speech database analysis. For the performance evaluation of conversion system, the MOS test is adopted. The test result shows that angry and sad speech conversions are relatively more successful than joyful speech conversion.