Emotional Speech Synthesis for Multi-Speaker Emotional Dataset Using WaveNet Vocoder

Cited 3 time in webofscience Cited 6 time in scopus
  • Hit : 216
  • Download : 0
This paper studies the methods for emotional speech synthesis using a neural vocoder. For a neural vocoder, WaveNet is used, which generates waveforms from mel spectrograms. We propose two networks, i.e., deep convolutional neural network (CNN)-based text-to-speech (TTS) system and emotional converter, and deep CNN architecture is designed as to utilize long-term context information. The first network estimates neutral mel spectrograms using linguistic features, and the second network converts neutral mel spectrograms to emotional mel spectrograms. Experimental results on a TTS system and emotional TTS system, showed that the proposed systems are indeed a promising approach.
Publisher
Institute of Electrical and Electronics Engineers Inc.
Issue Date
2019-01-12
Language
English
Citation

2019 IEEE International Conference on Consumer Electronics, ICCE 2019

DOI
10.1109/ICCE.2019.8661919
URI
http://hdl.handle.net/10203/269539
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0