This paper studies the methods for emotional speech synthesis using a neural vocoder. For a neural vocoder, WaveNet is used, which generates waveforms from mel spectrograms. We propose two networks, i.e., deep convolutional neural network (CNN)-based text-to-speech (TTS) system and emotional converter, and deep CNN architecture is designed as to utilize long-term context information. The first network estimates neutral mel spectrograms using linguistic features, and the second network converts neutral mel spectrograms to emotional mel spectrograms. Experimental results on a TTS system and emotional TTS system, showed that the proposed systems are indeed a promising approach.