Waveform interpolation-based wideband speech compression for the text-to-speech database = 음성합성기의 데이터베이스를 위한 파형보간 기반의 광대역 음성신호 압축

This thesis presents the low bitrate wideband speech compression techniques for the corpus-based TTS (Text-to-Speech) system. In recent years, a variety of speech coding techniques have been proposed and evolved for various applications mainly in communication areas. However, these coding approaches are not appropriate for the storage applications such as the compression of the TTS database. There are some differences between the speech coders for communication areas and those for storage applications. Therefore, the speech coders developed for communication applications have some restrictions to be adopted directly into the storage applications. On the other hand, the corpus-based TTS systems generally use a database consisting of a great number of natural speech segments in order to synthesize a decent quality speech. Therefore, it is indispensable to compress it for the practical implementation of the embedded TTS system. To comply with these necessities, we propose several new features which are more efficient and eligible to compress the TTS database. These are a dimension conversion technique for the quantization of the spectrum vectors with a variable dimension, an efficient decoding scheme for the segmented frame decoding, and a complexity reduction method. The proposed dimension conversion method provides a practical way to quantize the variable dimension vectors with a small codebook memory. The efficient decoding scheme for the segmented frame decoding is an essential technique for the reconstruction of the decent quality speech segments needed as an input signal to the TTS system. The segmented frame decoding scheme can reconstruct a good quality speech by using the previous parameters and the pre-obtained phase estimates. The presented computational complexity reduction of the decoder is also one of the critical factors to realize an embedded TTS system. The decoder complexity reduction is realized by removing the characteristic waveform realignmen...
Hahn, Min-Sooresearcher한민수researcher
한국정보통신대학교 : 공학부,
Issue Date
TTS; Waveform Interpolation; Speech Coding; Wideband Speech; 광대역음성; 음성합성기; 파형보간; 음성코딩

School of Engineering-Theses_Ph.D(공학부 박사논문)
