(A) study on the frequency - weighted spectral representations for robust speech recognitionRobust한 음성인식을 위한 주파수 가중 스펙트럼 표현에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 684
  • Download : 0
To use a speech recognition system in a practical environment, the speech recognizer should have the robustness with respect to unexpected changes in the acoustical environmint as well as the inherent variability of the speech signal. In this dissertation we propose and analyze robust spectral representations derived from short-time speech spectrum, which are less sensitive to the encironmental changes due to speakers or background noise, and yield smaller number of bits or low dimensional representation without performance degradation. By performing spectral peak enhancement to an all-pole model spectrum and integrating th resultant binarized spectrum with nonlinear frequency scale, several kinds of the frequency-weighted spectral representations were derived. The spectral peak was enhanced by thresholding the secondorder spatial derivative of the spectral envelope with respect to frequency to relax the sensitivity of distance measute to the peak amplitude as well as to enhance the spectral peak. The proposed representations are applied to nasal, stop, and isolated digit recognition in clean, white noise added, and band-limited conditions. To evaluate the recognition performance, we used an all-pole model based features, LPC cepstrum and mel-cepstrum, and filter bank based features with Euclidian and several weighted cepstral distance measures in a template-based isolated word recognition system. The proposed features showed imprevement in the recognition accuracy for the nasal-vowel syllables in clean, white Gaussian noise added, and band-limited conditions. While the performance of stop-vowel syllable recognition was decreased, the recognition result of isolated digits was competitive compared to the best conventional representation and distance measure, but the robustness and the represention efficiency were improved. Among the frequency-weighted spectral representations, critical-band peak presence vector(CBPPV) which consists of 17 bits representing peak ...
Advisors
Cho, Jung-Wan조정완
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
1992
Identifier
59805/325007 / 000835041
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학과, 1992.2, [ [x], 144 p. ]

Keywords

스펙트럼 표현

URI
http://hdl.handle.net/10203/32923
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=59805&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0