Robust speech recognition based on partial information technique부분 정보 기법에 기반한 강인한 음성인식

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 475
  • Download : 0
Automatic speech recognition (ASR) systems in real environments may have to cope with various noise signals that corrupt some time-frequency regions of speech more severely than other regions. Though great progress has been achieved in the area of robust ASR, most techniques have focused on reasonably stationary wide-band noise and, therefore, are limited in their ability to achieve robustness in real noisy environments. Partial information technique, a developing area of research, models the human ability of emphasizing reliable partial information in time-frequency regions. As one of its main approaches, the multi-band ASR scheme splits the whole frequency range into several sub-bands. Sub-bands recognition results are then recombined by exploiting sub-band reliabilities to make a final decision. This dissertation aims at improving the ASR performance on partially corrupted speech based on the partial information technique. In the frequency domain, three limitations of the multi-band recognition system are addressed. First, the multi-band scheme cannot maximally utilize the uncorrupted parts because sub-band boundaries are fixed. The sub-band boundaries should be adaptive to localize the noise and better utilize partial information. Second, because sub-band feature vectors are processed independently in this method, the information contained in a global spectral structure may be lost. Finally, the whole ASR system must be rebuilt because of the architectural differences from a full-band ASR system. This study proposes a weighted filter bank analysis and model adaptation (WFBA-MA) method to resolve these problems. The proposed scheme estimates reliability weights of Mel filter bank channels and extracts a weighted Mel frequency cepstral coefficient by suppressing unreliable log filter bank energies. The same weights are also applied to an entire set of HMM parameters. An environment selective processing (ESP) method is also proposed, which determines whether a...
Advisors
Oh, Yung-Hwanresearcher오영환researcher
Description
한국과학기술원 : 전산학전공,
Publisher
한국과학기술원
Issue Date
2003
Identifier
181185/325007 / 000985369
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학전공, 2003.2, [ ix, 88 p. ]

Keywords

Weighted filter bank analysis; Partially corrupted speech; Multi-band speech recognition; Partial Information; Model adaptation; 모델 적응; 가중 필터뱅크 분석; 부분 손상 음성; 다중대역 음성인식; 부분 정보

URI
http://hdl.handle.net/10203/32838
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=181185&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0