(A) model of masking as a front-end for the robust speech recognition잡음 둔감한 음성 인식을 위한 마스킹 모델

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 468
  • Download : 0
Nowadays automatic speech recognition (ASR) is emerging as one of the most promising technologies in near future. One of key challenges in ASR research is the sensitivity of ASR systems to the acoustic interferences like noise and reverberation. In this dissertation, the masking effect which is observed in human auditory perception, is utilized to make noise robust ASR systems. Masking is the process by which the threshold of audibility for one sound is raised by the presence of another sounds, and it is believed to enhance hearing resolution by cutting off redundant signals. The biological evidences for two kinds of masking, frequency masking and temporal masking, are exploited to model the masking effects and both types of masking are implemented with the conventional speech recognition systems. For further improvements of performance, the engineering approaches are introduced with the frequency and time domain filters. Frequency masking is modeled by the lateral inhibition in frequency domain and temporal masking by the unilateral inhibition in time domain. The parameters for the filters which determine the amount and range of inhibition, are searched on the basis of recognition performance with isolated-word recognition tasks. The proposed models are incorporated with the conventional feature extraction methods, including Mel-frequency cepstral coefficients (MFCC) model and zero-crossing peak-amplitude (ZCPA) model. MFCC model is well cooperated with the proposed model of frequency masking and ZCPA model has the built-in property of frequency masking. Temporal masking is applied to both model in the same way. The recognition performance with the proposed model of masking shows superior performance and it is also computationally efficient. For further improvement of performance, two additional methods are used with the proposed model. The spectral subtraction, which is conventional method widely used, shows the much more improvement when used wi...
Advisors
Lee, Soo-Youngresearcher이수영researcher
Description
한국과학기술원 : 전기및전자공학전공,
Publisher
한국과학기술원
Issue Date
2003
Identifier
231120/325007  / 000995133
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학전공, 2003.8, [ viii, 90 p. ]

Keywords

feature extraction; speech recognition; masking model; auditory model; 청각기관모델; 특징추출; 음성인식; 마스킹 모델

URI
http://hdl.handle.net/10203/35170
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=231120&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0