Zero-crossing-based sound source localization, segregation and recognition = 영교차점에 기초한 음원의 방향 탐지, 분리 및 인식

This thesis presents some new methods of spatial hearing Algorithm. The first one is zero-crossing-based sound source localization with precedence effect in severely reverberant conditions. And the second one is binaural mask estimation for sound segregation and recognition under the condition that multiple sound sources are present simultaneously. The precedence effect is a psychoacoustic effect related to a group of auditory phenomena. Especially under reverberant condition, when various similar sounds originated from one or more sources at different location from the listener, the direct sound arrived first and it is also heard first. To the listener, this creates the impression that the sound comes from that location alone due to a phenomenon and suppress the perception of later arrivals. By adapting this precedence effect to our sound source localization algorithm, we can get very good simulation results in sound localization under severely reverberant condition. For sound segregation and recognition, we use a ratio masking method. The masking is determined by the estimated sound source directions using the spatial cues such as inter-aural time differences (ITDs) and inter-aural intensity differences (IIDs). In the suggested method, the estimation of ITDs is utilizing the statistical properties of zero-crossings detected from binaural filter-bank outputs. We also consider the estimation of ITDs with the aid of IID samples to cope with the phase ambiguities of ITD estimates in high frequencies. For the masking method, we consider using the power ratio of the target to interference sources. We show that this power ratio is optimal from the view point of reconstructing the target speech signal and is effectively used in missing data speech recognition. To estimate the power ratio, the expectation and maximization (EM) method is used for ITD estimates. As a result, the proposed method is able to provide the better masking scheme for speech segregation and...
Kim, Sung-Horesearcher김성호researcherKil, Rhee-Manresearcher길이만researcher
한국과학기술원 : 수리과학과,
418773/325007  / 020045146

학위논문(박사) - 한국과학기술원 : 수리과학과, 2010.2, [ ix, 87 p. ]


반향; 음성 인식; 음원 방향 탐지; 음성 분리; 영교차점; Reverberation; Speech Recognition; Speech Segregation; Sound Source Localization; Zero-Crossing

