Monaural speech segregation based on pitch track correction using bayesian filters = 베이지안 필터를 사용한 피치 트랙 수정 기반 단일채널 음성분리

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 110
  • Download : 0
In this work, pitch tracking technique that adopts Bayesian filters and speech/music pitch classification using recurrent neural networks (RNN) for speech segregation from mixtures of speech and competing sounds are proposed. Conventional speech segregation methods use sub-band masking in which the masks are obtained by modulation at the found speech pitch frequency. Segregation performance, therefore, relies heavily on the quality of the pitch estimation. However, pitch estimation is difficult in severe noise environment. In order to improve the accuracy of estimation, we use Bayesian filters which are popularly used in object tracking from noisy videos. Two types of Bayesian filters, particle filter and ensemble Kalman filter, are adopted for tracking the pitch contours. The particle filter uses a simple first-order Markovian process from the past state to the present, and the ensemble Kalman filter adds a linear transition model to the same Markovian model. As speech and music has similar harmonic structures, the conventional speech segregation methods based on sub-band masking perform badly against music interference. Therefore, we propose speech/music pitch classification which adopts RNNs, which are simple recurrent network, long short-term memory (LSTM) and bidirectional LSTM, for modeling the characteristics of the speech pitch and music pitch. The experiment results conducted on mixtures of speech signals and various types of noise and music sound sources show that the proposed methods achieved significantly better segregation performance than the conventional method in most cases. Among all proposed methods, the segregation method with ensemble Kalman filter and bidirectional LSTM achieved the best performance.
Advisors
Choi, Ho-Jinresearcher최호진researcherOh, Yung-Hwanresearcher오영환researcher
Description
전산학부,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(박사) - 전산학부, 2018.8,[v, 65 p. :]

Keywords

Monaural speech segregation▼apitch track correction▼aparticle filter▼aensemble Kalman filter▼aspeech/music pitch classification▼arecurrent neural network; 단일채널 음성분리▼a피치 트랙 수정▼a파티클 필터▼a앙상블 칼만 필터▼a음성/음악 피치 분류▼a순환신경망

URI
http://hdl.handle.net/10203/265354
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828223&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0