Monaural speech segregation based on pitch track correction using bayesian filters베이지안 필터를 사용한 피치 트랙 수정 기반 단일채널 음성분리

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 515
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorChoi, Ho-Jin-
dc.contributor.advisor최호진-
dc.contributor.advisorOh, Yung-Hwan-
dc.contributor.advisor오영환-
dc.contributor.authorKim, Han-Gyu-
dc.date.accessioned2019-08-25T02:48:12Z-
dc.date.available2019-08-25T02:48:12Z-
dc.date.issued2018-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828223&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/265354-
dc.description학위논문(박사) - 전산학부, 2018.8,[v, 65 p. :]-
dc.description.abstractIn this work, pitch tracking technique that adopts Bayesian filters and speech/music pitch classification using recurrent neural networks (RNN) for speech segregation from mixtures of speech and competing sounds are proposed. Conventional speech segregation methods use sub-band masking in which the masks are obtained by modulation at the found speech pitch frequency. Segregation performance, therefore, relies heavily on the quality of the pitch estimation. However, pitch estimation is difficult in severe noise environment. In order to improve the accuracy of estimation, we use Bayesian filters which are popularly used in object tracking from noisy videos. Two types of Bayesian filters, particle filter and ensemble Kalman filter, are adopted for tracking the pitch contours. The particle filter uses a simple first-order Markovian process from the past state to the present, and the ensemble Kalman filter adds a linear transition model to the same Markovian model. As speech and music has similar harmonic structures, the conventional speech segregation methods based on sub-band masking perform badly against music interference. Therefore, we propose speech/music pitch classification which adopts RNNs, which are simple recurrent network, long short-term memory (LSTM) and bidirectional LSTM, for modeling the characteristics of the speech pitch and music pitch. The experiment results conducted on mixtures of speech signals and various types of noise and music sound sources show that the proposed methods achieved significantly better segregation performance than the conventional method in most cases. Among all proposed methods, the segregation method with ensemble Kalman filter and bidirectional LSTM achieved the best performance.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectMonaural speech segregation▼apitch track correction▼aparticle filter▼aensemble Kalman filter▼aspeech/music pitch classification▼arecurrent neural network-
dc.subject단일채널 음성분리▼a피치 트랙 수정▼a파티클 필터▼a앙상블 칼만 필터▼a음성/음악 피치 분류▼a순환신경망-
dc.titleMonaural speech segregation based on pitch track correction using bayesian filters-
dc.title.alternative베이지안 필터를 사용한 피치 트랙 수정 기반 단일채널 음성분리-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department전산학부,-
dc.contributor.alternativeauthor김한규-
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0