DSpace at KOASAS: Monaural speech segregation based on pitch track correction using bayesian filters

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Monaural speech segregation based on pitch track correction using bayesian filters베이지안 필터를 사용한 피치 트랙 수정 기반 단일채널 음성분리

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 518
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Choi, Ho-Jin	-
dc.contributor.advisor	최호진	-
dc.contributor.advisor	Oh, Yung-Hwan	-
dc.contributor.advisor	오영환	-
dc.contributor.author	Kim, Han-Gyu	-
dc.date.accessioned	2019-08-25T02:48:12Z	-
dc.date.available	2019-08-25T02:48:12Z	-
dc.date.issued	2018	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828223&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/265354	-
dc.description	학위논문(박사) - 전산학부, 2018.8,[v, 65 p. :]	-
dc.description.abstract	In this work, pitch tracking technique that adopts Bayesian filters and speech/music pitch classification using recurrent neural networks (RNN) for speech segregation from mixtures of speech and competing sounds are proposed. Conventional speech segregation methods use sub-band masking in which the masks are obtained by modulation at the found speech pitch frequency. Segregation performance, therefore, relies heavily on the quality of the pitch estimation. However, pitch estimation is difficult in severe noise environment. In order to improve the accuracy of estimation, we use Bayesian filters which are popularly used in object tracking from noisy videos. Two types of Bayesian filters, particle filter and ensemble Kalman filter, are adopted for tracking the pitch contours. The particle filter uses a simple first-order Markovian process from the past state to the present, and the ensemble Kalman filter adds a linear transition model to the same Markovian model. As speech and music has similar harmonic structures, the conventional speech segregation methods based on sub-band masking perform badly against music interference. Therefore, we propose speech/music pitch classification which adopts RNNs, which are simple recurrent network, long short-term memory (LSTM) and bidirectional LSTM, for modeling the characteristics of the speech pitch and music pitch. The experiment results conducted on mixtures of speech signals and various types of noise and music sound sources show that the proposed methods achieved significantly better segregation performance than the conventional method in most cases. Among all proposed methods, the segregation method with ensemble Kalman filter and bidirectional LSTM achieved the best performance.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Monaural speech segregation▼apitch track correction▼aparticle filter▼aensemble Kalman filter▼aspeech/music pitch classification▼arecurrent neural network	-
dc.subject	단일채널 음성분리▼a피치 트랙 수정▼a파티클 필터▼a앙상블 칼만 필터▼a음성/음악 피치 분류▼a순환신경망	-
dc.title	Monaural speech segregation based on pitch track correction using bayesian filters	-
dc.title.alternative	베이지안 필터를 사용한 피치 트랙 수정 기반 단일채널 음성분리	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	325007	-
dc.description.department	전산학부,	-
dc.contributor.alternativeauthor	김한규	-

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Monaural speech segregation based on pitch track correction using bayesian filters베이지안 필터를 사용한 피치 트랙 수정 기반 단일채널 음성분리

KOASAS

Communities & Collections