Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

Cited 3 time in webofscience Cited 3 time in scopus
  • Hit : 390
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Han-Gyuko
dc.contributor.authorJang, Gil-Jinko
dc.contributor.authorOh, Yung-Hwanko
dc.contributor.authorChoi, Ho-Jinko
dc.date.accessioned2020-10-15T00:55:11Z-
dc.date.available2020-10-15T00:55:11Z-
dc.date.created2019-07-01-
dc.date.issued2020-10-
dc.identifier.citationJOURNAL OF SUPERCOMPUTING, v.76, no.10, pp.8193 - 8213-
dc.identifier.issn0920-8542-
dc.identifier.urihttp://hdl.handle.net/10203/276584-
dc.description.abstractIn this paper, we propose speech/music pitch classification based on recurrent neural network (RNN) for monaural speech segregation from music interferences. The speech segregation methods in this paper exploit sub-band masking to construct segregation masks modulated by the estimated speech pitch. However, for speech signals mixed with music, speech pitch estimation becomes unreliable, as speech and music have similar harmonic structures. In order to remove the music interference effectively, we propose an RNN-based speech/music pitch classification. Our proposed method models the temporal trajectories of speech and music pitch values and determines an unknown continuous pitch sequence as belonging to either speech or music. Among various types of RNNs, we chose simple recurrent network, long short-term memory (LSTM), and bidirectional LSTM for pitch classification. The experimental results show that our proposed method significantly outperforms the baseline methods for speech–music mixtures without loss of segregation performance for speech-noise mixtures.-
dc.languageEnglish-
dc.publisherSPRINGER-
dc.titleSpeech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation-
dc.typeArticle-
dc.identifier.wosid000569152500032-
dc.identifier.scopusid2-s2.0-85077145665-
dc.type.rimsART-
dc.citation.volume76-
dc.citation.issue10-
dc.citation.beginningpage8193-
dc.citation.endingpage8213-
dc.citation.publicationnameJOURNAL OF SUPERCOMPUTING-
dc.identifier.doi10.1007/s11227-019-02785-x-
dc.contributor.localauthorOh, Yung-Hwan-
dc.contributor.localauthorChoi, Ho-Jin-
dc.contributor.nonIdAuthorKim, Han-Gyu-
dc.contributor.nonIdAuthorJang, Gil-Jin-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorSpeech segregation-
dc.subject.keywordAuthorSpeech pitch estimation-
dc.subject.keywordAuthorPitch classification-
dc.subject.keywordAuthorRecurrent neural network-
dc.subject.keywordAuthorLong short-term memory-
dc.subject.keywordAuthorBidirectional long short-term memory-
dc.subject.keywordPlusSEPARATION-
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0