Singing melody extraction using multi-column deep neural networks다중 심층 신경망을 사용한 가창 멜로디 추출

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 634
  • Download : 0
While the music market has been growing, the need for new service has also been increasing, such as cover song identification and query by humming. These services use a melody to search songs and so extracting melody, particularly from singing voice, is important to implement the systems. In this thesis, we focus on algorithms to extract the singing melody from audio signals. Singing melody extraction is a task that tracks pitch contour of singing voice in polyphonic music. While the majority of melody extraction algorithms are based on computing a saliency function of pitch candidates or separating the melody source from the mixture, data-driven approaches based on classification have been rarely explored. In this thesis, we present a classification-based approach for singing melody extraction using multi-column deep neural networks. In the proposed model, each of neural networks is trained to predict a pitch label of singing voice from spectrogram, but their outputs have different pitch resolutions. The melody contour is inferred by combining the outputs of the networks. We conduct the Viterbi decoding based on hidden Markov model to capture long-term temporal information. Our system also includes a singing voice detector to select singing voice frames using an additional deep neural network. It is trained with labels of singing voice activity and the output of deep neural networks for melody extraction. In order to take advantage of the data-driven approach, we also augment training data by pitch-shifting the audio content and modifying the pitch label accordingly. We use the RWC dataset and part of the MedleyDB dataset for training the model and evaluate it on the ADC 2004, MIREX 2005 and MIR-1k datasets. Through several settings of experiments, we show incremental improvements of the melody prediction. Lastly, we compare our best result to those of previous state-of-the-arts.
Advisors
Nam, Juhanresearcher남주한researcher
Description
한국과학기술원 :문화기술대학원,
Publisher
한국과학기술원
Issue Date
2016
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 문화기술대학원, 2016.8 ,[v, 39 p. :]

Keywords

melody extraction; data-driven approach; multi-column deep neural network; data augmentation; singing voice detection; 가창 멜로디 추출; 데이터 기반 방법; 다중 심층 신경망; 데이터 증가 방법; 가창 목소리 검출

URI
http://hdl.handle.net/10203/221343
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=663324&flag=dissertation
Appears in Collection
GCT-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0