DSpace at KOASAS: Singing melody extraction using multi-column deep neural networks

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Theses_Master(석사논문)

Singing melody extraction using multi-column deep neural networks다중 심층 신경망을 사용한 가창 멜로디 추출

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 634
Download : 0

Export

Kum, Sangeun / 금상은

While the music market has been growing, the need for new service has also been increasing, such as cover song identification and query by humming. These services use a melody to search songs and so extracting melody, particularly from singing voice, is important to implement the systems. In this thesis, we focus on algorithms to extract the singing melody from audio signals. Singing melody extraction is a task that tracks pitch contour of singing voice in polyphonic music. While the majority of melody extraction algorithms are based on computing a saliency function of pitch candidates or separating the melody source from the mixture, data-driven approaches based on classification have been rarely explored. In this thesis, we present a classification-based approach for singing melody extraction using multi-column deep neural networks. In the proposed model, each of neural networks is trained to predict a pitch label of singing voice from spectrogram, but their outputs have different pitch resolutions. The melody contour is inferred by combining the outputs of the networks. We conduct the Viterbi decoding based on hidden Markov model to capture long-term temporal information. Our system also includes a singing voice detector to select singing voice frames using an additional deep neural network. It is trained with labels of singing voice activity and the output of deep neural networks for melody extraction. In order to take advantage of the data-driven approach, we also augment training data by pitch-shifting the audio content and modifying the pitch label accordingly. We use the RWC dataset and part of the MedleyDB dataset for training the model and evaluate it on the ADC 2004, MIREX 2005 and MIR-1k datasets. Through several settings of experiments, we show incremental improvements of the melody prediction. Lastly, we compare our best result to those of previous state-of-the-arts.

Advisors: Nam, Juhan researcher; 남주한 researcher

Description: 한국과학기술원 :문화기술대학원,

Publisher: 한국과학기술원

Issue Date: 2016

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 문화기술대학원, 2016.8 ,[v, 39 p. :]

Keywords: melody extraction; data-driven approach; multi-column deep neural network; data augmentation; singing voice detection; 가창 멜로디 추출; 데이터 기반 방법; 다중 심층 신경망; 데이터 증가 방법; 가창 목소리 검출

URI: http://hdl.handle.net/10203/221343

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=663324&flag=dissertation

Appears in Collection: GCT-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Singing melody extraction using multi-column deep neural networks다중 심층 신경망을 사용한 가창 멜로디 추출

KOASAS

Communities & Collections