(An) approach for melody extraction using a harmonic structure model = 하모닉 구조 모델을 이용한 멜로디 추출 접근

This thesis considers an algorithm for extracting the melody pitch of a given polyphonic audio using a harmonic structure model. The proposed algorithm performs melody extraction in two steps: (1) melody pitch candidate estimation and (2) melody pitch sequence identification that includes a simple smoothing process. In the melody pitch candidate estimation step, multiple melody pitch candidates are estimated based on a cost that informs the strength of the harmonic structure in the spectrum of a windowed signal. Various techniques accurately estimate melody pitch candidates: (1) several harmonic structures are estimated from monophonic data containing a melody sound, such as a singing voice, piano, or saxophone, because the harmonic structure of the melody pitch is different from the melody instruments, pitches, and tempos; (2) a melody pitch range is estimated based on the pitch candidates to increase accuracy and reduce computation complexity; (3) percussive sounds are suppressed to reduce percussive sound interference. In the melody pitch sequence identification step, a melody line is selected from the many possible pitch sequences based on the following properties of the melody line: (1) the vibrato exhibits an extent of 60-200 cent for human singing voices and only 20-30 cent for other instruments; (2) transitions between melody notes are typically limited to one octave (1200 cent); (3) a rest during singing is longer than 50 ms. Then, a smoothing process is performed to refine spurious pitches and octave errors. The ADC04 database, MIREX05 training database, and RWC database are used for the experiment. The experiment results show that the proposed melody extraction algorithm is reasonable and performs comparably to state of the art algorithms.
