DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 남주한 | - |
dc.contributor.author | Yong, Sangeon | - |
dc.contributor.author | 용상언 | - |
dc.date.accessioned | 2024-08-08T19:31:02Z | - |
dc.date.available | 2024-08-08T19:31:02Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1098157&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/322000 | - |
dc.description | 학위논문(박사) - 한국과학기술원 : 문화기술대학원, 2024.2,[v, 90 p. :] | - |
dc.description.abstract | In this thesis, we present a novel method that leverages phonetic information to enhance the performance of note-level singing transcription models. Note-level singing transcription is a task that detects time-aligned musical notes from a given singing voice. Previous studies have utilized hidden Markov models with audio features extracted by signal processing-based methods to detect notes. However, the accuracy achieved by these methods has been relatively low. Recently, deep learning-based approaches have been proposed to improve performance, but due to the scarcity of well-annotated data on a large scale, the performance still lags behind that of other musical instruments with sufficient data. Notably, in the case of singing voices, generating a substantial amount of annotation data is challenging. Unlike a piano, there is no automated tool for recording notes, and synthesizing singing voices through virtual instruments is comparatively difficult. Moreover, manual annotation is both time-consuming and expensive due to the expressive characteristics of singing voices, and there are no definitive standard criteria for annotation. Considering these constraints, we conducted a comparative analysis of publicly available datasets annotated according to different standards and performed experiments to evaluate the impact of refined annotation on singing transcription. Based on the results of this analysis, we developed a dataset to facilitate a more accurate evaluation of singing transcription performance. Additionally, we propose a model that capitalizes on phonetic information to improve the performance of singing transcription and finally extend a proposed model to the scenario with background music by utilizing the pre-trained melody extraction model and speech recognition model with large-scale datasets. Our approach focuses on the fact that singing voices are associated with lyrics, and phonetic information is relatively less affected by the rich expressions of singing voices. We discuss the impact and limitations of integrating phonetic information into singing transcription and introduce methods to overcome these limitations, thereby further enhancing the accuracy of singing transcription. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 가창 채보▼a음악 정보 검색▼a음소 분류▼a딥러닝▼a전이 학습 | - |
dc.subject | Singing transcription▼aMusic information retrieval▼aPhoneme recognition▼aDeep learning▼aTransfer learning | - |
dc.title | Improving note-level singing transcription with phonetic information and note label refinement | - |
dc.title.alternative | 발음 정보와 음표 레이블 정제를 통한 노트 단위 가창 채보 개선 | - |
dc.type | Thesis(Ph.D) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :문화기술대학원, | - |
dc.contributor.alternativeauthor | Nam, Juhan | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.