Improving note-level singing transcription with phonetic information and note label refinement발음 정보와 음표 레이블 정제를 통한 노트 단위 가창 채보 개선

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 2
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor남주한-
dc.contributor.authorYong, Sangeon-
dc.contributor.author용상언-
dc.date.accessioned2024-08-08T19:31:02Z-
dc.date.available2024-08-08T19:31:02Z-
dc.date.issued2024-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1098157&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/322000-
dc.description학위논문(박사) - 한국과학기술원 : 문화기술대학원, 2024.2,[v, 90 p. :]-
dc.description.abstractIn this thesis, we present a novel method that leverages phonetic information to enhance the performance of note-level singing transcription models. Note-level singing transcription is a task that detects time-aligned musical notes from a given singing voice. Previous studies have utilized hidden Markov models with audio features extracted by signal processing-based methods to detect notes. However, the accuracy achieved by these methods has been relatively low. Recently, deep learning-based approaches have been proposed to improve performance, but due to the scarcity of well-annotated data on a large scale, the performance still lags behind that of other musical instruments with sufficient data. Notably, in the case of singing voices, generating a substantial amount of annotation data is challenging. Unlike a piano, there is no automated tool for recording notes, and synthesizing singing voices through virtual instruments is comparatively difficult. Moreover, manual annotation is both time-consuming and expensive due to the expressive characteristics of singing voices, and there are no definitive standard criteria for annotation. Considering these constraints, we conducted a comparative analysis of publicly available datasets annotated according to different standards and performed experiments to evaluate the impact of refined annotation on singing transcription. Based on the results of this analysis, we developed a dataset to facilitate a more accurate evaluation of singing transcription performance. Additionally, we propose a model that capitalizes on phonetic information to improve the performance of singing transcription and finally extend a proposed model to the scenario with background music by utilizing the pre-trained melody extraction model and speech recognition model with large-scale datasets. Our approach focuses on the fact that singing voices are associated with lyrics, and phonetic information is relatively less affected by the rich expressions of singing voices. We discuss the impact and limitations of integrating phonetic information into singing transcription and introduce methods to overcome these limitations, thereby further enhancing the accuracy of singing transcription.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject가창 채보▼a음악 정보 검색▼a음소 분류▼a딥러닝▼a전이 학습-
dc.subjectSinging transcription▼aMusic information retrieval▼aPhoneme recognition▼aDeep learning▼aTransfer learning-
dc.titleImproving note-level singing transcription with phonetic information and note label refinement-
dc.title.alternative발음 정보와 음표 레이블 정제를 통한 노트 단위 가창 채보 개선-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :문화기술대학원,-
dc.contributor.alternativeauthorNam, Juhan-
Appears in Collection
GCT-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0