(A) unified voice model for data-efficient singing voice synthesis통합 음성 모델을 통한 데이터 효율적인 가창 합성

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 206
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorNam, Juhan-
dc.contributor.advisor남주한-
dc.contributor.authorChoi, Soonbeom-
dc.date.accessioned2023-06-21T19:33:56Z-
dc.date.available2023-06-21T19:33:56Z-
dc.date.issued2023-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030385&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/307967-
dc.description학위논문(박사) - 한국과학기술원 : 문화기술대학원, 2023.2,[viii, 85 p. :]-
dc.description.abstractIn this thesis, we propose a unified voice model for singing voice synthesis focusing on data-efficiency. Singing voice synthesis (SVS) is a task that generates natural vocal sounds from lyrics and melody. SVS systems used to be based on concatenative synthesis or statistical parametric methods but they have limitations in synthesizing a natural singing voice. In recent years SVS models based on deep neural networks have made great improvements in synthesizing natural singing voices. To generate high-quality singing voices with deep neural networks, a large amount of data containing high-quality singing voices paired with lyrics and melodies are required. In addition, aligning melody labels to the corresponding singing voice audio is a time-exhausting manual work. Furthermore, there are a limited amount of publicly available singing voice data with melody labels so it is hard to improve a SVS model and synthesize a variety of singers' voice. To address these issues, we propose a unified voice model which can be trained with either speech or singing voice. They distill pronunciation and timbre information from speech data which are much more abundant than singing voice but they can generate singing voice. The first is a melody-unsupervision model which can be trained without melody information and phoneme length information. We evaluate the proposed model on singing voice synthesis and show that it can generate singing voice with low amount of singing voice. In addition, we improve the model by disentangling global timbre to synthesize a variety of singing voice timbre. Our model has the potential to be applied for personalized singing voice synthesizer with a small amount of data.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectSinging voice synthesis▼aSpeech synthesis▼aUnsupervised learning▼aDeep learning-
dc.subject가창 합성▼a음성 합성▼a비지도 학습▼a딥러닝-
dc.title(A) unified voice model for data-efficient singing voice synthesis-
dc.title.alternative통합 음성 모델을 통한 데이터 효율적인 가창 합성-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :문화기술대학원,-
dc.contributor.alternativeauthor최순범-
Appears in Collection
GCT-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0