(A) unified voice model for data-efficient singing voice synthesis통합 음성 모델을 통한 데이터 효율적인 가창 합성

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 205
  • Download : 0
In this thesis, we propose a unified voice model for singing voice synthesis focusing on data-efficiency. Singing voice synthesis (SVS) is a task that generates natural vocal sounds from lyrics and melody. SVS systems used to be based on concatenative synthesis or statistical parametric methods but they have limitations in synthesizing a natural singing voice. In recent years SVS models based on deep neural networks have made great improvements in synthesizing natural singing voices. To generate high-quality singing voices with deep neural networks, a large amount of data containing high-quality singing voices paired with lyrics and melodies are required. In addition, aligning melody labels to the corresponding singing voice audio is a time-exhausting manual work. Furthermore, there are a limited amount of publicly available singing voice data with melody labels so it is hard to improve a SVS model and synthesize a variety of singers' voice. To address these issues, we propose a unified voice model which can be trained with either speech or singing voice. They distill pronunciation and timbre information from speech data which are much more abundant than singing voice but they can generate singing voice. The first is a melody-unsupervision model which can be trained without melody information and phoneme length information. We evaluate the proposed model on singing voice synthesis and show that it can generate singing voice with low amount of singing voice. In addition, we improve the model by disentangling global timbre to synthesize a variety of singing voice timbre. Our model has the potential to be applied for personalized singing voice synthesizer with a small amount of data.
Advisors
Nam, Juhanresearcher남주한researcher
Description
한국과학기술원 :문화기술대학원,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 문화기술대학원, 2023.2,[viii, 85 p. :]

Keywords

Singing voice synthesis▼aSpeech synthesis▼aUnsupervised learning▼aDeep learning; 가창 합성▼a음성 합성▼a비지도 학습▼a딥러닝

URI
http://hdl.handle.net/10203/307967
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030385&flag=dissertation
Appears in Collection
GCT-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0