DSpace at KOASAS: (A) unified voice model for data-efficient singing voice synthesis

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Theses_Ph.D.(박사논문)

(A) unified voice model for data-efficient singing voice synthesis통합 음성 모델을 통한 데이터 효율적인 가창 합성

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 205
Download : 0

Export

Choi, Soonbeom

In this thesis, we propose a unified voice model for singing voice synthesis focusing on data-efficiency. Singing voice synthesis (SVS) is a task that generates natural vocal sounds from lyrics and melody. SVS systems used to be based on concatenative synthesis or statistical parametric methods but they have limitations in synthesizing a natural singing voice. In recent years SVS models based on deep neural networks have made great improvements in synthesizing natural singing voices. To generate high-quality singing voices with deep neural networks, a large amount of data containing high-quality singing voices paired with lyrics and melodies are required. In addition, aligning melody labels to the corresponding singing voice audio is a time-exhausting manual work. Furthermore, there are a limited amount of publicly available singing voice data with melody labels so it is hard to improve a SVS model and synthesize a variety of singers' voice. To address these issues, we propose a unified voice model which can be trained with either speech or singing voice. They distill pronunciation and timbre information from speech data which are much more abundant than singing voice but they can generate singing voice. The first is a melody-unsupervision model which can be trained without melody information and phoneme length information. We evaluate the proposed model on singing voice synthesis and show that it can generate singing voice with low amount of singing voice. In addition, we improve the model by disentangling global timbre to synthesize a variety of singing voice timbre. Our model has the potential to be applied for personalized singing voice synthesizer with a small amount of data.

Advisors: Nam, Juhan researcher; 남주한 researcher

Description: 한국과학기술원 :문화기술대학원,

Publisher: 한국과학기술원

Issue Date: 2023

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 문화기술대학원, 2023.2,[viii, 85 p. :]

Keywords: Singing voice synthesis▼aSpeech synthesis▼aUnsupervised learning▼aDeep learning; 가창 합성▼a음성 합성▼a비지도 학습▼a딥러닝

URI: http://hdl.handle.net/10203/307967

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030385&flag=dissertation

Appears in Collection: GCT-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

(A) unified voice model for data-efficient singing voice synthesis통합 음성 모델을 통한 데이터 효율적인 가창 합성

KOASAS

Communities & Collections