Extracting TTS style via adversarial and supervised contrastive learning적대적 및 지도 대조 학습을 통한 TTS 스타일 추출

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 79
  • Download : 0
Few-shot TTS is a useful but challenging task where we have to mimic a new style given a short reference speech. A popular approach for tackling this problem is to rely on architecture bottleneck for extracting style embedding. However, this approach may have robustness issues if the extracted embedding is not independent of text input, or relevance to speaker identity might be limited due to the bottleneck. In this study, we propose to use adversarial contrastive learning to extract style independent of text. Furthermore, we propose to use supervised contrastive learning to reinforce relevance to speaker identity and utilize rich representation learned by contrastive learning. Quantitative evaluation on benchmark dataset is performed in order to show that our method indeed improves robustness and relevance to speaker identity.
Advisors
Yang, Eunhoresearcher양은호researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2022.2,[iii, 21 p. :]

URI
http://hdl.handle.net/10203/308187
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997677&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0