DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kang, Minki | ko |
dc.contributor.author | Min, Dongchan | ko |
dc.contributor.author | Hwang, Sung Ju | ko |
dc.date.accessioned | 2023-12-12T07:01:01Z | - |
dc.date.available | 2023-12-12T07:01:01Z | - |
dc.date.created | 2023-12-10 | - |
dc.date.issued | 2023-06-06 | - |
dc.identifier.citation | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 | - |
dc.identifier.uri | http://hdl.handle.net/10203/316286 | - |
dc.description.abstract | There has been a significant progress in Text-To-Speech (TTS) synthesis technology in recent years, thanks to the advancement in neural generative modeling. However, existing methods on any-speaker adaptive TTS have achieved unsatisfactory performance, due to their suboptimal accuracy in mimicking the target speakers’ styles. In this work, we present Grad-StyleSpeech, which is an any-speaker adaptive TTS framework that is based on a diffusion model that can generate highly natural speech with extremely high similarity to target speakers’ voice, given a few seconds of reference speech. Grad-StyleSpeech significantly outperforms recent speaker-adaptive TTS baselines on English benchmarks. Audio samples are available at https://nardien.github.io/grad-stylespeech-demo. | - |
dc.language | English | - |
dc.publisher | IEEE Signal Processing Society | - |
dc.title | Grad-StyleSpeech: Any-speaker Adaptive Text-To-Speech Synthesis with Diffusion Models | - |
dc.type | Conference | - |
dc.identifier.scopusid | 2-s2.0-85177568036 | - |
dc.type.rims | CONF | - |
dc.citation.publicationname | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 | - |
dc.identifier.conferencecountry | GR | - |
dc.identifier.conferencelocation | Rhodes Island | - |
dc.identifier.doi | 10.1109/ICASSP49357.2023.10095515 | - |
dc.contributor.localauthor | Hwang, Sung Ju | - |
dc.contributor.nonIdAuthor | Kang, Minki | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.