HiPhi-GAN : improving neural vocoder by fixing the phase constantHiPhi-GAN : 위상 상수를 고정하여 신경망 보코더를 개선하는 방법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 180
  • Download : 0
Most modern neural vocoders generate a waveform from a mel-spectrogram, one of the acoustic features. Mel-spectrogram is information about the only magnitude, not phase constant. In other words, all mel-spectrograms obtained from x_ϕ ̅ are the same for all ϕ, when x_ϕ is phase transform by ϕ from the waveform x. Conversely, a neural vocoder that generates a waveform with only mel-spectrogram as input is confused in training because x_ϕ can be ground-truth for all ϕ. In this paper, we propose a universal vocoder consisting of a stage to avoid confusion by fixing ϕ to ϕ ̅ to guarantee the uniqueness (x_ϕ ̅ ) of ground-truth, and a stage to generate a full-band waveform according to the fixed ϕ ̅. Each stage is named phase synchronizer and waveform upsampler. The proposed neural vocoder HiPhi-GAN solves all the existing problems: slow inference speed, lousy audio quality at mid-high-band, and frequent phasing errors.
Advisors
Kim, Daeyoungresearcher김대영researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학부, 2023.2,[iii, 29 p. :]

Keywords

Neural vocoder▼aUniversal vocoder▼aPhase transform▼aGenerative adversarial network▼aDiffusion model▼aReal-time speech synthesis▼aReal-time voice conversion; 신경망 보코더▼a범용 보코더▼a위상 변환▼a생성적 적대 신경망▼a확산 모델▼a실시간 음성 합성▼a실시간 음성 변환

URI
http://hdl.handle.net/10203/309499
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032952&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0