Neural representations for speech recognition and natural language generation음성 인식과 자연어 생성을 위한 신경망 표현 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 723
  • Download : 0
Spoken dialog systems (SDS) are the most ecient interface for human-machine communication because a human can convey and receive large information in short time via spoken language. In this dissertation, toward emotional dialog systems, our research goals in terms of applications are (i) performance improvements of acoustic models in automatic speech recognition and (ii) natural language understanding and generation for emotional dialog. In terms of neural representations in deep architecture, we aim at (i) achieving discriminative but insensitive representations of speech acoustics, and (ii) disentangling emotional attributes from the latent representations of texts for emotional response generation. In the first part of the dissertation, we study representations of speech acoustics for obtaining robustness to spectral variations in convolution neural networks-hidden Markov model (CNN-HMM) hybrid acoustic model in ASR. We contend that convolution along the time axis is more effective than along the frequency axis. We also propose the addition of an intermap pooling (IMP) layer to deep CNNs that groups common spectrally variated features and then pools them, so that achieve the robustness to the spectral variations. The IMP-CNNs with the time convolution reduce the word error rates more in various speech database without speaker adaptation techniques. We expect the proposed model to be more useful when the speaker information is limited access. In the second part of the dissertation, we deal with a neural empathic conversational agent that can generate emotional responses by controlling the emotion attributes. We tackle this problem by two sub-goals: (1) controllable emotional sentence generations by disentangling emotional latent vectors of a sentence, and (2) controllable emotional response generations by matching context and response latent vectors. We propose deep generative frameworks to solve these problems: Wasserstein adversarial controllable autoencoder (WACAE) and Wasserstein adversarial controllable response generator (WACRG). The models are experimentally demonstrated on DailyDialog dataset, showing that proposing methods improve the emotion expressivity as well as the feasibility of emotional text and response generation and emotion transfer of sentences. From these results, we expect that the proposed models would be used to construct dialog systems that can communicate emotionally with the user according to the change of emotion of the conversational agent.
Advisors
Kim, Dae-Shikresearcher김대식researcherLee, Soo-Youngresearcher이수영researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2018.8,[v, 96 p. :]

Keywords

Deep leaning▼aacoustic modeling▼aemotional text and response generation; 심층 학습▼a음향 모델▼a맵간풀링▼a감정 텍스트 생성▼a감정 응답 생성

URI
http://hdl.handle.net/10203/265137
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828206&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0