라벨이 없는 데이터를 사용한 종단간 음성인식기의 준교사 방식 도메인 적응Semi-supervised domain adaptation using unlabeled data for end-to-end speech recognition

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 451
  • Download : 0
Recently, the neural network-based deep learning algorithm has dramatically improved performance compared to the classical Gaussian mixture model based hidden Markov model (GMM-HMM) automatic speech recognition (ASR) system. In addition, researches on end-to-end (E2E) speech recognition systems integrating language modeling and decoding processes have been actively conducted to better utilize the advantages of deep learning techniques. In general, E2E ASR systems consist of multiple layers of encoder-decoder structure with attention. Therefore, E2E ASR systems require data with a large amount of speech-text paired data in order to achieve good performance. Obtaining speech-text paired data requires a lot of human labor and time, and is a high barrier to building E2E ASR system. Therefore, there are previous studies that improve the performance of E2E ASR system using relatively small amount of speech-text paired data, but most studies have been conducted by using only speech-only data or text-only data. In this study, we proposed a semi-supervised training method that enables E2E ASR system to perform well in corpus in different domains by using both speech or text only data. The proposed method works effectively by adapting to different domains, showing good performance in the target domain and not degrading much in the source domain.
Publisher
한국음성학회
Issue Date
2020-06
Language
Korean
Citation

말소리와 음성과학, v.12, no.2, pp.29 - 37

ISSN
2005-8063
DOI
10.13064/KSSS.2020.12.2.029
URI
http://hdl.handle.net/10203/280563
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0