Deep learning based approach for enhanced non-native speech recognition비원어민의 음성인식 향상을 위한 딥러닝을 활용한 접근법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 3
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor유창동-
dc.contributor.authorYoon, Eunseop-
dc.contributor.author윤은섭-
dc.date.accessioned2024-07-25T19:31:14Z-
dc.date.available2024-07-25T19:31:14Z-
dc.date.issued2023-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045904&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/320674-
dc.description학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2023.8,[v, 35 p. :]-
dc.description.abstractAutomatic Speech Recognition (ASR) is a task that converts a spoken language into written text, and these systems have attained unprecedented performance with large speech models pre-trained based on self-supervised speech representation learning. However, these pre-trained speech models suffer from representational bias as they tend to better represent those prominent accents (i.e., native (L1) English accent) in the pre-training speech corpus than less represented accents, resulting in a deteriorated performance for non-native (L2) English accents. Although there have been some approaches to mitigate this issue, all of these methods require updating the pre-trained model weights. In this paper, we propose Information Theoretic Adversarial Prompt Tuning (INTapt), which introduces prompts concatenated to the original input that can re-modulate the attention of the pre-trained model such that the corresponding input resembles a native (L1) English speech without updating the backbone weights. INTapt is trained simultaneously in the following two manners: (1) adversarial training to reduce accent feature dependence between the original input and the prompt-concatenated input and (2) training to minimize CTC loss for improving ASR performance to a prompt-concatenated input. Experimental results show that INTapt improves the performance of L2 English and increases feature similarity between L2 and L1 accents.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject음성 인식▼a프롬프트 튜닝▼a도메인 적-
dc.subjectAutomatic speech recognition▼aPrompt tuning▼aDomain adaptation-
dc.titleDeep learning based approach for enhanced non-native speech recognition-
dc.title.alternative비원어민의 음성인식 향상을 위한 딥러닝을 활용한 접근법-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthorYoo, Changdong-
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0