DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 양은호 | - |
dc.contributor.author | Seo, Kyusung | - |
dc.contributor.author | 서규성 | - |
dc.date.accessioned | 2024-07-30T19:30:38Z | - |
dc.date.available | 2024-07-30T19:30:38Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096061&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/321356 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2024.2,[iii, 17 p. :] | - |
dc.description.abstract | A data augmentation technique involving cut-and-paste operations has garnered significant interest within the field of computer vision because of its straightforward nature and its proven effectiveness in enhancing the ability to generalize. However, applying this method to Automatic Speech Recognition (ASR) tasks poses challenges due to the varying lengths of segments corresponding to specific output tokens such as words or sub-words. Furthermore, if speech segments are combined without regard for their meaning, there is a risk of generating incoherent or nonsensical sentences. In this paper, we introduce a method called WeavSpeech, which addresses these challenges by offering a straightforward yet powerful cut-and-paste augmentation approach for ASR tasks. WeavSpeech weaves together pairs of speech data while taking into account their semantics. This method is universally applicable to languages without requiring language-specific knowledge and can be seamlessly incorporated with other verified augmentation techniques such as SpecAugment. Our research demonstrates the superiority of WeavSpeech on well-known ASR benchmark datasets, including LibriSpeech and WSJ. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 음성 인식▼a데이터 증강▼a컷앤페이스트▼a컷믹스▼a믹스업 | - |
dc.subject | Speech recognition▼aData augmentation▼aCut-and-paste▼aCutmix▼aMixup | - |
dc.title | Semantically-driven cut-and-paste data augmentation strategy for automatic speech recognition | - |
dc.title.alternative | 자동 음성 인식을 위한 의미 중심 컷앤페이스트 데이터 증강 전략 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :김재철AI대학원, | - |
dc.contributor.alternativeauthor | Yang, Eunho | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.