SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 43
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Changhunko
dc.contributor.authorPark, Joonhyungko
dc.contributor.authorShim, Hajinko
dc.contributor.authorYang, Eunhoko
dc.date.accessioned2023-12-11T03:03:38Z-
dc.date.available2023-12-11T03:03:38Z-
dc.date.created2023-12-08-
dc.date.issued2023-08-23-
dc.identifier.citation24th International Speech Communication Association, Interspeech 2023, pp.3367 - 3371-
dc.identifier.urihttp://hdl.handle.net/10203/316202-
dc.description.abstractAutomatic speech recognition (ASR) models are frequently exposed to data distribution shifts in many real-world scenarios, leading to erroneous predictions. To tackle this issue, an existing test-time adaptation (TTA) method has recently been proposed to adapt the pre-trained ASR model on unlabeled test instances without source data. Despite decent performance gain, this work relies solely on naive greedy decoding and performs adaptation across timesteps at a frame level, which may not be optimal given the sequential nature of the model output. Motivated by this, we propose a novel TTA framework, dubbed SGEM, for general ASR models. To treat the sequential output, SGEM first exploits beam search to explore candidate output logits and selects the most plausible one. Then, it utilizes generalized entropy minimization and negative sampling as unsupervised objectives to adapt the model. SGEM achieves state-of-the-art performance for three mainstream ASR models under various domain shifts.-
dc.languageEnglish-
dc.publisherInternational Speech Communication Association-
dc.titleSGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization-
dc.typeConference-
dc.identifier.scopusid2-s2.0-85171535059-
dc.type.rimsCONF-
dc.citation.beginningpage3367-
dc.citation.endingpage3371-
dc.citation.publicationname24th International Speech Communication Association, Interspeech 2023-
dc.identifier.conferencecountryIE-
dc.identifier.conferencelocationDublin-
dc.identifier.doi10.21437/Interspeech.2023-1282-
dc.contributor.localauthorYang, Eunho-
Appears in Collection
AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0