DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Cited 1 time in webofscience Cited 0 time in scopus
  • Hit : 103
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorLee, Dongheonko
dc.contributor.authorChoi, Jung-Wooko
dc.date.accessioned2023-03-27T01:00:23Z-
dc.date.available2023-03-27T01:00:23Z-
dc.date.created2023-02-07-
dc.date.issued2023-
dc.identifier.citationIEEE SIGNAL PROCESSING LETTERS, v.30, pp.155 - 159-
dc.identifier.issn1070-9908-
dc.identifier.urihttp://hdl.handle.net/10203/305794-
dc.description.abstractIn this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppress-ing the noise and reverberation embedded in the short-time Fourier transform (STFT) of an input signal. The proposed mask estimation network incorporates three different types of blocksfor aggregatinginformationin thespatial, spectral, and temporal dimensions. It utilizes a spectral transformer with a modified feed-forward network and a temporal con-former with sequential dilated convolutions. The use of dense blocks and transformers dedicated to the three differ-ent characteristics of audio signals enables more compre-hensive enhancement in noisy and reverberant environ-ments. The remarkable performance of DeFT-AN over state-of-the-art multichannel models is demonstrated based on two popular noisy and reverberant datasets in terms of various metrics for speech quality and intelligibility.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleDeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement-
dc.typeArticle-
dc.identifier.wosid000942334400001-
dc.identifier.scopusid2-s2.0-85149384907-
dc.type.rimsART-
dc.citation.volume30-
dc.citation.beginningpage155-
dc.citation.endingpage159-
dc.citation.publicationnameIEEE SIGNAL PROCESSING LETTERS-
dc.identifier.doi10.1109/LSP.2023.3244428-
dc.contributor.localauthorChoi, Jung-Woo-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorSpeech enhancement-
dc.subject.keywordAuthorTransformers-
dc.subject.keywordAuthorNoise measurement-
dc.subject.keywordAuthorConvolution-
dc.subject.keywordAuthorTime-frequency analysis-
dc.subject.keywordAuthorTime-domain analysis-
dc.subject.keywordAuthorConvolutional neural networks-
dc.subject.keywordAuthorComplex-spectral masking-
dc.subject.keywordAuthormultichannel-
dc.subject.keywordAuthorspeech enhancement-
dc.subject.keywordAuthortransformer-
dc.subject.keywordPlusSPEECH-
dc.subject.keywordPlusINTELLIGIBILITY-
dc.subject.keywordPlusATTENTION-
dc.subject.keywordPlusFRAMEWORK-
dc.subject.keywordPlusCORPUS-
dc.subject.keywordPlusCNN-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 1 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0