DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Dongheon | ko |
dc.contributor.author | Choi, Jung-Woo | ko |
dc.date.accessioned | 2023-03-27T01:00:23Z | - |
dc.date.available | 2023-03-27T01:00:23Z | - |
dc.date.created | 2023-02-07 | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | IEEE SIGNAL PROCESSING LETTERS, v.30, pp.155 - 159 | - |
dc.identifier.issn | 1070-9908 | - |
dc.identifier.uri | http://hdl.handle.net/10203/305794 | - |
dc.description.abstract | In this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppress-ing the noise and reverberation embedded in the short-time Fourier transform (STFT) of an input signal. The proposed mask estimation network incorporates three different types of blocksfor aggregatinginformationin thespatial, spectral, and temporal dimensions. It utilizes a spectral transformer with a modified feed-forward network and a temporal con-former with sequential dilated convolutions. The use of dense blocks and transformers dedicated to the three differ-ent characteristics of audio signals enables more compre-hensive enhancement in noisy and reverberant environ-ments. The remarkable performance of DeFT-AN over state-of-the-art multichannel models is demonstrated based on two popular noisy and reverberant datasets in terms of various metrics for speech quality and intelligibility. | - |
dc.language | English | - |
dc.publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | - |
dc.title | DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement | - |
dc.type | Article | - |
dc.identifier.wosid | 000942334400001 | - |
dc.identifier.scopusid | 2-s2.0-85149384907 | - |
dc.type.rims | ART | - |
dc.citation.volume | 30 | - |
dc.citation.beginningpage | 155 | - |
dc.citation.endingpage | 159 | - |
dc.citation.publicationname | IEEE SIGNAL PROCESSING LETTERS | - |
dc.identifier.doi | 10.1109/LSP.2023.3244428 | - |
dc.contributor.localauthor | Choi, Jung-Woo | - |
dc.description.isOpenAccess | N | - |
dc.type.journalArticle | Article | - |
dc.subject.keywordAuthor | Speech enhancement | - |
dc.subject.keywordAuthor | Transformers | - |
dc.subject.keywordAuthor | Noise measurement | - |
dc.subject.keywordAuthor | Convolution | - |
dc.subject.keywordAuthor | Time-frequency analysis | - |
dc.subject.keywordAuthor | Time-domain analysis | - |
dc.subject.keywordAuthor | Convolutional neural networks | - |
dc.subject.keywordAuthor | Complex-spectral masking | - |
dc.subject.keywordAuthor | multichannel | - |
dc.subject.keywordAuthor | speech enhancement | - |
dc.subject.keywordAuthor | transformer | - |
dc.subject.keywordPlus | SPEECH | - |
dc.subject.keywordPlus | INTELLIGIBILITY | - |
dc.subject.keywordPlus | ATTENTION | - |
dc.subject.keywordPlus | FRAMEWORK | - |
dc.subject.keywordPlus | CORPUS | - |
dc.subject.keywordPlus | CNN | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.