LINEAR-SCALE FILTERBANK FOR DEEP NEURAL NETWORK-BASED VOICE ACTIVITY DETECTION

Cited 10 time in webofscience Cited 0 time in scopus
  • Hit : 232
  • Download : 0
Voice activity detection (VAD) is an important preprocessing module in many speech applications. Choosing appropriate features and model structures is a significant challenge and an active area of current VAD research. Mel-scale features such as Mel-frequency cepstral coefficients (MFCCs) and log Mel-filterbank (LMFB) energies have been widely used in VAD as well as speech recognition. The reason for feature extraction in Mel- frequency scale to be one of the most popular methods is that it mimics how human ears process sound. However, for certain types of sound, in which important characteristics are reflected more in the high frequency range, a linear-scale in frequency may provide more information than the Mel- scale. Therefore, in this paper, we propose a deep neural network (DNN)-based VAD system using linear-scale feature. This study shows that the linear-scale feature, especially log linear-filterbank (LLFB) energy, can be used for the DNN-based VAD system and shows better performance than the LMFB for certain types of noise. Moreover, a combination of LMFB and LLFB can integrates both advantages of the two features.
Publisher
The Korean Society of Speech Sciences
Issue Date
2017-11-01
Language
English
Citation

20th Conference of the Oriental-Chapter-of-the-International-Coordinating-Committee-on-Speech-Databases-and-Speech-I/O-Systems-and-Assessment (O-COCOSDA), pp.43 - 47

DOI
10.1109/ICSDA.2017.8384446
URI
http://hdl.handle.net/10203/227283
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 10 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0