POLYPHONIC SOUND EVENT DETECTION USING CONVOLUTIONAL BIDIRECTIONAL LSTM AND SYNTHETIC DATA-BASED TRANSFER LEARNING

Cited 24 time in webofscience Cited 15 time in scopus
  • Hit : 244
  • Download : 0
This paper presents a novel approach to improve the performance of polyphonic sound event detection that combines a convolutional bidirectional recurrent neural network (CBRNN) with transfer learning. The ordinary convolutional recurrent neural network (CRNN) is known to suffer from a vanishing gradient problem, which significantly reduces the efficiency of information transfer to past events. To resolve this issue, we combine forward and backward long short-term memory (LSTM) modules and demonstrate that they complement each other. To effectively deal with the issue of overfitting that arises from increased model complexity, we apply transfer learning with a dataset that contains synthesized artifacts. We show that the model achieves faster and better performance with less data. Simulations with the 2016 TUT dataset show that the performance of the CBRNN with transfer learning is dramatically improved compared to the ordinary CRNN; the F1 score was 28.4% higher, and the error rate was 0.42 lower.
Publisher
IEEE
Issue Date
2019-05
Language
English
Citation

44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.885 - 889

ISSN
1520-6149
DOI
10.1109/ICASSP.2019.8682909
URI
http://hdl.handle.net/10203/274942
Appears in Collection
BiS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 24 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0