DSpace at KOASAS: Comparison and Analysis of SampleCNN Architectures for Audio Classification

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Journal Papers(저널논문)

Comparison and Analysis of SampleCNN Architectures for Audio Classification

Cited 43 time in

Cited 35 time in

Hit : 776
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Kim, Taejun	ko
dc.contributor.author	Lee, Jongpil	ko
dc.contributor.author	Nam, Juhan	ko
dc.date.accessioned	2019-06-12T07:50:19Z	-
dc.date.available	2019-06-12T07:50:19Z	-
dc.date.created	2019-06-12	-
dc.date.created	2019-06-12	-
dc.date.created	2019-06-12	-
dc.date.issued	2019-05	-
dc.identifier.citation	IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, v.13, no.2, pp.285 - 297	-
dc.identifier.issn	1932-4553	-
dc.identifier.uri	http://hdl.handle.net/10203/262580	-
dc.description.abstract	End-to-end learning with convolutional neural networks (CNNs) has become a standard approach in image classification. However, in audio classification, CNN-based models that use time-frequency representations as input are still popular. A recently proposed CNN architecture called SampleCNN takes raw waveforms directly and has very small sizes of filters. The architecture has proven to be effective in music classification tasks. In this paper, we scrutinize SampleCNN further by comparing it with spectrogram-based CNN and changing the suhsampling operation in three different audio domains: music, speech, and acoustic scene sound. Also, we extend SampleCNN to more advanced versions using components from residual networks and squeezeand-excitation networks. The results show that the squeeze-andexcitation block is particularly effective among them. Furthermore, we analyze the trained models to provide better understanding of the architectures. First, we visualize hierarchically learned features to see how the filters with small granularity adapt to audio signals from different domains. Second, we observe the squeeze-and-excitation block by plotting the distribution of excitation in several different ways. This analysis shows that the excitation tends to be increasingly class specific with increasing depth but the first layer that takes raw waveforms directly can be highly class specific, particularly in music data. We examine this further and show that the excitation in the first layer is sensitive to the loudness, which is an acoustic characteristic that distinguishes different genres of music.	-
dc.language	English	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	Comparison and Analysis of SampleCNN Architectures for Audio Classification	-
dc.type	Article	-
dc.identifier.wosid	000468435500009	-
dc.identifier.scopusid	2-s2.0-85065982131	-
dc.type.rims	ART	-
dc.citation.volume	13	-
dc.citation.issue	2	-
dc.citation.beginningpage	285	-
dc.citation.endingpage	297	-
dc.citation.publicationname	IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING	-
dc.identifier.doi	10.1109/JSTSP.2019.2909479	-
dc.contributor.localauthor	Nam, Juhan	-
dc.description.isOpenAccess	N	-
dc.type.journalArticle	Article	-
dc.subject.keywordAuthor	Audio classification	-
dc.subject.keywordAuthor	end-to-end learning	-
dc.subject.keywordAuthor	convolutional neural networks	-
dc.subject.keywordAuthor	residual networks	-
dc.subject.keywordAuthor	squeeze-and-excitation networks	-
dc.subject.keywordAuthor	interpretability	-
dc.subject.keywordPlus	CONVOLUTIONAL NEURAL-NETWORKS	-

Appears in Collection: GCT-Journal Papers(저널논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 43 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Comparison and Analysis of SampleCNN Architectures for Audio Classification

This item is cited by other documents in WoS

KOASAS

Communities & Collections