DSpace at KOASAS: Investigating Time-Frequency Representations for Audio Feature Extraction in Singing Technique Classification

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Conference Papers(학술회의논문)

Investigating Time-Frequency Representations for Audio Feature Extraction in Singing Technique Classification

Cited 3 time in

Cited 0 time in

Hit : 74
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Yamamoto, Yuya	ko
dc.contributor.author	Nam, Juhan	ko
dc.contributor.author	Terasawa, Hiroko	ko
dc.contributor.author	Hiraga, Yuzuru	ko
dc.date.accessioned	2022-11-09T03:00:11Z	-
dc.date.available	2022-11-09T03:00:11Z	-
dc.date.created	2022-09-14	-
dc.date.issued	2021-12-15	-
dc.identifier.citation	2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021, pp.890 - 896	-
dc.identifier.issn	2309-9402	-
dc.identifier.uri	http://hdl.handle.net/10203/299402	-
dc.description.abstract	Singing techniques are used for expressive vocal performances by employing temporal fluctuations of timbre, pitch, and other components of the voice. In this study, we compare the performances of hand-crafted features and automat-ically extracted features using deep learning methods to identify different singing techniques. Hand-crafted acoustic features are based on expert knowledge of singing voice whereas the deep learning methods take low-level feature representations, such as spectrograms and raw waveforms, as inputs and learn features automatically using convolutional neural networks (CNNs). These extracted features are used as an input to the random forest classifier for comparison with the hand-crafted features for 10-class singing technique classification. We show that the CNN-based features outperform the hand-crafted features in terms of classification accuracy. Furthermore, we explore various time-frequency representations as an input to the CNNs. We show that the best performing input is multi-resolution short-time Fourier Transform (STFTs), when the CNN kernels are oblong and they slide on the frequency- and time-axis directions separately.	-
dc.language	English	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	Investigating Time-Frequency Representations for Audio Feature Extraction in Singing Technique Classification	-
dc.type	Conference	-
dc.identifier.wosid	000782454900145	-
dc.identifier.scopusid	2-s2.0-85126646903	-
dc.type.rims	CONF	-
dc.citation.beginningpage	890	-
dc.citation.endingpage	896	-
dc.citation.publicationname	2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021	-
dc.identifier.conferencecountry	JA	-
dc.identifier.conferencelocation	Tokyo	-
dc.contributor.localauthor	Nam, Juhan	-
dc.contributor.nonIdAuthor	Yamamoto, Yuya	-
dc.contributor.nonIdAuthor	Terasawa, Hiroko	-
dc.contributor.nonIdAuthor	Hiraga, Yuzuru	-

Appears in Collection: GCT-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 3 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Investigating Time-Frequency Representations for Audio Feature Extraction in Singing Technique Classification

This item is cited by other documents in WoS

KOASAS

Communities & Collections