Investigating Time-Frequency Representations for Audio Feature Extraction in Singing Technique Classification

Cited 3 time in webofscience Cited 0 time in scopus
  • Hit : 74
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorYamamoto, Yuyako
dc.contributor.authorNam, Juhanko
dc.contributor.authorTerasawa, Hirokoko
dc.contributor.authorHiraga, Yuzuruko
dc.date.accessioned2022-11-09T03:00:11Z-
dc.date.available2022-11-09T03:00:11Z-
dc.date.created2022-09-14-
dc.date.issued2021-12-15-
dc.identifier.citation2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021, pp.890 - 896-
dc.identifier.issn2309-9402-
dc.identifier.urihttp://hdl.handle.net/10203/299402-
dc.description.abstractSinging techniques are used for expressive vocal performances by employing temporal fluctuations of timbre, pitch, and other components of the voice. In this study, we compare the performances of hand-crafted features and automat-ically extracted features using deep learning methods to identify different singing techniques. Hand-crafted acoustic features are based on expert knowledge of singing voice whereas the deep learning methods take low-level feature representations, such as spectrograms and raw waveforms, as inputs and learn features automatically using convolutional neural networks (CNNs). These extracted features are used as an input to the random forest classifier for comparison with the hand-crafted features for 10-class singing technique classification. We show that the CNN-based features outperform the hand-crafted features in terms of classification accuracy. Furthermore, we explore various time-frequency representations as an input to the CNNs. We show that the best performing input is multi-resolution short-time Fourier Transform (STFTs), when the CNN kernels are oblong and they slide on the frequency- and time-axis directions separately.-
dc.languageEnglish-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titleInvestigating Time-Frequency Representations for Audio Feature Extraction in Singing Technique Classification-
dc.typeConference-
dc.identifier.wosid000782454900145-
dc.identifier.scopusid2-s2.0-85126646903-
dc.type.rimsCONF-
dc.citation.beginningpage890-
dc.citation.endingpage896-
dc.citation.publicationname2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021-
dc.identifier.conferencecountryJA-
dc.identifier.conferencelocationTokyo-
dc.contributor.localauthorNam, Juhan-
dc.contributor.nonIdAuthorYamamoto, Yuya-
dc.contributor.nonIdAuthorTerasawa, Hiroko-
dc.contributor.nonIdAuthorHiraga, Yuzuru-
Appears in Collection
GCT-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0