PSEUDO-LABEL TRANSFER FROM FRAME-LEVEL TO NOTE-LEVEL IN A TEACHER-STUDENT FRAMEWORK FOR SINGING TRANSCRIPTION FROM POLYPHONIC MUSIC

Cited 7 time in webofscience Cited 0 time in scopus
  • Hit : 89
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKum, Sangeunko
dc.contributor.authorLee, Jongpilko
dc.contributor.authorKim, Keunhyoung Lukeko
dc.contributor.authorKim, Taehyoungko
dc.contributor.authorNam, Juhanko
dc.date.accessioned2022-09-27T10:00:18Z-
dc.date.available2022-09-27T10:00:18Z-
dc.date.created2022-09-14-
dc.date.created2022-09-14-
dc.date.issued2022-05-24-
dc.identifier.citation47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, pp.796 - 800-
dc.identifier.issn1520-6149-
dc.identifier.urihttp://hdl.handle.net/10203/298716-
dc.description.abstractLack of large-scale note-level labeled data is the major obstacle to singing transcription from polyphonic music. We address the issue by using pseudo labels from vocal pitch estimation models given unlabeled data. The proposed method first converts the frame-level pseudo labels to note-level through pitch and rhythm quantization steps. Then, it further improves the label quality through self-training in a teacher-student framework. To validate the method, we conduct various experiment settings by investigating two vocal pitch estimation models as pseudo-label generators, two setups of teacher-student frameworks, and the number of iterations in self-training. The results show that the proposed method can effectively leverage large-scale unlabeled audio data and self-training with the noisy student model helps to improve performance. Finally, we show that the model trained with only unlabeled data has comparable performance to previous works and the model trained with additional labeled data achieves higher accuracy than the model trained with only labeled data.-
dc.languageEnglish-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titlePSEUDO-LABEL TRANSFER FROM FRAME-LEVEL TO NOTE-LEVEL IN A TEACHER-STUDENT FRAMEWORK FOR SINGING TRANSCRIPTION FROM POLYPHONIC MUSIC-
dc.typeConference-
dc.identifier.wosid000864187901014-
dc.identifier.scopusid2-s2.0-85131239972-
dc.type.rimsCONF-
dc.citation.beginningpage796-
dc.citation.endingpage800-
dc.citation.publicationname47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022-
dc.identifier.conferencecountrySI-
dc.identifier.conferencelocationMarina Bay Sands Expo & Convention Center-
dc.identifier.doi10.1109/ICASSP43922.2022.9747147-
dc.contributor.localauthorNam, Juhan-
dc.contributor.nonIdAuthorKum, Sangeun-
dc.contributor.nonIdAuthorLee, Jongpil-
dc.contributor.nonIdAuthorKim, Keunhyoung Luke-
dc.contributor.nonIdAuthorKim, Taehyoung-
Appears in Collection
GCT-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 7 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0