DC Field | Value | Language |
---|---|---|
dc.contributor.author | Im, Jaekwon | ko |
dc.contributor.author | Choi, Soonbeom | ko |
dc.contributor.author | Yong, Sangeon | ko |
dc.contributor.author | Nam, Juhan | ko |
dc.date.accessioned | 2022-12-07T12:00:22Z | - |
dc.date.available | 2022-12-07T12:00:22Z | - |
dc.date.created | 2022-12-02 | - |
dc.date.created | 2022-12-02 | - |
dc.date.created | 2022-12-02 | - |
dc.date.issued | 2022-11-09 | - |
dc.identifier.citation | 14th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2022, pp.809 - 814 | - |
dc.identifier.issn | 2309-9402 | - |
dc.identifier.uri | http://hdl.handle.net/10203/302017 | - |
dc.description.abstract | Singing voice separation (SVS) is a task that separates singing voice audio from its mixture with instrumental audio. Previous SVS studies have mainly employed the spectrogram masking method which requires a large dimensionality in predicting the binary masks. In addition, they focused on extracting a vocal stem that retains the wet sound with the reverberation effect. This result may hinder the reusability of the isolated singing voice. This paper addresses the issues by predicting mel-spectrogram of dry singing voices from the mixed audio as neural vocoder features and synthesizing the singing voice waveforms from the neural vocoder. We experimented with two separation methods. One is predicting binary masks in the mel-spectrogram domain and the other is directly predicting the mel-spectrogram. Furthermore, we add a singing voice detector to identify the singing voice segments over time more explicitly. We measured the model performance in terms of audio, dereverberation, separation, and overall quality. The results show that our proposed model outperforms state-of-the-art singing voice separation models in both objective and subjective evaluation except the audio quality. © 2022 Asia-Pacific of Signal and Information Processing Association (APSIPA). | - |
dc.language | English | - |
dc.publisher | Asia-Pacific Signal and Information Processing Association (APSIPA) | - |
dc.title | Neural Vocoder Feature Estimation for Dry Singing Voice Separation | - |
dc.type | Conference | - |
dc.identifier.wosid | 000922154500130 | - |
dc.identifier.scopusid | 2-s2.0-85146287665 | - |
dc.type.rims | CONF | - |
dc.citation.beginningpage | 809 | - |
dc.citation.endingpage | 814 | - |
dc.citation.publicationname | 14th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2022 | - |
dc.identifier.conferencecountry | TH | - |
dc.identifier.conferencelocation | Chiang Mai | - |
dc.identifier.doi | 10.23919/APSIPAASC55919.2022.9980093 | - |
dc.contributor.localauthor | Nam, Juhan | - |
dc.contributor.nonIdAuthor | Choi, Soonbeom | - |
dc.contributor.nonIdAuthor | Yong, Sangeon | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.