DSpace at KOASAS: Neural Vocoder Feature Estimation for Dry Singing Voice Separation

DSpace at KOASAS

College of Liberal Arts and Convergence Science(인문사회융합과학대학)Graduate School of Culture Technology(문화기술대학원)GCT-Conference Papers(학술회의논문)

Neural Vocoder Feature Estimation for Dry Singing Voice Separation

Cited 0 time in webofscience

Cited 0 time in

Hit : 149
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Im, Jaekwon	ko
dc.contributor.author	Choi, Soonbeom	ko
dc.contributor.author	Yong, Sangeon	ko
dc.contributor.author	Nam, Juhan	ko
dc.date.accessioned	2022-12-07T12:00:22Z	-
dc.date.available	2022-12-07T12:00:22Z	-
dc.date.created	2022-12-02	-
dc.date.created	2022-12-02	-
dc.date.created	2022-12-02	-
dc.date.issued	2022-11-09	-
dc.identifier.citation	14th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2022, pp.809 - 814	-
dc.identifier.issn	2309-9402	-
dc.identifier.uri	http://hdl.handle.net/10203/302017	-
dc.description.abstract	Singing voice separation (SVS) is a task that separates singing voice audio from its mixture with instrumental audio. Previous SVS studies have mainly employed the spectrogram masking method which requires a large dimensionality in predicting the binary masks. In addition, they focused on extracting a vocal stem that retains the wet sound with the reverberation effect. This result may hinder the reusability of the isolated singing voice. This paper addresses the issues by predicting mel-spectrogram of dry singing voices from the mixed audio as neural vocoder features and synthesizing the singing voice waveforms from the neural vocoder. We experimented with two separation methods. One is predicting binary masks in the mel-spectrogram domain and the other is directly predicting the mel-spectrogram. Furthermore, we add a singing voice detector to identify the singing voice segments over time more explicitly. We measured the model performance in terms of audio, dereverberation, separation, and overall quality. The results show that our proposed model outperforms state-of-the-art singing voice separation models in both objective and subjective evaluation except the audio quality. © 2022 Asia-Pacific of Signal and Information Processing Association (APSIPA).	-
dc.language	English	-
dc.publisher	Asia-Pacific Signal and Information Processing Association (APSIPA)	-
dc.title	Neural Vocoder Feature Estimation for Dry Singing Voice Separation	-
dc.type	Conference	-
dc.identifier.wosid	000922154500130	-
dc.identifier.scopusid	2-s2.0-85146287665	-
dc.type.rims	CONF	-
dc.citation.beginningpage	809	-
dc.citation.endingpage	814	-
dc.citation.publicationname	14th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2022	-
dc.identifier.conferencecountry	TH	-
dc.identifier.conferencelocation	Chiang Mai	-
dc.identifier.doi	10.23919/APSIPAASC55919.2022.9980093	-
dc.contributor.localauthor	Nam, Juhan	-
dc.contributor.nonIdAuthor	Choi, Soonbeom	-
dc.contributor.nonIdAuthor	Yong, Sangeon	-

Appears in Collection: GCT-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Neural Vocoder Feature Estimation for Dry Singing Voice Separation

KOASAS

Communities & Collections