DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kye, Seong Min | ko |
dc.contributor.author | Kwon, Yoohwan | ko |
dc.contributor.author | Chung, Joon Son | ko |
dc.date.accessioned | 2021-10-28T02:50:25Z | - |
dc.date.available | 2021-10-28T02:50:25Z | - |
dc.date.created | 2021-10-27 | - |
dc.date.issued | 2021-01 | - |
dc.identifier.citation | IEEE Spoken Language Technology Workshop (SLT), pp.294 - 300 | - |
dc.identifier.issn | 2639-5479 | - |
dc.identifier.uri | http://hdl.handle.net/10203/288414 | - |
dc.description.abstract | The goal of this paper is text-independent speaker verification where utterances come from `in the wild' videos and may contain irrelevant signal. While speaker verification is naturally a pair-wise problem, existing methods to produce the speaker embeddings are instance-wise. In this paper, we propose Cross Attentive Pooling (CAP) that utilises the context information across the referencequery pair to generate utterance-level embeddings that contain the most discriminative information for the pair-wise matching problem. Experiments are performed on the VoxCeleb dataset in which our method outperforms comparable pooling strategies. | - |
dc.language | English | - |
dc.publisher | IEEE | - |
dc.title | CROSS ATTENTIVE POOLING FOR SPEAKER VERIFICATION | - |
dc.type | Conference | - |
dc.identifier.wosid | 000663633300041 | - |
dc.identifier.scopusid | 2-s2.0-85097919959 | - |
dc.type.rims | CONF | - |
dc.citation.beginningpage | 294 | - |
dc.citation.endingpage | 300 | - |
dc.citation.publicationname | IEEE Spoken Language Technology Workshop (SLT) | - |
dc.identifier.conferencecountry | CC | - |
dc.identifier.conferencelocation | Shenzhen | - |
dc.identifier.doi | 10.1109/SLT48900.2021.9383565 | - |
dc.contributor.localauthor | Kye, Seong Min | - |
dc.contributor.nonIdAuthor | Kwon, Yoohwan | - |
dc.contributor.nonIdAuthor | Chung, Joon Son | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.