CROSS ATTENTIVE POOLING FOR SPEAKER VERIFICATION

Cited 6 time in webofscience Cited 0 time in scopus
  • Hit : 94
  • Download : 0
The goal of this paper is text-independent speaker verification where utterances come from `in the wild' videos and may contain irrelevant signal. While speaker verification is naturally a pair-wise problem, existing methods to produce the speaker embeddings are instance-wise. In this paper, we propose Cross Attentive Pooling (CAP) that utilises the context information across the referencequery pair to generate utterance-level embeddings that contain the most discriminative information for the pair-wise matching problem. Experiments are performed on the VoxCeleb dataset in which our method outperforms comparable pooling strategies.
Publisher
IEEE
Issue Date
2021-01
Language
English
Citation

IEEE Spoken Language Technology Workshop (SLT), pp.294 - 300

ISSN
2639-5479
DOI
10.1109/SLT48900.2021.9383565
URI
http://hdl.handle.net/10203/288414
Appears in Collection
RIMS Conference Papers
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 6 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0