Publication Details

Normalising Flows for Speaker and Language Recognition Backend

ESPUNA, A.; PRASAD, A.; MOTLÍČEK, P.; MADIKERI, S.; SCHUEPBACH, C. Normalising Flows for Speaker and Language Recognition Backend. Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop. Quebec: International Speech Communication Association, 2024. p. 74-80.
Czech title
Normalizace toků pro back-end pro rozpoznávání mluvčího a jazyka
Type
conference paper
Language
English
Authors
ESPUNA, A.
Prasad Amrutha (DCGM)
Motlíček Petr, doc. Ing., Ph.D. (DCGM)
Madikeri Srikanth
SCHUEPBACH, C.
URL
Keywords

Speaker recognition, Language Recognition

Abstract

In this paper, we address the Gaussian distribution assumption
made in PLDA, a popular back-end classifier used in Speaker
and Language recognition tasks. We study normalizing flows,
which allow using non-linear transformations and still obtain a
model that can explicitly represent a probability density. The
model makes no assumption about the distribution of the ob-
servations. This alleviates the need for length normalization,
a well known data preprocessing step used to boost PLDA
performance. We demonstrate the effectiveness of this flow
model on NIST SRE16, LRE17 and LRE22 datasets. We ob-
serve that when applying length normalization, both the flow
model and PLDA achieve similar EERs for SRE16 (11.5% vs
11.8%). However, when length normalization is not applied,
the flow shows more robustness and offers better EERs (13.1%
vs 17.1%). For LRE17 and LRE22, the best classification accu-
racies (84.2%, 75.5%) are obtained by the flow model without
any need for length normalization.

Published
2024
Pages
74–80
Proceedings
Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop
Conference
Odyssey 2024: The Speaker and Language Recognition Workshop, Quebec, Canada, CA
Publisher
International Speech Communication Association
Place
Quebec
DOI
BibTeX
@inproceedings{BUT193369,
  author="ESPUNA, A. and PRASAD, A. and MOTLÍČEK, P. and MADIKERI, S. and SCHUEPBACH, C.",
  title="Normalising Flows for Speaker and Language Recognition Backend",
  booktitle="Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop",
  year="2024",
  pages="74--80",
  publisher="International Speech Communication Association",
  address="Quebec",
  doi="10.21437/odyssey.2024-11",
  url="https://www.isca-archive.org/odyssey_2024/espuna24_odyssey.pdf"
}
Files
Back to top