Publication Details

Analysis of Speaker Recognition Systems in Realistic Scenarios of the SITW 2016 Challenge

NOVOTNÝ, O.; MATĚJKA, P.; PLCHOT, O.; GLEMBEK, O.; BURGET, L.; ČERNOCKÝ, J. Analysis of Speaker Recognition Systems in Realistic Scenarios of the SITW 2016 Challenge. In Proceedings of Interspeech 2016. San Francisco: International Speech Communication Association, 2016. p. 828-832. ISBN: 978-1-5108-3313-5.
Czech title
Analýza systémů pro ověřování mluvčího v realistických podmínkách SITW 2016 Challenge
Type
conference paper
Language
English
Authors
Novotný Ondřej, Ing., Ph.D.
Matějka Pavel, Ing., Ph.D. (DCGM)
Plchot Oldřich, Ing., Ph.D. (DCGM)
Glembek Ondřej, Ing., Ph.D.
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
URL
Keywords

speaker recognition, SRE systems, diarization

Abstract

In this paper, we summarize our efforts for the Speakers In The Wild (SITW) challenge, and we present our findings with this new dataset for speaker recognition. Apart from the standard comparison of different SRE systems, we analyze the use of diarization for dealing with audio segments containing multiple speakers, as in part of the newly introduced enrollment and test protocols, diarization is a necessary system component. Our state-of-the-art systems used in this work utilize both cepstral and DNN-based bottleneck features and are based on i-vectors followed by Probabilistic Linear Discriminant Analysis (PLDA) classifier and logistic regression calibration/fusion. We present both narrow-band (8 kHz) and wide-band (16 kHz) systems together with their fusions.

Annotation

In this paper, we summarize our efforts for the Speakers In The Wild (SITW) challenge, and we present our findings with this new dataset for speaker recognition. Apart from the standard comparison of different SRE systems, we analyze the use of diarization for dealing with audio segments containing multiple speakers, as in part of the newly introduced enrollment and test protocols, diarization is a necessary system component. Our state-of-the-art systems used in this work utilize both cepstral and DNN-based bottleneck features and are based on i-vectors followed by Probabilistic Linear Discriminant Analysis (PLDA) classifier and logistic regression calibration/fusion. We present both narrow-band (8 kHz) and wide-band (16 kHz) systems together with their fusions.

Published
2016
Pages
828–832
Proceedings
Proceedings of Interspeech 2016
ISBN
978-1-5108-3313-5
Publisher
International Speech Communication Association
Place
San Francisco
DOI
UT WoS
000409394400173
EID Scopus
BibTeX
@inproceedings{BUT132599,
  author="Ondřej {Novotný} and Pavel {Matějka} and Oldřich {Plchot} and Ondřej {Glembek} and Lukáš {Burget} and Jan {Černocký}",
  title="Analysis of Speaker Recognition Systems in Realistic Scenarios of the SITW 2016 Challenge",
  booktitle="Proceedings of Interspeech 2016",
  year="2016",
  pages="828--832",
  publisher="International Speech Communication Association",
  address="San Francisco",
  doi="10.21437/Interspeech.2016-981",
  isbn="978-1-5108-3313-5",
  url="https://www.researchgate.net/publication/307889224_Analysis_of_Speaker_Recognition_Systems_in_Realistic_Scenarios_of_the_SITW_2016_Challenge"
}
Files
Back to top