Publication Details
Development of ABC systems for the 2021 edition of NIST Speaker Recognition evaluation
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Glembek Ondřej, Ing., Ph.D.
Matějka Pavel, Ing., Ph.D. (DCGM)
Mošner Ladislav, Ing. (DCGM)
Plchot Oldřich, Ing., Ph.D. (DCGM)
Rohdin Johan Andréas, M.Sc., Ph.D. (DCGM)
Silnova Anna, M.Sc., Ph.D. (DCGM)
Stafylakis Themos
and others
speaker verification, recognition, evaluation
In this contribution, we provide a description of the ABC teams collaborative
efforts toward the development of speaker verification systems for the NIST
Speaker Recognition Evaluation 2021 (NISTSRE2021). Cross-lingual and
cross-dataset trials are the two main challenges introduced in the NIST-SRE2021.
Submissions of ABC team are the result of active collaboration of researchers
from BUT, CRIM, Omilia and Innovatrics. We took part in all three close condition
tracks for audio-only, audio-visual and visual-only verification tasks. Our
audio-only systems follow deep speaker embeddings (e.g., x-vectors) with
a subsequent PLDA scoring paradigm. As embeddings extractor, we select some
variants of residual neural network (ResNet), factored time delay neural network
(FTDNN) and Hybrid Neural Network (HNN) architectures. The HNN embeddings
extractor employs CNN, LSTM and TDNN networks and incorporates a multi-level
global-local statistics pooling method in order to aggregate the speaker
information within short time-span and utterance-level context. Our visual-only
systems are based on pretrained embeddings extractors employing some variants of
ResNet and the scoring is based on cosine distance. When developing an
audio-visual system, we simply fuse the outputs of independent audio and visual
systems. Our final submitted systems are obtained by performing score level
fusion of subsystems followed by score calibration.
@inproceedings{BUT179689,
author="Jahangir {Alam} and Lukáš {Burget} and Ondřej {Glembek} and Pavel {Matějka} and Ladislav {Mošner} and Oldřich {Plchot} and Johan Andréas {Rohdin} and Anna {Silnova} and Themos {Stafylakis}",
title="Development of ABC systems for the 2021 edition of NIST Speaker Recognition evaluation",
booktitle="Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022)",
year="2022",
pages="346--353",
publisher="International Speech Communication Association",
address="Beijing",
doi="10.21437/Odyssey.2022-48",
url="https://www.isca-speech.org/archive/pdfs/odyssey_2022/alam22_odyssey.pdf"
}