Publication Details
Discriminative Training of VBx Diarization
DIEZ SÁNCHEZ, M.
Landini Federico Nicolás, Ph.D. (RG SPEECH)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Silnova Anna, M.Sc., Ph.D. (DCGM)
Delcroix Marc
TAWARA, N.
speaker diarization, VBx, clustering, variational Bayes, discriminative training
Bayesian HMM clustering of x-vector sequences (VBx) has be- come a widely adopted
diarization baseline model in publications and challenges. It uses an HMM to
model speaker turns, a gen- eratively trained probabilistic linear discriminant
analysis (PLDA) for speaker distribution modeling, and Bayesian inference to
esti- mate the assignment of x-vectors to speakers. This paper presents a new
framework for updating the VBx parameters using discrim- inative training, which
directly optimizes a predefined loss. We also propose a new loss that better
correlates with the diarization error rate compared to binary cross-entropy - the
default choice for diarization end-to-end systems. Proof-of-concept results
across three datasets (AMI, CALLHOME, and DIHARD II) demonstrate the method's
capability of automatically finding hyperparameters, achieving comparable
performance to those found by extensive grid search, which typically requires
additional hyperparameter behavior knowledge. Moreover, we show that
discriminative fine-tuning of PLDA can further improve the model's performance.
We release the source code with this publication.
@inproceedings{BUT189781,
author="KLEMENT, D. and DIEZ SÁNCHEZ, M. and LANDINI, F. and BURGET, L. and SILNOVA, A. and DELCROIX, M. and TAWARA, N.",
title="Discriminative Training of VBx Diarization",
booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
year="2024",
pages="11871--11875",
publisher="IEEE Signal Processing Society",
address="Seoul",
doi="10.1109/ICASSP48485.2024.10446119",
isbn="979-8-3503-4485-1",
url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446119"
}