Publication Details

Discriminative Training of VBx Diarization

KLEMENT, D.; DIEZ SÁNCHEZ, M.; LANDINI, F.; BURGET, L.; SILNOVA, A.; DELCROIX, M.; TAWARA, N. Discriminative Training of VBx Diarization. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 11871-11875. ISBN: 979-8-3503-4485-1.

Czech title

Diskriminativní trénování VBx diarizace mluvčích

Type

conference paper

Language

English

Authors

Klement Dominik, Ing. (DCGM)
DIEZ SÁNCHEZ, M.
Landini Federico Nicolás, Ph.D. (RG SPEECH)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Silnova Anna, M.Sc., Ph.D. (DCGM)
Delcroix Marc (FIT)
TAWARA, N.

URL

Keywords

speaker diarization, VBx, clustering, variational Bayes, discriminative training

Abstract

Bayesian HMM clustering of x-vector sequences (VBx) has be-
come a widely adopted diarization baseline model in publications
and challenges. It uses an HMM to model speaker turns, a gen-
eratively trained probabilistic linear discriminant analysis (PLDA)
for speaker distribution modeling, and Bayesian inference to esti-
mate the assignment of x-vectors to speakers. This paper presents
a new framework for updating the VBx parameters using discrim-
inative training, which directly optimizes a predefined loss. We
also propose a new loss that better correlates with the diarization
error rate compared to binary cross-entropy - the default choice
for diarization end-to-end systems. Proof-of-concept results across
three datasets (AMI, CALLHOME, and DIHARD II) demonstrate
the method's capability of automatically finding hyperparameters,
achieving comparable performance to those found by extensive grid
search, which typically requires additional hyperparameter behavior
knowledge. Moreover, we show that discriminative fine-tuning of
PLDA can further improve the model's performance. We release the
source code with this publication.

Published

2024

Pages

11871–11875

Proceedings

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Conference

2024 IEEE International Conference on Acoustics, Speech and Signal Processing IEEE, Seoul, KR

ISBN

979-8-3503-4485-1

Publisher

IEEE Signal Processing Society

Place

Seoul

DOI

10.1109/ICASSP48485.2024.10446119

EID Scopus

2-s2.0-85195386292

BibTeX

@inproceedings{BUT189781,
  author="KLEMENT, D. and DIEZ SÁNCHEZ, M. and LANDINI, F. and BURGET, L. and SILNOVA, A. and DELCROIX, M. and TAWARA, N.",
  title="Discriminative Training of VBx Diarization",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2024",
  pages="11871--11875",
  publisher="IEEE Signal Processing Society",
  address="Seoul",
  doi="10.1109/ICASSP48485.2024.10446119",
  isbn="979-8-3503-4485-1",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446119"
}

Files

pdf klement_icassp2024_Discriminative_Training_of_VBx_Diarization.pdf 945 kB