Publication Details
Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks
Profant Ján, Ing.
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Speaker diarization, Variational Bayes, HMM, x-vector, AMI
The recently proposed VBx diarization method uses a Bayesian hidden Markov model
to find speaker clusters in a sequence of x-vectors. In this work we perform an
extensive comparison of performance of the VBx diarization with other approaches
in the literature and we show that VBx achieves superior performance on three of
the most popular datasets for evaluating diarization: CALLHOME, AMI and DIHARD II
datasets. Further, we present for the first time the derivation and update
formulae for the VBx model, focusing on the efficiency and simplicity of this
model as compared to the previous and more complex BHMM model working on
frame-by-frame standard Cepstral features. Together with this publication, we
release the recipe for training the x-vector extractors used in our experiments
on both wide and narrowband data, and the VBx recipes that attain
state-of-the-art performance on all three datasets. Besides, we point out the
lack of a standardized evaluation protocol for AMI dataset and we propose a new
protocol for both Beamformed and Mix-Headset audios based on the official AMI
partitions and transcriptions.
@article{BUT175852,
author="Federico Nicolás {Landini} and Ján {Profant} and Mireia {Diez Sánchez} and Lukáš {Burget}",
title="Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks",
journal="COMPUTER SPEECH AND LANGUAGE",
year="2022",
volume="71",
number="101254",
pages="1--16",
doi="10.1016/j.csl.2021.101254",
issn="0885-2308",
url="https://www.sciencedirect.com/science/article/pii/S0885230821000619"
}