Publication Details
Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
Stafylakis Themos
Landini Federico Nicolás, Ph.D. (RG SPEECH)
Silnova Anna, M.Sc., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
End-to-End Neural Diarization, Speaker Characteristic Information
In this paper, we apply the variational information bottleneck approach to
end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This
allows us to investigate what in- formation is essential for the model. EEND-EDA
utilizes attrac- tors, vector representations of speakers in a conversation. Our
analysis shows that, attractors do not necessarily have to con- tain speaker
characteristic information. On the other hand, giv- ing the attractors more
freedom to allow them to encode some extra (possibly speaker-specific)
information leads to small but consistent diarization performance improvements.
Despite ar- chitectural differences in EEND systems, the notion of attrac- tors
and frame embeddings is common to most of them and not specific to EEND-EDA. We
believe that the main conclu- sions of this work can apply to other variants of
EEND. Thus, we hope this paper will be a valuable contribution to guide the
community to make more informed decisions when designing new systems.
author="ZHANG, L. and STAFYLAKIS, T. and LANDINI, F. and DIEZ SÁNCHEZ, M. and SILNOVA, A. and BURGET, L.",
title="Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?",
booktitle="Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop",
publisher="International Speech Communication Association",
address="Québec City",