Publication Details
Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
Stafylakis Themos
Landini Federico Nicolás, Ph.D. (RG SPEECH)
DIEZ SÁNCHEZ, M.
Silnova Anna, M.Sc., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
End-to-End Neural Diarization, Speaker Characteristic Information
In this paper, we apply the variational information bottleneck approach to
end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This
allows us to investigate what in- formation is essential for the model. EEND-EDA
utilizes attrac- tors, vector representations of speakers in a conversation. Our
analysis shows that, attractors do not necessarily have to con- tain speaker
characteristic information. On the other hand, giv- ing the attractors more
freedom to allow them to encode some extra (possibly speaker-specific)
information leads to small but consistent diarization performance improvements.
Despite ar- chitectural differences in EEND systems, the notion of attrac- tors
and frame embeddings is common to most of them and not specific to EEND-EDA. We
believe that the main conclu- sions of this work can apply to other variants of
EEND. Thus, we hope this paper will be a valuable contribution to guide the
community to make more informed decisions when designing new systems.
@inproceedings{BUT193432,
author="ZHANG, L. and STAFYLAKIS, T. and LANDINI, F. and DIEZ SÁNCHEZ, M. and SILNOVA, A. and BURGET, L.",
title="Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?",
booktitle="Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop",
year="2024",
pages="123--130",
publisher="International Speech Communication Association",
address="Québec City",
doi="10.21437/odyssey.2024-18",
url="https://www.isca-archive.org/odyssey_2024/zhang24_odyssey.pdf"
}