Publication Details
Advancing speaker embedding learning: Wespeaker toolkit for research and production
CHEN, Z.
HAN, B.
WANG, H.
XIANG, X.
Rohdin Johan Andréas, M.Sc., Ph.D. (DCGM)
Silnova Anna, M.Sc., Ph.D. (DCGM)
Qian Yanmin
Li Haizhou
and others
- https://pdf.sciencedirectassets.com/271578/1-s2.0-S0167639324X00060/1-s2.0-S0167639324000761/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAsaCXVzLWVhc3QtMSJIMEYCIQC8Doe66%2Bu6V%2FODd2NY6EZwVTEeN05avzWi09%2FPx3ob%2FQIhAP%2BOyz3L2hXSsDYY4l3zSuz1pzOjFiaTh%
- https://www.fit.vut.cz/research/group/speech/public/publi/2024/wang_speech%20communication_2024.pdf PDF
Wespeaker; Speaker embedding learning; SSL; Open-source
Speaker modeling plays a crucial role in various tasks, and fixed-dimensional
vector representations, known as speaker embeddings, are the predominant modeling
approach. These embeddings are typically evaluated within the framework of
speaker verification, yet their utility extends to a broad scope of related tasks
including speaker diarization, speech synthesis, voice conversion, and target
speaker extraction. This paper presents Wespeaker, a user-friendly toolkit
designed for both research and production purposes, dedicated to the learning of
speaker embeddings. Wespeaker offers scalable data management, state-of-the-art
speaker embedding models, and self-supervised learning training schemes with the
potential to leverage large-scale unlabeled real-world data. The toolkit
incorporates structured recipes that have been successfully adopted in winning
systems across various speaker verification challenges, ensuring highly
competitive results. For production-oriented development, Wespeaker integrates
CPU- and GPU-compatible deployment and runtime codes, supporting mainstream
platforms such as Windows, Linux, Mac and on-device chips such as horizon X3'PI.
Wespeaker also provides off-the-shelf high-quality speaker embeddings by
providing various pretrained models, which can be effortlessly applied to
different tasks that require speaker modeling. The toolkit is publicly available
at https://github.com/wenet-e2e/wespeaker.
@article{BUT193986,
author="WANG, S. and CHEN, Z. and HAN, B. and WANG, H. and XIANG, X. and ROHDIN, J. and SILNOVA, A. and QIAN, Y. and LI, H.",
title="Advancing speaker embedding learning: Wespeaker toolkit for research and production",
journal="Speech Communication",
year="2024",
volume="162",
number="103104",
pages="1--12",
doi="10.1016/j.specom.2024.103104",
issn="0167-6393",
url="https://pdf.sciencedirectassets.com/271578/1-s2.0-S0167639324X00060/1-s2.0-S0167639324000761/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAsaCXVzLWVhc3QtMSJIMEYCIQC8Doe66%2Bu6V%2FODd2NY6EZwVTEeN05avzWi09%2FPx3ob%2FQIhAP%2BOyz3L2hXSsDYY4l3zSuz1pzOjFiaTh%"
}