Publication Details
Multi-Channel Extension of Pre-trained Models for Speaker Verification
SERIZEL, R.
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Plchot Oldřich, Ing., Ph.D. (DCGM)
VINCENT, E.
Peng Junyi (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
multi-channel speaker verification, pre-trained models
In this work, we focus on designing a multi-channel speech processing system
based on large pre-trained models. These models are typically trained for
single-channel scenarios via self-supervised learning (SSL). A common approach to
using the SSL models with microphone array data is to prepend it with
a multi-channel speech enhancement. The downside is that spatial information can
be leveraged only by the pre-processing stage, and enhancement errors get
propagated to the SSL model. We aim to alleviate the issue by designing METRO,
a Multi-channel ExTension of pRe-trained mOdels. It interleaves per- channel
processing with cross-channel information exchange, eventually fusing channels
into one. While our approach is general, here we focus on multi-channel speaker
verification. Our experiments on the MultiSV corpus show noteworthy improvements
over the best-published results on the dataset.
@inproceedings{BUT193682,
author="MOŠNER, L. and SERIZEL, R. and BURGET, L. and PLCHOT, O. and VINCENT, E. and PENG, J. and ČERNOCKÝ, J.",
title="Multi-Channel Extension of Pre-trained Models for Speaker Verification",
booktitle="Proceedings of Interspeech 2024",
year="2024",
journal="Proceedings of Interspeech",
volume="2024",
number="9",
pages="2135--2139",
publisher="International Speech Communication Association",
address="Kos",
doi="10.21437/Interspeech.2024-1260",
issn="1990-9772",
url="https://www.isca-archive.org/interspeech_2024/mosner24_interspeech.pdf"
}