Publication Details
Improving Speaker Verification with Self-Pretrained Transformer Models
Plchot Oldřich, Ing., Ph.D. (DCGM)
Stafylakis Themos
Mošner Ladislav, Ing. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
speaker verification, pre-trained speech transformer model, pre-training,
Recently, fine-tuning large pre-trained Transformer models using
downstream datasets has received a rising interest. Despite
their success, it is still challenging to disentangle the benefits
of large-scale datasets and Transformer structures from the limitations
of the pre-training. In this paper, we introduce a hierarchical
training approach, named self-pretraining, in which
Transformer models are pretrained and finetuned on the same
dataset. Three pre-trained models including HuBERT, Conformer
andWavLM are evaluated on four different speaker verification
datasets with varying sizes. Our experiments show that
these self-pretrained models achieve competitive performance
on downstream speaker verification tasks with only one-third
of the data compared to Librispeech pretraining, such as Vox-
Celeb1 and CNCeleb1. Furthermore, when pre-training only
on the VoxCeleb2-dev, the Conformer model outperforms the
one pre-trained on 94k hours of data using the same fine-tuning
settings.
@inproceedings{BUT185575,
author="Junyi {Peng} and Oldřich {Plchot} and Themos {Stafylakis} and Ladislav {Mošner} and Lukáš {Burget} and Jan {Černocký}",
title="Improving Speaker Verification with Self-Pretrained Transformer Models",
booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
year="2023",
journal="Proceedings of Interspeech",
volume="2023",
number="08",
pages="5361--5365",
publisher="International Speech Communication Association",
address="Dublin",
doi="10.21437/Interspeech.2023-453",
issn="1990-9772",
url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23_interspeech.pdf"
}