Publication Details

Improving Speaker Verification with Self-Pretrained Transformer Models

PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Improving Speaker Verification with Self-Pretrained Transformer Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 5361-5365. ISSN: 1990-9772.

Czech title

Zlepšení ověřování mluvčího pomocí samoučících se modelů typu Transformer

Type

conference paper

Language

English

Authors

Peng Junyi (DCGM)
Plchot Oldřich, Ing., Ph.D. (DCGM)
Stafylakis Themos
Mošner Ladislav, Ing. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)

URL

Keywords

speaker verification, pre-trained speech transformer model, pre-training,

Abstract

Recently, fine-tuning large pre-trained Transformer models using
downstream datasets has received a rising interest. Despite
their success, it is still challenging to disentangle the benefits
of large-scale datasets and Transformer structures from the limitations
of the pre-training. In this paper, we introduce a hierarchical
training approach, named self-pretraining, in which
Transformer models are pretrained and finetuned on the same
dataset. Three pre-trained models including HuBERT, Conformer
andWavLM are evaluated on four different speaker verification
datasets with varying sizes. Our experiments show that
these self-pretrained models achieve competitive performance
on downstream speaker verification tasks with only one-third
of the data compared to Librispeech pretraining, such as Vox-
Celeb1 and CNCeleb1. Furthermore, when pre-training only
on the VoxCeleb2-dev, the Conformer model outperforms the
one pre-trained on 94k hours of data using the same fine-tuning
settings.

Published

2023

Pages

5361–5365

Journal

Proceedings of Interspeech, vol. 2023, no. 08, ISSN 1990-9772

Proceedings

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference

Interspeech Conference, Dublin, IE

Publisher

International Speech Communication Association

Place

Dublin

DOI

10.21437/Interspeech.2023-453

EID Scopus

2-s2.0-85171555712

BibTeX

@inproceedings{BUT185575,
  author="Junyi {Peng} and Oldřich {Plchot} and Themos {Stafylakis} and Ladislav {Mošner} and Lukáš {Burget} and Jan {Černocký}",
  title="Improving Speaker Verification with Self-Pretrained Transformer Models",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="08",
  pages="5361--5365",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-453",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23_interspeech.pdf"
}

Files

pdf peng23_interspeech2023_improving.pdf 643 kB