Publication Details
Progressive contrastive learning for self-supervised text-independent speaker verification
self-supervised, text-independent, speaker, verification
Self-supervised speaker representation learning hasdrawn attention extensively in recent years. Most of thework is based on the iterative clustering-classificationlearning framework, and the performance is sensitiveto the pre-defined number of clusters. However, thecluster number is hard to estimate when dealing withlarge-scale unlabeled data. In this paper, we proposea progressive contrastive learning (PCL) algorithm todynamically estimate the cluster number at each stepbased on the statistical characteristics of the data itself,and the estimated number will progressively approachthe ground-truth speaker number with the increasing ofstep. Specifically, we first update the data queue bycurrent augmented samples. Then, eigendecompositionis introduced to estimate the number of speakers in theupdated data queue. Finally, we assign the queued datainto the estimated cluster centroid and construct a contrastiveloss, which encourages the speaker representationto be closer to its cluster centroid and away from others.Experimental results on VoxCeleb1 demonstrate the effectivenessof our proposed PCL compared with existingself-supervised approaches.
@inproceedings{BUT179661,
author="Junyi {Peng} and Chunlei {Zhang} and Jan {Černocký} and Dong {Yu}",
title="Progressive contrastive learning for self-supervised text-independent speaker verification",
booktitle="Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022)",
year="2022",
pages="17--24",
publisher="International Speech Communication Association",
address="Beijing",
doi="10.21437/Odyssey.2022-3",
url="https://www.isca-speech.org/archive/pdfs/odyssey_2022/peng22_odyssey.pdf"
}