Publication Details
Investigation of Specaugment for Deep Speaker Embedding Learning
Rohdin Johan Andréas, M.Sc., Ph.D. (DCGM)
Plchot Oldřich, Ing., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
YU, K.
Černocký Jan, prof. Dr. Ing. (DCGM)
speaker embedding, on-the-fly data augmentation, speaker verification,
SpecAugment is a newly proposed data augmentation method for speech recognition.
By randomly masking bands in the log Mel spectogram this method leads to
impressive performance improvements. In this paper, we investigate the usage of
SpecAugment for speaker verification tasks. Two different models, namely 1-D
convolutional TDNN and 2-D convolutional ResNet34, trained with either Softmax or
AAM-Softmax loss, are used to analyze SpecAugments effectiveness. Experiments are
carried out on the Voxceleb and NIST SRE 2016 dataset. By applying SpecAugment to
the original clean data in an on-the-fly manner without complex off-line data
augmentation methods, we obtained 3.72% and 11.49% EER for NIST SRE 2016
Cantonese and Tagalog, respectively. For Voxceleb1 evaluation set, we obtained
1.47% EER.
author="WANG, S. and ROHDIN, J. and PLCHOT, O. and BURGET, L. and YU, K. and ČERNOCKÝ, J.",
title="Investigation of Specaugment for Deep Speaker Embedding Learning",
booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher="IEEE Signal Processing Society",