Publication Details
ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform
QU, X.
WANG, J.
GU, R.
XIAO, J.
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
end-to-end speaker verification, raw waveform,complex neural networks, interpretable complex filters
Recently, extracting speaker embedding directly from raw waveformhas drawn increasing attention in the field of speaker verification.Parametric real-valued filters in the first convolutionallayer are learned to transform the waveform into time-frequencyrepresentations. However, these methods only focus on themagnitude spectrum and the poor interpretability of the learnedfilters limits the performance. In this paper, we propose a complexspeaker embedding extractor, named ICSpk, with higherinterpretability and fewer parameters. Specifically, at first, toquantify the speaker-related frequency response of waveform,we modify the original short-term Fourier transform filters intoa family of complex exponential filters, named interpretablecomplex (IC) filters. Each IC filter is confined by a complexexponential filter parameterized by frequency. Then, a deepcomplex-valued speaker embedding extractor is designed to operateon the complex-valued output of IC filters. The proposedICSpk is evaluated onVoxCeleb andCNCeleb databases. Experimentalresults demonstrate the IC filters-based system exhibitsa significant improvement over the complex spectrogram basedsystems. Furthermore, the proposed ICSpk outperforms existingraw waveform based systems by a large margin.
@inproceedings{BUT175835,
author="PENG, J. and QU, X. and WANG, J. and GU, R. and XIAO, J. and BURGET, L. and ČERNOCKÝ, J.",
title="ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform",
booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
year="2021",
journal="Proceedings of Interspeech",
volume="2021",
number="8",
pages="511--515",
publisher="International Speech Communication Association",
address="Brno",
doi="10.21437/Interspeech.2021-2016",
issn="1990-9772",
url="https://www.isca-speech.org/archive/interspeech_2021/peng21_interspeech.html"
}