Publication Details
Learnable Sparse Filterbank for Speaker Verification
GU, R.
Mošner Ladislav, Ing. (DCGM)
Plchot Oldřich, Ing., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
learnable filter, sparse filtering, sparsity, speaker verification
Recently, feature extraction with learnable filters was extensively investigated
with speaker verification systems, with filters learned both in time- and
frequency-domains. Most of the learned schemes however end up with filters close
to their initialization (e.g. Mel filterbank) or filters strongly limited by
their constraints. In this paper, we propose a novel learnable sparse filterbank,
named LearnSF, by exclusively optimizing the sparsity of the filterbank, that
does not explicitly constrain the filters to follow pre-defined distribution.
After standard pre-processing (STFT and square of the magnitude spectrum), the
learnable sparse filterbank is employed, with its normalized outputs fed into
a neural network predicting the speaker identity. We evaluated the performance of
the proposed approach on both VoxCeleb and CNCeleb datasets. The experimental
results demonstrate the effectiveness of the proposed LearnSF compared to both
widely-used acoustic features and existing parameterized learnable front-ends.
@inproceedings{BUT179826,
author="PENG, J. and GU, R. and MOŠNER, L. and PLCHOT, O. and BURGET, L. and ČERNOCKÝ, J.",
title="Learnable Sparse Filterbank for Speaker Verification",
booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
year="2022",
journal="Proceedings of Interspeech",
number="9",
pages="5110--5114",
publisher="International Speech Communication Association",
address="Incheon",
doi="10.21437/Interspeech.2022-11309",
issn="1990-9772",
url="https://www.isca-speech.org/archive/pdfs/interspeech_2022/peng22e_interspeech.pdf"
}