Publication Details

End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA

ROHDIN, J.; SILNOVA, A.; DIEZ SÁNCHEZ, M.; PLCHOT, O.; MATĚJKA, P.; BURGET, L. End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA. In Proceedings of ICASSP. Calgary: IEEE Signal Processing Society, 2018. p. 4874-4878. ISBN: 978-1-5386-4658-8.

Czech title

End-to-end DNN rozpoznávání mluvčího inspirované i-vektory a PLDA

Type

conference paper

Language

English

Authors

Rohdin Johan Andréas, M.Sc., Ph.D. (DCGM)
Silnova Anna, M.Sc., Ph.D. (DCGM)
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM)
Plchot Oldřich, Ing., Ph.D. (DCGM)
Matějka Pavel, Ing., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2018/rohdin_icassp2018_0004874.pdf PDF

Keywords

Speaker verification, DNN, end-to-end

Abstract

Recently, several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we develop an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of endto- end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.

Published

2018

Pages

4874–4878

Proceedings

Proceedings of ICASSP

ISBN

978-1-5386-4658-8

Publisher

IEEE Signal Processing Society

Place

Calgary

DOI

10.1109/ICASSP.2018.8461958

UT WoS

000446384605009

EID Scopus

2-s2.0-85054212885

BibTeX

@inproceedings{BUT155046,
  author="Johan Andréas {Rohdin} and Anna {Silnova} and Mireia {Diez Sánchez} and Oldřich {Plchot} and Pavel {Matějka} and Lukáš {Burget}",
  title="End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA",
  booktitle="Proceedings of ICASSP",
  year="2018",
  pages="4874--4878",
  publisher="IEEE Signal Processing Society",
  address="Calgary",
  doi="10.1109/ICASSP.2018.8461958",
  isbn="978-1-5386-4658-8",
  url="https://www.fit.vut.cz/research/publication/11724/"
}