Publication Details

End-to-end DNN based text-independent speaker recognition for long and short utterances

ROHDIN, J.; SILNOVA, A.; DIEZ SÁNCHEZ, M.; PLCHOT, O.; MATĚJKA, P.; BURGET, L.; GLEMBEK, O. End-to-end DNN based text-independent speaker recognition for long and short utterances. COMPUTER SPEECH AND LANGUAGE, 2020, vol. 2020, no. 59, p. 22-35. ISSN: 0885-2308.

Czech title

Rozpoznávání mluvčího závislé na textu založené na End-to-end DNN přístupu pro dlouhé a krátké promluvy

Type

journal article

Language

English

Authors

Rohdin Johan Andréas, M.Sc., Ph.D. (DCGM)
Silnova Anna, M.Sc., Ph.D. (DCGM)
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM)
Plchot Oldřich, Ing., Ph.D. (DCGM)
Matějka Pavel, Ing., Ph.D.
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Glembek Ondřej, Ing., Ph.D.

URL

Keywords

Speaker verification, DNN, End-to-end, Text-independent, i-vector, PLDA

Abstract

Recently several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we present an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of end-to-end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.

Published

2020

Pages

22–35

Journal

COMPUTER SPEECH AND LANGUAGE, vol. 2020, no. 59, ISSN 0885-2308

DOI

10.1016/j.csl.2019.06.002

UT WoS

000490540900002

EID Scopus

2-s2.0-85067618095

BibTeX

@article{BUT158088,
  author="Johan Andréas {Rohdin} and Anna {Silnova} and Mireia {Diez Sánchez} and Oldřich {Plchot} and Pavel {Matějka} and Lukáš {Burget} and Ondřej {Glembek}",
  title="End-to-end DNN based text-independent speaker recognition for long and short utterances",
  journal="COMPUTER SPEECH AND LANGUAGE",
  year="2020",
  volume="2020",
  number="59",
  pages="22--35",
  doi="10.1016/j.csl.2019.06.002",
  issn="0885-2308",
  url="https://www.sciencedirect.com/science/article/pii/S0885230818303632"
}

Files

pdf rohdin_elsevier_Journal_Paper_2020_18303632.pdf 522 kB