Publication Details

DNN Based Embeddings for Language Recognition

LOZANO DÍEZ, A.; PLCHOT, O.; MATĚJKA, P.; GONZALEZ-RODRIGUEZ, J. DNN Based Embeddings for Language Recognition. In Proceedings of ICASSP 2018. Calgary: IEEE Signal Processing Society, 2018. p. 5184-5188. ISBN: 978-1-5386-4658-8.

Czech title

DNN Embeddings pro rozpoznávání jazyka

Type

conference paper

Language

English

Authors

Lozano Díez Alicia, Ph.D.
Plchot Oldřich, Ing., Ph.D. (DCGM)
Matějka Pavel, Ing., Ph.D.
Gonzalez-Rodriguez Joaquin

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2018/lozano_icassp2018_0005184.pdf PDF

Keywords

Embeddings, language recognition, LID, DNN

Abstract

In this work, we present a language identification (LID) systembased on embeddings. In our case, an embedding is a fixed-lengthvector (similar to i-vector) that represents the whole utterance, butunlike i-vector it is designed to contain mostly information relevantto the target task (LID). In order to obtain these embeddings, wetrain a deep neural network (DNN) with sequence summarizationlayer to classify languages. In particular, we trained a DNN basedon bidirectional long short-term memory (BLSTM) recurrent neuralnetwork (RNN) layers, whose frame-by-frame outputs are summarizedinto mean and standard deviation statistics. After this poolinglayer, we add two fully connected layers whose outputs correspondto embeddings. Finally, we add a softmax output layer and train thewhole network with multi-class cross-entropy objective to discriminatebetween languages. We report our results on NIST LRE 2015and we compare the performance of embeddings and correspondingi-vectors both modeled by Gaussian Linear Classifier (GLC). Usingonly embeddings resulted in comparable performance to i-vectorsand by performing score-level fusion we achieved 7.3% relativeimprovement over the baseline.

Published

2018

Pages

5184–5188

Proceedings

Proceedings of ICASSP 2018

Conference

IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, CA

ISBN

978-1-5386-4658-8

Publisher

IEEE Signal Processing Society

Place

Calgary

DOI

10.1109/ICASSP.2018.8462403

UT WoS

000446384605071

EID Scopus

2-s2.0-85054288455

BibTeX

@inproceedings{BUT155045,
  author="Alicia {Lozano Díez} and Oldřich {Plchot} and Pavel {Matějka} and Joaquin {Gonzalez-Rodriguez}",
  title="DNN Based Embeddings for Language Recognition",
  booktitle="Proceedings of ICASSP 2018",
  year="2018",
  pages="5184--5188",
  publisher="IEEE Signal Processing Society",
  address="Calgary",
  doi="10.1109/ICASSP.2018.8462403",
  isbn="978-1-5386-4658-8",
  url="https://www.fit.vut.cz/research/publication/11723/"
}

Files

pdf lozano_icassp2018_0005184.pdf 275 kB