Publication Details

Sequence-discriminative Training of Deep Neural Networks

VESELÝ, K.; GHOSHAL, A.; BURGET, L.; POVEY, D. Sequence-discriminative Training of Deep Neural Networks. Proceedings of Interspeech 2013. Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013). Lyon: International Speech Communication Association, 2013. p. 2345-2349. ISBN: 978-1-62993-443-3. ISSN: 2308-457X.

Czech title

Sekvenční diskriminativní trénování hlubokých neuronových sítí

Type

conference paper

Language

English

Authors

Veselý Karel, Ing., Ph.D. (DCGM)
Ghoshal Arnab
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Povey Daniel

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2013/vesely_interspeech2013_IS131333.pdf

Keywords

speech recognition, deep learning, sequencecriteriontraining, neural networks, reproducible research

Abstract

This article presents experiments with DNN-HMM hybrid systemstrained using frame-based cross-entropy and differentsequence-discriminative criteria on the 300 hour Switchboardconversational telephone speech task.

Annotation

Sequence-discriminative training of deep neural networks (DNNs) is investigated on a standard 300 hour American English conversational telephone speech task. Different sequencediscriminative criteria-maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI - are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria - lattices are regenerated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hypotheses are disjoint are removed from the gradient computation. Starting from a competitive DNN baseline trained using cross-entropy, different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on average. Little difference is noticed between the different sequencebased criteria that are investigated. The experiments are done using the open-source Kaldi toolkit, which makes it possible for the wider community to reproduce these results.

Published

2013

Pages

2345–2349

Journal

Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013)., no. 8, ISSN 2308-457X

Proceedings

Proceedings of Interspeech 2013

Conference

Interspeech Conference, Lyon, FR

ISBN

978-1-62993-443-3

Publisher

International Speech Communication Association

Place

Lyon

BibTeX

@inproceedings{BUT103549,
  author="Karel {Veselý} and Arnab {Ghoshal} and Lukáš {Burget} and Daniel {Povey}",
  title="Sequence-discriminative Training of Deep Neural Networks",
  booktitle="Proceedings of Interspeech 2013",
  year="2013",
  journal="Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013).",
  number="8",
  pages="2345--2349",
  publisher="International Speech Communication Association",
  address="Lyon",
  isbn="978-1-62993-443-3",
  issn="2308-457X",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2013/vesely_interspeech2013_IS131333.pdf"
}