Publication Details
Sequence-discriminative Training of Deep Neural Networks
speech recognition, deep learning, sequencecriterion training, neural networks, reproducible research
This article presents experiments with DNN-HMM hybrid systems trained using frame-based cross-entropy and different sequence-discriminative criteria on the 300 hour Switchboard conversational telephone speech task.
Sequence-discriminative training of deep neural networks (DNNs) is investigated on a standard 300 hour American English conversational telephone speech task. Different sequencediscriminative criteria-maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI - are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria - lattices are regenerated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hypotheses are disjoint are removed from the gradient computation. Starting from a competitive DNN baseline trained using cross-entropy, different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on average. Little difference is noticed between the different sequencebased criteria that are investigated. The experiments are done using the open-source Kaldi toolkit, which makes it possible for the wider community to reproduce these results.
@inproceedings{BUT103549,
author="Karel {Veselý} and Arnab {Ghoshal} and Lukáš {Burget} and Daniel {Povey}",
title="Sequence-discriminative Training of Deep Neural Networks",
booktitle="Proceedings of Interspeech 2013",
year="2013",
journal="Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013).",
number="8",
pages="2345--2349",
publisher="International Speech Communication Association",
address="Lyon",
isbn="978-1-62993-443-3",
issn="2308-457X",
url="http://www.fit.vutbr.cz/research/groups/speech/publi/2013/vesely_interspeech2013_IS131333.pdf"
}