Publication Details
Variational Approximation of Long-span Language Models for LVCSR
Mikolov Tomáš, Ing., Ph.D.
Kombrink Stefan, Dipl.-Linguist.
Karafiát Martin, Ing., Ph.D. (DCGM)
Khudanpur Sanjeev
Recurrent Neural Network, Language Model, Variational Inference
We have presented experimental evidence that (n-gram) variational approximations of long-span LMs yield greater accuracy in LVCSR than standard n-gram models estimated from the same training text.
Long-span language models that capture syntax and semantics are seldom used in the first pass of large vocabulary continuous speech recognition systems due to the prohibitive search-space of sentencehypotheses. Instead, an N-best list of hypotheses is created using tractable n-gram models, and rescored using the long-span models. It is shown in this paper that computationally tractable variational approximations of the long-span models are a better choice than standard n-gram models for first pass decoding. They not only result in a better first pass output, but also produce a lattice with a lower oracle word error rate, and rescoring the N-best list from such lattices with the long-span models requires a smaller N to attain the same accuracy. Empirical results on the WSJ, MIT Lectures, NIST 2007 Meeting Recognition and NIST 2001 Conversational Telephone Recognition data sets are presented to support these claims.
@inproceedings{BUT76377,
author="Anoop {Deoras} and Tomáš {Mikolov} and Stefan {Kombrink} and Martin {Karafiát} and Sanjeev {Khudanpur}",
title="Variational Approximation of Long-span Language Models for LVCSR",
booktitle="Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011",
year="2011",
pages="5532--5535",
publisher="IEEE Signal Processing Society",
address="Praha",
isbn="978-1-4577-0537-3",
url="http://www.fit.vutbr.cz/research/groups/speech/publi/2011/deoras_icassp2011_5532.pdf"
}