Publication Details

Improving Language Models for ASR Using Translated In-domain Data

KOMBRINK, S.; MIKOLOV, T.; KARAFIÁT, M.; BURGET, L. Improving Language Models for ASR Using Translated In-domain Data. Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto: IEEE Signal Processing Society, 2012. p. 4405-4408. ISBN: 978-1-4673-0044-5.

Czech title

Vylepšení jazykových modelů pro rozpoznávání řeči pomocí přeložených dat z cílové oblasti

Type

conference paper

Language

English

Authors

Kombrink Stefan, Dipl.-Linguist.
Mikolov Tomáš, Ing., Ph.D.
Karafiát Martin, Ing., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2012/kombrink_icassp2012_0004405.pdf

Keywords

Low Resource ASR, Language Modeling,Machine Translation

Abstract

This paper descibes how to do the acquisition of in-domain training data for the puspose of building speech recognition systems for under-resourced languages.

Annotation

Acquisition of in-domain training data to build speech recognition systems for under-resourced languages can be a costly, time-demanding and tedious process. In this work, we propose the use of machine translation to translate English transcripts of telephone speech into Czech language in order to improve a Czech CTS speech recognition system. The translated transcripts are used as additional language model training data in a scenario where the baseline language model is trained on off- and close-domain data only. We report perplexities, OOV and word error rates and examine different data sets and translators on their suitability for the described task.

Published

2012

Pages

4405–4408

Proceedings

Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Conference

The 37th International Conference on Acoustics, Speech, and Signal Processing, Kyoto, JP

ISBN

978-1-4673-0044-5

Publisher

IEEE Signal Processing Society

Place

Kyoto

DOI

10.1109/ICASSP.2012.6288896

BibTeX

@inproceedings{BUT91478,
  author="Stefan {Kombrink} and Tomáš {Mikolov} and Martin {Karafiát} and Lukáš {Burget}",
  title="Improving Language Models for ASR Using Translated In-domain Data",
  booktitle="Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing",
  year="2012",
  pages="4405--4408",
  publisher="IEEE Signal Processing Society",
  address="Kyoto",
  doi="10.1109/ICASSP.2012.6288896",
  isbn="978-1-4673-0044-5",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2012/kombrink_icassp2012_0004405.pdf"
}