Publication Details
Approaches to automatic lexicon learning with limited training examples
Thomas Samuel
Agarwal Mohit
Akyazi Pinar
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Feng Kai
Ghoshal Arnab
Glembek Ondřej, Ing., Ph.D.
Karafiát Martin, Ing., Ph.D. (DCGM)
Povey Daniel
Rastrow Ariya
Rose Richard
Schwarz Petr, Ing., Ph.D. (DCGM)
Lexicon Learning, LVCSR
The paper is on approaches to automatic lexicon learning with limited training examples. We use a combination of lexicon learning techniques.
Preparation of a lexicon for speech recognition systems can be a significant effort in languages where the written form is not exactly phonetic. On the other hand, in languages where the written form is quite phonetic, some common words are often mispronounced. In this paper, we use a combination of lexicon learning techniques to explore whether a lexicon can be learned when only a small lexicon is available for boot-strapping. We discover that for a phonetic language such as Spanish, it is possible to do that better than what is possible from generic rules or hand-crafted pronunciations. For a more complex language such as English, we find that it is still possible but with some loss of accuracy.
@inproceedings{BUT37050,
author="Nagendra {Goel} and Samuel {Thomas} and Mohit {Agarwal} and Pinar {Akyazi} and Lukáš {Burget} and Kai {Feng} and Arnab {Ghoshal} and Ondřej {Glembek} and Martin {Karafiát} and Daniel {Povey} and Ariya {Rastrow} and Richard {Rose} and Petr {Schwarz}",
title="Approaches to automatic lexicon learning with limited training examples",
booktitle="Proc. International Conference on Acoustics, Speech, and Signal Processing",
year="2010",
journal="Proc. International Conference on Acoustics, Speech, and Signal Processing",
volume="2010",
number="3",
pages="5094--5097",
publisher="IEEE Signal Processing Society",
address="Dallas",
isbn="978-1-4244-4296-6",
issn="1520-6149",
url="http://www.fit.vutbr.cz/research/groups/speech/publi/2010/goel_icassp2010_0005094.pdf"
}