Publication Details
Lightly supervised vs. semi-supervised training of acoustic model on Luxembourgish for low-resource automatic speech recognition
PERALES, C.
Szőke Igor, Ing., Ph.D. (DCGM)
Luque Jordi
Černocký Jan, prof. Dr. Ing. (DCGM)
Luxembourgish, call centers, speech recognition,low-resourced ASR, unsupervised training
In this work, we focus on exploiting inexpensive data in orderto to improve the DNN acoustic model for ASR. We exploretwo strategies: The first one uses untranscribed data fromthe target domain. The second one is related to the proper selectionof excerpts from imperfectly transcribed out-of-domainpublic data, as parliamentary speeches. We found out that bothapproaches lead to similar results, making them equally beneficialfor practical use. The Luxembourgish ASR seed systemhad a 38.8% WER and it improved by roughly 4% absolute,leading to 34.6% for untranscribed and 34.9% for lightlysuperviseddata. Adding both databases simultaneously ledto 34.4% WER, which is only a small improvement. As asecondary research topic, we experiment with semi-supervisedstate-level minimum Bayes risk (sMBR) training. Nonetheless,for sMBR we saw no improvement from adding the automaticallytranscribed target data, despite that similar techniquesyield good results in the case of cross-entropy (CE) training.
@inproceedings{BUT155104,
author="VESELÝ, K. and PERALES, C. and SZŐKE, I. and LUQUE, J. and ČERNOCKÝ, J.",
title="Lightly supervised vs. semi-supervised training of acoustic model on Luxembourgish for low-resource automatic speech recognition",
booktitle="Proceedings of Interspeech 2018",
year="2018",
journal="Proceedings of Interspeech",
volume="2018",
number="9",
pages="2883--2887",
publisher="International Speech Communication Association",
address="Hyderabad",
doi="10.21437/Interspeech.2018-2361",
issn="1990-9772",
url="https://www.isca-speech.org/archive/Interspeech_2018/abstracts/2361.html"
}