Publication Details
BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge
Umesh Jahnavi
Karafiát Martin, Ing., Ph.D. (DCGM)
Švec Ján, Ing. (DCGM)
Lopez Fernando
Beneš Karel, Ing. (DCGM)
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM)
Szőke Igor, Ing., Ph.D. (DCGM)
Luque Jordi
Veselý Karel, Ing., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
ASR fusion, end-to-end model, self-supervised
learning, automatic speech recognition.
Research
on the development of Automatic Speech Recognition
systems for the Albayzin 2022 Challenge. We train and evaluate
both hybrid systems and those based on end-to-end models.
We also investigate the use of self-supervised learning speech
representations from pre-trained models and their impact on
ASR performance (as opposed to training models directly from
scratch). Additionally, we also apply the Whisper model in a
zero-shot fashion, postprocessing its output to fit the required
transcription format. On top of tuning the model architectures
and overall training schemes, we improve the robustness of our
models by augmenting the training data with noises extracted
from the target domain. Moreover, we apply rescoring with
an external LM on top of N-best hypotheses to adjust each
sentence score and pick the single best hypothesis. All these
efforts lead to a significant WER reduction. Our single best
system and the fusion of selected systems achieved 16.3% and
13.7% WER respectively on RTVE2020 test partition, i.e. the
official evaluation partition from the previous Albayzin challenge.
@inproceedings{BUT180167,
author="Martin {Kocour} and Jahnavi {Umesh} and Martin {Karafiát} and Ján {Švec} and Fernando {Lopez} and Karel {Beneš} and Mireia {Diez Sánchez} and Igor {Szőke} and Jordi {Luque} and Karel {Veselý} and Lukáš {Burget} and Jan {Černocký}",
title="BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge",
booktitle="Proceedings of IberSpeech 2022",
year="2022",
pages="276--280",
publisher="International Speech Communication Association",
address="Granada",
doi="10.21437/IberSPEECH.2022-56",
url="https://www.isca-speech.org/archive/pdfs/iberspeech_2022/kocour22_iberspeech.pdf"
}