Publication Details

BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge

KOCOUR, M.; UMESH, J.; KARAFIÁT, M.; ŠVEC, J.; LOPEZ, F.; BENEŠ, K.; DIEZ SÁNCHEZ, M.; SZŐKE, I.; LUQUE, J.; VESELÝ, K.; BURGET, L.; ČERNOCKÝ, J. BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge. Proceedings of IberSpeech 2022. Granada: International Speech Communication Association, 2022. p. 276-280.
Czech title
BCN2BRNO: Fúze ASR systémů pro Albayzin 2022 Speech to Text Challenge
Type
conference paper
Language
English
Authors
URL
Keywords

ASR fusion, end-to-end model, self-supervised learning, automatic speech
recognition.

Abstract

Research on the development of Automatic Speech Recognition systems for the
Albayzin 2022 Challenge. We train and evaluate both hybrid systems and those
based on end-to-end models. We also investigate the use of self-supervised
learning speech representations from pre-trained models and their impact on ASR
performance (as opposed to training models directly from scratch). Additionally,
we also apply the Whisper model in a zero-shot fashion, postprocessing its output
to fit the required transcription format. On top of tuning the model
architectures and overall training schemes, we improve the robustness of our
models by augmenting the training data with noises extracted from the target
domain. Moreover, we apply rescoring with an external LM on top of N-best
hypotheses to adjust each sentence score and pick the single best hypothesis. All
these efforts lead to a significant WER reduction. Our single best system and the
fusion of selected systems achieved 16.3% and 13.7% WER respectively on RTVE2020
test partition, i.e. the official evaluation partition from the previous Albayzin
challenge.

Published
2022
Pages
276–280
Proceedings
Proceedings of IberSpeech 2022
Conference
IberSPEECH 2022 Conference, Granada, ES
Publisher
International Speech Communication Association
Place
Granada
DOI
BibTeX
@inproceedings{BUT180167,
  author="Martin {Kocour} and Jahnavi {Umesh} and Martin {Karafiát} and Ján {Švec} and Fernando {Lopez} and Karel {Beneš} and Mireia {Diez Sánchez} and Igor {Szőke} and Jordi {Luque} and Karel {Veselý} and Lukáš {Burget} and Jan {Černocký}",
  title="BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge",
  booktitle="Proceedings of IberSpeech 2022",
  year="2022",
  pages="276--280",
  publisher="International Speech Communication Association",
  address="Granada",
  doi="10.21437/IberSPEECH.2022-56",
  url="https://www.isca-speech.org/archive/pdfs/iberspeech_2022/kocour22_iberspeech.pdf"
}
Files
Back to top