BCN2BRNO Automatic speech recognition system for Albayzin 2022 Speech to Text Challenge

Czech title

BCN2BRNO: ASR systém pro Albayzin 2022 Speech to Text Challenge

Type

software

License

In order to use the result by another entity, it is always necessary to acquire a license

License Fee

The licensor does not require a license fee for the result

Authors

Kocour Martin, Ing. (DCGM)
Umesh Jahnavi
Karafiát Martin, Ing., Ph.D. (DCGM)
Švec Ján, Ing. (DCGM)
Lopez Fernando
Beneš Karel, Ing. (DCGM)
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM)
Szőke Igor, Ing., Ph.D. (DCGM)
Luque Jordi
Veselý Karel, Ing., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)

Keywords

automatic speech recognition

Description

The software is based on the development of Automatic Speech Recognition systems
for the Albayzin 2022 Challenge. We trained and evaluated both hybrid systems and
those based on end-to-end models. We also investigated the use of self-supervised
learning speech representations from pre-trained models and their impact on ASR
performance (as opposed to training models directly from scratch). Additionally,
we also applied the Whisper model in a zero-shot fashion, postprocessing its
output to fit the required transcription format. On top of tuning the model
architectures and overall training schemes, we improved the robustness of our
models by augmenting the training data with noises extracted from the target
domain. Moreover, we applied rescoring with an external LM on top of N-best
hypotheses to adjust each sentence score and pick the single best hypothesis. All
these efforts lead to a significant WER reduction. Our single best system and the
fusion of selected systems achieved 16.3% and 13.7% WER respectively on RTVE2020
test partition, i.e. the official evaluation partition from the previous Albayzin
challenge

Location

Kontaktujte: https://www.fit.vut.cz/person/cernocky/ nebo https://www.fit.vut.cz/person/ikocour/

Projects

Multi-linguality in speech technologies, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION 19LTAIN, LTAIN19087, start: 2020-01-01, end: 2023-08-31, running

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)