Publication Details
Automatic Speech Recognition Benchmark for Air-Traffic Communications
Motlíček Petr, doc. Ing., Ph.D. (DCGM)
ZHAN, Q.
Veselý Karel, Ing., Ph.D. (DCGM)
BRAUN, R.
Speech Recognition, Air Traffic Control, TransferLearning, Deep Neural Networks, Lattice-Free MMI
Advances in Automatic Speech Recognition (ASR) over the lastdecade opened new areas of speech-based automation such as inAir-Traffic Control (ATC) environments. Currently, voice communicationand data links communications are the only wayof contact between pilots and Air-Traffic Controllers (ATCo),where the former is the most widely used and the latter is anon-spoken method mandatory for oceanic messages and limitedfor some domestic issues. ASR systems on ATCo environmentsinherit increasing complexity due to accents from non-English speakers, cockpit noise, speaker-dependent biases andsmall in-domain ATC databases for training. Hereby, we introduceCleanSky EC-H2020 ATCO2, a project that aims todevelop an ASR-based platform to collect, organize and automaticallypre-process ATCo speech-data from air space. Thispaper conveys an exploratory benchmark of several state-ofthe-art ASR models trained on more than 170 hours of ATCospeech-data. We demonstrate that the cross-accent flaws dueto speakers accents are minimized due to the amount of data,making the system feasible for ATC environments. The developedASR system achieves an averaged word error rate (WER)of 7.75% across four databases. An additional 35% relative improvementin WER is achieved on one test set when training aTDNNF system with byte-pair encoding.
@inproceedings{BUT168149,
author="ZULUAGA-GOMEZ, J. and MOTLÍČEK, P. and ZHAN, Q. and VESELÝ, K. and BRAUN, R.",
title="Automatic Speech Recognition Benchmark for Air-Traffic Communications",
booktitle="Proceedings of Interspeech 2020",
year="2020",
journal="Proceedings of Interspeech",
volume="2020",
number="10",
pages="2297--2301",
publisher="International Speech Communication Association",
address="Shanghai",
doi="10.21437/Interspeech.2020-2173",
issn="1990-9772",
url="https://isca-speech.org/archive/Interspeech_2020/pdfs/2173.pdf"
}