Publication Details

Automatic Speech Recognition Benchmark for Air-Traffic Communications

ZULUAGA-GOMEZ, J.; MOTLÍČEK, P.; ZHAN, Q.; VESELÝ, K.; BRAUN, R. Automatic Speech Recognition Benchmark for Air-Traffic Communications. In Proceedings of Interspeech 2020. Proceedings of Interspeech. Shanghai: International Speech Communication Association, 2020. p. 2297-2301. ISSN: 1990-9772.
Czech title
Srovnávací test automatického rozpoznávání řeči pro hlasovou komunikací v leteckém provozu
Type
conference paper
Language
English
Authors
ZULUAGA-GOMEZ, J.
Motlíček Petr, doc. Ing., Ph.D. (DCGM)
ZHAN, Q.
Veselý Karel, Ing., Ph.D. (DCGM)
BRAUN, R.
URL
Keywords

Speech Recognition, Air Traffic Control, TransferLearning, Deep Neural Networks, Lattice-Free MMI

Abstract

Advances in Automatic Speech Recognition (ASR) over the lastdecade opened new areas of speech-based automation such as inAir-Traffic Control (ATC) environments. Currently, voice communicationand data links communications are the only wayof contact between pilots and Air-Traffic Controllers (ATCo),where the former is the most widely used and the latter is anon-spoken method mandatory for oceanic messages and limitedfor some domestic issues. ASR systems on ATCo environmentsinherit increasing complexity due to accents from non-English speakers, cockpit noise, speaker-dependent biases andsmall in-domain ATC databases for training. Hereby, we introduceCleanSky EC-H2020 ATCO2, a project that aims todevelop an ASR-based platform to collect, organize and automaticallypre-process ATCo speech-data from air space. Thispaper conveys an exploratory benchmark of several state-ofthe-art ASR models trained on more than 170 hours of ATCospeech-data. We demonstrate that the cross-accent flaws dueto speakers accents are minimized due to the amount of data,making the system feasible for ATC environments. The developedASR system achieves an averaged word error rate (WER)of 7.75% across four databases. An additional 35% relative improvementin WER is achieved on one test set when training aTDNNF system with byte-pair encoding.

Published
2020
Pages
2297–2301
Journal
Proceedings of Interspeech, vol. 2020, no. 10, ISSN 1990-9772
Proceedings
Proceedings of Interspeech 2020
Conference
Interspeech, Shanghai, CN
Publisher
International Speech Communication Association
Place
Shanghai
DOI
UT WoS
000833594102086
EID Scopus
BibTeX
@inproceedings{BUT168149,
  author="ZULUAGA-GOMEZ, J. and MOTLÍČEK, P. and ZHAN, Q. and VESELÝ, K. and BRAUN, R.",
  title="Automatic Speech Recognition Benchmark for Air-Traffic Communications",
  booktitle="Proceedings of Interspeech 2020",
  year="2020",
  journal="Proceedings of Interspeech",
  volume="2020",
  number="10",
  pages="2297--2301",
  publisher="International Speech Communication Association",
  address="Shanghai",
  doi="10.21437/Interspeech.2020-2173",
  issn="1990-9772",
  url="https://isca-speech.org/archive/Interspeech_2020/pdfs/2173.pdf"
}
Files
Back to top