Publication Details

Automatic Speech Recognition Benchmark for Air-Traffic Communications

ZULUAGA-GOMEZ, J.; MOTLÍČEK, P.; ZHAN, Q.; VESELÝ, K.; BRAUN, R. Automatic Speech Recognition Benchmark for Air-Traffic Communications. In Proceedings of Interspeech 2020. Proceedings of Interspeech. Shanghai: International Speech Communication Association, 2020. p. 2297-2301. ISSN: 1990-9772.

Czech title

Srovnávací test automatického rozpoznávání řeči pro hlasovou komunikací v leteckém provozu

Type

conference paper

Language

English

Authors

ZULUAGA-GOMEZ, J.
Motlíček Petr, doc. Ing., Ph.D. (DCGM)
ZHAN, Q.
Veselý Karel, Ing., Ph.D. (DCGM)
BRAUN, R.

URL

Keywords

Speech Recognition, Air Traffic Control, TransferLearning, Deep Neural Networks, Lattice-Free MMI

Abstract

Advances in Automatic Speech Recognition (ASR) over the lastdecade opened new areas of speech-based automation such as inAir-Traffic Control (ATC) environments. Currently, voice communicationand data links communications are the only wayof contact between pilots and Air-Traffic Controllers (ATCo),where the former is the most widely used and the latter is anon-spoken method mandatory for oceanic messages and limitedfor some domestic issues. ASR systems on ATCo environmentsinherit increasing complexity due to accents from non-English speakers, cockpit noise, speaker-dependent biases andsmall in-domain ATC databases for training. Hereby, we introduceCleanSky EC-H2020 ATCO2, a project that aims todevelop an ASR-based platform to collect, organize and automaticallypre-process ATCo speech-data from air space. Thispaper conveys an exploratory benchmark of several state-ofthe-art ASR models trained on more than 170 hours of ATCospeech-data. We demonstrate that the cross-accent flaws dueto speakers accents are minimized due to the amount of data,making the system feasible for ATC environments. The developedASR system achieves an averaged word error rate (WER)of 7.75% across four databases. An additional 35% relative improvementin WER is achieved on one test set when training aTDNNF system with byte-pair encoding.

Published

2020

Pages

2297–2301

Journal

Proceedings of Interspeech, vol. 2020, no. 10, ISSN 1990-9772

Proceedings

Proceedings of Interspeech 2020

Conference

Interspeech, Shanghai, CN

Publisher

International Speech Communication Association

Place

Shanghai

DOI

10.21437/Interspeech.2020-2173

UT WoS

000833594102086

EID Scopus

2-s2.0-85098162088

BibTeX

@inproceedings{BUT168149,
  author="ZULUAGA-GOMEZ, J. and MOTLÍČEK, P. and ZHAN, Q. and VESELÝ, K. and BRAUN, R.",
  title="Automatic Speech Recognition Benchmark for Air-Traffic Communications",
  booktitle="Proceedings of Interspeech 2020",
  year="2020",
  journal="Proceedings of Interspeech",
  volume="2020",
  number="10",
  pages="2297--2301",
  publisher="International Speech Communication Association",
  address="Shanghai",
  doi="10.21437/Interspeech.2020-2173",
  issn="1990-9772",
  url="https://isca-speech.org/archive/Interspeech_2020/pdfs/2173.pdf"
}

Files

pdf zuluaga-gomez_Interspeech2020_2173.pdf 160 kB