Publication Details

Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding

ZULUAGA-GOMEZ, J.; NIGMATULINA, I.; PRASAD, A.; MOTLÍČEK, P.; KHALIL, D.; MADIKERI, S.; TART, A.; SZŐKE, I.; LENDERS, V.; RIGAULT, M.; CHOUKRI, K. Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding. Aerospace, 2023, vol. 2023, no. 10, p. 1-33. ISSN: 2226-4310.

Czech title

Poznatky získané při přepisu 5000 hodin komunikace řízení letového provozu pro robustní automatické porozumění řeči

Type

journal article

Language

English

Authors

ZULUAGA-GOMEZ, J.
NIGMATULINA, I.
Prasad Amrutha (DCGM)
Motlíček Petr, doc. Ing., Ph.D. (DCGM)
KHALIL, D.
Madikeri Srikanth
TART, A.
Szőke Igor, Ing., Ph.D. (DCGM)
LENDERS, V.
RIGAULT, M.
CHOUKRI, K.

URL

Keywords

air traffic control communications; automatic speech recognition and understanding; OpenSky Network; callsign recognition; ADS-B data

Abstract

Voice communication between air traffic controllers (ATCos) and pilots is critical for
ensuring safe and efficient air traffic control (ATC). The handling of these voice communications
requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts
aim at integrating artificial intelligence (AI) into ATC communications in order to lessen ATCos's
workload. However, the development of data-driven AI systems for understanding of spoken ATC
communications demands large-scale annotated datasets, which are currently lacking in the field.
This paper explores the lessons learned from the ATCO2 project, which aimed to develop an unique
platform to collect, preprocess, and transcribe large amounts of ATC audio data from airspace in
real time. This paper reviews (i) robust automatic speech recognition (ASR), (ii) natural language
processing, (iii) English language identification, and (iv) contextual ASR biasing with surveillance
data. The pipeline developed during the ATCO2 project, along with the open-sourcing of its data,
encourages research in the ATC field, while the full corpus can be purchased through ELDA. ATCO2
corpora is suitable for developing ASR systems when little or near to no ATC audio transcribed
data are available. For instance, the proposed ASR system trained with ATCO2 reaches as low as
17.9% WER on public ATC datasets which is 6.6% absolute WER better than with "out-of-domain"
but gold transcriptions. Finally, the release of 5000 h of ASR transcribed speech-covering more
than 10 airports worldwide-is a step forward towards more robust automatic speech understanding
systems for ATC communications.

Published

2023

Pages

1–33

Journal

Aerospace, vol. 2023, no. 10, ISSN 2226-4310

DOI

10.3390/aerospace10100898

UT WoS

001093774900001

EID Scopus

2-s2.0-85175267376

BibTeX

@article{BUT185576,
  author="ZULUAGA-GOMEZ, J. and NIGMATULINA, I. and PRASAD, A. and MOTLÍČEK, P. and KHALIL, D. and MADIKERI, S. and TART, A. and SZŐKE, I. and LENDERS, V. and RIGAULT, M. and CHOUKRI, K.",
  title="Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding",
  journal="Aerospace",
  year="2023",
  volume="2023",
  number="10",
  pages="1--33",
  doi="10.3390/aerospace10100898",
  issn="2226-4310",
  url="https://www.mdpi.com/2226-4310/10/10/898"
}

Files

pdf zuluaga-gomez_aerospace2023-10-00898-v2.pdf 2 MB