Publication Details

MGB-3 but system: Low-resource ASR on Egyptian YouTube data

VESELÝ, K.; BASKAR, M.; DIEZ SÁNCHEZ, M.; BENEŠ, K. MGB-3 but system: Low-resource ASR on Egyptian YouTube data. In Proceedings of ASRU 2017. Okinawa: IEEE Signal Processing Society, 2017. p. 368-373. ISBN: 978-1-5090-4788-8.

Czech title

MGB-3 BUT Systém: egyptské rozpoznávání řeči s omezenými zdroji

Type

conference paper

Language

English

Authors

Veselý Karel, Ing., Ph.D. (DCGM)
Baskar Murali Karthick, Ing., Ph.D.
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM)
Beneš Karel, Ing., Ph.D. (DCGM)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2017/vesely_asru2017_mgb3-paper.pdf PDF

Keywords

MGB-3, ASR adaptation, low-resource ASR, Egyptian Arabic, diarization

Abstract

This paper presents a series of experiments we performed duringour work on the MGB-3 evaluations. We both describethe submitted system, as well as the post-evaluation analysis.Our initial BLSTM-HMM system was trained on 250 hoursof MGB-2 data (Al-Jazeera), it was adapted with 5 hours ofEgyptian data (YouTube). We included such techniques asdiarization, n-gram language model adaptation, speed perturbationof the adaptation data, and the use of all 4 correctreferences. The 4 references were either used for supervisionwith a confusion network, or we included each sentence 4xwith the transcripts from all the annotators. Then, it was alsohelpful to blend the augmented MGB-3 adaptation data with15 hours of MGB-2 data. Although we did not rank with oursingle system among the best teams in the evaluations, we believethat our analysis will be highly interesting not only forthe other MGB-3 challenge participants.

Annotation

This paper presents a series of experiments we performed during our work on the MGB-3 evaluations. We both describe the submitted system, as well as the post-evaluation analysis. Our initial BLSTM-HMM system was trained on 250 hours of MGB-2 data (Al-Jazeera), it was adapted with 5 hours of Egyptian data (YouTube). We included such techniques as diarization, n-gram language model adaptation, speed perturbation of the adaptation data, and the use of all 4 correct references. The 4 references were either used for supervision with a confusion network, or we included each sentence 4x with the transcripts from all the annotators. Then, it was also helpful to blend the augmented MGB-3 adaptation data with 15 hours of MGB-2 data. Although we did not rank with our single system among the best teams in the evaluations, we believe that our analysis will be highly interesting not only for the other MGB-3 challenge participants.

Published

2017

Pages

368–373

Proceedings

Proceedings of ASRU 2017

Conference

2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), Okinawa, JP

ISBN

978-1-5090-4788-8

Publisher

IEEE Signal Processing Society

Place

Okinawa

DOI

10.1109/ASRU.2017.8268959

UT WoS

000426066100051

EID Scopus

2-s2.0-85050549506

BibTeX

@inproceedings{BUT144502,
  author="Karel {Veselý} and Murali Karthick {Baskar} and Mireia {Diez Sánchez} and Karel {Beneš}",
  title="MGB-3 but system: Low-resource ASR on Egyptian YouTube data",
  booktitle="Proceedings of ASRU 2017",
  year="2017",
  pages="368--373",
  publisher="IEEE Signal Processing Society",
  address="Okinawa",
  doi="10.1109/ASRU.2017.8268959",
  isbn="978-1-5090-4788-8",
  url="https://www.fit.vut.cz/research/publication/11595/"
}

Files

pdf vesely_asru2017_mgb3-paper.pdf 155 kB