Detail výsledku

Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models

KESIRAJU, S.; SARVAŠ, M.; PAVLÍČEK, T.; MACAIRE, C.; CIUBA, A. Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. no. 08, p. 2148-2152. ISSN: 1990-9772.
Typ
článek ve sborníku konference
Jazyk
angličtina
Autoři
Kesiraju Santosh, Ph.D., UPGM (FIT)
Sarvaš Marek, Ing., FIT (FIT), UPGM (FIT)
Pavlíček Tomáš, Ing.
MACAIRE, C.
CIUBA, A.
Abstrakt

This paper presents techniques and findings for improving
the performance of low-resource speech to text translation
(ST). We conducted experiments on both simulated and reallow
resource setups, on language pairs English - Portuguese,
and Tamasheq - French respectively. Using the encoder-decoder
framework for ST, our results show that a multilingual automatic
speech recognition system acts as a good initialization
under low-resource scenarios. Furthermore, using the CTC as
an additional objective for translation during training and decoding
helps to reorder the internal representations and improves
the final translation. Through our experiments, we try to
identify various factors (initializations, objectives, and hyperparameters)
that contribute the most for improvements in lowresource
setups. With only 300 hours of pre-training data, our
model achieved 7.3 BLEU score on Tamasheq - French data,
outperforming prior published works from IWSLT 2022 by 1.6
points.

Klíčová slova

speech translation, low-resource, multilingual, speech recognition

URL
Rok
2023
Strany
2148–2152
Časopis
Proceedings of Interspeech, roč. 2023, č. 08, ISSN 1990-9772
Sborník
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Konference
Interspeech Conference
Vydavatel
International Speech Communication Association
Místo
Dublin
DOI
EID Scopus
BibTeX
@inproceedings{BUT185572,
  author="KESIRAJU, S. and SARVAŠ, M. and PAVLÍČEK, T. and MACAIRE, C. and CIUBA, A.",
  title="Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="08",
  pages="2148--2152",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-2506",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/kesiraju23_interspeech.pdf"
}
Soubory
Projekty
Neuronové reprezentace v multimodálním a mnohojazyčném modelování, GAČR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, zahájení: 2019-01-01, ukončení: 2023-12-31, ukončen
Praktické ověření možnosti integrace umělé inteligence pro příjem tísňových volání pomocí hlasového chatbota, vyvinutého v rámci výzkumného projektu BV č. VI20192022169, s technologií pro příjem tísňové komunikace 112 a 150 v ČR (TCTV 112), MV, 1 VS OPSEC, VK01020132, zahájení: 2023-01-06, ukončení: 2025-10-31, ukončen
Výměny pro výzkum řeči a technologií, EU, Horizon 2020, zahájení: 2021-01-01, ukončení: 2025-12-31, ukončen
Výzkumné skupiny
Pracoviště
Nahoru