Publication Details
Jointly Trained Transformers Models for Spoken Language Translation
Karafiát Martin, Ing., Ph.D. (DCGM)
Žmolíková Kateřina, Ing., Ph.D. (FIT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
Spoken Language Translation, Transformers, Joint training, How2 dataset,
Auxiliary loss, ASR objective, Coupled decoding, End-to-End differentiable
pipeline.
End-to-End and cascade (ASR-MT) spoken language translation (SLT) systems are
reaching comparable performances, however, a large degradation is observed when
translating the ASR hypothesis in comparison to using oracle input text. In this
work, degradation in performance is reduced by creating an End-to-End
differentiable pipeline between the ASR and MT systems. In this work, we train
SLT systems with ASR objective as an auxiliary loss and both the networks are
connected through the neural hidden representations. This training has an
End-to-End differentiable path with respect to the final objective function and
utilizes the ASR objective for better optimization. This architecture has
improved the BLEU score from 41.21 to 44.69. Ensembling the proposed architecture
with independently trained ASR and MT systems further improved the BLEU score
from 44.69 to 46.9. All the experiments are reported on English-Portuguese speech
translation task using the How2 corpus. The final BLEU score is on-par with the
best speech translation system on How2 dataset without using any additional
training data and language model and using fewer parameters.
@inproceedings{BUT175791,
author="Hari Krishna {Vydana} and Martin {Karafiát} and Kateřina {Žmolíková} and Lukáš {Burget} and Jan {Černocký}",
title="Jointly Trained Transformers Models for Spoken Language Translation",
booktitle="ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
year="2021",
pages="7513--7517",
publisher="IEEE Signal Processing Society",
address="Toronto, Ontario",
doi="10.1109/ICASSP39728.2021.9414159",
isbn="978-1-7281-7605-5",
url="https://www.fit.vut.cz/research/publication/12522/"
}