Publication Details

Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling

CHO, J.; BASKAR, M.; LI, R.; WIESNER, M.; MALLIDI, S.; YALTA, N.; KARAFIÁT, M.; WATANABE, S.; HORI, T. Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling. In Proceedings of 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018). Athens: IEEE Signal Processing Society, 2018. p. 521-527. ISBN: 978-1-5386-4334-1.
Czech title
Multilingvální sequence-to-sequence rozpoznávání řeči: architektura, přenosové učení a jazykové modelování
Type
conference paper
Language
English
Authors
CHO, J.
Baskar Murali Karthick, Ing., Ph.D.
Li Ruizhi
Wiesner Matthew, PhD.
Mallidi Sri Harish
YALTA, N.
Karafiát Martin, Ing., Ph.D. (DCGM)
Watanabe Shinji
HORI, T.
URL
Keywords

Automatic speech recognition (ASR), sequence to sequence, multilingual setup,
transfer learning, language modeling

Abstract

Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new
direction in speech research. The approach benefits by performing model training
without using lexicon and alignments. However, this poses a new problem of
requiring more data compared to conventional DNN-HMM systems. In this work, we
attempt to use data from 10 BABEL languages to build a multilingual seq2seq model
as a prior model, and then port them towards 4 other BABEL languages using
transfer learning approach. We also explore different architectures for improving
the prior multilingual seq2seq model. The paper also discusses the effect of
integrating a recurrent neural network language model (RNNLM) with a seq2seq
model during decoding. Experimental results show that the transfer learning
approach from the multilingual model shows substantial gains over monolingual
models across all 4 BABEL languages. Incorporating an RNNLM also brings
significant improvements in terms of %WER, and achieves recognition performance
comparable to the models trained with twice more training data.

Published
2018
Pages
521–527
Proceedings
Proceedings of 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018)
Conference
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), Athens, GR
ISBN
978-1-5386-4334-1
Publisher
IEEE Signal Processing Society
Place
Athens
DOI
UT WoS
000463141800073
EID Scopus
BibTeX
@inproceedings{BUT163489,
  author="CHO, J. and BASKAR, M. and LI, R. and WIESNER, M. and MALLIDI, S. and YALTA, N. and KARAFIÁT, M. and WATANABE, S. and HORI, T.",
  title="Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling",
  booktitle="Proceedings of 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018)",
  year="2018",
  pages="521--527",
  publisher="IEEE Signal Processing Society",
  address="Athens",
  doi="10.1109/SLT.2018.8639655",
  isbn="978-1-5386-4334-1",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8639655"
}
Files
Back to top