Publication Details

Spelling-Aware Word-Based End-to-End ASR

EGOROVA, E.; VYDANA, H.; BURGET, L.; ČERNOCKÝ, J. Spelling-Aware Word-Based End-to-End ASR. IEEE SIGNAL PROCESSING LETTERS, 2022, vol. 29, no. 29, p. 1729-1733. ISSN: 1558-2361.
Czech title
End-to-End systém pro rozpoznávání řeči založený na slovech beroucí v úvahu jejich hláskování
Type
journal article
Language
English
Authors
Egorova Ekaterina, Ing., Ph.D.
Vydana Hari Krishna
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
URL
Keywords

end-to-end, ASR, OOV, Listen Attend and Spell architecture

Abstract

We propose a new end-to-end architecture for automatic speech recognition that expands the listen, attend and spell (LAS) paradigm. While the main word-predicting network is trained to predict words, the secondary, speller network, is optimized to predict word spellings from inner representations of the main network (e.g. word embeddings or context vectors from the attention module). We show that this joint training improves the word error rate of a word-based system and enables solving additional tasks, such as out-of-vocabulary word detection and recovery. The tests are conducted on LibriSpeech dataset consisting of 1000h of read speech.

Published
2022
Pages
1729–1733
Journal
IEEE SIGNAL PROCESSING LETTERS, vol. 29, no. 29, ISSN 1558-2361
DOI
UT WoS
000842088200001
EID Scopus
BibTeX
@article{BUT178877,
  author="Ekaterina {Egorova} and Hari Krishna {Vydana} and Lukáš {Burget} and Jan {Černocký}",
  title="Spelling-Aware Word-Based End-to-End ASR",
  journal="IEEE SIGNAL PROCESSING LETTERS",
  year="2022",
  volume="29",
  number="29",
  pages="1729--1733",
  doi="10.1109/LSP.2022.3192199",
  issn="1558-2361",
  url="https://ieeexplore.ieee.org/document/9833231"
}
Files
Back to top