Publication Details
Implementing contextual biasing in GPU decoder for online ASR
Madikeri Srikanth
VILLATORO-TELLO, E.
Motlíček Petr, doc. Ing., Ph.D. (DCGM)
ZULUAGA-GOMEZ, J.
PANDIA, K.
GANAPATHIRAJU, A.
real-time speech recognition, contextual adaptation, GPU decoding, finite-state
transducers
GPU decoding significantly accelerates the output of ASR predictions. While GPUs
are already being used for online ASR decoding, post-processing and rescoring on
GPUs have not been properly investigated yet. Rescoring with available contextual
information can considerably improve ASR predictions. Previous studies have
proven the viability of lattice rescoring in decoding and biasing language model
(LM) weights in offline and online CPU scenarios. In real-time GPU decoding,
partial recognition hypotheses are produced without lattice generation, which
makes the implementation of biasing more complex. The paper proposes and
describes an approach to integrate contextual biasing in real-time GPU decoding
while exploiting the standard Kaldi GPU decoder. Besides the biasing of partial
ASR predictions, our approach also permits dynamic context switching allowing
a flexible rescoring per each speech segment directly on GPU. The code is
publicly released1 and tested with open-sourced test sets.
@inproceedings{BUT187754,
author="NIGMATULINA, I. and MADIKERI, S. and VILLATORO-TELLO, E. and MOTLÍČEK, P. and ZULUAGA-GOMEZ, J. and PANDIA, K. and GANAPATHIRAJU, A.",
title="Implementing contextual biasing in GPU decoder for online ASR",
booktitle="Proceedings of the Annual Conference of International Speech Communication Association, INTERSPEECH",
year="2023",
journal="Proceedings of Interspeech",
volume="2023",
number="8",
pages="4494--4498",
publisher="International Speech Communication Association",
address="Dublin",
doi="10.21437/Interspeech.2023-2449",
issn="1990-9772",
url="https://www.isca-archive.org/interspeech_2023/nigmatulina23_interspeech.html"
}