Detail výsledku

Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model

KOCOUR, M.; ŽMOLÍKOVÁ, K.; ONDEL YANG, L.; ŠVEC, J.; DELCROIX, M.; OCHIAI, T.; BURGET, L.; ČERNOCKÝ, J. Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022. no. 9, p. 4955-4959. ISSN: 1990-9772.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Kocour Martin, Ing., UPGM (FIT)
Žmolíková Kateřina, Ing., Ph.D.
ONDEL YANG, L.
Švec Ján, Ing., UPGM (FIT)
Delcroix Marc, FIT (FIT)
OCHIAI, T.
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)

Abstrakt

In typical multi-talker speech recognition systems, a neural
network-based acoustic model predicts senone state posteriors
for each speaker. These are later used by a single-talker decoder
which is applied on each speaker-specific output stream separately.
In this work, we argue that such a scheme is sub-optimal
and propose a principled solution that decodes all speakers
jointly. We modify the acoustic model to predict joint state
posteriors for all speakers, enabling the network to express uncertainty
about the attribution of parts of the speech signal to
the speakers. We employ a joint decoder that can make use
of this uncertainty together with higher-level language information.
For this, we revisit decoding algorithms used in factorial
generative models in early multi-talker speech recognition systems.
In contrast with these early works, we replace the GMM
acoustic model with DNN, which provides greater modeling
power and simplifies part of the inference. We demonstrate the
advantage of joint decoding in proof of concept experiments on
a mixed-TIDIGITS dataset.

Klíčová slova

Multi-talker speech recognition, Permutation invariant
training, Factorial Hidden Markov models

URL

Rok

2022

Strany

4955–4959

Časopis

Proceedings of Interspeech, č. 9, ISSN 1990-9772

Sborník

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Konference

Interspeech Conference

Vydavatel

International Speech Communication Association

Místo

Incheon

DOI

10.21437/Interspeech.2022-10406

UT WoS

000900724505027

EID Scopus

2-s2.0-85140088159

BibTeX

@inproceedings{BUT179827,
  author="KOCOUR, M. and ŽMOLÍKOVÁ, K. and ONDEL YANG, L. and ŠVEC, J. and DELCROIX, M. and OCHIAI, T. and BURGET, L. and ČERNOCKÝ, J.",
  title="Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2022",
  journal="Proceedings of Interspeech",
  number="9",
  pages="4955--4959",
  publisher="International Speech Communication Association",
  address="Incheon",
  doi="10.21437/Interspeech.2022-10406",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/pdfs/interspeech_2022/kocour22_interspeech.pdf"
}

Soubory

pdf kocour22_interspeech2022_revisiting.pdf 302 kB

Projekty

Automatický sběr a zpracování hlasových dat z letecké komunikace, EU, Horizon 2020, zahájení: 2019-11-01, ukončení: 2022-02-28, ukončen
HAAWAII - Highly Automated Air Traffic Controller Workstations with Artificial Intelligence Integration, EU, Horizon 2020, H2020-SESAR-2019-2, zahájení: 2020-06-01, ukončení: 2022-11-30, ukončen
Multi-lingualita v řečových technologiích, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION, LTAIN19087, zahájení: 2020-01-01, ukončení: 2023-08-31, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)