Publication Details

Speaker activity driven neural speech extraction

DELCROIX, M.; ŽMOLÍKOVÁ, K.; OCHIAI, T.; KINOSHITA, K.; NAKATANI, T. Speaker activity driven neural speech extraction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Toronto: IEEE Signal Processing Society, 2021. p. 6099-6103. ISBN: 978-1-7281-7605-5.
Czech title
Neurální extrakce řeči řízená aktivitou řečníka
Type
conference paper
Language
English
Authors
Delcroix Marc
Žmolíková Kateřina, Ing., Ph.D. (FIT)
OCHIAI, T.
Kinoshita Keisuke
Nakatani Tomohiro
URL
Keywords

Speech extraction, Speaker activity, Speech enhancement, Meeting recognition,
Neural network

Abstract

Target speech extraction, which extracts the speech of a target speaker in
a mixture given auxiliary speaker clues, has recently received increased
interest. Various clues have been investigated such as pre-recorded enrollment
utterances, direction information, or video of the target speaker. In this paper,
we explore the use of speaker activity information as an auxiliary clue for
single-channel neural network-based speech extraction. We propose a speaker
activity driven speech extraction neural network (ADEnet) and show that it can
achieve performance levels competitive with enrollmentbased approaches, without
the need for pre-recordings. We further demonstrate the potential of the proposed
approach for processing meeting-like recordings, where speaker activity obtained
from a diarization system is used as a speaker clue for ADEnet. We show that this
simple yet practical approach can successfully extract speakers after
diarization, which leads to improved ASR performance when using a single
microphone, especially in high overlapping conditions, with relative word error
rate reduction of up to 25 %.

Published
2021
Pages
6099–6103
Proceedings
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Conference
2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, CA
ISBN
978-1-7281-7605-5
Publisher
IEEE Signal Processing Society
Place
Toronto
DOI
UT WoS
000704288406074
EID Scopus
BibTeX
@inproceedings{BUT171749,
  author="DELCROIX, M. and ŽMOLÍKOVÁ, K. and OCHIAI, T. and KINOSHITA, K. and NAKATANI, T.",
  title="Speaker activity driven neural speech extraction",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2021",
  pages="6099--6103",
  publisher="IEEE Signal Processing Society",
  address="Toronto",
  doi="10.1109/ICASSP39728.2021.9414998",
  isbn="978-1-7281-7605-5",
  url="https://www.fit.vut.cz/research/publication/12479/"
}
Files
Back to top