Publication Details

Compact Network for Speakerbeam Target Speaker Extraction

DELCROIX, M.; ŽMOLÍKOVÁ, K.; OCHIAI, T.; KINOSHITA, K.; ARAKI, S.; NAKATANI, T. Compact Network for Speakerbeam Target Speaker Extraction. In Proceedings of ICASSP. Brighton: IEEE Signal Processing Society, 2019. p. 6965-6969. ISBN: 978-1-5386-4658-8.

Czech title

Kompaktní síť pro SpeakerBeam extrakci mluvčího

Type

conference paper

Language

English

Authors

Delcroix Marc
Žmolíková Kateřina, Ing., Ph.D. (FIT)
OCHIAI, T.
Kinoshita Keisuke
ARAKI, S.
Nakatani Tomohiro

URL

Keywords

Target speech extraction, Neural network, Adaptation, Auxiliary feature, Speech
enhancement

Abstract

Speech separation that separates a mixture of speech signals into each of its
sources has been an active research topic for a long time and has seen recent
progress with the advent of deep learning. A related problem is target speaker
extraction, i.e. extraction of only speech of a target speaker out of a mixture,
given characteristics of his/her voice. We have recently proposed SpeakerBeam,
which is a neural network-based target speaker extraction method. Speaker- Beam
uses a speech extraction network that is adapted to the target speaker using
auxiliary features derived from an adaptation utterance of that speaker.
Initially, we implemented SpeakerBeam with a factorized adaptation layer, which
consists of several parallel linear transformations weighted by weights derived
from the auxiliary features. The factorized layer is effective for target speech
extraction, but it requires a large number of parameters. In this paper, we
propose to simply scale the activations of a hidden layer of the speech
extraction network with weights derived from the auxiliary features. This simpler
approach greatly reduces the number of model parameters by up to 60%, making it
much more practical, while maintaining a similar level of performance. We tested
our approach on simulated and real noisy and reverberant mixtures, showing the
potential of SpeakerBeam for real-life applications. Moreover, we showed that
speech extraction performance of SpeakerBeam compares favorably with that of
a state-of-the-art speech separation method with a similar network
configuration.

Published

2019

Pages

6965–6969

Proceedings

Proceedings of ICASSP

ISBN

978-1-5386-4658-8

Publisher

IEEE Signal Processing Society

Place

Brighton

DOI

10.1109/ICASSP.2019.8683087

UT WoS

000482554007040

EID Scopus

2-s2.0-85069006044

BibTeX

@inproceedings{BUT160003,
  author="DELCROIX, M. and ŽMOLÍKOVÁ, K. and OCHIAI, T. and KINOSHITA, K. and ARAKI, S. and NAKATANI, T.",
  title="Compact Network for Speakerbeam Target Speaker Extraction",
  booktitle="Proceedings of ICASSP",
  year="2019",
  pages="6965--6969",
  publisher="IEEE Signal Processing Society",
  address="Brighton",
  doi="10.1109/ICASSP.2019.8683087",
  isbn="978-1-5386-4658-8",
  url="https://ieeexplore.ieee.org/document/8683087"
}

Files

pdf delcroix_icassp2019_0006965.pdf 944 kB