Publication Details
Multi-Channel Speech Separation with Cross-Attention and Beamforming
Plchot Oldřich, Ing., Ph.D. (DCGM)
Peng Junyi (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
multi-channel source separation, cross-channel attention, beamforming
Originally, single-channel source separation gained more research interest. It
resulted in immense progress. Multichannel (MC) separation comes with new
challenges posed by adverse indoor conditions making it an important field of
study. We seek to combine promising ideas from the two worlds. First, we build MC
models by extending current single-channel time-domain separators relying on
their strength. Our approach allows reusing pre-trained models by inserting
designed lightweight reference channel attention (RCA) combiner, the only trained
module. It comprises two blocks: the former allows attending to different parts
of other channels w.r.t. the reference one, and the latter provides an
attention-based combination of channels. Second, like many successful MC models,
our system incorporates beamforming and allows for the fusion of the network and
beamformer outputs. We compare our approach with the SOTA models on the SMS-WSJ
dataset and show better or similar performance.
@inproceedings{BUT185571,
author="Ladislav {Mošner} and Oldřich {Plchot} and Junyi {Peng} and Lukáš {Burget} and Jan {Černocký}",
title="Multi-Channel Speech Separation with Cross-Attention and Beamforming",
booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
year="2023",
journal="Proceedings of Interspeech",
volume="2023",
number="08",
pages="1693--1697",
publisher="International Speech Communication Association",
address="Dublin",
doi="10.21437/Interspeech.2023-2537",
issn="1990-9772",
url="https://www.isca-speech.org/archive/interspeech_2023/mosner23_interspeech.html"
}