Publication Details

Probing Self-Supervised Learning Models With Target Speech Extraction

PENG, J.; DELCROIX, M.; OCHIAI, T.; ASHIHARA, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. Probing Self-Supervised Learning Models With Target Speech Extraction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 535-539. ISBN: 979-8-3503-7451-3.

Czech title

Testování modelů získaných samoučením na úloze extrakce řeči cílového mluvčího

Type

conference paper

Language

English

Authors

Peng Junyi (DCGM)
Delcroix Marc
OCHIAI, T.
ASHIHARA, T.
Plchot Oldřich, Ing., Ph.D. (DCGM)
ARAKI, S.
Černocký Jan, prof. Dr. Ing. (DCGM)

URL

Keywords

Target speech extraction, self-supervised learning, SUPERB

Abstract

Large-scale pre-trained self-supervised learning (SSL) models have shown
remarkable advancements in speech-related tasks. However, the utilization of
these models in complex multi-talker scenarios, such as extracting a target
speaker in a mixture, is yet to be fully evaluated. In this paper, we introduce
target speech extraction (TSE) as a novel downstream task to evaluate the feature
extraction capabilities of pre-trained SSL models. TSE uniquely requires both
speaker identification and speech separation, distinguishing it from other tasks
in the Speech processing Universal PERformance Benchmark (SUPERB) evaluation.
Specifically, we propose a TSE downstream model composed of two lightweight
task-oriented modules based on the same frozen SSL model. One module functions as
a speaker encoder to obtain target speaker information from an enrollment speech,
while the other estimates the target speaker's mask to extract its speech from
the mixture. Experimental results on the Libri2mix datasets reveal the relevance
of the TSE downstream task to probe SSL models, as its performance cannot be
simply deduced from other related tasks such as speaker verification and
separation.

Published

2024

Pages

535–539

Proceedings

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Conference

2024 IEEE International Conference on Acoustics, Speech and Signal Processing IEEE, Seoul, KR

ISBN

979-8-3503-7451-3

Publisher

IEEE Signal Processing Society

Place

Seoul

DOI

10.1109/ICASSPW62465.2024.10627502

EID Scopus

2-s2.0-85202435980

BibTeX

@inproceedings{BUT189780,
  author="PENG, J. and DELCROIX, M. and OCHIAI, T. and ASHIHARA, T. and PLCHOT, O. and ARAKI, S. and ČERNOCKÝ, J.",
  title="Probing Self-Supervised Learning Models With Target Speech Extraction",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2024",
  pages="535--539",
  publisher="IEEE Signal Processing Society",
  address="Seoul",
  doi="10.1109/ICASSPW62465.2024.10627502",
  isbn="979-8-3503-7451-3",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10627502"
}

Files

pdf peng_icassp2024_Probing_Self-Supervised_Learning.pdf 1 MB