Project Details

Robust SPEAKER DIariazation systems using Bayesian inferenCE and deep learning methods

Project Period: 1. 3. 2017 – 28. 2. 2019

Project Type: grant

Agency: Evropská unie

Program: Horizon 2020

Czech title
Robustní diarizace mluvčích pomocí Bayesovské inference a hlubokého učení
Type
grant
Keywords

Machine learning, statistical data processing and applications using signal
processing, Numerical analysis, simulation, optimisation, modelling tools, data
mining, Ontologies, neural networks, genetic programming, fuzzy logic, Cognitive
science, human computer interaction, natural language processing, Complexity and
cryptography, electronic security, privacy, biometrics, Speaker Diarization,
Speaker Recognition, Variational Bayes Inference, Deep Neural Networks, Speech
Data Mining

Abstract

The proposed project deals with Speaker Diarization (SD) which is commonly
defined as the task of answering the question "who spoke when?" in a speech
recording. The first objective of the proposal is to optimize the Bayesian
approach to SD, which has shown to be promising for the tasks. For Variational
Bayes (VB) inference, that is very sensitive to initialization, we will develop
new fast ways of obtaining a good starting point. We will also explore
alternative inference methods, such as collapsed VB or collapsed Gibbs Sampling,
and investigate into alternative priors similar to those introduced for Bayesian
speaker recognition models. The second part of the proposal is motivated by the
huge performance gains that, in recent years, have been brought to other
recognition tasks by Deep Neural Networks (DNNs). In the context of SD, DNNs have
been used in the computation of i-vectors, but their potential was never explored
for other stages of SD. We will study ways of integrating DNNs in the different
stages of SD systems. The objectives of the proposal will be achieved by
theoretical work, implementation, and careful testing on real speech data. The
outcomes of the project are intended not only for scientific publications, but
eagerly awaited by European speech data mining industry (for example Czech
Phonexia or Spanish Agnitio). The project is proposed by an excellent female
researcher, Dr. Mireia Diez, having finished her thesis in the GTTS group of
University of the Basque Country, one of the most important European labs dealing
with speaker recognition and diarization. The proposed host is the Speech@FIT
group of Brno University of Technology, with a 20-year track of top speech data
mining research. The proposed research training and combination of skills of Dr.
Diez and the host institution have chances to advance the state-of-the-art in
speaker diarization, provide the applicant with improved career opportunities and
benefit European industry.

Team members
Publications

2020

2019

2018

2017

Back to top