Project Details

DARPA Low Resource Languages for Emergent Incidents (LORELEI) - Exploiting Language Information for Situational Awareness (ELISA)

Project Period: 1. 9. 2015 – 31. 3. 2020

Project Type: contract

Partner: University of Southern California

Czech title
DARPA Jazyky s omezenými zdroji pro potenciální krizové situace (LORELEI) - Využití jazykové informace pro situační povědomí (ELISA
Type
contract
Keywords

Speech processing, language, apeech mining

Abstract

Speech processing in our proposal will be addressed by low-resource or
language-agnostic technologies. Rather than concentrating on mining the content
(for which, obviously, standard resources such as acoustic model, language model
or pronunciation dictionary will be lacking), speech data will be handled by
a multitude of "speech miners" that make minimum use of resources of the target
language. The processing will begin with a reliable voice activity detection
(VAD) capable of segmenting the signal into useful and useless portions. Often
regarded as "not a rocket science", a good VAD is crucial for correct functioning
of the following blocks and for human processing of speech input. Our work will
improve on existing DNN-based VAD that proved its efficiency in a difficult RATS
setting [Ng2012]. A processing with several phone posterior estimators with
either mono-lingual or multilingual phoneme sets [Schwarz2009] will follow to
provide the "miners" with a coherent low-dimensional representation. The first
real "miner" will be language identification (LID) with a significant set of
target languages (>60). Even if it is not sure that the target language will be
in this set, LID will allow to detect segments in English or possibly in other
languages for which we have ASR technology. We will follow our recent development
of LID base on features derived from phone posteriors [Plchot2013] as well as on
DNNs. We will also work on enrollment of a new language with very little data
(down to one utterance). Another "miner" will perform basic speaking style
recognition allowing to separate read speech from spontaneous. Finally, speaker
recognition (SRE) or clustering will allow to gather information about speakers
(in case they were previously enrolled) or at least to perform coarse speaker
clustering, as for the analyst, the information on who is speaking can be equally
important as what is said. Here, we will build up on our significant track in
iVector-based SRE and will mainly work on automatic adaptation and calibration on
unlabeled data-sets [Brummer2014]

Team members
Burget Lukáš, doc. Ing., Ph.D. (DCGM) – research leader
Beneš Karel, Ing. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
Fér Radek, Ing.
Glembek Ondřej, Ing., Ph.D.
Kocour Martin, Ing. (DCGM)
Matějka Pavel, Ing., Ph.D. (DCGM)
Ondel Lucas Antoine Francois, Mgr., Ph.D. (SSDIT)
Skácel Miroslav, Ing.
Szőke Igor, Ing., Ph.D. (DCGM)
Žmolíková Kateřina, Ing., Ph.D. (FIT)
Publications

2019

2018

2017

2016

2015

Back to top