Project Details
Exchanges for SPEech ReseArch aNd TechnOlogies
Project Period: 1. 1. 2021 – 31. 12. 2025
Project Type: grant
Agency: Evropská unie
Program: Horizon 2020
artificial intelligence, intelligent systems, multi agent systems, machine learning, data mining, statistical data processing and application, modelling engineering, human computer interaction, natural language processing, speech processing, neural networks, explainability, human assisted learning, low resources, natural language processing, standardization, evaluation
The ESPERANTO project aims at pushing speech processing technologies to their next step in order to enable the diffusion of these technologies in European SMEs and to maximize and securize their use in the civil society for forensic, health or education. The ESPERANTO consortium forsees that the next generation of artificial intelligence algorithms for speech processing should : 1. be more accessible : via a larger number of spoken languages, and for applications where resources are strongly limited (health, education, robotics); 2. integrate a human in the loop to guaranty a higher usability and ease of deployment and maintenance; 3. be explainable in order to enable sensitive applications related to forensic or health and contribute to personal data preservation by detecting and characterizing existing biases due to the data-driven nature of current speech technologies. ESPERANTO intends to lead the scientific community by releasing evaluation metrics, protocols and standards that will boost the development and evaluation of this new generation of algorithms. To achieve this ambitious goal, the ESPERANTO project gathers a large and trans-sectorial community of experts in speech related applications such as speech transcription, separation, enhancement, translation, understanding and speaker recognition and diarization to transfer knowledge, organize, produce and standardize resources with the aim of catalyzing and cross-pollenizing this area. The main goals of the ESPERANTO project are: - support the development of open-source tools that will encourage fast developement, exchanges and reproducibility; - produce tutorials and competitive baselines on various topics of speech processing in order to boost the fostering of new speech-AI students, researchers and engineers; - facilitate the collection and sharing of linguistic and speech resources through standards; - organize workshops to progress on the speech technologies and favor tranfer of knowledge.
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Kohlová Renata, Ing. (DCGM)
Landini Federico Nicolás (RG SPEECH)
Matějka Pavel, Ing., Ph.D. (DCGM)
Mošner Ladislav, Ing. (DCGM)
Plchot Oldřich, Ing., Ph.D. (DCGM)
Rohdin Johan Andréas, M.Sc., Ph.D. (DCGM)
Silnova Anna, M.Sc., Ph.D. (DCGM)
2024
- HAN, J.; LANDINI, F.; ROHDIN, J.; DIEZ SÁNCHEZ, M.; BURGET, L.; CAO, Y.; LU, H.; ČERNOCKÝ, J. Diacorrect: Error Correction Back-End for Speaker Diarization. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul: IEEE Signal Processing Society, 2024.
p. 11181-11185. ISBN: 979-8-3503-4485-1. Detail - KLEMENT, D.; DIEZ SÁNCHEZ, M.; LANDINI, F.; BURGET, L.; SILNOVA, A.; DELCROIX, M.; TAWARA, N. Discriminative Training of VBx Diarization. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024.
p. 11871-11875. ISBN: 979-8-3503-4485-1. Detail - LANDINI, F.; DIEZ SÁNCHEZ, M.; STAFYLAKIS, T.; BURGET, L. DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors. IEEE Transactions on Audio, Speech, and Language Processing, 2024, vol. 32, no. 7,
p. 3450-3465. ISSN: 1558-7916. Detail
2023
- KAKOUROS, S.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L. Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023.
p. 1-5. ISBN: 978-1-7281-6327-7. Detail - KESIRAJU, S.; BENEŠ, K.; TIKHONOV, M.; ČERNOCKÝ, J. BUT Systems for IWSLT 2023 Marathi - Hindi Low Resource Speech Translation Task. In 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference. Toronto (in-person and online): Association for Computational Linguistics, 2023.
p. 227-234. ISBN: 978-1-959429-84-5. Detail - KESIRAJU, S.; SARVAŠ, M.; PAVLÍČEK, T.; MACAIRE, C.; CIUBA, A. Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023.
p. 2148-2152. ISSN: 1990-9772. Detail - LANDINI, F.; DIEZ SÁNCHEZ, M.; LOZANO DÍEZ, A.; BURGET, L. Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023.
p. 1-5. ISBN: 978-1-7281-6327-7. Detail - MATĚJKA, P.; SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; PLCHOT, O.; KLČO, M.; PENG, J.; STAFYLAKIS, T.; BURGET, L. Description and Analysis of ABC Submission to NIST LRE 2022. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023.
p. 511-515. ISSN: 1990-9772. Detail - MOŠNER, L.; PLCHOT, O.; PENG, J.; BURGET, L.; ČERNOCKÝ, J. Multi-Channel Speech Separation with Cross-Attention and Beamforming. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023.
p. 1693-1697. ISSN: 1990-9772. Detail - PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023.
p. 555-562. ISBN: 978-1-6654-7189-3. Detail - PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Improving Speaker Verification with Self-Pretrained Transformer Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023.
p. 5361-5365. ISSN: 1990-9772. Detail - PENG, J.; STAFYLAKIS, T.; GU, R.; PLCHOT, O.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Rhodes Island: IEEE Signal Processing Society, 2023.
p. 1-5. ISBN: 978-1-7281-6327-7. Detail - SILNOVA, A.; BRUMMER, J.; SWART, A.; BURGET, L. Toroidal Probabilistic Spherical Discriminant Analysis. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023.
p. 1-5. ISBN: 978-1-7281-6327-7. Detail - SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; KLČO, M.; PLCHOT, O.; MATĚJKA, P.; PENG, J.; STAFYLAKIS, T.; BURGET, L. ABC System Description for NIST LRE 2022. Proceedings of NIST LRE 2022 Workshop. Washington DC: National Institute of Standards and Technology, 2023.
p. 1-5. Detail - STAFYLAKIS, T.; MOŠNER, L.; KAKOUROS, S.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023.
p. 1136-1143. ISBN: 978-1-6654-7189-3. Detail
2022
- ALAM, J.; BURGET, L.; GLEMBEK, O.; MATĚJKA, P.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; STAFYLAKIS, T. Development of ABC systems for the 2021 edition of NIST Speaker Recognition evaluation. Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022.
p. 346-353. Detail - BRUMMER, J.; SWART, A.; MOŠNER, L.; SILNOVA, A.; PLCHOT, O.; STAFYLAKIS, T.; BURGET, L. Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022.
p. 1446-1450. ISSN: 1990-9772. Detail - KOCOUR, M.; UMESH, J.; KARAFIÁT, M.; ŠVEC, J.; LOPEZ, F.; BENEŠ, K.; DIEZ SÁNCHEZ, M.; SZŐKE, I.; LUQUE, J.; VESELÝ, K.; BURGET, L.; ČERNOCKÝ, J. BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge. Proceedings of IberSpeech 2022. Granada: International Speech Communication Association, 2022.
p. 276-280. Detail - MOŠNER, L.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Multi-Channel Speaker Verification with Conv-Tasnet Based Beamformer. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022.
p. 7982-7986. ISBN: 978-1-6654-0540-9. Detail - MOŠNER, L.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Multisv: Dataset for Far-Field Multi-Channel Speaker Verification. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022.
p. 7977-7981. ISBN: 978-1-6654-0540-9. Detail - PENG, J.; GU, R.; MOŠNER, L.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Learnable Sparse Filterbank for Speaker Verification. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022.
p. 5110-5114. ISSN: 1990-9772. Detail - SILNOVA, A.; STAFYLAKIS, T.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; MATĚJKA, P.; BURGET, L.; GLEMBEK, O.; BRUMMER, J. Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch. Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022.
p. 9-16. Detail - STAFYLAKIS, T.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; BURGET, L.; ČERNOCKÝ, J. Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022.
p. 605-609. ISSN: 1990-9772. Detail
2021
- STAFYLAKIS, T.; ROHDIN, J.; BURGET, L. Speaker embeddings by modeling channel-wise correlations. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Brno: International Speech Communication Association, 2021.
p. 501-505. ISSN: 1990-9772. Detail