Detail projektu
Robustní zpracování nahrávek pro operativu a bezpečnost
Období řešení: 1. 10. 2020 – 30. 9. 2025
Typ projektu: grant
Kód: VJ01010108
Agentura: Ministerstvo vnitra ČR
rozpoznávání řeči, robustní, nahrávky, operativa, bezpečnost
Cílem projektu je zvýšení kompetencí, sjednocení a větší koordinace dvou předních českých výzkumných pracovišť, v oboru dolování informací z řeči z reálných nahrávek pro oblast bezpečnosti a úzká spolupráce s bezpečnostními sbory na uvádění výsledků výzkumu do praxe vyšetřování a zpravodajství. Tento cíl zahrnuje posun v robustním automatickém rozpoznávání řeči (ASR), trénování/adaptaci ASR pro různá prostředí, určení kdy kdo mluví v nahrávce (diarizace) a výzkum prohledávání nahrávek pomocí akustických dotazů (Query by Example)
Diez Sánchez Mireia, M.Sc., Ph.D. (UPGM)
Malenovský Vladimír, Ing., Ph.D. (UPGM)
Matějka Pavel, Ing., Ph.D. (UPGM)
Szőke Igor, Ing., Ph.D. (UPGM)
2024
- HAN, J.; LANDINI, F.; ROHDIN, J.; DIEZ SÁNCHEZ, M.; BURGET, L.; CAO, Y.; LU, H.; ČERNOCKÝ, J. Diacorrect: Error Correction Back-End for Speaker Diarization. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul: IEEE Signal Processing Society, 2024.
p. 11181-11185. ISBN: 979-8-3503-4485-1. Detail - LANDINI, F.; DIEZ SÁNCHEZ, M.; STAFYLAKIS, T.; BURGET, L. DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors. IEEE Transactions on Audio, Speech, and Language Processing, 2024, vol. 32, no. 7,
p. 3450-3465. ISSN: 1558-7916. Detail
2023
- KAKOUROS, S.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L. Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023.
p. 1-5. ISBN: 978-1-7281-6327-7. Detail - LANDINI, F.; DIEZ SÁNCHEZ, M.; LOZANO DÍEZ, A.; BURGET, L. Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023.
p. 1-5. ISBN: 978-1-7281-6327-7. Detail - MATĚJKA, P.; SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; PLCHOT, O.; KLČO, M.; PENG, J.; STAFYLAKIS, T.; BURGET, L. Description and Analysis of ABC Submission to NIST LRE 2022. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023.
p. 511-515. ISSN: 1990-9772. Detail - MOŠNER, L.; PLCHOT, O.; PENG, J.; BURGET, L.; ČERNOCKÝ, J. Multi-Channel Speech Separation with Cross-Attention and Beamforming. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023.
p. 1693-1697. ISSN: 1990-9772. Detail - PENG, J.; STAFYLAKIS, T.; GU, R.; PLCHOT, O.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Rhodes Island: IEEE Signal Processing Society, 2023.
p. 1-5. ISBN: 978-1-7281-6327-7. Detail - SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; KLČO, M.; PLCHOT, O.; MATĚJKA, P.; PENG, J.; STAFYLAKIS, T.; BURGET, L. ABC System Description for NIST LRE 2022. Proceedings of NIST LRE 2022 Workshop. Washington DC: National Institute of Standards and Technology, 2023.
p. 1-5. Detail - STAFYLAKIS, T.; MOŠNER, L.; KAKOUROS, S.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023.
p. 1136-1143. ISBN: 978-1-6654-7189-3. Detail
2022
- ALAM, J.; BURGET, L.; GLEMBEK, O.; MATĚJKA, P.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; STAFYLAKIS, T. Development of ABC systems for the 2021 edition of NIST Speaker Recognition evaluation. Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022.
p. 346-353. Detail - BRUMMER, J.; SWART, A.; MOŠNER, L.; SILNOVA, A.; PLCHOT, O.; STAFYLAKIS, T.; BURGET, L. Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022.
p. 1446-1450. ISSN: 1990-9772. Detail - KOCOUR, M.; UMESH, J.; KARAFIÁT, M.; ŠVEC, J.; LOPEZ, F.; BENEŠ, K.; DIEZ SÁNCHEZ, M.; SZŐKE, I.; LUQUE, J.; VESELÝ, K.; BURGET, L.; ČERNOCKÝ, J. BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge. Proceedings of IberSpeech 2022. Granada: International Speech Communication Association, 2022.
p. 276-280. Detail - LANDINI, F.; LOZANO DÍEZ, A.; DIEZ SÁNCHEZ, M.; BURGET, L. From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022.
p. 5095-5099. ISSN: 1990-9772. Detail - MOŠNER, L.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Multi-Channel Speaker Verification with Conv-Tasnet Based Beamformer. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022.
p. 7982-7986. ISBN: 978-1-6654-0540-9. Detail - MOŠNER, L.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Multisv: Dataset for Far-Field Multi-Channel Speaker Verification. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022.
p. 7977-7981. ISBN: 978-1-6654-0540-9. Detail - SILNOVA, A.; STAFYLAKIS, T.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; MATĚJKA, P.; BURGET, L.; GLEMBEK, O.; BRUMMER, J. Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch. Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022.
p. 9-16. Detail - STAFYLAKIS, T.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; BURGET, L.; ČERNOCKÝ, J. Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022.
p. 605-609. ISSN: 1990-9772. Detail
2021
- KARAFIÁT, M.; VESELÝ, K.; ČERNOCKÝ, J.; PROFANT, J.; NYTRA, J.; HLAVÁČEK, M.; PAVLÍČEK, T. Analysis of X-Vectors for Low-Resource Speech Recognition. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021.
p. 6998-7002. ISBN: 978-1-7281-7605-5. Detail - LANDINI, F.; GLEMBEK, O.; MATĚJKA, P.; ROHDIN, J.; BURGET, L.; DIEZ SÁNCHEZ, M.; SILNOVA, A. Analysis of the BUT Diarization System for Voxconverse Challenge. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021.
p. 5819-5823. ISBN: 978-1-7281-7605-5. Detail