Project Details
Nové směry ve výzkumu a využití hlasových technologií
Project Period: 1. 1. 2005 – 31. 12. 2007
Project Type: grant
Code: GA102/05/0278
Agency: Czech Science Foundation
Program: Standardní projekty
voice technology;automatic speech recognition;multi-lingual systems;speaker recognition and verification;spontaneous speech recognition;accoustic-visual speech processing;automatic transcription;large speech databases;dialogue systems;prosody optimization
The proposed project follows up the previous research activities carried out in the speech processing area by the team that integrates all Czech research groups which are recently active in speech analysis, synthesis and recognition. It was established in 1996 to participate on an ambitious 6-year project supported by the GACR and later continued in another speech oriented project ending in 2002. Each of the groups involved has its own proficiency in a specific domain, which allows the consortium to work on integrated and complex tasks. In the previous years the team has created large databases of annotated speech recordings, which are now available both training and testing purposes in speech recognition domain as well as for speech synthesis. In addition, a set of powerful tools and platforms for developing own recognition and synthesis systems has been built together with several working prototypes that serve for evaluation and demonstration purposes. Based on this state and with respect to the recent trends in voice technologies, the project will focus on the investigation and implementation of algorithms that are applicable in distributed, embedded and mobile systems, in recognition engines working with very large vocabularies, in TTS modules for interactive communication and information services, in automatic transcription of broadcast news as well as in multimodal audio-visual interfaces. Primarily, the research will address specific needs of Czech.
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Grézl František, Ing., Ph.D. (DCGM)
Chalupníček Kamil, Ing. (RG SPEECH)
Karafiát Martin, Ing., Ph.D. (DCGM)
Matějka Pavel, Ing., Ph.D. (DCGM)
Motlíček Petr, doc. Ing., Ph.D. (DCGM)
Schwarz Petr, Ing., Ph.D. (DCGM)
Szőke Igor, Ing., Ph.D. (DCGM)
2007
- BRÜMMER, N.; BURGET, L.; ČERNOCKÝ, J.; GLEMBEK, O.; GRÉZL, F.; KARAFIÁT, M.; VAN LEEUWEN, D.; MATĚJKA, P.; SCHWARZ, P.; STRASHEIM, A. Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006. IEEE Transactions on Audio, Speech, and Language Processing, 2007, vol. 15, no. 7,
p. 2072-2084. ISSN: 1558-7916. Detail - BURGET, L.; MATĚJKA, P.; SCHWARZ, P.; GLEMBEK, O.; ČERNOCKÝ, J. Analysis of feature extraction and channel compensation in GMM speaker recognition system. IEEE Transactions on Audio, Speech, and Language Processing, 2007, vol. 15, no. 7,
p. 1979-1986. ISSN: 1558-7916. Detail - ČERNOCKÝ, J.; BURGET, L.; SCHWARZ, P.; MATĚJKA, P.; KARAFIÁT, M.; GLEMBEK, O.; KOPECKÝ, J.; SZŐKE, I.; FAPŠO, M.; GRÉZL, F.; HUBEIKA, V.; OPARIN, I. Search in speech, language identification and speaker recognition in Speech@FIT. Proc. 17th International Conference Radioelektronika, 2007. Brno: Department of Radioelectronics FEEC BUT, 2007.
p. 1-6. ISBN: 978-80-214-3390-8. Detail - ČERNOCKÝ, J.; SZŐKE, I.; FAPŠO, M.; KARAFIÁT, M.; BURGET, L.; KOPECKÝ, J.; GRÉZL, F.; SCHWARZ, P.; GLEMBEK, O.; OPARIN, I.; SMRŽ, P.; MATĚJKA, P. Search in speech for public security and defense. Proc. IEEE Workshop on Signal Processing Applications for Public Security and Forensics, 2007 (SAFE '07). Washington D.C.: IEEE Signal Processing Society, 2007.
p. 1-7. ISBN: 1-4244-1226-9. Detail - FAPŠO, M. Search in speech records. Proc. 13th Conference STUDENT EEICT 2007. Brno: Faculty of Electrical Engineering and Communication BUT, 2007.
p. 1-3. ISBN: 978-80-214-3410-3. Detail - GRÉZL, F.; ČERNOCKÝ, J. TRAP-based Techniques for Recognition of Noisy Speech. Proc. 10th International Conference on Text Speech and Dialogue (TSD 2007). LNCS. Berlin: Springer Verlag, 2007.
p. 270-277. ISBN: 978-3-540-74627-0. Detail - GRÉZL, F.; KARAFIÁT, M.; ČERNOCKÝ, J. Neural network topologies and bottle neck features in speech recognition. Brno: 2007.
p. 78-82. Detail - GRÉZL, F.; KARAFIÁT, M.; KONTÁR, S.; ČERNOCKÝ, J. Probabilistic and bottle-neck features for LVCSR of meetings. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007). Hononulu: IEEE Signal Processing Society, 2007.
p. 757-760. ISBN: 1-4244-0728-1. Detail - HUBEIKA, V.; SZŐKE, I.; BURGET, L.; ČERNOCKÝ, J. Maximum Likelihood and Maximum Mutual Information Training in Gender and Age Recognition System. Proc. 10th International Conference on Text Speech and Dialogue (TSD 2007). Pilsen: Springer Verlag, 2007.
p. 1-6. ISBN: 978-3-540-74627-0. Detail - KARAFIÁT, M.; BURGET, L.; ČERNOCKÝ, J.; HAIN, T. Application of CMLLR in narrow band wide band adapted systems. Proc. INTERSPEECH 2007. Proceedings of Interspeech. Antwerpen: International Speech Communication Association, 2007.
p. 1260-1263. ISSN: 1990-9772. Detail - MATĚJKA, P.; BURGET, L.; GLEMBEK, O.; SCHWARZ, P.; HUBEIKA, V.; FAPŠO, M.; MIKOLOV, T.; PLCHOT, O. BUT system description for NIST LRE 2007. Proc. 2007 NIST Language Recognition Evaluation Workshop. Orlando: National Institute of Standards and Technology, 2007.
p. 1-5. Detail - MATĚJKA, P.; BURGET, L.; SCHWARZ, P.; GLEMBEK, O.; KARAFIÁT, M.; GRÉZL, F.; ČERNOCKÝ, J.; VAN LEEUWEN, D.; BRÜMMER, N.; STRASHEIM, A. STBU system for the NIST 2006 speaker recognition evaluation. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007). Honolulu: IEEE Signal Processing Society, 2007.
p. 221-224. ISBN: 1-4244-0728-1. Detail - MIKOLOV, T.; OPARIN, I.; GLEMBEK, O.; BURGET, L.; KARAFIÁT, M.; ČERNOCKÝ, J. Použití mluvených korpusů ve vývoji systému pro rozpoznávání českých přednášek. Praha: Univerzita Karlova, 2007.
s. 1-5. Detail - SZŐKE, I.; BURGET, L.; KARAFIÁT, M. Combination of Word and Phoneme Approach for Spoken Term Detection. Brno: 2007.
p. 1 (1 s.). Detail - SZŐKE, I.; FAPŠO, M.; KARAFIÁT, M.; BURGET, L.; GRÉZL, F.; SCHWARZ, P.; GLEMBEK, O.; MATĚJKA, P.; KOPECKÝ, J.; ČERNOCKÝ, J. Spoken Term Detection System Based on a Combination of LVCSR and Phonetic Search. Brno: 2007.
p. 1 (1 s.). Detail
2006
- AL-HAMES, M.; HAIN, T.; ČERNOCKÝ, J.; SCHREIBER, S.; POEL, M.; MÜLLER, R.; MARCEL, S.; VAN LEEUWEN, D.; ODOBEZ, J.; BA, S.; BOURLARD, H.; CARDINAUX, F.; GATICA-PEREZ, D.; JANIN, A.; MOTLÍČEK, P.; REITER, S.; RENALS, S.; VAN REST, J.; RIENKS, R.; RIGOLL, G.; SMITH, K.; THEAN, A.; ZEMČÍK, P. Audio-Visual Processing in Meetings: Seven Questions and Current AMI Answers. Proc. 3nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI 2006). Washington D.C.: 2006.
p. 1-12. Detail - BURGET, L.; FAPŠO, M.; MATĚJKA, P.; SMRŽ, P.; ČERNOCKÝ, J.; KARAFIÁT, M.; SCHWARZ, P.; SZŐKE, I. Indexing and search methods for spoken documents. Proceedings of the Ninth International Conference on Text, Speech and Dialogue, TSD 2006. Lecture Notes in Computer Science. LNCS. Berlin: Springer Verlag, 2006.
p. 351-358. ISSN: 0302-9743. Detail - BURGET, L.; MATĚJKA, P.; ČERNOCKÝ, J. Discriminative Training Techniques for Acoustic Language Identification. Proceedings of ICASSP 2006. Toulouse: 2006.
p. 209-212. Detail - ČERNOCKÝ, J.; MATĚJKA, P.; BURGET, L.; SCHWARZ, P. Automatic Language Identification System. Sborník příspěvků z odborného semináře "Nové technologie v radiokomunikacích". Brno: University of Defence in Brno, 2006.
p. 1-6. Detail - FAPŠO, M.; SCHWARZ, P.; SZŐKE, I.; SMRŽ, P.; SCHWARZ, M.; ČERNOCKÝ, J.; KARAFIÁT, M.; BURGET, L. Search Engine for Information Retrieval from Speech Records. Proceedings of the Third International Seminar on Computer Treatment of Slavic and East European Languages. Bratislava: 2006.
p. 100-101. Detail - FAPŠO, M.; SMRŽ, P.; SCHWARZ, P.; SZŐKE, I.; SCHWARZ, M.; ČERNOCKÝ, J.; KARAFIÁT, M.; BURGET, L. Information Retrieval from Spoken Documents. Proceedings of the Seventh International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2006). Mexico City: Springer Verlag, 2006.
p. 410-416. ISBN: 3-540-32205-1. Detail - GLEMBEK, O.; KARAFIÁT, M.; BURGET, L.; ČERNOCKÝ, J. Czech Speech Recognizer for Multiple Environments. Radioeletronika 2006. Bratislava: 2006.
p. 1-4. Detail - HUBEIKA, V. Estimation of Gender and Age from Recorded Speech. Proc. ACM Student Research competition 2006. Prague: Czech Technical University, 2006.
p. 25-32. ISBN: 80-01-03595-6. Detail - KARAFIÁT, M.; GRÉZL, F.; SCHWARZ, P.; BURGET, L.; ČERNOCKÝ, J. Robust heteroscedastic linear discriminant analysis and LCRC posterior features in large vocabulary continuous speech recognition. Proc. Fifth Slovenian and First International Language Technologies Conference. Ljubljana: 2006.
p. 1-4. Detail - KARAFIÁT, M.; GRÉZL, F.; SCHWARZ, P.; BURGET, L.; ČERNOCKÝ, J. Robust heteroscedastic linear discriminant analysis and LCRC posterior features in meeting data recognition. Proc. 3nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI 2006). Lecture Notes in Computer Science. Berlin: Springer Verlag, 2006.
p. 275-284. ISBN: 3-540-69267-3. Detail - KONTÁR, S. Parallel training of neural networks for speech recognition. Proc. 12th International Conference on Soft Computing MENDEL'06. Brno: Brno University of Technology, 2006.
p. 6037-6042. ISBN: 80-214-3195-4. Detail - KOPECKÝ, J.; SZŐKE, I.; FAPŠO, M.; KARAFIÁT, M.; BURGET, L.; OPARIN, I.; SCHWARZ, P.; MATĚJKA, P.; ČERNOCKÝ, J.; GLEMBEK, O. BUT System for NIST STD 2006 - Arabic. Proc. NIST SPoken Term Detection Evaluation workshop (STD 2006). Washington D.C.: National Institute of Standards and Technology, 2006.
p. 1-15. Detail - MATĚJKA, P.; BURGET, L.; SCHWARZ, P.; ČERNOCKÝ, J. Brno University of Technology System for NIST 2005 Language Recognition Evaluation. Proceedings of Odyssey 2006: The Speaker and Language Recognition Workshop. San Juan: 2006.
p. 57-64. ISBN: 1-4244-0472-X. Detail - MATĚJKA, P.; BURGET, L.; SCHWARZ, P.; ČERNOCKÝ, J. NIST Language Recognition Evaluation 2005. Proceedings of NIST LRE 2005. Washington DC: National Institute of Standards and Technology, 2006.
p. 1-37. Detail - MATĚJKA, P.; SCHWARZ, P.; BURGET, L.; ČERNOCKÝ, J. Use of anti-models to furher improve state-of-the-art PRLM language recognition system. Proceedings of ICASSP 2006. Toulouse: 2006.
p. 197-200. Detail - SCHWARZ, P.; MATĚJKA, P.; ČERNOCKÝ, J. Hierarchical structures of neural networks for phoneme recognition. Proceedings of ICASSP 2006. Toulouse: 2006.
p. 325-328. Detail - SZŐKE, I.; FAPŠO, M.; KARAFIÁT, M.; BURGET, L.; GRÉZL, F.; SCHWARZ, P.; GLEMBEK, O.; MATĚJKA, P.; KONTÁR, S.; ČERNOCKÝ, J. BUT System for NIST STD 2006 - English. Proc. NIST SPoken Term Detection Evaluation workshop (STD 2006). Washington D.C.: National Institute of Standards and Technology, 2006.
p. 1-26. Detail
2005
- ASHBY, S.; BOURBAN, S.; CARLETTA, J.; FLYNN, M.; GUILLEMOT, M.; HAIN, T.; KARAISKOS, V.; KRAAIJ, W.; KRONENTHAL, M.; LATHOUD, G.; LINCOLN, M.; LISOWSKA, A.; MCCOWAN, I.; POST, W.; REIDSMA, D.; WELLNER, P.; KADLEC, J. The AMI Meeting Corpus: A Pre-Announcement. Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI). Edinburgh: 2005.
p. 1-4. Detail - ASHBY, S.; BOURBAN, S.; CARLETTA, J.; FLYNN, M.; GUILLEMOT, M.; HAIN, T.; KARAISKOS, V.; KRAAIJ, W.; KRONENTHAL, M.; LATHOUD, G.; LINCOLN, M.; LISOWSKA, A.; MCCOWAN, I.; POST, W.; REIDSMA, D.; WELLNER, P.; KADLEC, J. The AMI Meeting Corpus. Measuring Behavior 2005 Proceedings Book. Wageningen: 2005.
p. 1-4. Detail - FAPŠO, M.; SCHWARZ, P.; SZŐKE, I.; ČERNOCKÝ, J.; SMRŽ, P.; BURGET, L.; KARAFIÁT, M. Search Engine for Information Retrieval from Multi-modal Records. Edinburgh: 2005.
p. 0-0. Detail - FAPŠO, M.; SMRŽ, P.; SCHWARZ, P.; SZŐKE, I.; BURGET, L.; KARAFIÁT, M.; ČERNOCKÝ, J. Systém pre efektívne vyhľadávanie v rečových databázach. Sborník databázové konference DATAKON 2005. Brno: Masaryk University, 2005.
s. 323-333. ISBN: 80-210-3813-6. Detail - GRÉZL, F. Spectral plane investigation for probabilistic features for ASR. Edinburgh: 2005.
p. 82-86. Detail - HAIN, T.; BURGET, L.; DINES, J.; GARAU, G.; KARAFIÁT, M.; LINCOLN, M.; MCCOWAN, I.; MOORE, D.; WAN, V.; ORDELMAN, R.; RENALS, S. The 2005 AMI System for the Transcription of Speech in Meetings. Machine Learning for Multimodal Interaction, Second International Workshop, MLMI 2005, Edinburgh, UK, July 11-13, 2005, Revised Selected Papers. Lecture Notes in Computer Science Volume 3869, Springer 2006. Edinburgh: University of Edinburgh, 2005.
p. 450-462. ISBN: 978-3-540-32549-9. Detail - HAIN, T.; KARAFIÁT, M.; DINES, J.; MCCOWAN, I.; LINCOLN, M.; GARAU, G.; WAN, V.; ORDELMAN, R.; RENALS, S. The Development of the AMI System for the Transcription of Speech in Meetings. Machine Learning for Multimodal Interaction, Second International Workshop, MLMI 2005, Edinburgh, UK, July 11-13, 2005, Revised Selected Papers. Lecture Notes in Computer Science Volume 3869, Springer 2006. Edinburgh: University of Edinburgh, 2005.
p. 344-356. ISBN: 978-3-540-32549-9. Detail - HAIN, T.; KARAFIÁT, M.; GARAU, G.; MOORE, D.; WAN, V.; ORDELMAN, R.; RENALS, S. Transcription of Conference Room Meetings: an Investigation. Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology. European Conference EUROSPEECH. Lisabon: International Speech Communication Association, 2005.
p. 1-4. ISSN: 1018-4074. Detail - KARAFIÁT, M.; BURGET, L.; ČERNOCKÝ, J. Using Smoothed Heteroscedastic Linear Discriminant Analysis in Large Vocabulary Continuous Speech Recognition System. 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms. tento článek nebyl zařazen mezi Revised Selected Papers, nevyšel v LNCS 3869. Edinbourgh, Scotland: University of Edinburgh, 2005.
p. 1-8. Detail - MATĚJKA, P. Phoneme Recognition Tuning for Language Identification System. Proceedings of the 11th conference STUDENT EEICT 2005. Brno: Faculty of Electrical Engineering and Communication BUT, 2005.
p. 658-653. ISBN: 80-214-2890-2. Detail - MATĚJKA, P.; SCHWARZ, P.; ČERNOCKÝ, J.; CHYTIL, P. Phonotactic Language Identification using High Quality Phoneme Recognition. Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology. European Conference EUROSPEECH. Lisbon: International Speech Communication Association, 2005.
p. 2237-2240. ISSN: 1018-4074. Detail - MATĚJKA, P.; SCHWARZ, P.; ČERNOCKÝ, J.; CHYTIL, P. Phonotactic Language Identification. Proceedings of Radioelektronika 2005. Brno: Faculty of Electrical Engineering and Communication BUT, 2005.
p. 140-143. ISBN: 80-214-2904-6. Detail - MATĚJKA, P.; SCHWARZ, P.; ČERNOCKÝ, J.; CHYTIL, P. Tuning Phonotactic Language Identificaion System. Brno: Faculty of Information Technology BUT, 2005.
p. 1-5. Detail - MOTLÍČEK, P.; BURGET, L.; ČERNOCKÝ, J. Non-parametric Speaker Turn Segmentation of Meeting Data. Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology. European Conference EUROSPEECH. Lisabon: International Speech Communication Association, 2005.
p. 657-660. ISSN: 1018-4074. Detail - STOLCKE, A.; ANGUERA, X.; BOAKYE, K.; CETIN, Ö.; GRÉZL, F.; JANIN, A.; MANDAL, A.; PESKIN, B.; WOOTERS, C.; ZHENG, J. Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System. Machine Learning for Multimodal Interaction, Second International Workshop, MLMI 2005, Edinburgh, UK, July 11-13, 2005, Revised Selected Papers. Lecture Notes in Computer Science 3869, Springer 2006. Edinburgh, Scotland: University of Edinburgh, 2005.
p. 463-475. ISBN: 978-3-540-32549-9. Detail - SUMEC, S.; KADLEC, J. Event Editor - The Multi-Modal Annotation Tool. Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI). Edinburgh: 2005.
p. 1 (1 s.). Detail - SZŐKE, I. Smooth Pitch Tracker Based on Harmonic and Noise Model. STUDENT EEICT 2005. Brno: Faculty of Information Technology BUT, 2005.
p. 673-677. ISBN: 80-214-2890-2. Detail - SZŐKE, I.; SCHWARZ, P.; BURGET, L.; FAPŠO, M.; KARAFIÁT, M.; ČERNOCKÝ, J.; MATĚJKA, P. Comparison of Keyword Spotting Approaches for Informal Continuous Speech. Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology. European Conference EUROSPEECH. Lisabon: 2005.
p. 633-636. ISSN: 1018-4074. Detail - SZŐKE, I.; SCHWARZ, P.; BURGET, L.; KARAFIÁT, M.; MATĚJKA, P.; ČERNOCKÝ, J. Phoneme Based Acoustics Keyword Spotting in Informal Continuous Speech. Lecture Notes in Computer Science, 2005, vol. 2005, no. 3658,
p. 302-309. ISSN: 0302-9743. Detail - SZŐKE, I.; SCHWARZ, P.; MATĚJKA, P.; BURGET, L.; FAPŠO, M.; KARAFIÁT, M.; ČERNOCKÝ, J. Comparison of Keyword Spotting Approaches for Informal Continuous Speech. 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms. Edinburgh: 2005.
p. 1-12. Detail - ZHU, Q.; CHEN, B.; GRÉZL, F.; MORGAN, N. Improved MLP Structures for Data-Driven Feature Extraction for ASR. Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology. European Conference EUROSPEECH. Lisabon: 2005.
p. 2129-2132. ISSN: 1018-4074. Detail