News
Category: news
Day: 6 January 2026
In January, Ladislav Mošner from the Institute of Computer Graphics and Multimedia will defend his dissertation
We invite you to the defense of the dissertation of Ing. Ladislav Mošner from the Department of Computer Graphics and Multimedia, FIT BUT, which will take place on Wednesday, January 14, 2026, at 9:00 a.m. in meeting room G108. The supervisor of the dissertation entitled "Speaker recognition from a remote source with multi-channel audio processing" is Prof. Jan Černocký.
The general scientific problem that Mošner has been dealing with for a long time is speaker verification in a situation where we have a recording made from multiple remote microphones. We can imagine, for example, our communication with voice assistants (devices such as Google Home or Amazon Echo). The aim of Mošner's work is to offer steps leading to more accurate verification of the identity of a specific speaker in a similar situation, using: a) solutions to the absence of data for training models based on neural networks; b) finding specialized data processing techniques.
"In the first step, the user registers their voice in the system, i.e., they provide a recording of their voice. From this recording, information is extracted using neural networks—embedding—which identifies and characterizes them," Mošner begins to describe the general context of speaker verification in layman's terms. "In addition, we have a second group of recordings available, which come from multiple channels, typically several microphones." From these multiple recordings, it is necessary to extract the aforementioned embedding, i.e., a characteristic vector (a typical representation of a given speaker), which is then compared with the initial registration embedding. The result of the comparison is a score which, again in layman's terms, indicates the extent to which the system believes that two speakers are one and the same person. The specificity of verification in Ladislav Mošner's research lies precisely in the existence of multiple channels from which the recordings originate.
The above-defined research field is a relatively narrowly specified area that many experts around the world do not address. Generally speaking, there are few publications on the subject. This also led to problems that the author faced in his dissertation. Specifically, these included a lack of data/datasets, which are the basis of machine learning. Until now, data sets prepared for specific publications have been used. Mošner therefore sought to create a new data set for training and subsequent evaluation in such a way that other users could also use this set (i.e., while maintaining the principle of data openness). The result is the MultiSV and MultiSV2 data sets.
Another output of Mošner's dissertation is the solution to the problem of multichannel verification itself. Such a complex challenge required division into subproblems. The first sub-problem was multichannel processing using signal methods with neural networks; the second sub-problem was the extraction of embeddings in a situation where the input is only a single-channel recording that has been cleaned up (from reverberation or noise and with speech highlighted), i.e., a better version of the original multichannel input. The core of the author's work consisted of the first step, i.e., improving multichannel processing to provide a better recording of the speaker, which in turn leads to more accurate verification. The release of the MultiSV2 dataset then enabled Mošner and his colleagues to train a complex system capable of taking a multichannel recording and extracting embeddings directly from it.
When asked what he considers his greatest research achievement during his doctorate, Ladislav Mošner responds stoically: "Well, we achieved exactly what the project set out to do. We created a functional, complex system that does not depend on preprocessing." He himself states that he would like to continue his research into multichannel processing in other areas of human speech processing at the faculty. He would also like to continue working on the topic of speech biometrics (speaker verification), where he is already involved in cooperation with an industrial partner—the Greek company Omilia, a major global player in the field of conversational systems and voice biometrics. He sees his dissertation as a major milestone in his successful research career. He feels grateful to the people who surrounded him at the faculty. "I am glad that I was able to do my doctorate in Professor Černocký's group, where there are many great people and great experts." He also mentioned the importance of his research stay abroad at the French institute Inria (Institut national de recherche en sciences et technologies du numérique), which he completed
We wish Ladislav Mošner a successful defense and the fulfillment of his other scientific goals.