Thesis Details
Robustní rozpoznávání mluvčího pomocí neuronových sítí
The objective of this work is to study state-of-the-art deep neural networks based speaker verification systems called x-vectors on various conditions, such as wideband and narrowband data and to develop the system, which is robust to unseen language, specific noise or speech codec. This system takes variable length audio recording and maps it into fixed length embedding which is afterward used to represent the speaker. We compared our systems to BUT's submission to Speakers in the Wild Speaker Recognition Challenge (SITW) from 2016, which used previously popular statistical models - i-vectors. We observed, that when comparing single best systems, with recently published x-vectors we were able to obtain more than 4.38 times lower Equal Error Rate on SITW core-core condition compared to SITW submission from BUT. Moreover, we find that diarization substantially reduces error rate when there are multiple speakers for SITW core-multi condition but we could not see the same trend on NIST SRE 2018 VAST data.
speaker verification, speaker recognition,neural networks, x-vector, i-vector
Čadík Martin, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Holub Jan, prof. Ing., Ph.D. (FIT CTU), člen
Křivka Zbyněk, Ing., Ph.D. (DIFS FIT BUT), člen
Polčák Libor, Ing., Ph.D. (DIFS FIT BUT), člen
Szőke Igor, Ing., Ph.D. (DCGM FIT BUT), člen
@mastersthesis{FITMT21835, author = "J\'{a}n Profant", type = "Master's thesis", title = "Robustn\'{i} rozpozn\'{a}v\'{a}n\'{i} mluv\v{c}\'{i}ho pomoc\'{i} neuronov\'{y}ch s\'{i}t\'{i}", school = "Brno University of Technology, Faculty of Information Technology", year = 2019, location = "Brno, CZ", language = "czech", url = "https://www.fit.vut.cz/study/thesis/21835/" }