Publication Details
Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts
Glembek Ondřej, Ing., Ph.D.
Plchot Oldřich, Ing., Ph.D. (DCGM)
Matějka Pavel, Ing., Ph.D.
Soufifar Mehdi Mohammad, Ing.
Cordoba Ricardo
Černocký Jan, prof. Dr. Ing. (DCGM)
- http://www.isca-speech.org/archive/interspeech_2012/i12_0042.html
- http://www.fit.vutbr.cz/research/groups/speech/publi/2012/d_haro_interspeech2012_558_pp1_4.pdf PDF
- http://www.fit.vutbr.cz/research/groups/speech/publi/2012/d_haro_interspeech2012_presentation_MonO1b_04.pdf PDF
- http://www.fit.vutbr.cz/research/groups/speech/publi/2012/d_haro_interspeech2012_presentation_Mon.O1b.04.pptx PPT
subspace modeling, multinomial distributions, LID
The article is about a Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts.
This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices.
@inproceedings{BUT97012,
author="Luis Fernando {D’Haro} and Ondřej {Glembek} and Oldřich {Plchot} and Pavel {Matějka} and Mehdi Mohammad {Soufifar} and Ricardo {Cordoba} and Jan {Černocký}",
title="Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts",
booktitle="Proceedings of Interspeech 2012",
year="2012",
journal="Proceedings of Interspeech",
volume="2012",
number="9",
pages="1--4",
publisher="International Speech Communication Association",
address="Portland, Oregon",
isbn="978-1-62276-759-5",
issn="1990-9772",
url="http://www.isca-speech.org/archive/interspeech_2012/i12_0042.html"
}