Publication Details
Automatic Language Identification using Phoneme and Automatically Derived Unit Strings
Szőke Igor, Ing., Ph.D. (DCGM)
Schwarz Petr, Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
language identificaton, phoneme recognizer, speech processing, ergodic hidden Markov model
Language identification (LID) based on phono-tactic modeling is presented in this paper. Approaches using phoneme strings and strings of units automatically derived by an Ergodic HMM (EHMM) are compared. The phoneme recognizers were trained on 6 languages from OGI multi-language-corpus and Czech SpeechDat-E. The LID results are obtained on 4 languages. The results show superiority of Czech phoneme recognizer while used in LID and promising trends using
the EHMM-derived units.
Language identification (LID) based on phono-tactic modeling is presented in this paper. Approaches using phoneme strings and strings of units automatically derived by an Ergodic HMM (EHMM) are compared. The phoneme recognizers were trained on 6 languages from OGI multi-language-corpus and Czech SpeechDat-E. The LID results are obtained on 4 languages. The results show superiority of Czech phoneme recognizer while used in LID and promising trends using
the EHMM-derived units.
@article{BUT45738,
author="Pavel {Matějka} and Igor {Szőke} and Petr {Schwarz} and Jan {Černocký}",
title="Automatic Language Identification using Phoneme and Automatically Derived Unit Strings",
journal="Lecture Notes in Computer Science",
year="2004",
volume="2004",
number="3206",
pages="8",
issn="0302-9743",
url="http://www.springerlink.com/index/CUFLYEGQA8W1LNBE"
}