Publication Details

Automatic Language Identification System

ČERNOCKÝ, J.; MATĚJKA, P.; BURGET, L.; SCHWARZ, P. Automatic Language Identification System. Sborník příspěvků z odborného semináře "Nové technologie v radiokomunikacích". Brno: University of Defence in Brno, 2006. p. 1-6.

Czech title

Systém pro automatickou identifikaci jazyka

Type

conference paper

Language

English

Authors

Černocký Jan, prof. Dr. Ing. (DCGM)
Matějka Pavel, Ing., Ph.D.
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Schwarz Petr, Ing., Ph.D. (DCGM)

URL

http://www.fit.vutbr.cz/~cernocky/publi/2006/acr2006.pdf

Keywords

speech processing, automatic language identification

Abstract

This paper presents the language identification (LID) systemdeveloped in Speech@FIT. The system consists of twoparts: Acoustic LID determines the language directly on thebasis of features derived from the speech signal. We haveimproved existing approaches by adding discriminative trainingof acoustic models. In phonotactic LID, speech is firsttranscribed by phoneme recognizer into strings or graphs (lattices)of phonemes. On these, language models are trainedto capture statistics of sequences of phonemes. We have pioneeredthe use of so called îanti-modelsî for this task. All experimentalresults are reported on standard NIST 2003 data;comparison with other published results is favorable to oursystem.

Annotation

This paper presents the language identification (LID) system developed in Speech@FIT. The system consists of two parts: Acoustic LID determines the language directly on the basis of features derived from the speech signal. We have improved existing approaches by adding discriminative training of acoustic models. In phonotactic LID, speech is first transcribed by phoneme recognizer into strings or graphs (lattices) of phonemes. On these, language models are trained to capture statistics of sequences of phonemes. We have pioneered the use of so called îanti-modelsî for this task. All experimental results are reported on standard NIST 2003 data; comparison with other published results is favorable to our system.

Published

2006

Pages

1–6

Proceedings

Sborník příspěvků z odborného semináře "Nové technologie v radiokomunikacích"

Conference

Special seminar "New technologies in radiocommunications", Brno, CZ

Publisher

University of Defence in Brno

Place

Brno

BibTeX

@inproceedings{BUT22285,
  author="Jan {Černocký} and Pavel {Matějka} and Lukáš {Burget} and Petr {Schwarz}",
  title="Automatic Language Identification System",
  booktitle="Sborník příspěvků z odborného semináře {"}Nové technologie v radiokomunikacích{"}",
  year="2006",
  pages="1--6",
  publisher="University of Defence in Brno",
  address="Brno",
  url="http://www.fit.vutbr.cz/~cernocky/publi/2006/acr2006.pdf"
}