Publication Details

Feature And Score Level Combination Of Subspace Gaussians In LVCSR Task

MOTLÍČEK, P.; POVEY, D.; KARAFIÁT, M. Feature And Score Level Combination Of Subspace Gaussians In LVCSR Task. Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013. p. 7604-7608. ISBN: 978-1-4799-0355-9.

Czech title

Kombinace výstupů Gaussovek v podprostorech na úrovni parametrů a skóre v LVCSR úloze

Type

conference paper

Language

English

Authors

Motlíček Petr, doc. Ing., Ph.D. (DCGM)
Povey Daniel
Karafiát Martin, Ing., Ph.D. (DCGM)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2013/motlicek_icassp2013_0007604.pdf

Keywords

Automatic Speech Recognition, Discriminativefeatures, System combination

Abstract

We have demonstrated that the SGMM framework is an efficient approachin the LVCSR task. Overall evaluations of SGMMs exploitingpowerful but complex PLP-BN features yield similar results asthose obtained by conventional HMM/GMMs. Nevertheless, the totalnumber of SGMM parameters is about 3 times less than in theHMM/GMM framework. Evaluation results also indicate differentproperties of the examined acoustic modeling techniques. AlthoughSGMMs consistently outperform HMM/GMMs when built over individualfeatures, HMM/GMMs can benefit much more from thefeature-level combination than SGMMs. Nevertheless based on ananalysis measuring complementarity of individual recognition systems,we show that SGMM-based recognizers produce heterogeneousoutputs (scores) and thus subsequent score-level combinationcan bring additional improvement.

Annotation

In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outputs (word sequences) from individual recognizers trained using different features are also combined on a score-level using ROVER for the both acoustic modeling techniques. Experimental results indicate three important findings: (1) SGMMs consistently outperform HMM/GMMs (relative improvement on average by about 6% in terms of WER) when both techniques are exploited on single features; (2) SGMMs benefit much less from feature-level combination (1% relative improvement) as opposed to HMM/GMMs (4% relative improvement) which can eventually match the performance of SGMMs; (3) SGMMs can be significantly improved when individual systems are combined on a score-level. This suggests that the SGMM systems provide complementary recognition outputs. Overall relative improvements of the combined SGMMand HMM/GMM systems are 21% and 17% respectively compared to a standard ASR baseline.

Published

2013

Pages

7604–7608

Proceedings

Proceedings of ICASSP 2013

Conference

38th International Conference on Acoustics, Speech, and Signal Processing, Vancouver, CA

ISBN

978-1-4799-0355-9

Publisher

IEEE Signal Processing Society

Place

Vancouver

BibTeX

@inproceedings{BUT103519,
  author="Petr {Motlíček} and Daniel {Povey} and Martin {Karafiát}",
  title="Feature And Score Level Combination Of Subspace Gaussians In LVCSR Task",
  booktitle="Proceedings of ICASSP 2013",
  year="2013",
  pages="7604--7608",
  publisher="IEEE Signal Processing Society",
  address="Vancouver",
  isbn="978-1-4799-0355-9",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2013/motlicek_icassp2013_0007604.pdf"
}