Publication Details
Combination of MFCC and TRAP features for LVCSR of meeting data
Grézl František, Ing., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
speech recognition, TRAP, feature extraction, feature combination, hlda
he aim of this work is to examine TempoRAl Patterns (TRAPs) basedfeature extraction for the task of large vocabulary continuous speechrecognition (LVCSR). Previously, TRAPs based features were mainly usedin conjunction with hybrid NN-HMM recognition system (the conectionistapproach). In this work, we use Tandem-TRAPS system to generate speechfeatures, which are then used as an input for a standard GMM-HMMsystem. This approach allows for more precise modeling of phoneticcontext (context dependent models), which is important for LVCSR.Experiments are carried out on ICSI meetings database. For TRAPSprocessing, it is shown that use of frequency differentiation and localoperators can significantly improve recognition performance.Performances obtained with TRAPs based features and convetional MFCCfeatures are compared. Although stand-alone TRAPs based features neveroutperform MFCC in our experiments, we have reported an improvementover MFCC when TRAPs based features and MFCC features are combinedtogether. The combined features are created by concatenation of theoriginal feature streams followed by Heteroscedastic LinearDiscriminant Analysis to perform decorelation and dimensionalityreduction. Compared to previous works, the big advantage is brought byHLDA which combines the two feature streams optimally without strongassumptions imposed on data by previously used transforms (as PCA andLDA)
@misc{BUT63339,
author="Martin {Karafiát} and František {Grézl} and Lukáš {Burget}",
title="Combination of MFCC and TRAP features for LVCSR of meeting data",
year="2004",
pages="1",
address="Martigny",
note="presentation, poster"
}