Publication Details
Temporal processing for feature extraction in speech recognition, habilitation thesis
speech recognition, feature extraction
Speech recognition is a booming research field, having large number of
applications in telecommunications (especially mobile), automobile
industry, consumer electronics, military and security, etc. Speech
recognition systems are classically built from three basic blocks:
feature extraction, acoustic matching and language modeling. While the
last two are trained on data (annotated databases for acoustics and
large speech corpora for the LM), feature extraction block is often
neglected and most often, mel-frequency cepstral coefficients (MFCC) are
used. This work concentrates on two techniques that should improve the
feature extraction. The first is temporal filtering of feature
trajectories using filters designed on data using Linear Discriminant
Analysis (LDA). This technique is shown to improve the recognition
accuracy of isolated Czech words, confirming previous results on
US-English obtained by our colleagues from OGI Portland. The second part
of the work concentrates on more revolutionary approach of feature
extraction using TRAPs (temporal patterns) whose fundamentals were also
laid at OGI. Several experiments were conducted on three databases
during author's stay at OGI. Although we have shown that TRAPs are
comparable to MFCC's only on a small vocabulary recognition task, we
believe that combination of frequency-band processing and neural nets
will become very important in the next decade, and that they will become
standard blocks of feature extraction. A conclusion chapter is included
for both methods, giving directions of current and future work both at
OGI Portland and VUT Brno.
@misc{BUT67489,
author="Jan {Černocký}",
title="Temporal processing for feature extraction in speech recognition, habilitation thesis",
year="2002",
pages="80",
address="Brno",
url="http://www.fit.vutbr.cz/~cernocky/publi/2002/habil.pdf",
note="habilitation thesis"
}