Publication Details

VISUAL FEATURES FOR MULTIMODAL SPEECH RECOGNITION

MOTLÍČEK, P.; BURGET, L.; ČERNOCKÝ, J. VISUAL FEATURES FOR MULTIMODAL SPEECH RECOGNITION. Radioelektronika 2005. Brno: Faculty of Electrical Engineering and Communication BUT, 2005. p. 187-190. ISBN: 80-214-2904-6.

Czech title

Video parametry pro multimodální rozpoznávání řeči

Type

conference paper

Language

English

Authors

Motlíček Petr, doc. Ing., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)

URL

Keywords

speech recognition, feature extraction, parameterization, visual features, linear transforms, meeting data

Abstract

This paper proposes a bimodal speech recognition scheme using visual parameters extracted from meeting recordings.

Annotation

This paper demonstrates the use of visual parameters extracted from video for automatic recognition of phoneme strings. Encouraged by previous works utilizing "visually clean" data we investigate their efficiency in non-ideal conditions which are introduced by meeting audio-visual data employed in our experiments.

Published

2005

Pages

187–190

Proceedings

Radioelektronika 2005

Conference

15th International Czech-Slovak Scientific conference Radioelektronika 2005, Brno, CZ

ISBN

80-214-2904-6

Publisher

Faculty of Electrical Engineering and Communication BUT

Place

Brno

BibTeX

@inproceedings{BUT21499,
  author="Petr {Motlíček} and Lukáš {Burget} and Jan {Černocký}",
  title="VISUAL FEATURES FOR MULTIMODAL SPEECH RECOGNITION",
  booktitle="Radioelektronika 2005",
  year="2005",
  pages="187--190",
  publisher="Faculty of Electrical Engineering and Communication BUT",
  address="Brno",
  isbn="80-214-2904-6",
  url="https://www.fit.vut.cz/research/publication/7784/"
}