Publication Details

EMPLOYMENT OF SUBSPACE GAUSSIAN MIXTURE MODELS IN SPEAKER RECOGNITION

MOTLÍČEK, P.; DEY, S.; MADIKERI, S.; BURGET, L. EMPLOYMENT OF SUBSPACE GAUSSIAN MIXTURE MODELS IN SPEAKER RECOGNITION. In Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. South Brisbane, Queensland: IEEE Signal Processing Society, 2015. p. 4445-4449. ISBN: 978-1-4673-6997-8.

Czech title

Využití podprostorových modelů Gaussovských směsí pro rozpoznávání mluvčího

Type

conference paper

Language

English

Authors

Motlíček Petr, doc. Ing., Ph.D. (DCGM)
Dey Subhadeep
Madikeri Srikanth
Burget Lukáš, doc. Ing., Ph.D. (DCGM)

URL

Keywords

speaker recognition, i-vectors, subspace Gaussianmixture models, automatic speech recognition

Abstract

This paper presents Subspace Gaussian Mixture Model (SGMM)approach employed as a probabilistic generative model to estimatespeaker vector representations to be subsequently used in the speakerverification task. SGMMs have already been shown to significantlyoutperform traditional HMM/GMMs in Automatic Speech Recognition(ASR) applications. An extension to the basic SGMM frameworkallows to robustly estimate low-dimensional speaker vectorsand exploit them for speaker adaptation. We propose a speaker verificationframework based on low-dimensional speaker vectors estimatedusing SGMMs, trained in ASR manner using manual transcriptions.To test the robustness of the system, we evaluate theproposed approach with respect to the state-of-the-art i-vector extractoron the NIST SRE 2010 evaluation set and on four differentlength-utterance conditions: 3sec-10sec, 10 sec-30 sec, 30 sec-60 secand full (untruncated) utterances. Experimental results reveal thatwhile i-vector system performs better on truncated 3sec to 10sec and10 sec to 30 sec utterances, noticeable improvements are observedwith SGMMs especially on full length-utterance durations. Eventually,the proposed SGMM approach exhibits complementary propertiesand can thus be efficiently fused with i-vector based speakerverification system.

Annotation

This paper presents Subspace Gaussian Mixture Model (SGMM) approach employed as a probabilistic generative model to estimate speaker vector representations to be subsequently used in the speaker verification task. SGMMs have already been shown to significantly outperform traditional HMM/GMMs in Automatic Speech Recognition (ASR) applications. An extension to the basic SGMM framework allows to robustly estimate low-dimensional speaker vectors and exploit them for speaker adaptation. We propose a speaker verification framework based on low-dimensional speaker vectors estimated using SGMMs, trained in ASR manner using manual transcriptions. To test the robustness of the system, we evaluate the proposed approach with respect to the state-of-the-art i-vector extractor on the NIST SRE 2010 evaluation set and on four different length-utterance conditions: 3sec-10sec, 10 sec-30 sec, 30 sec-60 sec and full (untruncated) utterances. Experimental results reveal that while i-vector system performs better on truncated 3sec to 10sec and 10 sec to 30 sec utterances, noticeable improvements are observed with SGMMs especially on full length-utterance durations. Eventually, the proposed SGMM approach exhibits complementary properties and can thus be efficiently fused with i-vector based speaker verification system.

Published

2015

Pages

4445–4449

Proceedings

Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing

Conference

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), Brisbane, AU

ISBN

978-1-4673-6997-8

Publisher

IEEE Signal Processing Society

Place

South Brisbane, Queensland

DOI

10.1109/ICASSP.2015.7178811

UT WoS

000427402904111

EID Scopus

2-s2.0-84946019484

BibTeX

@inproceedings{BUT119895,
  author="Petr {Motlíček} and Subhadeep {Dey} and Srikanth {Madikeri} and Lukáš {Burget}",
  title="EMPLOYMENT OF SUBSPACE GAUSSIAN MIXTURE MODELS IN SPEAKER RECOGNITION",
  booktitle="Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing",
  year="2015",
  pages="4445--4449",
  publisher="IEEE Signal Processing Society",
  address="South Brisbane, Queensland",
  doi="10.1109/ICASSP.2015.7178811",
  isbn="978-1-4673-6997-8",
  url="https://ieeexplore.ieee.org/document/7178811"
}

Files

pdf motlicek_icassp2015_0004445.pdf 446 kB