Publication Details

Learning document representations using subspace multinomial model

KESIRAJU, S.; BURGET, L.; SZŐKE, I.; ČERNOCKÝ, J. Learning document representations using subspace multinomial model. In Proceedings of Interspeech 2016. San Francisco: International Speech Communication Association, 2016. p. 700-704. ISBN: 978-1-5108-3313-5.
Czech title
Učení reprezentací dokumentů pomocí podprostorového multinomiálního modelu
Type
conference paper
Language
English
Authors
URL
Keywords

Document representation, subspace modelling, topic identification, latent topic discovery

Abstract

Subspace multinomial model (SMM) is a log-linear model and can be used for learning low dimensional continuous representation for discrete data. SMMand its variants have been used for speaker verification based on prosodic features and phonotactic language recognition. In this paper, we propose a new variant of SMM that introduces sparsity and call the resulting model as `1 SMM. We show that `1 SMM can be used for learning document representations that are helpful in topic identification or classification and clustering tasks. Our experiments in document classification show that SMM achieves comparable results to models such as latent Dirichlet allocation and sparse topical coding, while having a useful property that the resulting document vectors are Gaussian distributed.

Annotation

Subspace multinomial model (SMM) is a log-linear model and can be used for learning low dimensional continuous representation for discrete data. SMMand its variants have been used for speaker verification based on prosodic features and phonotactic language recognition. In this paper, we propose a new variant of SMM that introduces sparsity and call the resulting model as `1 SMM. We show that `1 SMM can be used for learning document representations that are helpful in topic identification or classification and clustering tasks. Our experiments in document classification show that SMM achieves comparable results to models such as latent Dirichlet allocation and sparse topical coding, while having a useful property that the resulting document vectors are Gaussian distributed.

Published
2016
Pages
700–704
Proceedings
Proceedings of Interspeech 2016
ISBN
978-1-5108-3313-5
Publisher
International Speech Communication Association
Place
San Francisco
DOI
UT WoS
000409394400145
EID Scopus
BibTeX
@inproceedings{BUT132598,
  author="Santosh {Kesiraju} and Lukáš {Burget} and Igor {Szőke} and Jan {Černocký}",
  title="Learning document representations using subspace multinomial model",
  booktitle="Proceedings of Interspeech 2016",
  year="2016",
  pages="700--704",
  publisher="International Speech Communication Association",
  address="San Francisco",
  doi="10.21437/Interspeech.2016-1634",
  isbn="978-1-5108-3313-5",
  url="https://www.researchgate.net/publication/307889473_Learning_Document_Representations_Using_Subspace_Multinomial_Model"
}
Files
Back to top