Publication Details

Nalezení slovních kořenů v češtině

CHMELAŘ, P.; HELLEBRAND, D.; HRUŠECKÝ, M.; BARTÍK, V. Nalezení slovních kořenů v češtině. CEUR Workshop Proceedings, 2011, roč. 2011, č. 802, s. 1-12. ISSN: 1613-0073.
English title
Czech Stemming Algorithm
Type
journal article
Language
Czech
Authors
Chmelař Petr, Ing.
Hellebrand David, Ing.
Hrušecký Michal
Bartík Vladimír, Ing., Ph.D. (DIFS)
URL
Keywords

Lemmatization, stemmization, Snowball, Czech, grammar.

Abstract

The goal was to create an algorithm for stemming Czech language based on grammatical rules, in addition to methods using vocabulary for retrieval and mining of Czech texts. The article includes the basics of Czech word formation for different word classes, description of problems and several stemming and lemmatization algorithms. The main contribution of this work is the implementation of the Snowball stemming algorithm for the Czech language based on complete sets of all prefixes and suffixes, which may occur in Czech words.

Published
2011
Pages
1–12
Journal
CEUR Workshop Proceedings, vol. 2011, no. 802, ISSN 1613-0073
Book
Selected papers from the 10th annual Czech and Slovak knowledge technology conference (Znalosti 2011)
Publisher
Aachen University of Technology
Place
Aachen
BibTeX
@article{BUT91156,
  author="Petr {Chmelař} and David {Hellebrand} and Michal {Hrušecký} and Vladimír {Bartík}",
  title="Nalezení slovních kořenů v češtině",
  journal="CEUR Workshop Proceedings",
  year="2011",
  volume="2011",
  number="802",
  pages="1--12",
  issn="1613-0073",
  url="http://www.ceur-ws.org/Vol-802"
}
Back to top