Publication Details

Multi-aspect Document Content Analysis using Ontological Modelling

MILIČKA, M.; BURGET, R. Multi-aspect Document Content Analysis using Ontological Modelling. Proceedings of 9th Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2014). Smolenice: Vydavateľstvo STU, 2014. p. 9-12. ISBN: 978-80-227-4267-2.
Czech title
Analýza více aspektů obsahu dokumentu s využitím ontologií
Type
conference paper
Language
English
Authors
Milička Martin, Ing.
Burget Radek, doc. Ing., Ph.D. (DIFS)
Keywords

document modeling, information extraction, page segmentation, content
classification, ontology, RDF

Abstract

Existing methods of information extraction from web documents are usually based
on a single aspect of the document or its contents such as the code, textual
features or visual features. Due to the great variability of the available online
documents, it seems reasonable to combine multiple kinds of analysis in order to
use all the available knowledge for identifying a particular information in the
document. In this paper, we propose an ontological document model that allows to
integrate the results of the analysis of different document aspects. We propose
a generic architecture of an information extraction system based on this model
and we show its applicability on a practical example.

Published
2014
Pages
9–12
Proceedings
Proceedings of 9th Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2014)
Conference
9th Workshop on Intelligent and Knowledge oriented Technologies, Smolenice, SK
ISBN
978-80-227-4267-2
Publisher
Vydavateľstvo STU
Place
Smolenice
BibTeX
@inproceedings{BUT111652,
  author="Martin {Milička} and Radek {Burget}",
  title="Multi-aspect Document Content Analysis using Ontological Modelling",
  booktitle="Proceedings of 9th Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2014)",
  year="2014",
  pages="9--12",
  publisher="Vydavateľstvo STU",
  address="Smolenice",
  isbn="978-80-227-4267-2",
  url="https://www.fit.vut.cz/research/publication/10724/"
}
Files
Back to top