Publication Details

Hierarchies in HTML Documents: Linking Text to Concepts

BURGET, R. Hierarchies in HTML Documents: Linking Text to Concepts. 15th International Workshop on Database and Expert Systems Applications. Zaragoza: IEEE Computer Society, 2004. p. 186-190. ISBN: 0-7695-2195-9.
Czech title
Hierarchie v HTML dokumentech: Přiřazování textu ke konceptům
Type
conference paper
Language
English
Authors
Keywords

HTML, Information extraction, Ontology, Logical document structure

Abstract

For the successful setting of the Semantic Web, it is necessary to provide tools
for linking the large amounts of data that are currently available in HTML
documents to the Semantic Web ontologies. Due to the enormous variability of the
HTML code, it is very limiting to define direct bindings between patterns of the
HTML code and the concepts. We propose an approach based on modeling the visual
part of the rendered document and describing the key characteristics of the data
presentation in a general way. As a next step, we propose the way for using this
model for locating the instances of the concepts in the document using the
approximate tree matching algorithms and regular expressions.

Published
2004
Pages
186–190
Proceedings
15th International Workshop on Database and Expert Systems Applications
ISBN
0-7695-2195-9
Publisher
IEEE Computer Society
Place
Zaragoza
BibTeX
@inproceedings{BUT17352,
  author="Radek {Burget}",
  title="Hierarchies in HTML Documents: Linking Text to Concepts",
  booktitle="15th International Workshop on Database and Expert Systems Applications",
  year="2004",
  pages="186--190",
  publisher="IEEE Computer Society",
  address="Zaragoza",
  isbn="0-7695-2195-9"
}
Back to top