Publication Details

Analyzing Logical Structure of a Web Site

BURGET, R. Analyzing Logical Structure of a Web Site. Proceedings of 5th International Conference ISM '02 - Information Systems Modelling. Ostrava: 2002. p. 29-35. ISBN: 80-85988-70-4.
Type
conference paper
Language
English
Authors
URL
Keywords

HTML analysis, Semi-structured data, Information extraction

Abstract

The today's World Wide Web consists mainly of documents written in Hypertext
Markup Language (HTML). This language has been developed for describing the look
of the documents and the references to other documents and therefore it has very
poor facilities for describing the semantics and the structure of the contained
data. Moreover, some of these facilities are often not used by the authors of the
documents or they are not used in apropriate way. In our work, we are attempting
to analyze the look and the stucture of a Web site represented by the facilities
of the HTML language and create its logical model which would represent the data
relations the same way a human user would see it. We propose a tree
representation of a Web site and algorithms for the analysis of the most
importatnt HTML constructions - section headings, lists, tables and links.

Published
2002
Pages
29–35
Proceedings
Proceedings of 5th International Conference ISM '02 - Information Systems Modelling
Conference
5th International Conference on Information System Modelling - ISM'02, Rožnov pod Radhoštěm, CZ
ISBN
80-85988-70-4
Place
Ostrava
BibTeX
@inproceedings{BUT10013,
  author="Radek {Burget}",
  title="Analyzing Logical Structure of a Web Site",
  booktitle="Proceedings of 5th International Conference ISM '02 - Information Systems Modelling",
  year="2002",
  pages="29--35",
  address="Ostrava",
  isbn="80-85988-70-4",
  url="http://www.fit.vutbr.cz/~burgetr/publications/ism2002.ps"
}
Back to top