Publication Details
HTML Document Analysis for Information Extraction
BURGET, R. HTML Document Analysis for Information Extraction. Proceedings of 8th EEICT conference. Brno: Faculty of Information Technology BUT, 2002. p. 426-430. ISBN: 80-214-2116-9.
Czech title
Analýza HTML dokumentů pro extrakci informace
Type
conference paper
Language
English
Authors
Keywords
HTML Analysis, Information Extraction
Abstract
The today's World Wide Web contains a vast amount of information stored in HTML documents. However, the HTML language primarily describes the look of the documents and it doesn't contain facilities for the description of contained data structure. In this paper we propose a model of a Web site that describes logical structure of contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.
Published
2002
Pages
426–430
Proceedings
Proceedings of 8th EEICT conference
ISBN
80-214-2116-9
Publisher
Faculty of Information Technology BUT
Place
Brno
BibTeX
@inproceedings{BUT10014,
author="Radek {Burget}",
title="HTML Document Analysis for Information Extraction",
booktitle="Proceedings of 8th EEICT conference",
year="2002",
pages="426--430",
publisher="Faculty of Information Technology BUT",
address="Brno",
isbn="80-214-2116-9"
}