Publication Details

Automatic Web Document Restructuring Based on Visual Information Analysis

BURGET, R. Automatic Web Document Restructuring Based on Visual Information Analysis. In Advances in Intelligent Web Mastering - 2, Proceedings of the 6th Atlantic Web Intelligence Conference - AWIC'2009. Advances in Intelligent and Soft Computing , Vol. 67. Prague: Springer Verlag, 2010. p. 61-70. ISBN: 978-3-642-10686-6.
Czech title
Automatická úprava struktury webových dokumentů na základě analýzy vizuální informace
Type
conference paper
Language
English
Authors
Keywords

document restructuring, page analysis, page segmentation, block importance

Abstract

Many documents available on the current web have quite a complex structure that
allows to present various kinds of information. Apart from the main content, the
documents usually contain headers and footers, navigation sections and other
types of additional information. For many applications such as document indexing
or browsing on special devices, it is desirable that the main document
information should precede the additional information in the underlying HTML
code. In this paper, we propose a method of document preprocessing that
automatically restructures the document code according to this criteria. Our
method is based on rendered document analysis. A page segmentation algorithm is
used for detecting the basic blocks on the page and the relevance of the
individual parts is estimated from the visual properties of the text content.

Published
2010
Pages
61–70
Proceedings
Advances in Intelligent Web Mastering - 2, Proceedings of the 6th Atlantic Web Intelligence Conference - AWIC'2009
Series
Advances in Intelligent and Soft Computing , Vol. 67
Conference
6th Atlantic Web Intelligence Conference, Prague, CZ
ISBN
978-3-642-10686-6
Publisher
Springer Verlag
Place
Prague
DOI
EID Scopus
BibTeX
@inproceedings{BUT30224,
  author="Radek {Burget}",
  title="Automatic Web Document Restructuring Based on Visual Information Analysis",
  booktitle="Advances in Intelligent Web Mastering - 2, Proceedings of the 6th Atlantic Web Intelligence Conference - AWIC'2009",
  year="2010",
  series="Advances in Intelligent and Soft Computing , Vol. 67",
  pages="61--70",
  publisher="Springer Verlag",
  address="Prague",
  doi="10.1007/978-3-642-10687-3\{_}6",
  isbn="978-3-642-10686-6"
}
Back to top