Publication Details

Cluster-based Page Segmentation - a fast and precise method for web page pre-processing

ZELENÝ, J.; BURGET, R. Cluster-based Page Segmentation - a fast and precise method for web page pre-processing. In The Third International Conference on Web Intelligence, Mining and Semantics. Madrid: Association for Computing Machinery, 2013. p. 1-12. ISBN: 978-1-4503-1850-1.
Czech title
Cluster-based Page Segmentation - rychlá a přesná metoda pro předzpracování webových stránek
Type
conference paper
Language
English
Authors
Zelený Jan, Ing., Ph.D.
Burget Radek, doc. Ing., Ph.D. (DIFS)
Keywords

VIPS, vision-based page segmentation, clustering, template,\\template detection

Abstract

Segmenting a web page may be one of initial steps of information retrieval or
content classification performed on that page. While there has been an extensive
research in this area, the approaches usually focus either on performance or
quality of the results. Vision based segmentation is one of the quality focused
methods, which are considerably slow. This paper proposes an approach for
boosting the performance of vision based algorithms. Our approach is based on
concepts of modern web and a very common scenario in which an entire web site is
processed at once. In this scenario, a great amount of performance boost can be
gained by isomorphic mapping of previous results gathered from pages within the
site to other pages on the same site. We provide the results of experiments
performed on VIPS, the most common algorithm for page segmentation.

Published
2013
Pages
1–12
Proceedings
The Third International Conference on Web Intelligence, Mining and Semantics
Conference
International Conference on Web Intelligence, Mining and Semantics, Madrid, ES
ISBN
978-1-4503-1850-1
Publisher
Association for Computing Machinery
Place
Madrid
DOI
EID Scopus
BibTeX
@inproceedings{BUT106483,
  author="Jan {Zelený} and Radek {Burget}",
  title="Cluster-based Page Segmentation - a fast and precise method for web page pre-processing",
  booktitle="The Third International Conference on Web Intelligence, Mining and Semantics",
  year="2013",
  pages="1--12",
  publisher="Association for Computing Machinery",
  address="Madrid",
  doi="10.1145/2479787.2479792",
  isbn="978-1-4503-1850-1",
  url="https://www.fit.vut.cz/research/publication/10252/"
}
Files
Back to top