Publication Details
Accelerating the process of web page segmentation via template clustering
Burget Radek, doc. Ing., Ph.D. (DIFS)
VIPS, page segmentation, vision-based page segmentation, web page segmentation,
web page preprocessing, segmentation performance, clustering, template, template
detection
Segmenting a web page is often one of the initial steps when performing some data
mining on that page. We acknowledge that there is a lot of research in the area
of segmentation based on visual perception of the web page. In this paper we
propose a method how to improve the efficiency of virtually all vision-based
segmentation algorithms. Our method, called Cluster-based Page Segmentation,
takes the widely spread concept of web templates and utilizes it to improve the
efficiency of vision-based page segmentation by clustering web pages and
performing the segmentation on the cluster instead of on each page in that
cluster. To prove the efficiency of our algorithm we offer experimental results
gathered using three different vision-based segmentation algorithms.
@article{BUT130902,
author="Jan {Zelený} and Radek {Burget}",
title="Accelerating the process of web page segmentation via template clustering",
journal="International Journal of Intelligent Information and Database System",
year="2016",
volume="2016",
number="2",
pages="134--153",
doi="10.1504/IJIIDS.2016.075424",
issn="1751-5858",
url="https://www.fit.vut.cz/research/publication/10530/"
}