Product Details

Information Extraction Tools from CEUR Workshop Pages

Created: 2015

Czech title
Nástroje pro extrakci informací ze stránek workshopů CEUR
Type
software
License
In order to use the result by another entity, it is always necessary to acquire a license
License Fee
The licensor does not require a license fee for the result
Authors
Burget Radek, doc. Ing., Ph.D. (DIFS)
Milička Martin, Ing.
Keywords

information extraction, web mining, document analysis, text classification

Description

This project implements the applications and tools for automatic information
extraction from the CEUR-WS.org workshop proceedings pages. The tools take the
CEUR HTML pages as an input and produce a structured linked dataset in RDF
format. The implementation is based on the existing FITLayout document analysis
framework with many extensions specific for the given task. The resulting data
may be used for evaluating the quality of the individual CEUR workshops. The
tools were created as a proposed solution of the Task 1 of the Semantic
Publishing Challenge 2015 colocated with the Extended Semantic Web Conference
2015. They were awarded as the Best performing tool and the Most innovative
approach. They provide a case study that demonstrates the developed document
analysis methods.

Location
License Conditions

Free software under the terms of the GNU GPL license.

Projects
The IT4Innovations Centre of Excellence, MŠMT, Operační program Výzkum a vývoj pro inovace, ED1.1.00/02.0070, start: 2011-01-01, end: 2015-12-31, completed
Výzkum pokročilých metod ICT a jejich aplikace, BUT, Vnitřní projekty VUT, FIT-S-14-2299, start: 2014-01-01, end: 2016-12-31, completed
Research groups
Departments
Back to top