Product Details

Automatic document quality assessment software module

Created: 2019

Czech title
Softwarový nástroj pro automatické měření obrazové kvality digitalizovaných textových dokumentů
In order to use the result by another entity, it is always necessary to acquire a license
License Fee
The licensor does not require a license fee for the result
Bako Matúš, Ing.
Buchal Petr, Ing.
Hradiš Michal, Ing., Ph.D. (DCGM)

OCR, document, text quality, readability, Convolutional Networks


This tool provides automatic quality assessment of digitalized documents. The
estimated quality scores closely correspond to readability by humans. The tool
provides quality score heatmaps and an overall quality score for a whole document
page. The module computes local perceptual quality scores based on confidence
scores from Optical Character Recognition (OCR) or directly by a fast
convolutional neural network.

This module is build on top of OCR developed in project PERO (pero-ocr). The text
recognition works in multiple stages. Firstly, locations and heights of text
lines are determined using a fully convolutional neural network (modified U-NET).
The individual text lines are processed by covolutional-recurrent networks
trained using CTC loss. These networks provide confidences of recognized
characters which are locally mapped to perceptual scores. The mapping to
perceptual scores was calibrated on a large dataset of readability ratings by
human readers.

Advanced content extraction and recognition for printed and handwritten documents for better accessibility and usability, MK, Program na podporu aplikovaného výzkumu a experimentálního vývoje národní a kulturní identity na léta 2016 až 2022 (NAKI II), DG18P02OVV055, start: 2018-03-01, end: 2022-12-31, running
Research groups
Back to top