JBIG2 Supported by OCR

Abstract. Digital Mathematical libraries contain a large volume of PDF documents containing scanned text. In this paper we describe how this documents can be compressed and thus provide them more effectively to the users. We introduce a JBIG2 standard for compressing bitonal images such as scanned t...

Full description

Bibliographic Details
Main Author: Radim Hatlapatka
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
DML
OCR
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.6794
http://ceur-ws.org/Vol-921/wip-04.pdf
Description
Summary:Abstract. Digital Mathematical libraries contain a large volume of PDF documents containing scanned text. In this paper we describe how this documents can be compressed and thus provide them more effectively to the users. We introduce a JBIG2 standard for compressing bitonal images such as scanned text and we discuss issues if OCR is used for improving the compression ratio of jbig2enc open-source encoder. For this purpose we have designed API for using OCR in jbig2enc which we describe in this paper together with already achieved results.