PDF Enhancements Tools for a Digital Library

summary:This paper describes several innovative PDF document enhancements and tools that can be used when building a digital library. The main result presented in this paper is the PDF re-compression tool, developed using the jbig2enc encoder called pdfJbIm. This re-compression tool enables the size...

Full description

Bibliographic Details
Main Authors: Hatlapatka, Radim, Sojka, Petr
Format: Conference Object
Language:English
Published: Masaryk University Press 2010
Subjects:
DML
Online Access:http://hdl.handle.net/10338.dmlcz/702572
id ftdmlcz:oai:oai.dml.cz:10338.dmlcz/702572
record_format openpolar
institution Open Polar
collection DML-CZ (Czech Digital Mathematics Library)
op_collection_id ftdmlcz
language English
topic keyword:jbig2enc
keyword:JBIG2
keyword:PDF size optimization
keyword:compression
keyword:DML
keyword:digital signature
keyword:JB2
keyword:DjVu
keyword:pdfsign
keyword:DML-CZ
keyword:EuDML
keyword:pdfsizeopt.py
keyword:Google
keyword:JB2 algorithm
msc:68-06
msc:68U10
msc:68U15
msc:68U99
spellingShingle keyword:jbig2enc
keyword:JBIG2
keyword:PDF size optimization
keyword:compression
keyword:DML
keyword:digital signature
keyword:JB2
keyword:DjVu
keyword:pdfsign
keyword:DML-CZ
keyword:EuDML
keyword:pdfsizeopt.py
keyword:Google
keyword:JB2 algorithm
msc:68-06
msc:68U10
msc:68U15
msc:68U99
Hatlapatka, Radim
Sojka, Petr
PDF Enhancements Tools for a Digital Library
topic_facet keyword:jbig2enc
keyword:JBIG2
keyword:PDF size optimization
keyword:compression
keyword:DML
keyword:digital signature
keyword:JB2
keyword:DjVu
keyword:pdfsign
keyword:DML-CZ
keyword:EuDML
keyword:pdfsizeopt.py
keyword:Google
keyword:JB2 algorithm
msc:68-06
msc:68U10
msc:68U15
msc:68U99
description summary:This paper describes several innovative PDF document enhancements and tools that can be used when building a digital library. The main result presented in this paper is the PDF re-compression tool, developed using the jbig2enc encoder called pdfJbIm. This re-compression tool enables the size of the original bitonal PDFs to be, on average, downsized by one third. Some modifications to the jbig2enc encoder that increase the compression ratio even further are also described here. Together with another program, the pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size to such an extent that the transmission needs of a digital library were significantly reduced. We report the storage saving results that we have achieved on The Czech Digital Mathematics Library DML-CZ—we have downsized the PDF corpus to 43% of its original size. We also describe pdfsign tool for batch digital signature stamping of PDF documents.
format Conference Object
author Hatlapatka, Radim
Sojka, Petr
author_facet Hatlapatka, Radim
Sojka, Petr
author_sort Hatlapatka, Radim
title PDF Enhancements Tools for a Digital Library
title_short PDF Enhancements Tools for a Digital Library
title_full PDF Enhancements Tools for a Digital Library
title_fullStr PDF Enhancements Tools for a Digital Library
title_full_unstemmed PDF Enhancements Tools for a Digital Library
title_sort pdf enhancements tools for a digital library
publisher Masaryk University Press
publishDate 2010
url http://hdl.handle.net/10338.dmlcz/702572
genre DML
genre_facet DML
op_relation isbn:978-80-210-5242-0
reference:1. Bartošek, M., Lhoták, M., Rákosník, J., Sojka, P., Šárfy, M.: DML-CZ: The Objectives and the First Steps.In: Borwein, J., Rocha, E.M., Rodrigues, J.F. (eds.) CMDE 2006: Communicating Mathematics in the Digital Era, pp. 69–79. A. K. Peters, MA, USA (2008) MR 2590568
reference:2. Bloomberg, D.: Leptonica.[online] (2010), [cit. 2010-04-25], http://www.leptonica.com/jbig2.html
reference:3. Bočák, P.: Digitáne podpisované PDF dokumenty (Bachelor thesis written in Czech, Digital signatures of PDF documents).Masaryk University, Faculty of Informatics (advisor Petr Sojka), Brno, Czech Republic (2008)
reference:4. Bottou, L., Haffner, P., Howard, P.G., Simard, P., Bengio, Y., Le Cun, Y.: High Quality Document Image Compression with DjVu.Journal of Electronic Imaging 7(3), 410–425 (1998), http://leon.bottou.org/papers/bottou-98
reference:5. Bruno, L.: IText PDF.[online] (2009), http://www.itextpdf.com/
reference:6. Committee, J.: 14492 FCD.ISO/IEC JTC 1/SC 29/WG 1 (1999), http://www.jpeg.org/public/fcd14492.pdf
reference:7. Foundation, T.A.S.: Apache PDFBox – Java PDF Library.[online] (2010), http://pdfbox.apache.org/
reference:8. Hatlapatka, R.: JBIG2 komprese (Bachelor thesis written in Czech, JBIG2 compression).Masaryk University, Faculty of Informatics (advisor Petr Sojka), Brno, Czech Republic (2010)
reference:9. Hatlapatka, R.: PDF Recompression using JBIG2.[online] (2010), http://nlp.fi.muni.cz/projekty/eudml/pdfRecompression/
reference:10. Hatlapatka, R.: Source codes of pdfJbIm.[online] (2010), http://code.google.com/p/pdfrecompressor/
reference:11. Howard, P.: Text image compression using soft pattern matching.Computer Journal 40(2/3), 146–156 (1997)
reference:12. ISO/IEC JTC1/SC29/WG1: JBIG Maui Meeting Press Release.(December 1999), http://www.jpeg.org/public/mauijbig.pdf
reference:13. Langley, A.: Homepage of jbig2enc encoder.[online], http://github.com/agl/jbig2enc
reference:14. Sylwestrzak, W., Borbinha, J., Bouche, T., Nowiński, A., Sojka, P.: EuDML—Towards the European Digital Mathematics Library.In: Sojka, P. (ed.) Proceedings of DML 2010. Masaryk University Press, Paris, France (Jul 2010)
reference:15. Adobe Systems Incorporated: Adobe Systems Incorporated: PDF Reference.pp. 90–100. Adobe Systems Incorporated, sixth edn. (2006), http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
reference:16. Szabó, P.: Optimizing PDF output size of TeX documents.TUGboat 30(3), 112–130 (2009), [cit. 2010-04-26], http://code.google.com/p/pdfsizeopt/
reference:17. Union, I.T.: ITU-T Recommendation T.88.ITU-T Recommendation T.88 (2000), http://www.itu.int/rec/T-REC-T.88-200002-I/en
op_rights access:Unrestricted
rights:DML-CZ Czech Digital Mathematics Library, http://dml.cz/
rights:Institute of Mathematics AS CR, http://www.math.cas.cz/
conditionOfUse:http://dml.cz/use
_version_ 1768386254692417536
spelling ftdmlcz:oai:oai.dml.cz:10338.dmlcz/702572 2023-06-11T04:11:18+02:00 PDF Enhancements Tools for a Digital Library Hatlapatka, Radim Sojka, Petr 2010 application/pdf http://hdl.handle.net/10338.dmlcz/702572 eng eng Masaryk University Press isbn:978-80-210-5242-0 reference:1. Bartošek, M., Lhoták, M., Rákosník, J., Sojka, P., Šárfy, M.: DML-CZ: The Objectives and the First Steps.In: Borwein, J., Rocha, E.M., Rodrigues, J.F. (eds.) CMDE 2006: Communicating Mathematics in the Digital Era, pp. 69–79. A. K. Peters, MA, USA (2008) MR 2590568 reference:2. Bloomberg, D.: Leptonica.[online] (2010), [cit. 2010-04-25], http://www.leptonica.com/jbig2.html reference:3. Bočák, P.: Digitáne podpisované PDF dokumenty (Bachelor thesis written in Czech, Digital signatures of PDF documents).Masaryk University, Faculty of Informatics (advisor Petr Sojka), Brno, Czech Republic (2008) reference:4. Bottou, L., Haffner, P., Howard, P.G., Simard, P., Bengio, Y., Le Cun, Y.: High Quality Document Image Compression with DjVu.Journal of Electronic Imaging 7(3), 410–425 (1998), http://leon.bottou.org/papers/bottou-98 reference:5. Bruno, L.: IText PDF.[online] (2009), http://www.itextpdf.com/ reference:6. Committee, J.: 14492 FCD.ISO/IEC JTC 1/SC 29/WG 1 (1999), http://www.jpeg.org/public/fcd14492.pdf reference:7. Foundation, T.A.S.: Apache PDFBox – Java PDF Library.[online] (2010), http://pdfbox.apache.org/ reference:8. Hatlapatka, R.: JBIG2 komprese (Bachelor thesis written in Czech, JBIG2 compression).Masaryk University, Faculty of Informatics (advisor Petr Sojka), Brno, Czech Republic (2010) reference:9. Hatlapatka, R.: PDF Recompression using JBIG2.[online] (2010), http://nlp.fi.muni.cz/projekty/eudml/pdfRecompression/ reference:10. Hatlapatka, R.: Source codes of pdfJbIm.[online] (2010), http://code.google.com/p/pdfrecompressor/ reference:11. Howard, P.: Text image compression using soft pattern matching.Computer Journal 40(2/3), 146–156 (1997) reference:12. ISO/IEC JTC1/SC29/WG1: JBIG Maui Meeting Press Release.(December 1999), http://www.jpeg.org/public/mauijbig.pdf reference:13. Langley, A.: Homepage of jbig2enc encoder.[online], http://github.com/agl/jbig2enc reference:14. Sylwestrzak, W., Borbinha, J., Bouche, T., Nowiński, A., Sojka, P.: EuDML—Towards the European Digital Mathematics Library.In: Sojka, P. (ed.) Proceedings of DML 2010. Masaryk University Press, Paris, France (Jul 2010) reference:15. Adobe Systems Incorporated: Adobe Systems Incorporated: PDF Reference.pp. 90–100. Adobe Systems Incorporated, sixth edn. (2006), http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf reference:16. Szabó, P.: Optimizing PDF output size of TeX documents.TUGboat 30(3), 112–130 (2009), [cit. 2010-04-26], http://code.google.com/p/pdfsizeopt/ reference:17. Union, I.T.: ITU-T Recommendation T.88.ITU-T Recommendation T.88 (2000), http://www.itu.int/rec/T-REC-T.88-200002-I/en access:Unrestricted rights:DML-CZ Czech Digital Mathematics Library, http://dml.cz/ rights:Institute of Mathematics AS CR, http://www.math.cas.cz/ conditionOfUse:http://dml.cz/use keyword:jbig2enc keyword:JBIG2 keyword:PDF size optimization keyword:compression keyword:DML keyword:digital signature keyword:JB2 keyword:DjVu keyword:pdfsign keyword:DML-CZ keyword:EuDML keyword:pdfsizeopt.py keyword:Google keyword:JB2 algorithm msc:68-06 msc:68U10 msc:68U15 msc:68U99 type:math text:in_proceedings 2010 ftdmlcz 2023-04-24T16:24:59Z summary:This paper describes several innovative PDF document enhancements and tools that can be used when building a digital library. The main result presented in this paper is the PDF re-compression tool, developed using the jbig2enc encoder called pdfJbIm. This re-compression tool enables the size of the original bitonal PDFs to be, on average, downsized by one third. Some modifications to the jbig2enc encoder that increase the compression ratio even further are also described here. Together with another program, the pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size to such an extent that the transmission needs of a digital library were significantly reduced. We report the storage saving results that we have achieved on The Czech Digital Mathematics Library DML-CZ—we have downsized the PDF corpus to 43% of its original size. We also describe pdfsign tool for batch digital signature stamping of PDF documents. Conference Object DML DML-CZ (Czech Digital Mathematics Library)