Content-Based Retrieval of Historical Ottoman Documents Stored as Textual Images

Cataloged from PDF version of article. There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large da...

Full description

Bibliographic Details
Published in:IEEE Transactions on Image Processing
Main Authors: Saykol, E., Sinop, A. K., Gudukbay, U., Ulusoy, O., Cetin, A. E.
Format: Article in Journal/Newspaper
Language:English
Published: IEEE 2015
Subjects:
Online Access:http://hdl.handle.net/11693/11270
https://doi.org/10.1109/TIP.2003.821114
id ftbilkentuniv:oai:repository.bilkent.edu.tr:11693/11270
record_format openpolar
spelling ftbilkentuniv:oai:repository.bilkent.edu.tr:11693/11270 2023-05-15T18:32:43+02:00 Content-Based Retrieval of Historical Ottoman Documents Stored as Textual Images Saykol, E. Sinop, A. K. Gudukbay, U. Ulusoy, O. Cetin, A. E. 2015-07-28T11:57:18Z application/pdf http://hdl.handle.net/11693/11270 https://doi.org/10.1109/TIP.2003.821114 English eng IEEE http://dx.doi.org/10.1109/TIP.2003.821114 Şaykol, E., Sinop, A. K., Güdükbay, U., Ulusoy, Ö., & Çetin, A. E. (2004). Content-based retrieval of historical Ottoman documents stored as textual images. Image Processing, IEEE Transactions on, 13(3), 314-325. 1057-7149 http://hdl.handle.net/11693/11270 doi:10.1109/TIP.2003.821114 Copyright © 2004 IEEE. IEEE Transactions on Image Processing Angular And Distance Span Binary Wavelet Decomposition Content-based Retrieval Historical Document Compression Partial Symbol-wise Matching Article 2015 ftbilkentuniv https://doi.org/10.1109/TIP.2003.821114 2022-04-13T19:20:10Z Cataloged from PDF version of article. There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts. Article in Journal/Newspaper The Pointers Bilkent University: Institutional Repository IEEE Transactions on Image Processing 13 3 314 325
institution Open Polar
collection Bilkent University: Institutional Repository
op_collection_id ftbilkentuniv
language English
topic Angular And Distance Span
Binary Wavelet Decomposition
Content-based Retrieval
Historical Document Compression
Partial Symbol-wise Matching
spellingShingle Angular And Distance Span
Binary Wavelet Decomposition
Content-based Retrieval
Historical Document Compression
Partial Symbol-wise Matching
Saykol, E.
Sinop, A. K.
Gudukbay, U.
Ulusoy, O.
Cetin, A. E.
Content-Based Retrieval of Historical Ottoman Documents Stored as Textual Images
topic_facet Angular And Distance Span
Binary Wavelet Decomposition
Content-based Retrieval
Historical Document Compression
Partial Symbol-wise Matching
description Cataloged from PDF version of article. There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.
format Article in Journal/Newspaper
author Saykol, E.
Sinop, A. K.
Gudukbay, U.
Ulusoy, O.
Cetin, A. E.
author_facet Saykol, E.
Sinop, A. K.
Gudukbay, U.
Ulusoy, O.
Cetin, A. E.
author_sort Saykol, E.
title Content-Based Retrieval of Historical Ottoman Documents Stored as Textual Images
title_short Content-Based Retrieval of Historical Ottoman Documents Stored as Textual Images
title_full Content-Based Retrieval of Historical Ottoman Documents Stored as Textual Images
title_fullStr Content-Based Retrieval of Historical Ottoman Documents Stored as Textual Images
title_full_unstemmed Content-Based Retrieval of Historical Ottoman Documents Stored as Textual Images
title_sort content-based retrieval of historical ottoman documents stored as textual images
publisher IEEE
publishDate 2015
url http://hdl.handle.net/11693/11270
https://doi.org/10.1109/TIP.2003.821114
genre The Pointers
genre_facet The Pointers
op_source IEEE Transactions on Image Processing
op_relation http://dx.doi.org/10.1109/TIP.2003.821114
Şaykol, E., Sinop, A. K., Güdükbay, U., Ulusoy, Ö., & Çetin, A. E. (2004). Content-based retrieval of historical Ottoman documents stored as textual images. Image Processing, IEEE Transactions on, 13(3), 314-325.
1057-7149
http://hdl.handle.net/11693/11270
doi:10.1109/TIP.2003.821114
op_rights Copyright © 2004 IEEE.
op_doi https://doi.org/10.1109/TIP.2003.821114
container_title IEEE Transactions on Image Processing
container_volume 13
container_issue 3
container_start_page 314
op_container_end_page 325
_version_ 1766216902983548928