Content-based retrieval of historical Ottoman documents stored as textual images

There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for...

Full description

Bibliographic Details
Published in:	IEEE Transactions on Image Processing
Main Authors:	Şaykol, E., Sinop, A.K., Güdükbay, U., Ulusoy Ö., Çetin, A.E.
Format:	Article in Journal/Newspaper
Language:	English
Published:	2004
Subjects:	Angular and distance span Binary wavelet decomposition Content-based retrieval Historical document compression Partial symbol-wise matching Database systems Feature extraction Image analysis Image compression Imaging techniques Multimedia systems Wavelet transforms Partial symbol wise matching Textual image compression Content based retrieval algorithm archeology art article automated pattern recognition comparative study computer assisted diagnosis computer graphics computer interface computer program cultural anthropology data base documentation evaluation factual database hypermedia image enhancement information center information dissemination information processing Internet methodology natural language processing reproducibility sensitivity and specificity signal processing validation study Abstracting and Indexing Algorithms Archaeology Archives Automatic Data Processing Culture Data Compression Database Management Systems Databases Factual Image Interpretation Computer-Assisted Pattern Recognition Automated Reproducibility of Results Software User-Computer Interface The Pointers
Online Access:	http://hdl.handle.net/11693/24313 https://doi.org/10.1109/TIP.2003.821114

id	ftbilkentuniv:oai:repository.bilkent.edu.tr:11693/24313
record_format	openpolar
spelling	ftbilkentuniv:oai:repository.bilkent.edu.tr:11693/24313 2023-05-15T18:32:45+02:00 Content-based retrieval of historical Ottoman documents stored as textual images Şaykol, E. Sinop, A.K. Güdükbay, U. Ulusoy Ö. Çetin, A.E. 2004 application/pdf http://hdl.handle.net/11693/24313 https://doi.org/10.1109/TIP.2003.821114 English eng http://dx.doi.org/10.1109/TIP.2003.821114 10577149 http://hdl.handle.net/11693/24313 doi:10.1109/TIP.2003.821114 IEEE Transactions on Image Processing Angular and distance span Binary wavelet decomposition Content-based retrieval Historical document compression Partial symbol-wise matching Database systems Feature extraction Image analysis Image compression Imaging techniques Multimedia systems Wavelet transforms Partial symbol wise matching Textual image compression Content based retrieval algorithm archeology art article automated pattern recognition comparative study computer assisted diagnosis computer graphics computer interface computer program cultural anthropology data base documentation evaluation factual database hypermedia image enhancement information center information dissemination information processing Internet methodology natural language processing reproducibility sensitivity and specificity signal processing validation study Abstracting and Indexing Algorithms Archaeology Archives Automatic Data Processing Culture Data Compression Database Management Systems Databases Factual Image Interpretation Computer-Assisted Pattern Recognition Automated Reproducibility of Results Software User-Computer Interface Article 2004 ftbilkentuniv https://doi.org/10.1109/TIP.2003.821114 2022-04-13T19:21:29Z There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts. Article in Journal/Newspaper The Pointers Bilkent University: Institutional Repository IEEE Transactions on Image Processing 13 3 314 325
institution	Open Polar
collection	Bilkent University: Institutional Repository
op_collection_id	ftbilkentuniv
language	English
topic	Angular and distance span Binary wavelet decomposition Content-based retrieval Historical document compression Partial symbol-wise matching Database systems Feature extraction Image analysis Image compression Imaging techniques Multimedia systems Wavelet transforms Partial symbol wise matching Textual image compression Content based retrieval algorithm archeology art article automated pattern recognition comparative study computer assisted diagnosis computer graphics computer interface computer program cultural anthropology data base documentation evaluation factual database hypermedia image enhancement information center information dissemination information processing Internet methodology natural language processing reproducibility sensitivity and specificity signal processing validation study Abstracting and Indexing Algorithms Archaeology Archives Automatic Data Processing Culture Data Compression Database Management Systems Databases Factual Image Interpretation Computer-Assisted Pattern Recognition Automated Reproducibility of Results Software User-Computer Interface
spellingShingle	Angular and distance span Binary wavelet decomposition Content-based retrieval Historical document compression Partial symbol-wise matching Database systems Feature extraction Image analysis Image compression Imaging techniques Multimedia systems Wavelet transforms Partial symbol wise matching Textual image compression Content based retrieval algorithm archeology art article automated pattern recognition comparative study computer assisted diagnosis computer graphics computer interface computer program cultural anthropology data base documentation evaluation factual database hypermedia image enhancement information center information dissemination information processing Internet methodology natural language processing reproducibility sensitivity and specificity signal processing validation study Abstracting and Indexing Algorithms Archaeology Archives Automatic Data Processing Culture Data Compression Database Management Systems Databases Factual Image Interpretation Computer-Assisted Pattern Recognition Automated Reproducibility of Results Software User-Computer Interface Şaykol, E. Sinop, A.K. Güdükbay, U. Ulusoy Ö. Çetin, A.E. Content-based retrieval of historical Ottoman documents stored as textual images
topic_facet	Angular and distance span Binary wavelet decomposition Content-based retrieval Historical document compression Partial symbol-wise matching Database systems Feature extraction Image analysis Image compression Imaging techniques Multimedia systems Wavelet transforms Partial symbol wise matching Textual image compression Content based retrieval algorithm archeology art article automated pattern recognition comparative study computer assisted diagnosis computer graphics computer interface computer program cultural anthropology data base documentation evaluation factual database hypermedia image enhancement information center information dissemination information processing Internet methodology natural language processing reproducibility sensitivity and specificity signal processing validation study Abstracting and Indexing Algorithms Archaeology Archives Automatic Data Processing Culture Data Compression Database Management Systems Databases Factual Image Interpretation Computer-Assisted Pattern Recognition Automated Reproducibility of Results Software User-Computer Interface
description	There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.
format	Article in Journal/Newspaper
author	Şaykol, E. Sinop, A.K. Güdükbay, U. Ulusoy Ö. Çetin, A.E.
author_facet	Şaykol, E. Sinop, A.K. Güdükbay, U. Ulusoy Ö. Çetin, A.E.
author_sort	Şaykol, E.
title	Content-based retrieval of historical Ottoman documents stored as textual images
title_short	Content-based retrieval of historical Ottoman documents stored as textual images
title_full	Content-based retrieval of historical Ottoman documents stored as textual images
title_fullStr	Content-based retrieval of historical Ottoman documents stored as textual images
title_full_unstemmed	Content-based retrieval of historical Ottoman documents stored as textual images
title_sort	content-based retrieval of historical ottoman documents stored as textual images
publishDate	2004
url	http://hdl.handle.net/11693/24313 https://doi.org/10.1109/TIP.2003.821114
genre	The Pointers
genre_facet	The Pointers
op_source	IEEE Transactions on Image Processing
op_relation	http://dx.doi.org/10.1109/TIP.2003.821114 10577149 http://hdl.handle.net/11693/24313 doi:10.1109/TIP.2003.821114
op_doi	https://doi.org/10.1109/TIP.2003.821114
container_title	IEEE Transactions on Image Processing
container_volume	13
container_issue	3
container_start_page	314
op_container_end_page	325
_version_	1766216925092773888

Content-based retrieval of historical Ottoman documents stored as textual images

Similar Items