Content-Based Retrieval of Historical Ottoman Documents Stored As Textual Images

There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for...

Full description

Bibliographic Details
Main Authors:	Ediz Saykol, Ali Kemal Sinop, Ugur Gudukbay, Ozgur Ulusoy, Özgür Ulusoy, A. Enis Cetin
Other Authors:	The Pennsylvania State University CiteSeerX Archives
Format:	Text
Language:	English
Published:	2004
Subjects:	The Pointers
Online Access:	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.8791 http://www.cs.bilkent.edu.tr/~ediz/bilmdg/papers/ieeetip.pdf

id	ftciteseerx:oai:CiteSeerX.psu:10.1.1.4.8791
record_format	openpolar
spelling	ftciteseerx:oai:CiteSeerX.psu:10.1.1.4.8791 2023-05-15T18:32:43+02:00 Content-Based Retrieval of Historical Ottoman Documents Stored As Textual Images Ediz Saykol Ali Kemal Sinop Ugur Gudukbay Ozgur Ulusoy Özgür Ulusoy A. Enis Cetin The Pennsylvania State University CiteSeerX Archives 2004 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.8791 http://www.cs.bilkent.edu.tr/~ediz/bilmdg/papers/ieeetip.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.8791 http://www.cs.bilkent.edu.tr/~ediz/bilmdg/papers/ieeetip.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://www.cs.bilkent.edu.tr/~ediz/bilmdg/papers/ieeetip.pdf text 2004 ftciteseerx 2016-09-25T00:16:34Z There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images,which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts. Text The Pointers Unknown
institution	Open Polar
collection	Unknown
op_collection_id	ftciteseerx
language	English
description	There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images,which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.
author2	The Pennsylvania State University CiteSeerX Archives
format	Text
author	Ediz Saykol Ali Kemal Sinop Ugur Gudukbay Ozgur Ulusoy Özgür Ulusoy A. Enis Cetin
spellingShingle	Ediz Saykol Ali Kemal Sinop Ugur Gudukbay Ozgur Ulusoy Özgür Ulusoy A. Enis Cetin Content-Based Retrieval of Historical Ottoman Documents Stored As Textual Images
author_facet	Ediz Saykol Ali Kemal Sinop Ugur Gudukbay Ozgur Ulusoy Özgür Ulusoy A. Enis Cetin
author_sort	Ediz Saykol
title	Content-Based Retrieval of Historical Ottoman Documents Stored As Textual Images
title_short	Content-Based Retrieval of Historical Ottoman Documents Stored As Textual Images
title_full	Content-Based Retrieval of Historical Ottoman Documents Stored As Textual Images
title_fullStr	Content-Based Retrieval of Historical Ottoman Documents Stored As Textual Images
title_full_unstemmed	Content-Based Retrieval of Historical Ottoman Documents Stored As Textual Images
title_sort	content-based retrieval of historical ottoman documents stored as textual images
publishDate	2004
url	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.8791 http://www.cs.bilkent.edu.tr/~ediz/bilmdg/papers/ieeetip.pdf
genre	The Pointers
genre_facet	The Pointers
op_source	http://www.cs.bilkent.edu.tr/~ediz/bilmdg/papers/ieeetip.pdf
op_relation	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.8791 http://www.cs.bilkent.edu.tr/~ediz/bilmdg/papers/ieeetip.pdf
op_rights	Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_	1766216900678778880

Content-Based Retrieval of Historical Ottoman Documents Stored As Textual Images

Similar Items