The Complexity of Text-Preserving XML Transformations

While XML is nowadays adopted as the de facto standard for data exchange, historically, its predecessor SGML was invented for describing electronic documents, i.e., markedup text. Actually, today there are still large volumes of such XML texts. We consider simple transformations which can change the...

Full description

Bibliographic Details
Main Authors: Timos Antonopoulos, Wim Martens, Frank Neven
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 2011
Subjects:
DML
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.302.4102
http://fox7.eu/wp-content/uploads/pods11a-antonopoulos.pdf
id ftciteseerx:oai:CiteSeerX.psu:10.1.1.302.4102
record_format openpolar
spelling ftciteseerx:oai:CiteSeerX.psu:10.1.1.302.4102 2023-05-15T16:02:02+02:00 The Complexity of Text-Preserving XML Transformations Timos Antonopoulos Wim Martens Frank Neven The Pennsylvania State University CiteSeerX Archives 2011 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.302.4102 http://fox7.eu/wp-content/uploads/pods11a-antonopoulos.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.302.4102 http://fox7.eu/wp-content/uploads/pods11a-antonopoulos.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://fox7.eu/wp-content/uploads/pods11a-antonopoulos.pdf Categories and Subject Descriptors H.2.3 [Database Management Languages—Data manipulation languages (DML F.4.3 [Mathematical Logic and Formal Languages Formal Languages D.2.4 [Software Engineering Software/Program Verification text 2011 ftciteseerx 2016-01-07T22:08:03Z While XML is nowadays adopted as the de facto standard for data exchange, historically, its predecessor SGML was invented for describing electronic documents, i.e., markedup text. Actually, today there are still large volumes of such XML texts. We consider simple transformations which can change the internal structure of documents, that is, the mark-up, and can filter out parts of the text but do not disrupt the ordering of the words. Specifically, we focus on XML transformations where the transformed document is a subsequence of the input document when ignoring mark-up. We call the latter text-preserving XML transformations. We characterize such transformations as copy- and rearrangefree transductions. Furthermore, we study the problem of deciding whether a given XML transducer is text-preserving over a given tree language. We consider top-down transducers as well as the abstraction of XSLT called DTL. We show that deciding whether a transformation is text-preserving over an unranked regular tree language is in Ptime for topdown transducers, EXPtime-complete for DTL with XPath, and decidable for DTL with MSO patterns. Finally, we obtain that for every transducer in one of the above mentioned classes, the maximal subset of the input schema can be computed on which the transformation is text-preserving. Text DML Unknown
institution Open Polar
collection Unknown
op_collection_id ftciteseerx
language English
topic Categories and Subject Descriptors H.2.3 [Database Management
Languages—Data manipulation languages (DML
F.4.3 [Mathematical Logic and Formal Languages
Formal Languages
D.2.4 [Software Engineering
Software/Program Verification
spellingShingle Categories and Subject Descriptors H.2.3 [Database Management
Languages—Data manipulation languages (DML
F.4.3 [Mathematical Logic and Formal Languages
Formal Languages
D.2.4 [Software Engineering
Software/Program Verification
Timos Antonopoulos
Wim Martens
Frank Neven
The Complexity of Text-Preserving XML Transformations
topic_facet Categories and Subject Descriptors H.2.3 [Database Management
Languages—Data manipulation languages (DML
F.4.3 [Mathematical Logic and Formal Languages
Formal Languages
D.2.4 [Software Engineering
Software/Program Verification
description While XML is nowadays adopted as the de facto standard for data exchange, historically, its predecessor SGML was invented for describing electronic documents, i.e., markedup text. Actually, today there are still large volumes of such XML texts. We consider simple transformations which can change the internal structure of documents, that is, the mark-up, and can filter out parts of the text but do not disrupt the ordering of the words. Specifically, we focus on XML transformations where the transformed document is a subsequence of the input document when ignoring mark-up. We call the latter text-preserving XML transformations. We characterize such transformations as copy- and rearrangefree transductions. Furthermore, we study the problem of deciding whether a given XML transducer is text-preserving over a given tree language. We consider top-down transducers as well as the abstraction of XSLT called DTL. We show that deciding whether a transformation is text-preserving over an unranked regular tree language is in Ptime for topdown transducers, EXPtime-complete for DTL with XPath, and decidable for DTL with MSO patterns. Finally, we obtain that for every transducer in one of the above mentioned classes, the maximal subset of the input schema can be computed on which the transformation is text-preserving.
author2 The Pennsylvania State University CiteSeerX Archives
format Text
author Timos Antonopoulos
Wim Martens
Frank Neven
author_facet Timos Antonopoulos
Wim Martens
Frank Neven
author_sort Timos Antonopoulos
title The Complexity of Text-Preserving XML Transformations
title_short The Complexity of Text-Preserving XML Transformations
title_full The Complexity of Text-Preserving XML Transformations
title_fullStr The Complexity of Text-Preserving XML Transformations
title_full_unstemmed The Complexity of Text-Preserving XML Transformations
title_sort complexity of text-preserving xml transformations
publishDate 2011
url http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.302.4102
http://fox7.eu/wp-content/uploads/pods11a-antonopoulos.pdf
genre DML
genre_facet DML
op_source http://fox7.eu/wp-content/uploads/pods11a-antonopoulos.pdf
op_relation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.302.4102
http://fox7.eu/wp-content/uploads/pods11a-antonopoulos.pdf
op_rights Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_ 1766397671289913344