Friday , April 19 2024

Models XP for Rewriting XPath Queries

Nicoleta Liviana TUDOR
Department of Computer Science, Petroleum-Gas University of Ploiesti
39, Bucuresti Street, 100680, Romania

Abstract: This paper defines XP models for classes of XPath queries stored in cache, as materialized views. After declaration of issue of the correspondence between the tree models used for classes of XPath queries stored in cache and the set of trees associated to a XML document, it follows the solutions for rewriting the XPath views by transformation of patterns of trees. Author’s personal contribution consists in modelling the set of trees associated to XP queries, for a multitude of constraints of XPath expressions and description of functions of correspondence in XP{ /, //, *, [] } representation. Verification of possibility to return the result of a query using the views materialized in cache requires the analysis of compatibility of tree models associated to XPath queries and XPath views materialized in cache. Finding a morphism of XP models demonstrates the real possibility of rewriting the XP view. This paper describes a method for establishing a semantic cache of XPath views. Composition of queries using a semantic cache of XP views, assumes the existence of a query which, by composition with a view from cache, will return the result of query.

Keywords: XPath query, rewriting views, XP models, XML tree, morphism, composition of queries, cache.

>>Full text
CITE THIS PAPER AS:
Nicoleta Liviana TUDOR, Models XP for Rewriting XPath Queries, Studies in Informatics and Control, ISSN 1220-1766, vol. 20 (2), pp. 121-128, 2011. https://doi.org/10.24846/v20i2y201104

1. Introduction

Data processing as a business object or XML Web services, used within data exchange between applications, requires XML data format [17]. Relational systems convert XML data into a relational format which may be effectively memorized and queried as a set of relations.

Various approaches of storing XML data in relational databases suggest use of metadata, of generic relational table diagrams (pre-defined) including mapping of XML documents. Kossman ?i Florescu [1] represented XML documents using graphs, where each edge is a tuple of a relation. This approach uses recursive SQL99 queries including constructions of assessment of XML queries. Zhang, DeWitt and others [2] proposed a system labeling each node by numbers obtained by pre and post order crossing of tree.

To solve multiple joins, Yoshikawa, Amagasa and others [3] suggest storing information about each node of tree in the XML document, especially the path from the root of the tree until each leaf node. The algorithm of translation of XML data as proposed by Yoshikawa and Amagasa is appropriate only for non-recursive data and fails to succeed in correct results when XML data have ancestors with the same label within representation tree [4]. This algorithm requires use of joins with ? operators (), implying a very expensive processing [2]. Bohannon, Freire and others [5] have cost based approach using information from XML diagram to select an alternative of execution of XML queries for the lowest price.

The relational diagram allows elaboration of algorithms for translation of XQuery queries into SQL queries, as XQuery is a standard language for XML data query. For instance, Oracle XML [3] allows storing the entire XML document using CLOB type data, the assessment of XML queries being similar to process of queries in native XML databases. Microsoft SQL Server enables creation of relational data XML views. Query of views using XPath language is restricted at XPath expressions.

For optimization of XPath queries, a semantic cache could memorize XML views. In order to avoid repeated connection with backend database, views materialized in cache are subject to queries. This type of middle-tier cache became very popular in Web applications using relational databases [6].

Mandhani and Suciu [7] have suggested a method of creating a semantic cache storing XPath views, used in query processing. Views materialized in cache are XPath expressions and queries could be XQuery or XPath fragments, for whose execution the systems checks firstly in cache if it may return the result. They memorized the views cache in relational tables showing the efficiency of the techniques for selection of views.

Lee and Wesley Chu [8] presented a semantic cache for Web databases relying upon translations of queries and made an analysis of compatibilities of queries with views memorized in cache. The semantic cache proposed by Lee and Wesley Chu consists of a hash table with input data type (key, value), where the key is a semantic description following the queries made and value contains results of the queries associated to the key. In this type of cache, the semantic views use only conjunctive predicates, queries being transformed in conjunctive components.

The paper is organized as follows:

  • the section Rewriting XPath views describes the issue of rewriting the XML data based on XPath views, within relational databases
  • the section Tree models for XPath queries is dedicated to tree models associated to certain classes of XPath queries, using XP{ /, //, *, [] } representation within context of rewriting XPath views and describes the correspondence functions
  • the section Morphism of XP models demonstrates that there is a morphism of XP models on the set of functions of transformation of tree models associated to XP queries
  • the section Semantic Cache of XP Views describes the composition of XPath queries and the way of processing a query using a semantic cache of XP views
  • the section Conclusions.

References:

  1. KOSSMAN, D., D. FLORESCU, Storing and Querying XML Data using an RDBMS, IEEE Data Engineering Bulletin, 1999.
  2. ZHANG, C., J. NAUGHTON, D. DEWITT, Q. LUO, G. LOHMAN, On Supporting Containment Queries in Relational Database Management Systems, in ACM SIGMOD, 2001, pp. 425-436.
  3. YOSHIKAWA, M., A. TOSHIYUKI, T. SHIMURA, S. S. UEMURA, Xrel: A Path-based Approach to Storage and Retrieval of XML Documents using Relational Databases, ACM Transactions on Internet Technology, Nr. 1, 2001, pp. 110-141.
  4. KRISHNAMURTHY, R., R. KAUSHIK, J. F. NAUGHTON, XML to SQL Query Translation Literature: The State of the Art and Open Problem, in XML Database Symposium, XSym, 2003, pp. 31-38.
  5. BOHANNON, P., J. FREIRE, P. ROY, J. SIMEON, From XML Schema to Relations: A Cost-based Approach to XML Storage, in 18th International Conference on Data Engineering, 2002, pp. 64-76.
  6. LUO, Q., S. KRISHNAMURTHY, C. MOHAN, H. PIRAHESH, H. WOO, B. LINDSAY, J. NAUGHTON, Middle-tier Database Caching for e-Business, in Proceedings of the ACM SIGMOD, 2002, pp. 600 – 611.
  7. MANDHANI, B., D. SUCIU, Query Caching and View Selection for XML Databases, Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005.
  8. DONGWON, L., W. CHU WESLEY, A Semantic Caching Scheme for Wrappers in Web Databases, In ACM International Workshop on Web Information and Data Management (WIDM), USA, 1999.
  9. TANG, J., S. ZHOU, A Theoretic Framework for Answering XPath Queries using Views, In XSym, 2005.
  10. BALMIN, A., F. ÖZCAN, K. S. BEYER, R. COCHRANE, H. PIRAHESH, A Framework for using Materialized XPath Views in XML Query Processing, in VLDB, 2004.
  11. LAKSHMANAN, L. V. S., H. WANG, Z. ZHAO, Answering Tree Pattern Queries using Views, in VLDB, 2006.
  12. GAO, J., T. WANG, D. YANG, MQTree based Query Rewriting over Multiple XML Views, in DEXA, 2007.
  13. CAUTIS, B., A. DEUTSCH, N. ONOSE, XPath Views for Documents with Persistent Identifiers, in SIGMOD, 2007.
  14. BENEDIKT, M., W. FAN, F. GEERTS, XPath Satisfiability in the Presence of DTDs, in PODS, 2005.
  15. GROPPE, S., S. BÖTTCHER, J. GROPPE, XPath Query Simplification with Regard to the Elimination of Intersect and Except Operators, in ICDE Workshops, 2006.
  16. XU, W., Z. M. ÖZSOYOGLU, Rewriting XPath Queries Using Materialized Views, VLDB 2005.
  17. VOLOVICI, D., G. D. CUREA, M. BREAZU, D. I. MORARIU, Statistical Methods for Performance Evaluation of WEB Document Classification, Studies in Informatics and Control, Vol. 19, No. 2, 2010.