Query Optimization against XML Data

Nicoleta Liviana TUDOR
Petroleum-Gas University of Ploiesti,
Department of Informatics, Information Technology, Mathematics and Physics,
39, Bucuresti Blvd., Ploiesti, 100680, Romania
Tudor.Liviana@gmail.com

Abstract: Web services allow middleware access to a relational database and require data representation in XML format. The XML views obtained from relational databases can be accessed by using XPath queries. This article proposes an optimization model for XML data processing based on a heuristic algorithm to extract data from XPath views. To this end, the author uses various XPath query classes temporarily stored in cache, as XPath views. For each view selected from cache, a compensation query can be found and composed with in order to solve an XML data query. Experimental results reveal the effectiveness of the heuristic method used to solve queries on XML documents.

Keywords: cache, heuristic algorithm, relational databases, query processing, XML data.

>>Full text
CITE THIS PAPER AS:
Nicoleta Liviana TUDOR, Query Optimization against XML Data, Studies in Informatics and Control, ISSN 1220-1766, vol. 25(2), pp. 173-180, 2016. https://doi.org/10.24846/v25i2y201604

Introduction

The distributed systems use business object or XML web services to share data between applications, ensuring the platform and programming language independence. The Web services architecture involves the existence of a few layers, protocols and related technologies like XML, SOAP (Simple Object Access Protocol), WSDL (Web Services Description Language).

Various XML data storage approaches for relational databases recommend the use of the generic relational structure, including XML document mapping. Kossman [1] has represented XML documents using graphs.

The ideas of memorizing information related to each node of the tree in an XML document has been developed by Yoshikawa, Amagasa and others [2]. The algorithm for XML data translation proposed by Yoshikawa and Amagasa is only appropriate for nonrecursive data and fails to obtain accurate results if the XML data has ascendants with the same label in the tree representation.

The relational model allows the elaboration of translation algorithms for the XQuery queries into SQL queries, where XQuery is a standard XML data interrogation language. For example, Oracle allows the creation of XML views for relational data, and the interrogation of XML views can use the XPath language.

Multiple queries optimization has been expressed in several contexts in the recent past including transient views [3], view maintenance [4], XML query optimization and continuous query optimization.

Tudor proposes a cache pattern with multiqueries and describes the multi-query optimization with scheduling, caching and pipelining. A set of cache patterns is derived from a set of class of multiqueries that are loaded into the cache [5].

A semantic cache memorizing XML views can be used to optimize business objects. To avoid repeated connections to a backend database, the views stored in cache are interrogated. This type of middle-tier cache has become very popular for Web applications with relational databases. Semantic cache uses the views’ semantics to determine if the queries can be solved with the cache information entirely or partially [6], [7].

The contributions to this article can be summarized as follows:

a method to optimize access to XML data has been identified and it is based on the extraction of XPath views from a semantic cache.
a new solving technique has been proposed for the XPath queries. Thus, this paper offers:

definition for the XPath query classes for which the XPath view heuristic extraction algorithm is evaluated;
an effective method to select an XPath view from cache based on the constraint satisfaction verification for the XPath expressions;

the heuristic solving techniques for the queries described above have been implemented for relational Oracle databases. The complexity analysis stands as proof for the performance of the proposed algorithm. The time complexity will be evaluated according to the size of the input space represented by the set of XPath views. The experimental results prove the feasibility and effectiveness of the newly proposed heuristic algorithm.

The article is organized as follows. Section 2 describes the way to process XML queries and the conversion of XQuery queries in relational databases. Section 3 describes the issue of XML data rewriting using XPath views in Oracle databases. The author presents the composition of XPath queries and the creation of an XPath views’ cache. Section 4 describes the HSelectXP heuristic algorithm for special XPath query classes. Section 5 describes a complexity evaluation for the heuristic algorithm. Section 6 presents the experimental results, comparisons against other known algorithms and the way to process queries using the heuristic algorithm. The experimental study emphasizes the performance of the heuristic algorithm in processing XML query data. Section 7 draws-up the conclusions regarding the effectiveness of a semantic cache and the heuristic algorithm in optimizing access to XML data.

REFERENCES

KOSSMAN, D., D. FLORESCU, Storing and Querying XML Data using an RDBMS, IEEE Data Eng. Bulletin, 1999.
YOSHIKAWA, M., T. AMAGASA, T. SHIMURA, S. S. UEMURA, Xrel: A Path-based Approach to Storage and Retrieval of XML Documents using Relational Databases, ACM Trans. on Internet Tech., no. 1, 2001, pp. 110-141.
SUBRAMANIAN, S. N., S. VENKATARAMAN, Cost based Optimization of Decision Support Queries using Transient Views, In ACM SIGMOD International Conference Management of Data, Seattle, WA, 1998.
MISTRY, H., P. ROY, S. SUDARSHAN, K. RAMAMRITHAM, Materialized View Selection and Maintenance using Multi-query Optimization, In ACM SIGMOD International Conference on Management of Data, 2001.
TUDOR, N. L., Cache Pattern with Multi-Queries, Advances in Electrical and Computer Engineering, Faculty of El. Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, 10, nr. 2, 2010, pp. 84-88.
ARINBJARNAR, M., B. PORSSON, B. P. JONSSON, Performance of Semantic caching Revisited, Technical Report RUTR–CS06002, Reikjavik University Iceland, 2006.
LUO, Q., S. KRISHNAMURTHY, C. MOHAN, H. PIRAHESH, H. WOO, B. LINDSAY, J. NAUGHTON, Middle-tier Database Caching for e-Business, In Proceedings of the ACM SIGMOD International conference on Management of data, 2002, pp 600 – 611.
DAMIANI, E., S. D. C. DI VIMERCATI, S. PARABOSCHI, P. SAMARATI, Design and Implementation of an Access Control Processor for XML Documents, Computer Networks, Amsterdam, Netherlands, 1999, vol. 33(1–6), 2000, pp. 59-75.
FUNDULAKI, I., M. MARX, Specifying Access Control Policies for XML Documents with XPath, In The ACM Symposium on Access Control Models and Technologies (SACMAT), ACM Press, 2004, pp 61-69.
MANDHANI, B., D. SUCIU, Query Caching and View Selection for XML Databases, Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005.
LEE, D., C. W. WESLEY, Semantic Caching Via Query Matching for Web Sources, In ACM Proceedings of the 8^th International Conference on Information and Knowledge Management, USA, 1999.
TANG, J., S. ZHOU, A Theoretic Framework for Answering XPath Queries using Views, Third International XML Database Symposium, XSym, Trondheim, Norway, 2005.
Balmin, A., F. Özcan, K. S. Beyer, R. Cochrane, H. Pirahesh, A Framework for using Materialized XPath Views in XML Query Processing, In VLDB, 2004.
TUDOR, L., Models XP for Rewriting XPath Queries, Studies in Informatics and Control, Bucharest, Romania, vol. 20, no. 2, 2011, pp. 121-128.
VASILE, S. L., Query Optimization on Random Databases, Studies in Informatics and Control, vol. 23(3), 2014, pp. 257-265.
TREMBLAY, J. P., P. G. SORENSON, An Introduction to Data Structures with Applications, second edition, McGraw-Hill, New-York, 1984.
CHEN, L., S. WANG, E. RUNDENSTEINER, Replacement Strategies for XQuerycaching Systems, Data & Knowledge Engineering, vol. 49, no. 2, 2004, 145-175.