Abstract: System and method for evaluating query performance The present invention provides a system and method for evaluating query performance. The proposed system (30) comprises a query normalization module (34) adapted to receive an instance of an input query (32) and to generate a normalized representation (36) of said input query (32). A query plan generator (38) generates a query tree from the normalized representation (36) of said input query (32) and generates a query execution plan (42) based upon index and statistical information pertaining to a relational database. The index information and statistical information pertaining to said relational database is contained in a system catalog (40) that further contains performance characteristics of operational primitives of a query language. The proposed system (30) includes an abstract interpreter (44) adapted to evaluate a query performance metric based upon said query execution plan (42) and said performance characteristics obtained from the system catalog (40). FIG 2
Description
System and method for evaluating query performance
The present invention relates to query processing and in particular to a system and method for estimating performance characteristics of a query pertaining to a relational database.
Databases are well known systems for information storage and retrieval. An example of a frequently used type of database is a relational database that stores data in a set of tables that may be reorganized and accessed in a number of different ways. A relational database management system uses relational techniques for storing and retrieving data. Structured query language (SQL) is a well known example of a database language that includes commands for retrieving, storing, updating and deleting data stored in a relational database.
For the most part, SQL queries are declarative, with their evaluation being defined within the context of a database instance. The evaluation of an SQL query yields result sets whose elements are drawn from the objects belonging to an execution domain, namely, the database instance. Till date, performance characteristics of SQL queries have been estimated with the assistance of a tool provided by the SQL server tool chain suite. This involves, for example, measurement of input/output (I/O) statistics and/or use of a query analyzer. However, such existing tools require a compatible database that is populated in a manner commensurate with real world requirements. Due to the dependence on a populated database, such an exercise is usually taken up only after an application program is fully functional.
Existing techniques for evaluating query performance thus require a populated real-world database first procured or created, making it impossible to attempt any kind of performance management or measurement activity until the
application (in whole or in part) is working and a target database is available. Hence performance management or measurement of SQL queries cannot be attempted during design or early phase of development of the application, since the investigation of alternatives concerning database modeling are difficult or impossible to undertake. That is, if assumptions on sizes and data distribution in the database change, it is difficult to recreate a live database for performance management or measurement.
It is an object of the present invention to provide an improved system and method for evaluating query performance.
The above object is achieved by a system for evaluating query performance, comprising:
- a query normalization module adapted to receive an instance
of an input query and to generate a normalized representation
of said input query,
-a query plan generator adapted to generate a query tree from
the normalized representation of said input query and to
generate a query execution plan based upon index and
statistical information pertaining to a relational database,
said index information and statistical information pertaining
to said relational database being contained in a system
catalog,
-said system catalog further containing performance
characteristics of operational primitives of a query
language, and
-an abstract interpreter adapted to evaluate a query
performance metric based upon said query execution plan and
said performance characteristics obtained from the system
catalog.
The above object is achieved by a method for evaluating query performance comprising the steps of:
- receiving an instance of an input query,
-generating a normalized representation of said input query, -generating a query execution plan from the normalized representation of said input query, based upon index and
statistical information pertaining to a relational database contained in a system catalog, said system catalog further containing performance characteristics of operational primitives of a query language, and
-evaluating a query performance metric based upon said query execution plan and said performance characteristics obtained from the system catalog.
The underlying idea of the present invention is to apply the technique of abstract interpretation to the problem of deriving a performance metric for a query. In the proposed system a query is evaluated to yield objects belonging to an abstract domain, instead of its normal execution domain (i.e. a real world database instance), thus obviating the need to procure a real world database during design or development of the application.
In a preferred embodiment, the query performance metric comprises the number of logical read operations incurred during processing of said query execution plan. Number of logical read operations incurred is a fundamental and widely accepted measure of query performance.
In one embodiment, the index and statistical information pertaining to the relational database comprises structure and size for indices defined and nature of distribution of data within said indices. This provides particularly advantageous means for generating a query execution plan.
In a further embodiment, the proposed system further comprises an execution time approximation means to estimate the time required to execute said query based upon the obtained query performance metric. Using this feature, the performance penalty may also be projected in terms of the time elapsed for executing the query.
The present invention is further described hereinafter with reference to preferred embodiments shown in the accompanying drawings, in which:
FIG 1 illustrates a system for executing an SQL query, and
FIG 2 illustrates a system for deriving a performance metric of an SQL query in accordance with an embodiment of the present invention.
The present invention makes use of the technique of abstract interpretation and applies it to the problem of deriving a performance measure for a query, for example in structured query language (SQL). Abstract interpretation has hitherto been used in the area of program analysis and symbolic execution. The underlying idea of abstract interpretation has been presented in the article Patrick Cousot, Radhia Cousot, "Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints", 4th ACM Symposium on Principles of Programming Languages, Jan. 1977, Los Angeles.
Having first described a method to arrive at a run-time complexity measure for SQL queries via abstract interpretation, it is next shown how this method can be integrated into an SQL engine to achieve a practical solution to the aforementioned problems of measuring query performance. In what is proposed herein, SQL queries are evaluated to yield objects belonging to an abstract domain, instead of their normal execution domain (i.e. a real world database instance). By appropriately choosing an abstract domain and the associated machinery for their evaluation, SQL queries can be interpreted to yield some property of interest. In this case, the property of interest is the number of logical read operations incurred during the processing of an SQL query, which is a well-accepted measure of SQL query performance.
FIG 1 illustrates a system 10 for execution of an SQL query to produce a result set. The system 10 accepts a user physical query 12 as input. This input query 12 may be user specified, and may typically comprise one or more selection
criteria and desired result fields. For example, the selection criteria may be adapted to retrieve information about a patient named xMary'. The particular information retrieved is specified by the desired result fields. The result field may include, for example, the age and test results for a particular test.
The input query 12 is received by a query normalization module 14 that parses the query 12 to generate a normalized representation 16 of the input query 12 using the principles of relational algebra. An example of a normalized relational representation of a physical query is given by the following:
(SELECT R.a, S.b WHERE R.m = S.n) => ...(1)
Herein R and S are tables in a relational database; a and m are attributes in table R; and b and n are attributes in table S. Next, this normalized representation 16 of the query is converted into a canonical query tree that identifies the relationship between different relational tables in the target database. The structure of the tree is derived from the underlying physical database being abstracted. This canonical tree is then optimized to yield an efficient query execution plan 22 by a query plan generator 18. The query execution plan 22 is an execution order for the units of the relational query that requires least system efforts and resources. The query plan generator 18 is responsible for both generation and optimization of the query execution plan 22.
For generating the query execution plan 22, the query plan generator 18 makes use of index information and statistical information about the tables in the relational database (R and S in the illustrated example). The statistical and index information include the structure and size of the indices defined and the depth of the index tree, the nature of distribution of data within the indices and the relations, size of the relations in blocks and records, selectivity of key attributes of significant relations, etc. Such
statistical and index information are defined in a system catalog 20. Depending on the index and statistical information obtained from the system catalog 20, the query plan generator 18 determines the optimal operational primitives that require least system efforts and resources to execute the query. For example, execution of the relational query given by (1) requires a joining operation. To that end, an appropriate operational primitive for carrying out an optimized joining operation (for example, sort-merge join, nested loop join, among others) will be determined by the query plan generator 18.
The final stage of processing involves a query plan interpreter 2 4 stepping through the execution plan and creating the result set 28 based upon the data drawn from a real world relational database 2 6 that defines the execution context.
In the proposed system for evaluating a query performance metric, the final stage of processing differs from the above-described SQL execution engine as can be seen from FIG 2. FIG 2 illustrates a system 30 for deriving a performance metric of an SQL query in accordance with an embodiment of the present invention. As before, the system 30 includes a query normalization module 34 to receive an instance of an input physical query 32 and to generate a normalized representation 36 of input query 32, As before, a query plan generator 38 generates a canonical query tree from the normalized representation 36 of the input query 32 and uses this tree to generate a query execution plan 42.
As mentioned above, for generating the query execution plan 42, the query plan generator 38 makes use of index information and statistical information about the relational tables in the target database- The statistical and index information include the structure and size of the indices defined and the depth of the index tree, the nature of distribution of data within the indices and the relations, size of the relations in blocks and records, selectivity of
key attributes of significant relations, etc. In this case, statistical and index information are defined in a system catalog 40. Depending on the index and statistical information obtained from the system catalog 40, the query plan generator 38 determines the optimal operational primitives that require least system efforts and resources to execute the query.
However, in the illustrated embodiment of the proposed system 30, the system catalog 4 0 is extended and augmented with information pertaining to performance penalties for various algorithms used in order to realize SQL operational primitives, on the database instance defining the execution context. Further, in the system 30, the query interpreter is replaced by an abstract interpreter that steps through the query execution plan 42 but operates within the context of the extended system catalog 40 (instead of a populated physical database), to determine the performance penalty associated with the operational primitives required to execute the query, and hence produces a query performance metric 46. For example, for the relational query given by (1) above, if the query execution plan 42 contains a sort-merge join operation, the abstract interpreter 44 would obtain the performance penalty associated with the sort-merge join algorithm from the context of the extended system catalog 40 for evaluating the performance metric of the query given by (1) .
The extended system catalog 4 0 thus contains the performance complexities for the different realizations of the SQL primitives in addition to the statistical and index information of the target database. The statistical and index information of a database obtained from the system catalog 40 is used for arriving at optimal query plans. However, this is created from an analysis of data present in the database tables. Consequently, the solution based on abstract interpretation may be somewhat dependent on the database instance. Even so, it is possible to render the solution generic and readily useful even in the absence of a real
world database. This, however, requires the solution to be capable of allowing users to define the system catalog 40 (for a database commensurate in terms of both size and distribution of data, with the real world database of interest). The advantage of such an arrangement is that it obviates the need for a real representative database, which is not easy to obtain. It also gives users the flexibility to readily modify the system catalog 40, if assumptions about data distribution and sizes are to change.
According to a further embodiment, the performance penalty thus derived may be projected in terms of elapsed time for executing a query. This may be done by an execution time approximation means that constructs a run-time model of the query plan interpreter making use of platform specific parameters. In this embodiment, having obtained the performance of the query in terms of the number of logical read operations, an approximation is derived for the time required to execute the query based upon the run-time model of the query plan interpreter, while making assumptions about buffer cache size and hit ratio, input/output (I/O) subsystems, and machine parameters like average physical block read time, memory size, among others.
The proposed system is hence advantageous while designing and developing solutions based on relational database management systems. The proposed system derives a performance metric/penalty for SQL query without requiring a real world database (i.e. one that is populated and matching in both size and distribution of data) to be procured. It is thus possible to carry out performance management or measurement activity during design or early phase of development of the application. This is because since no real world database is required, it is possible to investigate alternatives concerning database modeling modifying the assumptions on sizes and data distribution in the target database.
Summarizing, the present invention provides a system and method for evaluating query performance. The proposed system
comprises a query normalization module adapted to receive an instance of an input query and to generate a normalized representation of said input query. A query plan generator generates a query tree from the normalized representation of said input query and generates a query execution plan based upon index and statistical information pertaining to a relational database- The index information and statistical information pertaining to said relational database is contained in a system catalog that further contains performance characteristics of operational primitives of a query language. The proposed system includes an abstract interpreter adapted to evaluate a query performance metric based upon said query execution plan and said performance characteristics obtained from the system catalog.
Although the invention has been described with reference to specific embodiments, this description is not meant to be construed in a limiting sense. Various modifications of the disclosed embodiments, as well as alternate embodiments of the invention, will become apparent to persons skilled in the art upon reference to the description of the invention. It is therefore contemplated that such modifications can be made without departing from the spirit or scope of the present invention as defined.
We Claim,
A system (30) for evaluating query performance, comprising:
- a query normalization module (34) adapted to receive an instance of an input query (32) and to generate a normalized representation (36) of said input query (32),
- a query plan generator (38) adapted to generate a query tree from the normalized representation (36) of said input query (22) and to generate a query execution plan (38) based upon index and statistical information pertaining to a relational database, said index information and statistical information pertaining to said relational database being contained in a system catalog (40),
- said system catalog (40) further containing performance characteristics of operational primitives of a query language, and
- an abstract interpreter (44) adapted to evaluate a query performance metric based upon said query execution plan (42) and said performance characteristics obtained from the system catalog (40) .
The system according to claim 1, wherein said query performance metric comprises the number of logical read operations incurred during processing of said query execution plan.
The system according to any of the preceding claims, wherein said index and statistical information pertaining to the relational database comprises structure and size for indices defined and nature of distribution of data within said indices.
The system according to any of the preceding claims, further comprising an execution time approximation means to estimate the time required to execute said
query based upon the obtained query performance metric.
A method for evaluating query performance comprising the steps of:
- receiving an instance of an input query (32),
- generating a normalized representation (36) of said input query (32),
- generating a query execution plan (42) from the
normalized representation (36) of said input query
(32), based upon index and statistical information
pertaining to a relational database contained in a system catalog (40), said system catalog (40) further containing performance characteristics of operational primitives of a query language, and
- evaluating a query performance metric based upon
said query execution plan (42) and said performance
characteristics obtained from the system catalog
(40) .
The method according to claim 5, wherein evaluating said query performance metric comprises evaluating the number of logical read operations incurred during processing of said query execution plan.
The method according to any of claims 5 and 6, wherein said index and statistical information pertaining to the relational database comprises structure and size for indices defined and nature of distribution of data within said indices.
The method according to any of claims 5 to 7, further comprising estimating the time required to execute said query based upon the obtained query performance metric.
| # | Name | Date |
|---|---|---|
| 1 | 1131-che-2007-abstract.pdf | 2011-09-03 |
| 1 | abs-1131-che-2007.jpg | 2011-09-03 |
| 2 | 1131-che-2007-claims.pdf | 2011-09-03 |
| 2 | 1131-che-2007-form 5.pdf | 2011-09-03 |
| 3 | 1131-che-2007-correspondnece-others.pdf | 2011-09-03 |
| 3 | 1131-che-2007-form 3.pdf | 2011-09-03 |
| 4 | 1131-che-2007-description(complete).pdf | 2011-09-03 |
| 4 | 1131-che-2007-form 1.pdf | 2011-09-03 |
| 5 | 1131-che-2007-drawings.pdf | 2011-09-03 |
| 6 | 1131-che-2007-description(complete).pdf | 2011-09-03 |
| 6 | 1131-che-2007-form 1.pdf | 2011-09-03 |
| 7 | 1131-che-2007-correspondnece-others.pdf | 2011-09-03 |
| 7 | 1131-che-2007-form 3.pdf | 2011-09-03 |
| 8 | 1131-che-2007-claims.pdf | 2011-09-03 |
| 8 | 1131-che-2007-form 5.pdf | 2011-09-03 |
| 9 | 1131-che-2007-abstract.pdf | 2011-09-03 |
| 9 | abs-1131-che-2007.jpg | 2011-09-03 |