Global optimization of correlated subqueries and exists predicates5761657Abstract A method of optimizing a query and its subqueries together in a global manner regardless of the nesting levels. All joins in different nesting levels are effectively converted into joins in the same level by assigning a nesting level attribute to relations, assigning properties to the predicate, and assigning nesting level attributes and minimum relation set attributes to join predicate operands. The joins and their predicates are then considered as if they were contained in only the main query according to a set of rules based on the attributes of the relations and the join predicates. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
______________________________________
SELECT X1
FROM T1
or:
SELECT X1, Y2
FROM T1, T2
WHERE T1.Y1 = T2.X2
______________________________________
A more complex query has two or more query blocks, such as:
______________________________________
SELECT X1
FROM T1
WHERE X1 NOT IN (SELECT Y2 FROM T2)
______________________________________
There is always an enclosing query block, the main or outermost query block, which represents the whole query, and there are zero or more enclosed query blocks, which represent subqueries inside the main query. An enclosed query block may also have its own enclosed query blocks. Nesting levels for query blocks are only limited by system resources used for query processing. When the nesting level is greater than two, the innermost query has more than one enclosing query, which is its immediate enclosing query, and the enclosing query of its immediate enclosing query. If a table is immediately contained inside a query block, it is considered to be an inner table; otherwise, it is an outer table. A table can be inner in only one block and outer in one or more subblocks. Queries and their subqueries are connected by any of three types of predicates: (1) connecting, (2) correlating, and (3) existential. For purposes of illustration, all examples in the present specification use the SELECT operations. However, it will be recognized by those of skill in the art that the method of the present invention also applies for INSERT, UPDATE and DELETE operations. CONNECTING PREDICATES A connecting predicate may contain a comparison operator or a comparison operator followed by the ANY or ALL quantifier. Comparison operators are <, >, =, .ltoreq., .gtoreq., and <>, which represents less than, greater than, equal, less than or equal, greater than or equal, and not equal, correspondingly. Comparison operators compare two values of the same type, which can be numeric or alphanumeric string. A subquery following a comparison operator must return a single row. If a comparison operator is followed by the quantifier ANY or ALL, the predicate is a quantified predicate. A subquery following a quantified predicate can return multiple rows. The following are examples of queries related via a connecting predicate:
______________________________________
SELECT X1
FROM T1
WHERE T1.Y1 = ANY (SELECT Y2 FROM T2)
SELECT X1
FROM T1
WHERE (T1.X1, T1.Y1) = ANY (SELECT X2,Y2 FROM T2)
SELECT Y1
FROM T1
WHERE X1 <= (SELECT AVERAGE (X2) FROM T2)
______________________________________
Connecting predicates connect an enclosed query block and its immediate enclosing query block, which are also referred to as the inner and the outer query blocks. CORRELATING PREDICATES A correlating predicate exists when an operand used in a subquery predicate references an outer table. The following is a correlating predicate example:
______________________________________
SELECT *
FROM T1, T2
WHERE T1.X1 = T2.X2 AND T1.Y1 NOT IN (SELECT Y3
FROM T3, T4 WHERE T2.Z2 < T3.Z3 AND
T3.X3 = T4.X4)
______________________________________
The above example shows the main query and the subquery related by both a connecting predicate (Y1 NOT IN Y3) and a correlating predicate (T2.Z2<T3.Z3) in the subquery which references T2, which is immediately contained in the main query block. EXISTENTIAL PREDICATES The operator EXISTS followed by a query forms an existential predicate, which is also known for short as an EXISTS predicate. As with connecting predicates, EXISTS predicates connect an outer block to an inner block. The left operand of EXISTS is a NULL. The following is an example query with an EXISTS predicate:
______________________________________
SELECT *
FROM T1
WHERE EXISTS (SELECT * FROM T2 WHERE
T1.X1 = T2.X2)
______________________________________
REGULAR PREDICATES A regular predicate is one which is not a connecting, correlating, or EXISTS predicate, is regular, and usually appears in a WHERE clause which selects a table subset. GLOBAL OPTIMIZATION OF JOINS IN SOL QUERIES FIG. 3 describes a preferred method of global optimization for joins in queries with subqueries according to the present invention. The following steps are incorporated in the global optimization method of the present invention: 1. At 302, reduce the query and determine the minimum relation sets for predicate operands at 304. 2. At 306, generate different join plans following predefined rules (described below) to select relations and predicates. 3. At 308, select the join plan with the least cost. The semantics of a correlated subquery is that each subquery is evaluated once for each outer row, and each result set is used to join back with the outer row using the connecting predicate. Effectively, a connecting predicate is evaluated once for each row in outer table. According to the method of the present invention, a connecting predicate is evaluated only once. Iteratively evaluating the subquery for each selected row of the outer query is achieved by joining the outer tables and the inner tables of the subquery in an order resulting in a relation which is divided into groups, each of which is associated with one selected row in the outer query. To identify the groups in this resulting relation, a unique identifier is assigned to each outer row and each row of the resulting relation will contain this unique outer row identifier. The groups are identified by these identifiers. For example, given the following query:
______________________________________
SELECT TO.*
FROM T1, T0
WHERE T0.X0 IN (SELECT T2.X2 FROM T2 WHERE
T1.Y1 < T2.Y2)
______________________________________
The subquery may be evaluated as follows:
______________________________________
TEMP (ID1, X2) = SELECT T1.UNIQUEID, T2.X2 FROM T1, T2
WHERE T1.Y1 < T2.Y2;
______________________________________
The table T0 and the temporary result table TEMP are then typically joined, such that a row from T0 will be returned at most once for each group of rows in TEMP associated with a row from table T1. If a connecting predicate's subquery specifies an aggregate function, the aggregate must be done on the subquery with the grouping field as the outer row's unique identifier before the connecting condition can be evaluated. For example, given the following query:
______________________________________
SELECT TO.*
FROM T0, T1
WHERE T0.X0 IN (SELECT MAX (T2.X2) FROM T2 WHERE
T1.Y1 < T2.Y2)
______________________________________
The following join plan may be used to handle the query:
______________________________________
TEMP (MAXX2) = SELECT MAX (T2.X2) FROM T1, T2 WHERE
T1.Y1 < T2.Y2 GROUP BY T1.UNIQUEID
SELECT TO. *
FROM T0, TEMP
WHERE T0.X0 = TEMP.MAXX2
______________________________________
In the above example, the resulting relation TEMP does not contain the identifiers because it has been "used up" in the aggregate MAX in the subquery. The outer row identifiers are always used in some manner, whether or not they are retained in the temporary relation. The rules to select relations and predicates given below in this specification clearly point out when the identifiers are needed in the resulting relation. If the left operance of the connecting condition specifies one of the outer table in the subquery result, a predicate will be added to match the outer row unique identifiers and the subquery result. This is illustrated by the following example:
______________________________________
SELECT T1.*
FROM T1
WHERE T1.X1 IN (SELECT MAX (T2.X2) FROM T2 WHERE
T1.Y1 < T2.Y2)
______________________________________
The above example differs from the previous example in the number of tables immediately contained in the main query. It will be noted that the connecting predicate and the correlating predicate share the same table T1. This fact requires the identifiers to be retained in the resulting relation when the connecting predicate is evaluated and an extra predicate to be internally generated. The rules to select relations and predicates given below in the present specification clearly identify this situation. For this example, the following join plan may be used to handle the query:
______________________________________
TEMP (ID1, MAXX2) = SELECT T1.UNIQUEID, MAX (T2.X2)
FROM T1, T2 WHERE T1.Y1 < T2.Y2 GROUP BY
T1.UNIQUEID
SELECT T1.*
FROM T1, TEMP
WHERE T1.UNIQUEID = TEMP.ID1 AND T1.X1 =
TEMP.MAXX2
______________________________________
The present invention performs a global optimization by converting a complex query into a simple one with a list of tables to be joined and a list of predicates to apply in the joins. T0 be able to generate a join order that efficiently return a correct result, the present invention incorporates the following elements: --A query reduction method to label the scope of tables and predicate operands, and to determine minimum relation sets for predicate operands and predicate properties. This reduction turns a complex query into a simple one. --A set of rules to determine the conditions to apply a join to any two relations and which predicates can be used to join them. These rules guarantee the correctness of results generated via the global optimization approach. QUERY REDUCTION METHOD FIG. 4 describes a preferred query reduction method according to the present invention. The following steps are incorporated in the optimization method of the present invention. Label Query. At 402, each query is assigned a unique number which is then used to form an ordered set to label query blocks, tables and predicate operands. Ordered sets are represented by an ordered number list inside a pair of curly brackets. Label Query Block. At 404, each query block is assigned a label comprising an ordered set of query numbers that indicate the enclosing queries and its own unique query number. For example, {1, 3, 5} indicates a query block numbered 5, defined inside a query block numbered 3, defined inside a query block numbered 1 (i.e., the outermost or main query). Label Table. At 406, each table is assigned a label comprising an ordered set of query numbers that indicate the query block where it is immediately contained. For example label {1, 4} for T indicates that T is immediately contained in query block numbered 4, inside a query block numbered 1. Label Predicate Operand. At 408, both the left and right operands of a regular predicate or correlating predicate are labeled with the label of the immediate query block where they are specified. For connecting predicates, the left operand is labeled with the label of the outer query block and the right operand is labeled with the label of the inner query block. For EXISTS predicates, the NULL left operand is labeled with the label of the outer query block and the right operand is labeled with the label of the inner query block. Minimum Relation Set. At 410, the minimum relation set attribute of a connecting predicate operand indicates the set of base tables which must be present in the table list of a relation for it to become a candidate operand for a join operation which uses the predicate as a join condition. If the subquery is correlated, the predicate will also have the minimum relation set of outer tables which must be a subset of the table list of either the left or the right operand of a join operation which uses the predicate as a join condition. The minimum relation set of the left operand of a connecting predicate is the list of tables used in the operands. The minimum relation set of the right operand of a connecting predicate or an EXISTS predicate is the list of tables immediately contained in the subquery and the nested subqueries of the subquery. The left operand of an EXISTS predicate (a NULL) has an empty minimum relation set and it is needed only for the purpose of labeling. If a correlated subquery specifies an aggregate function, the minimum relation set of the right operand includes the set of the outer tables. This results because the aggregate function has to be carried out on the subquery result grouped by the combined unique identifier associated with all outer tables in the subquery result before the connecting condition can be used in a join operation. In addition, before the aggregate can be done, join operations with the outer tables using the correlating conditions have to be performed to restrict the set of inner rows that would participate in the aggregate. The minimum relation set values are used to determine which connecting predicate is to be used in a binary join operation. When a join uses a predicate as a condition, it is said that the join applies to the predicate. Predicate Properties. At 412, if a subquery specifies an aggregate function, the aggregate function must be performed on the right operand before the connecting predicate can be used in a join operation. An attribute is used to indicate whether an aggregate function exists in the right operand. When a correlated subquery specifies a COUNT function, and if a row from the outer query has no match in the subquery, a COUNT of zero must be returned for this row. To satisfy this requirement, an outer join operation is used to join an outer table and an inner table. Another attribute is used to indicate that the join applying to a predicate has to be an outer join and which of its operands has to be the outer table. For each outer table referenced in a subquery, if there is a correlating predicate between this table and an inner table, the attribute of this predicate will indicate the outer join requirement; otherwise, if no correlating predicate is specified between this outer table and an inner table, a dummy predicate will be added and its attribute indicates the outer join requirement. Steps to Reduce Query. FIG. 5 describes the preferred steps to reduce a query. The following steps assume that QueryBlockNumber is declared as a set of numbers, RelationList is a set of relations and PredicateList is a list of predicates. It is also assumed that every WHERE condition in the query is already in conjunctive normal form (CNF). The following are the preferred steps to reduce the query according to the present invention: 1. At 502, increment the query number by 1 and add this number to the set identified as QueryBlockNumber. 2. At 504, label the current query block with the set in QueryBlockNumber. 3. At 506, label each table immediately contained in the corresponding query block with the label of the query block and add it to the RelationList. 4. At 508, for each predicate in the CNF predicate list in the block: a. Label the left operand with the label of the query block. b. If it is neither a connecting nor an EXISTS predicate, then: i. Label the right operand with the label of the query block. c. Otherwise: i. Repeat steps 1 to 4 recursively for the subquery. ii. Replace the reference to the subquery with the projection specified in the subquery. iii. Label the right operand with the label of the subquery. iv. Form the left operand minimum relation set by listing all the tables in the left operand. v. Form the right operand minimum relation set with the set of tables immediately contained in this subquery and all its nested subqueries. vi. Form the outer table set by listing all outer tables (with respect to this query block) referenced in this subquery and all its nested subqueries. d. Assign the aggregate attribute to the predicate. 5. At 510, assign the outer join properties. 6. At 512, add the predicate to the PredicateList. 7. Pop the last number from QueryBlockNumber. After performing the above-described steps, a complex query with one or more subqueries is reduced to a simple query with a list of tables (RelationList) and a list of predicates (PredicateList). Optimization is then performed on this simple query according to the following set of rules, which are used to determine which relations can be joined together and which predicates can be applied to a particular pair of relations. RULES TO SELECT RELATIONS AND PREDICATES FIG. 6 describes a table of preferred rules to select relations and predicates. The following rules are incorporated in the optimization method of the present invention: 1. At 602, a relation whose label set is the same, or a subset or a superset to the label set of another relation can be joined to that relation. For example, a relation with the label {1, 3} can be joined with a relation with the label {1}, {1, 3} or {1, 3, 7}, but it cannot be joined with one labeled {1, 8}. 2. At 604, when a join operation is performed on two relations, the resulting relation is labeled as follows: if a connecting predicate is used, the resulting relation inherits the label of the outer relation; otherwise, the connecting predicate inherits the label of the inner relation. The table list is the union of the two table lists of the joined relations. For example, if relation {1} has tables {T1, T2}, {1, 3} has {T2, T3, T4, T5}, {1, 3, 7} has {T6, T7}, then without using any connecting predicate, the result of joining {1} and {1, 3} is labeled {1, 3} with the table list {T1, T2, T3, T4, T5}; {1, 3} and {1, 3, 7}, {1, 3, 7} and the list {T2, T3, T4, T5, T6, T7}. Alternatively, if a connecting condition is used, the result of joining {1} and {1, 3} is labeled {1} with the table list {T1, T2, T3, T4, T5}. 3. At 606, for any two candidates for a join operation, a predicate is chosen to be a join condition if both of the following conditions are satisfied: the right operand label is a subset of the bigger set of either of the two candidates labels; if it is a connecting predicate, the table list of one candidate covers the minimum set for one operand, and the other candidate's table list covers the minimum set for the other operand, and either candidate's table list covers the outer table set. All qualified predicates are chosen and they are all ANDed together. 4. At 608, if a join operation is performed, then perform the following steps as shown in FIGS. 7A and 7B: a. At 702, record the join in the join plan and add its cost to the plan's cost. b. At 704, delete all the predicates used in the join from the list of predicates. c. At 706, label the resulting relation as described in step b above and add it to the list of relations. d. At 708, if each relation is not needed to cover for any other connecting predicate remaining in the predicate list, then at 710 delete each relation used in the join from the list of relations. A relation cannot be deleted if there is any remaining connecting condition whose left operand has a label that is a subset of the label of the join result and it has this relation in the left operand. e. At 712, if the unique identifier is between an outer table and an inner table and the connecting condition between them is not used in the join, then at 714 save the unique identifier of the outer table in the join result. f. At 716, if an outer table is not deleted from the list of relation, then at 718 add a predicate to equate the grouping identifier saved in the join result and that outer row identifier. g. At 720, aggregate the right operand grouped by the saved outer row identifier if a connecting condition is used and its property indicates aggregate has to be done on the right operand before the join. h. At 722, if a connecting condition is used and the right operand of the predicate contains an outer table, the join will be done such that a row from the left operand will be returned at most once for each group of rows in the right operand associated with an outer row. i. At 724, if a predicate is used in the join and its property indicates an outer join is required, or if the join is between an outer join result and some inner table with no connecting condition being applied, the join will be implemented using an outer join. j. At 726, if the connecting condition used in the join is an EXISTS predicate, process as follows: i. At 728, if one join table contains only outer tables and the other join table does not contain any outer table, then at 730 a semi-join will be used that returns each outer row at most once. ii. Otherwise, after the join is done, at 732 a unique sort will be done on the join result with the unique key as the unique identifier of all the outer tables. 5. At 610 in FIG. 6, stop processing when there is only one relation left in the relation list. This relation should include all tables contained in the query. If the predicate list is exhausted before the relation list is reduced to one, the remaining relations are product-joined to produce the final result. CONCLUSION This concludes the description of the preferred embodiment of the invention. The following paragraph describes an alternative embodiment for accomplishing the same invention. In one alternative embodiment, any type of computer, šsuch as a mainframe, minicomputer, or personal computer, could be used to implement the present invention. In addition, any software program adhering (either partially or entirely) to the SQL language could benefit from the present invention. In summary, the present invention discloses a method, apparatus, and program product for optimizing a query and its subqueries together in a global manner regardless of the nesting levels. All joins in different nesting levels are effectively converted into joins in the same level by assigning a nesting level attribute to relations, assigning properties to the predicate, and assigning nesting level attributes and minimum relation set attributes to join predicate operands. The joins and their predicates are then considered as if they were contained in only the main query according to a set of rules based on the attributes of the relations and the join predicates. The foregoing description of the preferred embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.
|
Same subclass Same class Consider this |
||||||||||
