Method and system for using materialized views to evaluate queries involving aggregation5897632Abstract The present invention is a method and system for using materialized views to compute answers to SQL queries with grouping and aggregation. A query is evaluated a using a materialized view. The materialized view is semantically analyzed to determine whether the materialized view is usable in evaluating an input query. The semantic analysis includes determining that the materialized view does not project out any columns needed to evaluate the input query and determining that the view does not discard any tuple that satisfies a condition enforced in the input query. If the view is usable, the input query is rewritten to produce an output query that is multi-set equivalent to the input query and that specifies one or more occurrences of the materialized view as a source of information to be returned by the output query. The output query is then evaluated. The semantic analysis and rewriting may be iterated, with the output query of each iteration being the input query of the next iteration. The output query is evaluated after the last iteration. Claims What is claimed is: Description FIELD OF THE INVENTION
______________________________________
Q.sub.1 :
SELECT Year, Plan.sub.-- Name, SUM(Charge)
FROM Calls, Calling.sub.-- Plans
WHERE Calls.Plan.sub.-- Id = Calling.sub.-- Plans.Plan.sub.--
Id
AND Year .gtoreq. 1990 AND Year .ltoreq. 1995
GROUPBY Year, Plan.sub.-- Name
HAVING SUM(Charge) > 1,000,000
______________________________________
The telephone company also maintains materialized views that summarize the performance of each of their calling plans on a periodical basis. In particular assume that the following materialized view V.sub.1 (Plan.sub.-- Id,Month,Year,Earnings) is available:
______________________________________
V.sub.1 :
SELECT Plan-ID, Month, Year, SUM(Charge)
FROM Calls
GROUPBY Plan.sub.-- Id, Month, Year
______________________________________
View V.sub.1 can be used to evaluate the query Q.sub.1 by joining V.sub.1 with the table Calling.sub.-- Plans, collapsing multiple groups corresponding to the monthly plan earnings into annual plan earnings, and enforcing the additional conditions to get the summaries of plans earning more than a million dollars in one of the years between 1990 and 1995. The rewritten query Q.sub.1 that uses V.sub.1 is:
______________________________________
Q.sub.1 '
SELECT Year, Plan.sub.-- Name, SUM(Earnings)
FROM V.sub.1, Calling.sub.-- Plans
WHERE V.sub.1.Plan.sub.-- Id = Calling.sub.-- Plans.Plan.sub.
-- Id
AND Year .gtoreq. 1990 AND Year .ltoreq. 1995
GROUPBY Year, Plan.sub.-- Name
HAVING SUM(Earnings) > 1,000,000
______________________________________
The Calls table may be huge, and the materialized view V.sub.1 is likely to be orders of magnitude smaller than the Calls table. Hence, evaluating Q.sub.1 will be much faster than evaluating Q.sub.1, emphasizing the importance of recognizing that Q.sub.1 can be rewritten to use the materialized view V.sub.1. Consider now the case where, instead of V.sub.1, the telephone company maintains the materialized view V.sub.1 (Plan-Id, Month, Year, Earnings), summarizing the performance of their calling plans only since 1991:
______________________________________
V.sub.1 ':
SELECT Plan-ID, Month, Year, SUM(Charge)
FROM Calls
WHERE Year .gtoreq. 1991
GROUPBY Plan.sub.-- Id, Month, Year
______________________________________
View V.sub.1 can still be used to evaluate query Q.sub.1, However, not all the table in Q.sub.1 can be computed using V.sub.1 ': the summary information computation for 1990 would have to access the Calls table, and the rewritten query Q.sub.1 ": involves a UNION ALL.
______________________________________
Q.sub.1 ":
SELECT Year, Plan.sub.-- Name, SUM(Earnings)
FROM V.sub.1 ', Calling.sub.-- Plans
WHERE V.sub.1 '.Plan.sub.-- Id = Calling.sub.-- Plans.Plan.sub.
-- Id
AND Year .ltoreq. 1995
GROUPBY Year, Plan.sub.-- Name
HAVING SUM(Earnings) > 1,000,000
UNION ALL
SELECT Year, Plan.sub.-- Name, SUM(Charge)
FROM Calls, Calling.sub.-- Plans
WHERE Calls.Plan.sub.-- Id = Calling.sub.-- Plans.Plan.sub.--
Id
AND Year = 1990
GROUPBY Year, Plan.sub.-- Name
HAVING SUM(Charge) > 1,000,000
______________________________________
Evaluating Q.sub.1 " will still be faster than evaluating Q.sub.1, even though it involves accessing the Calls table. A process for rewriting a query is shown in FIG. 2. The process begins with step 202, in which the original query, which is to be rewritten, and the views that are to be used, are provided. One view is selected to be analyzed first. In step 204, the view is semantically analyzed to determine whether it is usable in evaluating the original query. The semantic analysis involves two parts. First, the view is analyzed to determine whether the view projects out any needed columns. Second, the view is analyzed to determine whether the view discards any tuples that satisfy a condition enforced in the original query. If a view V is usable in evaluating a query Q, then V must "replace" some of the tables and conditions enforced in Q: other tables and conditions from Q must remain in the rewritten query Q'. The rewritten query Q' can be a single-block query, or a multi-block query that is a UNION ALL of single-block queries. For view V to be usable in answering query Q, such that Q' is a single-block query, it must be the case that: V does not project out any columns needed by Q. A column A is needed by Q if it appears in the result of Q or if Q needs to enforce a condition involving A that has not been enforced in the computation of V. V does not discard any tuples needed by Q. A tuple is needed by Q if it satisfies the conditions enforced in Q. When Q' can be a multi-block query, the second requirement can be somewhat relaxed to require that V not discard any tuples needed for some of the groups in Q. In step 206, it is determined whether the view is usable, based on the results of step 204. If the view is not usable, then in step 214, the next view from among those provided is selected to be analyzed. If the view is usable, then in step 208, the query is rewritten to form a multi-set equivalent query that uses the view. In step 210, it is determined whether the iteration of the selected view is complete. If it is not, then the process loops back to step 204, in which the rewritten query is semantically analyzed. As long as the view remains usable, a rewritten query is generated for each iteration. If the iteration is complete, then in step 212, it is determined whether all provided views have been analyzed. If not, then in step 214, the next provided view is selected. The process then loops back to step 204 and the newly selected view is iteratively analyzed using the current rewritten query. If, in step 212, all provided views have been analyzed, then in step 216, the final rewritten query that results from the process is evaluated. NOTATION AND DEFINITIONS We consider SQL queries and views with grouping and aggregation. Queries can either be single-block queries (described below) or union multi-block queries that are the UNION ALL (i.e., additive multiset union) of single-block queries. A view is defined by a query, and the name of the view is associated with the result of the query. In this document, we consider only views defined by single-block queries. We give the form as well as a simple example of a single-block query below:
______________________________________
Q: SELECT Sel(Q)
FROM R.sub.1 (A.sub.1), . . . , R.sub.n (A.sub.n)
WHERE Conds(Q)
GROUPBY Groups(Q)
HAVING GConds(Q)
Qe: SELECT A, MAX(D), SUM(E)
FROM R(A, B), S(C, D, E)
WHERE B = C
GROUPBY A, B
HAVING SUM(D) > 1000
______________________________________
For notational convenience, we modify the naming convention of standard SQL to guarantee unique column names for each of the columns in a single-block query. For example, let R.sub.1 and R.sub.2 be two tables each with a single column named A. If a single-block query Q has both R.sub.1 and R.sub.2 in its FROM clause, our notation would replace them by R.sub.1 (A.sub.1) and R.sub.2 (A.sub.2). Every reference to R.sub.1.A in Q is replaced by A.sub.1, and every reference to R.sub.2.A in Q is replaced by A.sub.2. Similarly, if a single-block query Q has two range variables R.sub.1 and R.sub.2 ranging over table R in its FROM clause, our notation would replace them by R(A.sub.1) and R(A.sub.2). Every reference to R.sub.1.A in Q is replaced by A.sub.1, and every reference to R.sub.2.A in Q is replaced by A.sub.2. We use Tables(Q) to denote the set of tables (along with their columns) {R.sub.1 (A.sub.1), . . . ,R.sub.n (A.sub.n)} in the FROM clause of a single-block query Q, and Cols(Q) to denote A.sub.1 .orgate. . . . .orgate.A.sub.n, i.e., the set of columns of tables in Tables(Q). In the example of query Q.sub.e, Tables(Q.sub.e) is {R(A,B),S(C,D,E)} and Cols(Q.sub.e) is {A,B,C,D,E}. The set of columns in the SELECT clause of Q, denoted by Sel(Q), consists of: (a) non-aggregation columns: this is a subset of the columns in Cols(Q): and is denoted by ColSel(Q); and (b) aggregation columns: these are of the form AGG(Y), where Y is in Cols(Q) and AGG is one of the aggregate functions MIN, MAX, SUM and COUNT. The set of columns that are aggregated upon, such as Y above, is a subset of Cols(Q), and is denoted by AggSel(Q). In the example of query Q.sub.e, Sel(Q.sub.e) is {A,MAX(D),SUM(E)}, ColSel(Q.sub.e) is {A} and AggSel(Q.sub.e) is {D,E}. The grouping columns of query Q, denoted by Groups(Q), consists of a subset of the columns in Cols(Q). SQL requires that if Groups(Q) is not empty, then ColSel(Q) must be a subset of Groups(Q). In the example of query Q.sub.e, Groups(Q.sub.e) is {A,B} and ColSel(Q.sub.e) is a proper subset of Groups(Q.sub.e). We consider built-in predicates that are arithmetic predicates of the form .alpha. op .beta., where op is one of the comparison predicates {<,.ltoreq.,=,.gtoreq.,>}, and .alpha. and .beta. are terms formed from columns of tables, aggregation columns, and constants using the arithmetic operations +, - and *. The conditions in the WHERE clause of query Q, denoted by Conds(Q), consists of a Boolean combination of built-in predicates formed using columns in Cols(Q) and constants. The conditions in the HAVING clause of query Q, denoted by GConds(Q), consists of a Boolean combination of built-in predicates formed using columns in Groups(Q), aggregation columns of the form AGG(Y) where Y is in Cols(Q), and constants. In the example of query Q.sub.e, Conds(Q.sub.e) is B=C, and GConds(Q.sub.e) is SUM(D)>1000. Given a single-block query Q, if Groups(Q), AggSel(Q) and GConds(Q) are non-empty (Note that each of Groups(Q), AggSel(Q) and GConds(Q) can be empty without the other two being empty.), then Q is referred to as an aggregation query. Determining that a single-block view V is usable in evaluating a single-block query requires (as we show later in the paper) that we consider mappings from V to Q. These are specified by column mappings, defined below. Column Mappings A column mapping from a single-block query Q.sub.a to a single-block query Q.sub.b is a mapping .phi. from Cols(Q.sub.a) to Cols(Q.sub.b) such that if R(A.sub.1, . . . ,A.sub.n) is a table in Tables(Q.sub.a), then: (1) there exists a table R(B.sub.1, . . . ,B.sub.n) in Tables(Q.sub.b), and (2) B.sub.i =.phi.(A.sub.i),1.ltoreq.i.ltoreq.n. A 1-1 column mapping .phi. is a column mapping from Q.sub.a to Q.sub.b such that distinct columns in Cols(Q.sub.a) are mapped to distinct columns in Cols(Q.sub.b). Otherwise, the column mapping is a many-to-1 column mapping. As a shorthand, if R is a table in Tables(Q.sub.a), we use .phi.(R(A.sub.1, . . . ,A.sub.n)) to denote R(.phi.(A.sub.1), . . . , .phi.(A.sub.n)), where A.sub.1, . . . ,A.sub.n are columns in Cols(Q.sub.a). We use similar shorthand notation for mapping query results, sets and lists of columns, sets of tables, and conditions. We formalize the intuitive notation of "usability" of view V in evaluating query Q as finding a rewriting of Q, defined below. In this paper, we consider only rewritings that are either single-block queries or multi-block queries that are UNION ALLS of single-block queries. For example, rewriting Q.sub.1 ' in Example 1.1 is a single block query, whereas rewriting Q.sub.1 " in the same example is a multi-block query that is a UNION ALL of single-block queries. Rewriting of a Query A query Q' is a rewriting of query Q that uses view V if: (1) Q and Q' are multiset-equivalent, i.e., they compute the same multiset of answers for any given database, and (2) Q' contains one or more occurrences of V in the FROM clause of one of its blocks. In the sequel, we say that view V is usable in evaluating query Q, if there exists a single-block or a union multi-block query Q' such that Q' is a rewriting of Q that uses V. When the rewritten query can be a multi-block query, there is a certain trivial sense in which any view V is usable in evaluating a given query Q--the rewritten query can be the UNION ALL of Q itself and a single-block query in which V occurs in the FROM clause and which has an unsatisfiable conjunction of built-in predicates in the WHERE clause. Further, when Q is unsatisfiable, any rewriting of Q would also have to be unsatisfiable. Dealing with these and other such possibilities would complicate our presentation without aiding our understanding of the problem. Hence, we consider satisfiable queries and views, and do not permit multi-block rewritings where any block is unsatisfiable. AGGREGATION QUERY AND CONJUNCTIVE VIEWS In this section we consider the problem of using single-block conjunctive views to evaluate a single-block query with grouping and aggregation. Using a single-block view to evaluate a multi-block query can be achieved by independently testing usability of the view in evaluating each block of the multi-block query separately. We formalize these intuitions below, show that they yield both necessary and sufficient conditions for certain kinds of queries, and present an algorithm to rewrite Q using V. We first examine the case when the query does not have a HAVING clause, and then describe the effect of the HAVING clause on the conditions for usability and the rewriting algorithms Aggregation Query Without A HAVING Clause Single-block Rewritten Query The conditions for usability of a single-block view V in evaluating a single-block query Q, such that the rewritten query Q' is a single-block query, are presented formally in FIG. 3 in terms of column mappings. Note that the conditions apply also to the restricted case when both the view and the query are conjunctive. Condition C.sub.1 302 and the first part of condition C.sub.4 308 essentially guarantee that the view is multiset equivalent to its image under .phi.: these are a reformulation of the conditions presented in S. Chaudhuri and M. Y. Vardi, "Optimization of real conjunctive queries", In Proc. ACM PODS, 1993 for testing equivalence of conjunctive queries under the multiset semantics. Note that the 1-1 mapping is necessary because of the multiset semantics, whereas a many-to-1 mapping would suffice in the case of sets. Condition C.sub.4 308 ensures that constraints not enforced in the view can still be enforced in the query when the view is used, since they do not refer to columns that are projected out in the view and hence are no longer available. Conditions C.sub.2 304 and C.sub.3 306 ensure that the view does not project out any columns that are required in the SELECT clause of the query. If conditions C.sub.1 -C.sub.4 302-308 are satisfied, the rewritten query Q' is obtained from Q by applying process ConjViewSingleBlock 400, shown in FIG. 4. Process 400 begins with step S.sub.1 402, in which all the tables in .phi.(Tables(V)) are replaced by .phi.(V). In step S.sub.2 404, each column A in Groups(Q) .orgate. ColSel(Q) .orgate. AggSel(Q) are replaced by .phi.(B.sub.A), where B.sub.A satisfies conditions C.sub.2 304 and C.sub.3 306, part 1. In step S.sub.2 404-406, a Boolean combination of built-in predicates Conds' satisfying condition C.sub.4 308 is determined. Conds(Q) in Q is then replaced by Conds'. In step S.sub.4 408, COUNT(A) is replaced by COUNT(B), where B is any column in .phi.(V). COUNT(A) is an aggregation column in Sel(Q) such that A is in .phi.(Cols(V)), but not in .phi.(Sel(V)). Theorem 3.1 Let Q be a single-block aggregation query without a HAVING Clause, and let V be a single-block conjunctive view. If conditions C.sub.1 -C.sub.4 are satisfied, V is usable in evaluating Q. In that case Q', obtained by applying algorithm ConjViewSingleBlock is a rewriting of Q using V. If Conds(Q) and Conds(V) contain only equality predicates of the form A=B, where A and B are column names or constants, and the rewritten query is required to be a single-block query, V is usable in evaluating Q only if conditions C.sub.1 -C.sub.4 are satisfied. The following example illustrates conditions C.sub.1 -C.sub.4 302-308 and process ConjViewSingleBlock 400 for obtaining a single-block rewritten query. EXAMPLE 2 Consider the telephone company database from Example 1. The following query Q.sub.2 can be used to determine the total earnings of each calling plan as well as the total number of calls charged under each calling plan in December 1995.
______________________________________
Q.sub.2 :
SELECT PN.sub.1, SUM(C.sub.1), COUNT(C.sub.1)
FROM Calls(F.sub.1, T.sub.1, TI.sub.1, D.sub.1, M.sub.1,
Y.sub.1, DU.sub.1, P.sub.1, C.sub.1),
Calling.sub.-- Plans(PI.sub.1, PN.sub.1)
WHERE P.sub.1 = PI.sub.1 AND Y.sub.1 = 1995 AND M.sub.1 = 12
GROUPBY PN.sub.1
______________________________________
Assume that the telephone company maintains call data for December 1995 as the view V.sub.2 below:
______________________________________
V.sub.2 :
SELECT F.sub.2, T.sub.2, TI.sub.2, D.sub.2, M.sub.2, Y.sub.2,
DU.sub.2, P.sub.2, C.sub.2
FROM Calls (F.sub.2, T.sub.2, TI.sub.2, D.sub.2, M.sub.2,
Y.sub.2, DU.sub.2, P.sub.2, C.sub.2)
WHERE Y.sub.2 = 1995 and M.sub.2 = 12
______________________________________
View V.sub.2 can be used to evaluate query Q.sub.2 since conditions C.sub.1 -C.sub.4 are satisfied: (C.sub.1) The 1-1 column mapping .phi. from V.sub.2 to Q.sub.2 is {F.sub.2 .fwdarw.F.sub.1,T.sub.2 .fwdarw.T.sub.1,TI.sub.2 .fwdarw.TI.sub.1,D.sub.2 .fwdarw.D.sub.1,M.sub.2 .fwdarw.M.sub.1,Y.sub.2 .fwdarw.Y.sub.1,DU.sub.2 .fwdarw.DU.sub.1,P.sub.2 .fwdarw.P.sub.1,C.sub.2 .fwdarw.C.sub.1 }. (C.sub.2) Trivially satisfied. (C.sub.3) For column C.sub.1, B.sub.C.sbsb.1 is the column C.sub.2 in Sel(V.sub.2). (C.sub.4) Conds' is given by P.sub.1 =PI.sub.1. The single-block rewriting of Q.sub.2 that uses V.sub.2 is:
______________________________________
Q.sub.2 '
SELECT PN.sub.1, SUM(C.sub.1), COUNT(C.sub.1)
FROM V.sub.2 (F.sub.1, T.sub.1, TI.sub.1, D.sub.1, M.sub.1,
Y.sub.1, DU.sub.1, P.sub.1, C.sub.1),
Calling.sub.-- Plans(PI.sub.1, PN.sub.1)
WHERE P.sub.1 = PI.sub.1
GROUPBY PN.sub.1
______________________________________
Multi-Block Rewritten Query When the rewritten query is not required to be a single-block query, but can be a multi-block query that is a UNION ALL of single-block queries, additional usage of views in evaluating queries are possible. The conditions for usability of a single-block view V in evaluating a single-block query Q, when Q' can be a multi-block rewritten query, are similar to the conditions for usability when Q' has to be a single-block query. In particular conditions C.sub.1 -C.sub.3 302-306 are unchanged. Condition C.sub.4 308 has to be modified to reflect the possibility that V can be used to compute only some of the tuples of Q. The modified condition C.sub.4.sup.m 500 is formally presented in FIG. 5. Intuitively, given a view V that satisfied condition C.sub.1 302 query Q can always be reformulated as a UNION ALL of 2 single-block queries Q.sub.a and Q.sub.b, that differ from Q (and from each other) only in their WHERE clauses such that (1) Conds(Q.sub.a) is equivalent to Conds(Q) & .phi.(Conds(V)), and (2) Conds(Q.sub.b) is equivalent to Conds(Q) & .phi.(Conds(V)). View V can be potentially used to evaluate Q.sub.a, but clearly not Q.sub.b. Conditions C.sub.1 -C.sub.3, 302-306 and parts 1, 2(a) and 2(b) of condition C.sub.4.sup.m 500 essentially check whether view V can be used to evaluate Q.sub.a. The reformulation of Q as the UNION ALL of Q.sub.a and Q.sub.b, however, does not always preserve the semantics of Q. To preserve the semantics, it must be guaranteed that Q.sub.a and Q.sub.b do not compute tuples for the same group of Q--part 2(c) of condition C.sub.4.sup.m 500 embodies this requirement. If conditions C.sub.1 -C.sub.3 302-306 and C.sub.4.sup.m 500 are satisfied, the multi-block rewritten query Q' is obtained, using process ConjViewMultiBlock 600, shown in FIG. 6. Process 600 begins with step 602, in which .phi.(Conds(V)) is used to split Q into Q.sub.a and Q.sub.b, such that Conds(Q.sub.a) is equivalent to Conds(Q) & .phi.(Conds(V)), and Conds(Q.sub.b) is equivalent to Conds(Q) and .phi.(Conds(V)). In step 604, process ConjViewSingleBlock is used to rewrite Q.sub.a to make use of view V, resulting in the single-block query Q'.sub.a. In step 606, if Q.sub.b is satisfiable, the multi-block query Q' that is the rewriting of Q using V is the UNION ALL of Q'.sub.a and Q.sub.b. Else Q' is the same as Q'.sub.a. Theorem 3.2 Let Q be a single-block aggregation query without a HAVING clause, and let V be a single-block conjunctive view. If conditions C.sub.1 -C.sub.3 and C.sub.4.sup.m are satisfied, V is usable in evaluating Q. In that case Q', obtained by applying algorithm ConjViewMuliBlock, is a multi-block rewriting of Q using V. Multiple Uses of Views Often a query can make use of multiple views, or the same view multiple times. The rewriting algorithms ConjViewSingleBlock and ConjViewMultiBlock presented above can be used to incorporate multiple uses of views. To obtain rewritings with multiple views we create successive rewritings Q'.sub.a, . . . ,Q'.sub.n, where each rewriting is obtained from the previous one by testing conditions C.sub.1 -C.sub.3 302-306 and either C.sub.4 308 or C.sub.4.sup.m 500 (depending on the form of the rewriting desired), and applying the corresponding rewriting algorithm. At each successive rewriting, the views incorporated in previous rewritings are treated as database tables rather than being expanded using their view definitions. Theorem 3.3 Let Q be a single-block aggregation query without a HAVING clause, and let V.sub.a, . . . V.sub.m be single-block conjunctive views. Then the following hold: 1. An iterative application of algorithm ConjViewSingleBlock is sound, i.e., each successive rewriting is multiset-equivalent to Q. 2. An iterative application of algorithm ConjViewMuldiBlock is sound, i.e., each successive rewriting is multiset-equivalent to Q. 3. The rewriting algorithm ConjViewSingleBlock is order-independent. That is, if there is a single-block rewriting of Q that uses each of V.sub.a, . . .V.sub.m then the result of rewriting Q to incorporate views V.sub.a, . . .V.sub.m, would be the same regardless of the order in which the views are considered. 4. If Conds(Q), Conds(V.sub.a), . . . ,Conds(V.sub.m) contain only equality predicates of the form A=B, where A and B are column names, or constants, and the rewritten query is required to be a single-block query, then the iterative application of algorithm ConjViewSingleBlock is complete. That is, any rewriting of Q that uses one or more of V.sub.a, . . . ,V.sub.m can be obtained by iteratively applying algorithm ConjyiewSingleBlock. It is important to note that, for the case of equality predicates, the iterative application of ConjViewSingleBlock guarantees that we find all ways of using the views to answer a query, provided the rewritten query is required to be a single-block query. AGGREGATION QUERY WITH A HAVING CLAUSE We now describe how to extend the previous algorithms to the case in which the queries may contain a HAVING clause. We only consider the case when the rewritten query is required to be a single-block query. The case when the rewritten query can be a multi-block query is a straightforward extension, along the lines described for aggregation queries without HAVING clauses. We first describe how to extend our usability conditions to accommodate the HAVING clause, and then show how we can use various transformations on the query that can cause the conditions to be satisfied in a larger number of cases. Intuitively, when the single-block query Q has a HAVING clause, the conditions for usability of a conjunctive view V in evaluating Q and the rewriting algorithm ConjViewSingleBlock need to be extended to account for: Conditions in GConds(Q) that must be satisfied by the query, in addition to conditions in Conds(Q), and Aggregation columns of the form AGG(Y), that occur in GConds(Q), but not in Sel(Q). To accommodate such conditions we modify C.sub.3 306 to also consider Ad arguments that appear in GConds(Q). The extended condition, C.sub.3.sup.h 700, is formally presented in FIG. 7. If Q and V satisfy conditions C.sub.1 302, C.sub.2 304, C.sub.3.sup.h 700 and C.sub.4 308, the single-block rewritten query Q' is obtained using the algorithm HavingConjViewSingleBlock, comprising steps 802, 804 and 806, presented in FIG. 8. Theorem 3.4 Let Q be a single-block aggregation query with a HAVING clause, and let V be a single-block conjunctive view. If conditions C.sub.1, C.sub.2, C.sub.3.sup.h and C.sub.4 are satisfied, V is usable in evaluating Q. In that case Q', obtained by applying algorithm HavingConjViewSingleBlock, is a rewriting of Q using V. Strengthening the Conditions in the Query When query Q has a HAVING clause, the conditions in its HAVING clause may enable us to strengthen the conditions in the WHERE clause, without affecting the result of the query. Strengthening the conditions in the WHERE clause may allow us to detect usability of views that would otherwise not be determined to be usable, because it makes it more likely that condition C.sub.4 308 will be satisfied. Several authors have considered the problem of inferring conditions that can be conjoined to Conds(Q) given the conditions in GConds(Q), and removing redundant conditions in GConds(Q). These techniques can be applied to rewrite the query Q as a pre processing step, yielding possibly modified conditions Conds(Q) and GConds(Q). The modified Conds(Q) and GConds(Q) are then used in checking conditions C.sub.2 304, C.sub.3.sup.h 700, and C.sub.4 308. EXAMPLE 3 Consider again the telephone company database from Example 1.1. The following query Q.sub.3 can be used to determine, for each customer, the maximum charge for a single call under the calling plan "TrueUniverse" in December 1995, provided the charge exceeds $10.
______________________________________
Q.sub.3 :
SELECT F.sub.1, MAX(C.sub.1)
FROM Calls(F.sub.1, T.sub.1, TI.sub.1, D.sub.1, M.sub.1,
Y.sub.1, DU.sub.1, P.sub.1, C.sub.1),
Calling.sub.-- Plans(PI.sub.1, PN.sub.1)
WHERE P.sub.1 = PI.sub.1 AND PN.sub.1 = "TrueUniverse"
AND Y.sub.1 = 1995 AND M.sub.1 = 12
GROUPBY F.sub.1
HAVING MAX(C.sub.1) > 10
______________________________________
Assume that the telephone company maintains detailed call data for 1995, for calls whose charge exceeds $1, as the view V.sub.3 below:
______________________________________
V.sub.3 :
SELECT F.sub.2, T.sub.2, TI.sub.2, D.sub.2, M.sub.2, Y.sub.2,
DU.sub.2, P.sub.2, C.sub.2
FROM Calls(F.sub.2, T.sub.2, TI.sub.2, D.sub.2, M.sub.2,
Y.sub.2, DU.sub.2, P.sub.2, C.sub.2)
WHERE Y.sub.2 = 1995 AND C.sub.2 > 1
______________________________________
Although the WHERE clause of Q.sub.3 does not enforce any conditions on the Charge column, while the WHERE clause of V.sub.3 does, V.sub.3 can still be used to evaluate Q.sub.3. This is because the condition MAX(C.sub.1)>10 in the HAVING clause of Q.sub.3 is equivalent to having the condition C.sub.1 >10 in the WHERE clause of Q.sub.3. Strengthening. Conds(Q.sub.3) by conjoining C.sub.1 >10 (and subsequently removing the redundant HAVING clause) allows the detection of usability of V.sub.3 in evaluating Q.sub.3. The rewriting of Q.sub.3 that uses V.sub.3 is:
______________________________________
Q.sub.3 ':
SELECT F.sub.1, MAX(C.sub.1)
FROM V.sub.3 (F.sub.1, T.sub.1, TI.sub.1, D.sub.1, M.sub.1,
Y.sub.1, DU.sub.1, P.sub.1, C.sub.1),
Calling.sub.-- Plans(PI.sub.1, PN.sub.1)
WHERE P.sub.1 = PI.sub.1 AND PN.sub.1 = "TrueUniverse"
AND M.sub.1 = 12 AND C.sub.1 > 10
GROUPBY F.sub.1
______________________________________
Note that view V.sub.3 cannot be used to answer query Q.sub.2 (from Example 3.1) since conditions C.sub.4 308 and C.sub.4.sup.m 500 are violated--in particular V.sub.3 enforces the condition C.sub.2 304>1, which results in the discarding of Calls tuples needed by Q.sub.2. AGGREGATION QUERY AND VIEWS In this section we consider the problem of using single-block views in evaluating single-block queries when both the view and the query have grouping and aggregation. We only consider the case when the rewritten query is required to be a single-block query. Recall that the two intuitive requirements for the usability of a conjunctive view V in answering a single-block aggregation query Q are that V not project out columns needed in Q, and that V not discard tuples needed in Q. In the presence of grouping and aggregation in the view, these requirements become more subtle: An aggregation over a column in V can be thought of as though that column was partially projected out, since V contains just aggregate values over that column, not the original column values themselves. A GROUPBY in V results in the multiplicities of the tuples being lost. However, as the following examples illustrate, in some cases it is possible to overcome the difficulties introduced by grouping and aggregation in the view. EXAMPLE 4 Coalescing Subgroups The following example illustrates that the aggregate information in a view may be sufficient to compute the aggregate information needed in the query. Consider the telephone company database from Example 1. The following query Q.sub.4 can be used to determine the total earnings of various calling plans as well as the maximum charge under each calling plan in 1995.
______________________________________
Q.sub.4 :
SELECT P.sub.1, PN.sub.1 SUM(C.sub.1), MAX(C.sub.1)
FROM Calls(F.sub.1, T.sub.1, TI.sub.1, D.sub.1, M.sub.1,
Y.sub.1, DU.sub.1, P.sub.1, C.sub.1),
Calling.sub.-- Plans(PI.sub.1, PN.sub.1)
WHERE P.sub.1 = PI.sub.1 AND Y.sub.1 = 1995
GROUPBY P.sub.1, PN.sub.1
______________________________________
Assume that the telephone company also maintains information giving the total earnings as well as the maximum charge of each calling plan in each month in the form of view V.sub.4 below:
______________________________________
V.sub.4 :
SELECT P.sub.2, M2, Y2 SUM(C.sub.2), MAX(C.sub.2)
FROM Calls(F.sub.2, T.sub.2, TI.sub.2, D.sub.2, M.sub.2,
Y.sub.2, DU.sub.2, P.sub.2, C.sub.2),
GROUPBY P.sub.2, M.sub.2, Y.sub.2
______________________________________
View V.sub.4 groups the table Calls by the Plan.sub.-- Id,Month, and Year columns, and computes aggregate information on each such group. Query Q.sub.4, on the other hand, groups the table Calls only on the Plan.sub.-- Id column, resulting in more coarse groups than those computed in V.sub.4. However, the aggregate information of the Plan.sub.-- Id groups in Q.sub.4 can be computed by further aggregating the aggregate information computed for the (Plan-Id, Month, Year) groups in V.sub.4, as illustrated in the following rewritten query:
______________________________________
Q.sub.4.sup.'
SELECT P.sub.1, PN.sub.1 SUM(ME.sub.1), MAX(MC.sub.1)
FROM V.sub.a (P.sub.1, M.sub.1, Y.sub.1, ME.sub.1,
MC.sub.1),
Calling.sub.-- Plans(PI.sub.1, PN.sub.1)
WHERE P.sub.1 = PI.sub.1 AND Y.sub.1 = 1995
GROUPBY P.sub.1, PN.sub.1
______________________________________
The following example illustrates that the existence of other columns in the view may enable us to recover the tuple multiplicities lost because of grouping in the view. EXAMPLE 5 Recovery of Lost Multiplicities Consider again the telephone company database from Example 1. The following query Q.sub.5 can be used to determine the total number of calls under each calling plan in 1995:
______________________________________
Q.sub.5 :
SELECT P.sub.1, COUNT(CN.sub.1)
FROM Calls(F.sub.1, T.sub.1, TI.sub.1, D.sub.1, M.sub.1,
Y.sub.1, DU.sub.1, P.sub.1, C.sub.1)
Customer(PN.sub.1, CN.sub.1)
WHERE F.sub.1 = PN.sub.1 AND Y.sub.1 = 1995
GROUPBY P.sub.1
______________________________________
View V.sub.5a below maintains the total annual revenue for each customer, plan, and year:
______________________________________
V.sub.5a :
SELECT F.sub.2, P.sub.2, Y.sub.2, SUM(C.sub.2)
FROM Calls(F.sub.2, T.sub.2, TI.sub.2, D.sub.2, M.sub.2,
Y.sub.2, DU.sub.2, P.sub.2, C.sub.2)
WHERE F.sub.2, P.sub.2, Y.sub.2
______________________________________
V.sub.5a cannot be used to evaluate Q.sub.5. This is because the multiplicity of the From column of Calls is needed in order to compute COUNT(CN.sub.1), but that multiplicity is lost in the view V.sub.5a. However, consider view V.sub.5b below.
______________________________________
V.sub.5b :
SELECT F.sub.2, P.sub.2, Y.sub.2, SUM(C.sub.2), COUNT(C.sub.2)
FROM Calls(F.sub.2, T.sub.2, TI.sub.2, D.sub.2, M.sub.2,
Y.sub.2, DU.sub.2, P.sub.2, C.sub.2)
GROUPBY F.sub.2, P.sub.2, Y.sub.2
______________________________________
Although the multiplicities of the From column are not explicit in V.sub.5b, they can be computed using the available information, V.sub.5b can be used to evaluate Q.sub.5 as follows:
______________________________________
Q.sub.5.sup.'
SELECT P.sub.1, SUM(GC.sub.1)
FROM V.sub.5b (F.sub.1, P.sub.1, Y.sub.1, YE.sub.1,
CG.sub.1),
Customer(PN.sub.1, CN.sub.1)
WHERE F.sub.1 = PN.sub.1 AND Y.sub.1 = 1995
GROUPBY P.sub.1
______________________________________
As the examples illustrate, to use views that involve aggregations, we need to verify that (a) the aggregate information in the view is sufficient to compute the aggregates needed in the query, and that (b) the correct multiplicities exist or can be computed. We formalize these intuitions below, present conditions for usability, and provide an algorithm to rewrite Q using V. WITHOUT HAVING CLAUSES To specify conditions for usability for single-block aggregation views, we need to slightly modify conditions C.sub.2 304 and C.sub.4 308 and to substantially modify condition C.sub.3 306 to deal with the different cases of aggregates appearing in the SELECT clause of the query. (Condition C.sub.1 302 is unchanged.) The modified conditions are formally presented in FIG. 9. Since ColSel(Q) must be a subset of Groups(Q), condition C.sub.2.sup.a 904 is a generalization of condition C.sub.2 304. Intuitively, condition C.sub.3.sup.a 906 guarantees that the columns in the view contain enough information to compute the aggregates required in the query. In particular, condition C.sub.3.sup.a 906 parts 1(b), 1(c) and 2 guarantee that we can recover the multiplicities in the view in order to perform an aggregation that depends on such multiplicities (i.e., either SUM or COUNT). The two parts of the condition cover the cases when the aggregation is on a column mapped by the view, and not mapped by the view, respectively. Note that the second part of condition C.sub.4.sup.a 908 does not allow Conds' to constrain any of the columns in .phi.(AggSel(V)). Intuitively, this is because the columns in AggSel(V) are aggregated upon in view V, and hence are not available for imposition of additional constraints in the rewritten query Q'. If conditions C.sub.1.sup.a -C.sub.4.sup.a 902-908 are satisfied, the rewritten query Q' is obtained from Q by applying algorithm AggViewSingleBlock, presented in FIG. 10. Steps S.sub.1.sup.a 1002,S.sub.2.sup.a .sub.1004,S.sub.3.sup.a 1006 are similar to steps S.sub.1 402, S.sub.2 404 and S.sub.3 406 of algorithm ConjViewSingleBlock. Steps S.sub.4 .sup.a 1008 and S.sub.5.sup.a 1010 deal with the various kinds of aggregation that may occur in the view and the query. Theorem 4.1 Let Q and V be single-block aggregation queries without HAVING clauses. If conditions C.sub.1.sup.a -C.sub.4.sup.a are satisfied, V is usable in evaluating Q. In that case, Q', obtained by applying algorithm AggViewSingleBlock, is a rewriting of Q using V. EXAMPLE 6 Consider again the query Q.sub.4 and view V.sub.4 from Example 4.1. View V.sub.4 can be used to evaluate Q.sub.a since conditions C.sub.1.sup.a -C.sub.4.sup.a are satisfied. Condition C.sub.1.sup.a : The 1-1 column mapping .phi. from V.sub.4 to Q.sub.4 is {F.sub.2 .fwdarw.F.sub.1,T.sub.2 .fwdarw.T.sup.1,TI.sub.2 .fwdarw.TI.sub.1,D.sub.2 .fwdarw.D.sub.1,M.sub.2 .fwdarw.M.sub.1,Y.sub.2 .fwdarw.Y.sub.1,DU.sub.2 .fwdarw.DU.sub.1,P.sub.2 .fwdarw.P.sub.1,C.sub.2 .fwdarw.C.sub.1 }. Condition C.sub.2.sup.a : For column P.sub.1 in Groups(Q.sub.4), B.sub.P.sbsb.1 is the column P.sub.2 in ColSel(V.sub.4). Condition C.sub.3.sup.a : For column SUM(C.sub.1) in Sel(Q.sub.4), Sel(V.sub.4) contains column SUM(C.sub.2), and for column MAX(C.sub.1) in Sel(Q.sub.4), Sel(V.sub.4) contains column MAX(C.sub.2). Condition C.sub.4.sup.a : Conds' is the same as Conds(Q.sub.4), i.e., P.sub.1 =PI.sub.1 &Y.sub.1 =1995 since no conditions are enforced in V.sub.4. The rewritten query Q.sub.4 resulting from applying steps S.sub.1.sup.a -S.sub.5.sup.a is given in Example 4.1. EXAMPLE 7 Constraining .phi.(AggSel(V)) Consider again the telephone database from Example 1.1. The following Q.sub.6 can be used to determine the total earnings of various calling plans in 1995, considering only calls whose charge exceeds $1.
______________________________________
Q.sub.6 :
SELECT P.sub.1, SUM(C.sub.1)
FROM Calls(F.sub.1, T.sub.1, TI.sub.1, D.sub.1, M.sub.1,
Y.sub.1, DU.sub.1 P.sub.1, C.sub.1),
WHERE Y.sub.1 = 1995 and C.sub.1 > 1
GROUPBY P.sub.1
______________________________________
Let the view V.sub.6 be the same as view V.sub.4 (from Example 4.1):
______________________________________
V.sub.6 :
SELECT P.sub.2, M.sub.2, Y.sub.2, SUM(C.sub.1), MAX(C.sub.2)
FROM Calls(F.sub.2, T.sub.2, TI.sub.2, D.sub.2, M.sub.2,
Y.sub.2, DU.sub.2, P.sub.2, C.sub.2)
GROUPBY P.sub.2, M.sub.2, Y.sub.2
______________________________________
View V.sub.6 cannot be used to evaluate Q.sub.6 above, although in the absence of the condition "C.sub.1 >1" in the WHERE clause in Q.sub.6, V.sub.6 could be used to evaluate Q.sub.6. Intuitively, this is because the built-in predicates in the query constrain the possible values of C.sub.1 and C.sub.2 is aggregated upon in the view V.sub.6, no condition on the result of the SUM or the MAX in V.sub.6 can capture the effect of the condition on C.sub.1 in Q.sub.6. With HAVING Clauses Essentially, the additional subtleties that must be considered involve the relationships between the GROUPBY and HAVING clauses in the view V and the query Q. Intuitively, the HAVING clause in V may eliminate certain groups in V (i.e., those that do not satisfy GConds(V)). If any of these eliminated groups in V is "needed" to compute an aggregate function over a group in Q, by coalescing multiple groups in V, then V cannot be used to evaluate Q. Hence, condition C.sub.4.sup.a 908 must be extended to test whether there exists GConds' such that GConds (Q) is equivalent to the combination of GConds(V) and GConds', taking the grouping columns Groups(V) and Groups(Q) into account. Before checking any of the conditions for usability, the query Q and view V can be independently preprocessed to "move" maximum sets of conditions from the HAVING clause to the WHERE clause, as discussed in Section 3.3; the resulting normal form allows independent comparison of Conds(Q) and Conds(V), on the one hand, and of GConds(Q) and GConds(V), on the other. The rewriting algorithm takes these additional refinements of the conditions of usability into account. Specifically, step S.sub.3.sup.a 1006 determines a GConds' in addition to Conds', using GConds(V) and GConds(Q) (resulting from the preprocessing step). Steps S.sub.4.sup.a 1008 and S.sub.5.sup.a 1010 are augmented to compute aggregation columns appearing in GConds(Q) in addition to those appearing in Sel(Q). Conjunctive Query and Aggregation Views Consider the case when the query Q is a conjunctive query (i.e., no grouping and aggregation), but the view V has grouping and aggregation. In this case the GROUPBY clause in the view results in losing information about the multiplicities of tuples, and view V cannot be used to evaluate Q if the multiset semantics is desired. Theorem 5.1 Let Q be a conjunctive query, and V be a single-block aggregation view. Then, there is no single-block rewriting of Q using V. The following example illustrates the problem with conjunctive queries and aggregation views: EXAMPLE 8 Consider the telephone company database from Example 1.1. The query Q.sub.7 below is used to obtain information about calls exceeding an hour in duration:
______________________________________
Q.sub.7 :
SELECT F.sub.1, D.sub.1, M.sub.1, Y.sub.1
FROM Calls(F.sub.1, T.sub.1, TI.sub.1, D.sub.1, M.sub.1,
Y.sub.1, DU.sub.1, P.sub.1, C.sub.1)
WHERE DU.sub.1 > 3600
______________________________________
The view V.sub.7 below counts the number of calls exceeding an hour in duration made by each caller on a daily basis:
______________________________________
V.sub.7 :
SELECT F.sub.2, D.sub.2, M.sub.2, Y.sub.2, COUNT(T.sub.2)
FROM Calls(F.sub.2, T.sub.2, TI.sub.2, D.sub.2, M.sub.2,
Y.sub.2, DU.sub.2, P.sub.2, C.sub.2)
WHERE DU.sub.2 > 3600
GROUPBY F.sub.2, D.sub.2, M.sub.2, Y.sub.2
______________________________________
There is a 1-1 column mapping from V.sub.7 to Q.sub.7, Sel(V.sub.7) contains all the columns required in Sel(Q.sub.7), and the conditions enforced by the WHERE clauses are identical. Even through COUNT(T.sub.2) has the required multiplicity information, this information cannot be used in an SQL query to "replicate" the tuples in V.sub.7 the appropriate number of times. Thus there is no rewriting of Q.sub.7 that uses view V.sub.7. Although a specific embodiment of the present invention has been described, it will be understood by those of skill in the art that there are other embodiments which are equivalent to the described embodiment. Accordingly, it is to be understood that the invention is not to be limited by the specific illustrated embodiment, but only by the scope of the appended claims.
|
Same subclass Same class Consider this |
||||||||||
