Automated query optimization method using both global and parallel local optimizations for materialization access planning for distributed databases4769772Abstract In a Distributed Database System (DDS), database management and transaction management are extended to a distributed environment among a plurality of local sites which each have transaction server, file server, and data storage facilities. The Materialization and Access Planning (MAP) method of a distributed query, update, or transaction is an important part of the processing of the query, update, or transaction. Materialization and access planning results in a strategy for processing a query, update, or transaction in the distributed database management system (DSDBMS). Materialization consists of selecting data copies used to process the query, update, or transaction. This step is necessary since data may be stored at more than one site (i.e., computer) on the network. Access planing consists of choosing the execution order of operations and the actual execution site of each operation. Three access planning methods are used: General (Response), General (Total) and Initial Feasible Solution (IFS). For a distributed query, General (Response) and General (Total) decrease the communication cost and increase the local processing costs as compared to the IFS. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
__________________________________________________________________________
<network commands> ::=
{ <preconditions> } ( <move command>
<insert command> <execute command>
<modify command> <delete command> )
<preconditions> ::= { <preconditions> } <command>
<move command> ::=
<execution site> <temporary relation name>
<destination>
<insert command> ::= <execution site list> <insert>
<execution site list> ::= { <execution site list> } <execution site>
<execute command> ::= <execution site> <assignment>
<modify command> ::= <execution site list> <modify>
<delete command> ::= <execution site list> <delete>
<assignment> ::= <target> <- <algebra expression>
<target> ::= <temporary relation name>
<algebra expression> ::=
<item list> <selection> <projection>
<join> <two relation operation>
<item list> ::= { <item list> , } <value spec>
<value spec> ::= <variable> <constant>
<selection> ::= SELECT <term> WHERE <logical expression>
<term> ::= <relation name> `(` <algebra expression> `)`
<projection> ::= PROJECT <term> OVER <attribute spec list>
<attribute spec list> ::= { <attribute spec list> , } <attribute spec>
<attribute spec> ::=
<attribute name> <relation name>
<attribute name>
<join> ::= <term> JOIN <term> WHERE <join attribute specs>
<join attribute specs> ::=
{<join attribute spec> AND}
<join attribute spec>
<join attribute spec> ::= <attribute spec> = <attribute spec>
<two relation operation> ::= <term> <two relation operator> <term>
<two re1ation operator> ::=
UNION INTERSECT MINUS
TIMES DIVIDEBY
<logical expression> ::=
{ <logical expression> (AND OR) }
<logical term>
<1ogical term> ::=
{ NOT } ( <condition>
`(` <logical expression> `)` }
<condition> ::=
<set expr> <set comparison op> <set expr>
<scalar expr> <scalar comparison op> <scalar expr>
<set comparison op> ::= INCLUDES INCLUDED --IN = <>
<set expr> ::= <item list> <relation spec>
<relation spec> ::= <relation name> <attribute spec list>
< scalar comparison op> ::= = <> < > <= >=
<scalr expr> ::= { <scalar expr> (+ -) } <term>
<term> ::= {<term> (X / DIV REM) } <factor>
<factor> ::= `(` <scalar expr> `)` <basic value>
<basic value> ::=
{ <scalar function> } (<value spec>
<attribute spec>)
<scalar function> ::= ABS SQRT
<insert> ::=
INSERT <temporary relation name> INTO
<base relation name>
<delete> ::= DELETE <base relation name> WHERE <logical expression>
<modify> ::=
MODIFY <attribute spec list> OF <base relation name> TO
<scalar expr list> WHERE <logical expression>
<scalar expr list> ::= {<scalar expr list>,} <scalar expr>
__________________________________________________________________________
For example, assume that Emp and Dept are at sites 1 and 2 respectively. Assume that the result site is site 2. The commands that produce the final result at the result site will be shown. For the query: get <LName, Sex> of Emp, the following commands may be generated (depending on the optimization used):
______________________________________
command#
site precondition
command
______________________________________
1 1 -- T1<-SELECT Emp
2 1 1 T2<-PROJECT T1
OVER Emp.LName, Emp.Sex
3 1 2 MOVE T2 TO SITE 2
______________________________________
For the query: get <LName, Sex> of Emp where FName=`Mary` and Name of Dept=`ISA`, the following commands satisfy the query:
______________________________________
com-
mand# site precondition
command
______________________________________
1 1 -- T1<-SELECT Emp WHERE
Emp.FName = `Mary`
2 1 1 MOVE T1 TO SITE 2
3 2 -- T2<-SELECT Dept WHERE
Dept.DName = `ISA`
4 2 2,3 T3<-JOIN T1, T2 WHERE
T1.Dep#=T2.D#
5 2 4 T4<-PROJECT T3 OVER
T3.LName, T3.Sex
______________________________________
For the query: get <LName, Sex> of Emp where FName32 `Mary` or Name of Dept=`ISA`, the following commands satisfy the query:
______________________________________
com-
mand# site precondition
command
______________________________________
1 1 -- T1<-SELECT Emp WHERE
Emp.FName = `Mary`
2 1 1 MOVE T1 TO SITE 2
3 1 -- T2<-SELECT Emp
4 1 3 MOVE T2 TO SITE 2
5 2 -- T3<-SELECT Dept WHERE
Dept.DName = `ISA`
6 2 4,5 T4<-JOIN T2,T3 WHERE
T2.Dep#=T3.D#
7 2 2 T5<-PROJECT T1 OVER
T1.LName, T1.Sex
8 2 6 T6<-PROJECT T4 OVER
T4.LName, T4.Sex
9 2 7,8 T7<-T5 UNION T6
______________________________________
For the update: delete Emp where Lname=`Smith` the following commands are generated:
______________________________________
com- precon-
mand# site dition command
______________________________________
1 1 -- T1<-SELECT Emp WHERE
Emp.LName = `Smith`
2 1 1 T2<-PROJECT T1 OVER T1.E#
3 1 1,2 DELETE Emp WHERE Emp.E#
INCLUDED IN T1
______________________________________
In summary, MAP is applied to queries, updates, and transactions expressed in terms of the relational model of data. The input to MAP is an internal SQL representation, and the output from MAP is a distributed relational algebra representation. Type of Distribution The type of data distribution affects the MAP process in three ways. The first is whether the data are locally or geographically distributed, the second is whether the data are replicated, and the third is whether the data are partitioned. Both the materialization and access planning methods used in the MAP process depend on whether data are locally or geographically distributed. Materialization and access planning processes may be divided into two classes. The first class finds a minimum cost execution strategy based on a given materialization, and the second class finds a minimum cost materialization based on a given execution strategy (see "The Optimization of Query Processing in Distributed Database Systems", Ph.D. Thesis, by A. R. Hevner, Technical Report DB-80-02, Department of Computer Science, Purdue University, December 1979, and "Methods for Data Retrieval in Distributed Systems", by A. R. Hevner, Proceedings of the Second Symposium on Reliability in Distributed Software and Database Systems, Pittsburgh, July 1982, which are incorporated herein by reference). The access planning methods used in DDS are GENERAL (RESPONSE), GENERAL (TOTAL), and Initial Feasible Solution (IFS) as described in "Optimization Algorithms for Distributed Queries", by P. M. G. Apers, A. R. Hevner, and S. B. Yao, IEEE Transactions on Software Engineering, Vol. SE-9, No. 1, January 1, 1983, which is incorporated herein by reference. These methods belong in the first class. That is, a materialization for all data has been chosen before the access planning process builds an execution strategy. The materialization process chooses a site for all data in the query. The method used depends on whether the data are locally or geographically distributed. In a geographically distributed environment, a "closest in distance" type of method may be used. In a locally distributed environment, distance is not a major factor, so a clustering method may be better. Both methods have been implemented on DDS. The access planning process also depends on whether the data are locally or geographically distributed. Access planning methods try to minimize a cost function of some type. A typical cost function is given by: cost=communication costs+processing costs. The communication costs are a function of the amount of data transmitted on the network. The processing costs are a function of the disk access and CPU costs at each site in the network. GENERAL (RESPONSE) and GENERAL (TOTAL) assume that processing costs are negligible with respect to communication costs. As a result of this assumption, execution strategies are produced that decrease the amount of data to be sent at the expense of computation. This assumption is more valid in a geographically distributed network than in a locally distributed network. In addition, dynamic system state factors such as communication line contention and queueing delays are not considered. The access planning method used does not depend on whether data is replicated; conversely, the materialization method used depends on whether data is replicated. This method chooses copies of data needed to execute the query, update, or transaction from the various sites. If the data are not replicated, there is no choice to make and the materialization is predetermined. The MAP process includes a simple materialization method that chooses sites based on the number of relations needed by the query, update or transaction located at each site. It chooses those sites with the most relations needed by the query, update, or transaction. This method can be easily extended to choose sites based on the number of records or bytes needed by the query, update, or transaction. Both the materialization and the access planning methods depend on the type of data partitioning. In the relational model, data can be partitioned in five ways: 1. By schema: a schema is a description of the relations and attributes in a database. At this level of partitioning, each database resides at a single site and cannot be distributed among the sites. 2. By relation: the relations in a schema can be distributed among the sites. All rows and columns of a relation are stored in at the same site. 3. By row (horizontally): the rows (tuples) in a relation can be distributed among the sites. 4. By column (vertically): the columns (attributes) in a relation can be distributed among the sites. 5. By row and column: the rows and columns in a relation can be distributed among the sites. The materialization and access planning methods of the preferred embodiment assume that partitioning by relation is supported. In summary, the type of distribution supported by the proposed MAP method is geographically distributed data, with replication, and partitioning by relation. Level of Distribution Transparency The level in the system where knowledge of data location is available affects the MAP process. In DDS, the data location information is used to map between the global representation schema 8 and the local representation schemas 12a, 12b and 12c. At each AP 20a, 20b and 20c, the MAP process has access to the global representation schema 8, the necessary local representation schemas 12a, 12b and 12c, and the data location information. In the preferred embodiment implementation of DDS, the global representation schema 8, the local representation schemas 12a, 12b and 12c, and the data location information are replicated at all APs 20a, 20b and 20c. Alternatively, the necessary data could be partitioned and distributed among the DPs 24a, 24b and 24c. In this case, the data used by MAP 38 must be collected from the DPs 24a, 24b and 24c before the MAP process begins. In summary, the MAP method must have access to the global representation schema 8, the local representation schemas 12a, 12b and 12c, the mapping between them, and the location of the data in the local representation schemas 12a, 12b and 12c. This necessary data can be initialized independently of the MAP process. In the DDS of the preferred embodiment implementation, it is replicated at each AP 20a, 20b and 20c in the distributed system such that each AP has access to a local data dictionary/directory (DD/D) 36. Usage Characteristics The type of usage of the system affects the MAP process. In DDS, the user interfaces with the system by using the GORDAS query and update language. The user can build a transaction that consists of one or more query or update requests. In the preferred embodiment, GORDAS does not provide control constructs. The MAP method of the preferred embodiment handles queries, updates, and transactions of one or more queries or updates. Queries are decomposed using the materialization and access planning methods. Updates are decomposed by first building an execution strategy that builds a temporary relation of the tuple identifiers of the tuples to be updated. Commands are then added to the execution strategy to move this list of tuple identifiers to each site that has the relation to be updated, and to update the relation. Transactions of one or more query or update requests are decomposed in a similar manner. Execution strategies are built for each query or update independent of one another. An improvement would be to recognize common subexpressions in the transaction. This would reduce the number of temporary relations built. A necessary extension to the MAP method of the preferred embodiment is to process control constructs. Work has been done on decomposing the IF-THEN-ELSE-ENDIF control construct as described in "Transaction Optimization in a Distributed Database Testbed System", by P. A. Dwyer and A. R. Hevner, Proceedings of the IEEE Seventh International Computer Software and Applications Conference, November 1983, which is incorporated herein by reference. A method was developed on DDS that handles the IF-THEN-ELSE-ENDIF, but it uses a simple access planning method. Further research is still necessary in this area. In summary, the MAP method handles general queries and updates, but does not attempt to recognize common subexpressions or to handle control constructs in a transaction. Transaction Management Facilities The Reliable Transaction Processing project described in "An Overview of Reliable Transaction Processing in DDTS", by J. P. Richardson, M. D. Spinrad, and M. G. Smith, HR-82-268: 17-38, Honeywell Corporate Computer Sciences Center, November 1982, which is incorporated herein by reference addressed the problem of providing reliable, serializable, and highly available transaction execution in a distributed database management system. Site recovery logic and a commit protocol are used to provide reliability; a distributed concurrency control process is used to provide serializability; and a data replication control process is used to provide highly available transaction execution. The MAP method is largely independent of the methods used to provide reliable transaction processing in DDS. These activities occur during transaction execution, well after the MAP process has occurred. The materialization method chooses the sites of all relations in the query or transaction. The execution strategies generated for a transaction may be stored for later execution. Each sub-query is sent to the local site where it is stored. Upon site failure, the MAP process must be reexecuted for transactions involving the failed site. Site failure information must be available to the materialization process, so that it can select new sites for use in the transaction. The materialization process handles replicated data at the granularity of relations. For a query, if a relation is replicated at more than one site, the materialization process chooses a single copy. For an update, all copies are chosen, and the execution strategy contains an update command for each chosen copy. The data replication control method ensures that the consistency of the replicated data copies is maintained in the presence of failures. In conclusion, the MAP process is independent of the protocols used for reliable transaction processing, but it is dependent on the location of redundant copies. The MAP process must be reexecuted if a site containing data to be accessed in an execution strategy becomes unavailable. The Materialization and Access Planning Method This section describes the MAP method of the preferred embodiment. The theoretical foundations of the MAP process and the design of the MAP process are presented. Theoretical Foundations The input to the MAP process is a cluster tree that represents a query, update, or transaction. Materialization and access planning are applied to each leaf node in a cluster tree. For cluster trees with non-leaf set nodes, the location of the final result of each leaf is used to decide where the set operation (UNION, INTERSECTION and DIFFERENCE) is to occur. For cluster trees with non-leaf update nodes, the update operation (DELETE, INSERT and MODIFY) is performed at all sites containing the relation to be updated. The materialization process used picks the sites that have the most number of relations in the leaf node. The access planning methods used are GENERAL (RESPONSE), GENERAL (TOTAL), or Initial Feasible Solution (IFS). In the DDS of the preferred embodiment, the user specifies what method to use. IFS generates a baseline execution strategy without any optimization. GENERAL (RESPONSE) minimizes response time and GENERAL (TOTAL) minimizes total time of a query. The cost function that is minimized is a linear function of the size of the data transmitted. The response time is the time elapsed between the start of the first transmission and the time at which the result arrives at the required computer. The total time is the sum of the costs of all transmissions required. GENERAL (RESPONSE) and GENERAL (TOTAL) require that all local processing for a query has been accounted for. That is, all selections, projections, and joins that may be computed between relations that reside at the same node are determined. The effect of the local processing on the size of the relations and the selectivities of the attributes is estimated. GENERAL (RESPONSE) and GENERAL (TOTAL) use the estimation of the effect of local processing at each node; therefore, the next step after materialization is to determine what processing can be done locally. Then, GENERAL (RESPONSE) or GENERAL (TOTAL) determine the best strategy for executing the query. For each base relation, R(i), i=1, 2, . . . , n, the following data are stored in the Data Dictionary/Directory (DD/D). For each relation, R(i),
______________________________________
n(i): the number of tuples (cardinality),
a(i): the number of attributes (degree),
s(i): the size, s(i) = n(i) * (sum over all a(i) of w(i,j)).
______________________________________
For each attribute, d(i,j), j=1, 2, . . . , a(i) of R(i):
______________________________________
v(i,j):
the number of possible domain values,
u(i,j):
the number of distinct domain values currently held,
w(i,j):
the size of one data item in attribute domain (bytes),
h(i,j):
the high value of d(i,j) in u(i,j),
l(i,j):
the low value d(i,j) in u(i,j).
______________________________________
The following data can be computed using the above data:
______________________________________
s(i,j) = size of all attributes j, = n(i) * w(i,j),
p(i,j) = selectivity of attribute j, = u(i,j)/v(i,j).
______________________________________
The selectivity is the number of different values occurring in the attribute divided by the number of all possible values of the attribute. Thus, 0<p(i,j)<=1. To process the queries represented by the leaf clusters, only the selection, projection, and join operations are needed. For a distributed query, joins on relations that are located at different sites can occur. Before computing these joins, the sizes of the relations are reduced by selections and projections at each site. All joins that can be done locally at each site are performed. The information required for GENERAL (RESPONSE) or GENERAL (TOTAL) for the resulting relations must be computed. This information is:
______________________________________
n(i): the cardinality,
a(i): the number of attributes,
s(i): the size in bytes.
______________________________________
For each attribute:
______________________________________
p(i,j): the selectivity,
s(i,j): the size in bytes of all of the data items in attribute
d(i,j).
______________________________________
In order to compute this information after a selection, an estimate of the expected fraction of tuples that satisfies a predicate is computed. The method used in MAP is based on a method described in "Access Path Selection in a Relational Database Management System", by P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price, Proceedings of the ACM SIGMOD Conference, June 1979, which is incorporated herein by reference. satisfies a predicate. For each of the following predicates, the selectivity factor is shown:
______________________________________
(1) column = value
F = 1/u(i,j)
(2) column1 = column2
F = 1/MAX (u(i,column1), u(i,column2))
(3) column > value
F = (h(i,column) - value) / (h(i,column) - l(i,column))
= 0 (if F < 0)
(4) column < value
F = (value - l(i,column) / (h(i,column) - l(i,column))
= 0 (if F < 0)
(5) (pred1) OR (pred2)
F = F(pred1) + F(pred2) - F(pred1) * F(pred2)
(6) (pred1) AND (pred2)
F = F(pred1) * F(pred2)
(7) NOT pred
F = 1 - F(pred)
______________________________________
The base relation parameters are changed in the following way for the select:
______________________________________
T <- SELECT R WHERE predicate
n(T) = n(R) * F
a(T) = a(R)
s(T) = s(R) * F
For all attributes, j,
v(T,j) = v(R,j)
u(T,j) = u(R,j) * F
w(T,j) = w(R,j) (size of ONE attribute)
h(T,j) = h(R,j)
l(T,j) = l(R,j)
s(T,j) = s(R,j) * F
p(T,j) = p(R,j) * F
______________________________________
The base relation parameters are changed in the following way for the project:
______________________________________
T <- PROJECT R OVER d(j), . . .
n(T) = n(R)
a(T) = count (d(j), . . .) - the number projected
s(T) = n(T) * (the sum over the attributes projected)
v(T,j) = v(R,j)
u(T,j) = u(R,j)
w(T,j) = w(R,j)
h(T,j) = h(R,j)
l(T,j) = l(R,j)
s(T,j) = s(R,j)
p(T,j) = p(R,j)
______________________________________
For a join operation, the selectivity factors are given by the selectivity, p(i,j), of the joining attributes. For the following join:
______________________________________
JOIN R(1), R(2) WHERE R(1).A = R(2).A
F(1) = p(2,A) and F(2) = p(1,A)
n(1) = n(1) * F(1) n(2) = n(2) * F(2)
a(1) = a(1) a(2) = a(2)
s(1) = s(1) * F(1) s(2) = s(2) * F(2)
For the attributes, j:
v(i,j) = v(i,j)
u(i,j) = u(i,j) * F(i)
w(i,j) = w(i,j)
h(i,j) = h(i,j)
l(i,j) = l(i,j)
______________________________________
For the new relation that results from the join, the parameters are:
______________________________________
if (n(1) > n(2))
n = n(1)
s = s(1) + s(2) + (s(2) * ((n(1) - n(2))/n(2))
else
n = n(2)
s = s(1) + s(2) + (s(1) * ((n(2) - n(1))/n(1))
a = a(1) + a(2)
______________________________________
After local processing is accounted for, GENERAL (RESPONSE) or GENERAL (TOTAL) determines the best way to execute the remaining join operations. The transmission cost of the data is assumed to be the same between any two computer sites and to be a linear function of the size of the data transmitted. To reduce the transmission cost, the amount of data transferred must be reduced. Semi-join operations are used to reduce the amount of data. A semi-join is used to make a relation, which is one of the operands of a join, smaller by deleting tuples that cannot play a role in the join. For example, suppose we want to join relations R and S that are located at two different sites. Two ways to do the join are to ship a copy of R to the site of S or to ship a copy of S to the site of R. Suppose that the transmission cost, C, is given by: C=Co+X where X is the amount of data transmitted, and Co is the cost of initiating a message. If R and S are of sizes r and s, the cost of the transmission is (where min is a minimum function): C=Co+min (r,s). Another way to do the join is to project R over the joining attribute and ship the projection to the site of S. At the site of S, we do a natural join of the projection of R and S. The result of the join is shipped back to the site of R where the final join is computed. If the projections of R and S over the joining attributes have sizes r' and s', and the natural join of the projections of R and S with S and R have sizes r" and s", then the cost of the transmission is: C=2Co+min (r'+s", s'+r"). This cost may be less than C=Co+min (r,s). The best way to perform a join depends on where the result of the join is required. Suppose that the result is required at a third site, other than the site of R or the site of S. Then, in this example, there are five ways to do the join to be considered: 1. The relations R and S are transmitted to the result site, where the join is computed. 2. The relation R is projected over the joining attributes, and the joining attributes are transmitted to the site of S. Relation S is reduced in size by semi-joining S with the joining attributes of R. The relation R and the reduced relation S are transmitted to the result site, where the join is computed. 3. The relation S is projected over the joining attributes, and the joining attributes are transmitted to the site of R. Relation R is reduced in size by semi-joining R with the joining attributes of S. The relation S and the reduced relation R are transmitted to the result site, where the join is computed. 4. The relation R is projected over the joining attributes, and the joining attributes are transmitted to the site of S. Relation S is reduced in size. The reduced relation S is projected over the joining attributes, and the joining attributes are transmitted to the site of R. Relation R is reduced in size. The reduced relations S and R are transmitted to the result site where the join is computed. 5. The relation S is projected over the joining attributes, and the joining attributes are transmitted to the site of R. Relation R is reduced in size. The reduced relation R is projected over the joining attributes, and the joining attributes are transmitted to the site of S. Relation S is reduced in size. The reduced relations R and S are transmitted to the result site where the join is computed. The data transmissions used for reducing a relation and the transmission of the reduced relation to the result computer site form a schedule for this relation. For example a schedule for relation R(1) follows: ##STR5## Attribute d(1,2) of relation R(1) is sent to the site of relation R(2), where a semi-join is performed on relation R(2). The cost of transmitting d(1,2) is c(1). The reduced attribute d(2,2) is sent back to the site of relation R(1) at cost c(2) in parallel with d(2,1) at cost c(3). R(1) is joined with d(2,2) and d(2,1) and the final reduced relation, R(1) is sent to the result node at cost c(4). The selectivity of a joining attribute is used to estimate the change in size of a relation or attribute after a join or semi-join. If relation R(i) has a semi-join with attribute d(k,l) with selectivity p(k,l) on attribute d(i,j), then the size, s(i), of R(i) is changed by: s(i)=s(i)*p(k,l) The size, s(i,j) and selectivity, p(i,j), of each joining attribute, j, are changed by:
______________________________________
s(i,j) = s(i,j) * p(k,l)
and
p(i,j) = p(i,j) * p(k,l).
______________________________________
The costs are then computed based on the new size. The incoming selectivity of a schedule for a relation is the product of the selectivities of all the attributes in the schedule excluding the attributes of the relation. If there is more than one occurrence of an attribute in a schedule, it can contribute only one instance of its selectivity. The response time of a schedule is the time elapsed between the start of the first transmission and the time at which the relation arrives at the result site. The total time of a schedule is the sum of the costs of all transmissions required in the schedule. For the above example, the response time, R, and the total time, T, are given by: R=max(c(1)+c(2)+c(4), c(3)+c(4)) and T=c(1)+c(2)+c(3)+c(4). The output of GENERAL (RESPONSE) or GENERAL (TOTAL) is a distribution strategy for a query. This strategy consists of the schedules for all relations in the query that do not reside at the result node. The final step in MAP is to translate the schedules to a canonical query representation that includes all local processing, move requests, and final processing at the result site. This query representation is the final execution strategy that is split up and distributed to the sites involved in the execution. The following section contains a detailed description of the MAP process. See FIG. 8. MAP Process 1. Build execution strategies For each cluster tree 72 in the list of trees input by Translation and Integrity Control 70, (each tree corresponds to a GORDAS query or update) build an execution strategy, using BUILD REQUESTS process. 2. Add print commands Add commands to print the result at the result application processor (AP). For queries, keep track of where the final result is built (the result DP), and the temporary relation containing the result. Add move commands to the strategies that move the temporary relations to the result AP and print commands to print the temporary relation at the result AP. BUILD REQUESTS Process 1. Build an execution strategy for a tree starting at the root node 74. Decision block 76 checks whether or not the current mode is a leaf. a. Root is a leaf. If the root of the cluster tree is a leaf, build a strategy using OPTIMIZE CLUSTER Process. b. Root is not a leaf. Decision block 80 checks whether or not all child nodes are processed. If so, then block 78 makes the parent mode the current mode and build a strategy using OPTIMIZE CLUSTER process. If not, then block 82 makes the next child node the root of subtree and apply BUILD REQUESTS process. Decision block 94 checks if there are more nodes. If so, then block 92 makes the next node the current node and apply BUILD REQUESTS. If there are no more nodes, pass the execution strategy 96 to the distributed execution monitor 98. Keep track of all final relations and result DPs. If the root cluster represents a set operation, choose a DP that has the most operands, and add a set command to the execution strategies. If the root cluster represents an update, add move commands of the final relations to all sites containing the relation to be updated. Add an update command for each site. OPTIMIZE CLUSTER Process 1. Materialization. In materialization planning 84, choose a site for each base relation in the variable list of the cluster and a result DP using MATERIALIZATION process. 2. Add local processing commands. In local process planning 86, generate the commands in the execution strategy for all initial local processing. This includes SELECT using any condition, PROJECT over only the attributes in the final result or needed to do JOINs, and JOIN any relations at the same site that can be joined. 3. Perform access planning. The first step in non local process planning 88 and build request 90 in access planning. a. GENERAL (RESPONSE) or GENERAL (TOTAL). If optimization method required is GENERAL (RESPONSE) process or GENERAL (TOTAL) process, calculate the sizes and selectivities of the relations resulting after local processing, and apply GENERAL (RESPONSE) or GENERAL (TOTAL) process. Build the commands in the execution strategy for each schedule generated. Do not add redundant commands. b. IFS. No optimization is performed. 4. Add commands for final processing at the result DP. The second step in non local process planning 88 and build requests 90 is the commands for processing at the result DP. a. GENERAL (RESPONSE) or GENERAL (TOTAL). For GENERAL (RESPONSE) or GENERAL (TOTAL) process, add a move to the result DP of each temporary relation in the execution strategy that corresponds to the final relation built in the schedule. Each temporary relation corresponds to one or more of the original base relations, that resulted after local processing. Build the commands to generate the final result at the result DP. b. IFS. For IFS process, add a move to the result DP of each temporary relation in the execution strategy that corresponds to the relation built after the initial local processing. Each temporary relation corresponds to one or more of the original base relations. Build the commands to generate the final result at the result DP. MATERIALIZATION Process 1. Choose a materialization for the base relations. While a site has not been chosen for each base relation variable, choose the site that has the most number of relations. Associate this site with the relations that reside at it. 2. Choose a result data processor. The result DP is the site chosen the most often. If the result AP site has a DP and there is a tie involving this DP, choose this DP. GENERAL (RESPONSE) or GENERAL (TOTAL) Process A query is "simple" if after initial local processing, the relations contain only one joining attribute, PARALLEL process computes minimum response time schedules for simple queries, and SERIAL process computes minimum total time schedules for simple queries. In a "general" query, the relations can contain more than one joining attribute; therefore, such a relation can be reduced in size by semi-joins on different joining attributes. GENERAL (RESPONSE) process computes minimum response time schedules for general queries using RESPONSE process, and GENERAL (TOTAL) computes minimum total time schedules for general queries using TOTAL process. 1. Generate candidate relation schedules. Isolate each of the joining attributes, and consider each to define a simple query with an undefined result node. a. GENERAL (RESPONSE). To minimize response time, apply PARALLEL process to each simple query. Save all candidate schedules for integration in step 2 below. b. GENERAL (TOTAL). To minimize total time, apply SERIAL process to each simple query. This results in one schedule per simple query. From these schedules, the candidate schedules for each joining attribute are extracted. For attribute d(i,j), its candidate schedule is identical to the schedule produced by SERIAL, applied to the simple query in which d(i,j) occurs, up to the transmission of d(i,j). All transmissions after d(i,j) are deleted. 2. Integrate the candidate schedules. For each relation R(i), that is not at the result node, the candidate schedules are integrated to form a schedule for R(i). To minimize response time, apply RESPONSE process. To minimize total time, apply TOTAL process. PARALLEL PROCESS 1. Order relations. Order relations R(i) such that s(1)<=s(2)<= . . . s(n). R(i) consists of one attribute. 2. Select best schedule. Consider each relation R(i) in ascending order of size. For each relation R(j), j<i, construct a schedule to R(i) that consists of the parallel transmission of the schedule of relation R(j) and all schedules of relations R(k), k<j. Select the schedule with minimum response time. SERIAL PROCESS 1. Order relations. Order relations R(i) such that s(1)<=s(2)<= . . . s(n). R(i) consists of one attribute. 2. Select best schedule. If no relations are at the result node, then select the strategy: R(1).fwdarw.R(2).fwdarw. . . . R(n).fwdarw.result node. If R(r) is at the result node, then there are two strategies. Select the one with the minimum response time: R(1).fwdarw.R(2).fwdarw. . . . R(r).fwdarw. . . . R(n).fwdarw.R(r) and R(1).fwdarw.R(2).fwdarw. . . . R(r-1).fwdarw.R(r+1).fwdarw. . . . R(n).fwdarw.R(r). RESPONSE PROCESS 1. Order candidate schedules. For each relation R(i), order the candidate schedules on the joining attributes in R(i), d(i,j), in ascending order of arrival time. 2. Integrate schedules. For each candidate schedule for R(i), CSCH(1), in ascending order, construct an integrated schedule that consists of the parallel transmission CSCH(1) and all CSCH(k) where k<1. Select the integrated schedule with minimum response time. TOTAL PROCESS 1. Generate additional candidate schedules. For each relation R(i) and each candidate schedule CSCH(1), if CSCH(1) contains a transmission of a joining attribute of R(i), d(i,j), then add an additional candidate schedule for R(i) that is the same as CSCH(1) except that the transmission of d(i,j) is deleted. 2. Select the best candidate schedule. For each relation R(i) and for each joining attribute d(i,j), select the candidate schedule, BEST(i,j), that minimizes total time for transmitting R(i) if only the joining attributes are considered which can be joined with d(i,j). 3. Order candidate schedules. For each relation R(i), order the schedules BEST(i,j) on joining attributes d(i,j) of R(i), in ascending order of arrival time. 4. Integrate schedules. For each BEST(i,j) in ascending order, construct an integrated schedule for R(i) that consists of the parallel transmission of BEST(i,j) and all BEST(i,k) where k<j. Select the integrated schedule with minimum total time. Materialization and Access Planning Program Modules The following sections present the details of the higher level program modules (procedures) of the preferred embodiment which perform the materialization and access planning function. These higher level program modules are: MAP, BUILD.sub.-- REQUESTS, OPTIMIZE.sub.-- CLUSTER, LOCAL.sub.-- PROCESSING, GENERAL, and BUILD.sub.-- REQUESTS.sub.-- TABLE. These sections describe the function, inputs and outputs of each program module as well as presenting the modules procedure in a program design notation known as "P-Notation", which is part of the WELLMADE methodology, which is described in "The WELLMADE System Design Methodology", Honeywell Corporate Computer Science Center, October 1979, which is incorporated herein by reference. MAP Procedure Functional Description Determine an optimal execution strategy for a given query, update, or transaction on a distributed database management system. The execution strategy is determined by trying to minimize communication costs.
______________________________________
Static Design
procedure map (
in: first --cluster: cluster --ptr,
result --ap: node --id,
opt --Type: map --optimization --choices;
out: block --table: block --ptr,
error: error --code
);
Permanent Data
None
Local Data
cl --prt: pointer to cluster --list;
req --table: request array;
final --temp: relation --name;
result --dp: node --id;
______________________________________
Input Specifications Background A conjunctive query is one that can be expressed using only select, project, and join operators with AND in the WHERE clause. The retrieve statement in QUEL and the SELECT-FROM-WHERE statement in SEQUEL yield conjunctive queries whenever the WHERE clauses are the logical and of terms that equate two components of tuples of equate a component to a constant. It is possible to find the optimal query equivalent to a conjunctive query. Assuming that selections and projections should be done as early as possible, then only the order in which the joins are done need be considered. Processes that optimize this type of query take advantage of the fact that the joins commute with each other. The query does not have to be represented by a tree, but can be represented by a linear structure: a list of variables in the query. a condition to be applied to the variables. a list of attributes to be projected over for the result. For a relationally complete language, the only operators needed are selection, projection, join, union, and difference. An intersection can be implemented using a join. In order to allow the logical OR in a condition, the UNION operator must be included. In system R*, the operators selection, projection, join, and union are handled. A compacted tree is used to represent a query. A tree must be used since the union and join operators do not commute over each other. To include difference is even harder, since the difference operator does not commute among other difference operators in general. A compacted tree contains clusters of operators. Each cluster is a non-procedural (unordered) grouping of the same binary operator. In DDS, the input to MAP consists of a tree compacted into a cluster tree or cluster. Each cluster can then be optimized independently. For GORDAS queries with only logical AND in the condition, one cluster is created. The output of Translation and Integrity Control (TIC) procedure is a compacted tree as described. The set operators supported will be UNION, INTERSECTION, and DIFFERENCE. The update operators will be DELETE, MODIFY, and INSERT. The leaf nodes in a compacted tree will represent the SELECT, PROJECT, JOIN portion of the total query. Each leaf has a variable list of the base relations involved, a condition on the bases, and a list of attributes to project for the result. The input to this procedure consists of three things. The first is a list of cluster roots. Each root is a root of a cluster tree that represents a GORDAS query or update. More than one root is needed for a GORDAS transaction. Each root is optimized independently in this version. A cluster tree represents a compacted tree of a query or update. The second input is a node identifier of the application processor. The third is the type of optimization: IFS, GENERAL (RESPONSE) or GENERAL (TOTAL). Output Specification The output consists of a block table that is used by the DEM to send the relational algebra commands to all LEMs chosen. Procedural Design Procedural Description For each root in the list input, build the relational algebra requests for the query or update. Add a move of the final temporary built in a query to the result AP and print it there. To build the output of MAP for DEM, get the dataflow relationships among the requests and build the integer command list and the block table.
______________________________________
P-Notation
error := FALSE;
initialize --req --table (out: req --table, error);
if (not error)
cl --ptr := first --cluster;
do (cl --ptr = NULL) and (not error)
build --requests (in: cl --ptr->cluster --root, result --ap,
opt --type, req --table; out: req --table, final --temp,
result --dp, error)
if (final --temp = "") and (not error)
add --move (in: req --table, final --temp, result --dp,
result --ap; out: req --table, error);
if (not error)
add --print (in: req --table, final --temp,
result --ap; out: req --table, error);
(error)
skip;
fi;
(error)
skip;
fi;
cl --ptr := cl --ptr->next --root;
od;
if (not error)
dataflow (in: req --table; out: req -- table, error);
if (not error)
build --command --list (in: req --table;
out: block --ptr, error);
(error)
skip;
fi;
(error)
skip;
fi;
free clusters;
______________________________________
BUILD.sub.-- REQUESTS Procedure Functional Description Build the requests in the requests table for a given query using the optimization process input.
______________________________________
Static Design
procedure build --requests
in: cl --root: pointer to cluster,
result --ap: node --id,
opt --type: map --optimization --choices,
req --table: request array;
out: req --table: request array,
final --temp: relation --name,
result --dp: node --id,
error: error --code;
);
Permanent Data
None
Local Data
temp --info --list: temp --dp;
var: cluster;
temp: relation --name;
dp: node --id;
______________________________________
Input Specifications The input to this procedure consists of three things. The first is the root of a compacted tree for the given query or update. The second is the node identifier of the application processor. The third is the type of optimization. Output Specification The output consists of the requests for this cluster tree in the requests table in the form of the relational algebra. The final temporary that is created for a query is also output. An error code of TRUE or FALSE is set. Procedural Design Procedural Description Process the cluster tree input. If the root input is a leaf node, optimize the cluster using the process indicated. If the root is a non-leaf, this procedure is called recursively for each child. If the node is a set operation, the operation is created in the request table to be performed using each child at the data processor chosen the most. If the node is an update operation, the update is performed at each node containing the base relation to be updated.
__________________________________________________________________________
P-Notation
error = FALSE;
if (cl --root = NULL)
if (tag (cl --leaf) = BASE) { leaf }
optimize --cluster (in: cl --root, result --ap, opt --type, req --table;
out: final --temp, req --table, result --dp, error);
.vertline. (otherwise) {UNION, INTERSECT, DIFFERENCE, DELETE, INSERT,
MODIFY}
var = cl --root->cl --var;
temp --info --list = NULL;
do (var = NULL) and (not error)
build --requests (in: var, result --ap, opt --type, req --table;
out: req --table, temp, dp, error);
if (not error)
add --to --list (in: temp --info --list, temp, dp;
out: temp --info --list, error);
.vertline. (error)
skip;
fi;
var := var->next --var;
od;
if (tag (cl --root) = UNION) or (tag (cl --root) = INTERSECT) or
(tag (cl --root) = DIFFERENCE)
( and (not error))
pick --dp (in: temp --info --list, result --ap; out: dp, error);
if (not error)
add --set --ops (in: temp --info --list, dp, req --table,
tag (cl --root); out: req --table, error);
.vertline. (error)
skip;
fi;
.vertline. ((tag (cl --root) = DELETE) or (tag (cl --root) = INSERT) or
(tag (cl --root) = MODIFY)) and (not error)
add --update --ops (in: temp --info --list, req --table, tag (cl
--root),
cl --root->update --cluster; out: req --table, error);
.vertline. (otherwise)
skip;
fi;
fi;
fi
__________________________________________________________________________
OPTIMIZE.sub.-- CLUSTER Procedure Functional Description Build the relational algebra requests for a leaf cluster in a compacted tree. The materialization and building of the query tree are accomplished.
______________________________________
Static Design
procedure optimize --cluster (
in: cl --leaf: cluster,
result --ap: node --id,
opt --type: map --optimization --choices,
req --table: request array;
out: final --temp: relation --name,
req --table: request array,
result --dp: node --id;
error: error --code
);
Permanent Data
Temp --rel --table: abstract; { table in data dictionary }
Local Data
rel --info: rel --info --ptr;
______________________________________
Input Specifications The input to this procedure consists of four things. The first is the cluster that we are currently processing. It must be of type BASE, that is, a leaf. The second is the result application processor. The third is the optimization type. The fourth is the requests table to which requests for this cluster are appended. Output Specification The output consists of the name of the temporary that is created as a result of this cluster. The output also consists of the requests added to the requests table to realize the command represented in the cluster, and the result data processor where the final temporary is created. Procedural Design Procedural Description Determine a materialization and result DP for the cluster. The sites chosen in the materialization are stored in the cluster base variable structures. Build the relation information table used in optimization process GENERAL and in building the relational algebra requests. The local processing is taken into account when this is built. If the optimization process is GENERAL, build the schedules for the relations. Build the requests table from the relation information table local processing information and the schedules built by GENERAL.
______________________________________
P-Notation
error := FALSE;
{ set the base sites in the cluster }
materialization (in: cl --lear->bases, result --ap;
out: cl --lear->bases, result --dp, error);
if (not error)
local --processing (in: cl --leaf; out:rel --info, error);
if (not error)
if (opt --type = GENERAL RESPONSE) or (opt --type =
General --TOTAL)
general (in: cl --leaf, result --dp, result --ap, rel --info;
out: rel --info, error);
.vertline. (otherwise)
skip;
fi;
{ don't do anything for IFS }
if (not error)
build --requests --table (in: rel --info, cl --leaf, req --table,
result --dp;
out: final --temp, req --table, error);
.vertline. (error)
skip;
fi;
free rel --info;
.vertline. (error)
skip;
fi;
.vertline. (error)
skip;
fi;
______________________________________
LOCAL.sub.-- PROCESSING Procedure Functional Description Initialize the relation information list. Include the processing that can be done at one site, i.e., the local processing.
______________________________________
Static Design
procedure local --processing (
in: cl --leaf: pointer to cluster;
out: rel --info: rel --info --ptr,
error: error --code;
);
Permanent Data
None
Local Data
None
______________________________________
Input Specifications The input to this procedure consists of one thing. It is the cluster that is currently being processed. It must be of type BASE, i.e., a leaf. Output Specification The output is the list of relation information records. The size and selectivity fields have been set based on the condition. Base relations that reside at the same node have been joined if they have a joining attribute in common. Procedural Design Procedural Description Create a relation information record for each base relation variable in the cluster. For each variable in the cluster, calculate the selectivity factor using the selection condition. Compute the selectivity factor using the necessary data in the data dictionary/directory. For each base relation, the following data are stored in the data dictionary/directory: For each relation, R(i),
______________________________________
n(i): the number of tuples (cardinality),
a(i): the number of attributes (degree),
s(i): size,
s(i) = n(i) * (sum over all a(i) of w(i,j))
______________________________________
For each attribute, d(i,j), j=1, . . . , a(i) of R(i):
______________________________________
v(i,j):
the number of possible domain values,
u(i,j):
the number of distinct domain values currently held
w(i,j):
the size of ONE data item in attribute domain (bytes)
h(i,j):
the high value of u(i,j)
l(i,j):
the low value of u(i,j)
______________________________________
The following data can be computed using the above data:
______________________________________
s(i,j) = size of all attributes j,
= n(i) * w(i,j)
p(i,j) = selectivity of attribute j,
= u(i,j)/v(i,j)
______________________________________
For a SELECTION operation, the selectivity factor is determined by the methode described by the Selinger, Astrahan, et al reference mentioned above by the condition in the following way:
______________________________________
(1) column = value
F = 1/u(i,j)
(2) column1 = column2
F = 1/MAX (u (i,column1), u(i,column2))
(3) column > value
F = (h(i,column) - value) / h(i,column) - l(i,column))
= 0 (if F < 0)
(4) column < value
F = (value - l(i,column) / (h(i,column) - l(i,column))
= 0 (if F < 0)
(5) (pred1) OR (pred2)
F = F(pred1) + F(pred2) - F(pred1) * F(pred2)
(6) (pred1) AND (pred2)
F = F(pred1) * F(pred2)
(7) NOT pred
F = 1 - F(pred)
______________________________________
The base relation parameters are changed in the following way for the select:
______________________________________
T <- SELECT R WHERE predicate
n(T) = n(R) * F
a(T) = a(R)
s(T) = s(R) * F
v(T,j) = v(R, j)
u(T,j) = u(R,j) * F
w(T,j) = w(R,j) (recall size of ONE attribute)
h(T,j) = h(R,j)
l(T,j) = l(R,j)
s(T,j) = s(R,j) * F
p(T,j) = p(R,j) * F
______________________________________
The selectivity factor is used to reduce the size of relations and attributes after operations are performed on them. The selectivity factor is the expected fraction of tuples that satisfies a predicate. For a PROJECTION operation, each data element in the data dictionary/directory is changed in the following way:
______________________________________
T <- PROJECT R over d(j), . . .
n(T) = n(R)
a(T) = count (d(j)) - the number projected
s(T) = n(T) * (the sum over the attributes projected)
v(T,j) = v(R, j)
u(T,j) = u(R,j)
w(T,j) = w(R,j)
h(T,j) = h(R,j)
l(T,j) = l(R,j)
s(T,j) = s(R,j)
p(T,j) = p(R,j)
______________________________________
Set the size and cardinality of the base relation using the selectivity factor. Add the joining attributes to each relation, setting the size and selectivity of each joining attribute. Combine the relation information records of base relations that are located at the same node and can be joined. Multiply the size and cardinality of the relations by the selectivity of the relation being joined to. Similarly, multiply the size and selectivity of each joining attribute by this same selectivity. For a JOIN operation, the selectivity factors are given by the selectivity, p(i,j), of the joining attributes. For the following join:
______________________________________
JOIN R(1), R(2) where R(1).A = R(2).A
F(1) = P(2,A) and F(2) = P(1,A)
n(1) = n(1) * F(1) n(2) = n(2) * F(2)
a(1) = a(1) a(2) = a(2)
s(1) = s(1) * F(1) s(2) = s(2) * F(2)
For the attributes:
v(i,A) = v(i,A)
u(i,A) = u(i,A) * F(i)
w(i,A) = w(i,A)
h(i,A) = h(i,A)
l(i,A) = l(i,A)
______________________________________
For the new relation that results from the join, the parameters are:
______________________________________
if (n(1) > n(2))
n = n(1)
s = s(1) + s(2) + (s(2) * ((n(1) - n(2))/n(2))
else
n = n(2)
s = s(1) + s(2) + (s(1) * ((n(2) - n(1))/n(1))
a = a(1) + a(2)
P-Notation
error := FALSE;
if tag(cl --leaf) = BASE
error := TRUE;
tag(cl --leaf) = BASE
rel --info --create
(in: cl --leaf.fwdarw.base --cluster;
.sup. out: re --info, error);
if (not error)
join --bases --at --same --node
(in: rel --info;
.sup. out: rel --info, error);
(error)
skip;
fi;
fi;
______________________________________
GENERAL Procedure Functional Description Determine the optimal execution strategy for a query represented by a leaf cluster. This process will minimize either response time or total time using the method described by the Apers, Hevner and Yao reference mentioned above.
______________________________________
Static Design
procedure general (
in: cl --leaf: cluster --ptr,
opt --type: map --optimization --choices,
result --dp: node --id,
result --ap: node --id,
rel --info: rel --info --ptr;
out: rel --info: rel --info --ptr,
error: error --code
);
Permanent Data
Total --number --of --joining --attrs: integer;
Local Data
ja: integer;
temp --relp: rel --info --ptr;
______________________________________
Input Specifications The input to this procedure consists of five things. The first is the cluster to be optimized. It must be of type BASE, i.e., a leaf. The second is the map optimization type. The third is the result data processor determined by the materialization process. The fourth is the result application processor. The fifth is the relation information list after all local processing has been determined. Output Specification The output consists of the schedule determined for each relation. An error code of TRUE or FALSE is returned. Procedural Design Procedural Description The best schedule based on response or total time for each joining attribute in the relation information list is determined. The optimal schedule for each relation is built using the attribute schedules.
______________________________________
P-Notation
error := FALSE;
ja := 0;
do (ja < Total --number --of --joining --attrs )
and (not error)
if (opt --type = GENERAL --RESPONSE)
parallel (in: rel --info, ja; out: rel --info, error);
(opt --type = GENERAL --TOTAL)
serial (in: rel --info, result --dp, ja;
.sup. out: rel --info, error);
fi;
ja := ja + 1;
od;
if (not error)
temp --relp := rel --info;
do (temp --relp = NULL) and (not error)
if (opt --type = GENERAL --RESPONSE)
response (in: rel --info, temp --relp, result --dp;
.sup. out: temp --relp, error);
(opt --type = GENERAL --TOTAL)
total (in: rel --info, temp --relp, result --dp;
.sup. out: temp --relp, error);
fi;
temp --relp := temp --relp.fwdarw.next --rel;
od;
(error)
skip;
fi;
______________________________________
BUILD.sub.-- REQUESTS.sub.-- TABLE Procedure Functional Description Build the relational algebra requests for a cluster of type BASE using the relation information list and the schedules built during optimization.
______________________________________
Static Design
procedure build --requests --table (
in: rel --info: rel --info --ptr,
cl --base: pointer to base --cluster,
req --table: abstract,
result --dp: node --id,
opt --type: map --optimization --choices;
out: final --temp: relation --name,
req --table: abstract,
error: error --code;
);
Permanent Data
None
Local Data
None
______________________________________
Input Specifications The input to this procedure consists of five things. The first is a pointer to the relation information list after all local processing and schedules have been determined. The second is the base cluster that is currently being processed. The third is the requests table to which the requests should be appended. The fourth is the result data processor. The fifth is the type of query optimization process. Output Specification The output is the final temporary relation that is created. The updated requests table is also output. Procedural Design Procedural Description Create the requests to do the local processing. For GENERAL, build the requests for the schedules. For IFS, don't do anything. Add moves to the result data processor for each temp created in the relation information list. Get the final answer at the result data processor.
__________________________________________________________________________
P-Notation
error := FALSE;
generate --local --requests
(in: rel --info, cl --base, req --table;
.sup. out: req --table, error);
if (not error)
if (opt --type = GENERAL --RESPONSE) or (GENERAL --TOTAL)
build --rel --ops --for --schedules
(in: rel --info, req --table;
.sup. out: req --table, error);
(opt type = IFS)
trelp := rel --info;
do (trelp = NULL)
trelp.fwdarw.final --temp --rel := trelp.fwdarw.temp --rel;
trelp := trelp.fwdarw.next --rel;
od;
fi;
if (not error)
get --final --result
(in: rel --info, cl --base.fwdarw.projects, result --dp,
.sup. req --table; out:req --table, final --temp, error);
(error)
skip;
fi;
(error)
skip;
fi;
__________________________________________________________________________
Query Processing Example The method by which a user query is processed by the above program modules is shown in the following example user query. The numbers in parentheses after the program module names indicate the level of recursion in the calling of the program modules. User query: get <<Firstname,Salary>> of EMPLOYEE where Lastname="Smith" or Name of department="Accounting":
__________________________________________________________________________
Program Module Action
__________________________________________________________________________
TIC parse and translate query producing one
cluster tree, C1
##STR6##
result --ap = 4 (user site)
opt --type = GENERAL RESPONSE
MAP requests --table is initialized.
C1 is passed to BUILD --REQUESTS.
BUILD --REQUESTS(1) first child is passed to BUILD --REQUESTS.
##STR7##
BUILD -- REQUESTS(2) first child is passed to OPTIMIZE --CLUSTER
OPTIMIZE --CLUSTER Materialization is chosen for EMPLOYEE at site 1,
result --dp = site 1.
first child is passed to LOCAL --PROCESSING.
LOCAL --PROCESSING Local processing is determined:
All operations can be performed locally.
Return to OPTIMIZE --CLUSTER.
OPTIMIZE --CLUSTER BUILD --REQUESTS --TABLE is called.
BUILD --REQUESTS --TABLE
Requests are appended to requests --table.
Return to OPTIMIZE --CLUSTER.
site requests-table:
1
##STR8##
1
##STR9##
OPTIMIZE --CLUSTER Return to BUILD --REQUESTS(2).
BUILD --REQUESTS(2) Return to BUILD --REQUESTS(1).
BUILD --REQUESTS(1) Keep track of T2 at site 1 in temp-info-list.
second-child is passed to BUILD --REQUESTS.
##STR10##
BUILD --REQUESTS(3) second child is passed to OPTIMIZE --CLUSTER.
OPTIMIZE --CLUSTER Materialization is chosen - EMPLOYEE at site 1,
DEPARTMENT at site 2; result-dp = site 2.
LOCAL --PROCESSING is called.
LOCAL --PROCESSING Local processing is determined.
Select all EMPLOYEES at site 1 and
select `Accounting` DEPARTMENTS at site 2,
Project needed attributes.
Affect of select and project operations on
size and selectivity is computed.
Return to OPTIMIZE --CLUSTER.
OPTIMIZE --CLUSTER GENERAL is called.
GENERAL PARALLEL and RESPONSE are called.
They use size and selectivity values
estimated in local processing to produce
two schedules (one for each relation).
Assume that the schedules are to use a semi-join to
reduce EMPLOYEE relation and to use
DEPARTMENT relation as is.
Return to OPTIMIZE --CLUSTER.
__________________________________________________________________________
Schedules
______________________________________
DEPARTMENT: Dept. D#
EMPLOYEE: Dept. D# EMPLOYEE
______________________________________
Program
Module Action
______________________________________
OPTIMIZE --CLUSTER
BUILD --REQUESTS --
TABLE is called.
BUILD --REQUESTS --TABLE
Requests are appended to
requests --table.
Return to OPTIMIZE --
CLUSTER.
______________________________________
Site Requests-table:
______________________________________
1 T1 <- Select EMPLOYEE where Lastname
= `Smith`
1 T2 <- Project T1 over Firstname, Salary
2 T3 <- Select DEPARTMENT where Dname
= `Accounting`
2 T4 <- Project T3 over D#
2 Move T4 to site 1
1 T5 <- Select EMPLOYEE
1 T6 <- Join T4, T5 over D#
1 T7 <- Project T6 over Firstname,
Salary
1 Move T7 to site 2
OPTIMIZE --
Return to BUILD --REQUESTS(3)
CLUSTER
BUILD -- Return to BUILD --REQUESTS(1)
REQUESTS(3)
BUILD -- Keep track of T7 at site 2
REQUESTS(1)
in temp-info-list.
PICK-DP chooses site 1 for
union operation.
ADD-SET-OPS adds union to
requests --table.
Return to: MAP.
(new ones added)
ADD --MOVE and ADD --PRINT are called.
1 Move T8 to site 4
4 print T8
DATAFLOW is called to generate
dataflow dependencies.
BUILD --COMMAND --LIST formats
the message information needed by DEM.
Final output to DEM based on
requests-table (logically equivalent).
______________________________________
Example of Optimization for Response Time or Total Time In the following example, which illustrates optimization for response time or total time, the relational database is used to keep track of PARTS, SUPPLIERS, PARTS ON-ORDER by SUPPLIERS, and JOBS requiring PARTS from SUPPLIERS. It consists of the following relations:
______________________________________
PARTS (P#, PNAME)
SUPPLIERS (S#, SNAME)
ON-ORDER (S#, P#, QTY)
S-P-J (S#, P#, J#)
______________________________________
Suppose that PARTS and SUPPLIERS are located at site 1, ON-ORDER at site 2, S-P-J at site 3, and the result is required at site 4. The following cluster tree, T, that contains one leaf node, represents the query: List the P#, PNAME, S# for all parts that are currently on order from suppliers who supply that part to job 50. ##STR11## The local processing consists of performing the selection on S-P-J where S-P-J.J#=50, and the projections of the required attributes in all relations. The resulting size and selectivity parameters of the new relations are:
______________________________________
Relation Size d(i,1)=P#
d(i,2)=S#
R(i): S(i) s(i,1) p(i,1)
s(i,2) p(i,2)
______________________________________
R(1):
ON-ORDER 10000 1000 0.4 500 0.3
R(2):
S-P-J 15000 1000 0.5 1500 0.8
R(3):
PARTS 20000 2000 0.7 -- --
______________________________________
The Initial Feasible Solution (IFS) is to transmit all relations after local processing to the result node, where the result is built. Assuming the cost function is C(X)=20+X where X is the size of the data transmitted in bytes, the schedules for this solution are: ##STR12## Response time=20020 Total time=45060 sing, and final processing at the result node for this set of schedules is:
______________________________________
site request
______________________________________
1 T1 <- PROJECT PARTS OVER PARTS.P#,
PARTS.PNAME
1 MOVE T1 TO 4
2 T2 <- PROJECT ON-ORDER OVER ON-ORDER.S#,
ON-ORDER.P#
2 MOVE T2 TO 4
3 T3 <- SELECT S-P-J WHERE S-P-J.J# = 50
3 T4 <- PROJECT T3 OVER T3.S#, T3.P#
3 MOVE T4 TO 4
4 T5 <- JOIN T2, T4 WHERE T2.S# = T4.S# AND
T2.P# = T4.P#
4 T6 <- JOIN T5, T1 WHERE T5.P# = T1.P#
4 RESULT <- PROJECT T6 OVER T6.P#,
T6.PNAME, T6.S#
______________________________________
To find the minimum response time schedule, GENERAL (RESPONSE) is used. After the initial local processing, two simple queries are formed, one having P# and the other having S# as common joining attributes. PARALLEL process is applied to each simple query. For P#, the resulting candidate schedules generated are: ##STR13## For S#, the resulting candidate schedules generated are: ##STR14## Now schedules for R(1), R(2), and R(3) are constructed. For R(1), the schedules of attributes that can be applied to R(1) are ordered on their arrival time at R(1):
______________________________________
Attribute Arrival Time
______________________________________
d(2,2) 990
d(2,1) 1020
d(3,1) 1440
______________________________________
For each of these three attributes, an integrated schedule for R(1) is constructed that consists of the parallel transmission of all attributes having an arrival time less than or equal to its own arrival time: ##STR15## From these 3 schedules and the IFS for R(1), the schedule with the minimum response time of 4260 is chosen. For R(2), the arrival times for applicable attributes are:
______________________________________
Attribute Arrival Time
______________________________________
d(1,2) 520
d(1,1) 1020
d(3,1) 1440
______________________________________
The following candidate schedules are built: ##STR16## The schedule with response time 2720 is chosen. For R(3), the arrival times for applicable attributes are:
______________________________________
Attribute Arrival Time
______________________________________
d(1,1) 1020
d(2,1) 1020
______________________________________
The following candidate schedules are built: ##STR17## The schedule with response time 5040 is chosen. For the example query, the response time is 5040 and the total time is 18630. To find the minimum total time schedule, GENERAL (TOTAL) is used. The SERIAL process is applied to each simple query. For P#, the resulting candidate schedules are: ##STR18## For S#, ##STR19## Now schedules for R(1), R(2), and R(3) are constructed. For R(1), two schedules are added for P#: ##STR20## Now the best candidate schedule, BEST(1,1) is chosen. The first schedule, d(1,1) is meaningless. The candidates are: ##STR21## BEST (1,1)=d(3,1) with a total time of 5380. For S#, one schedule is added: ##STR22## Now, BEST(1,2) is chosen. The candidates are: ##STR23## BEST(2,1)=d(1,2) with a total time of 9010. Now BEST(1,1) and BEST(1,2) are ordered on their total time, and the following two schedules are generated: ##STR24## The first is chosen as the final schedule for R(1). For R(2), one schedule is added for P#: ##STR25## The candidates for BEST(2,1) are: ##STR26## BEST(2,1)=with a total time of 6060. For S#, no schedules are added. The only candidate for BEST(2,2) is: ##STR27## BEST(2,1) and BEST(2,2) are ordered on their total time, and the following two schedules are generated: ##STR28## The second is chosen as the final schedule for R(2). For R(3), no schedules are added for P#. The candidates for BEST(3,1) are: ##STR29## BEST(3,1)=d(2,1) with a total time of 5460. Since this is the only joining attribute, the second is the final schedule for R(3). The total time is 14480, and the response time is 5460. Both the total time and response time for the schedules generated from GENERAL (RESPONSE) and GENERAL (TOTAL) are much better than the IFS. From the above discussion, it can be appreciated that the optimization method of the present invention partitions optimization into global and local optimization. The method provides that global optimization is done by selecting the sites from which data is to be accessed from and transmitted to, and determining where the accessed.ata is to be processed. Once this global optimization is performed and the execution strategy is determined, the Distributed Execution Monitor (DEM) oversees the distribution of the execution commands to the Local Execution Monitors (LEM) at the local sites. The LEM then passes on these execution commands to the Local Operation Module (LOM) which can then perform any desired local optimization using local data base parameters to optimize accessing their individual local databases. Because there may be multiple LEMs involved, this local optimization may be done in parallel at multiple sites, with each site optimizing its own local processing. This may result in starting the execution of the query sooner because the total elapsed optimization time is decreased. Further, a local site can begin execution of its portion of a query once it completes its local optimization processing, thus possibly further speeding up the completion of the query execution. Having the local optimization performed at the local sites also results in eliminating having to distribute local site parameters throughout the network to all sites which might perform optimization and further allows for local optimization to be done using local parameters which may be more current. DDS Performance In this selection, the performance of DDS and the MAP methods of the preferred embodiment are discussed based upon the experimental results. The experimental results are summarized below: (1) The elapsed real time spent in MAP is negligible compared to the time spent in distributed compilation and execution. The time spent in DEM is at least 98% of the response time. The time in DEM increases rapidly as the size of relations, number of relations, and number of attributes per relation accessed increases, due to increases in local execution time, execution time communications costs, and compile time communication costs. Local compile time is relatively constant. (2) In DDS, transactions are compiled and the execution strategies saved for later execution. If the cost of compiling a transaction is negligible compared to the cost of executing the transaction, it may be desirable to use MAP processes that are more dynamic and balance workloads. Both the cost of compiling and the cost of executing a query are complex functions of the size of relations, number of relations, number of attributes per relation accessed, and the distribution of relations. For the same distribution, the cost of executing a query increases much more rapidly with the size of relations accessed than the cost of compiling a query, and therefore the dynamic MAP processes may be desirable for large sizes of relations accessed. Both the costs of compiling and executing a query increases rapidly with the number of relations and/or number of attributes per relation. The effect on communications time is much greater when the AP and DP are on geographically separate computers. (3) The load and capacity of the site processors (i.e., computer) are very important to overall performance. For some queries, it is more efficient to use a DP physically located on a different processor than the same processor as the AP. As the size and/or number of relations accessed increases, the communications costs increase and negate this effect. The configuration of the processors (e.g., the existence of cache memory) influences the performance of DDS by making one DP superior to another. (4) There is a constant overhead associated with each query. Its magnitude depends on the processor configuration and load. One area for performance improvement is DDS tuning. DDS tuning involves further identification of the DDS bottlenecks and improvement of DDS software in an effort to increase the performance of DDS. Software modifications that would result in improvements to DDS performance based on the initial preferred embodiment results are: (1) Modify the cost functions used by MAP process to be the actual communications costs functions measured. Modify MAP so that it doesn't assume that the cost of sending a message from A to B is the same as sending a message from B to A. Use the cost functions in the process of choosing a result DP, so that the MAP materialization process does not assume that it costs the same to send a message of the same size from any DP to any AP. (2) Extend MAP to include local processing costs during both the materialization and access planning phases. Information about the configuration and loads of the DPs should be included. System and database parameter variation experimentation may lead to further performance improvement. The following experiments are suggested: (1) Evaluate the MAP processes in a geographically distributed DBMS. The performance benefits of the sophisticated MAP processes (GENERAL (RESPONSE), and GENERAL (TOTAL)) should be evident in this environment that includes long distance communications and large databases. This environment can provide performance data for a wide range of database parameters. (2) Evaluate the MAP processes in a locally distributed DBMS. This environment provides a third set of system parameters. The need for this type of MAP process is apparent from the initial DDS results, and will probably become much more important and necessary in an LAN environment with high speed communications. Another area of possible improvement includes the extension of the MAP cost function to include local execution costs and to do dynamic load balancing. The MAP process will need information about the local system configurations and loads. Some information, such as, the existence of cache memory, can be used at compile time. The load information is dynamic and a dynamic materialization process will be necessary. Further research into dynamic MAP processes, and MAP processes that perform access planning before materialization may also result in improved performance. The later process provides the ability to do access planning at compile time and materialization at run time. While the invention has been shown and described with reference to the preferred embodiment thereof, it will be understood by those skilled in the art that the above and other changes in form and detail may be made therein without departing from the spirit and scope of the invention.
|
Same subclass
| |||||||||||
