Generalized model for the exploitation of database indexes6253196Abstract A method, apparatus, and article of manufacture for computer-implemented exploitation of database indexes. A statement is executed in a database stored on a data storage device connected to a computer. The database contains data. A model based on pattern matching for a user-defined predicate and selection of an index exploitation rule based on a matched user-defined predicate is provided to be used for exploiting an index to retrieve data from the database. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
CREATE TABLE employee (eid int,
dept int,
name varchar(30),
salary float,
bonus float,
award float, . . . );
CREATE TABLE department (deptID int,
name varchar (30),
budget float,
mgr int, . . . );
CREATE INDEX comp ON employee (salary, bonus, award);
The first CREATE TABLE statement creates a table, named "employee", with several columns. The second CREATE TABLE statement creates a table, named "department", with several columns. The CREATE INDEX statement creates an index, named "comp", for the "employee" table with the index keys "salary", "bonus", and "award". Each index key is called a "dimension" of the index. FIG. 5 is a diagram illustrating index exploitation in a conventional system. The following statement 500 is illustrated in FIG. 5: SELECT*FROM employee WHERE salary=50000 AND bonus<10000; Statement 500 selects records from the "employee" table, which has columns for salary, bonus, and award, which are used in the INDEX "comp". For statement 500, a conventional system is able to generate a range based on the limitations of the WHERE clause. In particular, for statement 500, for the INDEX "comp", the salary value is 50000, and the bonus value ranges from 0 to 10000. The award value ranges from 0 to infinity, as the value was not restricted by the WHERE clause. The range is represented as follows: (50000, 0, 0) 502 to (50000, 10000, .infin.) 504. The following statement 506 is also illustrated in FIG. 5: SELECT*FROM employee e, department d WHERE e.salary>d.budget*0.1; Statement 506 selects records from the "employee" table and the "department" table based on the salary of an employee being greater than 10% of the budget for the employee's department. For statement 506, a range is represented as follows: (d.budget*0.1,0,0) 508 to (.infin., .infin., .infin.) 510. The existing index exploitation technique, available in conventional systems, has several disadvantages. First, table records are sorted based on built-in relations, for example, the ordering may be based on mathematical relations on built-in data types, such as int, real, char, varchar, etc. Additionally, the built-in relations are based on "total ordering", which is a linear model. Conventional systems exploit indexes using only built-in predicates, such as <, >, <=, >=, =etc. Moreover, conventional systems find applicable -predicates for index exploitation based on very simple pattern matching mechanisms. Moreover, the usefulness of conventional indexes typically limited to one-dimensional searching. The following pseudocode represents the steps of existing index exploitation in a conventional system:
For each index I:
For each dimension D of index I;
For each predicate P in the query:
(1) Determine whether either side of predicate P is an
index key.
(2) If one side of predicate P is an index key, determine
whether the other side of predicate P is a "bounded"
expression.
(3) If the other side of predicate P is a "bounded"
expression, use predicate P as the key predicate for
the dimension D of index I.
(4) Next, determine the start/stop key for dimension D
with the "bounded" expression in predicate P.
The above pseudocode for existing index exploitation selects each index "I" and each dimension "D" of the index. A dimension is a key part that was defined for the index with a CREATE INDEX statement. For each index and dimension, and each predicate "P" in a query, it is determined whether either side of the predicate "P" is a simple column that is part of an index (i.e., an index part). A predicate "P" is an expression that compares two values (e.g., A B). If neither side of the predicate "P" is a simple column that is part of an index, then index exploitation is not performed. If either side of the predicate "P" is a simple column that is part of an index, then it is determined whether the other side of the predicate "P" is a "bounded" expression. A "bounded" expression is an expression that does not use the value from the other side of the predicate to compute its result (e.g., d.budget*0.01). If the other side of the predicate "P" is not a "bounded" expression, index exploitation is not performed. If the other side of the predicate "P" is a "bounded" expression, the predicate "P" is used as the key predicate for the dimension "D" of index "I". That is, predicate "P" is used to exploit the index "I" for the identified key part. The start and stop keys that define the range for dimension "D" are determined by the "bounded" expression in predicate "P". Weakness of the Existing System Pattern matching and selection of index exploitation rules are the major components involved in exploiting indexes. In the present invention, both pattern matching and selection of index exploitation rules can be user-defined. A conventional system recognizes only simple patterns, while the present invention recognizes complex patterns. Once the complex pattern has been matched, the present invention determines how to generate search ranges based on the matched predicates. Because the conventional system cannot perform complex pattern matching, the conventional system is not able to determine how to generate search ranges based on the matched complex predicates. The existing index exploitation has several disadvantages. For example, conventional index exploitation systems are not able to exploit indexes on expressions, such as "salary +bonus +award" or "location.area", referenced in the example CREATE INDEX statement below: CREATE INDEX ON employee (salary+bonus+award, location..area); In particular, in conventional index exploitation systems, sorting is based on built-in relations. That is, the above statement indicates that records of the "employee" table are to be sorted based on a value coming from the first expression "salary+bonus+award" and the second expression "location..area". The index manager sorts records, for example, based on an integer value when "salary+bonus+award" results in an integer value. However, the present invention no longer uses column values directly to sort. Instead, the present invention uses expressions over column values to sort records. Existing systems are not able to handle this because they only recognize simple columns and do not recognize expressions. Another disadvantage of conventional index exploitation is that it can only index on built-in relations. Therefore, a conventional system could not exploit indexes based n user-defined relations, such as the example CREATE INDEX statement below that indexes using a user-defined index type, which is further explained in Application Ser. No. 09/112,723, entitled "Supporting Database Indexes Based on a Generalized B-Tree ndex," filed on same date herewith, by Stefan Dessloch, et al., which is incorporated by reference herein: CREATE INDEX ON employee (location) USING spatial_index(location); In the above example, sorting is performed on a compound object, "location", using a user-defined function "spatial_index". On the other hand, conventional index exploitation systems are unable to sort on compound objects. Exploitation of Database Indexes Overview There are two major components involved in exploiting indexes: pattern matching and selection of index exploitation rules. In the present invention, both pattern matching and selection of index exploitation rules can be user-defined. The present invention recognizes complex patterns in predicates. Once the complex pattern has been matched, the present invention determines how to generate search ranges based on the matched predicates. FIG. 6 is a block diagram illustrating the index exploitation system 600 of the present invention. The index exploitation system 600 receives both built-in and user-defined relations along with both built-in and user-defined predicates. The index exploitation system 600 processes this input to identify an optimal access path for use in processing a query. In particular, the index exploitation system 600 determines whether an index for the database is based on at least one user-defined relationship. Then, if it is determined that the index is based on at least one user-defined relationship, the index exploitation system determines whether an index pattern matches at least one predicate. If the index pattern matches at least one predicate, the index is exploited using that predicate to delimit the search ranges based on the user-defined relationship. The user-defined relationship is encapsulated within a user-defined index-type on which the index is based. Unlike conventional systems, the index exploitation system 600 sorts table records based on user-defined relations as well as built-in relations. With the index exploitation system 600, the values for the index keys, by which table records are sorted, are generated with user-defined functions, such as "generated_grid_index" (i.e., the index entries are function (expression) result over index keys). The index exploitation system 600 is able to process user-defined relations on which an index is based, instead of using only the "total ordering" of conventional systems. Generalized Pattern Matching The index exploitation system 600 of the present invention performs generalized pattern matching . The following discussion focuses on pattern matching for two cases: pattern matching for the exploitation of a user-defined index type and pattern matching for the exploitation of an index-on-expression. The index exploitation system 600 uses user-defined predicates as well as built-in predicates (e.g., <, >, <=, >=, =) to exploit indexes. The index exploitation system 600 finds applicable predicates for index exploitation based on a generalized pattern matching mechanism. Additionally, the index exploitation system 600 uses indexes in a multi-dimensional search as well as in a one-dimensional search. Pattern Matching for the Exploitation of a User-Defined Index Type The following pseudocode represents the code used to create a user-defined function: CREATE FUNCTION within (x shape, y shape) RETURN INT LANGUAGE C . . . EXTERNAL NAME `spatialLib!within` AS PREDICATE WHEN=1 SEARCH BY INDEX EXTENSION gridlndex WHEN KEY(X) USE search1By2(y) WHEN KEY(Y) USE search2By1(x); The CREATE FUNCTION creates a function called "within" with arguments "x" and "y". The AS PREDICATE WHEN=1 statement identifies the matching pattern for index exploitation. The SEARCH BY INDEX EXTENSION statement identifies "gridlndex" as an index based on a user-defined relationship. The WHEN statements identify index exploitation rules to be used for each argument. In particular, if the argument represented by "x" is an index using the user-defined relationship in "gridlndex", the index exploitation rule used is "search1By2(y)". If the argument represented by "y" is an index using the user-defined relationship in "gridlndex", the index exploitation rule used is "search2By1(x)". The index exploitation rules would be further defined in the user-defined function "gridIndex". FIG. 7 is a diagram illustrating predicate cloning for a user-defined predicate by the index exploitation system 600. Predicate cloning is used to enable generalized pattern matching. The following SQL statement 700 is illustrated in FIG. 7: SELECT*FROM customers WHERE within (location, :circle) 1; SQL statement 700 would be represented in a conventional system as a binary tree 702. The index exploitation system 600 clones the "within (location, :circle)=1" user-defined predicate. In particular, the index exploitation system 600 generates a new binary tree 704 with binary sub-trees 706 and 708, which represent the two user-defined "within (location, :circle)=1" predicates of SQL statement 700 after the cloning. The two sub-trees 706 and 708 are ANDed together in the binary tree 704. When two predicates are ANDed together, if the evaluation of the original predicate would evaluate to a TRUE value, then ANDing the original predicate with itself would result in a TRUE value. Pattern Matching for the Exploitation of an Index-on-Expression The following is a sample SQL statement for creating a table used by the index exploitation system 600:
CREATE TABLE employee (eid int,
dept int,
name varchar (30),
salary float,
bonus float,
. . . )
The CREATE TABLE statement creates a table named "employee" with at least five columns, including columns for salary and bonus. The ellipses indicate that other columns may be included in the table. FIG. 8 is a diagram illustrating generalized pattern matching for key expressions with the index exploitation system 600. The following SQL statement 800 is illustrated in FIG. 8: SELECT*FROM employee WHERE salary+bonus=50000 AND salary+bonus<70000; The index exploitation system 600 is able to understand that the values in the salary and bonus columns of the "employee" table are to be added together to process the "salary +bonus" expression. Additionally, the index exploitation system 600 understands that the range of values will be from 50000 to under 70000. Moreover, the index exploitation system 600 is able to perform generalized pattern matching. For example, the following SQL statement 802 creates an index: CREATE INDEX compensation ON employee (salary+bonus); The CREATE INDEX statement 802 creates an index called "compensation" on the "employee" table using the expression "salary +bonus". The index exploitation system 600 recognizes the expression "salary +bonus" and is able to match this to the "salary+bonus" expression in the SELECT statement 800. Selection of Index Exploitation Rules The index exploitation system 600 of the present invention exploits index using index exploitation rules. The following discussion focuses on two cases of index exploitation: a case in which an index is based on built-in relations and a case in which an index is based on a user-defined relation. The following pseudocode represents the steps of the index exploitation system 600 to perform index exploitation: For each index I: IF the index is based on built-in relations THEN call exploitBtreelndex, ELSE call exploitNonBtreelndex; Initially, the index exploitation system 600 determines whether the index is based on built-in relations. If the index is based on built-in relations, the index exploitation system 600 will invoke the exploitBtreelndex function described below. If the index is based on user-defined relations, the index exploitation system 600 will invoke the exploitNonBtreelndex function described below. Index Exploitation Rule Selection for Indexes Based on Built-in Relationships The following is pseudocode for the exploitBtreelndex function:
For each index I:
For each dimension D of index I:
For each predicate P in the query:
(1) Determine whether either side of predicate P is an
index key part, which, in this case, can be either a
simple column or a scalar expression.
(2) If one side of predicate P is an index key part,
determine whether the other side of predicate P is a
"bounded" expression.
(3) If the other side of predicate P is a "bounded"
expression, use predicate P as the key predicate for
the dimension D of index I.
(4) Next, determine the start/stop key for dimension D
with the "bounded" expression in predicate P.
In the above pseudocode for index exploitation, the index exploitation system 600 selects each index "I" and each dimension "D" of the index. For each index and dimension, and each predicate "P" in a query, the index exploitation system 600 determines whether either side of the predicate "P" is an index key part. If neither side of the predicate "P" is an index key part, then the index exploitation system 600 does not perform index exploitation. If either side of the predicate "P" is an index key part, then the index exploitation system 600 determines whether the other side of the predicate "P" is a "bounded" expression. If the other side of the predicate "P" is not a "bounded" expression, the index exploitation system 600 does not perform index exploitation. If the other side of the predicate "P" is a "bounded" expression, the index exploitation system 600 uses the predicate "P" as the key predicate for the dimension "D" of index "I". That is, the predicate "P" is used to exploit the index "I" for the identified key part. The index exploitation system 600 sets the start and stop keys that define the range for dimension "D" are by the "bounded" expression in predicate "P". Selection of Index Exploitation Rules for User-Defined Index Types The following is pseudocode for the exploitNonBtreelndex function:
For each predicate P in the query:
For each combination of arguments A of predicate P:
(1) Determine whether each combination of arguments
A is an index key part, which, in this case, is a set of
columns specified in the index exploitation rules
of the corresponding CREATE FUNCTION
statement.
(2) If the combination of arguments A is an index key
part, find the index exploitation rule E for
the combination of arguments A.
(3) Determine whether the remaining arguments used
by E are "bounded".
(4) If the remaining arguments are "bounded", record
the user-defined range function in the access plan.
(5) Next, generate the start/stop keys using the user-
defined range function at run-time.
The above pseudocode for index exploitation selects each predicate "P" in the query, and then processes each combination of arguments "A" in the predicate "P". The index exploitation system 600 determine whether each combination of arguments "A" is an index key part. If the combination of arguments "A" is not an index key part, the index exploitation system 600 does not perform index exploitation. If the combination of arguments "A" is an index key part, the index exploitation system 600 finds the index exploitation rule "E" for argument "A". The index exploitation rule refers to a search method in which the range-producing function is specified. The index exploitation system 600 then determines whether the remaining arguments used by the index exploitation rule "E" are "bounded". If they are not "bounded", the index exploitation system 600 does not perform index exploitation. If they are "bounded", the index exploitation system 600 records the user-defined range function in the access plan. The index exploitation system 600 generates the start and stop keys for the range by using the range function at run-time. For example, if the predicate "P" is "within (location, :circle)=1", then the index exploitation system 600 would select each combination of arguments "A" of the predicate "P". Assuming that the "location" argument were selected, the index exploitation system 600 would then find the index exploitation rule for the argument. The user-defined function "within" is defined above and includes the following statement defining index exploitation rules: SEARCH BY INDEX EXTENSION gridlndex WHEN KEY(X) USE search1By2(y) WHEN KEY(Y) USE search2By1(x); Here, since the argument "location" is "x", the search1By2(y) index exploitation rule is used, with the argument ":circle" replacing "y". Then, the index exploitation system 600 determines whether the remaining argument ":circle" is "bounded". Since the argument ":circle" is bounded, the index exploitation system 600 would record the user-defined range function in the access plan and generated the range using the user-defined range function at run-time. Advantages of Exploitation of Database Indexes The index exploitation system 600 has several advantages. The index exploitation system 600 provides a uniform model for exploiting database indexes. Additionally, the index exploitation system 600 provides a generalized technique for exploiting indexes derived from the existing technique. Moreover, the index exploitation system 600 is easy to implement and has a minimum impact on the existing system. Nonetheless, the index exploitation system 600 is very powerful and can handle indexing on expressions, user-defined index types, etc. Flow Diagrams FIG. 9 is a flow diagram illustrating the steps performed by the index exploitation system 600 to perform index exploitation. In Block 900, the index exploitation system 600 determines whether the index is based on a user-defined relation. If the index is based on a user-defined relation, the index exploitation system 600 continues to Block 902, otherwise, the index exploitation system 600 continues to Block 904. In Block 902, the index exploitation system 600 invokes the exploitNonBtreelndex function. In Block 904, the index exploitation system 600 invokes the exploitBtreelndex function. FIG. 10 is a flow diagram illustrating the steps performed by the index exploitation system 600 to execute the exploitNonBtreelndex function. The index exploitation system 600 performs steps 1000-1012 for each predicate in the query. In Block 1000, for each predicate, the index exploitation system 600 selects the next combination of arguments of the predicate, starting with the first. In Block 1002, the index exploitation system 600 determines whether the combination of arguments is an index key part. If the combination of arguments is an index key part, the index exploitation system 600 continues to Block 1004, otherwise, the index exploitation system 600 does not perform index exploitation and is done. In Block 1004, the index exploitation system 600 identifies the index exploitation rule for the combination of arguments. In Block 1006, the index exploitation system 600 determines whether the remaining arguments are "bounded". If the remaining arguments are "bounded", the index exploitation system 600 continues to Block 1008, otherwise the index exploitation system 600 does not perform index exploitation and is done. In Block 1008, the index exploitation system 600 records the user-defined range function in the access plan. In Block 1010, the index exploitation system 600 generates the start and stop keys using the user-defined range function at run-time. In Block 1012, the index exploitation system 600 determines whether all arguments have been selected. If all arguments have been selected, the index exploitation system 600 is done, otherwise, the index exploitation system 600 loops back to Block 1000 to select the next combination of arguments. FIG. 11 is a flow diagram illustrating the steps performed by the index exploitation system 600 to execute the exploitBtreelndex function. The index exploitation system 600 performs steps 1100-1110 for each dimension of each index. In Block 1100, for each dimension of an index, the index exploitation system 600 selects the next predicate, starting with the first. In Block 1102, the index exploitation system 600 determines whether either side of the predicate is an index key part. If either side is an index key part, the index exploitation system 600 continues to Block 1104, otherwise, the index exploitation system 600 does not perform index exploitation and is done. In Block 1104, the index exploitation system 600 determines whether the other side of the predicate is a "bounded" expression. If the other side of the predicate is a "bounded" expression, the index exploitation system 600 continues to Block 1106, otherwise, the index exploitation system 600 does not perform index exploitation and is done. In Block 1106, the index exploitation system 600 uses the predicate as the key predicate for the dimension of the index. In Block 1108, the index exploitation system 600 generates the start and stop keys using the "bounded" expression in the predicate. In Block 1110, the index exploitation system 600 determines whether all predicates have been selected. If all predicates have been selected, the index exploitation system 600 is done, otherwise, the index exploitation system 600 loops back to Block 1100 to select the next predicate. Conclusion This concludes the description of the preferred embodiment of the invention. The following describes some alternative embodiments for accomplishing the present invention. For example, any type of computer, such as a mainframe, minicomputer, or personal computer, or computer configuration, such as a timesharing mainframe, local area network, or standalone personal computer, could be used with the present invention. In summary, the present invention discloses a method, apparatus, and article of manufacture for a computer-implemented exploitation of database indexes. The present invention exploits database indexes for databases containing structured data and non-structured data. Additionally, the present invention generates search ranges for user-defined predicates using built-in relations or user-defined relations. Moreover, the present invention recognizes general patterns for index exploitation. The foregoing description of the preferred embodiment of the invention has been resented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.
|
Same subclass Same class Consider this |
||||||||||
