User-defined search using index exploitation6266663Abstract A method, apparatus, and article of manufacture for a computer-implemented model for user-defined search in Relational Database Management Systems. A statement is executed in a database stored on a data storage device connected to a computer. Data that is qualified by user-defined functions is located based on a model that supports user-defined search. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
CREATE TABLE emp_table (name varchar (20),
age integer,
salary dec (11,2)
. . .
)
The above statement creates a table with several columns, including a name column, an age column, and a salary column. The following example illustrates a query on the table "emp_table": SELECT*FROM emp_table WHERE age>30 AND age<35; The above query selects all rows of the table "emp_table" for which the age is between 30 and 35. The following example illustrates creation of an index: CREATE INDEX ON emp_table (age) The above statement creates an index on the table "emp_table", with the index containing the age data. The index is used to increase the efficiency of processing the query. A traditional RDBMS only supports efficient searches for simple scalar predicates, such as "age>30" and "age<35", when an index has been created on the column referenced in the predicate (e.g, in this case, the "age" column). Advanced RDBMSs allow users to define data based on user-defined types. For example, a user can define a data type "polygon" or "point". Then, the user can define data having that type. For example, "square" and "rectangle" can be generated based on the user-defined type "polygon". Similarly, "point1 " can be generated based on the user-defined data type "point". When user-defined functions are defined to a RDBMS, a predicate can include a user-defined function. For example, if a user defines a function called "distance", then the user may use the user-defined function in a predicate, such as "distance (point1, point2)<=20". Similarly, if a user defines a function called "within", the user may use the function in a predicate, such as "within (point1, areal)=1". The traditional indexing and index exploitation techniques do not work in an environment in which user-defined data types and user-defined functions are used. The search model of the present invention extends a RDBMS traditional indexing technique to include user-defined behavior. The search model introduces a user-defined index type with a user chosen name. The user-defined index type is associated with the following elements: A number of index instance parameters. A number of index key source data types. A number of index key target data types. A key transform function. The key transform function accepts index key source data of index key source data types and index instance parameters as its inputs to generate index key target data of index key target data types. Key transformation may generate a single or multiple index key target data for one index key source data. A number of index search methods, each method is associated with the following: A number of search argument data types. A search argument transform function. The search argument transform function accepts search argument data of search argument data types and index instance parameters to generate a number of pairs of index key target data of index target data types. Each pair of index key target data is used as a start or stop condition for index searching. A screening function, which accepts search argument data, index instance parameters, and index entry data to filter out index entries. The user-defined index type can be chosen to create an index on columns of a table when the data types of these columns match the index key source data types. The search model of the present invention also introduces index exploitation rules for user-defined functions, which are used as query predicates. The following elements are introduced: A basic predicate of a form of "User-Defined Function op Value". A predicate index exploitation technique for the above basic predicate, which is associate with a number of index exploitation rules, with each rule being defined by the following: A user-defined index type name. A search method name of the user-defined index. A mapping from the subset of the basic predicate parameters and value item to the user-defined index type key sources. A mapping from the subset of the basic predicate parameters and value item to the user-defined index type search arguments. When a database query is submitted by a user, each table in the query is accessed. The present invention provides a simple and efficient technique to exploit each index created on the table with optimal exploiting predicates, especially when both the index and the predicate are based on user-defined data and functions. Each index of the table to be accessed is checked against the predicates of the query. One index can be efficiently exploited when there exists an index exploitation rule in a predicate of the query with the following conditions satisfied: The index type in the user-defined index matches the index type of the index exploitation rule of the predicate. The index key source columns specified in the index fully agree the index key source columns specified in the index exploitation rule of the predicate. The index search arguments specified in the index exploitation rule of the predicate can be passed to the index search method as valid values. The index can be exploited by the matching predicate with the matching search technique, and the predicate becomes the exploiting predicate for the index. The following discussion provides an example of SQL extensions in one embodiment of the present invention. The following illustrates creation of a user-defined index type by using "CREATE INDEX EXTENSION":
CREATE INDEX EXTENSION grids (level varchar(20))
with index keys
for (s shape)
generated by gridentry (s..mbr..xmin,
s..mbr..ymin,
s..mbr..xmax,
s..mbr..ymax,
level)
with search methods for index keys (gx, gy, xmin, ymin,
xmax, ymax)
when within (r shape)
range through gridwithin (r..mbr..xmin,
r..mbr..ymin,
r..mbr..xmax,
r..mbr..ymax,
level)
check with withinx (r..mbr..xmin,
r..mbr..ymin,
r..mbr..xmax,
r..mbr..ymax,
level,
gx, gy, xmin, ymin,
xmax, ymax)
when contains (r shape)
range through gridcontains (r..mbr..xmin,
r..mbr..ymin,
r..mbr..xmax,
r..mbr..ymax,
level)
check with containsx (r..mbr..xmin,
r..mbr..ymin,
r..mbr..xmax,
r..mbr..ymax,
level,
gx, gy, xmin, ymin,
xmax, ymax);
The CREATE INDEX EXTENSION "grids" statement creates an index type. The search model of the present invention has added the CREATE INDEX EXTENSION feature to enable users to create user-defined index types. The name of the index type being created is "grids". The index type "grids" takes a value for "level" as input when an index instance of "grids" is created. The statement "with index keys" identifies the input for the index. The statement "for (s shape)" indicates that data of type shape will be in the index (i.e., the data of type shape is the index source data). The statement "generated by gridentry" transforms the data of type "shape" to index target data using the user-defined function "gridentry". The statement "with search methods for index keys" defines the search methods to be used for the index of type "grids". The arguments "gx", "gy", "xmin", "ymin", "xmax", and "ymax" are used to rename the content of the index, and these arguments match the output of the user-defined function "gridentry". Each "when" statement under the "with search methods" statement identifies a particular search method. For example, for "when within (r shape)", "within" is a user-defined search method taking "r" of type "shape" as a search argument. The statement "range through gridwithin" generates a range for the target domain using the user-defined function "gridwithin". The statement "check with withinx" applies Boolean logic using the user-defined function "withinx" whose search arguments are "r..mbr..xmin", "r..mbr..ymin", "r..mbr..xmax", "r..mbr..ymax", and "level" and whose source data are arguments "gx", "gy", "xmin", "ymin", "xmax", and "ymax". The following illustrates creation of a user-defined index based on the above user-defined index type: CREATE INDEX emplocx on emp (loc) using grids (`10, 100, 1000`); The CREATE INDEX statement creates an index named "emplocx" on the table "emp" using the user-defined index type "grids". The following illustrates creation of a user-defined predicate based on a user-defined function:
CREATE FUNCTION within (s1 shape, s2 shape)
returns integer
language c
parameter style db2sq1
not variant not fenced
no sq1 no external action
external name `gis!within`
as predicate within (x, y) = 1
filter by withiny (x..mbr..xmin,
x..mbr..ymin,
x..mbr..xmax,
y..mbr..xmin,
y..mbr..ymin,
y..mbr..xmax,
y..mbr..ymax)
search by index extension grids
when key x use within (y)
when key y use contains (x);
The statement CREATE FUNCTION "within" creates a user-defined function "within" with arguments "s1" and "s2", each of which is of type "shape". The user-defined function "within" is different from the user-defined search method "within" discussed above, although both have the same name in this example. The statement "as predicate within (x, y)=1" indicates that index exploitation methods follow. The statement "filter by" enables a user to provide a function, such as "withiny", to filter data. The statement "search by index extension grids" defines search methods for the index defined using the user-defined index type "grids". The statement starting with "when" is an index exploitation rule. For example, the statement "when key x use within(y)" indicates that when the key of the index is "x", then the user-defined search method "within()" with argument "y" is used for searching. The statement "when key y use within(x)" indicates that when the key of the index is "y", then the user-defined search method "contains()" with argument "x" is used for searching. The following SELECT statement performs index exploitation for a query: SELECT*FROM employees e, stores s WHERE within (e.loc, s.territory)=1; The above select statement includes a predicate "within (e.loc, s.territory)=1" which uses the above user-defined function "within". The user-defined search method "within" was defined with a statement "search by index extension grids" that indicated the search methods to be used with the user-defined function "within". Using this, the search model of the present invention is able to perform index exploitation on the user-defined index of the user-defined index type using the exploitation rules specified when the user-defined function was created. The following SELECT statement shows an example SQL statement for which an index can be exploited to retrieve data from two tables "employees" and "stores" and for which the predicate includes a user-defined function "dist": SELECT*FROM employees e, stores s WHERE dist (e.loc, s.loc)<100; For the above statement, if one or both of the tables, "employees" or "stores", is accessed by an index based on a user-defined index type, and if the index exploitation rules specified by the user-defined function "dist" are applicable for the index, the exploitation rules are used to exploit the index. The following SELECT statement shows an example SQL statement for which an index can be exploited to retrieve data from two tables "employees" and "stores" and for which the predicate includes user-defined functions "within" and "dist": SELECT*FROM employees e, stores s WHERE within (e.loc, s.territory) or dist (e.loc, s.loc)<100; For the above statement, if one or both of the tables, "employees" or "stores", is accessed by an index based on a user-defined index type, and if the index exploitation rules specified by the user-defined functions "within" and "dist" are applicable to the index, then the index exploitation rules are used to exploit the index. The Extended DBMS Architecture for User-Defined Search FIG. 4 illustrates a compiler 400 of the present invention, which performs steps 204 and 206, discussed above. The compiler 400 of the present invention contains the following "extended" modules: Predicate Specification 404 and Index Exploitation 406. The run-time 450 of the present invention contains the following "extended" modules: Range Producer 410, DMS Filter 424, RDS Filter 426, and Key Transformer 440. The "extended" modules have been modified to provide the capability for pushing user-defined types, index maintenance and index exploitation, and user-defined functions and predicates inside the database. The Predicate Specification module 404 has been extended to handle user-defined predicates. The Index Exploitation module 406 has been modified to exploit user-defined indexes and provide more sophisticated pattern matching (e.g., recognizes "salary+bonus"). Additionally, the Predicate Specification module 404, the Index Exploitation module 406, and the DMS Filter module 424 work together to provide a technique to evaluate user-defined predicates using a three-stage technique. In the first stage, an index is applied to retrieve a subset of records using the following modules: Search Arguments 408, Range Producer 410, Search Range 412, Search 414, and Filter 420. For the records retrieved, in the second stage, an approximation of the original predicate is evaluated by applying a user-defined "approximation" function to obtain a smaller subset of records, which occurs in the DMS Filter module. In the third stage, the predicate itself is evaluated to determine whether the smaller subset of records satisfies the original predicate. The Range Producer module 410 has been extended to handle user-defined ranges, and, in particular, to determine ranges for predicates with user-defined functions and user-defined types. The DMS Filter module 424 and the RDS Filter module 426 have been extended to handle user-defined functions for filtering data. To process a query 402, the compiler 400 receives the query 402. The query 402 and the predicate specification from the Predicate Specification module 404 are submitted to the Index Exploitation module 406. The Index Exploitation module 406 performs some processing to exploit indexes. At run-time, the Search Arguments module 408 evaluates the search argument that will be used by the Range Producer module 410 to produce search ranges. The Range Producer module 410 will generate search ranges based on user-defined functions. The Search Range module 412 will generate final search ranges. The Search module 414 will perform a search using the B-Tree 416 to obtain the record identifier (ID) for data stored in the data storage device 418. The retrieved index key is submitted to the Filter module 420, which eliminates non-relevant records. Data is then fetched into the Record Buffer module 422 for storage. The DMS Filter module 424 and the RDS Filter module 426 perform final filtering. The Key Transformer module 440 has been modified to enable users to provide user-defined functions for processing inputs to produce a set of index keys. The user-defined functions can be scalar functions or table functions. A scalar function generates multiple key parts to be concatenated into an index key. A table function generates multiple sets of key parts, each of which is to be concatenated into an index key. Additionally, the input to the Key Transformer module 440 can include multiple values (e.g., values from multiple columns or multiple attributes of a structured type), and the user-defined functions can produce one or more index keys. The compiler 400 can process various statements, including a Drop 428, Create/Rebuild 430, or Insert/Delete/Update 432 statements. A Drop statement 428 may be handled by Miscellaneous modules 434 that work with the B-Tree 416 to drop data. An Insert/Delete/Update statement produce record data in the Record Buffer module 436 and the RID module 438. The data in the Record Buffer module 436 is submitted to the Key Transformer module 440, which identifies key sources in the records it receives. Key targets from the Key Transformer module 440 and record identifiers from the RID module 438 are used by the Index Key/RID module 442 to generate an index entry for the underlying record. Then, the information is passed to the appropriate module for processing, for example, an Add module 444 or a Delete module 446. The compiler 400 will process a Create/Rebuild statement 430 in the manner of the processing a Drop statement 428 when data is not in the table or an Insert/Delete/Update statement 432 when data is in the table. Flow Diagram FIG. 5 is a flow diagram illustrating the steps performed by the user-defined search mechanism. In Block 500, the user-defined search mechanism provides a model for locating data qualified by user-defined functions. In Block 502, the user-defined search mechanism creates a user-defined index type based on the provided model in response to user input. In Block 504, the user-defined search mechanism creates an index based on the created user-defined index type in response to user input. In Block 506, the user-defined search mechanism receives user-defined functions from a user, which include index exploitation rules for exploiting the index. In Block 508, the user-defined search mechanism receives a query that specifies a user-defined function from the user. In Block 510, the user-defined search mechanism exploits the index using the index exploitation rules defined in the user-defined function specified in the query. In particular, the user-defined search mechanism is able to exploit the index when it determines that the index to the table, from which data is to be retrieved using the query, is a user-defined index type that can be exploited with the user-defined function in the query. Conclusion This concludes the description of the preferred embodiment of the invention. The following describes some alternative embodiments for accomplishing the present invention. For example, any type of computer, such as a mainframe, minicomputer, or personal computer, or computer configuration, such as a timesharing mainframe, local area network, or standalone personal computer, could be used with the present invention. In summary, the present invention discloses a method, apparatus, and article of manufacture for a computer-implemented model for user-defined search in Relational Database Management Systems. The present invention provides a model to efficiently locate user-defined data qualified by user-defined functions. The present invention also ensures that the model works with existing RDBMSs. The foregoing description of the preferred embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.
|
Same subclass Same class Consider this |
||||||||||
