Generalized key indexes5870747Abstract Generalized key indexes enable a first table of a relational database to be indexed using an index key and index conditions, wherein either or both of the index key and index conditions may reference multiple tables of the database or may be an expression using fields of one or more tables other than the first table. The generalized key indexes directly associate generalized index key values with record identifiers of records of the first table, thus enabling efficient storage and data retrieval. Claims What is claimed is: Description BACKGROUND
______________________________________
CREATE GK-INDEX <index-name>
ON <table-name>
(SELECT AS KEY <key-list>
FROM <other-table-list>
WHERE <condition-list>)
______________________________________
where the parameters are as follows:
______________________________________
index-name:
identifies the index being created
table-name:
identifies the table whose records are being indexed.
key-list: includes one or more fields from one or more tables
of the relational database.
other-table-list:
includes each table, other than the table
corresponding to table-name, having a field
referenced either in the key-list or the condition-list.
If the index being defined is a single-table index,
this list may be empty.
condition-list:
includes zero or more conditions (expressions) to be
satisfied. This list may be empty.
______________________________________
One embodiment creates a GK index from such a statement as if processing a query and storing the results. For example, the general format for creating a GK index may be expressed as the query:
______________________________________
SELECT <key-list>, <table-name>.RID
FROM <other-table-list>, <table-name>
WHERE <condition-list>
ORDER BY <order-list>
______________________________________
where <table-name>.RID references the table records and the <order-list> generally reflects the fields as ordered in the key-list. GK indexes may be used to create conventional single-table indexes, including virtual column indexes and partial indexes. In an implementation using the above format, single-table GK indexes are created by statements such as:
______________________________________
CREATE GK-INDEX <index-name>
ON <table-name>
(SELECT AS KEY <key-list>
WHERE <condition-list>)
______________________________________
which may be expressed as the query:
______________________________________
SELECT <key-list>, <table-name>.RID
FROM <table-name>
WHERE <condition-list>
ORDER BY <order-list>
______________________________________
Thus, for example, the standard one-table index discussed with reference to FIG. 4 indexing the CUST table by name would be created by a statement such as:
______________________________________
CREATE GK-INDEX Cust.sub.-- Name.sub.-- Index
ON CUST
(SELECT AS KEY CUST.cust.sub.-- name)
______________________________________
which may be expressed as the query:
______________________________________
SELECT CUST.cust.sub.-- name, CUST.RID
FROM CUST
ORDER BY CUST.cust.sub.-- name
______________________________________
The virtual column index 700 discussed above with reference to FIG. 7 may be created by the statement:
______________________________________
CREATE GK-INDEX VC.sub.-- Index
ON ORD
(SELECT AS KEY (ORD.price - ORD.discount)
______________________________________
which may be expressed as the query:
______________________________________
SELECT (ORD.price - ORD.discount)
ORD.RID
FROM ORD
ORDER BY (ORD.price - ORD.discount)
______________________________________
Similarly, the partial index 800 discussed with reference to FIG. 8 may be created by:
______________________________________
CREATE GK-INDEX Partial.sub.-- Index
ON ORD
(SELECT AS KEY ORD.date
WHERE ORD.price .gtoreq. 50,000.00)
______________________________________
which may be expressed as the query:
______________________________________
SELECT ORD.date, ORD.RID
FROM ORD
WHERE ORD.price .gtoreq. 50,000.00
ORDER BY ORD.date
______________________________________
As indicated above, GK indexes encompass conventional single-table indexes. GK indexes may also be multi-table. In processing the query corresponding to a multi-table GK index, intermediate tables are referenced in determining the final association of GK index key values with record identifiers of records of the indexed table, but the GK index stores the end results. Query processing methods have been extensively studied, and a variety of methods can be used to process a query corresponding to a "Create GK Index" command. FIG. 9 is a flow diagram illustrating a method for creating a GK index. After receiving a command to create a GK index GKI on a table T (referred to above as <table-name>), using an index key K (referred to above as <key-list>) and index conditions {C} (referred to above as <condition-list>) (910), values for index key K are determined (920). For each index key value, associated record identifiers of table T satisfying the conditions {C} are determined (930). Index key values and their associated record identifiers are stored as index entries (940). Determining associated record identifiers of records of table T for an index key value can reference an intermediate table (one of the tables in the <other-table-list>) other than T. For example, if a table in the <other-table-list> is referenced by the index key or in a condition of {C}, data from that other table will need to be retrieved. However, whether intermediate tables are referenced, the index stores the result of the mapping, directly correlating an index key value with associated record identifiers of records of indexed table T. FIG. 10 shows a GK index 1000 associating customer names with record identifiers of item records, performing the same function as the indexes of FIGS. 4-6. As a result of a command such as:
______________________________________
CREATE GK-INDEX
cust.sub.-- items
ON ITEM
(SELECT AS KEY CUST.cust.sub.-- name
FROM CUST
WHERE CUST.cust.sub.-- no = ORD.cust.sub.-- no AND
ORD.ord.sub.-- no = ITEM.ord.sub.-- no)
______________________________________
the corresponding query:
______________________________________
SELECT CUST.cust.sub.-- name, ITEM.RID
FROM CUST, ORD, ITEM
WHERE CUST.cust.sub.-- no = ORD.cust.sub.-- no AND
ORD.ord.sub.-- no = ITEM.ord.sub.-- no
ORDER BY CUST.cust.sub.-- name
______________________________________
is processed, creating the GK index 1000 of FIG. 10. Like the index 400 of FIG. 4, GK index 1000 associates customer names with record identifiers of item records. However, as shown, GK index 1000 has only a single level, directly associating values for customer names from the CUST table with record identifiers of records of the ITEM table, eliminating reference to the ORD table. Conventional multi-table indexes such as the indexes of FIGS. 4-6 have multiple levels referencing intermediate records to associate key values with record identifiers of table records. Not only do multiple levels result in larger memory requirements and slower access times, they also place some limitations on the types of indexes that can be created. For example, conventional multi-level multi-table indexing does not permit an index key to be based on fields of more than one table. In contrast, because GK indexes have only a single level, directly associating GK index key values with record identifiers of table records, users have great flexibility in determining how the association is defined. For example, a GK index key may be created on a first table, using an index key that includes an expression using one or more fields of a second table. Referring to the tables of FIGS. 1-3, to determine customers who had made single order purchases having a particular total cost, GK index 1100 of FIG. 11 may be created as follows:
______________________________________
CREATE GK-INDEX cust.sub.-- expenditure
ON CUST
(SELECT AS KEY ((ORD.price - ORD.discount) *
(1 + ORD.tax))
FROM ORD
WHERE ORD.cust.sub.-- no = CUST.cust.sub.-- no)
______________________________________
which would be processed as:
______________________________________
SELECT ((ORD.price - ORD.discount) *
(1 + ORD.tax)), CUST.RID
FROM ORD, CUST
WHERE ORD.cust.sub.-- no = CUST.cust.sub.-- no
______________________________________
GK index keys can also be created on fields from more than one table. For example, the following command would create GK index 1200 of FIG. 12, associating a combination of a customer name and an order date with record identifiers of the items purchased in that order:
______________________________________
CREATE GK-INDEX
cust.sub.-- date.sub.-- item
ON ITEM
(SELECT AS KEY (CUST.cust.sub.-- name,
ORD.date)
FROM CUST, ORD
WHERE CUST.cust.sub.-- no = ORD.cust.sub.-- no AND
ORD.ord.sub.-- no = ITEM.ord.sub.-- no)
______________________________________
which may be created by:
______________________________________
SELECT (CUST.cust.sub.-- name,
ORD.date), ITEM.RID
FROM CUST, ORD, ITEM
WHERE CUST.cust.sub.-- no = ORD.cust.sub.-- no AND
ORD.ord.sub.-- no = ITEM.ord.sub.-- no
ORDER BY CUST.cust.sub.-- name, ORD.date
______________________________________
These examples reflect the flexibility of generalized key indexing to create indexes customized to a user's specific needs. To ensure continuing accuracy of a GK index after data is added, deleted, or altered in a table that was referenced to create that GK index, the GK index must be updated to verify that its entries are still accurate after the data change has occurred. Various update methods exist and may be adapted to update GK indexes. One embodiment reconstructs a GK index when any of the referenced fields are altered, or deletes the GK index if the alteration is incompatible with the GK index (e.g., the field is deleted or changed to an incompatible data type). This method is especially suitable for data warehousing environments, in which data is not frequently updated. A database system uses a GK index to access data by finding the GK index key value in the GK index and following associated record identifiers to the relevant table records. For example, using GK index 1000 of FIG. 10 to find item records for a customer named Smith, GK index 1000 is first searched for an entry for Smith 1030a, and the item records are retrieved by the corresponding record identifiers 1030b. To use an index, relational database systems first determine whether that particular index is appropriately used to process a given query. Typically, database systems have a query optimizer responsible for determining which indexes, if any, are used to process a given query. Generally, query optimizers determine whether to use a particular index for a given query by determining (1) whether the index is applicable to the query; and (2) whether the index is optimally used to process the query. Consider, for example, the following query Q:
______________________________________
SELECT <query-select-list>
FROM <query-table-list>
WHERE <query-condition-list>
______________________________________
and GK index GKI, created as follows:
______________________________________
CREATE GK-INDEX GKI
ON T
(SELECT AS KEY <index-key-list>
FROM <index-other-table-list>
WHERE <index-condition-list>)
______________________________________
Generally, for a GK index to be applicable to a query, the table it indexes must be referenced by the query. FIG. 13 is a flow diagram illustrating the process in one embodiment of determining applicability. If the table T, indexed by GK index GKI, is not one of the tables in the <query-table-list> (1310), GK index GKI is not applicable to the query (1315). Similarly, if the <index-condition-list> is more restrictive than the <query-condition-list> (1320), GK index GKI is not applicable to the query (1315). Otherwise, GK index GKI is applicable to the query (1325). The restrictiveness evaluation (1320) ensures that using the index GKI will not produce incomplete results, which could occur if the index GKI is created on more restrictive conditions than are required by the query. In one embodiment, whether the restrictiveness condition is satisfied is determined by evaluating whether each condition in the <index-condition-list> is satisfied by at least one condition of the <query-condition-list>. The following example illustrates the concept of applicability. Using the tables of FIGS. 1-3, the following indexes are created: (1) cust.sub.-- items index: as illustrated in FIG. 10, cust.sub.-- items index 1000 associates a customer name with record identifiers of item records for items purchased by that customer. (2) cust.sub.-- big.sub.-- items index: illustrated in FIG. 14, cust.sub.-- big.sub.-- items 1400 associates a customer name with record identifiers of item records for items purchased by that customer in orders having a price of at least $50,000.00. This index may be created by:
______________________________________
CREATE GK-INDEX
cust.sub.-- big.sub.-- items
ON ITEM
(SELECT AS KEY CUST.cust.sub.-- name
FROM CUST, ORD
WHERE CUST.cust.sub.-- no = ORD.cust.sub.-- no AND
ORD.ord.sub.-- no = ITEM.ord.no AND
ORD.price .gtoreq. 50,000.00)
______________________________________
3. cust.sub.-- many.sub.-- big.sub.-- items: illustrated in FIG. 15, cust.sub.-- many.sub.-- big.sub.-- items 1500 associates a customer name with record identifiers of item records for items which were purchased in quantities of at least ten, purchased by that customer in orders having a price of at least $50,000.00. This index may be created by:
______________________________________
CREATE GK-INDEX
cust.sub.-- many.sub.-- big.sub.-- items
ON ITEM
(SELECT AS KEY
CUST.cust.sub.-- name
FROM CUST, ORD
WHERE CUST.cust.sub.-- no = ORD.cust.sub.-- no AND
ORD.ord.sub.-- no = ITEM.ord.no AND
ORD.price .gtoreq. 50,000.00 AND
ITBM.quantity .gtoreq. 10)
______________________________________
The following queries are submitted on the database including the tables of FIGS. 1-3: (1) Q1: Find item numbers and item quantities in orders by customer Smith. This query is formatted as:
______________________________________
SELECT ITEM.item.sub.-- no, ITEM.quantity
FROM CUST, ORD, ITEM
WHERE CUST.cust.sub.-- name = "Smith" AND
CUST.cust.sub.-- no = ORD.cust.sub.-- no AND
ORD.ord.sub.-- no = ITEM.ord.sub.-- no
______________________________________
(2) Q2: Find item numbers and quantities of items in orders of at least $60,000.00 by customer Smith. This query is formatted as:
______________________________________
SELECT ITEM.item.sub.-- no, ITEM.quantity
FROM CUST, ORD, ITEM
WHERE CUST.cust.sub.-- name = "Smith" AND
CUST.cust.sub.-- no = ORD.cust.sub.-- no AND
ORD.ord.sub.-- no = ITEM.ord.sub.-- no AND
ORD.price .gtoreq. 60,000.00
______________________________________
(3) Q3: Find item numbers and quantities of items purchased in quantities of at least 25 orders of at least $60,000.00 by customer Smith. This query is formatted as:
______________________________________
SELECT ITEM.item.sub.-- no, ITEM.quantity
FROM CUST, ORD, ITEM
WHERE CUST.cust.sub.-- name = "Smith" AND
CUST.cust.sub.-- no = ORD.cust.sub.-- no AND
ORD.ord.sub.-- no = ITEM.ord.sub.-- no AND
ORD.price .gtoreq. 60,000.00 AND
ITEM.quantity .gtoreq. 25
______________________________________
The applicability of indexes 1000, 1400, and 1500 to queries Q1, Q2, and Q3 is illustrated in FIG. 16. Q1 requests information about all items purchased by Smith. Only index 1000 is applicable because the index conditions of indexes 1400 and 1500 are more restrictive than the query conditions and would not lead to complete results: The condition in index 1400 that ORD.price .gtoreq.50,000 is not satisfied by any condition of Q1. Accordingly, index 1400 indicates only items purchased by Smith in orders of at least $50,000.00. Index 1500 includes conditions both that ORD.price .gtoreq.50,000 and that ITEM.quantity .gtoreq.10, neither of which are satisfied by any conditions of Q2. Thus, index 1500 indicates only items purchased by Smith in orders of at least $50,000.00. This example also illustrates that an index having conditions more restrictive than query conditions can lead to incorrect results: the proper response to Q1, requesting all items purchased by Smith, corresponds to items 201, 202, 203; index 1400 would produce only items 201 and 202; index 1500 produces only item 201. The concept of applicability is further illustrated by queries Q2 and Q3. Q2 requests items purchased by Smith in orders of at least $60,000.00. Index 1400 is applicable for Q2 because its condition that ORD.price .gtoreq.50,000.00 is satisfied by Q2 query condition that ORD.price .gtoreq.60,000.00. Index 1500 is not applicable because although its condition that ORD.price .gtoreq.50,000 is satisfied, its condition that ORD.quantity .gtoreq.10 is not satisfied by any condition of Q2. Q3 requests items purchased by Smith in quantities greater than 25 in orders of at least $60,000.00. Index 1400 is applicable for the same reason it is applicable for Q2. Index 1500 is also applicable because the condition that ITEM.quantity .gtoreq.10 is satisfied by Q3 query condition that ITEM.quantity .gtoreq.25. Once determining that an index is applicable to a given query, query optimizers generally determine whether the index is optimally used in processing the query. Although methods of optimization vary among query optimizers, a commonly used method is based on the Selinger optimizer algorithm, discussed in P. G. Selinger et al., An Access Specification Language for a Relational Data Base System, 1979 ACM SIGMOD INT'L CONF. ON MANAGEMENT OF DATA 23-34 (May/June 1979). The basic idea of the Selinger algorithm is to enumerate possible execution plans for processing a given query, estimating the cost of each possible plan, and selecting the best plan (the plan having the least cost). Thus, processing the query:
______________________________________
SELECT <query-select-list>
FROM <query-table-list>
WHERE <query-condition-list>)
______________________________________
where <query-table-list> is the set {T1, T2, . . . , Tn}, requires that all of the tables in <query-table-list> be joined. One possible Selinger optimizer generates plans for all permutations of tables of the query table set and estimates the cost for each permutation by starting with the first table in the permutation and estimating the cost of joining that table with a second table, adding that cost to the estimate of the cost of joining the result of the first join to a third table, and continues joining all tables of the tables set and calculating the accumulated estimated cost. The total cost also considers the cost of incorporating single-table conditions that do not require any joins. This process is repeated for every possible permutation. Indexes can reduce the plan cost. Selinger optimizers generally use indexes to reduce the cost of joining a table, or to reduce the cost of determining a condition on a single table. Referring the tables 100, 200, 300 of FIGS. 1-3, consider the query:
______________________________________
SELECT CUST.name, ITEM.quantity
FROM CUST, ORD, ITEM
WHERE CUST.cust.sub.-- name = "Smith" AND
CUST.cust.sub.-- no = ORD.cust.sub.-- no AND
ORD.ord.sub.-- no = ITEM.ord.sub.-- no AND
ITEM.quantity .gtoreq. 25
______________________________________
The table set is {CUST, ORD, ITEM}; the plans include the following possible permutations:
______________________________________
(1) {CUST, ORD, ITBM}
(2) {CUST, ITEM, ORD}
(3) {ORD, CUST, ITEM}
(4) {ORD, ITEM, CUST}
(5) {ITEM, CUST, ORD}
(6) {ITEM, ORD, CUST}
______________________________________
Index 1700 of FIG. 17 associates the field ITEM.quantity with ITEM.RID, and is helpful to process the condition ITEM.quantity .gtoreq.25. This condition references only the ITEM table, and therefore does not require that any other table be previously joined. Thus index 1700 can be used in each of plans (1)-(6) when the ITEM table is added. Index 1800 of FIG. 18 associates the field ITEM.ord.sub.-- no with ITEM.RID, and is useful to effect the condition ORD.ord.sub.-- no=ITEM.ord.sub.-- no, joining the ITEM table to the ORD table by the common ord.sub.-- no fields. However, because this condition references the ORD table, index 1800 is used only to join the ITEM table if the ORD table has already been joined. Thus, index 1800 is used only in plans (1), (3), and (4). Various methods of determining when to use GK indexes may be used to adapt Selinger optimizers to incorporate GK indexes into this algorithm. For example, in one embodiment, like conventional single-table indexes, a single-table GK index on a table T will always be considered when that table is joined. Multi-table GK-indexes will be considered when all of the tables referenced in the key-list have already been joined. Query optimizers based on algorithms other than the Selinger algorithm can be similarly adapted to handle GK indexes, although the specific modifications may vary. Other embodiments are within the scope of the following claims. For example, other embodiments may use different update methods. One embodiment, rather than deleting a GK index when an underlying table is altered, updates or recalculates the index. Other possible variations include use different types of data structures may be used to represent the GK key indexes, including b-trees and bit-mapped indexes. While the invention is described in terms of a software implementation, the invention may be implemented in hardware or software, or a combination of both. Preferably, the invention is implemented in a software program executing on a programmable processing system comprising a processor, a data storage system, an input device, and an output device. FIG. 19 illustrates one such programmable processing system 1900, including a CPU 1910, a RAM 1920, and an I/O controller 1930 coupled by a CPU bus 1940. The I/O controller 1930 is also coupled by an I/O bus 1950 to input devices such as a keyboard 1960, and output devices such as displays 1970. The present invention has been described in terms of an embodiment. The invention, however, is not limited to the embodiment depicted and described. Rather, the scope of the invention is defined by the appended claims.
|
Same subclass Same class Consider this |
||||||||||
