Technique for detecting a shared temporal relationship of valid time data in a relational database management system6691097Abstract A method, apparatus, and article of manufacture for detecting shared temporal relationships in a relational database. In accordance with the present invention, an invocation of a shares operation that specifies a first event and a second event is received. In response to the invocation, a combination of temporal relationships between the first event and the second event are evaluated to determine whether (1) the second event ends after the first event starts and the first event ends after the second event starts or (2) the first and second events start and end at the same time. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE 1
ID NAME SALARY START_DATE END_DATE
123 Grace Theophila 45,000 02/01/1997 04/20/1998
123 Grace Theophila 48,000 04/20/1998 10/30/1998
123 Grace Theophila 49,500 10/30/1998 04/04/1999
... ... ... ... ...
Both Table 1 and Table 2 track valid time information about different business conditions. Table 1 records salary information for employees throughout various periods of time, and Table 2 records employees' job titles throughout various periods of time. The "period" nature is a characteristic of temporal data and temporal analysis. Traditional databases (i.e., databases which focus on currently valid data) rarely model employee salary and job title information in two separate tables, as shown in Table 1 and Table 2. However, for simplicity, a "temporal" database (i.e., one which attempts to track historical information and, possibly, current and future information as well) may model data in separate tables. An employee's salaries and job titles can vary over time, independently of one another. Storing both pieces of information in a single temporal table forces the DBMS professional to design the database in the following manner either: (1) retain only one start/end date pair to record the valid time for all the row's content; or (2) include multiple start/end date pairs, each recording the valid time for a single part of the row's content. Each of these design options increases the complexity of the temporal analysis. Therefore, in the interest of simplicity and clarity, the single function operator system 124 will be described herein with respect to separate tables for each type of data. However, if desired, the single function operator system 124 can be used with other database designs, e.g., single table designs. Both Table 1 and Table 2 use dates as their level of temporal granularity, because presumably, an employee's salary or job title remains constant for a single day. However, temporal data can be recorded at coarser or finer levels of granularity. The START_DATE represents the first day on which the condition became true, and the END_DATE represents the first day thereafter in which the condition failed to remain true. For example, beginning Feb. 1, 1997, Grace had a salary of $45,000 per year. Grace continued to earn this salary until--but not including--Apr. 20, 1998. This technique of modeling temporal data is sometimes referred to as a "closed-open" representation in research literature. Of course, other representations of the data are possible without exceeding the scope of the single function operator system 124. Many of the underlying principles for a preferred embodiment of the single function operator system 124 are based on the theoretical work of J. F. Allen, who identified a set of operators (commonly referred to as Allen's operators) that can be used to assess temporal relationships. Allen's operators can be expressed in a variety of languages, including SQL. Allen's operators are shown in FIG. 2. FIG. 2 has an OPERATOR column 200, a RELATIONSHIP column 202, and a GRAPHIC EXAMPLE column 204. The OPERATOR column 200 contains seven of Allen's operators, including BEFORE 206, MEETS 208, OVERLAPS 210, DURING 212, STARTS 214, FINISHES 216, and EQUAL 218. These operators 206, 208, 210, 212, 214, 216, and 218 perform a comparison operation. The result of each comparison operation yields a TRUE or FALSE value. The RELATIONSHIP column 202 shows the relationship between a time period X 220 and a time period Y 222. The GRAPHIC EXAMPLE column 204 displays a graphical representation of the relationship between the time period X 220 and the time period Y 222. Other Allen's operators include MET BY, OVERLAPPED BY, STARTED BY, and FINISHED BY. The results of these operators also produce a TRUE or FALSE value. The preferred embodiment of the single function operator system 124 combines some of the operators 206, 208, 210, 212, 214, 216, and 218 into a single function. Combining some of the operators 206, 208, 210, 212, 214, 216, and 218 simplifies certain queries and helps reduce the number of lines of SQL code. More specifically, an embodiment of the single function operator system 124 provides a WITHIN operator that combines the EQUAL 218, DURING 212, STARTS 214, and FINISHES 216 operators into a single function operator. The WITHIN operator returns a TRUE value when the time period X 220 is wholly or partly contained within the time period Y 222. Another embodiment of the present invention provides a SHARES operator. The SHARES operator is similar to the WITHIN operator. Like the WITHIN operator, the SHARES operator combines the EQUAL 218, DURING 212, STARTS 214, and FINISHES 216 operators into a single function operator. The difference between the WITHIN operator and the SHARES operator is that the SHARES operator adds the following operators to the combination: OVERLAPS 210, OVERLAPPED BY, CONTAINS, STARTED BY, and FINISHED BY. The SHARES operator returns a TRUE value when time period X 220 shares any time in common with time period Y 222. To illustrate the benefits of the SHARES operator, the SHARES operator is used to extract data from Table 3 and Table 4. Table 3 represents a Store database. The Store database records data about stores and the districts to which each store reports. Table 3 contains five columns, a SID column, a STORE_NAME column, a DID column, a ORG_START column, and ORG_END column. The SID column contains a store identifier. The STORE_NAME column contains the name of store. The DID column contains the district identifier of the district that the store reports to. The ORG_START column contains the start date of the store-to-district reporting structure, and the ORG_END column contains the end date of the store-to-district reporting structure.
TABLE 3
SID STORE_NAME DID ORG_START ORG_END
0 Acme 0 7 05/06/1998 07/20/1998
0 Acme 0 6 01/01/1998 05/06/1998
1 Acme 1 7 04/20/1998 05/05/1998
2 Acme 2 6 01/01/1998 09/30/1998
2 Acme 2 7 09/30/1998 12/30/1998
3 Acme 3 5 01/10/1998 12/30/1998
4 Acme 4 5 09/01/1998 12/30/1998
... ... ... ... ...
Table 4 represents a District database. The District database records data about the districts and about the districts associated trading area. Table 4 contains five columns: a DID column that contains a district identifier; a D_NAME column that contains a district name; a TID column that contains an identifier of the trading area that the districts reports to; an ORG_START column that contains a start date of the reporting structure, and an ORG_END column that contains an end date of the reporting structure.
TABLE 4
DID D_NAME TID ORG_START ORG_END
5 Valley District 11 01/01/1998 07/30/1998
6 Springs District 11 05/30/1998 12/30/1998
6 Lakes District 12 01/01/1998 05/30/1998
7 Mountain District 12 02/04/1998 11/30/1998
5 Willows District 12 07/30/1998 08/30/1998
6 Waterfront District 9 01/01/1997 12/30/1997
... ... ... ... ...
As an example, assume that a query seeks to report the names of stores and the districts which the stores are associated with over time. This type of query is sometimes referred to as a "temporal sequenced join". Such a query might produce a report that cites the name of each store, the name of the district into which the store reported, and the dates for which this store-to-district reporting information is valid. Table 5 represents a sample result.
TABLE 5
STORE_NAME D_NAME ORG_START ORG_END
Acme 0 Lakes District 01/01/1998 05/06/1998
Acme 0 Mountain District 05/06/1998 07/20/1998
Acme 1 Mountain District 04/20/1998 05/05/1998
Acme 2 Lakes District 01/01/1998 09/30/1998
Acme 2 Springs District 01/01/1998 09/30/1998
Acme 2 Mountain District 09/30/1998 12/30/1998
Acme 3 Valley District 01/10/1998 12/30/1998
Acme 3 Willows District 01/10/1998 12/30/1998
Some conventional techniques for drafting a query that produces the results contained in Table 5 require four SELECT statements, three UNION statements, and a total of eleven data comparison operations. Each SELECT statement tests for some relationship between the time period of validity for the store-to-district reporting structure. The data comparison operators which implement Allen's operators, test for various temporal relationships. After testing for the temporal relationships, the query then unions the results together. A sample conventional query is shown below: SELECT store_name, d_name, store.org_start, store.org_end FROM store, district WHERE store.did=district.did and district.org_start<=store.org_start and store.org_end<=district.org_end UNION SELECT store name, d_name, store.org_start, store.org_end FROM store, district WHERE store.did=district.did and store.org_start>district.org_start and district.org_end<store.org_end and store.org_start<district.org_end UNION SELECT store_name, d_name, store.org_start, store.org_end FROM store, district WHERE store.did=district.did and district.org_start>store.org_start and store.org_end<district.org_end and district.org_start<store.org_end UNION SELECT store_name, d_name, store.org_start, store.org_end FROM store, district WHERE store.did=district.did and district.org_start>=store.org_start and district.org_end<=store.org_end ORDER BY store_name The above query contains four query blocks. Each section of the query that begins with a SELECT statement is a query block. Each query block contains a standard join clause based on the district identification number (i.e., the DID column of the STORE and DISTRICT tables). Each query block also includes a temporal join clause. To simplify the discussion of the temporal join clauses, assume "P1" denotes the time period specified by the ORG_START and ORG_END dates of the STORE table shown in Table 3, and assume "P2" denotes the time period specified by the ORG_START and ORG_END dates of the DISTRICT table shown in Table 4. Thus, the four query blocks test for the following temporal conditions: Query Block 1: P1 DURING P2 or P1 EQUAL P2 or P1 STARTS P2 Query Block 2: P2 OVERLAPS P1 Query Block 3: P1 OVERLAPS P2 Query Block 4: P2 DURING P1 or P2 EQUAL P1 or P2 FINISHES P1 While this query produces the intended result set shown in Table 5, many users would experience difficulty formulating this query. In particular, few users are capable of developing the logic and correctly coding the syntax (particularly all the date comparison operators) in a timely manner. Assuming that users store their temporal data in a relational or object/relational DBMS, a user must perform the following steps to formulate the above query: (1) understand the logic of each of the relevant temporal conditions; (2) correctly translate the logic into SQL date comparison operators; (3) formulate appropriate query blocks; and (4) UNION these query blocks together. Such query logic can be difficult to debug, as an error in one date comparison operator will yield incorrect results. However, the that same error will fail to produce an error warning message from the database. In addition to the difficulty in formulating and debugging the SQL query, the execution of the SQL query can cause a database management system to scan the table(s) referenced in the query multiple time (one time for each query block). Such scanning may result in considerable input and output processing and poor performance (e.g., delays in receiving query results). Fortunately, the single function operator system 124 provides the SHARES operator. The SHARES operator simplifies the above query. More specifically, the SHARES operator eliminates three of the four SELECT statements, all of the UNION statements, and ten of the eleven date comparisons. Using the SHARES operator, the above query can be revised as follows: SELECT store_name, store.did, d_name, store.org_start, store.org_end FROM store, district WHERE store.did=district.did and shares(store.org_start, store.org_end, district.org_start, district.org_end)=1 ORDER BY store_name, store.did In addition to greatly simplifying the traditional query, the revised query adds a the district identifier (the DID column of Table 4) to the result shown in Table 6.
TABLE 6
STORE_NAME DID D_NAME ORG_START ORG_END
Acme 0 6 Lakes District 01/01/1998 05/05/1998
Acme 0 7 Mountain 05/06/1998 07/20/1998
District
Acme 1 7 Mountain 04/20/1998 05/05/1998
District
Acme 2 6 Springs District 01/01/1998 09/30/1998
Acme 2 6 Lakes District 01/01/1998 09/30/1998
Acme 2 7 Mountain 09/30/1998 12/30/1998
District
Acme 3 5 Valley District 01/10/1998 12/30/1998
In the revised query, the operator that eliminates the most code is the SHARES OPERATOR: shares(store.org_start, store.org_end, district.org_start, district.org_end)=1 The SHARES function combines several temporal tests into one. A temporal relationship exists when either time period is equal to the other time period, or overlaps with the other time period, or occurred during the other time period, or starts during the other time period, or finishes during the other time period. That is, the two periods share some time in common. The SHARES operator expects to receive four DATE values as input (each pair containing the start/end points of each time period). The SHARES operator returns a "1" if the test evaluates as TRUE or a "0" if the test evaluates as FALSE. The WITHIN operator also eliminates the amount of code used in a traditional query. To illustrate the benefit of using the WITHIN operator, the WITHIN operator is used to extract data from Table 7 and Table 8 below. Table 7 represents a Discount database. The discount database records the retail price discount offered by store for products. Table 7 contains five columns: a PID column, a SID column, a PERCENT_OFF column, a D_START column, and a D_END column. The PID column contains a product identifier. The SID column contains a store identifier. The PERCENT_OFF column contains a discount rate. The D_START column contains a start time for a discount on product. The D_END columns contains the end time for a discount on product.
TABLE 7
PID SID PERCENT_OFF D_START D_END
100 3 5 05/01/1998 05/30/1998
200 1 7 04/30/1998 05/10/1998
200 2 7 04/30/1998 05/10/1998
600 2 5 11/30/1998 12/05/1998
500 0 10 02/01/1998 02/10/1998
... ... ... ... ...
Table 8 represents a Manu_Special database. The Manu_Special database records a manufacturer's discount offered to retailers. Table 8 contains four columns, a PID column, a PERCENT_OFF column, a S_START column, and a S_END column. The PID column contains the product identifier. The PERCENT_OFF column contains manufacturer's discount rate. The S_START column contains the start date of manufacturer's special pricing. The S_END column contains the end date of manufacturer's special pricing.
TABLE 8
PID PERCENT_OFF S_START S_END
100 2 04/30/1998 05/15/1998
100 5 07/30/1998 08/15/1998
400 10 07/30/1998 08/30/1998
600 5 12/10/1998 12/30/1998
500 15 01/01/1998 02/15/1998
... ... ... ...
A sample query is shown below. The query seeks to determine which products were put on sale during a time period X 220, wherein the time period X 220 occurs outside of the time period Y 222. The time period Y 222 represents the time period in which the store was eligible to receive a manufacturer's rebate. In other words, the query seeks to determine if any portion of a product's retail discount period fell outside the manufacturer's rebate period. SELECT discount.pid, sid, d_start as disc_start, d_end as disc_end, s_start as rebate_start, s_end as rebate_end FROM discount, manu_special WHERE discount.pid=manu_special.pid and ((d_start<s_start) or (d_end>s_end)) The query contains one query blocks. The query block contains a standard join clause based on the product identification number (i.e., the PID column of the Discount and Manu_Special tables). The query block also includes a temporal join clause. Formulating this temporal join clause is difficult because correct date comparison operations must be specified. In this example, the goal is to produce a result that contains discounted products that fell outside the manufacturer's rebate period--that is, any discounts occurring before or after the rebate period. To simplify the discussion of the temporal join clauses, assume "P1" denotes the time period specified by the D_START and D_END dates of the Discount table shown in Table 7, and assume "P2" denotes the time period specified by the S_START and S_END dates of the Manu Special table shown in Table 8. While this query produces the intended result set, many people would experience difficulty formulating this query. In particular, few people are capable of developing the logic and correctly coding the syntax (particularly the date comparison operators) in a timely manner. The WITHIN operator simplifies the above query. Using the WITHIN operator, the above query can be revised as follows: SELECT discount.pid, sid, d_start as disc_start, d_end as disc_end, s_start as rebate_start, s_end as rebate_end FROM discount, manu_special WHERE discount.pid=manu_special.pid and within(d_start, d_end, s_start, s_end)=0 In the revised query, formulating the temporal portion of the query is simple. Namely, the revised query returns those rows that lack the WITHIN condition. Specifying that the query return a "0" or FALSE value produces rows that lack the WITHIN condition. FIGS. 3A and 3B are flow charts illustrating the steps performed by the present invention 124 in accordance with an embodiment of the single function operator system 124. In particular FIG. 3A illustrates the steps performed by the present invention to create a WITHIN operator and FIG. 3B illustrates the steps performed by the present invention to create a SHARES operator. In FIG. 3A, block 300 represents the single function operator system 124 receiving a WITHIN operator. Block 302 represents the single function operator system logically combining the EQUAL, DURING, STARTS, and FINISHES operators into a single operation represented by the WITHIN operator. In FIG. 3B, block 304 represents the single function operator system receiving a SHARES operator. Block 306 represents the single function operator system 124 logically combining the OVERLAP, OVERLAPPED BY, DURING, CONTAINS, STARTS, STARTED BY, FINISHES, FINISHED BY, and EQUALS operators into a single operation represented by the SHARES operator. CONCLUSION This concludes the description of the preferred embodiment of the invention. The following describes some alternative embodiments for accomplishing the present invention. For example, any type of computer, such as a mainframe, minicomputer, or personal computer, or computer configuration, such as a timesharing mainframe, local area network, or standalone personal computer, could be used with the present invention. The foregoing description of the preferred embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.
|
Same subclass Same class Consider this |
||||||||||
