Incremental maintenance of materialized views containing one-to-N lossless joins6125360Abstract A method and apparatus are provided for performing incremental refreshes to materialized views defined by one-to-N lossless joins. Each base table of the materialized view is selected to be processed as the current "selected table". The processing of the current selected table varies depending on whether the selected table is the right side table of an outer join. If the selected table is not the right table of an outer join, then the selected table is processed by (1) deleting rows from the materialized view based on rows of the selected table that have been updated or deleted in the selected table during the batch window, and (2) inserting rows into the materialized view based on updates and inserts into the selected table that occurred during the batch window. If the selected table is the right table of an outer join, then changes made to the selected table are processed in a way that reduces the number of changes that have to be made to the materialized view. According to one embodiment of the invention, operations performed during the incremental refresh are performed by issuing database statements (e.g. SQL queries) to a database server. The incremental refresh techniques described herein are "memoryless" in that they do not require a record of the sequence of changes that were made during a batch window. Techniques are described for performing the incremental refresh steps through the use of database commands and queries. Claims What is claimed is: Description FIELD OF THE INVENTION
______________________________________
MV: select
<sel-cols>
from
T1, T2, T3, . . . , Tn
where
<join.sub.-- preds>
______________________________________
In this query definition, T1, T2, . . . , Tn are the base tables of MV. <sel-cols> are the columns in the select clause. <join.sub.-- preds> are the join predicates of the form T1.a=T2.a or T1.a=T2.a(+). Join predicates are in a conjunctive form. The MV stores rowids of the base table columns that are listed in <sel-cols> and the join columns that are listed in <join.sub.-- preds>. The notation MV.Ti.sub.-- rid refers to the column in MV that stores the rowid values from a particular table Ti. The rowid value stored in the rowid column of a given row in the MV is used to uniquely identify the row in Ti that combined to form the given row in the MV. Consequently, any other column or set of columns of Ti that is subject to uniqueness and non-null constraints may be used for this purpose instead of the rowid column. OJGraph(T1) is used to refer to the part of the join of a query Q that is reachable from T1 by either inner joins or left outer joins. For example, given the join S.rarw.L.fwdarw.O.fwdarw.C, OJGraph(O) is equal to O.fwdarw.C. When the same table is not allowed to be the right side of two different left outer joins, the following statement holds true: if there exists a Tj, such that Tj.fwdarw.Ti, then Tj.fwdarw.OJGraph(Ti) is a valid subgraph of the Join-graph of Q and furthermore this graph is unique for a given Ti. The subgraph Tj.fwdarw.OJGraph(Ti) is indicated by the notation POJGraph(Ti). For example, given the join S.rarw.L.fwdarw.O.fwdarw.C, POJGraph(O) is equal to L.fwdarw.O.fwdarw.C. If Ti is not the outer table of any outer join then POJGraph(Ti)=OJGraph(Ti) The notation .about.OJGraph(Ti) is used to refer to the complement of OJGraph(Ti). Thus, .about.OJGraph(Ti) =Join.sub.-- Graph(Q)--OJGraph(Ti). Dlt.sub.-- Tn is a table which contains the rowids of the rows in table Tn which have changed. Dlt.sub.-- Tn is referred to herein as the "delta table" for table Tn. VTn denotes a view on table Tn which produces the rows from Tn which have changed. SCN refers to a System Change Number. A system change number is a logical number assigned to transactions in commit time order. Each change is associated with the scn of the transaction that performed the change. The scn associated with a change indicates the logical time at which the change was made within the database. THE INCREMENTAL REFRESH OPERATION Referring to FIG. 5, it is a flow chart that illustrates the steps involved in an incremental refresh operation according to one embodiment of the invention. Incremental refresh is performed by applying Dlt.sub.-- Ti to MV. According to one embodiment, there is a partial order in application of Dlt.sub.-- Ti. Specifically, if Ti is a right side of an outer join Tj.fwdarw.Ti, then DIt.sub.-- Tj is applied before Dlt.sub.-- Ti. At step 500, the leftmost unprocessed table listed in the join query that defines the materialized view to be refreshed is selected. Thus, if the join query is T1.fwdarw.T2, and neither T1 nor T2 have yet been processed, then T1 is selected during step 500. During the second iteration of step 500, T1 will have already been processed, so T2 will be selected. At step 501, the membership of the rowid set Dlt.sub.-- Ti is determined. The rowid set Dlt.sub.-- Ti includes the rowids of all rows of the selected table that have undergone any change since the last refresh operation. Various techniques may be used to establish Dlt.sub.-- Ti. For the purposes of explanation, it shall be assumed that the database system maintains a "delta table" SNLog.sub.-- Ti that stores, for each changed row of table Ti, (1) the rowid of the changed row, and (2) the scn that indicates the logical time at which the row was changed. Assuming that the last refresh of a materialized view occurred at a time "Last.sub.-- Refresh.sub.-- SCN" and that the current refresh is intended to update the materialized view to reflect the time "Current.sub.-- Start.sub.-- SCN", the membership of Dlt.sub.-- Ti may be established according the database query:
______________________________________
Dlt.sub.-- Ti = select unique rid from SNLog.sub.-- Ti
where
scn > :Last.sub.-- Refresh.sub.-- SCN and
scn < :Current.sub.-- Start.sub.-- SCN;
______________________________________
After the membership of Dlt.sub.-- Ti is established, control passes to step 502. At step 502, the changed rows that still exist in the currently selected table are retrieved. This set of these rows is referred to as VTi. VTi may be established using the database query:
______________________________________
VTi =select Ti.* from Ti, Dlt.sub.-- Ti
where
Ti.rowid = Dlt.sub.-- Ti.rid;
______________________________________
Some changed rows may no longer exist in the selected table at the time the incremental refresh of the materialized view is being performed. Specifically, changed rows include rows that have been deleted, as well as rows that have been updated and inserted. The rowids of deleted rows will be returned in Dlt.sub.-- Ti, but the deleted rows themselves will not be part of VTi, since VTi is generated by executing a query on the selected table after the deleted rows have been removed from the table. After the membership of VTi is established, control passes to step 504. At step 504, it is determined whether the selected table is the right side of an outer join. If the selected table is the right side of an outer join, control passes to step 506. If the selected table is not the right side of an outer join, control passes to step 510. At step 506, the right side of an outer join is processed. The various steps for processing the right side of an outer join are illustrated in FIG. 7. Referring to FIG. 7, steps 700-706 are performed to update MV to reflect deletions that have occurred in the selected table (Ti) during the batch window. Steps 708-714 are performed to update MV to reflect insertions that have been made into the selected table during the batch window. Hence, if no deletions have occurred in Ti during the batch window, steps 700-706 may be skipped. Similarly, if no insertions have occurred in Ti during the batch window, then steps 708-714 may be skipped. In this context, each update constitutes both an insert (of the new values) and a deletion (of the old values). At step 700, three sets of information are generated. The three sets of information are as follows: Tmp.sub.-- mv=a list of rowid combinations <Tj.sub.-- rid, Ti.sub.-- rid> such that Ti.sub.-- rid is the rowid of a row in Ti that has changed, and Tj.sub.-- rid is a rowid of a row in Tj (the left side of the outer join) that combines with Ti.sub.-- rid in MV. For example, assume that a row Ti1 was changed, and that MV had two rows that resulted from rows Tj1 and Tj2 combining with Ti1. Under these circumstances, Tmp.sub.-- mv would contain the rowid combinations <Tj1, Ti1> and <Tj2, Ti1>. Tmp.sub.-- mvcnt.sub.-- rids=a list of tuples<Tj.sub.-- rid, count> such that Tj.sub.-- rid is the rowid of a row in Tj that is combined in MV with any changed row in Ti, and count is the number of rows in MV that result from the combination Tj.sub.-- rid with rows in other tables. For example, assume that MV has three rows that include the value Tj1, and that those three MV rows are the result of Tj1 combining with Ti1, Ti5 and Ti8. Further assume that at least one of Ti1, Ti5 and Ti8 is a changed row. Under these circumstances, Tmp.sub.-- mvcnt.sub.-- rids would contain a tuple<Tj1, 3>. Tmp.sub.-- cnt=a list of three-tuples <Tj.sub.-- rid, dif, count> such that Tj.sub.-- rid is the rowid of a row in Tj that is combined in MV with any changed row in Ti, and count is the number of rows in MV that result from the combination Tj.sub.-- rid with rows in other tables. Dif is the number of rows in MV that include Tj and which are the result of Tj combining with a row in Ti that is not changed. For example, assume that MV has three rows that include the value Tj1, and that those three MV rows are the result of Tj1 combining with Ti1, Ti5 and Ti8. Further assume out of the three rows Ti1, Ti5 and Ti8, only Ti1 is a changed row. Under these circumstances, Tmp.sub.-- mvcnt.sub.-- rids would contain the three-tuple <Tj1, 2, 3>. According to one embodiment of the invention, the three sets of information described above are gathered into temporary tables (Tmp.sub.-- mv, Tmp.sub.-- mvcnt.sub.-- rids, and Tmp.sub.-- cnt) by issuing database statements in SQL. The Tmp.sub.-- mv table is a table that contains the old rows in the MV that need to be acted upon during the incremental refresh. Tmp.sub.-- mv is created based on MV.sub.-- rid, Tj.sub.-- rid, Ti.sub.-- rid, and mv-columns-of (.about.OJGraph(Ti)). Tmp.sub.-- mv may be created according to the statement:
______________________________________
Tmp.sub.-- mv = select rowid, Tj.sub.-- rid, Ti.sub.-- rid from MV
where
Ti.sub.-- rid in (select rid from Dlt.sub.-- Ti);
______________________________________
This statement creates a table named Tmp.sub.-- mv with three columns: rowid, Tj.sub.-- rid and Ti.sub.-- rid. Values for all three columns come from the MV. One row is created in Tmp.sub.-- mv from every row in MV that has a rowid that exists in Dlt.sub.-- Ti. Since Dlt.sub.-- Ti includes the rowids of all rows of the selected table that have undergone any change since the last refresh operation, Tmp.sub.-- mv includes a row for every row of MV that includes values that originated from rows in the selected table that have undergone any change since the last refresh operation. The Tmp.sub.-- mvcnt.sub.-- rids table includes two columns: Tj.sub.-- rid and count. The Tj.sub.-- rid column contains rowids of the table on the left side of the outer join (Tj). The count column indicates how many rows of the selected table are combined with each row of Tj. Tmp.sub.-- mvcnt.sub.-- rids may be created by the statement:
______________________________________
* Create table Tmp.sub.-- mvcnt.sub.-- rids(Tj.sub.-- rid, count)
Tmp.sub.-- mvcnt.sub.-- rids = select Tj.sub.-- rid, count(*) from MV
where
Tj.sub.-- rid in (select Tj.sub.-- rid from Tmp.sub.-- mv) and
Ti.sub.-- rid not null
group by Tj.sub.-- rid;
______________________________________
The Tmp.sub.-- cnt table includes three columns: Tj.sub.-- rid, dif and count. The Tj.sub.-- rid column contains the rowids of table Tj. The dif column contains the difference between the count column of the Tmp.sub.-- mvcnt.sub.-- rids table and the number of rows in the Tmp.sub.-- mv table for a given Tj.sub.-- rid. The count column contains the same values as the count column of Tmp.sub.-- mvcnt.sub.-- rids. Tmp.sub.-- cnt can be created using the statement:
______________________________________
* Create table Tmp.sub.-- cnt(Tj.sub.-- rid, dif, count)
Tmp.sub.-- cnt = select a.Tj.sub.-- rid, (a.count-b.count), a.count
from Tmp.sub.-- mvcnt.sub.-- rids a, (select Tj.sub.-- rid, count(*) from
Tmp.sub.-- mv
group by Tj.sub.-- rid)b
where
a.Tj.sub.-- rid = b.Tj.sub.-- rid;
______________________________________
Control proceeds from step 700 to step 702. During steps 702, 704 and 706, MV is updated such that, for any give Tj that combines with a changed row in Ti, the following holds true: if Tj only combines in MV with changed rows of Ti, then the rows in MV that contain Tj are deleted and replaced with a single row in which selected columns are set to NULL; and if Tj combines in MV with both changed and unchanged rows of Ti, then the rows in MV that result from Tj combining with changed rows are deleted, leaving only the rows in which Tj combines with unchanged rows of Ti. For example, assume that the only rows in MV that include Tj1 are rows: <Tj1, 1, 1, Ti1, 1, 1> <Tj1, 1, 1, Ti2, 1, 2> Further assume that both Ti1 and Ti2 are changed rows. Under these conditions, MV is updated to leave only the row: <Ti1, 1, 1, NULL, NULL, NULL> On the other hand, assume that the only rows in MV that include Tj1 are rows: <Tj1, 1, 1, Ti1, 1, 1> <Tj1, 1, 1, Ti2, 1, 2> <Tj1, 1, 1, Ti3, 1, 3> Further assume that Ti1 and Ti2 are changed rows, but Ti3 has not been changed. Under these conditions, MV is updated to leave only the row: <Tj1, 1, 1, Ti3, 1, 3> According to one embodiment, the update just described is performed in three steps by issuing SQL statements to a database server. Specifically, at step 702, those rows of MV that join with exactly one changed row of Ti (the "update target rows") are updated. Specifically, the columns of MV that are in OJGraph(Ti) are set to NULL in the update target rows. The update target rows can be identified using values from the columns of Tmp.sub.-- cnt. Specifically, the update target rows are identified in the Tjrid column of every row where Tmp.sub.-- cnt.dif=0 and Tmp.sub.-- cnt.count=1. The update can be performed according to the following statement:
______________________________________
update MV
set
column-of(OJGraph(Ti)) = null
where
rowid in (select a.MV.sub.-- rid from Tmp.sub.-- mv a, Tmp.sub.-- cnt b
where
a.Tj.sub.-- rid = b.Tj.sub.-- rid and
b.dif = 0 and
b.count = 1;
);
______________________________________
At step 704, the rows from MV that join with more than one row of Ti are deleted. These rows (the "delete target rows") can be identified using Tmp.sub.-- mv and Tmp.sub.-- cnt. Specifically, a row of MV is a delete target row if the row contains a Tj.rid value that is in Tmp.sub.-- mv that whose Tmp.sub.-- cnt.count number is greater than one. The delete target rows may be deleted using the following statement:
______________________________________
delete from MV
where
rowid in (select a.MV.sub.-- rid from Tmp.sub.-- mv a, Tmp.sub.-- cnt b
where
a.Tj.sub.-- rid = b.Tj.sub.-- rid and
b.count > 1;
);
______________________________________
At step 706, a row with null cols for OJGraph(Ti) is inserted for (1) those rows in MV which join with more than 1 row of Ti and (2) all of which have been deleted in step 704. Assuming that Tj.fwdarw.OJGraph(Ti) is a valid outer join, this insertion can be performed using the statement:
______________________________________
insert into MV
select a.mv-columns-of(.about.OJGraph(Ti)), (mv-columns-of(OJGraph(Ti)=
null))
from Tmp.sub.-- mv a, Tmp.sub.-- cnt b
where
a.Tj.sub.-- rid = b.Tj.sub.-- rid and
b.dif =0 and
b.count > 1;
______________________________________
Control proceeds from step 706 to step 708. During steps 708-714, MV is updated to reflect inserts into the selected table (Ti) that occurred during the batch window. Specifically, Ti is updated such that new rows that result from the combination of Tj with the new rows of Ti are inserted into MV. In addition, MV rows that exist because a particular row of Tj did not previously combine with any row of Ti are removed if that particular row of Tj combines with a changed row of Ti. For example, assume that MV included the row: <Tj1, 1, 1, NULL, NULL, NULL> because Tj1 did not combine with any row of Ti prior to the batch window. Further assume that during the batch window a row <Tj8, 1, 8> was created in Ti (either through an insertion or an update). Under these circumstances, MV will be updated to replace the row <Tj1, 1, 1, NULL, NULL, NULL> with the row <Tj1, 1, 1, Tj8, 1, 8> At step 708, a second initialization phase is performed where three new tables are created. The three tables, New.sub.-- mv, Tmp.sub.-- mv.sub.-- rids, and Tmp.sub.-- newcnt, may be used to perform the operations associated with the subsequent steps 710, 712 and 714. The table New.sub.-- mv consists of all the updated/new rows to be added to MV, and may be created according to the statement:
______________________________________
New.sub.-- mv = select <sel-cols> . . . with all Ti replaced by VTi
from
T1, T2, T3, . . . , Ti, . . . , Tn
where
<join-preds> ; . . . with all Ti replaced by VTi
______________________________________
The table Tmp.sub.-- mv.sub.-- rids contains the rowids of the rows in table Tj that (1) are combined in MV with a row from Ti, and (2) also combine with a changed row of Ti. Tmp.sub.-- mv.sub.-- rids may be created according to the statement:
______________________________________
Tmp.sub.-- mv.sub.-- rids = select Tj.sub.-- rid from MV
where
Tj.sub.-- rid in (select Tj.sub.-- rid from New.sub.-- mv) and
Ti.sub.-- rid not NULL
______________________________________
The table Tmp.sub.-- newcnt specifies the number of times a Tj row combines with new or updated rows of Ti. Tmp.sub.-- newcnt may be created according to the statement:
______________________________________
* Create table Tmp.sub.-- newcnt(Tj.sub.-- rid, count)
Tmp.sub.-- newcnt = select Tj.sub.-- rid, count(*)
from New.sub.-- mv
group by Tj.sub.-- rid;
______________________________________
Control proceeds from step 708 to step 710. At step 710, the rows in MV which join with exactly one changed (updated or inserted) row that currently exists in Ti are updated to replace null values with values that reflect the new values of the changed row. These rows may be identified using the tables created in step 708 based on the expression (Tmp.sub.-- newcnt.count=1 and Tmp.sub.-- newcnt.Tj.sub.-- rid not in Tmp.sub.-- mv.sub.-- rids). Specifically, the update may be performed according to the statement:
______________________________________
update MV
(select
mv-columns-of-OJGraph(Ti), columns-of(OJGraph(Ti))
from
MV, New.sub.-- mv, Tmp.sub.-- newcnt
where
Tmp.sub.-- newcnt.count=1 and
Tmp.sub.-- newcnt.Tj.sub.-- rid not in (select Tj.sub.-- rid from
Tmp.sub.-- mv.sub.-- rids)
and
New.sub.-- mv.Tj.sub.-- rid = Tmp.sub.-- newcnt.Tj.sub.-- rid and
MV.Tj.sub.-- rid = New.sub.-- mv.Tj.sub.-- rid
set
mv-columns-of-OJGraph(Ti) = columns-of(OJGraph(Ti));
______________________________________
At step 712, selected rows are deleted from MV. The rows deleted from MV during step 712 are those rows that are derived from a Tj row that joins with no unchanged row of Ti and more than one changed Ti row. For example, assume that prior to the batch window, row Ti1 did not combine with any row of Tj. Thus, MV would contain the row: <Ti1, 1, 1, NULL, NULL, NULL> Assume that after the batch window, Ti1 combines with two rows Tj1 and Tj5 in Tj. Under these conditions, the row <Ti1, 1, 1, NULL, NULL, NULL> will be deleted during step 712. The rows to be deleted during step 712 will currently have null Ti.sub.-- rid values. The rows to be deleted in this step may be identified using the tables created in step 708 based on the expression (Tmp.sub.-- newcnt.count>1 and Tmp.sub.-- newcnt.Tj.sub.-- rid not in Tmp.sub.-- mv.sub.-- rids). Specifically, the deletion may be performed according to the statement:
______________________________________
delete from MV
where
MV.Tj.sub.-- rid in (select Tj.sub.-- rid from Tmp.sub.-- newcnt
where
Tmp.sub.-- newcnt.count>1 and
Tmp.sub.-- newcnt.count.Tj.sub.-- rid not in
(select Tj.sub.-- rid from Tmp.sub.-- mv.sub.-- rids)
);
______________________________________
During step 714, the new rows from New.sub.-- mv which have not already been updated in step 710 are inserted into MV. The rows to be inserted may be identified using the expression Tmp.sub.-- newcnt.count>1. Specifically, the insertion may be performed according to the statement:
______________________________________
insert into MV
select New.sub.-- mv.* from New.sub.-- mv, Tmp.sub.-- newcnt
where
New.sub.-- mv.Tj.sub.-- rid= Tmp.sub.-- newcnt.Tj.sub.-- rid and
Tmp.sub.-- newcnt.count > 1;
______________________________________
Upon completion of step 714, the processing of the right side of the outer join (step 506) is completed and control passes to step 514. At step 514, it is determined whether any base tables of the MV have not yet been processed. If any tables have not yet been processed, control returns to step 500 and the next unprocessed base table is processed. If all base tables have been processed, control proceeds to step 516. As mentioned above, control proceeds from step 504 to step 510 if the selected table is not a right side of an outer join. Therefore, control proceeds to step 510 when Ti is a left side on an outer join Ti.fwdarw.Tj or is either side of an inner join Ti><Tj. At step 510, rows in the materialized view that are derived from rows in the selected table that have been changed are removed from the materialized view. This removal can be performed according to the statement: delete MV where MV.Ti.sub.-- rid in (select rid from Dlt.sub.-- Ti); Control proceeds from step 510 to step 512. At step 512, the rows that were removed in step 510 are recalculated. This recalculation may be performed according to the following statement:
______________________________________
insert into MV
select
<sel-cols> . . . with all Ti replaced by VTi
from
VTi, T2, T3, . . . , Tn
where
<join-preds>; . . . with all Ti replaced by VTi
______________________________________
Control proceeds from step 512 to step 514. At step 514, it is determined whether there are any unprocessed base tables of the materialized view. If any base tables have not yet been processed, then control returns to step 500. Otherwise, control proceeds to step 516, and the incremental refresh process is complete. EXAMPLE For the purpose of explanation, an exemplary application of the incremental maintenance process set forth above shall be given with reference to tables X and Y, illustrated in FIG. 4a. Referring to FIG. 4a, table X includes two columns x.a and x.b and a pseudo-column "X.rowid". The pseudo-column contains a unique identifier for each row in table X. The pseudo-column may, for example, simply represent the addresses of the actual storage locations at which the rows are stored. Similar to table X, table Y includes two columns y.a and y.b and a pseudo-column Y.rowid. FIG. 4a represents the tables X and Y at a particular point in time (T1). At time T1, table X has five rows, with corresponding rowids of x1 to x5, and table Y has four rows, with corresponding rowids y1 to y4. FIG. 4b illustrates a materialized join view MV 400 created by performing a join on tables X and Y. The defining query for MV 400 is: QUERY3 SELECT * FROM X, Y WHERE x.a(+)=y.a; The MV 400 illustrated in FIG. 4b is current as of time T1. Therefore, it accurately represents the results produced by executing Query3 on tables X and Y at time T1. MV 400 includes the columns of the base tables X and Y, as well as the pseudo-columns X.rowid and Y.rowid. Query3 is an outer join. Consequently, the join produced by applying the query to tables X and Y are lossless (all rows of table X are reflected in MV 400). In addition, rows of X are allowed to combine with more than one row of Y. For example, row X2 combines with rows Y2 and Y4 of table Y. Therefore, the join that created MV 400 is also a one-to-N join. FIG. 4c illustrates changes made to tables X and Y between time T1 and a later time T2. Various mechanisms may be employed to identify these changes at the time an incremental refresh is to be performed. For example, snapshot logs may be used to record the changes that have been made to base tables after the most recent MV refresh. The present invention is not limited to any particular mechanism for recording changes to base tables. In the illustrated example, five changes were made to table X and four changes were made to table Y between T1 and T2. The changes to table X include two insert operations, two delete operations, and one update operation. The changes to table Y include one insert operation, one delete operation, and two update operations. FIG. 6 illustrates tables X and Y at time T2 after the changes have been made. At time T2, the technique described above is used to incrementally refresh MV 400 to accurately reflect the state of tables X and Y at time T2. The sequence of the incremental refresh would proceed as follows: SELECTED TABLE=TABLE X At step 500, the leftmost unprocessed table listed in the join query that defines the materialized view to be refreshed is selected. In the present example, none of the tables have been processed, and table X is the leftmost table in the join query, therefore table X is selected. At step 501, the membership of Dlt.sub.-- X is determined. Dlt.sub.-- X is the set of rowids of the rows that have been changed in table X since the last refresh of MV 400. For the purposes of explanation, it shall be assumed that the database system maintains a "delta table" SNLog.sub.-- X that stores, for each changed row of table X, (1) the rowid of the changed row, and (2) the scn that indicates the logical time at which the row was changed. The rowids of the changed rows of table X may be retrieved using the database query:
______________________________________
Dlt.sub.-- X = select unique rid from SNLog.sub.-- X
where
scn > :T1 and
scn < :T2;
______________________________________
In this query, scn stands for the system change number associated with changes that are recorded in SNLog.sub.-- X. As mentioned above, the system change number is a logical number assigned to transactions in commit time order, and which applies to all changes made by those transactions. Executing this query returns a set of rowids Dlt.sub.-- X whose membership consists of the rowids X6, X3, X7, and X2. Once the rowids of the changed rows in table X are identified, control proceeds to step 502. In step 502, VX is calculated. VX is the set of changed rows of table X that still exist in table X. VX may be calculated using the database query:
______________________________________
VX =select X.*from X, Dlt.sub.-- X
where
X.rowid = Dlt.sub.-- X.rid;
______________________________________
This query, executed against the state of table X at time T2 (FIG. 6), returns two rows: <X3, 7, 0> and <X6, 5, 1>. No rows are returned for the rowids X7 and X2 because those rows do not exist in table X at time T2. Therefore, at the end of step 502, Dlt.sub.-- X=X2, X3, X6 and X7 VX=<X3, 7, 0> and <X6, 5, 1> After VX is generated, control passes to step 504. At step 504, it is determined whether table X is the right side of an outer join. Table X is not the right side of an outer join, so control passes to step 510. At step 510, rows in the materialized view that are derived from rows in table X that have been changed are removed from the materialized view 400. This removal can be performed according to the statement: delete MV where MV.X.sub.-- rid in (select rid from Dlt.sub.-- X); This delete statement causes the rows in MV 400 that have X2, X3, X6 or X7 in column X.sub.-- rid to be deleted. The deletion of these rows from MV 400 leaves three rows remaining in MV 400. The three rows remaining in MV 400 after this deletion are: <R1, X1, 1, 1, Y1, 1, 0> <R4, X4, 4, 4, Y3, 4, 1> <R5, X5, 9, 0, NULL, NULL, NULL> Control proceeds from step 510 to step 512. At step 512, the rows that were removed from MV 400 in step 510 are recalculated. To perform this recalculation, the query that defines MV is re-executed after replacing references to table X with references to the view VX. Since the view VX will typically contain many orders of magnitude fewer rows than table X, execution of the join using VX will typically involve a much lower overhead than would be required to perform a total refresh. This recalculation may be performed according to the following statement:
______________________________________
insert into MV
select
*
from
VX, Y
where VX.a(+)=y.a
______________________________________
The two rows in VX, namely <X3, 7, 0> and <X6, 5, 1>, do not have values in column VX.a that combine with any values in column y.a of Table Y. Therefore, since the join is an outer join, the rows in VX are combined with Null rows to produce rows<X3, 7, 0, NULL, NULL, NULL> and <X6, 5, 1, NULL, NULL, NULL>. These newly produced rows are inserted into MV 400 by the insert statement specified above. After step 512, MV 400 has the following five rows: <R1, X1, 1, 1, Y1, 1, 0> <R4, X4, 4, 4, Y3, 4, 1> <R5, X5, 9, 0, NULL, NULL, NULL> <R7, X3, 7, 0, NULL, NULL, NULL> and <R8, X6, 5, 1, NULL, NULL, NULL> Control proceeds from step 512 to step 514. At step 514, it is determined whether there are any unprocessed base tables for MV 400. In the present example, MV 400 has two base tables X and Y. Table X has been processed but table Y has not yet been processed. Control therefore returns to step 500. SELECTED TABLE=TABLE Y At step 500, the leftmost unprocessed table listed in the join query that defines the materialized view to be refreshed is selected. In the present example, the only unprocessed table is table Y. Therefore table Y is selected. At step 501, the membership of Dlt.sub.-- Y is determined. Dlt.sub.-- Y is the set of rowids of the rows that have been changed in table Y since the last refresh of MV 400. The rowids of the changed rows of table Y may be retrieved using the database query:
______________________________________
Dlt.sub.-- Y = select unique rid from SNLog.sub.-- Y
where
scn > :T1 and
scn < :T2;
______________________________________
Executing this query returns a set of rowids Dlt.sub.-- Y that includes Y1, Y3, Y4 and Y5. Once the rowids of the changed rows in table Y are identified, control proceeds to step 502. At step 502, the set of still-existing changed rows of table Y (VY) is calculated using the database query:
______________________________________
VY =select Y.*from Y, Dlt.sub.-- Y
where
Y.rowid = Dlt.sub.-- Y.rid;
______________________________________
This query, executed against the state of table Y at time T2 (FIG. 6), returns three rows: <Y1, 9, 0>, <Y4, 9, 3> and <Y5, 3, 1>. No rows are returned for the rowid Y3 because the row that had rowid Y3 does not exist in table Y at time T2. Therefore, at the end of step 502, Dlt.sub.-- Y=Y1, Y3, Y4 and Y5, and VY=<Y1, 9, 0>, <Y4, 9, 3> and <Y5, 3, 1> After VY is generated, control passes to step 504. At step 504, it is determined whether table Y is the right side of an outer join. Table Y is the right side of an outer join, so control passes to step 506. At step 506, table Y is processed. The various steps for processing table Y are illustrated in FIG. 7. Referring to FIG. 7, three temporary tables (Tmp.sub.-- mv, Tmp.sub.-- mvcnt.sub.-- rids, and Tmp.sub.-- cnt) are initialized. The Tmp.sub.-- mv table is a table that contains the old rows in the MV that need to be acted upon during the incremental refresh. Tmp.sub.-- mv is created based on MV.sub.-- rid, X.sub.-- rid, Y.sub.-- rid, and mv-columns-of (.about.OJGraph(Y)). Tmp.sub.-- mv may be created according to the statement: Tmp.sub.-- mv=select rowid, X.sub.-- rid, Y.sub.-- rid from MV where Y.sub.-- rid in (select rid from Dlt.sub.-- Y); At this point in the incremental refresh process, MV 400 contains the rows: <R1, X1, 1, 1, Y1, 1, 0> <R4, X4, 4, 4, Y3, 4, 1> <R5, X5, 9, 0, NULL, NULL, NULL> <R7, X3, 7, 0, NULL, NULL, NULL> and <R8, X6, 5, 1, NULL, NULL, NULL> Therefore, the select statement specified above would create a Tmp.sub.-- mv table with the two rows: <R1, X1, Y1> <R4, X4, Y3> As mentioned above, one row is created in Tmp.sub.-- mv from every row in MV that has a rid that exists in Dlt.sub.-- Y. Specifically, Tmp.sub.-- mv includes a row for every row of MV that includes values that originated from rows in table Y that have undergone any change since the last refresh operation. Rows R1 and R4 contain values from Y1 and Y3, respectively, and Y1 and Y3 have undergone changes. Therefore, Tmp.sub.-- mv includes one row for each of rows R1 and R4 of MV 400. The Tmp.sub.-- mvcnt.sub.-- rids table is created by the statement:
______________________________________
* Create table Tmp.sub.-- mvcnt.sub.-- rids(X.sub.-- rid, count)
Tmp.sub.-- mvcnt.sub.-- rids = select X.sub.-- rid, count(*) from MV
where
X.sub.-- rid in (select X.sub.-- rid from Tmp.sub.-- mv)and
Y.sub.-- rid not null
group by X.sub.-- rid;
______________________________________
This statement produces one row for every row of MV that has an X.sub.-- rid value in Tmp.sub.-- mv and that does not have a NULL Y.sub.-- rid value. Once these rows are produced, the rows that have the same X.sub.-- rid value are combined to form a single row whose "count" value indicates how many rows were combined to form the row. At this point in the incremental refresh process, MV 400 is: <R1, X1, 1, 1, Y1, 1, 0> <R4, X4, 4, 4, Y3, 4, 1> <R5, X5, 9, 0, NULL, NULL, NULL> <R7, X3, 7, 0, NULL, NULL, NULL> and <R8, X6, 5, 1, NULL, NULL, NULL> and Tmp.sub.-- mv table is: <R1, X1, Y1> <R4, X4, Y3> The rows in MV that have an X.sub.-- rid value in Tmp.sub.-- mv and that do not have null Y.sub.-- rid values are: <R1, X1, 1, 1, Y1, 1, 0> <R4, X4, 4, 4, Y3, 4, 1> Performing the group and count operations results in a Tmp.sub.-- mvcnt.sub.-- rids table with rows: <X1, 1> <X4, 1> The Tmp.sub.-- cnt table is created using the statement:
______________________________________
* Create table Tmp.sub.-- cnt(X(.sub.-- rid, dif, count)
Tmp.sub.-- cnt = select a.X.sub.-- rid, (a.count-b.count), a.count
from Tmp.sub.-- mvcnt.sub.-- rids a, (select X.sub.-- rid, count(*)
from Tmp.sub.-- mv
group by X.sub.-- rid) b
where
a.X.sub.-- rid = b.X.sub.-- rid;
______________________________________
The Tmp.sub.-- cnt table includes three columns: X.sub.-- rid, dif, and count. The X.sub.-- rid column contains the rowids of table X that are in both Tmp.sub.-- mvcnt.sub.-- rids and Tmp.sub.-- mv. In the present example, the X.sub.-- rid values that qualify are X1 and X4. The dif column contains the difference between the count column of the Tmp.sub.-- mvcnt.sub.-- rids table and the number of rows in the Tmp.sub.-- mv table for a given X.sub.-- rid. The count column contains the same values as the count column of Tmp.sub.-- mvcnt.sub.-- rids. Thus, the calculation of Tmp.sub.-- cnt results in: <X1, 0, 1> <X4, 0, 1> Control proceeds from step 700 to step 702. During steps 702, 704 and 706, MV 400 is updated such that, for any give X that combines with a changed row in Y, the following holds true: if X only combines in MV 400 with changed rows of Y, then the rows in MV 400 that contain X are deleted and replaced with a single row in which selected columns are set to NULL; and if X combines in MV 400 with both changed and unchanged rows of Y, then the rows in MV 400 that result from X combining with changed rows are deleted, leaving only the rows in which X combines with unchanged rows of Y. At step 702, those rows of MV that join with exactly one changed row of Y (the "update target rows") are updated. Specifically, the columns of MV that are in OJGraph(Y) are set to NULL in the update target rows. The update target rows can be identified using values from the columns of Tmp.sub.-- cnt. Specifically, the update target rows are identified in the Xrid column of every row where Tmp.sub.-- cnt.dif=0 and Tmp.sub.-- cnt.count=1. The update can be performed according to the following statement:
______________________________________
update MV
set
columns-of(OjGraph(Y)) = null
where
rowid in (select a.MV.sub.-- rid from Tmp.sub.-- mv a, Tmp.sub.-- cnt b
where
a.X.sub.-- rid = b.X.sub.-- rid and
b.dif = 0
and
b.count = 1;
);
______________________________________
In the present example, both rows of Tmp.sub.-- cnt have a zero in the dif column and a one in the count column. The X.sub.-- rid values in those rows are X1 and X4. Therefore, the rows of MV with X.sub.-- rid values of X1 and X4 are update target rows. By executing the update statement, the update target rows: <R1, X1, 1, 1, Y1, 1, 0> <R4, X4, 4, 4, Y3, 4, 1> by setting to Null the columns that belong to OJGraph(Y). The resulting columns thus produced are: <R1, X1, 1, 1, NULL, NULL, NULL> <R4, X4, 4, 4, NULL, NULL, NULL> After this update, MV 400 includes the rows: <R1, X1, 1, 1, NULL, NULL, NULL> <R4, X4, 4, 4, NULL, NULL, NULL> <R5, X5, 9, 0, NULL, NULL, NULL> <R7, X3, 7, 0, NULL, NULL, NULL> and <R8, X6, 5, 1, NULL, NULL, NULL> At step 704, the rows from MV that join with more than one row of Y are deleted. These rows (the "delete target rows") can be identified using Tmp.sub.-- mv and Tmp.sub.-- cnt. Specifically, a row of MV is a delete target row if the row contains a X.rid value that is in Tmp.sub.-- mv whose Tmp.sub.-- cnt.count number is greater than one. The delete target rows may be deleted using the following statement:
______________________________________
delete from MV
where
rowid in (select a.MV.sub.-- rid from Tmp.sub.-- mv a, Tmp.sub.-- cnt b
where
a.X.sub.-- rid = b.X.sub.-- rid and
b.count > 1;
);
______________________________________
In the present example, the Tmp.sub.-- cnt table does not have any rows with a value greater than one in the count column. Therefore, there are no delete target rows. At step 706, a row with null cols for OJGraph(Y) is inserted for (1) those rows in MV which join with more than 1 row of Y and (2) all rows which have been deleted in step 704. Assuming that X.fwdarw.OJGraph(Y) is a valid outer join, this insertion can be performed using the statement:
______________________________________
insert into MV
select a.mv-columns-of(.about.OJGraph(Y)),
(mv-columns-of(OJGraph(Y)=null))
from Tmp.sub.-- mv a, Tmp.sub.-- cnt b
where
a.X.sub.-- rid = b.X.sub.-- rid and
b.dif = 0 and
b.count > 1;
______________________________________
In the present example, no rows in MV join with more than 1 row of Y, and no rows were deleted in step 704. Therefore no rows are inserted during step 706. At step 708, a second initialization phase is performed where three new tables are created. The three tables, New.sub.-- mv, Tmp.sub.-- mv.sub.-- rids, and Tmp.sub.-- newcnt, may be used to perform the operations associated with the subsequent steps 710, 712 and 714. The table New.sub.-- mv consists of all the updated/new rows to be added to MV, and may be created according to the statement: New.sub.-- mv=select SELECT * FROM X, VY WHERE x.a(+)=VY.a; The New.sub.-- mv table resulting from this join includes the rows: <X1, 1, 1, NULL, NULL, NULL> <X3, 7, 0, NULL, NULL, NULL> <X4, 4,4, NULL, NULL, NULL> <X5, 9, 0, Y1, 9, 0> <X5, 9, 0, Y4, 9, 3> <X6, 5, 1, NULL, NULL NULL> The table Tmp.sub.-- mv.sub.-- rids may be created according to the statement:
______________________________________
Tmp.sub.-- mv.sub.-- rids = select X.sub.-- rid from MV
where
X.sub.-- rid in (select X.sub.-- rid from New.sub.-- mv) and
Y.sub.-- rid not NULL
______________________________________
At this point in time, MV 400 contains the rows: <R1, X1, 1, 1, NULL, NULL, NULL> <R4, X4, 4, 4, NULL, NULL, NULL> <R5, X5, 9, 0, NULL, NULL, NULL> <R7, X3, 7, 0, NULL, NULL, NULL> and <R8, X6, 5, 1, NULL, NULL, NULL> The Y.sub.-- rid value in all of these rows is NULL, so no rows are selected. Therefore Tmp.sub.-- mv.sub.-- rids=empty. The table Tmp.sub.-- newcnt may be created according to the statement:
______________________________________
* Create table Tmp.sub.-- newcnt(X.sub.-- rid, count) gives the number of
times
a X row is in New.sub.-- mv.
Tmp.sub.-- newcnt = select X.sub.-- rid, count(*)
from New.sub.-- mv
group by X.sub.-- rid;
______________________________________
Executing this query creates a Tmp.sub.-- newcnt with the rows: <X1, 1> <X3, 1> <X4, 1> <X5, 2> <X6, 1> At step 710, the rows in MV which join with exactly one row in Y are updated. These rows may be identified using the tables created in step 708 based on the expression (Tmp.sub.-- newcnt.count=1 and Tmp.sub.-- newcnt.X.sub.-- rid not in Tmp.sub.-- mv.sub.-- rids). Specifically, the update may be performed according to the statement:
______________________________________
update MV
(select
mv-columns-of-OJGraph(Y), columns-of(OJGraph(Y))
from
MV, New.sub.-- mv, Tmp.sub.-- newcnt
where
Tmp.sub.-- newcnt.count=1 and
Tmp.sub.-- newcnt.X.sub.-- rid not in (select X.sub.-- rid from
Tmp.sub.-- mv.sub.-- rids) and
New.sub.-- mv.X.sub.-- rid = Tmp.sub.-- newcnt.X.sub.-- rid and
MV.X.sub.-- rid = New.sub.-- mv.X.sub.-- rid
set
mv-columns-of-OJGraph(Y) = columns-of(OJGraph(Y));
______________________________________
The rowids X1, X3, X4 and X6 are from the rows in Tmp.sub.-- newcnt that have a count of 1. None of these rowids are in Tmp.sub.-- mv.sub.-- rids. The rows of MV that are associated with these rowids, which are selected for update, include: <R1, X1, 1, 1, NULL, NULL, NULL> <R4, X4, 4, 4, NULL, NULL, NULL> After the update, MV includes the rows: <R1, X1, 1, 1, NULL, NULL, NULL> <R4, X4, 4, 4, NULL, NULL, NULL> <R5, X5, 9, 0, NULL, NULL, NULL> <R7, X3, 7, 0, NULL, NULL, NULL> and <R8, X6, 5, 1, NULL, NULL, NULL> At step 712, the rows in MV for which a X row will join with more than one Y row are deleted. These rows will currently have null Y.sub.-- rids. The rows to be deleted in this step may be identified using the tables created in step 708 based on the expression (Tmp.sub.-- newcnt.count>1 and Tmp.sub.-- newcnt.X.sub.-- rid not in Tmp.sub.-- mv.sub.-- rids). Specifically, the deletion may be performed according to the statement:
______________________________________
delete from MV
where
MV.X.sub.-- rid in (select X.sub.-- rid from Tmp.sub.-- newcnt
where
Tmp.sub.-- newcnt.count>1 and
Tmp.sub.-- newcnt.count.X.sub.-- rid not in
(select X.sub.-- rid from Tmp.sub.-- mv.sub.-- rids)
);
______________________________________
The rowid X5 is in a row in Tmp.sub.-- newcnt that has a count greater than 1. X5 is not in Tmp.sub.-- mv.sub.-- rids. Therefore, all rows of MV that include X5 in the X.sub.-- rid column are deleted. After this deletion, MV 400 includes the rows: <R1, X1, 1, 1, NULL, NULL, NULL> <R4, X4, 4, 4, NULL, NULL, NULL> <R7, X3, 7, 0, NULL, NULL, NULL> and <R8, X6, 5, 1, NULL, NULL, NULL> During step 714, the new rows from New.sub.-- mv which have not already been updated in step 710 are inserted into MV. The rows to be inserted may be identified using the expression Tmp.sub.-- newcnt.count>1. Specifically, the insertion may be performed according to the statement:
______________________________________
insert into MV
select New.sub.-- mv.* from New.sub.-- mv, Tmp.sub.-- newcnt
where
New.sub.-- mv.X.sub.-- rid = Tmp.sub.-- newcnt.X.sub.-- rid and
Tmp.sub.-- newcnt.count > 1;
______________________________________
In the present example, New.sub.-- mv includes two rows with X5. <X5, 9, 0, Y1, 9, 0> <X5, 9, 0, Y4, 9, 3> In Tmp.sub.-- newcnt, X5 is associated with a count value that is greater than 1. Therefore, the two rows of New.sub.-- mv that include the X5 rowid are inserted into MV 400. After this insertion, MV 400 includes the rows: <R1, X1, 1, 1, NULL, NULL, NULL> <R4, X4, 4, 4, NULL, NULL, NULL> <R7, X3, 7, 0, NULL, NULL, NULL> and <R8, X6, 5, 1, NULL, NULL, NULL> <R10, X5, 9, 0, Y1, 9, 0> <R9, X5, 9, 0, Y4, 9, 3> Upon completion of step 714, the processing of the right side of the outer join (step 506) is completed and control passes to step 514. At step 514, it is determined whether any base table has not yet been processed. In the present example, all of the base tables have been processed, so the incremental refresh operation is compete. FIG. 6B illustrates MV 400 after the completion of the incremental refresh process. The contents of MV 400 in FIG. 6B is identical to the results produced by executing the join query definition of MV 400 on the base tables X and Y at time T2, as shown in FIG. 6A. However, the cumulative overhead incurred in performing the incremental refresh using the techniques described herein will typically be significantly less than the overhead incurred by a total refresh operation. The sequence of steps illustrated in FIG. 5 are merely exemplary. The sequence of various steps may be altered without altering the outcome of the incremental refresh process. For example, step 502 may be performed between steps 510 and 512. This and various other alterations would be apparent to one skilled in the art. Consequently, the present invention is not limited to any particular sequence of steps for performing incremental refresh of a materialized join view. Under certain conditions, certain steps in the incremental refresh process described above may be skipped without affecting the result of the refresh, thus increasing the speed and decreasing the overhead of the incremental refresh. For example, if, since the last refresh, the only changes made to a base table were inserts, then steps 700, 702, 704, 706 and 510 may be skipped when that base table is being processed. Similarly, if since last refresh, the only changes made to a base table were deletes or drop partitions, then steps 708, 710, 712, 714 and 512 may be skipped when that base table is being processed. According to one embodiment of the invention, the rowids stored in the MV indicate the partition to which the row belongs. In such an embodiment, incremental refresh performance is improved when the only change since the last refresh is that a partition was dropped. Under these conditions, the rows in MV that include values from the dropped partition (the "rows of interest") are identified by inspecting the rowids in the MV. For inner joins, the rows of interest are deleted. For outer joins, some of the columns of the rows of interest are set to NULLs. In addition, if MV is partitioned in a way that parallels the partitioning of T, the same partition from MV could simply be dropped for inner joins. In both cases, no join to master tables are required. Significantly, the incremental refresh technique described above is "memoryless". That is, it does not require information about the order of updates to the base tables. Consequently, the overhead associated with maintaining such sequencing information is avoided. In addition, the technique is idempotent in that performing an incremental refresh N times on the same data using the techniques described herein yields the same results a single incremental refresh on the data. This property is valuable when, for example, the system crashes during an incremental refresh operation. After the crash, the incremental refresh operation may be restarted from the beginning without taking into account how far the operation had progressed prior to the crash. In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
|
Same subclass Same class Consider this |
||||||||||
