Apparatus and method for capturing and propagating changes from an operational database to data marts6032158Abstract A method for updating a target table of a data mart in response to changes made by a transaction to data stored in a source table of an operational database. Data that was changed in the source table by the transaction is stored in a dynamic image table of a change capture database. Data that was not changed in the source table by the transaction, but which is nevertheless required to be mapped to the target table, is stored in a static image table of the change capture database. The change capture database also contains relevant information regarding the transaction. Once the dynamic and static image tables are properly staged, the changes are propagated from the change capture database to the target tables of the data marts. In other words, data is extracted from the change capture database and subsequently transformed and loaded, thereby minimizing the impact to the operational database. Thereupon, the tables of the change capture database are truncated to discard data which is now no longer needed. Claims What is claimed is: Description FIELD OF THE INVENTION
TABLE 1
______________________________________
DB.sub.-- SERVER.sub.-- ID NUMBER
//The unique id of the operational
//database.
//Also known as the instance id.
X.sub.-- TID
VARCHAR(57) //The external transaction id (from
//the operational database
I.sub.-- TID.sub.-- HIGH
NUMBER //The high word of the internal
//transaction id
I.sub.-- TID.sub.-- LOW
NUMBER //The low word of the internal
//transaction id
START.sub.-- LSN
VARCHAR(65) //The Isn of the begin transaction
//log record
END.sub.-- LSN
VARCHAR(65) //The of the commit transaction log
//record
BEGIN.sub.-- TIME
DATETIME //The time when the transaction
//started
END.sub.-- TIME
DATETIME //The time when the transaction
//resolved
TRAN.sub.-- STATE
NUMBER //The state of the transaction
//1.fwdarw.started
//2.fwdarw.committed for image builder
//3.fwdarw.committed for pmserver
______________________________________
The server and database id's are used to separate change data streams coming from multiple different operational databases into the same CDC database. The transaction id is a unique number that is assigned internally for preserving the transaction order when changes are propagated to the target data mart. The id of the first log record and the last log record are kept with every transaction. The log record id's are commonly referred to as log sequence numbers (LSNs). They are used to handle rollback commands as may be generated by the operational database. The transaction begin and resolve times are maintained with every transaction in order to support a change propagation based on a time filter (e.g., "Get all changes as of Sunday, 12:00 p.m."). Lastly, the transaction state tracks whether the transaction commits or aborts. This ensures that changes from uncommitted transactions will never be propagated to the target data mart. The log record table 302 contains the unique id of the operational database; the internal transaction id; the source table id; and the dynamic image table name of the source table. Table 2 below shows the fields for an exemplary log record table.
TABLE 2
______________________________________
DB.sub.-- SERVER.sub.-- ID
NUMBER //The unique id of the
//operational database.
//Also known as the
//instance id.
I.sub.-- TID.sub.-- HIGH
NUMBER //The high word in the
//transaction id
I.sub.-- TID.sub.-- LOW
NUMBER //The low word in the
//transaction id
SOURCE.sub.-- TABLE.sub.-- ID
NUMBER //The source table id
IMAGE.sub.-- TABLE
VARCHAR(20) //The dynamic image table
//name of the source table.
______________________________________
The actual changes to the source tables of an operational database are stored in one or more image tables 303-306. For every source table in an operational database from which changes are to be captured, two image tables (one dynamic and one static) are created and maintained inside the CDC database 204. For example, FIG. 3 shows two sets of image tables (303-304 and 305-306). Dynamic image table 303 and static image table 304 correspond to one source table, whereas dynamic image table 305 an static image table 306 correspond to another, second source table. Image tables 303-306 are identical to the source tables, except that they contain additional information used by the CDC process. Dynamic image tables 303 and 305 are used to store the actual changes made to their corresponding operational source tables. Additionally, dynamic image tables 303 and 305 contain the following information: the internal transaction id that made the change; the LSN of the log record containing the change; a flag indicating whether the changed value (e.g., the row value) in the operational table is before the change took place or after the change took place; and a flag indicating whether the change in the operational table was an insert, delete or update row. Table 3 below shows the fields of an exemplary dynamic image table.
TABLE 3
______________________________________
OPB.sub.-- I.sub.-- TID.sub.-- HIGH
NUMBER //The high word of the
//internal transaction id
OPB.sub.-- I.sub.-- TID.sub.-- LOW
NUMBER //The low word of the
//internal transaction id
OPB.sub.-- I.sub.-- LSN
VARCHAR(65) //The log sequence num-
//ber of the log record
OPB.sub.-- CC.sub.-- IS.sub.-- B4.sub.-- IMG
NUMBER //Flag indicating if value
//in the row is the value
//before the change or
//after the change.
//1.fwdarw.before image value
//0.fwdarw.after image value
OPB.sub.-- CC.sub.-- ROW.sub.-- TYPE
NUMBER //The type of the log
//record
//4.fwdarw.insert log record
//5.fwdarw.delete log record
//6.fwdarw.update log record
______________________________________
The internal transaction id from the dynamic image table is used to sequence the transaction in accordance with their original order in the operational database. This ensures data consistency in the target tables after the changes are loaded in the target tables. The LSN from the dynamic image table is used to sequence multiple changes within a transaction and to preserve the data consistency in the target tables. The CDC process also allow changes from insert, delete or updates to be captured and propagated to the target data mart. Hence, every row in the dynamic image table is tagged by the type of change. This allows the extracting, transform, and load process to take appropriate actions in the target table when propagating the changes. When applying the changes from an update operation the engine removes the effect of the before image value and applies the after image value. Consequently, when the change type is update, the dynamic image table contains two rows--one row for the before image and one row for the after image. In contrast, static image tables 304 and 306 are used to store the historical information from their corresponding operational source tables. Such historical information is needed by the extracting, transform and load process to join the changes with historical data. The internal transaction id is also contained in the static image tables. The contents of the static image tables must be synchronized with that of the operational tables. When rows are deleted from the operational source table, the change is recorded in the dynamic image tables. However, that row cannot be deleted from the static image table until all the changes that occurred before the delete are processed. In implementation, the transaction id in the static image table is set to the transaction id of the transaction that deleted the row from the operational tables. If the same row is inserted in the operational database, the transaction id is set to zero. Otherwise, when all the changes prior to the delete are processed, the row with the non-zero transaction id is deleted from the static image table. The main reason to maintain the static image tables 304 and 306 is to reduce the impact on the operational database. The content of the static image tables are used to join with zero or more static image tables and exactly one dynamic image table. The capture process pulls historical data from the operational database. This happens when a change occurs to a primary source table and the associated data in the static image table of non-primary source table is not present. An in-memory queue (within the extract, transform, and load process) of in-flight transactions is maintained to improve the throughput of the system. When a transaction starts (e.g., a log record describing a new transaction is started), this information is recorded in the transaction table as well as in the in-memory queue of in-flight transactions. Upon receiving a change (e.g., a log record describing the change), the in-memory queue is consulted. If the table modified by this transaction is already present in the transaction queue (e.g., already being modified by the same transaction), then no row is added to the log record. Otherwise, the in-memory queue as well as the log record table is updated to record the table being modified by the transaction. When the transaction is resolved, the entry is removed from the in-memory queue. All transaction related lookup are performed on the in-memory queue. Thereby, a lookup in the database table is not necessary. Each element in the in-memory queue contains the following information: external transaction id, internal transaction id, source table id, dynamic image table id, and the transaction begin time. The functions of the CDC process are now described. Referring to FIG. 4, a diagram showing the interactions of the CDC process is shown. Two operational databases 401 and 402 are shown. A log transfer manager 403 scans the server log and forwards the changes of operational database 401 to the capture process. The capture process 404 is responsible for determining the source table that has been modified and the actual changes that had occurred in that particular source table. The changes are then staged in the dynamic image tables of CDC database 410. When the transaction responsible for the changes in the operational table is committed, the capture process 403 marks all the changes made by the transaction ready for the builder process 405. Thereupon, capture process 406 notifies builder process 405 to process these changes. Builder process 405 only processes the changes corresponding to those transactions which have been committed in the operational database. Upon finding the changes in the dynamic image table belonging to the committed transaction, builder process 405 ensures that the related data in the static image tables of all non-primary source tables is present. If the related data is not present, such data will be pulled out from the operational database and stored in the static image table. The list of such non-primary source tables is determined by inspecting every mapping in which the source table is used as a primary source table. If the source table whose changes are found in the dynamic image table is also used as a non-primary source table in other mappings, then the builder process 405 folds the changes in from the dynamic image table to its static image table. If the table which changed is only used as a non-primary source, then the changes are folded from the dynamic to static image tables, and the changes are then removed from the dynamic table. At this point, the changes are ready to be propagated to the target data marts 412 via extract, transform, and load process 411. After the staged data from the image tables are moved into the target data marts 412, such data is no longer needed to be present in the CDC database 410. It is the function of truncator process 406 to determine the lowest water mark points of change propagations to the target data marts 412. The dynamic image tables, static image tables, transaction table, and log record table are then truncated (e.g., records discarded) accordingly. The CDC process is itself, comprised of three separate processes: capture process 404, builder process 405, and truncator process 406. These processes is known as an instance 407. Another similar instance 409 may be utilized to work with a second operational database 402 via LTM 408. Thereby, changes to operational database 402 are also staged and stored onto the same CDC database 410. Eventually, these changes are propagated to the appropriate data marts 412 through the extract, transform, and load process 411. Each of these three different processes are now described in detail. First, the steps for performing the capture process are described in detail as shown by the flowchart of FIG. 5. Initially, the capture process receives incoming log records from the log transfer manager and parses each log record to determine whether the particular log record is of any interest to the CDC system, step 501. Next, in step 502, the log record's transaction type is determined. If the log record indicates a begin transaction, a new element in added in the memory queue, step 503. In addition, a new entry is inserted in the transaction table, step 504. A unique internal transaction id is then assigned to the external transaction that was received from the LTM, step 505. If the log record indicates a commit transaction, then all the elements from the in-memory queue of active transactions are removed, step 506. The transaction state in the transaction table is then updated so that the builder process can know when it can start processing the changes, steps 507 and 508. If the log record indicates an abort transaction, all in-memory queue elements belonging to this transaction are removed first, step 509. A determination is then made in step 510 as to whether the transaction modified any source tables. If a source table was modified, the changes are deleted from the dynamic image tables, step 511, and the entry of this transaction is deleted from the transaction and log record tables, step 512. However, if the log record indicates either an insert, delete, or update transaction, a determination is made in step 513 as to whether the source table is of any interest. If the log record does not change any source table in any mapping associated with the instance, then the log record is deemed of no interest, and it is discarded, step 517. Otherwise, the log record is stored in the log record table and in a specific dynamic image table. Thereupon, a determination is made in step 514 as to whether the table has already been modified by this transaction. If it has not yet been modified, a new entry is added in the log record table, step 515. The actual changes corresponding to the source table are stored in the dynamic image table, step 516. Step 518 determines whether another log record is received. For each subsequent log record, the process is repeated starting with step 501. The builder process is now described in detail. Basically, the capture process initiates the builder process when there is data for the builder to process. Once the builder process is initialized, it reads a transaction form the transaction table that has been marked for processing by the builder. It also determines the list of source tables and dynamic image tables modified by this transaction. Processing the log records belonging to a transaction originating at the source database according to the builder process is shown in the flowchart of FIG. 6. In step 601, a new transaction is started. The transaction and log record tables are consulted to find a transaction which is ready for the builder to start its process on, step 602. For the entire data corresponding to that particular transaction, the following steps are performed. In step 603, a determination is made as to whether there is a next log record belonging to this transaction. If there are more log records for this transaction, step 604 is performed. Otherwise, the process skips to step 612. In step 604, the mapping is determined in which the table is used. Thereupon, if the table is used only as a non-primary source table in any mapping, the changes are folded from the dynamic image table to its static image table, and the changes are deleted from the dynamic image table, steps 605-607. Otherwise, if the table is used as a primary source table, then it is ensured that the associated rows are present in the static image tables of all non-primary source tables in mappings in which the table is used as a primary source table, steps 608-609. If the data does not yet exist in the static tables, it is pulled out from the operational database. If, however, the table is used as both a primary source as sell as a non-primary source table, the changes are retained in the dynamic image table (step 610) and steps 606 and 609 are performed (step 611). The process then repeats, starting at step 603 again. If step 603 determines that there are no more log records belonging to this transaction, then the transaction state is changed to "committed" and the transaction is then actually committed, steps 612 and 613. If there is a new transaction, as determined by step 614, then the builder process repeats, starting with step 601. The truncator process is now described in detail. The truncator process only truncates data that has already been moved to the target data mart by the extract, transform, and load process and which no longer needs to be present in the Change Capture Database. The information on how much data (changes) has been propagated to various target tables are kept in a recovery table as part of the repository. An exemplary recovery table is shown below in Table 4.
TABLE 4
______________________________________
SESSION.sub.-- ID
NUMBER //Id of the session pumping data
//to the target table
SUBSCRIBER.sub.-- ID
NUMBER //unused
MAPPING.sub.-- ID
NUMBER //mapping id of the session
TARGET.sub.-- ID
NUMBER //target table id in which the
//changes are pumped
IS.sub.-- CC.sub.-- SESSION
NUMBER //flag indicating if this is a cc
//session
LAST.sub.-- TIMESTAMP
DATETIME //timestamp of the last changes
//made into the target
RECOVER.sub.-- FROM
NUMBER //used only for non-CC session
I.sub.-- HIGH.sub.-- TTD
NUMBER //the tid of the last change made
//into the target
I.sub.-- LOW.sub.-- TID
NUMBER //the tid of the last change made
//into the target
______________________________________
The truncator process uses this information to determine the truncation points for the various dynamic and static image tables, the log record table, and the transaction table. FIG. 7 is a flowchart describing the steps for performing the truncation process. Initially, the recovery table is read to determine the lowest water mark point for each dynamic image table, step 701. The lowest water mark point for a dynamic image table is the lowest internal transaction id amongst all target tables which use the source table as the primary source. In other words, the truncator determines all target tables that use this particular table as a primary source. The minimum propagation point among all the target tables is the lowest water mark point for this dynamic image table. Thereupon, all changes up to and including this lowest water mark, are discarded from the dynamic image table, step 702. A similar technique is also used to truncate the static image tables. For a static image table, the lowest water mark point is determined in step 703 by finding the lowest internal transaction id amongst all target tables which use this source table as a non-pnmary source. In other words, the truncator process determines all target tables that use this source table as a non-primary source. The minimum propagation point among all these target tables is designated as the lowest water mark point for the static image table. All data having an internal transaction id less than the lowest water mark point is discarded, step 704. Next, in step 705, the lowest water mark point amongst all dynamic and static image tables is then computed. The transaction table and log record tables are then truncated at this new lowest water mark, step 706. FIG. 8 shows an example of how changes are mapped from source tables of an operational database to target tables of a data mart via the dynamic and static image tables of a change capture database. Three source tables are shown for storing data relating to ITEMS 801, PRICES 802, and QUANTITES SOLD 803. Each of these source tables has a corresponding set of dynamic and static image tables. For example, source table 801 has an associated dynamic image table 804 and static image table 805; source table 802 has dynamic image table 806 and static image table 807; and source table 803 has dynamic image table 808 and static image table 809. Three target tables are shown as ITEMS table 810, PRICES table 811, and Sales.sub.13 REVENUE table 812. In this example, suppose that ITEMS source table 801 is designated as being the primary source for ITEMS target table 810. Hence, any changes to the data in source table 801 are captured and staged into dynamic image table 804. The actual changed data is then propagated from dynamic image table 804 to the target ITEMS table 810. Similarly, the PRICES source table 802 is designated as being the primary source for the PRICES target table 811. Likewise, any changes to the prices in source table 802 are captured and staged into dynamic image table 806. When the transaction commits, the changes in dynamic image table 806 are propagated to the target table 811. However, for the SALES.sub.-- REVENUE target table 812, the QUANTITIES SOLD source table 803 is designated as the primary source. But since items and prices data are mapped onto the SALES.sub.-- REVENUE target table 812 (in order to calculate sales information), the ITEMS source table 801 and PRICES source table 802 are designated as being non-primary sources. Consequently, data from static image table 805, static image table 807, and dynamic image table 808 are propagated to target SALES.sub.-- REVENUE table 812. Subsequently, if a change is made to an entry in the ITEMS source table 801, the change is staged in dynamic image table 804. The actual updated, changed data is written from dynamic image table 804 to static image table 805 and then propagated to SALES.sub.-- REVENUE table 812. If, however, the requisite price data does not currently reside within static image table 807, the data (historical and unchanged) is read from the PRICES source table 802 and stored in static image table 807. From static image table 807, the requisite data is then propagated to the SALES.sub.-- REVENUE target table 812. Changes to the QUANTITIES SOLD source table 803 are captured and staged in dynamic image table 808 and eventually propagated to SALES.sub.-- REVENUE table 812. Hence, an apparatus and method for efficiently capturing and propagating changes made upon an operational database to one or more target datamarts with minimal impact on the operational database has been disclosed. The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the Claims appended hereto and their equivalents.
|
Same subclass Same class Consider this |
||||||||||
