Method and system for consistent updates of redundant data in relational databases6567798Abstract An improved method for consistent updates of redundant data in a database is achieved advantageously by providing a dependency model identifying source fields and respective derived fields. The model having the necessary information for dependent data to be identified and calculated on a change of respective source data according to some predetermined rules. Further, by recording changes, i.e., modifications to data involved in a redundancy, and propagating changes including cascaded changes directed to the derived fields using the rules defied in said dependency model. Advantageously, the dependency model is stored as a database table (2) and the changes are recorded in a change table in the database. The step of recording changes is separated from the step of propagating changes to derived fields in order to improve performance. Claims Having thus described our invention, what we claim as new, and desire to secure by letters patent is: Description BACKGROUND OF THE INVENTION
Term Definition
Field An attribute of an
encapsulation of a concept
derived field A field of which the value
is dependent on one or more
other fields
source field A field required for the
calculation a derived field
dependency relation A named and typed
relationship between a set
of source fields and a set
of derived fields. The
dependency relation
specifies the rules for
derivation between the two
sets of fields.
cascading change The case where a derived
field is also a source
filed in another dependency
relation. The original
change cascades through the
database
change table A database table where the
changes to source fields
are recorded
propagation field A field that is maintained
purely for the change
propagator. The field is
not part of the "natural"
encapsulation of the
entity.
With reference to FIG. 1 first an overview of the inventional concepts will be given next below. In the upper box of the drawing a schema 1 is given for indicating by an arrow `a` that a standard Entity Relationship (ER) model is used for describing a redundant database schema which is depicted in the bottom `database area` of the drawing and containing exemplarily the entities `Node` and `Port` with their respective attributes which will be described in more detail below. The dependency model relies on the specification of dependency relations depicted in a box 2. A-field may be involved in several dependency relations with another field. The different dependency relations between two fields are distinguished by their names. In this sample, the database contains data about a data network in two tables, one table `Node` for the nodes of the network and one table `Port` for the ports of the nodes in the network. In a non-redundant schema only the table `Port` would contain a status field. A node may contain several ports as indicated by 1 and m in box 3 the drawing. The status of the node is dependent on the statuses of the ports contained in the node. This example is now described holding redundant data such that `Node` contains a derived field and therewith a redundant field `status`. The field status in the table `Node` should contain the summary status of all ports in the node. This is depicted in the second upper box 3. The node status can become inconsistent in one of the following cases: a) port changes its status and the node status is not updated. b) a port record is inserted or deleted and the node status is not updated to reflect the new or missing port c) the status of the node is directly set by someone using SQL or another application. In every case, extra action is needed to re-establish the consistency of the data base. Further now, there is introduced a new relationship "veto" defined between Node and Port, as depicted in the box 3. This relationship describes the desired effect, that when the Node has a status down, all Ports should be set to down. A real life example of the need for this effect may be, e.g., in the case of a power failure, where alarms are received meaning that the node and all ports are down. According to the invention the existence of dependency relationships is specified by some extensions indicated in box 3 which exist in form of some annotations to the regular relationships in the E/R-Model. These annotations are transformed into a suitable format and stored in the database like other information on the data model, see arrow `b` and box 2. This information identifies the source fields and the derived fields. The following information is needed about the relation.
Information Description
relationID Unique Identifier of the dependency
relation
name Name of the dependency relation
SourceFieldList The list of the source fields used in
the dependency relation
derivedFieldList The list of derived fields in the
dependency relation
On a change of a predetermined source field a source field trigger is fired. see arrow `c` in FIG. 1 which records this change in form of an insert in a so-called change table 4, whereby the new value --if present--is inserted in the associated entry, too. Here, the new value is depicted as being "down". Then, said changes of source fields are propagated to any concerned derived fields with a mechanism 5 called Change Propagator such that finally all derived fields have their respective current value imposed by said one change in said source field. It is obvious that many different scenarios exist which can be used to transport said information depicted above into the logic comprised of the change propagator. For a good performance it is proposed exemplarily to create classes for each entity, each class having objects provided with a number of attributes as e.g. status in the above example. Then, the logic relationships can be transformed into program logic, and code portions of said logic reflecting said relationships between the information depicted above can be created. These code portions can advantageously be run as an executable piece of code instead of realizing it by predicate logic techniques. The inventional concepts, however are more general and cover all of said scenarios mentioned above. Next, the inventional method for optimizing database access will be described with special reference to FIG. 2. Firstly, details of the change recorder will be described. After having established a mapping between all source fields and their respective plurality of derived fields, step 8, for all source fields, a database trigger is set up which is activated by changes to the field--step 10. The trigger mechanism is available in state-of-the-art databases; it allows for additional processing to be associated with normal database operations such as update and delete. According to the invention the triggers fire--step 20--upon a change to the source field and write respective entries into a change table in the data base--step 30. The entries in the change table are committed with the original transaction--steps 40, 50. Then, according to the present invention control is passed to the Change Propagator algorithm 13 step 60 which will be appreciated by a person skilled in the art when understanding the problems caused by blocking mechanism described next below. In some configurations, however, there can exist a so-called blocking problem which will be described and discussed below. Generally, it would be possible to implement the calculation of derived fields in the database triggers. This would have the big advantage that the database is always in a consistent state. This may be possible in certain systems, for example where are simple dependencies or where are only a few updating processes or where are only low rate of changes to the data. In large and sophisticated systems, however, where completely different conditions are present the question arises whether to store redundant data. In such systems the following conditions and facts prevail: Dependencies without redundant information are complicated and compute intensive, there are multiple processes updating the source fields, and there is a high rate of data changes, or a high peak rate of data changes possible, e.g., a blackout causes tens of thousands of Node or Port down status changes. If the calculation of all derived fields takes place in the change recorder, then in a complicated system with many derived fields, cascading changes and multiple updates, it is possible that substantial blocking problems arise. A blocking problem occurs when a database transaction locks resources that another transaction needs and does not give them free in an acceptable amount of time. An example of the effect of this blocking and the decision for a separate Change Propagator process is described below. In this example, all derived fields and cascading changes are implemented in the Change Recorder. In the system presented here, updates to Nodes and Ports usually occur automatically from messages from the Network Management System (NMS). However sometimes messages are lost or are not able to be sent. In this case the network operator must enter the changes manually. A network operator receives a call informing her that a port has gone down. The network operator enters this information into her computer. A database transaction (A) is started, which updates the status of the port. The triggers on the source field Port.status propagate that to the status field of the affected node. Before the change is committed the computer asks the operator to confirm her entry. Unfortunately, the operator has just gone to lunch. During her lunch, messages start arriving from the Network Managing System (NMS) about the status of a second port of the same node. These messages result in a transaction B. Unfortunately, transaction B requires the same Node record as transaction A and is blocked until transaction. A is completed. In this case, it is blocked until the operator has had her lunch, which is quite late. Whether manual intervention is involved or not, performing all calculations in the Change Recorder can lead to substantial blocking, or in the worst case, it can lead even to deadlocks. In order to avoid this, the preferred embodiment recommends dividing the actions involving derived fields into two parts, as called, e.g., the Change Recorder and the Change Propagator. With reference to FIG. 3 the Change Propagator is generally responsible for propagating a change to a source field throughout the database, steps 110 to 210 in FIG. 2. The change to a single source field may result directly in many derived fields in the database being updated, as a source field may be involved in several dependency relations or related to many other records in the database. In more detail, after the start of the Change Propagator which can run in an endless loop in hard-coded form, for example, the decision is taken if there are any new entries in the change table as described above with reference to FIG. 1--decision 120. If yes, then said change table is promoted `active` for providing a signal in order to know that additional actions must be taken now--step 130. Then in a step 140 by help of an inspection of the source field/derived field mapping list described above all relevant values of derived fields are calculated and database update is performed. Then in a decision 150, it is determined if the obligation to propagate changes further on must be taken. If there are other field(s) being derived by that source field updated, see the YES-branch of decision 150, then, first the `active` change table entries are first marked as `done`--step 160, and then, the source field/derived field mapping table described further above is further scanned for subsequent changes necessary for `second degree` derived fields, step 170, and entries which are marked as 'done are deleted from the Change table,--step 180. Then, a loop is reentered comprising steps 140 to 180 with working on third degree derived fields, and so on, until no derived fields are to be found in this `cascade` of fields. On exiting said loop with a NO--condition in decision 150 all change table entries which are not new are deleted for clearing the change table--step 190. Then, in a step 200 all changes to the change table are committed to the database system and the procedure can be terminated--step 210--for the next call. Additionally, the Change Propagator is also responsible for updates to fields resulting from so-called cascading changes. Cascading changes occur when a derived field is also a source field in another relation. This is best illustrated with reference to FIG. 2 and with the following example illustrating a cascading change from Node.status to Link.status. In this example a new entity "Link" is added. The new entity Link models a network connection between two Port entities. The fields of the entity Link are described in the following table.
Field Description
id Unique Identifier
name Name of the Link
fromPortId Identifier of the
originating port
toPortId Identifier of the
destination port
status status of the connection
Link has its own status, reflecting its ability to transmit data. This status may be `down` in the case of a cable being cut or other physical damage to the connection. The Link status is also derived from the status of the `from` and `to` Ports. In the example depicted above, this means that there is a dependent relation `veto` between Port and Link. This veto relation means, in the case of a `from` or `to` Port having the status down, the status of the Link is also down, i.e. the link cannot transmit data if the port is not working. The example depicted above illustrates the basis for a cascading change. A change to the source field Node.status involves an update to the derived field Port.status due to the dependency relation `veto` between Node and Port. However, the change to Node.Status has a cascading effect because Port.status is in turn a source field in the dependency relation `veto` between the entities Port and Link. The Change Propagator is responsible not only for propagating to the derived field Port.Status but also further to the field Link.Status. A change to the status field of a Node may then result in the change of the status for a Link, despite the fact that there is no direct dependency relation between the two entities Node and Link. Thus, if a node is shut down, then not only the node is down and all of its ports are down, but also every link that goes to a port in that node is down. According to the invention, preferably, all direct changes and cascading changes to derived fields occur in a single transaction. This transaction also involves deleting the entries in the change table that were in the change table at the beginning of the Change Propagator transaction. The general flow of processing is illustrated in FIG. 2. The Change Propagator should be implemented as a "singleton" job. That means, there should be only always one Change Propagator running as a database job. It is possible to implement Change Propagators that run in parallel, but the implementation is significantly harder and there is a much bigger chance of blocking or causing deadlocks. Going back now to the blocking problem illustrated in the "network operator" example given further above. The introduction of the Change Propagator allows transaction B to complete. This is because transaction A does not take a lock out on the status of Node in the trigger of the Change Recorder any more. Next, the preferred timing of the Change Propagator will be described. The blocking problem described above is resolved at the cost of potential inconsistencies. The degree to which inconsistencies, are tolerable depends on the problem domain. In the example of network management given further above it is most likely acceptable that a completely consistent picture of the network is available after every 5, 10 or even 60 seconds. The calculation of a highly visible stock index, however, may have a tolerance of a tenth of a second. According to the invention the time window of inconsistency can be prescribed in two different ways. First, the Change Propagator is triggered by the Change Recorder commit. After the Change Recorder has committed its changes--step 40 from FIG. 2--the Change Propagator job is started--step 110 in FIG. 3. This has the advantage that the time window of inconsistency is held to a minimum. A slight disadvantage of this approach is that for every change a job has to be started, and there will be actually no easy way for any optimizing of the activities of the Change Propagator. As to the second way to control the timing of the change propagator an example is given where such an optimization is needed. In this example a faulty port has a tricky mechanical problem: It changes its status very rapidly from down to up and back again. Each change thus results in many changes in the database. The database is constantly under load. According to the present invention, in this situation a starting interval for the Change Recorder is defined: The Change Propagator is configured to start after a configurable time interval. The time window of inconsistency begins with the commit of the Change Recorder and ends after the compute and commit of the Change Propagator after the Change Propagator had started. The advantage of this solution is that entries in the Change Table from multiple transactions can be processed at the same time. The problem situation illustrated in the last example is thus alleviated because several "up"-to-"down" status changes can be reduced to a single change, or, in the case of "up" to "down" followed by "down" to "up" no change will occur at all. The disadvantage is that the window of inconsistency is longer than in the first way described. Next, a preferred feature of the present invention will be described concerning a performance enhancement through Propagation Fields. Before explaining the purpose of propagation fields, it is necessary to illustrate the logic involved in deriving the status of the Node. The status of the node is derived from the status of all of its ports. Thus it is a datum which is derived from a plurality of entities. For example, the status of the node can have one of three values which are depicted with the respective meaning in the table next below.
up all ports have a status of up
down all ports have a status of down
partly down some ports are up and some are down
The logic for changing the node's status is already much more complex. The table next below gives a summary thereof:
old status new status event
up partly down a port went
down, but some
ports still have
status up
partly down down the last port
that was up is
now down
down partly down a port went up,
but some ports
are still down
partly down up the last port
that was down is
now up
partly down partly down a port changed,
but some ports
are up and
some are down
up down the single port
in a node, change
from up to down
down up the single port
in a node,
changed from down
to up
An analysis of the relationship.between node and port quickly shows that in order to calculate the status of the node, information is needed about the status of all of the ports of the node. In database terms that means, that although only one port has changed its status, all of the ports need to be accessed in order to calculate the status of the node. This represents a complete contradiction to the previously defined advantage of storing derived data redundantly, namely quick data access and calculation at run-time. In order to overcome such divergences the present invention's concepts propose to implement so-called Propagation Fields. A Propagation Field is an extra field associated with an entity. Only entities with derived fields should have Propagation Fields. The Propagation Field is not part of the `natural` semantic encapsulation of an entity, but instead, it solely serves to improve the calculation of the derived fields and to avoid unnecessary database read processes. In the above example referring to the propagation of a change to a Port Status to the Node Status, two additional Propagation Fields are needed for the entity Node. They are depicted in the table below:
Propagation Field Description
upCounter Count of all Ports
contained in the Node with
a status of up
downCounter Count of all Ports
contained in the Node with
a status of down
These Propagation Fields are maintained by both the Change Recorder during inserts and deletes and as well by the Change Propagator during propagation. These additional Propagation Fields summarize a collection of single facts into a concentrated and somehow inexact form 3 as they ignore which ports are down and which are up, but on the other hand they enable the calculation of the status of the Node without having to access any additional Port records. This effect can be examined by looking at the change event table given above. In the table given next below the logic column upCounter is abbreviated to uC and downCounter is abbreviated to dC.
old new
logic status status event
uC > 0 && up partly a port went down, but some
dC > 0 down ports still have status up
uC == 0 partly down the last port that was up is
down now down
uC > 0 && down partly a port went up, but some ports
dC > 0 down are still down
uD == 0 partly up the last port that was down is
down now up
uC > 0 && partly partly a port changed, but some ports
dC > 0 down down are up and some are down.
uC == 0 up down the single port in a node,
changed from up to down
uD == 0 down up the single port in a node,
changed from down to up
As can be seen in the left most logic column of the table above, the calculation of the status of the node can now be achieved by examining the data in the single Node record and the change to a single Port status. The change to a Port status changes either the upCounter or downCounter. After this `counter` change, the status of the node can be calculated by evaluating which of the three criteria in the `logic` column has been fulfilled, namely upCounter equals 0 (result is Node status is downdownCounter equal 0 (result is Node status is both upCounter and downCounter are greater than zero (result is that NodeStatus is partly down Thus, no other information about other Ports is needed. For reasons of increased clarity, the dependencies in FIG. 1 do not incorporate the Propagation fields described above. Thus, it should be added that the value of the derived field Node.status is actually derived from the two Propagation Fields Node.upCounter and Node.downCounter as illustrated in the table and the list above. The values of the Propagation Fields, however, are derived from the status of the Ports in the Node, i.e., the fields Port.nodeId and Port.Status. In conclusion it should be noted that not every dependency relation requires Propagation Fields. In the case of an account balance being the sum of all account transactions, the delta representing the gap relative to the current balance is easily calculated from the change to an account transaction. No propagation fields are required because the other account transaction records are not needed to calculate the new account balance. While the invention has been particularly shown and described with respect to preferred embodiments therof, it will be understood by those skilled in the art that the foregoing and other changes in form and details may be made therein without departing from the spirit and scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
