Transaction and version management system5535386Abstract Each element of a database may have multiple versions; the versions are partitioned into branches, and versions of a branch are ordered linearly according to their timestamps. Branches are timestamped and related to one another by a version graph. Each version of an element of a database is represented by a unique identifier, a timestamp, a branch name and a value. A new version of an element associated with a branch is created in response to an operation associated with the branch which would modify the element. An object graph in the database is represented independent of the branches and version; an application coded for elements in one version (and branch) can be reused for the same elements in a different version and (different branch) without any re-coding effort. Methods for long duration transactions, cooperative transactions and schema evolutions are provided. Claims What is claimed is: Description RELATED APPLICATIONS
TABLE 1
______________________________________
t = 1 t = 2 t = 3 t = 4
______________________________________
A x(0) -> x(1) y(0) -> y(3)
B x(0) -> x(2) y(0) -> y(4)
______________________________________
The most recent versions in branch A and branch B are, respectively, {x(1) , y(3)} and {x(2), y(4)}. The versions that correspond to different time contexts are shown in Table 2.
TABLE 2
______________________________________
TIME CONTEXT VERSIONS
______________________________________
t = 0 {x(0),y(0)}
t = 1 {x(1),y(0)}
t = 2 {x(2),y(0)}
t = 3 {x(2),y(3)}
t> = 4 {x(2),y(4)}
______________________________________
The above example is typical for mapping different branch versions of an object into different timestamps of the same object. This kind of mapping also precludes versions of two branches of an object having the same timestamps being created. The object graph is again no longer version independent. It is only by coincidence that branch B's most recent version is the same as those of time context t.gtoreq.4. In general, a version of an object graph in a given branch cannot be identified with a time context. The current object faulting algorithm cannot, therefore, be used to fault in a version of an object graph in a given branch. An application program must rely on the explicit fetch command to fetch one object at a time in referencing an object graph--a tedious operation for a user to do. The present invention may be implemented using a general purpose computer. In the present invention, a multi-linear approach is utilized. In a multi-linear version scheme, the database can be viewed as a set of 4-tuples, <oid, b, t, value> and a version graph. The new quantity b represents the name of a branch of versions. The version graph represents the relationship among the branches of versions. The other three quantities, oid, t, and value, represent the same factors as in linear versions. An object graph in this model is represented in a version independent way--a pointer to an object in the database contains only oid. Whenever an application changes an object in a multi-linear version database, the object is never modified in place; a new version of the object is stamped (b.sub.m,t.sub.n), where b.sub.m is the name of a branch of versions (i.e., a branch name) and t.sub.n a unique timestamp, and appended to the branch b.sub.m in the database. When a new branch b.sub.m+1 is created from branch b.sub.m, the new branch is timestamped T(b.sub.m+1); the parent and child relationship between branch b.sub.m and b.sub.m+1 together with their timestamps are kept in a version graph shown schematically in FIG. 2. The version graph shows a parent branch 28 (with branch name "b.sub.0 " and timestamp "t.sub.0 ") with two child branches 30 and 32. The child branches have each created new versions of the "car" object from the parent branch. An object in the parent branch b.sub.m is accessible in the child branch b.sub.m+1 using the concept of "copy-on-write" wherein the objects from the parent are copied only at the time when they can no loner be shared between parent and child branches Also, a "context", c, is defined to be (B(c),S(c)) where B(c) is the branch name of c and S(c) the timestamp of c. Context is a generalization of the time context of linear versions. The previous example is presented again using the multi-linear approach described above. Assume that a database of multi-linear versions contains two objects x and y. Let x(b,t.sub.1) denote the version of object x with branch name b and timestamp t.sub.1. Let A and B denote the names of two branches of versions. The following represents one scenario that two applications, one using the branch of versions A and the other B, may have updated the database at time t=1,2,3, and 4. Let A and B have a common parent branch 0 at time t=0 and the versions of x and y at time t=0 be x(0,0) and y(0,0).
TABLE 3
______________________________________
t = 1 t = 2 t = 3 t = 4
______________________________________
A x(0,0)->x(A,1) y(0,0)->y(A,3)
B x(0,0)->x(B,2)
y(0,0)->y(B,4)
______________________________________
The most recent versions in branches A, B and 0 at t>4 are, respectively, {x(A,1),y(A,3)}, {x(B,2),y(B,4)} and {x(0,0),y(0,0)}. The versions that correspond to different contexts are shown in Table 4.
TABLE 4
______________________________________
c(A,t) c(B,t)
t = 0 {x(0,0),y(0,0) {x(0,0),y(0,0)}
t = 1 {x(A,1),y(0,0) {x(0,0),y(0,0)}
t = 2 {x(A,1),y(0,0) {x(B,2),y(0,0)}
t = 3 {x(A,1),y(A,3) {x(B,2),y(0,0)}
t> = 4 {x(A,1),y(A,3) {x(B,2),y(B,4)}
______________________________________
The example illustrates that the set of instances for each context is clearly identifiable; the object faulting algorithm can thus be used to fetch implicitly an object graph from database into main memory for each given context. A storage manager supporting multi-linear versions has the following interface functions: Fetch(c.sub.1,c.sub.2,oid) function fetches an object from the database. The arguments c.sub.1 and c.sub.2 are contexts an oid is the identifier of the fetched object; the branch B(c.sub.1) is an ancestor branch of B(c.sub.2) in the version graph; and the timestamp S(c.sub.1).ltoreq.the timestamp S(c.sub.2). The following steps implement the idea of copy-on-write in fetching an object: 1. Let b=B(c.sub.2) and t=S(c.sub.2). Execute Step 2. 2. Search for an object with the given oid in branch b and which has a timestamp that is most recent with respect to t; if an object is found, return it. Otherwise execute step 3. 3. Let t=T(b) and b be the parent branch of b. If either t.ltoreq.S(c.sub.1) or b is a parent branch of B(c.sub.1), then return "not found". Otherwise go to execute step 2 again. Step 3 is executed when the object that has an earlier timestamp than t cannot be found in branch b; the search of the same object then begin at the parent branch of b with a timestamp earlier than the timestamp of b--the creation time of b. The adjustment of the timestamp of t is required because the object may have a version created in the parent branch of b after b has been created. CreateObject(c) function creates a new object in the current context. A unique oid is returned. CreateBranch(from) function creates a new branch from a given existing branch, from. A timestamped unique branch name is entered in the version graph and returned to the caller. An application can fetch an object either using explicitly the fetch function or implicitly the object faulting mechanism. Object fault occurs when an application dereferences a pointer. Object faulting also invokes the fetch function; the arguments of fetch are a default context and the oid in the pointer being dereferenced. The invocation of the fetch function by the computer during object faulting is transparent to the user. A version graph comprises branches as nodes and parent-child relationships as edges. Each branch is created with a unique timestamp. A node in a version graph contains a branch name and its timestamp. A direct edge from node (b.sub.1,t.sub.1) of branch b.sub.1 and timestamp t.sub.1 to node (b.sub.2,t.sub.2) of branch b.sub.2 and timestamp t.sub.2 means that branch b.sub.2 is a child branch of b.sub.1. The fetch function uses the version graph to implement copy-on-write in fetching an object. Given an oid, the same objects in different branches are locked separately; locking an object of one branch does not preclude the same object (i.e., with the same oid) in other branches from being accessed. A long transaction and a short transaction accessing the same object of different branches do not, therefore, block each other. A long transaction in a multi-linear version model can be thought as a sequence of "regular" transactions operating on a private branch of a database. The intermediate results are saved in the database when a member transaction of the sequence commits. To abort a long transaction in this model simply discards the associated branch of objects. To commit a long transaction is equivalent to merging the branch with its parent branch; minor inconsistencies between versions encountered during merging can be adjusted manually. A branch and its parent branch in a version graph can be associated with different data definitions of an object as found in schema evolution. When an object that exists in the parent branch is referenced for the first time in the child branch, a conversion from the old to the new data definition can then be triggered to take place. In other words, the proposed scheme can support directly a lazy evaluation style of schema evolution. In a cooperative design team, members can view each other's intermediate results. The work done by an individual member cannot be atomic with respect to other members'. But the collection of work done by all members should preserve database consistency. That is, a cooperating team's work should be serializable with the work done outside the team. In short, a cooperative transaction model should satisfy: * The work done by the entire team is an atomic transaction. * Members of a cooperative design team can view and modify each other's intermediate results. The multi-threaded transaction model disclosed herein solves the cooperative transaction problem. A thread models the work of a member of a cooperating team. Objects locked by a transaction are accessible to all the threads of the transaction; members of a team could access each other's intermediate results. A team's work that is modeled by a multi-threaded transaction preserves database consistency as a "regular" transaction does. In a multi-threaded transaction that models the cooperative design work a thread models the work of a member of a cooperating team. The following schemes solves the concurrency control problems associated in a multi-threaded transaction. 1. Shared access among threads-- Objects locked by a transaction are accessible to all the threads of a transaction. Concurrency control among threads again can be resolved through locking. The execution of the threads of a transaction clearly are not serializable. Locking at thread level does not need to follow the two-phase locking protocol. As soon as the access of an object is over, a thread should release the lock. A lock released by a thread should be retained by the transaction until the transaction commits so that a strict two-phase locking protocol is observed at the transaction level. "ThreadRead" is a shared lock and guarantees that no other threads can append a new version to the object. This command may be used by a team member who must have the latest version of an object. "ThreadWrite" is an exclusive lock that allows the owner of the lock to append a new version to the object. A "ThreadNotification" lock is used when a thread wishes to be notified whenever there is a newer version appended to the object by other threads or transactions. 2. Transparency of thread level locks-- An application writer should see no difference between thread level and transaction level locks. A thread should release a lock as soon as its use is over. The system manages the mapping between the thread level locks and the transaction levels. The system also retains the transaction level locks for a transaction until it commits (or aborts) to enforce the two-phase locking protocol at the transaction level. 3. Two-phase commit-- The commit action must be synchronized among the threads using two-phase commit protocol. 4. Deadlock-- A thread waiting for a lock to be released by a different thread or transaction may deadlock with the other threads or transactions. Since a thread cannot be restarted the same way as a transaction, the resolution of deadlock involving a thread should be left as a user's or a cooperating team's responsibility. Deadlock among threads of a transaction distinguishes multi-thread transaction from a distributed transaction because there is generally no deadlock among the threads at different sites of a distributed transaction if the objects at different sites are disjoint and each thread access the data local to its thread. If deadlock is unwanted among threads of a transaction, an application may use any scheme itself to prevent deadlock from occurring. 5. User Interface-- A Begin.sub.-- Thread function with a transaction id as input argument returns a unique thread id if it is successfully registered with the transaction. Otherwise Begin.sub.-- Thread returns a Null value. Most interface functions that are available in a transaction are also applicable in a thread of a transaction. Representing an object graph independent of its versions is an important design feature that enables code reuse when an application needs to work with different versions. The current implementation in Zeitgeist is adequate for linear versions, but not for branching versions. Both long duration transactions and fine grain version management need to deal with branching versions at storage level. Multi-linear versions, a model for supporting branching versions, preserve the representation independence of an object graph and are a natural extension of the current implementation of linear versions in Zeitgeist. The present invention provides several technical advantages over the prior art. A database may be organized in multi-linear versions and a version graph so that an application can access implicitly an object graph of a given version in a given branch through object fault. Further, a long transaction may be modeled using a sequence of "regular" transactions accessing a common branch of versions. The present invention supports fine grain version management at the storage management level. The version graph is for the entire database-not one version graph per object. Also, the present invention supports lazy evaluation style schema evolution directly. While the present invention has been described in connection with an object oriented database, it may be used in connection with other databases as well. Although the present invention has been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
|
Same subclass Same class Consider this |
||||||||||
