High concurrency in use manager5265245Abstract An in use table manager in a computer system uses an in use table to track the use of files, or objects. The in use table is used to determine which objects may need recovery in the event of a system failure. Object addresses are hashed by the in use manager to identify a preferred slot in the table. The slots contain information identifying the object, and indicating the extent of use of the object. The in use manager assigns alternate slots, and dynamically changes the size of the in use table to reduce contention for slots. Several atomic operations on the table ensure integrity of the table, while permitting concurrent use. Portions of the table are bundled into single I/O operations to enhance system performance by minimizing I/O. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE 1
______________________________________
procedure
Compare.sub.-- and.sub.-- swap ( a , wethink, wewant, rc )
{ a is the storage location being operated on
atomically }
{ wethink is the value variable a is assumed
to contain }
{ wewant is new value we want variable a to
be set to if }
{ a is equal to wethink. }
if a = wethink
then
begin
a := wewant;
rc := compare.sub.-- and.sub.-- swap successful;
{ rc is the return code }
end
else
begin
wethink := a;
rc := compare.sub.-- and.sub.-- swap failed;
end
end Compare.sub.-- and.sub.-- swap; :
______________________________________
The pseudo-code shown in Table 2 below describes the logic of acquiring exclusive control of a table entry. When exclusive control is achieved, the entry is said to be locked. The notation: entry.sub.-- address .fwdarw. lock, signifies a de-referencing of the pointer value (entry.sub.-- address) thereby granting addressability to the lock field residing within the preferred synchronization slot of the in-use table entry addressed by the Variable known as entry.sub.-- address. In the following code, the literal "locked" represents one state of the slot, and the literal "unlocked" represents the other state. The logic in Table 2 shows the method of obtaining exclusive control of an in-use table entry (such as the preferred synchronization slot for an object).
TABLE 2
______________________________________
acquire.sub.-- entry : procedure ( entry.sub.-- address )
compare.sub.-- and.sub.-- swap ( entry.sub.-- address -> lock,
unlocked, locked, rc)
while ( rc = compare.sub.-- and.sub.-- swap.sub.-- failed )
begin
delay.sub.-- task.sub.-- briefly;
compare.sub.-- and.sub.-- swap ( entry.sub.-- address -> lock,
unlocked, locked, rc);
end
end acquire.sub.-- entry
______________________________________
The following logic in Table 3 shows the method of relinquishing exclusive control of an in-use table entry. This routine is only used in cases where the entry is already held exclusively by the invoking process and therefore no check for compare and swap success is necessary.
TABLE 3
______________________________________
release.sub.-- entry : procedure ( entry.sub.-- address )
compare.sub.-- and.sub.-- swap ( entry.sub.-- address -> lock,
locked, unlocked, rc)
end Release.sub.-- entry
______________________________________
A procedure is shown in Table 4 to get control of the "proper" entry for an object. It is called the "proper" slot because the first impression regarding the location of the slot which will serve to house recovery information regarding the data base object may need to be revised. Consider, for example, what happens when two objects, which have never been placed in-use previously, both hash to the same slot. The collision is handled by granting the slot to the first requestor and informing the second requestor to hereafter revise the location considered to be the preferred synchronization slot for this object. The revisions performed by the slot manager consist of replacing the zero slot address residing within the object header with the address of the revised slot. Any concurrently executing jobs which had sampled the old zero slot address initially attempt to lock the original slot by hashing the object address, only to discover that in the meantime the preferred slot location had been revised. Consequently, in highly concurrent environments, one may occasionally lock a slot presumed to be the "proper" slot only to discover after acquiring the lock that the "preferred" slot now resides elsewhere. The following pseudo code is provided in order to handle this condition. By using this mechanism, high levels of concurrency are supported and serialization of access at the data base object level is not required. Note that the object is mapped to an entry in a repetitive manner to handle the situation of discovering that the preferred slot resides elsewhere. Repetition is used to verify correct mapping. The in use manager discovers that the address in the table entry does not correspond to the object address which was hashed to identify the preferred slot. The mechanism provides a definite performance advantage.
TABLE 4
______________________________________
Procedure Concurrent.sub.-- lock ( object.sub.-- id, Var
synch.sub.-- entry.sub.-- address)
synch.sub.-- entry.sub.-- address := Nil;
Map.sub.-- object.sub.-- to.sub.-- entry( object.sub.-- id,
original.sub.-- entry.sub.-- address)
{Provides our initial impression regarding
the location of the "preferred" slot}
Concurrent.sub.-- success := false;
Repeat
Acquire.sub.-- entry(original.sub.-- entry.sub.-- address);
Map.sub.-- object.sub.-- to.sub.-- entry( object.sub.-- id,
synch.sub.-- entry.sub.-- address);
{ This is the repetitive mapping described
above, it reveals whether object still
points to same slot}
if { Object no longer maps to the entry we
locked above . }
synch.sub.-- entry.sub.-- address <>
original.sub.-- entry.sub.-- address
then
Release.sub.-- entry(original.sub.-- entry.sub. -- address);
original.sub.-- entry.sub.-- address :=
synch.sub.-- entry.sub.-- address;
Else { The synchronization entry did not
move }
Concurrent.sub.-- success := true;
Until concurrent.sub.-- success = true ;
end Concurrent.sub.-- lock;
______________________________________
In-use Table Operations This section describes a method of performing operations on the in use table in a concurrent environment. A FIND operation is used by the in use manager to retrieve information from the table for a specific object. It is important that this operation be performed with minimal impact on other jobs or processes contending for the in use table as a whole, as well as low impact on those seeking to alter the in use status of the specific data base object whose status is sought. The following pseudo code in Table 5 conveys the method for FIND. The slot is first found, either by using the slot address in the object header, or if zero, hashing to the preferred slot address. The slot is locked while the FIND operation is being performed.
TABLE 5
______________________________________
Procedure In.sub.-- Use.sub.-- Find ( object.sub.-- id, Var Entry.sub.--
info,
Find.sub.-- return.sub.-- code)
Concurrent.sub.-- lock( object.sub.-- id, entry.sub.-- address);
if entry.sub.-- address -> Entry.object.sub.-- id =
object.sub.-- id
{ Slot referenced in the in use table
knows something about the data base
object }
then
begin
{ Return information found in in-use
table for this entry to the user. }
Return.sub.-- entry.sub.-- info(entry.sub.-- address,
entry.sub.-- info)
Find.sub.-- Return.sub.-- code := Find successful;
end
else
Find.sub.-- Return.sub.-- code := Not in table;
Release.sub.-- entry(entry.sub.-- address)
end in.sub.-- use.sub.-- find;
______________________________________
Whenever a process begins using an object the in-use manager is invoked to perform an INCR (increment use-count) operation. If the object was not already in-use a new table entry indicated at 50 in FIG. 3 is allocated for the object and initialized with a use count 58 of 1. On the other hand, if the object is already in-use by others, the INCR operation simply maps to the pre-existing preferred slot using the slot address, and increments the count 58 stored therein. Notice that only 0 to 1 and 1 to 0 transitions regarding the use-count signify a genuine change in the recovery state of an object. It is the 0 to 1 transition which must efficiently reach non volatile secondary storage in order to provide adequate recovery information in the face of a crash. Whenever a 0 to 1 transition is made, the update flag 64 is also set in the slot entry. Note that the actual recovery process is not part of this invention. The in use table is used to identify objects which may need recovery. The table is used during IPL following abnormal system termination. Thus, it is important that the slot entry be written to DASD prior to actual changes in the object. In the preferred implementation on a single level store machine an object identifier 60 is simply the address of the object. For multi-level store machines, this may be simply an identifier such as a file name. The update flag 64 is used during recovery to determine whether or not an object needs special recovery actions performed. If the update flag is on, recovery of the object is done. A lock flag 66 is used to ensure that the entry is not changed by more than one process at any one point in time. It is used by the acquire entry and release entry function described in Tables 2 and 3. An I/O count 6B is used to determine whether or not the entry has been written to secondary storage following a change to the entry. These fields will be discussed in further detail in later sections of the description. The following pseudo code in Table 6 shows the method used for incrementing the use count of an object. There are three different cases encountered for the actual INCR function, and they are shown in Tables 7, 8, and 9. The following Tables use the shortened phrase, sync entry, to refer to the item called: preferred synchronization slot, elsewhere.
TABLE 6
______________________________________
Procedure INCR ( object.sub.-- id, entry.sub.-- info )
Repeat { loop until an entry is allocated }
concurrent.sub.-- lock (object.sub.-- id, entry.sub.-- address);
if { the synch.sub.-- entry is unallocated
entry.sub.-- address -> use.sub.-- count = 0
then { Store the fresh object information in
this slot
INCR.sub.-- case.sub.-- 1(entry.sub.-- address); { See Table 7}
else if { the entry locked already represents
the data base object whose use-count
is to be incremented
entry.sub.-- address -> object.sub.-- address =
object.sub.-- id
then
INCR.sub.-- case.sub.-- 2; { See Table 8 }
else { The entry we've locked is allocated to
some other object, so a new table entry
will need to be allocated
INCR.sub.-- case.sub.-- 3; { See Table 9 }
UNTIL Incr.sub.-- processing = complete ;
if { The surrounding Data Base manager is
dependent on the recovery information
being stored in this slot and therefore
requires the recovery state to be written
to DASD }
ensure.sub.-- level = ensure.sub.-- all OR ensure.sub.-- level =
ensure.sub.-- specific
then
Handle.sub.-- in.sub.-- use.sub.-- Ensurance; { Provides
support for assuring that a write
from main memory to DASD is scheduled
and performed efficiently }
end Incr Procedure;
______________________________________
Contention is likely to arise against such a table from two primary sources: Simultaneous attempts to lock the same slot, and attempts to modify a page of the table at the same time that some other job is attempting to write this page out to secondary storage. The use of the low overhead Compare-and-Swap mechanism coupled with the Concurrent-lock procedure described above addresses the first concern. Careful selection of free slot placement coupled with bundled write operations, described in greater detail below, help alleviate the second concern. A further alternative is the provision of a mechanism by which free slot placement is influenced by knowledge about which formerly modified pages of the table were still in the midst of experiencing I/O at the time a new slot needed to be assigned. The following Table 7 describes the logic used when the probe into the in-use table finds an empty table entry. It is called from the code in Table 6. The entry is used for the object information.
TABLE 7
______________________________________
Procedure INCR.sub.-- case.sub.-- 1 ( entry.sub.-- address);
begin
Set.sub.-- Object.sub.-- in-use.sub.-- pointer
(entry.sub.-- address);
Initialize.sub.-- entry(entry.sub.-- address);
Update.sub.-- IO.sub.-- ranges(entry.sub.-- address);
{ I/0 ranges are maintained in
order to identify which portion
of the in-use table has been
modified recently but not yet
written to DASD }
Update.sub.-- on.sub.-- page.sub.-- allocation.sub.-- bite(entry.sub.--
address);
{ Reveals that another slot on page has
been consumed }
INCR.sub.-- processing := complete
end
end INCR.sub.-- case.sub.-- 1;
______________________________________
The following Pseudo code in Table 8, called from Table 6, describes the case when the probe into the in-use table finds an entry which points back to the object we are putting in use. That is; the object identifier stored in the in-use entry is the same as the object.sub.-- id of the object being put in-use.
TABLE 8
______________________________________
Procedure INCR.sub.-- Case.sub.-- 2 (entry.sub.-- address);
begin
Update.sub.-- entry(entry.sub.-- address);
Update.sub.-- IO.sub.-- ranges(entry.sub.-- address);
INCR.sub.-- processing := complete
end
end.
______________________________________
Once a new slot has been selected, it can be initialized as shown in a part of Table 9, following the statement that increment processing is complete. The last case, shown in Table 9, is when the probe into the in-use table finds a table entry which is allocated for some other object. This results in the allocation of a free entry while the synch.sub.-- entry is held to synchronize table access for this object. Once the new entry is allocated and initialized, and the object is made to point to the new.sub.-- in-use table entry, then the first acquired lock (that of the sync-entry) is released.
TABLE 9
______________________________________
Procedure INCR.sub.-- case.sub.-- 3(entry.sub.-- address);
begin
Allocate.sub.-- new.sub.-- entry(entry.sub.-- address,
New-entry@);
if allocation.sub.-- rc = success
then
incr.sub.-- processing := complete;
Set.sub.-- Object.sub.-- in-use.sub.-- pointer (Object.sub.-- id,
New.sub.-- entry@);
Release.sub.-- lock(Entry.sub.-- address);
Initialize.sub.-- entry(new.sub.-- entry@);
Update.sub.-- IO.sub.-- ranges(new.sub.-- entry@);
Update.sub.-- on.sub.-- page.sub.-- allocation.sub.-- bitmap(new.sub.--
entry@);
Release.sub.-- lock(New.sub.-- entry@);
else
processing.sub.-- complete := NOT.sub.-- COMPLETE;
Release.sub.-- lock(Entry.sub.-- address);
Enlarge.sub.-- in.sub.-- use.sub.-- table;
end
______________________________________
The Decrement function of the in use manager allows a process to reduce the count of active users of an object, when it no longer needs to access the object. This function is to record that a user is no longer using the object (as might happen when a data base file is closed). The use count is decremented to show that one less user is sharing the object. When the use count associated with the object goes to 0, then the entry is de-allocated by turning off the appropriate bit of the on-page slot allocation bit-map. In Table 10, pseudo code for the Decrement Function (DECR) is shown.
TABLE 10
______________________________________
Procedure DECR ( object.sub.-- id, Var Object.sub.-- info,
Return.sub.-- code );
concurrent.sub.-- lock (object.sub.-- id, entry.sub.-- address);
if entry.sub.-- address -> entry.object.sub.-- address=
object.sub.-- ID
{ Entry represents for which DECR is being
performed }
then
begin
{ Return information found in in-use
table for this entry to the user. }
Return.sub.-- entry.sub.-- info(entry.sub.-- address,
entry.sub.-- info);
Decrement the Use Count;
Return.sub.-- code := Object.sub.-- found;
if { the use count is now 0
then
begin
{Turn on the bit in the on-page
bitmap associated with this
slot. This indicates that this
slot is unallocated. }
if { all bits in the on-page bitmap
for this page have a value of 1}
then
{ Toggle the bit in the page
allocation bitmap
corresponding to the page
containing the slot just
decremented to 0 )
end
else { Object is not in - use }
begin
return.sub.-- code := Object.sub.-- not.sub.-- in.sub.-- use;
end
Release.sub.-- lock(Entry.sub.-- address);
end DECR Procedure;
______________________________________
The in use manager finds and allocates entries in the in-use table. The allocation is required when the INCR function chooses a synch entry which is already allocated to an object other than the object being put in-use. The structure of the in use table is shown in FIG. 4. There are three distinct parts of the in-use table structure. An in use table control information header 80 includes a first and last entry address which thereby designate the ensure required range, a size of table indication, and a number of entries in the table value. Also included is a count of the number of I/O write operations issued for the entire table. The table comprises an in use entry section which contains the slots, or entries containing in use information about the objects. The entry section consists of 512 byte logical pages, 82, 84 and 86 each containing 16 entries in one preferred embodiment. Each page can be separately written to DASD. Each page has self contained entry allocation information in the form of entry allocation bit maps, 88, 90 and 92. The entry bit maps comprise 16 bits indicating the status of each entry on the page. A page allocation bit map in the header 80 of the table indicates which pages of the table have available entries. The jth bit of the page allocation bit-map represents the jth logical page of the entry section of the table. Each bit represents the allocation state of a page in the In-Use entry section. The bit has a value of 1 if the page it represents is full. A 0 value means there is still space available on the logical page. As referred to above, the in use table is broken into logical pages each containing 16 entries. In addition to object-related in use status information shown in FIG. 3, the very first entry of each logical page also contains a 16 bit field which is referred to as the on page entry allocation bit map. The leftmost bit represents the allocation status of the first entry on the surrounding logical page containing the bit-map. The 2nd bit from the left represents the allocation status of the 2nd entry on the logical page, and so on. The rightmost bit of the on page entry bit map represents the allocation status of the 16th entry on the logical page. The ith bit of the on page entry bit map can only be changed when the corresponding entry on that page has been locked via the acquire entry procedure described earlier. All accesses to the on page entry bit map are preferably performed using an atomic instruction such as Compare and Swap to ensure that only the desired bit is changed. Allocating Entries As can been seen from the INCR function, the first step in trying to allocate an entry is to map the object to a synchronization entry and check if the synchronization entry can be used. The first case to consider involves the entry not being already allocated to a different object. It is then allocated and the on page entry allocation bit-map is updated. If the logical page becomes full as a result of this allocation, the corresponding bit of the page allocation bit-map 80 is TOGGLED because the page has gone from a not full to a full state. The toggling of the bit is done atomically to ensure that all processors see the correct data when they next access the bit. A page can be in the process of becoming full on behalf of the actions of one process, and at the same time another process could be freeing an entry on the same page. Toggling the bit ensures that the correct status is present when the two operations are complete. The second case arises when the preferred synchronization entry is already allocated for some other object. In this instance, the on page entry allocation bit-map 88 is examined to find an available entry on the same page. This is important because the processor may have already taken one page fault to bring in the page containing the synchronization point from secondary storage. It is desirable to avoid additional page faults by attempting to select a substitute slot from the same logical page already brought into main memory. If an entry on the same page as the synchronization entry is available, that entry is locked and allocated as in case 1 above. A page fault is thus easily avoided. The third case to consider arises when the synchronization point was allocated to a different object, and examination of the on page entry bit map shows that all entries on the page were also allocated. This requires that a different page be selected. Since a full page is represented with a bit setting of one (1), finding a new page involves searching the page allocation bit map looking for a zero (0) bit. One could simply start searching through the page allocation bit map 80 starting at the first bit, but this would result in substantial contention for the first free page 82. The preferred method is to use that synchronization entry's offset into the table as input to yet another hash function that chooses a starting point in the page allocation bit map. In the preferred embodiment, the page finding hash is performed so that slots are allocated in a more uniform manner. The offset is halved and then translated to a page number containing the halved offset. The search begins with the byte in the page allocation bit map containing the nth bit where n is the page number found by the just mentioned translation. Another feature of the preferred implementation technique makes use of a common Translate and Test instruction to search for the first zero (0) and also translate the bit pattern of the page allocation bit map to the bit number of the first zero (0) bit. This encoding of the bit number in the translate table saves the effort of scanning the byte. High level Pseudo code for the general allocation of an entry is shown in Table 11. Case 1 mentioned above is not shown here because it is the first case of the INCR function already presented.
TABLE 11
______________________________________
Procedure Allocate.sub.-- New.sub.-- entry( entry.sub.-- address, var
new.sub.-- entry@);
Map.sub.-- entry.sub.-- to.sub.-- page.sub.-- address(entry.sub.--
address,
page.sub.-- address);
Try.sub.-- to.sub.-- allocate.sub.-- entry.sub.-- on.sub.-- page(page.sub.
-- address,
new.sub.-- entry@, Try.sub.-- page.sub.-- Rc)
While try.sub.-- page.sub.-- rc <> Success do
Begin
{ Look for a page which appears to have
available space }
Find.sub.-- Unallocated.sub.-- page(entry.sub.-- address,
new.sub.-- page.sub.-- address);
{ Try to find an unallocated entry on the
selected page }
Try.sub.-- to.sub.-- allocate.sub.-- entry.sub.-- on.sub.-- page
(page.sub.-- address, New.sub.-- entry@,
Try.sub.-- page.sub.-- Rc)
end
______________________________________
Trying to allocate an entry once a page has been selected is fairly straight forward, and is shown in Table 12. But since the table manager is concurrent there is one interesting wrinkle. The on page entry bit map 88 is examined to determine a candidate entry. The examination is done using a compare and swap instruction to get a most current snapshot of the bit-map. The snapshot is examined to find the first unallocated entry on this page. Unallocated entries on the page are indicated in the on page entry bit map by 1 values. A hardware instruction is used to locate the first 1 bit in the bit-map. Many other alternative means of locating the first 1 bit are apparent to one skilled in the art. The next step is to gain exclusive control of the entry indicated from the snapshot of the bit-map. At this point, it is possible that a delay may be encountered if a concurrently executing process has exclusive control of the entry which was thought to be free. Hence, when the lock for this entry is finally granted, a fresh atomic snapshot of the bit-map must be acquired. The fresh snapshot is examined to make sure the bit corresponding to the locked entry is still a 1 meaning that the entry is still unallocated. If so, the bit is turned off and the new.sub.-- entry@ is set to this new entry. If upon obtaining exclusive control of the entry the reexamination of the on-page allocation bitmap revealed that the entry is already allocated, then the procedure checks the remaining bits of the bit-map and repeats the above process for another candidate entry. This process continues for this page until either an entry is successfully allocated, or no unallocated entries exist on the page.
TABLE 12
______________________________________
Procedure
Try.sub.-- to.sub.-- allocate.sub.-- entry.sub.-- on.sub.-- page (
page.sub.-- address,
new.sub.-- entry@, Try.sub.-- page.sub.-- rc)
get.sub.-- snapshot.sub.-- of.sub.-- bitmap (page.sub.-- address,Var
bitmap)
Find.sub.-- first.sub.-- 1.sub.-- bit(bitmap,Var bit.sub.-- number,
Entry.sub.-- number);
Acquire.sub.-- lock(Entry.sub.-- number);
trypage.sub.-- rc = Do.sub.-- not.sub.-- know.sub.-- yet;
Repeat { Until an entry is allocated, or no
more free entries on page }
get.sub.-- snapshot.sub.-- of.sub.-- bitmap (page.sub.-- address,Var
bitmap)
if { the bit is still set to 1 }
Bit.sub.-- test( bit-map, bit.sub.-- number) = 1
then { we can allocate the entry .
begin
{ Turn off the bit to show entry
allocated }
Reset.sub.-- Bit ( Var bitmap, bit.sub.-- number )
Trypage.sub.-- rc := Success;
end
Else if { every slot on this logical page is
now allocated }
bitmap = 0
then Trypage.sub.-- rc := Failure;
Until trypage.sub.-- rc = success OR Trypage.sub.-- rc =
failure
______________________________________
The following Pseudo code in Table 13 describes how a candidate page is selected using the Page allocation bit map 80.
TABLE 13
______________________________________
Procedure
Find.sub.-- Unallocated.sub.-- page(entry.sub.-- address,
new.sub.-- page.sub.-- address);
{ Choose a point in the bit-map where search
should begin.}
starting.sub.-- bit := (entry.sub.-- address -
first.sub.-- entry.sub.-- address )/
pagesize / 2
starting.sub.-- byte := starting.sub.-- bit / 8 + 1
scan.sub.-- bitmap (starting.sub.-- byte, var stop.sub.-- byte,
bit.sub.-- number, rc)
{ Calculate the proper page.sub.-- number from
bitmap }
Page.sub.-- number := stop.sub.-- byte *8 + bit.sub.-- number -1;
{ determine the actual page address of the
candidate page. }
page.sub.-- address := first.sub.-- entry.sub.-- address +
page.sub.-- number * pagesize;
______________________________________
Minimizing I/O The in use manager minimizes I/O by only performing I/O to save entries to nonvolatile storage when absolutely needed. To achieve this, several fields are added to the table header. First, there are the "high boundary" and "low boundary" pointers which indicate the range of table entries which require forcing. Forcing is used in the sense that the range of table entries will be forced or written from main, volatile storage to nonvolatile secondary storage. It is a common operation performed in a paging environment. The pointers are initialized to indicate that no entries need forcing. Also in the table header is a count field which is incremented just before each I/O is started, and just after it completes thus indicating the pendency of I/O operations on the table. This count field is initialized to zero. This means that whenever the field is an even integer, no I/O is in progress, and whenever it is odd, an I/O is in progress. This fact will be used later. Each table entry also contains an I/O count field 68. By comparing the I/O count with the header count, a process can easily and efficiently determine whether a particular slot's contents have been written to auxiliary storage yet. Since in an interactive computing environment, data base objects are generally put in use when the corresponding data base file is opened, but may not experience actual modification until a later time when the application program begins modifying the underlying data base, we have elected to separate the actions of placing an object in use (INCR) from the subsequent actions (often minutes later) of flagging the object as subject to recovery risks. This second step is signified by requesting that the in use table manager write the corresponding slot to auxiliary storage. (This is known as the ENSURE operation.) Hence, when an object is put in use, the table entry housing it need not be written to disk immediately. Rather, when the system is about to make the first change to an object that may require recovery, the in-use table manager is invoked to ensure that the corresponding table entry has been forced to disk. This permits several objects' entries to be written to secondary storage all at once, thereby minimizing the total number of I/O operations performed against the table as a whole. Some I/O overhead includes bus allocation processing, etc. Without this invention, the simple-minded approach would be to repetitively write the table anew if any change had been made anywhere within the table that had not yet been written. This means that the corresponding entry might have been written several I/O operations previously, and yet is still written again. By means of an even/odd count method described below, when the object is placed in use, the I/O count field 68 in the table entry is set to the value that the header count will be set to when the next full I/O operation has been done. If an I/O operation is currently in progress (the header field is odd), the entry field is set to the header field plus three. If the header field is even, the entry field is set to the header field plus two. The I/O count field is never changed from this point on, until the object is no longer in use. A simple comparison between the header field and the entry field is used to determine whether or not an I/O operation is required for the entry to ensure it is on secondary storage before a modification requiring recovery is made to the associated object. FIG. 5 shows a time ordered sequence depiction of the I/O count stored in the in use table header 103a, 103b, 103c, 103d, and 103e, along with sets 104a, 104b, 104c, 104d, and 104e of four table slots showing their I/O counts. At time t1, no activity has occurred and all values begin at zero (0). In addition, the low and high boundary indicators 105a and 106a are initialized to indicate NO slots need I/O. At time t2, two slots have been allocated and marked with the I/O count which corresponds to the I/O count that will be in the header 103b when the next complete I/O operation has taken place. Since the next I/O will increment the header I/O count from 0 to 2, a 2 is placed in the first and third slots of set 104b. Also, the low and high boundary indicators 105b and 106b are adjusted to indicate which slots need to be written with the next I/O. At time t3, an I/O has been initiated to write set 104c slots one and three to nonvolatile storage. The high and low boundary indicators 105c and 106c have been adjusted to indicate that no slots require an I/O. However, slots one and three still indicate that they have not been written to nonvolatile storage since their I/O counts are greater than the header I/O count. This is correct since the I/O has only been initiated and is not yet complete, as indicated by the 1 in the header I/O count. At time t4, two more objects have been placed in use and have occupied table set 104d slots two and four. Accordingly, the high and low boundary indicators 105d and 106d have been adjusted. Since it cannot be determined if the changes to slots 2 and 4 will be written by the previously requested I/O (slot 2 might be, since it was within the previous high/low indicators, but 4 was not) they cannot be guaranteed to be written until the header I/O count reaches 4. Note that even while the I/O is in progress for slots 1 and 3, changes can be made in volatile storage to slots 2 and 4 which reside in the same page as slots 1 and 3. This is accomplished by making a separate copy (cloning) of the page being written when a subsequent access is made to it. This effectively allows a page to be modified as it is being written. At time t5, the first I/O operation has been completed and the header 103e I/O count updated to 2. This will serve to inform other processes that set 104e slots 1 and 3 have been successfully written to nonvolatile storage. This assures that another I/O will not be issued until it is required that slot 2 or slot 4 be written. The implementation of the I/O reduction logic requires two locks for best performance. The first lock is obtained to limit access to the upper and lower boundaries of the I/O window (table entries between the pointers). This lock is held for the shortest of times, and is not held while an I/O operation is in progress to the holding task. So this lock does not impair performance. Simple procedures are used, and shown in the pseudo code to obtain and release the Limit lock. A second lock, the I/O lock is used to limit the number of simultaneous I/O operations for the in use table to one. This was done so that subsequent requests which are started within a very short time period, can be reduced to one I/O operation. The scenario is as follows: Task 1 issues an I/O request to ensure some changes. (time t0) : Task 2 makes some changes and updates the window of I/O at time t1. : Task 1 now obtains the I/O lock and Limit lock at time t2. : An I/O operation is performed for task 1 consequently writing task 2's data. : Task 2 finally gains control of the I/O lock and discovers that the I/O operation has already been completed. Task 2 does not perform the I/O since it is unnecessary. For even better granularity, the method is also applicable to synchronization of I/O operations to multiple zones I/O of the in-use table. Each zone is allocated on separate DASD devices. One I/O operation is then performed for each zone. To achieve this the low and high boundaries as well as the I/O count are replicated for each zone (extent) of the table. Also, a limit and I/O lock are required for each zone. This extension provides improved performance for large tables which span several storage extents. The implementation of this method logically breaks into two functional pieces. The first piece is invoked when the table entry is operated on via an increment operation. A Handle Update of Ensurance Information routine is invoked from the Increment procedure explained earlier. The pseudo code follows in Table 14.
TABLE 14
______________________________________
Procedure Handle in-use ensurance;
If { This entry has gone from a read-only state to
a read-write thereby revealing that the object
is about to be changed and needs to be
classified as a candidate for potential
recovery processing }
then
Begin
Obtain.sub.-- in-use.sub.-- limit.sub.-- lock;
IF { Lower boundary = 0, or address of table
entry is less than the lower boundary}
THEN { Set Lower Boundary to address of
table entry. }
IF { Address of table entry is greater than
upper boundary of window }
THEN { Set upper boundary to address of
table entry }
{ Set the entry's i/o # to the future value
of I/0 number when this table entry is
guaranteed to be written out. }
DSIUIO# := (DSIUFRC# + 3) & (-2);
{ IF header.sub.-- IO.sub.-- count is odd
THEN
entry.sub.-- IO.sub.-- count = header.sub.-- IO.sub.-- count + 3
ELSE
entry.sub.-- IO.sub.-- count = header.sub.-- IO.sub.-- count + 2}
Release.sub.-- in-use.sub.-- limit.sub.-- lock;
If { Invoker has asked us to assure that a
specific slot be written to DASD AND
the I/0 count in the in use table
header is less than this entry's I/0
count and ensure.sub.-- specific = True and
Header.sub.-- IO.sub.-- count < Entry.sub.-- IO.sub.-- count}
Then { Tell the force routine that a force
is required because of a specific
ensure. }
FORCESPC := SET;
End
End Handle.sub.-- in-use.sub.-- ensurance;
______________________________________
An In Use Force Routine handles the writing of the table entries to auxiliary storage when appropriate as shown in Table 15.
TABLE 15
______________________________________
Procedure In-Use.sub.-- IO;
Var CURIO INTEGER, { Snapshot of current I/0
counter. }
FORCLO@ PTR, { Local copy of lower
bndry of force range}
FORCHI@ PTR; { Local copy of upper
bndry of force range}
Begin
If { The handle.sub.-- ensure procedure has indicated
that I/0 is necessary, OR Global Ensure was
requested }
Then
Begin
{ Set CURIO to the current value of the I/0
counter. }
Obtain.sub.-- in-use.sub.-- IO.sub.-- lock; May be delayed here
during I/0 by
another process
Obtain.sub.-- in-use.sub.-- Limit.sub.-- lock;
If { I/0 hasn't been done since we tried to
get the locks and there is some I/0 to
perform }
Then
Begin
{ Set FORCLO@ to current value of the
BOTTOM of I/0 range}
{ Set FORCHI@ to current value of the
TOP of I/0 range}
{ Clear Lower and Upper limits of I/0
range }
{ Increment the I/0 count once before
I/0 is performed }
Release.sub.-- in-use.sub.-- LIMIT.sub.-- lock; { Letting
other allocations take place. }
Write.sub.-- storage(forclo@, forchi@);
{ Increment the I/0 count once after
I/0 is performed }
Release.sub.-- IN-USE.sub.-- IO.sub.-- lock;
End
Else
Begin
Release.sub.-- inuse.sub.-- limit.sub.-- lock;
Release.sub.-- in-use.sub.-- IO.sub.-- lock;
End
End
End.
______________________________________
Some objects may not need recovery following abnormal system termination if they had not been changed by any of the processes accessing them. Some objects are placed in-use for both read only and read/write functions. Prior to this invention, when an object was not placed in use for read/write purposes, but had been put in use solely for read only operations, and the system failed, restart processing was not able to distinguish between the read only and read/write in-use statuses, and hence read only objects underwent recovery processing needlessly. The in use manager avoids this inefficiency by keeping track with a separate bit (update flag 68) in the table entry which will indicate if the object had ever been placed in use for read/write purposes. This bit indicates whether or not the object should be recovered should the system fail. When ensuring individual entries, the surrounding page(s) containing the entries are written to auxiliary storage. Several methods are used to reduce contention caused by I/O operations which are pending. One method involves copying the contents of the real storage frame of main memory to a clone frame. One is then free to access the clone frame directly, while the original storage frame is available to be written out to secondary storage. Other techniques developed to minimize I/O contention include placing an I/O pending bit in the page allocation bitmap to keep track of which pages were currently being written to storage. The FIND.sub.-- Unallocated.sub.-- page procedure described earlier is used to search for pages which are not full and which are not currently being written to auxiliary storage thereby avoiding the need to clone a page frame as described above. An encoding was developed which allowed a Translate and test instruction to efficiently search a bit string where two adjacent bits represent I/O pending status and allocation status of each page. In one scan of the bit-map the proper free and available page are selected. In a further preferred embodiment, since there are many entries per page, it is desirable to reflect that all neighboring entries have been written to auxiliary storage by a page-out operation intended for a specific entry. The idea is to turn on a bit in an entry when the entry is first allocated. A one value indicates that the entry has not been written to auxiliary storage. When a page-out operation is performed, a storage management exit is invoked which turns this bit off for each entry on the page being written. This prevents the need to turn off the bit outside of storage management since doing so would mark the page as changed, and thus trigger yet another page I/O. The Storage management exit must recognize that the page-out is for table entries of the in-use table. The recognition was made easy on the Application System/400 because the in-use table is recognized by it's virtual address. A further method of reducing contention involves the use of shadow pages. In the absence of the above mentioned clone method, I/O contention is reduced by having two tables which mirror one another. The first table is used for reading and updates. The first table is never explicitly written to auxiliary storage at the request of the in use manager. All updates are copied from the first table into the shadow. When the table entry must be forced, then the shadow portion is written to auxiliary storage. This leaves the first table available for unlimited references and high levels of concurrency irrespective of the I/O traffic. If updates are made to the first table which do not need to be ensured, then the copying of the update may be deferred until an update to the page is made which needs to be ensured. Contention is limited to the times when updates are being copied from the first table to the shadow. While the present invention has been described in terms of a preferred embodiment, it will be recognized that alternative embodiments are within the scope of the invention as claimed. The benefits of identifying slots without serially searching could be achieved by methods other than the particular hash described. There are also alternatives to the atomic operations described. Further, in multiprocessor environments, the in use table may be split between processors, or reside entirely on one processor, dependent on the addressing schemes employed.
|
Same subclass Same class Consider this |
||||||||||
