Method for deallocating a log in database systems5832508Abstract A SQL database server system having an enhanced logging system is described. The logging system implements a "private log cache" (PLC) for reducing the contention on the system's "log" resource (which is protected by a log semaphore). An area of memory private to a user's task is set aside as a PLC--a cache where log records are built and stored before being posted to the log. Each PLC may hold multiple log records for a single transaction before they are flushed to the log (page chain) through the log semaphore. When a transaction commits or the memory fills with log records, the PLC associated with the transaction is flushed to the log. Also described is improved log deallocation methodology provided by the system which alleviates the needs to read each of the log pages to be deallocated, as was conventionally required. The operation to do the deallocation occurs at the allocation page, not at the individual pages (which are to be deallocated). In accordance with this method, the system writes a log record which includes the first page and the last page of the run (i.e., the number of those pages). The system need not, however, read any of the pages of the run, other than the first page and the last page. In a common case of a completely filled allocation unit, the system reads two pages--the first page and the last page--and writes one log record. As a result, the system reads far fewer log pages and writes but a single log record. Claims What is claimed is: Description The present application claims priority from and is a continuation-in-part of provisional application Ser. No. 60/025,880, filed Sep. 18, 1996 pending, the disclosure of which is hereby incorporated by reference.
__________________________________________________________________________
/*
** XXS - XLS Session descriptor
*/
typedef struct xxs
/* Link for active transaction list - MUST BE FIRST */
LINK xxs.sub.-- activelist;
/* External data to XLS - none */
/* Internal data */
/*
** This set remain constant once the XXS object
** is attached to the XDES object.
*/
struct xdes
*xxs.sub.-- xdes;
/* Pointer to attached XDES */
struct pss
*xxs.sub.-- pss;
/* Pointer to task's PSS
** (housekeeping)
PLCSTATE
*xxs.sub.-- plc;
/* Pointer to task's PLC */
/*
** This set remain constant during a transaction.
*/
struct dbtable *xxs.sub.-- dbt; /* Database for transaction */
/*
** This set varies over the life of a transaction.
*/
struct sdes
*xxs.sub.-- syslogs.sub.-- xact; /* SDES for SYSLOGS for xacts
*/
uint32 xxs.sub.-- state; /* State of XLS session */
int xxs.sub.-- plclrn; /* LRN id of first lr in PLC */
int xxs.sub.-- count; /* Total number of lr's written */
circ.sub.-- long
xxs.sub.-- flushseq; /* Current dirty seq. for flush */
int xxs.sub.-- error; /* Error for last failure */
struct sdes
*xxs.sub.-- syslogs.sub.-- scan; /* SDES for SYSLOGS for SCANs
*/
XLSCAN *xxs.sub.-- scan; /* Current scan structure */
uint32 xxs.sub.-- reqmarkers;
uint32 xxs.sub.-- nlreqmarkers;
XLRMARKER
xxs.sub.-- markers ›XLSM.sub.-- NUM.sub.-- MARKERS!;
}XXS;
__________________________________________________________________________
The XXS data structure references a private log cache data structure or "PLC object," as shown at 365. This controls the region in memory where the log records are actually cached. In an exemplary embodiment, the PLC object (an object of type PLCSTATE) may be constructed as follows.
__________________________________________________________________________
/*
**
PLCSTATE - PLC object
**
**
The PLCSTATE holds the state information for the PLC itself.
**
Each PLCSTATE owns one PLC.sub.-- DATA object, which is where log
records are
**
actually cached.
**
**
XXS <-> PLC synchronisation
**
** plc.sub.-- xkeeper -
Identifies the the XLS session (via the internal
** descriptor (XXS *)) that keeps the PLC. This field
** may only be changed by the task that owns
** the XLS session and only when holding the PLC
** lock.
**
** This means that the keeper of the PLC can read
** this field without any synchronisation. All other
** tasks must grab the PLC lock to see a stable value for
** this field.
**
** The PLC at any time is kept by at most one XLS session.
** If an XLS session does not have the PLC kept
** (plc.sub.-- xkeeper is NULL), then there can be
** no log records in the PLC. The thread owning
** the PLC may avoid locking the PLC if the current XLS
** session does not have the PLC kept. The assumption
** is that other threads have no interest in the PLC
** when it is empty.
**
** plc.sub.-- lockcount Can only be read and modified under the PLC
spinlock.
*/
typedef struct plcstate
/* External data to XLS - none */
/* Internal data */
/*
** This set remain constant once the PLCSTATE object
** is attached to the PLC.sub.-- DATA object.
*/
struct spinlock *plc.sub.-- spinlock; /* Spinlock for PLC */
BYTE *plc.sub.-- start;
/* Start of PLC.sub.-- DATA */
BYTE *plc.sub.-- end;
/* End of PLC.sub.-- DATA */
int plc.sub.-- maxpagecount;
/* Maximum number of
** log pages required
** to flush the PLC.
*/
/*
** This set is variable during the life of the PLCSTATE
*/
int plc.sub.-- lockcount;
/* PLC semaphore count */
int plc.sub.-- waiters;
/* # of waiters for PLC lock */
int plc.sub.-- maxlength;
/* max len of next log rec */
int plc.sub.-- count;
/* # of log records in PLC */
int plc.sub.-- logspace.sub.-- free;
/* number of bytes still
** reserved in log through
** the threshold manager.
*/
/*
** plc.sub.-- offset
**
** if the PLC is not being flushed then plc.sub.-- offset is the
** address where the next log record will be placed in the PLC.
**
** Otherwise plc.sub.-- offset is the address of the next log record
** to be removed from the PLC.
*/
BYTE *plc.sub.-- offset;
struct xxs
*plc.sub.-- xkeeper;
/* Current XXS keeper of PLC */
struct pss
*plc.sub.-- locker;
/* Pss of task locking the PLC */
struct syb.sub.-- buf *plc.sub.-- bufhead; /* Header of pinned buffer
chain
*/
}PCSTATE;
__________________________________________________________________________
Although the PLC is shown inside the XXS data structure, in a preferred embodiment the XXS data structure would instead store a pointer or handle referencing the PLC structure or object. The XXS data structure 360 also stores or is associated with an array of transaction log record (XLR) markers 370. In a preferred embodiment, the array 370 comprises a fixed array (size=6) XLR markers. Each array entry, which serves as a pre-defined marker pointing to a particular part of the transaction, may be constructed from the following definitions.
__________________________________________________________________________
/*
**
XLRMARKER - Log Record Marker - OPAQUE to XLS clients.
**
**
Clients of the XLS must not assume or have knowledge of
**
any of the below.
**
**
Internally a XLRMARKER has two formats, non-permanent and permanent.
**
**
In the non-permanent form the marker is the numeric
**
order of the log record with the first being XXS.sub.-- LRN.sub.--
FIRST (==1)
**
This is stored in u.sub.-- xlr.sub.-- marker.
**
**
In the permanent form the marker is the RID of the log record
**
(ie. once it has made it to the log page chain). This is stored
**
in u.sub.-- xlr.sub.-- rid.
*/
typedef struct xlrmarker
union
{
struct
{
pgid.sub.-- t fakepage;
int32 u.sub.-- xlr.sub.-- marker;
}s;
RID u.sub.-- xlr rid;
}u;
} XLRMARKER;
/* Markers stored in XXS */
#define
XLSM.sub.-- FIRST 0
/* First log record of transaction */
#define
XLSM.sub.-- LAST 1
/* Last log record of transaction */
#define
XLSM.sub.-- CMDSTART
2 /* Client defined CMDSTART */
#define
XLSM.sub.-- ABORTMARK
3 /* Client defined ABORTMARK i.e.
** a place in log to rollback to
*/
#define
XLSM.sub.-- LASTSAVE
4 /* Client define LASTSAVE
** i.e. a savepoint in log to rollback to
*/
#define
XLSM.sub.-- LASTINPAGECHAIN
5 /* Last log record added to page
chain */
#define
XLSM.sub.-- NUM.sub.-- MARKERS 6
/* Number stored in XXS */
/* Markers not stored in XXS */
#define
XLSM.sub.-- SCAN (-1)
/* Current log record of log scan
#define
XLSM.sub.-- OLDESTXACT
(-2) /* Oldest active transaction */
#define
XLSM.sub.-- LOGLAST
(-3) /* Last log record in log page chain */
#define
XLSM.sub.-- LOGFIRST
(-4) /* First log record inlog page chain
*/
__________________________________________________________________________
Client code refers to the markers by indexing into the array. The first three markers are of particular interest to the present invention. The "first" XLR marker 371 points to the first log record associated with a transaction. In a corresponding manner, the "last" XLR marker 372 points to the last log record associated with the transaction. Finally, the "command start" XLR marker 373 points to the log record associated with the first data access/manipulation command of the current SQL statement (e.g., SQL INSERT command). As indicated by the definition for XLRMARKER and illustrated in FIG. 3B, each XLR marker can take one or two forms. In a first form, the marker stores a pointer 381 to a log record while the log record is in a private log cache. In an alternative form, the marker stores a "RID"--a record ID--as shown at 391. In this latter form, the marker points to the log record in the page chain by referencing a particular ID of the page together with a row number on that page where the record is located. The index stored by pointer 381, on the other hand, simply is the ordinal number of the log record within the transaction; it is used to locate a record within the PLC. 3. Two-Phase "Pinning" And "Unpinning" Recall that the "log pin" represents where in the log page chain log records must be flushed to disk before the corresponding data page can be flushed. Besides this mechanism, the present invention introduces a second mechanism. Because log records can reside in a (user) private log cache (i.e., per "user" connection), a mechanism is needed to ensure that such records are flushed (to the log page chain). If a change which affects a data page is represented in a private log cache, the PLC pin for that page will be instantiated and will point to the corresponding PLC.sub.-- DATA data structure (through the xdes data structure). In this manner, when such a data page is written, the system first unpins the PLC pin, thus causing the corresponding PLC.sub.-- DATA data structure to be flushed to the log page chain. Now, the data page will get a pin to the log, for indicating where in the records must be flushed in order to write the data page. The addition of the second pointer to the data page buffer provides a two-phase unpinning mechanism. Unpinning a page now requires that the corresponding private log cache (i.e., the one being pointed to by the pointer stored in the page) be first flushed to the log page chain. Secondly, the in-memory log itself is flushed to disk (up to the point in the log required for writing that data page). C. Methods Of Using Private Log Caches Operation of the internal methods in the system of the present invention are perhaps best described by way of example. The following description will focus on various logging scenarios and describe changes which occur to the core data structures throughout these scenarios. 1. EXAMPLE #1 FIGS. 4A-E illustrate operation of the system for a transaction which includes an INSERT operation affecting two data pages. FIG. 4A represents the state of the logging data structures (400a) at the outset, when the transaction first begins (i.e., corresponding to the point of processing an SQL statement of BEGIN TRAN). Here, a "begin transaction" log record (BEGINXACT) is posted to the corresponding private log cache, as indicated at 401. Further, the "first" XLRMARKER is now instantiated to point or index into this first log record, as shown at 403. The "last" XLRMARKER also points at this record. Other pointers at this instance either point to themselves or store NULL values, as indicated by "X". At the point in time indicated by FIG. 4A, no actual work has begun. Note that in contrast to prior systems, however, the truncation point in the log (page chain) is not tied up. Specifically, the "begin transaction" log record is not "sitting" in the log page chain but, instead, resides in the private log cache. Further, the system includes at this point an additional flag for indicating that no user data resides in the private log cache. If for some reason the cache needs to be flushed (e.g., because of another transaction), the system knows as a result of the flag that there is not useful user data to be flushed from the private log cache, since it only contains a single "begin transaction" record. In an exemplary embodiment, the flag is maintained by the XXS context descriptor data structure. FIG. 4B illustrates a point in time when actual work has begun. For this example, the transaction has undertaken an INSERT (e.g., as a result of an INSERT SQL statement). The logging data structures, now illustrated as 400b, change as follows. A new log record, INSERT DP1, is appended to the private log cache, as shown at 411. This represents an "insert" log record which indicates that an INSERT operation is occurring at data page 1, as shown at 412. Further, the "last" marker and the "command start" marker are instantiated as follows. The "last" marker now points to this newly-added log record, as indicated at 413. This last marker now indicates the last log record written by the transaction (at this point in time). At this point, the "command start" marker stores a pointer or reference to the INSERT log record as indicated at 414. The "command start" marker now also points to the newly-added log record, but for different reasons. Here, the log record indicates the start of a command statement, here, an SQL INSERT statement. This information is useful in the event that the system needs to roll back the statement: it can roll back the effects which have occurred to the start of the statement. In addition to data being inserted into data page 1, the page is "pinned." Here, the PLC pin is instantiated to point to the private log cache, as indicated at 415. In the currently-preferred embodiment, the PLC pin stores a reference to the xdes data member which, in turn, stores a reference to the XXS data member which, in turn, pins the page to the PLC data structure. Proceeding with the example, suppose an insert occurs on data page 2. This may occur, for instance, as a result of insertion of a data record having a key value which requires that the data record be stored on data page 2. This may occur, for instance, because of an existing clustered index requiring particular records to be inserted into particular data pages (based on key value). As a result, another "insert" log record is appended to the private log cache, as shown at 421 for log data structures 400c. This represents the insertion of data into data page 2, as indicated at 422. Data page 2 is now also "pinned" to the private log cache, as indicated at 423. Specifically, it stores a reference to the xdes data member which controls the XXS data member. Finally, the XLRMARKERs are updated by setting the "last" marker to point to the newly-added "insert" log record, as indicated at 424. FIG. 4D illustrates the state of the data structures (now 400d) at the time when the transaction is to be committed (i.e., corresponds to SQL COMMIT statement). As shown, an "end transaction" record (ENDXACT) is appended to the private log cache, as shown at 431. Now, the "last" marker is updated to point to this newly-added log record, as indicated at 432. At this point, the user is trying to commit the transaction. For the system, this means that all the log records for the transaction must go to disk. FIG. 4E illustrates the state of the logging data structures (now 400e) at the point when the log records have been flushed to the log page chain. FIG. 4E therefore represents a continuation of the events which occur at commit time. As indicated at 440, the log records have been copied from the PLC to the page chain. During this copy operation, the log semaphore is held, as indicated at 441. As each log record is copied, if there is a marker that points to that record then the marker is updated to now store a RID (record ID) value, for indicating the new location of the log record in the log page chain. Recall that each RID value comprises a page ID and row number. Each row number itself points to an entry in a row number table or directory (e.g., row number table 443). Each row number table entry, in turn, points to a particular offset within the page where the row begins. As the BEGINXACT log record is copied to the log page chain, the "first" marker is updated, with a RID value, for pointing to the corresponding location of the log record in the page chain, as indicated at 445. In a similar fashion, the "last" marker is updated to point to the ENDXACT log record in the page chain, indicated at 446. The "command start" marker, similarly, now points to the "insert" log record in the page chain, as indicated at 447. The three markers have all been updated to point to their respective log records, now stored in the log page chain. In addition to the changes to the log records and the markers, the buffers are updated as follows. Each buffer is "unpinned" from the PLC structure and then "pinned" to the log page chain. The buffers for data pages 1 and 2 are "unpinned" from the PLC structure and then "pinned" to the last log page storing the flushed log records, as indicated at 452. Therefore, in a manner similar to that done for the markers, the buffers are now pinned to the log page chain. 2. EXAMPLE #2 The next example illustrates processing when the private log cache reaches a state where it is full. FIG. 5A illustrates the state of logging data structures (500a) at the point which a transaction has begun and a command to insert a large row (i.e., "big" INSERT) has been processed. As illustrated at 501, the insert is sufficiently large that it has almost filled up the private log cache. For this example, another insert will occur which cannot be immediately accommodated by the private log cache. FIG. 5B illustrates the logging data structures (500b) when the additional insert has arrived. Now, since the private log cache cannot accommodate the new insert, the previous "big" INSERT (log records) need to be flushed from the private log cache to the log page chain. As illustrated in FIG. 5B, the log records are copied to the log page chain, indicated at 511. This is done under the control of the log semaphore, as shown at 512. In a manner similar to that described above, the XLRMARKERs are updated to now point to their respective log records as they reside in the log page chain. Also, the buffer (buffer for data page 1) is "unpinned" from the private log cache and "pinned" to the corresponding log page chain, as indicated at 515. FIG. 5C illustrates the state of the logging data structures (500c) at the point of processing the new insert. Now, the "insert" log record for the data insert on data page 2 is appended to the private log cache, as indicated at 521. The "first" marker points to the log record in the log page chain, as indicated by 522. The "last" marker points to the new "insert" log record, as indicated at 523. Additionally, the "command start" marker points to the new "insert" log record, illustrated at 524, since the insert is associated with a different statement (i.e., different SQL INSERT statement). At this point, data page 1 is pinned to the log, as indicated at 525. Data page 2, on the other hand, is pinned to the private log cache, as indicated at 526. Continuing with the example, FIG. 5D illustrates the state of the logging data structures (500d) at the point when the transaction attempts to commit (i.e., COMMIT SQL command statement). The corresponding ENDXACT log record is appended to the private log cache, as indicated at 531. As this represents the last log record, the "last" marker is set to point to it, as indicated at 532. FIG. 5E illustrates the state of the logging data structures (500e) at the point where all logging records are flushed to the page chain. As illustrated in the figure, both data page buffers are now pinned to the log page chain. 3. EXAMPLE #3 FIGS. 6A-E illustrate processing of the logging data structures for the scenario where a data page used by a pending transaction is written from memory, for instance, when the page is written from memory by a virtual memory manager, because the transaction is inactive. Because the data page is being written to disk, it is necessary for the buffer for that page to undergo the unpinned process. The private log cache itself operates under control of a lock so that another transaction, if appropriate, can lock the structure and flush the cache (e.g., for memory management purposes). As shown for logging data structures 600a in FIG. 6A, the buffer for data page 1 is pinned to the private log cache, since log records reside there. Suppose now that the transaction using the page, transaction T1, is inactive for some period of time, thus, causing the system to swap out the page. The act of unpinning of the page's buffer causes the private log cache to flush the log records to the log page chain. This is illustrated for the logging data structures (now 600b) in FIG. 6B. As shown at 611, the log records have been copied to the log page chain; the private log cache is now empty. Accordingly, the markers are updated (with RID values) for referencing their respective log records as they reside in the log page chain. Finally, FIG. 6B indicates that the buffer for data page 1 is now pinned to a log page, as indicated at 612. FIG. 6C illustrates logging data structures (600c) when the transaction T1 again resumes activity. Note here that the log records are not brought back into the private log cache. Instead, they are simply left in the log page chain. As the transaction continues, however, new log records are posted to the private log cache, such as the log record at 621 indicating a new insert on the data page. FIG. 6D illustrates the state of logging data structures (600d) at the point when the transaction is to commit. Now, the "last" marker is updated to point to the ENDXACT log record, as indicated at 631. The system is now ready to flush the log records. This is illustrated in FIG. 6E, for logging data structures 600e. Specifically, all remaining log records have been flushed to the log page chain, as indicated at 641. The buffer for data page 1 has been unpinned from the private log cache and pinned to the log page chain, as indicated at 642. Finally, the XLRMARKERs which pointed to log records in the private log cache are updated (with RID values), as indicated at 643 and 644, for pointing to the log records as they reside in the log page chain. 4. EXAMPLE #4 FIGS. 7A-C illustrate a final scenario where a client requests immediate flushing of log records. This request is referred to herein as a "single log record" (SLR) transaction. A common example of such a transaction occurs during a request for page deallocation (e.g., occurring in response from a DELETE FROM SQL statement). FIG. 7A illustrates the state of logging data structures (700a) at the point where page deallocation is requested. As shown, the private log cache stores a "begin transaction" log record and a "page deallocation" log record. The XLRMARKERs point to appropriate log records, in the manner previously described. Next, flushing of the log records to the log page chain occurs. This is illustrated in FIG. 7B, for logging data structures 700b. The log records have been flushed to the log page chain, as indicated as 711. The XLRMARKERs have been updated to now point to the log records as they reside in the log page chain. Even though an event has occurred which requires immediate flushing of log records, the private log cache accommodates the scenario and then continues on supporting the transaction. As illustrated in FIG. 7C for logging data structures 700c, data has been inserted into data page 1, shown at 721, thus leading to an insert log record being appended to the private log cache, as indicated at 722. The "last" marker and the "command start" marker point to this new log record. The "first" marker, on the other hand, continues to point to the BEGINXACT log record, as it resides in the log page chain. This example serves to illustrate, therefore, that the transaction continues to use the private log cache despite the fact that certain log records associated with the transaction have already been flushed. Backup system A. Glossary For the purposes of this discussion the following terms will be defined: An ALLOCATION PAGE is a descriptor within the database or file system which describes a block of PAGES in a database. In one embodiment an ALLOCATION PAGE will contain 32 EXTENT DESCRIPTORS. An ALLOCATION UNIT is a block of 256 database pages described by an ALLOCATION PAGE. In one embodiment, an ALLOCATION UNIT will contain 32 EXTENTS. An ARCHIVE DEVICE is an I/O device used for making bulk copies of on-line data. An ASSIGNMENT OF A PAGE is the creation of a mapping of a logical page to a stripe. This mapping occupies an element in a STRIPE DIRECTORY. A BACKLOGGED DEVICE is a device which has a high number of pending requests for work compared to other devices available for service. A CURRENT ELEMENT NUMBER is a temporary variable used in determining whether the STRIPE NUMBER for a given logical page has been located. Given the stripe number the invention can refer to the corresponding archive device to which it will write the page. A DATABASE DISK PIECE is a distinct contiguous interval of database disk blocks. In Sybase.RTM. databases, the sysusage table identifies these intervals of disk blocks. A DIRECTORY ELEMENT is a slot in the STRIPE DIRECTORY that expresses the number of EXTENTS assigned to a single STRIPE for a specific ALLOCATION PAGE. A DUMP START RID identifies the most recent transaction log record as of the start of the backup session. The invention uses it to establish the start of the region of the transaction log that will contain records for transactions which commit while the backup session proceeds. An EXTENT is a contiguous interval of disk pages the identifier of whose first page is an integer multiple of the size of the interval. In one embodiment an EXTENT contains 8 PAGES. An EXTENT COUNT is the number of EXTENTS assigned to a STRIPE. An EXTENT DESCRIPTOR is an entry contained in an ALLOCATION PAGE which describes an EXTENT within the corresponding ALLOCATION UNIT. An EXTENT GROUP is an interval of EXTENTS assigned to a STRIPE. A FREE EXTENT is an EXTENT which is not accounted for as belonging to any object within the database or file system. A FREE EXTENT is available for allocation to an object and filling with new data. A LOGICAL PAGE is a page which belongs to a Sybase database. Each logical page has a unique integer identifier called a logical page number. The MAXIMUM STRIPE DENSITY is the smallest number of stripes which must be used when determining the STRIPE AFFINITY for a set of EXTENTS. A PAGE is a fixed-size, addressable container of data in a computing system. Pages generally begin at locations, or addresses, that are integral multiples of the page size. Generally, to access a page, one gives its number or address to the computing system. A PAGE TIMESTAMP is a field within a logical page which indicates the time of the last modification to that page. A PARTIAL EXTENT is a RESERVED EXTENT which contains at least one PAGE available for allocation. A PHASE is a time interval within a backup session during which only database pages having specific properties may be dumped. Under Sybase.RTM. Backup Server software Release 10.0, three phases are defined. These phases elapse within a single session and execute as: 1) dump all allocated database and log pages; 2) dump all pages which changed during Phase 1 in a way that cannot be completely reconstructed from transaction log information; and 3) dump all log pages that have existed continuously between the beginning of Phase 1 and the end of Phase 2. Phases start when the SQL Server sends an RPC to the Backup Server initiating a phase. Phases end when all of the I/O initiated during the phase has physically been completed. The PHASE BOUNDARY is the point in time that marks the beginning or end of a PHASE. PHASE OPTIONS are a list of qualifiers provided to the RPC to indicate that a new phase should begin. PHASE OPTIONS are used by the SQL Server to tell the Backup Server how to perform dump I/O for the phase. One PHASE OPTION is BS.sub.-- YOUSCAN which tells the Backup Server to read the ALLOCATION PAGES directly from the database, perform STRIPE AFFINITY calculations, generate lists of DATABASE PAGES to dump to their corresponding STRIPES, and finally perform the STRIPE I/O specified in the lists. When all STRIPE I/O from the lists is physically complete, the Backup Server signals completion of the begin-phase RPC to the SQL Server. The SQL Server then sends another RPC to mark the end of that phase. In a non-optimized situation, if the SQL Server does not specify BS.sub.-- YOUSCAN, the Backup Server generates a default set of STRIPE DIRECTORIES, the contents of which do not depend on ALLOCATION PAGE contents. The SQL Server then sends lists of pages for dumping to the Backup Server, instead of the Backup Server generating the lists internally as for it does for BS.sub.-- YOUSCAN. The SQL Server then sends another RPC to mark the end of that phase. A REMAINDER STRIPE is the STRIPE to which all EXTENTS unaccounted for in the DIRECTORY ELEMENTS for the ALLOCATION PAGE are to be assigned. A RESERVED EXTENT is an EXTENT that belongs to some database or file system object. AN RPC is a Remote Procedure Call which is a service that allows one process to call a software procedure in another process. The processes may be running on separate machines. The SESSION HANDLER is the process created within the Backup Server in response to a connection from the SQL Server. This process controls the progress of a DUMP or LOAD session. A STRIPE is an archive device denoted by its position within an ordered set of archive devices in concurrent use. Usage of STRIPES involves writing or reading a collection of PAGES to or from a set of archive devices so that the devices operate concurrently and performance is increased. STRIPE AFFINITY is a method for ensuring the integrity of database backups made to multiple backup devices simultaneously. Stripe affinity permits reloading backups from fewer devices than used to make the backup. The STRIPE DENSITY is the number of stripes used to dump a given block of disk pages, such as an ALLOCATION UNIT. The STRIPE NUMBER is the numeric position of a STRIPE within an ordered set of STRIPES. The STRIPE DIRECTORY is the data structure that expresses the number of EXTENTS assigned to each of the STRIPES for a specific ALLOCATION PAGE, a collection of DIRECTORY ELEMENTS plus a REMAINDER STRIPE FIELD. VARIABLE WIDTH DIRECTORY ELEMENTS allow the allocation of smaller stripe directory elements as the number of stripes increases beyond a pre-defined point. B. SQL Backup Server Referring first to FIG. 8, a typical operating environment for a Database Server and Backup Server provided in accordance with the invention is shown. For purposes of this description, it is assumed that the database engine or server is a SQL Server, Release 10.0, such as the type currently available from Sybase, Inc. of Emeryville, Calif. However, the architecture of the improved data backup system may be applied to any suitable transactional database or transactional file system which requires the benefits provided by the instant invention. In the preferred embodiment, the backup system of the invention coordinates the transfer of information between a Backup Server 810 and an SQL Server 812 to produce a recoverable database dump using a special dump synchronization protocol. The dump synchronization protocol may run on a controller (not shown) and control the operation of both the Backup Server and the SQL Server. The controller may also be integrated into the architecture of the Backup Server 810, the SQL Server 812 or both. Both the Backup Server 810 and the SQL Server 812 have access to database devices 816. In addition, the Backup Server 810 can address multiple archive devices 814. The dump synchronization protocol ensures recoverability of archived data by organizing the dump operation from the SQL Server 812 to the Backup Server 810 into a set of phases. In addition, through the use of a number of phase options, the Backup Server 810 is capable of determining the best strategy and order for moving database pages stored on database device 816 from the SQL Server 812 to the Backup Server 810 in order to expeditiously and efficiently archive the contents of the database. Since one of the strategies for increasing the performance of the backup system is to enable unsynchronized database dumps, it is preferable that once a dump is initiated the Backup Server 810 and the SQL Server 812 perform only minimal or no synchronization of access to those database pages on the target database device 816 which are to be backed up. This is advantageous over currently available systems since by providing for such unsynchronized database dumps, the full bandwidth of the available I/O subsystem may be utilized without being locked to SQL Server events. Referring next to FIG. 9, the basic synchronization scheme used in the database dump of the instant invention is shown. As can be seen, dumps are ordered into phases. In Phase 1, a user initiates a request to dump an image of the database to an archive device. The SQL Server blocks the initiation of any other dumps of the target database 920 and records a dump START RID 922. After recording the dump START RID, the SQL Server signals the Backup Server to begin dumping 924. At this point, the Backup Server begins its dump utilizing begins its dump utilizing the fully available I/O bandwidth 926. A transaction list is also built at this point including a flush list 928. When this initial Phase 1 dump is completed, the Backup Server signals the SQL Server that Phase 1 of the dump is completed 930 thereby creating a baseline dump of all the pages which will need to be recorded in the backup database. It should be noted that the backup system of the instant invention is especially well suited to operation in a transactional database environment where it is necessary to update database pages which have already been backed up during some part of Phase 1, but which may have then changed while another part of the Phase 1 backup was in effect. This rewriting of information takes place during Phase 2 during which pages which have been allocated within pending or committed transactions following the time that the dump START RID was received are dumped again. As noted, it is necessary to dump these pages again because an allocation to a transaction tracking log or a page split which takes place during Phase 1, but after the corresponding allocation unit has been dumped, would not otherwise be recoverable. Under the architecture of the instant invention, it is sufficient to re-dump only those pages which have changed, because those pages will contain information created later in time and will therefore overwrite any earlier recorded data with more current data during a restoration session. A useful way to perform this task without limiting the throughput of the SQL Server during Phase 2 is to keep track of all physical changes made to the database which cannot be completely recovered from log information. These are changes which have not yet been committed. These physical changes include, but are not limited to, page splits from B-tree index updates and new pages generated by index creation. During Phase 1, the SQL Server will maintain a list of pages allocated for these purposes, and this list is known as a flush list 928. As can be seen in FIG. 9, at the beginning of Phase 2, the SQL Server blocks any tasks attempting to add to the flush list before they issue the corresponding log records 932. The preferred order is to (1) log the change; (2) flush the page; and (3) add the page number to the flush list. This is necessary since without such blocking those log records could not be redone. The SQL Server then determines an end point for the flush list and sends it to the Backup Server 934 while awaiting acknowledgment from the Backup Server that pages in the flush list have been dumped. The Backup Server then dumps those pages 936 and returns a completion message to the SQL Server 938 indicating that Phase 2 has been completed. Turning next to Phase 3, as can be seen in FIG. 9, the SQL Server handles all the log pages which have been allocated since the start of Phase 1 . The records all fall between the dump START RID and the current last record of the log. All other allocations and data are recoverable from this log. The SQL Server captures the current last record of the log 940 (called the dump instant) and constructs a list of all log pages between the dump START RID and the dump instant. It ensures that all those pages have been written to the database and then sends a list of log pages to the Backup Server for dumping 946. Finally, the flush list is discarded and deactivated which has the effect of reactivating tasks waiting on the flush list. When the Backup Server signals the end of Phase 3, the SQL Server permits new dumps once again 950. It is noted that the incorporation of three (3) phases is not critical to the invention, as Phase 2 is necessitated due to the way that the Sybase.RTM. SQL Server program (in its current release) performs transactional logging. Rather, at a minimum only Phase 1 (the data phase) and Phase 3 (the log phase) are required. For example, in a transactional data repository system, such as a database or transactional file system, it is only necessary to perform an unsynchronized copy of the repository in Phase 1 , followed by a synchronized copy of the transaction log in Phase 3 (which would be the second actual phase in this example). As noted, the function of a later phase is to capture recovery information for all the changes which have occurred during or subsequent to an initial phase but prior to a secondary phase. In the implementation under the Sybase SQL Server Release 10.0, it is necessary to have a Phase 2 for flushed pages due to the particular Sybase design. However this flushed page concept may not exist in other embodiments and, consequently, a data phase and a log phase alone will suffice. Utilizing the above process, the use of phases provides for recoverable dumps because the phases are designed to guarantee that at a future load time, every page is restored with the most recent image taken before the dump instant. For this guarantee to hold, the Backup Server must enforce the condition such that for any Phase P, where P is greater than 1, and any page p, all images of page p from Phases 1 through P-1 must be loaded before any image from Phase P is loaded. In other words, the system must enforce a condition that data is saved and then reloaded in time sequence order such that later transactions will overwrite earlier transactions, thus maintaining the integrity of the database. Since it is desirable to use multiple archive devices in order to increase system throughput and thereby reduce backup time, several basic ways of meeting the above noted conditions are possible. The first approach is to ensure that all images of any page p always appear only within the volumes of a single stripe. The second approach is to physically require the loading of all Phases 1 through P-1 before any later Phase P. Finally, the third approach is to keep track of page time stamps, which tell when a page has been created, in memory, in order to prevent overwriting a later image of a page with an older image of the same page. By analyzing these options, it can be seen that the third approach, while functional, is not practical for large databases since large amounts of memory are required to keep track of the numerous page time stamps. For example, a one-terabyte database composed of two-kilobyte pages, each of which contains a four-byte time stamp, would require two gigabytes of main memory to store all the time stamps. The second approach of physically requiring the loading of all Phases 1 through P-1 before any later Phase P, is also not optimal since in a situation where a set of dump volumes are loaded from fewer archive devices than were used at dump time, it is necessary to change volumes at each Phase boundary on the volumes so that all Phases 1 through P-1 are loaded before any Phase P. As noted, earlier, this is necessary because without such changes, a secondary Phase image of a restored page might be overwritten by a primary phase image from a subsequently loaded stripe volume, thus erroneously backdating that particular page in an unrecoverable manner. While this backdating may be overcome through manipulation by a backup operator, it is too inconvenient to require the user to manage this aspect of Phase ordering while performing backups. Therefore, it has been determined that option 1 provides an optimal strategy for ensuring that for any Phase P where P is greater than 1 and any page p, all images of page p from Phases 1 through P-1 are loaded before any image from Phase P is loaded. This approach is named stripe affinity and is defined as the property that every image of a page in a database appears on exactly one stripe at dump time. Stated another way, the purpose of stripe affinity is to track the disposition of each page in the database or file system so that if a subsequent request comes in to dump a page (for example, in a later phase), the stripe affinity algorithm can determine on which stripe the page should be dumped. This property removes the volume changing associated with the second approach since all the images of any page follow each other physically, as well as chronologically, on the dump volumes of a stripe and so are automatically restored sequentially, in time order, resulting in the latest version of a page always being restored last. It also addresses the memory usage limitations associated with the third (time stamp) approach illustrated above. In other words, using stripe affinity every image of a page will appear on the same stripe and will always appear sequentially in forward time order. A pseudo code listing of a stripe affinity algorithm which may be used is provided in Appendix A of commonly-owned U.S. Pat. No. 5,515,502, issued May 7, 1996; the disclosure of the patent is hereby incorporated by reference. Referring to the phases shown in FIG. 9, during a primary phase (or Phase 1 ) each dumped page is considered for a stripe affinity assignment once and only once. In addition, each dumped page is not seen in any preceding phase because under the invention a database disk piece may only be mapped to a single primary phase. Therefore, under the invention the Backup Server economically remembers the stripe to which a page is dumped by recording this information in a space efficient manner, using as little memory as possible, as well as in a manner which enables the Backup Server to access the information in a time efficient way. In order for this to take place, two primitive operations must be optimized for stripe affinity. These operations are (1 ) assigning a page seen for the first time to a stripe and remembering that assignment; and (2) determining the assignment of a page which may have been seen and previously assigned in a primary phase. In order to meet these two goals, the Backup Server must have the ability to spread I/O requests across all database disks in parallel in order to make full use of the available I/O bandwidth. However, it is preferable that the actual assignments of pages to archive devices be limited to a granularity somewhat coarser than individual pages since it is desirable that the memory used to account for stripe assignments be kept small. In other words, it is desirable to store more than one page at a time on a particular stripe in order to reduce memory needed to reference stripe assignments, since the Backup Server is oriented towards backing up large databases. As such large databases grow, the memory needed to track stripe assignments grows as well, so it is important that the growth be controlled in a way that will not tend to exhaust the memory resources of the computing system performing the backup. In the instant invention, pages are assigned a stripe affinity only during a primary dump phase or Phase 1. The pages may then be dumped in any order, but all the pages from a previous phase must be dumped before any page from the next phase is dumped. A similar rule applies for loads in that the pages in a phase may be loaded in any order, but all the pages from a previous phase must be loaded before any page from the next phase is loaded. Since in the dump sequence pages for later phases may arrive at the Backup Server before that phase has begun, the Backup Server must save these pages and dump them when the proper phase to which they belong begins. In a preferred embodiment, phases are identified by number and later phases may have higher numbers than earlier phases. Phases are not nested; the previous phase must end before the next phase begins. Additionally, although the invention is described with respect to communications between the SQL Server and a Backup Server, the invention does not require two separate programs. Rather it is only required that some service provider perform the necessary functions. The agent and service provider may be as close as a subroutine call. As noted earlier, when a dump has been requested, and before the initiation of the first phase, the SQL Server must inform the Backup Server of the pages mapped to each phase by issuing a series of calls to link each phase with those disk pieces from which it will dump pages. Since these two types of phases, primary (corresponding to Phase 1) and secondary (corresponding to Phases 2 & 3) depend on the phase property bits which are set, in some cases the SQL Server may not send data for a primary phase. Instead, the Backup Server may drive its own transfer of all pages from a device given in a phase map for that phase to the archive device. Then, when the Backup Server has completed the transfer, it will return an acknowledgment for the completion of that phase. Each secondary phase is bound to a single primary phase. A secondary phase contains run lists from the page mapping of its reference primary phase that refers to those pages which may have already been dumped during that particular primary phase. In other words, the pages identified in a secondary phase are referring to the same page maps as referred to by some of the pages dumped during that primary phase. As noted above, the purpose of a secondary phase is to compensate for those inconsistencies which occur in backing up a transactional database which remains on-line while the backup is taking place, by ensuring that the most recent version of each page is present in the dump so that these later versions of pages will be written to the database at load time. In a preferred embodiment, each phase has properties which determine how pages are read from the database and then handled by the Backup Server. Since pages are assigned a stripe only during a primary dump phase, in one embodiment of the invention, if such a phase specifies that a BSYOUSCAN property is active, it indicates that the Backup Server should read allocation pages directly from the disk pieces specified in the phase map for that phase. Run lists are then constructed reflecting reserved (or written) and free (or empty) extents on the allocation page read in order to pass I/O service tasks for transfer. In the preferred embodiment, pages are dumped and loaded by logical page number. The Backup Server has to map these logical pages into offsets in the files or devices which contain the database to backed up. This logical/physical mapping is defined by the SQL Server, but carried out by the Backup Server. This mapping is specified by a series of procedures known as as.sub.--db.sub.--map. Each as.sub.-- db.sub.-- map procedure defines a disk piece which is a range of logical pages, a file or device name, and the offset in the file or device at which the range begins. The Backup Server then determines the physical location of a page in the database by subtracting the start of the range from the logical page number, adding the offset, and using that as a page number in the device. Since the logical-to-physical mapping at the time of the dump may not match the mapping at the time of the load, the Backup Server uses the mapping provided by the as.sub.-- db.sub.-- map procedure at the time of the load no matter what the mapping was at the time of the dump. There may be any number of disk pieces defined by the database via as.sub.-- db.sub.-- map procedures, and the same device may be referenced in more than one as.sub.-- db.sub.-- map. Referring next to FIGS. 10A, 10B and 10C, the determination of stripe assignments for each allocation page is shown. As illustrated, FIG. 10A shows the construction of a stripe directory based on the contents of an allocation page. FIG. 10B describes the interface between the Session Handler and the Service Tasks in greater detail. In FIG. 10B, the interface is a queue of Run List structures 1070. In operation the Session Handler produces Run Lists 1070 as its output and the Service Task 1080 takes the Run Lists as its input. FIG. 10B depicts the queue 1090, an expression of which disks have entries in the Run Lists in the queue for each stripe, the structure of a Run List 1095 and the structure of a Run 1000. These structures determine which blocks the Service Task backs up from the database disks to its archive device. Finally, FIG. 10C illustrates in greater detail the mapping of assigned extents 1005 to Run Lists 1010. In FIG. 10C the contents of an extent group 1007 are represented in a Run List structure 1010, showing in particular how runs from multiple database disks can appear in one Run List. Under the invention stripe assignments are made at the extent level, and no extent ever spans stripes. The use of dump striping allows N archive devices to be used in parallel to create an N-way striped dump. This means that a database may be split logically into N roughly equal sized portions and those portions dumped concurrently to N archive devices. The advantage of this architecture is that it reduces dump time by a factor proportional to N and may, in fact, obviate the need to change physical archive volumes during a backup session since the N archive devices may have enough capacity to hold their corresponding portions of the database on a single device and/or single volume. For the purposes of the invention the set of devices utilized in a striped dump or load is called the stripe set. In a preferred embodiment, the maximum number of devices in the stripe set is 32. The system labels dump volumes per the ANSI tape labeling standard and each volume of the dump saves the ordinal number of the device in the stripe set as well as the logical volume number in the ANSI volume label. This information is then used at load time and in connection with the change of volumes to verify that volumes are mounted in the same order as they were at dump time so that the database is reconstructed consistently. This scheme also allows any combination of devices to be used at dump or load time and only constrains that volume ordering (that is, the order of media within a particular stripe) be the same at dump and load times. Most importantly, using this scheme it is possible to load from a smaller number of stripe devices than were used at dump time. As noted above, stripe assignments are determined for each allocation page, accounted for at the extent level, and never permit an extent to span stripes. The mapping of assigned extents 1005 to run lists 1010 is shown in FIG. 10C. The stripe affinity provided in Appendix A of commonly-owned U.S. Pat. No. 5,515,502 analyzes the distribution of allocated extents in the allocation unit and assigns groups of extents (both free and reserved) in the allocation unit to individual stripes. In a preferred embodiment, a local 4 byte bit map variable is used to represent the used/free status for each of the 32 extents on any allocation page. This assignment decision attempts to balance the distribution of reserved extents across stripes evenly. As seen in FIG. 10A, each allocation page has associated with it a stripe directory 1060. The stripe directory describes the distribution of extents in an allocation unit 1062 to the available stripes 1064a-1064e (1063). The stripe directory is represented as an array of variable width bytes followed by a remainder stripe byte 1065. Each byte corresponds positionally to a particular stripe and contains the number of extents which are assigned to that stripe. The extent counts accumulate up to a maximum of 32 since there are a maximum of 32 archive devices 1064 (FIG. 10A), and there are an array of such entries for each disk piece that is a part of a primary Phase. To ease the implementation on 8-bit-byte platforms, the stripe directory can be composed of an 8-bit remainder stripe number with 8-bit extent counts for 16 and fewer stripes, and 4-bit extent counts for 17-32 stripes. Under this concept the size of the directory elements does not change the algorithm for creating and accessing a stripe directory. Rather, using 4 or 8-bits simplifies the reading and writing of elements on most computers because a 5-bit value need not be extracted into 8-bits, and vice versa. This extraction work is the cost of using the most compact representation. On the other hand using the less-compact design saves on extraction time but uses more memory. In the preferred embodiment, any extents not counted in the directory elements are then assigned to a remainder stripe 1066. The remainder stripe 1066 is simply a stripe to which are assigned all the extents not otherwise assigned to any other stripe. The purpose of the remainder stripe is to allow all of the extents in an allocation unit to be assigned to one stripe without requiring that each directory element be able to represent the maximum number of extents on any given allocation page. The remainder stripe field takes the place of the element in the directory because any extents not assigned to any other of the stripes must be assigned to a remaining stripe. If a directory element contains 0, then no extents are assigned to the corresponding stripe. In this case the stripe affinity algorithm attempts to equalize the number of reserved extents assigned to each stripe, in order to load balance as much as possible. However, every extent, free or not, is assigned in the directory to some stripe. This allows assignment of subsequently allocated extents named in a reconciliation phase (such as Phase 2 or Phase 3) to a stripe without modifying the directory of that extent's allocation page. As noted, in a preferred embodiment the elements of the stripe directory are of a variable width with the width of the elements 1218 decreasing step-wise as the number of stripes 1222 increases. This property is shown in FIG. 12 and allows space and resource savings when representing stripe directories in memory. The decreasing element width reflects the fact that a decreasing number of bits are needed to represent the maximum number of free extents which may lie between reserved extents on an allocation page when at least one reserved extent is written to each stripe. This relationship may be shown by the proof that given a stripe set of size S, with X extents per allocation page, such that S is less than or equal to X, and given that at least one reserved extent is assigned to each stripe, it can be shown that maximum of (X-S+1) reserved or free extents can be assigned to any one stripe by showing that the stripe set is partitioned so that stripes ›1,S-1! archive extents ›1,S-1!. Stripe S then archives extents ›S,X! and there are (X-S+1) extents in ›S,X!. In one modification to the invention, the stripe assignment algorithm may be modified to include a factor based upon feedback information from I/O service tasks as a way of adapting the backup process to differences in the data transfer rates supported by the available archive devices. Using the proof above noted to make an assignment to the stripes shown in FIG. 10A, it can be seen that stripe affinity is based on the I/O backlog which affects stripe 2 (1064b) with the result being that stripe 2 (1064b) is temporarily not available to archive data. In such a case where we have S stripes, and X extents, and where S=5 and X=32, we can use the equation of the log.sub.2 (x-S+1) is equal to the log base log.sub.2 (32-5+1) which is equal to the log.sub.2 (28) resulting in a stripe directory containing 5 bits per element times 4 stripe fields plus 5 bit remainder stripe field for a total of 25 bits in the stripe directory. This equation is generalized in the chart shown in FIG. 12. As seen in FIG. 12, generally the greater the number of stripes 1222, the greater the number of directory bits 1218 which will be necessary. For example, if 5 bits are used to represent the number of extents which may be assigned to one stripe then we can assign 31 extents to one stripe. However, if we have 32 extents on an allocation page, we need to use another stripe to assign the remaining extent. This leads to a calculation known as maximum stripe density (msd), which is the minimum number of stripes which compose used if we make an assignment for every extent (as we must) with a minimum number of extents assigned to the remainder stripe. As shown if the number of stripes grows to 15 (1222) then 80 directory bits will be used. Therefore, in an attempt to make the stripe map smaller, it is possible to have a lower maximum stripe density in return for a smaller number of directory bits. In other words, FIG. 12 shows a manner of controlling the growth of the size of the stripe directory to a rate approximately logarithmic in relation to the growth of the number of stripes beyond 16. As noted above, stripe assignment takes place during the primary phase or Phase 1. In a secondary phase, such as Phase 2 or Phase 3, the SQL Server sends run lists, and logical page number intervals to the Backup Server for archiving. Since this information is an update of information which has previously been stored on the archive devices, the Backup Server must be able to determine quickly the stripes on which the listed pages should be stored. Therefore, a basic optimization of the invention is to calculate stripes only for each whole or partial extent contained in the run list since an extent never spans more than one stripe. Referring back to FIG. 10A, after fetching the stripe directory 1065 for the extents allocation page, the next step is to determine in which stripe (1060a-1060e) a given extent to be assigned falls. This is done by scanning across the elements of the stripe directory 1065, and summing the extent counts from each element. The element for which the accumulated extent count exceeds the given extent number is the stopping point of the scan. Then, the remainder stripe field 1066 is checked against the current element position. If the remainder stripe value is less than or equal to the current element number, then the assignment stripe will be 1+the current element number, otherwise, the assignment stripe is the current element number. If the element scan reaches the remainder stripe field 1066 without the accumulated extent count equaling or exceeding the given extent number, then the extent is assigned to the stripe given in the remainder stripe field 1066. For the stripe assignment example given in FIG. 10A, a page on extent 24 (1060e) would be assigned to stripe 5 (1064e), the remainder stripe, because the sum of the extent counts over the elements for stripes 1 through 4 is less than 24. As indicated earlier, an object of the invention is to provide maximum concurrency or throughput between database I/O and the available archive devices. To reach this goal it has been determined that: 1. to provide load balancing of archive space usage, the concurrency strategy should attempt to place approximately equal amounts of data on each stripe; 2. to provide load balancing of time usage, (that is to prevent large imbalances of backlogged I/O requests across stripes), the concurrency strategy should utilize feedback so it can avoid sending pages to backlogged devices until the device clears its backlog; 3. to provide database disk concurrency, the concurrency strategy should arrange that as many reads as possible be issued to different database disks at any given instant, rather than reading each database disk sequentially; and 4. to avoid conflicts, archive devices should not "fight" with each other for disk access; that is, two archive devices should not be told to read from widely physically spaced locations on the same disk at the same time. Rather, successive I/O requests should be for blocks at, or a short distance ahead of, the current disk head seek position. In such an embodiment, the database disk arm ideally moves across the platters smoothly (ignoring ordinary database activity.) The first objective is met through the use of the stripe affinity scheme of the invention because that scheme attempts to divide up allocated extents on an allocation page evenly across stripes. The second objective is met by the communication and assignment of I/O loads away from backlogged devices. Objectives 3 and 4 are met through the use of their own logic design. Throughout the foregoing the term "Database Disk" or some variant is used. It is noted that these terms may be construed to mean a virtual disk; that is, disk entries for the database stored on a system device. More than one portion of a database can be resident on a virtual disk, and a virtual disk can contain portions of more than one database. In addition, disk pieces for the same virtual disk are grouped together and the disk pieces for a virtual disk then form a single stream of allocation pages. In practice, one stripe affinity directory array will be associated with one disk piece, since disk pieces on a virtual disk can correspond to widely separated intervals of logical pages. This scheme then yields maximum I/O concurrency when virtual disks map to distinct physical spindles with the result that concurrent virtual disk I/O's are physically parallelized. In actual use it is not practical to guarantee that all the stripe archive devices will each read from a different disk whenever it is possible to do so, because the number of combinations of available disks with available stripes can easily become unmanageable even for small numbers of database disks and archive devices. Instead, a best effort approach is preferred so that during a primary stripe assignment phase, data transfers will be allocated to each archive stripe in sequence so that each stripe will tend to access a different disk from the others at a given instant. As seen in FIG. 11A, when there are at least as many disks as stripes, this is a straightforward process. The preferred embodiment follows the scheme depicted below, iterating over the extent groups until all the extents from all the current allocation pages have been sent via run lists to the stripe devices. These iterations are repeated until all the allocation pages from the target disks have been scanned. If there are any remaining disks, the next S disks are selected and the cycle begins again. For example, assuming 5 database disks and 5 stripe archive devices, in a first iteration extent 1 from database disk 1 is written to stripe 1; extent 2 from database disk 2 is written to stripe 2 . . . and extent 5 from database disk 5 is written to extent 5. Then on the second iteration, extent 2 from database disk 1 is written to stripe 2; extent 3 from database disk 2 is written to strip 3; and so on. In this way, extent group 1 from every database disk is always written to stripe 1; extent group 2 from every database disk is always written to stripe 2; etc. This ensures that maximum I/O concurrency can be maintained as the tasks are spread across the available database disks and stripes. It also ensures that extents are always written to the same stripe, which makes certain that data recorded later in time will be written after data recorded earlier in time and further that during restoration data will be provided in a proper time sequence order regardless of the number of archive devices used for restoration. Referring to FIG. 11B, on the other hand, if there are fewer disks than stripes, either at the beginning or at the end of one or more iterations of the cycle described above, then the allocation scheme will still work as depicted except that database pieces will be passed only to D stripes per iteration (rather than S stripes per iteration). Since the passing of data may proceed much faster than the stripe device I/O, and the algorithm of the invention distributes data to different but mostly overlapping stripes during each iteration, the stripe devices will still tend to have I/O requests to all disks pending at one time. This, in turn, will tend to maximize concurrent disk activity as there will be fewer than the total number of stripes per request at each iteration. In a preferred embodiment, an incorporated task scheduler mechanism is used to allocate processor time among the tasks that comprise the backup system. In this way the I/O service tasks can process their event queues while the invention processes the current set of allocation pages. In general, the Backup Server will be able to sense the type of device that the archival device name represents. This backup logic removes the need for users to pre-specify device types or capabilities. The Backup Server is capable of determining the control method for devices given just the name, and popular supported devices include, but are not limited to, 8 mm tape, DAT, 9 track tape, streaming cartridges, disk files, raw foreign mounted disks, floppy discs or other removable disks as well as all of the preceding devices when accessed via a network. As touched on above, a desirable effect of using a dump synchronization logic incorporating stripe affinity of the invention is that it simplifies the load logic used during restoration to a nearly trivial level. No phase synchronization is necessary during loading because of the stripe affinity property. Database disk I/O concurrency takes place automatically assuming similar mapping of logical pages to disks at dump and load times because the dump time logic tends to assign pages from different disks to different stripes. As a result of the calculations performed during dump time, there is little backup specific work to do at load time. Finally, referring to FIG. 13, an illustration depicting the restoration of data from fewer achive devices than used to perform a backup is shown. The "Write Disks" sequences show the approximate time sequence of access by each Service Task to the database disks. This restoration follows the backup illustrated in FIG. 11B. For example, stripe 1 will provide extent I from disk 1; then extent 1 from disk 3; then extent 1 from disk 2. Concurrently stripe 2 will provide extent 2 from disk 2; and then extent 2 from disk 1; then extent 2 from disk 3; and so on. Using this arrangement, data may be restored with good concurrency (that is, the Service Tasks will tend to access separate disks at each time step), and with fewer restoration archive devices. In summary, then, through the teaching of the invention an improved data backup system is provided which is especially suited to the backup of information stored in a large transactional database. The backup system is capable of backing up data while the transactional database is still active, and is capable of utilizing multiple backup archive devices to increase the speed at which a backup takes place. Additionally, the information to be backed up is processed through a calculation to determine stripe affinity, a method for ensuring the integrity of the database backups made to multiple backup devices simultaneously, while also permitting the reloading of data from fewer devices than used to make the original backup. Log deallocation A. Introduction Log deallocation is the process of freeing up that portion of a transaction log which no longer tracks active transactions. A log deallocation typically occurs in the context of a DUMP TRANSACTION command. In Sybase SQL Server, for instance, a DUMP TRANSACTION command is principally a command to archive part of the log, specifically the "unused" portion of the log. Variant forms of the DUMP TRANSACTION command exist, largely for the purpose of truncating the log. An alternative form, DUMP TRANSACTION WITH TRUNCATE.sub.-- ONLY, for instance, serves to only perform the truncation part. Here, there is no effort to archive the deallocated portion to tape. In general, however, the main DUMP TRANSACTION command serves to archive the unused portion of the log, typically to a tape device, and then deallocates that unused portion of the log. The task of allocation and deallocation of pages in a database largely involves tracking which pages (uniform storage units) are employed. In the particular instance of the log, pages are grouped together into another uniform storage unit, a "segment." A segment simply serves as a mechanism for keeping together certain pages. By keeping track of which pages are used (and which ones are not used), the system can optimize space management. In SQL database servers, there is a fairly standardized mechanism whereby transactions write log records to the log. There is, however, no such standardized mechanism for deleting log records from a log. Since SQL standards do not specify such a mechanism, each database vendor has implemented this functionality in its own proprietary manner. In the instance of Sybase SQL Server, for instance, this is done by deallocating log pages. B. Log deallocation process FIG. 14 illustrates diagrammatically how a log file is deallocated. For log 1400, there exists a beginning of the log 1401 and a current end of the log 1403. In an active system, the current end of the log is continually changing as the log file grows. At various points in the log 1400, checkpoint records are inserted, such as checkpoint record 1405. The checkpoint record points back in the log to the oldest active transaction (log record), such as indicated at 1407. Therefore, the portion of the log which is currently in use exists between the current end of the log and that part of the log pointed to by the checkpoint record. Here, the unused portion of the log exists from the beginning of the log to where the checkpoint record points to, as indicated by hatching. When a new DUMP TRANSACTION command is executed, the system inserts a new checkpoint record. In Sybase SQL Server, when parts of the log are deallocated, the system logs a record describing that deallocation. In other words, the deallocation itself is a logged event or transaction. If the command fails or is interrupted, the log truncation will be undone. When the deallocation is complete, the transaction will commit and a commit record will be written for the transaction; at this point, the deallocation will be made permanent. C. Improved log deallocation in system having a backup server In one configuration of a database system, the above-described "Backup Server" may be employed in combination with an SQL database server. FIG. 15 illustrates a SQL database server system 1500 comprising both a SQL database server component 1510 and a backup server component 1520. In such a system, the "archiving" part of a DUMP TRANSACTION command is done by the backup server 1520. Here, the backup server 1520 copies the information which is in the log to an archive device (e.g., tape drive). The "deallocation" part of the command is performed by the SQL database server 1510 deallocating particular log pages. In the specific commercial embodiment of Sybase SQL Server, this task is carried out as follows. The SQL database server 1510 sends a "run list" of pages to the backup server. Since the log pages, like data pages, are divided up into allocation units, the run list represents a portion or whole of an allocation unit, such as allocation unit 1600 illustrated in FIG. 16. In the exemplary embodiment of SQL Server, an allocation unit comprises 256 pages. Of that unit, one page is designated as an "allocation page" (of that unit), such as allocation page 1601. The allocation page itself stores information describing which pages of the allocation unit had been allocated and deallocated. Each allocation unit can be further divided into extents, such as extent 1603. Extents, each which typically comprises eight pages, map to different objects. Within a data segment of a database, therefore, extents can map to different objects. Here, each extent is associated with no more than one object. If residing on a shared data and log segment, the run of pages to be deallocated will be an extent or part of an extent. For a dedicated log segment (e.g., in a production database), the allocation is more serial; the run in such an instance can be as large as an allocation unit. Consider the allocation units 1700 illustrated in FIG. 17. The largest run which can exist occurs when an allocation unit is fully allocated for the log, such as shown at 1705. Here, 255 pages would be deallocated. In a production system, it is common to have many allocation units such as allocation unit 1705. In addition to these, however, there typically exists an allocation unit at the beginning which is partially filled, such as allocation unit 1701. Additionally, there typically exists an allocation unit at the end which is partially filled, such as allocation unit 1703. The common case in a production database system is to have a large number of 255-page runs, such as for allocation unit 1705. Each allocation unit itself represents a unit of work for the backup server to archive. In prior versions of Sybase SQL Server, such as System 10, the deallocation part of the DUMP TRANSACTION work is done one page at a time within each allocation unit, by the SQL database server. In System 10, for each log page to be deallocated, the page must be read. This is required since the log record contains the page header (of the page being deallocated). In System 10, therefore, to deallocate a particular allocation unit, the system would read each page of the part of the log to be deallocated. For each page, the system would write a log record which describes the deallocation of that page. As a result, many pages of the log are read and many log records are written. In the common case of an allocation unit such as unit 1705, for instance, the system would read all 255 pages and write 255 log records. According to the present invention, this methodology is modified such that the system no longer needs to read each of the log pages. Instead, the system now reads the first page and the last page of the run. For allocation unit 1705, for instance, the system would read first page 1721 and last page 1723. According to the present invention, therefore, the operation to do the deallocation occurs at the allocation page, not at the individual pages (which are to be deallocated). In accordance with this method, the system writes a log record which includes the first page and the last page of the run (i.e., the number of those pages). A new log record type--LOGDEALLOC--is provided by the system for this purpose. This new log record is associated with information describing the first and last page of a run. The system need not, however, read any of the pages of the run, other than the first page and the last page. In a common case of a completely filled allocation unit, such as allocation unit 1705, the system reads two pages--the first page and the last page--and writes one log record. As a result, the system reads far fewer log pages and writes but a single log record. In the instance where an allocation unit is not completely filled, the method is instead adapted to deallocate an extent at a time, thus still reading fewer pages and writing fewer log records. Here, the system reads two pages of the extent and writes a single log record. In prior approaches, in contrast, a database system would have read all eight pages of the extent and written eight log records for describing deallocation of each those pages. The approach of the present invention is particularly advantageous in a multi-user environment where writing numerous log records leads to increased contention for the system log, a shared resource. A method of the present invention for log deallocation is summarized by flow chart 1800, shown in FIG. 18. At step 1801, a loop is established to loop through all allocation units containing log pages to be deallocated. When no more allocation units remain, the method is done and may terminate. At step 1802, the method stores the first and last page IDs in the log record, for those log pages to be deallocated; at step 1803, the log record is written. Steps 1804-1806 represent method steps for processing each allocation unit; these will now be described in further detail. At step 1804, the method fetches and locks the allocation page for the current allocation unit under exam. At step 1805, the method updates the allocation page to deallocate the entirety of the log run (for that allocation unit). The method loops through the extents in the allocation page, as indicated at step 1805a. For each extent, this includes turning off the allocation bits in the extent structure on the allocation page corresponding to the pages to be deallocated, as shown at step 1805b. The method step 1805 also includes turning on the deallocation bit in the extent structure on the allocation page, as shown at step 1805c. Finally, the allocation page for the current allocation unit under exam is unlocked, at step 1806. The method loops back to step 1801 for any remaining allocation units; otherwise, all allocation units have been processed and the method is done. While the invention is described in some detail with specific reference to a single preferred embodiment and certain alternatives, there is no intent to limit the invention to that particular embodiment or those specific alternatives. Thus, the true scope of the present invention is not limited to any one of the foregoing exemplary embodiments but is instead defined by the appended claims.
|
Same subclass | ||||||||||
