Recoverability

Gap recovery for off-site data storage and recovery systems

5412801

Abstract

A method for creating a control record for use in future machine recovery of gaps in a complete series of journal data formed by a computing machine from a complete series of transactional data. Copy is formed of the series of journal data for transmission to a remote site. A plurality of data group records are formed. Each data group record identifies a different boundary of each of a plurality of data groups in the series of journal data. The presence is detected of a gap in the copy of the journal data for transmission. One of the data group records is created for a boundary of each gap.


Claims

What is claimed is:

1. A method of logging a complete series of journal data, which is sequentially created and is stored in a computing machine system at a central site, to a remote site, the method being characterized in that there may be a gap in the series of data, created at the central site, that is being received at the remote site, the method comprising the steps of:

a) copying the journal data being stored at the central site;

b) transmitting the copied journal data to the remote site;

c) detecting a gap in the complete series of journal data that is in the copied journal data transmitted to the remote site; and

d) identifying groups of data (data groups) in the series of journal data being transmitted at the remote site and creating an ending timestamp for each data group indicating an ending of the corresponding data group.

2. A method of logging a complete series of journal data, which is sequentially created and is stored in a computing machine system at a central site, to a remote site, the method being characterized in that there may be a gap in the series of data, created at the central site, that is being received at the remote site, the method comprising the steps of:

a) copying the journal data being stored at the central site;

b) transmitting the copied journal data to the remote site;

c) detecting a gap in the complete series of journal data that is in the copied journal data transmitted to the remote site; and

d) creating an ending timestamp for the gap in the copied journal data indicating, in the series of journal data, an ending time for the gap in the series of journal data.

3. A method of logging a complete series of journal data, which is sequentially created and is stored in a computing machine system at a central site, to a remote site, the method being characterized in that there may be a gap in the series of data, created at the central site, that is being received at the remote site, the method comprising the steps of:

a) copying the journal data being stored at the central site;

b) transmitting the copied journal data to the remote site;

c) detecting a gap in the complete series of journal data that is in the copied journal data transmitted to the remote site; and

d) identifying groups of data (data groups) in the series of journal data being transmitted at the remote site and creating a beginning timestamp indicating a beginning of the corresponding data group and an ending timestamp for each data group indicating an ending of the corresponding data group.

4. A method of logging a complete series of journal data, which is sequentially created and is stored in a computing machine system at a central site, to a remote site, the method being characterized in that there may be a gap in the series of data, created at the central site, that is being received at the remote site, the method comprising the steps of:

a) copying the journal data being stored at the central site;

b) transmitting the copied journal data to the remote site;

c) detecting a gap in the complete series of journal data that is in the copied journal data transmitted to the remote site; and

d) creating a beginning and an ending timestamp for the gap in the copied journal data indicating, in the series of journal data, a beginning and ending time for the gap in the series of journal data.

5. A method of logging a complete series of journal data, which is sequentially created and is stored in a computing machine system at a central site, to a remote site, the method being characterized in that there may be a gap in the series of data, created at the central site, that is being received at the remote site, the method comprising the steps of:

a) copying the journal data being stored at the central site;

b) transmitting the copied journal data to the remote site;

c) detecting a gap in the complete series of journal data that is in the copied journal data transmitted to the remote site;

d) identifying data groups in the series of journal data being transmitted to the remote site and creating for each said data group and each said gap a different one of a sequence of incrementally valued data group numbers;

e) arranging the series of journal data created at the central site into a series of data blocks occurring in a sequence in time and each such data group comprises a plurality of such data blocks;

f) creating a beginning and an ending timestamp indicating a beginning boundary and an ending boundary of the data blocks in each such data group and in each such gap, which are adjacent in the series;

g) retrieving, from the complete series of journal data at the central site, a copy of any of the journal data which goes in such gap upon detecting such gap; and

h) transmitting the copy of the journal data which goes in such gap to the remote site.

6. The method of claim 5 wherein the step of creating the timestamps for the gap comprises the step of extending the time for each boundary by a gap extender value to thereby form the timestamps.

7. A method of logging a complete series of journal data, which is sequentially created and is stored in a computing machine system at a central site, to a remote site, the method being characterized in that there may be a gap in the series of data, created at the central site, that is being received at the remote site, the method comprising the steps of:

a) copying the journal data being stored at the central site;

b) transmitting the copied journal data to the remote site;

c) detecting a gap in the complete series of journal data that is in the copied journal data transmitted to the remote site;

d) creating at least one data group identifier for a group of journal data (data group) in the copied journal data which, in the series, is adjacent to and outside of the gap;

e) creating a data group identifier (gap data group identifier) for a group of the journal data that is in the gap;

f) storing in a check point file an indication of time, in the series of journal data created at the central site, for any of such data groups;

g) using the indication of time stored in the check point file to create a time of occurrence indication (data group time of occurrence indication) for said data group and a time of occurrence indication for said gap (gap time of occurrence indication);

h) retrieving, from the complete series of journal data at the central site, a copy of any of the journal data which goes in such gap upon detecting such gap; and

i) transmitting the copy of the journal data which goes in such gap to the remote site.

8. The method of claim 7 wherein the series of journal data created at the central site is arranged into a series of data blocks and each of the data groups comprises a plurality of the data blocks, the method comprising the steps of creating a sequence of incrementally valued data block identifiers, a different one of such data block identifiers for each of individual ones of the data blocks and comprising the step of storing, as the indication of time in the check point file, a timestamp indicating the time of the data blocks in the series of journal data created at the central site.

9. The method of claim 8 comprising the step of forming an indication confirming that such last one of the data blocks has at least been received at the remote site and the step of storing in the checkpoint file is performed in response to such confirmation.

10. The method of claim 9 wherein the step of forming an indication comprises the step of confirming that such last one of the data blocks has been safely and completely stored at the remote site.

11. A method for creating a control record for future machine recovery of gaps in a copy of a complete series of journal data formed sequentially in time by a computing machine from a complete series of transactional data and transmitted to a remote site, the method comprising the steps of:

a) forming a plurality of data group records, at least one such data group record being formed for each of a plurality of different data groups in the series of journal data;

b) detecting the presence of a gap in the copy of the journal data that has been transmitted to the remote site;

c) creating one said data group record for a boundary of each such gap; and

d) forming an ending timestamp in one of said data group records indicative of the end of the corresponding data group conditioned upon the presence of a gap being detected following such data group.

12. The method of claim 11 comprising the step of creating another said data group record for the gap upon the detection of such gap.

13. A method for creating a control record for future machine recovery of gaps in a copy of a complete series of journal data formed sequentially in time by a computing machine from a complete series of transactional data and transmitted to a remote site, the method comprising the steps of:

a) forming a plurality of data group records, at least one such data group record being formed for each of a plurality of different data groups in the series of journal data;

b) detecting the presence of a gap in the copy of the journal data that has been transmitted to the remote site;

c) creating one said data group record for a boundary of each such gap; and

d) creating an ending timestamp in the data group record for the series of journal data which indicates a time preceding any such gap and creating a beginning timestamp, in the data group record for the gap, which indicates a time for the starting of the corresponding gap in the series of journal data.

14. A method for creating a control record for future machine recovery of gaps in a copy of a complete series of journal data formed sequentially in time by a computing machine from a complete series of transactional data and transmitted to a remote site, the method comprising the steps of:

a) forming a plurality of data group records, at least one such data group record being formed for each of a plurality of different data groups in the series of journal data;

b) forming a timestamp in each said data group record indicative of a time in the sequence of journal data for the corresponding data group;

c) detecting the presence of a gap in the copy of the journal data that has been transmitted to the remote site;

d) creating one said data group record for a boundary of each such gap; and

e) forming an ending timestamp, for each data group record in such gap, which indicates a time that is outside of the time for the gap in the series of journal data to insure that upon such future machine recovery all data in each such gap is recovered.

15. A method for creating a control record for future machine recovery of gaps in a copy of a complete series of journal data formed sequentially in time by a computing machine from a complete series of transactional data and transmitted to a remote site, the method comprising the steps of:

a) forming a plurality of data group records, at least one such data group record being formed for each of a plurality of different data groups in the series of journal data;

b) forming a timestamp in each said data group record indicative of a time in the sequence of journal data for the corresponding data group;

c) detecting the presence of a gap in the copy of the journal data that has been transmitted to the remote site;

d) creating one said data group record for a boundary of each such gap; and

e) forming a timestamp for a beginning and an ending of the corresponding data group, for each data group record in such gap, which indicates a time that is outside of the time for the gap in the series of journal data to insure that upon such future machine recovery all data in each such gap is recovered.

16. The method of claim 14 or 15 wherein the step of creating a timestamp comprises the step of modifying a value representing the time of a boundary of the gap by a gap extender value.

17. A method of logging a complete series of journal data, which is sequentially created in time and is stored in a computing machine system at a central site, to a remote site, the method being characterized in that there may be a gap in the series of data, created at the central site, that is being received at the remote site, the method comprising the steps of:

a) copying the journal data being stored at the central site into a series of data group records, each data group record comprising the complete series of journal data created during a time interval;

b) the step of copying comprising the step of marking a boundary of the series of journal data in each data group record with a data group identifier, which identifies a boundary of such series of journal data, once said data group record contains the complete series of journal data for the corresponding time interval;

c) detecting a condition occurring in the computing system, at the central site, pursuant to which there may be a gap in the complete series of journal data being copied and sent to the remote site;

d) upon detecting the condition pursuant to which there may be such a gap, creating a gap said data group record with said data group identifier, the step of creating the gap data group record comprising the step of retrieving for the gap data group record from the complete series of journal data formed at the central site, a copy of all of the series of journal data created at the central site during a gap time interval extending at least between the time the condition is detected and the time for the end of the series of journal data in the preceding data group record for which there is one of said data group identifiers for a complete series of journal data; and

e) transmitting each said data group record and said gap data group record to the remote site.

18. The method of claim 17 wherein the step of marking each data group record with a data group identifier comprises the step of marking each data group record with an ending timestamp for the last journal data in the corresponding data group record.

19. The method of claim 18 wherein the step of transmitting comprises the step of transmitting each said data group record and said gap data group record over at least one automatic electronic data communication system.

20. The method of claim 19 wherein the step of transmitting comprises the step of transmitting each said data group record and said gap data group record over the same automatic electronic communication system.

21. The method of claim 18 wherein the step of creating the gap data group record comprises the step of

creating the gap data group record with said journal data (data group) in the copied journal data which, in the series, is adjacent to and outside of the gap time interval.

22. The method of claim 21 wherein the step of transmitting comprises the step of transmitting to the remote site a representation of each of said data group identifiers, including the gap data group identifier.

23. The method of claim 21 comprising the step of creating a time of occurrence indication (data group time of occurrence indication) for said data group record and a time of occurrence indication for said gap data group record (gap time of occurrence indication).

24. The method of claim 23 comprising the step of storing, at the central site, a representation of the data group identifier, the gap data group identifier, the data group time of occurrence indication and the gap time of occurrence indication.

25. The method of claim 24 comprising the step of transmitting a representation of the data group identifier, the gap data group identifier, the data group time of occurrence indication and the gap time of occurrence indication to the remote site.

26. The method of claim 18 wherein the step of retrieving for the gap data group record and transmitting comprises the step of copying and transmitting additional data in the series of journal data which, is in the series of journal data created at the central site and is outside of the time for the data in the gap time interval.

27. The method of claim 17 wherein the step of marking an end of the series of journal data in each data group record with a data group identifier comprises the step of marking each data group record with a beginning timestamp corresponding to the end of the previous data group record.

28. The method of claim 18 wherein the step of retrieving comprises the step of using the timestamp for locating the copy of the data which goes in the gap from the stored journal data.

29. The method of claim 4, comprising the step of creating for each said data group record and gap data group record a different one of a sequence of incrementally valued data group numbers.

30. The method of claim 21 or 29 wherein the series of journal data created at the central site is arranged into a series of data blocks and each of the data groups records comprises a plurality of the data blocks, the method comprising the steps of creating a sequence of incrementally valued data block identifiers, a different one of such data block identifiers for each of individual ones of the data blocks.

31. The method of claim 30 comprising the step of storing representations of the data group records identifiers and the data block identifiers at the central site.

32. The method of claim 30 comprising the step of transmitting a representation of the data group record identifiers and data block identifiers in association with the corresponding data group records and data blocks respectively to the remote site.

33. The method of claim 29 wherein the series of journal data created at the central site is arranged into a series of data blocks occurring in a sequence in time and each such data group records comprises a plurality of such data blocks, the method comprising the step of creating an indication of a boundary in the series of data group records between data group records which are adjacent in the series.

34. The method of claim 33 wherein the step of creating an indication of the boundary comprises the step of creating at least one timestamp indicative of a time for the boundary.

35. The method of claim 34 wherein the step of retrieving and transmitting any of the journal data in a gap occurs after the time during which the gap occurred in the series of journal data and using the at least one timestamp for the step of retrieving the data.

36. The method of claim 17 comprising the steps of;

a) temporarily storing in a buffer, comprising a first-in, first-out queue, a copy of the journal data before it is transmitted to the remote site; and

b) storing the copy of the journal data in sequence in a first-in, first-out order in the queue and transmitting the stored journal data from the queue in such order to the remote site.

37. The method of claim 36 comprising the step of detecting when the amount of journal data stored in the queue is above a threshold level and upon such detection, storing the journal data from the queue into a spill file.

38. The method of claim 37 comprising the step of creating at least one data group record corresponding to the journal data in the spill file.

39. The method of claim 38 comprising the step of transmitting the journal data from the spill file to the remote site along with the corresponding data group record.

40. The method of claim 39 wherein the step of transmitting the journal data from the spill file to the remote site is initiated after the spill file is filled with stored journal data above a threshold level in the spill file.

41. The method of claim 40 comprising the step of storing the journal data from the queue into one said spill file and simultaneously transferring the journal data to the remote site from a second said spill file, filled using the step of storing in a spill file.

42. The method of claim 37 wherein the step of storing journal data in a spill file from the queue is halted upon the amount of the stored journal data in such queue falling below a predetermined level.

43. The method of claim 37 comprising the step of temporarily storing the journal data being transferred to the remote site in a further buffer during such transmission.

44. The method of claim 17 comprising the step of creating the journal data by a plurality of databases running simultaneously at the central site.

45. The method of claim 17 comprising the step of creating the data group identifiers as incrementally ordered sequence numbers from one data group record to another.

46. The method of claim 17 wherein the step of detecting a condition occurring in the computing system comprises the step of testing, upon system start-up, the most recent data group identifier for an indication of a complete series of journal data.

47. The method of claim 17 wherein the step of copying the journal data comprises the step of copying the journal data into a memory at the central site and the step of detecting a condition occurring in the computing system comprises the step of detecting an overflow of the system memory.

48. The method of claim 17 wherein the computing machine system comprises a database program for logging journal data and a discrete computer program for executing the recited steps, and wherein the step of detecting a condition at the central site comprises detecting when journal data has been logged at the central site by the database program during a time interval when the discrete computer program was not initialized to execute the recited steps.

49. A system for logging a complete series of journal data, which is sequentially created in time and stored in a computing machine at a central site, to a remote site, the system being characterized in that there may be a gap in the series of data, created at the central site, that is being received at the remote site, the system comprising:

a) means for copying the journal data as it is being stored at the central site into a series of data group records, each data group record comprising the complete series of journal data created during a time interval;

b) the means for copying a series of data group records comprising means for marking an end of the series of journal data in each data group record with a data group identifier, which identifies a boundary of such series of journal data, once said data group record contains the complete series of journal data for the corresponding time interval;

c) means for detecting a condition occurring in the computing system, at the central site, pursuant to which there may be such a gap in the complete series of journal data being copied and sent to the remote site;

d) means operative, upon detecting the condition pursuant to which there may be such a gap, for creating a gap data group record with said data group identifier, the means for creating the gap data group record comprising means for retrieving for the gap data group record from the complete series of journal data formed at the central site, a copy of all of the series of journal data created at the central site during a gap time interval extending at least between the time the condition is detected and the time for the end of the series of journal data in the preceding data group record for which there is one of said data group identifiers for a complete series of journal data; and

e) means for transmitting each data group record and gap data group record to the remote site.


Description

FIELD OF THE INVENTION

The current invention is in the field of remote reliable database change log duplication.

BACKGROUND OF THE INVENTION

Since the automation of business operations, data managers have worried about how to recover from a loss of data, be it from computer malfunction, natural disaster, or human error. Typically, businesses make image copies of the current state of their data on tape and send the tapes to an off-site location for storage in the event of a disaster at the on-site location. In the event of a total disaster, the business can easily restart their operations with the data in the state it was when the last off-site backup was made. This approach suffers from several drawbacks. Transactions occurring after the backup copies are made are lost and have to be recovered by some method other than via the off-site backups. For some businesses, this reentry would just add to the nightmare of the disaster. For example, a bank's automatic teller users or an airline's agents would have to reenter days or weeks worth of on-line activity. Full image backups of this type should be made as frequently as possible to minimize the number of transactions that would have to be reentered. At a typical computer site, backup could be done as often as once a day if there is a period during the day in which little or no activity is taking place. Minimal activity periods are needed for this scheme, since, in many installations, no transactions can be entered while the data is being backed up. Having to stop computer operations presents a problem to sites that are working 24 hours per day. Often, the value of computer equipment is measured by the number of hours per day that it can be actively operating, so increasing downtime and resource usage for backups decreases the value of the equipment. One solution to this problem is that the computer system make instantaneous backups. As each transaction is created and written to disk, it is also written to tape. While this does keep information more up to date and doesn't require any downtime, the disadvantage of this approach is that the perceived response time of the computer is increased to unacceptable levels since the processor must now do twice as much writing to computer peripherals. This instantaneous information, commonly called log or journal data, can be used in combination with the backups and a utility generally provided by the DBMS vendor which would reconstruct the current data from the backups and the log information which shows each successive change to the database. For this to be done reliably requires that every single record of a change to the database be known since changes later in time are directly dependent on changes that were done earlier in time. If earlier changes are not recorded, later changes will have a tendency to corrupt the data. Generally in a recovery process, a data administrator would not use any record of changes that occurred after the point where information about the changes had been lost. This is a case where having no data is actually preferable to having some incomplete data. There is currently no easy way, aside from making complete image backups of the databases, to insure that the journal of database changes which has been moved off-site is actually a complete set of all changes made to the database. Without this assurance, a prudent data administrator would abandon the entire change log back to the last complete image backup. The present invention solves each of these shortcomings of the prior art.

One embodiment of the present invention is known as E-NET1 Release 2.0. An earlier version, Release 1.1 of E-NET1 which is prior art to the present invention, solved some of the problems associated with data recovery, but failed to solve most of the problems that other prior art also failed to solve. E-NET1 Release 1.1 could not handle abnormal terminations of the operating system at the sending site, nor the abnormal termination of E-NET1 Release 1.1. Any gaps in the data that were created in such situations remained as gaps in the data. No capability existed within Release 1.1 to recover a loss of data, and users were required to restore the lost data manually. While E-NET1 Release 1.1 could move log data from a central site to a remote site via electronic communication lines, it required the communication lines to be operating and throughput to be high enough so that the sending queue at the central site did not overflow, since queue overflow in Release 1.1 resulted in lost data. E-NET1 Release 2.0 is referred to hereafter for brevity as E-NET1.

SUMMARY OF THE INVENTION

An embodiment of the invention is a method for creating a control record for use in future machine recovery of gaps in a complete series of journal data formed by a computing machine from a complete series of transactional data. The method includes forming a copy of the series of journal data for transmission to a remote site, forming a plurality of data group records, each data group record identifying a different boundary of each of a plurality of data groups in the series of journal data, detecting the presence of a gap in the copy of the journal data for transmission, and creating one of said data group records for a boundary of each gap.

An embodiment of the invention is also a method of logging a complete series of journal data, which are created and stored in a computing machine at a central site, to a remote site. The method is characterized in that there may be a gap in the data created at the central site that is being received at the remote site. The method involves copying the journal data as it is being stored at the central site, testing for a gap in the copied data, creating a data group identifier for a group of data in the copied data which in the series is outside of the gap, creating a data group identifier for the gap, and transmitting to the remote site for logging a copy of any of the data including the data in the gap and the data group identifiers including the gap data group record.

An embodiment of the invention is also a method for logging at a remote site for future recovery in the event of a failure at the central site, a copy of a complete series of journal data. The journal data is computed at the central site from a complete series of transactional data. The method is characterized in that there may be a gap in the copy of the journal data received at the remote site. The method involves forming a unique series of incrementally valued data group identifiers each of which identifies a different boundary in the complete series of journal data, forming at least one said data group identifier for such data gap, and transmitting the data group identifiers, including such gap data group identifier to the remote site along with the corresponding data group.

An embodiment of the invention is also a method of using a computing machine for recreating data from journal data stored on readable tape at a remote site without the use of the database which was the source of the journal data. The method involves reformatting the journal data from such tape into a form usable by a roll-forward recovery utility program, detecting duplications in such journal data, removing such duplications, recording the data without the duplications, and processing the data without the duplication with a roll-forward recovery utility program to recreate a copy of the original data.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a central mainframe site and a remote mainframe site;

FIG. 2 is a block diagram of a DBMS region of mainframe main memory;

FIG. 3 is a block diagram of an E-NET1 region of mainframe main memory;

FIG. 4 is a block diagram of a receiving E-NET1 region of the remote site mainframe main memory;

FIG. 5 is a logic flow diagram illustrating the E-NET1 region initialization;

FIG. 6 is a logic flow diagram illustrating DBMS interface initialization;

FIG. 7 is a logic flow diagram illustrating the DBMS write routine;

FIG. 8 is a diagram showing the structure of a T-record and an example the contents of a T-record;

FIGS. 9A through 9D, are a block diagram showing the states of the queue handler;

FIG. 10A is a logic flow diagram illustrating the queue handler operations while in states 1 and 2;

FIG. 10B is a logic flow diagram illustrating the queue handler operations while in states 3 and 4;

FIG. 10C is a logic flow diagram illustrating the queue handler operations when the queue fills;

FIG. 11 is a block diagram of the gap recovery region;

FIGS. 12 and 13 are logic flow diagrams showing the operation of the gap recovery programs;

FIG. 14 is a block diagram of the recovery process;

FIGS. 15A and 15B are a logic flow diagram (in parts A, B) showing the flow of the DBMS recovery program E1EXTC;

FIG. 16 is a logic flow diagram of the T-record output routine;

FIGS. 17A, 17B, 17C, 17D, 17E, 17F, 17G, and 17H are block diagrams showing usage of spill files; and

FIG. 18 is a block diagram of database shadowing.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

INDEX

I. DBMS AND MAINFRAME OPERATION, GENERALLY

II. GENERAL DESCRIPTION OF OFF-SITE DATA STORAGE

III. DETAILED DESCRIPTION

A. NORMAL OPERATION OF SENDING AND RECEIVING DATA

B. SPILL FILE OPERATIONS

C. GAP CREATION

D. GAP RECOVERY

E. DBMS RECOVERY

F. EXAMPLE OF SPILL FILE USAGE

G. E-NET1 SHUTDOWN

H. DATABASE SHADOWING OPERATION

IV. SUMMARY

V. TABLES

1.

A. UP-TO-DATE BASE DATASET

B. BACKUP BASE DATASET

C. CHANGE LOG DATASET

2.

A. SUCCESSFUL RECREATION OF BASE DATA

B. ATTEMPTED RECREATION OF BASE DATA

3. DCF DATA STRUCTURE

A. ARCHIVED DATA RECORD STRUCTURE

B. DCF CONTROL RECORD STRUCTURE

4.

A. ECVT STRUCTURE

B. SCOM STRUCTURE

5. SCF DATA STRUCTURE

A. SCF CONTROL RECORD STRUCTURE

B. DBMS CONTROL RECORDS STRUCTURE

C. DATA GROUP RECORDS STRUCTURE

6. SCF CONTENTS EXAMPLE

7. DBMS PARAMETER FILE STRUCTURE

8. E-NET1 PARAMETERS FILE EXAMPLE

9. RCF DATA STRUCTURE

10. RCF CONTENTS EXAMPLE

11. USER PARAMETERS FILE EXAMPLE

12. DCF/SCF/RCF CONTROL RECORD STRUCTURE DETAIL

13. GAP RECOVERY CONTROL FILE EXAMPLE

I. DBMS AND MAINFRAME OPERATION, GENERALLY

FIG. 1 depicts a typical computer site. All of the programs and data discussed subsequently reside in one of two physical locations, the central site 10 and the remote site 12. The central site 10 is where the computer activity of interest is occurring, and the remote site 12 is used to assist in the recovery of the central site in the event of a disaster at the central site. FIG. 1 is a bank computer operation and is disclosed by way of example, but the scope of the invention is in no means so limited. The central site could be a bank data center, a bank branch, or a check processing center. In a bank branch, the computer, by way of example, keeps track of customer's accounts, so as to allow a teller to quickly find a customer's balance, updates customer information when a deposit is made, and updates a customer's name or address. The central site 10 houses a mainframe computer ("mainframe") 16 by way of example, the Model 3090 by International Business Machines Corporation (IBM). The mainframe 16 has such typical components as main memory 14, for the storage of programs and data while the computer is operating; a tape drive 46 for writing to and reading from removable reels of magnetic storage tapes 39; magnetic storage disks and disk drives (disk memory) 18, for storage of data and programs for retrieval by the mainframe 16 and for storage into main memory 14, such disk memory also being used for storage of data and programs while the computer is not operating; user terminals 42, 43 for accepting data and commands relating to the manipulation of data contained in the mainframe 16; an operator console 44, for receiving messages on screen 44a and issuing commands on keyboard 44b relating to the operation of the mainframe 16; and communications interface 50 to a modem 52, for the purpose of electronically sending data from the mainframe 16 to remote locations over communication line 53, which can be a normal telephone line, a dedicated phone line, or other means of transmitting data. Although it will be understood that the central site typically includes multiple user terminals, operator consoles, disk memories, tape drives, communication links, modems, and even multiple mainframes tied together as one large, virtual mainframe, many such duplications have not been shown for simplicity.

The mainframe 16, when first initialized by applying power or using reset procedures loads a program from a predetermined location in disk memory 18 into main memory 14 that controls the hardware of the mainframe and the software being used on the mainframe. This program is known as the "operating system". The operating system program Multiple Virtual Storage (MVS) 20, is loaded into main memory 14 and controls the tape drive 46 and disk memory 18 as well as the operator console 44 and user terminals 42, 43. MVS 20 also controls the placement of programs and data into main memory 14, as well as allocating unused main memory for use by the various executing programs loaded into main memory. MVS logically divides main memory 14 into multiple regions, sometimes called "address spaces" to allow multiple operations or programs to execute simultaneously. FIG. 1 depicts three regions: A region 26, B region 28, and C region 30, but the number of regions is not limited to three. MVS keeps track of what is happening in each of the regions, creating new regions from unused memory and removing regions and programs after the programs have finished their operation.

By way of example, the program, Integrated Database Management Software (IDMS), a product of Computer Associates International, Inc. (formerly a product of Cullinet Software, Inc.), is loaded and running in A Region and the program, Customer Information Control System (CICS), a product of IBM, is loaded and running in B Region. No programs are running in C Region. The programs of each region do not generally interact with each other, but do interact with their own MVS-allocated terminals and disks. However, MVS provides inter-region interaction by the use of an extra-regional memory area known as the Common System Area (CSA) 40, and subprograms of MVS collectively known as Cross Memory Services (XMS) 48. MVS also includes subprograms collectively known as Virtual Telecommunications Access Method (VTAM) 64 for controlling communications through the modem 52.

The programs IDMS and CICS are commonly called Database Management Systems (DBMSs). The purpose of a DBMS is to accept input data from users, store data in an orderly fashion, and retrieve it for later users. Although a detailed discussion will not be given of each DBMS, a discussion of the detailed operation of one DBMS, CICS, will provide one skilled in the art an understanding of how to make and use the invention. Further information will be found for MVS in MVS/Extended Architecture Overview, published by IBM (First Ed., 3/1984, Order #GC28-1348-0 File S370-34), for IDMS in IDMS/R Concepts and Facilities, published by Associates Computer International, Inc. (formerly published by Cullinet Software, Inc.) (Rev. 0.0, 9/1986, Order #TDDB-010010000), for CICS in CICS General Information, published by IBM (4th. Ed., 2/1987, Order #GC33-0155-3), for IMS in IMS/VS Version 2 General Information Manual, published by IBM (2nd. Ed., 6/1987, Order #GC26-4180-1 ), and for CPCS in Check Processing Control System Program Reference and Operations Manual, published by IBM (9th. Ed., 6/1988, Order #SH20-1228-8), and for further information about VTAM communications, VTAM Programming, published by IBM (3rd. Ed., 9/1985, Order #SC27-0611-2 File S370/4300/30XX-50), the disclosure each of which is incorporated by reference herein.

Mainframe 16 running MVS operating system 20, uses terminal 42 for interaction with the IDMS program running in A Region, and user terminal 43 for interaction with the CICS program running in B Region. Further, MVS has allocated several files, called datasets, two of which are shown at 72 and 74 in disk memory 18, for use by CICS. Base dataset 72 contains information vital to the bank, namely data concerning its customer accounts. Table 1A shows typical data contained in the base dataset 72.

If the information contained in base dataset 72 were lost or inaccessible, the disruption to the bank's business would be disastrous. Possible causes of such data loss are varied. The data might be lost due to hardware or software failure, or due to a natural disaster. The data could be temporarily lost as a result of a central site power failure, or an operating system stoppage. If the power is lost, or the operating system stops abnormally, the data stored in main memory 14 is lost, but the data stored in disk memory 18 can usually be accessed normally once power to the mainframe and the operating system is restored. The operating system should not normally stop absent an operator command to shutdown. The condition of abnormal operating system failure is designated ABEND (ABnormal ENDing).

To avoid total loss of data in the case of permanent failure of the disk memory 18, or erasure of the dataset 72, the mainframe operator makes backup copies of the data stored in disk memory 18. Typically this is done during a period when no DBMS programs are running, and on a regular daily, weekly, or other basis. As shown in the example backup data of Table 1B, the bank backs up its data at the beginning of every month. A backup is typically done by an operator issuing a command at console 44 which starts image backup program 106 copying the data from disk memory 18 to tapes 39. The tapes 39 are then sent off-site for secure storage. Typically, this step makes an entire copy of all data. A backup of an entire database is called an image backup since an image of all the data is taken. The opposite of an image backup is an incremental backup where data is not backed up at once, but is backed up in increments.

Image backup procedures have drawbacks. If the data is saved at the beginning of the month, and a disaster occurs in the middle of the month, the data restored from tape would be a half month out of date. For a bank, this would be unacceptable. Customer balances would not reflect deposits and withdrawals that occurred during that half month. A backup could be made more often, but even a daily backup would be out of date if the disaster occurred in the middle of the day. Another drawback to image backups is that, since an exact copy of the entire database must be made, in most installations no part of the database can be changed during the backup process. Because no changes can be made while the image backup copy is being taken, the mainframe's operations must be suspended while the backup is in progress. Since computer downtime is costly, backups must be kept to a minimum. Further costs accrue if the computer center is operated 24 hours a day, since work must stop during the backup.

Banks and other businesses that process transactional data store change logs. A change log is a dataset showing all the changes to a base dataset that have occurred since the last backup. The DBMS software packages, CICS, IDMS, and IMS, include programs which automatically produce change log datasets. Table 1C lists an example of change log dataset 74, and details all the changes made to the base dataset 72, Table 1A. Since a record of a change could take up much less disk space than the data which could have been altered by the change, the log dataset 74, sometimes called journal data, is continuously backed up to tape for safekeeping.

When a disaster strikes, the operator loads the backup data from the last image backup and, in some installations, the change log from tapes 39 onto the disk memory 18, and issues a command via the operator console 44 to initiate a roll-forward recovery program (not shown in FIG. 1) provided by the supplier of the DBMS software. Roll-forward recovery programs are well known in the art and read a record of a change from the log dataset 72, Table 1C, and use the record to determine how the base dataset 72, Table 1B, was changed. Knowing how the base dataset 72 changed, the recovery program can change the restored backup data to reflect the change. After all the change records are applied to the backup data in chronological order, the backup data will have changed to accurately reflect the most current base data.

Table 2A illustrates an example of the operation of the roll-forward recovery process. Table 1B depicts the base dataset 72 as it was restored, current as of Jul. 7, 1989, but out of date as of Jul. 20, 1989, the time of an assumed disaster. Table 1B indicates that the customer account 01-6056 had a credit limit of $1,000 as of Jul. 1, 1989. In the time between Jul. 1, 1989 and Jul. 20, 1989, several changes were made to the value of the credit limit, as shown by Table 1C.

Changes to base dataset 72, Table 1A, are made by a CICS user issuing commands to the DBMS main program 83 in region B (shown in FIG. 2) via user terminal 43. DBMS main program 83 acts upon a command by changing the value stored in the base dataset 72 and adding a record of the change to log dataset 74, Table 1C. The modification of base dataset 72 and the writing to log dataset 74 is done by a subroutine of CICS, journalling routine 66. The journalling routine 66 ensures that every change to base dataset 72 is logged to the log dataset 74. The credit limit for the account 01-6056, changed four times between Jul. 1, 1989 and Jul. 20, 1989, as shown in Table 1C, lines 1, 6, 9, and 16. After the assumed Jul. 20, 1989 disaster, the correct value of the credit limit can be restored by taking the backup value $1,000, in Table 1B, and applying the changes of lines 1, 6, 9, and 16, in order, to arrive at the final value of $6,000. The same can be done for the balance and the interest rate. All other accounts and base data can similarly be restored using this method.

Most DBMSs typically have the built-in ability to create log datasets in tandem with base datasets, and include roll-forward processing capability, as well as the ability to periodically archive data from log datasets to tapes 39 using what is called a batch job. However, the change log dataset is often lost in the same disaster that destroys the base dataset. Ideally, the change log data should be moved off-site as soon as it is created. One problem with continually sending change log data off-site is the possibility of losing at least one record of the log. While losing one record out of possibly millions of records may not seem like a problem, the effect of one missing record in a complex system could be compounded to the point where it would be better to have none of the records than to have all but one of the records.

The ill effects of a missing log record are illustrated by Tables 2A and 2B. Table 2A depicts the correct restoration of the balance of account 01-6056 using the backup base dataset of Table 1A and the log dataset of Table 1C. The changing balance is shown in the "Balance" column of Table 2A as the prior balance is adjusted by the change amount in each log record of Table 1C. The final correct balance is $645.00.

The fact that the change log records are applied in order is critical in DBMSs where data "ripples, " i e. . where one change in a datum causes a change in another datum, which in turn causes another datum to change, and so on to a point where the full effect of a given change of a datum cannot be readily ascertained. In the example, rippling of data occurs because a change in the account balance of account 01-6056 causes a change in other data relating to the amount of interest paid to the account, which causes a change in the data relating to the total amount of interest the bank has paid to accounts and the total amount of deposits which the bank has, which affects still further data relating to the entire balance sheet of the bank. Thus, the effect of each change to the data base cannot be known unless the change is applied to a database that is in the same state as the original database was at the time the change was originally applied.

Table 2B depicts the same process as Table 2A, but with a missing change log record number 17 of Jul. 14, 1989, at 10:01, increasing the balance by $210. Recreating the balance from the prior value and the change log records works fine up until the missing record number 17. From that point on, the balance is incorrect. Finding the missing value and adding another change record at the end of the change log is unacceptable since the effect on rippling is different depending on the order of the change records. Further, the order in which the changes occurred must be known. The backup data can only be brought up to date to the point just before the first missing record number 17, so the value of the backup data is related to the availability of the log of all the changes made to the data.

The present invention involves a method and means for getting the log data to a remote site for safekeeping as soon as possible after a change to the base data is made. The log data is moved quickly, since the length of time it would take for the data to be moved is the length of time for which the base data would be out of date. For example, if a log record representing a change to a base dataset is sent for safekeeping one second from the time the change was made, the reconstructed base data after a disaster would also be at least one second out of date. However, the base data can only be reliably updated to the point of the first gap in the log data however small the gap. If a disaster occurs a month after the first gap, a recovered base dataset would be a month out of date. For this reason, any change logging method must ensure that no gaps occur in the log data sent off-site for remote safekeeping, and if gaps do occur, that the gap data is recovered and sent to the remote site so that the data remains complete up to the moment before a disaster strikes the central site.

Referring again to FIG. 1, the remote site 12 is geographically remote from the central site so that data logged there is not destroyed in the event of a natural disaster or power outage in the area of the central site 10. The remote site 12, houses a mainframe 24 similar to the mainframe 16 which in turn contains much the same hardware and software that is described above for the central site, except as described below. In some situations, the remote site could be used as a replacement for the entire operation of the central site.

Mainframe 24 has main memory 15, disk drive (disk memory) 17, tape drive 47, reels of magnetic storage tapes (tapes) 61, MVS operating system 22, communications interface 56, modem 54, and operator console 13. MVS 22 is the same as MVS 20 and controls the allocation and execution of programs in the multiple regions A 34 and B 36 of main memory 15.

II. GENERAL DESCRIPTION OF OFF-SITE DATA STORAGE

FIGS. 2 and 3 depict the mainframe 16 of FIG. 1 in greater detail with the addition of E-NET1, and embody the present invention. With the addition of E-NET1, the computer program E-NET1 causes copies of log records to be sent from the central site 10 to the remote site 12 as the log records are created by journalling routine 66, and recovers log records to fill in missing data gaps in the transmitted data. Gaps occur when the sending site ABENDs, when E-NET1 terminates operation abnormally, when the DBMS that is generating the data is initiated before initiating E-NET1, if the remote site ABENDs, if communication between central and remote sites fails, or if the memory for E-NET1 overflows.

E-NET1 manipulates and transfers log data in data structures known as transmission records (T-records) 51. Each time journalling routine 66 writes log data to log dataset 74, the exit routine extracts selected data from the log data, as described below, and if any records are extracted, a T-record is created by exit routine 80, and the selected data is duplicated in the data field 49 (see FIG. 8). The data written each time by journalling routine 66 varies in number of bytes each time, but generally does not exceed 30,000 bytes. After creating a T-record in memory and copying log data into the data field 49, exit routine 80 fills in the record length field 302 with the total number of bytes of the T-record, fills in the record type field 304 with the value 20 hex, indicating that the T-record contains data, fills in the DBMS origin code field 308 with the DBMS origin code which was specified when CICS was initiated, and fills in the Original Data Length Before Compression field 314 with the number of bytes contained in the data field. The Data Group Number (#) field 310 and Sequence Number (#) field 312 are not filled in by exit routine 80.

E-NET1 runs in its own region, C Region (see FIG. 3), with a small routine located in B Region. Therefore the response time of the DBMS appears to the user to be the same as if E-NET1 were not running. Because the copying of T-records 51 is done directly from one main memory location to another, E-NET1 does not affect the access times for the disk memory 18.

While the embodiment depicted in FIGS. 2 and 3 has only one DBMS, namely CICS, and one B region 28, running E-NET1, several DBMSs and multiple regions could be running multiple copies of E-NET1 simultaneously. Also, E-NET1 may run on several types of mainframes such as the IBM Models 4341, 4381, 3081, 3083, 3084, 3090, and compatible machines that can run the IBM Multiple Virtual Storage/Extended Architecture operating system (MVS/XA), Version 2, Release 1 or greater, or the IBM Multiple Virtual Storage/Enterprise Systems Architecture operating system (MVS/ESA), Version 3, Release 1 or greater. E-NET1 can operate with several DBMSs, such as IDMS, IMS, CICS, CPCS, but the invention is not limited to specific DBMSs or specific sizes or types of computers. For example, the computer could be a mini-computer if the mini-computer had sufficient power. Although only one DBMS will be described, essentially the same description would apply to any other DBMS, except where noted.

Generally, an embodiment of the invention works by capturing copies of log data as they are written to disk memory by the DBMS, organizing the data into T-records, queuing the T-records, recovering lost T-records (gaps), and sending the T-records over communications lines to the remote site, where the data is eventually stored in suitable archive media. An embodiment of the invention also has the ability to restore the data in the archive media at the remote site into a form suitable for use by roll-forward recovery programs.

III. DETAILED DESCRIPTION

A. NORMAL OPERATION OF SENDING AND RECEIVING DATA

Normal operation, which is comparatively simple, will be discussed first, and the more complex operations in adverse conditions will be discussed subsequently.

Refer now to FIGS. 2 and 3. As the T-records are moved from B region to C region and then to remote site 12, the progress of this movement is tracked by various parts of E-NET1, and the status of the data movement is stored in the Sending Control File (SCF) 76. The structure of SCF 76 will be discussed in more detail below. Database Control File (DCF) 78 is filled with data relating to the data stored on tapes 39. The DCF 78, the structure of which is illustrated in Table 3, is filled with data by the E-NET1 DCF jobstep 84 and by the E-NET1 exit routine 80.

Jobstep is a term known in the art for describing a set of statements in a Job Control Language file (JCL). JCL is a method of initiating programs for execution and a method of locating the data which the executing program will need to execute properly. This method is well known in the art. A JCL file, which is stored in disk memory 18, sets up a program for execution along with creating pointers to data for use by the program and performs housekeeping and initialization tasks, consists of jobsteps, each of which indicates an action to be taken. Upon operator or program request, mainframe 16 will read the requested JCL file and preform in order the actions specified by each jobstep. The result of a request can be altered by altering the JCL file for that request. For example, DBMS main program 83 regularly moves data from log dataset 74 to tape drive 46 by means of a JCL file known as the DBMS Log Archive Job 82. This JCL file, or examples and recommended usage, is supplied as part of the CICS program by its manufacturer. A jobstep, the E-NET1 DCF jobstep 84, has been added to the file so that whenever data is archived to tape drive 46 using the JCL file, an extra step is done, namely to process the E-NET1 DCF jobstep, which fills a record in DCF 78 with data relating to location and contents of the data being written to tape drive 46 by DBMS Log Archive Job 82.

DCF 78 is read by E-NET1 gap recovery programs, and is used to find the location on tapes 39 of specific log data records, for recovery of data from the gaps being recovered. DCF 78 also indicates which log data records exist, but which have yet to be written to tapes 39.

Checkpoint file 104 contains data written by the E-NET1 queue handler subprogram (E1DEQU) 92, shown in FIG. 3. The data in the checkpoint file (discussed in detail below) relates to the progress and success of the data sending operation performed by E-NET1. The checkpoint file 104 is read during a recovery of central site 10 to determine the last T-record 51 successfully sent by E-NET1 at the point in time when disaster struck. If multiple regions are running E-NET1, as discussed previously, each region would have its own unique, separate SCF, DCF, and checkpoint files.

The SCF, DCF, and checkpoint files are stored in disk memory 18. By way of example, each of these files is physically implemented in two identical files residing on two different physical disk drives.

FIG. 5 depicts the logic flow when starting execution of E-NET1 in the E-NET1 C Region 30. During flow block 202, the program image of E-NET1 is loaded from disk memory 18 into main memory 14 and assigned by the operating system 20 to C Region 30. The particular region assigned for E-NET1 is important for the purposes of the present invention, and is assigned based on logic internal to MVS operating system 20.

Consider now some of the programs and flags which are loaded and created. The main E-NET1 program, E1MAIN 94, attaches and controls other processes (shown in FIG. 3), and the enqueuer 88, which runs independently from the main program, is loaded.

Two status flag areas of memory are created. One is the E-NET1 Communications Vector Table 41 (ECVT), having the structure of Table 4A. The ECVT is used to communicate the status of E-NET1 from the E-NET1 region to other regions, such as the DBMS regions. Since the ECVT is accessed by many different regions, it is located in an area of main memory set aside by the MVS operating system in the mainframe for region-to-region communications, the Common System Area (CSA) 40. A Sending Communications block (SCOM) 98, depicted in Table 4B is a memory area which holds status information for use within the E-NET1 region, including flag SCOMQFUL indicating that the queue is full, flag SCOMQMTY indicating that the queue is empty, flag SCOMGRIP indicating that gap recovery is in progress, and flag SCOMSDT indicating that the VTAM session is active (see Table 4B).

In flow block 204, E-NET1 checks the SCF 76 for any records therein that indicate that E-NET1 had previously abnormally terminated as discussed below.

As shown in Tables 5 and 6, SCF 76 is a "duplexed Virtual Sequence Access Method Key Sequenced DataSet" (duplexed VSAM KSDS). The key features of this type of file, as known in the art, are that it is a dataset, is physically duplicated (duplex), consists of multiple records that can be accessed in a logical order (VSAM), and the logical order is based on the value of a key datum stored with the record (KSDS). The key value in each record of SCF for identifying and retrieving the record is a combination of a data group number (data group #) and a DBMS ID and this key is used to identify and locate a record in the SCF, and identify the type of record.

The SCF file contains three types of records. The three types of structures are depicted in Tables 5A, 5B and 5C and are the SCF control record identified by a data group number (#) of 0 and a DBMS ID of 0 (Table 5A); DBMS control records identified by a data group number (#) of 0, and a non-zero DBMS ID field (Table 5B); and Data Group records identified by a non-zero data group number (Table 5C).

Within broader concepts of the invention, SCF records generally, and Data Group records in particular may be fixed or variable length records, logical records or physical records. Also, within broader concepts of the invention, Data Group numbers and sequence numbers, also known as indicators, may be a series incrementally valued numbers or may be parts of a linked list identifying the next data group or data blocks in the series of data groups or data blocks respectively. For example, using incrementally valued numbers, a series of data groups can be inspected for missing data groups by checking, in series, that each number is represented by a data group. A missing data group results in a skipped number. Similarly, a linked list of data groups and data group records could be used in place of incrementally valued numbers. The inspection for missing data groups would, in the latter case, consist of checking each record in the list in series. Likewise, use of sequence numbers is preferred, but not essential. Within broader concepts, data group records representing live data groups, data group records representing spill data groups, and data group records representing gap data group records need not be put in one SCF file, but each type of record could be stored to separate control records or files.

A data group is a set of T-records 51 whose continuity is not in question. The idea of data groups is important to the operation of E-NET1. All data handled by E-NET1 has an associated Data Group number (#). In normal operation, all data sent by E-NET1 is unquestionably continuous, and is contained in a single Data Group. If E-NET1 is aborted and restarted, there would be a question of continuity of data. If E-NET1 could not send data to remote site 12 as fast as it was being generated, causing E-NET1 to buffer the data to disk memory 18 or causing E-NET1 to lose data, there would also be a question of continuity. Therefore, the data from before an abort and the data after the abort must be included in different data groups. E-NET1 handles the assignment of data groups using the SCF.

There is one record per dataset for each key value, so there is one SCF control record per SCF dataset. An example of the contents of an SCF control record is illustrated in Table 6. The SCF control record appears at the logical top of the file since it has the lowest key value. This record, as indicated on Table 5A contains information about the SCF file itself such as flags indicating if either of the SCF files in the duplexed pair is bad, and the multi-user information relating to the SCF file (Data Cluster VSI length and Data Cluster VSI Level #) which is required for handling multiple simultaneous accesses to the file. Rather than closing and opening the SCF every time a series of records are written to the SCF by a single process, the SCF is left open to improve performance, but the multi-use information relating to the SCF, which is in the SCF control record, is updated to allow multiple processes (such as gap recovery) to correctly access the SCF without having to close the file after each series of reads and/or writes by a single process.

The contents of two DBMS control records are shown in Table 6 below the SCF control record. The purpose of DBMS control records is to allow E-NET1 to track in DBMS Type and DBMS ID (Table 5B) which DBMSs regions are in use and the type of each DBMS. Generally, one DBMS control record exists for each DBMS which is used with E-NET1. These records also keep track, in Selective Journal Exit Name, the name of the exit routine to be used to selectively filter the records coming from the DBMS represented by the DBMS type field of the record.

All other records in the SCF 76 of Table 6 below the SCF control record and the DBMS control records, are data group records having the format depicted in Table 5C. Referring to the data group records, the DBMS ID is blank (filled with hex `0` characters) unless the record represents gap data, in which case the DBMS ID is the DBMS ID of the DBMS which created the gap. Each data group record contains two timestamps, a starting timestamp indicating the beginning time for the T-records included in the Data Group and an ending timestamp giving the ending time of the T-records included in the Data Group. Timestamps consist of the date (i.e., Julian century, year and day of year) and time when the T-record was created. The date and 24 hour time (in tenths of a second) is read directly from the T-records 51. Timestamps can be created by a program making a request to the mainframe to read the time and date from real-time clock 8, which is maintained by the mainframe, but the current embodiment reads the date and time directly from the T-record, to arrive at the time the T-record was created instead of the time E-NET1 received the T-record.

The timestamp from the creation time of the T-record, rather than the receipt of the T-record, is preferred, since gap recovery will be extracting information based on the creation time, not the receipt time. Within broader concepts of the invention, timestamps are preferred but not essential. Instead, data group numbers, and preferably sequence numbers, can be used in place of timestamps to reference data groups and T-records, recover data and gaps of data.

Although the date portion of the timestamp is stored in SCF 76 whenever the time portion of the timestamp is stored, Table 6 does not show the date portion, for simplicity. If a data group record represents a data group that contains log records currently being generated, the end timestamp for that data group record is blank, since the time and date are that of an event that has not yet occurred. Table 5C depicts the Status flag in a bit representation, but for clarity the Status flag is depicted in Table 6 as a logical representation, showing both the type of the Data Group (DG TYPE) and the status of the record (STATUS).

The structure of the SCF 76 file is by way of example, and is the preferred layout, but other layouts may be devised within the scope of the invention. For example, although beginning and ending timestamps are disclosed and preferred, one may use a single timestamp in each control record, i.e., a timestamp for the beginning (or end) of one data group and merely imply the end (or beginning) of the preceding (or next) data group from the one timestamp. Within the scope of the invention, one may even replace the use of timestamps with the use of a linked list referencing the next data group or blocks in a series of data groups of blocks, which would be an alternate method of determining the boundaries of the data groups in a series of data groups.

Refer now to FIG. 5 and consider operation when bringing up an E-NET1 region. A blank ending timestamp (in Table 6) indicates to E-NET1, in the flow of block 204, whether E-NET1 had abnormally terminated. When E-NET1 previously terminates normally, E-NET1 records an ending timestamp for all data group records which have blank ending timestamps. Thus, when E-NET1 is restarted normally, E-NET1, during block 204, will not find any data group records in the SCF 76 that have blank ending timestamps. During block 204, E-NET1 will place a current timestamp into the location for the ending timestamp for each open (unended) Data Group record in SCF 76 and label each such record a "gap" record by setting the corresponding bit in the Status flag (Table 5C) for that record. Also for each DBMS control record that was found to be open, a Data Group record is created at the end of the SCF 76 and is assigned a new data group number. E-NET1 determines the value of newly created data group numbers by looking at the SCF and finding the highest numbered data group number used and adding one to that data group number. In the example of Table 6, the highest data group number in use is 6, so if E-NET1 needs a new data group number, the new data group would be data group 7.

E-NET1 then proceeds to the flow of block 206, and initializes the ECVT and SCOM memory areas. Once E-NET1 finishes initializing itself, E-NET1 then proceeds to block 208 and sets E-NET1 status flag ECVTACTV in the ECVT 41 (FIG. 2) to indicate that E-NET1 is active and accepting data. E-NET1 sends a message to the operator console 44 indicating that E-NET1 is active and running at flow block 210, and that installation is complete, leaving E-NET1 to run without further operator interaction at flow block 212.

When the DBMS main program 83 is started in B region and attempts its first write operation to disk memory 18, the DBMS main program executes programs illustrated in the logic flow of FIG. 6; then the logic flow of FIG. 7. The IDMS and CPCS DBMS programs also work in this manner, but for IMS the logic of FIG. 6 is executed immediately when IMS is started, without waiting for the first write operation.

The flow of FIG. 6 starts at block 220 when the operator issues a command at the console 44 for the operating system to start up a region containing the CICS as the DBMS main program, using a JCL file associated with CICS. The operating system then executes the jobsteps listed in the JCL file. Additional programming steps have been added, through the process of installation, to the JCL file causing it to execute the program steps illustrated in FIG. 6. Flow proceeds to block 222 and the JCL program checks the CSA 40 for the existence of the ECVT 41 (FIG. 2), and if the ECVT exists, the program checks the E-NET1 status flag ECVTACTV to see if it is set, indicating that E-NET1 is active. If E-NET1 is active, the program initialization ends and control passes normally to block 230 to the DBMS main program. If the ECVT is not found in the CSA, or the flag ECVTACTV is not set, the program then steps to block 224 and sends a message to the display 44a of operator console 44 requesting that the operator make a decision to hold up the loading of CICS until E-NET1 is loaded and active, or to force the loading of CICS without E-NET1 running. The program then checks the response at 226, and if the operator requested a retry, presumably starting E-NET1 before responding, the program moves back to block 222 to recheck the E-NET1 status as described above. If the operator had responded that CICS should be forced, i.e., started anyway without E-NET1, the program proceeds to block 228, and inserts a gap data group record in the SCF (i.e., at the end of the last record in Table 6). indicating that CICS will be running without E-NET1. Running a DBMS without E-NET1 is one of the conditions that trigger creation of gap data group records and the need to do gap recovery. Summarizing, a gap data group record is inserted into the SCF 76, and the Data Group field of the record is filled with the next available data group number, the DBMS ID field is filled with the DBMS ID for the DBMS, found from the DBMS dependent parameters file 77, the starting timestamp field is filled with the current time and date; and the status flag indicating that the record is a gap record, is set. After the gap data group record is created and stored in SCF 76, the program proceeds normally past block 230 for execution of the CICS program. If the particular DBMS had been used previously with E-NET1, the DBMS control record for that DBMS would already exist in the SCF. If the DBMS record already exists, a new DBMS control record will not be created and the old DBMS control record will be used, but the old one will be "reopened," by filling the starting timestamp field with the current time.

During installation of E-NET1, E-NET1 exit routine 80 (FIG. 2) is linked into the journalling routine 66, so that while the DBMS is running, any write access of disk memory 18 done by the journalling routine (XJCWR in CICS) 66 requires a call to exit routine 80. The actual name of the journalling routine differs from DBMS to DBMS, but can be found in the documentation for the specific DBMS, with the exception of IMS. For IMS, the source program which writes the journal code must be modified. As part of the installation, the DCF 78 file must be created and E-NET1 DCF jobstep 84 is attached to CICS to gather the data necessary for the DCF. Table 3 illustrates the structure of the DCF 78.

The DCF 78 is used by E-NET1 gap recovery programs, discussed subsequently, to locate log records on tape 39 when the records are needed for the gap recovery process. Like the SCF, the DCF is a duplexed VSAM KSDS type file. One DCF record is formed for each contiguous block of log records on tape 39. The key fields for each DCF record in the DCF file are Log/Journal number (#), which identifies the dataset which was the source of the data, and the starting date and starting time of the data. E-NET1 DCF jobstep 84 is added to the JCL file that is supplied with CICS for archiving log records. The added jobstep is the operation of executing program 84, and the JCL file is used to perform the general function of archiving log records. This jobstep writes a record to the DCF 78 showing where on tape 39 to find the data.

The step of installing E-NET1, which needs to be done only once for each DBMS, is slightly different for each DBMS, but the goal is the same: attach an exit routine to capture data as it is written to a log or journal dataset, and provide a means to track the location on tape of records archived from the log or journal dataset. The steps of attaching exit routines and adding jobsteps are conventional and are known to those skilled in the art. The details of the steps vary from DBMS to DBMS. For example, when installing IDMS, an exit routine 80 and an archive copy routine 84 are "link-edited" into the IDMS program image. For IMS, only the exit routine 80 is used, as IMS already maintains the data for locating data on archive tape in a file labelled "RECON," eliminating the need for E-NET1 to separately track such information.

Log records are generally not written to log dataset 74 one at a time, but rather are kept by DBMS main program 83 until several records are ready to be written to disk. Journalling routine 66 is only called by the DBMS main program 83 when the block of records is ready to be written to disk. Table 1C shows an example of how multiple records are blocked. A first block of records 1-10 would become one logical block, records 11-20 would become another logical block, etc. Multiple record blocking increases the speed of DBMS operations by limiting the amount of disk access required to write log records. In the example of Table 1C, to write all 28 records would require only 3 write-to-disk operations. Typical block size would be larger than 10 records/block, but 10 is used for purposes of illustration. Each time journalling routine 66 writes data to log dataset 74, a new T-record 51 is created by exit routine 80, and the T-record 51 contains a copy of the data written. T-records 51 are illustrated in FIGS. 2, 3, 4, and 8.

FIG. 7 depicts the logic flow for the DBMS main program 83 which writes data to log dataset 74 and using E-NET1 exit routine 80 (FIG. 2), creates a T-record and transfers it to C Region 30, where the T-record comes under the control of E-NET1 (more specifically, the enqueuer (E1QPOR) 88) if E-NET1 is active, and handles the case where E-NET1 is not active. The E-NET1 exit routine 80 is added to the journalling routine 66 during installation. Although block 250 of FIG. 7 depicts that the normal DBMS Write Routine 250 occurs after the E-NET1 exit routine 80, the write routine could come first. Whenever the internal logic of the DBMS main program 83 decides to write a block of data which it has generated to disk memory 18, such data, consisting of one or more log records, DBMS main program 83 executes the journalling routine 66 (FIG. 2). At block 237, the program calls user-supplied selective journalling routine 81. This program, supplied by the user, is well known in the art and indicates to the exit routine 80 which records, if any, of the block to be written to disk memory 18 should be ignored, and which should be sent to the E-NET1 C region 30 (FIG. 3). Some users may choose not to provide this routine 81, and consequently, E-NET1 will store every log record generated by CICS. When routine 81 returns control to the journalling routine 66, the flow proceeds to block 238. If routine 81 returns with no records selected, control proceeds to block 250 without activating any other part of the exit routine 80 indicated in FIG. 7. If the routine 81 has selected some records, control passes to block 240. During flow block 240, the flag ECVTACTV of ECVT 41 in the CSA 40 (FIG. 2) is checked to find the status of E-NET1, and if E-NET1 is not active, control passes to block 241.

If the block to be written is the first block written since the DBMS was started, E-NET1 will give the operator an opportunity to fix the problem and retry the operation. To this end during block 242, the operator is asked to make a selection on console 44 (FIG. 1) between a RETRY operation or a FORCE operation. If the operator keys in a RETRY, control passes back to block 240. If the user keys in a FORCE, control passes to block 243, where a check is made to see if a gap data group record is open, indicating that a gap in the data already exists. If a gap does not exist, block 244 is entered, where a gap record is created. If a gap exists, control passes to block 250.

The embodiment depicted in FIG. 7 only requests at block 241 a response from the operator when the first write operation fails. An alternate embodiment would request operator action whenever any write operation fails. This could be accomplished by having block 241 always respond with "yes" regardless of whether or not the write operation is the first one. Another embodiment would never request operator action. This could be accomplished by having block 241 always respond with "no" regardless of whether or not a write operation is the first one.

To create a gap data group record, any open live type data group record in SCF 76 (Tables 5C, 6), i.e., with a blank or an ending timestamp, is closed by putting a current timestamp into the ending timestamp field of the record, and a new data group record is created in the SCF 76 for a new gap data group. If a DBMS control record (Tables 5B, 6) does not exist in the SCF 76 for the current DBMS, a DBMS control record will be added to the SCF 76 containing a current starting timestamp. The existence of a DBMS control record is found by searching the SCF 76 for a record with a data group number of 0 and a DBMS identifier equal to the DBMS identifier stored in the DBMS parameter file 77 (FIG. 2), the structure of which is shown in Table 7. This DBMS identifier is also the DBMS identifier used when a new DBMS control record is added to the SCF 76. A DBMS control record is created during block 244 of FIG. 7 during the first execution of this logic after the start up of the DBMS.

FIG. 8 shows the word format or structure of a T-record 51 (FIG. 2) and an example T-record 51. The exit routine fills the DBMS Origin Code field 308 with the DBMS code read from the DBMS dependent parameters file 77, and fills the original length field 314 with the length of the data field before compression. The data group number field 310 and the sequence number field 312 are filled by the queue handler 92 when T-record 51 is removed from the FIFO queue 100 and a data group number (#) and sequence number are assigned to the T-record. The Timestamp field 316 is filled in with the time of creation of the T-record.

The time of creation may be determined differently for different DBMSs. By way of example, in a T-record from a CICS DBMS, the timestamp can be read from the beginning of the data field 49, in an area known in the art as a "CICS label record". In a T-record from an IDMS DBMS, the timestamp can be read from the beginning of the data field 49, in an area known in the art as an "IDMS TIME record". In a T-record from an IMS DBMS, the timestamp is determined by looking at all the IMS log records stored in a data field and setting the timestamp equal to the latest timestamp value found in an IMS log record.

The other fields stored in a T-record are the overall length field 302, which changes from time to time as the T-record is compressed, encrypted, decompressed and decrypted and which indicates the current size of the T-record, and a Record Type field 304 in which all records containing actual data will have a Record Type of 20 H (hex) and those without data have either an 80 H, 40 H, or 10 H. T-records having a Record Type of 80 H are E-NET1 Startup Header Records. An E-NET1 Startup Header Record is sent to the remote site each time E-NET1 first starts up. The only other field used in such a T-record is the Timestamp field which indicates the time at which E-NET1 was started. T-records having a Record Type of 40 H are E-NET1 Shutdown Trailer Records. An E-NET1 Shutdown Trailer Record is the last T-record to be sent to the remote site before E-NET1 shuts down at the central site. The only other field used in such a record is the Timestamp field, which contains the time at which the T-record was created. T-records having a Record Type of 10 H are Data Group Header Records. A Data Group Header Record is sent to the remote site just before the first T-record of each data group is sent. This type of T-record contains a data group number in the Data Group Number field 310, and a timestamp in the Timestamp field 316 which is a copy of the starting timestamp stored in the SCF 76 for the data group specified in the Data Group Number field.

The Record Type 304 is a byte field. The DBMS Origin Code 308 is a 2-byte field where the first byte is a character, the second byte has three digits with a range of only 000-255, the data group number 310 and sequence number 312 are full words, the original length 314 is a full word, and the data field 49 is a variable length, up to 32735 bytes.

Referring back to FIGS. 3 and 7, if E-NET1 is found to be active during block 240, a pointer to the enqueuer (E1QPOR) 88 located in a different region, C region 30 (FIG. 3), is read from the ECVT 41, in the variable ECVTPCN0 (Table 4A). The pointer is then passed to XMS 48, provided as part of the operating system, which maintains a table to convert the pointers to program locations, and which handles moving T-records from B region 28 to C region 30. XMS 48 invokes the enqueuer routine 88, to be discussed. If the enqueuer routine is able to put the T-records passed by XMS onto the FIFO queue 100 during block 246, the enqueuer will pass back a return code of 0. If the enqueuer cannot place the T-records onto the FIFO queue, i.e. because the FIFO queue has filled to capacity, the enqueuer will return a non-zero return code. Normally, the queue handler 92 will prevent the FIFO queue 100 from filling to capacity. If the return code is non-zero, control proceeds from block 248 to block 241, to give the operator an opportunity to retry moving the T-record to FIFO queue 100. If the error code is 0, indicating that the data has been successfully written onto the FIFO queue, control passes from block 248 to block 250. After the block of data is written to log dataset 74 during block 250 control is passed back from routine 66 to DBMS main program 83 at block 252.

The activity of enqueuer 88 is initiated only by XMS 48. When initiated, the enqueuer 88 attempts to place a T-record incoming from XMS onto the FIFO queue 100, and returns a code to XMS indicating the success of the operation.

E1MAIN 94, the main E-NET1 program, controls the attachment of a warning generator 86 (E1WARN), a wake-up timer (E1TIMR) 99, the queue handler (E1DEQU) 92, a spill file reader (E1SPRD) 95, and a buffer sending program (E1TRAN) 97. The wake-up timer (E1TIMR) starts up the queue handler and the spill file reader at regular intervals so that they have a chance to handle the changing status of flags in the SCOM memory area 98 and in the SCF 76. The queue handler 92 moves T-records out of the FIFO queue 100 and prepares the T-records for transfer to the remote site 12, as well as initiates gap and spill file recovery as necessary. During normal operation, the logic of gap recovery and spill file handling will not be used. Under normal conditions, the queue handler moves T-records from the FIFO queue 100 to the VTAM buffers 102. The queue handler 92 assigns the data group 310 and sequence numbers 312 (FIG. 8) to the T-records 51 as they are taken off the queue. The data group number to be assigned is read from the last open data group record in the SCF 76. The sequence number to be assigned is remembered by the queue handler, and is a sequential number which increases by one for every T-record 51 read from the FIFO queue, and is set back to 1 every time a new data group is formed. Thus, a T-record 51 has a unique identifier of the data group 310 in combination with the sequence number 312.

The checkpoint file 104 (FIG. 3) contains the data group number and sequence number of the last successfully transmitted T-record, as well as a timestamp of that last T-record. For a T-record to be determined to have been successfully transmitted, the VTAM buffer for that T-record must have been released for reuse. The operation of transmission and transmission confirmation, which releases the buffers, is discussed subsequently.

The queue handler, E1DEQU, and E1TRAN track which buffers in FIFO queue 100 and VTAM buffer 102 are free by means of a flag associated with each buffer. Another embodiment uses two linked lists, one to track which buffers are occupied and one to track which buffers are free.

The regular interval for updating the checkpoint file 104 is specified as a number of T-records and the number is kept in a variable, CKPTINT of the E-NET1 parameters file (see Table 8). Table 8 shows the format and example contents of the E-NET1 parameters file 87. The file 87 is a text file with multiple lines, each line having a variable identifier and a variable value. CKPTINT in the example of Table 8 is set to record a checkpoint every 1000 T-records. A checkpoint is simply a marker indicating the latest T-record successfully received by the remote site and confirmed by the remote site. The checkpoint is used as the starting point for a recovery.

Referring to FIG. 3, the E-NET1 parameters file 87 also contains the values QUESIZE for the size of the FIFO queue 100 in kilobytes, the queue high threshold percentage QUETHHI, queue low threshold percentage QUETHLO, and queue warning threshold percentage QUEWARN, respectively. The flow of the queue handler 92 (E1DEQU) is determined in part by these values. Under normal operation, the FIFO queue 100 never fills up to the queue high threshold, but is emptied to the VTAM buffers 102 and transmitted to the remote site 12 at a rate at least as fast as the rate the data is being collected by the enqueuer 88. If the FIFO queue 100 were to fill up past the QUEWARN percentage, E1DEQU 92 sends a command via E1MAIN 94 to the warning generator E1WARN 86, which would then display a message on the operator's console 44 to the effect that the FIFO queue has filled past its warning level.

The queue handler (E1DEQU) compresses the data field of the T-record as it is read from the FIFO queue for efficient handling and transfer of data. After compression, E1DEQU encrypts the data if a user-supplied encryption program 91 is made available to E1DEQU. To make the encryption program available to E1DEQU, the user names the program "E1CRPT" and includes it in a link-edit of E1DEQU. If the user supplies an encryption routine 91 (FIG. 3), the user must also supply a decryption routine 816 (FIG. 14), to be used in the event that database recovery is needed. E1DEQU calls the encryption routine 91 if used, passing the routine a T-record with an unencrypted data field 49, and the encryption routine returns control to E1DEQU, passing back the T-record 51 with the data field 49 encrypted.

Buffer sending program E1TRAN 97 is another program started and attached by the E-NET1 main program E1MAIN 94. E1TRAN passes data from the VTAM buffers 102 to the operating system supplied VTAMcommunications programs 64 which move the data to the remote site. E1TRAN invokes the communication programs. The VTAM buffers are not released by E1TRAN 97 for reuse until E1TRAN receives a return confirmation from the remote site 12 via VTAM communication programs 64. This ensures that data will not be discarded before it is safely copied to disk memory 17 at the remote site 12. This feature ensures that no data is lost if the communication line 53 or the remote site mainframe 24 unexpectedly fails. VTAM communications programs 64 on mainframe 16, communication programs 60 on mainframe 24, communications interfaces 50, 56, modems 52, 54, and communication line 53 are collectively referred to as a "VTAMlink." Alternate embodiments of the present invention may use means other than those depicted in FIG. 1 to communicate from the central site to the remote site, such as a direct hardware link, or dedicated wires. Significantly, such communication links operate similarly to the embodiment depicted in FIG. 1, and each is generally referred to as a "VTAM link." The effects of the remote site failing to confirm receipt of data blocks is discussed below along with spill file operations. If confirmations are not received in a timely manner, FIFO queue 100 fills, and the-creation of spill files becomes necessary.

FIG. 4 depicts the remote site 12 with the E-NET1 remote site software installed and running in the A region 34 of mainframe 24 (FIG. 1). B region 36 is shown in FIG. 1 along with A region 34 for illustration of the multiple regions of the main memory 15 or remote mainframe 24, but only one region is used at the remote mainframe 24 for E-NET1. Region 36 is empty and is not running any programs. The receive control task program E1MNRC 130 is a program initially loaded to control the E-NET1 receiving region 34. After attaching and starting the receive journal blocks program E1RECV 128 and the write-data-to-disk program E1JFMS 132, E1MNRC 130 is not active. E1RECV 128 receives the T-records 51 as sent by E1TRAN 97 (FIG. 3) running on the central site mainframe 16, and puts the T-record 51 into VTAM buffers 126. E1JFMS 132 then removes the T-records 51 in the order received, from the VTAM buffers 126 and puts the T-records into a disk journal file 120 in disk memory 17 (FIG. 1). When a T-record 51 is written with certainty to the file 120, E1JFMS 132 posts a message to E1RECV 128 to the effect that a data block was securely written to disk. E1JFMS 132 then frees the VTAM buffer containing the just written data block, for reuse by E1RECV 128. When E1RECV receives the message from E1JFMS, a VTAM confirmation is sent to the central site over communication line 53 (FIG. 1). This confirmation is known in the art as a "DRI" or Definite Response Confirm. VTAM confirmation is done as taught and described in the VTAM manual supplied by IBM with the VTAM product which is incorporated herein by reference.

When E1JFMS 132 fills a file 120, it initiates the unload disk to tape routine (unloader) 134, which moves the file 120 to tape drive 47 which copies the file 120 onto a removable magnetic tape 61.

The Receive Control File (RCF) 124 contains data relating to the progress of data being received and the location of received data. The structure of the RCF 124 is shown in Table 9. Table 10 shows typical data contained in the RCF.

Table 9 shows the format of each record in the RCF 124. Each record represents a contiguous series of T-records 51, all of which are stored in a single location. The first four fields shown in Table 9 indicate the start and end of the extent of the T-records 51 in the contiguous series. A T-record is uniquely identified by its data group number and sequence number, thus the first four fields of a record in the RCF 124 identify a first and a last T-record. In addition, the starting timestamp indicates the time applicable to first T-record of the series and the ending timestamp indicative of the time applicable to the last T-record of the series. The Dataset Name field indicates which file 120 in disk memory 17 contains the T-records. After the data in file 120 is moved to tape 61, the Unit Type, Volume Serial Number, and File Sequence Number fields indicate which tape the series of T-records was written to.

Table 10 depicts an example RCF. The first series of T-records, as shown in the first line of Table 10, begins with the T-record with a data group number of 1 and a sequence number of 1, and the series ends with the T-record with a data group number of 1 and a sequence number of 1299. The second series of T-records, as shown in the second line of Table 10, begins with the T-record with a data group number of 1 and a sequence number of 1300, and the series ends with the T-record with a data group number of 1 and a sequence number of 1503. The third series of T-records, as shown in the third line of Table 10, begins with the T-record with a data group number of 2 and a sequence number of 1, and the series ends with the T-record with a data group number of 2 and sequence number of 306.

A record is created in the RCF whenever E1JFMS 132 writes a T-record to file 120 and finds no open records in the RCF to record the fact that the T-record was written to file 120. An open record is defined as one that has blank ending data group number (#) and sequence number (#) fields. Whenever a file 120 is filled, E1JFMS program 132, writes the data group number, sequence number and timestamp of the last written T-record 51 into the ending data group number, ending sequence number and ending timestamp fields of the last open record of the RCF 124, thus closing the record. The next time E1JFMS 132 writes a data block 49 to file 120, a new open record, as depicted in Table 9, will be created in the RCF 124 as, for example, at the end of Table 10. When a file 120 fills up, E1JFMS 132 will close the file, start to fill a new file 120, and initiate, via a method known in the art as submitting, the batch JCL job E1AJNL 134. The size of each file 120 is selected to closely match the capacity of the tapes 61 so that one file 120 should just completely fill one tape. The batch JCL job E1AJNL 134 moves the data from a file 120 in a disk memory 17 to tape drive 47 which in turn writes the data to a tape 61. The batch JCL job E1AJNL 134 routine then frees up the file 120 for reuse by E1JFMS 132.

The write-data-to-disk program E1JFMS 132, when processing two T-records 51 that are out of order relative to one another, will close the currently open RCF record (Table 9) by filling in the ending data group number, ending reference number and ending timestamp field using data group number, sequence number and timestamp of the first T-record. Furthermore, a new open RCF record will be created with the starting data group number, starting sequence number and starting timestamp fields filled in using the data group number, sequence number and timestamp of the second T-record.

The tapes 61 containing the T-records 51 derived originally from the log datasets 70, 74 at the central site 10 are stored for use in future recovery in case recovery is ever necessary because of a disaster or the like at the central site. DBMS recovery using these tapes is discussed subsequently.

B. SPILL FILE OPERATIONS

If the queue handler 92 (FIG. 3) cannot empty the FIFO queue 100 due to the communication line 53 being down, or due to data being generated too quickly, E1DEQU 92 will "spill" the FIFO queue into a dataset (i.e., spill file 93) to keep the FIFO queue from overflowing. When E1DEQU 92 cannot fill spill files 93, which would be the case if disk memory 18 (FIG. 1) fills to capacity, E1DEQU 92 allows the FIFO queue 100 to fill completely, which triggers the enqueuer E1QPOR 88 to stop accepting data from XMS 48, which in turn generates a gap data record in the SCF 76 when the first non-zero return code (as shown in FIG. 7) is sent back to exit routine 80 (FIG. 2). This process is discussed in detail subsequently, during the discussion of gap creation. The spill files are saved for transmission until the communication line 53 is restored to operation, or when the DBMS activity slows to the point that the spill files 93 can be sent.

Referring to the flow diagram of FIGS. 9 A, B, C and D, the queue handler E1DEQU 92 is always in one of four states identified as the normal state 1; the queue spilling state 2; the co-process state 3; and the queue filling state 4. FIGS. 9A-D illustrate a high threshold level HT and a low threshold LT of data in FIFO queue 100 which will be referred to in the subsequent discussion.

Information used to control the flow of E1DEQU 92 (FIG. 3) comes from other programs. For example, the status of VTAM is known by the buffer sending program E1TRAN 97. E1TRAN 97 keeps several flags (see Table 4B) in SCOM 98 updated showing the status of VTAM. E1DEQU 92 determines the status of the VTAM buffers by looking to see if a flag in SCOM has been set. The flags in SCOM 98 and their meaning are shown in Table 4B. When set, the flag SCOMSDT indicates that VTAM is operational. The flag is set and reset by E1TRAN based on its internal logic.

Consider the operation in more detail with reference to Table 4B and FIGS. 10A, 10B, and 10C which depict the logic of the queue handler program E1DEQU 92. E1DEQU 92 starts at block 502 when E1DEQU 92 is attached by E1MAIN 94. At block 502, program ECVT 41 and the SCF 76 are set up as depicted in more detail in FIG. 5, and as discussed in Section IIIA, entitled, "Normal Operation of Sending and Receiving Data." The SCOM 98 flag SCOMQFUL is checked at block 503. If the flag is set, usually by the enqueuer 88, the FIFO queue 100 is full as indicated in FIGS. 9B and 9C and must be handled. This handling is discussed in more detail subsequently in connection with FIG. 10C.

If SCOMQFUL is not set, FIFO queue is not full, as depicted at FIG. 9A, and block 504 is entered where E1DEQU 92 is set to state 1. E1DEQU then repeatedly executes a loop starting with block 506 and ending at block 505 until a shutdown is requested by the operator at console 44.

Assume the system is at block 505. If a shutdown has been requested, control passes to block 507 and E1DEQU 92 stops. Otherwise control is passed to block 506 to check the state of E1DEQU. During flow block 507, E1DEQU closes the currently open data group record in SCF 76 by placing the current time and date from real-time clock 8 in the ending timestamp field of that data group record. This step is necessary since an open record indicates that E-NET1 is currently running or has improperly stopped.

E1DEQU 92 adds a live data group record to the SCF 76 as each new data group is created. As discussed previously, a new data group and data group number are created whenever continuity of data comes into question. The situations where a new data group is created is discussed subsequently along with the discussion of the logic accompanying the creation of new data groups.

Each time a T-record 51 is read from the FIFO queue 100, the T-record 51 is deleted from the FIFO queue, and is assigned a data group number and sequence number, which numbers are placed in the appropriate fields. More specifically T-record 51 49 is read from the FIFO queue 100 in a first-in first-out order. E1DEQU 92 keeps track of the current data group number and sequence number in that each time a T-record is read from the FIFO queue, the sequence number is incremented by 1. Each time a new data group is created, the sequence number is reset to 1. Both the data group number and the sequence number are written by E1DEQU to the T-record 51 (FIG. 8) being taken from the FIFO queue 100.

E1DEQU 92 also compresses and encrypts the data fields 49 of T-records 51 after removal from the FIFO queue, so all the data in the VTAM buffers 102 and in the spill files 93 is in the compressed and encrypted format, ready to be sent to the remote site. The processes of compressing and encrypting were discussed previously, as part of normal operations discussions in Section III. A.

Briefly, state 1, the normal state illustrated in FIG. 9A, is generally the state occupied by E1DEQU when no spill files 93 exist and the queue 100 is less full than the queue high threshold percentage value QUETHHI (HT in FIGS. 9A and 10) found in the E-NET1 parameters file 87. Although spill files 93 could exist while E1DEQU is in state 1, none are shown in FIG. 9A, since any existing spill files are ignored while E1DEQU is in state 1. This is the state to which E1DEQU will return eventually if VTAM is operating and data is entering the FIFO queue 100 at a rate lower than the maximum capacity of E-NET1. As long as E1DEQU stays in state 1, no new data groups are created.

Consider state 1 in more detail. During block 506 (FIG. 10A), if the state is found to be state 1, E1DEQU proceeds to block 508. During block 508, E1DEQU moves as many T-records as possible from the FIFO queue 100 over to available VTAM buffers 102. The logic flow then proceeds to block 510, where E1DEQU makes a test or query. The query has four parts, all of which must be true: (1) is the FIFO queue less full than the queue low threshold percentage value (Lt in FIG. 9A) QUETHLO found in the E-NET1 parameters file 87, (2) is the SCOMSPRD flag of Table 4B set, indicating that data exists in spill files 93 and is ready to be transmitted, (3) is the SCOMSDT flag of Table 4B set, indicating that VTAM communications 64 is currently operating, and (4) are all VTAM buffers free or empty. If this multi-part query is true, E1DEQU flows to block 512, and the state is changed to state 4 (FIG. 9D). During block 513, the current data group record in the SCF 76 is closed, and a new live data group and live data group record are started. Control is then passed to block 505.

If the multi-part query is false, E1DEQU moves to block 514 and checks whether the FIFO queue 100 is as full as or fuller than the queue high threshold QUETHHI (HT in FIG. 9A). If the queue is not as full as the high threshold QUETHHI, E1DEQU remains in the normal state 1, and control passes to block 505. If the FIFO queue is as full as or fuller than the high threshold, E1DEQU checks the SCOM flag SCOMSPRD of Table 4B at block 516 for a spill file condition, and checks the SCOM flag SCOMSDT of Table 4B at block 518. If either of these flags is unset, the state is changed to the queue spilling state 2 at block 520. At block 521 E1DEQU creates a new spill data group. The new spill data group begins with the T-records from the unconfirmed VTAM buffers. Then control is passed to block 505. If both flags are set, E1DEQU then checks at block 522 to see if VTAM buffers 102 are not empty (i.e., is currently sending data), or if VTAM buffers 102 are empty (see subsequent discussion on how to detect if VTAM buffers are empty). If the VTAM buffers are empty, the state is changed to the co-process state 3 at block 524 and then at block 525 a new spill data group and spill data group record are created. If VTAM buffers are not empty E1DEQU stays in state 1, and no new data group and data group record are created. In either case, control passes to block 505.

E1DEQU determines that all VTAM buffers are empty (free) by scanning all the VTAM buffers to check the flag associated with each buffer. In an alternate embodiment, E1DEQU checks an occupied buffer linked list to see if the list is empty, indicating that no occupied buffers exist. The linked list approach may be preferable, for performance reasons, when using a large number of VTAM buffers.

Briefly, state 2, the queue spilling state of FIG. 9B, is the state where data is moved from the FIFO queue 100 to spill files 93 (FIG. 3) and no data goes to the VTAM buffers 102. E1DEQU transitions to state 2 when the VTAM buffers are not operating or the FIFO queue is full and no other spill files exist. Since state 2 is the only state that does not involve the use of VTAM buffers, it is the proper state for E1DEQU when VTAM is not operating. If VTAM buffers are operating and other spill files exist, state 2 is not the proper state, since the content of other spill files could be sent to the remote site via VTAM, and other states would be more appropriate. However, if a spill file is in the process of being filled but is the only spill file, state 2 is the appropriate state, given that the VTAM buffers cannot be sent a partially full file. A spill file might not be absolutely full, but if a T-record cannot completely fit into a spill file, the file is considered full. In some situations, a partially full file can be sent, but the requirement that the spill files fill before sending them simplifies operation of the spill file sending programs. From state 2, E1DEQU transitions to either state 1 or state 4.

Consider state 2 in more detail. During the flow of block 506, if the state is found to be state 2, E1DEQU proceeds to block 526 where data is quickly spilled from unconfirmed VTAM buffers 102 and FIFO queue 100 to a spill file 93. T-records from unconfirmed VTAM buffers followed by the FIFO queue are copied to a spill file 93. Unconfirmed VTAM buffers are copied to spill file 93, otherwise a case could arise where the data is sent to the remote site 12 but remote site mainframe 24 aborts before it writes the data safely to disk memory 17. This would mean that although the data was sent, the confirmation from the remote site to the central site was never received. If this condition persists, and only T-records from the FIFO queue are spilled to spill files 93, there is a chance that the central site mainframe 16 could abort, resulting in the total loss of the data in the VTAM buffer 102.

Since the T-records in the VTAM buffer 102 are older than the T-records from the FIFO queue 100, loss of the data in the VTAM buffer 102 would make the T-records from the FIFO queue useless, for reasons discussed previously. While this condition will occur very infrequently, it must be addressed since the usefulness of E-NET1 is in situations that happen, but that don't happen very often. When T-records are removed from unconfirmed VTAM buffers 102 by E1DEQU 92, the data group number field and the sequence number field are changed to the current data group number and sequence number. This places the T-records in a new data group, a spill type data group.

After the T-records are written to a spill file 93, E1DEQU at block 528 checks the status of the currently open spill file 93. If the spill file is not full, E1DEQU remains in state 2 and control is passed to block 505 since a spill file must be filled before it can be sent to the remote site. If the currently open spill file 93 is full, E1DEQU closes the full file at block 529, creates a new data group and data group record and opens a new spill file. Control is then passed to block 532, and the level of the FIFO queue is compared with the queue low threshold value QUETHLO at block 532 (LT in FIG. 9B). If the FIFO queue is below the low threshold, the VTAM buffers 102 are checked at block 534. If the VTAM buffers are empty, the state is changed to state 4 at block 536. Either way, control passes to block 505. If the FIFO queue is not below the low threshold, the SCOMSDT flag of Table 4B is checked at block 538. If the flag is unset, E1DEQU stays in state 2. If the flag is set, the state is changed to state 1 at block 540. Either way, control passes to block 505.

Although state 1, the normal state, is entered from state 2 while spill files still exist, E1DEQU will remain in state 1 only long enough to empty the FIFO queue below the low threshold. E1DEQU then transitions to state 4, and the spill files are sent.

Briefly, state 3, the co-processing state of FIG. 9C, is a state where data is moved from the FIFO queue 100 to spill files 93, while other spill files 93 are being moved into the VTAM buffers 102 and sent to the remote site. This state is arrived at from state 4 by having the FIFO queue 100 fill past the high threshold HT and having VTAM operational. E1DEQU transitions out of state 3 into state 4 when the queue drops below the low threshold LT, VTAM is operating, and all the spill files 93 are full.

Consider state 3 in more detail with reference to FIGS. 10a and 10b. When control is passed to block 506, if E1DEQU is in state 3, control moves to block 542 where E1DEQU spills the FIFO queue 100 to a spill file 93, then at block 544 sends the contents of the oldest spill file to the VTAM buffer 102. During block 546, E1DEQU checks that the conditions are right to transition to state 4. The condition tested in block 546 to enter state 4 is true if the FIFO queue is less than the queue low threshold LT, the SCOMSDT flag is set, and the currently filling spill file is full. If the condition is true, the state is changed to state 4 at block 548. If false, E1DEQU remains in state 3. During block 547, E1DEQU tests for a full spill file. This condition is always true if the state was just changed to state 4, since the condition for changing state includes having a full spill file. If the spill file is full, a new data group and data group record are created at block 549. Control is then passed to block 505.

Briefly, state 4, the FIFO queue filling state, is a state where data is not moved out of the FIFO queue 100, and spill files 93 are sent to the remote site via the VTAM buffers. This state is arrived at by having the queue less full than the low threshold LT. E1DEQU transitions to state 1 if VTAM is operating and no more spill files exist, but transitions to state 3 before then if the FIFO queue fills past the high threshold HT.

Consider state 4 in more detail with reference to FIGS. 10A and 10B. When control is passed to block 506, if E1DEQU is in state 4, control moves to block 550, where E1DEQU allows the FIFO queue to fill with data. During block 552, E1DEQU sends the existing spill files, oldest first, to the remote site via the VTAM buffers 102. During block 554 E1DEQU checks the level of the FIFO queue 100. If the FIFO queue is as full as or fuller than the queue high threshold HT, the state is changed to state 3, the co-processing state, at block 556, and control is passed to block 505. If in block 554 the FIFO queue is less full than the high threshold HT, E1DEQU checks the SCOM flag SCOMSPRD at block 558 and the flag SCOMSDT at block 560. If SCOMSPRD is unset and SCOMSDT is set, E1DEQU changes back to the normal state 1, at block 562. If available spill files do exist at block 358 (SCOMSPRD unset), or VTAM communications 64 is not available at block 560 (SCOMSDT unset), E1DEQU stays in state 4 until the FIFO queue fills above threshold HT, and at that point transitions to state 3 going from block 554 to block 556. In either case, control is then passed to block 505.

No new data groups or data group records are created in state 4. If a data group were created in state 4, the previous data group, which would have been created just before entering state 4, would always be empty since no data is ever read from the FIFO queue 100 during state 4.

Spill file handling by E1DEQU completes taking into account the possibility that other programs will create spill files besides E1DEQU such as the gap recovery program. The gap recovery program will be discussed below.

Within broader concepts of the invention, spill files need not be based transmitted to the remote site in a first-in, first-out basis, but could be output in any number of orders. Also, the trigger for the use of a spill file need not be a threshold level. For example, the use of spill files for containing data to be transmitted could be continuous, where any data in the FIFO queue is unconditionally sent to spill files, or where, instead of a threshold level triggering spilling, other indicators, such as the rate of data generation by the E-NET1 exit routine, or any number or combination of other factors.

Referring to FIGS. 10A and 10C if, when E1DEQU is started the SCOMQFUL flag is set, E1DEQU branches at block 503 to block 580. At that point, the SCOMCLQU (meaning "clear queue") flag set on. At block 582, E1DEQU moves the FIFO queue 100 T-records into spill file 93. Each spill file 93 is generally made to be the same size as the FIFO queue, so that this operation will fill exactly one full spill file. After spilling the FIFO queue, SCOMCLQU is set OFF in block 584, SCOMQFUL is set OFF at block 586, and the open gap data group record, which was created by the event that caused the SCOMQFUL flag to be turned on, is closed at block 588 by placing an ending timestamp and setting the "gap established" flag in the SCF 76 record for the gap. At block 590 E1DEQU initiates or "submits" 590 gap recovery for the newly closed gap, and returns to block 505, which is the normal return location for the other E1DEQU operations. As illustrated in FIGS. 10A and 10C, the FIFO queue full processing is independent of the state of E1DEQU.

Submitting is a method known in the art whereby one process initiates another process (by means of JCL), but where the submitted process executes independently of the submitting process, and executes in its own region, on the same mainframe computer or a different mainframe computer in the same complex. The submitting process does not wait for the submitted process to finish, but continues executing after "submitting" a request to the operating system to initiate the submitted process. In the case of E1DEQU submitting the gap recovery process 600 (FIG. 11), E1DEQU continues its own processing while the gap recovery process executes. When the gap recovery process completes, the operating system removes the region which the operating system created for the submitted process.

C. GAP CREATION

A "gap" is defined as a time interval in which an unknown amount of database activity, possibly none, occurred during which E-NET1 was not able to record and store T-records with certainty at the remote mainframe 24. A gap can be characterized as a time span of missing log dataset activity from one of the DBMSs tracked by E-NET1. The time span is characterized by a starting and an ending timestamp. There are several causes of gaps, and each cause may be detected by a different program within E-NET1. The details of each different program are discussed subsequently, however, each program handles gaps in essentially the same manner.

Generally, when a gap is first detected by any program, the detecting program creates a new gap data group record in the SCF 76 (Table 6) for that gap. Some of the fields of the gap data group record are filled in by the detecting program. The fields filled are the data group number (DG#), the starting timestamp (date and time), the DBMS ID and the DG Type flag indicating that the record is for a data group which contains gap data.

Part of the process, by each program, of detecting the gap is estimating the starting time of the missing data gap. Each cause of gaps, as discussed in subsequent detail, requires a slightly different means of estimating the starting time of the gap. Even if the exact starting time of the missing data is known, the time range of the gap is enlarged to ensure that all missing data is recovered.

To this end, the E-NET1 parameters file 87 (FIG. 8) contains a gap interval extender variable GAPINTX, which is in hundredths of seconds, the same as the timestamps. The starting timestamp of a gap in SCF 76 is set to a value earlier in time than the actual beginning time of the loss of data by the amount of the gap interval GAPINTX. For example, if the FIFO queue 100 overflows at 11:45:30.20 and the value of GAPINTX is 100 (see Table 8) indicating a gap interval extender of 1 second, the starting timestamp in the gap data group record is set to 11:45:29.20. This starting timestamp is used by gap recovery programs in that the gap recovery programs will recover a range of data beginning with data created on or after the time 11:45:29.20, thus ensuring that all the missing data is recovered. A gap due to queue overflow begins when the queue is unable to store a T-record, and the gap ends when the queue 100 is again able to receive T-records 51 from the enqueuer 88. If, continuing the example, the queue is restored at 11:47:13.00, the ending timestamp in the SCF gap data group record is set to 11:47:14.00.

Since gaps only occur when something goes wrong with the operation of E-NET1, there is a strong likelihood that the operations started to go wrong a short time before the detectable error occurred. These undetectable error periods can be easily recovered using the gap interval extender variable GAPINTX. Using the gap interval extender variable GAPINTX, some T-records could be duplicated at the remote site, but this is not a problem since DBMS recovery, discussed subsequently, is able to eliminate duplicate T-records.

The data group number for the new gap data group record is found by looking at the data group number (Table 6) of the last data group record in the SCF 76 and assigning the next data group number to the new data group. Since the SCF is ordered by data group, the last record will always have the largest data group number. The DBMS ID field for the gap data group record is determined by the detecting program as discussed below.

A gap could be caused by a) the mainframe 16, MVS 20, or the E-NET1 region 30 unexpectedly losing power or suffering an ABEND causing an unexpected stoppage of E-NET1, b) by the FIFO queue 100 overflowing, or c) by the operator choosing to execute a DBMS which has been configured to run with E-NET1 without having an active E-NET1 region. Note that gap is not created by faulty transmission of data, since the VTAM buffers are sent and resent until confirmed. Gaps are also not created due to the failure of the communications line 53, or any other communications device, since the data will be safely stored in spill files 93 until such time that the contents of the spill files can be sent to the remote site.

A gap due to unexpected stoppage of E-NET1 is detected by the program E1MAIN 94 (FIG. 3) when the system comes back up after the stoppage. As shown in block 204 of the logic diagram of FIG. 5, E1MAIN checks the SCF 76 for any open data group records. If E-NET1 is terminated normally just before E1MAIN is run, all data group records in the SCF will be closed (as defined above) when E1MAIN checks for open data group and DBMS control records at block 507 (FIG. 10A). However, if E-NET1 suffers an unexpected stoppage, E-NET1 would not have had a chance to close the open records, and very probably one or more T-records represented by the open data group record would also be lost.

If an open live or spill data group record (Table 5C) in the SCF 76 exists when E1MAIN starts up, E1MAIN will close the live or spill data group record, and change the status flags of the record to make it a gap data group record. At any given time, only one live or spill data group record is open. This is the record representing the data group that was last being filled. This record is closed by E1MAIN by inserting an ending timestamp equal to the current time.

Briefly, for each open DBMS control record in the SCF 76, E1MAIN closes the open DBMS control record and, as it does so, creates a new gap data group record for that DBMS control record. E1MAIN, in each new data group record, sets the DBMS ID of the new record equal to the DBMS ID field of the DBMS control record, sets the DG Type flag of the new record to indicate a gap, sets the starting timestamp for the new record equal to the starting timestamp of the DBMS control record moved back in time by the gap interval extender time, sets the ending timestamp of the new record to the current time, read from real-time clock 8 (which is some time after the system comes back up) moved forward in time by the gap interval extender time.

Checkpoint file 104 (FIG. 3) is used to minimize the number of T-records retransmitted when processing terminates abnormally and has a) a data group number, b) a sequence number, and c) a timestamp, all relating to the most recently check-pointed successfully transmitted T-record. The checkpoint file can be used to determine approximately how far along T-record transmission was when the system terminated. The data group number and sequence number and timestamp are the same as the T-record that was last successfully sent to the remote site 12. The checkpoint file is updated at regular intervals, as discussed above, and is not updated to reflect the transmission of any one T-record until all T-records prior to it have been received by the remote site. Thus, any gap recovery only need concern T-records generated after T-record indicated by the checkpoint file. To take advantage of this known point, if any gap records need to be created, the starting timestamp can be taken to be no earlier than the checkpoint file timestamp, so that no gap extends further back in time than the timestamp held in the checkpoint file.

Referring to FIG. 3, a gap caused by FIFO queue 100 overflow is detected by the enqueuer program E1QPOR 88. To this end if E1QPOR is unsuccessful in placing a T-record 51, passed from the exit routine 80, onto the FIFO queue 100, the enqueuer sets the SCOMQFUL flag in the SCOM area 98 to indicate that the FIFO queue is presently full. The enqueuer expects that E1DEQU will reset SCOMQFUL, but not before completely emptying the FIFO queue below the low queue threshold percentage (QUETHLO). Without this design feature, the enqueuer might be able to get every other T-record onto the FIFO queue as the FIFO queue is being emptied slowly, but a gap data group and gap data group record in SCF 76 would be created for every other T-record.

When the enqueuer 88 first fails in an attempt to put a T-record onto the FIFO queue 100, the enqueuer sets the SCOMQFUL, which alerts E1DEQU 92, which, when E1DEQU is executed, handles the situation. The exit routine 80 (FIG. 2) that was not able to send data to the enqueuer creates a new open gap data group record. The exit routine 80 fills the appropriate fields of the new gap data group record with the DBMS ID and DBMS type read from the DBMS parameter file 77 for that DBMS, as well as the starting timestamp derived from the rejected T-record, including an adjustment backward in time in the amount of the gap recovery interval. The exit routine 80 also sets the proper flag to indicate that the data group is now a gap data group.

Referring to FIG. 2, a gap can be caused by a DBMS being started without E-NET1 running, if E-NET1 is still not running when the exit routine 80 for that DBMS tries to write a block of data to E-NET1. Specifically, a gap data group record will be created when the exit routine 80 tries to send data to the enqueuer E1QPOR 88 via XMS 48. The exit routine 80 first queries the E-NET1 status flag ECVTACTV stored in ECVT 41 before sending T-record 51, to check if E-NET1 is active. If the E-NET1 ECVTACTV status flag in ECVT 41 indicates that E-NET1 is not active, the exit routine 80 creates a new gap data group record in the SCF 76 if an open gap data group record for the specific DBMS using the exit routine does not already exist.

The new gap data group record (Table 6) has a starting timestamp corresponding, not to the start up of the DBMS, but to the timestamp assigned by the DBMS to the first record in the T-record sought to be transferred. The starting timestamp is actually the time taken from the first record in the T-record adjusted back in time by the amount of the gap interval extender GAPINTX and is stored in the new gap data group record. Also, the DBMS ID field is filled in with the DBMS ID read from the T-record 51 that the exit routine 80 tried to send to E-NET1, and the DBMS type field is read directly from the code of the exit routine, since each type of DBMS requires a different exit routine.

The exit routine creates a gap data group record if it cannot send the T-record to the E-NET1 region even if E-NET1 was active when the DBMS was initiated. For subsequent T-records, a new gap record is not necessary if there is already an open gap data group record.

D. GAP RECOVERY

Briefly, as soon as the full extent of a gap is known, recovery of the missing data can be started. When the extent of the gap is known, the ending timestamp field in the SCF record 76 for that gap is filled in. The starting and ending timestamps define the extent of the gap data group record, and since the starting timestamp is always put in the gap data group record when it is created, gap recovery can be initiated by the same program that "closes" the gap data group record by filling in its ending timestamp field when the end of a gap is detected. The program that detects the end of a gap also sets the "gap established" flag, submit