|
|
|
DATABASE SCHEMA OR DATA STRUCTURE |
Space reclamation system and method for use in connection with tape logging system5778394
Abstract
A digital data processing system comprises a host information generating device, a mass storage subsystem, and a back-up information storage subsystem. The host information generating device generates information and provides it to the mass storage subsystem for storage. The mass storage subsystem receives the generated information from the host information generating device and transfers the generated information to the storage element for storage, and further transfers the generated information to the back-up information storage subsystem. The back-up information storage subsystem receives and stores the generated information from the mass storage subsystem's control element. The back-up information storage subsystem includes a filter/buffer module, a tape log module and a reconstruction module. The filter/buffer module filters and buffers the information received from the mass storage subsystem and provides the buffered information to the tape log module for storage. The tape log module stores the information received from the filter/buffer module in logging fashion on tape cartridges. The filter/buffer module filters the information received from the mass storage subsystem so as to reduce the amount of information to be logged, so that, if the host changes the information while it is being buffered, the filter/buffer module will provide only the most recent information to the tape log module for storage. If a failure occurs in the mass storage subsystem, the reconstruction module can reconstruct the information that was on the failed device using the stored information from the tape log module and the buffered information.
Claims
What is claimed as new and desired to be secured by Letters Patent of the United States is:
1. A valid data item update retrieval and storage subsystem for copying valid ones of a plurality of data item updates serially recorded on a source storage medium, onto a target storage medium, each data item update having a data item identifier which is one of a set of data item identifier values, the valid data item update retrieval and recording arrangement comprising:
A. a storage medium directory including a series of directory entries each identifying, for a corresponding one of the series of data item updates recorded on said source storage medium, the data item identifier associated with said corresponding one of the series of data item updates;
B. a data item identifier flag set comprising a plurality of data item identifier flags, each associated with a data item identifier, each flag having a valid condition indicating that the source storage medium has a valid data item update associated with the respective data item identifier flag's associated data item identifier stored thereon, and at least one other condition;
C. a valid data item update identifier for using the storage medium directory and said data item identifier flag set to identify a set of valid data item updates on said source storage medium, the valid data item update identifier scanning the directory entries in said directory in reverse in reverse order and, for each directory entry, determining whether the data item identifier flag associated with the data item identifier contained in the directory entry indicates that the source cartridge has a valid data item update associated with the respective flag's associated data item identifier stored thereon and, if so, determining that the data item update associated with the directory entry is a valid data item update;
D. a valid data item update transfer control for providing valid ones of the data item updates as identified by said valid data item update identifier from the source storage medium for storage on the target storage medium.
2. A valid data item update retrieval and storage subsystem as defined in claim 1 in which the valid data item update identifier further condition the one of the data item identifier flags associated with the data item identifier to said at least one other condition when it determines that the data item update associated with the directory entry is a valid data item update.
3. A valid data item update retrieval and storage subsystem as defined in claim 2 in which said valid data item update identifier includes:
A. a directory entry selector for selecting a directory entry;
B. a validity determination generation element for using the data item identifier flag associated with the data item identifier identified by the selected directory entry generate a validity indication indicating whether the selected directory entry is associated with a valid data item update;
C. a data item identifier flag conditioner responsive to the validity indication generated by the validity determination generation element indicating that the selected directory entry is associated with a valid data item update for conditioning the data item identifier flag associated with the data item identifier identified by the selected directory entry to said at least one other condition; and
D. an iteration control element for controlling the directory entry selector element, said validity determination generation element and said data item identifier flag conditioner through a series of iterations.
4. A valid data item update retrieval and storage subsystem as defined in claim 3 in which said iteration control element enables said directory entry selector to, in each of a series of iterations, select the one of the directory entries which precedes the directory entry selected during the preceding iteration.
5. A valid data item update retrieval and storage subsystem as defined in claim 4 in which said iteration control element enables said directory entry selector to, in a first iteration, select the last directory entry in the storage medium directory.
6. A valid data item update retrieval and storage subsystem as defined in claim 3 in which said iteration control element terminates iterations after the iteration in which the directory entry selector selects the first directory entry in the storage medium directory.
7. A valid data item update retrieval and storage subsystem as defined in claim 1 further comprising a data item update storage control for controlling storage of valid data item updates provided by said valid data item update transfer control on the target storage medium.
8. A valid data item update retrieval and storage subsystem as defined in claim 7 in which said data item update storage control further controls storage of data item updates provided by a data item update source on the target storage medium.
9. A valid data item update retrieval and storage subsystem as defined in claim 8, the target storage medium having associated therewith a target storage medium data item identifier flag set comprising a plurality of target storage medium data item identifier flags, each associated with a data item identifier, the data item update storage control conditioning the target storage medium data item identifier flags in response to the storage of data item updates on the target storage medium.
10. A valid data item update retrieval and storage subsystem as defined in claim 9 in which each target storage medium data item identifier flag has having a valid condition and at least one other condition, the data item update storage control conditioning the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the data item update source and stored on said target storage medium to the valid condition.
11. A valid data item update retrieval and storage subsystem as defined in claim 10, the valid data item update retrieval and storage subsystem being used in a system comprising at least one other storage medium having associated therewith a data item identifier flag set comprising a plurality of data item identifier flags, each associated with a data item identifier, each data item identifier flag having a valid condition indicating that the other storage medium has a valid data item update associated with the respective data item identifier flag's associated data item identifier stored thereon, and at least one other condition, the data item update storage control further conditioning the data item identifier flag of the data item identifier flag set associated with the other storage medium to said other condition for each target storage medium data item identifier flag which has the valid condition.
12. A valid data item update retrieval and storage subsystem as defined in claim 9 in which each target storage medium data item identifier flag has has a valid condition and at least one other condition, the data item update storage control conditioning the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the data item update source and stored on said target storage medium to the valid condition.
13. A valid data item update retrieval and storage subsystem as defined in claim 12 in which said data item update storage control further selectively conditions the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the source storage medium to one of the valid condition or the other condition.
14. A valid data item update retrieval and storage subsystem as defined in claim 13, the data item update storage control conditioning the target storage medium data item identifier flags associated with data item identifiers for valid data item updates received from the source storage medium to the valid condition.
15. A valid data item update retrieval and storage subsystem as defined in claim 1, further comprising a data item update storage control for controlling storage of successive data item updates on said target storage medium, the data item update storage control maintaining a storage medium directory for said target storage medium, the data item update storage control providing an entry for each data item update stored on said trget storage medium including the data item identifier, so that the successive entries in the target storage medium's storage medium directory identify the successive data item identifiers associated with successive data item updates stored on the target storage medium.
16. A valid data item update retrieval and storage subsystem as defined in claim 15 in which each entry of said target storage medium's storage medium directory further includes an invalid flag having an invalid conditin and at least one other condition, the condition of the invalid flag being controlled by said data item update storage control.
17. A valid data item update retrieval and storage subsystem as defined in claim 16, the valid data item update identifier and the valid data item update transfer control operating during a space reclamation operation in which valid data item updates are copied from the source storage medium to the target medium, the data item update storage control during the space reclamation operation maintaining an auxiliary data item identifier flag set comprising a plurality of auxiliary data item identifier flags, each associated with a data item identifier, each flag having a plurality of conditions, the data item update storage control controlling the condition of the invalid flag of selected ones of the entries of said target storage medium's storage medium directory in relation to the condition of the auxiliary data item identifier flag asociated with the data item identifier of the respective entries.
18. A valid data item update retrieval and storage subsystem as defined in claim 17, in which said data item update storage control further controls storage of data item updates provided by a data item update source, each being associated with a said data item identifier, on the target storage medium, the data item update storage control including:
A. an auxiliary data item identifier flag conditioner for, during a said space reclamation operation, conditioning the auxiliary data item identifier flag assocaited with the data item identifier of each data item update provided by said data item update source to a predetermined condition; and
B. a target storage medium directory entry generator for generating an entry for the target storage medium's storage medium directory, the target storage mdeium directory entry generator conditioning the entry's invalid flag to an invalid condition if the data item update stored on the target storage medium was provided by the valid data item update transfer control and the auxiliary data item identifier flag associated with the data item identifier of the data item update had predetermined condition.
19. A method of copying valid ones of a plurality of data item updates serially recorded on a source storage medium, onto a target storage medium, each data item update having a data item identifier which is one of a set of data item identifier values, a set of said valid data item updates for storing on a valid data item update storage medium, the valid data item update retrieval and recording arrangement comprising:
A. providing a storage medium directory including a series of directory entries each identifying, for a corresponding one of the series of data item updates recorded on said source storage medium, the data item identifier associated with said corresponding one of the series of data item updates;
B. providing a data item identifier flag set comprising a plurality of data item identifier flags, each associated with a data item identifier, each flag having a valid condition indicating that the source storage medium has a valid data item update associated with the respective data item identifier flag's associated data item identifier stored thereon, and at least one other condition;
C. for using the storage medium directory and said data item identifier flag set to identify a set of valid data item updates on said source storage medium, the valid data item update identifier scanning the directory entries in said directory in reverse in reverse order and, for each directory entry, determining whether the data item identifier flag associated with the data item identifier contained in the directory entry indicates that the source cartridge has a valid data item update associated with the respective flag's associated data item identifier stored thereon and, if so, determining that the data item update associated with the directory entry is a valid data item update;
D. providing valid ones of the data item updates as identified by said valid data item update identifier from the source storage medium for storage on the target storage medium.
20. A method as defined in claim 19 in which the valid data item update identification step further includes the step of conditioning the one of the data item identifier flags associated with the data item identifier to said at least one other condition when it determines that the data item update associated with the directory entry is a valid data item update.
21. A method as defined in claim 20 in which said valid data item update identification step includes the steps of iteratively:
A. selecting a directory entry;
B. using the data item identifier flag associated with the data item identifier identified by the selected directory entry generate a validity indication indicating whether the selected directory entry is associated with a valid data item update; and
C. conditioning the data item identifier flag associated with the data item identifier identified by the selected directory entry to said at least one other condition in response to the validity indication indicating that the selected directory entry is associated with a valid data item update.
22. A method as defined in claim 21 in which, during the directory entry selection step, in each iteration the one of the directory entries is selected which precedes the directory entry selected during the preceding iteration.
23. A method as defined in claim 22 in which, during the directory entry selection step of a first iteration, the last directory entry in the storage medium directory is selected.
24. A method as defined in claim 21 in which iterations are terminated after the iteration in which the first directory entry in the storage medium directory is selected during the directory entry selection step.
25. A method as defined in claim 19 further comprising the step of controlling storage of valid data item updates provided by said valid data item update transfer control on the target storage medium.
26. A method as defined in claim 25 in which, during the data item update storage step, data item updates provided by a data item update source are also stored on the target storage medium.
27. A method as defined in claim 26, the target storage medium having associated therewith a target storage medium data item identifier flag set comprising a plurality of target storage medium data item identifier flags, each associated with a data item identifier, the data item update storage step including the step of conditioning the target storage medium data item identifier flags in response to the storage of data item updates on the target storage medium.
28. A method as defined in claim 27 in which each target storage medium data item identifier flag has having a valid condition and at least one other condition, the data item update storage step including the step of conditioning the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the data item update source and stored on said target storage medium to the valid condition.
29. A method as defined in claim 28, the method being used in a system comprising at least one other storage medium having associated therewith a data item identifier flag set comprising a plurality of data item identifier flags, each associated with a data item identifier, each data item identifier flag having a valid condition indicating that the other storage medium has a valid data item update associated with the respective data item identifier flag's associated data item identifier stored thereon, and at least one other condition, the data item update storage step further including the step of conditioning the data item identifier flag of the data item identifier flag set associated with the other storage medium to said other condition for each target storage medium data item identifier flag which has the valid condition.
30. A method as defined in claim 27 in which each target storage medium data item identifier flag has has a valid condition and at least one other condition, the data item update storage step further including the step of conditioning the the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the data item update source and stored on said target storage medium to the valid condition.
31. A method as defined in claim 30 in which said data item update storage step further includes the step of selectively conditioning the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the source storage medium to one of the valid condition or the other condition.
32. A method as defined in claim 31, the data item update storage step including the step of conditioning the target storage medium data item identifier flags associated with data item identifiers for valid data item updates received from the source storage medium to the valid condition.
33. A method as defined in claim 19, further comprising the step of storing successive data item updates on said target storage medium, the data item update storage step maintaining a storage medium directory for said target storage medium, the data item update storage step including the step of providing an entry for each data item update stored on said target storage medium including the data item identifier, so that the successive entries in the target storage medium's storage medium directory identify the successive data item identifiers associated with successive data item updates stored on the target storage medium.
34. A method as defined in claim 33 in which each entry of said target storage medium's storage medium directory further includes an invalid flag having an invalid condition and at least one other condition, the data item update storage step including the step of conditioning the invalid flag.
35. A method as defined in claim 34, the valid data item update identification step and the valid data item update transfer step being performed during a space reclamation operation in which valid data item updates are copied from the source storage medium to the target storage medium, the data item update storage step including the step of, during the space reclamation operation, maintaining an auxiliary data item identifier flag set comprising a plurality of auxiliary data item identifier flags, each associated with a data item identifier, each flag having a plurality of conditions, the data item update storage step including the step of controlling the condition of the invalid flag of selected ones of the entries of said target storage medium's storage medium directory in relation to the condition of the auxiliary data item identifier flag associated with the data item identifier of the respective entries.
36. A method as defined in claim 35, in which said data item update storage step further includes the step of storing of data item updates provided by a data item update source, each being associated with a said data item identifier, on the target storage medium, the data item update storage step further including the steps of:
A. during a said space reclamation operation, conditioning the auxiliary data item identifier flag associated with the data item identifier of each data item update provided by said data item update source to a predetermined condition; and
B. generating an entry for the target storage medium's storage medium directory, the target storage medium directory entry generator conditioning the entry's invalid flag to an invalid condition if the data item update stored on the target storage medium was provided by the valid data item update transfer control and the auxiliary data item identifier flag associated with the data item identifier of the data item update had predetermined condition.
37. A valid data item update retrieval and storage subsystem for copying valid ones of a plurality of data item updates serially recorded on a source storage medium, onto a target storage medium, each data item update having a data item identifier which is one of a set of data item identifier values, a set of said valid data item updates for storing on a valid data item update storage medium, the valid data item update retrieval and recording arrangement comprising:
A. a digital data processor; and
B. a control subsystem for controlling the processor, the control subsystem comprising:
i. a storage medium directory including a series of directory entries each identifying, for a corresponding one of the series of data item updates recorded on said source storage medium, the data item identifier associated with said corresponding one of the series of data item updates;
ii. a data item identifier flag set module for enabling the processor to maintain a data item identifier flag set comprising a plurality of data item identifier flags, each associated with a data item identifier, each flag having a valid condition indicating that the source storage medium has a valid data item update associated with the respective data item identifier flag's associated data item identifier stored thereon, and at least one other condition;
iii. a valid data item update identifier use the storage medium directory and said data item identifier flag set to identify a set of valid data item updates on said source storage medium, the valid data item update identifier scanning the directory entries in said directory in reverse in reverse order and, for each directory entry, determining whether the data item identifier flag associated with the data item identifier contained in the directory entry indicates that the source cartridge has a valid data item update associated with the respective flag's associated data item identifier stored thereon and, if so, determining that the data item update associated with the directory entry is a valid data item update;
iv. a valid data item update transfer control for enabling the processor to provide valid ones of the data item updates as identified by said valid data item update identifier from the source storage medium for storage on the target storage medium.
38. A valid data item update retrieval and storage subsystem as defined in claim 37 in which the valid data item update identifier module further enables the processor to condition the one of the data item identifier flags associated with the data item identifier to said at least one other condition when it determines that the data item update associated with the directory entry is a valid data item update.
39. A valid data item update retrieval and storage subsystem as defined in claim 38 in which said valid data item update identifier module includes:
A. a directory entry selector module for enabling said processor to select a directory entry;
B. a validity determination generation module for enabling said processor to use the data item identifier flag associated with the data item identifier identified by the selected directory entry generate a validity indication indicating whether the selected directory entry is associated with a valid data item update;
C. a data item identifier flag conditioner module for enabling the processor to, in response to the validity indication indicating that the selected directory entry is associated with a valid data item update, condition the data item identifier flag associated with the data item identifier identified by the selected directory entry to said at least one other condition; and
D. an iteration control module for controlling processor to process the directory entry selector module, said validity determination generation module and said data item identifier flag conditioner module through a series of iterations.
40. A valid data item update retrieval and storage subsystem as defined in claim 39 in which said iteration control module enables said processor to, when processing the directory entry selector in each iteration, select the one of the directory entries which precedes the directory entry selected during the preceding iteration.
41. A valid data item update retrieval and storage subsystem as defined in claim 40 in which said iteration control module enables said processor to, when processing said directory entry selector in a first iteration, select the last directory entry in the storage medium directory.
42. A valid data item update retrieval and storage subsystem as defined in claim 39 in which said iteration control module enables said processor to terminate iterations after the iteration in which, during the directory entry selection step, the first directory entry in the storage medium directory is selected.
43. A valid data item update retrieval and storage subsystem as defined in claim 37 further comprising a data item update storage control module for enabling said processor to control storage of valid data item updates provided by said valid data item update transfer control on the target storage medium.
44. A valid data item update retrieval and storage subsystem as defined in claim 43 in which said data item update storage control module further enables said processor to control storage of data item updates provided by a data item update source on the target storage medium.
45. A valid data item update retrieval and storage subsystem as defined in claim 44, the target storage medium having associated therewith a target storage medium data item identifier flag set comprising a plurality of target storage medium data item identifier flags, each associated with a data item identifier, the data item update storage control module further enabling said processor to condition the target storage medium data item identifier flags in response to the storage of data item updates on the target storage medium.
46. A valid data item update retrieval and storage subsystem as defined in claim 45 in which each target storage medium data item identifier flag has having a valid condition and at least one other condition, the data item update storage control module enabling the processor to condition the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the data item update source and stored on said target storage medium to the valid condition.
47. A valid data item update retrieval and storage subsystem as defined in claim 46, the valid data item update retrieval and storage subsystem being used in a system comprising at least one other storage medium having associated therewith a data item identifier flag set comprising a plurality of data item identifier flags, each associated with a data item identifier, each data item identifier flag having a valid condition indicating that the other storage medium has a valid data item update associated with the respective data item identifier flag's associated data item identifier stored thereon, and at least one other condition, the data item update storage control module further enabling said processor to condition the data item identifier flag of the data item identifier flag set associated with the other storage medium to said other condition for each target storage medium data item identifier flag which has the valid condition.
48. A valid data item update retrieval and storage subsystem as defined in claim 47 in which each target storage medium data item identifier flag has has a valid condition and at least one other condition, the data item update storage control module enabling the processor to condition the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the data item update source and stored on said target storage medium to the valid condition.
49. A valid data item update retrieval and storage subsystem as defined in claim 48 in which said data item update storage control module further enabling the processor to selectively condition the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the source storage medium to one of the valid condition or the other condition.
50. A valid data item update retrieval and storage subsystem as defined in claim 49, the data item update storage control module enabling the processor to condition the target storage medium data item identifier flags associated with data item identifiers for valid data item updates received from the source storage medium to the valid condition.
51. A valid data item update retrieval and storage subsystem as defined in claim 37, further comprising a data item update storage control module for enabling said processor to (i) control storage of successive data item updates on said target storage medium, (ii) enable the maintaining a storage medium directory for said target storage medium, and (iii) provide an entry for each data item update stored on said target storage medium including the data item identifier, so that the successive entries in the target storage medium's storage medium directory identify the successive data item identifiers associated with successive data item updates stored on the target storage medium.
52. A valid data item update retrieval and storage subsystem as defined in claim 51 in which each entry of said target storage medium's storage medium directory further includes an invalid flag having an invalid condition and at least one other condition, the data item update storage control module enabling the processor to condition of the invalid flag.
53. A valid data item update retrieval and storage subsystem as defined in claim 52, the valid data item update identifier and the valid data item update transfer control module controlling said processor during a space reclamation operation in which valid data item updates are copied from the source storage medium to the target medium, the data item update storage control module enabling said processor to, during the space reclamation operation, maintain an auxiliary data item identifier flag set comprising a plurality of auxiliary data item identifier flags, each associated with a data item identifier, each flag having a plurality of conditions, the data item update storage control module further controlling the processor to condition of the invalid flag of selected ones of the entries of said target storage medium's storage medium directory in relation to the condition of the auxiliary data item identifier flag associated with the data item identifier of the respective entries.
54. A valid data item update retrieval and storage subsystem as defined in claim 53, in which said data item update storage control further controls storage of data item updates provided by a data item update source, each being associated with a said data item identifier, on the target storage medium, the data item update storage control module including:
A. an auxiliary data item identifier flag conditioning module for, during a said space reclamation operation, enabling said processor to condition the auxiliary data item identifier flag associated with the data item identifier of each data item update provided by said data item update source to a predetermined condition; and
B. a target storage medium directory entry generation module for controlling said processor to generate an entry for the target storage medium's storage medium directory, the target storage medium directory entry generation module enabling said processor to condition the entry's invalid flag to an invalid condition if the data item update stored on the target storage medium was provided by the valid data item update transfer control and the auxiliary data item identifier flag associated with the data item identifier of the data item update had predetermined condition.
55. A control subsystem for use in connection with a processor to form a valid data item update retrieval and storage subsystem for copying valid ones of a plurality of data item updates serially recorded on a source storage medium, onto a target storage medium, each data item update having a data item identifier which is one of a set of data item identifier values, a set of said valid data item updates for storing on a valid data item update storage medium, the control subsystem comprising:
A. a storage medium directory including a series of directory entries each identifying, for a corresponding one of the series of data item updates recorded on said source storage medium, the data item identifier associated with said corresponding one of the series of data item updates;
B. a data item identifier flag set module for enabling the processor to maintain a data item identifier flag set comprising a plurality of data item identifier flags, each associated with a data item identifier, each flag having a valid condition indicating that the source storage medium has a valid data item update associated with the respective data item identifier flag's associated data item identifier stored thereon, and at least one other condition;
C. a valid data item update identifier use the storage medium directory and said data item identifier flag set to identify a set of valid data item updates on said source storage medium, the valid data item update identifier scanning the directory entries in said directory in reverse in reverse order and, for each directory entry, determining whether the data item identifier flag associated with the data item identifier contained in the directory entry indicates that the source cartridge has a valid data item update associated with the respective flag's associated data item identifier stored thereon and, if so, determining that the data item update associated with the directory entry is a valid data item update;
D. a valid data item update transfer control for enabling the processor to provide valid ones of the data item updates as identified by said valid data item update identifier from the source storage medium for storage on the target storage medium.
56. A control subsystem as defined in claim 55 in which the valid data item update identifier module further enables the processor to condition the one of the data item identifier flags associated with the data item identifier to said at least one other condition when it determines that the data item update associated with the directory entry is a valid data item update.
57. A control subsystem as defined in claim 56 in which said valid data item update identifier module includes:
A. a directory entry selector module for enabling said processor to select a directory entry;
B. a validity determination generation module for enabling said processor to use the data item identifier flag associated with the data item identifier identified by the selected directory entry generate a validity indication indicating whether the selected directory entry is associated with a valid data item update;
C. a data item identifier flag conditioner module for enabling the processor to, in response to the validity indication indicating that the selected directory entry is associated with a valid data item update, condition the data item identifier flag associated with the data item identifier identified by the selected directory entry to said at least one other condition; and
D. an iteration control module for controlling processor to process the directory entry selector module, said validity determination generation module and said data item identifier flag conditioner module through a series of iterations.
58. A control subsystem as defined in claim 57 in which said iteration control module enables said processor to, when processing the directory entry selector in each iteration, select the one of the directory entries which precedes the directory entry selected during the preceding iteration.
59. A control subsystem as defined in claim 58 in which said iteration control module enables said processor to, when processing said directory entry selector in a first iteration, select the last directory entry in the storage medium directory.
60. A control subsystem as defined in claim 58 in which said iteration control module enables said processor to terminate iterations after the iteration in which, during the directory entry selection step, the first directory entry in the storage medium directory is selected.
61. A control subsystem as defined in claim 55 further comprising a data item update storage control module for enabling said processor to control storage of valid data item updates provided by said valid data item update transfer control on the target storage medium.
62. A control subsystem as defined in claim 61 in which said data item update storage control module further enables said processor to control storage of data item updates provided by a data item update source on the target storage medium.
63. A control subsystem as defined in claim 62, the target storage medium having associated therewith a target storage medium data item identifier flag set comprising a plurality of target storage medium data item identifier flags, each associated with a data item identifier, the data item update storage control module further enabling said processor to condition the target storage medium data item identifier flags in response to the storage of data item updates on the target storage medium.
64. A control subsystem as defined in claim 63 in which each target storage medium data item identifier flag has having a valid condition and at least one other condition, the data item update storage control module enabling the processor to condition the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the data item update source and stored on said target storage medium to the valid condition.
65. A control subsystem as defined in claim 64, the valid data item update retrieval and storage subsystem being used in a system comprising at least one other storage medium having associated therewith a data item identifier flag set comprising a plurality of data item identifier flags, each associated with a data item identifier, each data item identifier flag having a valid condition indicating that the other storage medium has a valid data item update associated with the respective data item identifier flag's associated data item identifier stored thereon, and at least one other condition, the data item update storage control module further enabling said processor to condition the data item identifier flag of the data item identifier flag set associated with the other storage medium to said other condition for each target storage medium data item identifier flag which has the valid condition.
66. A control subsystem as defined in claim 65 in which each target storage medium data item identifier flag has has a valid condition and at least one other condition, the data item update storage control module enabling the processor to condition the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the data item update source and stored on said target storage medium to the valid condition.
67. A control subsystem as defined in claim 66 in which said data item update storage control module further enabling the processor to selectively condition the target storage medium data item identifier flags associated with data item identifiers for data item updates received from the source storage medium to one of the valid condition or the other condition.
68. A control subsystem as defined in claim 67, the data item update storage control module enabling the processor to condition the target storage medium data item identifier flags associated with data item identifiers for valid data item updates received from the source storage medium to the valid condition.
69. A control subsystem as defined in claim 65, further comprising a data item update storage control module for enabling said processor to (i) control storage of successive data item updates on said target storage medium, (ii) enable the maintaining a storage medium directory for said target storage medium, and (iii) provide an entry for each data item update stored on said target storage medium including the data item identifier, so that the successive entries in the target storage medium's storage medium directory identify the successive data item identifiers associated with successive data item updates stored on the target storage medium.
70. A control subsystem as defined in claim 69 in which each entry of said target storage medium's storage medium directory further includes an invalid flag having an invalid condition and at least one other condition, the data item update storage control module enabling the processor to condition of the invalid flag.
71. A control subsystem as defined in claim 70, the valid data item update identifier and the valid data item update transfer control module controlling said processor during a space reclamation operation in which valid data item updates are copied from the source storage medium to the target medium, the data item update storage control module enabling said processor to, during the space reclamation operation, maintain an auxiliary data item identifier flag set comprising a plurality of auxiliary data item identifier flags, each associated with a data item identifier, each flag having a plurality of conditions, the data item update storage control module further controlling the processor to condition of the invalid flag of selected ones of the entries of said target storage medium's storage medium directory in relation to the condition of the auxiliary data item identifier flag associated with the data item identifier of the respective entries.
72. A control subsystem as defined in claim 71, in which said data item update storage control further controls storage of data item updates provided by a data item update source, each being associated with a said data item identifier, on the target storage medium, the data item update storage control module including:
A. an auxiliary data item identifier flag conditioning module for, during a said space reclamation operation, enabling said processor to condition the auxiliary data item identifier flag associated with the data item identifier of each data item update provided by said data item update source to a predetermined condition; and
B. a target storage medium directory entry generation module for controlling said processor to generate an entry for the target storage medium's storage medium directory, the target storage medium directory entry generation module enabling said processor to condition the entry's invalid flag to an invalid condition if the data item update stored on the target storage medium was provided by the valid data item update transfer control and the auxiliary data item identifier flag associated with the data item identifier of the data item update had predetermined condition.
Description
FIELD OF THE INVENTION
The invention relates generally to digital data storage subsystems for use in storing information from, for example, digital computers. The invention more particularly relates to storage subsystems which may be used as back-up stores for one or more digital computer systems, and which further may be remotely-located from one or more of the digital computer systems so as to ensure that catastrophic failure which may occur at the sites of the respective digital computer systems do not result in unavailability of the information stored thereon.
BACKGROUND OF THE INVENTION
Digital computer systems are used in a number of applications in which virtually continuous availability of data is important to the operation of businesses or other entities using the systems. Generally, computer centers will periodically produce back-up copies of data on their various digital computer systems. Such back-up copies are usually not maintained on a continuous basis, but instead at particular points in time, often at night, and in any case represent the data at the particular points in time at which the back-up copies are generated. Accordingly, if a failure occurs between back-ups, data which has been received and processed by the digital computer systems since the last back-up copy was produced, may be lost.
Typically, such back-up copies will be maintained by the computer centers at their respective sites so that they may be used in the event of a failure, although some off-site archival back-ups may be maintained. Significant additional problems arise in the case of, for example, catastrophic events that can occur, such as may result from, for example, fire, flood or other natural disasters, intentional tampering or sabotage and the like, which may result in unintentional or intentional damage to an entire site or some significant portion thereof, since some or all of the back-up copies may also be damaged and the data contained thereon may be unavailable.
SUMMARY OF THE INVENTION
The invention provides a new and improved digital data storage subsystem which provides secure remote mirrored storage of digital data for one or more digital data processing systems.
In brief summary, the invention provides a space reclamation subsystem for copying valid ones of a plurality of data item updates serially recorded on a source storage medium, such as a magnetic tape medium, onto a target storage medium. Each data item update has a data item identifier which is one of a set of data item identifier values. The space reclamation subsystem comprises a storage medium directory, a data item identifier flag set, a valid data item update identifier, and a valid data item update transfer control. The storage medium directory includes a series of directory entries each identifying, for a corresponding one of the series of data item updates recorded on the source storage medium, the data item identifier associated with the corresponding one of the series of data item updates. The data item identifier flag setcomprises a plurality of data item identifier flags, each associated with a data item identifier, each flag having a valid condition indicating that the source storage medium has a valid data item update associated with the respective data item identifier flag's associated data item identifier stored thereon, and at least one other condition. The valid data item update identifier uses the storage medium directory and the data item identifier flag set to identify a set of valid data item updates on the source storage medium. In that operation, the valid data item update identifier scans the directory entries in the directory in reverse in reverse order and, for each directory entry, determines whether the data item identifier flag associated with the data item identifier contained in the directory entry indicates that the source cartridge has a valid data item update associated with the respective flag's associated data item identifier stored thereon and, if so, determines that the data item update associated with the directory entry is a valid data item update. The valid data item update transfer control provides valid ones of the data item updates as identified by the valid data item update identifier from the source storage medium for storage on the target storage medium.
BRIEF DESCRIPTION OF THE DRAWINGS
This invention is pointed out with particularity in the appended claims. The above and further advantages of this invention may be better understood by referring to the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a functional block diagram of a system including a remote data protection facility constructed in accordance with the invention;
FIG. 2 is a functional block diagram of one embodiment of a digital data processing system which the remote data protection facility;
FIG. 3 is a diagram which is useful in understanding the operation of the digital data processing system depicted in FIG. 2;
FIG. 4 is a functional block diagram of an input module useful in the remote data protection facility depicted in FIG. 1;
FIG. 5 is a functional block diagram of a filter/buffer module which is useful in the remote data protection facility depicted in FIG. 1;
FIG. 6 is a functional block diagram of a tape log module useful in the remote data protection facility depicted in FIG. 1;
FIG. 7 is a functional block diagram of an output module useful in the remote data protection facility depicted in FIG. 1;
FIGS. 8 and 9 are a flow charts detailing operations performed by the filter/buffer module's control module in controlling the filter/buffer module depicted in FIG. 5; and
FIGS. 10 and 11 are flow charts detailing operations performed by the tape log module's tape log control module in controlling the tape log module depicted in FIG. 6; and
FIG. 12 is a flow chart detailing operations performed by the reconstruction module 53 depicted in FIG. 1.
DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT
I. General
FIG. 1 is a functional block diagram of a remote data protection facility 5 constructed in accordance with the invention. With reference to FIG. 1, the remote data protection facility 5 is shown connected to one or more digital data processing systems 10(0) through 10(N) (generally identified by reference numeral 10(n)) over one or more communication links 12(0) through 12(N) (generally identified by reference numeral 12(n)). Each digital data processing system 10(n) includes one or more host computers generally identified by reference numeral 11(n) and an associated local mass storage subsystem generally identified by reference numeral 14(n). The host computer 11(n) may comprise, for example, a mainframe computer system, a personal computer, workstation, or the like which can be connected directly or indirectly to the respective mass storage subsystem 14(n). Each host computer 11(n) may initiate an access operation in connection with its associated local mass storage subsystem 14(n) to perform a retrieval operation, in which the local computer 13(n) initiates retrieval of computer programs and digital data (which will generally be referred to herein as "information" or "data") from the local mass storage subsystem 14(n) for use by the host computer 11(n) in its processing operations. In addition, each host computer 11(n) may initiate storage of processed data in the mass storage subsystem 14(n). Generally, retrieval operations and storage operations in connection with the mass storage subsystems 14(n) will collectively be referred to as "access operations."
The mass storage subsystems 14(n) in one embodiment are generally similar to the digital data storage subsystem described in U. S. Pat. No. 5,206,939, entitled System And Method For Disk Mapping And Data Retrieval, issued Apr. 27, 1993 to Moshe Yanai, et al (hereinafter, "the '939 patent"), U.S. patent application Ser. No. 07/893,509 filed Jun. 4, 1992, in the name of Moshe Yanai, et al., entitled "System And Method For Dynamically Controlling Cache Management" now U.S. Pat. No. 5,381,539, issued Jan. 19, 1995, and U.S. patent application Ser. No. 08/523,304, filed Sep. 5, 1995, in the name of Natan Vishlitzky, et al., and entitled "Cache Management System Using Time Stamping For Replacement Queue" (hereafter "Vishlitzky cache management application"), now U.S. Pat. No. 5,592,432 issued Jan. 7, 1997, and U.S. patent application Ser. No. 08/619,931, filed Mar. 18, 1996, in the name of Natan Vishlitzky, et al., and entitled "System And Method For Caching Information In A Digital Data Storage Subsystem," all of which are assigned to the assignee of the present invention and incorporated herein by reference.
The remote data protection facility 5 provides "mirrored" back-up protection for data stored in the mass storage subsystems 14(n) of the various digital data processing systems 10(n), to protect against numerous types of failures, including, for example, catastrophic failures at the sites of the respective mass storage subsystems 14(n). Such catastrophic failures may result from numerous types of events at the respective sites of the mass storage subsystems 14(n), including, for example, fire, flood or other natural disasters, intentional tampering or sabotage, and the like, which may result in unintentional or intentional damage to a mass storage subsystem 14(n) and/or its site and consequent loss of availability of the data stored in the respective mass storage subsystem 14(n). The remote data protection facility 5 will preferably be located geographically remotely from the sites of the digital data processing systems 11(n) sufficiently far to ensure that, if a catastrophic failure occurs at the digital data processing system, the remote data protection facility will survive. It will be appreciated that remote data protection facilities may be provided at a number of sites that are distributed geographically, and a digital data processing system 11(n) may be connected to remote data protection facilities at one or more of the sites. In addition, the remote data protection facility 5 will also protect against digital data processing system failures which are less than catastrophic, such as, for example, failure of some or all of the elements of the mass storage subsystems 14(n) as described below, for reasons that are not limited to catastrophic causes.
Each of the mass storage subsystems 14(n) may transfer information to the remote data protection facility 5 over the respective communication link 12(n) for protected mirrored storage at the remote data protection facility site remote from the respective digital data processing system 10(n). Furthermore, each mass storage subsystem 14(n) may transmit control commands to the remote data protection facility 5 to control certain operations of the remote data protection facility 5. In addition, if the digital data processing system 10(n) requires information stored on the remote data protection facility 5 for processing, which may be a result of an earlier catastrophic failure at the site of the digital data processing system 10(n), failure to maintain suitable information data protection locally at the digital data processing system 10(n), or the like, the digital data processing system 10(n) (in particular its mass storage subsystem 14(n)) may retrieve the information that was previously stored on the remote data protection facility 5 for use in its subsequent processing. Furthermore, if, for example, a digital data processing system 10(n) is unavailable due to, for example, a catastrophe at its site, another digital data processing system 10(n') (n'.noteq.n) may retrieve information from the remote data protection facility 5 which was previously stored by the digital data processing system 10(n) for use in its processing, which may assist in ensuring that the information is continually available for processing even if the digital data processing system 10(n) is not available, thereby ensuring that the information will be available to at least one of the digital data processing systems 10(n).
The communication links 12(n) interconnecting the respective digital data processing systems 10(n), on the one hand, and the remote data protection facility 5, on the other hand, are preferably high-speed data communications links, such as may be used in connection with computer networks, including, for example, optical fibers, high-speed telephone lines, and the like. The information transferred over the communication links 12(n) is preferably compressed, using any convenient compression mechanism, and some or all of the information may be encrypted to protect against improper eavesdropping or dissemination during communication over the communication links. If the remote data protection facility 5 is to be used in connection with information which belongs to multiple owners, each of the owners of the information may use its own encryption mechanism (such as its own encryption algorithm or its own encryption key); as will be clear from the following description, the remote data protection facility 5 may store information in encrypted or unencrypted form, but will preferably be provided with an identifier for each item of information so that, if an item is updated, it will be able to associate the item with its update.
In accordance with one aspect of the invention, in one embodiment, the mass storage subsystems 14(n) provide access requests, including storage requests and retrieval requests, to the remote data protection facility 5 when information is to be stored in, or retrieved from, the remote data protection facility 5, without requiring any action by a host computer 11(n). In that embodiment, the generation of storage and retrieval requests is, accordingly, effectively transparent to the host computers 11(n) and programs being processed thereby. The structure and operation of one embodiment of a digital data processing system 11(n) useful in connection with the remote data protection facility 5 will be described in connection with FIGS. 2 and 3, and the structure and operation of the remote data protection facility 5 itself will be described in connection with FIGS. 4 through 12.
II. Digital Data Processing System 10(n)
FIG. 2 depicts a functional block diagram of a digital data processing system 10(n) which is useful with the remote data protection facility 5. The digital data processing system 10(n) is generally similar to the digital data processing system 10(n) described in the above-identified Yanai patent and Vishlitzky Cache management application and Vishlitzky patent applications. FIG. 3 depicts several data structures which are useful in understanding the operation of the digital data processing system 10(n) depicted in FIG. 2. With reference to FIG. 2, digital data processing system 10(n) includes a plurality of host computers 11(n)(1) through 11(n)(K) (generally identified by reference numeral 11(n)(k)), mass storage subsystem 14(n) and a remote data protection facility interface 17 interconnected by a common bus 13. Each host computer 11(n)(k) includes a local computer 16(k), which may comprise, for example, a personal computer, workstation, or the like which may be used by a single operator, or a multi-user computer system which may be used by a number of operators.
Each local computer 16(k) is connected to an associated host adapter 15(k), which, in turn, is connected to bus 13. Each local computer 16(k) may control its associated host adapter 15(k) to perform a retrieval operation, in which the host adapter 15(k) initiates retrieval of information from the mass storage subsystem 14(n) for use by the local computer 16(k) in its processing operations. In addition, the local computer 16(k) may control its associated host adapter 15(k) to perform a storage operation in which the host adapter 15(k) initiates storage of processed data in the mass storage subsystem 14(n). Generally, storage operations and retrieval operations in connection with the mass storage subsystem 14(n) will collectively be referred to as "access operations."
The remote data protection facility interface 17 monitors storage operations by the local computers's host adapters 15(k) and, when the host adapter 15(k) initiates a storage operation as described below, it will also receive the processed data and transfer it to the remote data protection facility 5 for mirrored storage. The remote data protection facility interface 17 can also initiate retrieval operations to retrieve information from the mass storage subsystem 14(n) to be transferred to the remote data protection facility 5 for mirrored storage, as will also be described below. In addition, the remote data protection facility interface 17 can transfer operational commands to the remote data protection facility 5 to enable the remote data protection facility to perform predetermined operations. The operational commands may be provided by, for example, a system manager through the host computers 11(n)(1), or through a system manager console 19. Furthermore, the remote data protection facility interface 17 can also receive information from the remote data protection facility 5 for storage in the mass storage subsystem 14(n).
In connection with both retrieval and storage operations, the host adapter 15(k) will transfer access operation command information, together with processed data to be stored during a storage operation, over the bus 13, and a bus access control logic circuit 18 is provided to arbitrate among devices connected to the bus, including the host adapters 15(k), which require access to the bus 13. In controlling access to the bus 13, the bus access control logic circuit 18 may use any of a number of known bus access arbitration techniques, including centralized bus access control techniques in which bus access is controlled by one device connected to bus 13, as well as distributed arbitration techniques in which bus access control logic circuitry is distributed among the devices which require access to the bus. In addition, the digital data processing system 10(n) includes the system manager console 19 which, in addition to permitting the system manager to control the remote data protection facility 10, also can permit a system manager to control various elements of the system 10 in a conventional manner. It will be appreciated that, although the system manager console 19 is shown in FIG. 1 as a separate element, any of the local computers 22(h) may provide the functionality of the console 19, in which case a separate element need not be provided.
The mass storage subsystem 14(n) in one embodiment is generally similar to the mass storage subsystem described in U.S. Pat. No. 5,206,939, entitled System And Method For Disk Mapping And Data Retrieval, issued Apr. 27, 1993 to Moshe Yanai, et al (hereinafter, "the '939 patent"). As shown in FIG. 1, the mass storage subsystem 14(n) includes a plurality of digital data stores 20(1) through 20(M) (generally identified by reference numeral 20(m)), each of which is also connected to bus 13. Each of the data stores 20(m) stores information, including programs and data, which may be accessed by the host computers 11 (n)(k) as well as processed data provided to the mass storage subsystem 14(n) by the host computers 11(n)(k). Generally, the information is in the form of records, which may be of variable length.
Each data store 20(m), in turn, includes a storage controller 21(m) and one or more storage devices generally identified by reference numeral 22. The storage devices 22 may comprise any of the conventional magnetic disk and tape storage devices, as well as optical disk storage devices and CD-ROM devices from which information may be retrieved. Each storage controller 21(m) connects to bus 13 and controls the storage of information which it receives thereover in the storage devices connected thereto. In addition, each storage controller 21(m) controls the retrieval of information from the storage devices 22 which are connected thereto for transmission over bus 13. In addition to controlling access by the host adapters 15(k) to bus 13, the bus access control logic circuit 18 also controls access by the storage controllers to the bus 13.
The mass storage subsystem 14(n) also includes a common memory subsystem 30 for caching information during an access operation and event status information providing selected status information concerning the status of the host computers 11(n)(k) and the data stores 20(m) at certain points in their operations. The caching of event status information by the common memory subsystem 34 is described in detail in U.S. patent application Ser. No. 08/532,240 filed Sep. 22, 1992, in the name of Eli Shagam, et al., and entitled Digital Computer System Including Common Event Log For Logging Event Information Generated By A Plurality of Devices (Atty. Docket No. 95-034) assigned to the assignee of the present invention and incorporated herein by reference. The information cached by the common memory subsystem 34 during an access operation includes data provided by a host computer 11(n)(k) to be stored in a data store 20(m) during a storage operation, as well as data provided by a data store 20(m) to be retrieved by a host computer 11 (n)(k) during a retrieval operation. The common memory subsystem 34 effectively operates as a buffer to buffer information transferred between the host computers 11(n)(k) and the data stores 20(m) during a local access operation.
The common memory subsystem 30 includes a cache memory 31, a cache index directory 32 and a cache manager memory 33, which are generally described in U.S. patent application Ser. No. 07/893,509 filed Jun. 4, 1992, in the name of Moshe Yanai, et al., entitled "System And Method For Dynamically Controlling Cache Management now U.S. Pat. No. 5,381,539issued Jan. 19, 1995, " and U.S. patent application Ser. No. 08/523,304, filed Sep. 5, 1995, in the name of Natan Vishlitzky, et al., and entitled "Cache Management System Using Time Stamping For Replacement Queue" (hereafter "Vishlitzky cache management application"), now U.S. Pat. No. 5,592,432 issued Jan. 7, 1997, both of which are assigned to the assignee of the present invention and incorporated herein by reference. The cache memory 31 operates as a buffer in connection with storage and retrieval operations, in particular buffering records received from the host computers 11(n)(k) to be transferred to the storage devices for storage, and buffering data received from the data stores 20(m) to be transferred to the host computers 11(n)(k) for processing.
The cache memory 31 and cache index directory 32 will generally be described in connection with FIG. 3. With reference to FIG. 3, the cache memory 31 includes a series of storage locations, which are organized in a series of cache slots 35(0) through 35(S) (generally identified by reference numeral 35(s)). The storage locations are, in turn, identified by a series of addresses, with the starting address of a cache slot being identified by a base address. The cache slots 35(s), in turn, operate as the cache memory's buffer as described above.
The cache index directory 32 operates as an index for the cache slots 35(s) in the cache memory 31. The cache index directory 32 includes a plurality of cache index tables 36(0) through 36(D) (generally identified by reference numeral 36(d)), each of which is associated with one of the storage devices 22 in the storage subsystem 14(n). Each cache index table 36(d) includes a device header field 40, which provides, for example, selected identification and status information for the device 22 associated with the cache index table 36(d). In addition, each cache index table 36(d) includes a plurality of cylinder descriptors 41(0) through 41(C) (generally identified by reference numeral 41(c)) each of which is associated with one of the cylinders in the storage device 22 that is associated with the cache index table 36(d). Each cylinder descriptor 41(c), in turn, includes a cylinder header 42(c), which provides, for example, selected identification and status information for the cylinder associated with the cylinder descriptor 41(c).
In addition, each cylinder descriptor 41(c) includes a plurality of track descriptors 43(c)(0) through 43(c)(T) (generally identified by reference numeral 41 (t)), each of which is associated with one of the tracks in the cylinder 41(c). Each track descriptor 43(c)(t), in turn, includes information for the associated track of the storage device 22, including whether a copy of the data stored on the track is cached in the cache memory 31, and, if so, the identification of the cache slot 35(s) in which the data is cached. In particular, each track descriptor 43(c)(t) includes a cached flag 44(c)(t) and a cache slot pointer 45(s)(t). The cached flag 44(c)(t), if set, indicates that the data on the track associated with the track is cached in a cache slot 35(s), and the cache slot pointer 45(s)(t) identifies the particular cache slot in which the data is cached. In addition, each track descriptor 43(c)(t) includes a used flag 46(c)(t) which may be used to indicate whether the data, after being stored in the cache slot identified by the cache slot pointer 43(c)(t), has been used by the host computer 11(n)(k) during a retrieval operation. This "host used" flag may be used to determine whether the cache slot may be re-used for another access operation.
Each of the host adapters 15(k) and each of the storage controllers 21(m) includes a cache manager 23(k) and 24(m), respectively, to access to the cache memory 31, cache index directory 32 and cache manager memory 33. The particular operations performed during an access operation will depend on a number of factors, including the access operation to be performed, whether or not the data from the particular track to be accessed is cached in the cache memory 31, and whether or not the data contained in a cache slot 35(s) has been modified or updated by a host adapter's cache manager 24(k) during a storage operation. As described in the aforementioned Vishlitzky cache management application, the host computers 11(n)(k) typically perform storage and retrieval operations in connection with data in the cache memory 31, and the storage controllers 21(m) perform "staging" and "de-staging" operations to transfer data in the storage devices 22 to the cache memory 31 for buffering (the staging operations) and to transfer data from the cache memory 31 to the storage devices 22 for storage (the de-staging operations). In performing the staging and de-staging operations, the storage controllers 21(m) generally transfer data to and from the cache memory 31 in units of a track, that is, they will during a staging operation transfer all of the data in a track from a storage device 22 to a cache slot 35(s) in the cache memory 31, and during a de-staging operation copy all of the data in a slot in the cache memory 31 to the track of the storage device 22 from which it was originally staged.
The cache manager memory 33 maintains a number of work lists which are used to control operations by the host adapters 15(k) and storage controllers 21(m) during an access operation. In particular, the cache manager memory 33 includes a cache slot replacement list, a pending write list and various lists which the host adapters 15(k) and storage controllers 21(m) use to communicate to coordinate staging operations (not shown). The various lists maintained by the cache manager memory 33 may comprise any of a number of convenient forms, including queues, trees, stacks or the like. The cache slot replacement list is used to control re-use of cache slots during staging operations in accordance with a convenient cache-slot re-use methodology. During a staging operation, the storage controller's cache manager 24(m) uses the cache slot replacement list to select a cache slot 35(s) into which it will load the data retrieved from a storage device. (The aforementioned Vishlitzky cache management application describes a modified least-recently-used cache-slot re-use methodology used in one embodiment of the invention). The pending write list is used to identify cache slots 35(s) which contain updated data, which has not been written to a storage device. During de-staging operations, the storage controllers' cache managers 24(m) will use the write pending list to identify cache slots to be written to a storage device 22. Preferably, the cache slots 35(s) which are identified in the pending write list will not also be listed in the cache slot replacement list, so that cache slots 35(s) which contain updated data will not be used until the data has not been written to a storage device through a de-staging operation.
The staging operation coordination communication lists include a plurality of stage request lists and a plurality of stage completion lists, with one stage request list being associated with each data store 20(m) and one stage completion list being associated with each host computer 11(n)(k). The host computers' cache managers 23(m) use the stage request lists to store stage requests to be performed by the respective data stores 20(m), and the data stores' cache managers 24(m) use the stage completion lists to store stage completion messages to indicate to the respective host computers' cache managers 23(m) that the stage requests have been completed.
Generally, a host computer 11(n)(k), during a retrieval operation, attempts to retrieve the data from the cache memory 31. However, if the data is not in the cache memory 31, it will enable the storage controller 21(m) which controls the storage device 22 that contains the data to be retrieved to "stage" the track which contains the data to be retrieved, that is, to transfer all of the data in the track which contains the data to be retrieved into a slot in the cache memory 31. After the data to be retrieved is in a slot in the cache memory 31, the host computer 11(n)(k) will retrieve the data from the slot. Similarly, during a storage operation, the host computer 11(n)(k) will determine whether the particular track into which the data is to be written is in a slot in the cache memory 31 and if so will store the data in the slot. However, if the data is not in the cache memory 31, the host computer 11(n) will enable the cache manager 24(m) and storage controller 21(m) which controls the storage device 22 that contains the track whose data is to be updated to perform a staging operation in connection with the track, thereby to transfer the data in the track into a slot in the cache memory 31. After the data from the track has been copied into the cache memory 31, the host computer 11(n)(k) will update the data in the track.
The storage controller 21(m) generally attempts to perform a staging operation in connection with an empty slot in the cache memory 31. However, if the storage controller 21(m) may find that all of the cache slots in the cache memory 31 are filled, it will in any case select one of the slots to be used with the staging operation. Before transferring the data from the track to the selected cache slot, it will determine whether the data in the slot has been updated by a storage operation, and if so copy the data to the storage device 22 in a de-staging operation, and thereafter perform a staging operation to copy the data from the storage device to the selected cache slot. It will be appreciated that the storage controller 21(m) need only perform a de-staging operation in connection with a cache slot if the data in the cache slot has been updated, since if the data in the cache slot not been updated before the slot is re-used (which may occur if the a host computer 11(n)(k) has only performed retrieval operations therewith), the data in the cache slot corresponds to the data in the storage device 22.
More specifically, as described in the aforementioned Vishlitzky cache management application, during a retrieval operation, the cache manager 23(k) of the initiating host adapter 15(k) will initially access the cache index table 36(d) in the cache index directory 32 associated with the storage device 22 in which the data to be retrieved is stored, in particular accessing the track descriptor 36(c)(t) of the cylinder descriptor 36(c) to determine, from the condition of the cached flag 42(c)(t), whether the data from the track is cached in a cache slot 35(s) in the cache memory. If the cached flag 42(c)(t) indicates that data from the track is cached in a cache slot 35(s), the cache manager 23(k) uses the cache slot pointer 43(t) to identify the particular cache slot 35(s) in which the data is cached and retrieves the required data from the cache slot 35(s).
On the other hand, if the cache manager 23(k) determines from the cached flag 36(c)(t) that the data from the track is not cached in a cache slot 35(s), it will generate a stage request to enable the storage controller 21(m) for the storage device 22 which maintains the data to be retrieved, load the stage request in the stage request queue for the data store 20(m) and notify the storage controller 21(m) that a stage request had been loaded in the stage request queue. At some point after receiving the notification, the storage controller 21(m) will retrieve the stage request and perform a staging operation in response thereto. In performing the staging operation, the storage controller 21(m) will retrieve the data from the requested track, use the above-described cache slot replacement list to select a cache slot 35(s), load the data into cache slot 35(s) and update the track descriptor 36(c)(t) in the cache index table 36(d) associated with the storage device 22 to indicate that the data from the track is in the cache slot 35(s), in particular setting the cached flag 42(c)(t) and loading a pointer to the cache slot in the cache slot pointer 43(c)(t).
After the storage controller 21(m) has completed the staging operation, it will load a staging completed message in the stage completion list in the cache manager memory 33 associated with the host computer 11(n)(k) which issued the staging request, and notify the host computer's cache manager 23(k) that a stage completed message has been loaded therein. At some point after receiving the notification, the host computer's cache manager 23(k) can repeat the operations performed in connection with the retrieval request as described above, in particular accessing the cache index table 36(d) in the cache index directory 32 associated with the storage device 22 in which the data to be retrieved is stored, in particular accessing the track descriptor 36(c)(t) of the cylinder descriptor 36(c) to determine, from the condition of the cached flag 42(c)(t), whether the data from the track is cached in a cache slot 35(s) in the cache memory and, if so, use the cache slot pointer 43(t) to identify the particular cache slot 35(s) in which the data is cached and retrieve the required data from the cache slot 35(s). Since at this point the cached flag 42(c)(t) should indicate that the data from the track is cached in a cache slot 35(s), the cache manager 23(k) should be able to complete the retrieval operation.
Similar operations occur during a storage operation, in which data in a particular track is updated, with the additional operation of removing the identification of the cache slot 35(s) containing data to be updated from the replacement list and loading it into the pending write list. During a storage operation, the cache manager 23(k) of the initiating host adapter 15(k) will initially access the cache index table 36(d) in the cache index directory 32 associated with the storage device 22 in which the data to be updated is stored, in particular accessing the track descriptor 36(c)(t) of the cylinder descriptor 36(c) to determine, from the condition of the cached flag 42(c)(t), whether the data from the track is cached in a cache slot 35(s) in the cache memory. If the cached flag 42(c)(t) indicates that data from the track is cached in a cache slot 35(s), the cache manager 23(k) uses the cache slot pointer 43(t) to identify the particular cache slot 35(s) in which the data is cached and loads the update data into the cache slot 35(s). In addition, the host adapter's cache manager 23(k) will remove the identification of the selected cache slot 35(s) from the replacement list to the pending write list so that the cache slot 35(s) will not be re-used until a de-staging operation has been performed in connection with the cache slot 35(s).
On the other hand, if the cache manager 23(k) determines from the cached flag 36(c)(t) that the data from the track is not cached in a cache slot 35(s), it will generate a stage request to enable the storage controller 21(m) for the storage device 22 which maintains the data to be retrieved, load the stage request in the stage request queue for the data store 20(m) and notify the storage controller 21(m) that a stage request had been loaded in the stage request queue. At some point after receiving the notification, the storage controller 21(m) will retrieve the stage request and perform a staging operation in response thereto. In performing the staging operation, the storage controller 21(m) will retrieve the data from the requested track, select a cache slot 35(s), load the data into cache slot 35(s) and update the track descriptor 36(c)(t) in the cache index table 36(d) associated with the storage device 22 to indicate that the data from the track is in the cache slot 35(s), in particular setting the cached flag 42(c)(t) and loading a pointer to the cache slot in the cache slot pointer 43(c)(t).
After the storage controller 21(m) has completed the staging operation, it will load a staging completed message in the stage completion queue in the cache manager memory 33 associated with the host computer 11(n)(k) which issued the staging request, and notify the cache manager 23(k) that a stage completed message has been loaded therein. At some point after receiving the notification, the cache manager 23(k) can repeat the operations performed in connection with the retrieval request as described above, in particular accessing the cache index table 36(d) in the cache index directory 32 associated with the storage device 22 in which the data to be retrieved is stored, in particular accessing the track descriptor 36(c)(t) of the cylinder descriptor 36(c) to determine, from the condition of the cached flag 42(c)(t), whether the data from the track is cached in a cache slot 35(s) in the cache memory and, if so, use the cache slot pointer 43(t) to identify the particular cache slot 35(s) in which the data is cached and retrieve the required data from the cache slog 35(s). Since at this point the cached flag 42(c)(t) should indicate that the data from the tack is cached in a cache slot 35(s), the cache manager 23(k) should be able to complete the storage operation as described above.
As described above, the data stores'cache managers 24(m) also perform de-staging operations using the pending write list to identify cache slots 35(s) which contain updated data to be written back to the original storage device 22 and track whose data was cached in the respective cache slots 35(s). After the data store's cache manager 24(m) has de-staged a cache slot 35(s), it will notify the remote data protection facility 20, which, in turn, will retrieve the records in the de-staged cache slot 35(s) and transfer them to the remote data protection facility 5 for storage. After it receives an acknowledgment for the records from the remote data protection facility 5, the remote data protection facility 5 can remove the cache slot's identification from the pending write list and return it to the replacement list so that the cache slot 35(s) can be reused.
As indicated above, the remote data protection facility interface 17 performs several operations. Generally, the remote data protection facility interface 17:
(i) monitors storage operations by the local computers's host adapters 15(k) and, when a host adapter 15(k) initiates a storage operation, it will also receive the processed data and transfer it to the remote data protection facility 5;
(ii) initiates retrieval operations to retrieve information from the mass storage subsystem 14(n) for transfer to the remote data protection facility 5 for mirrored storage, and
(iii) receives information from the remote data protection facility 5 for storage in the mass storage subsystem 14(n) during a reconstruction operation.
In addition, the remote data protection facility interface 17 can transfer operational commands provided by a system manager to the remote data protection facility 5 to control the operations thereof. The remote data protection facility interface 17 can also receive status information representing the operational status of the remote data protection facility 5, which status information can be provided to a system manager.
As will be described below in connection with FIGS. 4 through 13, the remote data protection facility 5 stores data from the digital data processing systems 10(n) in the form of fixed-length portions which will be referred to as "segments." In one embodiment, in which the storage devices comprise disk storage units, each segment is selected to comprise contents of an entire track of a respective storage device 22, which can include one or more CKD records as described above. Generally, when a host adapter 15(k) initiates a storage operation, the information that is stored will not comprise a complete segment of data (that is, data for a complete track) in the cache memory 31, but instead will store only information comprising a partial segment. As described above, the remote data protection facility interface 17 will also transfer this partial segment to the remote data protection facility 5 for mirrored storage.
As will be described below, the remote data protection facility 5 operates in two phases, including (i) filtering and buffering information received from the digital data processing systems 10(n) and thereafter (ii) storing the filtered information on, in one embodiment, magnetic tape storage cartridges. During the filtering and buffering phase, the remote data protection facility 5 will buffer all of the information that it receives from the remote data protection facility interface. However, at some point prior to storing the filtered information on the tape cartridges, the remote data protection facility 5 will determine whether the particular information received from the remote data protection facility interface 17 comprises a partial segment or a full segment, and if the information comprises a partial segment the remote data protection facility 5 will request the remote data protection facility interface 17 to provide the complete segment. At that point, the remote data protection facility interface 17 can initiate a retrieval operation in connection with the mass storage subsystem 14(n) to retrieve the segment. The operations performed by the remote data protection facility interface 17 and the mass storage subsystem 14(n) during this retrieval operation are similar to those described above in connection with retrieval operations initiated by a host computer's host adapter 15(n), and may necessitate performance of a staging operation as described above to enable the segment to be loaded in the cache memory 31. After the segment has been loaded in the cache memory 31, the remote data protection facility interface 17 can retrieve it and transfer it to the remote data protection facility 5.
When the full segment is received by the remote data protection facility 5, the filtering performed during the filtering phase will preferably provide that the previously-received partial segment will be filtered-out and discarded. It will also be apparent from the description of the remote data protection facility 5 below that if it (that is, the remote data protection facility 5) has received any other partial segments for the particular segment prior to receiving the full segment from the remote data protection facility interface 17, those other partial segments will also be discarded, since the information contained therein will also be contained in the full segment received from the remote data protection facility interface 17. On the other hand, if the remote data protection facility interface 17 is unable to provide the full segment, which may occur, for example, as a result of a malfunction or other failure in connection with the mass storage subsystem 14(n), the partial segment(s) may be useful in reconstructing the full segment during a reconstruction operation.
III. Remote Data Protection Facility 5
A. General
The structure and operation of the remote data protection facility 5 will be described in connection with FIGS. 1 and 4 through 13. With reference initially to FIG. 1, the remote data protection facility 5 generally includes an input module 50, a filter/buffer module 51, a tape log module 52, a reconstruction module 53 and an output module 54, all under control of a control module 55. The input module 50, as will be described below in detail in connection with FIG. 4, receives information from the respective digital data processing systems 10(n) which is to be stored by the remote data protection facility 5, couples it to the filter/buffer module 51, and generates acknowledgments for transmission to the digital data processing system 10(n) from which the information was received. In addition, the input module 50 received control commands from the respective digital data processing systems 10(n), and couples them to the control module 55. The control commands may, for example, enable the remote data protection facility 5 to begin storing information from the digital data processing system 10(n), retrieve previously-stored information for transmission to the same or another digital data processing system 10(n), and the like.
The filter/buffer module 51 performs the filtering and buffering phase as described above. The filter/buffer module 51 buffers information received from the input module, formats it into predetermined formats for storage, and filters the buffered information, as will be described below in detail in connection with FIGS. 5 and 8. In one particular embodiment, the filter/buffer module 51 buffers the received information using one or more disk storage devices, although it will be appreciated that other digital data storage devices, such as conventional random access memories, may be used in instead of the disk storage devices or to augment the storage provided by the disk storage devices. If information received from a digital data processing system 10(n) is in the form of a partial segment, the filter/buffer module 51 at some point during the filtering and buffering operation will also request the source digital data processing system 10(n), that is, the digital data processing system 10(n) which provided the partial segment was received, to provide the entire segment.
After filtering by the filter/buffer module 51, the filtered information is transferred to the tape log module 52 for storage. The tape log module 52 performs the storage phase as described above. In the tape log module 52, which will be described below in detail in connection with FIGS. 6, 10 and 11, the information received from the digital data processing systems 10(n) is logged onto tape cartridges, such as digital linear tape ("DLT") cartridges, using a conventional autochanger (not separately shown) which forms part of the tape log module 52. In logging the information onto the tape cartridges, the tape log module 52 stores the information received from the filter/buffer module 51 on a currently-selected "logging" cartridge, without regard to whether the information currently being stored is an update of previously-stored information which may be stored on the same or another cartridge. As will be described below in connection with FIGS. 6, 10 and 11, the tape log module 52 uses various data structures to determine, if multiple updates for the same segment are stored on one or more tape cartridge in the tape log module 52, which update was most recently received from the respective digital data processing system 10(n), and, thus, is the valid update.
In one embodiment, the autochanger includes eighty-eight tape cartridges divided into eight groups, or "logging sets," of eleven cartridges each. The autochanger includes one robot arm, which is used to move cartridges between cartridge storage slots in which the cartridges are normally stored and ones of nine drives for storing information on and retrieving information from the tape cartridges. Generally, one drive will be allocated for use with an associated one of the logging sets, and the ninth drive will be used if a space reclamation operation is being performed in connection with a cartridge from one of the logging sets as described below.
Each "protected volume" whose data is mirrored by the remote data protection facility 5 is associated with one logging set, although one logging set may be associated with a number of protected volumes. In one embodiment, each protected volume is associated with one of the storage devices 22 in a mass storage subsystem 14(n). Each segment which is received by the remote data protection facility 5 is associated with a segment identifier that uniquely identifies the particular mass storage subsystem 14(n), protected volume, cylinder and track on which the segment is stored.
Periodically, the tape log module 52 will perform a space reclamation operation in connection with a cartridge, as a space reclamation source cartridge, to consolidate all of the valid segment updates from the space reclamation source cartridge onto one or more other cartridges. Preferably, a large portion of the segment updates on the space reclamation source cartridge will be invalid, that is, they will have been superseded by more recently-received segment updates which may be stored on other cartridges. After the space reclamation operation, the space reclamation source cartridge may be considered empty and used for storing data during subsequent storage and space reclamation operations. During a space reclamation operation, the valid segment updates will be copied from the source cartridge onto the cartridge from the log set which is currently being used for logging, that is onto which information from the filter/buffer module 51 is being stored. The space reclamation operation will be performed concurrently with the logging operation, so that valid segment updates retrieved from the space reclamation source cartridge will be stored on the current logging cartridge interleaved with segment updates that are provided to the tape log module 52 by the filter/buffer module 51. During a space reclamation operation, if the current logging cartridge becomes filled another cartridge may be selected as the current logging cartridge; accordingly, during a space reclamation operation, valid segment updates from the space reclamation source cartridge may be copied onto several cartridges in the logging set.
When a protected volume needs to be provided with information stored in a log set of the tape log module 52, in either a full reconstruction operation or a partial reconstruction operation, the reconstruction module 53 will retrieve the required information and provide it to the output module 54, which, in turn, provides the information to the protected volume's digital data processing system 10(n) or to another digital data processing system 10(n'). In that operation, the reconstruction module 53 may obtain the information from the particular ones of the cartridges on which the information has been stored by the tape log module 52, as well as from the filter/buffer module 51 if that module 51 is buffering more recently received information than is stored on the cartridges. The reconstruction module 53 may perform a full reconstruction operation if all of the information from, for example, a particular digital data processing system 10(n) needs to be reconstructed, which may occur, for example, in the event of a catastrophic failure at the digital data processing system 10(n). On the other hand, the reconstruction module 53 may perform a partial reconstruction operation if information from only one or several storage devices 22 (FIG. 2) needs to be provided, which can occur, for example, in the event of a failure by the storage devices 22.
During a reconstruction operation in connection with a protected volume, the reconstruction module 53 will enable the tape log module 52 to scan through the cartridges of the log set on which the information from the protected volume is mirrored to retrieve the valid information for the protected volume or volumes whose information is to be reconstructed. In addition, the reconstruction module 53 can retrieve information that is currently being buffered for the protected volume or volumes whose information is to be reconstructed from the filter/buffer module 51 and merge that information with the information retrieved from the cartridges. The reconstruction module 53 will provide the merged information to the output module 54, which in turn will transmits the information to the protected volume's digital data processing system 10(n), or to another digital data processing system 10(n') (n'.noteq.n) if, for example, there was a catastrophic failure at the original digital data processing system 10(n).
The reconstruction module 53 can perform essentially two types of reconstruction operations, namely, a full reconstruction operation and a partial reconstruction operation. In the embodiment in which a protected volume comprises a single storage device 22, in a partial reconstruction, the reconstruction module can perform a reconstruction operation in connection with the mirrored information for the protected volume and provide the reconstructed information to the output module 54 to be transferred to the mass storage subsystem 12(n) which contains the protected volume. The mass storage subsystem 12(n), in turn, can distribute the information among its other storage devices, load it onto a spare, or the like. During a partial reconstruction operation, the tape log module 52 can retrieve information from a plurality of the cartridges of the log set in parallel, in multiple ones of the drives provided by the autochanger, to reduce the time required for the partial reconstruction. A full reconstruction operation, in which all or a subset of protected volumes of one or more mass storage subsystems 14(n) will be reconstructed, is generally similar, except that information may be retrieved from cartridges from multiple log sets. In any case, by limiting storage of information from a single protected volume to a single log set, the number of cartridges that need to be scanned to reconstruct the information from the protected volume can be limited, which, in turn, can also serve to reduce the time required for the partial reconstruction.
As indicated above, the various elements 50 through 55 of the remote data protection facility 5 operate under control of the control module 55. The control module 55 controls the remote data protection facility in response to commands received from the various digital data processing systems 10(n), which may enable it to, for example, initiate logging for a respective digital data processing system 10(n), and initiate a full or partial data reconstruction operation, as will be described below.
B. Input Module 50
FIG. 4 depicts the structure of the input module 50 useful in the remote data protection facility 5. With reference to FIG. 4, the input module 50 includes a plurality of interfaces 60(1) through 60(N) (generally identified by reference numeral 60(n)) each of which is connected to receive information from a correspondingly-indexed digital data processing system 10(n) over a communication link 12(n). Each interface 60(n) receives signals, either in electrical or optical form representing digital information or control commands that are transmitted to the remote data protection facility 5, converts the signals to digital form and provides the digitized information to a respective block generator 61(1) through 61(N) (generally identified by reference numeral 61(n)).
Each block generator 61(n), in turn, receives the digital information provided by the interface 60(n) and generates therefrom individual items of information, and in addition aggregates the individual items into blocks to be logged. Each item of information, which will be termed herein a "segment update," corresponds to information from either a partial segment or a full segment, with, as indicated above, a fall segment corresponding in one embodiment to the information stored on a track of a storage device 22. Since a segment update may comprise a partial segment or a full segment, a segment update may be of variable length, up to a maximum length which corresponds to the maximum amount of information that can be stored on a track of a storage device 22. Each segment update is associated with a segment identifier, which in one embodiment is a selected function of an identifier identifying the mass storage subsystem 14(n) which contains the protected volume on which the segment associated with the update is stored, an identifier for the protected volume itself, and an identifier for the track on which the segment update is stored. The segment identifier will remain constant if the contents of the record are changed, modified or updated, for reasons which will be clear from the following description.
Each block generator 61(n) aggregates the received segment updates, along with the segment identifiers and other information, into fixed-sized blocks, which have structures which will be described below in more detail in connection with FIG. 5. The blocks may have any convenient length; in one embodiment, in which the filter/buffer module 51 buffers the information received from the digital data processing systems 10(n) in disk storage devices, the block length is selected to be greater than the maximum segment length. Each block accommodates at least one segment update. In one embodiment, each segment update is stored in at most one block, so that segment updates will not be divided across multiple blocks. Each block generator 61(n), after generating the segment updates and aggregating them into respective blocks, passes the blocks to the filter/buffer module 51 for buffering and filtering.
In addition, each block generator 61(n) receives the digital data relating to control commands and generates command information therefrom which it provides to the control module 55. The control information may enable the control module to, for example, enable the filter/buffer module 51, tape log module 52 and reconstruction module 53 to begin operations in connection with a new digital data processing system, and to perform a full or partial reconstruction operation to reconstruct information which it has been logging for a digital data processing system.
C. Filter/Buffer Module 51
FIG. 5 depicts the structure of the filter/buffer module 51 useful in the remote data protection facility 5 (FIG. 1). With reference to FIG. 5, the filter/buffer module 51 maintains a block queue 70, and a segment identifier hash table 71 all under control of a control module 73. The block queue 70 generally includes the blocks generated by the block generators 61(n). The block queue 70 can have enqueued therewith a variable number of blocks. After each block has passed through the queue, the filter/buffer module 51 selectively provides ones of the segment update(s) in the block to the tape log module 52 for recording on the respective tape log sets maintained thereby. In particular, when a segment update reaches the head of the block queue 70, if no more recent segment update has been received by the filter/buffer module 51 and enqueued with the block queue 70, the filter/buffer module 51 will provide the segment update to the tape log module 52 for recording. As will be described below in connection with FIGS. 6, 10 and 11, for each segment update received by the tape log module 52, the tape log module 52 in turn will store the segment update on the tape log set which is used to store segment updates for the particular protected volume with which the segment update is associated.
On the other hand, for each segment update for a segment, for which a more recent segment update has been received by the filter/buffer module 51 and enqueued with the block queue 70, the filter/buffer module 51 will discard the earlier-received segment update, and not provide it to the tape log module for recording. As will be described below in greater detail, the filter/buffer module 51 will provide segment updates which reach the head of the block queue 70 to the tape log module 52 for recording, and so, if the later-received segment update reaches the head of the block queue 70 before the filter/buffer module 51 receives a yet later segment update for the segment, the filter/buffer module 51 will provide that later-received segment update to the tape log module 52 for recording. On the other hand, if a yet later segment update is received, when the "later-received segment update" reaches the head of the block queue 70, that "later received segment update" will also be discarded.
The segment identifier hash table 71 is used to identify the particular block of the block queue 70 which contains the most recently received segment update for each segment for which a segment update is contained in a block of the block queue 70. Thus, when the filter/buffer module 51 is to determine whether a segment update contained in a block of the block queue 70 is the most recently received update, prior to providing the segment update to the tape log module 52 for storage, it (that is, the filter/buffer module 51) will determine whether the segment identifier hash table identifies the block as containing the most recently-received segment update. On the other hand, if the segment identifier hash table 71 indicates, for a segment update in a block, that an updated copy of the segment update is stored in another block in the filter/buffer module 51, that particular segment update in the block will not be passed to the tape log module 52 for storage. This will occur for each segment update in each of the blocks in the block queue 70, and so the filter/buffer module 51 will ensure that each block preferably remains in the queue for a period of time that is sufficiently long that it is likely that, if the digital data processing system 10(n) updates the information contained in the segment again within some time after an update is stored in the queue, it will be superseded or filtered out before the update is stored by the tape log module 51. In one embodiment, the time that a block remains in the block queue 70 is controlled to some extent by providing that the block queue 70 will have at least a minimum number of blocks prior to providing segment updates to the tape log module 52. This filtering will serve to reduce the number of segment updates associated with a particular segment (that is, which are associated with a particular segment identifier) which are stored by the tape log module 52 if the digital data processing system 10(n) modifies the record several times within a relatively short period of time.
More specifically, the block queue 70 contains the various blocks that are generated by the block generators 61(n) and provided to the filter/buffer module 51. Block queue 70 comprises a block queue header 74 and a series of one or more block queue elements 75(1) through 75(B) (generally identified by reference numeral 75(b)), with block queue element 75(1) comprising the "head" of the block queue 70 and block queue element 75(B) comprising the "tail" of the block queue 70. The block queue header 74 includes two fields, including a head pointer field 80 and a tail pointer field 81, with the head pointer field 80 pointing to the head block queue element 75(1), and the tail pointer field 81 pointing to the tail block queue element 75(B). Each successive block queue element 75(1) through 75(B-1) points to the next block queue element in the series of block queue elements comprising the block queue 70, thereby to define and establish the series of block queue elements defining the block queue 70. The tail block queue element 75(B) may contain a null or other value which indicates that it is the last block queue element in the block queue 70.
Each block queue element 75(b), in turn, comprises a queue element header 76 and a block 77. The queue element headers 76 of the respective block queue elements 75(b) essentially serve to define the order of the block queue elements 75(b) in the block queue 70 and identify the respective blocks that are associated with the queue 70. The block 77 associated with each queue element header 76 generally corresponds to one of the blocks that is generated by the record generator 61(n) of the interface 50 (FIG. 4). Each queue element header 76 includes several fields, including at least a next block pointer field 82 and a block pointer field 84. The next block pointer field 82 in header 76 of a block queue element 75(b) contains a next block pointer to the next block queue element 75(b+1) in the block queue 70, and thus the next block pointers effectively serve to define the order of the block queue 70 as described above.
The block pointer field 84 includes a block pointer that points to the block 77 that is associated with the block queue element 75(b). In the embodiment in which the filter/buffer module 51 buffers the information received for the protected volumes in a disk storage device, the block pointer will preferably comprise the address of the storage location in the disk drive unit in which the block 77 is stored; it will be appreciated, however, that if other storage media, such as conventional random access memories, are used to store the information, the block pointer in field 84 will generally contain an address that identifies the location in the storage media in which the block 77 is stored.
Block 77, which, as indicated above, is pointed to by the block pointer 84, includes one or more entries 85(1) through 85(R) (generally identified by reference numeral 85(r)), with each entry 85(r) being associated with one segment update loaded into the block by the block generator 61(n). Each entry 85(r), in turn, includes a number of fields, including a segment identifier field 90, a segment length field 91, and a segment update information storage field 93. The actual segment update is stored in the segment update information storage field 93. The segment identifier field 90 receives the segment identifier for the segment update. As indicated above, the segment update can be of variable length, and the segment length field 91 stores a segment length value that identifies the length of the segment update. As described above, the segment update can be either a full segment or a partial segment, and it will be appreciated that, in addition to helping identify the beginning of the next entry 85(r) in the block 77, the segment length value in the segment length field 91 can also be used do indicate whether the segment update stored in the entry 85(r) comprises a partial segment or a full segment.
As indicated above, the filter/buffer module 51 also provides a source of queue elements (not shown). The queue element source may be in the form of, for example, queue which buffers unused queue elements until they are required for use in the block queue 70. Thus, when a new block queue element is required for a new block received by the filter/buffer 51 from a record generator 61(n) of input 50, the queue element will be provided by the queue element source for use in establishing a block queue element 75(b). In addition, when segment updates from a block 77 associated with a block queue element 75 have been either transferred to the tape log module 52 for storage or discarded, the queue element 75(b) is returned to the queue element source.
As noted above, the filter/buffer module 51 also includes the segment identifier hash table 71 which identifies the block queue elements 75(b), and thus the blocks 77, which contain the most recently received segment updates. The record identifier hash table 71 includes a selected number of pointer entries 110(1) through 110(H) (generally identified by reference numeral 110(h)) which point to respective linked lists 111(h), with each index "h" representing a hash value which may be generated by applying a selected hash function to the segment identifiers for the various segment updates that may be received by the remote data protection facility 5 from the digital data processing systems 10(n). Each pointer entry 110(h) contains a pointer that points to the correspondingly-indexed linked list 111(h).
Each linked list 111(h), in turn, can comprise one or more list entries 111(h)(1) through 111(h)(J) (generally identified by reference numeral 111(h)(j)) which are associated with a various segment updates for the various segments whose segment identifiers hash to the hash value corresponding to the index "h." Each entry in list 111(h) comprises a number of fields, including a segment identifier field 112, a block queue entry pointer field 113 and a next hash entry pointer field 114. When a new block 77 is received from the block generator 61(n) and used in a new block queue element 75(b), for each segment update in the block, a hash function is applied to the segment update's segment identifier to provide a hash value "h," which is used as an index to identify a pointer entry 110(h) in the segment identifier hash table 71. If the pointer entry 110(h) contains a null or other value that indicates that there is no list 111(h) associated with the pointer entry 110(h), which can occur if the block queue 70 does not contain any blocks 77 which, in turn, contain segment updates whose segment identifier hashes to the entry's index value "h," a list 111(h) will be established by creating a new list entry 111(h)(1) for the segment update. In addition, the segment identifier for the segment update will be loaded into the segment identifier field 112, and a block pointer loaded into the block queue entry pointer field 113 to point to the block 77 of the block queue 70 that contains the segment update. In addition, a null or other value may be provided in the next hash entry pointer field 114 to indicate that the entry is the last entry 111(h)(J) in the list 111(h).
On the other hand, if the pointer entry 110(h) contains a pointer to a list 111(h), the list 111(h) contains one or more entries 111(h)(j) whose segment identifier values hash to the index value "h." One of the entries 111(h)(j) in that list may contain a segment identifier field 112 which contains a segment identifier that corresponds to the segment update's segment identifier. In that case, the entries 111(h)(j) in the list 111(h) can be scanned to determine whether it contains an entry for which the segment identifier field 112 contains a segment identifier value that corresponds to the segment identifier for the new segment update. If so, the block pointer field 113 for that entry 111(h)(j) can be updated to point to the new block, which, in turn, will ensure that the segment identifier hash table 71 will always point to the block 77 which contains the most recently received segment update for a particular segment identifier. On the other hand, if the list 111(h) does not contain an entry 111(h)(j) for which the segment identifier field 112 contains a segment identifier value that corresponds to the segment identifier for the new segment update, a new entry 111(h)(j) can be added to the list 111(h) in a manner similar to that described above, and linked to the list 111(h) by loading a pointer pointing to the new entry 111(h)(j) in the next pointer field 114 of the last entry in the list 111(h).
As described above, the filter/buffer 51 also includes a control module 73 to control the block queue 70 and the segment identifier hash table 71 to receive blocks from the input module 50, establish block queue elements 75(b) therefor, and selectively transfer segment updates from the block queue 70 to the tape log module 52 for storage. In addition, the control module 73 will determine whether segment updates stored in the block queue elements 75(1) are partial segments and, if so, will enable a request to be transmitted to the remote data protection facility interface 17 of the appropriate digital data processing system 10(n) (FIG. 1) to initiate the retrieval of the corresponding full segments. Preferably, for each such partial segment in the block queue 70, the control module 73 will issue a request for the corresponding full segment so that it would normally receive the full segment before the block queue element containing the partial segment reaches the head of the block queue. It will be appreciated description that, when the full segment is received, the entry 111(h)(j) in the segment identifier hash table will be updated to point to the block queue entry 75(b) which contains the full segment, in which case the partial segment will be discarded and not passed to the tape log module 52 for storage. In one embodiment, if the full segment is not so received, the partial segment will not be passed to the tape log module 52 (in that embodiment, only full segments are stored by the tape log module 52), but instead the remote data protection facility 5 marks the segment as being invalid and will not be reconstructed by the reconstruction module 53 during a reconstruction operation.
Generally, the control module 73, when a block 77 is received from a block generator 61(n), forms a block queue element 75(b) and enqueues it (that is, the block queue element) to the block queue. In those operations, in response to receipt of a block from a block generator 61(n), the control module 73 will:
(a) retrieve a queue element from the queue element source, generate a block queue element and link the generated block queue element as the tail block queue element 75(B) for the block queue 70, and
(b) update the segment identifier hash table 71 to enable the respective entries of the lists 111(h) whose segment identifiers identify the segment updates in the new tail block queue element 75(B) to point to the new tail block queue element 75(B).
In generating a block queue element and linking it as the tail block queue element 75(B) (item (a) above), the control module 73 will update both (i) the tail pointer 81 of the block queue's block queue header 74 and (ii) the next block pointer 82 of the block queue element which was previously at the tail of the block queue 70, to point to the new tail block queue element 75(B). The control module 73 will also condition the block queue header 76 of the new tail block queue element 75(B), in particular, (i) providing an appropriate value as the next block pointer 82, which, as noted above, may illustratively comprise a null value); and (ii) providing a pointer for the block pointer field 84 which points to the new block 77. The control module 73 will perform these operations for each of the blocks received from the block generator 61(n) of the input module 50 (FIG. 4)
As indicated above, the filter/buffer module 51 preferably maintains at least a minimum number of block queue elements 75(b) in the block queue 70 to ensure that the block queue elements 75(b), and thus the segment updates stored therein, will remain in the filter/buffer module 51 for at least some time before they are transferred to the tape log module 52 for storage. Thus, while the block queue 70 contains at least the required minimum number of block queue elements 75(b), the control module 73 will selectively provide the segment updates from entries 85(r) of the head block queue element 75(1) to the tape log module 52 for storage on the appropriate tape log set. In that operation, the control module 73, for each entry 85(r) of the block 77 contained in the head block queue element 75(1), will determine whether the segment update contained in the entry is the most recently received segment update for the segment.
In making that determination, the control module 73 will, in turn, use the selected hash function as described above to generate the hash value "h" for the segment identifier in field 90 of the entry 85(r) and determine whether the list 111(h) associated with that hash value "h" in the segment identifier hash table 71 contains an entry whose block pointer 113 points to the head block queue element's block 77. If the control module 73 determines that the list 111(h) associated with that hash value "h" in the segment identifier hash table 71 contains an entry whose block pointer 113 points to the head block queue element's block 77, it can determine that the segment update contained in the entry 85(r) is the most recently-received segment update for the segment identified in field 90, and provide that entry 85(r) to the tape log module 52 for storage. On the other hand, If the control module 73 determines that the list 111(h) associated with that hash value "h" in the segment identifier hash table 71 contains an entry whose block pointer 113 points to the block 77 of a different queue entry 75(b) (b.noteq.1), it can determine that the segment update contained in the entry 85(r) is not the most recently-received segment update for the segment identified in field 90, and discard that entry 85(r).
With this background, the detailed operations enabled by the control module 73 will be described in connection with the flow charts depicted in FIGS. 8 and 9, with FIG. 8 depicting operations performed by the control module 73 when a block is received from the input module 50, and FIG. 9 depicting operations performed by the control module 73 in connection with transferring of entries 85(r) from the head block queue element 75(1) to the tape log module 52. With reference initially to FIG. 8, the control module 73 will
(i) receive a block 77 from the input module 50 (step 200);
(ii) retrieve a queue element from the queue element source (step 201);
(iii) enqueue the queue element, which was retrieved in step 201, to the block queue 70 (step 202), in the process updating the next block pointer field 82 of the current tail block queue element 75(B) and the tail pointer 81 of the header 74 of the block queue; and
(iv) load a pointer to the block 77 received from the input module 50 into the block pointer field 84 (step 203) thereby to link the block 77 to the tail block queue element 75(B).
Thereafter, the control module 73 will update the segment identifier hash table for each of the entries 85(r) in the block 77 of the new queue element, and in those operations will:
(v) select the first entry 85(r) in the block 77 (step 204),
(vi) use the selected hashing function in connection with the segment identifier in field 90 of the selected entry to generate a hash value "h" (step 205); and
(vi) scan the list 111(h) of the segment identifier hash table 71 pointed to by list pointer 10(h), to determine whether an entry exists whose segment identifier field 112 contains the same segment identifier as the segment identifier field 90 of the selected entry 85(r) (step 206).
(vii) If the control module 73 makes a positive determination in step 206, it will update the block pointer field 113 of the entry to point to the block 77 of the new block queue element (step 207), but
(viii) if the control module makes a negative determination in step 206, it will generate a new entry for the list 111(h) and insert the segment update's segment identifier in segment identifier field 112 and a pointer to the block 77 of the new block queue element in block pointer field 113 of the new entry (step 208)
(ix) Thereafter, the control module 73 will determine whether the block 77 of the new block queue element contains any additional entries 85(r) (step 209), and
(x) in response to a positive determination in step 209, select the next entry 85(r) (step 210) and return to step 205 to process that entry.
FIG. 9 depicts operations performed by the control module 73 in connection with transferring of entries 85(r) from the head block queue element 75(1) to the tape log module 52. In connection with those operations, the control module |