|
|
|
FILE OR DATABASE MAINTENANCE |
Outboard file cache system5809527
Abstract
A system and method are described for caching files of data in a cache which is beyond the input/output boundary of a host. A host references a file with file access commands containing a logical file-identifier and a logical offset into the file. An outboard file cache coupled to the input/output section of the host receives the file access commands. The outboard file cache is transparent to users who program the host. Generation of input/output channel programs and mapping the data referenced to a physical address in secondary storage are eliminated when the referenced data is present in the cache. A file descriptor table in the outboard file cache identifies the logical portions of the logical files which are present in the cache. If the data referenced by the logical file-identifier and logical offset in a file access command is present in the outboard file cache, the data is transferred from the outboard file cache to the host memory. Otherwise, a miss status is returned to the host, and the host stages data from secondary storage to the outboard file cache. The outboard file cache further includes a lock table for storing file locks which inhibit access to selected files. In an outboard cache, excessive writes to a file are detected, the outboard cache having the first division for storage of selected portions of normal files, and a second division for storage of selected the second division is monitored and automatically converted to the first division when the storage used in the second division of cache memory falls below the periodic minimum.
Claims
That which is claimed is:
1. A data processing system comprising:
a host processor for issuing file access commands, wherein each file access command defines an operation to be performed on a selectable one of one or more files and includes a file-identifier referencing one file of said one or more files and a logical offset referencing a selected portion of said one file, said host processor including an input-output logic section which provides an interface for input of data to said host processor and output of data from said host processor;
an outboard file cache coupled to said input-output logic section of said host processor and responsive to said file access commands, wherein said outboard file cache provides cache storage for said one or more files and comprises
a cache memory, wherein said cache memory provides random access storage for selectable portions of said one or more files;
a file descriptor table, wherein said file descriptor table provides storage of file-identifiers and offsets which are indicative of portions of said one or more files which are present in said cache memory;
cache detection control interfaced with said file descriptor table and responsive to said file access commands, wherein said cache detection control detects whether said selected portion is present in said cache memory and provides a hit code if said selected portion is present in said cache memory; and
cache access control responsive to said hit code and interfaced with said cache memory, wherein said cache access control provides access to said selected portion of said one file if said hit code is provided.
2. The data processing system of claim 1, further comprising:
a secondary storage device responsively coupled to said input-output logic section of said host processor for storing said one or more files;
said cache detection control further provides a miss code if said selectable portion is not present in said cache memory, whereby a miss condition is indicated; and
staging means responsive to said miss code for reading said selectable portion from said secondary storage device and writing said selectable portion in said cache memory.
3. The data processing system of claim 2, wherein,
said selected portion references one or more segments; and
said cache detection control provides a separate miss code for each said one or more segments which is not present in said cache memory.
4. The data processing system of claim 3, wherein,
said staging means further comprises
address indicator means for providing a device identifier and a device address to said outboard file cache, wherein said device identifier identifies said secondary storage device and said device address indicates the address in said secondary storage device at which each said one or more segments are stored;
said outboard file cache further comprises
address storage means responsive to said staging means for storing said device identifier and device address in said file descriptor table;
destage initiation means for detecting when to destage one or more segments from said cache memory and providing a destage request, wherein said destage request specifies said one or more segments to destage;
destage means responsive to said destage request and interfaced with said file descriptor table for reading said one or more segments to destage from said cache memory and writing said one or more segments to destage to said secondary storage device, whereby said file descriptor table provides said device identifier and said device address where said one or more segments are to be written.
5. The data processing system of claim 3, wherein,
said secondary storage device includes a first secondary storage device responsively coupled to said input-output logic section of said host processor for storing said one or more files; and
a second secondary storage device responsively coupled to said input-output logic section of said host processor for storing a copy of selected ones of said one or more files;
said staging means further comprising
address indicator means for providing a first device identifier, first device address, second device identifier, and second device address to said outboard file cache, wherein said first device identifier is indicative of said first secondary storage device, said first device address indicates the address in said first secondary storage device at which each said one or more segments is stored, said second device identifier is indicative of said second secondary storage device, and said second device address indicates the address in said second secondary storage device at which each said one or more segments is stored;
said outboard file cache further comprises
address storage means responsive to said staging means for storing said first device identifier, said first device address, said second device identifier, and said second device address in said file descriptor table;
destage initiation means for detecting when to destage one or more segments from said cache memory and providing a destage request, wherein said destage request specifies one or more segments to destage;
destage means responsive to said destage request and interfaced with said file descriptor table for reading said one or more segments to destage from said cache memory and writing said one or more segments to destage to said first secondary storage device and to said second secondary storage device, whereby said file descriptor table provides said first device identifier, said first device address, said second device identifier, and said second device address.
6. The data processing system of claim 1, further comprising:
a secondary storage device responsively coupled to said input-output logic section of said host processor for storing said one or more files;
said host processor further including first initiation queue means for queuing said file access commands;
first initiation queue processing means interfaced with said first initiation queue means for monitoring said first initiation queue means, dequeuing a file access command, and sending said file access command to said outboard file cache.
7. The data processing system of claim 6, further comprising:
second initiation queue means for queuing said file access commands;
said host processor further including command enqueuing means interfaced with said first initiation queue means and said second initiation queue means for selecting either said first initiation queue means or said second initiation queue means and enqueuing a file access command in either said first initiation queue means or said second initiation queue means;
second initiation queue processing means interfaced with said second initiation queue means for monitoring said second initiation queue means, dequeuing a file access command, and sending said file access command to said outboard file cache.
8. The data processing system of claim 1,
wherein each file access command further includes a file type which designates whether said one file is a resident file;
wherein said file descriptor table further includes file type designators which are indicative of portions of cache memory in which resident files are stored, whereby said cache memory allocated to said resident files is not eligible for cache replacement;
wherein said cache detection means provides a miss code if said selectable portion is not present in said cache memory;
wherein said outboard file cache further comprises
cache replacement control interfaced with said file descriptor table and responsive to said miss code and said file type from a file access command, wherein said cache replacement control selects a portion of said cache memory which is not allocated to a resident file for storing said selectable portion if said miss code is detected and said file type indicates said one file is not a resident file; and
resident file storage control interfaced with said file descriptor table and responsive to said miss code and said file type from a file access command, wherein said resident file storage control allocates a portion of said cache memory which is not presently allocated to a resident file for storing said selectable portion if said miss code is detected and said file type indicates said one file is a resident file.
9. The data processing of claim 8,
wherein said cache memory comprises:
a first division of storage for storing selected portions of files which are eligible for cache replacement; and
a second division of storage for storing selected portions of resident files which are not eligible for cache replacement;
wherein said outboard file cache further comprises
apportioning control interfaced with said cache memory and responsive to said resident file storage control, wherein said apportioning control automatically converts a first predetermined amount of storage from said first division to said second division when all of said second division of storage is currently assigned to one or more resident files.
10. The data processing system of claim 9, further comprising:
a storage monitor interfaced with said second division of storage, wherein said storage monitor observes an amount of storage in said second division which is assigned to one or more resident files;
a minimum usage indicator responsive to said storage monitor, wherein said minimum usage indicator stores a periodic minimum of said amount of storage in said second division which is assigned to one or more resident files;
wherein said apportioning control is further interfaced with said minimum usage indicator and automatically converts a second predetermined amount of storage from said second division to said first division when said second division of cache memory falls below said minimum usage indicator.
11. A data processing system comprising:
a first host processor for issuing file access commands, wherein each file access command defines an operation to be performed on a selectable one of one or more files and includes a file-identifier referencing one file of said one or more files and a logical offset referencing a selected portion of said one file, wherein one of said file access commands is a lock command, said first host processor including an input-output logic section which provides an interface for input of data to said first host processor and output of data from said first host processor;
a second host processor for issuing file access commands wherein said one or more files are accessible by each of said first and said second host processor, said second host processor including an input-output logic section which provides an interface for input of data to said second host processor and output of data from said second host processor;
an outboard file cache coupled to said input-output logic section of said first host processor and coupled to said input-output logic section of said second host processor and responsive to said file access commands from said first host processor and said second host processor, wherein said outboard file cache provides cache storage for said one or more files;
said outboard file cache comprising,
a cache memory, wherein said cache memory provides random access storage for selectable portions of said one or more files;
a file descriptor table, wherein said file descriptor table provides storage for file-identifiers and offsets which are indicative of portions of said one or more files which are present in said cache memory;
cache detection control interfaced with said file descriptor table and responsive to said file access commands, wherein said cache detection control detects whether said selected portion is present in said cache memory and provides a hit code if said selectable portion is present in said cache memory; and
cache access control responsive to said hit code and interfaced with said cache memory, wherein said cache access control provides access to said one or more requested segments if said hit code is provided.
12. The data processing system of claim 11 wherein:
said outboard file cache further comprises
activity queue means for queuing file access commands received by said outboard file cache;
said cache detection control comprises
first cache detection control interfaced with said file descriptor table and said activity queue means, wherein said first cache detection control reads a file access command from said activity queue means, detects whether said selected portion referenced by said file access command is present in said cache memory, and provides a hit code if said selected portion is present in said cache memory; and
second cache detection control interfaced with said file descriptor table and said activity queue means, wherein said second cache detection control reads a file access command from said activity queue means, detects whether said selected portion referenced by said file access command is present in said cache memory, and provides a hit code if said selected portion is present in said cache memory;
said cache access control comprises
first cache access control responsive to said hit code, interfaced with said cache memory, and coupled to said first host processor, wherein said first cache access means provides access for said first host processor to said one or more requested segments if said hit code is detected; and
second cache access control responsive to said hit code, interfaced with said cache memory, and coupled to said second host processor, wherein said second cache access control provides access for said second host processor to said one or more requested segments if said hit code is detected.
13. The data processing system of claim 12 wherein said outboard file cache further comprises:
lock table means for storing file locks;
means responsive to a file lock command and interfaced with said lock table means for locking a file; and
means interfaced with said lock table means and responsive to said miss code for responding to said file access commands and indicating whether said selected portion of said one file is locked if said miss code is detected.
14. A data processing system comprising:
a host processor for issuing file access commands, wherein each file access command defines an operation to be performed on a selectable one of one or more files, said host processor including an input-output logic section which provides an interface for input of data to said host processor and output of data from said host processor, wherein each of said file access commands includes a file-identifier referencing one of said one or more files and an offset referencing a selected portion of said one file;
an outboard file cache coupled to said input-output logic section of said host processor and responsive to said file access commands, wherein said outboard file cache provides cache storage for said one or more files;
said outboard file cache comprising,
a cache memory, wherein said cache memory provides random access storage for said one or more files;
a file descriptor table for storage of file-identifiers and offsets which are indicative of portions of said one or more files which are present in said cache memory, wherein said file descriptor table further includes one or more stage-pending indicators for designating when portions of files are being staged;
cache detection control interfaced with said file descriptor table and responsive to said file access commands, where in said cache detection control detects whether said selected portion of said one file is present in said cache memory, provides a hit code if said selected portion is present in said cache memory, provides a miss code if said selected portion is not present in said cache memory, and provides a resend code if a stage-pending indicator indicates that said selected portion is being staged to said cache memory;
cache access control responsive to said hit code and interfaced with said cache memory, wherein said cache access control provides access to said selected portion of said one file if said hit code is provided;
a secondary storage device responsively coupled to said input-output logic section of said host processor for storing said one or more files;
staging means responsive to said miss code for reading said selected portion of said one file from said secondary storage device and writing said selected portion in said cache memory; and
resend means responsive to said resend code for resending a file access command if said resend code is provided.
15. The data processing system of claim 14, wherein
said staging means further comprises
address indicator means for providing a device identifier and a device address to said outboard file cache, wherein said device identifier identifies said secondary storage device and said device address indicates the address in said secondary storage device at which said selected portion of said one file is stored;
said outboard file cache further comprises
address storage means responsive to said staging means for storing said device identifier and device address in said file descriptor table;
destage initiation means for detecting when to destage a portion of a file from said cache memory and providing a destage request, wherein said destage request specifies a portion of a file to destage;
destage means responsive to said destage request and interfaced with said file descriptor table for reading said portion of said file to destage from said cache memory and writing said portion of said file to destage to said secondary storage device, whereby said file descriptor table provides said device identifier and said device address where said portion of said file is to be written.
16. In a data processing system including a host processor having one or more instruction processors, primary storage, and an input-output section interfacing with devices external to the host processor, wherein the data processing system further includes one or more secondary storage devices and an outboard file cache, each coupled to the input-output section of the host processor, wherein the data accessible to the host processor is logically grouped into one or more files and the secondary storage devices provide storage for the one or more files, and the files are referenced by providing a file access command to the operating system of the host processor, a method for providing access to a selectable portion of the one or more files of data, comprising the steps of:
issuing a file access command to the outboard file cache, wherein the file access command includes a file-identifier and a file-relative-segment-offset, said file-identifier referencing a selected file, and said file-relative-segment-offset referencing a selected portion of said selected file;
comparing the file access command to file-identifiers and file-relative-segment-offsets stored in the outboard file cache to detect whether said selected portion of said selected file is present in the outboard file cache; and
providing access to said selected portion of said selected file if said selected portion is present in the outboard file cache.
17. The method of claim 16, further comprising the steps of:
staging said selected portion of said selected file referenced by the file access command from a secondary storage device to the outboard file cache if said selected portion is not present in the outboard file cache; and
storing said file-identifier and said file-relative-segment-offset of said selected portion of said selected file in the outboard file cache to indicate said selected portion of said selected file is present in the outboard file cache.
18. The method of claim 17, further comprising the step of reissuing the file access command to the outboard file cache after said storing step to provide access to said selected portion of said selected file.
19. The method of claim 17, further comprising the steps of:
inhibiting access to said selected portion of said selected file until said staging step is complete.
20. The method of claim 19, further comprising the steps of:
designating said portion of said selected file as stage-pending if said selected portion is not present in the outboard file cache;
designating said selected portion of said selected file available after said staging step.
21. The method of claim 17,
wherein said storing step further comprises storing a device identifier for said selected portion of said selected file, wherein said device identifier indicates the secondary storage device and said device address indicates the physical address on the secondary storage device where said selected portion of said selected file is stored;
the method further comprises the steps of
selecting a portion of a file in the outboard file cache to destage, wherein said portion of said file to destage contains data which is not present on the secondary storage device which provides storage for said portion of said file to destage;
destaging said portion of said file from said selecting step to secondary storage, whereby said device identifier and said device address in the outboard file cache indicate the secondary storage device and the physical address on the secondary storage device at which said portion of said file is to be destaged.
22. The method of claim 21, wherein said destaging step comprises the steps of:
notifying the host processor when said portion of said file from said selecting step should be destaged;
issuing a destage file access command to the outboard file cache, wherein said destage file access command indicates said portion of said file to destage;
transferring said portion of said file to destage from the outboard file cache to the host processor; and
writing said portion of said file in said transferring step to the secondary storage device.
23. The method of claim 16 further comprising the steps of:
issuing a lock file access command to the outboard file cache indicating a selected file to lock;
locking said selected file; and
inhibiting access to said selected file from said locking step for subsequently issued file access commands if said subsequently issued file access commands reference said selected file which is locked and said selected file which is locked is not present in the outboard file cache.
24. The method of claim 16 further comprising the steps of:
issuing a lock file access command to the outboard file cache indicating a selected portion of a selected file to lock;
locking said selected portion of said selected file;
inhibiting access to said selected portion of said selected file from said locking step for subsequently issued file access commands if said subsequently issued file access commands reference said selected portion of said selected file which is locked and said selected portion of said selected file which is locked is not present in the outboard file cache.
25. The method of claim 16,
wherein said file access command further comprises a file type which designates whether said selected file is a normal file or a resident file;
and further comprising the steps of:
if said file type indicates a normal file and said selected portion of said selected file is not present in the outboard file cache, performing steps (a), (b), and (c);
(a) selecting a first portion of cache memory which is unused or presently assigned to a normal file for assignment to said selected portion of said selected file;
(b) destaging said first portion of cache memory if the file data stored therein has been written; and
(c) assigning said first portion of cache memory for storage of said selected portion of said selected file;
if said file type indicates a resident file and said selected portion of said selected file is not present in the outboard file cache, performing steps (d) and (e);
(d) selecting an unused portion of cache memory in the outboard file cache for storing said selected portion of said selected file; and
(e) assigning said unused portion of cache memory for storage of said selected portion of said selected file.
26. The method of claim 25, further comprising the steps of:
designating a first division of cache memory in the outboard file cache for storage of selected portions of files which are eligible for cache replacement; and
designating a second division of cache memory in the outboard file cache for storage of selected portions of resident files, wherein portions of said second division of cache memory are not eligible for reassignment.
27. The method of claim 26, further comprising the step of automatically converting a first predetermined amount of storage from said first division of cache memory to said second division of cache memory when all of said second division of cache memory is assigned to resident files.
28. The method of claim 26, further comprising the steps of:
monitoring an amount of storage in said second division of cache memory which is assigned to resident files;
establishing a periodic minimum of said amount of storage in said second division which is assigned to resident files;
automatically converting a second predetermined amount of storage from said second division to said first division when the storage used in said second division of cache memory falls below said periodic minimum.
29. The method of claim 27, further comprising the steps of:
monitoring said amount of storage in said second division of cache memory which is assigned to resident files;
establishing a periodic minimum of said amount of storage in said second division which is assigned to resident files;
automatically converting a second predetermined amount of storage from said second division to said first division when storage used in said second division of cache memory falls below said periodic minimum.
30. A cache system responsive to data access commands issued by a host processor in a data processing system, wherein each of the data access commands designates an operation to perform on data addressed by the command and a data type indicating whether the data addressed by the command is resident data or replaceable data, wherein resident data presently stored in the cache is not subject to cache replacement and replaceable data presently stored in the cache is subject to cache replacement, the cache system comprising:
a cache memory;
cache detection control responsive to a data access command, wherein said cache detection control detects whether the data addressed by the data access command is present in said cache memory, provides a hit code if the data addressed by the data access command is present in said cache memory, and provides a miss code if the data addressed by the data access command is not present in said cache memory;
cache access control interfaced with said cache memory and responsive to said hit code, wherein said cache access control provides access to the data addressed by the data access command if said hit code is provided;
cache replacement control responsive to said miss code and the data access command, wherein said cache replacement control selects a portion of said cache memory in which replaceable data is stored for storing the data addressed by the data access command if said miss code is detected and the data type in the data access command is replaceable data; and
resident data storage control responsive to said miss code and the data access command, wherein said resident data storage control selects a portion of said cache memory in which is neither resident data nor replaceable data is stored for storing the data addressed by the data access command if said miss code is detected and the data type in the data access command is resident data.
31. The cache system of claim 30,
wherein said cache memory comprises
a first division of storage for storing replaceable data;
a second division of storage for storing resident data;
wherein said cache system further comprises
apportioning control interfaced with said cache memory and responsive to said resident data storage control, wherein said apportioning control automatically converts a first predetermined amount of storage from said first division to said second division when all of said second division is filled with resident data.
32. The cache system of claim 31, further comprising:
a second division storage monitor interfaced with said second division of storage, wherein said second division storage monitor reports said amount of storage in said second division in which resident data is stored;
a minimum usage indicator responsive to said second division storage monitor, wherein said minimum usage indicator stores a periodic minimum of said amount of storage in said second division in which resident data is stored;
wherein said apportioning control is further interfaced with said minimum usage indicator and automatically converts a second predetermined amount of storage from said second division to said first division when usage of storage in said second division of said cache memory falls below said minimum usage indicator.
33. In a cache system responsive to data access commands issued by a host processor in a data processing system, wherein each of the data access commands designates an operation to perform on data addressed by the command and a data type indicating whether the data addressed by the command is resident data or replaceable data, wherein resident data presently stored in the cache is not subject to cache replacement and replaceable data presently stored in the cache is subject to cache replacement, a method of operating the cache system comprising the steps of:
detecting whether the data addressed by a data access command is present in the cache;
providing access to the data addressed by the data access command if the data addressed is present in the cache;
selecting a portion of the cache in which replaceable data is stored for storing the data addressed by the data access command if the data addressed is not present in the cache and the data type in the data access command is replaceable data; and
selecting a portion of the cache in which is neither resident data nor replaceable data is stored for storing the data addressed by the data access command if the data addressed is not present in the cache and the data type in the data access command is resident data.
34. The method of claim 33, further comprising the steps of:
designating a first division of the cache for storage of resident data; and
designating a second division of the cache for storage of replaceable data.
35. The method of claim 34, further comprising the step of automatically converting a first predetermined amount of storage from said first division to said second division when resident data is stored in all of said second division.
36. The method of claim 35, further comprising the steps of:
monitoring an amount of storage in said second division in which resident data is stored;
establishing a periodic minimum of said amount of storage in said second division in which resident data is stored; and
automatically converting a second predetermined amount of storage from said second division to said first division when said amount of storage in said second division in which resident data is stored falls below said periodic minimum.
37. In a cache system responsive to data access commands issued by a host processor in a data processing system, wherein each of the data access commands designates an operation to perform on data addressed by the command and a replacement level indicating a relative priority for which the data addressed by the command is subject to cache replacement once the data addressed by the command is stored in memory of the cache system, a method of operating the cache system comprising the steps of:
detecting whether the data addressed by a data access command is present in the memory of the cache system;
providing access to the data addressed by the data access command if the data addressed is present in the memory of the cache system;
selecting a portion of the memory of the cache system in which data with the lowest replacement level is stored for storing the data addressed by the data access command if the data addressed is not present in the memory of the cache system;
storing the data addressed by the data access command in said portion of the memory of the cache system from said selecting step;
reading the replacement level from the data access command;
associating the replacement level from the data access command with said portion of the memory of the cache system from said selecting step, whereby the data access command provides said replacement level; and
decreasing replacement levels associated with portions of the memory of the cache system which are not selected for storing the data addressed by the data access command.
Description
TABLE OF CONTENTS
I. BACKGROUND OF THE INVENTION
A. Field of the Invention
B. General Background
II. SUMMARY OF THE INVENTION
III. BRIEF DESCRIPTION OF THE DRAWINGS
IV. DESCRIPTION OF THE PREFERRED EMBODIMENT
A. Host Data Processing System
B. Prior Art Data Storage Hierarchy
C. File Cache System Overview
1. Functional Block Diagram
2. Data Flow
a. Command Packet
b. Program Initiation Queue
c. Status Packet Queue and Program Status Packet
3. File Space Management
4. Major Component Overview
a. Host Software
(1) Input/Output Software
(2) File Cache Handler Software
b. Data Mover (DM) and Host Interface Adapter (HIA)
c. Index Processor (IXP)
d. Storage Interface Controller (SICT)
e. Non-volatile Storage (NVS)
f. Street Interprocessor Network
5. Multi-Host Capability
D. File Cache Handler Software Detailed Description
1. Data Transfer
2. Host Local Buffers and Outboard File Cache Buffer
3. DATA.sub.-- DESCRIPTOR.sub.-- WORD and Data Chain
4. Status Processing
5. Destage Process
6. READ Command
a. READ Command Packet
b. FILE.sub.-- IDENTIFIER
c. READ Status Packet
d. Destage Request Packet
7. ALLOCATE Command
a. ALLOCATE Command Packet
b. ALLOCATE Status Packet
8. CLEAR PENDING Command
a. CLEAR PENDING Command Packet
b. CLEAR PENDING Status Packet
9. DESTAGE Command
a. DESTAGE Command Packet
b. DESTAGE Status Packet
c. Segment Information Packet
10. DESTAGE COMPLETE Command
a. DESTAGE COMPLETE Command Packet
b. DESTAGE COMPLETE Status Packet
11. DESTAGE AND PURGE DISK Command
a. DESTAGE AND PURGE DISK Command Packet
b. DESTAGE AND PURGE DISK Status Packet
12. DESTAGE AND PURGE FILE Command
a. DESTAGE AND PURGE FILE Command Packet
b. DESTAGE AND PURGE FILE Status Packet
13. DESTAGE AND PURGE FILES BY ATTRIBUTES Command
a. DESTAGE AND PURGE FILES BY ATTRIBUTES Command Packet
b. DESTAGE AND PURGE FILES BY ATTRIBUTES Status Packet
14. LOCK CACHE FILE Command
a. LOCK CACHE FILE Command Packet
b. LOCK CACHE FILE Status Packet
15. LOCK CACHE FILES BY ATTRIBUTES Command
a. LOCK CACHE FILES BY ATTRIBUTES Command Packet
b. LOCK CACHE FILES BY ATTRIBUTES Status Packet
16. MODIFY File Descriptor Command
a. MODIFY File Descriptor Command Packet
b. MODIFY File Descriptor Status Packet
17. PURGE DISK Command
a. PURGE DISK Command Packet
b. PURGE DISK Status Packet
18. PURGE FILE Command
a. PURGE FILE Command Packet
b. PURGE FILE Status Packet
19. PURGE FILES BY ATTRIBUTES Command
a. PURGE FILES BY ATTRIBUTES Command Packet
b. PURGE FILES BY ATTRIBUTES Status Packet
20. RETURN SEGMENT STATE Command
a. RETURN SEGMENT STATE Command Packet
b. RETURN SEGMENT STATE Status Packet
c. Segment State Packet
21. STAGE BLOCKS Command
a. STAGE BLOCKS Command Packet
b. STAGE BLOCKS Status Packet
22. STAGE SEGMENTS Command
a. STAGE SEGMENTS Command Packet
b. STAGE SEGMENTS Status Packet
23. STAGE WITHOUT DATA Command
a. STAGE WITHOUT DATA Command Packet
b. STAGE WITHOUT DATA Status Packet
24. UNLOCK CACHE FILE Command
a. UNLOCK CACHE FILE Command Packet
b. UNLOCK CACHE FILE Status Packet
25. UNLOCK CACHE FILES BY ATTRIBUTES Command
a. UNLOCK CACHE FILES BY ATTRIBUTES Command Packet
b. UNLOCK CACHE FILES BY ATTRIBUTES Status Packet
26. WRITE Command
a. WRITE Command Packet
b. WRITE Status Packet
27. WRITE OFF BLOCK BOUNDARY Command
a. WRITE OFF BLOCK BOUNDARY Command Packet
b. WRITE OFF BLOCK BOUNDARY Status Packet
E. Index Processor (IXP) Detailed Description
1. Data Structures
2. Index Processor Processing
V. CLAIMS
Appendix
A. Glossary and Acronyms
I. BACKGROUND OF THE INVENTION
A. Field of the Invention
This invention relates to data storage architectures used by data processing systems, and more particularly to a system for outboard caching of file data.
B. General Background
The performance of data processing systems has improved dramatically through the years. While new technology has brought performance improvements to all functional areas of data processing systems, the advances in some areas have outpaced the advances in other areas. For example, advancements in the rate at which computer instructions can be executed have far exceeded improvements in the rate at which data can be retrieved from storage devices and supplied to the instruction processor. Thus, applications that are input/output intensive, such as transaction processing systems, have been constrained in their performance enhancements by data retrieval and storage performance.
The relationship between the throughput rate of a data processing system, input/output (I/O) intensity, and data storage technology is discussed in "Storage hierarchies" by E. I. Cohen, et al., IBM Systems Journal, 28 No. 1 (1989). The concept of the storage hierarchy, as discussed in the article, is used here in the discussion of the prior art. In general terms, the storage hierarchy consists of data storage components within a data processing system, ranging from the cache of the central processing unit at the highest level of the hierarchy, to direct access storage devices at the lowest level of the hierarchy. I/O operations are required for access to data stored at the lowest level of the storage hierarchy.
Varied attempts have been made to relieve the I/O bottleneck which constrains the performance of I/O intensive applications. Three ways in which the I/O bottleneck has been addressed include solid state disks, cache disks, and file caches.
Solid state disks (SSDs) were invented to address the relatively slow electromechanical speeds at which data stored on magnetic disks is read or written. SSDs are implemented using dynamic random access memory (DRAM) technology. The logical organization of the DRAM corresponds to the particular magnetic disk which the SSD is emulating. This allows software applications to access files stored on the SSD in the same manner they would access files stored on a magnetic disk.
The major advantage SSDs have over magnetic disks is that data can be read or written at electronic speeds rather than the electromechanical speeds of magnetic disks. An application's throughput may be significantly improved if the application makes a substantial number of disk requests to an SSD rather than a magnetic disk.
At least three problems persist with SSDs. First, the data path length for making requests to the SSD remains the same as for magnetic disks; second, the overhead involved in addressing the proper location in SSD storage is still allotted to the instruction processor or central processing unit; and third, a fault tolerant SSD configuration requires two write operations for data security. All three problems result in added processing time and reduced system throughput.
The first disadvantage associated with SSDs remains because a SSD resides at the same level of the data storage hierarchy as a magnetic disk. To access a given file at a particular location within the file (offset), the file and offset must be located in the storage hierarchy: the SSD on which the file is stored must be identified; the disk controller which provides access to the SSD must be identified; the input/output channel to which the disk controller is coupled must be identified; and the input/output processor to which controls the input/output channel must be identified. All this processing is performed by the instruction processor. While the instruction processor is performing these tasks, others must wait, and the result is a reduction in the overall data processing throughput rate. Furthermore, the application seeking access to the file data must wait for the input/output request to travel to the I/O processor, through the I/O channel, through the disk controller, to the desired disk, and back up the data path to the application.
The second disadvantage for SSDs is that the instruction processor is required to map a relative file addresses to a physical disk address and manage allocation of SSD space. While the instruction processor is mapping file requests and managing disk space it cannot perform other tasks and the data processing system throughput rate suffers.
The third disadvantage associated with SSDs remains because two SSDs are required if fault tolerant capabilities are required. Fault tolerance with SSDs involves coupling two SSDs to a data processing system through two different data paths. A backup SSD mirrors the data on the primary SSD and is available in the event of failure of the primary SSD. To keep the backup SSD synchronized with the primary SSD, the instruction processor must perform two write operations when updating a file: the first write operation updates the primary SSD, and the second write operation updates the backup SSD. This method adds additional overhead to the data processing system to the detriment of the system throughput rate.
A cache disk subsystem is a second invention which was made to address the I/O bottleneck. U.S. Pat. No. 4,394,733, issued to Robert Swenson discloses a cache disk subsystem. The cache disk subsystem utilizes DRAM storage for buffering portions of magnetic disks, and resides at the disk controller level of the data storage hierarchy so that portions of a plurality of magnetic disks can be cached.
The chief advantage of the cache disk subsystem is that I/O requests addressing a portion of a disk which is cached can be processed at electronic speeds rather than the electromechnical speed of a disk. While this advantage is substantial, the cache disk subsystem's position in the data storage hierarchy constricts the flow of I/O requests. The I/O performance gained by cache disk subsystems is limited by the data path length and numerous files competing for limited cache storage space. Because the caching of disk storage takes place at the disk controller level of the data storage hierarchy, the operating system must determine the appropriate data path in the same manner as described with the SSD. As described above, a lengthy data path reduces overall system throughput.
Where a large number of files compete for cache disk subsystem cache space, the I/O performance gains may be severely limited due to excess overhead processing. If two or more files have a high I/O request rate and they are stored on the same or different disks under a common disk controller, a substantial amount of the processing performed by the cache disk subsystem may be overhead. The overhead is incurred when most or all of cache storage is in use, and the cache disk subsystem is experiencing a high miss rate. A miss is defined as an I/O request which references a portion of disk which is not currently in cache storage. When a miss occurs, the cache disk subsystem must select a segment of cache storage to allocate to the latest I/O request (the selected segment may currently hold a different portion of different disk), and read the referenced portion of disk and store it in the cache segment. If this processing is required for a large proportion of I/O requests, the benefit of caching disk storage is lost to overhead processing.
One way in which the aforementioned problem is addressed is by separating files with a high access rate by storing them on separate disks under different storage controllers. This solution is expensive in two respects. First, human resources are required to physically separate the files and ensure that the operating system has the correct configuration information. Continual monitoring is required to detect when the location of files is hampering the I/O rate, and then redistributing files as necessary. Second, hardware costs are substantial because additional disks, disk controllers, and cache disk subsystems are required to physically separate the files.
A third strategy for relieving the I/O bottleneck is file caching. File caching differs from cache disk subsystems in that file data is buffered in main DRAM storage of a data processing system, and file management software manages allocation of main storage for file buffers. In "Scale and Performance in a Distributed File System" by John Howard, et al., ACM Transactions on Computer Systems, 6, No. 1, (1988), 51-81; "Caching in the Sprite Network File System", by Michael Nelson, et al., ACM Transactions on Computer Systems, 6, No. 1, (1988), 134-154; and U.S. Pat. No. 5,163,131, entitled, "Parallel I/O Network File Server Architecture", to Edward Row, et al., three different approaches to file caching are discussed.
The file caching described in "Scale and Performance in a Distributed File System" involves files which are distributed across a network of workstations. Each workstation contains server software for providing access to each of the files it manages. File cache software on the workstation seeking access to a selected file locates the server which controls access to the file and requests the file data from the server software. The file cache software stores the file data it receives on the local disk storage of the client workstation. In contrast, the file cache system described in "Caching in the Sprite Network File System" caches file data to the main memory of the client workstation. The disadvantages with each approach are readily apparent.
With the first approach to file caching, the "cached" file data is stored on a disk controlled by the client workstation. This means that the rate at which file data can be accessed is still dependent upon the access rate of the local disk. Furthermore, any updates to the locally cached file must be written to the server's version of the file before other clients are allowed to access the file.
While the second approach provides access to file data at main memory access speed, it is still burdened with the overhead of keeping the server's version of the file consistent with the client's cached version. In addition, file data loss is also possible if main memory on the client workstation fails. In particular, if the cached file is updated and the client workstation crashes before the update is forwarded to the server, the file update may be lost. Therefore, to provide file data integrity for a file update occurring on the client, before the operation is allowed to complete, the file update must be transmitted to the server workstation and stored on its disk.
U.S. Pat. No. 5,163,131 also discusses a file cache architecture applicable to a networked workstation environment. In this patent, the file data is cached in the main memory of the server workstation. For other workstations on the network to access the file data cached on the server, network communication must be initiated for the transfer of file data. Thus, the benefits of file caching are limited by the amount of traffic on the network and the network bandwidth.
The current state of file caching schemes involves the tradeoff between the security of storing file data on disk and an increased access rate by storing the file data stored in main memory. Alternatively, the file data can be stored in electronic memories which are closer to the disk in the storage hierarchy, but the access rate is constrained by the length of the data path from an application to the electronic memory. Therefore, it would be desirable for a file cache to provide a high I/O rate while and still maintain data security which is comparable to disk storage.
II. SUMMARY OF THE INVENTION
It is an object of the invention to increase the rate at which access to file data is provided when the file data is not present in the main memory of a host.
Another object is to cache file data in storage which is non-volatile relative to a host.
A further object of the invention is to eliminate having to map a logical file access command to the physical storage device and storage device address of the backing store for the file where the data referenced in the logical file access command resides when the referenced data is present in the cache.
Still another object of the invention is to minimize the processing required to destage file data from the cache storage to storage device where the file data resides.
Yet another object is to cache file data from a plurality of hosts in shared cache storage.
A further object is to cache file data which is shared between a plurality of hosts.
Another object is to selectively allow files to permanently remain in cache storage, whereby files which are permanently in cache storage are not subject to cache replacement.
Still another object is to allow selected portions of files to permanently remain in cache storage, whereby the portions of files which are permanently in cache storage are not subject to cache replacement.
Another object is to dynamically vary the proportion of cache storage allocated to permanently cached files and files which are subject to cache replacement.
A further object is to selectively vary the level of cache replacement to which selected portions of files are subject.
According to the present invention, the foregoing and other objects and advantages are attained by coupling an outboard file cache to the input/output logic section of a host. The host issues file access commands which include a logical file-identifier and a logical offset. The outboard file cache includes a file descriptor table and cache memory for electronic random access storage of the cached files. The file descriptor table stores the logical file-identifiers and offsets of the portions of the files in the cache storage. Cache detection logic is interfaced with the file descriptor table and receives file access commands from the host. The file descriptor table is used to determine whether the portion of the file referenced by the file access command is present in the cache memory. Cache access control is responsive to the cache detection logic, and if the portion of the file referenced in the cache access command is present in cache memory, the desired access is provided. The outboard file cache is non-volatile relative to the main memory of the host because it is a separately powered storage system. Neither the host nor the outboard file cache is required to map the file data referenced in a file access command to the physical storage device and the physical address of the backing store on which the file data is stored if the referenced data is present in cache storage.
In accordance with yet another aspect of the invention, device identifiers and device addresses are stored in the file descriptor table for cached files. Destage initiation logic determines when it is necessary to destage a portion of a file in cache memory to the backing store on which the file is permanently stored. Destage control is responsive to the destage initiation logic and performs the destaging of file data from the outboard file cache to the backing store when prompted to do so by the destage initiation logic. The destage control is interfaced with the file descriptor table, whereby the device identifier and device address which specify the backing store to which the data is to be destaged are directly attainable. Mapping at destage time is minimized by pre-mapping, that is storing the device identifier and device address in the file descriptor table at the time file data is staged to the outboard file cache.
In an additional aspect of the invention a first and a second of host are coupled to the outboard file cache. The cache memory in the outboard file cache is shared between the files of the first host and the files of the second host The outboard file cache includes dual cache detection logic sections. Each of the cache detection logic sections may process file access commands from either the first host or the second host and each section operates concurrently with the other. The outboard file cache includes a first cache access control section and a second cache access control section. The first cache access control section is dedicated to providing access to the cache storage for the first host and the second cache access control section is dedicated to providing access to the cache storage for the second host.
In carrying out another aspect of the invention, file data in cache memory may be shared between a plurality of hosts. The outboard file cache includes a lock table which indicates the logical file and the portion of the file that is locked. Control is provided which is interfaced with the lock table and responsive to lock commands for locking selected files. The outboard file cache further includes control interfaced with the lock table and responsive to a cache miss condition for rejecting the access specified in a file access command if the referenced portion of the referenced file is locked and is not present in the cache storage.
Still another aspect of the invention involves selectively allowing files to permanently remain in cache memory, whereby files which are permanently in cache memory are not subject to cache replacement. File access commands further include a file data type indicator. The file data type indicator dictates whether the file data, once it is staged to cache storage, is eligible for cache replacement. File data present in cache storage which is not subject to cache replacement is referred to as resident data, and file data present in cache storage which is subject to cache replacement is referred to as replaceable data. Cache replacement control selects a portion of cache memory in which replaceable file data is presently stored for storage of data referenced in a file access command if the referenced data is not present in the cache and the file data type indicates replaceable data. Resident file storage control selects a portion of cache memory, in which neither replaceable nor resident file data is stored, for storage of data referenced in a file access command if the referenced data is not present in the cache and the file data type indicates resident file data.
In accordance with another aspect of the invention, the proportion of cache storage which is allocated to resident file data and the portion of cache storage which is allocated to replaceable file data is dynamically adjusted, thereby ensuring availability of adequate cache storage for each type of file. The cache storage is divided into two divisions, a first division for storing replaceable file data and a second division for storing resident file data. Apportioning control is responsive to the control for managing resident file storage. When all of the storage in the second division has resident file data stored therein, the apportioning control converts a predetermined amount of storage from the first division of storage to the second division of storage whereby additional cache storage is made available for resident file data. In addition, control is provided to monitor the amount of storage in the second division of cache storage in which resident file data is stored. The monitor establishes a periodic minimum usage level for the second division. The periodic minimum usage indicates the minimum amount of storage in which resident file data was stored over the prior predetermined period of time. Further control is provided which converts storage from the second division of cache storage to the first division of cache storage if the current amount of storage in the second division in which resident file data is stored falls below the periodic minimum.
Corresponding to another aspect of the invention, the priority to which file data in cache storage is subject cache replacement may be selectively varied in an incremental fashion, whereby portions of files for which future access is likely are temporarily made unavailable for cache replacement. A replacement level is provided to the cache system in the file data access command. If the data referenced in the command is not present in the cache, a portion of the cache is selected to which the data referenced in the command is staged. The particular portion of cache chosen is portion of cache in which data with the lowest designated replacement level is stored. As each of the other portions of cache are considered for replacement, their associated replacement levels are decremented, whereby each of the other portions of cache storage becomes a higher priority candidate for cache replacement.
Still other objects and advantages of the present invention will become readily apparent to those skilled in the art from the following detailed description, wherein only the preferred embodiment of the invention is shown, simply by way of illustration of the best mode contemplated for carrying out the invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
III. BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows an exemplary data processing system, or "host", with which the present invention could be used;
FIG. 2 shows the architecture of an Input/Output Complex of the exemplary Host;
FIG. 3 is a block diagram of a plurality of Hosts coupled to a variety of prior art disk subsystem configurations;
FIG. 4 illustrates an Outboard File Cache in a data storage hierarchy;
FIG. 5 shows the overall file processing within the data storage hierarchy shown in FIG. 4;
FIG. 6 is a functional block diagram of the hardware and software components of the preferred embodiment of the outboard file cache system;
FIGS. 7A, 7B, and 7C contain a data flow diagram illustrating the flow of data between each of the major functional components of the file cache system;
FIG. 8 shows the general layout of a Command Packet and the information contained therein;
FIG. 9 illustrates the Program Initiation Queue;
FIG. 10 shows the information contained in and the format of a Program Initiation Packet;
FIGS. 11 and 12 respectively illustrate the Status Packet Queue and the format and information contained in a Program Status Packet;
FIG. 13 illustrates the HIA ACB Buffer;
FIG. 14 illustrates Activity Queue, and FIG. 15 shows the information contained in each Activity Queue Entry;
FIG. 16 illustrates the file space available in the Outboard File Cache;
FIG. 17 shows the logical organization of a single Segment;
FIG. 18 shows the logical composition of a Block;
FIG. 19 shows the logical division between Cache File Space, Nail Space, and Resident File Space in the File Space of the Outboard File Cache;
FIG. 20 illustrates the File Descriptor Table;
FIG. 21 shows the information contained in a File Descriptor;
FIG. 22 is a flow chart of the general processing the I/O Software performs for file requests from Application Software;
FIG. 23 shows a flow chart of the FILE CACHE INTERFACE processing performed by the File Cache Handler Software;
FIG. 24 shows a flow chart of the general processing for detecting when the processing of a Command Packet (or a chain) is complete;
FIGS. 25A and 25B respectively show the components of a Data Mover (DM) and Host Interface Adapter (HIA);
FIG. 26 is a functional block diagram of the Index Processor (IXP);
FIG. 27 is a flow chart of the main processing loop of the IXP;
FIG. 28 is a block diagram to further illustrate the functional components of the Street interprocessor communication and storage access network within the Outboard File Cache;
FIG. 29 is an block diagram illustrating a data processing configuration including a plurality of Hosts coupled to a Outboard File Cache;
FIGS. 30 and 31 illustrate the relationship between Host Local Buffers, a Cache Buffer, and a Data Chain;
FIGS. 32, 33, and 34 respectively illustrate the implementation of the Data Chain, Data Chain Packet, and Data Descriptor Word;
FIG. 34 shows the format and content of a DATA.sub.-- DESCRIPTOR.sub.-- WORD;
FIGS. 35A and 35B illustrate the general status processing which is invoked from the Completion Monitor processing;
FIG. 36 is a flowchart showing the processing which occurs when the "Resend" RECOMMENDED.sub.-- ACTION is returned from the Outboard File Cache;
FIG. 37 is a flowchart of the Purge Disabled Segments and then Resend processing;
FIG. 38 is a flowchart of the Send CLEAR PENDING Followed by Original Command processing;
FIG. 39 is a flowchart showing the processing which occurs when the "Stage Data" RECOMMENDED.sub.-- ACTION is returned from the Outboard File Cache;
FIG. 40 shows a flowchart of the processing performed when the Stage Data and Log No Resident File Space Condition RECOMMENDED.sub.-- ACTION is returned from the Outboard File Cache;
FIG. 41 contains a flowchart of the processing performed for the Down File Cache Interface RECOMMENDED.sub.-- ACTION;
FIG. 42 contains a flowchart of the processing performed for the Down Outboard File Cache RECOMMENDED.sub.-- ACTION;
FIGS. 43A and 43B contain a flowchart of the general processing of the Destage Process;
FIG. 44 shows the format of a READ Command Packet;
FIG. 45 shows the content and format of the FILE.sub.-- IDENTIFIER;
FIG. 46 shows the content and format of the READ Status Packet;
FIG. 47 illustrates the format and content of a Destage Request Packet;
FIG. 48 illustrates the format and content of a ALLOCATE Command Packet;
FIG. 49 illustrates the format and content of a ALLOCATE Status Packet;
FIG. 50 is a flowchart illustrating the processing in which the CLEAR PENDING command may be used;
FIG. 51 illustrates the information and format of the CLEAR PENDING Command Packet;
FIG. 52 illustrates the information and format of the CLEAR PENDING Status Packet;
FIG. 53 shows the format of a DESTAGE Command Packet;
FIG. 54 shows the format of a DESTAGE Status Packet;
FIG. 55 shows the format of a Segment Information Packet;
FIG. 56 shows the format of a DESTAGE COMPLETE Command Packet;
FIG. 57 shows the format of a DESTAGE COMPLETE Status Packet;
FIG. 58 contains a flowchart which describes the process in which the DESTAGE AND PURGE DISK command may be used;
FIG. 59 shows the format of a DESTAGE AND PURGE DISK Command Packet;
FIG. 60 shows the format of a DESTAGE AND PURGE DISK Status Packet;
FIG. 61 contains a flowchart which describes the process in which the DESTAGE AND PURGE FILE command may be used;
FIG. 62 shows the format of a DESTAGE AND PURGE FILE Command Packet;
FIG. 63 shows the format of a DESTAGE AND PURGE FILE Status Packet;
FIG. 64 shows the format of a DESTAGE AND PURGE FILES BY ATTRIBUTES Command Packet;
FIG. 65 shows the format of a DESTAGE AND PURGE FILES BY ATTRIBUTES Status Packet;
FIG. 66 shows the format of a LOCK CACHE FILE Command Packet;
FIG. 67 shows the format of a LOCK CACHE FILE Status Packet;
FIG. 68 shows the format of a LOCK CACHE FILES BY ATTRIBUTES Command Packet;
FIG. 69 shows the format of a LOCK CACHE FILES BY ATTRIBUTES Status Packet;
FIG. 70 shows the format of a MODIFY File Descriptor Command Packet;
FIG. 71 illustrates the format and content of a MODIFY File Descriptor Status Packet;
FIG. 72 contains a flowchart showing the processing in which the PURGE DISK command may be used;
FIG. 73 shows the format of a PURGE DISK Command Packet;
FIG. 74 shows the format of a PURGE DISK Status Packet;
FIG. 75 contains a flowchart which describes the process in which the PURGE FILE command may be used;
FIG. 76 shows the format of a PURGE FILE Command Packet;
FIG. 77 shows the format of a PURGE FILE Status Packet;
FIG. 78 is a flowchart showing the processing in which the PURGE FILES BY ATTRIBUTES command may be used;
FIG. 79 shows the format of a PURGE FILES BY ATTRIBUTES Command Packet;
FIG. 80 shows the format of a PURGE FILES BY ATTRIBUTES Status Packet;
FIG. 81 shows the format of a RETURN SEGMENT STATE Command Packet;
FIG. 82 shows the format of a RETURN SEGMENT STATE Status Packet;
FIG. 83 shows the format of a Segment State Packet;
FIG. 84 illustrates the information and format of the STAGE BLOCKS Command Packet;
FIG. 85 illustrates the information and format of the STAGE BLOCKS Status Packet;
FIG. 86 illustrates the information and format of the STAGE SEGMENTS Command Packet;
FIG. 87 illustrates the information and format of the STAGE SEGMENTS Status Packet;
FIG. 88 illustrates the information and format of the STAGE WITHOUT DATA Command Packet;
FIG. 89 illustrates the information and format of the STAGE WITHOUT DATA Status Packet;
FIG. 90 shows the format of a UNLOCK CACHE FILE Command Packet;
FIG. 91 shows the format of a UNLOCK CACHE FILE Status Packet;
FIG. 92 shows the format of a UNLOCK CACHE FILES BY ATTRIBUTES Command Packet;
FIG. 93 shows the format of a UNLOCK CACHE FILES BY ATTRIBUTES Status Packet;
FIG. 94 shows the format and content of a WRITE Command Packet;
FIG. 95 shows the content and format of a WRITE Status Packet;
FIG. 96 shows the format and content of a WRITE OFF BLOCK BOUNDARY Command Packet;
FIG. 97 shows the content and format of a WRITE OFF BLOCK BOUNDARY Status Packet;
FIG. 98 illustrates logical block diagrams of the Hash Table, the File Descriptor Table, and File Space;
FIG. 99 illustrates the layout data and control structures of the Outboard File Cache in Non-Volatile Storage (NVS).
FIGS. 100A and 100B contain a flowchart of the COMMAND BRANCH processing;
FIGS. 101A, 101B, 101C, and 101D contain a flowchart of the READ-WRITE routine;
FIGS. 102A, 102B, 102C, and 102D contain a flowchart of the processing performed for STAGE commands;
FIGS. 103A and 103B contain a flowchart describing the processing performed by the Outboard File Cache for a DESTAGE command;
FIGS. 104A, 104B, and 104C contain a flowchart of the processing performed by the Outboard File Cache in processing a DESTAGE COMPLETE command;
FIG. 105 contains a flowchart of the processing done by the Outboard File Cache for a WRITE OFF BLOCK BOUNDARY command;
FIG. 106 contains a flowchart of the processing performed by the Outboard File Cache for a CLEAR PENDING command;
FIG. 107 contains a flowchart of the processing performed by the Outboard File Cache for a RETURN SEGMENT STATE command;
FIG. 108 illustrates lock tables used for coordinating file locks as used in the LOCK CACHE FILE, LOCK CACHE FILES BY ATTRIBUTES, UNLOCK CACHE FILE, and UNLOCK CACHE FILES BY ATTRIBUTES commands;
FIGS. 109A and 109B contain a flow chart of the processing performed for the LOCK CACHE FILE and LOCK CACHE FILES BY ATTRIBUTES commands;
FIG. 110 contains a flowchart of the processing performed for processing LOCK CACHE FILES BY ATTRIBUTES commands;
FIGS. 111A and 111B contain a flowchart of the processing performed for the UNLOCK CACHE FILE and UNLOCK CACHE FILES BY ATTRIBUTES commands;
FIG. 112 contains a flowchart of the processing performed for an UNLOCK CACHE FILES BY ATTRIBUTES command;
FIGS. 113A, 113B, 113C, 113D, 113E, and 113F contain a flowchart of the LOGICAL-SCAN processing performed by the Outboard File Cache in processing the DESTAGE, DESTAGE AND PURGE FILE, MODIFY File Descriptor, CLEAR PENDING, and RETURN SEGMENT STATE commands;
FIGS. 114A, 114B, 114C, 114D, 114E, 114F, and 114G illustrate the flowchart for the PHYSICAL-SCAN processing performed by the Outboard File Cache;
FIG. 115 illustrates the flowchart for SEARCH processing;
FIGS. 116A, 116B, and 116C contain a flowchart of the processing performed when access to a segment is requested and the segment is not present in the Outboard File Cache;
FIGS. 117A and 117B contain a flowchart of the processing performed upon invocation of the MISS-B and SPECULATE-HIT-1 processing;
FIG. 118 contains a flowchart of the MISS-END processing;
FIG. 119 contains a flowchart of the MISS-BA processing;
FIGS. 120A, 120B, 120C, and 120D contain a flowchart of FLAGS processing which tests the flags in the File Descriptor when a segment hit occurs;
FIG. 121 contains a flowchart of CLEAR-STAGE-PENDING processing which clears the STAGE.sub.-- PENDING state for segments which have been placed in a STAGE.sub.-- PENDING as a result of processing a READ, WRITE, or WRITE OFF BLOCK BOUNDARY command;
FIG. 122 contains a flowchart of the processing performed for both FIX-STATE and FIX-STATE-1;
FIG. 123 contains a flowchart of the HASH function;
FIGS. 124A, 124B, 124C, 124D, 124E, and 124F contain a flowchart of REUSE processing which selects a segment in the cache for allocation;
FIG. 125 contains a flowchart of the PRE-USE processing which reserves a segment for an Index Processor;
FIGS. 126A, 126B, 126C, and 126D contain a flowchart of the DESTAGE-CHECK processing which identifies segments for destaging and creates Destage Request Packets;
FIG. 127 contains a flow chart of RELINK processing which links a File Descriptor into a hash list of File Descriptors;
FIG. 128 contains a flowchart of the DELINK processing to remove a File Descriptor from a hash list;
FIG. 129 contains a flowchart of the processing performed in iterating many of the processing loops described herein;
FIGS. 130A and 130B contain a flowchart of the processing for detecting whether a file is surging;
FIG. 131 contains a flowchart illustrating the SPECULATE-DECISION processing which determines whether more segments should be staged than were identified in the Command Packet;
FIGS. 132A and 132B contain a flowchart of the DESTAGE-GROUP processing which gathers a group of segments to be included in a Destage Request Packet;
FIGS. 133A and 133B contain a flowchart of the DESTAGE-BUILD processing which forms a Segment Information Packet to return to a Host for destaging segments;
FIG. 134 contains a flowchart of CACHE-TIGHT processing for gathering segments destage when a cache-tight condition is detected;
FIG. 135 contains a flowchart for SPECULATIVE-HIT-TEST PROCESSING;
FIG. 136 contains a flowchart of the FIX-STATE-FOR-HITS processing;
FIG. 137 contains a flowchart of the processing performed in purging a segment from the Outboard File Cache;
FIG. 138 contains a flowchart of the PURGE-BLOCKS processing to purge selected blocks from a segment in the Outboard File Cache;
FIGS. 139A-E contain a flowchart of the processing for the ALLOCATE command;
FIG. 140 contains a flowchart for the ENDERR, ENDWT, and END processing which completes processing of a Command Packet;
FIG. 141 contains a flowchart of NEW-BIT processing which tests whether the NEW flag in a File Descriptor should be set for the segment in process;
FIGS. 142A and 142B contain a flowchart of GET-NAIL processing which locates an available segment in Nail Space for allocation;
FIG. 143 contains a flowchart of CONVERT-SPACE processing which reapportions Cache File Space and Nail Space;
FIGS. 144A, 144B, and 144C contain a flowchart for LESS-NAIL processing which converts 64 segments at the end of Nail Space in each Storage Module to Cache File Space;
FIG. 145 contains a flowchart for GIVE-SEGMENT processing which returns an allocated nailed segment to the linked list of available segments in Nail Space;
FIG. 146 contains a flowchart illustrating the processing for GET-RESIDENT-FILE;
FIG. 147 contains a flowchart of MORE-RESIDENT-FILE processing which reapportions Cache File Space and Resident File Space;
FIGS. 148A-D contain a flowchart of LESS-RESIDENT-FILE processing which reapportions File Space between Resident File Space and Cache File Space; and
FIG. 149 contains a flowchart for GIVE-RESIDENT-FILE processing which returns an allocated segment in Resident File Space to the linked list of available segments in Resident File Space.
IV. DESCRIPTION OF THE PREFERRED EMBODIMENT
A. Host Data Processing System
FIG. 1 shows an exemplary data processing system, or "host", with which the present invention could be used. The Host 10 architecture is that of the 2200/900 Series data processing system which is commercially available from the Unisys Corporation.
The Instruction Processors (IPs) 12 are the basic instruction execution units of the system. Each IP includes a first level cache (not shown) having a section for instructions and a section for operands. The Ips 12 are functional to call instructions from memory, execute the instructions and store the results, and in general, perform data manipulation.
Each of the IPs 12 is directly coupled via Cables 13 to a Storage Controller (SC) 14. The maximum configuration for the 2200/900 data processing system includes four SCs 14, each SC having two directly coupled IPs 12. The SCs 14 provide logic and interconnects which provide access to Main Storage Units (MSUs) 16. The MSUs comprise the main random access memory of the Host 10. Each SC 14 controls access to two directly coupled MSUs 16. Cables 18 couple the MSUs to their respective SCs 14.
The SCs 14 contain interconnect logic that ties all IPs 12 together in a tightly coupled system. SC1 is coupled to SC2 via Cable 20; SC1 is coupled to SC3 via Cable 22; SC1 is coupled to SC4 via Cable 24; SC2 is coupled to SC3 via Cable 26; SC2 is coupled to SC4 via Cable 28; and SC3 is coupled to SC4 via Cable 30. Each IP 12 can address every MSU 16 of Host 10. For example, the SC intercoupling allows IP6 to have access to the addressable memory of MSU8. A memory request originating in IP6 is first sent to SC3; SC3 sends the memory request to SC4; SC4 provides access to the portion of addressable memory; and if requested, SC4 returns data to SC3 which in turn forwards the data to IP6.
Each of the SCs 14 also provide interfaces for two Input/Output Complexes (IOCs) 32. Cables 34 couple each of the IOCs 32 to their respective SCs 14. Each of the IOCs 32 may contain multiple Input/Output Processors (IOPs not shown). The IOPs read data from the MSUs 16 for writing to peripheral devices, and read data from peripheral devices for writing to the MSUs 16. Peripheral devices may include printers, tape drives, disk drives, network communication processors, etc.
The 2200 Series data processing architecture allows a Host 10 to be logically partitioned into one or more independent operating environments. Each independent operating environment is referred to as a partition. A partition has its own operating system software which manages the allocation of resources within the partition. Because a partition has its own operating system, it may be also referred to as a Host. Using Host 10 as an example, it could be partitioned into four Hosts: a first host having the resources accompanying SC1, a second host having the resources accompanying SC2, a third host having the resources accompanying SC3, and a fourth host having the resources accompanying SC4.
FIG. 2 shows the architecture of an Input/Output Complex of the exemplary Host. Input/Output Remote Adapter (IRA) 36 is a non-intelligent adapter which transfers data and messages between an SC 14 and an IOP 38 via an Input/Output Bus 40. The IRA 36 occupies one physical drop out of the thirteen available on Input/Output Bus 40 and has the highest priority of any unit connected to Input/Output Bus 40. IRA 36 does not participate in any rotational priority operations and can gain access to the Input/Output Bus 26 through the normal request process even when other units coupled to the Input/Output Bus are operating in a rotational priority mode.
The Input/Output Bus 40 provides the communication path and protocol to transfer data between the attached units. The Input/Output Bus 40 can accommodate twelve Input/Output Processors 38. It will be recognized that bus architectures are well known in the prior art and a further discussion of the Input/Output Bus shown is not necessary for the purposes of the present invention.
The IOPs 38 are microprocessor controlled units that control the initiation, data transfer, and termination sequences associated with software generated I/O channel programs. Initiation and termination sequences are executed by the microprocessor and data transfer is controlled by hard-wired logic. Each IOP 38 is coupled to a Data Bus 42, which in turn has available slots for up to four Block Mux Channel Adapters 44 and a Word Channel Adapter 46. Channel Adapters 44 and 46 are coupled to their respective peripheral subsystems via Cables 48. While not shown, it should be understood that each of IOP2, IOP3, . . . , and IOP12 is coupled to its associated Data Bus. The 11 Data Buses which are not shown, provide connections for additional Channel Adapters. Lines 50 represent the coupling between IOP2, IOP3, . . . , and IOP12 and their associated Data Buses.
B. Prior Art Data Storage Hierarchy
FIG. 3 is a block diagram of a plurality of Hosts coupled to a variety of prior art disk subsystem configurations. FIG. 3 serves to illustrate the hierarchical relationship between the configurations. Each Host 10 is coupled to one or more of the Control Units 80, 82, 88, or 92 by Line 48. Host-1 is coupled to Control Units 80 and 82. Control Unit 80 provides access to Magnetic Disks 84, and Control Unit 82 provides access to Magnetic Disks 86. If application software on Host-1 requests access to a file stored on Magnetic Disks 84 or 86, operating system software is required to find: (1) the Disk 84 or 86 on which the file is stored; (2) which Control Unit 80 or 82 provides access to the disk; (3) the IOP 38 to which the Control Unit is coupled; and (4) the Input/Output Bus 40 to which the IOP 38 is coupled. Once the necessary information is determined, a control program can be constructed and sent along the identified data path to access the file. File data may be buffered in the Main Storage 16 of Host-1 to enhance the access rate for file data; however, the file data must be destaged to Disks 84 to protect against data loss.
Control Unit 82 is coupled to and shared by Host-1, Host-2, and Host-3. Each of the coupled Hosts can access data stored on Disks 86. A Multi-Host File Sharing (MHFS) system, which is commercially available from Unisys Corporation, allows application software on Host-1, Host-2, and Host-3 to share file data stored on Disks 86 and coordinates locking files or portions thereof.
Host-3 is coupled to Cache Disk Controller 88. Cache Disk Controller 88 provides access to Disks 90 and buffers portions of Disks 90. The cache storage that Cache Disk Controller 88 uses to buffer Disks 90 resides within the Cache Disk Controller 88. Operation of the Cache Disk Controller 88 is transparent to application and system software on Host-3. The cache storage is allocated to all application and system software having access to files stored on Disks 90 on a first-come first-served basis.
Control Unit 92 is coupled to Host-n and controls access to Disks 94 and a Solid State Disk 96. The Solid State Disk 96 resides at the Disk 94 level of the data storage hierarchy and provides access to data stored therein at electronic rather than the electromechanical speed of the Disks 94. In order to gain access to data stored on Solid State Disk 96, the data path on which the disk resides must be constructed in the same manner as discussed above for Disks 84.
C. File Cache System Overview
FIG. 4 illustrates an Outboard File Cache in a data storage hierarchy. A plurality of Control Units 104 are coupled to Host 10 via IOPs 38 for providing access to Disks 106. Application and system software executing on Host 10 reads data from and writes data to Files 108a-h. While Files 108a-h are depicted as blocks it should be understood that the data is not necessarily stored contiguously in Disks 106. The Disks provide mass storage for retaining the Files. In the storage hierarchy, disks would fall into the category of secondary storage, with primary storage being the main memory of a Host.
Outboard File Cache 102 provides cache storage for Files 108a-h with resiliency against data loss which is comparable to Disks 108. A Data Mover 110 is coupled to the Input/Output Bus 40 in the Host and provides a functionality which is similar to the IOPs 38. The Data Mover provides a Fiber Optic Link 112 to the Outboard File Cache. An or part of Files 108 may be stored in the Outboard File Cache 102 depending upon the storage capacity of the Outboard File Cache 102, and the size and number of Files 108 selected to be cached.
The portion of Files 108a-h that are stored in the Outboard File Cache 102 are shown as blocks 114a-h. The cached portion of Files 108 are labeled File-A', File-B', File-H' for discussion purposes. File-A' 114a is the portion of File-A that is stored in Outboard File Cache 102, File-B' 114b is the portion of File-B that is stored in Outboard File Cache 102, etc. The Outboard File Cache at this level of the storage hierarchy allows references to cached files to be immediately directed to the Outboard File Cache 102 for processing, in contrast with a non-cached file where an I/O channel program must be constructed to access the proper disk and the request and data must flow through a possibly lengthy data path.
FIG. 5 shows the overall file processing within the data storage hierarchy shown in FIG. 4. The processing begins at Step 122 where a software application executing on Host 10 requests access to a selected file. The access request may involve either reading data from or writing data to the selected file.
A file access command is sent to the Outboard File Cache 102 at Step 124. Included in the file access command are a file identifier which specifies the file on which the operation is to be performed, an offset from the beginning of the file which specifies precisely where in the file the operation is to begin, and the quantity of data which is to be read from or written to the file. At Decision Step 126, the Outboard File Cache determines whether the referenced data is present in the Outboard File Cache 102 based on the file identifier, offset, and quantity. If the referenced data is not in the Outboard File Cache 102, Control Path 128 is followed to Step 130.
Step 130 involves staging the data from the appropriate Disk 106 to the Outboard File Cache 102. Staging the data involves reading the required data from Disk 106 and then storing the data in the Outboard File Cache. Subsequent references to the staged data normally will not result in a miss, and the data can be accessed in the Outboard File Cache. If Decision Step 126 finds that the referenced data is in Outboard File Cache 102, Control Path 132 is followed to Step 134 where access is granted to the referenced data.
1. Functional Block Diagram
FIG. 6 is a functional block diagram of the hardware and software components of the preferred embodiment of the outboard file cache system. The overall system is comprised of hardware and software elements in both the Host 10 and Outboard File Cache 102. The software on Host 10 is shown by blocks 202, 204, 206, and 208. The blocks are joined to signify the interrelationships and software interfaces between the software elements.
Application Software 202 provides data processing functionality to end users and includes applications such as bank transaction processing and airline reservations systems. Data bases maintained by Application Software 202 may be stored in one or more the exemplary Files 108 as shown in FIG. 4. File Management Software 204, Input/Output Software 206, and File Cache Handler Software 208 are all part of the operating system (not shown). In general File Management Software 204 provides overall management of file control structures, and in particular handles the creating, deleting, opening, and closing of files.
Input/Output Software 206 provides the software interface to each of the various I/O devices coupled to the Host 10. The I/O devices may include network communication processors, magnetic disks, printers, magnetic tapes, and optical disks. Input/Output Software 206 builds channel programs, provides the channel programs to the appropriate IOP 38, and returns control to the requesting program at the appropriate time.
File Cache Handler Software 208 coordinates the overall processing for cached files. In general, File Cache Handler Software 208 provides the operating system level interface to the Outboard File Cache 102, stages file data from Disks 106 to the Outboard File Cache 102, and destages file data from the Outboard File Cache 102 to Disks 106. The File Cache Handler Software 208 provides file data and file access commands to the hardware interface to the Outboard File Cache 102 via Main Storage 16. Main Storage 16 is coupled to the Input/Output Bus 40 by Line 210. Line 210 logically represents the Storage Controller 14 and Input/Output Remote Adapter 36 of FIGS. 1 and 2.
A Data Mover (DM) 110a provides the hardware interface to the Outboard File Cache 102. While two DMs 110a and 110b are shown, the system does not require two DMs for normal operations. A configuration with two DMs provides fault tolerant operation; that is, if DM 110a fails, DM 110b is available to process file requests. Each of the DMs is coupled to the Input/Output Bus 40 of Host 10. File Cache Handler Software 208 distributes file access commands among each of the DMs coupled to Input/Output Bus 40. If DM 110a fails, file access commands queued to DM 110a can be redistributed to DM 110b.
The DMs 110a and 110b provide functionality which is similar to the IOPs 38 of FIG. 2, that is to read data from and write data to a peripheral device. The DMs can read from and write to Main Storage 16 without the aid of IPs 12. The DMs coordinate the processing of file access commands between File Cache Handler Software 208 and the Outboard File Cache 102 and move file data between Main Storage 16 and the Outboard File Cache 102. Each of the DMs is coupled to a Host Interface Adapter (HIA) 214 logic section within the Outboard File Cache 102. DM 110a is coupled to HIA 214a by a pair of fiber optic cables shown as Line 112a, and DM 110b is coupled to HIA 214b by a second pair of fiber optic cables shown as Line 112b.
The Outboard File Cache 102 is configured with redundant power, redundant clocking, redundant storage, redundant storage access paths, and redundant processors for processing file access commands, all of which cooperate to provide a fault tolerant architecture for storing file data. The Outboard File Cache 102 is powered by dual Power Supplies 222a and 222b. The portion of the Outboard File Cache 102 to the left of dashed line 224 is powered by Power Supply 222a and is referred to as Power Domain 225a, and the portion of the Outboard File Cache 102 to the right of dashed line 224 is powered by Power Supply 222b and is referred to as Power Domain 225b. Each of Power Supplies 222a and 222b has a dedicated battery and generator backup to protect against loss of the input power source.
Two separately powered Clock Sources 226a and 226b provide timing signals to all the logic sections of Outboard File Cache 102. Clock Source 226a provides timing to the logic sections within Power Domain 225a and Clock Source 226b provides timing to the logic sections within Power Domain 225b. Redundant oscillators within each Clock Source provide protection against the failure of one, and Clock Sources A and B are synchronized for consistent timing across Power Domains A and B.
Non-Volatile Storage (NVS) section 220 includes multiple DRAM storage modules and provides the cache memory. Half of the storage modules are within Power Domain 225a and the other half are within Power Domain 225b. The data contained within the storage modules in Power Domain 225b reflects the data stored in storage modules within Power Domain 225a. NVS 220 thereby provides for redundant storage of file data and the control structures used by the Outboard File Cache 102. The redundant storage organization provides for both single and multiple bit error detection and correction.
The portion of NVS 220 within each of the Power Domains 226a and 226b is coupled to two Storage Interface Controllers (SICTs) 228a and 228b. While only two SICT are shown in FIG. 6, each half of NVS 220 is addressable by up to four SICT. Line 230 represents the coupling between SICT 228a and the portion of NVS 220 within each of Power Domains 225a and 225b. Similarly, Line 232 represents the coupling between SICT 228b and NVS 220.
Read and write requests for NVS 220 are sent to the SICTs 228a and 228b via Street Networks 234a and 234b. The Street Network provides the data transfer and interprocessor communication between the major logic sections within the Outboard File Cache 102. The Street Network is built to provide multiple requesters (HIAs 214a and 214b or Index Processors 236a and 236b) with high bandwidth access to NVS 220, as well as multiple paths for redundant access. Crossover 238 provides a path whereby NVS 220 requests may be sent from Street 234a to Street 234b, or visa versa, if a SICT is unavailable. For example, if SICT 228a fails, NVS requests sent from requesters (HIAs and IXPs) are sent to Street 234b via Crossover 238, whereby NVS 220 access is provided by SICT 228b.
The HIAs 214a and 214b provide functionality in the Outboard File Cache 102 which is similar to the functionality provided by the DMs 110a and 110b on the Host 10. In particular, the HIAs receive file access commands sent from the DM and provide general cache access control such as writing file data sent from the Host to Non-Volatile Storage (NVS) 220 and reading file data from NVS and sending it to the Host. The HLAs also contain the logic for sending and receiving data over fiber optic Cables 112a and 112b.
Index Processors (IXPs) 236a and 236b manage allocation and cache replacement for the storage space available in NVS 220, service file data access commands sent from Host 10, and generally provides for overall file cache management The IXPs contain microcode control for detecting whether the file data referenced in a file data access command is present in the cache memory, and for managing and coordinating access to the cache memory. The functionality provided by an IXP will be discussed in greater detail later in this specification.
2. Data Flow
FIGS. 7A, 7B, and 7C contain a data flow diagram illustrating the flow of data between each of the major functional components of the file cache system. Each of the blocks represents a major logic section, a software component, or a storage section of the file cache system. Within each of the blocks are data structures which are shown as labelled online storage symbols and circles representing processing performed by the component. Although the circles represent the processing performed, they are not intended to illustrate the flow of control. The directional lines represent the flow of data between processing circles and data structures and are labelled according to the data being transferred. FIGS. 8 through 15 show the information contained within the data structures referenced in FIG. 7. Each of FIGS. 8 through 15 will be discussed as it is encountered in the discussion of FIG. 7.
File access commands begin with application software on the Host 10 (not shown in FIG. 7) requesting input or output services (I/O) for a selected file. I/O requests for cached files are processed by the File Cache Handler Software 208. Data flow Line 300 shows the input of an I/O request to File Cache Handler Software 208. I/O requests are sent from the Host 10 to the Outboard File Cache 102 in Command Packets. At Process Node 302 the File Cache Handler Software 208 builds a Command Packet (CP) for the specified I/O request and stores the Command Packet in a Command Packet Data Structure 304. Line 306 represents storing the I/O request information in the Command Packet Data Structure 304.
a. Command Packet
FIG. 8 shows the general layout of a Command Packet and the information contained therein. The Command Packet 452 contains information that describes one of the available Outboard File Cache commands (read, write, stage, destage, etc.). Each of the commands is identified and discussed later in this specification. FIG. 8 shows only the command information which is common to all Command Packets for the various command types.
A Command Packet can have from 4 to 67 36-bit words, depending upon the command type. Words 0 and 1, bits 12 through 23 of Word 3, and Words 4 through n of the Command Packet, respectively referenced by 452a, 452b, and 452c, are dependent upon the command type.
The file cache system permits Command Packets to be chained together. That is, a first Command Packet 452 may point to a second Command Packet, and the second Command Packet may point to a third Command Packet, and so on. The NEXT.sub.-- COMMAND.sub.-- PACKET 452d is used for chaining the Command Packets together. It contains the address of the next Command Packet in the command chain. If the CCF 452e (Command Chain Flag) is set, then NEXT.sub.-- COMMAND.sub.-- PACKET contains the address of the next Command Packet in the command chain. A chain of commands is also referred to as a "program." If CCF is clear, then no Command Packets follow the Command Packet in the command chain. The CCF is stored at Bit 5 of Word 3 in the Command Packet.
The LENGTH 452f of the Command Packet, that is the number of words in the Command Packet following Word 3, is stored in bits 6 through 11 of Word 3. Bits 24 through 35 of Word 3 contain COMMAND.sub.-- CODE 452f which indicates the operation to be performed by the Outboard File Cache. Bits 0-4 of Word 3 and referenced by 452g are reserved.
Processing Node 308 in FIG. 7 enqueues a Program Initiation Packet (PIP) in a Program Initiation Queue (PIQ) 310. Line 312 represents the flow of Program Initiation Packet information to the Program Initiation Queue 310. The Command Packet (CP) Address from Node 302 is used in enqueuing a PIP. The CP Address supplied to Node 308 is shown by Line 309.
b. Program Initiation Queue
FIG. 9 illustrates the Program Initiation Queue. The Program Initiation Queue 310 may contain up to 32 Program Initiation Packets (PIPs), respectively referenced 456-1, 456-2, 456-3, . . . , 456-32. The Program Initiation Queue may be larger or smaller depending upon implementation chosen. Once the Program Initiation Queue is filled with Program Initiation Packets, further queuing is performed to handle the overflow.
FIG. 10 shows the information contained in and the format of a Program Initiation Packet. VF (Valid Flag) 456a is stored in bit 0 of Word 0 of the Program Initiation Packet 456. VF indicates whether the information in the Program Initiation Queue 310 entry is valid.
Bits 1 through 35 of Word 0 and Bits 0 through 3 of Word 1 are reserved for future use and are respectively referenced in FIG. 10 by 456b and 456c. The PROGRAM.sub.-- ID 456d is stored in bits 4 through 35 of Word 1. The PROGRAM.sub.-- ID uniquely identifies the program being submitted to the Outboard File Cache 102. The PROGRAM.sub.-- ID is used to associate the status returned from the Outboard File Cache 102 with the program to which it applies.
Word 2 of the Program Initiation Packet 456 contains the COMMAND.sub.-- PACKET.sub.-- ADDRESS 456e which is the real address of the first Command Packet 452 in a command chain or a single Command Packet. Word 3 contains the NEXT.sub.-- SP.sub.-- ADDRESS 456f. The NEXT.sub.-- SP.sub.-- ADDRESS is the real address in Main Storage 16 of an area where the Outboard File Cache 102 can write status information.
After the Outboard File Cache 102 has processed a command, the status of the command is reported back to the Host 10 in a Program Status Packet (PSP). Line 314 shows the flow of a Program Status Packet from the Data Mover (DM) 110 to an entry in the Status Packet Queue (SPQ) 316. The format of the Status Packet Queue 316 and the Program Status Packet is described next, followed by further discussion of Command Packet processing.
c. Status Packet Queue and Program Status Packet
FIGS. 11 and 12 respectively illustrate the Status Packet Queue and the format and information contained in a Program Status Packet. The number of Program Status Packets 460 in the Status Packet Queue 316 is equal to the number of programs queued in the Program Initiation Queue and are respectively referenced 460-1, 460-2, 460-3, . . . , 460-n. Generally, the content and format of a Program Status Packet is as follows:
______________________________________
Word Bit Definition
______________________________________
0 0-5 Valid Flag (VF) 460a indicates whether the Program
Status Packet contains valid status information. If
VF=0, then the Program Status Packet does not
contain valid status information. If the VF=1, then
the Program Status Packet does contain valid status
information.
0 6-17 Reserved as referenced by 460b.
0 18-35 UPI.sub.-- NUMBER 460c is the Universal
Processor Interrupt (UPI) number associated
with the Outboard File Cache interface.
1 0-3 Reserved as reference by 460d.
1 4-35 PROGRAM.sub.-- ID 460e is a value which indentifies the
Command Packet (or Command Packet Chain) which
is associated with the Program Status Packet. If
NO.sub.-- PROGRAM in the FLAGS field is set,
PROGRAM.sub.-- ID is reserved. Every Outboard File
Cache program issued by a Host has an associated
PROGRAM.sub.-- ID which is unique within the Host.
When status is returned to the Host, PROGRAM.sub.-- ID
is used to relate the status to the program to which it
applies. Note that PROGRAM.sub.-- ID applies to all
commands within a single program. A status is
associated with a command in a command chain by
using the COMMAND.sub.-- PACKET.sub.-- ADDRESS. The
portion of the File Cache Handler that builds and
initiates Outboard File Cache programs generates the
PROGRAM.sub.-- ID.
2 0-35 COMMAND.sub.-- PACKET.sub.-- ADDRESS 406f is a value
which contains the real address of the Command
Packet to which the status applies. When a chain of
commands is submitted to the Outboard File Cache
102 for processing, the Command Packet Address will
point to the Command Packet which caused an error.
If all the Command Packets in the command chain
were processed without error, then the Command
Packet Address points to the last Command Packet in
the command chain.
3 3-35 HARDWARE.sub.-- DEPENDENT.sub.-- STATUS-1 460g is an
address within Main Storage 16 which was referenced
and an error was detected. The File Cache Handler
Software 208 takes the RECOMMENDED.sub.-- ACTION.
4 0-35 This word is reserved and is beyond the scope of this
invention.
5 0-11 RECOMMENDED.sub.-- ACTION 460i is the processing
that should be performed by the File Cache Handler
Software 208 upon receiving a Program Status Packet.
5 12-23 REASON 460j indicates the condition that caused the
particular status to be returned.
5 24-29 COUNT 460k is the recommended number of times
that the File Cache Handler Software 208 should retry
when responding to the status in the Program Status
Packet. For example, if the
RECOMMENDED.sub.-- ACTION returned is Resend,
then the Count indicates the number of times which the
File Cache Handler Software 208 should resend the
Command Packet. If NO.sub.-- PROGRAM in the FLAGS
field is not set and the RECOMMENDED.sub.-- ACTION
does not equal "no action required", this field
specifies the number of times the command specified
by the Command Packet pointed to by
COMMAND.sub.-- PACKET.sub.-- ADDRESS should be
retried. Retries apply only to that command and not to
any other commands in a command chain. All retries
use the same Outboard File Cache Interface to which
the original command was directed. If
NO.sub.-- PROGRAM in the FLAGS field is not set and
RECOMMENDED.sub.-- ACTION equals "no action
required", COUNT must be equal to 0. If
NO.sub.-- PROGRAM in the FLAGS field is set, this field
is reserved.
5 30-35 FLAGS 4601 is a set of bits that relay ancillary
information.
5 30 PRIORITY.sub.-- DESTAGE indicates whether priority
destage is required. If PRIORITY.sub.-- DESTAGE is set,
then the Destage Request Packets in the Destage
Request Table (see the READ Status Packet) refer to
segments that must be destaged as soon as possible.
If NO.sub.-- PROGRAM is set or
DESTAGE.sub.-- REQUEST.sub.-- PACKETS is not set,
PRIORITY.sub.-- DESTAGE must equal 0.
5 31 DESTAGE.sub.-- REQUEST.sub.-- PACKETS is a flag which
indicates whether the Destage Request Table exists
(see the READ Status Packet). If NO.sub.-- PROGRAM is
set, or the status applies to an invalid command, or
the status applies to a non-I/O command, then this
flag must be 0.
5 32 TERMINATED.sub.-- POLLING is a flag which indicates
that a Program Initiation Queue is no longer being
polled.
5 33 Reserved.
5 34 NO.sub.-- PROGRAM is a flag which indicates whether the
status is associated with a Command Packet. If
NO.sub.-- PROGRAM is set, then the status is not
associated with a Command Packet. If
TERMINATED.sub.-- POLLING is set, NO.sub.-- PROGRAM
must also be set. If the Program Status Packet is
returned via the Status Packet Queue,
NO.sub.-- PROGRAM must equal 0. This flag is beyond
the scope of this invention.
5 35 Reserved and is beyond the scope of this invention.
6 0-35 STATISTICS 460m is a set of codes which indicate
how successful the Outboard File Cache 208 has been
in avoiding destaging file data, speculating upon the
future file access commands, and the time the
Outboard File Cache 208 spent in processing the
Command Packet(s).
7 0-11 RECOVERY.sub.-- TIME is used to indicate to a Host 10
that the Outboard File Cache 102 is in the process of
performing a set of actions to recover from an internal
fault condition. The nature of the fault recovery
prohibit the Outboard File Cache from responding to
any commands received from a Host. When a
command is received, it is not processed by the
Outboard File Cache and is returned to the sending
Host with a RECOMMENDED.sub.-- ACTION equal to
"Resend." RECOVERY.sub.-- TIME is only used when the
NO.sub.-- PROGRAM flag is not set and the
RECOMMENDED.sub.-- ACTION is Resend. The value
contained in RECOVERY.sub.-- TIME provides the number
of six second intervals required to complete the
necessary recovery actions.
7 12-35 See Words 8-127
8-127 These words contain information which is
dependent upon the particular command in the
Command Packet which is associated with the
Program Status Packet. Words 7-119,
referenced by 460n depend upon
NO.sub.-- PROGRAM and COMMAND.sub.-- CODE (see
the READ Status Packet), and words 120
through 127 are reserved for future use as
referenced by 460o.
______________________________________
The discussion now returns to Command Packet processing as shown in FIG. 7. Before the enqueue Processing Node 308 writes an entry in the Program Initiation Queue 310, it first obtains the address of an available Program Status Packet 460 from the Status Packet Queue 316, as shown by Line 318. If the Valid Flag 460a in the Program Status Packet is 0, then the Program Status Packet is available for status reporting. The address of the Program Status Packet is stored in NEXT.sub.-- SP.sub.-- ADDRESS 456e in the Program Initiation Packet 456 in the Program Initiation Queue 310.
The Data Mover 110 continually monitors the Program Initiation Queue 310 for the presence of Command Packets 452 to process as shown by the Monitor and Retrieve Processing Node 320. A pointer to an entry in the Program Initiation Queue 310 is used for monitoring the Program Initiation Queue. If the VF 456a for the Program Initiation Packet 456 referenced by the pointer is equal to 1, then the Program Initiation Packet is valid and a Command Packet is available. If the VF equals 0, then the Program Initiation Packet is invalid which means there is no Command Packet available for processing; the same Program Initiation Packet is monitored until the VF is set. Line 322 represents the reading of a Program Initiation Packet from the Program Initiation Queue.
Where the VF 456a in the PIP is set, the Program Initiation Queue 310 pointer is advanced to the next entry in the queue, and the next entry is thereafter monitored. The Program Initiation Packet 456 with the VF set is then used to retrieve the Command Packet 452. The COMMAND.sub.-- PACKET.sub.-- ADDRESS 456e in the Program Initiation Packet is used to read the Command Packet from the Command Packet Data Structure 304 as indicated by Line 324.
The information in the Command Packet 456 is then written to one of the Activity Control Block (ACB) Buffers 326 which is local to the Data Mover 110, as indicated by data flow Line 328. There are three buffers used by the Data Mover 110 to manage Command Packets. Each of the ACB Buffers is described in greater detail in the discussion for the Data Mover. The Buffers are large enough for 16 entries, which allows for a maximum 16 Command Packets to be "active." When there are 16 active commands, the Data Mover 110 suspends monitoring the Program Initiation Queue 310 until one of the 16 commands is complete. In general, the ACB Buffers hold Command Packets and assorted control codes for the transfer of data between the Data Mover 110 and Main Storage 16.
After a Command Packet is written to the ACB Buffers 326, the Send Processing Node 332 reads the Command Packet 452 from the appropriate ACB Buffer as shown by data flow Line 332. The Command Packet is then sent via the Fiber Optic Cable 216 to the Host Interface Adapter 214 as shown by data flow Line 334. The Receive Processing Node receives the Command Packet and enters the Command Packet into the HIA ACB Buffer 338 as indicated by data flow Line 340.
FIG. 13 illustrates the HIA ACB Buffer. The HIA ACB Buffer 338 has 16 entries, respectively referenced 338-1 through 338-16, for managing activities. Each entry in the HIA ACB Buffer contains a Command Packet and Status Information associated with the Command Packet. Associated with each entry in the HIA ACB Buffer is an ACB Number. ACB Number 1 references the first entry 338-1 in the HIA ACB Buffer, ACB Number 2 references the second entry 338-2, . . . , and ACB Number 16 references the sixteenth entry 338-16.
The Monitor and Put Processing Node 342 monitors the HIA ACB Buffer 338 for the arrival of Command Packets. When a Command Packet arrives in the HIA ACB Buffer 338, the ACB Number associated with the HIA ACB Buffer entry is read as indicated by data flow Line 344. Processing Node 342 then puts an Activity Queue (AQ) Entry in the Activity Queue as shown by data flow Line 348. An entry in the Activity Queue 346 indicates to the Index Processor 236 that there is a Command Packet available for processing.
FIG. 14 illustrates Activity Queue, and FIG. 15 shows the information contained in each Activity Queue Entry. The Activity Queue 346 may contain up to n Activity Queue Entries, referenced in FIG. 14 as 347-1, 347-2, 347-3,. . . , 347-n. Word 0 of an Activity Queue Entry contains a MESSAGE CODE 347a an ACBID 347b, a HIA UID 347c, and a HIA BPID 347d. Word 1 of the Activity Queue Entry contains a MESSAGE 347e. Each of these fields will be discussed in greater detail in the discussions relating to the Host Interface Adapter and Index Processor. But briefly, the MESSAGE CODE indicates the type of operation to be performed by the Index Processor 236. For an operation type indicating a new entry has been made in the HIA ACB Buffer 338, the ACBID indicates the ACB Number of the entry in the HIA ACB Buffer where the Command Packet information resides. The HIA Identifier field indicates the particular Host Interface Adapter 214 which put the Activity Queue Entry in the Activity Queue 346. In the interest of clarity, the description of the HIA BPID and the MESSAGE fields will be reserved for later sections of the specification.
The Monitor and Request Processing Node 350 in the Index Processor 236 monitors the Activity Queue 346 for Activity Queue Entries. When an entry is added to the Activity Queue, Processing Node 350 reads the ACB Entry from the Activity Queue 346 as indicated by data flow Line 352. Based upon the information in the Activity Queue Entry, Processing Node 350 sends an ACB Request to the HIA 214 as shown by data flow Line 354. The ACB Request contains the ACB Number from the Activity Queue Entry.
Send Processing Node 356 takes the Command Packet from the entry in the HIA ACB Buffer 338 which is associated with the ACB Number specified in the ACB Request and sends the Command Packet to the Process Node 358 of Index Processor 236. Data flow Lines 360 and 362 show the flow of a Command Packet from the HIA ACB Buffer 338 to the Process Node 358.
Process Node 358 decodes the command contained in the Command Packet and references the Control Structures 364 which contain information for managing the available storage space in NVS 220 and referencing Cached Files 366 stored therein. For file access commands, File Information is read from the Control Structures 364 as shown by data flow Line 368. Based upon the File Information and the decoded command, Process Node 358 initiates the appropriate processing. For the rest of this discussion for FIG. 7 assume that either a read or write request was contained in the Command Packet, and the referenced file data is present in Cached Files 366.
Two pieces of information are returned to the HIA 214 from the Process Node 358: a Status and Address as indicated by data flow Lines 370 and 372. Both pieces of information are tagged with the ACB Number so that the Status and Address information are stored in the appropriate entry in the HIA ACB Buffer 338.
Read and Send Processing Node 374 and Receive and Write Processing Node 376 control the flow of data between the Data Mover 110 and the NVS 220. Processing Node 374 is active when file data is read from Cached Files 336, and Processing Node 376 is active when file data is being written to Cached Files 366. For both Processing Nodes 374 and 376, Data Transfer Parameters are read from an entry in the HIA ACB Buffer 338 as respectively shown by data flow Lines 378 and 380. The Data Transfer Parameters indicate the address within NVS 220 where the operation is to begin and the number of words to be transferred.
Read and Send Processing Node 374 sends a Reconnect Message to the Data Mover 110 as shown by data flow Line 382. The Reconnect Processing Node 384 on the Data Mover 110 receives the Reconnect Message and supplies the ACB Number in the Reconnect Message to Receive and Write Processing Node 386. Data flow Line 388 shows the ACB Number flowing from Processing Node 384 to Receive and Write Processing Node 386.
Receive and Write Processing Node 386 retrieves the Data Transfer Parameters from the appropriate ACB Buffer 326 as referenced by the ACB Number. Data flow Line 390 illustrates the Data Transfer Parameters retrieved by Processing Node 386 from ACB Buffers 326. The Data Transfer Parameters indicate the location in Application Storage 392 where the file data is to be written. As File Data is received by Processing Node 386, as shown by data flow Line 394, it is written to Application Storage 392. Data flow Line 396 shows the File Data flowing to Application Storage 392. In Host Interface Adapter 214, the Read and Send Processing Node 374 reads the referenced File Data from Cached Files 366 as illustrated by data flow Line 398.
As previously stated, Receive and Write Processing Node 376 writes file data to Cached Files 366. File Data is shown as being written to Cached Files 366 by data flow Line 400. The transfer of File Data from the Data Mover 110 to the Host Interface Adapter 214 is initiated by the Receive and Write Processing Node 376 by sending a Reconnect Message. Data flow Line 402 shows the Reconnect Message. The Reconnect Message contains an ACB Number which is forwarded to Read and Send Processing Node 404. The ACB Number is shown at Line 406. Read and Send Processing Node 404 obtains the Data Transfer Parameters from the appropriate ACB Buffer 326 as referenced by the ACB Number. Data flow Line 408 shows the Data Transfer Parameters. The Data Transfer Parameters indicate the real address in Main Storage 16 where the file data to transfer resides. Processing Node 404 reads the referenced File Data from Application Storage 392 as shown by data flow Line 410. Data flow Line 412 shows File Data being sent by Processing Node 404 in the Data Mover 110 to the Receive and Write Processing Node 376 in the Host Interface Adapter 214. The File Data is then written to Cached Files 366.
For each of Processing Nodes 374 and 376, when the respective data transfer tasks are complete, a Status is written to the appropriate entry in the HIA ACB Buffer 338. Data flow Lines 414 and 416 respectively show the writing of the Status for Processing Nodes 374 and 376.
Return Status Processing Node 418 reads the Program Status Packet from the HIA ACB Buffer 338 when an activity completes and sends the Program Status Packet to the Write Status Processing Node 420 on the Data Mover 110. Processing Node 420 writes the Program Status Packet to the appropriate entry in one of the ACB Buffers 326. Data flow Lines 422, 424, and 426 illustrate the flow of a Program Status Packet from the HIA ACB Buffer 338 to the ACB Buffers 326 on the Data Mover 110.
Once the Data Mover 110 has received a Program Status Packet in its ACB Buffers 326, the Program Status Packet can be returned to the File Cache Handler Software 208. Return Status Processing Node 428 reads the Program Status Packet from ACB Buffers 326. The Program Status Packet is then written to an available entry in the Status Packet Queue 316. The entry in the Status Packet Queue to which the Program Status Packet is written is selected from a queue of pointers to available entries in the Status Packet Queue 316. The File Cache Handler Software reads the Status from the entry in the Status Packet Queue 316 and returns the appropriate status to the application software from which the I/O request originated. Processing Node 430 and data flow Lines 432 and 434 illustrate the status reporting.
3. File Space Management
This section provides an overview of the logical organization and maintenance of storage space in the Outboard File Cache 102. The preferred embodiment for this invention is predicated upon the file management and input/output systems associated with the OS1100 and OS2200 operating systems from Unisys Corporation. Those skilled in the art will recognize that this invention could be adapted to the file management systems associated with other operating systems without departing from the spirit of this invention.
FIG. 16 illustrates the file space available in the Outboard File Cache. The File Space 502 is logically organized in Segments 503-0, 503-1, 503-2, . . . , 503-(n-1), wherein each Segment contains 1792 words. The number of Segments available varies according to the amount of RAM storage configured in the Outboard File Cache 102. A segment has the same logical format as a logical track, which is the basic unit of storage allocation in the 1100/2200 file system.
FIG. 17 shows the logical organization of a single Segment. Each Segment 503 contains 64 blocks, numbered consecutively from 0 to 63 and respectively referenced 504-0, 504-1, 504-2, . . . , 504-63. FIG. 18 shows the logical composition of a Block. Each block is comprised of 28 words, numbered consecutively from 0 to 27 and respectively referenced 506-0, 506-1, 506-2, . . . , 506-27.
A Segment 503 may either be assigned or unassigned. Assigned means that the segment is directly associated with a specific track on a Disk 106 which belongs to a particular file and contains data which belongs to that file. An unassigned segment is not associated with any track or file. When the Outboard File Cache 102 is first started, all segments in the File Space 502 are unassigned. A Segment's transition from unassigned to assigned is initiated by Host 10 software and occurs when an appropriate command is sent to the Outboard File Cache 102. The transition from an assigned state to an unassigned state (hereafter referred to as "deassignment") is jointly controlled by the Host 10 and the Outboard File Cache 102. Any of the following three events may cause a Segment to be deassigned.
First, a Host 10 may send a command to the Outboard File Cache 102 which specifies that the Segment 503 is to be purged. Purged means that the identified Segment 503 should no longer be associated with the identified file. The segment may thereafter be used for storing segments of other files.
Second, File Space 502 in the Outboard File Cache 102 may be in short supply. The segment may be required to be assigned or "allocated" to a different file. The particular Segment 503 chosen depends upon the cache segment replacement algorithm implemented in the Outboard File Cache 102.
Third, the Outboard File Cache 102 may detect that a hardware condition has rendered the RAM space occupied by the segment unusable. The segment is deassigned and is thereafter unavailable for future assignment.
Deassignment of a segment may require that the data contained in the segment be copied to the Disk 106 and track with which it is associated. For example, if a segment to be deassigned contains data that does not also exist in the track with which it is directly associated, the track may need to be made current with the data contained in the segment. The data transfer is called destaging.
If the need to deassign a segment is detected and initiated by Host 10 software, the requirement to destage a segment is also determined by Host 10 software. The Outboard File Cache 102 may also initiate the deassignment of a segment, and the decision whether the segment must also be destaged is made according to the following rule: If the segment contains data that is not in its associated track, the segment must be destaged before it can be deassigned. This is initiated by sending a destage request from the Outboard File Cache 102 to the Host 10. The Host 10 responds by transferring the data in the identified segment(s) from the Outboard File Cache 102 to Disk 106. When the Host 10 has completed destaging the segment(s), the Outboard File Cache 102 may deassign the segment(s). If the segment and its associated track contain identical data, then no destaging is required and the Outboard File Cache 102 may unilaterally deassign the segment.
FIG. 19 shows the logical division between Cache File Space, Nail Space, and Resident File Space in the File Space of the Outboard File Cache. The proportion of segments allocated between Cache File Space 522, Nail Space 523, and Resident File Space 524 varies according to runtime requirements. Cache File Space is allocated segment by segment to files. As demand for Cache File Space increases, allocation of segments is managed according to a cache replacement algorithm. Segments in Resident File Space are assigned to tracks of files which are to remain in File Space for an extended period of time. For example, Resident File Space may be used for files which are accessed frequently and for data which is recovery critical. The segments in Resident File Space are not eligible for replacement by the cache replacement algorithm for Cache File Space. An overview of Cache File Space management and Resident File Space management is provided in the following paragraphs.
A segment in Cache File Space 522 may either be "nailed" or "unnailed." A nailed segment is one that is permanently stored in the Outboard File Cache 102. A nailed segment remains in Cache File Space until it is purged by a Host 10. The Outboard File Cache never initiates deass |