Method for reducing disk I/O accesses in a multi-processor clustered type data processing system5239643Abstract A method for minimizing I/O mechanical assess operations on secondary storage devices in a data processing system having a plurality of processor units interconnected in a cluster configuration to permit each processor unit to request and obtain data that is resident only on a secondary storage device of one processor unit. The method involves the steps of maintaining at each processor unit information about each copy of data that has been sent from the unit to another unit to permit a second request to the unit to be serviced by transferring a copy of the data from the main memory which is storing the data to the requesting unit rather than servicing the request with a relatively slow I/O accessing operation to a secondary storage device. Claims We claim: Description FIELD OF THE INVENTION
______________________________________
State Indicator 4 bits
Local Segment ID 24 bits
Page Number Within Local Segment
16 bits
Last Entry Indicator 1 bit
Processor ID 8 bits
Index of Next Entry on Hash Chain
31 bits
______________________________________
As discussed earlier, the inverted page table employed by each processor unit 10 functions to correlate page frame addresses in main memory with the virtual address of the page that is stored in the page frame. An inverted page table as shown in FIG. 5 has one entry for each page frame in its main memory. The data contained in each of the inverted page tables in the cluster is not per se duplicated in the VSMT that is stored by that processor unit. The function of the VSMT for a processor unit is to log entries that reflect that a virtual page which is being coordinated by that processor unit has been sent to another unit in the cluster. Stated differently, the VSMT for a processor unit is updated when a virtual page that the processor unit is coordinating is transferred to another processor unit of the cluster. The VSMT for a processor unit 10 is a list of entries as shown in FIG. 7. Entry into the table is by an index stored in the system hash anchor table shown in FIG. 8. The index is to the first entry in the section of the table whose virtual addresses hash to the same value. These sections are referred to as hash value sections. Entries in each hash value section are sequenced in ascending order of Local Segment ID's. Within the same LSID, entries are sequenced increasingly by virtual page index. The hash value for a virtual address is obtained for example by hashing the LSID cf the segment containing the page with the page's Virtual Page Index. The hash value is an index into the anchor table to provide a pointer to the head of the hash entries in the VSMT. A shared mapped file consists of at least two virtual memory segments. At each using processing unit on which at least one application program has the file open there is one segment per open system call issued by an application program, and at the owning processing unit there is one segment. FIG. 9 shows a model of a shared mapped file. The segment at using processing unit 10a is bound to the segment at owning processing unit 10b using the Bind Remote Segment Service (BRSS). All of the pages of processing unit 10a's segment are mapped read-write to the owning processing unit segment using the Map Remote Page Range Service (MRPRS). To map a file, an application program first issues the open system call to open the file. The application program then issues the shmat system call to map the file into the application program's virtual address space. The shmat system call uses the Map Page Range Service (MPRS) to load the file into virtual memory. The application program can now directly access the file with load and/or store instructions. No other interaction with the operating system is required to access the file. When the application program is finished with the file, it can remove the file from its virtual address space by issuing the shmat system call. Alternatively, the program could just issue the close system call since the close system call will automatically perform the shmat system call as required. Sharing of the data in a given mapped file is performed in a cluster environment by binding the using processing unit's segment associated with the open file to the owning processing unit's segment associated with the open file. Each using processing unit that has a file mapped into a virtual memory segment has one segment for mapped file that application programs executing on it have opened. All application programs at a given using processing unit or the owning processing unit logically share the same segment. The owning processing unit's segment associated with the open file is the segment to which each using processing unit's segment associated with the open file is linked. One of the requirements for sharing virtual memory segments across a cluster configuration is to ensure that updates to mapped files behave the same in both a stand alone configuration and a cluster configuration. This implies that each store instruction executed against a mapped file must appear to be immediately applied to all copies of the mapped file shared throughout the cluster. This may be achieved by enforcing a set of consistency rules on access to mapped files. These consistency rules are: (1) At most one processing unit within the cluster configuration may have a copy of a given page of the segment if one or more application programs executing on that processing unit is (are) writing into the page. (2) Any number of processing units within the cluster configuration may have a copy of a given page of the segment if no application programs executing on any processing units in the cluster are writing into the page. The virtual memory managers (VMMs) at the owning processing unit and the using processing units cooperate to enforce these consistency rules. Each VMM enforces these consistency rules by using a prior art hardware page protection mechanism. Each page in shared virtual memory has two page protection keys associated with it. The former key is the requested protection key specified using the Create Segment Service (CSS) or the Protect Page Service (PPS). This key is used to determine what type of memory access is valid for the page. The latter key is the effective protection key. This key is used to enforce the data consistency rules for shared mapped files. Each page in a mapped file has one of three distinct consistency states at any given time. These consistency states apply to both using processing unit and owning processing unit shared mapped files. The consistency states for a given page of a mapped file that is shared across a cluster are recorded in the Virtual Shared Memory Table (VSMT) State Field (see FIG. 7). A description of how the VSMT data structure is updated is described below. The consistency states are as follows: NoAccess: A copy of the page is not in the main memory of the processing unit. Any access to the page by an application program will result in a page fault occurred interrupt to be signaled to the VMM. This state places no additional restrictions on the valid consistency states of a copy of the page at any other processing unit in the cluster. ReadOnly: A copy of the page is in the main memory of the processing unit and the copy of the page has not been modified since having been placed in main memory. The effective protection key for the page is read-only. A store access to the page will result in a page fault occurred interrupt to be signalled to the VMM if the requested protection key is read-write. A store access to the page will result in a protection exception occurred interrupt to be signalled to the VMM if the requested protection key is read-only. The former interrupt is used to inform the VMM that an application program attempted to access the page for writing. The latter interrupt is used to inform the VMM that an application program attempted to access the page for writing although it did not have permission to do so. This is generally considered an error condition, and appropriate error handling must be executed to handle the error in an appropriate way. Other processing units within the cluster may access the same page for reading when the page is in the ReadOnly consistency state. No other processing units within the cluster may access the same page for writing when the page is in the ReadOnly consistency state. ReadWrite: A copy of the page is in the main memory of the processing unit and the page has been modified since it was placed in the main memory of the processing unit. The effective protection key for the page is read-write. An access to the page for either reading or writing will be allowed without causing a page fault occurred interrupt to be signalled to the VMM. An access to the page for reading may cause a protection exception interrupt to be signalled to the VMM if the requested protection key does not allow read access to the page. An access to the page for writing may cause a protection exception interrupt to be signalled to the VMM if the requested protection key does not allow write access to the page. No other processing unit within the cluster may access the same page for either reading or writing when the page is in the ReadWrite consistency state. The consistency state of a page may be effected by the occurrence of one of several different events. These events are: (1) accesses to the page by application programs executing on the same processing unit; (2) execution of the Purge Page Range (PPRS) at the same processing unit; (3) execution of the Purge Segment Service (PSS) at the same processing unit; (4) execution of the VMM page replacement mechanism at the same processing unit; and (5) changes to the consistency state of the page at another processing unit within the cluster. The VMMs executing at each of the processing units within a Z5 cluster cooperate to ensure that an occurrence of any of these events results in a valid transition of the page consistency state. The valid transitions allowed by the VMMs are: NoAccess to ReadOnly: This consistency state transition is triggered by a page fault occurred interrupt having been signalled to the using processing unit VMM resulting from a read access to the page. Upon receipt of the page fault occurred interrupt, the using processing unit VMM sends a message to the owning processing unit VMM requesting that the owning processing unit VMM send the data for the page along with permission to access the page for reading to the using processing unit VMM. In some instances it may be desirable for the VMM at the using processing unit to "remember" the previous consistency state for the page and treat this transition as if it were a write access to the page occurring when the consistency state of the page was ReadWrite instead of a read access to the page occurring when the consistency state of the page was NoAccess. This variation in protocol would prevent two consistency state changes when a write access to the page follows a read access to the page which in practice is often the case. If this variant protocol is adopted, upon receipt of the page fault occurred interrupt, the using processing unit VMM sends a message to the owning processing unit VMM requesting that the owning processing unit VMM send the data for the page along with permission to access the page for writing to the using processing unit VMM. NoAccess to ReadWrite: This consistency state transition is triggered by a page fault occurred interrupt being signalled to the using processing unit VMM resulting from a write access to the page. Upon receipt of the page fault occurred interrupt, the using processing unit VMM sends a message to the owning processing unit VMM requesting that the owning processing unit VMM send the data for the page along with permission to access the page for writing to the using processing unit VMM. ReadOnly to ReadWrite: This consistency state transition is triggered by a page fault occurred interrupt being signalled to the using processing unit VMM resulting from a write access to the page. Upon receipt of the page fault occurred interrupt, the using processing unit VMM sends a message to the owning processing unit VMM requesting that the owning processing unit VMM send permission to access the page for writing to the using processing unit VMM. ReadOnly to NoAccess: This consistency state transition is triggered when a page frame containing an unmodified page is reassigned by the using processing unit VMM to hold another page of data or when an unmodified page is purged using either the Purge Segment Service (PSS) or the Purge Page Range Service (PPRS), or when the owning processing unit VMM requested that the using processing unit VMM change the effective protection key for the page to NoAccess, which would occur if an application program executing at another using processing unit that has the file mapped for read-write attempts to access the page for writing. ReadWrite to NoAccess: This consistency state transition is triggered when a page frame containing a modified page is selected for replacement by the using processing unit VMM or when a modified page is purged using either the Purge Segment Service (PSS) or the Purge Page Range Service (PPRS), or when the owning processing unit VMM requested that the using processing unit VMM change the effective protection key for the page to NoAccess, which would occur if an application program executing at another using processing unit that has the file mapped for read-write attempts to access the page for writing. The using processing unit VMM sends the data contained in the page to the owning processing unit VMM along with notification that the using processing unit VMM has changed the consistency state for the page to NoAccess, and has purged the page from its main memory. ReadWrite to ReadOnly: This consistency state transition is triggered when a page frame containing a modified page is selected for replacement by the using processing unit VMM or when a modified page is purged using either the Purge Segment Service (PSS) or the Purge Page Range Service (PPRS), or when the owning processing unit VMM requested that the using processing unit VMM change the effective protection key for the page to NoAccess, which would occur if an application program executing at another using processing unit that has the file mapped for read-write attempts to access the page for writing. The using processing unit VMM sends the data contained in the page to the owning processing unit VMM along with notification that the using processing unit VMM has changed the consistency state for the page to ReadOnly, and has set the effective protection key to allow read-only access to the page. The owning processing unit VMM ensures that a valid combination of consistency states exists at each of the nodes accessing the mapped file. This is achieved by having the owning processing unit VMM maintain a list of writers to each page that is managed under the owning processing unit's consistency control algorithm and by having the owning processing unit send requests to using processing units to change the consistency state of a given page of data. Any request by any using processing unit to access the page for reading will cause the owning processing unit to send a request to a using processing unit that has an application program executing on it that has written to the page to change the consistency state for the page from ReadWrite to ReadOnly. Any request by any using processing unit to access the page for writing will cause the owning processing unit to send a request to a using processing unit that has an application program executing on it that has the page in the ReadWrite consistency state to change the consistency state for the page to NoAccess, or to send a request to each using processing unit that has the page in the ReadOnly consistency state to change the consistency state for the page to NoAccess. The protocol for updating the various SVMT of each processor unit to reflect the processing of the pages that it is coordinating, by other processors units will depend to a large extent on the particular application. In some applications it may be more efficient to notify the Access Coordinator when:..the copy is no longer in main memory of the requestor so that the coordinator will not service another request by a triangular I/O operation involving that unit. Likewise, the protocol for protecting pages being written by more than one processor unit could take the form of many of the prior art protection schemes involving locking bits. The techniques discussed by A. Chang and M. Mergen in an article entitled "801 Storage: Architecture and Programming",presented in the Proceeding of the the 1987 Conference of the ACM Special Interest Group on Operating Systems on Nov. 26, 1987 may be employed. Typical operations will now be described in connection with FIGS. 10. FIG. 10 is a block diagram of a cluster configuration consisting of three processing units 10a, 10b, and 10c, a switch 11, and three communication links 12 that connect the processing units to the switch. Each of the processing units has a secondary storage device which may be thought of as a disk attached directly to it. Except for the contents of the files stored on the secondary storage devices attached to an individual processing unit, processing units 10a, 10b, and 10c should be thought of as identical. We shall use FIGS. 10-13 to illustrate typical operations in the cluster configuration. The description of these operations and the flow of messages is at a level of detail such that a person skilled in the art of implementing a software virtual memory manager component of a general purpose operating system will be able, without undue experimentation, to implement the method. FIG. 11 is a flow chart illustrating the steps for creating a new file that is stored on one of the processing units in the cluster. In Step A of FIG. 11, an application program executing on processing unit 10a uses the create [sic] system call to create the file "/u/smorgan/database se". In Step B the operating system executing on processing unit 10a intercepts the system call from the application program. In Step C the operating system examines the "root" system directory. The system directories individually contain lists of file names each with the name of the access coordinator for that file. We shall assume for the purpose of discussion that a file naming convention and directory structure of the UNIX operating system is used, although persons skilled in the art will understand that this assumption is not necessary for the purpose of implementing the method. For example, the application program may have asked to create the file "/u/smorgan/database". The operating system examines the root system directory, called "/", and finds that it contains an entry for "u" and that u is a directory. In Step D the operating system examines the u directory and determines that it contains an entry for "smorgan" and that smorgan is a directory. In Step E the operating system examines the smorgan directory and finds that it does not contain an entry for "database". Steps C-E are called the directory lookup phase of the create system call. In Step F the operating system determines which processing unit in the cluster is a good candidate to serve as the access coordinator for the file once it is created. The operating system uses some algorithm, whose exact working is unnecessary for an understanding of the method, to make a judicious choice. For example, the choice of an access coordinator might be based on a computation of which of the processing units is least heavily loaded with access coordinator duties for other existing files. By picking the least heavily loaded processing unit, the operating system might be making an assumption that the configuration will provide the best overall performance if the access coordination function is spread uniformly among the various processing units in the configuration. After having chosen one processing unit in the configuration as the access coordinator for the to-be-created file /u/smorgan/database, which for the purpose of discussion is assumed to be processing unit 10c. In Step G processing unit 10a sends message 1 to processing unit 10c to create the file. In Step H, upon receipt of message 1 from processing unit 10a, processing unit 10c determines that the file does not yet exist within the configuration by examining the various shared directories in a way similar to that performed by processing unit 10a in the directory lookup phase (Steps C-E) of the create system call. In Step I processing unit 10c creates the file and assigns it a file identifier FID. For the purpose of this discussion we shall assume that a file identifier is a 32 bit integer that uniquely identifies the file in the configuration. The file identifier may have been composed by concatenating the processing unit identifier for the access coordinator (processing unit 10c) with a number chosen by the access coordinator that uniquely identifies the file to the access coordinator. A processor identifier is a 7 bit integer that uniquely identifies a given processing unit within a cluster configuration. In Step J processing unit 10c sends message 2 to each of the other processing units 10a and 10b in the configuration that the file identified by FID has been created. Message 2 includes the name of the file, its file identifier FID, and the processor identifier PID of the access coordinator. In Step H, upon receipt of message 2 from processing unit 10c, each of the other processing units 10a and 10b updates its copy of the system directories to indicate the existence of the newly created file /u/smorgan/data base along with the file identifier FID and the access coordinator processor identifier PID for the file. In Step K, upon receipt of message Z from processing unit 10c, processing unit 10a determines that the file /u/smorgan/database has been created, and In step L 10A processing unit 10A informs the application program executing on processing unit 10a that this is the case. FIGS. 12A and 12B is a flow chart illustrating how an existing file is opened by an application program running on a processing unit within the cluster. In Step A of FIG. 12, an application program executing on processing unit 10a uses the open system call to open the file "/u/smorgan/database" for read-write access. In Step B the operating system executing on processing unit 10a intercepts the system call from the application program. In Step C the operating system examines the root system directory "/" and finds that it contains an entry for "u" and that u is a directory. In Step D the operating system examines the u directory for "smorgan" and determines that smorgan is a directory. In Step E the operating system examines the smorgan directory for "database" and determines: (1) that database is a file; (2) that the access coordinator for the file is processing unit 10c; and (3) that the file identifier FID is associated with the file. In Step F the operating system executing at processing unit 10a sends message 1 containing file identifier FID to processing unit 10c, requesting that the file identified by FID be opened on behalf of an application program executing on processing unit 10a. In Step G, upon receipt of message 1 from processing unit 10a, processing unit 10c determines the location on its secondary storage device of file descriptor FD, which describes the file identified by FID. The processing unit 10c locates file descriptor FD by using file identifier FID to index into the File Descriptor Table (FDT) located at processing unit 10c. The FDT located at processing unit 10c contains a file descriptor for each existing file for which processing unit 10c serves as access coordinator. A file descriptor identifies the number and location of disk blocks on secondary storage devices attached to processing unit 10c that are part of a given file. In addition, a file descriptor contains other information about a file, such as its length, the time it was most recently accessed, the name of the its owner, etc. Persons skilled in the art will understand that the additional information contained in a file descriptor is irrelevant insofar as developing an understanding of the method is concerned; thus, it is not discussed. In Step H processing unit 10c determines that the file identified by FID is not currently open, i.e. it does not currently have a local virtual segment associated with it. In Step I processing unit 10c uses the Create Segment Service (CSS) to create a virtual memory segment for the file FID. In doing so, processing unit 10c specifies that the segment is to be created using file descriptor FD, and also that the requested protection key for the segment to be created is to be read-write. CSS returns a segment identifier S by which the segment it created may be identified. In Step J processing unit 10c sends message 2 to processing unit 10a responding that processing unit 10c has successfully opened the file identified by FID on behalf of processing unit 10a. Message 2 identifies the segment identifier S as the segment associated with the file identified by FID. In Step K processing unit 10a determines that the file identified by FID is not currently open, i.e. it does not currently have a local virtual segment associated with it. In Step L processing unit 10a creates a local segment SA for the file identified by FID using the Create Remote Segment Service (CRSS). CRSS takes the segment identifier S and creates a "dummy" segment SA. A dummy segment is a local segment with a segment identifier and a Segment Identifier Table (SIT) entry, but without an External Page Table (XPT). In Step M processing unit 10a uses the Bind Remote Segment Service (BRSS) to bind the local segment SA to the global segment S. BRSS takes the segment identifiers S and SA, the processor identifier PID of the access coordinator (processing unit 10c), and modifies the SIT entry associated with segment SA to indicate that segment SA relates to segment S whose access is coordinated by processing unit PID. In Step N processing unit 10a determines that file u/smorgan/database has been successfully opened and informs the application program that this is the case. FIG. 13 is a flow chart illustrating how an open file is loaded into the virtual memory shared in a cluster configuration. In Step A of FIG. 13, an application program executing on processing unit 10a uses the shmat system call to map the local segment SA associated with the open file "/u/smorgan/database" into the application program's virtual address space for read-write access. In Step B the operating system executing on processing unit 10a intercepts the system call from the application program. In Step C the operating system determines that the local segment SA is bound to a remote segment S whose access is coordinated by processing unit 10c. Processing unit 10a makes this determination by examining the Segment ldentifier Table (SIT) relating a given segment identifier for a currently open file to the appropriate remote segment for the currently open file and the processor identifier of the access coordinator associated with that segment. In Step D processing unit 10a uses the Map Page Range Service (MPRS) to map the contents of segment SA into the virtual address space of the application program. In Step E processing unit 10a determines that the file /u/smorgan/data base has been successfully mapped into the virtual address space of the application program and informs the application program that this is the case. FIG. 14 is a flow chart illustrating the steps performed by the access coordinator when a using processing unit wishes to page-in a copy of a page that is not in the memory of any of the processing units in the configuration. This description assumes for the purpose of discussion that: (1) an application program executing on processing unit 10a has previously opened the file and had the file mapped into the application program's virtual address space; and (2) that processing unit 10c serves as the access coordinator for the file. In Step A of FIG. 14 an application program executing on processing unit 10a attempts to access for reading page P of segment SL containing file F. In Step B the application program page faults. In Step C the operating system executing on processing unit 10a intercepts the page fault and determines that it was caused by a read access to page P of segment SL by the application program. In Step D processing unit 10a determines that segment SL is a remote segment whose access is coordinated by processing unit 10c. In Step E processing unit 10a determines that segment SL is bound to remote segment SR. In Step F processing unit 10a sends message 1 to processing unit 10c requesting that processing unit 10c send a copy of page P of segment SR to processing unit 10a. In Step G, upon receipt of message 1, processing unit 10c examines its VSM Table looking for entries for page P of segment SR. Assume for the sake of discussion that exactly one entry exists in the VSM Table for page P of Segment SR, and that the entry indicates that processing unit 10b has a copy of the page in its memory, and that the ReadOnly consistency state is associated with that copy of the page. In Step H processing unit 10c determines that segment SR is bound to segment ST in processing unit 10b. In Step I processing unit 10c sends message 2 to processing unit 10b requesting that processing unit 10b send a copy of page P of segment ST to processing unit 10a and that the copy of the page have the ReadOnly consistency state associated with it. Message 2 further indicates that processing unit 10a refers to segment ST as segment SL. FIGS. 15a and 15b is a flow chart illustrating the detailed steps of how the VSMT is updated by the access coordinator when a page of data is transferred from one processing unit to another processing unit. This description assumes for the purpose of discussion that: (1) an application program executing on processing unit 10a has previously opened the file and had the file mapped into the application programs' virtual address space; and (2) that processing unit 10c serves as the access coordinator for the file. In Step A of FIG. 15A an application program executing on processing unit 10a attempts to access for reading page P of segment SA containing the file F. In Step B the application program page faults. In Step C the operating system executing on processing unit 10a intercepts the page fault and determines that it was caused by a read access to page P of segment SA by the application program. In Step D processing unit 10a determines that segment SA is a local segment bound to remote segment S whose access is coordinated by processing unit 10c. In Step E processing unit 10a sends message 1 to processing unit 10c requesting that processing unit 10c send a copy of page P of segment S to processing unit 10a. In Step F, upon receipt of message 1, processing unit 10c examines its VSM Table looking for entries for page P of segment S. We shall assume for the purpose of discussion that: (1) exactly one Lentry exists in the VSM Table for page P of Segment S; (2) the entry indicates that processing unit 10b has a copy of the page in its memory; (3) the ReadOnly consistency state is associated with that copy of the page. In Step G processing unit 10c sends message 2 to processing unit 10b requesting that processing unit 10b send a copy of page P of segment S to processing unit 10a and that the copy of the page have the ReadOnly consistency state associated with it. In Step H processing unit 10c adds an entry to its VSM Table indicating that processing unit 10a has been sent a copy of page P of segment S with the ReadOnly consistency state associated with it. In order to add an entry to the VSM Table for page P of segment S, the following steps must be performed by processing unit 10c: (H1) Hash the segment identifier SR and the page number P together to locate the hash anchor table entry that would correspond to page P of segment SR if there were already an entry for this page in the VSM Table. (H2) Determine whether the hash chain is empty . Perform this operation by examining the Empty bit in the hash anchor table entry for the computed hash value. In the case at hand the hash chain contains at least one entry, which is the entry for page P of segment S located at processing unit 10b; thus, the Empty bit will be clear. (H3) Follows the hash chain for the computed hash value until it finds the entry for page P of segment S at processing unit 10b. We shall refer to this below as entry E of the VSM Table. (H4) Allocate an entry F in the VSM Table by taking an entry off the free-list of currently unused VSM Table entries. Allocating an entry in a data structure from a free list is well known, simple, and will be understood by a person skilled in the art of computer programming; therefore, it is not illustrated here. (H5) Fill the appropriate values into entry F. Specifically, fill in: (a) the Processor Identifier field with an integer that uniquely identifies processing unit 10a; (b) the Page Number field with the page number P; (c) the Local Segment Identifier field with the segment identifier S; and (d) the State field with an integer that uniquely identifies consistency state ReadOnly. (H6) Add entry F to the hash chain for the computed hash value. Perform this operation by: (a) copying the Next Entry Index field of entry E into the Next Entry Index field of entry F; then (b) copying the number F into the Next Entry Index field of entry E. After Step H6 has been completed, entry F is on the hash chain for the computed hash value. In Step I, upon receipt of message 2 from processing unit 10c, processing unit 10b locates page P of segment S in its main memory. In Step J processing unit 10b sends message 3 containing page P of segment S to processing unit 10a. Message 3 indicates that page P of segment S has the ReadOnly consistency state associated with it. In Step K, upon receipt of message 3 from processing unit 10b, processing unit 10a places the copy of page P of segment S in its main memory, changes the virtual address of page P of segment S to indicate that the page is page P of segment SA, then sets the effective protection key for the page to ReadOnly. FIG. 16 is a flow chart illustrating the steps performed by the access coordinator when a using processing unit sends a request to cast-out a page from its main memory. In Step A of FIG. 16 processing unit 10a selects page P of segment SA for replacement. This would happen in the normal course of events if, for example, the virtual memory manager (VMM) component of the operating system executing on processing unit 10a determined that page P of segment SA had not been accessed by any application program for an extended period of time. In Step B, processing unit 10a determines that page P is contained within segment SA, and that segment SA is a local segment bound to remote segment S whose access is coordinated by processing unit 10c. In Step C processing unit 10a sends message 1 to processing unit 10c requesting that processing unit 10a be allowed to cast page P of segment S out of its main memory. In Step D, upon receipt of message 1 from processing unit 10a, processing unit 10c examines its VSM Table for all entries corresponding to page P of segment S. We shall assume for the purpose of discussion that: (1) exactly two entries exist in its VSM Table for page P of segment S; (2) the former entry indicates that processing unit 10a has a copy of page P of segment S in its memory in ReadOnly consistency state; and (3) the latter entry indicates that processing unit 10b also has a copy of page P of segment S in its memory, and that this copy is also in ReadOnly consistency state. In Step E processing unit 10c determines that, since: (1) there are currently two copies of the page cached in the main memory of processing units within the cluster configuration; and (2) both copies of the page are in ReadOnly consistency state, then processing unit 10a may be allowed to cast page P of segment S out of its main memory without the significant degradation of performance that re-reading page P of segment S from secondary storage might later incur. In Step F processing unit 10c sends message 2 to processing unit 10a responding that processing unit 10a may cast page P of segment S out of its main memory. In Step G, upon receipt of message 2 from processing unit 10c, processing unit 10a casts page P of segment S out of its main memory. FIG. 17 is a flow chart illustrating the steps performed by the access coordinator when a using processing unit sends a request to cast a page out of its main memory and there isn't a copy of the page in the memory of any other processing unit. In Step A of FIG. 17 the VMM component of the operating system executing at processing unit 10a selects page P of segment SA as a candidate for replacement. In Step B processing unit 10a sends message 1 to processing unit 10c requesting that processing unit 10a be allowed to cast page P of segment S out of its main memory. In Step C, upon receipt of message 1 processing unit 10c examines its VSM Table for entries for page P of segment S. We shall assume for the purpose of discussion that no entry exist in its VSM Table for page P of segment S. In Step D processing unit 10c determines that it has enough space in its main memory to hold page P of Segment S and allocates a trame for that purpose. In Step E processing unit 10c sends message 2 to processing unit 10a requesting that processing unit 10a send a copy of page P of to processing unit 10c. In Step F, upon receipt of message 2 from processing unit 10c, processing unit 10a sends message 3 containing page P of segment S to processing unit 10c. In Step G, upon receipt of message 3 from processing unit 10, processing unit 10c adds page P of segment S to its main memory. In Step H processing unit 10c updates its VSM Table indicating that a copy of page P with the ReadOnly consistency state associated with it, has been moved from processing unit 10a's main memory to processing unit 10c's main memory. In order to update an entry in the VSM Table for page P of segments the following steps must be performed by processing unit 10c. (H1) Hash the segment identifier S and the page number P together to locate the hash anchor table entry that would correspond to page P of segment S if there were already an entry for this page in the VSM Table. (H2) Determine whether the hash chain is empty. Perform this operation by examining the Empty bit in the hash anchor table entry for the computed hash value. In the case at hand the hash chain contains at least one entry, which is the entry for page P of segment S located at processing unit 10b; thus, the Empty bit will be clear. (H3) Follow the hash chain for the computed hash value until it finds the entry for page P of segment S at processing unit 10a. (H4) Update the processor Identifier field of entry E with an integer that uniquely identifies processing unit 10c after Step 4 has been completed Entry E has been updated. FIG. 18 is a flow chart illustrating the steps performed by the access coordinator when it determines that a given page of data must be cached by a shared memory processing unit. In Step A of FIG. 18 the VMM component of the operating system executing at processing unit 10a selects page P of segment S as a candidate for replacement. In Step B processing unit 10a sends message 1 to processing unit 10c, the access coordinator for segment S, requesting that processing unit 10a be allowed to cast page P of segment S out of its main memory. in Step C, upon receipt of message 1 processing unit 10c examines its VSM Table for entries for page P of segments S. Assume for the purpose of discussion hat no entry exist in its VSM Table for page P of segment S. In Step D processing unit 10c determines that it does not have enough space in its main memory to cache page P of Segment S. In Step E processing unit 10c determines that processing unit 10b is acting as a shared memory unit for the cluster configuration and sends message 2 to processing unit 10b requesting that processing unit 10b cache a copy of page P of segment S in its main memory processing unit 10c. In Step F processing unit 10c adds an entry to its VSM Table indicating processing unit 10b now holds a copy of page P of segment S with the ReadOnly consistency state associated with it. In order to add an entry in the VSM Table for page P of segment S the following steps must be performed by processing unit 10c. (F1) Hash the segment identifier S and the page number P together to locate the hash anchor table entry that would correspond to page P of segment S if there were already an entry for this page in the VSM Table. (F2) Determine whether the hash chain is empty. Perform this operation by examining the Empty bit in the hash anchor table entry for the computed hash value. In the case at hand the hash chain contains at least one entry, which is the entry for page P of segment S located at processing unit 10b; thus, the Empty bit will be clear. (F3) Follow the hash chain for the computed hash value until it finds the entry for page P of segment S at processing unit 10a. (F4) Update the processor Identifier field of entry E with an integer that uniquely identifies processing unit 10b. After Step 4 has been completed entry E has been updated. (G) In Step G upon receipt of message 2 from processing unit 10c, processing unit 10b allocates a page frame on its main memory and sends message 3 to processing unit 10a requesting that processing unit 10a send page P of Segment S to processing unit 10b'. (H) In Step H upon receipt of message 3 from processing unit 10c, processing unit 10 a sends page P of Segment S along with the ReadOnly consistency state associated with that page to processing unit 10b. While the preferred embodiment of applicant's method has been described for use in a virtual memory environment, it will be apparent that the method is equally applicable to a cluster configuration comprising a plurality of processing units which do not employ a virtual type of memory. The underlying problem of I/O disk access for obtaining a copy of data that is currently in the main memory of another unit can be solved in the manner taught in this application, namely maintaining information on what data has been sent to the main memory by what processor unit and transferring a copy of that data to the requestor from main memory having the copy rather than performing an I/O operation to disk to obtain the data. It will be apparent to those persons skilled in the art that other modifications may be made in the preferred embodiment of the method without departing from the spirit of the invention, the scope and appended claims.
|
Same subclass Same class Consider this |
||||||||||
