Communications between partitioned host processors and management processor6785892Abstract An inventive protocol for communicating between a management processor and host processors allows for the cooperative management of resources among host processors within a partition and also among a set of partitions in a computer system, wherein each partition may function under an instantiation of an operating system with a group of host processors. The protocol employs a message passing system using mailbox pairs in fixed but moveable or relocatable locations within the computer system shared memory. The messages share a format having specific codes or descriptors that act as codes for coordination of message interpretation. These codes include at least a validity flag and a sequence enumerator, and in a response message of a request/response message pair, a status indicator. Additionally, routing codes and function codes and code modifiers may be provided. Specific implementation details and messages are described to enable the smooth functioning of complex multiprocessor systems. Claims What is claimed is: Description A portion of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
TABLE 1
REQUEST mailbox format.
##STR1##
Note that a byte is 8 bits, and each column represents a byte. The descriptions below define what is in each column or set of columns that represents a reserved area in the message format defined in the table. If a buffer is used in a message, this means the buffer at another location (identified by the buffer address and buffer size information) will contain information needed by the message recipient for the communication to be complete. The information in the buffer address/size areas may be coded with message information instead of a buffer address and size and these codes may change for different protocol versions. The owner of the mailbox will fill in the appropriate fields and then set the valid flag, the target of the message will save the data before clearing the valid flag. Description of Elements of the Table function code: Located preferably at byte 00, the function code basically tells what kind of message this is. It depends on the direction (Host to MIP or vice-versa) for interpretation. For example, if the request is to change mailbox locations, a function code identifying that function of the instant message will be in this byte. These codes from a current embodiment are in columns g and l of FIGS. 6 and 7, respectively. MIP routing code: This identifies the kind of message. Specifically it indicates which process in the MIP or MIS will need to deal with the message so that the front end process within the MIP can direct the message for appropriate response. host routing code: This performs the same function for the Host processor entity. It identifies the kind of message and where within the message handling system of the Host the message should be directed to. Thus, in a Host to MIP message, the Host routing code will be sent back to the front end message receiving processor and from there be directed to the process within the host needing to get the reply. In a MIP to Host message, the MIP routing code will perform this function and the Host routing code will just be rewritten for the Host. While this kind of routing protocol may be found in communications systems, it is heretofore unknown in the context of controlling the run-time operating environment of a multiprocessor computer system. code modifiers: These are flags which designate specific attributes of the class/function code. bit 7: If set, this is a MIP to Host message; if clear, it is a Host to MIP message. bit 6: If set, this indicates that there is a buffer address/size indicator in the message, otherwise the message space usable for other purposes (codes) if needed. (NOTE: if the buffer bit is set in the request, indicating a buffer will be used for the request, and there is no buffer for the response, the buffer bit will preferably be clear in the response.) Other bits in the code modifiers are useable as desired for improvements/modifications. sequence number: This is an identifier written by the requestor (the host if Host.fwdarw.MIP message, or the MIP if MIP.fwdarw.Host message) and it is returned in the response mailbox. The requestor must guarantee that the sequence number is different and increasing for each message initiated from the requester (even if there are multiple users of the requester, such as when more than one processor in a Sub-POD may be sending requests). The sequence number is allowed to wrap back to zero. If the requestor is the MIP a different sequence will be kept for each mailbox user. The sequence numbers used by the requesters (host or MIP) can overlap, coordination of the sequence numbers for messages that are initiated in the separate directions is not required. buffer address: This is the address of a buffer of information that is an extension to the information in the mailbox. All buffers are allocated by the host in host memory. The buffer address in the preferred embodiment is always in bytes 08 through Of. buffer size: This is the size in bytes of the buffer whose address is specified in the buffer address field (Bytes 10 through 13). This field may also be used for other data if bit 6 of the code modifier field is clear. valid flag: Two values are currently used. 0=contents not valid; 1=contents valid; values 2-ff are reserved for future use. In preferred embodiments the receiver (host or MIP) of a message is enabled to write over the valid flag area of the mailbox in which it receives a message in order to let the sender know the message was received by the receiver. The rest of the mailbox, if used, is function dependent.
TABLE 2
RESPONSE mailbox format.
##STR2##
function code: SAME EXPLANATION AS REQUEST
MIP routing code: SAME EXPLANATION AS REQUEST copied from sender if this is
a Host .fwdarw. MIP message.
host routing code: SAME EXPLANATION AS REQUEST copied from sender if this
is a MIP .fwdarw. Host message.
code modifiers: SAME EXPLANATION AS REQUEST
sequence number: copied from request mailbox
buffer address: SAME EXPLANATION AS REQUEST
buffer size: SAME EXPLANATION AS REQUEST
status: result of the request determined by recipient
valid flag: SAME EXPLANATION AS REQUEST
Certain features of this message pair format are needed for a smoothly operational protocol. In the request and response messages in a Host to MIP message, the MIP routing code should be specified. Since the MIP will copy the host routing code from the request message to the response message host routing code field, the host should use this routing code for routing the response message. Thus, the host can use any routing code in the request message since it will be copied into the response message. In the MIP to Host message, the host routing code should be specified. Also, the host will copy the MIP routing code from the request message to the response message MIP routing code field. Therefore the MIP can use any routing code in a request message. The routing codes in the preferred embodiment for these messages are as follows: Briefly review table 3 below in which the routing codes for a preferred embodiment are specified. If in a Host.fwdarw.MIP message the host specifies "3" as the MIP routing code value, it is saying that the Mailbox Handler routines in the IMS/MIP software will be called upon to handle the message's request, in this example, perhaps changing a mailbox address. In the Host system, the handling process would likely be a "7" since this is a communications control message, but since hosts particularly should be able to evolve, the Host should have flexibility in what process it uses as a process to respond.
TABLE 3
Routing Code Summary
Description
MIP Routing
Code.sub.16
0 Not used
1 Platform Messages
2 Architectural Messages
3 Mailbox Handler Messages
4 Shared Memory Control Messages
Host Routing
Code.sub.16
0 Generic Message
1 BIOS Messages
2 HAL (Hardware Abstraction Layer)/PSM (Unisys
equivalent of HAL) Control Messages
3 Processor Messages
4 I/O Messages
5 Exclusive Memory Messages
6 Shared Memory Messages
7 MIP/Host Communications Control Messages
8 Partition Data Messages
9 Partition Control Messages
A General Logging Messages
A specific starting mailbox location is always the same for the Boot Strap Processor (BSP) that is, the processor which is responsible for starting itself and its POD or Sub-POD when released out of reset by the MIP, The BIOS for the BSP will identify this location for each Sub-POD. The multiprocessor computer system with Intel processors uses a single mailbox pair during normal operation. Once the master BSP Sub-POD has been selected, a single partition mailbox pair is selected by the BIOS for use by both the BIOS and the Host OS. (The BIOS selection will be translated for the partition by the address translation hardware for the partition that is organized/programmed when the partition is first set up.) For Intel processor platforms, the MIP is functions like a single device in communication with the BIOS, initially; and after booting, with the Host OS having a single mailbox location. The partition mailbox pair also contains two mailboxes; one for host to MIP and one for MIP to host communication. During the boot the MIP is unable to generate interrupts for responses on host to MIP communication because the interrupt organization (APIC) is set during the boot procedure. During the initialization and boot sequence, the hardware is in "Virtual Wire" mode and interrupts from the MIP cannot be generated. Therefore, until the switch of a host to APIC mode, the BIOS and OS loader have to poll the MIP to Host mailbox to determine when the MIP has presented a response to any request from the Host to MIP message. The structure of the data in a partition mailbox pair is preferably the same as in the BSP mailboxes. The initial location of the partition mailbox pair in the preferred embodiment is 0.times.F000 in the Host memory unit or MEM. (The "0x" in this address means that it is a hexadecimal number, as opposed to decimal, octal, binary, etc). We order the host to MIP mailbox as the first mailbox (cache line) of the pair--the host to MIP mailbox is at address 0.times.F000, the MIP to host mailbox is at address 0.times.F040. Once the BIOS has done appropriate initialization, the partition mailbox pair will be moved to a more secure area. The mailbox pairs need not be in contiguous memory locations or in order, but we find this organization easier to track. The particulars of the preferred embodiment protocol includes the messages listed in the table reproduced as FIGS. 6 and 7. Table 4 in FIG. 6 identifies the names of the preferred embodiment host to MIP messages in the first column, identifies whether there is a request buffer and a response buffer in the second or third columns and identifies the routing codes and function codes in the last three (3) columns. The direction column merely indicates that these are all hosts to MIP messages. As may be clear by now, messages in this system are routed in pairs (one MIP to host/one host to MIP, or vice-versa) for each message pair. Accordingly, the description of the messages begins with a request form and then the response form is described. The first message illustrated NOP, is an abbreviation for "No Operation" and is preferably used to test the host to MIP communication channel. It uses the format of the REQUEST mailbox from Table 1 above. In the code modifiers the REQUEST has Bit7 Clear (byte 03) indicating it is a host to MIP message and Bit6 Clear (byte 03) indicating that there is no buffer. The RESPONSE message in reply from the MIP to the host will also indicate in the code modifier Bit7 Clear (byte 03) that this is a host to MIP message response. In other words, Bit7 will simply be recopied back in the response. Bit6 also will receive the same treatment since there is no buffer being re-sent with the response message. In none of the host to MIP messages will this copy back of bits 6 and 7 of byte 03 need to change. As will become abundantly clear when reference is made to FIG. 5 in the discussion later, separate mail box addresses for the messages from the MIP to the host or the host to the MIP are provided in each mail box pair so the participants in the protocol cannot get confused about which message it is sending or receiving. In the second host to MIP message identified in Table 4, an organizational procedure enhancing the efficiency of the multiprocessor system is identified. This function "Save CMOS Image" 2 is used by the BIOS at the end of the boot sequence, prior to the hand-off to the operating system loader, to pass a copy of the current contents of the processor associated CMOS to the MIP software for storage. By storing the basic information on the configuration of the resources for a particular processor in an area accessible to the MIP, the MIP will have more ability to manage the multiprocessor system. In the request to save CMOS Image message, the bytes 0F to 08 will contain the address of the buffer to save, and the number of bytes in the buffer will be contained in the bytes 13 to 10. Note that the request buffer in FIG. 4, line 2, is marked "yes" since some of the buffer space of the messages used and the response buffer marked "no". In the preferred embodiment this only means that the No Buffer Bit 6 of the code modifiers is clear indicating that there is no buffer of use in this returned response message. This would be the case whether the status is successful or failed in the response from the MIP to the host. In Instruction 3, the multiprocessor processing unit running the BIOS notifies the MIP software that the read of the boot sector has been successful and that the BIOS is about to turn control over to the operating system loader software. This instruction is called BIOS to OS Transition. No buffer is needed in the response and the only piece of information required is that the MIP has received the message and this can be accomplished in the status bytes. In the preferred embodiment marking them as zero equals received. The fourth message to Release the Application Processors "Release APs" is used by the BIOS to inform the MIP software that all these Sub-PODs that are up in the partition and are not the master Sub-PODs should be switched to APs. The MIP will issue a become AP message in the mailbox of all non-master BSP Sub-PODs. This is an example of a cooperative control function. In other words, the processors in a partition organize themselves and based on the results of that organization reported to the MIP, the MIP issues the become AP messages and become BSP messages appropriately. These messages will be discussed in further detail with reference to Table 5 in FIG. 6. The Release APs message again requires no buffer and only an indication that it is a host to MIP message. The response merely needs to indicate that it has been received in the status fields. Another function provided by the protocol is the message in line 5 "Switch to Partition Mailbox". On successful completion of this message pair, the partition mailbox will be used and all the other mailbox pairs previously in use will no longer be used. This function is only used by the host BIOS to select the mailbox for the BIOS and the operating system loader. In order to accomplish it the BIOS should be programmed to perform a test of the partition mailbox pair location it plans to use. It should zero out the partition mailbox pair locations. It should then issue the "Switch to Partition Mailbox" message. However, there should be no outstanding message requests when this message is issued. In the preferred the embodiment the BIOS cannot send any additional messages on any mailbox until the switch to partition mailbox message is complete either successfully or unsuccessfully. The MIP software will then test the partition mailbox pair location and if its test is successful the MIP will send a "good" status in a response message through the mailbox pair in which the request was made. Then the partition mailbox will be used for all subsequent communications between this host and the MIP. If the test is unsuccessful the MIP will send a "bad" status response message and subsequent messages will proceed through the original mailbox pair. This message request by the host to the MIP will require an address of the new host to MIP mailbox referenced by the buffer as well as an address of the new MIP to host mailbox memory location. Accordingly, bit 6 of the code modifier byte would be sent and the buffer address and size information included in the request message. The response message will only require a status indication of whether the MIP has tested the location successfully or failed. The sixth message pair is called "Load Code". This message function can be used by the BIOS to load parts of itself that were not included in the initial load image if desired. The address of the buffer and length of the buffer at which this additional BIOS maybe located is carried in the request message. The response message requires a change in the status area to indicate whether this is an acceptable change to the MIP or not. The RESPONSE should also contain a buffer address and size to show where to get the additional code. The "Map Memory" message 7 enables the passing of the DIB (that is, the Direct I/O Bridge, a module for connecting to I/O peripherals) configuration information to the MIP software to perform initialization of range registers in the Sub-PODs in order to setup the appropriate addresses of the I/O available through the DIB. The request message of course has the address of the buffer and the number of bytes in the buffer if a buffer is used. In the preferred embodiment a fixed configuration of codes will be expected in the buffer itself, saving space that might otherwise be used to identify its parts. The response message sent in reply may in addition to the status, contain a buffer address. This buffer passing will enable the MIP software (the IMS) to allocate ranges in the MSU global address space for each partition. The MIP will give out such address information as each partition needs to know through this process. In preferred embodiments, the MIP software will identify several regions in the memory space for the partition it is communicating with, and identifying any regions that may be shared with other partitions if any. The Host will pass through its buffer, the information on which addresses it has assigned for various resources including interrupts, I/O, out of range addresses and the like. The MIP will send back in the response message the region addresses and sizes allocated to this partition, and the states of these regions, whether they are shared or exclusive to this partition and the like. The "Shadow ROM" message 8 is used to change the read/write access to the four Shadow memory spaces by the BIOS to write protect the Shadowed ROM. (Providing a place to put the slow ROM information is not new, but it has not previously been done in a unassigned multiprocessor computer system, such as is provided under current Intel-based multiprocessor systems. Shadow ROM is defined as is commonly known in the industry. In the code modifiers in the request for this message, it should be indicating that there is no buffer, however, byte 08-0C will contain information regarding the number of areas that are to be changed in state and the state changes to be made in those areas in the preferred embodiment. (Thus bit 6 of byte 03 will be clear.) The response message merely needs to indicate whether the change has been successful or whether the change has failed in the status fields. The cluster check-in complete message in line 7 is used to notify the MIP software that the Sub-POD has completed its processor start-up and APIC interrupt system initialization. The data provided in the buffer in this message is the CPU-Identification Information along with the status. The status should indicate whether the processor identified is operational, whether the processor specified is "up" in the partition and not in use (waiting for a become AP message), whether the processor has failed the built-in self test, whether the processor specified as up in the BIOS configuration table has failed to respond, whether the processor specified as up in the partition but not in use has failed to respond, whether the CPU-Identification stepping ID, model ID, or family ID, did not match the CMOS configuration data, whether the BIOS has microcode for this particular processor or whether the processor is not available for some other reason including failure of the hardware or hard powered down disabled. Additional information including the APIC version, the logical APIC ID, the indicator of whether this is a BSP processor and the type of processor this processor is, can all be reported in the buffer space referenced by this message pair. In the status of the response message only a "received" status (or "not received") needs to be sent. Thus, with each of the host instruction processors reporting-in in this manner using the cluster check-in message, the MIP system is enabled to report out to a user of the MIP system the status of each POD, Sub-POD, and processor and its operational configuration within the computer system. In keeping with the reporting theme, the message "Report Fault" 10, is used by the host processor to report fault information to the MIP. This can include fault information for hardware or software faults reported by the BIOS, the hardware abstraction layer (HAL), or PSM (Same function as HAL--Stands for (UnixWare) Platform Support Module.) or other host components. This message may have various routing codes. The host routing code is used by the host software to route the response to the message back to the intended recipients transparently to the MIP. The MIP response for this to work will in all cases echo the original host routing code. Thus, in the request message, byte 18 will be reserved to indicate the operating system type so that the messages can be properly interrupted. Degree of severity, probable cause of the problem and type of alarm, as well as other information relating to the problem or status can be transferred from a buffer area indicated by the buffer address in the message. In the response message, the status choices should include indicators of "received", "internal error", or "failed". Message 11 is a request to write-protect the BIOS or extend the BIOS area. This is provided to allow the BIOS to selectively read from DRAM in the MSU and write to ROM in the DIB. This may be used during the shadowing of the option ROMs to guarantee that the option ROMs do not corrupt the BIOS area. No buffer is transferred on the request message but one of the byte codes in the buffer area, in the preferred embodiment byte code 08 is reserved to indicate the BIOS state. The response message status will indicate whether this has been a successful change in BIOS state or whether it has failed. The twelfth message, "Switch to APIC Mode" 12, is used by the operating system in the host partition to request the MIP software to switch the partition hardware from virtual wire mode to APIC mode. (APIC stands for Advanced Programmable Interrupt Controller and is an Intel function. The timing of the setup of the APIC controllers in the boot up sequence in a multiprocessor system in accord with the systems described herein as described in RA-5187, U.S. patent application Ser. No. 09/362,388, "Method and Apparatus for Initiating Execution of an Application Processor in a Clustered Multiprocessor System" and communications between APIC controllers of different Sub-PODS are discussed in RA-5262, U.S. patent application Ser. No. 09/362,389, "Method and Apparatus for Routing Interrupts in a Clustered Multiprocessor System". Both of these applications are hereby incorporated by this reference in their entirety. The MIP does not generate interrupts to the host operating system until the "Enable MIP Interrupts" message is received. It is the host operating system responsibility to issue the "Switch to APIC Mode" message before the "Enable MIP Interrupts" message. No buffer is sent in the request in this message and the status is either "successful" or "failed" in the response message. Message 13 "Enable Unsolicited Messages" is used by the host to inform the MIP software that the host software is ready to accept unsolicited messages from the management system software. Once enabled, the host cannot disable unsolicited messages. This message does not enable MIP interrupts, just unsolicited messages. Again, in the request no buffer is required and in the response the only status indicator needed is a "received". Message 14 of the host to IP messages, does enable MIP interrupts. It's called "Enable MIP Interrupts". This function is used by the host to inform the MIP software that the host software is ready to accept interrupts from the management system software run by the MIP. Once enabled these cannot be disabled. The response message by the MIP will have a received in the status area but it will also, preferably, issue a confirmatory interrupt at the same time. The next messages "Heartbeat" and "Heartbeat Enable" is actually a pair of paired messages. The Heartbeat message 15 will be issued at predefined intervals by the host software to the MIP software to merely validate that the host is operational. If the MIP does not receive this message in a predefined interval the MIP software may initiate an automatic reboot if other partition state information indicates that this partition is not operating. In the request message in a Heartbeat message pair, the Heartbeat's source should be identified and in the preferred embodiment two bytes are reserved for this in the buffer address area. A response by the MIP indicating that the Heartbeat request has been "received" or "not received" is preferred. In the Heartbeat Enable message 16, the host software will be telling the MIP to enable the Heartbeat, in other words, to look for the Heartbeat message and start a timer mechanism. This message should specify to the MIP the Heartbeat interval. In the preferred embodiment request, both the interval and the Heartbeat source information will be included in the buffer address area of the request message. The reply by the MIP will include in the response message status area either that it has been "received" and thus indicate that the MIP will start timing or that it has "not" been "received" indicating that there is some failure in the MIP software or some inability to accomplish the Heartbeat monitoring function. Each Heartbeat request should increment the value in the sequence number byte fields, 04-07. Message 17 "MIP I/O Redirection Table-Modify" is used by the host to request a change to the redirection table information. In this protocol the I/O redirection table entry for the MIP can only be written by the MIP, it is a dynamic string that is only MIP accessible. This table entry for the MIP specifies the interrupt delivery mechanism for MIP interrupts. On receipt of this Message 17, the MIP software is expected to write the I/O redirection table information in all DIBs in the partition through the MIP_INT dynamic string. The host will use this message request 17 to request a change to the redirection table information including destination ID, destination EID, delivery mode, and interrupt vector. The bytes in the request message 08--0b, will contain the delivery mode interrupt vector destination EID and destination ID information currently used by this host processor partition. Since the MIP software will be writing the I/O redirection table information through a different communications channel (this is the MIP_INT dynamic string), the response message is simple, it is either successful or it has failed for some reason and the failure mode may be indicated in the status bytes of the response message. Message 18 "Select Sub-POD for Lowest Priority" is provided to allow the host to change the Sub-POD mapping. When this message is received all the DIBs in the partition will be set to the same value. This is a way to let the Intel APIC system function in a multiprocessor environment. Thus, where a processor in a Sub-POD has been selected to handle the undirected interrupts (called lowest priority interrupts in the Intel APIC system) for the partition, all the DIBs in this partition will generate service requests at the lowest priority value for this Sub-POD processor. This message pair is relevant for handling interrupts within a Sub-POD. For I/O interrupts there is a APIC mode called lowest priority mode and in this mode the interrupt is sent to a group of processors and the processor with the lower priority claims and handles the interrupt. In multiprocessor systems using the inventive protocol, there is no method for all processors in a partition to make the lowest priority decision, instead it must be done on a Sub-POD basis. The DIB (Direct I/O Bridge) should have a register to allow the selection of the Sub-POD for lowest priority interrupts based on the lower 3 bits of the interrupt vector. For first deliveries, this register will be initialized such that Sub-POD selection is rotated for each vector number. This means vector 0 will go to the lowest Sub-POD in the partition, vector 1 will go to the next Sub-POD, and so forth, rotating to the lowest until all 8 vectors have a value. Thus, in the request message, bytes 0f-08 represent the values for the vectors for each Sub-POD of the 8 Sub-PODs, vector 0 indicating the Sub-POD identifier with the lowest priority through vector 7 indicating the Sub-POD with the highest priority. Thus the lower 3 bits of the interrupt vector which are 0 would be directed to the lowest priority processor on Sub-POD identified with vector 0 in byte 08. The lower 3 bits of the vector being 7, would be directed to the lowest priority processor in the Sub-POD identified in bit 0f. The response again may indicate merely whether the operation was successful or if it has failed and the failure mode if any is desired to be indicated. Message 19 "Shutdown". The BIOS or host OS may desire to shutdown its partition thus request the MIP to perform a shutdown of the partition. Preferably there are two options, shutdown with automatic reboot, and shutdown with a halt. If an automatic reboot is preferred the MIP will automatically reboot this partition, if a halt, the MIP will halt all units in the partition. The host should not issue such a command unless the partition is in a reboot or halt state if possible. On auto recovery it is preferred that as a part of the auto recovery sequence the L2 caches of the processors are flushed prior to reset and the management system will perform return of all the data in the partition back to the MSU which includes the L2 cache in all processors prior to initiating a reboot. In this way the MSU will have ownership of all the data for the partition. This return of L2 caches will be performed regardless of whether the shutdown is with a halt or a reboot. If shutdown and halt is selected and sharing had been enabled for the partition, the management system will need to clear the sharing information for this partition and all the other partition's agent tables that may have participated in the sharing. By enabling this message pair, this protocol allows for smooth shut down procedures for each partition. In the preferred embodiment, 2 bytes in the request message are reserved for the reboot selection, although a single bit could be used. The response message should provide an indication that the request was received. Message 20, "Change Partitioning State", is issued by the host to the MIP in order to dynamically change the partitioning state of the host unit. This can be initiated due to a host user interface command, a fault, a workload management software request, or can be issued after the MIP has issued a "Request Partitioning Change" message to the host. For this message, a buffer is included. The buffer is used by the MIP to return specialized data to the host for the unit specified in the request message. For multiprocessor system where more than one type of processor may be used, the response data for bringing up a processor must contain the processor type information from the EEPROM including stepping, model, and family. It also should include the application processor (AP) instance number within the system or other identifier if any, and the start-up address for each processor. This information will also be provided by the BIOS in the multiprocessor table after the reset sequence is complete. It is provided in the response to this message in the event that the host software cannot access the multiprocessor table when a dynamic "up" is performed. The response data for the "up" of the memory unit contains the starting address and length of the memory area and includes the number of down memory ranges for the memory unit. We only need the number and not their address because there is another message for the host to request the addresses (Request Down Memory Ranges) that moves this particular conversation along. This number, if non-zero, tells the host that it should issue the Request Down Memory Ranges message. "Request a Unit Configuration Data and Status", message 21 is used by the host to obtain the configuration data and Identifier of a specific unit. (Identifier here means the specific Intel OS type and the instance number for that particular type of unit. Non-Intel processor types could use names.) We mean by "units" here, all visible unit types that are members of the partition. This includes in our preferred embodiments, MSU's (or most preferably MEMs which are memory units that are identifiable not by hardware but as part of a particular partition, such that the MEM is the host visible memory unit or all the memory that host partition sees), IP's, PCI Bridges, PCI Buses, and CPB's (which are hardware boards to make the hardware conform to the PC specifications expected by the boot processors), all units of specific type, or all units that are members of the partition.) The partitioning state of the unit or units being inquired about will be sent too, that is, the state this host thinks this unit is in, up/down (or other state if a system has some intermediate states that are tracked. The preferred embodiment uses just up or down). The response message will have a format for the buffer for all units that are not "up" and will confirm they exist and may be available to the host partition at a future time. In the preferred embodiments the response message has a particular format for UP MEMs and UP Processors For other unit types (PCI Bridges, etc.) that are UP, and for all units that are DOWN, a generic "additional unit entry format" is returned. This can confirm that these units are members of the partition and could therefore be Upped into the partition by the host OS through a Change Partitioning State message if desired. The response message will have different formats for the buffers for all UP units unique to the unit type. Processors in up states have different information of relevance than memory or other additional units and the format will support that information by the organization of the format of the buffer. In the response message itself, the address and size of the buffer with the information will be generated by the MIP/IMS system along with the unit type and instance number generated by the host in the originating request message. In the status area of the response message, if the process will not be successful, the MIP can indicate that the unit is of an unknown type or instance number, that there is a buffer overflow error or an internal problem with the IMS software making the response unavailable. Message 22 is a request for the attributes of shared memory ("Request Attributes--Shared Memory". In the preferred embodiment systems, the MIP/IMS system keeps track of the attributes of all shared memory and provides the user interface to change these values. The host will be notified by a MIP to Host message of a change in the shared memory attributes. (See discussion of M7, FIG. 7 below for that MIP to Host message.). Here also, the unit type and instance number should be included in the request message. The buffer is used by the response message to specify the attribute values in the preferred embodiment. The length of the buffer, if zero, indicates no attribute values will be passed to the host partition and the buffer is thus not accessed by the MIP. If there was a status field value in the response indicating a buffer overflow error, this length of the buffer field will contain the buffer length required to cover the overflow. To clarify this message's process, note that the Host is providing an empty buffer. The MIP then puts the current attribute values into the buffer and passes the buffer back to the Host. The MIP cannot acquire a memory buffer in the Host's address space in the MEM which is why the Host provides the empty buffer. If the buffer is too small the overflow status tells the Host to issue a larger buffer for this conversation. The "Move Mailbox" message 23, is used to move the mailbox pair from its current location to a new location specified by the host. The response is sent to the old mailbox. If the response indicates that the MIP was not successful the new mailbox will not be used. The preferred algorithm for employing the moved mailbox function has six (6) steps: 1) the host performs a test of the new mailbox pair locations; 2) the host zeros-out the new mailbox pair locations; 3) the move mailbox message, identifying that it wants to do this in its function code, is sent by the host via the old mailbox location; 4) MIP software tests the new mailbox pair location; 5) if the test is successful the MIP sends a good status response message through the old mailbox and switches itself to use the new mailbox for all subsequent message communications; and 6) if the test is unsuccessful, the MIP sends a bad status response and the old mailbox is used for subsequent messages. In the request message, the address of the new MIP to host and new host to MIP mailbox are both loaded. The "Negotiate Mailbox Version" message 24 is used by the host to negotiate a version level for the host to MIP communications. The host will provide the minimum and maximum version level that it's software supports. The MIP software will respond with the version that it has chosen from the range specified. The host entity must be sure that all messages it has issued have been completed before issuing a negotiate mailbox version message. Prior to receiving a negotiate mailbox version message the MIP will use the lowest mailbox version level supported. It is preferred that the first message issued by the BIOS for Intel processor based servers is the negotiate mailbox version message. No buffer is required for the request message, merely an indication of the maximum version and minimum version. This can be accomplished in a few reserved bytes. The response message should indicate first whether it has been successful, and second what the preferred or selected version that the MIP will use is. "Save and Distribute Partition Data" is a message used by the host software to distribute data to other running partitions via the MIP. This will also have the MIP save away partition specific data for later retrieval. This partition specific data will be sent to all running partitions by the MIP including the original partition that has previously issued "save and distribute partition data" message. The partition data is sent to the host using the "partition data notification" MIP to host message. The partition specific data for a partition can be requested using the "request partition data" MIP to host message. The data saved and distributed is host dependent and kept track of by the MIP software. A message type field is used in the request message to allow the host software to save and distribute multiple different messages. Each time the save and distribute partition data message is received by the MIP the partition specific data is distributed and saved. Only the last copy of the partition specific data for each message type is saved by the MIP. When the host OS stops, all messages saved by the MIP for that partition are deleted. The OS type is also saved by the MIP as part of the specific data for the partition. Thus, the OS type, message type, and the partition specific data should all be included in the request message. Currently, the OS type may be either Windows NT, UnixWare, A-Series MCP or OS2200, although others could be used in the future. The response message merely needs to indicate that it has been received or that there may have been a failure in the status area. When the host software wishes to request the saved partition data, accomplished by the previous message by the MIP, it employs the "request partition data" message 26. In the request message the partition data and message type needs to be included. The response will have the partition specific data, the OS type, the partition status, the partition ID, and the message type information all included. It will also indicate that the request was received or that it is failed either because of an internal error, and unknown partition ID, or an unknown message type. Message 27 "Report Event to MIP for Notification" is used by the host OS to report event information. This can include OS type and version, software critical events including workload information, I/O events, etc. Notification could include electronic service requests, stability or performance data, remote support requests, customer alerts, and so forth, which can be initiated by the MIP/IMS system monitoring the host. Thus if a piece of the host is failing or in trouble, a request can be automatically routed through the MIP indicating that service is required. The request message would include a buffer address in the preferred embodiment and the reply would indicate receipt. The buffer format would have areas reserved for unit type, instance number, nature of fault or request and the like. The response would need to indicate successful receipt of the message and the buffer, if any was sent. "Software Check-in" message 28. This message is used by the host software to let the MIP/IMS system know that the host software is up and running. When received by the MIP, the MIP will indicate that the partition is running on its monitoring function software in preferred embodiments. All check-in messages with their accompanying partitions specific data are logged in the MIP log. The OS type partition specific data and message type information should be included in this request message. The response merely needs to indicate that it has been received or if there has been a failure in the status area of the response message. "Request Down Memory Ranges" message 29, is used by the host to obtain all the inoperative memory ranges (i.e., down memory ranges) for a specific memory unit in the "up" partitioning state. The request message must contain the instance number, the unit type, and the number of bytes in the buffer and address of the buffer to which the information should be written by the MIP. The response will repeat all this information and include the starting address and length of all the down memory ranges. The response also indicates whether it has been successful or failed in the transfer of this information and the nature of the failure if desired. Message 30 "Additional Proposed Host to MIP Software (MAP) Functions", may be designed as desired. These may include a performance monitor data collection and coordination message pair and unattended operation including power scheduling message pairs among other useful functional messages. Thus, it can be seen that with additional functions addressable through particular versions, the protocol is extensible and quite flexible. The MIP to host messaging functions are not as extensive in this protocol system. The first message, "Become AP" has been mentioned before. It is used by the MIP software during the boot sequence to convert the BSP on all non-master BSP Sub-PODs to APs, when the MIP has already received a "release APs" message from the host. This message will be directed to each non-master Sub-POD BSP check-in mailbox. When the request message is received, the BIOS changes the BSP on the Sub-POD to an AP as may be required by the particular processor. The response from the BIOS executing on the BSP for that Sub-POD should indicate whether this has been successful or failed. Message M2, "Become BSP" directs the master BSP Sub-POD that is the BSP for the partition to remain one. When this message is received, the BIOS resumes the boot of the partition using the BSP on the designated master BSP Sub-POD. In the request message a processor map area is reserved. This processor map will inform the BSP which instruction processors within its partition are up. The response from the host merely needs to indicate whether the message has been successful or failed. Message M3 "Orderly Halt" is used by the MIP to request the host OS to shutdown. When complete the host OS will issue the shutdown Host to MIP message with the appropriate reboot selection. The request message for an orderly halt will include a reboot selection. The response message merely needs to indicate receipt. Message M4 "IMS Available" will indicate to the host that the MIP is available for Host to MIP communications after a fail over to a backup MIP or a reboot of the MIP. The MIP software will issue this message to all running partitions. No special information is required in the request and only a indication of receipt in the status area of the response is needed. "Request Partitioning Change" message M5 is used for dynamic partitioning. The host will respond with a "change partitioning state" message if the partitioning state change can be made. The host routing code for this message is dependent on the unit type which is specified in the request message. The host routing code will thus depend on the unit type that is being affected by the partitioning. The host will account for changes to its processors, memory and I/O differently so it will be useful to have a different routing code for the affected unit types. Accordingly the MIP should order the partitioning requests so that partitioning can be done safely, since in the preferred embodiment only one instance of one unit type can be subject to this messaging at a time. The response message should indicate that the request message was received and whether the request can be accommodated, whether it is for an unknown unit type or instance, whether the partitioning state requested is unknown or if there is another host software error causing failure. "Down Memory Range Notification" message M6. Using this message the MIP can inform the host of new failing memory ranges in a memory unit. The MIP provides the logical memory addresses utilized by the processors through their address translation hardware not the physical memory address. Different addresses may in fact be provided for each partition which shares a single shared memory range due to the starting addresses of the shared memory for the partitions. The request message should indicate the number of bytes in the down memory range and the starting address of the down memory range. The Intel OS type and the instance number should also be indicated in the request message. The status area should indicate whether the request was received or whether it is failed and the reasons for the failure if known in the response message. Message M7 "Host Notification-Shared Memory" will notify the host of a shared memory command or of a change in the values of the shared memory attributes. The host is expected to request the attribute values using the "Request Attributes-Shared Memory" host to MIP mailbox message 22. In the request by the MIP, the notification type, the unit type, and the instance number should be included in the request message. Dump, shutdown, and attribute change, as well as any other kind of indication of what change should be occurring in the shared memory area, can be included in the notification type area of the request message. The response, of course, should indicate whether the request was successful, failed, and why it may have failed if possible. After the reply is sent and has been received, the host system, if the request was successful, should send a Request Attributes-Shared Memory message 22 (from FIG. 6), to continue the conversation and allow itself to comply with the MIP notification of changes. Message M8 "Notification of Configuration Change" is sent by the MIP software when a component is added or removed from the configuration for this partition. The request message should contain a list of the Intel OS types that have been added or removed. The host issues the request unit configuration data and status message 21 to the MIP when it receives this Notification of Configuration Change message M8 from the MIP. The MIP returns the unit information for the requested components. Along with the unit types in the request message, a count of the number of unit types listed should be included. Again, the response message should indicate that it has received the request and whether it has failed, and if so for what reason. Message M9 "Partition Data Notification" is used to distribute partition data to all the partitions that have sent a "Save and Distribute Partition Data" message 25 and are 30 still up and running. A report fault or shutdown message from the host for catastrophic failure may be interpreted by the MIP as a reason for not distributing a partition data notification message to a host partition. In the message request partition specific data, OS type, partition status, partition ID, and message type should all be included. Again, there are five (5) message types currently available, although more may be available in the future including unknown, Windows NT, UnixWare, A-Series MCP, and OS 2200. The response message merely needs to indicate receipt or failure. Message M10 "Report Fault" is used to report hardware faults from the MIP to the host. The data format in the request message will indicated the type of fault and the response merely needs to indicate receipt. Part 3. Preferred Operation of the Protocol in the System. The mailbox system. Refer now to FIG. 5 in which a main memory unit 160a is drawn in outline form. Within it, two (2) memory locations 98 and 99 are located which may be used a mailbox pair. A host processor 175 and a MIP processor 115b communicate with these mailbox pairs in a one-way process. Thus, MIP 115b can write a message to mailbox 98 employing any of the MIP to host mailbox messages available in the protocol, and the host 175 will be able to read the MIP to host message in mailbox 98, but not be able to write to it. Similarly, the host processor will be able to write a host to MIP message into mailbox 99, and the MIP processor 115b will be able to read this message but not be able to write to mailbox 99. The operation of all of the mailbox systems described previously within this document function in this general manner as illustrated in FIG. 4. Conversations. There are several types of conversations or dialogs that may be enacted using this protocol for control over the multiprocessor computer system to work. There can be conversations related to the protocol itself, conversations related to resource management, and conversations related to reporting out states of the partitions and the overall computer system to provide appropriate maintenance and reconfiguration options. All of these conversations proceed using elemental units of speech. These elemental units are the pairs of request/response messages described above, and any enhancements or additions to them. The conversations in this protocol take place through mailboxes, and always between a MIP and a host, not Host to Host. The message pair interpretation can be described by a short algorithm pair for each message pair. Algorithms for message pair interpretation. In order for any protocol to work a simple a procedure is required. Please refer to FIGS. 8 and 9 in which this procedure is outlined. In FIG. 8, the process 10 begins with the test of the mail box currently being used between the Host and the MIP that the Host has authority to write to for the messaging system being discussed. As mentioned this will be in an area of memory controlled by that host. Initially it may be a mailbox shared by numerous processors or it may be the one agreed upon between the MIP and this processor in the boot strap process. With reference to FIG. 5, the read valid bit line 102b functions to allow the host to read the valid bit from the Host.fwdarw.MIP mailbox (mailbox is referred to as "MB" in FIGS. 8 and 9), in order to assure itself that the values in the mailbox 99 are in fact valid. The MIP is allowed to write into this Host.fwdarw.MIP mailbox a valid indicator through line 101a, the clear valid bit line. In other systems one could clearly use an extra mailbox or some other communications means to signal whether data in a mailbox is valid between the two processors, but we have chosen to give each processor the ability to clear the valid bits of a mailbox it ordinarily reads, once the processor has read the contents supplied by the other. Likewise, when a new set of data is stuffed into a sending processors mailbox, it must wait for the clearing of the valid bit from the last time it stuffed the mailbox before it can know it is safe to communicate the next message. Back to FIG. 8, once the Host finds the valid bit clear 11, it can then communicate with the MIP processor by executing the processes in step 15, that is, loading the data buffer (if one is used for this message) and then writing the message in the mailbox, with the function indicator, the data itself and/or the buffer address and length if required by the message type being used. Also, as the originator, the Host will in preferred embodiments set the sequence number data field as discussed previously. Any other relevant data that may be used in more advanced protocols would be set at this time too, by loading it into the Host.fwdarw.MIP mailbox. In preferred embodiments, another channel is used to generate a service request to the MIP, to stimulate the MIP to look at the mailbox. In some systems this may not be needed if the MIP polls the mailboxes often enough for the needs of the system and its users. As mentioned previously, the mailbox in question can be either a pre-complete boot mailbox or an after boot is complete mailbox. Although this need not be a step taken in the order shown here, step 17a may be part of step 17 or it could have taken place earlier. In any event, if the communication is before the boot strap process is complete and partition mailboxes set up, the action occurs via the Host polling the MIP.fwdarw.Host mailbox for a message indicating completion of the action requested by the Host. If after boot is complete, the interrupt system would be set up and the partition mailbox in use so there would be the ability to rely on the interrupt system to avoid continually looking at the MIP.fwdarw.Host mailbox for message completion, as is indicated in step 18. In both paths, the Host after completion of either steps 16 or 18 will act on the contents of the mailbox (7 or 19) to complete or move to the next step in the dialog. In FIG. 9, the MIP.fwdarw.Host communications process 20 is set out. Here too, the mailbox written to by the MIP will be polled or tested for a valid flag. If it is set the process will wait 22 for a sufficient period as may be set by a user or by the manufacturer, but past that period 23, the IMS software controlling the MIP will have the ability to adapt to the failed communication. This may be by recording a system failure, attempting to restart, calling for service, and the like as may be best in the situation. Assuming normal communications, there will be a cleared valid bit or flag found at step 21 and the MIP will write any needed data to the data buffer, and then write the Host.fwdarw.MIP mailbox with the function code, set the valid bit, and fill in any needed addresses or sequence codes and the like to generate its message in accord with the protocol described previously. In the preferred embodiments, the MIP will also generate an interrupt to signal the Host to look at its mailbox to retrieve the MIP.fwdarw.Host message. Initialization and Boot. A preferred process for completing the set up operation for a multi-partitioned multiprocessor computer system is described in some detail in U.S. patent application Ser. No. 09/362,388, entitled, "METHOD AND APPARATUS FOR INITIATING EXECUTION OF AN APPLICATION PROCESSOR IN A CLUSTERED MULTIPROCESSOR SYSTEM", hereby incorporated herein by this reference in its entirety, and therefore this procedure is not described again here. However it is the kind of conversation or dialog that is facilitated by the inventive protocol, so it will be referred to here briefly for that purpose. The MIP starts the system by de-asserting a reset signal for a particular Sub-POD allowing that group of processors to determine which one is the master BSP (Boot Strap Processor), which will then use the Sub-POD mailbox for that Sub-POD. Once this is completed, the BSP will send a message "Cluster-Check-In Complete" Message 9, to the Sub-POD mailbox. The MIP will use that information to control the overall computer system partition configuration and resources, based on criteria specified for this computer system in the IMS (Integrated Management System) software that is running on the MIP. Thus it will determine which BSP within a given partition is to be named the BSP of that partition, after de-asserting the reset signal for (releasing) as many Sub-PODs as it will specify for the given partition. Using the protocol, the MIP will reply with an acknowledgement message to the BSP of a released Sub-POD, and after deciding which one of the available BSP's in the partition is to be the BSP for the partition, it will send messages to each Sub-POD mailbox within the partition of either a Become BSP M2 message or Become AP Ml message variety. (Obviously only one of the contending BSP's will be the BSP of the partition so only one Become BSP message will be sent for the partition). In both cases the reply message indicates merely success or failure of the message delivery, presuming the BSP will follow this instruction if it can. It should be noted that the MIP will send a Become BSP message to a single mailbox first before sending messages out to the other Sub-PODS to become AP's so that the BSP can set up its resources and confirm with a Release AP's message 4, that it is ready to be the BSP for the partition. One could use the Switch to Partition Mailbox message 5, from the BSP host to the MIP to tell the MIP to go check it out and reply with a good or not good indication in the reply message in the old mailbox. It is possible to use the Move Mailbox message 23 or combine the two since their function is so similar. Either way, once the partition mailbox use has been established, the BIOS to OS transition can take place, with the BSP host for the partition running its BIOS, and when ready, sending the MIP the BIOS to OS Transition message. In the preferred embodiment, the MIP will have loaded the BIOS into main memory for each partition. The BIOS is located in a file on the MIP hard disk. The file contains the BIOS code and information on where to load the BIOS. The MIP loads the BIOS in main memory using the information from the BIOS file. Multiple segments may be loaded by the MIP from the BIOS file at designated addresses specified in the file. The load of the BIOS configuration data area, in main memory is preferably at the address 0.times.F200. This loading is done before the de-assertion of the reset for the Sub-PODs. Additionally in the preferred embodiment, the BSP on the first Sub-POD let out of reset negotiates the mailbox version allowing the BIOS to issue the Negotiate Mailbox Version message 24 to the MIP. The BIOS writes the host to MIP mailbox for the Sub-POD (Cluster) with the Negotiate Mailbox Version message, the data includes the minimum and maximum version supported by the BIOS. The MIP clears the mailbox valid bit. The MIP returns the version selected for the Mailbox in the status of the Negotiate Mailbox Version response. This is the version that will be used for all of the messages between the host BIOS for this BSP and the MIP. The version applies to both the mailbox format and the message data format. The MIP will maintain under IMS control, an indication of the BSP processor for use during the operation of the partition, to allow the partition to be rebooted if needed, and to perform other useful functions. The MIP provides a list of the IPs to remove from the MP Table in the mailbox. The default is to indicate the IPs that failed to start. If a flag to allow the down of a Sub-POD on an IP failure during boot, then the data will indicate all IPs on the Sub-POD. The MIP initializes the BSP Sub-POD to direct the range between A.sub.-- 0000-B_FFFF to direct accesses to the Compatibility DIB. BIOS begins VGA execution from the ROM. The shadow of the video ROM to C.sub.-- 0000 to C.sub.-- 7FFF is performed by the BIOS writing the base register in the PCI card to a higher memory mapped address, the data is read from the card and written to main memory at address C.sub.-- 0000. If the BIOS decided to write protect the shadowed video ROM area, the BIOS will issue the Shadow ROM message 8 to the MIP. The BSP host allows its BIOS to issue the Shadow ROM message to the MIP to set the video ROM range register to the mode where the reads are from main memory and the writes to ROM (DIB). The MIP halts the Sub-POD. The MIP then writes the register in the Sub-POD and resumes the Sub-POD, writing the message response indicating completion. (These addresses are details of the Intel PC boot requirements. The compatibility DIB is the DIB (PCI-BRIDGE) which contains special hardware called the CPB (Compatibility Board). This special hardware is required to boot an Intel based system under Windows and/or UnixWare. In addition, each partition may have at most one instance of this hardware. More than one DIB may actually contain this hardware. In order to adhere to the "at most one" requirement above, redundant instances of the Compatibility hardware are disabled so that only one instance is usable in any partition. The compatibility DIB is the one containing the compatibility hardware which is to be used for this partition.) In the CMP platform, the MIP tests and verifies main memory as a part of the MSU initialization, instead of the BIOS. The BIOS issues the Release APs message to the MIP. Each AP (Application Processor or just processor) is in a loop waiting for a fixed value to be written in a safe memory location by the host OS. The write of this memory location releases the processor for execution by the host OS. The safe address is selected by the BIOS in the BIOS memory area and is passed to the host in the OEM Table. In the preferred embodiment system, which we sometimes call a CMP platform, the execution of a HALT instruction and then the usage of the STARTUP IPI cannot be done due to the Hierarchical APIC structure and the platform does not support physical APIC mode. The BIOS scans for expansion ROMs, and initializes and shadows the boot device expansion ROMs using the following procedure: The MIP initializes the Sub-POD range registers to direct the range between C.sub.-- 0000-D_FFFF to the mode where the reads and writes are from RAM (main memory). The shadow of the expansion ROMs to the C.sub.-- 0000 to D_FFFF range is performed by the BIOS writing the base register in the PCI card to a higher memory-mapped address, the data is read from the card then written to main memory at the address in the range between C.sub.-- 0000 to D_FFFF. If the BIOS decides to write protect the shadowed expansion ROM area, the BIOS will issue the Shadow ROM message to the MIP. BIOS issues the Shadow ROM message to the MIP to set the appropriate range register(s) to the mode where the reads are from main memory and the writes to ROM (DIB). MIP halts the Sub-POD. MIP writes the register(s) in the Sub-POD and resumes the Sub-POD. MIP writes the message response indicating completion. The BIOS enables hardware interrupts. 4. Conclusion We have described a system for communicating between any host processor in a multi-host multiprocessor system which includes a protocol useable with Intel brand processors. It should be recalled that a Host processor may include a BSP and a plurality or multiplicity of subordinate processors we called APs in a preferred embodiment running computer system. The protocol requires a messaging delivery system which we describe as a mailbox pair or channel that can reside in the host memory, accessible by a management processor with independent memory. Numerous specific pairs of messages are described which are of four basic varieties, filling out the possible combinations of messages with buffer pointers and those without. Details of how the management processor gets information into and out of the host's memory locations is described also. Other processor types may be used with this system and other computer system configurations may also take advantage of the teachings provided herein. Accordingly, the scope of the invention is only limited by the following appended claims.
|
Same subclass | ||||||||||
