Operations controller for a fault tolerant multiple node processing system4980857Abstract A task communicator for each node in a multiple node processing system having a data memory storing data received from a voter interface which is used for the execution of tasks by an associated applications processor, a next task register storing the identification code of the next task to be executed by the applications processor received from a scheduler through a scheduler interface. An input handler passes the identification code of the next task and the data required for the execution of that task to an input FIFO register interfacing the applications processor. An output FIFO register temporarily stores the data generated by the applications processor and an output handler generates inter-node messages containing data stored in the output FIFO and passes these inter-node messages to a transmitter through a transmitter interface for transmission to all of the other nodes in the processing system. Claims What is claimed is: Description CROSS REFERENCE
TABLE I
______________________________________
Inter-Node Message Format
Message
Type Description/ Byte
Number Abbreviation Number Context
______________________________________
MT0 One Byte 1 NID/Message Type
Data Value 2 Data ID
3 Data Value
3 Data Value
4 Block Check
MT1 Two Byte 1 NID/Message Type
Data Value 2 Data I.D.
3-4 Data Value
5 Block Check
MT1 Task Interactive
1 NID/Message Type
Consistency (TIC)
2 Data I.D. = 0
3 Task Completed
Vector
4 Task Branch
Condition Bits
5 Block Check
MT2 Four Byte 1 NID/Message Type
Data Value 2 Data I.D.
(D4B) 3-6 Data Value
7 Block Check
MT3 Four Byte 1 NID/Message Type
Data Value 2 Data I.D.
(D4B2) 3-6 Data Value
7 Block Check
MT4 Base Penalty 1 NID/Message Type
Count 2 Base Count 0
(BPC) 3 Base Count 1
4 Base Count 2
5 Base Count 3
6 Base Count 4
7 Base Count 5
8 Base Count 6
9 Base Count 7
10 Block Check
MT5 System State 1 NID/Message Type
(SS) 2 Function Bits
3 Task Completed
Vector
4 Tank Branch
Condition Bits
5 Current System
State
6 New System State
7 Period Counter
(High)
8 Period Counter
(Low)
9 ISW Byte
10 Reserved
11 Block Check
MT6 Task Completed/
1 NID/Message Type
Started 2 Completed Task ID
(TC/S) 3 Started Task ID
4 Branch Condition/
ECC
5 Block Check
MT7 Error 1 NID/Message Type
(ERR) 2 Faulty Node ID
3 Error Byte 1
4 Error Byte 2
5 Error Byte 3
6 Error Byte 4
7 Penalty Base
Count
8 Penalty Increment
Count
9 Block Check
______________________________________
The inter-node messages all have the same basic format so as to simplify their handling in the receiving mode. The first byte of each inter-node message contains the Node identification (NID) code of the Node from which the message originated and a message type (MT) code identifying the message type. The last byte in each inter-node message is always a block check byte which is checked by the Receivers 32a through 32n to detect transmission errors. There are four different Data Value messages which range from a one byte Data Value message to a four byte Data Value message. These Data Value messages are identified as message types MT0 through MT3. The second byte of a Data Value message is a data identification (DID) code which when combined with the message type code uniquely identifies that particular data value from other data values used in the system. The data identification (DID) code is used by the Message Checker 34 to define the types of checks that are to be performed. The MT/DID codes ae used to identify which limits will be used by the Message Checker 34 and the deviance to be used by the Voter 38 to define the permissible deviance of each actual data value from the voted values and by the Task Communicator 44 to identify the data value to be supplied to the Applications Processor 14 in the execution of the current task. The bytes following the data identification byte are the data values themselves with the last byte being the block check byte as previously indicated. A Task Interactive Consistency (TIC) message is a special case of the two byte Data Value message which is identified by the DID being set to zero (0). The Task Interactive Consistency message, message type MT1, is a rebroadcast of the task completed vector and branch condition data contained in Task Completed/Started (TC/S) messages received from the other Nodes and are transmitted at the end of each Subatomic period (SAP), as shall be explained in the discussion of the timing sequence. The information content of the Task Interactive Consistency messages are voted on by each Node and the voted values are used by the Scheduler 40 in the task selection and scheduling process. A Base Penalty Count (BPC) message, message type MT4, contains the base penalty count that the individual Node is storing for each Node in the system including itself. Each Node will use this information to generate a voted base penalty count for each Node in the system. Thereafter, each Node will store the voted base penalty count as the current base penalty count for each Node. This assures that at the beginning of each Master period each Node is storing the same number of base penalty counts for every other Node in the system. The Base Penalty Count message is transmitted by each Node at the beginning of each Master period timing interval. A System State (SS) message, message type MT5, is sent at the end of each Atomic period timing interval and is used for the point-to-point synchronization of the Nodes and to globally affirm reconfiguration when a majority of the Nodes conclude that reconfiguration is required. The transmission of the System State message is timed so that the end of its transmission coincides with the end of the preceding Atomic period and the beginning of the next Atomic period. The first byte of the System State message contains the node identification (NID) code of the originating Node and the message type (MT) code. The second byte contains three function bits, the first two bits are the synchronization and presynchronization bits which are used in the Synchronization process described above. The third bit identifies whether or not the Node is operating or excluded. The third and fourth bytes of the System State messages are the task completed vector and the branch condition vector, respectively. Byte five contains the current system state vector and byte six contains the the new system state vector. When the sending Node has concluded reconfiguration is necessary, the new system state vector will be different from the current state vector. Byte seven and eight contain the higher and lower order of bits of the Node's own period counter. Byte nine is an "in sync with" (ISW) vector which defines which Nodes that particular Node determines, it is synchronized with, and byte ten is reserved for future use. Byte eleven is the conventional block check byte at the end of the message. The Synchronizer uses the time stamp of the pre-synchronization System State messages, identified by the pre-synchronization bit in the second byte being set to generate an error estimate used to compute a correction to the time duration of the last Subatomic period. This correction synchronizes the beginning of the next Atomic period in that Node with the Atomic period being generated by the other Nodes. The period counter bytes are used to align the Master periods of all the Nodes in the system. The period counter counts the number of Atomic periods from the beginning of each period and is reset when it counts up to the fixed number of Atomic periods in each Master period. Byte nine is used only during an automatic cold start as shall also be explained in more detailed in the discussion of the Synchronizer 46. The Task Completed/Started (TC/S) message, message type MT6, is generated by the Task Communicator 44 each time the Applications Processor 14 starts a new task. The second and third bytes of the Task Completed/Started message contain the task identification (TID) codes of the task completed and new task started by the Node's Applications Processor 14. The fourth byte of this message contains the branch condition of the completed task, and an error correction code (ECC). The last inter-node message is the Error message, message type MT7, which is sent whenever the Transmitter 30 is free during an Atomic period. Only one error message reporting the errors attributed to a particular Node can be sent in an Atomic period. The second byte of the Error message is the Node identification (NID) code of the Node accused of being faulty. The following four bytes contain error flags identifying each error detected. The seventh and eighth bytes of the error message contain the base penalty count of the identified Node and the increment penalty count which is to be added to the base penalty count if the errors are supported by Error messages received from other Nodes. The increment penalty count is based on the number of errors detected and the severity of these errors. This information is used by the other Nodes to generate a new voted base penalty count for the Node identified in the Error message. A separate Error message is sent for each Node which generates a message having a detected error. TIMING PERIODS The overall control system of the multi-computer architecture contains a number of concurrently operating control loops with different time cycles. The system imposes the constraint that each cycle time be an integer power of two times a fundamental time interval called an Atomic period. This greatly simplifies the implementation of the Operations Controller 12 and facilitates the verification of correct task scheduling. The length of the Atomic period is selected within broad limits by the system designer for each particular application. The System State messages which are used for synchronization are sent at the end of each Atomic period. The longest control loop employed by the system is the Master period. Each Master period contains a fixed number of Atomic periods, as shown in FIG. 3. All task scheduling parameters are reinitialized at the beginning of each Master period to prevent the propagation of any scheduling errors. The Nodes will also exchange Base Penalty Count messages immediately following the beginning of each Master period. The shortest time period used in the system is the Subatomic (SAP) period, as shown in FIG. 4, which defines the shortest execution time recognized by the Operations Controller 12 for any one task. For example, if the execution time of a task is less than a Subatomic period, the Operations Controller 12 will not forward the next scheduled task to the Applications Processor 14 until the beginning of the next Subatomic period. However, when the execution time of a task is longer than a Subatomic period, the Operations Controller 12 will forward the next scheduled task to the Applications Processor as soon as it is ready for it. There are an integer number of Subatomic periods in each Atomic period which are selectable by the systems designer to customize the multi-computer architecture to the particular application. As shown in FIG. 4, each Subatomic period is delineated by a Task Interactive Consistency message as previously described. TRANSMITTER FIG. 5 is a block diagram of the Transmitter 30 embodied in each of the Operations Controllers 12. The Transmitter 30 has three interfaces, a Synchronizer Interface 50 receiving Task Interactive Consistency messages and System State messages generated by the Synchronizer 46, a Fault Tolerator Interface 52 receiving the Error and Base Penalty Count messages generated by the Fault Tolerator 36, and a Task Communicator Interface 54 receiving Data Value and Completed/Started messages generated by the Task Communicator 44. The three interfaces are connected to a Message Arbitrator 56 and a Longitudinal Redundancy Code Generator 58. The Message Arbitrator 56 determines the order in which the messages ready for transmission are to be sent. The Longitudinal Redundancy Code Generator 58 generates a longitudinal redundancy code byte which is appended as the last byte to each transmitted message. The message bytes are individually transferred to a Parallel-to-Serial Converter 60 where they are framed between a start bit and two stop bits, then transmitted in a serial format on communication link 16. The Transmitter 30 also includes a Self-Test Interface 62 which upon command retrieves a predetermined self-test message from an external ROM (not shown) which is input into the Longitudinal Redundancy Code Generator 58 and transmitted to the communication link by the Parallel-to-Serial Converter 60. The Transmitter 30 also has an Initial Parameter Load Module 64 which will load into the Transmitter various predetermined parameters, such as the length of the minimum synchronization period between messages, the length of a warning period for Interactive Consistency and System State messages and the starting address in the ROM where the self-test messages are stored. As shown in FIG. 6, each of the three interfaces has an eight bit input register 66 which receives the messages to be transmitted from its associated message source through a multiplexer 68. The multiplexer 68 also receives the three bit Node identification (NID) code which identifies the Node which is generating the message. Whenever the associated message source has a message to be transmitted, it will hold the message until a buffer available signal is present signifying the input register 66 is empty. The message source will then transmit the first byte of the message to the input register 66. A bit counter 70 will count the strobe pulses clocking the message into the Input Register 66 and will in coordination with a flip flop 72 and an AND gate 74 actuate the multiplexer 68 to clock the three bit Node identification code into the Input Register 66 as the last three most significant bits of the first byte. The flip flop 72 is responsive to the signal "transmit quiet period" (TQP) generated at the end of its preceding message to generate a first byte signal at its Q output which enables AND gates 74 and 76. The AND gate 74 will transmit the three most significant bits generated by the bit counter 70 in response to the strobe signals loading the first byte into the input register 66 and will actuate the multiplexer 68 to load the three bit Node identification code into the three most significant bit places of the input register 66. The AND gate 76 will respond to the loading of the eighth bit into input register 66 and will generate an output which will actuate the flip flop 78 to a set state. In the set state, the flip flop 78 will generate a message available signal at its Q output and will terminate the buffer available signal at its Q output. The message available (MA) signal will reset the flip flop 72 terminating the first byte signal which in turn disables the AND gates 74 and 76. The message available (MA) signal is also transmitted to the Message Arbitrator 56 signifying a message is ready for transmission. Termination of the buffer available (BA) signal when the flip flop 78 is put in the set state inhibits the message source from transmitting the remaining bytes of the message to the Transmitter 30. The first three least significant of bits of the first bytes, which are the message type code, are communicated directly to the Message Arbitrator 56 and are used in the arbitration process to determine which message is to be sent if more than one message is available for transmission or if the sending of that message will not interfere with the transmission of a time critical message generated by the Synchronizer 46. The Message Arbitrator 56 will generate a transmit (Txxx) signal identifying the next message to be sent when there is more than one message ready for transmission. This message will actuate the Longitudinal Redundancy Code Generator 58 to pass the selected message to the Parallel-to-Serial Converter for transmission. The transmit signal will also reset the flip flop 78 in the appropriate interface which reasserts the buffer available (BA) signal, actuating the associated message source to transmit the remaining bytes of the message to the interface. These are then transmitted directly to the Longitudinal Redundancy Code Generator 58 as they are received. When all of the bytes of the message are transmitted, the Message Arbitrator 56 will generate a transmit quiet period (TQP) signal which actuates the Parallel-to-Serial Converter to transmit a null (synchronization) signal for a predetermined period of time following the transmission of each message. In the preferred embodiment, the quiet period is a time required for the transmission of 24 bits or two (2) null bytes. The transmit quiet period (TQP) signal will also set the flip flop 72 indicating that the preceding message has been sent and that the next byte received from the associated message source will be the first byte of the next message. The details of the Message Arbitrator 56 are shown on FIG. 7. Under normal operation when no critical time messages, such as Task Interactive Consistency (TIC) and System State (SS) messages, are to be sent, a Fault Tolerator (FLT) Task Communicator (TSC) Arbitration Logic 82 will generate, in an alternating manner, PFLT and a PTSC polling signals which are received at the inputs of AND gates 84 and 86, respectively. The AND gate 84 will also receive the Fault Tolerator Message Available (FLTMA) signal generated by the Fault Tolerator Interface 52 while AND gate 86 will receive a Task Communicator message available (TSCMA) signal generated by the Task Communicator Interface 54 after the Task Communicator 44 has completed the loading of the first byte of the message ready for transmission. The outputs of the AND gates 84 and 86 are transmit Fault Tolerator (TFLT) and transmit Task Communicator (TTSC) signals which are applied to AND Gates 88 and 90, respectively. The alternate inputs to AND gates 88 and 90 are received from a Time Remaining-Message Length Comparator 92 which produces an enabling signal whenever the transmission of the selected message will not interface with the transmission of a time dependent message as shall be explained hereinafter. If the AND gate 88 is enabled it will pass the transmit Fault Tolerator (TFLT) signal to the Fault Tolerator Interface 52 to reassert the buffer available signal, enabling it to receive the remaining bytes of the message from the Fault Tolerator 36 and to the Longitudinal Redundancy Code Generator 58 enabling it to pass the message, byte-by-byte from the Fault Tolerator Interface 52 to the Parallel-to-Serial Converter 60 for transmission on the communication link 16. In a like manner, when the AND gate 90 is enabled, and the polling of the Task Communicator Interface 54 indicates that the Task Communicator 44 has a message ready for transmission, then the AND gate 86 will generate a transmit Task Communicator (TTSC) signal which, if passed by the AND gate 90, will result in the transmission of the Task Communicator's message. The TFLT and the TTSC signals, when generated, are fed back to lock the FLT - TSC Arbitration Logic 82 in its current state until after the message is sent. The message arbitration between the Fault Tolerator's and Task Communicator's messages is primarily dependent upon the type of the message currently being transmitted. The logic performed by the FLT-TSC Arbitration Logic 82 is summarized on Table II.
TABLE II
______________________________________
FLT-TSC Abitration Logic Table
Poll Next Then
Poll Next Then
Current Message
Alternate Wait for Message
______________________________________
Fault Tolerator
Task Communicator
Task Communicator
Fault Tolerator
System State Fault Tolerator
(Master Period)
System State
Task Communicator
(Atomic Period)
Interactive Task Communicator
Consistency
Self Test Task Communicator
______________________________________
Normally the FLT-TSC Arbitration Logic 82 will poll the Fault Tolerator Interface 52 and the Task Communicator Interface 54 in an alternating sequence. However, at the beginning of each Atomic period, the FLT-TSC Arbitration Logic 82 will first poll the Task Communicator Interface 54 for a Task Completed/Started message which will identify the task being started by that Node. If the Task Completed/Started message is not available it will then poll the Fault Tolerator Interface 52. At the beginning of each Master period, all of the Nodes should transmit a Base Penalty Count message which is used for global verification of the health of each Node in the system. Therefore, after each System State message which is coincident with the beginning of a Master period, the FLT-TSC Arbitration Logic will first poll the Fault Tolerator Interface 52 and wait until it receives the Base Penalty Count message from the Fault Tolerator 36. After the transmission of the Base Penalty Count message, it will then poll the Task Communicator Interface 54 and transmit a Task Completed/Started message identifying the task scheduled to be started by the Applications Processor. If the Fault Tolerator 36 does not generate a Base Penalty Count message within a predetermined period of time, the FLT-TSC Arbitration Logic 82 will resume polling of the Fault Tolerator Interface 52 and the Task Communicator Interface 54 in an alternating sequence. In a like manner, after a self-test message, the FLT-TSC Arbitration Logic 82 will poll the Task Communicator Interface 54 and wait for a Task Completed/Started message. The Synchronizer 46 will load the first byte of either a Task Interactive Consistency or System State message in the Synchronizer Interface 50 a predetermined period of time before the beginning of the next Subatomic or Atomic periods. A warning Period Generator 94 will load a warning period counter with a number corresponding to the number of bits that are capable of being transmitted before the Task Interactive Consistency or System State messages are to be transmitted. As described previously, the transmission of the final bit of either of these messages marks the end of the previous Subatomic or Atomic periods respectively, therefore, their transmission will begin a predetermined time (bit counts) before the end of the period. Since the Task Interactive Consistency and System State messages are of different bit lengths, the number loaded into the warning period counter will be different. The Warning Period Generator 94 will decode the message type code contained in the first byte of the message stored in the Synchronizer Interface 50 and will load the warning period counter with a number indicative of the length of the warning period for that particular type of time critical message. The warning period counter will be counted down at the bit transmission rate of the Parallel-to-Serial Converter 60 to generate a number indicative of the time remaining for the transmission of a time critical message. The number of counts remaining in the warning period counter are communicated to a Synchronizer Transmission Control 96 and the Time Remaining-Message Length Comparator 92. When the warning period counter is counted down to zero the Synchronizer Transmission Control 96 will generate a transmit synchronizer (TSYN) signal which will actuate the Synchronizer Interface 50 to reassert the buffer available signal and will actuate the Longitudinal Redundancy Code Generator 58 to pass the message from the Synchronizer Interface 50 to the Parallel-to-Serial Converter 60 for transmission on the Node's own communication link 16. The Time Remaining-Message Length Comparator 92 will decode the message type of a message selected for transmission by the FLT-TSC Arbitration Logic and determine the number of bits that have to be transmitted for that message. To this number the Time Remaining-Message Length Comparator 92 will add a number equal to the number of bits corresponding to the quiet period between the messages and compare the sum of the message and the quiet period with the count remaining in the warning period counter to determine if the transmission of the selected message will or will not interfere with the transmission of the time critical message from the Synchronizer Interface 50. If the transmission of the selected message will not interfere with the sending of the time critical message from the Synchronizer 46, the Time Remaining-Message Length Comparator 92 will generate a signal enabling AND gates 88 and 90 to pass the TFLT or TTSC signals, otherwise the Time Remaining-Message Length Comparator 92 will generate a signal disabling AND gates 88 and 90, inhibiting the transmission of the selected message from either the Fault Tolerator Interface 52 or the Task Communicator Interface 54. This signal will also toggle the FLT-TSC Arbitration Logic 82 to poll the nonselected interface to determine if it has a message to transmit. If the nonselected interface has a message ready for transmission, the Time Remaining-Message Length Comparator 92 will determine if there is sufficient time to transmit the message from the nonselected interface before the transmission of the time critical message from the Synchronizer Interface 50. If there is sufficient time, the message from the nonselected interface will be transmitted, otherwise the AND gates 88 and 90 will remain disabled. The Message Arbitrator 56 also has a Byte Counter 100 which counts the number of bytes transmitted by the Parallel-to-Serial Converter 60. The output of the Byte Counter 100 is received by a Message Byte Logic 102. The Message Byte Logic 102 decodes the message type code of the message being transmitted and determines the number of bytes in that message. After the last byte of the message is transmitted, the Message Byte Logic 102 will first generate a transmit longitudinal redundancy code (TLRC) signal which enables the Longitudinal Redundancy Code Generator 58 to transmit the generated longitudinal redundancy code as the final byte of the message. The Message Byte Logic 102 will then generate a transmit quiet period (TQP) signal enabling the Parallel-to-Serial Converter 60 to transmit the null signal for a predetermined number of bytes which is used for message synchronization. The transmit quiet period (TQP) signal is also transmitted to the Synchronizer Transmission Control 96 where it is used to terminate the transmit synchronizer (TSYN) signal. At the end of the quiet period, the Message Byte Logic 102 will generate an end of quiet period (EQP) signal which will reset the Byte Counter 100 and unlatch the FLT-TSC Arbitration Logic 82 for selection of the next message for transmission. A Self-Test Arbitration Logic 104 recognizes a request for a self-test in response to a transmitted Task Completed/Started message in which the task identification (TID) code is the same as the Node identification (NID) code. After the transmission of a self-test request message, the Self-Test Arbitration Logic 104 will inhibit a Task Communicator Enable (TSCE) signal and a Fault Tolerator Enable (FLTE) signal as shown in FIG. 8 which, when applied to AND gates 84 and 86, respectively, inhibits all transmissions from the Fault Tolerator Interface 52 or the Task Communicator Interface 54. Immediately following the next Task Interactive Consistency or System State message, the Self-Test Arbitration Logic 104 will generate a transmit self-test (TSLT) signal which will actuate the Self-Test Interface 62 to read the self-test message from an associated off board (read only memory) ROM. The (TSLT) signal will also enable the Longitudinal Redundancy Code Generator 58 to pass the self-test message from the Self-Test Interface 62 to the Parallel-to-Serial Converter 60 for transmission. After transmission of the self-test message, the Self-Test Arbitration Logic 104 will restore the Task Communicator Enable (TSCE) signal to permit the transmission of a Task Completed/Started message signifying the completion of the self-test. As indicated in Table II, the FLT-TSC Arbitration Logic 82 will automatically select the message from the Task Communicator Interface 54 as the next message to be transmitted following the transmission of the self-test message. After the transmission of the Task Completed/Started message the Self-Test Arbitration Logic 104 will terminate the Task Communicator Enable (TSCE) signal until after the next Task Interactive Consistency or System State message is transmitted as indicated in FIG. 8. The Self-Test Interface 62 serves to transfer the self-test message from the off board ROM (not shown) to the Longitudinal Redundancy Code Generator 58. The off board ROM will store a plurality of Self-test messages which are transmitted one at a time in response each time a Self-test is requested. The first byte of each Self-test message is a number indicative of the number of bytes in the Self-test message which is passed back to the Message Byte Logic 102 to identify the completion of the self-test. The last byte in each self-test message stored in the off board ROM is the starting address for the next Self-test message. The starting address is not transmitted, but rather is stored in the Self-Test Interface 62 to locate the next Self-test message in the off board ROM to be transmitted. The last byte of the last Self-test message stored in the off board ROM contains the starting address of the first Self-test message, so that the Self-test message sequence is repeated. The starting address for the first Self-test message is loaded into the Self-Test Interface 62 by the Initial Parameter Load Module 64 in response to an initial load command generated by the Synchronizer 46 in response to the electrical power being turned on. As illustrated in FIG. 9, the Longitudinal Redundancy Code Generator 58 has an 4:1 Input Multiplexer 110 which receives the message bytes from the Synchronizer Interface 50, Fault Tolerator Interface 52, Task Communicator Interface 54, and Self-Test Interface 62. The Input Multiplexer 110 controls which message will be transmitted to the Parallel-to-Serial Converter 60 in response to the transmit (TFLT, TTSC, TSYN, and TSLT) signals generated by the Message Arbitrator 56, as previously described. Each byte of a message selected for transmission by the Message Arbitrator 56 is transmitted to an Output Multiplexer 112 by means of nine parallel lines, one for each bit in the received byte plus the parity bit generated by the associated interface. A Longitudinal Redundancy (LR) Bit Generator 114 is connected to each of the nine parallel bit lines and collectively generates a nine bit longitudinal redundancy code. Each bit in the longitudinal redundancy code is a function of the bit values in the same bit locations in the preceding bytes. The outputs of all the LR bit generators 114 are also received by the Output Multiplexer 112. The Output Multiplexer 112 is responsive to the transmit longitudinal redundancy code (TLRC) signal generated by the Message Arbitrator 56 to output the last bit generated by each of the LR bit generators 114 as the last byte of the message being transmitted. The output of the Output Multiplexer 112 is connected directly to the Parallel-to-Serial Converter 60 which frames each received byte between predetermined start and stop bits before it is transmitted on the Node's communication link. RECEIVERS The structures of the Receivers 32a through 32n are identical, therefore, only the structure of the Receiver 32a will be discussed in detail. Referring to FIG. 10, the messages from Node A transmitted on communication link 16a are received by a Noise Filter and Sync Detector 116. The synchronization portion of the Noise Filter and Sync Detector 116 requires that a proper synchronization interval exists prior to the reception of a message. As described relative to the Transmitter 30, the synchronization interval preferably is the time required for the Transmitter 30 to transmit two complete null bytes after each transmitted message. The low pass portion of the Noise Filter and Sync Detector 116 prevents false sensing of the "start" and "stop" bits by the Receiver 32a due to noise which may be present on the communication link 16a. The low pass filter portion requires that the signal on the communication link 16a be present for four (4) consecutive system clock cycles before it is interpreted as a start or a stop bit. The Noise Filter and Sync Detector 116 will generate a new message signal in response to receiving a start bit after a proper synchronization interval. After passing through the Noise Filter and Sync Detector 116 the message, byte-by-byte, is converted from a serial to a parallel format in a Serial-to-Parallel Converter 118. The Serial-to-Parallel Converter 118 also determines when a complete 12-bit byte has been received. If the 12-bit byte is not properly framed by a "start" and two "stop" bits, a new bit is added, the bit first received is discarded and the framing is rechecked. Framing errors are not flagged by the Receiver 32a since this fault will manifest itself during a vertical parity check. After conversion to a parallel format, the start and stop bits are stripped from each byte and the remaining 9-bit byte is transferred to a Longitudinal Redundancy Code and Vertical Parity Code (LRC and VPC) Checker 122 to check for parity errors. The error checking logic outputs the current combinational value of the vertical parity and the longitudinal redundancy codes. The vertical parity check portion checks the parity vertically across the received message while the longitudinal redundancy code checker portion performs a longitudinal redundancy code check on each byte received from the Serial-to-Parallel Converter 118. The Message Checker 34 decodes the message type information contained in the first byte of the message and determines which byte is the last byte in the message and, therefore, for which byte the longitudinal redundancy code check is valid. The Message Checker 34 will ignore all other LRC error signals generated by the LRC and VPC Code Checker 122. In parallel with the vertical parity and longitudinal redundancy checks, the 8-bit message byte is transferred to a Buffer 120 which interfaces with the Message Checker 34. The Buffer 120 temporarily stores each 8-bit message byte until the Message Checker 34 is ready to check it. Upon receipt of a message byte, the Buffer will set a byte ready flag signifying to the Message Checker 34 that it has a message byte ready for transfer. The Message Checker 34 will unload the message bytes from the Buffer 120 independent of the loading of new message bytes by the Serial-to-Parallel Converter 118. The 8-bit message bytes are transferred to the Message Checker 34 via a common bus 124 which is shared with all of the Receivers 32a through 32n in the Operations Controller 12. The transfer of the message between the Receivers 32 and the Message Checker 34 is on a byte-by-byte basis in response to a polling signal generated by the Message Checker. The Message Checker 34 will systematically poll each Receiver one at a time in a repetitious sequence. MESSAGE CHECKER The details of the Message Checker 34 are shown in FIG. 11. The Message Checker 34 processes the messages received by the Receivers 32a through 32n and verifies their logical content, records any errors detected, and forwards the messages to the Fault Tolerator 36. The operation of the Message Checker 34 is controlled by a Sequencer 126 which context switches among the multiple Receivers 32a through 32n in order to prevent overrun of the Buffers 120 in each Receiver. Each Receiver 32a through 32n is polled in a token fashion to determine if it has a message byte ready for processing. If the message byte is ready for processing when it is polled by the Sequencer 126 the byte will be processed immediately by the Message Checker 34. Otherwise the Sequencer 126 will advance and poll the next Receiver in the polling sequence. The Sequencer 126 stores the Node identification (NID) code of the Node 10 associated with each Receiver. The Sequencer 126 also has a Byte Counter associated with each Receiver 32a through 32n which is indexed each time the Sequencer 126 unloads a byte from that particular Receiver. The byte count uniquely identifies the particular byte being processed by the Message Checker 34. The Sequencer 126 will transfer the Node identification code and the byte count to a Data Multiplexer 128 to tag the message byte as it is transferred to the Fault Tolerator 36. The Node identification code and the byte count are also transmitted to an Error Check Logic 130 and a Context Storage 132. The Error Check Logic 130 will check the Node identification code expected by the Sequencer 126 with the Node identification code contained in the first byte of the message being checked to determine if they are the same. When they are the different the Error Checker Logic 130 will generate an error signal which is recorded in an error status byte being generated in the Context Storage 132. The Node identification code is also used as an address into the Context Storage 132 where the relevant information pertaining to the message being processed is stored. The Context Storage 132 has a separate storage location for each Node 10 in the system which is addressed by the Node identification code contained in the message. The Context Storage 132 stores the message type (MT) code, the data identification (DID) code, the byte count, an error status byte, a data value mask, and an intermediate error signal for each message as it is being processed. As each byte is unloaded from the Receivers, the information in the Context Storage 132 will be used by an Address Generator 134 with the message type (MT) code, the data identification (DID) code, and the byte count which identifies the specific byte to be processed. In response to this information, the Address Generator 134 will output an address where the required processing information is stored in a Message Checker ROM 136. The Message Checker ROM 136 stores the maximum and minimum values for the data contained in the message, the valid data identification numbers for each message type, and a data mask which identifies how many data values are contained in the message being processed and the number of bytes in each data value. The maximum and minimum data values are transmitted to a Between Limits Checker 138 which will check the data contained in each data byte against these maximum and minimum values. The Between Limits Checker 138 will generate four different error signals as a result of the between limits checks. The first two are the maximum value (MXER) and minimum value (MNER) error signals, signifying the data value exceeded the maximum value or was less than the minimum value. The other two error signals are the equal to maximum value (MXEQ) and equal to minimum value (MNEQ) signals. These latter error signals are transmitted to the Error Check logic 130 which will store them in the Context Storage 132 as intermediate error signals. The Error Check Logic 130 will OR the vertical parity code and the longitudinal redundancy code error signals generated by the Receiver and generate a parity error signal which is recorded in the error status byte being generated in the Context Storage 132. As previously described, the Error Check Logic 130 will check the expected Node identification (NID) code against the Node identification code contained in the first byte of the message and will check the message type (MT) code by checking to see if bits in bit position 1, 3, and 4 of the first byte are identical. As previously described in the detailed description of the Transmitter 30 the middle bit of the 3-bit message type code is repeated in bit positions 3 and 4 for message type error detections. The Error Check Logic 130 will also check the validity of the data identification (DID) code contained in the second byte of the message against the maximum value for a (DID) code received from the Message Checker ROM 136 and will generate an error signal if the data identification code has a value greater than the maximum value. The Error Check Logic 130 will further check the two's complement range of the appropriate data byte and generate a range error (RNGER) signal when a two's complement error range is detected. It will also record in the Context Storage 132 the maximum (MXER) and the minimum (MNER) error signals generated by the Between Limits Checker 138. With regard to the Between Limits Checker 138, often it can be determined from the first byte of a multi-byte data value if the data value within or outside the maximum or minimum values received from the Message Checker ROM 136 and checking of the remaining bytes is no longer necessary. However, when the Between Limits Checker 138 generates a MXEQ or MNEO signal signifying that the data value of the byte being checked is equal to either the maximum or minimum limit value, it will be necessary to check the next byte against a maximum or a minimum value to make a factual determination of whether or not the received data value is within or outside the predetermined limits. The Error Check Logic 130 in response to an MXEQ or an MNEQ signal from the Between Limits Checker 138 will store in the Context Storage an intermediate value signal which signifies to the Context Storage 132 that the between limits check is to be continued on the next byte containing that data value. This process will be repeated with the next subsequent byte if necessary to make a final determination. During the checking of the next byte of the particular data value, the Context Storage 132 will supply to the Error Check Logic 130 stored intermediate value which identifies to which limit, maximum or minimum, the data value of the preceding data byte was equal. From this information, the existence or non-existence of a between the limits error can readily be determined by relatively simple logic as shown on FIG. 12. A Decoder 140 responsive to the intermediate value stored in the Context Storage 132 will enable AND gates 142 and 144 if the preceding between limits check generated a signal signifying the date value contained in the preceding byte was equal to the maximum value. Alternatively, the intermediate value will enable AND gates 146 and 148 signifying that the data value contained in the preceding byte was equal to the minimum value. If on the second byte the Between Limits Checker 138 detects a maximum limit error (MXER) and AND gate 142 is enabled, the maximum limit error MXER will be recorded in the error status byte being generated in the Context Storage 132. In a like manner, if a minimum limit error (MNER) is detected on the second byte and the AND gate 146 is enabled, the minimum limit error (MNER) will be stored in the error status byte. If the second byte applies an equal to maximum (MXEQ) or equal to minimum (MNEQ) signal to the inputs of the AND gates 144 and 148, respectively, an intermediate value will again be stored in the Context Storage 132 and the final decision delayed to the next byte. The data value mask received by the Context Storage 132 from the Message Checker ROM 136 identifies the number of individual data values that are in the Data Value message being processed in which data bytes belong to each data value. This mask is used by the Error Check Logic 130 to identify the last byte in each data value. On the last byte of any data value, only maximum or minimum limit errors will be recorded in the Context Storage error status byte. The MXEQ and MNEQ signals will be ignored. The Error Check Logic 130 will also detect if the message contained the correct number of bytes. The Context Storage 132 stores the message type (MT) code for each message being processed. In response to a message signal received with a message byte from a particular Receiver 32, the Error Check Logic 130 will decode the message type code stored in the Context Storage 132 and generate a number corresponding to the number of bytes that type of message should have. It will then compare this number with the byte count generated by the Sequencer 126 prior to receiving a new message signal from the Receiver 32 and will generate a message length error (LENER) signal when they are not the same. Because the length error (LENER) signal may not be generated until after the error status byte has been sent to the Fault Tolerator 36, the message length error signal will be passed to the Fault Tolerator 36 in the error status byte for the next message received from that Node. The format of the error status byte formed in the Context Storage 132 is shown in FIG. 13. In an ascending order of bit positions, starting with the least significant or zero bit position the error status byte contains a flag for the parity error (PARER) a flag for the length error (LENER) for the preceding message, a flag bit for the Node identification (NID) error, a flag bit for the data identification (DID) error, a flag bit for the message type (MT) error, a flag bit for the two's complement range error (RNGER) and flag bits for the maximum and minimum limit (MXER and MNER) errors. Returning to FIG. 11 the Data Multiplexer 128 transmits each message byte directly to the Fault Tolerator 36 as it is processed by the Message Checker 34. The Data Multiplexer will append to each message byte a decriptor byte which contains the Node identification code (NID) and the byte count (BYTC) received from the Sequencer 126 for that particular byte of the message. At the end of the message, independent of its length the Data Multiplexer 128 will transmit the error status byte stored in the Context Storage 132 as the last byte. The last byte is identified by a byte count "15" so that it can readily be identified by the Fault Tolerator 36 for fault analysis. FAULT TOLERATOR The details of the Fault Tolerator 36 are shown on FIG. 14. The Fault Tolerator 36 has a Message Checker Interface 150 which receives the messages byte-by-byte after being checked by the Message Checker 34. Upon receipt of an error free Task Completed/Started message, the message Checker Interface 150 will forward the identity (NID) of the Node which sent the message condition contained in the message to a Synchronizer Interface 152, the identity (TID) of the new task started, and the branch condition contained in the message to the Scheduler Interface 154. The message Checker Interface 150 will also send the Node identification (NID) code and the message type (MT) code to a Voter Interface 158 and the data along with a partition bit to a Fault Tolerator RAM Interface 160. The Message Checker Interface 150 will also forward the error status byte (byte=15) generated by the Message Checker 34 to an Error Handler 164 for processing. The Synchronizer 46 will report to the Error Handler 164 through the Synchronizer Interface 152 any errors it has detected in the Task Interactive Consistency (TIC) and System State (SS) messages. The Scheduler Interfae 154 will forward to the Scheduler 40 the task identification (TID) code of the task started and the Node identity (NID) of each received Task Completed/Started message. In return, the Scheduler 40 will transmit to the Error Handler 164 through the Scheduler Interface 154 any errors it has detected. The Transmitter Interface 156 will forward to the Transmitter 30 the Base Penalty Count and Error messages generated by the Error Handler 164. As previously described, the Transmitter Interface 156 will load the first byte of the message to be transferred into the Transmitter's Input Register to signify it has a message ready for transmission. It will then await the reassertion of the buffer available (BAB) signal by the Transmitter 30 before forwarding the remainder of the message to the Transmitter 30 for transmission. A Reset Generator 157 is responsive to a reset signal generated by the Error Handler 164 when it determines its own Node is faulty and to a power on reset (POR) signal generated when electrical power is first applied to the Node to generate an Operations Controller reset (OCRES) signal and an initial parameter load (IPL) signal which are transmitted to the other subsystems effecting a reset of the Operations Controller 12. The Fault Tolerator RAM Interface 160 will store in a Fault Tolerator RAM 162 the data contained in the message bytes as they are received from the Message Checker Interface 150. The Fault Tolerator RAM 162 is a random access memory partitioned as shown in FIG. 15. A message partition section 166, as shown on FIG. 15, stores in predetermined locations the messages received from each Node. In the message partition section 166 the messages are reassembled to their original format using the identifier byte appended to the message bytes by the Message Checker 34. A double buffering or double partitioning scheme is used to prevent overwriting of the data that is still being used by the Voter 38. A context bit generated by the Message Checker Interface 150 determines into which of the two partitions the new data is to be written. Separate context bits are kept for each Node and are toggled only when the error status byte indicates the current message is error free. As previously discussed relative to the Message Checker 34, the message length (LENER) byte of the error status byte signifies that the preceding message had a message length error and, therefore, is ignored in the determination of an error free condition for the current message. The format for a single message in the message partition section 166 is illustrated in FIG. 16. As shown, the message is reconstructed in its original format in the Fault Tolerator RAM 162 using the Node identification (NID) code and the byte count appended to each message byte in the Message Checker as a portion of the address. The context bit generated by the Message Checker Interface 150, along with the message partition code (bits 8 through 11) generated by the Fault Tolerator RAM Interface 160 completes the address and identifies which of the two locations in the message partition 166 the message from each Node is to be stored. The Fault Tolerator RAM 162 has three sections used by the Error Handler 164 for generating the Base Penaltyh Count and Error messages. An error code file section 170 stores the error codes used to generate the Error messages transmitted immediately after the beginning of each Atomic period and to generate the increment penalty count which is included in the Error message. Since there are thirty-five different error detection mechanisms in each Operations Controller 12, there is a possibility of two to the thirty-fifth power of error combinations that may result from each message transmitted in the system. In order to reduce the number of combination of errors to a reasonable number, compatible with the state of the art storage capabilities of the Fault Tolerator RAM 162, the error reports from the various subsystems are formated into special error codes as they are received. The formated error codes, as shown on FIG. 17, include an identification of the subsystem which reported the error plus a flag indication of the errors detected. For example, the error status byte received from the Message Checker 34 is formated into two separate error codes. The first error code contains the subsystem code 0000 which reported the errors and the error flags from the four least significant bits of the error status byte. The second error code contains the subsystem code 0001 and the error flags from the four most significant bits of the error status byte. These error codes are stored in the error code file section 170 at an address defined by the faulty Nodes identification (NID) code and report number as shown in FIG. 19. The error code file section 170 is double partitioned the same as the message partition section 166 so that two error files are stored for each Node. The context bit generated by the Message Checker Interface 150 identifies in which of the two error files for that Node the error code will be reported. Each error code is used to address a group mapping section 168 of the Fault Tolerator RAM 162. The error code addresses a penalty weight pointer, as shown in FIG. 18, which addresses a penalty weight section 172 of the Fault Tolerator RAM. As shown in FIG. 20, the penalty weight pointer addresses a specific penalty weight which is assigned to the specific combination of reported errors contained in the formated error code. The penalty weights resulting from each error code stored in the error file for that Node are summed in the Error Handler 164 and appended to the Error message as an increment penalty count (byte-8) for that Node. As previously indicated, the Error Handler 164 will generate only one Error message in each Atomic period for each Node which transmitted a message which contained an error. The Fault Tolerator RAM 162 will also store the deviance limits for the one byte (MTO) two byte (MT1), and four byte (MT2 and MT3) Data Value messages in four separate sections, 174, 176, 178 and 180, which are used by the Voter 38, as shall be explained with reference to the Voter hereinafter. The details of the Message Checker Interface 150 are illustrated in FIG. 21. A Store Message Module 182 receives the message bytes directly from the Message Checker 34 and stores them in the message partition section 166 of the Fault Tolerator RAM 162. The Store Message Module 182 will add the context bits stored in a Message Checker Interface Context Store 190 to the descriptor (NID plus byte count) appended to the message byte by the Message Checker 34 to generate a partition address (PID). The partition address identifies the location in the message partition section 166 where the particular message byte is to be stored. As previously discussed, at the beginning of each Master period, each Node will first transmit a Base Penalty Count message followed by a Task Completed/Started message. The Store Message Module 182 stores for each Node a first flag signifying the receipt of the Base Penalty Count message and a second flag signifying the receipt of the subsequent Task Completed/Started message. These flags are set to false at the beginning of each Master period and are reset to true when the Base Penalty Count and the Task Completed/Started messages are received for that Node. Unless both of these flags are set to true the Store Message Module 182 will disable the writing of the address of any subsequently received messages from that Node in a Voter Interface Buffer 184. As a result, the subsequently received data from that Node will not be processed by the Voter 38 and will be ignored during any subsequent processing. The Voter Interface Buffer is a 8.times.7 first in-first out buffer in which the four most significant bits are the four most significant bits of the partition address (context bits plus NID) for the received message in the message partition section 166 of the Fault Tolerator RAM 162. The remaining three bits are the message type code contained in the first byte of the message. An Error Status Byte Detector 186 listens to the messages being transmitted from the Message Checker 34 to the Fault Tolerator 36 and will detect the receipt of each error status byte (byte 15) generated by the Message Checker 34. If the content of the error status byte, with the exception of the length error (LENER) bit, are all zeros, the Error Status Byte Detector 186 will enable the Message Checker Interface Context Storage 190 to load the Voter Interface Buffer 184 through the Store Message Module 182, or to load a Task Completed Register 202 or to load a Branch Condition Register 200 as required. Otherwise the Error Status Byte Detector 186 will load each non-zero error status byte in an Error Status Buffer 188 for subsequent processing by the Error Handler 164. The Error Status Byte Detector 188 will also detect if a message is a self-test message (TID=NID) set a self-test flag in the Error Status Buffer 188. The Error Status Buffer 188 is an 8.times.12 first in-first out buffer in which the most significant bit is a self-test flag, the next three bits are the Nodes identification (NID) code and the remaining 8-bits are the received error status byte. The Message Checker Interface Context Storage 190 temporarily stores for each Node the information contained in Table III. This information is temporarily stored since it is not known if the message is error free until the error status byte is received.
TABLE 3
______________________________________
Message Checker Inerface Context Storage
Bit Description When Written
______________________________________
13 TIC Flag MT1, Byte Count = 2 (DID = 0)
12 Partition Context Bit Byte Count = 15
11-9 Message Type Code Byte Count = 1
8 Branch Condition Bit
MT6, Byte Count = 4
7-0 Started TID MT6, Byte Count = 3
______________________________________
The most significant bit, bit 13, signifies that the received message is a Task Interactive Consistency (TIC) message which is processed by the Synchronizer 46. This flag is set by a Task Interactive Consistency Message Detector 192 in response to a message type MT1 having a data identification code which are all zero's (DID=0) and will inhibit the loading of the address of this message in the Voter Interface Buffer 184 since it is only used by the Synchronizer and no other subsystem of the Operations Controller. The twelfth bit is the partition context bit which identifies in which partition of the message partition section 166 the message will be stored. The context bit is toggled when the Error Status Byte Detector 186 indicates the prior message was error free. If the message is not error free, the context bit is not toggled and the next message received from that Node is written over the prior message in the Fault Tolerator RAM 162. The message type code bits are received directly from the first byte of the message. The branch condition bit, bit-8, is received from a branch Condition Detector 194 which detects the branch condition contained in the fourth byte of the Task Completed/Started (MT6) message. The identification of the started task (TID) is obtained from a Task Started Detector 196 which loads the TID of the started task into the seven least significant bit locations of the Message Checker Interface Context Storage 190. Upon the receipt of an error status byte which signifies that the received message was error free and if the message is not a Task Interactive Consistency message, the Message Checker Interface Context Storage 190 will transfer the context bit and the message type to the Store Message Module 182. In the Store Message Module 182, the context bit is added to the Node identification (NID) code to form the starting partition (PID) address of that message in the Fault Tolerator RAM 162. The message type code is appended to the partition address and they are transferred to the Voter Interface Buffer 184 for subsequent use by the Voter 38 to extract the data necessary for the voting process. Upon the receipt of an error status byte signifying the receipt of an error free Task Completed/Started (MT6) message, the Message Checker Interface Context Storage 190 will transfer the identification (TID) code of the stared task and the Node identification (NID) code to a Scheduler Interface Buffer 198 where it is transferred to the Scheduler 40 when requested. The Scheduler Interface Buffer 198 is an 8.times.11 bit first in-first out buffer which is reset at the end of the soft error window (SEW). The soft error window is generated by the Synchronizer 46 and defines a period of time bracketing the end of each Subatomic period during which the time critical messages from other Nodes should be received if they are in synchronization with each other. In parallel, the Message Checker Interface Context Storage 190 will transfer the stored branch condition (BC) bit to the Branch Condition Register 200 and transfer the node identification (NID) code of the Node that send the messge to the Task Completed Register 202. These registers are read by the Synchronizer Interface 152 when requested by the Synchronizer 46. The Branch Condition Register 200 and the Task Completed Registers 202 are double buffered with a different set of registers being reset at the end of each hard error window (HEW) signal. The hard error window signal is generated by the Synchronizer 46 and brackets the soft error window (SEW) at the end of each Subatomic period and defines the maximum deviance in the arrival time of the time critical messages from the other Nodes. The function of the hard error window (HEW) and soft error window (SEW) will be discussed in greater detail in the detailed desciption of the Synchronizer 46. The Error Handler, as shown on FIG. 22, includes an Error Filer 204, an Error Consistency Checker 206, an Error Message Generator 208, and an Error Handler Context Store 210. The Error Filer 204 polls the Message Checker Interface 150, the Synchronizer Interface 152, the Scheduler Interface 154, and the Voter Interface 158 for error reports from the various subsystems within the Operations Controller. The Error Filer will format the received error reports into a formated error code, as shown on FIG. 17, and tag them with an error file address, as shown on FIG. 19. The error filer address is a 3-bit error file identification code, a context bit which is the one generated by the Message Checker Interface 150 for filing the message in the message partition of the Fault Tolerator RAM 162, the Node identification (NID) code and a report number. As previously described, the fomated error code contains a 4-bit code which identifies the subsystem which detected the error and four flag bits identifying the errors detected. The Error Filer 204 will pass these formated error codes to the Fault Tolerator RAM Interface 160 which will store them in the error code file section 170 of the Fault Tolerator RAM 162. The Error Filer 204 will also forward the number of error reports written to the Error Handler Context Store 210 so that the Error Message Generator 208 will be able to determine how many error reports to process from the Fault Tolerator RAM 162. The Error Filer 204 will also detect the self-test flag generated by the Message Checker 34 and forward this flag to the Error Message Generator 208. The self-test flag is part of one of the group codes whose penalty weight is programmed to be zero or a very small value. The self-test error message will identify all of the errors detected and will include the Incremental and Base Penalty Count. The Error Consistency Checker 206 is responsible for consistent handling of the error resports and the base penalty counts for each Node in the system. A form of implicit interactive consistency is ued to achieve this goal. At the beginning of each Master period, the Error Consistency Check 206 receives through the Voter Interface 158 a voted base penalty count (VBPC) which is generated by the Voter 38 in response to the Base Penalty Count messages received from all the Nodes in the system including its own. Referring now to FIG. 23, these voted base penalty counts are stored in a Base Penalty Count Store 212 as the base penalty counts for each Node independent of the values of the base value penalty count stored for the preceding Master period. In this manner all the Nodes in the system will begin each Master period with the same base penalty counts for each Node in the system. The Base Penalty Count Store 212 also receives a voted increment penalty count (VIPC) which is gnerated by the Voter 38 from the error messages received from all of the Nodes including its own. The voted increment penalty count (VIPC) is added to the base penalty count of the accursed Node when the error is verified by a Validity Checker 218. Preferably the validity Checker 218 is embodied in the Voter 38, but may be part of the Error Consistency Checker 206 as shown in FIG. 23. The Error Consistency Checker 206 also maintains a Current System State Register 214 which stores a voted current system state (CSS) vector and a Next System State Register 216 which stores a next system state (NSS) vector. The current system state vector identifies which Nodes are currently active in the system and which are excluded, while the next system state vector identifies which Nodes are to be included and/or which are to be excluded in the next system state of the system. The system will change its state at the beginning of the next Master period if the voted next system state vector is different from the current system state vector. The current and next system state vectors have 8 flag bits, one for each Node, which are set when the Node is excluded and which are reset when the Node is readmitted to the operating set of Nodes. Prior to the discussion of the Validity Checker 218, the various types of errors that are detected in each Node will be discussed briefly. Table IV is a list of twenty-five fault detection mechanisms used in the systems.
TABLE IV
______________________________________
Fault Detection Mechanisms
Error Subsystem Sym/Asym
______________________________________
Messgae Vertical Parity
MSC A
Message Longitudinal Redundancy
MSC A
Message Length MSC A
Synchronization - Hard
MSC A
Synchronization - Soft
MSC A
Send Note ID MSC S
Invalid Message Type
MSC S
Invalid Data ID MSC S
Task ID Sequence FLT S
Data ID Sequence FLT S
Data Limit MSC S
Data Deviance FLT S
Task Run Time SCH S
Current System State
FLT S
Next System State FLT S
Penalty Count Base Deviance
FLT S
Penalty Count Increment Deviance
FLT S
Missed BPC Message FLT S
Unsupported Error Report
FLT S
Missing Error Report
FLT S
Self Detection Monitor
FLT S
M.P. Misalignment SYN S
Sync Sequence Error
SYN S
Sync Missing Message
SYN S
Too Many Data Messages
VTR S
AP Reported Error TSC S
Last DID Shipped TSC S
Wrong Message during SEW
FLT A
Too Many Error Reports
VTR S
Too Many BPC VTR S
Exceeded Max. No. of Errors
FLT A
______________________________________
This table lists the error, the subsystem which detects the error, and whether the detection of the error is symmetric (S) or asymmetric (A). Since the system is symmetric in its structure, most of the errors contained in the messages transmitted to each other should be detected by every other Node. Therefore, every Node should generate an error message which identifies the error detected and the incremental penalty counts to be charged against the Node that made the error. These errors which are detected by all of the Nodes are called symmetric errors. Therefore, the existence of symmetric errors should be verified by at least a majority of the active Nodes in the system. There also is the case where channel noise occurs so that an error manifests itself differently among the receiving Nodes. In this case, the majority of the Nodes will agree which Node is faulty. However, the error or errors detected may be different for each Node and the incremental penalty count reported in the various error messages may likewise be different. A median vote on the incremental penalty count will be used to increment the base penalty count for that Node. However, the Validity Checker 218 will not generate a deviance error report to the Error Filer 204 identifying those Nodes whose incremental penalty counts differed from the voted incremental penalty count by more than the allowed amount. This is to prevent the unjust penalizing of a healthy Node. Turning now to FIG. 24, the Validity Checker 218, whether embodied in the Voter 38 or the Fault Tolerator 36, has a Majority Agree Detector 224, an Asymmetric Error Detector 226, and an Error Reporter 230. The Majority Agree Detector 224 receives a "missing vector" from the Voter 38 indicating from which Nodes the Base Penalty Count or Error messages were not received and a "deviance vector" indicating which Nodes generated a Base Penalty Count or Error message in which the base penalty or increment penalty counts were outside of the permitted deviances about the voted values. The Majority Agree Detector 224 also receives a current system state vector from the Current System State Register 214. The Majority Agree Detector 224 will subtract the "deviance vector" and the "missing vector" from the current system state vector to generate a number corresponding to the number of Nodes which agree with the voted value. This number is then compared with the number of Nodes currently active in the system identified by the current system state vector. If a tie or a majority of the Nodes sent messages whose values agree with the voted values, then if the message is a Base Penalty Count message, the voted base penalty counts are stored in the Base Penalty Count Store 212. Otherwise, if the message is an Error message, the base penalty count stored in the Base Penalty Count Store 212 is incremented by the voted increment penalty count. If the messages received from the other Nodes do not represent a majority, then the Majority Agree Detector will generate a write inhibit signal which is applied to the Base Penalty Count Store 212 through an AND gate 234. This write inhibit signal will inhibit the writing of the voted values in the Base Penalty Count Store 212 provided the reported error or errors is not an asymmetric error. The Asymmetric Error Detector 226 receives the deviance vector, the missing vector, the current system state vector, and generates a deviance report inhibit signal when a majority of the Nodes send error messages identifying a particular Node as faulty but they disagree as to the incremental penalty counts to be charged against the faulty Node. The Asymmetric Error Detector will interrogate the Error Handler Context Store 210 and will generate the deviance report inhibit signal when the detected errors are determined to be asymmetric errors of the type identified in Table IV. The deviance report inhibit signal will inhibit the Error Reporter 230 from reporting to the Error Filer 204 a deviance error for any Node which sent an error message containing an incremental penalty count which deviated from the voted incremental penalty count by more than the permitted tolerance. The deviance report inhibit signal is also applied to an inverted (negative) input of the AND gate 234. The deviance report inhibit signal will disable the AND gate 234 and block the write inhibit signal generated by the Majority Agree Detector 224. This will enable the voted incremented penalty count to be added to the base penalty count stored in the Base Penalty Count Store 212. The Error Reporter 230 receives the missing and deviance vectors from the Voter 38, the current system state (CSS) vector from the Current System State Register 214, the error report inhibit signal from the Asymmetric Error Detector 226, and the write inhibit signal from the output of the AND gate 234. In response to the absence of a write inhibit signal, the Error Reporter 230 will report to the Error Filer 204 the Node identified in the deviance vector as having deviance errors, it will also report in response to the missing vector each Node which did not send a Base Penalty Count or Error message as required. In response to a write inhibit signal and the absence of an error report inhibit signal from the Asymmetric Error Detector 226, the Error Reporter 230 will report each Node having reported an unsupported error. No deviance errors are reported for these unsupported Error messages. Finally, in response to an error report inhibit signal from the Asymmetric Error Detector 226, the Error Reporter 230 will report to the Error Filer 204 any Node which fails to report the Asymmetric error as identified by the missing vector. As previously described, the Error Reporter 230 will not report any deviance errors in the presence of a deviance report inhibit signal from the Asymmetric Error Detector. Returning to FIG. 23, the Error Consistency Checker 206 also includes an Exclude/Readmit Threshold Comparator 220 responsive to the incrementing of the base penalty count in the Base Penalty Count Store 212 by the voted increment penalty count. The Exclude/Readmit Threshold Comparator 220 will compare the incremented base penalty count with a predetermined exclusion threshold value and when the incremented base penalty count exce | ||||||
