Hardware process scheduler and processor interrupter for parallel processing computer systems5379428Abstract A general-purpose device for scheduling and dispatching processes having variable processing priorities in parallel processing computer systems having multiple processors. The device comprises a plurality of cluster schedulers for scheduling processes on processor clusters and a processor interrupter for interrupting the processors. The cluster scheduler has an insert register queue that queues processes for execution on the processors. The cluster scheduler outputs the most significant process priority number awaiting execution. Processors access the cluster scheduler output in order to determine the next process to execute. The processor interrupter is initiated by the cluster scheduler whenever a new process is queued by the cluster scheduler. The processor interrupter compares the priority number of the active processes executing on the processors with the cluster scheduler output, and interrupts the processor if it is executing a process with a priority number of lesser value than the cluster scheduler output. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE 1
______________________________________
FLIGHT TESTER APPLICATION
Process Process Function
Priority Number
______________________________________
ACQ Acquires and 200
Determines validity
of incoming raw
data. Assembles
data into "frames."
TP Builds well- 199
structured display
blocking buffer.
Rate 1 Builds display
198
buffers at rate of
64 buffers per
second.
Rate 2 Builds display
197
buffers at rate of
32 buffers per
second.
Rate 3 Builds display
196
buffers at rate of
16 buffers per
second.
Rate 4 Builds display
195
buffers at rate of
8 buffers per
second.
Rate 5 Builds display
194
buffers at rate of
4 buffers per
second.
Rate 6 Builds display
193
buffers at rate of
1 buffer per
second.
______________________________________
In this example, the process labeled ACQ is assigned the highest priority number. When a frame is deposited in the frame buffer memory on the VME bus, an interrupt is generated over the VME bus which triggers the ACQ process. Request Flow Manager (RFM) 12a then places a request over VME bus 8 for processor execution to the only cluster scheduler 20a (only one cluster scheduler 20a is employed because the system shown in FIG. 2 is configured for only one processor). RFM 12a places a request to cluster scheduler 20a by inserting the priority number of the ACQ process into the insert register (not shown) of cluster scheduler 20a. Because there are no other processes now executing on the only processor located on cluster 14a, the priority number of ACQ is the highest number in the queue, and is loaded into the next register (not shown) associated with cluster scheduler 20a located in processor interrupter 22. Again, because there are no other processes executing, processor interrupter 22 immediately dispatches the ACQ process to execute on the processor (e.g. CPU1 referred to in FIG. 2 by numeral 31) located in cluster 14a. The ACQ process determines the validity of an incoming frame. When a frame is determined to be valid, the ACQ process begins its termination sequence. The ACQ process triggers the TP process to execute on CPU1. The TP process sifts through incoming frames in order to build a "display blocking buffer". The display blocking buffer has a well structured description of the incoming telemetry data from the airplane and has parameters in order p.sub.1 to Pn. All other processes in this flight tester application manipulate data in the display blocking buffer. The display blocking buffer is stored in cluster memory 17a-17d. The TP process executes in a relatively short period of time. The TP process has several sub-processes shown in Table 1. As shown in Table 1, these sub-processes are labeled RATE 1 through RATE 6. The lower numbered rates use higher process priority numbers and require executions more frequently than higher numbered rates. For example, RATE 1 has a priority number of 198 and operates on data that needs to be updated 64 times per second. RATE 2 has a priority number of 197, and operates on data that needs to updated 32 times per second. RATE 3 has a priority number of 196, and needs to operate on data 16 times per second. RATE 4 has a priority number of 195 and needs to operate on data 8 times per second. RATE 5 has a priority number of 194 and needs to operate on data 4 times per second. Finally, RATE 6 has a priority number of 193, and needs to operate on data once a second. When the TP process concludes, it requests that two of its subordinate processes (RATE processes) be executed. However, because there is only one processor in the system CPU1 31, only one RATE process can be executed. The other RATE process must be queued. The TP process writes the priority number of both RATE processes into the insert register (not shown) of cluster scheduler 20a. Assume that TP requested that RATE 2 and RATE 4 be executed. In that case, the priority number of RATE 2 (decimal 197) and the priority number of RATE 4 (decimal 195) is written to the cluster scheduler 20a. Cluster scheduler 20a then determines the highest priority number (197) in its insert register. Cluster scheduler 20a writes this number into the processor interrupter 22 next register (described hereinbelow). At this point, the decimal value 197, in the processor interrupter 22, may be read by any processor in the system (in this example, only processor CPU1 31). When the TP process begins its termination sequence, it reads the processor interrupter next register to determine the process to execute on CPU1 31. TP then triggers RATE 2 to execute on CPU1 31. If RATE 2 is not a multi-threaded process, its priority number is withdrawn from the cluster scheduler insert register (not shown), which causes the priority number of RATE 4 (195) to be loaded into the next register of the processor interrupter 22 (as it is the highest priority in the queue). Processor interrupter 22 then compares the value of the highest priority waiting process (195) with the lowest priority active process (197). Because CPU1 31 is currently executing the highest possible priority process, processor interrupter does nothing. However, when RATE 2 begins its termination sequence, it first reads the next register in the processor interrupter. Because the next register has the priority number of RATE 4, RATE 4 is executed on CPU1 31. As processing requests increase in number and frequency, the number of waiting processes also increases, eventually degrading system performance. At some point, the demand for processing power will surpass the ability of the single CPU1 31. However, by using the preferred embodiment of the present invention PSI 10 shown in FIG. 2, processing power is increased simply by adding more parallel processing CPUs and more cluster schedulers. The utility of the present invention is that PSI 10 permits design of software application programming without consideration for scheduling needs. The system shown in FIG. 2, using the hardware scheduler of the present invention PSI 10, does not need to accommodate scheduling algorithms in software. Parallel processing tasks are accomplished simply by adding more CPUs. There is no need to change existing application software. For example, in the system using processes with the priority numbers given in table 1, and designed with the 16 parallel processors and PSI 10 shown in FIG. 2, assume that CPU1 31 executes the ACQ process. When ACQ, with priority number 200 finishes, the TP process is executed on CPU2 32. At the conclusion of the TP process, two of the RATE processes are initiated. Assume that process RATE 3 and RATE 5 are triggered by TP. RATE 3 is initiated on CPU1 31 in cluster 14a. RATE 5 is initiated on CPU2 32. RATE 3 has a priority number of 196. At some point during the execution of RATES 3 and 5 the acquisition process (ACQ) will be initiated. At that time, the priority number 200 is presented to PSI 10. PSI 10 determines the lowest priority active process executing on a cluster 14a. If the lowest priority active process has a lower priority than the highest priority of the processes awaiting execution, that active process is interrupted by processor interrupter 22. In this example, CPU1 31 is executing process RATE 3 having a priority number of 196 and CPU2 32 is executing process RATE 5 having a priority number of 194. CPUs 33 and 34 are idle and therefore have active process priority numbers of 0. Therefore the lowest priority active process in CS1 20a for cluster 14a is 0. Process ACQ, with a priority number of 200, is greater than active process priority number 0 of CPU3 33. Therefore, process ACQ will be dispatched by PSI 10 to execute on CPU3 33. At the same time, the cluster scheduler associated with cluster 14a, i.e. cluster scheduler 20a, has priority number 200 withdrawn from its queue of waiting processes. Furthermore, processor interrupter 22 adds priority number 200 to its list of active process priority numbers associated with cluster 14a. Eventually there will be several RATE processes executing on all four CPUs, CPU1 31, CPU2 32, CPU3 33 and CPU4 34, in cluster 14a. For this example, assume that CPU1 31 is executing process RATE 2, CPU2 32 is executing process RATE 3, CPU3 33 is executing process RATE 4, and CPU4 34 is executing process RATE 5. At some point, an interrupt is generated from Front End CPU 12b and priority number "200" assigned to process ACQ is inserted via system bus 8 into cluster scheduler 20a in PSI 10. Cluster scheduler 20a then produces the highest waiting process priority number of processes awaiting execution on processors located in cluster 14a. In this case, process priority number "200" is the highest waiting process priority number, and therefore, cluster scheduler 20a will transmit priority number 200 to processor interrupter 22 via bus 24a. Cluster scheduler 20a will also generate a signal over bus 24a that loads the next register (not shown) in processor interrupter 22 with the priority number " 200." Loading the next register causes processor interrupter 22 to compare the four active process priority numbers of processes executing on cluster 14a to the priority number just loaded into the next register. Because the lowest active process priority number is 194, executing on CPU4 34, CPU4 34 must be interrupted by processor interrupter 22. Process RATE 5 is then preempted and priority number 194, associated with RATE 5, is written into the waiting process insert register (not shown) in cluster scheduler 20a. The priority number of "200" is then withdrawn from the insert register (not shown) of the cluster scheduler 20a. Withdrawing the priority number causes the next register of the processor interrupter 22 to be loaded with the highest waiting process number in the insert register of cluster scheduler 20a, i.e. the value of "194." The number "200" is written to the processor interrupter 22 and recorded as an active process. Process ACQ will then execute on CPU4 34 until it concludes. As is shown by this example, the preferred embodiment of the present invention provides for flexible and easily upgraded parallel processing system design. Processing power is increased simply by adding more processors and more cluster schedulers. Changes in system hardware are transparent to application programming. Process scheduling is also transparent to application programming. The embodiment of the present invention used by the parallel processing system of FIG. 2 is now further described with respect to FIG. 3. As shown in FIG. 3, cluster schedulers 20 comprise four independent cluster schedulers: cluster schedulers 20a, 20b, 20c, and 20d. As described hereinabove in reference to FIG. 2, each cluster scheduler 20a-20d independently schedules processes for its respective processor cluster 14a-14d (shown in FIG. 2). Referring now to FIG. 3, priority numbers for processes requesting execution are input over communication busses 21a, 21b, 21c and 21d. Communication busses 21a-21d are connected to bus interface logic 18 (shown in FIG. 1). Processes requiring execution input their priority numbers into the cluster scheduler that serves the desired processor. For example, a process requesting execution on a processor in cluster 14a (shown in FIG. 2) inputs the priority number of the process into cluster scheduler 20a via communication bus 21a. That process also presents a signal via communication bus 21a that indicates that a process priority number must be inserted into the insert register (not shown) of cluster scheduler 20a. Cluster scheduler 20a is described in more detail hereinbelow with reference to FIG. 4. The four cluster schedulers 20a-20d communicate with processor interrupter 22 over PSI internal bus 24 (shown in FIG. 1), shown in FIG. 3 as communication busses 24a-24d, and lines 24a'-24d'. The plurality of cluster schedulers 20 are designed to continuously provide at their output the highest process priority number awaiting execution. For example, cluster scheduler 20a, described more fully in reference to FIG. 4, is designed to constantly provide, at communication bus 24a, the highest process priority number awaiting execution on a processor in processor cluster 14a (shown in FIG. 2). Similarly, cluster scheduler 20b produces upon bus 24b the priority number of the highest priority process awaiting execution by a processor located in processor cluster 14b (shown in FIG. 2). Cluster schedulers 20c and 20d perform identical functions in order to schedule processes for their associated processor clusters 14c and 14d (shown in FIG. 2). In addition to providing the highest process priority number awaiting execution, the cluster schedulers 20a-20d also provide signals 24a', 24b', 24c' and 24d', that indicate that a new highest process number is ready for execution. For example, when a process requires execution on processor cluster 14a (shown in FIG. 2) the priority number of that process is inserted, via line 21a, into the insert register (not shown) of cluster scheduler 20a. Cluster scheduler 20a then determines whether the new incoming process has a higher priority number than any other process priority numbers already awaiting execution and stored in the insert register (not shown) contained in cluster scheduler 20a. If the new incoming process priority number is greater than all other process priority numbers stored in the insert register (not shown), then cluster scheduler 20a produces that process priority number on communication bus 24a. In this way, cluster scheduler 20a constantly provides the highest priority number of waiting processes to processor interrupter 22 via communication bus 24a. In addition, cluster scheduler 20a also provides a signal via control line 24a' to indicate to processor interrupter 22 that there is a new highest process priority number in the insert register (not shown) of cluster scheduler 20a. Cluster schedulers 20b-20d similarly have control lines 24b'-24d' which provide similar signals to processor interrupter 22. As shown in FIG. 3, the processor interrupter 22 comprises a preemptor sequencer 80, a group of four "next" registers 38a-38d, a bank of 16 active registers 40, and a comparison circuit 42. Preemptor sequencer 80 monitors lines 24a'-24d' to determine whether processor interrupter 22 is initiated. The group of four next registers 38a-38d are loaded with the value of the highest process priority number awaiting execution. For example, next register 38a is loaded by cluster scheduler 20a via bus 24a (which carries the highest priority number of waiting processes) and line 24a' (the load next signal) with the highest priority process number stored in the insert register (not shown) located in cluster scheduler 20a. Similarly, next register 38b is loaded by cluster scheduler 20b, next register 38c is loaded by cluster scheduler 20c and next register 38d is loaded by cluster scheduler 20d. As described hereinabove, the next registers may be read by any processor in the parallel processing system. As is described below in more detail with reference to FIG. 6, the next registers may be read using communication bus 23. Any processor on the system may address the next registers by communicating with the preemptor sequencer 80 over bus 23. Preemptor 80 then will enable the tri-state outputs of the next registers (shown in FIG. 6) and gate the appropriate output data onto bus 23, to be read by the addressing processor. This process is described in more detail below with reference to FIG. 6. Anytime one of the next registers 38a-38d, is loaded with a process priority number, one of the lines 24a'-24d is asserted, and the preemptor sequencer 80 is initiated. The preemptor sequencer 80 controls the operation of the processor interrupter 22. Whenever a next register 38a-38d is loaded by a cluster scheduler, preemptor sequencer causes the next register value associated with that cluster scheduler to be compared with the priority numbers of active processes executing on processors located in the processor cluster associated with that cluster scheduler. If the value of the next register is greater than the value of any active process priority number, comparison circuit 42 will produce a signal on communication bus 23 to interrupt the processor that is executing the lowest active priority process. For example, cluster scheduler 20a produces the highest waiting process priority number on bus 24a and triggers the loading of that priority number into next register 38a by asserting a signal on line 24a'. Therefore, next register 38a is loaded with the highest process priority number awaiting execution on a processor associated with cluster scheduler 20a, in this case, processor cluster 14a (shown in FIG. 2). Soon after loading next register 38a with the process priority number, preemptor sequencer 80 initiates operation. The process priority number in next register 38a is compared with the priority number of all four active processes executing on the four processors in processor cluster 14a (shown in FIG. 2). The priority number of the active processes executing on processor cluster 14a (shown in FIG. 2) are contained in the bank of 16 active process registers 40, and are referred to in FIG. 3 as 40a, 40b, 40c and 40d. The value of the priority number stored in active register 40a is compared with the value stored in next register 38a by eight-bit comparator 42a located in comparison circuit 42. Similarly, the values stored in active process registers 40b-40d are compared with the value of the priority numbers stored in next register 38a using eight-bit comparators 42b-42d. If any active process number stored in active process registers 40a-40d have a value less than the priority number stored in next register 38a, a logic high is produced at logic line 44, which causes an interrupt signal to be generated at communication line 23a. The value of next registers 38b-38d, when loaded by their respective cluster schedulers 20b-20d, are similarly compared with the other twelve active process registers contained in the bank of active process registers 40. In this manner, whenever one of the cluster schedulers 20a-20d produces a next highest priority process for scheduling, processor interrupter 22 compares the next register value with the active process priority numbers currently executing. If the incoming process has a priority number that is higher than the lowest active process currently executing, processor interrupter 22 initiates an interrupt causing the processor executing the lowest priority active process to be interrupted. Preemptor sequencer 80 indicates which processor in which cluster to interrupt using communication bus 23b. Referring now to FIG. 4, the preferred embodiment of the cluster scheduler of the present invention is shown in block diagram format. In the embodiment of the present invention shown in FIG. 2, all cluster schedulers 20a, 20b, 20c, 20c, are identical and are implemented as shown in FIG. 4. Cluster scheduler 20a comprises a scheduler sequencer 46, an insert register 48, and a priority encoder 50. Scheduler sequencer 46 controls the operation of cluster scheduler 20a. Insert register 48 is a 256 bit long by one bit wide register. Insert register 48 is designed so that any one bit within insert register 48 is accessible to any processor in the system, and any one bit within insert register 48 may be set or cleared without affecting any other bit within insert register 48. Communication bus 21a carries the process priority number for the process requesting scheduling by cluster scheduler 20a. In the embodiment of the present invention shown in FIG. 4, the priority number for the process requesting scheduling is presented to the address lines of insert register 48 on bus 52. Insert register 48 decodes the address presented via line 52 to select one of the 256 bits in insert register 48. The signal on line 54 indicates whether the bit selected by address lines presented by bus 52 should be set or cleared. If input line 54 is at a logical low value, the bit selected by the value presented on address line 52 is cleared; alternatively, if line 54 is at a logical high value the selected bit is set within insert register 48. Insert register 48 is arranged so that the lowest addressable bit represents the lowest process priority number, the highest addressable bit represents the highest process priority number. The lowest possible priority number is one (or hexadecimal 01, hereinafter represented as x01); the highest is 255 (or xff). When the address value of x01 is presented to the address lines of insert register 48 on bus 52, the least significant bit in insert register 48, representing the lowest process priority number, is either set or cleared, depending upon the status of line 54. Similarly, when the value of xff is presented on the address lines of insert register 48 via bus 52, the most significant bit in insert register 48, representing the highest priority process number 255, is either set or cleared depending on the status of line 54. Accordingly, insert register 48 produces a 255 bit-wide value on bus 56. The least significant bit in insert register 48, and thus the least significant bit on bus 56, represents the least significant priority process number. The most significant bit represents the most significant priority process number. This 255 bit-wide number is input to priority encoder 50. Priority encoder 50 provides an eight-bit binary representation of the most significant bit that is set in insert register 48. For example, if bin one is the only bit set in register 48 priority encoder 50 will produce a x01 on its output and transmit it over 24a. If the highest bit set in insert register 48 is the most significant bit 255, then priority encoder 50 will transmit the binary representation of 255, or xff, over bus 24a to the processor interrupter 22. Scheduler sequencer 46 controls the operation of insert register 48 and priority encoder 50 via control lines (not shown). Any time a bit is set or cleared in the insert register, scheduler sequencer 46, using signal line 24a', then signals processor interrupter 22 to load the highest process number provided on bus 24a into the next register 38a (shown in FIG. 3) located within processor interrupter 22. Cluster scheduler 20a of FIGS. 1-4 is now described in more detail with reference to FIG. 5. As shown in FIG. 5, cluster scheduler 20a comprises a scheduler sequencer 46, an input PAL 56, an insert random access memory (RAM) 58, and an output PAL 60. FIG. 5 demonstrates how cluster scheduler 20a of FIG. 4 is easily and inexpensively implemented using only a few available SSI and MSI integrated circuits. Communication bus 21a is the conduit for all system bus 8 (shown in FIG. 1) signals to cluster scheduler 20a. Line 54 is connected to input PAL 56. Bus 52, which transmits the priority number of the process requesting scheduling, is divided into bus lines 52a and 52b. Bus 52a is presented to the input of scheduler sequencer 46, and bus 52b is presented to the input of input PAL 56. Bus 52a is a five-bit wide bus which carries the five most significant bits of the priority number for the process requesting scheduling transmitted over bus 52. Bus 52b transmits the three least significant bits of the priority number for the process requesting scheduling. Scheduler sequencer 46 is initially reset by a system reset signal provided via bus 21a over reset line 62. Insert RAM 58, in conjunction with input PAL 56 and scheduler sequencer 46, implements the function of insert register 48 of FIG. 4. Insert RAM 58 has 32 separately addressable eight-bit memory locations (a larger RAM having more than 32 addressable words may be used so long as the unused address lines are tied to a logical low value) and is addressed by scheduler sequencer 46 via bus 64. Scheduler sequencer 46 may read from and write to any one of the 32 selected memory locations using control line 66 connected to the read and write control input of insert RAM 58. The operation of cluster scheduler 20a now is described with reference to FIG. 5. When the reset signal 62 is asserted, the scheduler sequencer 46 clears all the bits in insert RAM 58 to zero. Scheduler sequencer 46 accomplishes this clearing function by starting at address x00 asserted on bus line 64 and continuing to write zeros in the addressed location using control lines 66 and 68. This initialization process ensures that all process priority numbers contained in the insert RAM 58 are initially set to zero. At the end of the initialization process, the scheduler sequencer 46 loads the next register 38a (shown in FIG. 3) with zero by asserting signal line 24a'. At this point the scheduler sequencer 46 enters an idle stage. Scheduler sequencer 46 remains in the idle state until a new process priority number is presented on bus 52. When a process requires scheduling by cluster scheduler 20a, the process priority number is transmitted by the process over system bus 8 (shown in FIG. 2) to the cluster scheduler 20a via communication bus 21a. Line 54 is asserted high to indicate that the process priority number needs to be inserted into RAM 58. Line 54 going high causes scheduler sequencer 46 to transfer the five most significant process priority bits from bus 52a to bus 64. The five most significant bits of the process priority number presented on bus 64 are thus presented to the address lines of insert RAM 58. In this way, the scheduler sequencer 46 uses the five most significant bits of incoming process priority numbers to address one of 32 eight-bit words in insert RAM 58. The scheduler sequencer 46 then asserts line 66 high, which enables the selected data word addressed by the five most significant process priority number bits to be output onto bus 70. Scheduler sequencer 46 then latches the selected insert RAM 58 data word from bus 70 into the input PAL 56 by asserting control line 72. At this point, the input PAL 56 contains a copy of the selected one of the 32 eight-bit words in insert RAM 58. The eight-bit data word contained in input PAL 56 was selected by the five most significant bits of the priority process number that was inserted on bus 52. The three least significant bits of the priority number inserted on bus 52 are input to input PAL 56 over bus 52b. The three least significant bits of the process priority number presented on bus 52b define an address that selects one of the eight previously latched into input PAL 56. The bit selected by the three least significant bits of the priority process number transmitted on bus 52b is either set or cleared depending on the status of control line 54. Input PAL 56 then affects one of the eight bits previously latched from insert RAM 58 as determined by the status of control line 54 and the three least significant bits of priority process on bus 52b. Input PAL 56 sets a bit in the RAM 58 anytime a process priority number is inserted into cluster scheduler 20a; input PAL 56 clears a bit in the RAM 58 anytime a process priority number is withdrawn from cluster scheduler 20a. For example, if control line 54 is a logical high and bus: 52b is a binary 101, insert PAL 56 sets the fifth bit of the eight bits previously latched from insert RAM 58. Once input PAL 56 sets or clears one of the eight data bits as determined by the least significant process priority bits presented on bus 52b, the affected data word is ready to be reinserted into insert RAM 58. The input PAL 56 signals the scheduler sequencer 46 that it is has affected the selected bit by asserting status line 74. Scheduler sequencer 46 then causes insert RAM 58 to read the affected data word from input PAL 56 into the same address previously selected by the five most significant process priority bits on bus 64. Scheduler sequencer 46 accomplishes the write function by asserting line 66 to a logical low value (assuming that the RAM write control input is true when low). At this point, insert RAM 58 has one of its 256 bits set. The bit that is set is determined by the value of the priority process number inserted on bus 52. In effect, the insert RAM 58 performs the function of the insert register 40a shown in FIG. 4. Essentially, the insert RAM 58 emulates a 256 by 1 bit register. By using scheduler sequencer 46 in conjunction with the input PAL 56, the insert RAM 58 emulates a 256 by 1 bit register, where any one of the 256 bits can be set or cleared without affecting any other bit in the register. After sequencer 46 has written the affected data word into insert RAM 58, sequencer 46 then scans insert RAM 58, searching for the highest bit that is set. As discussed hereinabove with reference to insert register 48 (shown in FIG. 4), insert RAM 58 has data bits which represent process priority numbers. The least significant bit of the lowest addressable data word, i.e., bit 0 of data word x00, represents process priority number 0. The most significant bit of the most significant data word, i.e., bit 8 of data word 31 (address x1f), represents process priority number 255. Sequencer 46 scans insert RAM 58 for the most significant bit set in the most significant data word. Sequencer 46 therefore starts by asserting hexadecimal address x1f on bus 64 and keeps line 66 high in order to read the data word associated with the address selected on bus 64 onto bus 70. Output PAL 60 scans the output thus produced by insert RAM 58 and generates a logical high signal on line 76 if any of the selected bits on bus 70 are set. The sequencer 46 continues to read the contents of RAM 58 until line 76 is asserted high indicating that the sequencer 46 has found a set bit in a data word of insert RAM 58. When sequencer 46 finds a set bit in insert RAM 58, sequencer 46 has found the most significant set bit representing the most significant process priority number awaiting execution (stored in insert RAM 58). At this point, sequencer 46 must present an eight-bit binary representation of the highest process priority number awaiting execution onto communication bus 24a. The architecture shown in FIG. 5 easily allows for this function. The five most significant bits of the process priority numbers are readily available: these bits are simply the same as the bits used to address insert RAM 58 using bus 64. However, the 3 least significant bits of the process priority number to be communicated on communication bus 24a must be decoded from the eight-bit data word selected from the insert RAM 58 by sequencer 46. The output PAL 60 has an eight-bit to three-bit priority encoder (not shown) which outputs a three-bit binary representation of the highest set bit in the selected data word. The input of the encoder is the eight-bit output of RAM 58 over bus 70. The output of the eight-bit to three-bit priority encoder is provided to the three least significant bits of bus 24a. Therefore, after the scheduler sequencer 46 has scanned the insert RAM 58 and found the most significant bit that is set, the eight-bit binary representation of that bit is asserted over the communication bus 24a. The sequencer 46 then signals processor interrupter 22 (shown in FIG. 3) using load next line 24a' to load next register 38a (shown in FIG. 3) with the value of the highest process priority number awaiting execution now presented on communication bus 24a. Therefore, as described hereinabove in reference to FIG. 5, cluster scheduler 20a maintains a queue for processes awaiting execution on processors in processor cluster 14a (shown in FIG. 2). Cluster scheduler 20a constantly provides the highest process priority number awaiting execution. Any processor in the parallel processing system of FIG. 2 may insert or withdraw process priority numbers into cluster scheduler 20a via bus 21a. The act of inserting or withdrawing a process priority number will cause cluster scheduler 20a to produce a "new" highest waiting process priority on bus 24a, and will also cause cluster scheduler 20a to load the next register in processor interrupter 22. The operation of cluster scheduler 20a when withdrawing a process priority number from insert RAM 58 works identically to the operation described above when inserting a process priority number into insert RAM 58. The only difference is the status of line 54 when the input PAL affects the selected bit. If the process priority number is to be withdrawn, line 54 will be a logical low, and the affected bit will be cleared instead of set as described hereinabove. Referring to FIG. 6, the processor interrupter 22 is now described in detail. As shown in FIG. 6, one preferred embodiment of the present invention has the processor interrupter 22 which comprises four "tri-state" next registers 38a-38d, a preemptor sequencer 80, an active process RAM 82, an eight-bit register referred to as the lowest register 84, an eight-bit comparator circuit 86, and a four-bit preempt latch 88. The four next registers 38a-38d are loaded by the four cluster schedulers 20a-20d (shown in FIG. 3) with the highest priority number of all the waiting processes awaiting execution on the processor cluster served by each cluster scheduler. Referring simultaneously to FIGS. 3 and 6, next register 38a is loaded by cluster scheduler 20a, next register 38b is loaded by cluster scheduler 20b, next register 38c is loaded by cluster scheduler 20c and next register 38d is loaded by cluster scheduler 20d. The four load control lines 24a'-24d' are monitored by the preemptor sequencer 80. The preemptor sequencer 80 controls the operation of the processor interrupter 22 and is described in more detail hereinafter. The outputs of any next register may be read by any processor on the VME bus 8 (shown in FIG. 2). Next register outputs may be read by processors on the system bus 8 by addressing the sequencer 80 using bus 23d. The next registers are implemented using conventional "tri-state" latches (e.g. Part No. 74LS374 available from Texas Instruments). The outputs of the next registers 38a-38d are bussed together with the active process RAM outputs on bus 98. Both the next register 38a-38d outputs and the RAM 82 outputs are tri-stated and connected to bus 23c. The outputs of each next register is separately enabled by sequencer 80 via separate control lines. For example, next register 38a is enabled via control line 104. The output of the RAM 82 is also enabled by the sequencer 80 using a control line 96 which controls the read line of RAM 82. When a next register is read by a processor on the bus, the preemptor sequencer will then enable tile next register addressed and generate control signals to bus 23. The next register selected then outputs its data over bus 23c to bus 23 and to the bus interface 18 (shown in FIG. 1). Bus interface 18 generates the necessary VME timing and control signals and transmits the next register data from processor interrupter 22 to the processor reading the next register. Referring to FIG. 6, the active process RAM 82 implements the bank of 16 active process registers 40 (shown in FIG. 3). The active process RAM 82 is a 16-word by eight-bit random access memory (a memory having more than 16 addressable words can be used, so long as the unused address lines are tied to a logical low value) that is addressed by preemptor sequencer 80 using 4-bit bus 90. Each eight-bit word stored in the RAM 82 holds the priority number of the active process currently executing on the processor associated with that word. For example, the first addressable eight-bit word in RAM 82, word 0 (addressed by asserting a binary 0000 on 4-bit bus 90), holds the priority number of the active process currently executing on processor CPU1 31 (shown in FIG. 2). The second addressable word in RAM 82 (addressed by asserting a binary 0001 on 4-bit bus 90) holds the priority number of the active process currently executing on processor CPU2 32 (shown in FIG. 2). Similarly, the third through sixteenth addressable words (addressed with binary addresses 0002-1111) in RAM 82 respectively hold the priority numbers of the active processes currently executing on processors CPU3 33 through CPU16 35 (shown in FIG. 2). A processor can write the priority number of the active process that it is currently executing into the eight-bit word in RAM 82 that is associated with that processor. For example, processor CPU1 31 (shown in FIG. 2) can write the priority number of the active process that it is currently executing into word 0 (addressed with a binary 0000 on 4-bit bus 90) of the RAM 82. Similarly, CPU16 35 can write the priority number of the active process that it is currently executing into word 15 (addressed with a binary 1111 on 4-bit bus 90) of the RAM 82. Similarly to the access method discussed above with reference to accessing the next registers 38a-38d, processors gain access to the RAM 82 by addressing the preemptor sequencer 80 via communication busses 23c and 23d, which transmit (via bus 23) data, control and address signals from the bus interface 18 (shown in FIG. 1) to the components within the processor interrupter 22. Specifically, communication bus 23c transmits the eight-bit data from bus 23 to the data bus 98 connected to the data input/output lines of RAM 82. Communication bus 23d transmits a four-bit address and control signals from bus 23 to the sequencer 80. The sequencer 80 transmits the four-bit address from bus 23d to bus 90 whenever a processor writes its active process number (via bus 23c) into its associated word in RAM 82. As shown in FIG. 6, the write and read controls of active process RAM 82 are controlled by the preemptor sequencer 80 using a control line 92 for writing the RAM 82 and a control line 96 for reading the RAM 82. The processor interrupter 22 may be best understood by describing its operation with reference to FIG. 6. Upon system start-up, preemptor sequencer 80 is initially reset. Sequencer 80 performs an initialization sequence which writes the decimal value of 255 (the highest possible priority number, hexadecimal xff) into each of the 16 eight-bit words of the active process RAM 82. The sequencer 80 disables the outputs of each next register and asserts write line 92 low (assuming that the conventional write control of RAM 82 is activated when low). Disabling the next register outputs causes the tri-state data bus 98 of the active process RAM 82 to be pulled to a logical high value by pull-up resistors (not shown). Sequencer 80 then sequences through every address (starting from binary 0000 through binary 1111) of each of the 16 words of the RAM 82. When sequencer 80 has completed its initialization sequence, all of the bits in RAM 82 are set to a logical high value. Accordingly, all of the active process numbers are set to the highest possible priority number 255 (xff). The sequencer performs this initialization sequence to prevent the processor interrupter 22 from interrupting a processor until the processor is first initialized some time after system start-up. When a processor is initialized, it will write a priority number of zero (a zero priority number indicates that the processor is available) into the data word associated with that processor in RAM 82. By writing a zero to the data word associated with the processor, the processor is then released for scheduling. Processes may then be scheduled by the PSI 10 for execution by that processor. For example, some interval of time after system start-up, processor CPU2 32 (shown in FIG. 2) is initialized. Upon initialization, CPU2 32 writes a x00 to the address of its associated data word in RAM 82, in this case word 1 (addressed with a binary 0001 on bus 90). The address is decoded by bus interface 18 (shown in FIG. 1), and transmitted via bus 23 to bus 23d. The four-bit address and control signals are then presented to the sequencer 80 via bus 23d. The sequencer, in turn, passes the same four bit address to the address control lines of RAM 82 via 4-bit bus 90. The data (x00) is presented to the data input/output port of RAM 82 via bus 23c which is bussed together with tristate bus 98. The sequencer then writes the word, thus clearing the active process priority number, and enabling the scheduling of processes for CPU2 32. After the sequencer 80 has completed its initialization sequence, it then assumes an idle condition and waits until at least one of the next registers is loaded with a waiting process priority number by at least one of the cluster schedulers. When one of the cluster schedulers loads its next register, one of the load control lines 24a'-24d', will be asserted, and sequencer 80 will leave its idle state and begin sequencing. Sequencer 80 addresses RAM 82 over four-bit bus 90. The two most significant bits of the address contain a two-bit representation of the cluster scheduler which loaded its next register. For example, if next register 38a is loaded by cluster scheduler 20a (shown in FIG. 3) the two most significant bits on 4-bit bus 90 will be 00. However, if next register 38c was loaded, the two most significant bits of the address presented to RAM 82 on 4-bit bus 90 will be 10. Therefore, depending upon which next register is loaded, sequencer 80 selects one of four groups of data words in RAM 82 which corresponds to the next register which was loaded. Sequencer 80, having set the two most significant bits on 4-bit bus 90 to select a group of four data words, then uses the two least significant bits on 4-bit bus 90 to sequence through the four data words associated with the next register that was loaded. Each group of four data words in RAM 82 contain the priority number of the processes currently executing on processors associated with each next register 38a-38d. More specifically, the RAM 82 is arranged so that the four least significant data words (0-3) contain the priority process numbers of the four processes currently executing on the processors associated with next register 38a. The next four data words in RAM 82 (4-7) contain the active process priority numbers for the processes currently executing on the processors associated with next register 38b. The next four data words in RAM 82 contain the active process priority numbers for the processes executing on processors associated with next register 38c. Finally, the last four data words in RAM 82 (12-15) contain the active process priority numbers for the processes currently executing on processors associated with next register 38d. Sequencer 80, after sensing that a next register is loaded, reads the four active process priority numbers associated with the next register that caused sequencer 80 to leave its idle state. Sequencer 80 reads the four values from RAM 82 and, in conjunction with register 84 and comparator 86, determines the lowest active process priority number of the four active process priority numbers selected. Once the lowest process priority number is selected, it is placed in register 84 and compared with the process priority number contained in the next register. The comparison function is performed by compare circuit 86. If the value in the next register is higher than the lowest active process priority number, the sequencer 80 interrupts the processor associated with the lowest process priority number. For example, assume that cluster scheduler 20a (shown in FIG. 3) loads next register 38a with the value of the highest process priority number requiring scheduling. Sequencer 80 will detect that next register 38a was loaded by sensing line 24a' going high. Sequencer 80 then begins addressing RAM 82 by placing a 00 on the two most significant bits of 4-bit bus 90, and a 00 on the two least significant bits which represent the process or number. Therefore, sequencer 80 starts addressing RAM 82 at binary address 0000. Sequencer 80 then asserts line 96 high causing the active process priority number contained in data word 0 of RAM 82 to be asserted on bus 98. Using control line 100, the sequencer 80 latches the value of the read active process priority number into lowest register 84. Sequencer 80 then increments the address on 4-bit bus 90 to binary 0001 and reads the next active process priority number associated with the next processor onto bus 98. The two active processes are then compared using compare circuit 86. The output of compare circuit 86 is connected to the sequencer 80 with line 102. If the output of RAM 82 is less than the output of the lowest register 84, the sequencer 80 loads the output of RAM 82 into the lowest register 84. If the output of the RAM 82 is less than or equal to the output of the lowest register 84, the sequencer 80 leaves the lowest register 84 unaffected. In either case, sequencer 80 increases the address on 4-bit bus 90 by 1, to a binary 0010. This action causes the next active process priority number to be asserted onto bus 98 and compared with the value of lowest register 84. This sequence repeats until all four process priority numbers for active processes in RAM 82 have been read and compared. When sequencer 80 has sequenced through all four active process priority numbers, the value of the lowest active process priority number remains in the lowest register 84. Sequencer 80 then enables the output of next register 38a using enable line 104. The value of next register 38a is then compared to the lowest active process priority number by eight-bit comparator circuit 86. If the value of next register 38a is less than the value of the lowest active process priority number, sequencer 80 takes no action and returns to an idle condition. However, if the value of next register 38a is greater than the value of the lowest active process priority number, sequencer 80 loads the four-bit address of the lowest active process priority number into the preempt latch 88. Therefore, the preempt latch 88 contains a four-bit nibble which indicates which of the 16 processors require interrupting. Sequencer 80 then raises interrupt flag 23a and causes the interrupt address to be asserted over communication bus 23. Raising signal line 23a causes the selected processor to be interrupted. In the parallel processing system of FIG. 2, processor interrupter 22 initiates an interrupt over system bus 8. The interrupt is handled by request flow manager 12a. Request flow manager 12a reads the value of preempt latch 88 (shown in FIG. 6) to determine which processor requires interruption. The interrupt line of the selected processor is driven causing the active process on that processor to suspend execution. The suspended process stores context information into cluster memory located in the cluster associated with that processor. This allows any other processor to resume processing where the suspended process terminated. Whenever a process is suspended, it inserts its process priority number into a cluster scheduler. This causes the suspended process to be queued for scheduling. When the preempting process, which is not a multi-threaded process, begins execution on the interrupted processor, it withdraws its priority process number from the insert register of the cluster scheduler which scheduled that process. The preempting process also writes its priority number into its associated data word in the active process RAM 82 of the processor interrupter 22 (shown in FIG. 6). In this manner, processor interrupter 22 keeps a record of the priority numbers of all the active processes in the system. As discussed above, any processor may dynamically change its priority number by writing a different priority number to the active process RAM in processor preemptor 22. Also, any processor may read any next register from processor preemptor 22 by addressing the register over system bus 8. The cluster schedulers ensure that the next registers always contain the highest priority waiting process. The next register may not be read while the cluster is scheduling. The scheduler sequencer prevents any processor from reading the next register for the short period of time after being initiated by a new insert register access. When the cluster scheduler sequencer finishes scheduling, the next register may be read. Therefore, any processor can determine which process to execute next by reading its next register. The process is guaranteed to be the next highest priority process to execute. An alternative embodiment of the present invention has an individual processor interrupter 22 for each cluster scheduler 20. The preferred embodiment of the present invention shown in FIG. 2 uses a central processor interrupter 22 which services four cluster schedulers. The central processor interrupter 22 was implemented due to the cost and size constraints presented by having individual cluster scheduler processor interrupters. However, as discussed above, applicants expect that the cost of VLSI technology will decrease and that the present invention will in the future be implemented on a single VLSI device. At that time, each cluster scheduler will have a dedicated processor interrupter which works identically to the processor interrupter of the preferred embodiment, except that it services only one cluster scheduler instead of four. As discussed above, the PSI 10 of the present invention also facilitates scheduling of "multi-threaded processes" The software application using the present invention maintains tables which indicate whether a process is multi-threaded. The software tables also indicate how many processors are to be allocated to this multi-threaded process. Multi-threaded processes require that one to n processors execute one to n sub-processes. The processors operate on separate data. Typically, processes will not reschedule themselves during execution. An exception is made for multi-threaded processes. If a multi-threaded process executes on a processor, the process priority number associated with that process is left in the cluster scheduler until the multi-threaded process terminates. More specifically, when a process is dispatched to a processor, it usually withdraws its priority number from the cluster scheduler insert register as described above. However, if a process is multi-threaded, it does not withdraw its priority number from the cluster scheduler. The process is able to determine that it is multi-threaded by referring to software tables maintained by the software system. After leaving its priority number in the cluster scheduler insert register, the multi-threaded process then determines how many processors it has allocated to it. It then can initiate its sub-processes to the available processors. When a processor no longer executes a higher priority process, the PSI dispatches sub-processes to available processors in the manner described hereinabove. As processors become available, or as the multi-threaded process becomes the highest priority process awaiting execution, the multi-threaded process will be dispatched to a processor for execution. One preferred embodiment of the present invention has been implemented using the well known VME bus. The cluster scheduler and processor interrupter, as currently implemented, are now briefly discussed. However, the actual implementation of the PSI 10 of the present invention represents only one embodiment. The present invention may be implemented on several computer busses using the preferred embodiments discussed in detail above. Therefore, the following discussion is for explanation purposes and should not limit the scope of the invention. FIGS. 7a-7d, taken together with the scheduler sequencer flow-diagrams shown in FIGS. 7e-7f, in conjunction with the PAL programming equations attached in Appendix A, show the actual VME bus implementation of the cluster scheduler (of FIG. 5) of the present invention. Referring simultaneously to FIGS. 5 and 7a, the VME implementation of the cluster scheduler 20a shown in FIG. 5 is now described with reference to FIG. 7a. As shown in FIG. 7a, cluster scheduler 20a comprises a sequencer 46a, a PAL 46b, an input PAL 56, an insert random access memory (RAM) 58, and an output PAL 60. FIG. 7a demonstrates how cluster scheduler 20a of FIG. 5 is easily and inexpensively implemented using only a few available SSI and MSI integrated circuits. The scheduler sequencer 46 (shown as one device in FIG. 5) is actually implemented with two devices: the sequencer 46a and the PAL 46b. PAL 46b provides some of the interface functions to interface the cluster scheduler 20a with the VME bus. The other four devices, the sequencer 46a, the input pal 56, the insert RAM 58, and the output PAL 60, perform as described above with reference to FIG. 5. Insert RAM 58 is implemented with a conventional 2048 word by 8-bit Static RAM (Part No. CY7C128, available from Cypress Semiconductor Corp.). As shown in FIG. 7a, the 2K by 8 SRAM is used as a 32 word by 8-bit RAM because the lowest six address bits are connected to ground. PAL 46b, input PAL 56 and output PAL 60 are all implemented using conventional 16-input, 8-output and-or-invert programmable array logic devices (Part No. PAL 16L8, available from Texas Instruments, Inc.). The logic diagram of the programmed input PAL device 56 is shown in FIG. 7b. The logic diagram of the programmed output PAL device 60 is shown in FIG. 7c. The logic diagram of the programmed sequencer/interface device 46b is shown in FIG. 7d. The scheduler sequencer 46a is a programmable sequence generator (Part No. PSG 507, available from Texas Instruments, Inc.). The scheduler sequencer 46a is programmed in accordance with the scheduler sequencer flow-diagrams shown in FIGS. 7e-7f. The flow-diagram illustrates the operation of the sequencer 46 as described above with reference to FIG. 5. The PAL programming equations for programming the scheduler sequencer are given in Appendix A. Appendix A shows the source code for programming the scheduler sequencer 46a generated using the ProLogic Compiler (1988 Copyright Inlab, Inc.) available from Texas Instruments, Inc. The source code is readily convertible into standard JEDIC format used for programming the sequencer device. FIGS. 8a-8h, taken together with the preemptor sequencer flow-diagrams shown in FIGS. 8i-8j, in conjunction with the PAL programming equations attached in Appendix B, show the actual VME implementation of the processor interrupter 22 (of FIG. 6) of the present invention. Referring to FIG. 8a, the processor interrupter 22 as actually implemented on the VME bus is now briefly described. As shown in FIG. 8a, this embodiment of the present invention has a processor interrupter 22 which comprises a preemptor sequencer 80a, PAL 80b and PAL 80c (which together implement the preemptor sequencer 80 of FIG. 6), an active process RAM 82, an eight-bit register referred to as the lowest register 84, an eight-bit comparator circuit 86, and a four-bit preempt latch PAL 88. In addition, processor interrupter has four next registers as described above with reference to FIG. 6. FIGS. 8b-8e show the four "tri-state" next registers 38a-38d of the processor interrupter 22. The four next registers 38a-38d are loaded by the four cluster schedulers 20a-20d (shown in FIGS. 8b-8e as optional mezzanine modules) with the highest priority number of all the waiting processes in the cluster schedulers. Referring simultaneously to FIGS. 8b-8e, next register 38a is loaded by cluster scheduler 20a, next register 38b is loaded by cluster scheduler 20b, next register 38c is loaded by cluster scheduler 20c and next register 38d is loaded by cluster scheduler 20d. The next register outputs may be accessed by any processor on the bus via bus 23c. The next registers are implemented with octal edge-triggered tri-state flip-flop devices (Part No. 74LS374, available from Signetics Company). Referring now to FIG. 8a, the four load control lines 24a'-24d' are monitored by the preemptor sequencer 80a-80c shown in FIG. 8a. The preemptor sequencer 80a-80c controls the operation of the processor interrupter 22. The processor interrupter 22 functions as described above with reference to FIG. 6, with the exception that PALs 80b and 80c perform some VME interface functions as well as assisting the preemptor sequencer 80a. Specifically, the preempt latch 88, active process RAM 82, next registers 38a-38d (shown in FIGS. 8b-8e), and lowest latch 84 perform as described above with reference to FIG. 6. The active process RAM 82 is implemented with a conventional 2048 word by 8-bit Static RAM (Part No. CY7C128, available from Cypress Semiconductor Corp.). As shown in FIG. 8a, the 2K by 8 SRAM is used as a 16 word by 8-bit RAM because the seven most significant address bits are connected to ground. PAL 80b and preempt latch 88 are implemented using conventional 16-input, 8-output and-or-invert programmable array logic devices (Part No. PAL 16L8, available from Texas: Instruments, Inc). The logic diagram of the programmed preempt latch 88 PAL device is shown in FIG. 8f. The logic diagram of the programmed PAL device 80b is shown in FIG. 8g. PAL 80c is implemented using a conventional programmable array logic device (Part No. PAL 16R4, available from Texas Instruments, Inc.). The logic diagram of the programmed PAL device 80c is shown in FIG. 8h. The preemptor sequencer 80a is a programmable logic sequencer (Part No. PLS 506, available from Texas Instruments, Inc.). The preemptor sequencer 80a is programmed in accordance with the preemptor sequencer flow-diagram shown in FIGS. 8i-j. The flow-diagram illustrates the operation of the sequencer 80 as described above with reference to FIG. 6. The PAL programming equations for programming the preemptor sequencer 80a are given in Appendix B. Appendix B shows the source code for programming the preemptor sequencer 80a generated using the ProLogic Compiler (1988 Copyright Inlab, Inc.) available from Texas Instruments, Inc. The source code is readily convertible into standard JEDIC format used for programming the sequencer device. While the above detailed description has shown, described and pointed out the fundamental novel features of the invention as applied to various embodiments, it will be understood that various omissions and substitutions and changes in the form and details of the device illustrated may be made by those skilled in the art, without departing from the spirit of the invention. ##SPC1##
|
Same subclass Same class Consider this |
||||||||||
