Dependency based cooperative processing of multiple programs working together to accomplish a larger task

Dynamic activity-creating data-driven computer architecture

4644461

Abstract

A computer architecture wherein data inputs causes the dynamic creation of appropriate activities employing stored functions as necessary to accomplish the desired end result for the data. The architecture employs a large scale multi-processing environment for parallel computation. A fast forward propagating queue structure and improved interfacing crossbar are employed.


Claims

Wherefore, having thus described my invention, I claim:

1. In a computer architecture of the Petri Net type wherein a plurality of pre-established sub-functions, defining a total function, process portions of the common input data asynchronously to generate a single combined output reflecting the total function as applied to input data, an improvement comprising:

(a) a plurality of data processors;

(b) means for subdividing said total function when said total function is beyond the capability of an individual one of said data processors.

(c) function library means coupled to said data processors for storing each unique sub-function in a storage area as a replicator which can be copied and used by said data processors as needed on a dynamic basis;

(d) data queue means for recirculating data to be processed by said data processors continuously past said data processors until data is removed from said data queue means by one of said data processors, and for supplying said data to selected data processors;

(e) activity queue means for storing a plurality of sets of instructions for operating on selected corresponding data by said data processors, and for recirculating said instructions past said data processors until one of said instructions is taken from said queue by one of said data processors, said activity queue means including means for supplying to all of said data processors, instruction codes, for causing said data processors, (1) to remove particular labeled data from the data queue means, (2) to perform a computational function on said data (3) to introduce output labelled data into said data queue means, (4) to retrieve from the function library stored generalized instruction sequences, and (5) to introduce additional instruction codes into the activity queue means;

(f) means for associating one of said sets of instructions from said activity queue with an idle one of said processors;

(g) means for applying selected data from said data queue to said one processor, said selected data corresponding to said instructions selected from said activity queue; and

(h) said means for subdividing including means for applying subdivided data to the data queue, means for applying sub-function instructions to the activity queue such that further processing by a plurality of idle ones of said data processors can be accomplished and means for recombining the separately processed sub-functions whereby the processing of said total function is accomplished.

2. A computer architecture as defined in claim 1 wherein said queue means includes a plurality stages of latch means for holding the data.

3. A computer architecture as defined in claim 1 including crossbar circuit means for accessing the function library.

4. A computer architecture as defined in claim 1 wherein auxiliary storage nodes are provided, and wherein said function library resides therein.

5. A computer architecture as defined in claim 1 wherein the data in said data queue is labelled data and forms part of the means for applying selected data corresponding to instructions selected by said data processor from said activity queue.


Description

BACKGROUND OF THE INVENTION

The present invention relates to large system computer architecture and, more particularly, to interfacing elements employed between devices in large computer systems for the transfer of data such as queues and crossbars.

Basic digital computing operates in the cycles shown on FIG. 1. In what is referred to as a Von Neuman computer a sequence of instructions are alternately fetched from their location in memory into an execution register and executed. The computational sequences begin by providing the computer with a first instruction address for execution. Thereafter, the address of the next instruction is automatically calculated byy the hardware as a result of the previous execution.

A simple Von Neuman system is shown in FIG. 2. The arithmetic and control unit 10 sequentially fetches and executes a series of instructions located in instructional memory 12. Typically, the instructions from memory 12 cause the arithmetic and control unit 10 to operate on data read in through input lines and/or maintained in data memory 14. The results of the calculations can be output to be seen by an operator on a device such as typewriter 16.

Certain computational projects of large magnitude have traditionally imposed severe time constraints on computational ability. Things such as major conversion problems and military command and control systems requiring repetitive, rapid, and immediate results fall into this category. To solve such problems, the Von Neuman type computer has been made bigger and bigger and faster and faster. Recently, much thought has been given to deviating from this approach and having the data drive the problem rather than the problem drive the data. Much of the impetus for research in this area has been as a result of recent developments in hardware wherein microprocessors and "smart" chips have been made available in smaller and smaller sizes and at lower and lower costs. Such an approach is shown in its simplest form in FIG. 3. In such a system, the driving data elements are referred to as tokens. The tokens cause the "firing" of an activity which generates results depending upon the value of the tokens. Assume that an activity is implemented by a separate chip, and that chip 18 has two inputs 20 and 22 and an output 24. Assume further that chip 18 in the presence of both inputs 20, 22 will produce their sum at the output 24. The activity of chip 18 is simply that of the summing of the inputs.

Such a series of activities can be hard-implemented as "activity chips" or can be treated as logical entities emulated by more general devices, such as microprocessors 26 which are shown interconnected in FIG. 4 in what is referred to as a Petri net. Each of the activities 26 performs a function (indicated as f.sub.1 -f.sub.5 respectively) on the input(s). Such Petri nets are pre-established and do not change during the course of a computation. Each of the activities 26 is independent and, therefore, the sequence of computations is a function of the arrival of the tokens. When tokens A and B are present and available for function f.sub.1, its output is available as a input to functions f.sub.2 and f.sub.4. Token C in the presence of the output of function f.sub.1 will cause function f.sub.2 to be calculated providing the second input for function f.sub.4 which, in turn, provides one of the two inputs required for function f.sub.5. The presence of tokens D and E cause the output of function of f.sub.3 to be made available as the second input for function f.sub.5.

Turning now to FIG. 5, a simplified drawing of a computer system of the Von Neuman type as used in real-time on-line systems is shown. The main memory 30 as accessed by the arithmetic and control unit (not shown) contains resident system programs 32, resident sub-routines 34 (usually re-entrant i.e., more than one program can be using them at a time), and a large section of available memory 36 wherein programs can be loaded for temporary execution. The majority of the programs and data are maintained on a mass storage device 38. When, as a result of an appropriate stimulus such as an interrupt, data arrival, or the like, the resident system program 32 requires the execution of a program on the mass storage device 38, that program is transferred from the mass storage device 38 into an unoccupied portion of available execution memory 36 as symbolized by the dotted area 40. When the program has been transferred into area 40, control is transferred to the first instruction and execution proceeds normally. During the execution procedure, the program within dotted area 40 can employ one or more of the sub-routines 34. When the program has completed its operation, dotted area 40 is merely returned to available status. That is, programs are not transferred back to mass storage 38. In the preferred mode of operation of such systems, the programs on mass storage device 38 are maintained in "dump" format and coded according to "run anywhere" techniques. Thus, a simple block transfer from mass storage 38 into execution memory 36 can be accomplished and the program will run wherever loaded. Moreover, the program can be located and operated simultaneously in more than one available area within memory 36.

A more exotic version of such a system is shown in simplified form in FIG. 6. Such a multi-processor system is typically found in military command and control systems wherein such a large quantity of functions must be accomplished that no single processor (e.g. computer) can accomplish the entire task. In such a system, one or more mass storage devices 38 containing programs and data are interfaced with a communication bus 42. The processors 44 are also interfaced with the communication buss 42. In such manner, the processors 44 can communicate between one another along the buss 42 as well as accessing the mass storage 38. When operating in a task oriented mode, a common task list or queue is maintained and tasks are assigned to the next available processor 44 in response to an initiating stimulus.

Communication crossbar circuits and fast propagating first in, first out (FIFO) queues have been postulated for use in such large systems in order to eliminate potential bottlenecks. That is, when one entity needs to communicate with another, a bottleneck can occur if a communication path between the two cannot be established immediately. In such case, a communication crossbar circuit is proposed. In like manner, when a large number of items are placed on a queue having many stages, a potential bottleneck can occur if the next piece of available data is many stages away from the requester and a considerable period of time and clock pulses must be occupied in order to shift the data to the user.

The proposed devices to date either exist on paper only or, if built, only operate as a stand-alone device. This is because of the inability of the designer of such apparatus to recognize the limitations which must be included within such apparatus when working in the given environment. For example, a common technique is to make such devices operate asynchronously to the system. The patents to Faustini (U.S. Pat. No. 3,757,231), Mallerich, Jr. (U.S. Pat. No. 3,736,575), and Derickson, III, et al (U.S. Pat. No. 3,972,034) are examples of such apparatus. Faustini describes an asynchronous unclocked FIFO which shifts stage by stage. Mallerich describes an improvement over Faustini which operates in the same manner, but with fewer components. The Derickson patent includes the description of a FIFO as part thereof beginning with the description in column 3. That FIFO also is asynchronous.

In a single computer of the Von Neuman type, the problem cannot occur because any word in memory will always be completely accessed. That is, a word in memory will always be written into completely or read completely. In an interrupt-operated system having different priority levels, recognition of the potential problem being discussed herein is recognized by the technique of "re-entrant" coding which must be employed whenever common data can be accessed by multiple programming levels. Still in all, even in that case, reading and writing are on a complete word basis.

By contrast, if a common memory location is being accessed asynchronously, it is possible for one device to be accessing a particular memory location as, for example, by reading from it simultaneously with another device reading into it. Such an occurence can result in completely garbled and unintelligible data. The reason that this is not normally considered or accounted for is that, typically, it only happens on rare occasions. Thus, the designer running a short test case probably does not have the event occur or, if it does, it is not recognized. As a practical matter, however, it is something which does occur and must be accounted for.

Thus, any crossbar or FIFO operating within a computer architecture as to be herein described must be synchronized to the clock driving the system so that appropriate preventive measures can be incorporated to prevent such potentially catastrophic events from occurring.

It is the object of the present invention to provide a data driven computer architecture having the benefits of data driven initiation with the flexibility of more traditional Von Neuman computers.

It is a further object of the present invention to provide such an architecture with interfacing mechanisms which maximize the flow rate of data between elements and minimizes the time delays inherent therein.

SUMMARY

The foregoing objectives have been met in a computer architecture of the Petri Net type wherein pre-established sub-functions, defining a total function, process portions of common input data asynchronously to generate a single combined output reflecting the total function as applied to the input data by the improvement comprising, each unique sub-function existing in a storage area as a replicator which can be copied and used by a processor as needed on a dynamic basis; and, each replicator when copied and used having the ability, when presented with data to be processed by it, of determining if the data is within its present processing power and, if it is, processing it and presenting the results to its output; otherwise, subdividing the data, making copies of itself and spawning a combining logic having the original output destination as its output destination, designating the input of the combining logic as the output designation of the copies, presenting the subdivided data to respective ones of the copies for processing, and terminating itself, whereby a sequence of replicator copies and combining logic will be automatically generated at run-time which has the capability of processing the data input thereto and then be discarded when finished with its processing of that data.

In the preferred embodiment, the copies of the replicators and the spawned combining logic are placed as activities in a queue structure according to a unique embodiment described herein and accessed by a plurality of processors through a unique cross bar structure described herein whereby each activity is processed asynchronously on a processor-available basis.

The activities are disposed in a queue or FIFO comprising a plurality of stages into which data can be placed at input points and from which data can be removed at output points upon the occurrence of respective clock pulses of a clock line to which it is synchronized and by which it is driven, wherein the improvement comprises, status means interconnecting the stages for passing each stage's status to the stage in front of it and behind it between the clock pulses; and, shifting means interconnecting the stages for shifting data between stages towards output points over unused stages between clock pulses at combinatorial logic speeds whereby the availability of data at the output points during the clock pulses is maximized.

The basic crossbar circuit shown for selectively connecting between respective ones of a plurality, M, of inputs and respective ones of a plurality, N, of outputs comprises, M input bar circuit means disposed in spaced, non-intersecting relationship for respectively conducting signals as applied to respective ones of the input bar circuit means; N output bar circuit means disposed in spaced, non-intersecting relationship to one another and in close adjacent spaced relationship to the input bar circuit means for respectively conducting signals as supplied to respective ones of the output bar circuit means, each of the output bar circuit means being oriented to intersect each of the input bar circuit means once; and, a plurality (M.times.N) of control interfaced circuit means connecting the input bar circuit means to the output bar circuit means at each of respective ones of the intersections of the bar circuit means for recognizing its associated input bar circuit means number, m, in a signal applied to the M input bar circuit means and responding thereto, for recognizing its associated output bar circuit means member, n, in a responded-to signal, for establishing a connection along input bar circuit means, m, and output bar circuit means, n, through the intersection thereof and flagging the bar circuit means, m and n, as "in use" if not previously in use, sending an "in use" signal back along input bar circuit means m if "in use", and flagging the bar circuit means m and n as "not in use" when use of the established connection is completed.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a prior art computing cycle of the Von Neuman type wherein the operating instructions are alternately fetched and executed.

FIG. 2 is a block diagram of a prior art computer system of the Von Neuman type.

FIG. 3 is a simplified block diagram showing a basic prior art approach to data driven computing.

FIG. 4 is a block diagram of a prior art activity driven structure referred to as a Petri Net.

FIG. 5 is a simplified drawing of a prior art computer system of the Von Neuman type as used in real-time on-line systems.

FIG. 6 is a block diagram of a prior art multi-processing system.

FIG. 7 is a simplified block diagram demonstrating the basic premise of the present invention wherein data is presented to a copy of a replicator defining a basic function which has the capability of spawning additional activities and copies of itself as necessary to accomplish the computational task necessary on the data presented.

FIG. 8 is a simplified block diagram showing the computer architecture of the present invention in its basic form.

FIG. 9 is a simplified diagram of the computer architecture of the present invention in an improved form.

FIG. 10 is a diagram of the computer architecture according to the present invention in yet a further refined form for improved operation.

FIG. 11 is a diagram of the computer architecture of the present invention in its most refined and preferred form.

FIG. 12 is a basic block diagram showing the necessity for an interface between senders and receivers of messages and signals.

FIG. 13 is a simplified drawing of the internal workings of the interface network shown as a block in FIG. 12.

FIG. 14 is a simplified drawing showing how the horizontal rows of the interface network of FIG. 13 could be produced on individual horizontal bar circuits.

FIG. 15 is a simplified drawing showing how the vertical rows of the interface network of FIG. 13 could be produced as individual vertical bar circuits.

FIG. 16 is a simplified drawing showing how the horizontal and vertical bar circuits of FIGS. 14 and 15 could be combined to create the interface structure of FIG. 13.

FIG. 17 is an enlarged block diagram of one intersection of the structure of FIG. 16 showing how logic would be incorporated therein to allow the horizontal and vertical bar components to communicate with one another.

FIG. 18 is a simplified drawing showing how the basic crossbar structure as shown in FIG. 16 can be stacked with additional input and output interfaces to create an improved crossbar structure having the same number of inputs and outputs but with an improved probability of creating a successful path therethrough.

FIG. 19 is a simplified drawing showing how a number of basic crossbar circuits according to the present invention as shown in FIG. 16 can be interconnected to create a crossbar circuit with an increased number of input and output lines.

FIG. 20 is a simplified block diagram of several stages in a queue or FIFO according to the present invention.

FIG. 21 is a detailed block diagram of one stage from FIG. 20.

FIG. 22 is a table showing the four states possible in the stage of FIG. 21 according to the present invention and what action should be taken.

FIGS. 23-101 comprise detailed drawings of the implementation of the crossbar and queue FIFO circuits employed in the present invention and, in particular:

FIG. 23 shows circuit M-1 which is the P-bit rotating one-bit circuit.

FIG. 24 shows circuit M-2 which is the break-in switch circuit.

FIG. 25 shows circuit M-3 which is the arbitration signal generator circuit.

FIG. 27 shows circuit M-4 which is the innermost crossbar multiplexer circuit.

FIG. 28 shows circuit M-5 which is the innermost path latch circuit.

FIG. 29 shows circuit M-6 which is the want decoder circuit.

FIG. 30 shows circuit M-7 which is the P-Y compete four row circuit.

FIG. 31 shows circuit M-8 which is the P-compete 2.sup.r s path-latching broadcast crossbar circuit.

FIG. 32 is a block diagram showing the memory system crossbar facility.

FIG. 33 shows circuit M-9 which is a free-memory queue master bit internal signal circuit.

FIG. 34 shows circuit M-10 which is also a free-memory queue master bit internal signal circuit.

FIG. 35 shows circuit M-11 which is the free-memory queue master bit circuit.

FIG. 36 shows circuit M-12 which is the free-memory queue slate bit circuit.

FIG. 52 shows circuit M-25 which is incorporated into, for example, the interface to M-8/M-22: sender.

FIG. 53 shows circuit M-26 which is incorporated into, for example, the interface to M-8/M-22: receiver.

FIG. 54 shows circuit M-27 which is the memory code circuit.

FIG. 55 shows circuit M-28 which is the memory code for input-output circuit.

FIG. 56 shows a shorthand notation for the M-8 circuit.

FIG. 57 shows the bit fields of the address.

FIG. 58 is a block diagram showing the synthesis of larger crossbars.

FIG. 59 is a shorthand nnotation for an expanded M-8 circuit.

FIG. 60 is a shorthand notation showing one possible configuration of address bit-field "C".

FIG. 61 shows the special case of an M-8 circuit which results when there are alpha address bits on the inputs and 2.sup.alpha outputs.

FIG. 62 is an informal designation for the circuit of FIG. 61.

FIG. 63 is a block diagram of the inner connection between the M-13 circuits output routing to the memory nodes.

FIG. 64 shows circuit Q-1 which is a parallel preparator circuit.

FIG. 65 shows circuit Q-2 which is a claimant circuit.

FIG. 66 shows circuit Q-3 which is an input port claimant circuit.

FIG. 67 shows circuit Q-4 which is a priority select circuit.

FIG. 68 shows circuit Q-5 which is a row-wise comparison network circuit.

FIG. 69 shows circuit Q-6 which is a token position proper circuit.

FIG. 70 shows circuit Q-7 which is an RD field multiplexer circuit.

FIG. 71 shows circuit Q-8 which is a processor input port latches circuit.

FIG. 72 shows circuit Q-9 which is a row absorption network circuit.

FIG. 73 shows circuit Q-10 which is a processor output port circuit.

FIG. 74 shows circuit Q-11 which is an output port K-group multiplexer circuit.

FIG. 75 shows circuit Q-12 which is an output port K-group circuit.

FIG. 76 shows circuit Q-13 which is an output-gating multiplexer circuit.

FIG. 77 shows circuit Q-14 which is a K-group merged onto token queue.

FIG. 78 shows circuit Q-15 which is a token queue row complete circuit.

FIG. 79 shows circuit Q-16 which is a self-switching wait stage master bit circuit.

FIG. 80 shows circuit Q-17 which is a self-switching wait stage slave bit circuit.

FIG. 81 shows circuit Q-18 which is a token-wide wait stage circuit.

FIG. 82 shows circuit Q-19 which is a wait stage block circuit.

FIG. 83 shows circuit Q-20 which is a crossing switch circuit.

FIG. 84 shows circuit Q-21 which is a token-wide crossing switch circuit.

FIG. 85 shows the circuit of FIG. 84 in symbolic form.

FIG. 86 shows circuit Q-22 which is an accordian store balancing stage circuit.

FIG. 87 shows circuit Q-23 which is an accordian store circuit.

FIG. 88 shows circuit Q-24 which is a token queue proper circuit.

FIG. 89 shows circuit Q-25 which is a token queue complete circuit.

FIGS. 90-95 are improvement circuits on circuits 79-82 to remove inherent delays.

FIG. 96 is a block diagram showing the improved FIFO of FIG. 9 hooked up to form a longer FIFO.

FIGS. 97-99 are symbolic representations of the circuits of 92-94.

FIG. 100 is the symbolic representation of the circuit of FIG. 96.

FIG. 101 is a table provided to show how a basic stage responds to the three independent variable TSF, FWS, and CONT.

DESCRIPTION OF THE PREFERRED EMBODIMENT

Referring first to FIG. 7, the basic premise of the computer architecture of the present invention is shown in simplified form. Only the initial portion of the computation's Petri net is initially present in the mashine as indicated by activity group ROOT within box 52. The Petri net for the entire computation is a function of the data entering this initial portion, and, therefore, cannot be pre-established as in the traditional data driven computer.

Assume now that initial ROOT of 52 is the initial step in generating a Petri net execution graph which will sum all elements of an incoming vector (of any length) into a single scalar. FIG. 7 shows the introduction of a four-element vector 50 borne on a token 46 to initial ROOT 52. ROOT 52 responds dynamically to this vector, in simple terms, in this way: the ROOT activity group sees that vector 50 is "too large" and, therefore, spawns a second group of activities called the THEN group, also within box 52. The ROOT group of box 52 then vanishes. That is, the activities of the ROOT group have "fired" once and thereafter immediately vanish. The initial THEN group of 52 itself sets up more of the Petri net as follows: it generates an addition ("+") activity 58; it absorbs the original vector 50 and breaks it into two halves 54 and 56; and it spawns two new ROOT activity groups in boxes 55 and 57, and guides the split vector 54 to the new ROOT group of 55, and split vector 56 to the new ROOT group of 57. Having done its work, the initial THEN activity group of box 52 vanishes.

Except for its position in the overall Petri net (which is dynamically generated piecemeal), the ROOT group of 55 is behaviorally identical to the original ROOT group of 52. The two groups are, in fact, isomorphic, built according to a single master blueprint. The blueprint designated a "REPLICATOR" (and not shown in FIG. 7) is the fundamental for of stored program in the present invention. A replicator is a blueprint for an activity group. When the THEN group of 52 "spawned" the two ROOT groups of 55 and 57, the static replicator defining a ROOT group was called on as the pattern for construction.

Continuing with the example, the left ROOT of 55 operates on its input vector 54 independently of the right ROOT 57 operating on its vector 56. This is the parallelism consequential of any Petri net scheme. One can also see that the original ROOT of 52 led to the generation of the two new ROOTs 55 and 57, so that the overall computation was recursively defined. Single activities fire once, only once, and them immediately vanish. This forces the synamically generated activity groups to be acyclic and, viewed as a unit, also vanishing once they have been used. This, in turn, means the overall dynamically-generated Petri net will be acyclic; that is, without "loops" in which any activity is traversed twice (since activities vanish on their first traversal). These characteristics are fundamental to the present invention.

The ROOT group of 55 examines its incoming vector 54 and determines that the vector is "small enough". That ROOT group then spawns an ELSE group of box 55 which creates a waiting addition activity 64, absorbs incoming vector 54 and breaks it into isolated scalars "a" and "b" shown as inputs to activity 64, guiding those scalars to the prepared activity. The ROOT group of 55 vanishes after spawning the ELSE group of 55 and the ELSE group vanishes above the above steps. It should be seen that the right-hand ROOT group of box 57 undergoes similar behavior (behavior which is isomorphic to the left-hand group) but due to its context (that is, its global placement in the overall Perti net), it is not identical. The right-hand group absorbs its own vector as determined by its context, has its own local behavior as affected by that incoming vector and produces its own result destined ultimately for the right-hand input of the waiting addition activity 58. The linkage mechanisms cause the overall computation to be deterministic. That is, there is no ambiguity as to which activity will absorb which token, nor where a result will go once produced.

The computation proceeds to the point where only the four isolates scalars "a", "b", "c", and "d" (each on its own token), along with the three addition activities 64, 66, and 58, are the only entities remaining in the system. The linkages completely determine their interaction, as indicated by the arrows. Activity 64 absorbs the "a" and "b", produces the sum "a+b" and vanishes. Activity 66 produces "c+d" and the single activity 58. The ultimate posture should be clear: a single token "a+b+c+d" and no activities.

It is a fundamental feature of the present invention that the creation of new activities can be initiated (if not totally carried out) by existing activities in the system. The new activities are built having a very specific relationship to one another and to the context thay are introduced into. These new activities and their interrelationships constitute a "sub-graph" (referred to as a "group" in the above text). A sub-graph is itself a Petri net. Existing activities, under stimulus of the data they are currently processing, may choose to initiate construction of an activity group (a Petri net sub-graph of new activities which are properly interrelated) according to a pre-established pattern. The pattern, called the replicator, is accessed and used by the existing activity to establish the new activities. The existing activity also provides the information needed to interface ("concatenate") the new sub-graph properly to the existing overall Petri net. This latter information needed to link the new sub-graph to its context is provided by the "parent" activity (the activity initiating the creation) and not by the replicator. The replicator only indicates how that information must be applied. It is here that one should begin to see the detailed interaction of data (leading to decisions as to whether or not an activity group is to be created and, if so, according to which replicator's pattern), context (providing the linkage data needed to interface the new sub-graph properly to the existing Petri net), and replicators (which are the analog of a Von Neumann "program"; that is, the static descriptiors which are stored to effect the computation intended by the programmer or the "code" as distinct from the execution of the code).

Note that in the simplified example of FIG. 7, the activities are dynamic and the basic elements (i.e., the functions) are allocated on an as-needed basis. Once utilized, the functions are merely discarded. It is envisioned that the functions would be operated and executed in a manner similar to that discussed with respect to FIGS. 5 and 6 above and, therefore, no greater detail is to be set forth herein in this regard.

Turning now to FIG. 8, certain aspects of the computer architecture of the present invention can be seen. As envisioned, the computational facilities, as indicated by the dotted box 68, comprise a multitude of individual computer processors 70. It is not inconceivable with modern technology that in a large computational system according to the present invention there could be thousands of computer processors 70 comprising facilities 68. The system is stimulated by tokens 72. These are placed onto a token queue, generally indicated as 76, by action of executing activities in the computing facilities 68 ("input" and "output" activities guide data between external devices and the token queue 76). The token queue 76 can be accessed by all the processors 70. Activities 78 are generated by executing activities involved in the "spawning" process discussed earlier, according to patterns stored in replicator storage; activity generation, with the replicator store, is generally indicated at this time as box 80. As activities 78 are generated, they are placed on an activity queue, generally indicated as 82, which is also accessible by all the processors 70. As facilities 68 become available, the activities 78 are removed from the activity queue 82 and assigned to appropriate facilities 68, as symbolized by the arrow 84. The activity needs data (generally speaking) in order to fire and, therefore, it expects to find that data on tokens 72 in the token queue 76. The scheme used to establish token flow through the dynamically-generated Petri net takes the form of "labels" on both tokens and activities which are supported by the architecture. As a match between a token 72 label and a label on an activity 78 within facilities 68 is recognized, the token 72 is provided to the activity 78, as symbolized by the arrow 86. That token 72 is removed from the token queue 76 and effectively attached to the activity 78 to effect the process called "absorption" in earlier text. Subsequent generation of a result token 72 will cause it to be placed on the token queue 76, as symbolized by the arrow 88.

It is contemplated that in a highly active system a virtual swarm of tokens and activities will be occupying the respective queues 76, 82. Making the tokens 72 and activities 78 available to the facilities 68 and to one another in the most expedient manner is, therefore, a major concern of the computer architecture of the present invention.

As shown in FIGS. 9-11, several constructions according to the architecture concept of the present invention will be described.

In implementing the novel computer architecture of the present invention, there are two major areas of concern requiring consideration and apparatus for implementation beyond the scope of anything available presently in the art in order to make the architecture an achievable reality operating in a manner which totally realizes the desired objective. Those considerations are, of course, a practical interface system between the various physical components and a practical and time-efficient queueing structure for holding the waiting activities, tokens, and replicator descriptors (within the function library) in a manner which allows them to be readily and easily accessible to the facilities requiring them. Both these requirements have been addressed by the applicant herein and solutions are described hereinafter. Both the novel cross-bar circuitry and the accordian-store queue structure will be described functionally, and, thereafter, described with particularity with reference to a specific design.

Turning first to FIG. 12, the problem is shown in its simplest form. At any particular time, the elements of the system fall into a plurality of senders, generally indicated as 200, who wish to communicate with a plurality of receivers, generally indicated as 202. What is desired is an interface network 204 to which the senders 200 and receivers 202 can be connected such that any sender 200 can call for and obtain a direct path to any receiver 202 connected to the interface network.

Turning now to FIG. 13, the internal workings of the interface network 204 are broken down to a functional level. The interface network 204 is contained within the dotted box so indicated. A series of input lines 206 (labelled "I.sub.1 " to "I.sub.6 ") enter interface network 204 and a plurality of output lines 208 (labelled "O.sub.1 " through "O.sub.5 ") leave interface network 24. As can be seen, the input lines 206 connect to internal columns labelled "1" through "6", respectively. Likewise, the output lines 208 connect to internal rows labelled "1" through "5", respectively. Assume that each intersection of a row and column has a control logic 210 associated therewith. The problem to be solved, therefore, can be seen to be one of establishing the control logic 210 in a workable manner to accomplish the desired objectives. Assume, for example, that it is desired to effect an inter-connection between input I.sub.1 206 and output O.sub.3 208. The control logic 210 at the junction of column 1, row 1, must recognize that the desired address resides on row 3. Control, therefore, must be passed through the control logic 210 at row 1, column 1, to the control logic 220 at row 2, column 1. That control logic 210 must, of course, make the same type of evaluation. Upon the request signal arriving at control logic 210 for row 3, column 1, the address identity must be recognized. The function of control logic 210 for row 3, column 1, then becomes one of requesting forward for control of the row 3 output line 108 (i.e., output O.sub.3). Each control logic block 210 along row 3 must be asked whether it has taken control of the row 3 output line 208. If any one of them has, this must be signaled back across row 3 to the control logic block 210 at row 3, column 1 which is initating the horizontal request. This lack of availability, then, is sent vertically along the control logic blocks 210 of column 1 to the requester. In the event that no control logic block 210 along row 3 has control of the row 3 output line 208, control logic block 210 must then be so informed and, itself, cause the row 3 output line 208 to be in a "occupied" state and a successful path message sent vertically along the control logic block 210 of column 1 to the initiater on line I.sub.1 206.

As can be seen, such logic capability as must, thefefore, be incorporated in each control logic block 210 is fairly complex. That is only part of the problem. Cross-bar circuits effecting the above-described interconnecting logic are not completely unknown in the art as previously mentioned. On the other hand, known cross-bar circuits are hard wired and unflexible so as to not be easily adaptable to a large system as wherein the cross-bar circuit of the present invention must operate.

Turning now to FIG. 14, one can envision how the five rows of interface network 204 could be produced on five individual horizontal bar circuits 212 having control logic chips 214 thereon. Likewise, with reference to FIG. 15, one can invision how the interface network 204 could be divided into six vertical bar circuits 216 having control logic chips 218 thereon.

If, then, the horizontal bar circuits 212 of FIG. 14 were interposed over the vertical bar circuits 216 of FIG. 15, a composite circuit 204' would be created if each control logic 214 and 218 contained complementary portions of the totally required control logic of control logic 210 and was connected to intercommunicate one with another. Such construction would make a highly flexible cross-bar circuit in that each horizontal bar circuit 212 would be identical to every other horizontal bar circuit 212 and likewise for the vertical bar circuits 216. If each horizontal bar circuit 212 contained X positions of control logic 214 and each vertical bar circuit 216 had Y positions of control logic 218, any combination up to X by Y could be created by merely adding more or less bar circuits 212, 216. Such is the intent and achieved objective of the cross bar circuit of the present invention.

The interconnection forming control logic 210' at the intersection between each vertical bar circuit 216 and horizontal bar circuit 212 is shown diagramatically in detail in FIG. 17. Each horizontal bar circuit 212 contains a horizontal electrical bus 220. Correspondingly, each vertical bar circuit 216 contains a vertical electrical bus 222. At the intersection between each horizontal bar circuit 212 and vertical bar circuit 216 a bus interface 224 provides access to both buses 220, 222. Also provided are vertical logic 226 and horizontal logic 228. Vertical logic 226 and horizontal logic 228 can intercommunicate with one another and with the bus interface 224. Additionally, vertical logic 226 can sense signals travelling along vertical electrical bus 222 and inject signals into vertical bus 222. Likewise, horizontal logic 228 can both read signals from and inject signals into horizontal electrical bus 220. The signals read and injected by vertical and horizontal logic 226, 228 is, of course, described above with reference to FIG. 13.

To provide the flexibility and the expandability necessary for a cross-bar operating in the architecture of the present invention, two additional features not possible by cross-bars of the prior art and accomplished by the cross-bar of the present invention are required. The first is shown diagramatically in FIG. 18. To this point, the cross-bar circuit as described is one of flexibility in a single plane. As defined, the interfaced network 204 and 204' comprise six columns and five rows. While the numbers used for pusposes of description are small, one can envision how with a large number (X) of columns and large number (Y) of rows interconnected along a single plane the achieving of a path between a single input I.sub.i and a single output O.sub.o might run into problems of achievability within time constraints. This is, as the rows and columns became busy, the testing and retesting procedure along the columns consuming. In such case, more paths between the same number (X,Y) of inputs and outputs could help alleviate the problem. FIG. 18 diagramatically shows such a solution. That is, the cross-bar interfaced network 204" within the dashed box still has input lines I.sub.1 through I.sub.6 and output lines O.sub.1 through O.sub.5 so as to appear to the outside world identical to the cross-bar 204 and 204' previously described. Internally, however, one can see that a plurality of identical cross-bar boards 204' have been placed in parallel and interconnected along their inputs by input interconnecting logic 230 and along their outputs by output interconnecting logic 232.

The other case of expandability is one of requiring more input and output lines 206, 208. This problem and its solution according to the present invention is shown in simplified form in FIG. 19. Due to size limitations in the drawings, the cross-bar elements 204' are shown with three inputs and two outputs. By interconnecting in the manner shown, an enlarged planar structure 204"' having nine inputs and four outputs is constructed. As should be understood, since, for example, the cubic crossbar 204" of FIG. 18 appears to the outside world as if it were any planar cross-bar 204, such a structure could be incorporated within the enlarged planar cross-bar 204"' of FIG. 19 to create a composite cross-bar with both increased numbers of inputs and outputs along with greater through paths for achieving rapid interconnections between the inputs and outputs when requested.

A specific example of such cross-bar circuits as could be implemented according to the present invention is described hereinafter.

Before turning to specifics, however, the problem of queue storage according to the novel techniques of the present invention will be shown and discussed in simplified form.

Turning now to FIG. 20, a segment from a queue or FIFO is shown. Those skilled in the art will recognize that the principles being described are identical and merely depend on whether the final output is wrapped around to the input or not. That is, a queue as employed in the present invention and as described in detail hereinafter is circular in nature, to circulate its data past various points of inquiry until such time as they are individually removed. By contrast, a FIFO propogates its data from the input towards the output such that that which is first in is first out. A FIFO has one sampling point (the output) while the queue structure described and used in the architecture of the present invention has many sampling points.

In the conventional queue or FIFO, a plurality of stages 234 are connected to a clock line 236. Upon each clock pulse on clock line 236, data, as represented by the large arrows 238, moves from one stage 234 to the next, if the next stage 234 is (or will be) empty, or does not shift forward if the next stage 234 is (and will remain) full. In a slowly operating system with one or two accessing entities, this does not pose a problem. By contrast, in a system such as the architecture of the present invention, data must be made rapidly available to any and all users. Because of the vast numbers of tokens and activities which will be swarming in the queues, a movement of data from stage to stage at single clock pulse intervals is intolerable. What is required is the heretofor non-existent concept of interconnected stages which rapidly skip or slid forward over unused positions at combinatorial logic speeds between clock pulses so to always have data available for sampling at each sampling station rather than empty positions offering only wasted time. In this regard, the queue of FIFO structure of the present invention is like the bellows of an accordian (and, therefore, is oftimes referred to thusly) in that empty stages will virtually fold to place actual data stages in virtual proximity to one another on a time scale basis. That is, because the data will skip over unused positions between clock pulses, to the user, the data will appear to have shifted in from the previous stage in one clock pulse time where, in fact, it may have shifted in from a stage many positions removed.

With additional reference now to FIG. 21 with particularity, the mode of operation according to the present invention and an enlarged block diagram view of a single stage 234 thereof is shown. Each stage 234 of the queue or FIFO of the present invention contains a plurality of data lines 240, each carrying a unit of circulating data. In conventional structures, the data lines 40 pass into and out of latches 242. Each stage 234 according to the present invention also contains latches 242 on the data lines 240. Additionally, however, the latches 242 are contained within by-pass logic 244. Each latch 242 has a by-pass line 246 around the latch 242 controlled by by-pass logic 244. As symbolized by the arrows 248 and 250, the stages 234 communicate backward and forward, respectively. The stages 234 are interconnected by signal paths represented by the lines 252 and 254 in FIG. 21. Each stage 234 contains forward logic 256 and backward logic 258 capable of impressing signals on the line 252, 254 and reading signals therefrom in order to control the action of the stage 234. The bypass logic 244 and the two logics 256, 258 are combinatorial logic circuits whereby numerous signals can be sent and received and appropriate actions taken at the stages 234 between clock pulses on line 236 so as to effect the objectives of the present invention.

Each stage 234 can be in one of four states and will take appropriate action depending upon its state. These four states, the significance thereof, and the actions to be taken are descirbed in the Table of FIG. 22. In state number 1, there is valid data present in the latches 242. If backwards logic 258 senses that the next stage in sequence will accept data, it means that data in the latches 242 of this stage can be shifted forward to the next stage and, therefore, the backwards logic 258 of this stage can signal the stage preceding it that it will accept data. As a result, the data in the latches 242 is shifted out, the clock coming in on line 236 is enabled, and the incoming data is sent to the latches 242 of this stage.

In state number 2, valid data is once again contained in the latches 242 of this stage. By contrast, the next stage is signalling that it will not accept data under those conditions. This stage 234 can do nothing except maintain the data presently in its circuits. The clock on line 236 is inhibited for this stage, any incoming data is not accepted, and nothing is shifted out.

In state number 3, this stage 234 has no valid data. The next stage, however, is signalling that it will accept data. Under such conditions, there is no need for the latches 242 of this stage to become involved. Consequently, the by-pass logic 244 connects the by-pass line 246 between the input and output on each line such that the incoming data is propagated directly to the output, thus bypassing the latches. This, of course, is the novel fast propagation forward feature of the present invention.

In state number 4, this stage 234 contains no data and the next stage is signalling that it will not accept data. This stage, therefore, can accommodate any available data and so signals. Nothing is shifted out but the clock is enabled and any incoming data is sent to the latches 242 of this stage.

Thus, not only has the computer architecture according to the present invention been described, but additionally, the functional design for novel and necessary cross-bar and queue structures as would be employed in such a computer architecture have been described. The construction of the cross-bar circuit and the queue circuits will now be described with reference to particular embodiments thereof so as to completely describe them sufficiently for those skilled in the art to build them as necessary. It is to be understood that the following design is only one possible approach to turning the principles of the present invention into a real machine. It is probably the most literal interpretation in the sense that one can identify a swarm of tokens and a swarm of activities, and can also see that the physical location of both is not only insignificant but, in fact, is continuously changing. Tokens are absorbed by activities when the two "happen" to meet. There is no concerted effort to bring tokens directly to the corresponding waiting activities (or vice versa). Instead, activities scan the swarm of tokens looking for those tokens which apply to them. When the activities find those, they remove those tokens from the swarm. The circuits insure that, eventually, activities will find their tokens. The probability of an activity never finding its tokens is virtually zero.

Other additional architecture could be provided to take more deliberate means to bring tokens together with their activities. Recall in this concept, it is possible for either the tokens or the corresponding activity to appear first in time. Associative circuits, then, must manage either contingency. Those skilled in the art should recognize an associative approach in the following design, although the associative function is not concentrated into a well-defined "associative functional block", "associative memory", "associative switching network", or whatever. Instead, the associative function is distributed throughout the entire machine.

The basic logical machine of FIG. 9 consists of four elements. These are (a) the token queue 76, (b) the activity queue 82, (c) the processors 70, and (d) the function store 90.

The token queue 76 is the novel shift register (FIFO) which moves the tokens 72 past the processors 70. The tokens 72 are in constant motion. Gating is provided to permit the processors 70 to "absorb" tokens 72 from the token queue 76 as they pass by. Tokens 72 that are not absorbed while passing over the processors 70 simply return over the closed-loop path and pass over the processors 70 again. In a properly designed machine where deadlocks are avoided, each token 72 will eventually be absorbed by the activity 78 that is expecting it. Note, this also assumes the machine is properly programmed as well.

The activity queue 82 is of the same novel construction and carries descriptors of activities 78. As is clear from the conceptual description above, the activity descriptors are a set of rules which must be "executed" to achieve execution ("firing") of an activity 78. One can think of an activity descriptor as a "program" which is to be executed by a processor 70. Therefore, the activity descriptors are brought into processors 70, which are capable of decoding the implied instructions in the descriptors and carrying them out. As activities 78 give birth to new activities 78, these new activities 78 are ejected onto the activity queue 82 where they follow the closed path, find their way into a processor 78, and ultimately fire. When an activity 78 has fired, its descriptor is simply erased, meaning that the processor 70, which was executing it will eventually simply declare itself free, i.e. the activity 78 which was executing in it has died. This free processor 70 is now available to accept a new activity descriptor from the activity queue 82 and being its "execution". Remember of course, that the activity 78 may not be able to proceed until it has absorbed necessary tokens 72 from the token queue 76.

The function store 90 is a memory which can be read-accessed by all of the processors 70. At some initialization time, the user's replicators ("function library") are moved into the function store 90. Once this is done, together with causing some initial activity to appear on the activity queue 82, the machine can "bootstrap" itself and begin executing the program. Note, this program could actually be an "operating system" and not merely a user program.

The processors 70 themselves, then, are independent computers which interface to the overall system as shown. It should be recognized that a conventional computer, whether it be microprocessor or an enormous mainframe containing dozens of central processors, can be considered a single "processor" 70 in this machine. Exactly what the processor 70 looks like is not directly relevant to the present invention, which is more concerned with the interconnection of functionally-specified processors and the overall scheme used to co-ordinate them to work on a single specific user program. The processors 70 must be capable only of carrying out activities 78. Exactly how they do this is actually a "soft" specification, as almost any conventional computer can be programmed to perform that function, qualifying it for a position in this invention. More likely, however, a new design will emerge for these processors 70 for each new implementation of the present invention, disguising this "soft" specification as a "hardware" one. This is rhetorical, as the equivalence of one conventional computer to another is well understood in the art. The present invention encompasses any device which is capable of causing the effective functional " firing" of activities 78. It is sufficient here to note that a conventional computer can do this job.

Those skilled in the art will appreciate that such a logical structure assumes that there is sufficient capacity in each position of the token queue 76 to accomodate an entire token, both the label and the data (which could be very large structures such as matrices and higher-order arrays). This is not yet completely practical, so some "extension" of the token queue 76 will most likely be needed for implementation in the near future. It should be noted, though, that the machine will perform most effectively when any "extension" can be bypassed. That is, when, in fact, an entire logical token 72, can fit into a position in the token queue 76. Depending on the implementation, this is not an always/never proposition. Some tokens 72 will fit, and some tokens 72 won't. Individual tokens 72, then, may be provided with the ability to call on the "extension". Those that don't need it then, would not have to deal with it.

A similar argument applies to the activity descriptors. Of particular concern is the specification of "constants" in activity descriptors. Constants can be very large structures as well.

Finally, the replicator store 90, in a real machine, will have to deal with contention problems of one form or another from the continuous reproductive activity of the activities 78 in the processors 70.

Deadlocks can appear at several points. The two most obvious are running out of capacity in the token queue 76, and running out of capacity in the activity queue 82. For example, an activity 78 (in a processor 70) wanting to give birth to a new activity 78 may find that the "activity queue out" cannot accept the new "son", so the activity 78 will have to wait until the room appears. Until then, it occupies a processor 70. If enough processors 70 are tied up this way, the full activity queue 82 will have no processor 70 to shift into, insuring that the activity queue 82 will have no capacity and producing deadlock. The same thing can happen in the token queue 76.

Assuming that the queues 76, 82 can be built with enough capacity to accomodate new tokens 72 and activity descriptors, deadlocks can still occur. Processors may be filled with a subset of the activities which are waiting on tokens 72, and those tokens 72 will be produced only if some of the activities 78 which are not in the processors 70 are allowed to execute. Those occupying the processors 70 stay there waiting for tokens 72 which, simply because they are sitting there waiting, can never be generated by the activities 79 which are not yet in the processors 70. This can be monitored, with some difficulty, by noting when all the processors 70 have become full. Each processor 70 can be programmed to give a logic signal indication of a "wait for token" condition. If the logical AND of all of these signals from all the processors 70 ever emerges as TRUE, then all the processors 70 are waiting. At that moment, the token queue 76 can be monitored. When it is clear that the token queue 76 has rotated around enough times without a token absorption having taken place (again, the processors can provide the signals necessary to determine this) then a deadlock has occurred. Then the processors 70 have to be freed.

Toward eliminating some of the problems mentioned above, the improved machine of FIG. 10 consists of (a) a token queue 76 with a buffer 92, (b) an activity queue 82 with a buffer 94 and addressing circuits 96, (c) processors 70 as before, and (d) a memory subsystem, generally indicated as 98, which not only provides the function needed to implement the replicator store, but also the effective extensions of the token and activity descriptors.

The token queue 86 is improved by placing in it an expandable buffer 92, the above-described FIFO element which functionally swells and shrinks. The activity queue 82 also contains one of these accordion buffers 94.

The memory subsystem 98 consists of (a) many memory nodes 100 which will be defined below, (b) crossbar communications circuits 102 (to be discussed in detail hereinafter) which permit any processor 70 to communicate to any memory node 100, vice-versa, and permit any memory node to communicate with any other memory node, and (c) (not shown in the figure), a free-memory rotating queue which is connected to all memory nodes 100 and all processors 70, into which a memory node 190 places its own address when it considers itself "free" and from which either a processor 70 or a memory node 100 may remove such names in order to claim free memory for its own use.

Memory nodes 100 are processors themselves and, therefore, any conventional computer can serve this purpose. Unlike the activity processors 70, however, memory nodes 100 perform different functions. The only real difference between memory nodes 100 and activity processors 70 is that functional one. Activity processors 70 are expected to execute activities 78, while memory nodes 100 are expected to manage auxiliary memory functions required by actions initiated by the activity processors 70. Implementors will probably find that activity processors 70 and memory node computers 100 will be different from each other, but that (with a few exceptions) all activity processors 70 are alike, and all memory nodes 100 are alike. In this way, the benefits of mass-production can be gained in producing the thousands of processors of each type which are envisioned as being integrated as components of the present invention.

All processors 70 each have a unique "processor address" and all memory nodes 100 each have a unique "memory node address". These addresses are required in order for the processors 70 and memory node computers 100 to direct messages to each other.

The function library 90 can now be contained in one or more distinguishable memory nodes 100. Although each memory node 100 is able to write as well as read into its internal memory, the function library 90 is never to be altered once it is placed into the machine during execution of its functions. However, the writability of that memory permits the initial function library load.

Memory nodes 100 can be viewed as being intelligent storage areas, each providing some minimum amount of memory capacity which is available for allocation and use by the other processors 70 in the system. If one thinks in terms of the machine's logical quanta ("words," which are logical data units, not physical, so that they need not be fixed-length physically) there is an apparent capacity of NODE LENGTH words. Memory nodes 100 will require enough physical memory to provide this minimum capacity as seen by outside units, and additional storage as well, in order to contain the control program and scratch storage for internal operations.

Input-output is also achieved through the memory subsystem 98 by attaching peripheral devices to memory nodes 100 and modifying the control programs of those nodes. Activities 78 can be informed, via the supplied program, of the address of these memory nodes 100, so that the addresses of those nodes become I/O addresses. Input and output generic activities (which in this machine require one and two input tokens to "fire" respectively) are supplied this "format" information (containing the I/O address) on one of its input tokens. The second input token of the OUTPUT generic activity is the data to be output, which is passed to the memory node which manages all formating as per predefined rules OUTPUT then, produces an "acknowledge" output token. INPUT does not permit the input data token to enter the system until it absorbs a "format" token which describes where the input is to come from. At that point, the activity 78 gathers a token from that memory (I/O node) whose address was specified in the format token, and outputs the data token into the token queue 76.

Some additional benefits can be gained for the configuration of FIG. 11. The problem of contention for the function library 90 can be taken care of by brute-force. As it is a read-only structure, it can be copied by all of the processors 70 out of the memory subsystem 98 and into their own internal storage, so that processors 70 about to require information from the function library 90 do not have to invoke the memory subsystem 98 to obtain it. The processors 70 can be instructed to make this copy during a reset sequence initiated after the function library 90 has been moved into the memory subsystem 98. Logically, as all copies are identical, it is still the same function store (i.e. a "virtual" function store).

Any processor 70 has to be prepared to execute any activity descriptor with any possible combination of variables. Therefore, every processor 70 will have pretty much the same internal control program which contains the detailed rules and definitions for each generic activity 78 defined in the system, as well as the rules for how to declare itself "idle", what to do when it is idle, what to do if system deadlock is declared, and so forth. This control program can be burned-in to ROM, for example, making the soft definitions much less easily altered. Alternatively, the same previous described technique can be used, during an initialization sequence, of loading the control program into the memory subsystem 98 and then instructing all activity processors 70, which are executing some minimal "bootstrap", to make a copy of it. In this way the very "instruction set" of the machine can be quickly modified, providing for rapid definition of new generic activities or other functions. This is a soft approach to "microcoding" in the sense that these control programs support the user's applications code described in terms of tokens 72, activities 78, and function libraries 90.

Most important, though, is that tokens 72 which do not "fit" into the token queue 76 can have part of themselves reside in the memory subsystem 98. The same is true for activities 78 and constants. Once this is done, it leads to tremendous advantages in the very important operations of duplicating, splitting, and concatenating structures simply by managing "use counts" in the memory subsystem 98 and replicating pointers into the memory. Memory nodes 100, within certain restrictions, should be permitted to re-organize the structures which they may contain so as to keep them as balanced as possible. This represents no overhead to activity executions unless the memory node 100 is unable to service requests while it is performing such an internal reorganization, and very often it will be able to service external requests during such operations as the requests might not always require access into the structure itself. The benefits of keeping structures as "balanced" as possible is clear in the context of "splits", which try to divide sequences roughly in the middle. If this is not achieved, logarithmic algorithms could deteriorate into linear ones (in terms of their execution time) or even worse. This is one reason for memory nodes 100 each having their own data processing capability, to manage such reorganizations automatically.

The processors 70 of the machine are arranged, logically, as a matrix of X rows with Y processors 70 in each row. The token queue 76 moves over the processors 70 parallel to this Y-axis, that is, from row to row. Processors 70 are assumed to have NIP physical input ports each, and NOP physical output ports. The number of input ports in one row is NIP.multidot.Y=NI, and the number of output ports per row is NOP.multidot.Y=NO. The token queue 76 actually moves L token positions over each row in parallel. The ratio (NO/L) is assumed integral, L is assumed even and X is assumed even. (NO/L)=K.

A physical token position is TW parallel bits. These break down into one "PA" bit, NCL "CL" bits, and NRD "RD" bits. "PA" indicates presence/absence, "CL" corresponds to the (present) token's call address, and "RD" accounts for the "remaining data".

A physical activity queue position is AW parallel bits. These break down into one "PA" bit, ADW "address" bits, and the remaining bits. AP=(AW-1), so these remaining bits are AP-ADW in number.

The activity queue 82 is, physically, J activity positions "wide" so that J activity positions move toward the processors 70 on each queue shift. J is assumed even. The activity queue 82 moves in the direction of the x-axis, that is, across the columns of the processor matrix. It is logically at right angles to the token queue 76.

The memory subsystem 98 consists of mu (.mu.) memory nodes 100, each of which provides an apparent memory resource of NODE LENGTH words to the rest of the system. Therefore, the address of a memory node 100 (MNODE=X) can be described in roof (log.sub.2 .mu.) bits. Processors 70 also have addresses (PROC=x) and can be described in roof (log (X.multidot.)) bits.

Using a massive crossbar circuit 100 of novel design (to be discussed shortly), the path between any two intercommunicating points is MTW bits in parallel, and sequential transfers at this parallel width are assumed. MTW does not correspond to any logical quantum (necessarily), such as a "word". Additional address lines and crossbar control lines are also included.

The memory subsystem 98 also includes the "free memory queue" (not shown) which has .mu.+XY positions.

Processors 70 in the X-Y matrix are assumed to have one activity input port and one activity output port each. They also interface to the memory subsystem 98 and the free memory queue, and to the token queue 76 as indicated above.

All autonomous (physically) processors 70 may have their own internal clock, provided their operation is synchronous with the external circuits to the degree that adverse behavior is avoided. Beyond this, there are three system clocks, which drive the (synchronous) circuits of the token queue 76, activity queue 82, and memory subsystem arbitration networks respectively. These clocks may also be independent. The token queue clock is assumed to run as fast as possible. The activity queue clock should be also as fast as possible. The memory clock should be as fast as reasonable in view of achievable transfer rates into the activity processors 70 and the memory node processors 100.

Finally, an overseeing "monitor" processor is necessary as a sort of machine "front panel" which senses adverse condition signals generated by the subsystems, takes care of generating resets, alarming the machine operator, and otherwise managing the "watchdog" function over the machine proper.

The drawings of the figures which are described herein are logical schematics only. They do not delve into the details of gate delays and fanouts, as any designer skilled in the art knows how to modify the designs to handle these properly. The design is also committed to no particular technology. This is positive logic, 1=true, 0=false.

MEMORY AND INPUT-OUTPUT SUBSYSTEM DETAILED CIRCUITS

M-1: P-bit Rotating One-bit

Generally indicated as 300 in FIG. 23.

Given an integer "p" (p>0) there is some integer n such that 2.sup.n .gtoreq.p>2.sup.(n-1). Then the circuit shows an n-bit binary up-counter 302 with these properties: (a) the output of the counter changes after an appropriate delay from the rising edge of the positive-duty clock, as shown; (b) the counter cycles through the values 0, 1, . . . , (p-1), then follows (p-1) with 0, 1, . . . ; (c) if ENB=0 on the rising edge of the clock pulse, the output of the counter will not change after the pulse has passed. Construction of such a counter is straightforward and should be obvious to those skilled in the art (consider for example, TTL 161 circuits).

The n-bits of the counter are input to an n-in, 2.sup.n -out decoder 304 which always has one output line true (1) and all other lines false. The line which is true corresponds to the binary value represented on the n lines from the counter; therefore the decoder will present T.sub.1 =1 (and all other T.sub.i =0), then (on the next clock pulse which changes the counter) T.sub.2 =1 (and all other T.sub.i =0) and so on. T.sub.1 corresponds to counter output=0, T.sub.2 corresponds to counter output=1, . . . , and T.sub.p =1 corresponds to counter output=(p-1).

The result is a circuit 300 which (assume ENB=1) has a one-bit rotating around the p outputs, with all the other outputs zero.

M-2: Break-in Switch

Generally indicated as 306 in FIG. 24.

This circuit, if H=1, passes input I to STAT, and a logic 1 to output "O". If H=0, input I is copied to output "O" and STAT is a logic zero.

M-3: Arbitration Signal Generator

Generally indicated as 308 in FIG. 25.

This circuit is presented without further comment than to note that only one of the P M-2 circuits 306 will have its H-input=1, so that the AC output is simply equal to the inverse of the STAT of that M-2 306 having H=1. Note also that the "I" and "O" lines are grouped together such that the "O" output of any given M-2 306 is associated with the "I" input of the next M-2 306, except for the output of the p.sup.th M-2 306, which is associated with the input of the first M-2 306. It should be readily recognized that the designations of 1, 2, . . . , p are somewhat irrelevant, as the circuit 308 defines a ring where no one M-2 306 is more important than another.

M-4: Innermost Crossbar Multiplexer

Generally indicated as 310 in FIG. 27.

Each box 312 is a 2:1 MUX. Output lines LTO, VDO, EO ("s" parallel lines) and MDO ("mtw" parallel lines) are copies of corresponding LT, VD, E, and MD if LOC=1, else (LOC=0) are copies of corresponding LTT, VDT, ET, and MDT. "s" may equal zero. That is, "E" lines are not always needed.

M-5: Innermost Path Latch

Generally indicated as 314 in FIG. 28.

This circuit is complex enough to be best described simply by Table 1 which is a truth table of five independent variables. However, the circuit shows a J-K flipflop 316. As always herein, this is meant only to indicate any appropriate memory device having the properties in this circuit: (a) inputs J and K are sampled on the rising edge of the clock pulse, but output Q does not change until after an appropriate delay; (b) if J=1 and K=O on the rising edge, then Q will be 1 after the pulse has passed; (c) if J=0 and K=1 on the rising edge, then Q will be zero after the pulse has passed. (d) if J and K are both O, then Q will remain as it was before the clock pulse arrived. Note that, in this circuit, it is not possible for both J and K to equal 1 at the same time.

A systemwide reset pulse (not shown) causes these J-K flipflops in all M-5 circuits 314 in the system to output a zero (Q=0).

There are actually six independent variables (one of them being input signal ACKI) but Table 1 is in terms of five, with ACKI appearing in the table itself and meant to indicate 1 if ACKI=1, or 0 if ACKI=0, whenever ACKI appears in the truth table as a dependent variable output.

    __________________________________________________________________________
                  LCHED prior
                          (prior to clock)
                                          LCHED after
    WANT ARBI
             HOLDI
                  to clock pulse
                          LT IGOT
                                 WANTF
                                      ATW clock pulse
    __________________________________________________________________________
    1    1   1    1       1  1   1    ACKI
                                          0
    1    1   1    1       0  1   1    ACKI
                                          ACKI
    1    1   1    0       1  0   0    0   0
    1    1   1    0       0  0   0    0   0
    1    1   0    1       1  1   1    ACKI
                                          0
    1    1   0    1       0  1   1    ACKI
                                          ACKI
    1    1   0    0       1  1   1    ACKI
                                          0
    1    1   0    0       0  1   1    ACKI
                                          ACKI
    1    0   1    1       1  1   1    ACKI
                                          0
    1    0   1    1       0  1   1    ACKI
                                          ACKI
    1    0   1    0       1  0   0    0   0
    1    0   1    0       0  0   0    0   0
    1    0   0    1       1  1   1    ACKI
                                          0
    1    0   0    1       0  1   1    ACKI
                                          ACKI
    1    0   0    0       1  0   0    0   0
    1    0   0    0       0  0   0    0   0
    0    1   1    1       1  1   0    0   0
    0    1   1    1       0  1   0    0   1
    0    1   1    0       1  0   0    0   0
    0    1   1    0       0  0   0    0   0
    0    1   0    1       1  1   0    0   0
    0    1   0    1       0  1   0    0   1
    0    1   0    0       1  0   0    0   0
    0    1   0    0       0  0   0    0   0
    0    0   1    1       1  1   0    0   0
    0    0   1    1       0  1   0    0   1
    0    0   1    0       1  0   0    0   0
    0    0   1    0       0  0   0    0   0
    0    0   0    1       1  1   0    0   0
    0    0   0    1       0  1   0    0   1
    0    0   0    0       1  0   0    0   0
    0    0   0    0       0  0   0    0   0
    __________________________________________________________________________


It would be difficult to discuss each of the 32 cases in depth at this point, so no such attempt at an explanation will be made. But, in capsule form, what is going on here is:

Bus (LTT.fwdarw.LTO, VDT.fwdarw.VTO, ET.fwdarw.EO, MDT.fwdarw.MDO) is a "horizontal" bus onto which several of these M-5 circuits 314 will be hung. All of these M-5 circuits 314 serve to try to gain control of this bus. If one gains control of the bus, it will route its own incoming data to achieve (LT.fwdarw.LTO, VD.fwdarw.VDO, E.fwdarw.EO, MD.fwdarw.MDO) and prevent any other M-5 314 from claiming this same bus. Therefore only one M-5 314 may claim the bus at a time (it is possible for the bus to be unclaimed, also). An M-5 314 tries to claim the bus by asserting WANT=1. At that moment, an involved arbitration series begins.

First, it is possible that this M-5 314 has already claimed this bus. This is indicated by the state of the flipflop 316. If LCHED=1, this M-5 314 already owns the bus. But, if LCHED=0, this M-5 314 must compete with the other M-5s 314 to try to stake a claim to it. This requires that (a) the bus be currently unclaimed in the sense that no other M-5's 314 flipflop 316 is set (indicated by HOLDI=0), and this M-5 314 must beat the competition on the arbitration line (ARBI.fwdarw.ARBO). This latter means that ARBI must be true. This M-5 314 will then output ARBO=0 to cause all subsequent M-5s 314 to see ARBI=0. One way or the other, that is, whether it already owns it (LCHED=1) or whether it wins a competition, it will cause IGOT=1, which controls the gating of the M-4 310 multiplexer.

If IGOT=1 and WANT is still true, the result is a WANTF=1 (want forward). That is, WANTF will not occur unless the current WANT was successful. WANTF (want forward), as will be seen will cause competitions in other M-5s 314 in the overall crossbar network; and if all of these forward probes succeed, signal ACKI will be true. ACKI means, in effect, that the entire path from start to finish was claimed. Having established a complete path (ACKI=1) permits path-latching to occur.

Path-latching means that, once an entire path has been claimed, all flipflops 316 in all M-5s 314 in that path will be thrown to LCHED=1. Therefore, once the path is completed, it is claimed until such time as the flipflops 316 are reset (back to LCHED=0). If ACKI=1, latching of the path will occur automatically on the next clock pulse unless line LT=1 (LT="last transfer").

If WANT=1 and ACKI=1, then signal ATW ("all the way") will become true. However, if a want forward (WANTF) is posed but ACKI=0 (forward claims failed) line ACDIS will go true. ACDIS, as will be seen, prevents the ARBI ARBO chain from changing its logical beginning and end (one should review circuit M-3, 308 in FIG. 25, and understand its purpose now) by preventing the counter 302 of an associated M-3 308 from advancing. Although this does not guarantee that this circuit will win the arbitration during the next clock period, it at least increases the probability of this.

For proper understanding, one must study, at some length, the overall behavior of M-5s 314 in the context of a complete M-8 circuit to be described shortly.

M-6: Want Decoder

Generally indicated as 318 in FIG. 29.

The source of lines (LT, VD, E, MD) in the M-5 circuit 314 is from one of the "legs" of this (M-6) circuit. They are simply direct fanouts of the (LT, VD, LE, MD) signals of M-6 318, respectively. A new set of R lines HE is introduced here which operates on an r: 2.sup.4 decoder 320 having the following properties. If its ENABLE is zero, all output lines of the decoder 320 will be zero. If ENABLE=1, then one and only one of the outputs will be true (the rest being false). The binary value represented in r determines which output (0, 1, . . . , 2.sup.r -1) will be true. These control outputs W.sub.1, W.sub.2, . . . , W.sub.rr as shown (rr=2.sup.r). There are also rr ATW.sub.i lines all ORed together to create a single unified signal CATW.

It is important to note here the grouping of the HE and LE lines in M-6 318. Think of HE meaning "high-order E" and LE meaning "low-order E." They are treated as a single bus M-6 318 "E" which is r+s bits wide. This is not the same as an M-5 314 "E" necessarily.

M-7: P-wide Compete for Row

Generally indicated as 322 in FIG. 30.

As stated earlier, many M-5's 314 will compete for this single "horizontal" bus. In general, there will be p M-5's 314. A single M-3 308 generates the ARBI-ARBO chain for the M-5s 314. Note that the M-3 308 will rotate the chain head/tail around as long as all ACDIS outputs of the M-5s 314 are all zero; but, if any one posts ACDIS=1 (arbitration counter disable) then the M-3 308 will freeze. The HOLDI of all the M-5s 314 is the OR of all the LCHED outputs of those same M-5s 314. The WANTF outputs of all M-5s 314 are ORed to form a single unified want-forward (recall that only one M-5 314 will be able to generate a WANTF at a time, as, at most, one M-5 314 will have control of this horizontal bus). The ACKI inputs to all M-5s 314 is also from a single source, as (for the same reason) this ACKI will have meaning only to one of the M-5s 314. The "right hand" end of the horizontal bus is tied to logic false. Everything is tied to the same clock.

M-8: P-compete, 2.sup.r s Path-Latching Broadcast Crossbar

Generally indicated as 324 in FIG. 31.

This circuit is vitally important in the overall design.

Consider some entity which would be able to generate signals into the inputs of one of M-6s 318 (say M-6 #"p", having inputs E.sub.p which is r+s bits wide, LT, VD, and MD all grouped together as DI.sub.p, with return line THRU.sub.p and overall "boss" line WI.sub.p.

DI represents data to be gated through the crossbar. It consists of line LT ("last transfer"), VD ("valid data") and the data lines themselves, MD ("memory data") which is mtw-bits wide ("memory transfer width"). Note that mtw has nothing to do (necessarily) with the width of a "word" or any such logical grouping of bits. In general, the bigger mtw is, the more information can go through the crossbar at any one time.

E.sub.p represents the address of the destination of the data, that is, where the data is supposed to go. It is represented as a binary integer which can be contained in r+s bits. With the data and address on the DI.sub.p and E.sub.p lines, that entity can then assert WI.sub.p =1. As the path-finding circuits are purely combinatorial, this entity will be informed, within the same clock period as when it asserted WI, whether it has achieved the path or not. It is told this via THRU.sub.p, with a successful path indicated by THRU.sub.p =1. This is a very important aspect of the present invention.

Recall that, when the clock pulse arrives, this entity apparently will gain permanent possession of this path (assuming LT=0). Suppose that the entity does so, and permits the clock pulse to arrive having set WI=1, LT=0, and observing THRU=1. Then, after the clock pulse has passed, the path will have been locked-in by the J-K flipflops 316 in the M-7s 322. At this point, the entity could send a sequence of transfers to the destination (one transfer for each clock pulse) without having to assert WI anymore. The first such transfer could have accompanied the WI=1 assertion, as that data would be gated through to the destination even though the path was not locked by the flipflops 316 (it was at least locked by the arbitration mechanisms). The entity would indicate VD (valid data)=1 had it meant the receiver to pay attention to the data lines at that time. In general, if VD=1, the input data on MD lines is to be taken as valid.

The reason for permitting VD=0 is seen in the following scenario. Assume this entity has just locked up one path. It then decides that it wants to claim a second path without releasing the first one. If one studies the drawings carefully, it will be found that if this entity succeeds, it will be able to send the same data to both destinations at once, achieving a selective "broadcast" capability which could be very important in the performance of the computer system, assuming the memory nodes operate under a suitable algorithm. VD is 0 while the entity is staking the first claim, as it does not yet want to send any data until it has established all claims that it can. Then, it will begin sending data.

To release all of the paths it has locked up, it need only send one final data transfer having line LT=1. If one again studies the circuits in detail, it will be seen that all locked paths, no matter how numerous, will all be released as soon as the falling edge of the clock pulse arrives.

Obviously, this "crossbar" means that the entity may or may not succeed in staking a claim to the path it needs on the first try. If it sees THRU=0 when a clock pulse passes by, then it may be persistent and simply "hang" onto the input lines as long as it wants to. It is important to note that it is guaranteed that eventually the entity will "get through" provided that all such entities in the system follow some basic rules:

(1) no entity may wait indefinitely for a second, third, etc. path after having claimed a first one. That is, it may "hang" indefinitely for the first path but cannot do so for subsequent paths; and,

(2) once all paths are established that can be established, the entity will make use of the paths and release them as soon as the transfers are complete. In other words, once a path has been established, it is required that (eventually) the path will be released.

As shall be seen, it may be reasonable for an entity to "hang" onto the inputs for several clock cycles in an attempt to establish a claim, but the longer it does so, the worse overall contention problems in the crossbar become (assuming all such entities behave the same way).

The crossbar switch strips r of the r+s bits from the incoming E.sub.p lines to determine which of the 2.sup.r "horizontal busses" to contend for. What emerges on the output of the horizontal bus are the remaining s bits which were not yet used in the addressing effort. However, each of the horizontal busses corresponds to a single value for the r lines. In FIG. 31, the top M-7 322 corresponds to HE=0; the second (not shown) would be HE=1; and so on, until the bottom M-7 322 which corresponds to HE=2.sup.r -1. All entities whose HE lines are the same bit configuration will compete for the same bus.

A shorthand (informal) designation for the M-8 324 circuit sometimes used herein is shown in FIG. 56 and has

IN: (number of input ports, width of address supplied by input ports),

OUT: (number of output paths, width of address pass through on output paths).

For example, as shown, the informal designator may have epsilon competing input ports (room for epsilon entities) and each such entity will supply an address ("E") which is alpha bits wide. The M-8 324 will use the first beta of these alpha bits resulting in 2.sup.beta output paths. The address lines on these output paths will be the remaining (alpha-beta) bits.

The bit fields of the address are shown in FIG. 57. Field A, which is alpha bits wide, is broken up into fields B and C by the action of the M-8 circuit. Field B is beta bits wide; and it is field B which is effectively decoded and used to direct the data. Field B, being beta bits wide, can take on the values B=0, B=1, . . . , B=(2.sup.beta -2), and B=(2.sup.beta -1). There is a unique output from the M-8 324 circuit for every such value of B. Note that not all such output paths have to lead anywhere necessarily, as paths with nothing important "out there" can simply be considered an "undefined destination address."

Synthesis of Larger Crossbars

Shown with reference to FIG. 58.

This important recursive design step permits the construction of crossbars of conceptually unlimited size, with constraints being those of implementation details, and also of the increasingly hostile probabilities for achieving a complete path from input to output. These probabilities are actually a consequence of the increasing number of input entities, with the exact design of the crossbar having a lesser effect than one might at first assume.

The formula given here is for the sake of simplicity, as it is obvious that it is not necessary that all M-8s 324 used in the synthesis of the larger M-8 324 be identical, as long as there are no errors in terms of misdirected addresses. However, the process is most easily understood that way.

Begin with an IN: (epsilon, alpha), OUT: (2.sup.beta, alpha-beta) crossbar 324'. The total number of inputs such a circuit provides for is epsilon.

Now, place omega of these same circuits 324' side-by-side, as shown on the bottom of FIG. 58. Each of these epsilon-in circuits 324' has 2.sup.beta outputs, each output corresponding to one possible configuration of address bit-field "C" as shown in the drawing of FIG. 60.

Next, place 2.sup.beta M-8s 324" which are IN: (omega, alpha-beta), OUT: (2.sup.theta alpha-(beta+theta)). The first of these 2.sup.beta will correspond to field C=0, the next for field C=1, and so on until the last is for field C=2.sup.beta -1.

Now connect the lower row of M-8s 324' to the upper row of M-8s 324" as shown. The lower row's first M-8 324' has its C=0 output directed to input one of the C=0 M-8 324" in the upper row. The lower row's second M-8 324' has its C=0 output directed to input one of the C=0 M-8 324" in the upper row, etc. The lower row's omegath M-8 324' has its C=0 output directed to input omega of the C=0 M-8 324" in the upper row. Now, the lower row's first M-8 324' has its C=1 output directed to the first input of the C=1 M-8 324" in the upper row. The lower row's second M-8 324' has its C=1 output directed to the second input of the C=1 M-8 324" in the upper row and so forth.

The upper M-8s 324" strip off an additional theta bits to achieve address decoding, so the emerging address lines from the top of these M-8s 324" is (beta+theta) bits narrower than the address lines coming into the lower M-8s 324'.

The result, though, is something that looks exactly like another M-8 324, although its internal complexity is greater. This is not visible to outside entities, however, so is functionally transparent. The result is an IN: (epsilon.times.omega, alpha), OUT: (2.sup.beta+theta, alpha-(beta+theta)) M-8 324"'. The shorthand symbol therefor is shown in FIG. 59.

Fully decoding Path-Latching Broadcast Crossbar

Shown with reference to FIGS. 61 and 62. The drawings of FIGS. 61 and 62 deal with this special case of an M-8 circuit 324, which results when there are alpha address bits on the inputs, and 2.sup.alpha outputs. In that case, there are no more address bits available for passing through as outputs EO, so these outputs do not appear. The informal designation of FIG. 62 indicates this situation by showing outcoming address paths of width zero; and in general the informal designation for such a circuit would be IN: (epsilon, alpha), OUT: (2.sup.alpha, zero).

Memory System Crossbar Facilities

See FIG. 32.

This is the overall picture of the memory subsystem minus a few features and details which will be described later.

The top row (squares 326) represent all the processors in the machine. Assuming this is a matrix of X by Y, the number of processors 316 here is X.multidot.Y processors 326. In the discussions that follow, if log.sub.2 (XY) is not integral, then round it up to the next largest integer.

Then, there are mu memory "nodes" 328. A memory node 328 is, in lay terms, a computer unto itself. It is intelligent, can be programmed by the machine designers, and manages a large block of "memory" (storage locations) which provide the basis of the overall machine's memory function. Except for possibly a few, all memory nodes 328 are alike.

Again, in the discussion which follows, if the value log.sub.2 (mu) is not integral, round it up to the next largest integer.

Then: there are three crossbars (324) in the machine.

(1) All processors 326 are inputs to an IN: (XY, log.sub.2 mu), OUT: (mu, zero) crossbar 330, whose outputs are directed to every memory node 328 in the machine. In this way, processors can "speak" to memory.

(2) All memory nodes 328 are inputs to an IN: (mu, log.sub.2 (XY)), OUT: (XY, zero) crossbar 332, whose outputs are directed to every processor 326 in the system. In this way, memory can "speak" to the processors 326.

(3) All memory nodes 328 are inputs to an IN: (mu, log.sub.2 mu), OUT: (mu,zero) crossbar 334, whose outputs are directed to every memory node 328 in the machine. In this way, memory nodes 328 can "speak" to each other.

This is the simplest approach. The design could be modified to place several crossbars 324 in parallel where just one appears right now, and could be made use of like this. Assume that instead of just one crossbar 324, there are, say, gamma such crossbars 324 with all paths leading from the same places to the same places. This means that any one "entity" rather than seeing just one input port to the crossbar 324, would see an effective gamma input ports. So this entity could assert WI and place other information on the other lines on all gamma input ports at once. The result will be gamma THRu signals to deal with. The OR of these would tell the entity that it succeeded in getting through on at least one of the crossbars 324. It only needs one of these, and circuits would "choose" one of the successful paths so as not to tie up the others needlessly. This will be described in later drawings.

It should now be clear that all physical processors 326 have a unique address. Wsith X.multidot.Y processors 326, these addresses could be assigned as PROC=0; PROC=1; . . . ; PROC=(X.multidot.Y-1).

Just as clear, all memory nodes 328 have a unique address. With mu memory nodes 328, these could be assigned as MNODE=0; MNODE=1; . . . ; MNODE=(mu-1).

All such addresses may be "hard-wired" into the processors 326 and memory nodes 328. It is certainly possible to assign addresses to processors 326 and memory cells 328 arbitrarily. All that is important is that once assigned, fthe crossbars 324 are wired to them correctly. This means that, for example in the case of processors 326, the processor address need not be a simple function of its matrix coordinates (row, column). All that is necessary is for all processors to have unique addresses.

These processor and memory cell addresses have nothing to do with the activity address field of activities. Activities' addresses are fundamental to the concept of the machine and are supplied to activities by circuits in the activity queue. Processor and memory cell activities are part of this particular example of the concept.

Clever assignments of the processor and memory node addresses, when taken into consideration along with the overall architecture and dynamics of the machine, could probably make a considerable difference in the machine's performance.

M-9, M-10 Free-Memory Queue Master Bit Internal Signals

Generally indicated as 336 and 338 in FIGS. 33 and 34, respectively.

These combinatorial circuits will be understood best in the context of circuit M-11.

M-11: Free-Memory Queue Master Bit

Generally indicated as 340 in FIG. 35.

The function of this circuit will again be best described by the truth table of Table 2, but the overall intent and meaning of this circuit is described here. Again, the "D"-flipflop 342 is meant to indicate any appropriate memory device, whose D-input is latched on the rising edge of the clock pulse, but whose output does not change until after an appropriate delay. A system-wide reset pulse (not shown) would clear all of these flipflops 342 (in all M-11 circuits 340 in the system) to Q=0.

This bit manages a present/absent bit. Presence (=1) indicates that the memory address of an uncommitted (free) memory node 328 is present in associated parallel bits (see M-13 hereinafter). If absent (=0) it indicates that there is no meaningful information here. A free memory address arrives from the right-hand side into input "R", and leaves by output "FQ". The free memory bit, then, is actually part of another "rotating queue". This one bit is part of another big shift register, closed onto itself. Each "stage" of the shift register, then, may either hold a valid free-memory address, or else holds a "bubble", an unused place in the queue.

Now, in addition to the input R and output FQ, the situation is complicated by two needs: (1) an entity wanting to claim free memory must be able to remove a valid address from this queue; and (2) a memory node which has just declared itself "free" must be able to place its address onto the queue. Any combination of these demands may occur at the same time.

As in other parts of the machine (e.g., the "accordion stores" of the token and activity queues) a favorable set of demands will permit input R to propogate directly to output FQ without getting the latch involved. This is very important toward moving free-queue information quickly to those entities which need it. Otherwise, an entity would have to wait much longer for a free memory address to come moving toward it, stage by stage, even if there are no intervening demands for it.

An entity needing free memory will assert input T=1. His claim is satisfied when the circuit returns TSUP=1. A memory node wanting to place its address into the queue would assert G=1, and acceptance of its address into the queue is acknowledged when GA=1. The current stage of the queue contains a valid address if the flipflop 342 output Q=1, and a preceeding stage is presenting a valid address to this stage if R=1. These are the four independent variables which the circuit must deal with. There are five things it must determine, based on these: (1) what information is to be routed to the "taker" (the entity trying to claim free memory); (2) what information is to be routed to output FQ; (3) what signal to present on output TSUP; (4) what signal to present on output GA; and (5) what information to present to the input of the D-latch (which determines what information will be present at its Q-output when the next clock pulse has passed). "dc"=don't care.

                  TABLE 2
    ______________________________________
                                 source
                                       source give
                                 of    of     to
    R    Q     G      T   TSUP   TSUP  FQ     D     GA
    ______________________________________
    1    1     1      1   1      G     Q      R     1
    1    1     1      0   0      dc    Q      R     0
    1    1     0      1   1      Q     R      0     0
    1    1     0      0   0      dc    Q      R     0
    1    0     1      1   1      G     R      0     1
    1    0     1      0   0      dc    G      R     1
    1    0     0      1   1      R     Q(=0)  0     0
    1    0     0      0   0      dc    R      0     0
    0    1     1      1   1      G     Q      R(-0) 1
    0    1     1      0   0      dc    Q      R     0
    0    1     0      1   1      Q     R(-0)  R     0
    0    1     1      1   0      dc    Q      R     0
    0    0     1      1   1      G     Q(-0)  R(-0) 1
    0    0     1      0   0      dc    G      R     1
    0    0     0      1   0      dc    R(-0)  R     0
    0    0     0      0   0      dc    Q(-0)  R     0
    ______________________________________


The circuit contains one 4:1 multiplexer 344 to control what comes out of FQ; but the table entry "source of TSUP" makes less sense in the context of the master bit than it does for a slave bit, in which case this determines what data will be presented to the "taker" if in fact TSUP=1. This information is provided over the two lines TH and TL generated by the M-9 circuit 336 in the master bit.

The truth table of Table 2 is for the case of input line TRICK=0. However, one must study the drawing to understand the effect of TRICK=1. It effectively forces incoming data to be latched rather than propagated directly through. It is used only for a few of the M-11s 340 in the system to "break up" the overall queue to control gate delays. What is important is that TRICK never=1 on an M-11 340 whose G input could also equal 1. To do so would prevent GA from ever arising so that the memory node asserting G would never be able to get its address onto the queue.

Outputs OH, OL and TH, TL control multiplexers in the address-bearing slave bits as seen directly hereinafter.

M-12: Free-Memory Queue Slave Bit

Generally indicated as 346 in FIG. 36.

Straightforward, this circuit contains the same kind of memory device as M-11 340 except that it need not be RESET by the system-wide reset pulse.

M-13: Free-Memory Queue Stage

Generally indicated as 348 in FIG. 40.

By connecting log.sub.2 mu (round up if not integral) slave bits as shown to a single master bit, one builds a stage having an input bus RB, output bus FB, input port FG and output port TB all of width log.sub.2 mu. In addition, here are control lines G, GA, T, TSUP, and TRICK which were explained for M-11 340. All circuits operate off of the same clock.

M-14: Free Memory Queue

Generally indicated as 350 in FIG. 41.

Since the overall machine has been defined as having X Y processors and mu memory nodes, there will be (X.multidot.Y)+mu M-13s 348 in the system arranged as shown. "TRICK" inputs are not drawn here but are discussed below.

Now, the memory nodes 328 will each have their own M-13 348 to provide input for. Therefore, mu of the M-13s 348 will have their G/GA inputs committed.

Next, there are (X.multidot.Y) processors 326 which might each require free memory at any time. Therefore, connect the T/TB outputs of (X.multidot.Y) M-13s 348 to the processors 326. When doing so, use as many of those M-13s 348 which have their inputs committed as possible. The result now is that mu of the mu+(X.multidot.Y) inputs are committed, and (X.multidot.Y) of the mu+(X.multidot.Y) outputs are committed.

The remaining M-13s 348 with uncommitted outputs are each assigned to a memory cell 328, so that all remaining such M-13s 348 will have their outputs routed to a memory cell input. This is best seen in FIG. 63. It is clear that not only is memory able to free itself by placing its address onto the queue, but also is able to claim free memory by removing an entry from the queue. Processors can only claim free memory.

What is left after following the above steps is a queue with (X.multidot.Y) uncommitted inputs. These can be dubbed "drones". Tie all G-inputs of the drones to logic false.

Now, assuming that drones are positioned physically in the queue with forethought to the step about to be described here, tie the "TRICK" input of a few select drones to logic true (1), strategically selected to control gate delays in the queue. Then, tie all other "TRICK" inputs of all other M-13's 348 to false. It should be noted here that M-14 is a special closed-loop application of the fast-forward FIFO described elsewhere herein.

M-15: Base-of-Crossbar Inner Arbitrator

Generally indicated as 352 in FIG. 42.

As discussed with respect to FIG. 32 above, a method will be developed for connecting M-8 circuits 324 in parallel to improve the probabilities of achieving a complete path by giving the opportunity to choose among several parallel paths. It will be seen that the resulting circuit (M-21 hereinafter) will appear to the outside world exactly as an M-8 324. Its performance, however, should be superior.

As a start toward this circuit, part of the job of getting input into the crossbar is managed with M-15 352. This is best understood while looking at M-16 below.

M-16: Parallel Crossbar Input Fanout

Generally indicated as 354 in FIG. 43.

Ultimately, we will hook gamma M-8s 324 in parallel, so that any one input entity will in effect be attempting to achieve the same path on gamma M-8s 324 all at once. It is possible, then, for more than one of the M-8s 324 to reply affirmative to the request, but only one path is needed (the other should be freed at once, leaving them available to other contentions). This circuit manages not only the one-in gamma-out fanout of the data lines and address lines (DI and E resp.), but manipulates WI and THRU so that either no M-15 352 will received an input THRU, or else only one M-15 352 will receive an input THRU. This means that only one of the M-8s 324 will be allowed to retain a claim on a complete path (assuming, of course, that one or more of them in fact achieved one).

As will be seen, later circuits to be discussed prevent more than one M-8 324 from declaring a complete path. The arbitration at this end causes no trouble and acts as a safety feature.

M-17: Top-of-Crossbar Inner Arbitrator

Generally indicated as 356 in FIG. 44.

Although a complete understanding of this may have to wait until the discussion of M-21 hereinafter, the most important feature is that signal THRU cannot be one unless signal WFMC is also one. And, as shall be seen, among all the M-17s 356 which lead to the same destination, only one of them will have its WFMC true.

Circuits M-17 through M-21 manage the job of taking all of the gamma incoming requests-for-complete-path and choosing to reply to only one of them (leaving the remaining M-8s 324 free for other work). It turns out that, just as the M-8s 324 provide for internal path latching, it is therefore necessary, once the choice among the gamma contenders is made, to latch this choice as well (as it determines part of the overall path). This should be clear after studying circuit M-21.

M-18, M-19: Brute-Force Multiplexer

See FIGS. 45 and 46.

This multiplexer design, generally indicated as 360 in FIG. 46, guarantees that all output data lines are at defined logic levels. M-19 360 clearly chooses one of the sets of DO lines (there will be gamma sets emerging from the gamma M-8s 324 headed for this single destination) by selecting one (and only one) of the SEL inputs. Setting SEL.sub.i =1 selects DO set DO.sub.i. If no SEL input is high, then the outputs will be all zeroes.

M-20: Choice-of-Address Multiplexer

Generally indicated as 362 in FIG. 47.

An M-19 multiplexer 360 has its SEL.sub.i inputs delivered from the outputs of 2-in 1-out multiplexers 312 controlled by input LADD. If LADD=1, the M-19 SEL.sub.i inputs will come from the Q (outputs) of the D-flipflops 342 shown. Otherwise, (LADD=0) the SEL.sub.i inputs will come from the D (inputs) of the flipflops 342. The usual statement about the D-flipflops applies. These are meant to indicate any appropriate memory device whose input is latched on the rising edge of the clock pulse (LCLK) but not copied through to the output until after an appropriate delay. The inputs to these flipflops 342 come from inputs WC.sub.i as shown. The system reset pulse (not shown) clears all these flipflops to Q=0.

M-21: Parallel Crossbar Output Multiplexer

Generally indicated as 364 in FIG. 48.

Recall that the starting point here is that it is possible for more than one of the WI.sub.i to be true at the same time, representing more than one successful path, tying up more than one of the gamma parallel M-8s 324. Assume J-K flipflop 316 output Q=0 (more about this flipflop later). Then, the M-17 circuits 356 will permit only one of the WC.sub.i inputs to M-20 362 be true at once. In addition, the signal WF will be allowed to go true, indicating a complete path "want forward" to the ultimate receiving entity. Since input LADD to M-20 362 is zero, these WC.sub.i inputs control the M-19 360 internal multiplexer directly, so that the inputs corresponding to that M-17 356 whose WFMC output was allowed to be true will be copied through to the DO outputs. One of those bits is bit LT (last transfer).

Assume now that LT=0. That is, the sender at the other end of the crossbar wants to latch this path. Then, clearly, this circuit must latch this choice of which of the gamma M-8s 324 it is currently paying attention to (as the sender will probably not reassert its "want" signal, nor could it expect to win the arbitration on the next clock pulse even if it did). So with LT false and WF currently true, input J to the J-K flipflop 316 will be true (input K false) so that after the clock pulse has passed, the J-K flipflop 316 will output Q=1. The action of this same clock on M-20 362 must also be noted. Before the clock pulse, Q=0, so that the input LCLK will be enabled to receive the clock pulse. When it arrives, then, the internal latches of M-20 362 will latch the WC.sub.i inputs. After the pulse has passed, the WE.sub.i inputs before the pulse will be secure in the latches. Also, the J-K flipflop 316 will output a 1 (true) which does two things: (a) it prevents any more clock pulses from entering LCLK until some other action causes the J-K flipflop 316 to become a zero again, and (b) LADD will get a true input, meaning that on this new clock cycle, it will gate the data in from the same M-8 324 that it did the last time. Once the J-K flipflop 316 outputs a 1, it is no longer possible for WF to output a 1 until some action causes the latch to release (cause the J-K fliptop 316 to output zero). Exactly what this action is is clear, and is what is expected. As soon as the sender sends one more transfer with its bit LT=1, the clock pulse will see J=0 and K=1 on the flipflop 316, and reset it. The circuit then becomes available for arbitration again.

The receiving circuits (forward circuits) which deal with WF and generate ACKI are expected, just as for an M-8 324, not to assert ACKI unless WF is true AND the circuit is indeed ready to accept the transfer (possibly a burst transfer) at once. The M-17s 356 will insure that the ACKI is routed through that M-8 324 which has the path of interest.

The J-K flipflop 316, again, is meant to indicate any approriate memory device just as for circuit M-5 314. The reset pulse also applies to this device (will cause it to output zero).

M-22: Parallel Fully-Decoding Path-Latching Broadcast Crossbar

See FIG. 49.

Instead of using just one M-8 circuit 324, one can interconnect gamma M-8s 324 as shown to yield the gamma-parallel crossbar generally indicated as 368. This circuit is fully-decoding. That is, there are no EO busses emerging from the top of the circuit. This is a consequence of the particular designs of M-16 354 and M-21 364 circuits which did not provide for the throughput of E busses.

But, their addition follows logically. M-19 360 is readily modified to handle incoming E lines by the addition of as many more M-18s 358 as are needed (equal to the number of lines in the E-bus) and interconnecting them in the obvious way. The result affects M-20 362 in that now each DI.sub.i will also have an accompanying E.sub.i input, and there will be a single set EO emerging from it. This affects M-21 364 in the obvious way.

With this E-bus handling capability added to the M-21 circuits 364, it is not necessary to use fully-decoding M-8s 324, but instead one can use M-8s 324 which pass through a nonzero number of bits for EO.sub.i. These are then connected to the M-21s 364 in an apparent way, with the result that we have synthesized a not-fully-decoding, parallel path-latching broadcast crossbar. Such a crossbar is also an M-22 circuit 368.

The implications of this should be clear. M-22s 368 may be used wherever M-8s 324 are used, and vice-versa. Larger M-8s used 324 and larger M-22s 368 may be synthesized from lesser M-8s 324 and M-22s 368 in any conceivable combination. Which "mix" of circuits does the best job can be found by queueing theory analyses, trial and-error, simulation, or other known techniques.

In the four figures to be described next, as shall be seen, the circuits describe the interface of the memory system developed so far to individual memory nodes 328 and to the processors 326. Memory nodes 328 and processors 326, simply put, may be conventional computers. Consequently, there is some danger of these conventional computers being not properly synchronized with the circuits described so far. This danger is simply assumed avoided by proper implementation according to techniques well known to those skilled in the art.

Memory nodes 328 and processors 326, then, may interface to these other circuits through a conventional I/O mechanism. The next four figures described show what sorts of interfaces might be devised to accomodate such interfaces to demonstrate that the overall circuits can, in fact, be integrated. Exactly what sort of conventional computer (or computers, for that matter) will be "inside" of a memory node 328 or activity processor 326 will be discussed after the interfaces are demonstrated.

M-23: Example: Interface to M-13 Give-Free-Memory

See FIG. 50.

M-23 shows how a memory node might "give" its "hardwired" memory node address to the memory free queue. Its connection to an M-13 circuit 348 should be obvious. The "S-R flipflop" 368 is assumed not to reset its output until the falling edge of the pulse shown. The memory node 328 would generate a pulse "declare memory free" and would simply wait until a "free memory address accepted into queue" pulse emerged before assuming that it has in fact been properly declared as free memory.

M-24: Example: Interface to M-13 Take-Free-Memory

See FIG. 51.

This circuit is common both to memory nodes 328, and to activity processors 326. Therefore, this example could well apply to both. Its interconnection to an M-13 circuit 348 again should be obvious. A memory node 328 or processor 326 needing free memory would generate a pulse "need free memory" and would use the leading edge of the returning pulse "free memory supplied" to latch the data present on lines TB. The falling edge of that same pulse would also clear the request S-R flipflop 370.

M-25: Example: Interface to M-8/M-22: Sender

See FIG. 52.

This circuit is also common to both memory nodes 328 and to activity processors 326, as both will need to move data from themselves over a crossbar. This example was designed with a direct-memory-access type of transmission method in mind. The sending processor 326, when it has prepared in advance the sequence of information to be sent (which will be sequenced onto lines LT, VD, E, and MD properly), then throws a pulse "initiate" which sets request flipflop 372. Assuming there was not a latched path already at that time, interface line WANT will go high. Then, on the arrival of the clock pulse, either ACKI will be true (path established) or false (path not yet established). If ACKI is true, the logic level "path achieved" will go high, which will permit the generation of a "transfer occurred" pulse which can be used to sequence the information present on the LT, VD, E, and MD lines. If this was in fact the last transfer, a "last tranfer occurred" pulse is generated to indicate to the processor 326 the completion of the requested service. "Transfer occurred" also clears the request flipflop 372 (the falling edge thereof).

However, if ACKI was not true on the request out, a pulse "want request failed" will be generated. If this pulse is generated, another flipflop 374 (managed by the processor signals "try continuously" and "try only once more") will control whether the circuit will simply continue to assert itself, or give up. If it gives up, signal "want request failed, will not reassert" is generated.

If the processor 326 is trying to achieve a "broadcast", that is, trying to send the same information to more than one place at once, additional control would be necessary and the specific circuits for that are not shown. But, for purposes of this example, a processor 326 trying to achieve multiple paths would generally keep the VD output low, keep the LT line low, and on its first path claim, would proceed as above. But, as soon as the "transfer occurred" pulse arrived, the circuit would then reset the J-K flipflop 326, be sure the new address is present on the E lines, and generate "initiate" again. The additional circuits (not shown here) must remember that there is already a complete path achieved. The new request will result in success or failure, as before. If a failure, the processor 326 must not permit the circuit to assert itself without end on this subsequent-path request, as this could hang up the crossbar. It can manage this with the "try" flipflop 374 as before. When all paths have been established that can be established out of the processor's attempts, then (with proper synchronization) the J-K flipflop 376 can be set again and the transfer will occur normally. In such broadcast sequences, the establishment of paths must be very carefully managed and all control pulses and data lines properly sequenced.

M-26: Example: Interface to M-8/M-22: Receiver

See FIG. 53.

Again with direct-memory-access in mind, this example shows how a processor 326 or memory node 328 might set itself up to receive over the crossbar. Having set up its internal transfer buffers and so forth, the processor 326 generates the "enable initial reception" pulse. This sets a "permission" flipflop 378 which is gated to the incoming "want" from the crossbar.

A true output from this gate 380 is the ACK (acknowledge) signal returned to the crossbar. If ACK is true and this is not the sender's last transfer (LT line is false) then the J-K flipflop 382 will see J=1 and K=0, so will set output true after the pulse has passed (this J-K flipflop 382 and that in M-25 are as all others in the system: inputs sampled on clock rising edge, output changes after an appropriate delay).

When the clock pulse arrives, and either this circuit is asserting ACK or if it has latched the J-K flipflop 382 true, the clock pulse will generate "transfer occurred". Additionally, if the data was valid (VD line true), this pulse will generate "valid data transfer occurred" which can be used to enable the transfer of the data lines into the processor's memory or what-have-you. When LT is true before a clock pulse, the clock will generate "last transfer occurred", which can be used to alert the processor 326 that a complete transfer from the processor 326 has taken place. This LT=1 also clears the J-K flipflop 382. Any transfer will clear the "permission" flipflop 378 ("transfer occurred" falling edge) so that ACK is only asserted for the first clock period.

M-27: Memory Node

Generally indicated as 392 in FIG. 54.

Given that M-27 392 can be a conventional processor, this drawing is sufficient to specify it. The parameters of this conventional processor are not especially important, as long as it effectively provides for all of the functions expected of it. It is enough to consider this now as a "black box" which operates under its own internal control "program", and whether this black box contain one or many processors, whether it contain a conventional processor at all for that matter, is not consequential here. It should be obvious to any computer scientist that a conventional processor here will do the job, that it can be provided an endless variety of control programs, whether these be in ROM, or other devices is again not important. This "soft" control program, microcode if you like, will generally be the same for all M-27s 328 in the overall computer, except for a special few which get involved in machine initialization, and function-library lookup tables for improved implementation of function name and constant name lookups.

Each M-27 392, then, must contain enough internal storage (whether this is RAM, bubbles, magnetic media, etc. is again of no consequence here) to permit it to appear to the remainder of the system as if it has capacity for some minimum number of the overall computer's "tagged words", the basic quantum of information in the machine. This minimum number (call it NODE LENGTH) is another design parameter at the discretion of the implementor. The overall machine could address information in a memory node 328 by a memory pointer word (this is "tagged" as being a memory pointer) containing several fields, one field of which is the address of the memory node 328 itself, and the other field or fields specifying divisions of this memory node's address space, that is, exactly where or what in the memory node is being addressed.

In addition to the storage required to implement the NODE LENGTH, the memory node 328 will probably contain a great deal of additional storage to manage its own internal bookkeeping, manage communications with the overall system as fast as possible, implement decisions as to whether to "broadcast" or manage requests sequentially, and so forth.

The CLK shown is NOT this memory node's internal clock, but it is the clock provided by the external circuits (the crossbars and memory free queue). Proper synchronization between this clock and actions of the memory node 328 itself is assumed. Each memory node 328, as mentioned before, has its own unique address, and this is indicated by the application of this address from the outside. This could be switches, bus lines in the circuits, or assigned dynamically at system initialization into memory circuits. Again, this is not consequential here. This "hard-wired" memory node address is available to the memory node 328 itself (it can "see" its own address) and the processor 326 would use that address when declaring itself free (module M-23 of FIG. 50), in addition to using it to identify itself to modules it is communicating with.

M-28: Memory Node for Input-Output

Generally indicated as 394 in FIG. 55.

One method of providing input-output is to set aside some of the unique addresses which would otherwise be given to memory nodes, and assign them to modified memory node processors as shown. These interface to the system exactly as M-27s 392, except that they generally will not declare themselves "free" (they do not serve a memory function) so will not need an M-23 384 interface. Instead, they will interface to real peripheral devices, such as printers, bubble memory stores, real-time devices, or etc. These M-28s 394 are, therefore, I/O processors. The activity processors executing input, output, and I/O control activities can communicate with these exactly as they would communicate with an M-27 392.

As M-28s 394 are conventional processors just as M-27s 392 are, they can be very sophisticated and manage all of the output formatting, specific device control, conversions to character or from character formats, and any other conceivable management of input-output functions which the overall machine should not have to take care of at the activity level. The simple generic activity "output", given a data token to output and a "format" token to give it any other needed details, can initiate the output of many pages of information, especially if the data token is a matrix or other structured datum. Once in a processor, the activity would cause the processor to communicate that token to the M-28 394, which could then eject it from the system through the I/O device and ultimately declare the token as "absorbed". Input operations would be similar. An input activity would initiate an input operation by an M-28 394. When the M-28 394 has assembled all input, created a machine-compatible token out of it, balanced it properly, and so forth, the M-28 394 reports completion to the processor containing the waiting activity, which then ejects the token (or its descriptor) into the token queue.

M-28s 394 will probably not all be alike, as they will be managing different peripherals and be performing different functions. Otherwise, their integration into the overall computer is obvious, and also obvious that input-output can be achieved with them.

Review, Memory Subsystem Complete

The memory subsystem, then, consists of three M-8,/M-22 crossbars (processor-to-memory, memory-to-processor, and intermemory), mu memory nodes (M-27s and M-28s, which contain, for example, interfaces M-23 through M-26), a free memory queue (M-13s as necessary), and M-24, M-25, and M-26 interfaces to the activity processors (X Y of each). All of these circuits require a clock input: they are all driven from the same clock (call it MCLK, memory clock). This clock is not necessarily the same clock as the token queue clock, or the activity queue clock.

TOKEN QUEUE DETAILED CIRCUITS

Q-1: Parallel Comparator

Generally indicated as 400 in FIG. 64.

This circuit is used to decide whether or not there is a match between a token's call address and an input port's call label. A call address and a call label both are defined as being of length NCL bits. A logic true is produced at output M if:

(a) a token is present (A.sub.PA =true), AND

(b) the input port to which this comparator belongs is in CALLING state (B.sub.PA =true), AND

(c) Each bit of the token's call address (supplied on the A-bits A.sub.1 through A.sub.NCL) matches each correspondng bit of the input port's call label (supplied on the B-bits B.sub.1 through B.sub.NCL).

The quantity NCL+1 is defined to be =NCX. Circuit Q-1 400, then, can be thought of as requiring two inputs, each consisting of NCX parallel bits, and producing a single output bit.

Q-2: Claimant Circuit

Generally indicated as 402 in FIG. 65.

This circuit pr