For a parallel or multiprocessor system

Syntactically self-structuring cellular computer

4583164

Abstract

A design is disclosed for a cellular computer consisting of many processors, of two kinds, connected in the form of a tree. The computer is intended for the highly parallel execution of programs written in an applicative programming language. The program is stored in the leaf cells of the tree. The computer uses the syntactic structure of the program to guide the embedding of a network of "syntactic nodes" in the tree of machine cells, and execution of the program is accomplished through operations performed by the embedded network of nodes. This computer can execute many user programs simultaneously, it can take advantage of all the parallelism expressed in each user program (storage space permitting), and it can perform in parallel many operations below the level expressed in the user programs.


Claims

What is claimed is:

1. A method for parallel execution of specified operations upon data having a hierarchical syntactic structure, said structure comprising expressions nested within expressions, said specified operations to be executed on specified ones of said expressions, comprising the steps of:

1. Entering said data into an assemblage of cells, each cell comprising computational resources and communications resources,

2. Forming a processing network by apportioning said computational resources to form, in said cells, nodes in correspondence with certain expressions of said data, and apportioning said communications resources to form channels connecting said nodes, said processing network comprising said nodes interconnected by said channels, said processing network having a hierarchical structure corresponding essentially to said hierarchical syntactic structure of said data, each said node comprising computational resources entirely within one of said cells and each said channel comprising communications resources within at least one of said cells, and connecting two said nodes, at least one of which comprises computational resources belonging to the same cell as do certain communications resources forming said channel, and

3. Executing said operations upon said data by using said apportioned computational resources of said nodes to execute said operations on said expressions in respective correspondence with said nodes, and using said channels for communications among nodes coupled by said channels during said execution.

2. A method according to claim 1 wherein said assemblage of cells is a tree network of cells.

3. A method according claim 2 wherein said data comprises expressions in an applicative language, such as a Formal Functional Programming language wherein certain of said expressions comprise reducible applications, each said reducible application comprising an operator expression and an operand expression and wherein said specified operations to be executed are specified by said operator expressions and wherein said specified ones of expressions on which operations are to be executed are the said operand expressions.

4. A method according to claim 1 wherein said step of executing said operations on each said specified expression comprises the steps of:

1. Executing operations in a node corresponding to a highest level of said syntactic structure of said specified expression, said operations being executed on said specified expression, and said operations causing function calls to be made on nodes in correspondence with said expressions nested within said specified expression at a next lower level of said hierarchy of said processing network,

2. Executing operations in said nodes at said next lower level of said hierarchy of said processing network, in accordance with information specified in said function calls, said operations being executed on said expressions nested within said specified expression,

3. Recursively executing further operations in nodes at still lower levels of said hierarchy of said processing network, said nodes in correspondence with expressions nested still more deeply within said specified expression, and said operations being executed on said more-deeplynested expressions, and,

4. Recursively returning information from nodes at lower levels to nodes at higher levels of said hierarchy of said processing network in accordance with information specified in said function calls, so as to complete execution of said operations on said specified expression.

5. Apparatus for processing data, said data comprising atomic symbols belonging to a specified set of valid atomic symbols, and instances of syntactic markers belonging to a specified set of valid syntactic markers, said instances of syntactic markers delimiting sequences of said atomic symbols to form disjoint expressions, other instances of said syntactic markers delimiting sequences of said expressions to form further disjoint expressions, said data thereby being stuctured by said syntactic markers into a hierarchy of multiple levels of expressions nested within expressions, each expression at a given level of said hierarchy comprising either one said atomic symbol or a syntactically delimited sequence of disjoint expressions at a next lower level of said hierarchy, said processing to be done in accordance with specified compositions of elementary operations on specified ones of said expressions, said apparatus comprising:

a network of interconnected cells, input means for entering said data into certain of said cells, and output means for removing processed data after said processing,

each said cell comprising computational resources adapted to performing said elementary operations, communications resources adapted to sending and receiving data during said processing, and logic means responsive to said syntactically delimited hierarchical structure of said data, said logic means being adapted to communicate with and cooperate with said logic means in other said cells to create a processing network of computational nodes and communications channels in said network of cells by apportioning said computational resources to form said computational nodes and by apportioning said communications resources to form said communications channels interconnecting said nodes,

each said node comprising computational resources entirely within one of said cells and each said channel comprising communications resources within at least one of said cells, and connecting two said nodes, at least one of which comprises computational resources belonging to the same cell as do certain communications resources forming said channel,

said nodes being of at least two kinds, expression nodes corresponding to individual expressions in said data, and combining nodes corresponding to groups of expressions in said data,

said processing network being structured by said interconnecting communications channels into a hierarchy of multiple levels of nodes, each expression node at a given level of said hierarchy being connected by said channels through combining nodes at said given level of said hierarchy to expression nodes at a next lower level of said hierarchy,

said hierarchical structure of said processing network corresponding essentially to said hierarchical structure of said data in that each given expression E to be processed in said data corresponds to an expression node N(E) at some level of said hierarchy of said processing network and in that said expressions nested immediately within said given expression E correspond to expression nodes at said next lower level of said hierarchy of said processing network,

each said node of said processing network being adapted to cooperate with other said nodes to process said data in accordance with its specified compositions of elementary operations by using said apportioned computational resources to perform said elementary operations on said individual expression or group of expressions corresponding to said node and employing any partially processed data passed to it via communications channels immediately connecting said node to other nodes above and below and by passing resulting partially processed data to other said nodes immediately coupled to said node through said communications channels for further processing in accordance with said specified compositions of elementary operations until said processing is complete.

6. Apparatus according to claim 5 wherein said network of cells is a tree network of cells comprising a root cell, interior cells and leaf cells.

7. Apparatus according to claim 6 wherein said data to be processed is stored in said leaf cells prior to said processing.

8. Apparatus according to claim 7 wherein said data to be processed comprises expressions in an applicative language such as a Formal Functional Programming language, wherein certain of said expressions comprise reducible applications, each said reducible application comprising an operator expression and an operand expression, and wherein said specified compositions of elementary operations are specified by said operator expressions, and wherein said specified ones of said expressions to be processed are said operand expressions.


Description

TABLE OF CONTENTS

Background of the Invention

1. Formal Functional Programming Languages (FFPs)

2. Prior Art

2.1 The Mago Machine

2.2 DDMl

2.3 CORAL

2.4 The Expression Processor

Summary of the Present Invention

Brief Description of the Drawings

Description of a Preferred Embodiment of the Invention

1. The Overall Structure of the Computer

2. A Compact Representation of the FFP Text

3 Partitioning

3.1 Introduction

3.2 The Nodes and Channels of the TA-Mediator Network

3.3 Embedding the Leaf Nodes of the TA-Mediator Network

3.4 The Partitioning Algorithm

3.5 An Example

3.6 Properties of the TA-Mediator Network

3.7 Storing Information in the IN Nodes

3.7.1 Mtext, Ltext, and Rtext

3.7.2 Relative Offsets

3.8 Some Implications of Partitioning

3.9 Another Example of Partitioning, with Relative Offsets

4. Overview of Execution

5. Embedding the SN-CP Network

5.1 Introduction

5.2 The Downsweep: Computing the RLNs

5.3 The Nodes and Channels of the RA's SN-CP Network

5.4 The Upsweep: Building the SN-CP Network for the RA

5.4.1 A Simple Example of an SN-CP Network

5.4.2 Embedding the Leaf Nodes of the SN-CP Network

5.4.3 Embedding the non-Leaf Nodes of the SN-CP Network

5.4.4 An Example

5.4.5 Characterization of the SN-CP Network

5.4.6 Storing Syntactic Information in the Nodes

5.4.7 Storing Information about the Syntactic Father

5.5 Some Implications of the SN-CP Network

6. The Syntax Tree Language (STL)

6.1 Three Language Levels: FFP, STL, and MCL

6.2 Libraries of Operator Definitions

6.3 The STL and the Microprogramming Language of Machine I

6.4 Overview of the STL Execution Scheme

6.5 Storage and Transmission of Data in the SN-CP Network

6.5.1 Registers

6.5.2 Streams and s-queues

6.5.2.1 Creation of s-queues by CALL-SONS

6.5.2.2 Creation of s-queues by pipe operators

6.6 Reducing the RA: REDUCE

6.7 Operations in FN and FC nodes

6.8 Requesting Storage

7. Storage Management

7.1 Overtime

7.2 Finding the molten zones

7.3 User-program removal

7.4 Compaction of brackets (an option deferred)

7.5 Insertion or denial decisions

7.5.1 Insertion decisions

7.5.2 Denial decisions

7.6 Destination computation

7.6.1 The upsweep: embedding the BN network

7.6.2 the downsweep: computing BL and BR

7.6.3 The destination computation in the leaf BN nodes

7.7 Text movement

7.8 New user-program insertion

7.9 Partitioning and parsing

Claims

BACKGROUND OF THE INVENTION

This invention is a computer composed of numerous processor/memory units connected in the form of a tree, and intended for the efficient execution of applicative programming languages. In recent years, it has become possible to put one or more processing units and memory units on a single chip of silicon and to make numerous identical copies of such a chip at a small unit price. Given this technology, it is natural to try to combine many processors and memory units to obtain a powerful general-purpose computer. It is not obvious, though, how they should be connected, nor how they can be organized to cooperate with each other.

One promising approach is that proposed recently by Mago in U.S. Pat. No. 4,251,861 and in "A network of microprocessors to execute reduction languages," two parts, International Journal of Computer and Information Sciences 8, 5 (October 1979) and 8, 6 (December 1979). His computer architecture is designed specifically to execute an applicative programming language such as the Formal Functional Programming (FFP) languages introduced by Backus in "Can programming be liberated from the von Neumann style? A functional style and its algebra of programs," Communications of the ACM 21, 8 (August 1978), 613-641. These documents by Mago and by Backus are incorporated here by reference. An FFP language is defined by a small set of syntactic and semantic rules, with useful mathematical properties for reasoning about programs written in the language. FFPs are applicative (expression-oriented) rather than imperative (statement-oriented). There are no variables and no statements. Powerful operators can be built from simpler operators in a uniform style not possible in conventional programming languages. Most importantly, parallelism is easily and naturally expressed.

The machine disclosed here, like Mago's machine, can execute many programs simultaneously, and it can perform in parallel many operations below the level expressed in the program, since a single "primitive" FFP operation may be composed of more basic machine operations, some of which can be executed in parallel.

The machine disclosed here has been previously described in Tolle's Ph.D. dissertation, "Coordination of Computation in a Binary Tree of Processors: an Architectural Proposal," Department of Computer Science, the University of North Carolina at Chapel Hill, April 1981. That document is incorporated here by reference.

Since the invention described here is related to Mago's machine, it is necessary to discuss some of the details of his design. Before that is possible, the FFP languages must be described.

1. Formal Functional Programming Languages (FFPs)

This section covers FFPs only to the extent necessary to indicate the requirements of a machine to execute such a language. For a complete treatment, see the paper by Backus, previously referenced.

An FFP program is an expression. Every FFP expression e has a unique meaning, which is also an expression. The meaning of the expression e is denoted by mu(e); mu is called the meaning function. Execution of a program e consists in evaluating e, that is, finding mu(e).

The most elementary kind of expression is an atom. The choice of set A of atoms, in conjunction with the syntactic rules, determines what expressions are in the particular FFP language. Typically, A is chosen to include alphanumeric character strings and special symbols such as +, -, and *.

There are three other kinds of expressions: the sequence, the application, and bottom. Let x1, . . . , xn be expressions. Then <x1, . . . ,xn> is an expression, called a sequence (of length n), and xi is the i-th element of the sequence. Backus uses 0 to represent the empty sequence (the sequence of length 0), but here < > is used instead. If x and y are expressions, then (x:y) is an expression, called an application; x is its operator and y its operand. Both are elements of the expression. Bottom, denoted by .perp., plays a special error-indicating role in the language. It is not an atom, a sequence or an application. If x=<x1, . . ., xn> and one of the elements of x is .perp., then x=.perp.. That is , < . . . ,.perp., . . . >=.perp.. All expressions are formed by finite use of these rules. The symbols `<` and `>` are called sequence brackets and the symbols `(` and `)` application brackets. All four are referred to as syntactic brackets.

The subexpressions of an expression e are e itself and the subexpressions of the elements of e. An object is an expression that has no application as a subexpression. Thus .perp. is an object, an atom is an object, and a sequence is an object if and only if each of its elements is an object. The meaning mu(e) of an expression e is always an object. If e itself is an object, then mu(e)=e; the objects are precisely the fixed points of the meaning function mu. If e=<e1, . . ., en>, then mu(e)=<mu(el), . . . ,mu(en)>.

Execution of an FFP program (expression) e consists in finding mu(e). If e is an object--i.e., contains no applications--then mu(e) is simply e. If e does contain applications, then mu(e) is found by repeatedly replacing the innermost applications (the applications that have no other applications as subexpressions) by other expressions with the same meanings. If this process is non-terminating, then mu(e)=.perp.. If this process does terminate, the resulting object is mu(e). FFP languages have the Church-Rosser property: the meaning of an expression is independent of the order in which the innermost applications are evaluated. If the right machinery were available, all of them could be evaluated simultaneously.

The meaning of an innermost application (x:y) depends on the function represented by the operator x. This function is determined as follows:

1. If x is an atom, then there may be some FFP definition associated with that atom. An FFP definition must be an object--i.e., an expression with no applications. In this case the definition is substituted for the atom it defines, yielding a new application whose meaning is the same as that of the original application. Suppose, for instance, that the atom DDBL is associated with the definition <COMP,DBL,DBL>. Then evaluation of (DDBL:Q) begins by substituting the definition in place of DDBL, yielding (<COMP,DBL,DBL>:Q).

2. The second possibility is that x is an atom that is associated with some primitive ("built in") function. For instance, the atom DBL might be associated with a primitive function that "doubles" its operand, in the following sense: (DBL:U).fwdarw.<U,U>. (The arrow notation ".fwdarw." here is not part of the FFP. It is used to show a transition from one FFP expression to another, in accordance with the rules of the language.) Thus, executing the program (DBL:(DBL:A)) would yield the following transitions: (DBL:(DBL:A).fwdarw.(DBL:<A,A>) .fwdarw.<<A,A>,<A,A>>.

3. Another possibility is that x is an atom that is neither primitive nor defined. This is considered to be an error, and the meaning of the application is taken to be .perp..

4. If x itself is .perp., then this is also considered to be an error: mu(.perp.:y)=.perp..

5. If x=< >, then (< >:y).fwdarw.y. That is, < > represents the identity function.

6. The final possibility is that x is a non-empty sequence containing no applications. (Recall the assumption that (x:y) is an innermost application.) Thus (x:y) is of the form (<x1, . . . ,xn>:y), were n>=1 and each xi is an object. Then the function represented by <x1, . . . , xn> is given (recursively) by the metacomposition rule:

(<x1, . . . ,xn>:y).fwdarw.(x1:<<x1, . . . ,xn>,y>).

According to this rule, the expression (<AA,DBL>:<A,B,C>) will yield

(AA:<<AA,DBL>,<A,B,C>>)

which will then be evaluated by using the function associated with AA. (Backus uses AA to represent the primitive function "Apply to All," which applies its parameter to each element of the original operand. The parameter in the above example is DBL, and the next step of the evaluation yields

<(DBL:A),(DBL:B),(DBL:C)>,

which then yields

<<A,A>,<B,B>,<C,C>>.)

The association of primitive functions and FFP definitions with atoms is made prior to execution of the FFP program, and is accomplished by means external to the FFP language. The intention is for the FFP programmer to be able to construct his own FFP definitions--indeed, the programming process consists in writing FFP operator definitions. The primitive operators, on the other hand, are not intended to be changed by the FFP programmer--they are "built in" to the language, and changing them presumably requires the services of a systems programmer.

The process of replacing an innermost application according to the rules above is called reducing the application. Thus an innermost application is referred to as an RA (Reducible Application). Notice that "reducing" can be a misleading term in some circumstances: the new text may contain many new applications and may be either longer or shorter than the old.

2. Prior Art

2.1 The Mago Machine

This section introduces the basic ideas and terminology associated with Mago's machine, giving only that information needed for understanding the invention disclosed here. For the definitive treatment, see Mago's paper and patent disclosure, previously referenced. Although Mago designed the machine to execute Reduction languages, the precursors of FFPs, his machine will be discussed here as if it were designed for FFPs. This involves primarily a minor notational change.

The Mago machine is essentially a binary tree of microprocessors, with additional connections between adjacent leaf cells. The leaf cells are called L cells (for Leaf, or Linear), and collectively are known as the L array. The internal cells are called T cells (for Tree). All the L cells are identical; all the T cells are identical except for the root cell, which acts as an I/O port for the machine. (Since this would be too tight a bottleneck in a large machine, Mago suggests the possibility of allowing more I/O ports--all the T cells at some particular level of the tree, say.)

The FFP program text resides in the L cells, one symbol per cell, in left to right order. The text is not required to occupy contiguous L cells; empty L cells may be interspersed among the occupied ones. The colon used to separate the operator and operand of an application is syntactically superfluous and is not stored in the machine. For the same reason, the commas separating the elements of a sequence are not stored. The right application bracket and the right sequence bracket, `)` and `>`, are not explicitly stored either; the structural information they convey is represented in the L array by absolute level numbers (ALNs) attached to the symbols of the FFP text. The ALN of a symbol represents the symbol's nesting level in the FFP program. The program (DBL:<A,<B,C>>), for instance, might appear in the L array as follows:

    ______________________________________
    Symbol  (     DBL      <         A   <     B   C
    ALN     0     1        1         2   2     3   3
    Cell #  6     7        8   9     10  11    12  13
    ______________________________________


The machine works by sending ("sweeping") information up and down the tree network, combining or copying it upward and broadcasting or selectively distributing it downward. The operation is asynchronous, each cell operating at its own speed and accepting input from neighboring cells only when it is ready. Some functions of the machine require the cooperation of all the cells; others can be done locally, by one cell or by a group of cells.

The overall operation can be described in terms of states that the cells pass through, corresponding to the various activities that go on in the machine. The state changes sweep up and down the tree, usually in constant time, but in data-dependent time for some activities. A basic machine cycle, sufficient to execute a simple RA, goes through 14 states, corresponding roughly to 14 sweeps (7 up and 7 down).

The hardware requirements for each cell are modest. An L cell consists of a CPU, a "microprogram" store (see below), a state control unit, and several registers. The T cell has similar requirements, but it does not need a microprogram store.

Execution is accomplished through the various internal mechanisms of the machine, which provide a framework of fixed services such as syntactic analysis and storage management, and through the microprogramming language, which expresses the detailed actions the cells should take to execute each FFP primitive operator.

The major activities of the machine are:

1. Finding the RAs and allocating to them disjoint areas of the network within which to work. This process is called partitioning. An RA's area consists of the L cells in which its text resides and a binary tree of T cells (or parts of T cells) above the RA. An individual T cell may serve as many as four RAs at once. Partitioning is done in parallel throughout the machine and is accomplished in one upsweep and one downsweep, in time proportional to the height of the tree network.

2. Supplying microprograms to the RAs, on the basis of their operators, to control their execution. A microprogram defines an FFP primitive operator. A microprogram consists of instructions to be executed by the individual L cells holding the text of the RA. Corresponding to each microinstruction is a destination expression specifying the relative level number of the subexpression to which the instruction should be given. (The relative level number of a symbol in an RA is the difference between the symbol's ALN and the ALN of the application bracket of the RA.)

The microprograms for the primitive FFP operators are stored outside the machine and are brought in through the root of the machine upon request by the top cells of the areas. A microprogram is distributed through the T cells of the area to the L cells of the RA using a directory previously computed and stored in the T cells. The directory in a T cell is a synopsis of the high-level structure of the FFP text beneath the T cell. The distribution process matches up the microinstructions and the FFP expressions from left to right. All the RAs found by the partitioning process get their microprograms during a particular sweep of states through the machine. The time spent in distributing microprograms is proportional to the total length of all the distinct microprograms requested by the RAs.

3. Syntactically analyzing the RA. This is done in two phases. The first is the computation of the directory (see above), which analyzes the entire RA down to three levels relative to the application bracket, for the purpose of guiding the distribution of the microprograms. This requires a single upsweep, taking time proportional to the height of the tree.

The second phase of syntactic analysis is done in response to a Mark instruction in the microprogram. Each Mark instruction causes its FFP expression to be analyzed to an additional two levels. The results of this analysis are stored in the L cells holding the expression, so that each symbol of the marked expression knows such things as: the marker (e.g., `X`) that it is marked with; the number of top-level subexpressions in the expression; which top-level subexpression this symbol is a part of; how many symbols are in the expression; the sequence number (1-origin) of this symbol among all the symbols of the expression. Marking requires two upsweeps and two downsweeps, each taking time proportional to the height of the tree.

Marking allows reference by the microprogram to FFP expressions nested to a depth of five levels relative to the application bracket, so that different parts of the RA can communicate with each other on the basis of syntactic structure.

4. Providing storage (L cells) where needed. If an RA needs additional space to store its results, the space is made available through the mechanism of storage management. As soon as each L cell in the machine knows how many empty cells to request on its right and left (on the basis of the size of the expression(s) to be inserted there), storage management preparation takes place. This consists of two upsweeps and two downsweeps (taking time proportional to the height of the tree) to compute how far each symbol in the L array needs to move right or left to provide empty cells where they are needed. Then the storage management itself takes place: each symbol is moved directly from L cell to neighboring L cell, until it reaches its destination, leaving behind it a trail of cell markers in the "empty" cells. The cell marker in an "empty" L cell is later used by the cell during execution of the microprogram to determine which incoming symbol to accept and store. The time required for moving the symbols to their destinations and setting the cell markers is proportional to the longest distance any symbol has to travel. (The unit of distance here is the distance between two adjacent L cells.) This can be as small as one or as large as one less than the number of L cells in the machine, depending on how much storage space is requested and where the full and empty L cells happen to lie. Immediately after storage management, the entire machine has to go through partitioning again, since storage management generally makes the old areas obsolete. After partitioning, the old RAs (those that already have microprograms) can resume execution. The new RAs (those created as a result of the latest set of reductions) get their microprograms.

5. Executing the microprogram. Typical micro-instructions do things such as these: erase the symbol in the L cell; replace the symbol with a literal symbol specified by the microprogram; replace the symbol with some symbol received from the father T cell; send the symbol to the father T cell to be combined with other symbols using some operator; insert at the left (or right) of this symbol the expression marked with `X`; mark with `X` the FFP expression containing this symbol; if this symbol is the last one in its FFP expression, then erase, else do nothing.

The process of sending data up into the tree--to be combined with other data or simply to be copied--and then broadcasting it back down to the entire RA is called data movement. The time it takes to move an expression this way is linear in the amount of data moved through the root of the area, whereas all the other microprogram operations require time proportional merely to the height of the machine. (As has been noted, though, the internal mechanisms of microprogram distribution and storage management can also take time that is independent of and much greater than the height of the tree.) This great difference makes it undesirable to require the machine to wait for all the "slow" RAs to finish executing before looking for new RAs. Hence a mechanism is provided which interrupts all the RAs that are moving a lot of data, just prior to storage management. Each data item in the tree at the time of interruption is pushed back down to where it came from in L, and its special status is noted there. After storage management and partitioning, these interrupted RAs can resume data movement from wherever they left off, and continue until the next interruption, or until they finish.

Data movement can be going on in parts of the machine concurrently with certain other activities taking place elsewhere in the machine: directory building, microprogram distribution, marking, and storage management preparation. However, the entire machine has to participate in storage management itself and in partitioning. No other activities can proceed in parallel with either of these.

2.2 DDMl

U.S. Pat. No. 3,978,452 discloses a data driven network of uniform processing or function modules and local storage units which may be partitioned to accommodate various concurrent operations. The same machine is also described in "The architecture and system method of DDMl: a recursively structured data driven machine," by A. L. Davis, Conference Proceedings of the 5th Annual Symposium on Computer Architecture, April 1978, pp. 210-215, and also in "A data flow evaluation system based on the concept of recursive locality," by A. L. Davis, AFIPS Conference Proceedings, Volume 48, NCC, 1979, pp. 1079-1086. DDMl, the machine described in these references, is a tree machine that executes programs expressed in the form of graphs called DDNs: data-driven nets. Davis considers DDNs to be too low-level for direct use as a programming language.

DDMl differs in some important respects from Mago's machine and from the present invention. DDMl is not cellular, in that the individual PSEs ("processor storage elements") of DDMl are capable of executing any DDN, given sufficient storage, whereas a cellular computer in the sense of Mago and Tolle has the property that each cell performs but a small part of any applicative computation--i.e., the cells cooperate in performing the primitives of the applicative language.

Furthermore, decomposition of the DDN program--when it occurs at all--is done top-down, both in terms of the movement of information among the PSEs and in terms of the parsing of the DDN; thus DDMl has to decide which subtrees should be given which parts of the decomposed net. By contrast, in both Mago3 s (in both senses), and there is no question of how best to decompose the program.

A DDN, furthermore, is such a general kind of graph that mapping it onto the tree of PSEs requires a complex compilation process to be performed by a separate machine before the graph can be entered into DDM1. No such compilation is needed by Mago's machine or by the present invention. That compilation process yields an inflexibly static tree structure that is then decomposed, top-down, and parcelled out in pieces to various PSEs. In the compiled tree structure, certain nodes represent operations to be performed on data that (conceptually) flows to them along their input arcs. Although at first glance this may seem similar to the idea of the SN-CP networks of the present invention, there are in fact significant differences: a node of an SN-CP network corresponds not to a fixed operator, but to a syntactically-defined collection of data stored in the leaf nodes below the node in question, and any of a wide variety of operators may at run-time be associated with the node, quite dynamically. Each operator node of a DDN corresponds to a fixed operator, determined at compile-time. Furthermore, the SN-CP networks are re-built when the underlying program text is modified in the course of execution, whereas the DDN nets are built once and for all at compile-time.

2.3 CORAL

The machine proposed in "A binary tree multiprocessor: CORAL," by Yoshizo Takahashi, Naoki Wakabayashi, and Yoshihiro Nobutomo, in Journal of Information Processing, Vol. 3, No. 4, February 1981, pp. 230-237, consists of 100 or more processors connected in the form of a binary tree. The device is not cellular in construction. It does not contemplate the execution of language primitives by the cooperation of a group of processors. It does not execute programs of an applicative language. The authors state that they have no general method of decomposing a job into programs for the individual processors to execute. The device does not embed syntactic networks corresponding to the syntactic structure of programs or data.

2.4 The Expression Processor

In a paper entitled "The expression processor: a pipelined, multiple-processor architecture," published in IEEE Transactions on Computers, Vol. C-30, No. 8, August 1981, pp. 525-536, Jerry R. VanAken and Gregory L. Zick discuss a machine that includes a number of processing elements (PEs) together with a single central control unit and a single main memory containing data and program code. The PEs can be configured either as an SIMD (single instruction, multiple data-stream) array or as a binary tree network, called X-pipe. The machine is not cellular in nature; as the authors indicate, the expression processor can be compared to a conventional von Neumann machine, with the X-pipe taking the place of the arithmetic-logical unit. The machine does not execute programs of an applicative language. Programs have to be compiled before being entered into the machine; the compiler is responsible for determining a mapping of the parse trees onto the binary tree of PEs. No such compilation is needed by the present invention. Execution of a program is accomplished under control of a single central control unit that feeds operands and instructions to the PEs through an alignment network of switching elements at the base of X-pipe. All movement of information in X-pipe is from the leaves upward toward the root, the results eventually flowing out of the root PE back to the main memory.

In the compiled parse tree structure, certain nodes represent operations to be performed on data that (conceptually) flows to them along their input arcs. Although at first glance this may seem similar to the idea of the SN-CP networks of the present invention, there are in fact significant differences: a node of an SN-CP network corresponds not to a fixed operator, but to a syntactically-defined collection of data stored in the leaf nodes below the node in question, and any of a wide variety of operators may at run-time be associated with the node, quite dynamically. Each operator node in a parse tree in the X-pipe corresponds to a fixed operator determined at compile-time. Furthermore, all data flows in one direction in the parse trees in X-pipe, whereas data can be streamed up and down repeatedly and combined in sophisticated ways in the present invention. Furthermore, the SN-CP networks are re-built when the underlying program text is modified in the course of execution, whereas the parse trees of X-pipe are built once and for all at compile-time. Furthermore, the program text does not reside in X-pipe, but in a central memory, from which portions of the parse trees have to be fed into X-pipe by the central control unit.

3. Summary of the Present Invention

Mago's architecture is a first solution to the problem of mapping an FFP computation into a cellular computer and achieving highly concurrent execution. It is an undertaking of some difficulty. FFPs are elegant, abstract, and far-removed from the realities of hardware. Backus's computational model for their execution is mathematical--it includes no concept of storage space or data movement or processor. It does support concurrency in one vital respect: all innermost applications can be reduced simultaneously. The question is how to achieve this in hardware.

The challenge is to design a scheme for coordinating the activities of a large number of cells. The difficulty does not lie in a lack of power--the cells can be general-purpose processor/memory units. The difficulty lies in controling that power, in putting it to good use. How can the computational tasks and the data be parcelled out to the the processors? How can the processors know what their responsibilities are and what to expect of their neighbors in the way of cooperation?

Mago's work brings to light the interesting possibilities of a binary tree of processors operating on very small components (e.g., one symbol) of an FFP expression. There are advantages to the cellular approach. There are advantages to a language-based design, and to FFPs as the language. There are advantages to the binary tree interconnection pattern and the spreading thin of the FFP text so that communication paths between symbols will be short and so that concurrency can be had at the level of operations on individual symbols.

During the actual execution of an RA in Mago's machine, the T cells are used only for the transmission of messages between L cells and for simple combining operations on the data in certain of those messages. The complexity and the power reside principally in the L cells. Can the T cells be used to better effect? Can they be organized to cooperate in more sophisticated tasks than those possible in Mago's machine?

The invention disclosed here results from the realization that the binary tree of cells offers possibilities for combining and distributing information that were only hinted at in Mago's design. The central theme of this work can be stated in a single sentence: The syntactic structure of an innermost application of an applicative program can be used to impose a structure on the T cells and L cells of the machine--i.e., to embed a network of nodes corresponding to the subexpressions of the application--and the reduction of the application can be expressed in terms of operations performed by those embedded nodes.

The embedded structure, corresponding as it does to the syntactic structure of the RA, and being distributed over a number of L cells and T cells, gives a natural allocation of computing resources, connected in a useful way. The problem of programming the nodes admits of a natural, hierarchical solution corresponding to the hierarchical syntactic structure of the RA.

Mago's machine will be referred to as Machine I. The present invention will be referred to as Machine II.

Although the present invention is described in terms of execution of FFP language programs, it should be understood that only minor modifications are required to make it suitable for executing any similar applicative language. It should be further understood that the techniques described here for embedding syntactic networks and for performing computations in such networks are useful for any suitably structured data, and not only for expressions of an applicative language.

This invention provides:

1. A new representation of the program text, at the syntactic level. Explicit `)` and `>` are used, eliminating the use of ALNs. This makes syntactic analysis simpler and faster. Also, the representation allows the packing of one or more syntactic brackets into an L cell (along with an atom), reducing the number of L cells needed to hold a program. This representation, furthermore, makes it possible to perform certain frequently-occurring operations without having to do storage management. This is an important advantage.

2. A new means of finding the RAs. With the explicit closing brackets, one upsweep suffices to locate the RAs, associate with each RA a TA (Top of Area), and build a tree of storage management mediators above the TAs. This TA-Mediator network is used to control the storage management.

3. A means for building a syntax tree for each RA. Once the TA-Mediator network has been built, another upsweep is used to embed in the T cells above each RA a network of syntax nodes reflecting the syntactic structure of the RA's text. This network is used to accomplish the execution of the RA.

4. A means for reducing an RA, using the embedded network of syntax nodes. This involves the execution of a Syntax Tree Language (STL) program. The STL expresses the actions to be taken by an RA's syntax nodes in order to execute each primitive FFP operator. Thus, the role of the STL in Machine II is similar to that of the microprogramming language in Machine I. In Machine I, the microprogramming instructions are directed principally to the L cells--the T cells are used for some simple combining operations, but primarily for routing of data. In Machine II, by contrast, the computational power resides in the RA's network of syntax nodes, which is embedded in both the L cells and the T cells.

5. A new technique for storage management. The complexity of executing the STL makes it difficult to interrupt an RA for storage management. An executing RA may have complex information stored in the tree, information which is not simply a copy of data in L, but which is a non-trivial combination of that data. Such information cannot gracefully be stored back into the L array in the midst of execution as in Machine I. Thus, Machine II does not interrupt executing RAs to perform global storage management. Instead, the TA-Mediator network is used to perform local storage management in those areas willing to participate.

Furthermore, in Machine II the storage management algorithm moves data through the tree rather than directly from L cell to L cell. This technique is sometimes radically faster than Machine I's, never very much slower, and probably somewhat faster on the average. Using the tree for storage management also eliminates the need for direct connections between adjacent L cells, thus simplifying the physical layout problem as well as reducing the amount of hardware needed.

Other features and advantages of the invention will be apparent from the following drawings and detailed description of a preferred embodiment.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic representation of the interconnection of the cells of the present invention.

FIG. 2 is a schematic block diagram of a first alternative structure for a T cell of the present invention.

FIG. 3 is a schematic block diagram of a second alternative structure for a T cell of the present invention.

FIG. 4 is a schematic block diagram of a multi-purpose module that can be configured to serve as a MED module, an XN module, a TA module, an SN-CP module, or an SN-FC module in a cell of the present invention.

FIG. 5 is a schematic block diagram of an IN module of the present invention.

FIG. 6 is a schematic block diagram of a first alternative structure for an L cell of the present invention.

FIG. 7 is a schematic block diagram of a second alternative structure for an L cell of the present invention.

FIG. 8 is a schematic block diagram of an L cell basis module for the present invention.

FIG. 9 is a schematic representation of the correspondence between the text and nodes in an L cell.

FIG. 10 is a schematic representation of an example of partitioning in the present invention.

FIG. 11 is a schematic representation of all the possible channels that can come into and out of the nodes of the TA-Mediator network.

FIG. 12 is a schematic representation of all the configurations that a T cell can take on during partitioning in the present invention.

FIG. 12A is a schematic representation of all the configurations that an IN module can take on during partitioning in the present invention.

FIG. 12B is a schematic representation of all the configurations that a TA module can take on during partitioning in the present invention.

FIG. 12C is a schematic representation of all the configurations that an XN module can take one during partitioning in the present invention.

FIG. 12D is a schematic representation of all the configurations that a MED module can take on during partitioning in the present invention.

FIG. 13 shows an example of LB, RB, LOFF, and ROFF as computed during partitioning in the present invention.

FIG. 14 is a schematic representation of an example of an embedded TA-Mediator network.

FIG. 15 is a schematic representation of the structure of the TA-Mediator network shown in FIG. 14.

FIG. 16 is a schematic representation of an example of LB, RB, LOFF, and ROFF as computed during partitioning for the rightmost RA shown in FIG. 14.

FIG. 17 is a schematic representation of the execution cycle of the present invention.

FIG. 18 is a schematic representation of an example of an embedded SN-CP network according to a first alignment of text.

FIG. 19 is a schematic representation of an example of an SN-CP network structure according to a first alignment of text, as shown in FIG. 18.

FIG. 20 is a schematic representation of an example of an embedded SN-CP network according to a second alignment of text.

FIG. 21 is a schematic representation of an example of an SN-CP network structure according to a second alignment of text, as shown in FIG. 20.

FIG. 22 is a schematic representation of an example of the correspondence between the text and the leaf nodes of the SN-CP network embedded in an L cell.

FIG. 23 is a schematic representation of several examples of lseq, Lrln, AL, EL, and DL in an L cell.

FIG. 24 is a schematic representation of several examples of rseq, Rrln, AR, ER, and DR in an L cell.

FIG. 25 is a schematic representation of all the possible channels that can come into and out of the interior nodes of an SN-CP network.

FIG. 26 is a schematic representation of an example of a syntactic father and its four syntactic sons.

FIG. 27 is a schematic representation of an example of an embedded SN-CP network.

FIG. 28 is a schematic representation of the structure of the SN-CP network shown in FIG. 27.

FIG. 29 is a schematic representation of an example of #sons, #sym, f#sons, f#sym, pos#sons, and pos#sym for an SN-CP network.

FIG. 30 is a schematic representation of an SN node that has just been called by its syntactic father node.

FIG. 31 is a schematic representation of the s-queues set up by an execution of CALL.sub.-- SONS.

FIG. 32 is a schematic representation of an example of an SN-CP network for an RA that expresses a matrix multiplication.

FIG. 33 is a schematic representation of a pipeline of operators set up in one SN node during execution of the RA shown in FIG. 32.

FIG. 34 is a schematic representation of some of the s-queues in the SN-CP network of FIG. 32, at some point in time after execution of the first CALL.sub.-- SONS.

FIG. 35 is a schematic representation of some of the s-queues in the SN-CP network of FIG. 32, at some point in time prior to the execution of the third CALL.sub.-- SONS.

FIG. 36 is a schematic representation of some of the s-queues in the SN-CP network of FIG. 32 at some point in time after execution of the third CALL.sub.-- SONS.

FIG. 37 is a schematic representation of an example of an SN-CP network after the repartitioning and reparsing that follow the allocation of storage for an RA.

FIG. 38 is a schematic representation of all the configurations that a Mediator node can take on during the process of finding the molten zones.

FIG. 39 is a schematic representation of all the possible channels that can come into and out of the nodes of the TQ-QN network.

FIG. 40 is a schematic representation of an example of molten zones.

FIG. 41 is an schematic representation of an example of computing the number of requested L cells in a molten zone.

FIG. 42 is a schematic representation of an example of denial decisions made in a molten zone.

FIG. 43 is a schematic representation of an example of molten and frozen zones before the BN networks are embedded.

FIG. 44 is a schematic representation of the BN networks that get embedded by the molten zones of FIG. 43.

FIG. 45 is a schematic representation of an example of the computation by a BN network of 1#empty, 1#req, r#empty, r#req, BL, and BR.

FIG. 46 is a schematic representation of an example of text movement in a molten zone.

FIG. 47 is a schematic representation of a syntax tree corresponding to an FFP expression.

FIG. 48 is a schematic representation of a syntax tree and its corresponding relative level numbers.

FIG. 49 is a schematic representation of a deeply-nested FFP expression as it might be stored in the machine.

FIG. 50 is a schematic representation of a syntax tree.

FIG. 51 is a schematic representation of a different form of the syntax tree of FIG. 50.

FIG. 52 is a schematic representation of a flattened syntax tree corresponding to the syntax tree of FIG. 47.

FIG. 53 is a schematic representation of an I channel enclosing three interior channels.

FIG. 54 is a schematic representation of three interior channels, without their enclosing I channel.

FIG. 55 is a schematic representation of an IN node and its channels.

FIG. 56 is a schematic representation of the same IN node and its channels, as it will commonly be drawn.

FIG. 57 is a schematic representation of a syntax tree.

FIG. 58 is a schematic representation of a syntax tree.

FIG. 59 is a schematic representation of a flattened syntax tree corresponding to the syntax tree of FIG. 58.

FIG. 60 is a diagram of an SN-CP network, illustrating the algorithm used to compute the values of certain parse registers.

FIG. 61 is a schematic diagram of a worstcase embedding in an L cell.

FIG. 62 is a schematic representation of a CCC (current channel configuration).

FIG. 63 is a schematic representation of the configuration yielding the embedding in a T cell of the largest posssible number of nodes and channels of each kind.

FIG. 64 is a schematic representation, when D=2, of the maximum number of nodes and channels that ever have to be embedded in a T cell.

FIG. 65 is a schematic representation of an SN-CP network, showing elements of a sequence to be added together.

FIG. 66 is a schematic representation of the SN-CP network of FIG. 65, together with an indication of the symbols sent up by the leaf SN nodes when they have executed WRAPR.

FIG. 67 is a schematic representation of the SN-CP network shown in FIGS. 65 and 66, at a later stage of execution.

FIG. 68 is a schematic representation of an SN-CP network below an SN node.

FIG. 69 is a schematic representation of some L cells after storage management.

FIG. 70 is an illustration of the natural left-to-right ordering of the TAs, the DTAs, and those XNs that see entire XRAs.

FIG. 71 illustrates the relationships among TAs, DTAs, XNs and MEDs.

FIG. 72 is an example of FFP text and insertion request seen by an IN leaf node residing by itself in an L cell.

FIG. 73 shows schematically the 43 L cells required when the insertion requests of FIG. 72 are expanded.

FIG. 74 is a schematic representation of some FFP text and some approved insertion requests.

FIG. 75 shows the text that should appear in L cells 14 through 18, assuming that UPROG1 has a destination range of [14,18] and a sequence range of [1,5].

DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION

For ease of understanding, this description is divided into chapters.

1. The Overall Structure of the Computer

The preferred embodiment of the present invention is a full binary tree of two kinds of cells, as indicated in FIG. 1. The leaf cells 600 are referred to as L cells, and the entire collection of L cells is sometimes referred to as L. The non-leaf cells 200 are referred to as T cells, and the entire collection of T cells is sometimes referred to as T. In this description, it is assumed that the root cell of T acts as the I/O port of the processor, but it should be understood that other arrangements, such as having all the cells at some level of the tree act as I/O ports, are possible.

The program text is stored in the L cells in a representation decribed in the next chapter. The machine 100 operates by sweeping information upward and downward through the binary tree of machine cells, thereby locating the reducible applications, determining the operators to be executed, and reducing the applications. The succeeding chapters describe in detail how these things are accomplished. As is discussed in those chapters, networks of nodes are embedded in the tree of machine cells. Those networks of nodes correspond in well-defined ways to portions of the program text stored in L.

Each node corresponds to a particular portion of program text. A node acts as a processor, performing computations related to its portion of program text. Each node is connected to certain other nodes via two-way channels, which are capable of carrying information between the nodes. By embedding a network of nodes and channels in the machine, we mean the process of configuring the resources of the cells in such a way that the network is represented somehow in the totality of cells.

Any given cell, in general, embeds only a small part of the network--a few nodes and a few channels--and the network is spread across many cells. Later chapters specify in detail the abstract rules for building each kind of network required in the machine. In the present chapter we discuss the relationship between the abstract notion of a network of nodes and channels and the physical hardware required in the cells to represent ("embed") such a network. Various means of representing the embedded networks of nodes are possible; two alternatives are here described.

A network is a graph, and many well-known techniques are suitable for representing and manipulating graphs in a conventional computer. See, for instance, The Art of Computer Programming, Volume 1, Fundamental Algorithms, Second Edition, by Donald E. Knuth, Addison-Wesley, 1973; Combinatorial Algorithms: Theory and Practice, by Edward M. Reingold, Jurg Nievergelt, and Narsingh Deo, Prentice-Hall, Inc., 1977; and The Design and Analysis of Computer Algorithms, by Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman, Addison-Wesley, 1974.

We may, therefore, as a first alternative structure for a T cell, use the one shown in FIG. 2. According to this alternative, the T cell is connected to its left son, its right son, and its father by communication paths 251, 261, and 241, respectively. These communication paths have identical capabilities, and can carry information in both directions simultaneously. Each communication path, in fact, can serve as several channels of the various embedded networks, and each channel can independently carry information in both directions simultaneously. Each channel may have separate physical resources, or the effect of separate channels may be achieved by time-multiplexing over a single physical path. These communication paths connect to communication ports 250, 260, and 240, respectively, in the T cell 200. It is assumed that the cells of the present invention operate asynchronously. The communication ports, therefore, have the necessary logic for asynchronous communication, as well as for handling multiple two-way channels.

These communication ports are connected via communication paths 252, 262, and 223 to a control module 220, which has the ability to perform the partitioning algorithm described in chapter 3, the parsing algorithm described in chapter 5, and the algorithm that embeds the BN network as described in chapter 7. The control module 220 is connected by a communication path 222 to a CPU 230, which is capable of executing STL programs as described in chapter 6 and of performing processing related to storage management as described in chapters 4 and 7. The control module 220 is connected by a communication path 221 to a local storage unit 210, which contains (1) definitions of STL-defined operators; (2) storage used to represent the portions of the networks embedded in the cell; and (3) storage used by the embedded nodes during execution of STL operators as described in chapter 6. In this first alternative structure of a T cell, the embedded networks are presented in the local storage unit 210 using conventional data structures and programming techniques such as those described in the books referenced above. In this first alternative structure of a T cell (and in the corresponding one for an L cell, discussed below), the effect of having a node act as a processor is achieved by time-sharing the CPU among all the anodes represented in the local storage unit.

FIG. 3 indicates a second alternative structure for a T cell 200 of the present invention. As in the first alternative, the T cell is connected to its left son, its right son, and its father by communication paths 251, 261, and 241, respectively. These communication paths connect to communication ports 350, 360, and 340, respectively. Communication port 350 is connected via communication paths 352, 353, 354, 355, and 356 to communication port 340, to a MED module 490, to an XN module 491, to a TA module 492, and to an IN module 500, respectively. Symmetrically, communication port 360 is connected via communication paths 362 through 366 to the same entities. The IN module 500 is connected via communication path 381 to the TA module 492, which is connected via communication path 382 to the XN module 491, which is connected via communication path 383 to the MED module 490, which is connected via communication path 384 to the communication port 340. In this alternative structure for the T cell, the MED module 490, the XN module 491, and the TA module 492 are capable of embedding the portion of the TA-Mediator network (described in Chapter 3 ) corresponding to the MED nodes, the XN nodes, and the TA nodes, respectively. The IN module 500 is capable of embedding the IN nodes described in Chapter 3 and the SN-CP network described in Chapter 5.

The MED module 490, the XN module 491, and the TA module 492 are all specific instances of a multi-purpose module 400, shown in FIG. 4. The multi-purpose module 400 can serve as a MED module, an XN module, a TA module, an SN-CP module (see below), or an SN-FC module (see below), according to the specific logic built in to its control module 420. The multi-purpose module 400 has communication paths 451, 461, 471, and 441, which connect to communication ports 450, 460, 470, and 440, respectively, in the multi-purpose module 400. These communication ports are connected to a control module 420 by communication paths 452, 462, 472, and 423, respectively. The control module has the ability to embed either the MED nodes, the XN nodes, the TA node and two CP nodes, one syntactic level of SN and CP nodes, or one syntactic level of SN and FC nodes (together with associated channels), depending on the role to be served by the particular instance of the multi-purpose module 400. The portions of embedded networks are represented in the local storage unit 410 using conventional data structures and programming techniques such as those described in the books referenced above. The control module 420 is connected by communication path 422 to a CPU 430, which is capable either of executing STL operators or of performing processing related to storage management, depending on the role served by the particular instance of the multi-purpose module 400. The control module 420 is connected by a communication path 421 to a local storage unit 410, which contains (1) storage used to represent the portions of the networks embedded in the cell; and, if this instance of the multi-purpose module 400 is to be used as an SN-CP module or as an SN-FC module, (2) definitions of STL-defined operators; and (3) storage used by the embedded nodes during execution of STL operators as described in chapter 6.

FIG. 5 shows the structure of an IN module 500, as shown in FIG. 3. The IN module 500 has a left communication path 551, a right communication path 561, and a top communication path 541, which correspond respectively to communication paths 356, 366, and 381 of FIG. 3. These communication paths connect to communication ports 550, 560, and 540, respectively, in the IN module 500. Communication port 550 is connected via communication path 552 to communication port 540, and via communication paths 553, 554, 555, and 556 to SN-CP modules 525, 524, 523, and 522, respectively, and via communication path 557 to SN-FC module 521. Symmetrically, communication port 560 is connected via communication paths 562 through 567 to the same entities. The SN-FC module 521 is connected via communication path 581 to SN-CP module 522, which is connected via communication path 582 to SN-CP module 523, which is connected via communication path 583 to SN-CP module 524, which is connected via communication path 584 to SN-CP module 525, which is connected via communication path 585 to the communication port 540. The SN-FC module and each of the SN-CP modules is an instance of the multi-purpose module 400 shown in FIG. 4. Although FIG. 5 shows exactly four SN-CP modules, the intention is that in any particular embodiment of the invention, in accordance with said second alternative structure of the T cell, the IN module 500 shall have a plurality of SN-CP modules, the exact number of which shall determine the parsing depth, D, of the machine, as described in Chapter 5. Each SN-CP module has the capability of embedding one SN node and two CP nodes, and of executing STL operators as described in Chapter 6. The SN-FC module has the capability of embedding one SN node and one FC node, and of executing STL operators as described in Chapter 6.

FIG. 6 indicates a first alternative structure for an L cell 600 of the present invention. According to this alternative, the L cell is connected to its father by communication path 641. This communication path can carry information in both directions simultaneously. It can, in fact, can serve as several channels of the various embedded networks, and each channel can independently carry information in both directions simultaneously. Each channel may have separate physical resources, or the effect of separate channels may be achieved by time-multiplexing over a single physical path. This communication path connects to communication port 640, which is connected via communication path 623 to a control module 620, which has the ability to perform the partitioning algorithm described in chapter 3, the parsing algorithm described in chapter 5, and the algorithm that embeds the BN network as described in chapter 7. The control module 620 is connected by a communication path 622 to a CPU 630, which is capable of executing STL programs as described in chapter 6 and of performing processing related to storage management as described in chapters 4 and 7. The control module 620 is connected by a communication path 621 to a local storage unit 610, which contains (1) definitions of STL-defined operators; (2) storage used to represent the portions of the networks embedded in the cell; and (3) storage used by the embedded nodes during execution of STL operators as described in chapter 6. In this first alternative structure of an L cell, the embedded networks are represented in the local storage unit 610 using conventional data structures and programming techniques such as those described in the books reference above.

FIG. 7 indicates a second alternative structure for an L cell 600 of the present invention. As in the first alternative, the L cell is connected to its father by communication path 641. Communication path 641 connects to communication port 740. Communication port 740 is connected via communication paths 871 and 841 to the L cell basic module 800, and by communication path 742 to an IN module 500. The IN module 500 is connected to the L cell basis module 800 via communication paths 861 and 851. The IN module 500 is identical to the one already shown in FIG. 5.

In this alternative structure for the L cell, the L cell basis module 800 is capable of embedding the leaf nodes of the TA-Mediator network (described in Chapter 3), and the leaf nodes of the SN-CP network (described in Chapter 5), and the leaf nodes of the BN network (described in Chapter 7).

FIG. 8 shows the structure of the L cell basis module 800, which has four communication paths, 841, 851, 861, and 871, which connect to communication ports 840, 850, 860, and 870, respectively, which in turn are connected to control module 820 via communication paths 823, 824, 825, and 826, respectively. The control module 820 is connected via communication path 822 to a CPU 830, and via communication path 821 to a local storage unit 810. The control module 820 is capable of embedding the leaf nodes of the TA-Mediator network, the leaf nodes of the SN-CP network, and the leaf nodes of the BN network. The CPU 830 is capable of executing STL programs as described in Chapter 6 and of performing processing related to storage management as described in chapters 4 and 7. The local storage unit 810 contains (1) definitions of STL-defined operators; (2) storage used to represent the portions of the networks embedded in the basis module 800; and (3) storage used by the embedded nodes during execution of STL operators as described in Chapter 6.

The paragraphs above describe two alternative structures for T cells and L cells. In the first alternative for each kind of cell, there is a single CPU, a single control module, and a single local storage unit for each cell. In the second alternative each cell has a plurality of modules, each with its own set of computing resources; and each module has responsibility for a relatively small portion of the duties of the cell. As will become clearer in the following chapters, the duties of each module in the second alternative structures for the cells are syntactically defined--i.e., each module has responsibility for computations involving specific kinds of nodes that correspond to specific syntactically-defined entities in the program text. In the case of each SN-CP and SN-FC module, these computational responsibilities are further restricted to expressions at a certain syntactic nesting depth in an innermost application of a program.

It should be understood that the two alternative structures here described for the T cells and for the L cells do not exhaust the possibilities included in the intent and scope of the present invention. Among other alternatives particularly included are structures in which various of the modules share certain resources, such as storage and such as CPUs. Also particularly included are structures in which each module is further subdivided into submodules with further-restricted responsibilities, such as computational responsibility for a single node of an embedded network.

2. A Compact Representation of the FFP Text

Recall that in Machine I, the FFP text is stored in the L cells, at most one symbol (atom or syntactic bracket) per cell. All closing brackets (`>` and `)`) are omitted in this representation, and the structural information they convey is preserved by storing with each remaining symbol an absolute level number (ALN). Consider the possibility of using instead the following "natural" representation: at most one symbol per L cell, all brackets appearing explicitly, with no ALNs.

There are advantages to be gained by using this natural representation. First, it is natural, and simple, and requires less pre-processing of the FFP text upon input and output. More importantly, parsing is faster with this representation. Locating the boundaries of the RAs, for instance, requires in Machine I an upsweep and a downsweep; with the natural representation of the text, a single upsweep suffices. Even more importantly, corresponding to the natural representation is a natural embedding in the T cells of a syntax tree reflecting the structure of the FFP text in the L array. Unfortunately, the natural representation requires more L cells than Mago's does to store a given FFP program. This is a serious program, since the L cell is such a precious and limited resource in the machine. It would be possible, of course, to pack more than one FFP atom per L cell, but this would reduce the degree of parallelism possible on the machine. It is, however, possible to pack many syntactic brackets into an L cell, with little or no loss of opportunity for parallelism, and with but little additional hardware required per L cell. One way to do so is described below. It is called the PC representation (for Potentially Compact). The PC representation ordinarily uses even fewer L cells than Mago's to store a given FFP program. It has the further advantage of obviating storage management for some of the common syntactic transformations on FFP text. This is an important advantage.

For the PC representation of Machine II, five registers are needed in each L cell:

S, to store an FFP atomic symbol

LAP, to store a count of left application brackets

RAP, to store a count of right application brackets

LSEQ, to store a count of left sequence brackets

RSEQ, to store a count of right sequence brackets.

The registers are logically ordered from left to right like this:

LAP LESQ S RSEQ RAP

Let the values stored in these registers be denoted by lap, lseq, s, rseq, and rap, respectively. Then the text represented by these values is:

    ______________________________________
     ##STR1##


______________________________________


For example, suppose that the following values are stored in the registers of some L cell:

    ______________________________________
    REGISTER:
             LAP       LSEQ    S      RSEQ  RAP
    VALUE:   3         2       DBL    1     0
    ______________________________________


then the text represented in the L cell is:

(((<<DLB>

It is clear that lap, lseq, rseq, and rap will always be non-negative integers. Furthermore, it will always be the case that, in any given L cell, either lap=0 or rap=0. For if lap and rap were both positive, then an entire RA would be represented in the cell. This is not possible, because an RA has to have two top-level subexpressions--the operator and the operand--and it is easy to see that no more than one entire expression at any single nesting level can be stored in one L cell in this representation.

The representation is called "Potentially Compact" rather than simply "Compact" because execution of FFP programs naturally leads to less-than-fully-compact text, and the proper functioning of the machine is independent of the degree of compaction of the text. The text of the example above, for instance, could occupy anywhere from 1 to 7 cells. The degree of compaction is immaterial, except as it affects the amount of storage space (i.e., the number of L cells) available for storage management.

Notice the following simple but important properties of the PC representation (wich are also true, incidentally, of the representation of Machine I):

1. The left-to-right order of the FFP text is preserved.

2. Given two RAs, RA1 and RA2, one of them lies entirely to the left of the other.

3. No L cell contains text for more than one RA. In other words, no two RAs have to share any L cell.

4. No L cell contains more than one entire FFP expression at any given nesting level.

5. No L cell contains a entire RA, since each RA has two expressions at the same nesting level: the operator and the operand.

3. Partitioning

3.1 Introduction

In order to allow the simultaneous execution of all the RAs, Machine II--like Machine I--partitions itself into disjoint areas, thereby allocating to each RA a part of the machine for that RA's exclusive use. The partitioning process embeds in the machine a tree of several kinds of nodes, connected by several kinds of channels. The nodes act as processors and the channels carry information between the nodes. A given machine cell may have several nodes and channels embedded in it. The resulting embedded structure is called the TA-Mediator network, in reference to two of its several kinds of nodes, the TA and the Mediator.

"TA" stands for "Top of Area of an RA." The partitioning process associates exactly one TA with each RA, embedding the TA in the lowest cell that "sees" the entire RA. The TA supervises execution of the RA.

Corresponding to each pair of adjacent RAs (i.e., each pair of RAs not separated by another RA) is a Mediator node, embedded in the lowest machine cell that "sees" the entire text of both RAs. The Mediators supervise storage management in Machine II.

3.2 The Nodes and Channels of the TA-Mediator Network

The TA-Mediator network embedded by the partitioning process is a tree whose nodes are of seven types, denoted by the following node symbols:

1. TA--"Top of the Area of an RA" node.

2. MED--"Mediator" node.

3. IN--"Internal" node.

4. XN--"External" node.

5. (- --"Left application bracket" node.

6. -)--"Right application bracket" node.

7. DTA--"Dummy TA" node.

An explanation of the names may be helpful. The TA and the Mediator have already been mentioned. An IN node embedded in a cell corresponds to text that that cell has determined may be internal to some RA. An XN node corresponds to text that the cell knows is not internal to any RA (hence is external). The (- and -) nodes correspond resectively to `(` and `)` in the program text. Exactly two DTAs (Dummy TAs) are embedded in the machine--one in the leftmost L cell and one in the rightmost L cell--in order to simplify the partitioning possibilities at the boundaries of the machine. The DTA acts just like a TA in the partitioning process, but (unlike a TA) has no RA associated with it. (It is sometimes convenient to assume that a DTA has a dummy RA associated with it.)

The arc between two nodes of the TA-Mediator network is called a channel, because it is used to carry information between the nodes. There are five channel types, denoted by the following channel symbols:

1. M--"Mediator" channel.

2. I--"Internal" channel.

3. X--"External" channel.

4. (--"Left application bracket" channel.

5. )--"Right application bracket" channel.

The reasons for choosing these names for the channels should become clear shortly.

The partitioning process is done in a single sweep of information up through the machine. This upsweep does two things:

1. Embeds the nodes and channels of the TA-Mediator network.

2. Stores in every IN node some information concerning the FFP text below it, for use on the following downsweep.

Although both of these are done at the same time, for simplicity they are described separately.

3.3 Embedding the Leaf Nodes of the TA-Mediator Network

The L cells initiate the partitioning process by examining the FFP text they hold and embedding the appropriate leaf nodes of the TA-Mediator network, as specified in Chart A. (Recall that lap and rap are, respectively, the number of left and right application brackets stored in the L cell.)

                  CHART A
    ______________________________________
    Leaf nodes embedded by an L cell
                rap = 0 rap > 0
    ______________________________________
    lap = 0       IN        IN -) XN
    lap > 0       XN (- IN  [Error]
    ______________________________________


The two endmost L cells of the machine, furthermore, embed three additional nodes: XN DTA XN. The leftmost cell embeds them to the left of whatever else it embeds, the rightmost to the right. The left-to-right order of the nodes is important: it reflects the order of the text. The error case occurs if both `(` and `)` are represented in the cell; this is illegal in the PC representation and assumed not to happen. Notice that every L cell embeds exactly one IN node.

Both of the DTAs in the machine are leaf nodes of the TA-Mediator network. Every (- and every -) is a leaf node. Every TA and every MED is a non-leaf node. An IN or XN node may be either a leaf node or a non-leaf node. These facts will all be easy to verify once the partitioning algorithm has been specified.

The concept of the text seen by (or corresponding to) a node is important. For now, the definition is specified for a leaf node. Later, the definition is extended to include all the nodes of the TA-Mediator network.

1. An IN leaf node sees the text represented by the LSEQ, S, and RSEQ registers of the L cell--i.e., all the text in the L cell that may lie inside some RA. Since every L cell embeds an IN node, it is quite possible for an IN node to see no text; this is the case if lseq=0, rseq=0, and no atom is stored in S.

2. The (- node sees the most deeply nested `(` in the L cell. Notice that if there is no `(` represented in the cell (i.e., if lap=0), then no (- is embedded.

3. The -) node sees the most deeply nested `)` in the L cell. Notice that if there is no `)` represented in the cell (i.e., if rap=0), then no -) is embedded.

4. An XN leaf node immediately to the left of a (- node sees any left application brackets to the left of the innermost `(` in the L cell. Notice that if lap=1, the XN immediately to the left of (- sees no text.

5. An XN leaf node immediately to the right of a -) node sees any right application brackets to the right of the innermost `)` in the L cell. Notice that if rap=1, the XN immediately to the right of -) sees no text.

6. An XN leaf node immediately next to a DTA sees no text.

7. The DTA node sees no text.

FIG. 9 indicates informally the correspondence between the FFP text and the embedded leaf nodes of an L cell (although no single L cell could actually have all the text shown.) This correspondence is, more formally, an order-preserving function from the sequence of FFP symbols in the machine into (but not onto) the sequence of leaf nodes of the TA-Mediator network. It therefore makes sense to speak, for instance, of the text between a DTA and an RA, even though the DTA is not itself part of the text; the correspondence defines the DTA's position with respect to any text symbol.

3.4 The Partitioning Algorithm

Corresponding to every node is a channel leading upward from the node: the upward channel of the node. Chart B gives the correspondence. Each L cell, having embedded nodes in accordance with Chart A, then uses the information in Chart B to guide the embedding of the appropriate channels leading upward from those nodes. The left-to-right order of the channels is important; it reflects the order of the nodes.

    ______________________________________
    Chart B
    The channel leading upward from a node
    Node      Upward channel of the node
    ______________________________________
    TA        M
    MED       M
    IN        I
    XN        X
    (-        (
    -)        )
    DTA       M
    ______________________________________


Having embedded the leaf nodes and their upward channels, each L cell then determines whether any adjacent channels can be brought together to form a (non-leaf) node in the L cell. Chart C shows all the legal combinations. The two or three channels brought together to form a node are called the downward channels of the node, since they connect the node to lower-level nodes of the TA-Mediator network.

    ______________________________________
    Chart C
    The node formed from adjacent channels
    Pattern
    (downward channels)
                      Node to be embedded
    ______________________________________
    1.     I I            IN
    2.     I X            XN
    3.     X I            XN
    4.     X X            XN
    5.     X )            XN
    6.     ( X            XN
    7.     ( I )          TA
    8.     M X M          MED
    ______________________________________


Having embedded a non-leaf node, the L cell then uses the information in Chart B to embed the upward channel for that node. This yields a new set of channels for the L cell to try to bring together--the upward channel of the new node having replaced the downward channels. The cell continues to embed new nodes in this manner until no more adjacent channels match the patterns of Chart C. Then the final set of channels is extended upward to the father cell.

The process just described is specified more formally by Algorithm PARTITION, shown in Chart D. The vector of channels being processed by a cell at any instant during partitioning is called the Curent Channel Configuration, or CCC of the cell. Algorithm PARTITION is executed not only by each L cell, but also by each T cell. When a cell (other than the root cell of the machine) has embedded the nodes and channels as described, it sends its final CCC on to its father T cell. A T cell gets its initial CCC by concatenating the CCCs it receives from its sons. The global effect of all this is to embed the TA-Mediator network in the machine. The characteristics of the TA-Mediator network are not significantly affected by the particular choices made in the "Select" statement of the algorithm.

    ______________________________________
    Chart D
    Algorithm PARTITION
    ______________________________________
    Do while (the CCC contains adjacent channels
    matching a pattern in Chart C)
    { Select any adjacent channels in the CCC which
    match a pattern in Chart C.
    Embed the corresponding node, giving it as
    downward channels the selected channels
    of the CCC.
    Using Chart B, embed the appropriate upward
    channel for the node.
    Form a new CCC from the old CCC by substituting
    the channel just embedded in place of the
    selected channels.
    Send the CCC to the father T cell (unless this is
    the root cell of the machine).
    ______________________________________


The reader may have noticed that the term "channel" has been used here to denote two distinct concepts:

1. The entire arc between two nodes

2. The part of a channel that lies in a single machine cell.

In most situations, the context makes clear which meaning is intended. When there is no danger of confusion, "channel" will continue to be used in both senses. The term channel segment will be used to refer specifically to the second concept.

3.5 An Example

FIG. 10 shows the TA-Mediator network embedded in a machine holding this RA: (B<CD>). The machine has only four L cells. For simplicity, such diagrams ordinarily show the syntactic brackets rather than the PC representation registers. Each channel is labelled with its channel type. The nodes are circled to help distinguish them from the channel labels. This convention is followed from here on.

With reference to FIG. 10, we see that after the embedding of leaf nodes has been done, the cells proceed to combine channels together and embed the resulting non-leaf nodes. In this example, cell #1 combines X and I into an XN node, which yields, according to Chart B, an X channel. The CCC thus becomes X M X, which cannot be reduced further by Chart C. Hence the final CCC for cell #1 is X M X.

Cell #2 and cell #3 have no matching patterns, so they send on their CCCs without change. Cell #4 combines X X into an XN node, yielding an X channel and a final CCC of: I ) X M X.

The T cell above L cells #1 and #2 gets this inital CCC: X M X X (I. The pattern X X forms an XN node, yielding an X channel, so the CCC sent on to the father T cell is: X M X (I.

The T cell above cells #3 and #4 has as its initial CCC: I I ) X M X. It combines I I into an IN node, which yields an I channel, and this final CCC: I ) X M X.

Finally, the root T cell initially has this CCC: X M X (I I ) X M X. Four nodes are formed. First, I I yields IN, which yields an I channel. Then (I ) yields a TA node, which yields an M channel. The CCC at this point is X M X M X M X.

The second, third and fourth channels match pattern #8 of Chart C. So do the fourth, fifth, and sixth channels. In FIG. 10 an arbitrary choice has been made to reduce first the rightmost selectable channels. This gives a MED and leaves a CCC of X M X M X, which again matches pattern #8, producing a second MED and a final CCC of X M X. Since this is the root T cell, the partitioning is complete at this point. (If it were not the root, the CCC of X M X would be sent to the father T cell.)

3.6 Properties of the TA-Mediator Network

Charts A, B, C, and D together define the types of channels that can lead into and out of each type of node. FIG. 11 summarizes all the possibilities schematically. FIG. 12 gives a synopsis of the possibilities for a T cell in the partitioning process. There are 64 configurations a T cell may have to take on (but only one during any particular partitioning). The figure coalesces some similar configurations via the "alpha" and "beta" notation, as explained in the figure. The order of the rows and columns has been chosen deliberately to show the left-right symmetry. The entry in the bottom right corner (which represents 4 of the 64 configurations) is the only one in which there was a choice of which channels to combine together into a node. As in the example, an arbitrary choice has been made to combine the rightmost selectable channels first. The only other choice--using the leftmost selectable channels first--would have been just as appropriate; nothing depends critically on that choice. In both cases, the same number of nodes, of the same types, get embedded. The same CCC is sent to the father T cell. The only distinction is whether the TA is the right son or the left son of its MED father, and whether that MED is the right son or left son of its MED father. The only possible difference this can make is in the storage management process. In fact, it makes no significant difference.

With reference to FIG. 3, it is now possible to give, for each configuration of channels entering communications port 350, the configuration of channels carried by each of the communications paths leading upward from communications port 350. Table A below gives this correspondence.

                  TABLE A
    ______________________________________
     350        352     353    354    355  356
    ______________________________________
    I                                      I
    I ) X       I )            X
    X ( I                      X      (    I
    I ) X ( I   I )            X      (    I
    X M X       X       M      X
    I ) X M X   I ) X   M      X
    X M X ( I   X       M      X      (    I
    I ) X M X ( I
                I ) X   M      X      (    I
    ______________________________________


Again with reference to FIG. 3, it is now possible to give, for each configuration of channels entering communications port 360, the configuration of channels carried by each of the communications paths leading upward from communications port 360. Table B below gives this correspondence.

                  TABLE B
    ______________________________________
     360        366    365      364  363    362
    ______________________________________
    I           I
    I ) X       I      )        X
    X ( I                       X           ( I
    I ) X ( I   I      )        X           ( I
    X M X                       X    M      X
    I ) X M X   I      )        X    M      X
    X M X ( I                   X    M      X ( I
    I ) X M X ( I
                I      )        X    M      X ( I
    ______________________________________


FIG. 12A shows all the configurations that may be taken on by an IN module during partitioning. (See FIGS. 3, 5, and 7.) Notice that one of the possibilities is that no channels and no nodes get embedded.

FIG. 12B shows all the configurations that may be taken on by a TA module during partitioning. (See FIG. 3.) Notice that one of the possibilities is that no channels and no nodes get embedded.

FIG. 12C shows all the configurations that may be taken on by an XN module during partitioning. (See FIG. 3.)

FIG. 12D shows all the configurations that may be taken on by a MED module during partitioning. (See FIG. 3.)

A definition has been given for the text seen by a leaf node of the TA-Mediator network. That definition is now extended. The text seen by a non-leaf node of the TA-Mediator network is defined (recursively) to be the text seen by the sons of the node. The L cells seen by a node are the L cells containing the text seen by the node. The text seen by an L cell is the text represented in that L cell (by LAP, LSEQ, S, RSEQ, and RAP). The text seen by a T cell is the text seen by its son cells.

The FFP text strictly between two adjacent RAs (or between an RA and an adjacent DTA, or between two adjacent DTAs) is called an XRA (external to an RA). An XRA consists of zero or more FFP symbols. Every FFP symbol in the machine lies either in some RA or in some XRA.

The following properties of the TA-Mediator network are easy to verify:

1. A non-leaf IN node has two sons, both of which are IN nodes.

2. A non-leaf XN node has two sons, at least one of which is an XN node.

3. A TA node has three sons: (- and IN and -).

4. A MED node has three sons. The left son is a DTA, a TA, or a MED. So is the right son. The middle son is an XN.

5. A DTA, (-, or -) is always a leaf node.

6. The middle subtree of a TA is a binary tree of IN nodes.

7. Each TA sees one RA, and nothing else.

8. The IN son of a TA sees all the text between the `(` and `)` of the RA, and nothing else.

9. Each RA is seen by exactly one TA.

10. The TA of an RA lies in the lowest cell seeing the whole RA.

11. Each RA area (i.e., the TA and its subtree) is disjoint from the other RA areas.

12. The XN son of each MED sees precisely one XRA, and nothing else.

13. Each XRA is seen by the XN son of exactly one MED.

14. The MED whose XN son sees a given XRA lies in the lowest T cell seeing the two RAs (or the DTA and RA, or the two DTAs) that bound the XRA.

15. There is a MED (or two of them) in the root cell of the machine. This is guaranteed by the presence of the DTAs in the endmost cells.

16. Except for the two outermost XN nodes and their X channels, the TA-Mediator network is a tree.

17. The subgraph of the TA-Mediator network consisting solely of the TAs, DTAs, MEDs, and M channels is a binary tree whose non-leaf nodes are MEDs and whose leaf nodes are TAs and DTAs.

3.7 Storing Information in the IN Nodes

The partitioning upsweep not only builds the TA-Mediator network, it stores in each IN node two kinds of information concerning the text seen by the node:

1. Indications of whether it and its sons see any FFP text at all.

2. Relative syntactic nesting depth information.

These are now discussed in more detail.

3.7.1 Mtext, Ltext, and Rtext

Each IN leaf node has a Boolean flag, denoted by Mtext (for "My text") that is true if the node sees some FFP text, and is false otherwise. Each IN non-leaf node has three Boolean flags: Mtext, Ltext, and Rtext, indicating, respectively, whether it, its left son, and its right son see any FFP text.

Thus each I channel carries an Mtext value during the partitioning upsweep. When a non-leaf IN node is formed, it sets Ltext to the value received from the left, Rtext to the value received from the right, and Mtext to the logical or of the two values. It sends Mtext on up its upward I channel.

Only the IN nodes have to perform this computation. The I channel leading into a TA node carries Mtext=true. The TA can discard this value. The only other node with an I downward channel is the XN node; it, too, can discard any Mtext values that come to it.

3.7.2 Relative Offsets

The other computation done on the partitioning upsweep stores in the IN nodes information about the relative syntactic nesting depths of the FFP text corresponding to the two subtrees of the node. This informtion is used on the next downsweep to tell every FFP symbol that lies inside an RA what its nesting depth is, relative to the RA's application brackets.

On the partitioning upsweep, for this purpose, every I channel carries two non-negative integers, LB and RB, and every IN node stores two non-negative integers, LOFF and ROFF.

LB and RB represent the number of unmatched sequence brackets on the left and right boundaries, respectively, of the subtree under the I channel. By definition, the number of unmatched sequence brackets on the left boundary is the number of unmatched right sequence brackets in the subtree, and vice versa.

"LOFF" and "ROFF" stand for "left offset" and "right offset." The highest level of FFP text in one subtree of an IN node may be at a deeper syntactic nesting level than the highest level of FFP text in the other subtree. LOFF and ROFF indicate which (if either) subtree is deeper. If LOFF<0 then ROFF=0 and the left subtree is deeper by LOFF levels. If ROFF<0 then LOFF=0 and the right subtree is deeper by ROFF levels. Otherwise, the highest levels of FFP text in the subtrees are at the same nesting depth (and LOFF=ROFF=0).

Each IN node, upon being formed from two adjacent I channels, uses the LB and RB values on these channels to compute LOFF, ROFF, and the output values of LB and RB. Let LB1 and RB1 denote the values from the left channel; LB2 and RB2 denote the values from the right channel; and LB and RB denote the values sent on up to the father on the upward channel.

The leaf IN nodes can use the same computational algorithm as the other IN nodes if the appropriate input values of LB1, RB1, LB2, and RB2 are assumed. Recall that every L cell embeds exactly one IN leaf node; that IN leaf node should act as if it had the following input values:

LB1=0 RB1=lseq LB2=rseq RB2=0

(Recall that lseq and rseq are the number of left and right sequence brackets stored in the cell.) In other words, the IN leaf node acts as if it has a left son that sees the (lseq) left sequence brackets and a right son that sees the (rseq) right sequence brackets.

The algorithm for computing LOFF, ROFF, LB, and RB is given in Chart E. In effect, this algorithm matches up the (unmatched) left sequence brackets of the left son with the (unmatched) right sequence brackets of the right son. If there are more left brackets, ROFF becomes positive; if there are more right brackets, LOFF becomes positive; otherwise both become 0.

    ______________________________________
    Chart E
    Algorithm for computing relative offsets
    ______________________________________
    OFF              := RB1 - LB2
    ROFF             := Max ( 0, OFF )
    LOFF             := Max ( 0, -OFF )
    LB               := LB1 + LOFF
    RB               := RB2 + ROFF
    Store LOFF and ROFF.
    Send LB and RB up the upward channel.
    ______________________________________


Only the IN nodes have to perform this computation. The I channel leading into a TA node carries LB=0 and RB=0, assuming the RA is syntactically well-formed; the TA can discard these values. The only other node with an I downward channel is the XN node; it, too, can discard the LB and RB values that come to it. FIG. 13 shows a small example.

3.8 Some Implications of Partitioning

The following brief observations can now be made about the requirements of the cells of Machine II:

1. The nodes act as processors. Each has some local storage. Whether each node actually is a separate processor, or the nodes time-share one or more processors, is a question to be decided on the basis of cost and performance.

2. The channels are used as two-way communication links between nodes. This implies the need for some registers and some simple processing capabilities for each channel in each cell.

3. No L cell other than the leftmost and the rightmost in the machine has to embed more than 3 nodes. The L cells on the ends of the machine have to embed no more than 7 nodes. Exactly one of the nodes embedded in each L cell is an IN node. Chapter 5 describes how that IN node may be refined by embedding several other nodes within it.

4. No L cell other than the leftmost and rightmost in the machine has to embed more than 3 channel segments. The L cells on the ends of the machine have to embed no more than 7 channel segments. Exactly one of the channel segments in an L cell is an I channel segment. Chapter 5 describes how that I channel segment may be refined by embedding several other channel segments within it.

5. No T cell has to embed more than 4 nodes. (See FIG. 12.) At most one of these is an IN node. Chapter 5 describes how the IN node may be refined by embedding several other nodes within it.

6. No T cell has to embed more than 1 TA node.

7. No T cell has to embed more than 2 MED nodes.

8. No T cell has to embed more than 3 XN nodes.

9. No T cell has to embed more than 18 channel segments. At most five of these are I channel segments. Chapter 5 describes how each of these may be refined by embedding one or more new channel segments within it.

3.9 Another Example of Partitioning, with Relative Offsets

The next three figures show an example of partitioning in a machine with eight L cells. FIG. 14 shows which cells the various nodes and channels are embedded in. FIG. 15 shows the resulting TA-Mediator network structure, without the machine cells. FIG. 16 is a blow-up of the internal structure of the rightmost RA, showing the LB, RB, LOFF, and ROFF values.

4. Overview of Execution

What is stored in the L array is one or more applicative expressions, each comprising a single user-program. For simplicity, the convention is adopted that each user-program must be an application (u d ), where u is an operator expression of some standard form, say <QUIT jobinfo userinfo>, holding some identifying information about the job and the user; and where d is the applicative program the user wants executed. The operator QUIT is executed, of course, only after d has been reduced to a constant. It is also assumed that the application brackets enclosing a user-program are tagged so that the machine can distinguish them from ordinary application brackets, and a user-program is not allowed to be nested inside another expression. This makes it easy for the machine to detect completion of user-programs.

Machine II has one or more T cells that can serve as I/O ports, as they simultaneously serve in their normal roles as T cells. Each T cell in the machine is connected to each of its two sons by an I/O channel. The I/O channels are (logically, at least) distinct from the channels of the TA-Mediator network. Every T cell knows how to send information to an I/O port along the I/O channels.

Outside of the machine is a processor P that acts as the interface between the users and the machine. P can communicate with each I/O port of the machine. P accepts user-programs from the users and collects information about their space (and perhaps time) requirements. P and Machine II jointly decide when and where new user-programs should be inserted (as described below), and P makes those programs available to the appropriate I/O ports. When a user-program has been executed, it flows back to P through an I/O port, and P returns it to the user. This I/O, of course, must be coordinated with the other activities of Machine II; the following paragraphs describe at a high level how this is done. The succeeding chapters describe certain aspects in more detail.

Assume that some user-programs have been brought into the machine and stored in L, and that the machine has been running for a while. The activities of the machine are cyclic. FIG. 17 shows how the steps of the cycle are related. The description begins with the most easily identified point in the cycle:

1. [Polling] The topmost MED node in the machine sends a polling signal down its M channels. This signal, which is intended to determine the status of the RAs, spreads to all the TAs in the machine.

2. [Finding molten zones] Every TA responds immediately to the polling signal by sending one of these three status indications to its father MED:

(a) the RA has been reduced--i.e., its execution is complete

(b) the RA needs storage now

(c) the RA is still executing, and storage is not needed at this time.

On the basis of the status information, the MEDs locate the so-called molten zones of the machine: those regions outside the RAs with status (c). The RAs with status (c) continue their execution without interference. Any RA with status (a) or (b) becomes part of a molten zone, as does the non-RA text in the machine. The molten zones are the regions of the machine that are ready to have their text shifted around to provide storage where it is needed. Corresponding to each molten zone is a unique MED node that sees that entire molten zone; that MED node supervises steps 3 through 8 of the cycle for that zone.

3. [User-program removal] In each molten zone, each completed user-program is detected and removed from the machine--i.e., the result (the final expression) is returned to the user and erased from the L cells.

4. [Compaction] In each molten zone, after the removal of all completed user-programs, certain of the syntactic brackets may be coalesced, so that the text becomes fully compact in the zone. Execution of RAs--in particular, erasure of expressions--can yield text that is not fully compact. The purpose of this step of the cycle is to regain any such wasted space.

5. [Insertion decisions and denial decisions] Each molten zone decides whether it has too little space to honor all its storage requests, exactly the required amount, or an abundance. If there is too little, it denies the requests of some RAs; they wait until next time. If there is exactly the required amount, then all the requests are satisfied. If there is an abundance, and if the zone is not entirely inside a user-program, then the number of excess cells is reported to an I/O port, which reports it to P. If there are user-programs of the appropriate sizes waiting to be brought in, then a decision is made concerning which of them to bring in, and their sizes are reported back to the top MED node of the molten zone.

6. [Destination computation] Each molten zone--on the basis of the sizes and positions of its undenied storage requests, the sizes of the new user-programs to be inserted, and the locations of the empty L cells--computes where each text symbol presently in the zone should move in order to create empty L cells where they are needed to satisfy all the storage requests of the (undenied) RAs and to provide space for the new user-programs.

7. [Text movement] In each molten zone, the program symbols move from their present locations in L, up through the machine tree, and back down to their L cell destinations.

8. [New user-program insertion] The new user-programs are brought in and stored in the space allocated for them.

9. [Partitioning] In each molten zone, when the text has finished moving and the new programs (if any) have been stored, partitioning is done, as described in the previous chapter. This process builds a new TA-Mediator network reflecting the new alignment of text in the L array. Actually, the algorithm used here is a slightly modified version of Algorithm PARTITION of Chapter 3, the modification being necessary to take into account the fact that part of the TA-Mediator network (the part corresponding to the RAs whose status in step 2 above was (c): still executing) is already standing in place. The partitioning upsweep, in a sense, synchronizes the whole machine, terminating in the root cell only when all the molten zones have finished their text movemet and been re-integrated into the global network (i.e., the new TA-Mediator network). The topmost MED in the machine can then initiate another polling signal to start the next cycle.

10. [Parsing] As each TA is found on the partitioning upsweep, it notices the status of its RA. There are three possibilities:

(a) new RA

(b) old RA, storage having been allocated

(c) old RA, storage having been denied.

In case (c), there was insufficient space available to satisfy the RA's storage request this time, so the TA simply has to wait for the next polling signal to come down, and then ask again for storage. In case (a) or (b), though, the TA immediately initiates a parsing process in its area. This process, described in detail in the next chapter, refines the TA's tree of IN nodes by embedding a more elaborate tree of nodes reflecting the syntax of the RA. This is done in each RA's area concurrently with the continuing upsweep of the partitioning process elsewhere in the machine. There is no need for the RAs to communicate with each other in this process; each proceeds independently of the others.

11. [RA execution] The RAs are reduced concurrently, each beginning as soon as its parsing is completed. Within each RA's area, all the nodes are initially quiescent. The TA initiates the execution. It can activate computational processes in the other nodes of its network. The techniques for specifying and implementing these computations are described in Chapter 6. For now, it is merely noted that the computations can be highly concurrent and that they culminate in the reduction of the RA. Although the computation can involve the concurrent actions of a large number of machine cells, it is always the case that control eventually returns to the TA, all the other nodes of the RA's network becoming quiescent. At this point, the computation may be complete, or it may merely have reached a point at which additional storage is needed in order to continue the reduction. In either case, the TA waits until the next polling signal and then announces its status.

Steps 2 through 8 are collectively referred to as storage management. Chapter 7 discusses storage management.

In any particular molten zone, steps 2 through 9 are mutually exclusive: each step is completed before the next one begins. However, each molten zone goes through these steps independently of the other molten zones. Step 9 (partitioning) eventually causes the molten zones to go out of existence, as the new TA-Mediator network is built. (Steps 7, 8, and 9 could, in each molten zone, be overlapped to some degree, but, for simplicity, this possibility is not pursued here.)

The internal parsing of an RA (step 10) takes place concurrently with the continuing upsweep of the partitioning process (step 9); and as soon as the parsing is complete for an RA, the RA's execution (step 11) begins, independent of whether partitioning and parsing are still going on elsewhere in the machine.

An RA, once it has started execution, proceeds without pause through as many cycles as needed, until it either finishes or decides to ask for storage. Its only continuing responsibility outside itself is to respond immediately to each polling signal that comes down to the TA. (That response requires but a trivial amount of work by the TA.) This contrasts sharply with the situation in Machine I, in which every RA that has not finished by a certain point in the cycle is interrupted, forced to push back into L any data moving through the tree, and then required to participate in global storage management.

5. Embedding the SN-CP Network

5.1 Introduction

A human parses an FFP expression such as

(E<<F<G H I>><<J K>L>M>)

only with some difficulty. The difficulty is eased if the expression is drawn as a syntax tree in which each atom and each pair of matching brackets forms a node, as shown in FIG. 47.

In this form it is immediately obvious that the expression is an application with two subexpressions, the first being E and the second being a sequence which itself has three subexpressions, and so on. The translation back and forth between the two representations is straightforward.

A relative level number (RLN) is associated with each subexpression of the RA, as follows:

The RLN of the RA itself is 0.

If x is an element of expression e (see the discussion of FFPs), and e has RLN=i, then x has RLN=i+1.

There is, furthermore, a natural way to associate an RLN with each symbol of an RA. If the symbol is an FFP atom, then it is also an FFP expression, and thus already has an RLN. If the symbol is a syntactic bracket, then its RLN is taken to be the RLN of the expression delimited by that bracket.

EXAMPLE

    ______________________________________
    Text:  ( E < < F < G H I > > < < J K > L > M > )
    RLN:   011233444322344332210
    ______________________________________


The syntax tree shows it nicely of FIG. 48.

The execution of an RA in Machine II takes place in a tree of nodes reflecting the syntactic structure of the RA. The present chapter describes how that tree of nodes--a refinement of the TA-Mediator network--is embedded in the machine. Ideally, the tree embedded for each RA would look like the syntax trees in FIGS. 47 and 48. Two problems arise, however. Both of them occur because only a fixed amount of hardware can be incorporated in each machine cell.

The first problem is that there is virtually no limit to the number of elements a sequence can have. If <a1 a2 a3 . . . an > is embedded in the machine, the <> node has n sons. Does this mean it has to have n channels feeding directly into it from the sons? If so, the machine cell holding <> has to be able to embed n channels.

The second problem is that there is virtually no limit to the depth to which an expression can be nested. Suppose, for instance, that

<a1<a2<a3 . . . <an-1<an bn>bn-1>. . . b3>b2>b1>

is to be embedded in the machine. The natural way to do so is to embed, for each subexpression, a node in the lowest machine cell seeing the entire subexpression. It may happen that a single cell is the lowest one for all n of the non-atomic subexpressions in the example. The text might, for instance, be stored in the machine as indicated in FIG. 49.

In this case, one machine cell (the one marked in FIG. 49 with *) has to embed n nodes. In other words, the amount of hardware needed in the cell grows linearly with the nesting depth of the text seen by the cell.

The solution proposed here to the first problem is to combine together the upward channels of the sons of a node before the channels reach the node. For instance, the syntax tree shown in FIG. 50 might be treated by the machine as if it were of the form shown in FIG. 51.

It is possible to do so in such a way that every node is guaranteed to have only a few downward channels. Furthermore, it can be guaranteed that only a few of the circular of FIG. 51 to be embedded in any single machine cell.

The solution proposed here to the second problem is to embed a syntax tree corresponding only to the top few syntactic levels (i.e., the least deeply nested levels) of each RA. Most of the computation will be expressed in terms of these top few levels, and will be carried out by the nodes so embedded. (Of all the primitive operators mentioned in Backus's Turing Award Lecture, "Can programming be liberated from the von Neumann style? A functional style and its algebra of programs," Communications of the ACM 21, 8 (August 1978), 613-641, only one requires knowledge of the structure of the operand to a depth greater than three.) It is essential, though, that all the text of the RA be "visible" to the TA for certain basic functions such as copying the text to another location or comparing two expressions for equality.

Thus the following strategy is used. A small positive integer D is chosen at the time the machine is built. In each RA, every sequence bracket with RLN>D is treated, for purposes of parsing, as if it were an FFP atom. In effect, this "flattens" all the syntactic structure below level D into a single "flat" level (with RLN=D+1).

Suppose, for instance, that D=2 for the machine. Then, given the text

(I<<F<G H I>><<J K>L>M>),

the machine would embed a tree structure corresponding not to the syntax tree of FIG. 47 but to the flattened syntax tree shown in FIG. 52.

The coming sections show how to use these ideas to embed, for each RA, a tree of nodes reflecting the top levels of the syntax of the RA, assuming but a fixed amount of hardware in each machine cell. The term parsing will be used to refer to this process of embedding nodes corresponding to the RA's syntax.

5.2 The Downsweep: Computing the RLNs

Once a TA has been formed, it takes charge of the reduction of its RA. Since each TA sees its RA through its own separate network of nodes and channels, the TAs can proceed independently of each other. Each TA initiates parsing immediately upon being formed, so that parsing occurs simultaneously with the continued building of the TA-Mediator network higher up in the machine.

The parsing process refines the middle subtree of the TA (recall that it is a binary tree of IN nodes) by embedding a more elaborate tree reflecting the top several levels of the RA's syntax. The only nodes involved in the parsing are the TA nodes and the IN nodes that lie somewhere beneath a TA and see text. The IN nodes outside the RA areas and the IN nodes inside an RA but seeing no text have, by the time parsing begins, served their purpose. Now they simply sit until the next round of storage management. Each IN node that lies inside an RA area and sees FFP text, though, actively participates in the parsing, and thereby embeds within itself one or more new nodes and channels. These IN nodes, as distinguished from the non-participating ones, are called active IN nodes.

The resulting tree under each TA is called the SN-CP network of the RA, in reference to two of its several kinds of nodes, the SN node and the CP node. As was previously discussed, only the top levels of syntax of the RA can be parsed fully, since the amount of hardware in each machine cell grows with the number of levels of parsing depth. Let D denote the maximum number of levels of the RA's syntax that can be fully parsed. The value of D will determine the amount of hardware built into each machine cell.

The first downsweep after partitioning informs every FFP symbol in the RA what its RLN is. Then, to initiate the parsing upsweep, the symbols that are deeper than D cause "flat" nodes to be embedded in the L cells, while the symbols at higher levels cause ordinary syntax nodes to be embedded. (Notice that the symbols at the higher levels of syntax have small RLNs, and the symbols at the deeper levels of syntax have large RLNs).

The downsweep is initiated by the TA immediately upon its birth: the TA sends to its IN son the integer 1, which is the RLN of the highest-level text seen by the IN son of the TA. During this downsweep, each active IN node receives a single such integer, which is denoted by Mrln (for "My RLN"). Each IN non-leaf node, upon receipt of Mrln, computes the corresponding values to send to its sons, suing the algorithm shown in Chart F. In each case, the value of Mrln received by an IN node (non-leaf or leaf) is the RLN of the highest-level text seen by that node. What an IN leaf node does with Mrln is described in Section 5.4. Notice that only the active IN nodes receive Mrln.

    ______________________________________
    Chart F
    Algorithm for computing nesting depths
    ______________________________________
    Accept Mrln from father.
    Lrln := LOFF + Mrln
    Rrln := ROFF + Mrln
    If Ltext = true, send Lrln to left son.
    (This is the left son's Mrln.)
    If Rtext = true, send Rrln to right son.
    (This is the right son's Mrln.)
    ______________________________________


5.3 The Nodes and Channels of the RA's SN-CP Network

The parsing process in each TA's area refines the binary tree of IN nodes and I channels by embedding a tree--the SN-CP network of the RA--consisting of six kinds of nodes and four kinds of channels. These are referred to as interior nodes and interior channels. The interior nodes are represented by the following node symbols:

1. SN--"Syntax" node.

2. CP--"Syntax combining place" node.

3. FN--"Flat" node.

4. FC--"Flat combining place" node.

5. <---"Left sequence bracket" node.

6. ->--"Right sequence bracket" node.

The interior channels are represented by the following channel symbols:

1. S--"Syntax" channel.

2. F--"Flat" channel.

3. <"Left sequence bracket" channel.

4. >"Right sequence bracket" channel.

Each <- and each -> and each FN is a leaf node of the SN-CP network. Each CP and each FC is a non-leaf node. An SN may be either a leaf or a non-leaf. These facts will be easy to verify when the parsing algorithm has been specified.

Each SN node corresponds to an FFP expression at or above level D. Each FN node corresponds to one or more FFP symbols below level D. Each <- node corresponds to a `<` at or above level D. Each -> node corresponds to a `>` at or above level D. Each CP node combines the channels from SN (or other CP) nodes. Each FC node combines the channels from FN (or other FC) nodes.

The upsweep that embeds the SN-CP network also stores in the interior nodes information concerning the text seen by the nodes. For simplicity, the embedding process is described first. Then the enhancements that store the necessary information in the nodes are described.

In the upsweep of the parsing process, the active IN nodes get refined: they embed various interior nodes and channels within themselves. The I channels also get refined, embedding various interior channels (subchannels, one might say) within themselves.

If an I channel is refined by embedding, say, three channels of type S and > and S, respectively, then the situation could be pictured by showing the three interior channels enclosed in the I channel as depicted in FIG. 53.

It is usually clearer, however, to omit the I channel from the diagram, and simply show the interior channels, as shown in FIG. 54.

Similarly, it is usually convenient and clear to show the interior nodes and channels that get embedded within an IN node, without showing the IN node itself. Thus, for instance, the diagram of an IN node and its channels shown in FIG. 55 will instead commonly be drawn as shown in FIG. 56.

After the parsing has been done, the IN nodes and the I channels--as distinguished from the interior nodes and channels embedded within them--have no role to play in the RA's execution. They simply sit passively and wait until the RA goes through storage management; at that point they again become useful, as is explained in Chapter 7.

5.4 The Upsweep: Building the SN-CP Network for the RA

5.4.1 A Simple Example of an SN-CP Network

Before delving into the details of the parsing upsweep, consider an example of what is to be embedded. The complications introduced by the "flat nodes" are avoided in this example, by assuming that D is greater than the nesting depth of the text in question.

Consider this RA: (SP<<3 2 8><6 4 5>>). It has the syntax tree shown in FIG. 57.

FIG. 18 shows, for one way of storing this RA in the machine, the SN-CP network that gets embedded. (As explained above, the IN nodes and the I channels are not pictured.) FIG. 19 shows the structure of that SN-CP network, without the clutter of the machine cells.

FIG. 20 and FIG. 21 show the corresponding information for a different alignment of the text in the machine. (Still other alignments are possible, of course.)

Notice how similar the two SN-CP networks are, when abstracted from the machine. The only distinction between them is the order in which the S channels come together to form the CP nodes. This order depends entirely on how the text happens to sit under the T cells.

Notice that there is an SN node corresponding to each atom and to <3 2 8 > and to <6 4 5 > and to <<3 2 8 > <6 4 5 >>. The TA node corresponds to the entire RA. The CP nodes are used to combine together pairs of S channels of the top-level elements of a sequence or application. The close correspondence between the syntax tree and the SN-CP network should be apparent.

Assume that SP is a primitive FFP operator that takes a "sum of products" of the components of its operand. In the present example, then, the result should be 168, since 168=(3) (2) (8)+(6) (4) (5). Given the appropriate instructions, the nodes of the SN-CP network can easily and concurrently compute this result. In the SN-CP network of FIG. 19, for instance, the 2 and 8 can be multiplied in the CP node just above the SN nodes corresponding to 2 and 8. Then in the next CP above, the 3 and the 16 can be multiplied together.

While this is going on, the product of 6, 4, and 5 can be found in a similar way. Then the sum can be computed in the CP node that joins the <3 2 8 > and the <6 4 5 >. The degree of concurrency is greatly increased if the operand has many subsequences, each with many elements. The next chapter shows how to program the modes of the SN-CP network to perform highly concurrent computations of this kind, and of much more complex kinds.

5.4.2 Embedding the Leaf Nodes of the SN-CP Network

A leaf IN node, upon receiving the Mrln value from its father, adds Mrln to LOFF and ROFF and uses the resulting values to guide the embedding of one or more nodes within the IN node. The new nodes can conveniently be thought of as belonging to three groups: a left group, a middle group, and a right group. The nodes in the left group correspond to the FFP text represented by the LSEQ register--i.e., to the lseq left sequence brackets represented in the cell. The middle group corresponds to the contents of the S register of the cell. The right group corresponds to the rseq right sequence brackets in the cell.

When the IN leaf node receives Mrln, it performs these assignments:

Lrln:=LOFF+Mrln

Rrln:=ROFF+Mrln.

Lrln is the RLN of the topmost (i.e., outermost) `<` represented in the cell, if there is one, and Rrln is the RLN of the topmost `>`, if there is one. The chart below shows the RLNs of all the symbols of one RA. It also shows the LOFF, ROFF, Mrln, Lrln, and Rrln values of the leaf IN node. It is always true that lseq+Lrln=rseq+Rrln=the RLN of the atom, if there is an atom. If there is no atom in the cell, then this number is the RLN that an ato