Interpreter

Object-oriented, logic, and database programming tool with garbage collection

4989132

Abstract

A programming tool is provided which integrates an object-oriented programming language system, a logic programming language system, and a database in such a manner that logic terms can be treated as objects in the object-oriented programming language system, objects can be treated as logic terms in the logic programming language system, and logic terms and objects are stored in the database in a common data structure format. Automatic management of the database is provided which is transparent to the user.


Claims

We claim:

1. A program tool, comprising

a. a workstation having an operator interface, a mass memory, a CPU, and main memory;

b. an object oriented programming language system including,

(1) an object oriented programming language, and

(2) object oriented language compiler means for translating source code written in the object oriented programming language into objects and interpreter code;

c. a logic programming language system, having components representing terms, clauses, predicates, atoms, and variables, including,

(1) a logic programming language, and

(2) logic compiler means for translating source code written in the logic programming language into objects;

d. a database residing in said mass memory, for storing objects and components of a logic programming language as objects in a common data structure format, applications data, and applications stored as compiled interpreter code;

e. object database management for representing objects and components of a logic program in said common data structure format as objects, and responsive to calls for retrieving and storing such objects in said database, and for automatically deleting objects from said data base when they become obsolete;

f. interpreter means for executing said interpreter code and generating calls to said database management means; and

g. logic subsystem means for solving logic queries, said logic subsystem means treating any object as a term in the logic programming language.

2. The programming tool claimed in claim 1, wherein said object oriented programming language system includes means for calling subroutines written in another language and treating the call as an object.

3. The programming tool claimed in claim 1 or 2, wherein said logic programming language system includes means for processing attribute labels in a manner such that the attribute labels are taken as identical to object attribute names in the object programming language.

4. The programming tool claimed in claim 3, wherein said object oriented language compiler means comprises:

a. first phase means for performing compilation including parsing, optimization, and interpreter code generation;

b. second phase means in communication with said first phase means for resolving global symbols and loading the database with objects and interpreter code; and

c. an assembler-like intermediate language for communication between said first and second phase means.

5. The programming tool claimed in claim 3, wherein said object-oriented programming language system includes a language that is a dialect of Smalltalk (Alltalk), and means for treating primitive invocations as objects; and wherein said logic programming language system includes a language that is an extension of Prolog (ALF), and means for treating attribute labels as identical to instance variable names, and means for typing logic variables using objects.

6. The programming tool claimed in claim 5, wherein said interpreter code comprises a plurality of types of bytecodes, and said interpreter includes a plurality of bytecode handler means, one such means for processing each type of bytecode.

7. The programming tool claimed in claim 6, wherein said bytecode types comprise:

execute a primitive,

send a message,

define a block,

evaluate a block,

return from a block or method,

branch, and

assign from one variable to another.

8. The programming tool claimed in claim 7, wherein said interpreter means includes means for maintaining blocks as C data structures, and for making blocks into objects when blocks are assigned to instance variables, or returned as the result of a message.

9. The programming tool claimed in claim 8, wherein said interpreter means includes means for maintaining contexts as C data structures, and for making blocks into objects if and when an associated block is made into an object.

10. The programming tool claimed in claim 9, wherein the bytecode handler means for the "define a block" type bytecode generates block stubs, and wherein said interpreter means creates active context(s) for a block stub, stored separately from said block stub, and wherein said active contexts associated with block stubs obey a stack discipline.

11. The programming tool claimed in claim 10, wherein said interpreter means maintains running data structures for object-oriented processes in an array, each element in the array representing one process, each element containing a stack of active contexts, a pointer to the current context in the stack, an array of block stubs, and a pointer to the next available block stub.

12. The programming tool claimed in claim 11, wherein said interpreter means manages processes by creating processes, switching processes, destroying processes, and performing optimizations on processes.

13. The programming tool claimed in claim 12, wherein said interpreter means performs optimization on processes by message flattening.

14. The programming tool claimed in claim 12, wherein said interpreter means performs optimizations on processes by treating each primitive as its own bytecode.

15. The programming tool claimed in claim 5, wherein said logic programming language includes a set of built in predicates SEND N and including means for sendng messages between the logic programming language system and the object-oriented programming system by employing said predicates SEND N.

16. The programming tool claimed in claim 15, wherein said set of built-in predicates take arguments "receiver", "answer", "selector", and n additional arguments; wherein "receiver" is the receiver of the message to be sent, "answer" is the object returned from the message, and "selector" is that of the message send, and n remaining arguments are arguments to the message send itself.

17. The programming tool claimed in claim 5, wherein all clauses in the logic programming language are represented as instances of class "Clause", and are rules, facts and queries, and wherein included in the instance variables of class "Clause" are "head" and "tail"; if "head" is nil, the clause is a query, if "tail" is nil, the clause is a fact; "head" is of class "Predicate", or a sub-class thereof, "tail" is of class "LinkedList" whose links are of class "Predicate", or a sub-class thereof; and wherein values of the instance variables of the "head" and "tail links" can be arbitrary objects.

18. The programming tool claimed in claim 3, wherein said database management means includes:

a. object manager means employed by the object oriented language compiler, the interpreter means, primitives, and utilities for providing access to objects in the object database and for mainatining the orginization of objects in the database;

b. method fetcher means for calling the object manager means to fetch methods for the interpreter;

c. access manager means for managing access to the database, and being called by,

(1) a buffer manager for retrieving objects from the database,

(2) a transaction manager for adding/updating objects in the object database at commit points, and

(3) the object manager for providing higher level interface of the database;

d. buffer manager means for,

(1) generating calls to the access manager means when called by the object manager means, and

(2) keeping an in-memory copy of objects when called by the pool manager means;

e. pool manager means for maintaining memory for buffers; and

f. garbage collector means integrated with said object manager means and said interpreter means for identifying objects in main memory that are no longer reachable.

19. The programming tool claimed in claim 18, wherein said garbage collector means includes means for defining numbered regions for garbage collection, such that when a context is created, it is assigned a region number, each object created or accessed being assigned the region number of the context that created or accessed it, unless it was previously associated with a lower number; and when an object is returned from a called method to the calling method, the object being moved to the region of the calling method, and when a reference is made from a first object to a second object in another region, the second object being moved to the region of the first object, and when returning from a method, if the context to which it is returning belongs to a region whose number is at least two lower than that of the current region, then the said garbage collector means collects garbage in the regions with the higher numbers than that of the context to which return is being made.

20. The programming tool claimed in claim 19, wherein said garbage collector means includes region cleaning means for detecting when a region has accumulated an excessive number of objects and cleanng the region thus detected.

21. The programming tool claimed in claim 19, wherein said garbage collector means includes means for detecting when objects are shared across processes and for insuring that no object is discarded that is in use by another process.

22. The programming tool claimed in claim 19, wherein said garbage collector means includes an off-line mark/sweep collector means for periodically removing objects from the object database that have become unreachable by any other object in the database, by first marking all objects in the database that can be reached, and then sweeping the database to remove unmarked objects.

23. The programming tool claimed in claim 22, wherein said object database contains constants that are permanently marked such that they cannot be removed by said off-line mark/sweep collector means.

24. The programming tool claimed in claim 3, wherein said logic subsystem means performs unification of logic variables to answer logic queries, and in doing so, takes into account the typing of the logic variables to enable constraint of permissible values of logic variables.

25. The programming tool claimed in claim 3, further comprising debugger means for providing debugging functions comprising setting break points, stepping through program execution, tracing information (e.g. messages, blocks, bytecodes, processes), and displaying values of data structures, said debugger means being integrated with said interpreter means and including a set of C routines for performing tasks associated with the debugger commands, code within the interpreter, and a set of global variables and constants used to communicate between the C routines and the code in the interpreter.

26. The programming tool claimed in claim 3, wherein said object database comprises a key file and a prime file, the prime file having records of variable length containing objects, and the key file having records of fixed length containing the address and record length of objects in the prime file.

27. The programming tool claimed in claim 26, wherein objects in the prime file can be one of 6 types, including:

normal objects,

a symbol cross reference record that contains a string for a symbol and associated object identification of a symbol object,

a dictionary cross reference,

a control record,

a checkpoint integrity record, and

logically deleted objects.

28. In a heap based programming language system, having garbage collector means for removing objects from memory that are no longer reachable by the system, and improved garbage collector means, wherein the improvement comprises: means for defining numbered regions for garbage collection, such that when a context (representing the state of a method which is executing in the system) is created, it is assigned a region number, when an object is created or accessed by a method it is assigned the region number of the on context of the method that created or accessed it, unless the object was previously assigned a lower number; means for moving an object to the region of a calling method when an object is returned from a called method to the calling method, means for moving a second object to the region of a first object when reference is made from the first object to the second object assigned to another region and wherein said garbage collector means collects garbage when returning from a method, if the context to which it is returning belongs to a number at least two lower than the current region number before returning; the regions with the higher number than that of the context to which it is returning being collected (i.e. the objects in the regions are discarded).

29. The improvement claimed in claim 28 wherein; said garbage collector means includes region cleaning means for detecting when a region has accumulated an excessive number of objects, and cleaning the regions thus detected.

30. The improvement claimed in claim 28 wherein said garbage collector means includes means for detecting when objects are shared across processes for ensuring that no object is collected that is in use by another process.

31. The improvement claimed in claim 28, wherein said system further comprises an object database and wherein said garbage collector means includes off-line mark/sweep collector means for periodically removing objects from the database that have become unreachable by any other object in the database, by first marking all objects in the database that can be reached, and then sweeping the database to remove unmarked objects.

32. The improvement claimed in claim 31 wherein said object database contains constants that are permanently marked, and said off-line mark/sweep collector means includes means for recognizing said marks and preventing removal of said constants by said collector means from said database.

33. The improvement claimed in claims 28, 29, 30, 31, or 32, wherein said system further comprises an in-use table containing a list of objects that must be kept in-memory, said table including a field designating each object's region.


Description

TECHNICAL FIELD OF THE INVENTION

The invention relates to a programming tool that allows application programming in both logic and object-oriented style, and which provides integrated database support.

BACKGROUND OF THE INVENTION

Object-oriented programming, logic programming, and database facilities have all been shown to have significant power in the writing of applications to run on a computer. No single programming tool has successfully integrated all three facilities in such a way as to eliminate an explicit interface between them. Normally, one must convert between object data to logic data to use the logic programming system, and then convert the logic data back again in order to use the object-oriented system. Furthermore, one must normally make explicit calls to a database manager in order to retrieve and store application data.

There have been some attempts to provide combined logic and object-oriented programming tools. For example, the Smalltalk/V (Smalltalk Tutorial and Programming Handbook, Digitalk, Inc., 1987) allows the user to invoke a logic programming tool (Prolog) from an object-oriented on (Smalltalk). However, the only kind of data (terms) that Prolog understands are strings, symbols, numbers, structures, and lists of any of the above. Furthermore, the Prolog structures are constrained to be a type of list from the object-oriented programming tool. Additionally, Smalltalk/V does not have database storage for the objects.

There have also been attempts to provide database support for object-oriented tools. For example, the Gemstone system, a product of Servio- Logic, Inc., while supporting a database server that can be programmed in Smalltalk, does not allow the application to be written in Smalltalk in such a way that the database server is transparent: i.e. the application must make speciific calls to the database server (`Integrating an Object Server with Other Worlds`, by Alan Purdy et al, ACM Transactions on Office Information, Vol. 5, Number 1, Jan. 1987). Gemstone does not contain any logic programming tools.

Some so-called "expert system shells"(e.g., Nexpert Object from Neuron Data, Inc.) allow for objects, rules and database features to be combined, but these tools are for the construction of a certain class of application ("expert systems"), and do not provide a general-purpose programming tool.

It is the object of the present invention to solve the problem of providing a general purpose programming tool that smoothly integrates object-oriented and logic programming, and provides the user with database facilities that are transparent to the user.

SUMMARY OF THE INVENTION

The present invention solves the problem by providing a single programming tool (referred to herein as Alltalk) which allows the programmer to write applications in an object-oriented language (a dialect of Smalltalk, also referred to herein as Alltalk), a logic programming language, (which is an extension of Prolog, herein called ALF) or a combination of the object and logic programming languages which allows the logic programming language system to consider any object from the object-oriented programming language system as a term in the logic programming language, and which supplies database management on behalf of the programmer, without the need for any specific database management control statements to be supplied by the programmer.

The main components of the Alltalk tool include a work station having an operator interface, a mass memory, and a CPU. An object-oriented programming language system running on the work station includes an object-oriented programming language and an object-oriented language compiler for translating source code written in the object-oriented programming language into objects and interpreter code. Also running on the work station is a logic programming system including a logic programming language having components of terms, clauses, predicates, atoms, and logic variables, and a logic language compiler for translating source code written in the logic programming language into objects. A database residing in the mass memory stores objects and components of logic programs as objects in a common data structure format, applications data, and application stored as compiled interpreter code. The database is managed by an database manager that represents objects and components of the logic programming language in the common data structure format as objects and is responsive to calls for retrieving and storing objects in the database and for automatically deleting objects from the database when they have become obsolete. An interpreter executes the interpreter code and generates calls to the database manager. A logic subsystem solves logic queries and treats objects as components of a logic program.

According to a further aspect of the present invention, an improved database format is provided for an object-oriented programming language system. The database has a key file and a prime file. The prime file contains records of variable length for storing objects, and the key file contains records of fixed length for storing the address, record length, and type of object in the prime file. An improved database manager for managing this database includes an object manager employed by the compiler, interpreter, primitives and utilities for providing access to objects in the database, and for maintaining organization of objects in the database. An access manager is called by a buffer manager for retrieving objects from the database, a transaction manager for updating the database with new or changed objects at commit points, and for undoing changes to objects upon aborts, and the object manager for providing high level interface to the database. A buffer manager is called by the object manager for generating calls to the access manager, and by a pool manager for keeping an in-memory copy of objects. The pool manager maintains memory for buffers.

According to another aspect of the present invention, an improved garbage collector is provided for a heap based programming language system. The garbage collector employs the concept of regions for garbage collection. When a context (representing the state of a method which is executing in the system) is created, it is assigned a region number. When an object is created or accessed by a method, it is assigned the region number of the context of the method that created or accessed it, unless the object was previously assigned a lower number. When an object is returned from a called method to the calling method, the object is moved to the region of the calling method. When reference is made from a first object to a second object assigned to another reigon, the second object is moved to the region of the first object. When returning from a method, if the context to which it is returning belongs to a number at least two lower than the current region number before returning, the regions with the higher number than that of the context to which it is returning are collected (i.e., the objects in these regions are discarded).

According to a still further aspect of the present invention, the runtime performance of a Smalltalk programming language system is improved by implementing a technique called message flattening. The compiler flags any method which consists of a single return statement which returns either an instance variable, or the result of a primitive, for which the first argument is self, and the other arguments correspond to arguments to the method. The interpreter detects these flags at runtime and flattens any message that would normally invoke these methods, by replacing this message send in the first instance with an assign, and in the second instance with a primitive invocation.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram showing an overview of the invention;

FIG. 2 is a schematic diagram of the compiler;

FIG. 3 is a schematic diagram of the runtime environment;

FIG. 4 is a schematic diagram showing initialized stacks for contexts and block stubs;

FIG. 5 is a schematic diagram showing the creation of a new context;

FIG. 6 is a schematic diagram showing creation of a block;

FIG. 7 is a schematic diagram showing creation of a second block;

FIG. 8 is a schematic diagram showing the creation of a new context;

FIG. 9 is a schematic diagram showing a block evaluation;

FIGS. 10-13 illustrate the modes of block execution;

FIG. 14 shows the creation of a process;

FIGS. 15-16 illustrate process management;

FIGS. 17-18 show the relationships between the context stack, processes, regions, and objects;

FIG. 19 is a schematic block diagram illustrating the functions of the garbage collector;

FIG. 20 shows the in-use table's structure and internal relationships;

FIG. 21 shows the in-use table's relationships with the object table, the buffers, and the database;

FIG. 22 shows how garbage is collected upon a method and return; and

FIG. 23 is a schematic block diagram illustrating the functions of the ALF compiler.

DESCRIPTION OF THE INVENTION

A portion of the disclosure of this patent document contains material to which a claim of copyright protection is made. The copyright owner has no objection to the copying of the patent document or the patent disclosure, but reserves all other rights.

1. Introduction

The Alltalk tool runs on workstation type hardware, such as a Sun 4/360 by Sun Microsystems, Inc., executing the UNIX operating system (UNIX is a trademark of AT&T). Referring to the Drawings, FIG. 1, the hardware includes an operator interface including a visual display (CRT) 10, a keyboard 12, and a pointing device 14, such as a 3 button mouse. The hardware also includes mass memory, such as a disk 16 on which the Alltalk database resides, as well as a CPU and main memory 18. The Alltalk software which is executed by the CPU and main memory 18 consists of an Alltalk compiler 20 for a dialect of the Smalltalk language (also called Alltalk) and an Alltalk runtime environment 22. The hardware components of the workstation are connected by a bus 24.

2. Overview

The Alltalk compiler 20 is a program for translating Alltalk language source statements into interpreter code. The compiler is generated by the YACC and LEX utilities in the UNIX operating system, and contains subroutines written in the C programming language.

Referring to the Drawings, FIG. 2, the compiler operates in 2 phases: the first phase 26 parses the source code written in the Alltalk language 28 and constructs an intermediate code 30. The second phase 32 takes the intermediate code and generates class objects, constant objects, and method objects and places these in a database 40. These objects are subsequently retrieved by the runtime environment 22 (see FIG. 1).

The runtime environment 22 is written in the C programming language, and in Alltalk. Referring to the Drawings, FIG. 3, the logic language compiler 36 and the logic subsystem 38 are both written in Alltalk. These are compiled through the previously mentioned Alltalk compiler 20, and the output placed in the database 40 and hence available to the runtime environment 22. Other applications 42 written in Alltalk are similarly available to the runtime environment after compilation. Application programs 42 (called methods) are processed by an interpreter 44, which calls other components of the runtime environment, which includes: a transaction manager 46 which can commit and abort transactions, an object manager 48 which is called to create and retrieve objects, a method fetcher 50 which determines the correct method to execute next, and a garbage collector 52 which detects and removes unneeded objects from main memory. The object manager 48 calls upon a buffer manager 54 to determine if a requested object is in memory or needs to be fetched from the database. If the object is to be retrieved from the database, a pool manager 56 is called to find space in an appropriate buffer, after which an access manager 58 is called. It is the access manager 58 that accesses the disk 16 containing the database 40.

3. Compiler

The Alltalk compiler 20 translates class descriptions written in a dialect of the Smalltalk language herein referred to as Alltalk into database objects for use by the Alltalk interpreter 44 during execution.

3.1. Synopsis

The Alltalk compiler 20 takes a file containing one or more complete Alltalk class descriptions, and for each class generates:

1. A class object, containing a dictionary of the methods in the class and specification of the instance and class variables,

2. Compiled methods, each consisting of "bytecodes", which drive the runtime interpreter, and

3. Objects representing constants encountered during compilation, (numeric values, strings, etc.)

which are placed in the database 40 for use by the interpreter 44 during execution.

The Alltalk compiler 20 consists of two phases. The first phase 26 (see FIG. 2) does the compilation work, (parse, optimization, and code generation), while the second phase 32 resolves global symbols and loads the results into the database. The two phases communicate via intermediate code 30 (written in an assembler-like intermediate language) which can be examined and altered by the user, if desired. The following is a description of the organization of the internals of the Alltalk compiler 20, including code generation strategies and optimization techniques.

3.2. Phase 1 (kcom)

3.2.1. Parsing

The first phase 26 of the Alltalk compiler 20 consists of two distinct processing stages:

1. Parse tree construction, and

2. Code generation, (including optimization).

The parsing phase is implemented in a fairly straightforward manner using the UNIX yacc/lex parser generator/lexical analyzer tools. The primary goal of the parsing stage is to create an internal parse tree representation of the class description and its methods which can be analyzed using a relatively simple set of mutually recursive tree-walking routines. In addition, the grammer of the input file is checked and errors are reported to the user.

The grammer specification of the object-oriented language is virtually identical to that specified in the syntax diagrams of the standard Smalltalk language reference, Smalltalk-80 The Language and Its Implementation, by Goldberg and Robson. The most notable variation in the Alltalk grammar is that of allowing a primitive invocation to be used as a primary expression and to have primary expressions as arguments, (this is adopted from Little Smalltalk, by Timothy Budd). This allows Alltalk primitives to be intermixed freely with the Alltalk language as if they were function calls which return a value, (which is essentially what the primitives really are), instead of as wholesale replacements for methods, as in standard Smalltalk.

Additional productions have been included to allow for reading an entire class description from a file, (in a form roughly similar to Smalltalk "fileIn/fileOut" format). These additional productions include "header" information such as superclass specification, instance/class variable declarations, and instance/class method classification statements.

While it is possible to build the entire analysis and generation mechanisms directly into the action portions of the yacc productions, the conciseness of the analysis and generation stage would be lost in that it becomes difficult to piece together how the parser actions interact to accomplish that stage when the controlling function is the yacc parser. Clarity is enhanced by having the analysis and generation functions make explicit their own walking of the parsed information, since it may vary from that of the parser at various points in the compilation. For example, more complex/global optimization techniques, such as inter-statement optimizations, may need to determine their own scope of applicability across several statements worth of parsed information. Such techniques are harder to embody as a single understandable function when mixed with the simple actions of parsing.

The basic parse node is a simple binary node, (left and right child pointers), with placeholders for the node type constant, a source code line number, and a string pointer.

Parse nodes are created via a function called makenode(), which allocates storage storage for the node, inserts the current source line number, and sets the other elements as specified by the user. The storage allocated for these nodes, (as well as for the class and method structures and copies of strings), is not tracked in the Alltalk compiler 20 since the compiler is expected to be run only for the duration of the compilation of a file.

A sample parse tree for an Alltalk method is given in Table 3.1. Syntactical shorthand and default meanings, such as the return value of a method having no statements being "self", or a block having no statements being "nil", are fleshed out during the parse phase in order to limit the amount of special case logic in the analysis and generation phase.

                                      TABLE 3.1
    __________________________________________________________________________
    Parse Tree Example
    __________________________________________________________________________
    Method Code
    ! SequenceableCollection methodsFor: 'enumerating'
    do: aBlock
           .vertline. index length .vertline.
           index <- 0.
           length <- self size.
           [(index <- index + 1) <= length]
               whileTrue: [aBlock value: (self at: index)]
    __________________________________________________________________________
    Parse Tree
    statement    statement
    assign       assign
    "index"      "length"
    number       unary.sub.-- expr
    "0"          "size"
                 identifier
                 "self"
    statement
    keyword.sub.---- expr
                 keyword.sub.-- arg
    "whileTrue:"
    block        block
    statement    statement
    binary.sub.-- expr
                 keyword.sub.-- expr
                         keyword.sub.-- arg
    "<="         "value:"
    assign  identifier
                 identifier
                         keyword.sub.-- expr
                                 keyword.sub.-- arg
    "index" "length"
                 "aBlock"
                         "at:"
    binary.sub.-- expr   identifier
                                 identifier
    "+"                  "self"  "index"
    identifier
            number
    "index" "1"
    __________________________________________________________________________


A successful parse generates a parse tree for the statements of each method in the class. These parse trees are anchored in a method structure for each method, which are all, in turn, linked to a single class structure. When the parsing of a class is complete, the class structure is handed to analysis and code generation routines.

3.2.2. Code Generation

The major components of this stage of the compiler are:

1. Symbol table (symbol.c)

2. Code generation (compile.c)

3. Code management (code.c)

4. Optimization (optimize.c)

Generally, the processing steps involved in this stage, (implemented by a function called compileClass()), proceed as follows:

1. The symbol table is initialized with the instance and class variables available via the superclass chain for the class. These symbols are retrieved from a symbol file in the local directory. It is considered a fatal error if the superclass cannot be found in the symbol file, (i.e., the superclass must be compiled first).

2. The instance and class variables for the class are added to the symbol table. Name clashes involving superclass variables are also considered fatal.

3. Each method is compiled. During method compilation, bytecodes are collected into segments corresponding to groups of statements in the method: one for the method itself, and one for each block within the method. When method compilation is complete, the method code segment is emitted first, followed by the segments for each block.

4. After the methods have been successfully compiled, a record of the class' instance and class variables are written to the symbol file in the local directory. This makes the class available for use as a superclass in subsequent compilations.

The method compilation step is the heart of the compilation task. Before describing this step in detail, a description of the compiler's view of symbolic references and the symbol table is given.

3.2.3. Symbols

Throughout the Alltalk compiler 20, references to named symbols in the program being compiled, as well as references to unnamed runtime storage are represented in a uniform manner. This uniform representation allows the code generation stage to freely create and pass references between the recursive routines which implement this stage without regard for their type until a leaf routine which needs detailed type information is executed. The conciseness of the code generation routines is greatly enhanced with this representation scheme.

There are nine reference types, as follows:

Named References

Instance Variable

Class Variable

Method Parameter

Formal Method Temporary

Block Parameter

Global Symbol ("true", "false", "nil", class name, etc.)

Unnamed References

Constant ("10", "3.14", `a string`, #symbol, etc.)

Compiler Temporary (used in evaluating intermediate

expressions) Block Stub/Closure (reference to storage holding the runtime id of the closure)

The symbol table supports a subset of the named references, separating them into the three categories of: (1) class, (2) instance, and (3) temporary symbols. Temporary symbols encompass the method parameter, formal method temporary, and block parameter references. Global symbol references are never actually placed in the symbol table, but are materialized whenever the search for a name fails. These symbols are resolved by the second phase 32 of the Alltalk compiler 20, since the cross reference values for these names are actually present in the runtime system dictionary contained in the database 40.

The symbol table interface routines contain the usual routines for the addition of symbols, (addSymbol()), and name-based search for symbols, (findSymbol()). An initialization routine, (initSymbols()), purges the table and then uses the globally specified superclass name to populate the table with "ref" structures for the instance and class variable symbols available via the superclass chain, as recorded in the symbol file in the local directory. A routine for writing the instance and class variable symbols, (writeSymbols()), to the local symbol file for the globally specified class, (i.e., the one being compiled), is also provided. Finally, a pair of general routines, (markSymbols() and releaseSymbols()), are available for get/set of placeholders in the symbol table. These are primarily used to record the starting position of method and/or block temporary symbols, so that they can be removed at the end of the compilation of the method and/or block statements.

3.2.4. Method Compilation

In the runtime environment 22 (see FIG. 3), a method is executed with an associated "context" containing local storage organized as an array of temporary slots, analogous to a "stack frame" in a conventional language. This local storage is divided into the following five sections from the compiler's point of view:

1. The object id of the receiver, known as "self".

2. Method parameters.

3. Formal method temporaries, (named temporaries).

4. Compiler scratch area for intermediate expression evaluation.

5. Block stub/closure id storage for blocks in the method.

A general mechanism for tracking the use of the temporary slots is implemented in the compiler using a set of macro routines. This set includes routines for: allocating a number of temporaries, (allocTemp()), which returns the starting slot for the requested count; freeing a number of temporaries, (freeTemp()); get/set of temporary usage information, (getTempUse(); and setTempUse()); clearing usage information, (clearTempUse()); and requesting the high water mark for temporary usage, (maxTempUse()). Temporary usage is tracked with these routines for the first four kinds of temporaries listed above. Storage for block ids is tallied separately during method code generation since it is not known what the required number of compiler scratch temporaries will be until the method compilation has finished.

Before the method statements are examined, several initialization steps are performed:

1. The symbol table is populated with the entries for "self", "super", the parameters, and the formal temporaries. The slot index for each entry is determined by allocating a temporary as each symbol is added to the table.

2. A code segment is allocated for the method statements and made to be the "active" segment. During compilation, generated code is placed in the "active" code segment, which is switched when compilation of a new list of statements, (e.g., a block), is started or completed.

3. Label generation is reset, (used for branch targets and block entry points).

4. The block count is reset.

Also prior to commencing code generation, the parse tree of the method is examined to see if it can be tagged for "flattening" at runtime. "Method flattening" is a technique for determining whether a runtime message send can be avoided because the method is "trivial". A "trivial" method is one which contains a single statement returning either:

1. An instance variable, (can replace send with an assign), or

2. The result of a primitive for which first argument is self and the remaining arguments to the primitive invocation line up exactly with arguments to the method, (can replace send with primitive invocation).

At this point, compilation of the method statements is initiated by calling compileStatementList() with a pointer to the first statement parse node of the method. This routine invokes compileExpr() to compile the expression associated with each statement in the list. CompileStatementList() is used to compile lists of statements for blocks as well as methods, generating appropriate return bytecodes when explicit return statements are encountered and after the last statement in the method or block. CompileStatementList() distinguishes between method and block statement lists by the value of active code segment id, which is -1 for a method or >=0 for a block. Provision is also made for the case of "inline" block code generation, which is used in optimization of certain messages involving blocks, (such as messages to booleans), described later.

3.2.5. Expressions

Expression compilation is the center of most activity during the compilation of a method. CompileExpr() defines the compilation actions for all parse nodes other than statements in a concise manner. This routine is invoked with the node to be compiled and a destination specification for the result of compiling the expression indicated by the node in the form of a "ref" structure. The destination specification allows the calling routine to control placement of the expression's value, which is particularly useful for aligning values for message sends, minimizing unnecessary data movement at runtime. Simple expressions, such as identifiers and constants, are trivial compilations requiring only assignment of the value associated with the identifier or constant description to the specified destination. An explicit Alltalk assignment expression, (e.g., "a.rarw.b+c"), only requires compilation of the expression on the right of the ".rarw.", with the reference on the left as the destination, in addition to assigning this result into the specified destination for the assignment expression itself, (e.g., "d.rarw.(a.rarw.b+c)"), if indicated. The remaining expression types, (messages, cascades, primitive invocations and blocks), require somewhat more involved compilation steps, hence, these cases have been split into separate routines, (genSend(), genCascade(), genExecPrim(), and genBlock()). We now describe the compilation steps performed in each of these cases.

3.2.6. Messages and Cascades

The runtime implementation of the send message bytecode requires that the receiver and arguments be present in a contiguous set of the sending context's temporaries. The location for the return value of the message send is also required to be a temporary in the sending context, though it need not be adjacent to the receiver and arguments.

GenSend() ) complies with the first condition by allocating a contiguous set of temporaries, (via allocTemp() ), and compiling the receiver and argument expressions with each of these temporaries, (in order), as the specified destination. Hence, results of receiver and argument expressions are cleanly aligned with their use in containing message sends, eliminating unnecessary data re-positioning assignments. A simple optimization is also done at this point. If the receiver and argument temporaries already happen to line up,(detected by lineup() ), new temporaries are not allocated and the receiver and argument values need not be moved.

The second condition, (destination must be a temporary), is honored by examining the specified destination reference and allocating a temporary to hold the result of the message send if the destination is not already a temporary. This situation is remembered and code is generated for moving the result from the allocated temporary to the actual destination after the message send. This implementation style allows for the addition of variations of the send message bytecode for non-temporary destionations, if the need arises.

The previous comments also apply to cascaded message sends, (genCascade() ), except that the receiver expression is only evaluated once and the result placed in a temporary, to which the remaining messages in the cascade are sent, (genCascadeSend() ).

3.2.7. Primitive Invocations

As with message sends, Alltalk's primitive invocations require that arguments to the primitive be in a contiguous set of the invoking context's temporaries, and the destination for the result be a temporary in the invoking context. GenExecPrim() ) handles the non-temporary destination and argument alignment cases, (using lineup() ), in the same manner as is done for message sends. Each primitive argument is compiled, allowing arbitrary expressions to be used as arguments.

3.2.8. Blocks

Blocks are the most involved expression compilations in that they cause changes in the global state of the compiler. In Alltalk, a block is a list of statements which are to be executed with their own context when a "value" message is sent to it. The lexical aspects of a block allow it to refer to names available to the method in which the block is defined, as well as the names in any containing block. These names include the method's parameters and formal temporaries and any containing block's parameters. These semantics imply that a block is a static "object" of sorts which can potentially have muliple runtime activations, with each activation dynamically establishing variable name .rarw..fwdarw. storage bindings. Hence, from the compiler's point of view, a block is a separate list of statements to be compiled and "set-up" as an object, which may also include cross-context runtime references to be represented.

GenBlock() ) alters the global state of the compiler to create the proper compilation conditions to meet the needs described above. The block being compiled is given a unique id within the method, and a new code segment is allocated and marked with this id and connected to the list of code segments generated for the method so far. The currently active code segment is saved, along with its temporary usage, (since the block will have its own context), and the new segment is made the active segment, (generated code is always placed in the active segment). The previous code segment and its temporary usage are restored when compilation of the block is completed. The symbol table is marked so that the block's symbols, (block parameter names), can be released at the end of the block's compilation, and the block's symbols are added to symbol table for proper scoping. Finally, a label is generated to mark the start of the block's code in the method.

At this point, the state of the compiler has been properly altered, and compilation of the statements in the block is initiated via compileStatementList() ).

Once the block has been compiled, all the information needed to describe the block's activation characteristics at runtime, (temporary usage and entry point), has been established. This information is supplied to the runtime interpreter 44 via the set up block bytecode. This bytecode causes the interpreter to copy this information and associate it with a unique runtime id, known as a block stub id. A block stub id can be manipulated much the same way as any other object id. In the case of returning a block stub, or assigning a block stub to an instance variable, Alltalk establishes an object for the block stub. The information associated with the block stub id is used to establish a context for executing the statements of the associated block whenever the "value" message is sent to this id, (i.e., the evaluate block bytecode is executed for the id). Note that this requires that a block must be "set up" before it can be "evaluated" at runtime.

Alltalk choses placement of the set up bytecode for a specific block so that the bytecode is not executed an uncontrolled number of times. This is because the set up block implementation in the interpreter does not check for multiple "set ups" performed for the same block.

The solution to the placement problem is to group the set up block bytecodes for any "top level" block, (i.e., any block encountered while generating code for the method statements), and its contained blocks, and place them in the method code segment ahead of the first use of the "top level" block. This technique avoids executing set ups for any block(s) which are not in the specific control flow path at runtime. GenBlock() ) implements this strategy by setting a pointer to a position in the method segment code at which the set up block bytecodes are to be "spliced" when a "top level" block is entered.

3.2.9. Message Optimizations

Except for aiding the runtime environment for "method flattening", the rest of the compiler optimizations involve recognition of specific message selectors in the source code, (optimize.c). The optimization strategy for these selectors is to generate inline code to implement the specific semantics of the selector, (assuming a specific receiver class), in order to avoid sending the actual message at runtime. These optimizations are detected by the genOptiSend() ) routine which is invoked from compileExpr( ) ) when a message expression is compiled. If genOptiSend () ) can handle the message, the normal compilation via genSend() ) is avoided by compileExpr() ). The message selectors/receiver class combinations which are currently optimized are listed in Table 3.2.

                  TABLE 3.2
    ______________________________________
    Optimized Messages
    Class           Selector
    ______________________________________
    Object          perform:
                    perform:with:
                    perform:with:with:
                    perform:with:with:with:
    Integer         +
                    -
                    =
    Block           value
                    value:
                    value:value:
                    value:value:value:
                    value:value:value:value:
                    value:value:value:value:value:
                    whileTrue:
                    whileFalse:
                    whileTrue
                    whileFalse
    True/False      ifTrue:
                    ifFalse:
                    ifTrue:ifFalse:
                    ifFalse:ifTrue:
                    and:
                    or:
                    not
                    &
                    .vertline.
    Object/         isNil
    UndefinedObject notNil
    ______________________________________


The complexity of these optimizations vary from simply generating special bytecodes, (e.g., Integer messages), to inline block code generation with conditional branch bytecodes for implementing looping constructs, (e.g., Block "while" messages).

Due to the straightforward expression of these optimizations, they are not treated in detail here. However, one of the more complex optimizations, (optWhile() ), will be described to highlight and convey an understanding of some of the issues and supporting procedure structure involved in these optimizations.

OptWhile() ) handles optimization of the various "while" messages which can be sent to blocks, (whileTrue:, whileFalse:, whileTrue, and whileFalse). This routine demonstrates the need to deal with:

1. Evaluation of literal or non-literal block objects in receiver and/or argument positions,

2. Proper placement of set up block bytecodes to avoid repeated set up of the same block(s), (described previously in the section on block expression compilation), and

3. Generation of additional code to implement the semantics of the message, (looping, in this case).

Since the semantics of the "while" messages clearly involves sequenced evaluation of receiver and argument blocks, it is possible, if either block is literal, to treat the statements of that block as if they were in the statement list of the method or block containing the "while" message. This causes code to be generated directly into the currently active code segment, ("inline"), resulting in evaluation of that block in the current context at runtime, instead of setting up a separate context for evaluation of the block code. If either block is not a literal, (e.g., passed in as a parameter), that block must be evaluated in a separate context, (performed by the "evalb" bytecode).

This situation of block code generation strategy arises in the optimization of many other of the messages listed in Table 3.2. OptBlock() ) determines the code generation strategy based on the type of the parse node representing the block in the parse tree. If the node represents a literal block in the source code, the statement list for the block is compiled into the active code segment using compileStatementList() ). Otherwise, a "value" unary message expression, with the node representing the block as the receiver, is constructed and compiled under the explicit assumption that the receiver will be a block, (genEvalBlock() ). Note that this assumption is not made, (and a different bytecode is generated), when the "value" message is encountered in the original source code, since the actural receiver may not be a block at runtime, in this case.

Literal blocks which are part of the "while" message may be "top level" blocks, (i.e., outermost block of a nesting within a method). Beacuse of this, optWhile() ) must set the "splice point" in the method segment code for set up block bytecodes for any blocks contained in the "while" blocks, such that these bytecodes are placed outside the looping portion of the "while" code. This avoids the multiple "set up" problem for a block discussed in the previous section on block compilation.

With the background of the preceding discussion, the implementation of the optimization of "while" messages is summarized in the following steps:

1. If the "while" message is encountered in the method statement list, set a marker to the current position in the code as the "splice point" for set up block bytecodes for blocks which are encountered during compilation of the "while" message.

2. Generate a label to mark the start of the condition block, (i.e., the receiver of the "while" message).

3. Compile the condition block, (using optBlock() ), with an allocated temporary as the destination for its evaluation result.

4. Generate a conditional branch to the end of the "while" message code, (step 7), based on the result of evaluating the condition block and the specific message being compiled.

5. Compile the body block, (using optBlock() ), with no destination for its evaluation result.

6. Place an unconditional branch back to the label generated in step 2 to close the loop.

7. Generate code to assign "nil" to the destination specified for the value of the "while" message expression, (the destination may be "none"). This is the defined value for a "while" message expression.

8. Free the temporary allocated for the result of the condition block in step 3.

An example of the code generated for "while" message expressions with different combinations of literal and non-literal condition and body blocks is shown in Table 3.3.

                  TABLE 3.3
    ______________________________________
    "While" Message Code Generation
    Source Statement  Generated Code
    ______________________________________
    [x < y]whileTrue: L1
    [x <- x + 1].
                             send    t1[x],0,t5,2,#<
                             jne     L2,t5,'true
                             mov     t6,t1[x]
                             mov     t7,1
                             send    t6,0,t1[x],2,#+
                             jmp     L1
                      L2
    b1 whileTrue: [x <- x + 1].
                      L5
                             evalb   t3[b1],t5,1
                             jne     L6,t5,`true
                             mov     t6,t1[x]
                             mov     t7,1
                             send    t6,0,t1[x],2,#+
                             jmp     L5
                      L6
    [x < y] whileTrue: b2.
                      L9
                             send    t1[x],0,t5,2,#<
                             jne     L10,t5,`true
                             evalb   t4[b2],t6,1
                             jmp     L9
                      L10
    b1 whileTrue: b2. L13
                             evalb   t3[b1],t5,1
                             jne     L14,t5,`true
                             evalb   t4[b2],t6,1
                             jmp     L13
                      L14
    ______________________________________


The intermediate language is discussed further in the next section.

3.3 . Phase 2 (kasm)

As noted in the synopsis, the second phase 32 of the Alltalk compiler 20 concerns itself with resolving symbols, and creating and loading the classes and methods into the database.

3.3.1. Intermediate Language

The intermediate language expected as input for this phase consists of tokens representing bytecodes, along with directives for establishing the class, delimiting methods, and tracking the Alltalk source file name and line numbers. A summary of these tokens and directives are listed in Tables 3.4 and 3.5, respectively.

                  TABLE 3.4
    ______________________________________
    Intermediate Language Bytecode Tokens
    ______________________________________
    Message Send/Return
    send     send message
    sendp    send parameterized message, ("perform")
    mret     return from method, (" " in source code)
    Integer Arithmetic Optimizations
    seq      send "=" message
    sadd     send "+" message
    sadd1    send "+ 1" message
    ssub     send "-" message
    ssub1    send "- 1" message
    Block Set-Up/Evaluation/Return
    setb     set up block stub/closure
    evalb    evaluate block (receiver must be a block)
    evalbo   evaluate block (receiver might not be a block)
    bret     return from block
    Data Movement
    mov      src/dest specified by "effective addresses"
    Primitive Invocation
    prim     execute specified primitive
    Control Flow
    jeq      jump on target equal to constant
    jne      jump on target not equal to constant
    jmp      unconditional jump
    ______________________________________


TABLE 3.5 ______________________________________ Intermediate Language Directives ______________________________________ Class Information .class start specified class .supervar number of inherited superclass variables .ivar instance variable names defined in this class .cvar class variable names defined in this class Method Information .imethod start instance method .cmethod start class method .mparam method parameter names .mtemp method temporary names .mprim "flattenable" primitive-only method .mattr "flattenable" attribute-return method .mend end method Source File Tracking .file source code file name .line source code line number ______________________________________


In addition to the basic elements of the language, symbolic labels of the form "L<number>" are also available for use in the code, (for jeq, jne, jmp, and setb bytecodes), with target labels being required to start at the beginning of a line which contains no other tokens. Comments are allowed on any line, and are defined to be anything contained between a semicolon, (";"), and the end of the line. An example of this intermediate language for a method of a class call Foo, is shown in Table 3.6, which was constructed in order to demonstrate the variety of code and reference type representations generated by phase 1.

                  TABLE 3.6
    ______________________________________
    Intermediate Code Examples
    ______________________________________
    Method
    Class Foo :Object
           .vertline. instvar Classvar .vertline.
    .vertline.
    do: aBlock
           .vertline. a b .vertline.
           instvar <- Classvar.
           instvar associationsDo:
               [:assoc .vertline. assoc value timesRepeat:
               [aBlock value: assoc key]].
           a <- `hi andy`.
           b <- #(2 ( foo `hi` $a ) `fred`).
    ]
    Intermediate Language
    .file Foo.st
    .class Foo Object
    .supervar 0
    .ivar instvar
    .cvar Classvar
    .imethod do:
    .mparam aBlock
    .mtemp a b
    L0
           mov   i0[instvr],`Foo@i1[Classvar]
           mov   t5,i0[instvar]
           setb  b0,L1,5
           setb  b1,L2,5
           mov   t6,b0
           send  t5,0,t4,2,#associationsDo:
           mov   t2[a],`hi andy`
           mov   t3[b],( 2 ( #foo `hi` $a ) `fred`)
           mret  t0[self]
    L1
             evalbo  t1[assoc],t3,1
             send    t1[assoc],0,t3,1,#value
             mov     t4,b1
             send    t3,0,t2,2,#timesRepeat:
             bret    t2
    L2
             mov     t2,mt1[aBlock]
             mov     t4,b0@t1[assoc]
             send    t4,0,t3,1,#key
             evalbo  t2,t1,2
             send    t2,0,t1,2,#value:
             bret    t1
    .mend 7 2
    ______________________________________


3.3.2. Effective Addresses

References to various types of runtime variables and constants are represented in specific symbolic forms in the intermediate language which we call "effective addresses". These forms appear in the argument fields of many of the bytecode tokens, although not all forms are valid in specific argument positions of specific bytecodes. These effective address forms are summarized in Table 3.7, and the reader is again referred to the code in Table 3.6 for examples.

                                      TABLE 3.7
    __________________________________________________________________________
    Effective Address Forms
                    Runtime
    Form       Example
                    Type Description
    __________________________________________________________________________
    <num>      10.2 1      Numeric constant.
    $<char>    $a   1      Character constant.
    `<chars>`  `hello`
                    1      String constant.
    #<chars>   #size
                    1      Symbol constant, (object id of symbol).
    `<name>    `Bag 1    5 Global symbol cross reference constant, (object
                           id
                           associated with symbol name).
    #(<consts>)
               #(1 `hi`)
                    1      Array constant, (can be nested).
    t<num>     t5   5      Current context temporary.
    mt<num>    mt2  2      Owning method context temporary, (only found in
                         10
                           block code).
    b<num>     b1   2      Block stub id as slot in owning method context
                           temporary.
    i<num>     i0   3      Instance variable slot.
    `<name>@i<num>
               `Bag@i2
                    4      Class variable. Combination of cross reference
                         15
                           constant, (the class name), and slot.
    b<num>@t<num>
               b1@t3
                    6      Block parameter reference. Generated only when a
                           block refers to a containing block's
    __________________________________________________________________________
                           parameters.


In the particular case of the "mov" bytecode, phase 2 translates both the source and destination effective address forms into one of six specific runtime reference types.

3.3.3. Operational Description

Phase 2 maintains a global state around the current class and method being "assembled", resulting in method-at-a-time assembly and placement into the database. Ths class object is not given to the object manager until all methods described in the input file have been successfully translated and passed to the object manager. This insures that the old version of the class, (hence, its methods), is not replaced unless assembly of the new version is successful.

In contrast to phase 1, this phase is very "flat", that is, it contains no recursive functions to walk parse trees, since each input statement is essentially a self-contained description. All the implementing functions, (assemble.c), are despatched directly from the parser on a per statement, (or group of statements), basis, resulting in a very simple control flow.

Assembly of a method essentially consists of collecting the bytecodes described by the bytecode statements into a scratch area, (MethodBytes), and recording labels, references to labels, and block references in these statements for resolution when the end of the method is reached. Each bytecode statement has a corresponding translation routine, (assemble.c), which builds the runtime representation of the bytecode in the scratch area.

When the end of the method is reached, (endMethod() ), all label and block references are resolved and the object manager is called upon to allocate space for the compiled method object. In this area, the instance variable slots for the method object are initialized, (noTemps, noParms, classOop, selectorSymbol, . . . , etc.), and the bytecodes are copied in from the scratch area. A dictionary entry relating the method selector symbol id of the method to the id of the compiled method object is also created and added to entries already established for other methods, (in the Methods global array). These dictionary entries are stored in the class object when the end of the class is reached, (i.e., when a new `.class` directive or end-of-file is encountered).

When the end of the class is reached, (endClass() ), space is obtained from the object manager under the same object id as the previous version of the class, to cause replacement of that class. The class is then built in this area by filling in control information, including the object id of the first instance of the class obtained from previous version, the object id of the class name symbol, the id of the superclass and the size of the method dictionary for the class. The method dictionary entries are then closed-hashed, (by method selector symbol id), into a dictionary area in the class object. The class object is then flushed to the database, signaling completion of the assembly of the class, ending phase 2.

4. Interpreter

The interpreter 44 (see FIG. 3) is that portion of the Alltalk runtime environment 22 which the user invokes to run Alltalk applications. The interpreter 44 decodes the object code generated by the compiler 20 (FIG. 2), and executes it, calling upon many of the other services of the runtime environment 22. The interpreter 44 also includes a debugger, described below, which allows the programmer to inspect the running program in a variety of ways.

4.1. Synopsis

The previously described Alltalk compiler 20 for the Alltalk dialect of the Smalltalk language translates Alltalk source code into an intermediate representation, called bytecodes, and stores this representation in the database 40. Each bytecode represents an instruction for the interpreter 44, and consists of an operation code (a = bit integer) and a variable number of parameters. Applications are executed using the Alltalk interpreter 44. The Alltalk interpreter 44 uses the object manager 48 as the interface to the database 40. It also calls on the transaction manager 46 and the garbage collector 52. In addition, it invokes primitives which interface to the UNIX operating system to do things like operating on primitive data types (integer addition, floating point multiplication, string concatenation, etc.), performing file I/O, managing the display, and controlling keyboard and mouse input.

The object manager 48, transaction manager 46, garbage collector 52, and primitives are described in later sections in more detail.

The main functions in the interpreter 44 that are discussed in this section can be grouped into the following main categories:

(1) bytecode loop;

(2) bytecode handlers;

(3) context management;

(4) process management;

(5) initialization and shutdown; and

(6) the debugger.

4.2. Bytecode Loop

The state of the Alltalk interpreter 44 is captured, essentially, in a global array called Processes. Each element of this array represents one Smalltalk process. At interpreter initialization, one process is created. The user's application can create new processes, switch processes, and destroy processes as needed. Associated with each process is a stack of contexts, and a pointer to one which is the currently-executing context of that process. A context is created when a message is sent or a block is evaluated, and is destroyed when the corresponding message/block returns. Associated with each context is a set of bytecodes for the corresponding method/block, and a pointer to one which is the currently-executing bytecode of that econtext. The bytecodes are the object code to which the user's application was compiled. Each context also has an array of temporaries which are used to hold intermediate results of the execution of the associated method/block.

At any given time, only one of the Processes is running; it is the current process. The current context of that process is, then, the current context. The current bytecode of that context is the current bytecode.

The basic operation of the Alltalk interpreter 44 is a bytecode decode/dispatch loop. Code exists in the interpreter for handling each type of bytecode generated by the compiler. The interpreter decodes a bytecode to determine its type, then invokes the appropriate code for that bytecode type. We call the piece of code for a particular bytecode type a bytecode handler. Each bytecode handler increments the bytecode pointer so that after the handler completes, the interpreter main loop can decode and dispatch the next bytecode. Bytecode handlers can manipulate the bytecode pointer and other interpreter data structures in ways to affect program flow.

The routine exec.sub.-- bcodesO contains the bytecode loop. It decodes the bytecodes and invokes the appropriate bytecode handler. Before doing so, however, it checks to see if it should switch processes, i.e., it checks whether a different Smalltalk process should become the currently-active process. See the section below on Process Management for details on how process switches are handled and new processes are created.

4.3.Bytecode Handlers

There is one bytecode handler in the Alltalk interpreter 44 for each type of bytecode generated by the Alltalk compiler 20. Each handler is one (or more) case(s) in a C-language switch statement. The switch statement is part of exec.sub.-- bcodesO in the file exec.sub.-- bcodes.c. Each case of the switch is in a separate file to make source code maintenance easier. At compile time, these files are included in exec.sub.-- bcodes.c via #include's. This strategy was chosen over making the bytecode handlers each separate procedures because it cuts down on call overhead in the bytecode loop. It also allows the use of machine registers for certain control variables, since the handlers are all within a single C language function. Note that thousands of bytecodes are executed each second; overhead for that many calls would be very large.

A complete description of the bytecodes and their parameters is included in Table 4.1. ##SPC1##

The bytecodes can be grouped into the following categories:

execute a primitive;

send a message;

define a block;

evaluate a block;

return from a block or method;

branch; and

assign from one variable to another.

We describe each of these next.

4.3.1. Execute a primitive

bytecodes discussed:

0x000 --<0x0FF (primitives 0-255)

Primitives are called by Alltalk code to do the low-level tasks. These tasks generally depend on the underlying hardware and operating system, and include things like file I/O, integer and floating point arithmetic, and using the display. The bytecodes numbered from 0 to 255 (decimal), i.e., 00 to FF hex, are reserved for primitives. Primitives are similar to methods in that they have a receiver, they have optional arguments, and they return an object. They are unlike methods in that they are written in C rather than Alltalk, and no context is set up for them.

4.3.2. Send a message

bytecodes discussed:

0x100 (send.sub.-- msg.sub.-- bcode) 0x10B (send.sub.-- param.sub.-- msg.sub.-- bcode) 0x10C (send.sub.-- message.sub.-- add) 0x10D (send.sub.-- message.sub.-- sub) 0x10E (send.sub.-- message.sub.-- eq) 0x10F (send.sub.-- message.sub.-- add1) 0x110 (send.sub.-- message.sub.-- sub1)

The compiler generates several different types of bytecodes for messages. The normal message send is handled by send.sub.-- msg.sub.-- bcode. Messages of the type `perform:` and `perform:with:` are handled by send.sub.-- param.sub.-- msg.sub.-- bcode. These two handlers operate in very similar manner. The main difference is that for send.sub.-- msg.sub.-- bcode, the message selector is known at compile time, and is included in the bytecode itself; for send.sub.-- param.sub.-- msg.sub.-- bcode, the oop of the message selector is found, at run time, in a temporary of the current context.

The normal processing of a send.sub.-- msg (and send.sub.-- param.sub.-- msg) is as follows. Note that we do not discuss various optimizations that we have put into send.sub.-- msg bytecodes. These are discussed in a separate section below.

(1) Get the oop of the receiver of the message from the temporaries of the sending context. The send.sub.-- msg parameter arg.sub.-- start.sub.-- slot is the index into the temporaries at which this oop is found.

(2) If the receiver is not a context or positive integer, call the object manager to fetch the receiver object. Note that contexts and positive integers are not managed by the object manager: contexts are not objects in Alltalk, and positive integers are encoded as negative oops.

(3) Determine the receiver's class. If the receiver is not a context or positive integer, its class is found in its object header.

(4) Call the object manager to fetch the method associated with the message we are processing. We pass to the object manager the hashed.sub.-- selector and super.sub.-- flag parameters from the bytecode, plus the class of the receiver. It returns the method object which contains the bytecodes for the message we are processing.

(5) In the sending context, store the value of the bytecode parameter put.sub.-- answ.sub.-- slot. This is needed when we return to this context from the method we are about to execute. It represents the index of the sending context's temporaries into which the returned result is to be put.

(6) Increment the bytecode pointer in the sending context. When we return to this context, we will continue executing bytecodes in this context at that point.

(7) Create a new context for the message we are processing. Copy num.sub.-- args arguments from the sending context, starting at arg.sub.-- start.sub.-- slot in the temporaries of the sending context. They are copied into the temporaries of the new context starting at slot 0. Note that this assures that the receiver of a message can always be found in context temporary 0. The new context will have a bytecode pointer which points to its first bytecode. We make this context the current context, and return to the bytecode loop.

4.3.3. Define a block

bytecodes discussed:

0x106 (setup.sub.-- blk.sub.-- bcode)

In Alltalk, blocks are not objects managed by the object manager, but rather are maintained by the interpreter as C data structures. When they are assigned to instance variables, or returned as the result of a message, they are made into objects, is the home context (this is discussed in more detail below). They can, however, be assigned to method temporaries and passed as parameters in messages without being made into objects first.

When the interpreter encounters a setup.sub.-- blk bytecode, it creates a data structure called a block stub, and gives it an object id (which is a 32 bit integer) which we call an oop). The oop is in a special range, i.e., greater than or equal to INIT.sub.-- CNTX.sub.-- ID, so it can be recognized later as a block by the interpreter. The block stub contains enough information to evaluate the block when an eval.sub.-- blk bytecode is later encountered. Its oop is stored back in the temporaries of the home method in which it is defined. It can then be handled like any other oop stored in temporaries (except for the cases mentioned above).

4.3.4. Evaluate a block

bytecodes discussed:

0x109 (eval.sub.-- blk.sub.-- bcode) 0x10A(eval.sub.-- blk.sub.-- bcode2)

Evaluating a block means executing the code that the block contains. Note that a block must be `set up` before it can be evaluated. However, a block which is set up may or may not be evaluated. For example, the ifTrue: block and ifFalse: block of an ifTrue: ifFalse: message won't both be evaluated. A block may be evaluated immediately after it gets set up, or later. It may be evaluated by the context in which it was set up, or the context which sets it up may pass it as a parameter in a message send, so that it gets evaluated by another context.

The eval.sub.-- blk bytecode handler causes a block to be evaluated by converting the block stub for that block into an active context on the context stack of the active process. It makes that new context be the current context, and makes the global bytecode pointer point to the block's first bytecode.

4.3.5. Return from a block or method

bytecodes discussed:

0x101 (long.sub.-- return) 0x104(short.sub.-- return)

When the Alltalk interpreter 44 encounters a return bytecode, it means that the currently executing context is finished, and it switches control to a previous context. In addition, it passes back an object (actually the object's oop) to the context to which it is returning.

There are two different return bytecodes. What we call the long return (also known as return from method) causes the interpreter to return to the context just previous to the home context of the current context. The home context of a method context is itself; the home context of a block context is the context of the method in which the block is defined/setup. Therefore, a long return from a block is the same as doing a return from the block's home method. Long returns are indicated in the Alltalk code by the caret symbol," ".

A short return causes the interpreter to return to the context just previous to the current context, regardless of what it is. A short return and a long return from a method context are the same. (The Alltalk compiler 20 always generates a long return for returns from a method context.) A short return from a block means to simply return to the previous context in the stack. This previous context is the context which caused the block to be evaluated; it may or may not be the block's home context.

4.3.6. Branch

bytecodes discussed:

0x105 (branch) 0x107 (branch.sub.-- on.sub.-- equal) 0x108 (branch.sub.-- on.sub.-- not.sub.-- equal)

Branch bytecodes are used to implement control structures. In addition, branching bytecodes are used by the compiler as part of several optimizations.

The unconditional branch bytecode (0x105) simply increments/decrements the bytecode pointer by a certain amount. The conditional branch bytecodes compare an oop found in a temporary of the current context with an oop contained in the bytecode itself. Whether or not the bytecode pointer is incremented depends on the results of comparing these two oops.

4.3.7. Assign from one variable to another

bytecodes discussed:

0x1nm (assign type n variable from type m variable)

The Alltalk compiler 20 and the Alltalk interpreter 44 understand six different types of variables. These six types are as follows:

Type 1

This type of variable is simply an oop that the Alltalk compiler 20 generates, and includes as part of the assignment bytecode. Obviously, it cannot be the destination of an assignment statement, only the source. Examples of Type 1 variables are string, character, integer, and floating point constants, and class names.

Type 2

This type of variable is a temporary in the home context of the current context. The Alltalk compiler 20 specifies it as an index into the array of temporaries.

Type 3

This type of variable is an instance variable of the object which is `self` in the current context. The Alltalk compiler 20 specifies it as an index into the instance variables of the receiver.

Type 4

This type of variable is an indirect reference to a particular instance variable of a particular object. The Alltalk compiler 20 specifies the instance variable by specifying an index into the temporaries of the current context (which specifies the object), plus an index into the instance variables of that object (which specifies the particular instance variable).

Type 5

This type of variable is a temporary in the current context. The Alltalk compiler 20 specifies it as an index into the array of temporaries. Note the difference between this and the type 2 variable. For a method context, type 2 and type 5 are the same because a method's home context is itself; for a block, type 2 refers to its home context's temporaries, and type 5 refers to its own temporaries.

Type 6

This type of variable is needed for nested blocks in which an inner block refers to an argument of an outer block. The Alltalk compiler 20 specifies the argument by giving two parameters in the bytecode. First is an index into the temporaries of the home context. In that particular temporary is found the id of the block stub of the outer block. The second parameter is an index into the temporaries of the outer block. In that particular temporary is found the oop of interest. Since Smalltalk does not allow assignment to the arguments of a block, a type 6 variable cannot be the destination of an assignment statement, only the source.

Each assignment bytecode has a source variable type and a destination variable type. The destination is specified first, then the source. Because type 1 and type 6 variables cannot be destinations, there are 24 assignment bytecodes (4 destination types * 6 source types). The assignment bytecode handlers simply put the oop specified by the source into the location specified by the destination.

4.4. Context Management

As mentioned above, the state of the Alltalk interpreter 44 is contained in the global array, Processes. Each element in that array represents a process. In addition to the interpreter's C-language data structure for a process, there is also an instance of Smalltalk Class Process for each Smalltalk process in an application. In the following, we concentrate on the interpreter's data structure for processes, and ignore the Smalltalk object. Each process has associated with it a set of contexts. In the following, we explain how contexts are implemented for one process, but one should remember that there is one set of contexts for each process.

In order to improve performance of the Alltalk interpreter 44, it does not treat contexts as objects. Instead, they are maintained by the interpreter as C data structures. (As mentioned above, however, the home context may be turned into an object if an owned block is turned into an object).

The Alltalk interpreter 44 manages contexts in two pieces. One piece contains what are called active contexts. These are contexts associated with methods which have not yet returned and blocks which are executing and have not yet returned. This piece operates like a stack: when a message is sent or a block starts execution, the Alltalk interpreter 44 pushes another context on the stack; when a method or block returns, the Alltalk interpreter 44 pops one context (or more, in the case of a long return from a block) off the stack.

The second piece contains what are called block stubs. A block stub is established as the result of a setup.sub.-- blk bytecode (see setup.sub.-- blk.sub.-- bcode). In order to treat blocks as objects, object id's (oops) are given to such blocks. The block stubs represent these pseudo-objects. They hold just enough information so that when a block is evaluated (a value message is sent to it), the Alltalk interpreter 44 can create an active context for it. Note that a block stub exists as long as its home context exists; it does not go away just because its associated active context returns. In fact, in the case of loops in Smalltalk code, the same block stub might be evaluated many times, having an active context created from it and destroyed each time.

Because block stubs are stored as a separate piece, the active contexts can be allowed to obey a stack discipline. This simplifies context management, and improves performance.

The data structure for contexts is defined in "interp.sub.-- types.h". Contexts are of fixed size, and have 64 temporaries each. (Smalltalk defines 64 as the maximum number of temporaries a context may have.) This allows the Alltalk interpreter 44 to allocate space for them and doubly link them at interpreter initialization time, rather than on the fly. They are allocated as an array, and have one array/stack of contexts per Smalltalk process. The routine init.sub.-- cntx0 initializes one context, and it is called by init.sub.-- cntx.sub.-- stackO which initializes and links all contexts for a given process when the process gets created.

The data structure for block stubs is also defined in "interp.sub.-- types.h". Block stubs are of fixed size. This allows space to be allocated for them and allows them to be linked at interpreter initialization time, rather than on the fly. They are allocated as an array, and have one array/stack of contexts per Smalltalk process. The routine init.sub.-- blk.sub.-- stub.sub.-- stackO initializes and links all block stubs for a given process when the process gets created.

In addition to the two arrays, the Alltalk interpreter 44 maintains a pointer to the current active context, cur.sub.-- cntx, and a pointer to the next available (unused) block stub, next.sub.-- blk.sub.-- stub, for each process.

The fields of a context that are important for context management are described next.

prev, next

Each context has a prev pointer which links it to the previous context in the array/stack, and a next pointer which links it to the next context in the array/stack. These pointers are used rather than the array index to move between contexts. The Alltalk interpreter 44 follows the next pointer of the current context when it needs to add a new context. This happens when a message is sent (see send.sub.-- msg.sub.-- bcode), or a block is evaluated (see eval.sub.-- blk.sub.-- bcode). The Alltalk interpreter 44 follows the prev pointer of the home context of the current context to find the context to which it should return when it does a long return; it follows the prev pointer of the current context itself when it does a short return(see "short return", Table 4.1).

home.sub.-- cntx

For a method context, home.sub.-- cntx points to itself. For a block context, home.sub.-- cntx points to the context of the method in which the block is defined. This pointer is needed when the Alltalk interpreter 44 does long returns from blocks, and when blocks refer to the temporaries of their home method. By having a method context's home be itself, the Alltalk interpreter 44 can handle all long returns (both from method contexts and from block contexts) in the same way.

first.sub.-- block

The first.sub.-- block field of a context points to the first block stub that the context could allocate. This is used to free up block stubs when an active context returns.

my.sub.-- blk.sub.-- stub

For a method context, my.sub.-- blk.sub.-- stub is not used, and is NULL. For a block context, the field points to the context's corresponding stub. This pointer is used by the debugger (described below), and is also used in conjunction with the prev.sub.-- active.sub.-- cntx field to handle the case where one block stub has multiple active contexts at the same time.

prev.sub.-- active.sub.-- cntx

For a method context, prev.sub.-- active.sub.-- cntx is not used, and is NULL. For a block context, it is used in conjunction with the my.sub.-- blk.sub.-- stub field to handle the case where one block stub has multiple active contexts at the same time. It saves a pointer to the previous active block context associated with this block context's block stub. If this context is the only active context associated with the block stub, then this field holds a NULL pointer.

The fields of a block stub that are important for context management are described next.

id

Each block has an id which is an oop (long integer) in a special range, that is, greater than or equal to the constant INIT.sub.-- CNTX.sub.-- ID. The id's are assigned to a stub when the process to which it belongs is initialized. The id can be stored in the temporaries of other contexts, and can be passed as a parameter in a message send. In this way, blocks can be treated (almost) like real objects for flexibility, and yet be managed by the interpreter for good performance.

next

Each block stub has a next pointer which links it to the next block stub on the array/stack. When a new block stub is needed, the Alltalk interpreter 44 uses the one pointed to by the global pointer, next.sub.-- blk.sub.-- stub. At that time, it follows the next pointer of the stub pointed to by next.sub.-- blk.sub.-- stub to update next.sub.-- blk.sub.-- stub.

home.sub.-- cntx

Each block stub has a pointer to its home context. If the stub gets evaluated, the Alltalk interpreter 44 needs this pointer in the active context created for the block. Via this pointer, it can get at the temporaries of the home context.

active.sub.-- cntx

When a block gets evaluated, the Alltalk interpreter 44 updates the stub with a pointer to the active context that gets created to do the evaluation. This pointer is needed in order to resolve references to type 6 variables.

When a block is stored in an instance variable, or passed back from a method the Alltalk interpreter 44 must make the block a persistent object. In so doing, it must also make the home context a persistent object as well, since the block can reference temporaries of the home context. Alltalk contains routines to make the block and its home context persistent objects (and thus they may then be stored in the database and manipulated as any other object), and to put the block and home context back on the stack so that the block can be executed.

Referring to the Drawings, FIGS. 4 through 13 show how context management is done in Alltalk. Each Figure shows the same portion of the active context stack and the block stub stack for one process. Each box in the Figures represents one context or one stub; only the fields involved in context management are shown. (The my.sub.-- blk.sub.-- stub and prev.sub.-- active.sub.-- cntx fields are shown only in FIG. 13.) Pointers are indicated by arrows; pointers "connected to ground" represent NULL pointers. Pointers shown in double lines indicate pointers which were changed from the previous figure. The stacks grow downward.

FIG. 4, shows the state of the two stacks after the interpreter has been initialized, but no messages have been sent. Note that the next and prev pointers of the contexts, and the next pointers of the stubs have been established. Also, the id's of the stubs have been set.

FIG. 5, shows what happens when a message is sent. (We assume that the sending context is just off the top of the figure; the context we are about to create is the top box we see in the figure.) We follow the next pointer of the sending context to "create" a new context (from here on, called method context #1). The new context becomes the cur.sub.-- cntx, and its class is MethodContext. Since it's a method context, its home.sub.-- cntx is made to point to itself. Its first.sub.-- block pointer is made to point to the stub pointed to by next.sub.-- blk.sub.-- stub. Note that next.sub.-- blk.sub.-- stub is not moved; only when a block stub is used (i.e., set up) is the next.sub.-- blk.sub.-- stub moved forward.

FIG. 6 shows the stacks after method context #1 sets up its first block. Setting up a block means that the Alltalk interpreter 44 created a block stub; it does not mean that the Alltalk interpreter 44 creates another active context. The block stub pointed to by next.sub.-- blk.sub.-- stub becomes the new block stub. The Alltalk interpreter 44 pushes next.sub.-- blk.sub.-- stub forward to the stub pointed to by the next field of the new block stub. The home.sub.-- cntx field of the new block stub is made to point to the home.sub.-- cntx of cur.sub.-- cntx, i.e., method context #1. Note that if cur.sub.-- cntx were a block context, the home.sub.-- cntx of the new block stub would not be that block context, but rather the block's home context. Note also that method context #1 does not change.

FIG. 7 shows what the stacks look like after method context #1 sets up another block. We now have two block stubs whose home.sub.-- cntx is method context #1.

FIG. 8 shows the stacks after method context #1 sends a message. To handle this, the Alltalk interpreter 44 must "create" a new context (from here on, called method context #2). The Alltalk interpreter 44 follows the next pointer of the current context to find the next available active context, and make it the cur.sub.-- cntx. Its first.sub.-- block pointer is made to point to the block stub pointed to by next.sub.-- blk.sub.-- stub. Since the new context is a method context, its home.sub.-- cntx field is made to point to itself.

FIG. 9 is somewhat more complicated. In that figure, we see the stacks after method context #2 starts to evaluate one of the blocks that was set up by method context #1. (We assume that the block was passed as a parameter in the message which resulted in the creation of method context #2.) The stub to be evaluated is #214740009. To handle this, the Alltalk interpreter 44 must "create" a new active context--but this time, it is a block context. Just as with method context creation, the Alltalk interpreter 44 follows the next pointer of the cur.sub.-- cntx to find the next available active context, and make it the cur.sub.-- cntx. Also, the Alltalk interpreter 44 makes its first.sub.-- block pointer point to the block stub pointed to by next.sub.-- blk.sub.-- stub. However, the home.sub.-- cntx pointer of the new context does not point to the new context itself; because the new context is a block context, its home.sub.-- cntx pointer is gotten from its block stub. In this case, home.sub.-- cntx points to method context #1. Note also that the block stub's active.sub.-- cntx pointer is made to point to the new block context. The transformation of a block stub to an active context is handled by the routine stub.sub.-- to.sub.-- cntxO.

FIG. 10, shows how the stacks would appear if the block were to do a short return. Note that the Alltalk interpreter 44 simply follows the prev pointer of the current context to find the context to return to; it is made the cur.sub.-- cntx. Note also that the block stub associated with the evaluated block does not go away, even though its active context did go away. Block stubs go away when their home context goes away (returns). The Alltalk interpreter 44 also moves next.sub.-- blk.sub.-- stub to point to the block context's first.sub.-- block. This effectively "destroys" and frees up any block stubs set up by the block context. (In this case, the block context created no block stubs, so next.sub.-- blk.sub.-- stub does not change.)

FIG. 11, shows how the stacks would appear if the block were to do a long return. Remember that a long return from a block is the same as doing a return from the block's home context. In this case, the block's home context is method context #1, so the Alltalk interpreter 44 (in essence) does a return from method context #1. It follows the prev pointer of method context #1 to find the context to return to; it becomes the cur.sub.-- cntx. It also moves the next.sub.-- blk.sub.-- stub pointer back to point to the stub pointed to by first.sub.-- block of method context #1. This effectively "destroys" and frees up all blocks created by method context #1 and any of its descendent contexts.

FIGS. 12 and 13 show how the my.sub.-- blk.sub.-- stub and prev.sub.-- active.sub.-- cntx fields are used to handle the case where a block stub may have multiple active contexts associated with it. Note that these fields are shown in these figures only, and only for block contexts. Note also that we have shifted our view of the stacks down (or up) by one context in order to fit the contexts of interest on the page.

FIG. 12 shows how the stacks would appear if a second block context was activated for the same block stub as the current context. Note that the two block contexts created from the same block stub are very similar; only their prev.sub.-- active.sub.-- cntx fields differ. Note that the second one uses this field to point back to the previous (first) one. Note also that the active.sub.-- cntx field in the stub is updated so it points to the new context.

FIG. 13 shows how the stacks would appear if the second block context did a short return. The Alltalk interpreter 44 follows the my.sub.-- blk.sub.-- stub pointer of the returning block context to find its associated block stub. It copies the prev.sub.-- active.sub.-- cntx pointer of the returning block context into the active.sub.-- cntx field of the stub. Then it does the normal processing for a short return, that is, follow the returning context's prev pointer to find the sending context, and makes it the new current context. Note that in this example, prev and prev.sub.-- active.sub.-- cntx point to the same context, that is, the first block context; however, this will not necessarily be the case. There could be other intervening contexts between these two activations of the same stub. This is why it must save this information in the newly-created context.

4.5. Process Management

As mentioned above, the Alltalk interpreter 44 maintains run time data structures for Smalltalk processes in an array called Processes[]. Each element in that array represents one Smalltalk process. Each element contains (basically) a stack of active contexts, a pointer to the current context in that stack, an array of block stubs, and a pointer to the next available stub. The management of these two stacks and two pointers was described in the previous section. However, we have not yet discussed how processes are created, switched, or destroyed. These topics will be discused in this section.

4.5.1. Creating Processes

A Smalltalk process is created by sending a message to a block. The block contains the code that is to be executed in the new process. The message sent to the block might be forkAt:, fork, etc. However, all of these messages eventually result in the message newProcess being sent to the block. The Smalltalk code for method newProcess in Class Block is shown in Table 4.2.

                  TABLE 4.2
    ______________________________________
    newProcess
    "Answer a new process running the code in the receiver. The
    process is not scheduled."
     Process
    forContext:
    [self value.
    Processor terminateActive]
    priority: Processor activePriority
    ______________________________________


The forContext:priority: method in Class Process is a class method for creating new processes, and it is implemented as a primitive in Alltalk.

The routine createProcessO is the main routine for creating a new process. It first finds an available element in the Processes[ ] array by calling get.sub.-- proc.sub.-- idO. Then, in order to create a new process in Alltalk, the Alltalk interpreter 44 establishes the first context in that new process. It does that by copying appropriate active contexts and block stubs from the creator process to the created (new) process, and then making slight adjustments to the copies. This is best explained using an example.

Suppose an application wishes to create a process that simply prints a message. An example of code to do this is shown in Table 4.3.

                  TABLE 4.3
    ______________________________________
    Class Test
    .vertline.
    myTest
    .vertline. aProc .vertline.
    aProc <- [`This is a new process` print.] newProcess.
      "create it"
    aProc resume. "schedule it"
    Processor yield. "switch to it"
    ______________________________________


What contexts and stubs should be copied? Obviously, the Alltalk interpreter 44 must copy the user's block, that is, the one in method myTest. Because a block may refer to its home method's temporaries (though in this case it does not), and because a block's bytecodes are actually contained in its home method, it copies both the block stub and its home. In this case, the home context is the method context associated with the execution of myTest. But this is not enough. Note that the method, Block newProcess, which actually sends the message which directly creates the new process (Process forContext:priority:) also creates a block. This block, [self value. Processor terminateActive.], also must be copied; and its home context must be copied as well. In what follows, we call this block the outer block. Note that self in the outer block refers to the user's block.

To summarize: the Alltalk interpreter 44 copies the user's block and its home context (see proc.sub.-- copy.sub.-- cntx1O), plus the outer block and its home context (see proc.sub.-- copy.sub.-- cntx2O). After that, it evaluates the outer block, that is, it creates an active context from the block stub. When the new process becomes active, this, in turn, causes the user's block to be evaluated (as a result of the message self value). When that block finishes, the new process is destroyed (as a result of the message Processor terminateActive).

Referring to the Drawings, FIGS. 14 through 16 illustrate the relationships between these contexts and blocks. FIG. 14 shows a portion of the active context stack and block stub stack of the creator process. The contexts and stubs shown are the ones that are of interest when the Alltalk interpreter 44 creates the new process. FIG. 15 shows the active context stack and block stub stack of the created process just after it is created by the interpreter. FIG. 16 shows the same stack just after the new process has become active, and the user's block begins to execute.

4.5.2. Switching Processes

Switching processes is fairly straightforward. Before each bytecode is executed, the Alltalk interpreter 44 tests the Divert flag; if set, it switches to the process returned by the routine processSwitchO. The routine processSwitchO returns an oop; the routine find.sub.-- processO takes the oop as an argument, and returns a pointer to the corresponding element of the Process[ ] array.

The machinery for managing process switches is contained in the module process.c. It follows the implementation described in the standard reference for Smalltalk by Golberg and Robson, mentioned above.

4.5.3. Destroying Processes

Destroying (i.e., terminating) a process involves two basic steps. First, the appropriate element of the Processes[ ] array is marked as not in use so it can be reused if needed. Second, the garbage collector (described below) is told to clean up after the process. The routine destroyProcessO handles these two tasks.

Processes are destroyed in two situations. The first case is when the interpreter quits. At that time, all active processes are destroyed so garbage collection can be performed correctly. The second case is when a terminate message is sent to a Process object. This second case is implemented via primitives. Note that process 0 is created automatically when the interpreter is initialized; it cannot be destroyed, except by shutting down the interpreter.

4.6. Optimizations

Various techniques are used to improve the run-time performance of the Alltalk tool. These techniques are useful independently of the Alltalk tool. They can be advantageously employed in any Smalltalk-like object-oriented programming tool to improve the runtime performance. We describe these techniques below.

4.6.1. Replacing certain message sends with less expensive processing

This is referred to as message flattening. The Alltalk interpreter 44 detects at runtime if a message send's only purpose is either of the following 2 cases:

1. Return of an instance variable.

2. Execution of a primitive.

The Alltalk compiler 20 flags methods that are of these types, for easy detection at runtime. The Alltalk interpreter 44 will execute the appropriate logic in-line, and modify flags in the bytecode that is being executed, as well as caching in the bytecode itself the class of the receiver. Subsequent executions of the bytecode involved will cause the class of the now current receiver to be checked against the class cached in the bytecode. If it matches, the Alltalk interpreter 44 performs the optimized logic, in-line, without fetching (or executing) the method. Thus this optimization saves the fetching of the method, allocation (and subsequent deallocation) of a new context, and interpretation of the method.

4.6.2. Treating primitives as bytecodes

Rather than have one bytecode just for dispatching primitives, (e.g., an execute.sub.-- primitive bytecode), in Alltalk, each primitive is its own bytecode. This eliminates the extra level of indirection to get to the code for primitives. As mentioned previously, primitive bytecodes are in the range of 0.times.000 to 0.times.0FF hex; other bytecodes being at 0.times.100.

4.6.3. Saving a call to the object manager to fetch receiver

If the receiver of a message is the same as the receiver of the sending method, the Alltalk interpreter 44 avoids the call to the object manager to fetch the receiver again. Instead, since in Alltalk a pointer to the receiver is held in the associated context, the Alltalk interpreter 44 gets the receiver pointer from the associated context instead.

4.6.4. Replacing `value` messages with block evaluation

Since evaluating a block is less expensive than sending a message, the Alltalk interpreter 44 attempts to replace send.sub.-- msg.sub.-- bcodes with eval.sub.-- blk.sub.-- bcodes when possible. The Alltalk compiler 20 recognizes messages with the selector value (and value:, etc.), and replaces them with eval.sub.-- blk.sub.-- bcode2 bytecodes. This bytecode is the same as the eval.sub.-- blk.sub.-- bcode, except that it must check to see that the "receiver" of the value message is a block. If it is not a block, eval.sub.-- blk.sub.-- bcode2 simply returns, and lets processing fall through to the next bytecode which is a send.sub.-- msg.sub.-- bcode for the value message; if the receiver is a block, eval.sub.-- blk.sub.-- bcode2 operates like eval.sub.-- blk.sub.-- bcode, except that it must push the bytecode pointer past the following send.sub.-- msg.sub.-- bcode which it replaces.

4.6.5. Caching methods in send.sub.-- msg bytecodes

Alltalk uses a performance-improving technique, common to most Smalltalk implementations, known as method caching. The technique takes advantage of the fact that, while Smalltalk allows polymorphism, a given message often ends up being resolved to the same method every time. How Alltalk takes advantage of this is as follows.

The send.sub.-- msg bytecode has two extra fields which implement a method cache. One field is likely.sub.-- class. This saves the class of the receiver of the message when it was last sent. The second field is likely.sub.-- method. This saves the oop of the compiled method to which the message was resolved last time it was sent. When the bytecode is encountered again, the Alltalk interpreter 44 checks to see if the new receiver's class matches likely.sub.-- class; if it does, it uses the compiled method in likely.sub.-- method. If the classes do not match, it must do the normal, more expensive processing to fetch the appropriate method. Note that in Alltalk, when the cache is used, the Alltalk interpreter 44 calls the object manager to reserve the method object, to insure the object is not garbage collected until the object is no longer needed. However, this is less expensive than normal method fetching. Note also, that if the cache is not usable (i.e., the receiver's class does not match likely.sub.-- class), the Alltalk interpreter 44 updates the cache to match the receiver's class and the method's oop in the current message.

4.7. Initialization and Shutdown

The main procedure of the Alltalk interpreter 44 is contained in the module interp.c. It performs various types of initializations, then invokes the bytecode loop by calling exec.sub.-- bcodesO. When exec.sub.-- bcodesO returns, mainO does some minor clean up, and exits.

Initialization procedures are the following.

(1) Command line arguments are processed. These are parameters passed on the statement used to invoke the runtime environment 22. They include switches for relinquishing control of the keyboard and mouse to the Smalltalk application, and for avoiding the normal system booting procedures. Another parameter is an optional filename; it indicates that the interpreter should get the information for the initial message of the application from that file rather than by prompting the user.

(2) Signal handling is set up for the I/O primitives.

(3) The object manager 48 is initialized via a call to init.sub.-- omO.

(4) The values for the initial message are processed via a call to get.sub.-- init.sub.-- valsO.

(5) Keyboard and Mouse are `opened` via calls to openMouseO and openKeyboardO, if appropriate.

(6) The oops of certain Alltalk objects are referenced in the Alltalk interpreter 44 via global variables. Some of these are fixed to certain oops. For example, true is always oop 257. However some of the oops referenced via interpreter globals must be determined at start up of the interpreter--they are not fixed forever, just for the duration of the interpreter's run. The appropriate assignments are made by calling initializeOopsO. Likewise, certain instance variable indices are referenced by the interpreter via globals. These, too, must be determined at start up. A call to initializeIndicesO takes care of this.

(7) The first Smalltalk process is established. See the section above on Process Management for more details. The routines createProcessO, and init.sub.-- processorO do most of this work.

(8) The display is `opened` via a call to openDisplayO.

(9) The bytecodes and context for the first message are built and made the first one to be executed. Basically, the interpreter 44 builds:

(a) send.sub.-- msg and return bytecodes for the message startUp sent to Class SystemBoot;

(b) send.sub.-- msg and return bytecodes for the user-supplied initial message.

The routines bld.sub.-- dummy.sub.-- bcodesO and bld.sub.-- dummy.sub.-- cntxO perform these tasks.

4.8. The debugger

The debugger is named RAID, and it combines many of the features of the standard Smalltalk debugger and the UNIX debugger, dbx.

4.8.1. Overview of the Debugger

RAID (Revised Alltalk Interactive Debugger) is the debugger for the Alltalk system. We designed it to be used for debugging both Alltalk applications code, and the Alltalk system (implementation) itself. RAID provides typical debugger capabilities such as:

setting break points;

stepping through program execution;

tracing various types of information (messages, blocks, bytecodes, processes); and

displaying values of data structures/variables.

RAID is written in C, and is integrated quite closely with the Alltalk interpreter.

The user interface is a simple command interpreter, that looks somewhat like the Unix debugger, dbx, to the user. The command interpreter uses UNIX utilities lex and yacc to parse input and dispatch the appropriate C routines that perform the tasks of the RAID commands.

4.8.2. Basic Architecture of RAID

There are several versions of the Alltalk interpreter 44, each geared to a particular need. Not all of these interpreters contain RAID. For example, one version is optimized for running debugged applications as fast as possible; leaving out the debugger improves performance considerably. Another version is geared toward the collection of performance statistics; it also does not include the debugger. The version of the interpreter built by default, however, does include RAID.

Conceptually, there are three pieces to the implementation of RAID. One piece is a set of C routines in a library separate from the interpreter, that performs the tasks associated with the RAID commands. Each command has a C procedure associated with it, and that procedure may use other utility procedures to do its work. This first piece is conditionally linked to the interpreter depending on which version of the interpreter is made.

A second piece is the code within the interpreter that can get conditionally compiled into the interpreter itself; by default, it is included, but it can be excluded if debugging is not needed. This code is included when the C compiler switch DEBUGGER is on.

The third piece is a set of global variables and constants that are used to communicate between the first two pieces.

In what follows, we will refer to piece one simply as the debugger; piece two will be referred to as RAID code in the interpreter; piece three will be called debugger globals.

RAID is invoked when the interpreter calls a routine in the debugger called, appropriately enough, debuggerO. Flow of control is as follows:

(1) RAID code in the interpreter calls debuggerO.

(2) debuggerO prompts the user, and invokes the lex/yacc command interpreter.

(3) The command interpreter parses and interprets the user input, and calls the appropriate C-procedure with the appropriate parameters.

(4) The C-procedure performs the tasks associated with the desired command. This usually results in either display of some information (like the contents of the current context), or the updating of the debugger globals (like turning on or off the switch that tells the interpreter to stop at the next message-send).

(5) When the C-procedure returns, either control will be passed back to the interpreter at the point at which it called debuggerO, or the debugger goes to step 2. Which path is taken depends on the command just processed. For example, after the continue command executes, control is returned to the interpreter; after the print.sub.-- active.sub.-- cntx command executes, the user is given another RAID prompt.

(6) When control returns to the interpreter, it continues, executing both normal code and RAID code. RAID code within the interpreter may call debuggerO (step 1 above); it may update debugger globals; or it may display data to the user based on the values of the debugger globals (switches).

4.8.3. Command Interpreter

As previously mentioned, the interactive interface to RAID is a simple command interpreter built using the UNIX utilities lex and yacc.

The utility lex defines what are valid tokens in the RAID "command language"; the grammar defines how these tokens can legally be put together to form commands. In addition, the grammar calls the C-procedure associated with the command, passing the command parameters as arguments.

The following naming/capitalization conventions are employed for tokens:

(1) Tokens representing command names are all uppercase, e.g., MSG.sub.-- STEP.

(2) Other terminals have first letter uppercase, all others lowercase, e.g. Hex.sub.-- numeric.

(3) Non-terminals are all lowercase, e.g., help.sub.-- param.

4.8.4. Implementation of the Commands

This section will give a brief description of how each RAID command is implemented. For each command, we discuss how each of the three pieces of the RAID implementation (debugger, RAID code within the interpreter, and debugger globals) is used. First, we describe the naming/capitalization conventions used in the RAID implementation.

4.8.4.1. Naming conventions

Almost all variables, constants, and procedures that RAID uses begin with the letters "d.sub.-- " or "D.sub.-- " (the letter "d" or "D" followed by the underscore character). In addition, we use the following capitalization conventions:

(1) RAID global constants are all uppercase, e.g., D.sub.-- PROMPT.sub.-- SYMBOL.

(2) RAID typedefs and structure definitions are all lowercase, e.g., d.sub.-- ostat.sub.-- struct.

(3) RAID global variables have first letter uppercase, all others lowercase, e.g., D.sub.-- init.sub.-- vals.

(4) RAID macros are all uppercase, e.g., D.sub.-- CRESETO.

(5) RAID procedures are all lowercase, e.g., d.sub.-- whereO.

(6) Associated with each command with name command.sub.-- name is a routine with the name d.sub.-- command.sub.-- nameO.

4.8.4.2. RAID Switches

Some operations of RAID are controlled by two sets of binary switches. One set of switches controls the trace information that is displayed as the interpreter runs, e.g., message sends and returns. The other set holds state information, e.g., which RAID command is currently executing.

Each set of switches is implemented using a global variable bit vector, plus three macros: one for setting a particular switch (bit), one for resetting a particular switch (bit), and one for testing whether or not a switch (bit) is set. The first set of switches uses the global variable D.sub.-- display.sub.-- switches, and the corresponding macros are D.sub.-- DSETO, D.sub.-- DRESETO, and D.sub.-- ISDSETO. The second set of switches uses the global variable D.sub.-- control.sub.-- switches, and the corresponding macros are D.sub.-- CSETO, D.sub.-- CRESETO, and D.sub.-- ISCSETO.

4.8.4.3. Commands for starting and stopping execution

continue

This command simply continues execution of the interpreter by causing debuggerO to do a return. We cause debuggerO to return by setting the global variable D.sub.-- in.sub.-- debugger to "0" (zero).

quit, restart, rerun

The quit command causes the interpreter to exit; restart aborts the current Alltalk application, restarts the interpreter on the same application, and gives a RAID prompt; rerun is equivalent to restart followed immediately by a continue--it does not re-prompt the user before restarting the application. It is important to do garbage collection before aborting an application, so these commands make sure each active Smalltalk process is explicitly destroyed before aborting. The code does different things depending on the state of the Alltalk interpreter 44 when the command is invoked.

If the bytecode loop has not yet started, the user is forced to get into the bytecode loop (by executing one bytecode, for example) before allowing any of these commands to be used.

If the interpreter is in the middle of an application, i.e., it is inside the bytecode loop, the debugger does longjmpO to an appropriate spot in exec.sub.-- bcodesO where all active processes are destroyed in order to be sure garbage collection is done appropriately. Then it returns to interpO.

If an application has just completed, it is already outside the bytecode loop, so the debugger simply returns to the routine interpO; no garbage collection is needed since all active processes ran to completion.

In either of these last two cases, the debugger sets the appropriate global switch (D.sub.-- QUIT, D.sub.-- RERUN, or D.sub.-- RESTART) so that when it returns to interp, it knows whether to exit, restart, or rerun.

run

The run command is similar to restart, but it is used when the user wants to run a different Alltalk application without leaving the interpreter. The code, then must clear all breakpoints (since these are probably not meaningful in the new application), and get new values for the interpreter's initial message.

4.8.4.4. Commands for finding out where you are

print.sub.-- message, where

The where command is analogous to the dbx command of the same name. It prints out the currently active messages, i.e., the messages sends that have not yet returned. Only those messages in the currently-active process are printed. The print.sub.-- message command prints only the most-recently activated (last sent) message. Both commands use the routine d.sub.-- print.sub.-- msgO to print the message associated with a given context; where calls this routine on all the contexts in the context stack of the current process; print.sub.-- message calls this routine only on the current context.

4.8.4.5. Commands for setting breakpoints

stop.sub.-- at

This command handles a stop set for a particular bytecode type, e.g., send.sub.-- msg.sub.-- bcode. If the user enters the command without a parameter, the debugger simply prints out the currently-set stop, if any. If a parameter is given, the debugger stores it into the RAID global variable, D.sub.-- stop.sub.-- at.sub.-- bcode. Bytecodes range from (hex) 0.times.100 to 0.times.156; primitives range from (decimal) 0 to 255. The user may specify a bytecode in either range. As the interpreter executes, within exec.sub.-- bcodesO, before executing a bytecode, it checks the bytecode against the D.sub.-- stop.sub.-- at.sub.-- bcode; if it matches, the interpreter calls debuggerO.

stop.sub.-- in, delete

These commands handle stops set for particular methods and/or classes and/or selectors. More than one stop can be set at a time; the constant D.sub.-- MAX.sub.-- STOPS determines how many stops can be used. Stops are stored in the global array D.sub.-- stop.sub.-- in.sub.-- data. They are identified by number, from 1 to D.sub.-- MAX.sub.-- STOPS. The parameters of the stop.sub.-- in command define a new stop; new stops are added using d.sub.-- add.sub.-- stopO called from d.sub.-- stop.sub.-- inO. As with stop.sub.-- at, if invoked with no parameters, stop.sub.-- in simply prints the currently set stops using d.sub.-- print.sub.-- stopO; Stops are deleted using the delete command. Note that deleted stops cannot be re-used.

During interpreter execution, in the send.sub.-- msg bytecode handler, after each send.sub.-- msg bytecode is executed, a check is made to see if the just-executed bytecode matches any of the stops. If so, the stop is printed, and debuggerO is called.

4.8.4.6. Commands for executing a limited portion of the application

bcode.sub.-- step

This command simply causes the interpreter to continue execution until the next bytecode is about to be executed. It sets the D.sub.-- BCODE.sub.-- STEP switch. During interpreter execution, before a bytecode is executed in exec.sub.-- bcodesO, this switch is tested; if set, debuggerO is called. The switch is reset every time debuggerO is called.

goto, skip.sub.-- msg

These commands cause the interpreter to continue execution until a particular message is sent. The message is identified by the process in which it executes, and by its sequence number within that process. With the goto command, the user specifies an absolute message sequence number; with the skip.sub.-- msg command, he specifies a relative message sequence number. Note that goto also allows the user to specify a particular process; skip.sub.-- msg uses the current process. The process and message sequence number are stored in D.sub.-- goto.sub.-- skip. These are cleared every time debuggerO is called.

msg.sub.-- step

This command is to messages as bcode.sub.-- step is to bytecodes. It causes the interpreter to continue execution until the next message is sent. It sets the D.sub.-- MSG.sub.-- STEP switch. During interpreter execution, after a send.sub.-- msg bytecode is executed, this switch is tested; if set, debuggerO is called. The switch is reset every time debuggerO is called.

next.sub.-- msg

This command is rather more complicated than msg.sub.-- step. This command is to msg.sub.-- step as the dbx command next is to the dbx command step. That is, it causes the interpreter to continue executing until the next message at the current level is sent. In order to do this, it must keep track of what the current level was when the command was invoked; this is stored in D.sub.-- base.sub.-- cntx. As the interpreter executes a send.sub.-- msg bytecode, it checks to see if the message just sent was sent from D.sub.-- base.sub.-- cntx. If so, then debuggerO is called. Also, on every return bytecode, the interpreter checks to