Object-oriented, logic, and database programming tool with garbage collection4989132Abstract A programming tool is provided which integrates an object-oriented programming language system, a logic programming language system, and a database in such a manner that logic terms can be treated as objects in the object-oriented programming language system, objects can be treated as logic terms in the logic programming language system, and logic terms and objects are stored in the database in a common data structure format. Automatic management of the database is provided which is transparent to the user. Claims We claim: Description TECHNICAL FIELD OF THE INVENTION
TABLE 3.1
__________________________________________________________________________
Parse Tree Example
__________________________________________________________________________
Method Code
! SequenceableCollection methodsFor: 'enumerating'
do: aBlock
.vertline. index length .vertline.
index <- 0.
length <- self size.
[(index <- index + 1) <= length]
whileTrue: [aBlock value: (self at: index)]
__________________________________________________________________________
Parse Tree
statement statement
assign assign
"index" "length"
number unary.sub.-- expr
"0" "size"
identifier
"self"
statement
keyword.sub.---- expr
keyword.sub.-- arg
"whileTrue:"
block block
statement statement
binary.sub.-- expr
keyword.sub.-- expr
keyword.sub.-- arg
"<=" "value:"
assign identifier
identifier
keyword.sub.-- expr
keyword.sub.-- arg
"index" "length"
"aBlock"
"at:"
binary.sub.-- expr identifier
identifier
"+" "self" "index"
identifier
number
"index" "1"
__________________________________________________________________________
A successful parse generates a parse tree for the statements of each method in the class. These parse trees are anchored in a method structure for each method, which are all, in turn, linked to a single class structure. When the parsing of a class is complete, the class structure is handed to analysis and code generation routines. 3.2.2. Code Generation The major components of this stage of the compiler are: 1. Symbol table (symbol.c) 2. Code generation (compile.c) 3. Code management (code.c) 4. Optimization (optimize.c) Generally, the processing steps involved in this stage, (implemented by a function called compileClass()), proceed as follows: 1. The symbol table is initialized with the instance and class variables available via the superclass chain for the class. These symbols are retrieved from a symbol file in the local directory. It is considered a fatal error if the superclass cannot be found in the symbol file, (i.e., the superclass must be compiled first). 2. The instance and class variables for the class are added to the symbol table. Name clashes involving superclass variables are also considered fatal. 3. Each method is compiled. During method compilation, bytecodes are collected into segments corresponding to groups of statements in the method: one for the method itself, and one for each block within the method. When method compilation is complete, the method code segment is emitted first, followed by the segments for each block. 4. After the methods have been successfully compiled, a record of the class' instance and class variables are written to the symbol file in the local directory. This makes the class available for use as a superclass in subsequent compilations. The method compilation step is the heart of the compilation task. Before describing this step in detail, a description of the compiler's view of symbolic references and the symbol table is given. 3.2.3. Symbols Throughout the Alltalk compiler 20, references to named symbols in the program being compiled, as well as references to unnamed runtime storage are represented in a uniform manner. This uniform representation allows the code generation stage to freely create and pass references between the recursive routines which implement this stage without regard for their type until a leaf routine which needs detailed type information is executed. The conciseness of the code generation routines is greatly enhanced with this representation scheme. There are nine reference types, as follows: Named References Instance Variable Class Variable Method Parameter Formal Method Temporary Block Parameter Global Symbol ("true", "false", "nil", class name, etc.) Unnamed References Constant ("10", "3.14", `a string`, #symbol, etc.) Compiler Temporary (used in evaluating intermediate expressions) Block Stub/Closure (reference to storage holding the runtime id of the closure) The symbol table supports a subset of the named references, separating them into the three categories of: (1) class, (2) instance, and (3) temporary symbols. Temporary symbols encompass the method parameter, formal method temporary, and block parameter references. Global symbol references are never actually placed in the symbol table, but are materialized whenever the search for a name fails. These symbols are resolved by the second phase 32 of the Alltalk compiler 20, since the cross reference values for these names are actually present in the runtime system dictionary contained in the database 40. The symbol table interface routines contain the usual routines for the addition of symbols, (addSymbol()), and name-based search for symbols, (findSymbol()). An initialization routine, (initSymbols()), purges the table and then uses the globally specified superclass name to populate the table with "ref" structures for the instance and class variable symbols available via the superclass chain, as recorded in the symbol file in the local directory. A routine for writing the instance and class variable symbols, (writeSymbols()), to the local symbol file for the globally specified class, (i.e., the one being compiled), is also provided. Finally, a pair of general routines, (markSymbols() and releaseSymbols()), are available for get/set of placeholders in the symbol table. These are primarily used to record the starting position of method and/or block temporary symbols, so that they can be removed at the end of the compilation of the method and/or block statements. 3.2.4. Method Compilation In the runtime environment 22 (see FIG. 3), a method is executed with an associated "context" containing local storage organized as an array of temporary slots, analogous to a "stack frame" in a conventional language. This local storage is divided into the following five sections from the compiler's point of view: 1. The object id of the receiver, known as "self". 2. Method parameters. 3. Formal method temporaries, (named temporaries). 4. Compiler scratch area for intermediate expression evaluation. 5. Block stub/closure id storage for blocks in the method. A general mechanism for tracking the use of the temporary slots is implemented in the compiler using a set of macro routines. This set includes routines for: allocating a number of temporaries, (allocTemp()), which returns the starting slot for the requested count; freeing a number of temporaries, (freeTemp()); get/set of temporary usage information, (getTempUse(); and setTempUse()); clearing usage information, (clearTempUse()); and requesting the high water mark for temporary usage, (maxTempUse()). Temporary usage is tracked with these routines for the first four kinds of temporaries listed above. Storage for block ids is tallied separately during method code generation since it is not known what the required number of compiler scratch temporaries will be until the method compilation has finished. Before the method statements are examined, several initialization steps are performed: 1. The symbol table is populated with the entries for "self", "super", the parameters, and the formal temporaries. The slot index for each entry is determined by allocating a temporary as each symbol is added to the table. 2. A code segment is allocated for the method statements and made to be the "active" segment. During compilation, generated code is placed in the "active" code segment, which is switched when compilation of a new list of statements, (e.g., a block), is started or completed. 3. Label generation is reset, (used for branch targets and block entry points). 4. The block count is reset. Also prior to commencing code generation, the parse tree of the method is examined to see if it can be tagged for "flattening" at runtime. "Method flattening" is a technique for determining whether a runtime message send can be avoided because the method is "trivial". A "trivial" method is one which contains a single statement returning either: 1. An instance variable, (can replace send with an assign), or 2. The result of a primitive for which first argument is self and the remaining arguments to the primitive invocation line up exactly with arguments to the method, (can replace send with primitive invocation). At this point, compilation of the method statements is initiated by calling compileStatementList() with a pointer to the first statement parse node of the method. This routine invokes compileExpr() to compile the expression associated with each statement in the list. CompileStatementList() is used to compile lists of statements for blocks as well as methods, generating appropriate return bytecodes when explicit return statements are encountered and after the last statement in the method or block. CompileStatementList() distinguishes between method and block statement lists by the value of active code segment id, which is -1 for a method or >=0 for a block. Provision is also made for the case of "inline" block code generation, which is used in optimization of certain messages involving blocks, (such as messages to booleans), described later. 3.2.5. Expressions Expression compilation is the center of most activity during the compilation of a method. CompileExpr() defines the compilation actions for all parse nodes other than statements in a concise manner. This routine is invoked with the node to be compiled and a destination specification for the result of compiling the expression indicated by the node in the form of a "ref" structure. The destination specification allows the calling routine to control placement of the expression's value, which is particularly useful for aligning values for message sends, minimizing unnecessary data movement at runtime. Simple expressions, such as identifiers and constants, are trivial compilations requiring only assignment of the value associated with the identifier or constant description to the specified destination. An explicit Alltalk assignment expression, (e.g., "a.rarw.b+c"), only requires compilation of the expression on the right of the ".rarw.", with the reference on the left as the destination, in addition to assigning this result into the specified destination for the assignment expression itself, (e.g., "d.rarw.(a.rarw.b+c)"), if indicated. The remaining expression types, (messages, cascades, primitive invocations and blocks), require somewhat more involved compilation steps, hence, these cases have been split into separate routines, (genSend(), genCascade(), genExecPrim(), and genBlock()). We now describe the compilation steps performed in each of these cases. 3.2.6. Messages and Cascades The runtime implementation of the send message bytecode requires that the receiver and arguments be present in a contiguous set of the sending context's temporaries. The location for the return value of the message send is also required to be a temporary in the sending context, though it need not be adjacent to the receiver and arguments. GenSend() ) complies with the first condition by allocating a contiguous set of temporaries, (via allocTemp() ), and compiling the receiver and argument expressions with each of these temporaries, (in order), as the specified destination. Hence, results of receiver and argument expressions are cleanly aligned with their use in containing message sends, eliminating unnecessary data re-positioning assignments. A simple optimization is also done at this point. If the receiver and argument temporaries already happen to line up,(detected by lineup() ), new temporaries are not allocated and the receiver and argument values need not be moved. The second condition, (destination must be a temporary), is honored by examining the specified destination reference and allocating a temporary to hold the result of the message send if the destination is not already a temporary. This situation is remembered and code is generated for moving the result from the allocated temporary to the actual destination after the message send. This implementation style allows for the addition of variations of the send message bytecode for non-temporary destionations, if the need arises. The previous comments also apply to cascaded message sends, (genCascade() ), except that the receiver expression is only evaluated once and the result placed in a temporary, to which the remaining messages in the cascade are sent, (genCascadeSend() ). 3.2.7. Primitive Invocations As with message sends, Alltalk's primitive invocations require that arguments to the primitive be in a contiguous set of the invoking context's temporaries, and the destination for the result be a temporary in the invoking context. GenExecPrim() ) handles the non-temporary destination and argument alignment cases, (using lineup() ), in the same manner as is done for message sends. Each primitive argument is compiled, allowing arbitrary expressions to be used as arguments. 3.2.8. Blocks Blocks are the most involved expression compilations in that they cause changes in the global state of the compiler. In Alltalk, a block is a list of statements which are to be executed with their own context when a "value" message is sent to it. The lexical aspects of a block allow it to refer to names available to the method in which the block is defined, as well as the names in any containing block. These names include the method's parameters and formal temporaries and any containing block's parameters. These semantics imply that a block is a static "object" of sorts which can potentially have muliple runtime activations, with each activation dynamically establishing variable name .rarw..fwdarw. storage bindings. Hence, from the compiler's point of view, a block is a separate list of statements to be compiled and "set-up" as an object, which may also include cross-context runtime references to be represented. GenBlock() ) alters the global state of the compiler to create the proper compilation conditions to meet the needs described above. The block being compiled is given a unique id within the method, and a new code segment is allocated and marked with this id and connected to the list of code segments generated for the method so far. The currently active code segment is saved, along with its temporary usage, (since the block will have its own context), and the new segment is made the active segment, (generated code is always placed in the active segment). The previous code segment and its temporary usage are restored when compilation of the block is completed. The symbol table is marked so that the block's symbols, (block parameter names), can be released at the end of the block's compilation, and the block's symbols are added to symbol table for proper scoping. Finally, a label is generated to mark the start of the block's code in the method. At this point, the state of the compiler has been properly altered, and compilation of the statements in the block is initiated via compileStatementList() ). Once the block has been compiled, all the information needed to describe the block's activation characteristics at runtime, (temporary usage and entry point), has been established. This information is supplied to the runtime interpreter 44 via the set up block bytecode. This bytecode causes the interpreter to copy this information and associate it with a unique runtime id, known as a block stub id. A block stub id can be manipulated much the same way as any other object id. In the case of returning a block stub, or assigning a block stub to an instance variable, Alltalk establishes an object for the block stub. The information associated with the block stub id is used to establish a context for executing the statements of the associated block whenever the "value" message is sent to this id, (i.e., the evaluate block bytecode is executed for the id). Note that this requires that a block must be "set up" before it can be "evaluated" at runtime. Alltalk choses placement of the set up bytecode for a specific block so that the bytecode is not executed an uncontrolled number of times. This is because the set up block implementation in the interpreter does not check for multiple "set ups" performed for the same block. The solution to the placement problem is to group the set up block bytecodes for any "top level" block, (i.e., any block encountered while generating code for the method statements), and its contained blocks, and place them in the method code segment ahead of the first use of the "top level" block. This technique avoids executing set ups for any block(s) which are not in the specific control flow path at runtime. GenBlock() ) implements this strategy by setting a pointer to a position in the method segment code at which the set up block bytecodes are to be "spliced" when a "top level" block is entered. 3.2.9. Message Optimizations Except for aiding the runtime environment for "method flattening", the rest of the compiler optimizations involve recognition of specific message selectors in the source code, (optimize.c). The optimization strategy for these selectors is to generate inline code to implement the specific semantics of the selector, (assuming a specific receiver class), in order to avoid sending the actual message at runtime. These optimizations are detected by the genOptiSend() ) routine which is invoked from compileExpr( ) ) when a message expression is compiled. If genOptiSend () ) can handle the message, the normal compilation via genSend() ) is avoided by compileExpr() ). The message selectors/receiver class combinations which are currently optimized are listed in Table 3.2.
TABLE 3.2
______________________________________
Optimized Messages
Class Selector
______________________________________
Object perform:
perform:with:
perform:with:with:
perform:with:with:with:
Integer +
-
=
Block value
value:
value:value:
value:value:value:
value:value:value:value:
value:value:value:value:value:
whileTrue:
whileFalse:
whileTrue
whileFalse
True/False ifTrue:
ifFalse:
ifTrue:ifFalse:
ifFalse:ifTrue:
and:
or:
not
&
.vertline.
Object/ isNil
UndefinedObject notNil
______________________________________
The complexity of these optimizations vary from simply generating special bytecodes, (e.g., Integer messages), to inline block code generation with conditional branch bytecodes for implementing looping constructs, (e.g., Block "while" messages). Due to the straightforward expression of these optimizations, they are not treated in detail here. However, one of the more complex optimizations, (optWhile() ), will be described to highlight and convey an understanding of some of the issues and supporting procedure structure involved in these optimizations. OptWhile() ) handles optimization of the various "while" messages which can be sent to blocks, (whileTrue:, whileFalse:, whileTrue, and whileFalse). This routine demonstrates the need to deal with: 1. Evaluation of literal or non-literal block objects in receiver and/or argument positions, 2. Proper placement of set up block bytecodes to avoid repeated set up of the same block(s), (described previously in the section on block expression compilation), and 3. Generation of additional code to implement the semantics of the message, (looping, in this case). Since the semantics of the "while" messages clearly involves sequenced evaluation of receiver and argument blocks, it is possible, if either block is literal, to treat the statements of that block as if they were in the statement list of the method or block containing the "while" message. This causes code to be generated directly into the currently active code segment, ("inline"), resulting in evaluation of that block in the current context at runtime, instead of setting up a separate context for evaluation of the block code. If either block is not a literal, (e.g., passed in as a parameter), that block must be evaluated in a separate context, (performed by the "evalb" bytecode). This situation of block code generation strategy arises in the optimization of many other of the messages listed in Table 3.2. OptBlock() ) determines the code generation strategy based on the type of the parse node representing the block in the parse tree. If the node represents a literal block in the source code, the statement list for the block is compiled into the active code segment using compileStatementList() ). Otherwise, a "value" unary message expression, with the node representing the block as the receiver, is constructed and compiled under the explicit assumption that the receiver will be a block, (genEvalBlock() ). Note that this assumption is not made, (and a different bytecode is generated), when the "value" message is encountered in the original source code, since the actural receiver may not be a block at runtime, in this case. Literal blocks which are part of the "while" message may be "top level" blocks, (i.e., outermost block of a nesting within a method). Beacuse of this, optWhile() ) must set the "splice point" in the method segment code for set up block bytecodes for any blocks contained in the "while" blocks, such that these bytecodes are placed outside the looping portion of the "while" code. This avoids the multiple "set up" problem for a block discussed in the previous section on block compilation. With the background of the preceding discussion, the implementation of the optimization of "while" messages is summarized in the following steps: 1. If the "while" message is encountered in the method statement list, set a marker to the current position in the code as the "splice point" for set up block bytecodes for blocks which are encountered during compilation of the "while" message. 2. Generate a label to mark the start of the condition block, (i.e., the receiver of the "while" message). 3. Compile the condition block, (using optBlock() ), with an allocated temporary as the destination for its evaluation result. 4. Generate a conditional branch to the end of the "while" message code, (step 7), based on the result of evaluating the condition block and the specific message being compiled. 5. Compile the body block, (using optBlock() ), with no destination for its evaluation result. 6. Place an unconditional branch back to the label generated in step 2 to close the loop. 7. Generate code to assign "nil" to the destination specified for the value of the "while" message expression, (the destination may be "none"). This is the defined value for a "while" message expression. 8. Free the temporary allocated for the result of the condition block in step 3. An example of the code generated for "while" message expressions with different combinations of literal and non-literal condition and body blocks is shown in Table 3.3.
TABLE 3.3
______________________________________
"While" Message Code Generation
Source Statement Generated Code
______________________________________
[x < y]whileTrue: L1
[x <- x + 1].
send t1[x],0,t5,2,#<
jne L2,t5,'true
mov t6,t1[x]
mov t7,1
send t6,0,t1[x],2,#+
jmp L1
L2
b1 whileTrue: [x <- x + 1].
L5
evalb t3[b1],t5,1
jne L6,t5,`true
mov t6,t1[x]
mov t7,1
send t6,0,t1[x],2,#+
jmp L5
L6
[x < y] whileTrue: b2.
L9
send t1[x],0,t5,2,#<
jne L10,t5,`true
evalb t4[b2],t6,1
jmp L9
L10
b1 whileTrue: b2. L13
evalb t3[b1],t5,1
jne L14,t5,`true
evalb t4[b2],t6,1
jmp L13
L14
______________________________________
The intermediate language is discussed further in the next section. 3.3 . Phase 2 (kasm) As noted in the synopsis, the second phase 32 of the Alltalk compiler 20 concerns itself with resolving symbols, and creating and loading the classes and methods into the database. 3.3.1. Intermediate Language The intermediate language expected as input for this phase consists of tokens representing bytecodes, along with directives for establishing the class, delimiting methods, and tracking the Alltalk source file name and line numbers. A summary of these tokens and directives are listed in Tables 3.4 and 3.5, respectively.
TABLE 3.4
______________________________________
Intermediate Language Bytecode Tokens
______________________________________
Message Send/Return
send send message
sendp send parameterized message, ("perform")
mret return from method, (" " in source code)
Integer Arithmetic Optimizations
seq send "=" message
sadd send "+" message
sadd1 send "+ 1" message
ssub send "-" message
ssub1 send "- 1" message
Block Set-Up/Evaluation/Return
setb set up block stub/closure
evalb evaluate block (receiver must be a block)
evalbo evaluate block (receiver might not be a block)
bret return from block
Data Movement
mov src/dest specified by "effective addresses"
Primitive Invocation
prim execute specified primitive
Control Flow
jeq jump on target equal to constant
jne jump on target not equal to constant
jmp unconditional jump
______________________________________
In addition to the basic elements of the language, symbolic labels of the form "L<number>" are also available for use in the code, (for jeq, jne, jmp, and setb bytecodes), with target labels being required to start at the beginning of a line which contains no other tokens. Comments are allowed on any line, and are defined to be anything contained between a semicolon, (";"), and the end of the line. An example of this intermediate language for a method of a class call Foo, is shown in Table 3.6, which was constructed in order to demonstrate the variety of code and reference type representations generated by phase 1.
TABLE 3.6
______________________________________
Intermediate Code Examples
______________________________________
Method
Class Foo :Object
.vertline. instvar Classvar .vertline.
.vertline.
do: aBlock
.vertline. a b .vertline.
instvar <- Classvar.
instvar associationsDo:
[:assoc .vertline. assoc value timesRepeat:
[aBlock value: assoc key]].
a <- `hi andy`.
b <- #(2 ( foo `hi` $a ) `fred`).
]
Intermediate Language
.file Foo.st
.class Foo Object
.supervar 0
.ivar instvar
.cvar Classvar
.imethod do:
.mparam aBlock
.mtemp a b
L0
mov i0[instvr],`Foo@i1[Classvar]
mov t5,i0[instvar]
setb b0,L1,5
setb b1,L2,5
mov t6,b0
send t5,0,t4,2,#associationsDo:
mov t2[a],`hi andy`
mov t3[b],( 2 ( #foo `hi` $a ) `fred`)
mret t0[self]
L1
evalbo t1[assoc],t3,1
send t1[assoc],0,t3,1,#value
mov t4,b1
send t3,0,t2,2,#timesRepeat:
bret t2
L2
mov t2,mt1[aBlock]
mov t4,b0@t1[assoc]
send t4,0,t3,1,#key
evalbo t2,t1,2
send t2,0,t1,2,#value:
bret t1
.mend 7 2
______________________________________
3.3.2. Effective Addresses References to various types of runtime variables and constants are represented in specific symbolic forms in the intermediate language which we call "effective addresses". These forms appear in the argument fields of many of the bytecode tokens, although not all forms are valid in specific argument positions of specific bytecodes. These effective address forms are summarized in Table 3.7, and the reader is again referred to the code in Table 3.6 for examples.
TABLE 3.7
__________________________________________________________________________
Effective Address Forms
Runtime
Form Example
Type Description
__________________________________________________________________________
<num> 10.2 1 Numeric constant.
$<char> $a 1 Character constant.
`<chars>` `hello`
1 String constant.
#<chars> #size
1 Symbol constant, (object id of symbol).
`<name> `Bag 1 5 Global symbol cross reference constant, (object
id
associated with symbol name).
#(<consts>)
#(1 `hi`)
1 Array constant, (can be nested).
t<num> t5 5 Current context temporary.
mt<num> mt2 2 Owning method context temporary, (only found in
10
block code).
b<num> b1 2 Block stub id as slot in owning method context
temporary.
i<num> i0 3 Instance variable slot.
`<name>@i<num>
`Bag@i2
4 Class variable. Combination of cross reference
15
constant, (the class name), and slot.
b<num>@t<num>
b1@t3
6 Block parameter reference. Generated only when a
block refers to a containing block's
__________________________________________________________________________
parameters.
In the particular case of the "mov" bytecode, phase 2 translates both the source and destination effective address forms into one of six specific runtime reference types. 3.3.3. Operational Description Phase 2 maintains a global state around the current class and method being "assembled", resulting in method-at-a-time assembly and placement into the database. Ths class object is not given to the object manager until all methods described in the input file have been successfully translated and passed to the object manager. This insures that the old version of the class, (hence, its methods), is not replaced unless assembly of the new version is successful. In contrast to phase 1, this phase is very "flat", that is, it contains no recursive functions to walk parse trees, since each input statement is essentially a self-contained description. All the implementing functions, (assemble.c), are despatched directly from the parser on a per statement, (or group of statements), basis, resulting in a very simple control flow. Assembly of a method essentially consists of collecting the bytecodes described by the bytecode statements into a scratch area, (MethodBytes), and recording labels, references to labels, and block references in these statements for resolution when the end of the method is reached. Each bytecode statement has a corresponding translation routine, (assemble.c), which builds the runtime representation of the bytecode in the scratch area. When the end of the method is reached, (endMethod() ), all label and block references are resolved and the object manager is called upon to allocate space for the compiled method object. In this area, the instance variable slots for the method object are initialized, (noTemps, noParms, classOop, selectorSymbol, . . . , etc.), and the bytecodes are copied in from the scratch area. A dictionary entry relating the method selector symbol id of the method to the id of the compiled method object is also created and added to entries already established for other methods, (in the Methods global array). These dictionary entries are stored in the class object when the end of the class is reached, (i.e., when a new `.class` directive or end-of-file is encountered). When the end of the class is reached, (endClass() ), space is obtained from the object manager under the same object id as the previous version of the class, to cause replacement of that class. The class is then built in this area by filling in control information, including the object id of the first instance of the class obtained from previous version, the object id of the class name symbol, the id of the superclass and the size of the method dictionary for the class. The method dictionary entries are then closed-hashed, (by method selector symbol id), into a dictionary area in the class object. The class object is then flushed to the database, signaling completion of the assembly of the class, ending phase 2. 4. Interpreter The interpreter 44 (see FIG. 3) is that portion of the Alltalk runtime environment 22 which the user invokes to run Alltalk applications. The interpreter 44 decodes the object code generated by the compiler 20 (FIG. 2), and executes it, calling upon many of the other services of the runtime environment 22. The interpreter 44 also includes a debugger, described below, which allows the programmer to inspect the running program in a variety of ways. 4.1. Synopsis The previously described Alltalk compiler 20 for the Alltalk dialect of the Smalltalk language translates Alltalk source code into an intermediate representation, called bytecodes, and stores this representation in the database 40. Each bytecode represents an instruction for the interpreter 44, and consists of an operation code (a = bit integer) and a variable number of parameters. Applications are executed using the Alltalk interpreter 44. The Alltalk interpreter 44 uses the object manager 48 as the interface to the database 40. It also calls on the transaction manager 46 and the garbage collector 52. In addition, it invokes primitives which interface to the UNIX operating system to do things like operating on primitive data types (integer addition, floating point multiplication, string concatenation, etc.), performing file I/O, managing the display, and controlling keyboard and mouse input. The object manager 48, transaction manager 46, garbage collector 52, and primitives are described in later sections in more detail. The main functions in the interpreter 44 that are discussed in this section can be grouped into the following main categories: (1) bytecode loop; (2) bytecode handlers; (3) context management; (4) process management; (5) initialization and shutdown; and (6) the debugger. 4.2. Bytecode Loop The state of the Alltalk interpreter 44 is captured, essentially, in a global array called Processes. Each element of this array represents one Smalltalk process. At interpreter initialization, one process is created. The user's application can create new processes, switch processes, and destroy processes as needed. Associated with each process is a stack of contexts, and a pointer to one which is the currently-executing context of that process. A context is created when a message is sent or a block is evaluated, and is destroyed when the corresponding message/block returns. Associated with each context is a set of bytecodes for the corresponding method/block, and a pointer to one which is the currently-executing bytecode of that econtext. The bytecodes are the object code to which the user's application was compiled. Each context also has an array of temporaries which are used to hold intermediate results of the execution of the associated method/block. At any given time, only one of the Processes is running; it is the current process. The current context of that process is, then, the current context. The current bytecode of that context is the current bytecode. The basic operation of the Alltalk interpreter 44 is a bytecode decode/dispatch loop. Code exists in the interpreter for handling each type of bytecode generated by the compiler. The interpreter decodes a bytecode to determine its type, then invokes the appropriate code for that bytecode type. We call the piece of code for a particular bytecode type a bytecode handler. Each bytecode handler increments the bytecode pointer so that after the handler completes, the interpreter main loop can decode and dispatch the next bytecode. Bytecode handlers can manipulate the bytecode pointer and other interpreter data structures in ways to affect program flow. The routine exec.sub.-- bcodesO contains the bytecode loop. It decodes the bytecodes and invokes the appropriate bytecode handler. Before doing so, however, it checks to see if it should switch processes, i.e., it checks whether a different Smalltalk process should become the currently-active process. See the section below on Process Management for details on how process switches are handled and new processes are created. 4.3.Bytecode Handlers There is one bytecode handler in the Alltalk interpreter 44 for each type of bytecode generated by the Alltalk compiler 20. Each handler is one (or more) case(s) in a C-language switch statement. The switch statement is part of exec.sub.-- bcodesO in the file exec.sub.-- bcodes.c. Each case of the switch is in a separate file to make source code maintenance easier. At compile time, these files are included in exec.sub.-- bcodes.c via #include's. This strategy was chosen over making the bytecode handlers each separate procedures because it cuts down on call overhead in the bytecode loop. It also allows the use of machine registers for certain control variables, since the handlers are all within a single C language function. Note that thousands of bytecodes are executed each second; overhead for that many calls would be very large. A complete description of the bytecodes and their parameters is included in Table 4.1. ##SPC1## The bytecodes can be grouped into the following categories: execute a primitive; send a message; define a block; evaluate a block; return from a block or method; branch; and assign from one variable to another. We describe each of these next. 4.3.1. Execute a primitive bytecodes discussed: 0x000 --<0x0FF (primitives 0-255) Primitives are called by Alltalk code to do the low-level tasks. These tasks generally depend on the underlying hardware and operating system, and include things like file I/O, integer and floating point arithmetic, and using the display. The bytecodes numbered from 0 to 255 (decimal), i.e., 00 to FF hex, are reserved for primitives. Primitives are similar to methods in that they have a receiver, they have optional arguments, and they return an object. They are unlike methods in that they are written in C rather than Alltalk, and no context is set up for them. 4.3.2. Send a message bytecodes discussed: 0x100 (send.sub.-- msg.sub.-- bcode) 0x10B (send.sub.-- param.sub.-- msg.sub.-- bcode) 0x10C (send.sub.-- message.sub.-- add) 0x10D (send.sub.-- message.sub.-- sub) 0x10E (send.sub.-- message.sub.-- eq) 0x10F (send.sub.-- message.sub.-- add1) 0x110 (send.sub.-- message.sub.-- sub1) The compiler generates several different types of bytecodes for messages. The normal message send is handled by send.sub.-- msg.sub.-- bcode. Messages of the type `perform:` and `perform:with:` are handled by send.sub.-- param.sub.-- msg.sub.-- bcode. These two handlers operate in very similar manner. The main difference is that for send.sub.-- msg.sub.-- bcode, the message selector is known at compile time, and is included in the bytecode itself; for send.sub.-- param.sub.-- msg.sub.-- bcode, the oop of the message selector is found, at run time, in a temporary of the current context. The normal processing of a send.sub.-- msg (and send.sub.-- param.sub.-- msg) is as follows. Note that we do not discuss various optimizations that we have put into send.sub.-- msg bytecodes. These are discussed in a separate section below. (1) Get the oop of the receiver of the message from the temporaries of the sending context. The send.sub.-- msg parameter arg.sub.-- start.sub.-- slot is the index into the temporaries at which this oop is found. (2) If the receiver is not a context or positive integer, call the object manager to fetch the receiver object. Note that contexts and positive integers are not managed by the object manager: contexts are not objects in Alltalk, and positive integers are encoded as negative oops. (3) Determine the receiver's class. If the receiver is not a context or positive integer, its class is found in its object header. (4) Call the object manager to fetch the method associated with the message we are processing. We pass to the object manager the hashed.sub.-- selector and super.sub.-- flag parameters from the bytecode, plus the class of the receiver. It returns the method object which contains the bytecodes for the message we are processing. (5) In the sending context, store the value of the bytecode parameter put.sub.-- answ.sub.-- slot. This is needed when we return to this context from the method we are about to execute. It represents the index of the sending context's temporaries into which the returned result is to be put. (6) Increment the bytecode pointer in the sending context. When we return to this context, we will continue executing bytecodes in this context at that point. (7) Create a new context for the message we are processing. Copy num.sub.-- args arguments from the sending context, starting at arg.sub.-- start.sub.-- slot in the temporaries of the sending context. They are copied into the temporaries of the new context starting at slot 0. Note that this assures that the receiver of a message can always be found in context temporary 0. The new context will have a bytecode pointer which points to its first bytecode. We make this context the current context, and return to the bytecode loop. 4.3.3. Define a block bytecodes discussed: 0x106 (setup.sub.-- blk.sub.-- bcode) In Alltalk, blocks are not objects managed by the object manager, but rather are maintained by the interpreter as C data structures. When they are assigned to instance variables, or returned as the result of a message, they are made into objects, is the home context (this is discussed in more detail below). They can, however, be assigned to method temporaries and passed as parameters in messages without being made into objects first. When the interpreter encounters a setup.sub.-- blk bytecode, it creates a data structure called a block stub, and gives it an object id (which is a 32 bit integer) which we call an oop). The oop is in a special range, i.e., greater than or equal to INIT.sub.-- CNTX.sub.-- ID, so it can be recognized later as a block by the interpreter. The block stub contains enough information to evaluate the block when an eval.sub.-- blk bytecode is later encountered. Its oop is stored back in the temporaries of the home method in which it is defined. It can then be handled like any other oop stored in temporaries (except for the cases mentioned above). 4.3.4. Evaluate a block bytecodes discussed: 0x109 (eval.sub.-- blk.sub.-- bcode) 0x10A(eval.sub.-- blk.sub.-- bcode2) Evaluating a block means executing the code that the block contains. Note that a block must be `set up` before it can be evaluated. However, a block which is set up may or may not be evaluated. For example, the ifTrue: block and ifFalse: block of an ifTrue: ifFalse: message won't both be evaluated. A block may be evaluated immediately after it gets set up, or later. It may be evaluated by the context in which it was set up, or the context which sets it up may pass it as a parameter in a message send, so that it gets evaluated by another context. The eval.sub.-- blk bytecode handler causes a block to be evaluated by converting the block stub for that block into an active context on the context stack of the active process. It makes that new context be the current context, and makes the global bytecode pointer point to the block's first bytecode. 4.3.5. Return from a block or method bytecodes discussed: 0x101 (long.sub.-- return) 0x104(short.sub.-- return) When the Alltalk interpreter 44 encounters a return bytecode, it means that the currently executing context is finished, and it switches control to a previous context. In addition, it passes back an object (actually the object's oop) to the context to which it is returning. There are two different return bytecodes. What we call the long return (also known as return from method) causes the interpreter to return to the context just previous to the home context of the current context. The home context of a method context is itself; the home context of a block context is the context of the method in which the block is defined/setup. Therefore, a long return from a block is the same as doing a return from the block's home method. Long returns are indicated in the Alltalk code by the caret symbol," ". A short return causes the interpreter to return to the context just previous to the current context, regardless of what it is. A short return and a long return from a method context are the same. (The Alltalk compiler 20 always generates a long return for returns from a method context.) A short return from a block means to simply return to the previous context in the stack. This previous context is the context which caused the block to be evaluated; it may or may not be the block's home context. 4.3.6. Branch bytecodes discussed: 0x105 (branch) 0x107 (branch.sub.-- on.sub.-- equal) 0x108 (branch.sub.-- on.sub.-- not.sub.-- equal) Branch bytecodes are used to implement control structures. In addition, branching bytecodes are used by the compiler as part of several optimizations. The unconditional branch bytecode (0x105) simply increments/decrements the bytecode pointer by a certain amount. The conditional branch bytecodes compare an oop found in a temporary of the current context with an oop contained in the bytecode itself. Whether or not the bytecode pointer is incremented depends on the results of comparing these two oops. 4.3.7. Assign from one variable to another bytecodes discussed: 0x1nm (assign type n variable from type m variable) The Alltalk compiler 20 and the Alltalk interpreter 44 understand six different types of variables. These six types are as follows: Type 1 This type of variable is simply an oop that the Alltalk compiler 20 generates, and includes as part of the assignment bytecode. Obviously, it cannot be the destination of an assignment statement, only the source. Examples of Type 1 variables are string, character, integer, and floating point constants, and class names. Type 2 This type of variable is a temporary in the home context of the current context. The Alltalk compiler 20 specifies it as an index into the array of temporaries. Type 3 This type of variable is an instance variable of the object which is `self` in the current context. The Alltalk compiler 20 specifies it as an index into the instance variables of the receiver. Type 4 This type of variable is an indirect reference to a particular instance variable of a particular object. The Alltalk compiler 20 specifies the instance variable by specifying an index into the temporaries of the current context (which specifies the object), plus an index into the instance variables of that object (which specifies the particular instance variable). Type 5 This type of variable is a temporary in the current context. The Alltalk compiler 20 specifies it as an index into the array of temporaries. Note the difference between this and the type 2 variable. For a method context, type 2 and type 5 are the same because a method's home context is itself; for a block, type 2 refers to its home context's temporaries, and type 5 refers to its own temporaries. Type 6 This type of variable is needed for nested blocks in which an inner block refers to an argument of an outer block. The Alltalk compiler 20 specifies the argument by giving two parameters in the bytecode. First is an index into the temporaries of the home context. In that particular temporary is found the id of the block stub of the outer block. The second parameter is an index into the temporaries of the outer block. In that particular temporary is found the oop of interest. Since Smalltalk does not allow assignment to the arguments of a block, a type 6 variable cannot be the destination of an assignment statement, only the source. Each assignment bytecode has a source variable type and a destination variable type. The destination is specified first, then the source. Because type 1 and type 6 variables cannot be destinations, there are 24 assignment bytecodes (4 destination types * 6 source types). The assignment bytecode handlers simply put the oop specified by the source into the location specified by the destination. 4.4. Context Management As mentioned above, the state of the Alltalk interpreter 44 is contained in the global array, Processes. Each element in that array represents a process. In addition to the interpreter's C-language data structure for a process, there is also an instance of Smalltalk Class Process for each Smalltalk process in an application. In the following, we concentrate on the interpreter's data structure for processes, and ignore the Smalltalk object. Each process has associated with it a set of contexts. In the following, we explain how contexts are implemented for one process, but one should remember that there is one set of contexts for each process. In order to improve performance of the Alltalk interpreter 44, it does not treat contexts as objects. Instead, they are maintained by the interpreter as C data structures. (As mentioned above, however, the home context may be turned into an object if an owned block is turned into an object). The Alltalk interpreter 44 manages contexts in two pieces. One piece contains what are called active contexts. These are contexts associated with methods which have not yet returned and blocks which are executing and have not yet returned. This piece operates like a stack: when a message is sent or a block starts execution, the Alltalk interpreter 44 pushes another context on the stack; when a method or block returns, the Alltalk interpreter 44 pops one context (or more, in the case of a long return from a block) off the stack. The second piece contains what are called block stubs. A block stub is established as the result of a setup.sub.-- blk bytecode (see setup.sub.-- blk.sub.-- bcode). In order to treat blocks as objects, object id's (oops) are given to such blocks. The block stubs represent these pseudo-objects. They hold just enough information so that when a block is evaluated (a value message is sent to it), the Alltalk interpreter 44 can create an active context for it. Note that a block stub exists as long as its home context exists; it does not go away just because its associated active context returns. In fact, in the case of loops in Smalltalk code, the same block stub might be evaluated many times, having an active context created from it and destroyed each time. Because block stubs are stored as a separate piece, the active contexts can be allowed to obey a stack discipline. This simplifies context management, and improves performance. The data structure for contexts is defined in "interp.sub.-- types.h". Contexts are of fixed size, and have 64 temporaries each. (Smalltalk defines 64 as the maximum number of temporaries a context may have.) This allows the Alltalk interpreter 44 to allocate space for them and doubly link them at interpreter initialization time, rather than on the fly. They are allocated as an array, and have one array/stack of contexts per Smalltalk process. The routine init.sub.-- cntx0 initializes one context, and it is called by init.sub.-- cntx.sub.-- stackO which initializes and links all contexts for a given process when the process gets created. The data structure for block stubs is also defined in "interp.sub.-- types.h". Block stubs are of fixed size. This allows space to be allocated for them and allows them to be linked at interpreter initialization time, rather than on the fly. They are allocated as an array, and have one array/stack of contexts per Smalltalk process. The routine init.sub.-- blk.sub.-- stub.sub.-- stackO initializes and links all block stubs for a given process when the process gets created. In addition to the two arrays, the Alltalk interpreter 44 maintains a pointer to the current active context, cur.sub.-- cntx, and a pointer to the next available (unused) block stub, next.sub.-- blk.sub.-- stub, for each process. The fields of a context that are important for context management are described next. prev, next Each context has a prev pointer which links it to the previous context in the array/stack, and a next pointer which links it to the next context in the array/stack. These pointers are used rather than the array index to move between contexts. The Alltalk interpreter 44 follows the next pointer of the current context when it needs to add a new context. This happens when a message is sent (see send.sub.-- msg.sub.-- bcode), or a block is evaluated (see eval.sub.-- blk.sub.-- bcode). The Alltalk interpreter 44 follows the prev pointer of the home context of the current context to find the context to which it should return when it does a long return; it follows the prev pointer of the current context itself when it does a short return(see "short return", Table 4.1). home.sub.-- cntx For a method context, home.sub.-- cntx points to itself. For a block context, home.sub.-- cntx points to the context of the method in which the block is defined. This pointer is needed when the Alltalk interpreter 44 does long returns from blocks, and when blocks refer to the temporaries of their home method. By having a method context's home be itself, the Alltalk interpreter 44 can handle all long returns (both from method contexts and from block contexts) in the same way. first.sub.-- block The first.sub.-- block field of a context points to the first block stub that the context could allocate. This is used to free up block stubs when an active context returns. my.sub.-- blk.sub.-- stub For a method context, my.sub.-- blk.sub.-- stub is not used, and is NULL. For a block context, the field points to the context's corresponding stub. This pointer is used by the debugger (described below), and is also used in conjunction with the prev.sub.-- active.sub.-- cntx field to handle the case where one block stub has multiple active contexts at the same time. prev.sub.-- active.sub.-- cntx For a method context, prev.sub.-- active.sub.-- cntx is not used, and is NULL. For a block context, it is used in conjunction with the my.sub.-- blk.sub.-- stub field to handle the case where one block stub has multiple active contexts at the same time. It saves a pointer to the previous active block context associated with this block context's block stub. If this context is the only active context associated with the block stub, then this field holds a NULL pointer. The fields of a block stub that are important for context management are described next. id Each block has an id which is an oop (long integer) in a special range, that is, greater than or equal to the constant INIT.sub.-- CNTX.sub.-- ID. The id's are assigned to a stub when the process to which it belongs is initialized. The id can be stored in the temporaries of other contexts, and can be passed as a parameter in a message send. In this way, blocks can be treated (almost) like real objects for flexibility, and yet be managed by the interpreter for good performance. next Each block stub has a next pointer which links it to the next block stub on the array/stack. When a new block stub is needed, the Alltalk interpreter 44 uses the one pointed to by the global pointer, next.sub.-- blk.sub.-- stub. At that time, it follows the next pointer of the stub pointed to by next.sub.-- blk.sub.-- stub to update next.sub.-- blk.sub.-- stub. home.sub.-- cntx Each block stub has a pointer to its home context. If the stub gets evaluated, the Alltalk interpreter 44 needs this pointer in the active context created for the block. Via this pointer, it can get at the temporaries of the home context. active.sub.-- cntx When a block gets evaluated, the Alltalk interpreter 44 updates the stub with a pointer to the active context that gets created to do the evaluation. This pointer is needed in order to resolve references to type 6 variables. When a block is stored in an instance variable, or passed back from a method the Alltalk interpreter 44 must make the block a persistent object. In so doing, it must also make the home context a persistent object as well, since the block can reference temporaries of the home context. Alltalk contains routines to make the block and its home context persistent objects (and thus they may then be stored in the database and manipulated as any other object), and to put the block and home context back on the stack so that the block can be executed. Referring to the Drawings, FIGS. 4 through 13 show how context management is done in Alltalk. Each Figure shows the same portion of the active context stack and the block stub stack for one process. Each box in the Figures represents one context or one stub; only the fields involved in context management are shown. (The my.sub.-- blk.sub.-- stub and prev.sub.-- active.sub.-- cntx fields are shown only in FIG. 13.) Pointers are indicated by arrows; pointers "connected to ground" represent NULL pointers. Pointers shown in double lines indicate pointers which were changed from the previous figure. The stacks grow downward. FIG. 4, shows the state of the two stacks after the interpreter has been initialized, but no messages have been sent. Note that the next and prev pointers of the contexts, and the next pointers of the stubs have been established. Also, the id's of the stubs have been set. FIG. 5, shows what happens when a message is sent. (We assume that the sending context is just off the top of the figure; the context we are about to create is the top box we see in the figure.) We follow the next pointer of the sending context to "create" a new context (from here on, called method context #1). The new context becomes the cur.sub.-- cntx, and its class is MethodContext. Since it's a method context, its home.sub.-- cntx is made to point to itself. Its first.sub.-- block pointer is made to point to the stub pointed to by next.sub.-- blk.sub.-- stub. Note that next.sub.-- blk.sub.-- stub is not moved; only when a block stub is used (i.e., set up) is the next.sub.-- blk.sub.-- stub moved forward. FIG. 6 shows the stacks after method context #1 sets up its first block. Setting up a block means that the Alltalk interpreter 44 created a block stub; it does not mean that the Alltalk interpreter 44 creates another active context. The block stub pointed to by next.sub.-- blk.sub.-- stub becomes the new block stub. The Alltalk interpreter 44 pushes next.sub.-- blk.sub.-- stub forward to the stub pointed to by the next field of the new block stub. The home.sub.-- cntx field of the new block stub is made to point to the home.sub.-- cntx of cur.sub.-- cntx, i.e., method context #1. Note that if cur.sub.-- cntx were a block context, the home.sub.-- cntx of the new block stub would not be that block context, but rather the block's home context. Note also that method context #1 does not change. FIG. 7 shows what the stacks look like after method context #1 sets up another block. We now have two block stubs whose home.sub.-- cntx is method context #1. FIG. 8 shows the stacks after method context #1 sends a message. To handle this, the Alltalk interpreter 44 must "create" a new context (from here on, called method context #2). The Alltalk interpreter 44 follows the next pointer of the current context to find the next available active context, and make it the cur.sub.-- cntx. Its first.sub.-- block pointer is made to point to the block stub pointed to by next.sub.-- blk.sub.-- stub. Since the new context is a method context, its home.sub.-- cntx field is made to point to itself. FIG. 9 is somewhat more complicated. In that figure, we see the stacks after method context #2 starts to evaluate one of the blocks that was set up by method context #1. (We assume that the block was passed as a parameter in the message which resulted in the creation of method context #2.) The stub to be evaluated is #214740009. To handle this, the Alltalk interpreter 44 must "create" a new active context--but this time, it is a block context. Just as with method context creation, the Alltalk interpreter 44 follows the next pointer of the cur.sub.-- cntx to find the next available active context, and make it the cur.sub.-- cntx. Also, the Alltalk interpreter 44 makes its first.sub.-- block pointer point to the block stub pointed to by next.sub.-- blk.sub.-- stub. However, the home.sub.-- cntx pointer of the new context does not point to the new context itself; because the new context is a block context, its home.sub.-- cntx pointer is gotten from its block stub. In this case, home.sub.-- cntx points to method context #1. Note also that the block stub's active.sub.-- cntx pointer is made to point to the new block context. The transformation of a block stub to an active context is handled by the routine stub.sub.-- to.sub.-- cntxO. FIG. 10, shows how the stacks would appear if the block were to do a short return. Note that the Alltalk interpreter 44 simply follows the prev pointer of the current context to find the context to return to; it is made the cur.sub.-- cntx. Note also that the block stub associated with the evaluated block does not go away, even though its active context did go away. Block stubs go away when their home context goes away (returns). The Alltalk interpreter 44 also moves next.sub.-- blk.sub.-- stub to point to the block context's first.sub.-- block. This effectively "destroys" and frees up any block stubs set up by the block context. (In this case, the block context created no block stubs, so next.sub.-- blk.sub.-- stub does not change.) FIG. 11, shows how the stacks would appear if the block were to do a long return. Remember that a long return from a block is the same as doing a return from the block's home context. In this case, the block's home context is method context #1, so the Alltalk interpreter 44 (in essence) does a return from method context #1. It follows the prev pointer of method context #1 to find the context to return to; it becomes the cur.sub.-- cntx. It also moves the next.sub.-- blk.sub.-- stub pointer back to point to the stub pointed to by first.sub.-- block of method context #1. This effectively "destroys" and frees up all blocks created by method context #1 and any of its descendent contexts. FIGS. 12 and 13 show how the my.sub.-- blk.sub.-- stub and prev.sub.-- active.sub.-- cntx fields are used to handle the case where a block stub may have multiple active contexts associated with it. Note that these fields are shown in these figures only, and only for block contexts. Note also that we have shifted our view of the stacks down (or up) by one context in order to fit the contexts of interest on the page. FIG. 12 shows how the stacks would appear if a second block context was activated for the same block stub as the current context. Note that the two block contexts created from the same block stub are very similar; only their prev.sub.-- active.sub.-- cntx fields differ. Note that the second one uses this field to point back to the previous (first) one. Note also that the active.sub.-- cntx field in the stub is updated so it points to the new context. FIG. 13 shows how the stacks would appear if the second block context did a short return. The Alltalk interpreter 44 follows the my.sub.-- blk.sub.-- stub pointer of the returning block context to find its associated block stub. It copies the prev.sub.-- active.sub.-- cntx pointer of the returning block context into the active.sub.-- cntx field of the stub. Then it does the normal processing for a short return, that is, follow the returning context's prev pointer to find the sending context, and makes it the new current context. Note that in this example, prev and prev.sub.-- active.sub.-- cntx point to the same context, that is, the first block context; however, this will not necessarily be the case. There could be other intervening contexts between these two activations of the same stub. This is why it must save this information in the newly-created context. 4.5. Process Management As mentioned above, the Alltalk interpreter 44 maintains run time data structures for Smalltalk processes in an array called Processes[]. Each element in that array represents one Smalltalk process. Each element contains (basically) a stack of active contexts, a pointer to the current context in that stack, an array of block stubs, and a pointer to the next available stub. The management of these two stacks and two pointers was described in the previous section. However, we have not yet discussed how processes are created, switched, or destroyed. These topics will be discused in this section. 4.5.1. Creating Processes A Smalltalk process is created by sending a message to a block. The block contains the code that is to be executed in the new process. The message sent to the block might be forkAt:, fork, etc. However, all of these messages eventually result in the message newProcess being sent to the block. The Smalltalk code for method newProcess in Class Block is shown in Table 4.2.
TABLE 4.2
______________________________________
newProcess
"Answer a new process running the code in the receiver. The
process is not scheduled."
Process
forContext:
[self value.
Processor terminateActive]
priority: Processor activePriority
______________________________________
The forContext:priority: method in Class Process is a class method for creating new processes, and it is implemented as a primitive in Alltalk. The routine createProcessO is the main routine for creating a new process. It first finds an available element in the Processes[ ] array by calling get.sub.-- proc.sub.-- idO. Then, in order to create a new process in Alltalk, the Alltalk interpreter 44 establishes the first context in that new process. It does that by copying appropriate active contexts and block stubs from the creator process to the created (new) process, and then making slight adjustments to the copies. This is best explained using an example. Suppose an application wishes to create a process that simply prints a message. An example of code to do this is shown in Table 4.3.
TABLE 4.3
______________________________________
Class Test
.vertline.
myTest
.vertline. aProc .vertline.
aProc <- [`This is a new process` print.] newProcess.
"create it"
aProc resume. "schedule it"
Processor yield. "switch to it"
______________________________________
What contexts and stubs should be copied? Obviously, the Alltalk interpreter 44 must copy the user's block, that is, the one in method myTest. Because a block may refer to its home method's temporaries (though in this case it does not), and because a block's bytecodes are actually contained in its home method, it copies both the block stub and its home. In this case, the home context is the method context associated with the execution of myTest. But this is not enough. Note that the method, Block newProcess, which actually sends the message which directly creates the new process (Process forContext:priority:) also creates a block. This block, [self value. Processor terminateActive.], also must be copied; and its home context must be copied as well. In what follows, we call this block the outer block. Note that self in the outer block refers to the user's block. To summarize: the Alltalk interpreter 44 copies the user's block and its home context (see proc.sub.-- copy.sub.-- cntx1O), plus the outer block and its home context (see proc.sub.-- copy.sub.-- cntx2O). After that, it evaluates the outer block, that is, it creates an active context from the block stub. When the new process becomes active, this, in turn, causes the user's block to be evaluated (as a result of the message self value). When that block finishes, the new process is destroyed (as a result of the message Processor terminateActive). Referring to the Drawings, FIGS. 14 through 16 illustrate the relationships between these contexts and blocks. FIG. 14 shows a portion of the active context stack and block stub stack of the creator process. The contexts and stubs shown are the ones that are of interest when the Alltalk interpreter 44 creates the new process. FIG. 15 shows the active context stack and block stub stack of the created process just after it is created by the interpreter. FIG. 16 shows the same stack just after the new process has become active, and the user's block begins to execute. 4.5.2. Switching Processes Switching processes is fairly straightforward. Before each bytecode is executed, the Alltalk interpreter 44 tests the Divert flag; if set, it switches to the process returned by the routine processSwitchO. The routine processSwitchO returns an oop; the routine find.sub.-- processO takes the oop as an argument, and returns a pointer to the corresponding element of the Process[ ] array. The machinery for managing process switches is contained in the module process.c. It follows the implementation described in the standard reference for Smalltalk by Golberg and Robson, mentioned above. 4.5.3. Destroying Processes Destroying (i.e., terminating) a process involves two basic steps. First, the appropriate element of the Processes[ ] array is marked as not in use so it can be reused if needed. Second, the garbage collector (described below) is told to clean up after the process. The routine destroyProcessO handles these two tasks. Processes are destroyed in two situations. The first case is when the interpreter quits. At that time, all active processes are destroyed so garbage collection can be performed correctly. The second case is when a terminate message is sent to a Process object. This second case is implemented via primitives. Note that process 0 is created automatically when the interpreter is initialized; it cannot be destroyed, except by shutting down the interpreter. 4.6. Optimizations Various techniques are used to improve the run-time performance of the Alltalk tool. These techniques are useful independently of the Alltalk tool. They can be advantageously employed in any Smalltalk-like object-oriented programming tool to improve the runtime performance. We describe these techniques below. 4.6.1. Replacing certain message sends with less expensive processing This is referred to as message flattening. The Alltalk interpreter 44 detects at runtime if a message send's only purpose is either of the following 2 cases: 1. Return of an instance variable. 2. Execution of a primitive. The Alltalk compiler 20 flags methods that are of these types, for easy detection at runtime. The Alltalk interpreter 44 will execute the appropriate logic in-line, and modify flags in the bytecode that is being executed, as well as caching in the bytecode itself the class of the receiver. Subsequent executions of the bytecode involved will cause the class of the now current receiver to be checked against the class cached in the bytecode. If it matches, the Alltalk interpreter 44 performs the optimized logic, in-line, without fetching (or executing) the method. Thus this optimization saves the fetching of the method, allocation (and subsequent deallocation) of a new context, and interpretation of the method. 4.6.2. Treating primitives as bytecodes Rather than have one bytecode just for dispatching primitives, (e.g., an execute.sub.-- primitive bytecode), in Alltalk, each primitive is its own bytecode. This eliminates the extra level of indirection to get to the code for primitives. As mentioned previously, primitive bytecodes are in the range of 0.times.000 to 0.times.0FF hex; other bytecodes being at 0.times.100. 4.6.3. Saving a call to the object manager to fetch receiver If the receiver of a message is the same as the receiver of the sending method, the Alltalk interpreter 44 avoids the call to the object manager to fetch the receiver again. Instead, since in Alltalk a pointer to the receiver is held in the associated context, the Alltalk interpreter 44 gets the receiver pointer from the associated context instead. 4.6.4. Replacing `value` messages with block evaluation Since evaluating a block is less expensive than sending a message, the Alltalk interpreter 44 attempts to replace send.sub.-- msg.sub.-- bcodes with eval.sub.-- blk.sub.-- bcodes when possible. The Alltalk compiler 20 recognizes messages with the selector value (and value:, etc.), and replaces them with eval.sub.-- blk.sub.-- bcode2 bytecodes. This bytecode is the same as the eval.sub.-- blk.sub.-- bcode, except that it must check to see that the "receiver" of the value message is a block. If it is not a block, eval.sub.-- blk.sub.-- bcode2 simply returns, and lets processing fall through to the next bytecode which is a send.sub.-- msg.sub.-- bcode for the value message; if the receiver is a block, eval.sub.-- blk.sub.-- bcode2 operates like eval.sub.-- blk.sub.-- bcode, except that it must push the bytecode pointer past the following send.sub.-- msg.sub.-- bcode which it replaces. 4.6.5. Caching methods in send.sub.-- msg bytecodes Alltalk uses a performance-improving technique, common to most Smalltalk implementations, known as method caching. The technique takes advantage of the fact that, while Smalltalk allows polymorphism, a given message often ends up being resolved to the same method every time. How Alltalk takes advantage of this is as follows. The send.sub.-- msg bytecode has two extra fields which implement a method cache. One field is likely.sub.-- class. This saves the class of the receiver of the message when it was last sent. The second field is likely.sub.-- method. This saves the oop of the compiled method to which the message was resolved last time it was sent. When the bytecode is encountered again, the Alltalk interpreter 44 checks to see if the new receiver's class matches likely.sub.-- class; if it does, it uses the compiled method in likely.sub.-- method. If the classes do not match, it must do the normal, more expensive processing to fetch the appropriate method. Note that in Alltalk, when the cache is used, the Alltalk interpreter 44 calls the object manager to reserve the method object, to insure the object is not garbage collected until the object is no longer needed. However, this is less expensive than normal method fetching. Note also, that if the cache is not usable (i.e., the receiver's class does not match likely.sub.-- class), the Alltalk interpreter 44 updates the cache to match the receiver's class and the method's oop in the current message. 4.7. Initialization and Shutdown The main procedure of the Alltalk interpreter 44 is contained in the module interp.c. It performs various types of initializations, then invokes the bytecode loop by calling exec.sub.-- bcodesO. When exec.sub.-- bcodesO returns, mainO does some minor clean up, and exits. Initialization procedures are the following. (1) Command line arguments are processed. These are parameters passed on the statement used to invoke the runtime environment 22. They include switches for relinquishing control of the keyboard and mouse to the Smalltalk application, and for avoiding the normal system booting procedures. Another parameter is an optional filename; it indicates that the interpreter should get the information for the initial message of the application from that file rather than by prompting the user. (2) Signal handling is set up for the I/O primitives. (3) The object manager 48 is initialized via a call to init.sub.-- omO. (4) The values for the initial message are processed via a call to get.sub.-- init.sub.-- valsO. (5) Keyboard and Mouse are `opened` via calls to openMouseO and openKeyboardO, if appropriate. (6) The oops of certain Alltalk objects are referenced in the Alltalk interpreter 44 via global variables. Some of these are fixed to certain oops. For example, true is always oop 257. However some of the oops referenced via interpreter globals must be determined at start up of the interpreter--they are not fixed forever, just for the duration of the interpreter's run. The appropriate assignments are made by calling initializeOopsO. Likewise, certain instance variable indices are referenced by the interpreter via globals. These, too, must be determined at start up. A call to initializeIndicesO takes care of this. (7) The first Smalltalk process is established. See the section above on Process Management for more details. The routines createProcessO, and init.sub.-- processorO do most of this work. (8) The display is `opened` via a call to openDisplayO. (9) The bytecodes and context for the first message are built and made the first one to be executed. Basically, the interpreter 44 builds: (a) send.sub.-- msg and return bytecodes for the message startUp sent to Class SystemBoot; (b) send.sub.-- msg and return bytecodes for the user-supplied initial message. The routines bld.sub.-- dummy.sub.-- bcodesO and bld.sub.-- dummy.sub.-- cntxO perform these tasks. 4.8. The debugger The debugger is named RAID, and it combines many of the features of the standard Smalltalk debugger and the UNIX debugger, dbx. 4.8.1. Overview of the Debugger RAID (Revised Alltalk Interactive Debugger) is the debugger for the Alltalk system. We designed it to be used for debugging both Alltalk applications code, and the Alltalk system (implementation) itself. RAID provides typical debugger capabilities such as: setting break points; stepping through program execution; tracing various types of information (messages, blocks, bytecodes, processes); and displaying values of data structures/variables. RAID is written in C, and is integrated quite closely with the Alltalk interpreter. The user interface is a simple command interpreter, that looks somewhat like the Unix debugger, dbx, to the user. The command interpreter uses UNIX utilities lex and yacc to parse input and dispatch the appropriate C routines that perform the tasks of the RAID commands. 4.8.2. Basic Architecture of RAID There are several versions of the Alltalk interpreter 44, each geared to a particular need. Not all of these interpreters contain RAID. For example, one version is optimized for running debugged applications as fast as possible; leaving out the debugger improves performance considerably. Another version is geared toward the collection of performance statistics; it also does not include the debugger. The version of the interpreter built by default, however, does include RAID. Conceptually, there are three pieces to the implementation of RAID. One piece is a set of C routines in a library separate from the interpreter, that performs the tasks associated with the RAID commands. Each command has a C procedure associated with it, and that procedure may use other utility procedures to do its work. This first piece is conditionally linked to the interpreter depending on which version of the interpreter is made. A second piece is the code within the interpreter that can get conditionally compiled into the interpreter itself; by default, it is included, but it can be excluded if debugging is not needed. This code is included when the C compiler switch DEBUGGER is on. The third piece is a set of global variables and constants that are used to communicate between the first two pieces. In what follows, we will refer to piece one simply as the debugger; piece two will be referred to as RAID code in the interpreter; piece three will be called debugger globals. RAID is invoked when the interpreter calls a routine in the debugger called, appropriately enough, debuggerO. Flow of control is as follows: (1) RAID code in the interpreter calls debuggerO. (2) debuggerO prompts the user, and invokes the lex/yacc command interpreter. (3) The command interpreter parses and interprets the user input, and calls the appropriate C-procedure with the appropriate parameters. (4) The C-procedure performs the tasks associated with the desired command. This usually results in either display of some information (like the contents of the current context), or the updating of the debugger globals (like turning on or off the switch that tells the interpreter to stop at the next message-send). (5) When the C-procedure returns, either control will be passed back to the interpreter at the point at which it called debuggerO, or the debugger goes to step 2. Which path is taken depends on the command just processed. For example, after the continue command executes, control is returned to the interpreter; after the print.sub.-- active.sub.-- cntx command executes, the user is given another RAID prompt. (6) When control returns to the interpreter, it continues, executing both normal code and RAID code. RAID code within the interpreter may call debuggerO (step 1 above); it may update debugger globals; or it may display data to the user based on the values of the debugger globals (switches). 4.8.3. Command Interpreter As previously mentioned, the interactive interface to RAID is a simple command interpreter built using the UNIX utilities lex and yacc. The utility lex defines what are valid tokens in the RAID "command language"; the grammar defines how these tokens can legally be put together to form commands. In addition, the grammar calls the C-procedure associated with the command, passing the command parameters as arguments. The following naming/capitalization conventions are employed for tokens: (1) Tokens representing command names are all uppercase, e.g., MSG.sub.-- STEP. (2) Other terminals have first letter uppercase, all others lowercase, e.g. Hex.sub.-- numeric. (3) Non-terminals are all lowercase, e.g., help.sub.-- param. 4.8.4. Implementation of the Commands This section will give a brief description of how each RAID command is implemented. For each command, we discuss how each of the three pieces of the RAID implementation (debugger, RAID code within the interpreter, and debugger globals) is used. First, we describe the naming/capitalization conventions used in the RAID implementation. 4.8.4.1. Naming conventions Almost all variables, constants, and procedures that RAID uses begin with the letters "d.sub.-- " or "D.sub.-- " (the letter "d" or "D" followed by the underscore character). In addition, we use the following capitalization conventions: (1) RAID global constants are all uppercase, e.g., D.sub.-- PROMPT.sub.-- SYMBOL. (2) RAID typedefs and structure definitions are all lowercase, e.g., d.sub.-- ostat.sub.-- struct. (3) RAID global variables have first letter uppercase, all others lowercase, e.g., D.sub.-- init.sub.-- vals. (4) RAID macros are all uppercase, e.g., D.sub.-- CRESETO. (5) RAID procedures are all lowercase, e.g., d.sub.-- whereO. (6) Associated with each command with name command.sub.-- name is a routine with the name d.sub.-- command.sub.-- nameO. 4.8.4.2. RAID Switches Some operations of RAID are controlled by two sets of binary switches. One set of switches controls the trace information that is displayed as the interpreter runs, e.g., message sends and returns. The other set holds state information, e.g., which RAID command is currently executing. Each set of switches is implemented using a global variable bit vector, plus three macros: one for setting a particular switch (bit), one for resetting a particular switch (bit), and one for testing whether or not a switch (bit) is set. The first set of switches uses the global variable D.sub.-- display.sub.-- switches, and the corresponding macros are D.sub.-- DSETO, D.sub.-- DRESETO, and D.sub.-- ISDSETO. The second set of switches uses the global variable D.sub.-- control.sub.-- switches, and the corresponding macros are D.sub.-- CSETO, D.sub.-- CRESETO, and D.sub.-- ISCSETO. 4.8.4.3. Commands for starting and stopping execution continue This command simply continues execution of the interpreter by causing debuggerO to do a return. We cause debuggerO to return by setting the global variable D.sub.-- in.sub.-- debugger to "0" (zero). quit, restart, rerun The quit command causes the interpreter to exit; restart aborts the current Alltalk application, restarts the interpreter on the same application, and gives a RAID prompt; rerun is equivalent to restart followed immediately by a continue--it does not re-prompt the user before restarting the application. It is important to do garbage collection before aborting an application, so these commands make sure each active Smalltalk process is explicitly destroyed before aborting. The code does different things depending on the state of the Alltalk interpreter 44 when the command is invoked. If the bytecode loop has not yet started, the user is forced to get into the bytecode loop (by executing one bytecode, for example) before allowing any of these commands to be used. If the interpreter is in the middle of an application, i.e., it is inside the bytecode loop, the debugger does longjmpO to an appropriate spot in exec.sub.-- bcodesO where all active processes are destroyed in order to be sure garbage collection is done appropriately. Then it returns to interpO. If an application has just completed, it is already outside the bytecode loop, so the debugger simply returns to the routine interpO; no garbage collection is needed since all active processes ran to completion. In either of these last two cases, the debugger sets the appropriate global switch (D.sub.-- QUIT, D.sub.-- RERUN, or D.sub.-- RESTART) so that when it returns to interp, it knows whether to exit, restart, or rerun. run The run command is similar to restart, but it is used when the user wants to run a different Alltalk application without leaving the interpreter. The code, then must clear all breakpoints (since these are probably not meaningful in the new application), and get new values for the interpreter's initial message. 4.8.4.4. Commands for finding out where you are print.sub.-- message, where The where command is analogous to the dbx command of the same name. It prints out the currently active messages, i.e., the messages sends that have not yet returned. Only those messages in the currently-active process are printed. The print.sub.-- message command prints only the most-recently activated (last sent) message. Both commands use the routine d.sub.-- print.sub.-- msgO to print the message associated with a given context; where calls this routine on all the contexts in the context stack of the current process; print.sub.-- message calls this routine only on the current context. 4.8.4.5. Commands for setting breakpoints stop.sub.-- at This command handles a stop set for a particular bytecode type, e.g., send.sub.-- msg.sub.-- bcode. If the user enters the command without a parameter, the debugger simply prints out the currently-set stop, if any. If a parameter is given, the debugger stores it into the RAID global variable, D.sub.-- stop.sub.-- at.sub.-- bcode. Bytecodes range from (hex) 0.times.100 to 0.times.156; primitives range from (decimal) 0 to 255. The user may specify a bytecode in either range. As the interpreter executes, within exec.sub.-- bcodesO, before executing a bytecode, it checks the bytecode against the D.sub.-- stop.sub.-- at.sub.-- bcode; if it matches, the interpreter calls debuggerO. stop.sub.-- in, delete These commands handle stops set for particular methods and/or classes and/or selectors. More than one stop can be set at a time; the constant D.sub.-- MAX.sub.-- STOPS determines how many stops can be used. Stops are stored in the global array D.sub.-- stop.sub.-- in.sub.-- data. They are identified by number, from 1 to D.sub.-- MAX.sub.-- STOPS. The parameters of the stop.sub.-- in command define a new stop; new stops are added using d.sub.-- add.sub.-- stopO called from d.sub.-- stop.sub.-- inO. As with stop.sub.-- at, if invoked with no parameters, stop.sub.-- in simply prints the currently set stops using d.sub.-- print.sub.-- stopO; Stops are deleted using the delete command. Note that deleted stops cannot be re-used. During interpreter execution, in the send.sub.-- msg bytecode handler, after each send.sub.-- msg bytecode is executed, a check is made to see if the just-executed bytecode matches any of the stops. If so, the stop is printed, and debuggerO is called. 4.8.4.6. Commands for executing a limited portion of the application bcode.sub.-- step This command simply causes the interpreter to continue execution until the next bytecode is about to be executed. It sets the D.sub.-- BCODE.sub.-- STEP switch. During interpreter execution, before a bytecode is executed in exec.sub.-- bcodesO, this switch is tested; if set, debuggerO is called. The switch is reset every time debuggerO is called. goto, skip.sub.-- msg These commands cause the interpreter to continue execution until a particular message is sent. The message is identified by the process in which it executes, and by its sequence number within that process. With the goto command, the user specifies an absolute message sequence number; with the skip.sub.-- msg command, he specifies a relative message sequence number. Note that goto also allows the user to specify a particular process; skip.sub.-- msg uses the current process. The process and message sequence number are stored in D.sub.-- goto.sub.-- skip. These are cleared every time debuggerO is called. msg.sub.-- step This command is to messages as bcode.sub.-- step is to bytecodes. It causes the interpreter to continue execution until the next message is sent. It sets the D.sub.-- MSG.sub.-- STEP switch. During interpreter execution, after a send.sub.-- msg bytecode is executed, this switch is tested; if set, debuggerO is called. The switch is reset every time debuggerO is called. next.sub.-- msg This command is rather more complicated than msg.sub.-- step. This command is to msg.sub.-- step as the dbx command next is to the dbx command step. That is, it causes the interpreter to continue executing until the next message at the current level is sent. In order to do this, it must keep track of what the current level was when the command was invoked; this is stored in D.sub.-- base.sub.-- cntx. As the interpreter executes a send.sub.-- msg bytecode, it checks to see if the message just sent was sent from D.sub.-- base.sub.-- cntx. If so, then debuggerO is called. Also, on every return bytecode, the interpreter che | ||||||
