Compilers using a universal intermediate language4667290Abstract A method for directing a digital data processor to translate a program written in a source language into a sequence of machine executable instructions. The method consists of the translation of the source code into an intermediate language, followed by generation of object code for the target machine, the method being generally applicable to known source languages and to digital data processors. Claims We claim: Description For those ordinarily skilled in the art, attached is a microfiche appendix providing a complete coding printout of the present invention which will be of benefit.
______________________________________
q --srcC (for C)
q --srcPLI (for PL/I)
q --srcCBASIC (for C-Basic)
q --srcF77 (for FORTRAN 77)
q --srcCOBOL (for COBOL)
q --srcPASCAL (for Pascal)
(for RPG)
(for ADA)
(for Modula-2)
______________________________________
2. proc q.sub.-- new(qop.sub.-- type) Start construction of a new quad. The old quad (if there is one) becomes inaccessable and the new quad is current. The argument is the opcode of the new quad--This is one of the q.sub.-- xxx values given in the list and table above. The line and column entries must be set before another q new is performed as well as any Fields which must be set (according to the operator). The start and end values are assumed to be false unless set otherwise. 3. proc q.sub.-- sopcode(qop.sub.-- type) Resets the opcode of the current quad to the code given in the argument. 4. proc q.sub.-- sop(int, *blk.sub.-- type) Set Field n (1, 2, or 3) of the current quad to the supplied block table marker. A pointer to the block table marker (containing the block table index and block table displacement) is passed. The indirect value is assumed to be false and the live and use values are given a default value of true. 5. funct blk type *q.sub.-- rop(int) Returns a pointer to the block table marker for the operand of the current quad specified by the argument. This routine is identical in purpose to the q.sub.-- op() routine described below, except that this must be used exclusively when creating files and q.sub.-- op() is used exclusively when accessing them. 6. proc q.sub.-- sind(int, bool) Set the use value of operand Field int to the boolean value. Note that this must be done after a q.sub.-- sop call. 7. proc q.sub.-- suse(int, bool) Set the use value of operand Field n to the boolean value. Note that this must be done after a q-sop call. 8. proc q.sub.-- slit(int, int) Set the value of the Field specified by the first operand to the value given in the second operand. 9. proc q.sub.-- sstart(bool) Set the start value of the current quad to the boolean argument. 10. proc q.sub.-- send(bool) Set the end value of the current quad to the boolean argument. 11. proc q.sub.-- slocation(unt, unt) Set the line (first arg) and column (second arg) of the current quad. 12. proc q.sub.-- sdupl(unt) Set the duplication field of a data quad. 13. proc q.sub.-- scount(int) Set the count field of the current data quad. 14. proc q.sub.-- sdatabuf(*byte) Set the ten byte data field of the current data quad to the ten bytes pointed to by the argument. 15. venture q.sub.-- crclose() Close the Phi-code.TM. file currently being created. G. File Access Phi-code.TM. files are accessed through a Phi-code.TM. control block ("pcb"). A pcb is allocated by the caller and a pointer to it is passed to the file access routines as required. However, the caller should never interrogate the pcb data structure directly. Calls to manager routines are used to obtain such information. The caller may maintain many open Phi-code.TM. files. The limit is six less than the number of open files allowed in the operating system being used. There is a concept of the `current` quad file and the `current` quad. Whereas most information about the current quad is obtained from routines, some information is maintained in global variables. The following are the global variables used:
______________________________________
qord-type
quadcount The number of quads in the current
file.
qop --type
qopcode The opcode of the current quad.
qord --type
qordinal The ordinal of the current quad.
qline --type
qline The line number of the current quad.
qcol --type
qcolumn The column number.
bool qstart The value of the start bit.
bool qend The value of the end bit.
______________________________________
File access routines will now be listed 1. venture q.sub.-- open(*char, *pcb) Open an existing Phi-code.TM. file for access and make it the current file. There is no current quad. 2. proc q.sub.-- current(*pcb) Make the Phi-code.TM. file specified by the supplied pcb the `current` file. All other file access routines in this section assume the current file. Note that, after a q.sub.-- open or q current is issued, a q.sub.-- get must be done to position the file. The following routines retrieve information from the file header. These calls can only be made after a q.sub.-- open and before a q get or q.sub.-- current call is made. 3. funct short q.sub.-- srclang() Return the source language of the current file. This is a one of the q.sub.-- srcXXX codes described previously. 4. funct char q.sub.-- format() Return the format of the current file. 5. funct *char q.sub.-- qver() Return a pointer to a string which gives the version number of the Phi-code.TM. manager under which this file was created. 6. funct long q.sub.-- crdate() Return the number of seconds which elapsed from 00:00:00 GMT Jan. 1, 1970 (hereafter--the Zero Date) to the creation of the file. 7. funct *char q.sub.-- moddate() Return the number of seconds from the Zero Date to the last modifiication of the file. 8. proc q.sub.-- get(unt) Make the quad whose ordinal is specified by the argument the current quad. q.sub.-- get(0) moves to the first quad in the file. The values of the global variables are set. If the file is positioned past the end of the file, q.sub.-- opcode has the value q.sub.-- eof and the other variables are undefined. 9. funct int q.sub.-- getnxt() Position to the next sequential quad and return the ordinal of this quad. The global variables are set as for q.sub.-- get. 10. funct *blk.sub.-- type q.sub.-- op(int) Return a pointer to the block table marker for the Field specified by the argument (1, 2, or 3). The values of the components of the pointed to structure must be copied out immediately after this routine returns. 11. predicate q.sub.-- ind(int) Yields the boolean value of the indirect flag for the Field specified by the argument. 12. pre dicate q.sub.-- use(int) Yields the boolean value of the use flag for the Field specified by the argument. 13. funct int q.sub.-- lit(int) Return the integer literal value in the Field specified by the argument. 14. funct unsigned int q.sub.-- dupl() Return the value of the duplication field of the current data quad. 15. funct int q.sub.-- dcount() Return the value of the count field. 16. proc q.sub.-- databuf(*byte) Fill the ten bytes pointed to by the argument with the contents of the ten byte data field in the current data quad. 17. proc q.sub.-- acclose() Closes the currently accessed file. The current file can no longer be accessed and the pcb may be re-used on another file. Since no file is current after a q.sub.-- acclose, a q.sub.-- open or q current must be issued. File Modification routines follow. A single file may be opened for modification. It must be linked format. 18. venture q.sub.-- modify(*char) Open a file for modification. The file name is given by the argument. Once the file is open, the following operations may be used to access quads: q.sub.-- get, q.sub.-- getnxt, q.sub.-- op, q.sub.-- ind, q.sub.-- use, q.sub.-- lit, q.sub.-- dupl, q.sub.-- dcount, and q.sub.-- databuf. Also, the following routines may be used to modify the fields of the current quad: q.sub.-- sop, q.sub.-- slit, q.sub.-- sind, q.sub.-- suse, q.sub.-- sstart, q.sub.-- send, q.sub.-- sdupl, q.sub.-- scount, and q.sub.-- sdatabuf. 19. proc q.sub.-- sopcode(qop.sub.-- type) Sets the opcode of the current quad to the code given in the argument. Note that, after changing the opcode of a quad, the contents and number of fields needs to be set correctly, according to the quad. 20. proc q.sub.-- insert(short) Insert a quad after the current quad with the supplied opcode. The new quad becomes the current quad. 21. proc q.sub.-- delete() Delete the current quad. The next sequential quad becomes the current quad. 22. proc q.sub.-- modclose() Close the modified file. H. Phi-code.TM. File Formats This section specifies the disk file format of sequential and linked format files. The first section of a Phi-code.TM. file contains information about the file. This section, called the header section, contains the allowing fields, in order: a. File format: a single character: `s`=sequential, `l`=linked. b. File ID: The 18 character string: "The Philon System". c. Version of the Phi-code.TM. manager which created the file. This is a 5 character string of the form "x.xx". d. Creation time: 4 bytes which give the number of seconds which elapsed between the Zero Date and the creation of the file. e. Time of last modification: 4 bytes. f. Quad count: Two bytes which give the count of quads in the file. g. Identity of source language a single byte which contains the code of the source language h. The remainder of the file contains quads. 1. Sequential Format All quads are stored in variable length fields. This section describes the format of storage for operational quads. The format of data initialization quads has been described previously. The first byte (byte 0) contains the operator code. This is followed by a pair of bytes which contain the line number as an unsigned 16-bit integer quantity. The high-order bit of the following byte (byte 3) is set whenever the first operand is a stack operand; the column number is specified by the low-order 7 bits of the same byte. A column value of 1 indicates the first column in the source code. Following the column is an exec count/break count of 16 bits for labels only. Bit 0 of byte 4 is set if the third operand is a stack operand. Bit 1 is set if the second operand is a stack operand. The remainder of byte 4 contain the indirect and use bits for the 3 operands (top for first operand). Up to three Fields follow, each requiring 3 bytes The block index is given in the first byte and the block displacement is stored in the next two bytes. The format of a quad is summarized: ##STR1## If a field contains a literal, the value of the literal is stored in the first two bytes of the field. The last byte of a literal field is unused. 2. Linked Format The format of quad storage in linked format is identical to that for sequential storage. However, each quad has two additional link fields which point to the disk address (byte offset from the start of the file) for the preceeding and subsequent quad. Each link field requires three bytes. These two fields follow, in order, after the basic 14 bytes for the quad. III. SYMBOL TABLE The term "symbol table" refers collectively to three objects: the hash table, the name table, and the block table. The block table (often called synonymously the "symbol table") is the main source of information concerning user-defined and pre-defined items in a source program. It is created and used by the passes of a Philon System compiler. Two other tables are also used. One of these, the "hash table", results from a mapping from the item's name to an integer. This integer gives the location of the item in the hash table and provides a quick access to the item. An entry in the hash table consists of a "block table marker", which gives the location of the symbol table entry for the item, and a "names table marker", which gives a similar location in the name table, described below. Thus the hash table serves essentially as a an index to the item's information contained in the symbol and name tables. The "name table" is simply a list of the names of all relevant items in the program, terminated by nulls. This is the only piece of information contained here. The symbol table is created during execution of the first pass of the compiler. Here a standard symbol table containing all pre-defined entries (such as run-time procedures and built-in types) is simply copied. No new entries are made during this pass. The second pass actually creates the symbol table, making entries for objects and types encountered in the source program. The allocator will fill in certain fields of entries, such as their size and offsets. The table will now be used by the code generator or the source debugger. A. FORMAT OF THE SYMBOL TABLE FILE The symbol table file resides on the disk and is paged in to memory during execution of a pass of the compiler. The file itself consists of arbitrarily many pages of 512 bytes each. Each block of the symbol table (see below to one or more pages of the file. The first page (page 0) of the file is the file header, which contains several fields. Field 0 is a single character which gives the status of the file (a blank means "file OK"; anything else means "file corrupt"). Field 1 is the string "The Philon System". Field 2 is a string containing the version number of the symbol-table manager module. Fields 3 and 4 each contain 4-byte long integers which are time stamps in UNIX System format. The first time stamp is the file creation date, the second is the date of last modification. Fields 5 and 6 each contain 2-byte integers, the first of which is the file block count, the second the file page count. Page 1 of the block-table file is the first page of a variable-length directory called the Block Directory (BD). This directory gives, for each block in the table, the block table markers of the first and last entries of the block (both are zero if the block is empty). A block table file is initialized with one page devoted to the BD, but more pages can be allocated for this purpose if the size of the table demands it. The first two bytes of each page of the BD contain a forward link to the next page allocated to the BD (zero means no next page). The second two bytes are unused, to permit an integral number of four-byte entries for each block to fit on each page of the BD. "Normal" pages of the block table contain the entries. However, the first six bytes contain special information. The first two bytes are the block table marker of the first entry of the next page allocated to this block (zero means no next page). The next two bytes are the block table marker of the last entry of the previous page allocated to this block (zero means no previous page). The fifth and sixth bytes contain the block index of the block to which this page belongs. B. THE SYMBOL TABLE AND BLOCK STRUCTURE The symbol table was designed with the implementation of block-structured languages in mind. Each time a new scope is encountered, a new "block" is created. Entries local to this scope are put in this block. When an item is searched, the search proceeds from the innermost block out. Several items in the symbol table may thus exist with the same name. At the end of a scope the innermost block is "popped". If several items in the symbol table have the same name, these are chained together using a field in the entry. The hash table, through which access is made, will always point to the visible entry. This entry will in turn contain in its "back chain" field a pointer to the entry it has hidden. Thus, when a block is popped, the hash table can be restored to once again point to the entry momentarily hidden. C. FORMAT OF AN INDIVIDUAL ENTRY In general, an entry in the symbol table is either an object or a type. Certain fields are common to objects and types. These include: the size in the symbol table of the entry; its visibility; its hash table location; the size of the object or type; alignment information; a block table marker for another entry of; the same name; and whether this item is an object or a type All entries for objects have the same size. They contain the following fields: a block table marker for its type; whether or not the item is constant; the address of the object; a trap or register allocation flag; and initialization information Entries for types, on the other hand, may vary considerably in size and the amount of information they contain. Each type entry contains a field, called its "base type". Depending on its base type, it may contain other fields. For example, an entry with base type "array" will contain a field which gives the element type of the array and another field which will give bound information, etc. D. NOTATION FOR DESCRIBING ATTRIBUTES Each piece of information in the symbol table is called an attribute. We describe all of these below. The format we use can be illustrated by a sample entry. nature (ntr) (at.sub.-- type) Is the item an object or a type? Possible values: at.sub.-- obj: an object at.sub.-- typdf: a type This attribute gives the "nature" of the item. Below we explain this simply means whether it's an object or a type. The first item in parentheses is the "abbreviation" of the attribute itself (at-type means "attribute type"). If the possible values of the attribute are limited, these are enumerated below: here we see the nature can only be "at.sub.-- obj" or "at.sub.-- typdf". Abbreviations are used in giving the attributes values and in obtaining their values. In general, if "abbr" is the abbreviation of an attribute, to set this field in the entry whose block table marker is "btm", one would write st.sub.-- abbr(btm, value) That is, calling the routine "st.sub.-- " followed by the abbreviation will set this field to the value "value". To retrieve this value, one can write abbr.sub.-- of(btm) where the routine abbreviation followed by "of" is a function returning the value of the appropriate attribute. However, if the type of the item is "bool" (either true or false), the routine "abbr.sub.-- of" is called instead "is.sub.-- abbr" so one writes is.sub.-- abbr(btm) which returns true or false. E. ATTRIBUTES COMMON TO ALL ENTRIES 1. visibility (vis) (at.sub.-- type) The visibility of an item. Possible values are: at.sub.-- loc: local--visible in this module only at.sub.-- imp: imported--actually defined in another module but can be referenced here at.sub.-- exp: exported--defined here and can be accessed on an other module 2. alignment (algn) (int16) Alignment necessary for this item. At present, only two bits are allocated for this field, so the value can therefore be 0, 1, 2, or 3. 0: no alignment necessary 1: byte alignment 2: double byte alignment 3: quad byte alignment 3. object size (osize) (unt32) The actual size to allocate for this object or for an object of this type. Note that this field is now 32 bits wide. For type entries which are procedures or blocks, this field is used to give the automatic frame size. 4. nature (ntr) (at.sub.-- type) Whether this item is an object or a type. Possible values are: at.sub.-- obj: an object at.sub.-- typdf: a type 5. hash-table marker (htm) (hs.sub.-- type) The location in the hash table of this item. Certain items such as temporaries are unnamed, and so are not entered in the hash table. This field will contain zero for such items. 6. backchain marker (bkchn) (blk.sub.-- type) The block-table marker for another entry in the symbol table which has the same name as this item. As mentioned above under "Block Structure", this field is used to maintain scoping. If no item has the same name (or this is the first entry with this name), this field will be zero. 7. block index (bkdx) (bindx.sub.-- typ) The block number of the block in which this entry resides. Each scope (block) of the program is given a unique number. This value is actually not contained in the symbol table entry for the item itself, but is rather computed from the block table marker. F. ATTRIBUTES FOR OBJECTS 1. storage class (stcl) (at.sub.-- type) When and where does this object live? Possible values are: a. at.sub.-- stat: static--lives forever (one copy only) b. at.sub.-- auto: automatic--lives on the stock, created when local block is entered c. at.sub.-- reg: register--like automatic, but user has instructed to keep in register as much as possible (essentially treated like automatic, as we do our own register allocation) d. at.sub.-- comm: common--lives statically, in a COMMON block (FORTRAN). e. at.sub.-- eqv: equivalenced--it lives in a location relative to another object. The "equivalence link" field gives the symbol table entry of the item it is equivalenced to. The "equivalence offset" field gives its relative displacement to or from this latter item. f. at.sub.-- base: based--does not live until explicitly allocated in the program. The "base pointer" field points to the block table marker for a pointer which will point to this item after allocation. References to this object are through this pointer. g. at.sub.-- cntr: controlled--similar to based (PL/I only). Not implemented. h. at.sub.-- temp: temporary--like automatic, but results from intermediate calculations, rather than being user-defined. Identifies the object as a temporary. i. at.sub.-- stemp: static temporary--a temporary for languages like FORTRAN, BASIC, and COBOL, where all items live statically. j. at.sub.-- prm: parameter--a formal parameter to a procedure. Allocated upon entry to the procedure (below frame). k. at.sub.-- regprm: parameter which is instructed to keep in a register (`C` only). l. at.sub.-- fbuff: this item is the buffer for a file--it has static allocation. The initialization area for the entry is used to indicate the file it is the buffer for, and the next buffer (if any other) for the same file. m. at.sub.-- nost: no storage--no space required for this item. Procedures and labels have this storage class. 2. equivalenced item (eqlk) (blk.sub.-- type) Applies only if the object has storage class "eqv". This is the block table marker of the object to which it is equivalenced. 3. equivalence offset (eqoff) (int16) Applies only if the object has storage class "eqv". This is the offset (plus or minus) relative to the equivalenced item which determines exactly where in memory this object should be allocated. 4. file to which I am buffer (tofile) (blk.sub.-- type) Applies only if the object has storage class "fbuff". This is the block table marker of the file which this object is the buffer for. User in deciphering references of the form FILE-BUFFER OF FILE-NAME in COBOL. 5. next file buffer (nxtbuf) (blk.sub.-- type) Applies only if the object has storage class "fbuff". This is the block table marker of the next file buffer for the same file. 6. is a constant (const) (bool) Is this item a constant? Values are: true: is a constant false: is not a constant 7. is pre-defined (pdef) (bool) Only applies to procedure objects. Is this a pre-defined (built-in) or user-defined procedure? Values are: true: is pre-defined false: is user-defined 8. is referenced (refd) (bool) Only applies to procedure objects. Is this procedure actually called in the program? Used for pre-defined procedures, so that only required run-time routines are actually linked. The storage allocator also uses this item to indicate "forward references" in the generation of storage directives. Values are: true: is referenced false: is not referenced 9. address (addr) (unt32) Only applies to objects whose storage class is static. Value contains the location where storage for this object starts, relative to the starting point of some area. 10. offset (ofs) (int32) Only applies to objects whose storage class is automatic, parameter, register, or temporary. Its value is zero or positive for objects allocated in the automatic area of a stack frame and zero or negative for parameters. 11. kind of initialization (kdin) (at.sub.-- type) The kind of initialization supplied for this object. Valves are: a. at quad: initialization is in q-code (see attribute "q-code pointer") b. at.sub.-- name: initialization is in name table (see "name pointer") c. at.sub.-- intlt: initialization is an integer (see "integer value") d. at.sub.-- untlt: initialization is an unsigned integer (see "unsigned integer value") e. at.sub.-- lblt: initialization is a label value (see "lavel value") f. at.sub.-- strlt: initialization is a string (see "string value") g. at.sub.-- chrlt: initialization is a character (see "character value") h. at.sub.-- desc: initialization is a descriptor for some dynamic run-time object and is given a standard initialization. i. at.sub.-- ptlt: initialization is for a pointer to be set to the address of a data object, possibly modulo some offset (see "pointer value" and "pointer offset") j. at.sub.-- cxlt: initialization for a complex data item (FORTRAN; see "complex--real value" and "complex--imaginary value") k. at.sub.-- figlt: initialization is a figurative constant (COBOL). The high-order 3 bits of the initializatlon give the number associated with each figurative constant (1 for ZERO, 2 for SPACE, etc.). The remaining 29 bits supply the length of initialization required. l. at.sub.-- allfig: an initialization of the form ALL "string" (COBOL). The high-order 16 bits of the initializationn give the names table marker for the string, and the low-order 16 bits give the length required. m. at.sub.-- biglt: this initialization will appear only within q-code. Similar to `ptlt`, except the block table marker of the object addressed is in the low-order 16 bits of the associated initialization value, and the offset is the entire 32-bit initialization value following quad. n. at.sub.-- none: no initialization 12. q-code pointer (qdpt) (qord.sub.-- typ) A pointer (actually the byte offset) to a quadruple in the q-code file. Applies when kind initialization=at.sub.-- quad 13. integer value (invl) (int32) The integer initialization value. Applies when kind initialization=at.sub.-- intlt 14. unsigned integer value (unvl) (unt32) The unsigned integer initialization value. Applies when kind initialization=at.sub.-- untlt 15. float value (ftvl) (nm.sub.-- type) A name table index. Applies when kind initialization=at.sub.-- name 16. label value (lbvl) (qord.sub.-- typ) A q-code pointer to the "label" quadruple for this label. Effectively the location of this label in the intermediate code. Applies when kind initialization=at.sub.-- lblt 17. string value (stvl) (nm.sub.-- type) A name table index, providing access to the string. The string is enclosed in quotes. Applies when kind initialization=at.sub.-- strlt 18. character value (chvl) (char) The ASCII value of the character which is the initialization. Applies when kind initialization=at.sub.-- chrlt 19. pointer value (ptvl) (blk.sub.-- type) The block table marker of the object whose address is to be the initialization of this item (which is a pointer). Applies when kind initialization=at.sub.-- ptlt. See the next entry, "pointer offset". 20. complex--real value (cx1) (nm.sub.-- type) The real part of the initialization of a complex data item. It is a name table marker for the representation in characters of the value. Kind of initialization=at.sub.-- cxlt 21. complex--imaginary value (cx2) (nm.sub.-- type) The imaginary part of the initialization of a complex data item. It is a name table marker for the representation in characters of the value. Kind of initialization=at.sub.-- cxlt. 22. bcd value (bcd) (nm.sub.-- type) The initialization value for a BCD data item (with base type=at.sub.-- bcd). It is a name table marker giving the string representation of the value. Kind of initialization=at.sub.-- name. 23. type (tref) (blk.sub.-- type) The block table marker for the type entry which is the type of this object. G. ATTRIBUTES OF TYPE ENTRIES Unlike objects, type entries vary greatly in the fields they contain. The most significant attribute of a type entry is its "base type", which might be integer, float, array, structure, procedure, or many other things. Depending on the base type, the entry will contain other fields. Type entries in general can be sure of containing only one other attribute, which is: 1. examined (exam) (bool) Has this entry already been looked at by the storage allocator?0 The allocator processes object entries in order, but only processes type entries when an object of that type is found. By setting this field to true, the allocator avoids unnecessary processing, as well as infinite loops. Values are: true: has been processed false: has not been processed 2. base type (bsty) (at-type) The fundamental characteristic of this type. Says what kind of animal this type is. The remaining attributes this entry may have depend on the base type. The continuation of the description of the symbol table appears in Appendix 2. IV. Back Ends of Compilers According to the Present Invention The third pass of the compiler 14 is responsible for memory allocation. This program operates by examining each entry in the symbol table, and assigning a memory location to it. In determining the amount of memory space to be allocated to a particular entry, consideration is given to the entry's attributes, as contained in the symbol table. Examples of attributes considered are type, storage class, whether the entry is exportable, and whether its location is determined automatically relative to some internal frame of reference. The output of the memory allocation phase consists of additional entries in the symbol table which indicate where entries will be stored in memory after compilation is complete, as well as assembly language directives for implementing this storage. The Code Generator is the fourth 18 part of this compiler. This program module is responsible for generating assembly language instructions 20 from the intermediate language, the Phi-code.TM. quads in the preferred embodiment, and the symbol table created in the semantics pass. The Code Generator operates by identifying the assembly language instruction required to implement a corresponding line of Phi-code.TM.. This is accomplished by creating look-up tables containing pointers to the appropriate assembly language instruction. Since different labels and identifiers will appear in the second and third fields of the Phi-code.TM. instructions, for every source code program, the variable and label entries in the assembly language instructions in the code generator tables must be left incomplete. These incomplete instructions function as templates. When a particular assembly language instruction is required, the opcode and number of operands are determined from the stored template form; the operands are determined from the symbol table. Subroutines, known as machine dependent routines, or MDR's perform the task of completing the assembly language templates. For example, in the case of integer addition, the sum of two integer variables is obtained and stored in a location assigned to a third integer variable. A possible source code statement written in FORTRAN would be I=J+K (eqn 7) This would be translated into a series of intermediate language, statements, Phi-code.TM. in the preferred embodiment the exact statements depending upon how the independent variable J and K receive values. The Phi-code.TM. statement for integer addition, however, is simply q.sub.-- plus J K I (eqn 8) where the operand fields have been assigned the labels from the source code program. Assuming that the target machine utilizes a PDP-11 instruction set, the assembly language instructions produced by the Code Generator for eqn 8 would be mar J, R O (eqn 9) ADD K, R O (eqn 10) MOV RO, I (eqn 11) The Code Generator tables, referred to earlier, are used in like fashion to retrieve corresponding assembly language templates for each line of Phi-code.TM. in the intermediate program. The actual templates fetched by the Code Generator manager would not include the variable labels shown in the above example; these would be obtained from the symbol table by the MDRs. The fifth part of the compiler is an assembler 24. As is customary, the assembler converts assembly language statements 20 created by the code generator into executable machine language statements 26. The output of the code generator consists of a file of opcodes and operands, specified in ASCII code, appropriate for the target machine. These must be converted into binary code by the assembler. The actual operation of the assembler consists of examining each line of assembly language 20, identifying the op-code portion, then locating the corresponding machine language equivalent in a table, together with permissible arguments (operands) for the op-codes. The operands in the assembly language statement are then checked against the table entries to determine whether they are permissible and, if so, translated into binary. The output of the assembler is a translation of the source program into object code 26, in machine language form. This concludes the process of compilation. V. Optimization Techniques in Compilers According to the Present Invention As mentioned previously, and as shown diagramatically in FIG. 1, there are measures which may be taken to "optimize" the object code produced by a compiler operating according to the present invention. Listed below are the optimization techniques employed by the optimizer. Techniques are identified as faster and/or smaller to show how they improve the program. A deliberate attempt has been made to explain these techniques in simple, easy-to-understand terms. A. Code Elimination There are two ways of eliminating useless code: forward elimination, which removes code that can never be executed, and backward elimination, which removes code that even if executed would perform no useful function. An example of useless code is the middle statement of the following BASIC program: GOTO 30 PRINT "UNREACHABLE" 30 END The statement after a GOTO is always a potential candidate for being unreachable. These situations occur only rarely in source code. However, they often arise after other optimization techniques have been applied. 1. Forward Elimination The optimizer uses three techniques to identify code which can never be executed. They are listed in order of increasing thoroughness: a. Unlabelled Code Elimination (smaller) As in the BASIC example above, the optimizer can eliminate code that is always bypassed and never branched to. This technique removes code which follows a GOTO and has no statement number or label. b. Unreferenced Code Elimination (smaller). Even if the middle statement in the example did have a statement number, it still would never be executed. This technique removes code which has a statement number or label that follows a GOTO, but which is never the destination of a GOTO statement elsewhere in the program. c. Unreachable Code Elimination (smaller). Finally, suppose there was a GOTO statement which GOes TO the middle statement in the example. If that GOTO statement itself can never be executed, then neither will the middle statement, and we can eliminate both of them. This technique removes disconnected subgraphs of the program's flow graph. 2. Backward Elimination a. Dead Variable Elimination (faster, smaller). Most of the computations of a program are performed in the computer's registers, and are then stored in variables in the computer's memory. If the result of a computation is used while it is still in a register, there is no need to store it into memory. This technique removes the code which stores variables which are dead on exit from a basic block. b. Dead Computation Elimination (faster, smaller). If the result of a computation is never used at all, the code that computes the result can be eliminated as well as the code that stores the result in memory. This technique uses du-chains to remove code that computes and stores a result that has no next use in the current basic block and is dead on exit. c. Dead Code Elimination (faster, smaller). This technique removes code that has no effect on the output of the program. For example, in a routine which calls no other routines, we can eliminate a computation which cannot affect any global data, output statements, or the return value of the routine. B. Register Allocation Computations may be performed either in the computer's register or in its memory. The code will be faster and smaller if it uses the registers as much as possible, and avoids unnecessary movement of data between the registers and memory. Unfortunately, there are usually more variables in a program than registers in a computer. In addition, there are some languages which do not permit certain variables to reside in the registers. For example, variables whose addresses are taken in the language C must be located in memory. Whenever possible, the present invention allocates the limited number of registers to the most frequently used variables in the program. 1. Local Register Allocation (faster, smaller). Let the result of a computation stay in a register to make it more accessible for future use. For example, the result of the two multiplications shown below will be used immediately by an addition. PRINT A*B+C*D Therefore, the two products are kept in registers, rather than moving them into memory and back. 2. Local Register Allocation with Next-Use Information (faster, smaller). In the above example, the optimizer holds the products in two registers for use by the addition. Next-use information allows allocation of the two registers for new purposes after performing this addition. This technique immediately signals when used information is no longer needed in the valuable registers. 3. Available Register Allocation (faster, smaller). The above techniques allow the optimizer to remember only for a short period (within a basic block) the valuable results that are held in the registers. Available Register Allocation is a technique for keeping track of the contents of the registers, and eliminating the code to move a value from memory into a register when the value is already there. At the start of each basic block, the register descriptors are initialized to the intersection of the register descriptors at the end of each predecessor block. 4. Loop Based Register Allocation (faster). This technique keeps the induction variables of loops in a register. For example, the variable I is accessed each time the loop is executed,
______________________________________
FOR I = 1 TO 1000
. . .
NEXT I
______________________________________
so the optimizer keeps it in a register to avoid having to move it there from memory a thousand times. 5. Base Register Allocation (faster, smaller). There are two ways (or addressing modes) that an instruction can access a variable in memory. The instruction can specify the variable's memory address, or it can specify a register that holds the variable's memory address. The code will be faster and smaller if it uses the second addressing mode as much as possible. This technique allocates a base register to hold the address of frequently accessed variables. Note that a single base register can access many static variables in the same segment of memory. C. Code Motion A program can often be made faster and smaller by rearranging the order of the instructions. These optimizations need the Flow Graph, Data Flow information and, possibly, Chaining information. 1. Code Hoisting and Sinking (smaller). Sometimes the same piece of code occurs in different places in a program. Hoisting and sinking are techniques for removing repeated pieces of code and replacing them by single copy at one strategic point. Hoisting inserts the single copy at a point before all the places where the code was before; sinking inserts the single copy at a point after all the places where the code was before. More techinically, if a piece of code appears at the head of each successor of a basic block, the optimizer hoists the code out of the successors, and replaces it by a single copy at the tail of the basic block. Sinking is handled similarly. 2. Branch Minimization (faster, smaller). This technique moves the sections of the program around so that fewer GOTO's are necessary from section to section. It topologically sorts the basic blocks of the program. 3. Code Juxtaposition (faster). Code that is within a loop is executed many times; code outside of a loop is executed once. This technique moves instructions out of a loop so that fewer instructions will be repeatedly executed. For example, consider the BASIC WHILE loop:
______________________________________
WHILE(A = B)
. . .
WEND
END
______________________________________
The loop may be translated into the following three instructions: 1 if a<>b then goto 4 2 . . . 3 goto 1 4 end After code juxtaposition, the loop contains only two instructions. 1 goto 3 2 . . . 3 if a=b then goto 2 4 end 4. Constant Propagation and Folding (faster, smaller). If the optimizer can predict the value of a variable in a given statement, it will generate code which contains the variable's value instead of its memory address. This eliminates the need for code that copies the value from memory into a register. In addition, if the optimizer can predict the result of a computation, it will generate code that uses the result without performing the computation. Finally, knowing the values of variables can sometimes allows prediction of whether an IF statement is true or false, allowing an opportunity for Forward Elimination. (See Section A.) 5. Available Expression Elimination and Very Busy Expression Hoisting (faster, smaller) Are two techniques for replacing many similar computations by a smaller number of computations. The results of the replacement computations are stored in variables which are used in place of the deleted computations. Technically, if an expression is available in a basic block (i.e., computed on all paths into the block), the optimizer stores the value into a temporary variable as soon as it is computed. Similarly, if an expression is very busy in a basic block (i.e., computed on all paths out of the block), the optimizer hoists its computation into the basic block and stores the value into a temporary variable. In both cases, the optimizer accesses the temporary instead of recomputing the expression whenever there has not been a reassignment to the operands of the expression. 6. Loop Invariant Code Motion (faster). Instead of performing the same computation over and over inside a loop, perform it once before the loop begins. For example, in this BASIC program, FOR R=1 TO 3 PRINT "CIRCUMFERENCE IS"; 2 * PI * R NEXT R the multiplication of 2*PI yields the same product during each loop, so the optimizer needs to compute it only once. 7. Loop Unrolling (faster). In the above example, the program divides its time between the PRINT statement and the FOR/NEXT statements which serve only to repeat the PRINT statement. The FOR/NEXT statements are eliminated simply by producing three translations of the PRINT statement. PRINT "CIRCUMFERENCE IS"; 2 * PI * 1 PRINT "CIRCUMFERENCE IS"; 2 * PI * 2 PRINT "CIRCUMFERENCE IS"; 2 * PI * 3 Given a loop which runs a constant number of times (or a multiple of a constant number of times), the optimizer copies the body a number of times equal to some small divisor of the constant. Then each copy can be terminated simply by an update of the index variable rather than by an update of the index variable rather than by an update, test, and branch. Furthermore, if the index variable is not used within the body, the update can be done once (by a larger amount) at the end of the expanded body. D. Computation Alteration These techniques make loops smaller and much faster. 1. Induction Variable Strength Reduction (faster, smaller). Replace computations in a loop which involve the induction variables by faster and smaller computations. This is done by mimicking the computation using an arithmetic progression. 2. Induction Variable Elimination (faster, smaller) Combine several induction variables where each forms an arithmetic progression into a single induction variable. This is often applicable after Strength Reduction since it generates new induction variables. E. Control Structure Modification 1. Switch Optimization (faster, smaller). These techniques generate efficient code for IF and SWITCH statements to optimize time and/or space. Several data structures are possible for code generated for SWITCH, including a lookup table, index table, binary search table, near-perfect hash table, or a combination of conditional logic and some of the above. F. Miscellaneous 1. Linked Jump Elimination (faster). In the following BASIC program, 10 GOTO 100 20 GOTO 10 Statement 20 is translated as if it said GOTO 100. This eliminates unnecessary jumping around. Such situations often occur as a result of other optimizations. 2. Branch Reduction (faster, smaller). In native code, a GOTO to a distant instruction in the program takes more time and space than a GOTO to a nearer instruction. All GOTO's are made as fast and small as possible by checklng the distance from each GOTO to its destination. Note that shortening one GOTO instruction may shorten the branching distance of another GOTO, allowing opportunities for further improvement. 3. Linking Span Dependent Instructions (smaller). If one part of a program has many GOTO's to the same distant instruction, the optimizer replaces them all by "short" GOTO's to a single "long" GOTO whose destination is the distant instruction. G. Peephole Optimization 1. Context Sensitive Peephole Optimization (faster, smaller). The Philon System translates each statement of the source program into a separate list of native code instructions. The Peephole Optimizer can examine an arbitrarily large section of code, including instructions that were generated from separate source statements. It merges together instructions generated from adjacent statements, to take advantage of shortcuts which can not be anticipated during a statement-by-statement translation. The Peephole Optimizer changes patterns in the code into faster and smaller replacements. It performs context-sensitive pattern matching using a well-designed set of matching primatives are available to detect usage of a register later in the basic block and whether the register is live on exit from the block. H. Program Level Optimizations 1. In-line Routine Substitution (faster). In the same way that loops can be unrolled (see Section C), subroutine calls are unfolded by replacing each call by a copy of the subroutine. This frees the program from spending time executing the GOSUB/RETURN statements. Calls to small nonrecursive routines are replaced by the body of the routine, substituting formal parameters by actual arguments. 2. Subroutine Instantiation (faster). This is a combination of the previous technique with Constant Propagation (See Section C). It is used for subroutines that are frequently called with a constant argument. 3. Macro Compression (smaller). This is a more thorough technique than Hoisting and Sinking (See Section C) for removing repeated sections of code and replacing them by a single copy. Macro Compression catches repeated sections which are immune to Hoisting and Sinking, and changes them into calls to a subroutine which performs the same function. In the simple case, the synthesized subroutine has no parameters and replaces a constant sequence of instructions. This may be made even smaller by allowing the synthesized subroutine to take parameters so that it can replace different sequences of instructions. Although the present invention has been disclosed in terms of particular examples and embodiments, it should be clear that the scope of the invention is not limited to these disclosed examples. Rather, as will be readily apparent to one with ordinary skill in this art, many other implementations of the invention disclosed herein are possible. APPENDIX I This appendix defines the intermediate form (AST) used as the interface between passes one and two of the Philon COBOL compiler.
______________________________________
program: `a --cmpunt` iddiv envdiv datadiv procdiv
`a --endcmpunt`
iddiv: id
id: `a --id`
<hash value>
envdiv: `a --envdiv` configsect iosect samearea
configsect:
`a --config` spnames currsign dpcomma
spnames: `a --begin` switch* `a --end`
`a --empty`
line: `a --line` <line number> <column number>
optid: id
`a --empty`
currsign: optlit
optlit: lit
`a --empty`
lit: `a --lit` <literal 16-bit value>
dpcomma: flag
flag: `a --yes`
`a --no`
iosect: `a --begin` filecontrol* `a --end`
`a --empty`
samearea: `a --begin` areagroup* `a --end`
`a --empty`
areagroup: `a --begin` id* `a --end`
filecontrol:
`a -- filedef` line id isopt assigntowhat
org fstatus access keys `a --endfiledef`
isopt: flag
assigntowhat:
`a --text` filename
`a --binary` filename
`a --printer` optfilename*
lit
optifilename:
filename `a --empty`
filename: `a --item` forwardref
`a --command` forwardref
idlist: `a --begin` id* `a --end`
`a --empty`
org: `a --seq`
`a --rel`
`a --inx`
fstatus: optforwardref
optforwardref:
`a --item` forwardref
`a --empty`
forwardref:
`a --item` modname
`a --forward` modname
access: `a --seqacc` optforwardref
`a --rndacc` optforwardref
`a --dynacc` optforwardref
keys: `a --key` forwardref altkeys
`a --empty`
altkeys: `a --begin` altkey* `a --end`
`a --empty`
altkey: `a --altkey` hasdupls forwardref
hasdupls: flag
datadiv: `a --datadiv` filesect ussect linksect
commsect reptsect
filesect: `a --filesect` fdsds
fdsds: `a --begin` fdsd* `a --end`
`a --empty`
fdsd: `a --fd` line id fdclauses records
`a --sd` line id sdclauses records
fdclauses: `a --begin` fdclause* `a --end`
fdclause: `a --recsize` recsize
`a --labels` flag
`a --labval` labvallist
`a --datarecs` idlist
`a --linage` linage
`a --reports` idlist
recsize: `a --pair` name name
name
labvallist:
`a --begin` labval* `a --end`
labval: `a --check` forwardref
`a --wrlab` forwardref
`a --setlab`
forwardref
linage: forwardref optforwardref optforwardref
optforwardref
records: `a --begin` dd* `a --end`
`a --empty`
sdclauses: `a --begin` sdclause* `a --end`
sdclause: `a --recsize` recsize
`a --datarecs` idlist
wssect: `a --begin` dd* `a --end`
dd: `a --dd` line lit optid dclauses
dclauses: `a --begin` dclause* `a --end`
dclause: `a --redef` modname
`a --pic` name
`a --usage` usage
`a sign` sign
`a --occurs` range asckeys tabindexes
`a --just`
`a --blankzero`
`a --value` initval
`a --renames` renamed
`a --condval` condvallist
modname: `a --select` modname id
id
name: `a --name` <names table marker>
usage: `a --comp`
`a --display`
`a --index`
sign: `a --lead`
`a --trail`
`a --slead`
`a --strail`
range: `a --varrange` name name modname
name
asckeys: `a --begin` asckey* `a --end`
`a --empty`
asckey: `a --asckey` ascdesc modname
ascdesc: flag
tabindexes:
idlist
`a --empty`
initval: `a --fig` lit
`a --allfig` name
name
renamed: `a --pair` modname modname
`a --item` modname
condvallist:
`a --begin` conval* `a --end`
condval: `a --pair` id id
`a --item` id
linksect: `a --begin dd* `a --end`
commsect: stub
stub: `a --empty`
reptsect: stub
procdiv: `a --procdiv` line procprmlist
restofprocdiv
procprmlist:
`a --begin` procprm* `a --end`
`a --empty`
procprm: `a --prm` modname
restofprocdiv: `a --declaratives` declaratives sections
`a --sects` sections
`a --pargs` paragraphs
`a --stms` stms
declaratives:
`a --begin` declarative* `a --end`
declarative:
`a --decl` line para optlit usestate
paragraphs
usestate: `a --useio` useio
`a --usereport` vbl
`a --usedebug` stub
useio: `a --input`
`a --output`
`a --io`
`a --extend`
idlist
vbl: `a --subscr` modname subs
`a --fig` lit
`a --allfig` name
modname
subs: `a --begin` sub* `a --end`
sub: `a --plus` modname id
`a --minus` modname id
modname
sections: `a --begin` sections* `a --end`
section: `a --section` line para optlit paragraphs
paragraphs:
`a --begin` paragraph* `a --end`
paragraph: `a --para` line para stms
stms: `a --begin` state* `a --end`
state: `a --stm` line stm
srm: `a --libcall` id prms
`a --excpcall` id flag prms optstms
`a --call` vbl flag prmlist optstms
`a --accdisp` id flag optvbl prms optstms
`a --addcorr` flag vbl rndvbl optstms
`a --alter` para para
`a --compute` lit flag expr rndvbllist
optstms
`a --exitprog`
`a --goto` optpara
`a --cgoto` vbl lit litlist
`a --if` condition optstms optstms
`a --inspect` vbl tally replace
`a --merge` id asckeys idlist outproc
`a --move` lit vbl vbllist
`a --movecorr` vbl vbl
`a --perform` performed
`a --perftimes` vbl performed
`a --perfuntil` condition performed
`a --perfvar` initlist condlist performed
incrlist lastincr
`a --search` flag optstms vbl optvbl
whenlist
`a --srchall` flag vbl optstms searchcond
optstms
`a --set` lit vbl vbllist
`a --setup` lit vbl idlist
`a --setdown` lit vbl idlist
`a --sort` id asckeys inproc outproc
`a --string` flag vbl optvbl stringlist
optstms
`a --subtcorr` flag vbl rndvbl optstms
`a --unstring` flag vbl optvbl optvbl
delimlist targlist optstms
`a --endunstring`
`a --addto` addto
`a --addgiving` addgiving
`a --divinto` divinto
`a -- divgiving` divgiving
`a --rmndr` rmndr
`a --multby` multby
`a --multgiving` multgiving
`a --subtfrom` subtfrom
`a --subtgiving` subtgiving
prmlist: `a --begin` coprm* `a --end`
`a --empty`
coprm: `a --prm` vbl
prms: `a --begin` prm* `a --end`
prm: `a --nullprm`
`a --nulldesc`
`a --numprm` vbl
`a --intprm` vbl
`a --prm` vbl
`a --litprm` lit
`a --alfaprm` vbl
`a --fileprm` id
`a --keyprm` vbl
`a --recprm` vbl
`a --intoprm` vbl
`a --fromprm` vbl
`a --startprm`
vbllist: `a --begin` taggedvbl* `a -- end`
taggedvbl: `a --vbl` vbl
rndvbllist:
`a --begin` rndvbl* `a --end`
rndvbl: `a --rnd` vbl
`a --vbl` vbl
optstms: stms
`a --empty`
optmodname:
taggedmodname
`a --empty`
modnamelist:
`a --begin` taggedmodname* `a --end`
taggedmodname:
`a --item` modname
tally: `a --begin` tallyitem* `a --end`
`a --empty`
tallyitem: `a --tally` vbl tallywhat beforeafter
tallywhat: `a --all` vbl
`a --lead` vbl
`a --chars`
beforeafter:
APPENDIX 2 I. Attributes of Symbol Table (continued) A. Possible Values for Base Type 1. at.sub.-- int: integer (medium-size) 2. at.sub.-- sint: short integer 3. at.sub.-- lint: long integer 4. at.sub.-- char: character 5. at.sub.-- uchar: unsigned character 6. at.sub.-- fit: floating-point (medium-size) 7. at.sub.-- sflt: short floating-point 8. at.sub.-- lflt: long floating-point 9. at.sub.-- enum: enumeration type 10. at.sub.-- enmb: member of enumeration 11. at.sub.-- unsg: unsigned integer (medium-size) 12. at.sub.-- sunsg: short unsigned integer 13. at.sub.-- lunsg: long unsigned integer 14. at.sub.-- pntr: pointer type 15. at.sub.-- bit: bit field 16. at.sub.-- arr: array 17. at.sub.-- strct: structure 18. at.sub.-- union: union 19. at.sub.-- string: string 20. at.sub.-- darr: dynamic array 21. at.sub.-- bcd: bcd arithmetic type 22. at.sub.-- blck: a block 23. at.sub.-- lbl: label 24. at.sub.-- proc: procedure 25. at.sub.-- pkg: package B. ATTRIBUTES FOR BASE TYPE at.sub.-- enum, at.sub.-- enmb An enumeration type is specified by listing the values an item of this type can take on. The "enumeration list" field of the enumeration type entry points to an "enumeration member" type entry. This first enumeration member has a pointer, the "enumeration object" field, to a constant object of this enumeration type. This object is initialized to zero. Subsequent values in the enumeration each cause an "enumeration member" type entry and a corresponding object entry to be made. The enumeration members are chained together. The objects in turn are initialized to 0, 1, 2, . . . 1. enumeration list (enlst) (blk type) For an item of base type "at.sub.-- enum", this field holds the block table of the first "enumeration member" field of the enumeration. 2. enumeration link (enlk) (blk type) For an item of base type "at.sub.-- enmb", this field holds the block table marker of the next "enumeration member" field of the enumeration. 3. enumeration object (enobj) (blk type) For an item of base type "at.sub.-- enmb", this field holds the block table marker of an object which is the corresponding item in the enumeration. C. BASE TYPE at.sub.-- arr, at.sub.-- ardx For array types, we must indicate the element type of the array and also the number of dimensions and the range of each dimension. To do this, an array type entry has an "element type" field and an "index list" field. The index list field leads to a type entry of base type "at.sub.-- ardx". Array index (at.sub.-- ardx) types contain two fields. The first (the "index pointer") points to a type entry which gives the range of this dimension. The latter type entry is thus a subrange type or a constrained integral type. The second field is the "index link", and points to the next array index for the array (giving the bounds for the next dimension). If this is the last dimension of the array, the "index link" is zero. One point: a multi-dimensional array can be represented in two ways. As an array of scalar type with two array indexes, or as an array of element type array with one array index. 1. element type (elty) (blk.sub.-- type) For a type of base type "at.sub.-- arr", this field points to a type entry which is the element (member) type of the array. 2. ex list (inlst) (blk.sub.-- type) For a type of base type "at.sub.-- arr", this field points to an "at ardx" type entry which gives bound information for the first dimension of the array. 3. row or column order (row) (bool) For a type of base type "ar.sub.-- arr", this field says whether storage is to be allocated for the array in row-major or column-major order. Values are: true: row-major order false: column-major order 4. number of elements (nelt) (unt16) For a type of base type "at.sub.-- arr", this field provides the number of elements in the array. 5. number of dimensions (ndim) (unt8) For a type of base type "at.sub.-- arr", this field gives the number of dimensions in the array. 6. index pointer (inptr) (blk type) For a type of base type "at.sub.-- ardx", this field gives the type of the array bound. This type is either a subrange type or a constrained integral type. The actual upper and lower bounds can be found by investigating this type. 7. index link (inlk) (blk.sub.-- type) For a type of base type "at.sub.-- ardx", this field gives the next array index (at.sub.-- ardx) for the next dimension of the array. If zero, this is the last dimension of the array. D. BASE TYPE at.sub.-- strct, at.sub.-- stfd Structures or records have base type "at.sub.-- strct". For such a type, the "field list" attribute points to the first member of the structure, which has base type "at.sub.-- stfd". Each structure field type entry has a "field pointer" attribute, which gives the type of the field, a "field name" for its name, and a "field link" field, to get to the next member of the structure. It also contains a "field parent" field, which points back to the structure of which it is a member. 1. field list (fdlst) (blk.sub.-- type) For a type of base type "at.sub.-- strct", this field points to the first structure field (base type "at.sub.-- stfd") of the structure. 2. number of fields (nfld) (unt8) For a type of base type "at.sub.-- strct", this field gives the number of fields in the structure. 3. record block number (rblk) (bindx.sub.-- type) For PASCAL and MODULA.sub.-- 2 only. In these languages, a reference to a structure field is of the form "r.f" where r is the record (structure) and f is a field. We regard the reference to "r" as opening up a new block where the fields or r become visible. Thus, in these languages, a new scope is created for each record, and the block number where its fields lie is saved in this attribute. 4. field name (fdnm) (nm.sub.-- type) For a type of base type "at.sub.-- stfd", this field gives the name table marker for the field's name. 5. field pointer (fdptr) (blk.sub.-- type) For a type of base type "at.sub.-- stfd", this field points to the type entry which gives the type of the field. 6. field link (fdlk) (blk.sub.-- type) For a type of base type "at.sub.-- stfd", this field points to the next structure field belonging to the same structure. 7. field parent (fdpar) (blk.sub.-- type) For a type of base type "at.sub.-- stfd", this field points back to the object or structure field whose type is the structure of which this item is a field. This attribute is provided for such languages as PL/I and COBOL where a field may be referenced without naming the parent. 8. displacement (disp) (unt32) For a type of base type "at.sub.-- stfd", this field gives the displacement (offset) of the field from the start of the structure. E. BASE TYPE at.sub.-- union, at.sub.-- unmb Unions are similar to structures. 1. member list (mblst) (blk type) For a type of base type "at.sub.-- union", this field gives the first member of the union. 2. number of members For a type of base type "at.sub.-- union", this frid gives the number of members in the union. 3. member name (mbnm) (nm.sub.-- type) For a type of base type "at.sub.-- unmb", this field gives the name table marker for the member's name. 4. member pointer (mbptr) (blk.sub.-- type) For a type of base type "at.sub.-- unmb:, this field gives the type of the member. 5. member link (mblk) (blk.sub.-- type) For a type of base type "at.sub.-- unmb", this field gives the next member of the union. 6. member parent (mbpar) (blk.sub.-- type) For a type of base type "at.sub.-- unmb", this field points to the block table marker for the objector structure field whose type is the union of which this field is a member, or if the union is anonymous is the immediate enclosing structure. F. BASE TYPE at.sub.-- darr This means the corresponding object is a dynamic array. As such no further information is available at compile time. The object itself generally has received a standard initialization (at.sub.-- desc) and will be handled by run-time routines. 1. element type (elty) (blk.sub.-- type)--Same as for types with base type "at.sub.-- arr". 2. Index list (inlst) (blk.sub.-- type)--Same as for types with base type "at.sub.-- arr". 3. number of dimensions (ndim) (unt8)--Same as for types with base type "at.sub.-- arr". 4. row or column order (row) (bool)--Same as for types of base type "at.sub.13 arr". G. BASE TYPE at.sub.-- bcd This means the corresponding object is a binary-coded-decimal numeric item. Such a type has two attributes, "number of digits" and "number of decimal places". Actually, this base type was introduced for BASICs, where these attributes are not used, since all such objects have a standard size. However, in PL/I, these fields will be used. 1. number of digits (dgt) (unt16)--The number of digits represented. 2. number of decimal places (dcpt) (unt16)--The number of decimal places represented. H. BASE TYPE at.sub.-- string This indicates the corresponding object is a string. It is used in BASICs only. In CBASIC, only the string itself is allocated (for constant; for variables, a pointer is allocated since the strings are dynamic). In MBASIC, a descriptor consisting of length, pointer to string, and the string is allocated for constants. For variables, space for a length and a pointer are allocated. There is no further attribute for this base type. I. BASE TYPE at.sub.-- fstring A FORTRAN or PL/I character string. The corresponding object in general has two relevant sizes. The first is the maximal size it can take on. This is the size to allocate. The second is the "logical" size the string has at any given moment at run-time. This is due to varying-length strings in PL/I and to the substringing operation in FORTRAN. 1. allocated length (alleng) (unt16) The allocated length of the string. 2. working length (wkleng) (blk.sub.-- type) Block table marker of object giving working length of the string. 3. is varying (vary) (bool) Is the string varying in length (PL/I)? If so, an integer must be allocated for the string's length, which is used in run-time routines. Values are: true: it is varying false: no it's not J. BASE TYPE at.sub.-- numeric A COBOL numeric data item. This has the fields listed below. 1. length (leng) (unt16) The size of the item (in bytes). 2. decimal places (dp) (int8) Number of decimal places of item (can be negative). 3. sign information (sign) (unt8) Tells whether the item is signed or not and the internal representation of the sign. Values are: UN: not signed LS: sign is leading, separate character LN: sign is leading, not separate character TS: sign is trailing, separate character TN: sign is trailing, not separate character 4. is computational? (comp) (bool) Is this a COMPUTATIONAL data item? If so, will be allocated like a native integer. Values are: true: it is false: it is not K. BASE TYPE at.sub.-- numed A COBOL numeric edited item with the following fields: length 1. (leng) (unt16) The length of item. 2. decimal places (dp) (int8) Number of decimal places. 3. picture (pic) (blk.sub.-- type) Pointer to the block table entry for the item's picture (which is passed along with the item to run-time routines). blank 4. when zero? (bwz) (bool) Is the item to appear blank when it has the value zero? Values are: true yes false: no L. BASE TYPE at.sub.-- alfnum: A COBOL alphanumeric edited item with the fields: 1. length (leng) (unt16) Length of the item. 2. is justified (just) (bool) Is the item right justified? Values are: true: yes false: no M BASE TYPE at.sub.-- alfa: A COBOL alphabetic item with fields: 1. length (leng) (unt16) length of the item 2. is justified (bist) bool) Is the item right justified. Like above 3. picture (pic) (blk type) Pointer to the block table marker for the item's picture. If the picture is simple (no insertion of blanks), this value will be zero. N. BASE TYPE at.sub.-- alfed: COBOL alphanumeric edited item. Fields are: 1. length (leng) (unt16) Length of item. 2. is justified (just) (bool) Is it right justified? Like above. 3. picture (pic) (blk type) Block table markeer for picture of item. O. BASE TYPE at.sub.-- para COBOL paragraph or section. The base type "label" is not used because different information must be kept. 1. is a section? (issect) (bool) Is this a paragraph or a section? true: a section false: a paragraph 2. parent section (sect) (blk.sub.-- type) If a paragraph, the block table marker of the section to which it belongs. If a section or a paragraph in a program without sections, it is `blk.sub.-- null`. 3. referenced outside of it section? (extref) (bool) Has this paragraph been referenced anywhere outside of the section in which it is defined? Values are: true: yes false: not 4. number of times defined? (dfcount) (unt8) Number of times a paragraph with this name has been defined. (this and the preceding field are used to check for invalid paragraph references in COBOL.) 5. next paragraph (next) (blk.sub.-- type) The block table marker of a pointer which is set to point to the following paragraph. Space for this pointer is allocated, as there are statements in COBOL which change its value (ALTER and PERFORM). 6. saved next paragraph (save) (blk.sub.-- type) The block table marker of a pointer which points to the next physical paragraph. This value is used at run-time to restore the value of the `next` pointer after a return from a PERFORM. P. BASE TYPE at.sub.-- cond A COBOL condition (88-item). This is an item which is either true or false according to whether a referenced item has a given value(s) or range of value(s). 1. referenced object (cndobj) (blk.sub.-- type) The block table marker for the object whose value is being inquired. Note that if the object is subscripted, the pointer will be to a temporary. 2. value list (cndlst) (blk.sub.-- type) The block table marker for the first condition value against which to test the referenced object. Q. BASE TYPE at.sub.-- cndvl A condition value. This gives the value or range of values against which the referenced object in a condition is tested. 1. low value (locnd) (blk.sub.-- type) The block table marker of an object which gives the low value of a range. 2. high value (hicnd) (blk.sub.-- type) The block table marker of an object which gives the high value of a range. If the comparison is against a single value rather than a range, the pointer is to the single value and the `low value` is unused. 3. is a single value? (sngl) (bool) Is the condition a single value or a range? true: single value (use `high value` for the value) false: range (use `low value` and `high value`) 4. next condition value (condlk) (blk.sub.-- type) The block table marker of the next condition value against which to test the referenced object. R. BASE TYPE at.sub.-- index An INDEX data item in COBOL. This is implemented identically to an integer, but is a separate data type in order to check against COBOL semantics. There are no further attributes. S. BASE TYPE at.sub.-- blck Entry made for a block (an inter scope). In most languages, this corresponds to a new frame. However, in `C` and in MODULA-2 items in the block should be allocated at the same level as the current scope, as the block has only a significance with regard to visibility. 1. sub-block-index (sblk) (unt16) The block index for the corresponding block in the symbol table which this entry refers to. 2. lexical level (bklev) (unt8) The static nesting level of this block. Used to maintain the display. T. BASE TYPE at.sub.-- proc Indicates a procedure type. Such an entry has a "parameter list" field which leads to its formal parameters. It also has a "return type" field. Like a block entry, it has "sub-block-index" and "lexical level" fields. Notice a procedure, unlike a block in `C` or MODULA-2 always creates a new frame. 1. parameter list (pmlst) (blk.sub.-- type) Block table marker for first formal parameter of the procedure. Value is zero if procedure has no parameters. 2. return type (rtty) (blk.sub.-- type) Block table marker for return type of procedure. Value is zero if procedure does not return anything. 3. sub-block-index (sblk) (unt16) Block number for block corresponding to this procedure in the symbol table. 4. lexical level (bklev) (unt8) Static nesting level of the procedure. 5. number of parameters (nprm) (unt8) The number of parameters to the procedure 6. routine number (rout) (unt8) For built-in procedures in certain languages only. This is an index into a pre-defined array of routines which handle special built-in functions of the language. U. BASE TYPE at.sub.-- parm Objects having this type are parameters. This entry is used to chain the parameters of a procedure together. 1. parameter mode (pmode) (unt8) Indicates direction of information flow between formal parameter and corresponding actual parameter. Values are: a. at.sub.-- in: input parameter (may not be altered) b. at.sub.-- out: output parameter (undefined on entry) c. at.sub.-- inout: input-output parameter 2. parameter call (pcall) (unt8) Indicates method of parameter passing. Values are: at.sub.-- val: call by value at.sub.-- ref: call by address 3. parameter object (poptr) (blk.sub.-- type) Block table marker of the object which is the formal parameter. 4. parameter type (ptptr) (blk.sub.-- type) Block table marker for the type of the formal parameter. 5. parameter link (pmlk) (blk.sub.-- type) Block table marker for the next entry (of base type "at.sub.-- parm") for the next parameter of the procedure. 6. is conformant parm (conf) (bool) Is this a conformant array parameter? Used only in ISO PASCAL. Values are: true: it is false: it isn't V. BASE TYPE at.sub.-- undf This entry indicates the corresponding object is undefined. Items in error are generally given this type to avoid generating extra error messages. There are no other attributes. W. BASE TYPE at.sub.-- pkg Indicates a package (ADA). X. BASE TYPE at.sub.-- def Indicates a defined type (ADA). 1. defined type (dfty) (blk.sub.-- type) Block table marker for the type which this entry is defining. Y. BASE TYPE at.sub.-- log A logical (or boolean) type. Should always be allocated like a medium-size integer. Has no further attributes. Z. BASE TYPE at.sub.-- sbrng A subrange type. Has a "subrange type" field, indicating what type it is a subrange of. Also has "lower bound" and "upper bound" fields, pointing to objects giving the range. 1. subrange of (sbtp) (blk.sub.-- type) Block table marker for type of which this type is a subrange. 2. lower bound (lbd) (blk.sub.-- type) Block table marker for object giving lower bound of range. 3. upper bound (ubd) (blk.sub.-- type) Block table marker for object giving upper bound of range. AA. BASE TYPE at.sub.-- bit The corresponding object is a bit field. The fields include a "bit width" field for the size (in bits) to allocate, a "working width" field for the item's current length, and an "is aligned" field for PL/I aligned bit fields. 1. bit width (btwidth) (unt16) The number of bits to allocate for this item. May be zero for parameters and based items (in PL/I) or for zero-length bit fields (in `C`). 2. working width (wkwidth) (blk.sub.-- type) Block table marker for item which holds the run-time width of the field (passed to run-time routines). 3. is aligned (bitalgn) (bool) Is the field to be aligned on a byte boundary? Values are: true: yes false: no 4. bit offset (btofset) (unt8) The length in bits from the start of the byte in which this bit field starts. AB. BASE TYPE at.sub.-- lbl Label type. The only attribute gives the nesting level of the label. 1. nesting level (bklev) (unt8) The static nesting level of block in which this label is defined (to handle jumps out of current scope). AC. BASE TYPE at.sub.-- ieee, at.sub.-- sieee, at.sub.-- lieee Medium short, and long size real numbers represented internally in IEEE notation. Standard sizes only, so no further attributes. AD. BASE TYPE at.sub.-- file Indicates an object of this type is a file. In PASCAL, we will store the file descriptor here, so an integer will be allocated. In COBOL, an entry for the file table will be allocated for each file. 1. file type (fltp) (blk.sub.-- type) Block table marker for type entry which indicates file consists of sequence of components of this type (PASCAL). 2. file buffer (fbuff) (blk.sub.-- type) Block table marker for object which is the file's buffer area. 3. Linage counter (lcount) (blk.sub.-- type) Block table marker for the object which will serve as the LINAGE-COUNTER for the file (COBOL). 4. is a sort file? (issort) (bool) Is this a sort file? (COBOL). Values: true: yes false: no 5. File organization (org) (unt8) a. at.sub.-- seq: sequential b. at.sub.-- rel: relative (random) c at.sub.-- inx: indexed sequential 6. key list of file (fkylst) (blk.sub.-- type) Block table marker for the first (relative or indexed) key for the file. AE. TYPE at.sub.-- table A COBOL table. Similar to an array, but in addition can vary in size (maximum limit given by some data item) and may be sorted according to certain keys. It also may contain a list of its own indexes. 1. element type (elty) (blk.sub.-- type) Like base type "at.sub.-- arr", the element type of the table. 2. index list (inlst) (blk.sub.-- type) Also like "at.sub.-- arr", block table marker for an "at.sub.-- ardx" entry which will give array bounds. If the table is variable-size, the upper array bound is the (constant) maximum value. 3. variable bound (varbnd) (blk.sub.-- type) Block table marker for object which will give dynamic upper bound to size of table. If zero, table is fixed in size. 4. key list (kylst) (blk.sub.-- type) Block table marker for a key type entry (base type "at.sub.-- key") which is the first (primary) key by which the table is sorted. 5. index (COBOL) list (ixlst) (blk --type) Block table marker for an index chain entry (base type "at ixchn") which will lead to the first index associated with the table. Is zero if no indices are associated with the table. 6. The number of elements (nelt) (unt16) The number of elements in the array (see under base type "at.sub.-- arr"). BASE TYPE at.sub.-- key COBOL only. Used to chain together keys of table. 1. key pointer (kyptr) (blk.sub.-- type) Block table marker of object (which is member of the table) which serves as the key. 2. key mode (kymode) (at.sub.-- type) Whether the key is ascending or descending. Values are: a. at.sub.-- asckey: ascending key b. at.sub.-- deskey: descending key 3. key link (kylk) (blk.sub.-- type) Block table marker for next key in list. AG. BASE TYPE at.sub.-- ixchn COBOL only. Used to chain together indexes of table. 1. index object (ixobj) (blk.sub.-- type) Block table marker for the object which is the corresponding index. 2. index link (ixlk) (blk.sub.-- type) Block table marker for the member of the index chain. AH. BASE TYPE at.sub.-- set A set type. Currently, allocate objects which are sets as integers. For now, we will not allow sets to contain more member than there are bits in a medium-size word. 1. set type (setp) (blk.sub.-- type) Block table marker for the type which this entry is a set of. AI. BASE TYPE at.sub.-- char, at.sub.-- uchar Character or unsigned character type. There are no attributes. AJ. BASE TYPE at.sub.-- flt, at.sub.-- sflt, at.sub.-- lflt AK. BASE TYPE at.sub.-- ieee, at.sub.-- sieee, at.sub.-- lieee Medium-size, short, and long floating-point types. `flt` means hardware floating-point representation; `ieee` refers to IEEE representation. Attributes are: lower bound (lbd) (blk.sub.-- type) upper bound (ubd) (blk.sub.-- type) These attributes exist due to the misguided belief that there were such things as subranges of floating-point types. AL. BASE TYPE at.sub.-- cmblk A COMMON block (FORTRAN). Identifies the object whose type this is as a common block. Has the following attributes: 1. common-block list (cmlst) (blk.sub.-- type) Block table marker for the first item in a linked list for the same common block, but belonging to a different procedure. AM. BASE TYPE at.sub.-- cmitem An item in a common block linked list. This type is used to chain the members together. 1. Common-block object (cmobj) (blk.sub.-- type) Block table marker for the object in the list 2. common-block link (cmlk) (blk.sub.-- type) Block table marker for the next item in the linked list. AN. BASE TYPE at.sub.-- pntr A pointer type. Attribute is "target type". 1. target type (trty) (blk.sub.-- type) Block table marker of the type the pointer points to. AO. BASE TYPE at.sub.-- ucsdstring The associated object is a STRING in UCSD PASCAL. Its length is the only attribute. 1. length of string (leng) (unt16) The string's maximum length. The logical length of the string may vary dynamically, so the allocator must also allocate an integer in front of the string which will hold the current length. AP. BASE TYPE at.sub.-- arglist An argument list (at present COBOL only). This is a collection of parameters, a pointer to which will be passed to a run-time procedure. It is always initialized via q-code, and its length is always calculated from the intialization values themselves. There are no attributes. II. ACTUAL LAYOUT OF ATTRIBUTES IN BLOCK TABLE ENTRIES We now describe where each attribute is stored in the entry in the symbol table. This information is not needed by a user of the symbol table module. It is internal to the system and maintained by the macros in the file "s3mac.h", which must be consistent with this layout. A. All Entries: byte 0 bits 0-3: size of the entry in bytes/2 bits 4-7: for objects: the storage class (stcl) for integral types: Is constrained (cnstr) for parameter types: the parameter mode (pmode) byte 1 bits 0-1: the visibility (vis) bit 2: for procedure objects only: is referenced (refd) bits 3-4: alignment (algn) bit 5: for types: is examined (exam) bit 6: for objects: is constant (const) bit 7: nature (ntr) bytes 2-3: has table marker of entry (htm) bytes 4-7: for objects: for procedure and block types: size of the frame (osize) bytes 8-9: back-chain marker (bkchn) B. For Objects Only: bytes 10-11: type reference (tref) bytes 12-15: initialization value: integer value (invl) 12-15 unsigned int value (unvl) 12-15 character value (chvl) 12 q-code pointer (qdpt) 12-15 name table marker (nmpt) 12-13 float value (ftvl) 12-13 string value (stvl) 12-13 label value (lbvl) 12-15 bcd value (bcd) 12-13 if initialization is for complex: real part (cxl) 12-13 imaginary part (cx2) 14-15 if initialization is for pointer: pointer object (ptvl) 12-13 pointer offset (ptofs) 14-15 if storage class is `eqv` (equivalence), this will instead hold: equivalence link (eqlk) 12-13 equivalence offset (eqoff) 14-15 if storage class if `fbuff` (file buffer), this will instead hold: file to which is buffer (tobuff) 12-13 next file buffer (nxtbuf) 14-15 bytes 16-19: address (for static objects) or offset (for parameters, temporaries, or automatics) (addr) byte 20: trap flag (for source debugger), register allocation flag (code generator), or local modes (source debugger--for procedure objects) byte 21: bits 0-3: size of entry in bytes/2 bits 4-7: kind of initialization (kdin) C. For Types Only Byte 10: base type (bsty) D. Depending on the base type, the size of type entries vary. The last byte of every entry is the size of the entry itself, given in bytes/2. 1. BASE TYPE at.sub.-- char 2. BASE TYPE at.sub.-- uchar 3. BASE TYPE at.sub.-- pkg 4. BASE TYPE at.sub.-- undf 5. BASE TYPE at.sub.-- string 6. BASE TYPE at.sub.-- log 7. BASE TYPE at.sub.-- cmplx byte 11: size of entry 10. BASE TYPE at.sub.-- int 11. BASE TYPE at.sub.-- sint 12. BASE TYPE at.sub.-- lint 13. BASE TYPE at.sub.-- flt 14. BASE TYPE at.sub.-- sflt 15. BASE TYPE at.sub.-- lflt 16. BASE TYPE at.sub.-- unsg 17. BASE TYPE at.sub.-- sunsg 18. BASE TYPE at.sub.-- lunsg 19. BASE TYPE at.sub.-- sieee 20. BASE TYPE at.sub.-- ieee 21. BASE TYPE at)lieee bytes 11-12: lower bound (lbd) Bytes 13-14: upper bound (ubd) byte 15: size of entry 22. BASE TYPE at.sub.-- pntr bytes 11-12: target type (trty) byte 13: size of entry 23. BASE TYPE at.sub.-- arr bytes 11-12: element type (elty) bytes 13-14 index list (inlst) byte 15: number of dimensions (ndim) byte 16 | ||||||
