Method for generating short form instructions in an optimizing compiler4763255Abstract A method for improving the quality of code generated by a compiler or assembler, for a target machine that has short and long forms of some of its instructions with the short forms executing faster or occupying less space. The method first determines which bits of the result of each computational instruction are significant, by a backwards pass over the program that is similar to liveness analysis. Then the significant bits thus computed are used to guide the code selection process to select the most efficient instruction that computes the correct result in all the significant bit positions. Claims Having thus described our invention, what we claim is new, and desire to secure by Letters Patent is: Description FIELD OF THE INVENTION
______________________________________
ADD.L r1,r2 ADDI.L #123,r1
ADD.W r1,r2 ADDI.W #123,r1
ADD.B r1,r2 ADDI.B #123,r1
______________________________________
ADD.L (add long) adds the entire 32-bit contents of register r1 to register r2, and places the result in r2. ADD.W (add word) adds the rightmost 16 bits of r1 to the rightmost 16 bits of r2, leaving the leftmost 16 bits of r2 unchanged. ADD.B (add byte) adds the rightmost eight bits of r1 to the rightmost eight bits of r2, leaving the leftmost 24 bits of r2 unchanged. Similarly, ADDI.L (add immediate long) adds a number (123 is shown) to the entire 32-bit contents of register r1, ADDI.W adds to the rightmost 16 bits, and ADDI.B adds to the rightmost eight bits. The instructions ADD.W and ADD.B execute faster than ADD.L, and hence are preferred in a situation in which either would do. The instructions ADDI.W and ADDI.B execute faster and occupy less storage than ADDI.L, and hence are preferred to ADDI.L. The Motorola MC68000 has many other instruction types that exist in "long" and "short" forms, with the shorter form being faster in execution and often occupying less storage. Further details, including instruction timings, may be found in: MC68000 16-bit Microprocessor User's Manual, Second edition, Motorola, Inc., (January 1980). As an example of the code improvement accomplished by this invention, suppose a compiler has generated the instruction sequence:
______________________________________
ADD.L r2,r1
SUBI.L #16,r1
MOVE.W r1,6(r5)
______________________________________
and suppose further that the MOVE.W instruction, which stores the rightmost 16 bits of register r1 into storage at a location addressed by the contents of register r5 plus 6, is the last use of register r1. Then this invention will replace the ADD.L instruction with ADD.W, and the SUBI.L instruction with SUBI.W. The latter forms execute faster than the former, and the SUBI.W instruction occupies less storage than SUBI.L. DESCRIPTION OF THE PRIOR ART A number of computer data bases were searched for prior art. No art relevant to this invention was found. SUMMARY AND OBJECTS OF THE INVENTION It is a primary object of the present invention to provide an optimizing compiler with a module that replaces certain generated instructions with instructions that are equivalent in the sense that they do not change the overall computation performed by the program, but are preferable in that they execute faster and/or occupy less storage. It is a further object of the invention to utilize a generalization of the concept of "liveness" analysis of a program. It is another object to utilize this generalization for calculating which bits of a general register are "live" at a point in the program, rather than calculating only a summary bit which indicates whether or not any bit in the register is live, as is the usual practice. In the following description this generalization is called "significant bit analysis. DESCRIPTION OF THE DRAWINGS FIG. 1 is a very high level functional flowchart of an optimizing compiler in which the present invention has particular utility. FIG. 2 is a high level flowchart of the herein disclosed compiler module for effecting the desired significant bit analysis. FIG. 3 is a more detailed flowchart illustrating how some of the computational types of instructions are processed by the herein disclosed compiler module. DESCRIPTION OF THE PREFERRED ENBODIMENT The invention will be described as it fits into an optimizing compiler, and for the Motorola MC68000 target machine. The first step in applying this invention is to do "significant bit analysis" of the program being compiled. This is a process of determining, for each instruction, which bits of the result of that instruction are "significant" (it is assumed, for simplicity, that there is only one result). A bit is significant if it is used by the program after being computed. This is the same concept as the well known "liveness," except that each bit of the result is examined to determine whether or not it is "live," or "significant." In conventional liveness analysis, only a single summary bit is computed for each instruction, which indicates whether or not any bits of the result are "live," or "significant." There are three levels at which significance analysis could be done: 1. Globally, as ordinary liveness analysis is done, 2. On a basic block (actually branch-to-branch) level, with assistance from global liveness analysis, or 3. On a basic block (or branch-to-branch) level with no assistance. The first choice above is the most expensive to compute but gives the best results. The last is the cheapest to compute but gives the poorest results. The second is a compromise that is only a little more expensive to compute than (3) if liveness analysis has been done anyway, and gives results of intermediate quality. Choice (2) will be described herein, however the invention is not intended to be limited to this particular approach. FIG. 1 shows where level (2) of significant bit analysis (block 5) is most conveniently done within the framework of a typical optimizing compiler. The important thing is that global code optimization (block 3) is done before significant bit analysis. This is because conventional liveness analysis is part of global code optimization, and we will need the "last use" bits that are a byproduct of liveness analysis. It is preferable, although not necessary, that register allocation (block 4) be done before significant bit analysis. This permits a more efficient compiler, because if register allocation is done first, significant bit analysis can be done in terms of the real registers of the machine, and there is usually a fairly small number of them (e.g., 16 on the Motorola MC68000). Significant bit analysis has to be done before final code generation (block 6). The final code generation module of the compiler will use the results of significant bit analysis to determine what form (8-, 16-, or 32-bit) of each instruction to generate. FIG. 2 is a high level flowchart of the significant bit analysis shown in block 5 of FIG. 1. The analysis is done in a single backwards pass over the program. Although the processing shown here could be done on a basic block basis, it is shown being done on a branch-to-branch basis. This is just as easy to program, and it will sometimes result in better quality code being generated. Thus label points in the program are ignored (block 3). However, branch instructions (block 5) result in resetting the program's knowledge of which bits are significant in each register, to the state "all bits of all registers are presumed to be significant." This is the safe state to assume of those points at which the program has no information. If an instruction is not one that represents a label point, and is not a branch or subroutine "return" instruction, then it is an ordinary computational instruction such as "add," "load," "store," "shift," etc. The processing of computational instructions, shown as block 6 of FIG. 2, is shown in more detail in FIG. 3. This program works with an array of bits referred to herein as the significant bit table (SBT), that has a number of rows equal to the number of registers on the machine (e.g., 16 on the Motorola MC68000), and a number of columns equal to the register length of the machine (32 on the MC68000). At a typical point in processing, the array might look like this:
______________________________________
32 bits
______________________________________
0. 0000FFFF 8. FFFFFFFF
1. FFFFFFFF 9. 0000FFFF
2. FFFFFFFF 10. 000001FF
3. FFFFFFFF 11. 7FFFFFFF
4. FFFFFFF0 12. 0F0F0F0F
5. 000000FF 13. 0000FF00
6. FFFFFFFF 14. FFFFFFFF
7. FFFFFFFF 15. FFFFFFFF
______________________________________
Here we have shown the bits in hexadecimal notation, e.g., "0000FFFF" denotes 16 zero-bits followed by 16 one-bits. The values in the array change as the program scans backwards in the instruction stream. If, at a certain point, the array has the values shown above, then the meaning is that at that point, the leftmost 16 bits of register 0 are not significant ("dead"), but the rightmost 16 bits are significant. A value of "FFFFFFFF" means that all bits in the associated register are significant, etc. Now, with reference to FIG. 3, let us see how an instruction is processed to determine the significant bits of its result. The process is to propagate the significant bits from a result to the input operands of an instruction. Then, it is propagated from the input operands of a current instruction to the result operands of earlier instructions, as the program scans backwards through the instruction stream. Just how the bits propagate from a result to the input operands depends on the instruction type (add, shift, store, etc.), as shown in FIG. 3. To get started, the program must know, or assume, which bits of the result of the first encountered instruction are significant. For the process being described, the first encountered instruction is assumed to have all its result bits significant. This is recorded by initializing the entire 16.times.32 bit array to all one's when a branch instruction is encountered. Now, suppose the middle of a branch-to-branch section of code is being currently processed, and an add or subtract instruction is encountered. In particular, suppose the instruction is: ADD.L r1,r2 This means to add the contents of register r1 to the contents of register r2. Register r2 is both an input to and the result of the instruction. It will be easier for us to think in terms of a three-address instruction: ADD.L r1,r2,r3 in which r3 is the result register. First it is necessary to refer to the significant bit table at position r3, to see what bits of the result are significant. The bit mask retrieved from the table is associated (stored) with the add instruction, so that it can be used later by the assembly and final code generation module to generate the optimum form of the add instruction. Actually, for efficiency it suffices to associate only two bits with the add instruction, to record whether the instruction should be generated in long (32-bit) form, word (16-bit) form, or byte (8-bit) form, as those are the only choices on the MC68000. This association of two bits with the instruction will be referred to subsequently as "marking" the instruction. Suppose the significant bits of the result register r3 (as determined for the SBT) are X`00008012`. Then we can mark the add instruction as "word," or 16-bit, form, because all significant bits of the result lie in the rightmost 16 bits of register r3. Then, since addition is a right-to-left process (two's complement arithmetic is assumed throughout), the bits in the leftmost 16 positions of registers r1 and r2 cannot possibly affect a significant bit of the result, but bits anywhere in the rightmost 16 positions can. Therefore, the significant bits of registers r1 and r2 for this instruction are X`0000FFFF`. This is next recorded in the SBT table, at the rows for registers r1 and r2. If the add instruction is the last use of r1 (or r2) in that block, then the table position for r1 (or r2) is set to X`0000FFFF`. This is determined by looking at the liveness bit described above which may be set during the "Code Optimization" phase. On the other hand, if the add instruction is not the last use of r1 (or r2), then we "OR" X`0000FFFF` into the table at position r1 (or r2). The point is that r1 and r2 may have certain bits significant because of uses of these registers below the add instruction, i.e., uses that were processed earlier in this backwards scan, and wherein those significant uses must not be "forgotten". This processing of an add (or subtract) instruction is shown in blocks 1 and 2 of FIG. 3. As the backwards scan proceeds, it will likely come to an instruction that sets r1 or r2. At this point, it refers to the table at position r1 or r2, respectively, to determine which bits of that register are significant. It then propagates this information back to the input operands, in a manner similar to the processing of the add instruction described above. Suppose as another example that a "store byte" instruction is encountered (FIG. 3 blocks 9 and 10). This instruction would be written: MOVE.B r1,d(r2,r3) in which register r1 contains the byte being stored, and r2 and r3 are "base" and "index" registers that serve to address storage. "d" is a displacement (a constant) that has no role in significance analysis. The MOVE instruction has no result register (it doesn't alter the contents of any register). It uses only the rightmost eight bits of register r1. Therefore, a mask of X`000000FF` is OR'ed into the table at the position of r1. The MOVE instruction uses the right-most 24 bits of the base and index registers, so a mask of X`00FFFFFF` is OR'ed into the table at the positions of r2 and r3. FIG. 3 shows the processing of six instruction types. The complete process should be expanded to include other instruction types not shown in FIG. 3, such as "load" and "shift" instructions, etc. To now present a more substantial example, suppose that a sequence of code between branches is:
______________________________________
significance of r1
______________________________________
MOVE.L 4(r2),r1 0000FF00
LSR.L #8,r1 000000FF
ANDI.L X'000000FF',r1 000000FF
MOVE.B r1',0(r2) FFFFFFFF
______________________________________
The program reads a long word (32 bits) from memory, shifts it right eight positions (LSR=logical shift right), "AND's" it with the mask X`000000FF`, and stores the rightmost byte of register r1 into main memory. The column headed "significance of r1" shows one row of the significance array, that for r1, as processing proceeds backwards. The following will describe what happens to the significance of r1 as this sequence of instructions is processed. Initially (bottom row), the significance of r1 is set to X`FFFFFFFF`, which is what has to be assumed in the absence of any knowledge. Then the MOVE.B instruction is encountered. For this example, assume that the use of r1 in this instruction is flagged as a "last use," which has been denoted with a prime (') after r1. Then the significance or r1 is set to X`000000FF` in the table, following FIG. 3 block 10. Next the ANDI.L is encountered. This instruction uses r1 as both an input and the result register. The significance of r1 as a result, X`000000FF`, is "AND'd" with the mask, also X`000000FF`, and the result, "X`000000FF` is "OR'ed" into the table for position r1. The result is "X`000000FF` (no change to the significance of r1). These steps are summarized in FIG. 3 block 12. Now at this point, the significance analysis program could observe that the "AND" instruction turns off only insignificant bits, and hence can be omitted. Alternatively, the instruction could be marked as byte form, and final code generation could delete it, since the immediate mask ends in eight "1" bits. Next the LSR.L is encountered. It is marked as byte form, because the significance of the result is X`000000FF`. The significance of the input register, r1, is the significance of the result register, also r1, shifted left eight positions (handling of shifts is not shown in FIG. 3). Lastly, the MOVE.L is encountered. This is marked as word (16-bit) form, because the significance of the result register (r1) is X`0000FF00`, i.e., only bits in the rightmost 16 positions are significant. By using the marking computed above, final code generation can output the following instructions as a faster and shorter equivalent to those shown above:
______________________________________
MOVE.W 6(r2),r1 (load instruction from
memory)
LSR.W #8,r1
MOVE.B r1',0(r2) (store instruction to
memory)
______________________________________
There are two things that final code generation must be cognizant of that arise in the above example: (1) the MOVE.L cannot be changed to MOVE.W unless the increase of two in the displacement (from 4 to 6 in the example) results in a displacement within the limits allowed by the instruction (32767 for the MC68000), and (2) the selection for the most efficient form of the LSR instruction depends upon the significance of the result and the shift amount. In the example, the significance of the result is X`000000FF`, but the LSR instruction cannot be made LSR.B, because bits in positions 16-23 of register r1 are shifted into positions 24-31. It can, however, be made LSR.W, which is faster than the original LSR.L. Appendix I shows a complete working subroutine for performing significance analysis. It includes the steps that were illustrated in FIG. 2 and FIG. 3. It has been utilized successfully with several target machines: the Motorola MC68000, the IBM System/370, and several experimental reduced instruction set machine architectures. It is written in a language similar to PL/I, and is sufficiently annotated to allow any skilled programmer to incorporate the present invention into an optimizing compiler of the form shown in FIG. 2, or to rewrite the subroutine in another language, or for another target machine, or for a compiler or assembler of different structure. It will of course be appreciated that the specific significance values introduced into the bit table would have to be tailored to a specific system architecture and particular instruction format. The invention would have equal applicability to, for example, a 16 bit full word machine as well as to a 45 or 64 bit machine architecture. It should be noted more particularly, that the advantages of the present invention will be realized in a machine architecture where the shorter form instructions take less machine time or less storage space than the longer form regardless of the underlying and length of the machine. ##SPC1##
|
Same subclass Same class Consider this |
||||||||||
