Compiler for performing a loop fusion, dependent upon loop peeling and/or loop reversal6070011Abstract A compile method employs loop fusion to improve execution of a first loop and a second loop in a code sequence. A compile method initially peels one or more loop iterations from one of the loops to cause each of the loops to exhibit an equal number of loop iterations. Thereafter, an attempt is made to fuse the first and second loops, upon a condition that the resulting fused loop produces a same computational result as would be produced if the first loop and second loop were not fused. If the condition is not met, a loop reversal is performed on one of the loops and a fusing action is again attempted; if the attempted fusing action of the loops does not fulfill the condition, a loop reversal is performed on the other loop and a fusing action is again attempted. The combined loop peeling/loop reversal actions provide a higher probability of an ability to fuse the loops than otherwise. Claims We claim: Description FIELD OF THE INVENTION
______________________________________
DO I = 1, 3 DO I = 1, 3
A(I) = I --> A(4-I) = 4-I
ENDDO ENDDO
______________________________________
To illustrate the effect of a loop reversal, the example below shows the computations performed by a loop before and after loop reversal, during each iteration of the loop, respectively:
______________________________________
Original Loop Reversed Loop
______________________________________
Iter 1: A(1) = 1
A(3) = 3
Iter 2: A(2) = 2
A(2) = 2
Iter 3: A(3) = 3
A(1) = 1
______________________________________
It can be seen that the order in which the elements of array A are computed is reversed by the reversed loop and yet the end results are the same, i.e. array A is assigned values (1, 2, 3). Loop Fusion is a combination of two or more adjacent loops, both with the same number of iterations, into a single loop (L1 and L2 are labels used to identify the loops). For example:
______________________________________
L1: DO I = 1, 3 L3: DO I = 1, 3
A(I) = I A(I) = I
ENDDO --> B(I) = A(I)
L2: DO I = 1, 3 ENDDO
B(I) = A(I)
ENDDO
______________________________________
After the loop fusion transformation shown above, the same computations performed by the two fusion candidate loops L1 and L2 are performed by the single fused loop L3, i.e. arrays A and B are assigned values (1, 2, 3). Loop fusion has potential benefits in reducing the runtime of a program. The most important ones are: 1. It enables the exploitation of memory reuse across loops, which can have significant positive impact on the program's run time performance because it can reduce, even eliminate, cache and Translation Look-aside Buffer misses by bringing close together (in time) multiple accesses to the same or nearby memory locations. 2. It reduces the overhead for the run time execution of the loop by reducing the number of termination tests and branching instructions needed to restart the iterations required for the execution of each of the original loops down to that required for the single fused loop. A loop fusion can only "legally" be performed under the following conditions: 1. The candidate loops must be adjacent to each other in the program, i.e. there must be no other statements between the two loops. 2. The candidate loops must have identical numbers of iterations. 3. The fused loop must perform the same computation as the candidate loops. The procedure employed to test for the legality of a loop fusion operation is often called "data dependence analysis". See Zima et al., "Supercompilers for Parallel and Vector Computers" Addison Wesley, 1991. "Reuse" is a further test that is used to test whether a revised code sequence operates more efficiently (or "profitably") than the unrevised code sequence. Recall that loop fusion can only be performed when the candidate loops have the same number of iterations and that the fused loop must perform equivalent computations as that performed by the individual candidate loops. To enable fusion of loops, loop peeling on one of the loops has been used when candidate loops do not have the same number of iterations. Consider the following two loops L1 and L2 below. They cannot be directly fused because L1 contains one more iteration than L2 does:
______________________________________
L1: DO I = 1, 3 L2: DO J = 2,3
A(I) = I B(J) = A(J)
ENDDO ENDDO
______________________________________
However, if the first iteration of L1 is peeled from the loop, then the remainder of Loop L1 (i.e., L1') and Loop L2 can be fused to form loop L3 as shown below:
______________________________________
Peel: A(1) = 1 A(1) = 1
L1': DO I = 2, 3 L3: DO I = 2, 3
A(I) = I A(I) = I
ENDDO --> B(I) = A(I)
L2: DO J = 2, 3 ENDDO
B(I) = A(I)
ENDDO
______________________________________
The problem in practice is that neither loop peeling nor loop reversal is guaranteed to enable legal and profitable loop fusion. Furthermore, loop peeling or loop reversal by themselves may cause loop performance deterioration, as the former increases the size of the program and the latter changes the data dependencies of the loop, which may disable other profitable transformations. The prior art does teach the determination of the legality and profitability of loop fusion. See Carr et al., "Compiler Optimizations for Improving Data Locality", Proc. of ASPLOS VI, San Jose, Calif., October, 1994. Also the prior art has used loop reversal to enable loop permutation, but the use of loop reversal to enable legal and profitable loop fusion is, to Applicants' knowledge, not known to be in the prior art. Notwithstanding the use of loop peeling to enable subsequent loop fusion, it often occurs that a data dependency test on a pair of loops will indicate a resultant illegality if the loops are fused. In such case, a fusion action on the loops is inhibited. Nevertheless, the important operational efficiencies which can be achieved through loop fusion are sufficiently important to justify explorations of other loop manipulations to enable subsequent fusion actions. Accordingly, there is a need for an improved compiler method for the generation of fused loop sequences. SUMMARY OF THE INVENTION A compile method employs loop fusion to improve execution of a first loop and a second loop in a code sequence. A compile method first peels one or more loop iterations from one of the loops to cause each of the loops to exhibit an equal number of loop iterations. Thereafter, an attempt is made to fuse the first and second loops, upon a condition that the resulting fused loop produces a same computational result as would be produced if the first loop and second loop were not fused. If the condition is not met, a loop reversal is performed on one of the loops and a fusing action is again attempted; if the attempted fusing action of the loops does not fulfill the condition, a loop reversal is performed on the other loop and a fusing action is again attempted. The combined loop peeling/loop reversal actions provide a higher probability of an ability to fuse the loops than otherwise. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a high level block diagram of a computer system adapted to perform the invention. FIG. 2 is a high level logical flow diagram illustrating the steps of the invention. DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT The invention provides a technique that uses loop peeling and/or loop reversal to enable legal and profitable loop fusions in situations where the candidate loops do not have the same number of iterations and/or the fused loop does not perform equivalent computations as the candidate loops do. Loop reversal is used to enable loop fusion when directly fusing the candidate loops yields a fused loop that does not perform the equivalent computations as the candidate loops or the fused loop is no more efficient at run time than the original loops. Consider the example below:
______________________________________
L1: DO I = 1, 3 L3: DO I = 1, 3
A(I) = I A(I) = I
ENDDO --> B(4-I) = A(4-I)
L2: DO I = 1, 3 ENDDO
B(4-I) = A(4-I)
ENDDO
______________________________________
The fused loop L3 (on the right above) is not semantically equivalent to loops L1 and L2 (on the left above) because the execution of fused loop L3 will produce different values for the same elements of array B than those produced by the execution of the original loops L1 and L2. Suppose, before Loop L1 is executed, the elements of array A contain values (0, 0, 0), respectively. After Loop L1 is executed, array A becomes (1, 2, 3). After Loop L2 is executed, array B contains (1, 2, 3) as well. However, after execution of fused loop L3, array A contains (1, 2, 3) but array B contains (1, 2, 0), which is not the same. The invention solves this problem by first applying loop reversal to Loop L2, yielding Loop L2' and then applying loop fusion to Loop L1 and Loop L2' to produce the fused Loop L3' as shown below:
______________________________________
L1: DO I = 1, 3 L3' :DO I = 1, 3
A(I) = I A(I) = I
ENDDO --> B(I) = A(I)
L2' :DO I = 1, 3 ENDDO
B(I) = A(I)
ENDDO
______________________________________
It must be noted that loop reversal cannot always be applied to a loop to produce an equivalent loop. Data dependence analysis is used to determine whether a loop reversal can be safely applied. Referring now to FIG. 1, a computer 10 includes a central processing unit (CPU) 12, an input/output module 14, a disk drive memory 16 and a random access memory (RAM) memory 18. Each of the aforesaid elements of computer 10 is coupled by a bus 20 which enables communication therebetween. Input/output module 14 is utilized to receive a program 22 for storage in either disk drive 16 or memory 18. Also stored in either memory 18 or disk drive 16 is a compiler program 24 which includes an optimizer procedure 26, and the following procedures: a loop peel procedure 28; a loop reversal procedure 30; and a loop fusion procedure 32. Compiler 24 further includes a dependency analyzer 34 and a reuse analyzer 38. While it will be hereafter assumed that each of the aforesaid procedures and subprocedures is contained within memory 18 and operate, in conjunction with CPU 12, to perform the invention hereof, it is to be understood that each of the aforesaid procedures can be present on a magnetic disk or other storage media 40 for direct loading, via CPU 12, into memory 18 or disk drive 16, as the case may be. Briefly stated, upon receipt of a program 22 to be compiled, compiler 24 is started and commences an analysis of program 22. While many analytical procedures are carried out, in regards to the invention to be described herein, the initial action of compiler 24 is identification of loops that are present in program 22. A further determination is made as to whether such loops are capable of being fused. As will be understood from the description below, once the loops are identified, loop peel procedure 28, loop reversal procedure 28 or a combination thereof are attempted in order to enable a loop fusion. During the operation of these procedures, dependency analyzer 34 is operated to determine whether a loop reversal and/or a loop fusion will be "legal", i.e., that the resulting fused loop will provide an identical output as the individual loops, prior to fusion. Further, reuse analyzer procedure 38 is operated to determine whether the resulting reversal and/or fusion actions will be profitable or pessimizing. If pessimizing, a fusion action is not undertaken. Turning now to FIG. 2, the method of the invention will be described in relation to the steps shown therein. Thereafter, a pseudo-code listing of the algorithm employed by the invention will be illustrated. Initially, it is to be assumed that compiler 24 has identified loops that are present in program 22. Assume further that loops L1 and L2 have been identified and that they are adjacent in program 22 (i.e., no other code is present therebetween). Assuming adjacency, loops L1 and L2 are examined to determine if they have an equal number of iterations (decision box 50). If no, the loop with the greater number of iterations is peeled until both loops L1 and L2 have an equal number of iterations (box 52). More specifically, each peeled iteration is appended as an in-line code sequence with the variable that would have been otherwise handled within the loop. Now that loops L1 and L2 have the same number of iterations, dependency and reuse analyses are performed thereon to determine if a fusion action can be performed both legally and profitably (box 54). If yes (decision box 56), a fusion action is undertaken (box 58), and the procedure is at an end. More specifically, the dependency analysis determines whether the data dependencies that are present in loops L1 and L2 will be altered by a resulting fusion action. If so, the fusion action cannot be undertaken, immediately. The reuse analysis determines whether a loop resulting from the fusion of L1 and L2 will operate in a more efficient manner than loops L1 and L2, operating separately. Here again, only if the answer is yes does the loop fusion procedure 32 indicated in box 58 operate upon loops L1 and L2. Otherwise, the procedure moves to box 60 wherein a dependency analysis is performed on loops L1 and L2 to determine if a reversal of either loop will result in an illegal operation. In specific, it is determined whether the data dependencies in a loop will be altered if a reversal is attempted, in which case, the reversal cannot be performed. Accordingly, if a loop under consideration is not already reversed (decision box 62) (e.g. in this case loop L1) and if the dependency analysis has indicated that it is legal to attempt to reverse loop L1 (decision box 62), loop reversal procedure 30 operates to reverse the sequence of operations of loop L1 (box 64). The procedure then recycles back to box 54 where dependency and reuse analyses are performed on reversed loop L1 and nonreversed loop L2. If the potential fusion is both legal and profitable, then the fusion action is performed (box 58). Otherwise, the procedure cycles down to decision box 66 where it is determined whether it is legal to reverse loop L2 (and loop L2 has not theretofore been reversed). If yes, loop reversal procedure 30 operates upon loop L2 (box 68) and the procedure again returns to box 54. Thereafter, if it is found that loop L1 and loop L2 may both be legally fused and that the resulting fusion provides a profitable result, then L1 and L2 are fused (box 58). Otherwise, the procedure is at an end. The following pseudo code listing describes how loop peeling, loop reversal, and loop fusion are integrated to achieve legal and profitable transformations and to avoid pessimizing or unnecessary ones.
______________________________________
Input: Two adjacent Loops L1 and L2
Output:L1 and L2 or an equivalent and more efficient
Loop L3
Algorithm
START:
Loop1 = L1
Loop2 = L2
if L1 and L2 have identical number of iterations then
goto DO.sub.-- FUSION
else (if L1 has one more iteration than L2) then
Peel off the first iteration of L1
Loop1 = Remainder of L1
goto DO.sub.-- FUSION
else (if L2 has one more iteration than L1) then
Peel off the last iteration of L2
Loop2 = Remainder of L2
goto DO.sub.-- FUSION
DO.sub.-- FUSION:
if it is legal and profitable to fuse Loop1 and
Loop2 then
L3 = Fuse Loop1 and Loop2
goto STOP
else if L1 is not reversed and it is legal to
reverse L1 then
Loop1 = Reversed L1
goto DO.sub.-- FUSION
else if L2 is not reversed and it is legal to
reverse L2 then
Loop2 = Reversed L2
goto DO.sub.-- FUSION
else
goto STOP
STOP:
______________________________________
In the above algorithm, the fusion of two adjacent loops is used to illustrate the overall procedure. In practice this algorithm applies to any number of adjacent loops and can be extended to work for loops whose iteration counts differ by a number that is greater than one. It should be understood that the foregoing description is only illustrative of the invention. Various alternatives and modifications can be devised by those skilled in the art without departing from the invention. Accordingly, the present invention is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims.
|
Same subclass Same class Consider this |
||||||||||
