Compiler method and apparatus for elimination of redundant speculative computations from innermost loops6301706Abstract A method and system for use with VLIW processing architectures for avoiding redundant speculative computations in the compilation of the innermost loops. The method includes identifying a plurality of compiled flow paths, where each of the paths includes a plurality of conditions associated with the loop that permits transformation of the loop for more optimum execution. It is then determined whether the loop has an inductive variable and a conditional statement that depends on the inductive variable. It is also determined whether the loop set up values of the inductive variables to subsets, and at least one of which the conditional statement is a loop invariant. Finally, if conditions in the determination steps satisfy the conditions of one of the paths, the loop is transformed into two consecutive loops executable with a reduced set of values of the inductive variable. Claims We claim: Description I. FIELD OF THE INVENTION
for(i = 0; i < N; i++)
{
if (i < K) /* K is a loop invariant */
{
then_alternative;
}
else
{
else_alternative;
}
}
In this example, the entire set of values of the inductive variable "i" can be divided into two subsets: [0, K) and [K, N). (In the C language notation (i>=0 && i<K) and (i>=K && i<N)). After transformation:
for(i = 0; i < K; i++)
{
if (i < K) /* this condition is equal to TRUE */
{
then_alternative;
}
else
{
else_alternative;
}
}
for(; i < N; i++)
{
if (i < K) /* this condition is equal to FALSE */
{
then_alternative;
}
else
{
else_alternative;
}
}
After known transformations (such as unreachable code removing) the following resulting loops are obtained:
for(i = 0; i < K; i++)
{
then_alternative;
}
for(; i < N; i++)
{
else_alternative;
}
If the examined loop has no redundant speculative computations (see 106, FIG. 1), the loop is compiled for software pipelining, see 107, FIG. 1. Then, the next innermost loop is considered starting with step 101, FIG. 1. If the examined loop has an inductive variable and does not have cross-iterational data dependencies (except for incrementing for decrementing the inductive variable), the compiler divides the examined loop into two or three consecutive loops. The loop is divided at the conditional statement, which brings redundant speculative calculations into the loop body. See 108 and 109, FIG. 1. The transformation 109 is referred to herein as condition-driven loop splitting. The first of the newly formed loops in 109 performs computations in the portion of the body of the original loop, which precedes the conditional statement, and also computes the value of this conditional statement. The second loop and the third one, if necessary, perform computations for one of the alternatives of the conditional statement as well as the remaining computations in the body of original loop. To pass information from the first newly formed loop to the second and possibly third newly formed loops, intermediate vectors are introduced in the first loop. Namely one vector is provided for each data dependence between the new loops. Each vector length depends on the number of times the condition was satisfied. If the maximum possible value of the inductive variable in the original loop is not computed, an additional outer loop is formed so as to prevent violations of the bounds of the intermediate vectors. This technique is employed when it is not determined how much memory should be allocated for a vector. Thereafter, execution of the compiler returns to step 101 for all the newly formed loops. The basic principles discussed above in connection with 108 and 109 are illustrated in the examples of condition-driven loop splitting below (Examples 2, 3, and 4). EXAMPLE 2 Before transformation
for (i=0;i<N;i++)
{
A[i] = B[i] + 1;
if (q)
C[i] = D[i] * B[i] + E[i] * F[i] + A[i];
}
After transformation:
M = 0;
for (i=0;i<N;i++)
{
A[i] = B[i] + 1;
if (q)
VEC [m++] = i;
}
for (j=0;j<M;j++)
{
ii = VEC1[j];
C[ii] = D[ii] * B[ii] + E[ii] * F[ii] + A[ii];
}
Another illustration of this condition-driven loop splitting transformation is provided in the following Example 3: EXAMPLE 3 Before transformation:
for (i = 0; i < N; i++)
{
q = A[i] + B[i];
if (q > 0)
{
C[i] = D[i] * B[i] + q * i;
}
else
{
C[i] = E[i] * F[i] + G[i] * i;
}
}
When this loop is divided at the condition "q>0", there exist two data dependencies between the first resultant loop and the second resultant loop ("q" and "i") and there is one data dependence between the first resultant loop and the third resultant loop ("i"). After transformation:
/* First resultant loop */
count_then = count_else = 0;
for (i = 0; i < N; i++)
{
q = A[i] + B[i];
if (q > 0)
{
VEC_i_then[count_then++] = i;
VEC_q[count_then++] = q;
}
else
{
VEC_i_else[count_else++] = i;
}
}
/* Second resultant loop */
for (j = 0; j < count_then; j++)
{
ii = VEC_i_then[j];
qq = VEC_q[j];
C[ii] = D[ii] * B[ii] + qq * ii;
}
/* Third resultant loop */
for (j = 0; j < count_else; j++)
{
ii = VEC_i_else[j];
C[ii] = E[ii] * F[ii] + G[ii] * i;
}
It should be noted that the loop iteration counter ("i" in this example) is simply one of data dependencies between the resultant loops. Another example (Example 4 below) of condition-driven loop splitting demonstrated a general case of the use of this technique. EXAMPLE 4 Before transformation:
for (i = 0; i < N; i++)
{
q = A[i] + B[i];
if (q > 0)
{
C[i] = D[i] * B[i] + q * i;
}
else
{
C[i] = E[i] * F[i] + G[i] * i;
}
}
This source program is similar to the source in the previous example of the condition-driven loop splitting transformation. In this example, however, let us assume that the compiler cannot compute the maximum possible value of N. In this case, the compiler does not know the size of the intermediate vectors and, therefore, it cannot statically allocate memory for these vectors. Moreover, in some cases the compiler knows the maximal possible value of N, but this value is too large to statically allocate an intermediate vector of such a size. In these cases, an outer loop is formed. More specifically, in the cases where the size of an intermediate vector is unknown or too large for static allocation, two consecutive transformations are performed. First, the compiler performs a known transformation, which is generally referred to as loop tiling or loop blocking (see e.g. Bacon et al., Compiler transformations for High-Performance Computing, ACM Computing Surveys, v26, n4, December 1994). This transformation of the above program is shown below. After loop blocking:
for(I=0; I < N; I += 1024)
{
N_min = min(I+1024,N);
for (i = I; i < N_min; i++)
{
q = A[i] + B[i];
if (q > 0)
{
C[i] = D[i] * B[i] + q * i;
}
else
{
C[i] = E[i] * F[i] + G[i] * i;
}
}
}
After the loop blocking, the innermost loop has a known maximum repetition number. This number is chosen by the compiler (it is 1024 in this example). The compiler "knows" that it is possible to statically allocate intermediate vectors of this size. As illustrated below, condition-based loop splitting is then applied to the transformed loop. After transformation:
/* Statically allocated intermediate vectors */
int VEC_i_then[1024];
(type of q) VEC_q[1024];
int VEC_i_else[1024];
for(I=0; I < N; I += 1024) /* outer loop */
{
N_min = min(I+1024,N);
/* First resulting loop */
count_then = count_else = 0;
for (i = I; i < N_min; i++)
{
q = A[i] + B[i];
if (q > 0)
{
VEC_i_then[count_then++] = i;
VEC_q[count_then++] = q;
}
else
{
VEC_i_else[count_else++] = i;
}
}
/* Second resultant loop */
for (j = 0; j < count_then; j++)
{
ii = VEC_i_then[j];
qq = VEC_q[j];
C[ii] = D[ii] * B[ii] + qq * ii;
}
/* Third resultant loop */
for (j = 0; j < count_else; j++)
{
ii = VEC_i_else[j];
C[ii] = E[ii] * F[ii] + G[ii] * i;
}
}
If the examined loop has a conditional statement and, in general, no improbable alternatives can be identified, but such improbable alternatives exist on certain paths reaching the loop (i.e. if there is a correlation between the behavior of this conditional statement and the control flow paths reaching the examined loop), the operation of the compiler is as follows. (See FIG. 1, blocks 110 and 111). The compiler performs loop cloning for the examined loop such that at least in one of the newly formed loops the conditional statement has improbable alternative. Thereafter, the execution of the compiler returns to step 101 for both newly formed loops. The transformation 111 is referred to herein as correlation-driven loop cloning. The following example (Example 5) illustrates the correlation-driven loop cloning. This example assists in understanding of loop cloning when there is a correlation in behavior of an in-loop condition and a control flow path leading to the loop. EXAMPLE 5 Before transformation:
if (condition1)
then_expression1;
else
else_expression1;
while(loop_condition)
{
/*
* There is a correlation in behavior of condition1
* and condition2.
*/
if (condition2)
then_expression2;
else
else_expression2;
}
After transformation:
if (condition1)
{
then_expression1;
while(loop_condition)
{
if (condition2)
then_expression2; /* improbable computations in this
loop copy */
else
else_expression2;
}
}
else
{
else_expression1;
while(loop_condition)
{
if (condition2)
then_expression2;
else
else_expression2; /* improbable computations in this
loop copy */
}
}
This transformation (as well as self-correlation based transformation described herein) does not eliminate computations from bodies of the resultant loops. However, after these transformations, both resultant loops become good candidates for other transformations described here, such as loop nesting, for example. If the examined loop has a conditional statement, which has no improbable alternatives, but has a high "self-correlation" (i.e. if the result of the condition remains the same in successive loop iterations with high probability--this information can be collected by a profiler, as known in the art), then the compiler performs the following transformation: instead of the original loop, the compiler forms an outer loop, which contains two copies of the original loop. See 112 and 113, FIG. 1; the transformation 113 is referred to herein as self-correlation-driven loop serialization. In each copy of the original loop the conditional statement is transformed such that the improbable result of this statement (i.e. true for one loop copy and false for another one) will lead to exit from the current loop copy and transfer to another loop copy. After performing this transformation the compiler returns to step 101 for both newly formed innermost loops. This transformation is illustrated in the following Example 6. EXAMPLE 6 Before transformation:
while (q1)
{
if (q2)
expr1 ();
else
expr2 ();
}
After transformation:
while (TRUE)
{
while (q1)
{
if (q2)
expr1 ();
else
goto L1;
}
break;
L1: expr2 ();
while (q1)
{
if (q2)
goto L2;
else
expr2 ();
}
break;
L2: expr1 ();
}
If a "probable area" can be selected in the body of the examined loop (see 114, FIG. 1), then the compiler transforms the original loop into nested loops in the following manner (see 115): The body of the examined loop is divided into the probable area and the improbable area; all control flow paths from the improbable area to the probable area (except for the cross-iterational paths) are eliminated due to the duplication of the parts of the probable area; a new outer loop is formed such that all the back edges of the originate loop that originate in the improbable area are moved from the head of original loop to the head of the new outer loop. As a result, the new outer loop contains the inner loop, which body has only probable computations of the loop. After performing loop nesting the compiler returns to step 101 for the newly formed inner loop. This transformation in 115 is referred to as loop nesting. The loop nesting transformation 115 is illustrated in the Example 7 below. EXAMPLE 7 Before transformations:
while (q1)
{
a++;
if (q2)
{
. . .
/* various computations */
}
}
After transformations:
while (TRUE)
{
while (q1)
{
a++;
if (q2)
goto L;
}
break;
L: . . .
/* The same computations */
}
The processing of the compiler for optimizing inner loops, employing the methods discussed above, is illustrated in the following example.
innermost_loop_list = NULL;
for each innermost loop L in procedure
{
insert L into innermost_loop_list;
}
while((L = next_unit(innermost_loop_list)) != NULL)
{
apply_known_flow_optimizations(L);
if (attempt_to_apply_invariant_driven_loop_cloning(L))
{
append both resulting loops to the end of
innermost_loop_list;
continue;
}
if (attempt_to_apply_range_driven_loop_splitting(L))
{
append both resulting loops to the end of
innermost_loop_list;
continue;
}
if (!loop_has_redundant_speculative_calculations(L))
{
enable_software_pipelining(L);
continue;
}
if (attempt_to_apply_condition_driven_loop_splitting(L))
{
append 2 or 3 resulting loops to the end of
innermost_loop_list;
continue;
}
if (attempt_to_apply_correlation_driven_loop_cloning(L))
{
append both resulting loops to the end of
innermost_loop_list;
continue;
}
if (attempt_to_apply_self_correlation_driven_loop_
serialization(L))
{
append both resulting innermost loops to the end of
innermost_loop_list;
continue;
}
if (attempt_to_apply_loop_nesting(L))
{
append resulting innermost loop to the end of
innermost_loop_list;
continue;
}
}
|
Same subclass Same class Consider this |
||||||||||
