Random test generation for compiler optimization6223337Abstract An optimized compiler is tested. Code segments are stored in a segment file. Each code segment includes a description of an external interface with other segments. A source function is built using the code segments. The source function is compiled using optimization to produce first executable code. The source function is also compiled without using optimization to produce second executable code. The first executable code is executed to produce first results. The second executable code is executed to produce second results. The first results are compared with the second results. An error is reported when the first results differ from the second results. Claims I claim: Description BACKGROUND
TABLE 1
Segment a:
INPUT: float z
BEGIN
z = 1.0;
END
Segment b:
OUTPUT: int z
BEGIN
z = 2;
END
Segment c:
INPUT: float x
OUTPUT: int z
BEGIN
z = (int)x;
END
Segment d:
INPUT: int x
INOUT: float z
BEGIN
z += z;
END
Segment e:
INPUT: float, float y
OUTPUT: float z
LOCAL: int t
BEGIN
t = (int)(x+y);
z = (float)t;
END
A segment consists of a snippet of legal source code enclosed by BEGIN/END markers. The legal source code is preceded by one or more special statements which name the input, output, inout, and local variables of the code snippet. The four possible special statements are INPUT, OUTPUT, INOUT, and LOCAL. The type of each variable is declared in each special statement. In the preferred embodiment, a variable can occur in only one of these statements and at most once in that statement. Also in the preferred embodiment, each segment is required to have either exactly one output variable or one inout variable. This restriction eases the implementation of the algorithm for random source code generator engine 20. The special statements are needed for several purposes. First, the special statements describe all the variables used within the segment. This allows random source code generator engine 20 to rename these variables to avoid name clashes between segments. Second, the special statements describe the interfaces of the segment. That is, the special statements describe what variables are produced or consumed by the segment, so that it can ensure that other segments would consume or produce these variables, so as to avoid generating programs with undefined variables or dead code. Inout variables are treated as both input and output variables. Third, the type specified for each variable is used to ensure that there are no type mismatches of variables shared across two or more segments. The type is also used by Random source code generator engine 20 to generate the type declarations for the variables. FIG. 2 shows the testing as orchestrated by random source code generator engine 20. In a step 31, two procedures with the same body of code are generated. The body is generated via the generate_func_body( ) algorithm shown below in Table 2. Compiler pragmas are also generated to enable optimization for one of the procedures and to disable optimization for the other. This step can be repeated many times to generate many tests in a single source file. In a step 32, the main program is generated. The main program calls each pair of procedures generated earlier and compares their results. If they differ. An error message is printed out. In a step 33, the generated source file is compiled. In a step 34 it is determined whether any compile-time errors occurred. If so, in a step 35, the source file and the errors are saved. In a step 36, the compiled source file is executed. In a step 37 it is determined whether any failures occurred during execution. If so, in a step 38, the source file and errors are saved. If there were no errors detected, in a step 39 the source file is discarded. In a step 40, the test may be repeated as often as desired. For example, the test may be repeated using either the same or a different segment file. Table 2 below shows the basic algorithm (generate_func_body( )) for generating a random procedure used by random source code generator engine 20.
TABLE 2
function generate_func_body( )
v = new_var("_retval", int)
U = P = {v}
code = "return _retval;.backslash.n"
n = 0
while (U != {}) do
v = pop(U)
s = choose_a_segment(v->type, n)
b = copy of s->body
b = rename_var(b, s->output, v)
if (s->has_inout) then U = union(U, {v})
for each x in union(s->inputs, s->locals) do
y = new_var(new_name(x), x->type)
b = rename_var(b,x,y)
if(x is in s->inputs) then U = union(U, {y})
P = union(O, {y})
end for
code = b . code
n = n + 1
end while
for each z in P do
code = z->type . "". z. ";.backslash.n". code
end for
return code
end function
As shown by the algorithm in Table 2, random source code generator engine 20 generates code for a function body from the bottom-up, starting with the return statement of the function. Random source code generator engine 20 first generates a statement that returns the return-value (_retval). Random source code generator engine 20 then inserts the return value (_retval) into the set of undefined variables (U) and the set of procedure-local variables (P). Random source code generator engine 20 then enters the main loop of the algorithm, which iterates until there are no more undefined variables. For each iteration of the loop, random source code generator engine 20 removes a variable (v) from the set of undefined variables (U), chooses a segment that defines that variable's type, and prints out that segment after some renaming. More specifically, the first step is to choose a segment whose output variable has the same type as the undefined variable. (The algorithm for choosing a segment is set out in Table 3 below). If the segment's output variable is an inout variable, the variable is put back into the set of undefined variables (U). Random source code generator engine 20 then makes a copy of the segment's body and renames all occurrences of the segment's output variable in the copy of its body to be the undefined variable (v). Finally, the iteration must handle all the input and local variables in the segment. For each input and local variable (x), Random source code generator engine 20 comes up with a new, unique name for the variable (y) and renames that variable (x) to its new name (y). Random source code generator engine 20 also adds the new variable to the set of procedure local variables (P) and if the variable is an input variable, to the set of undefined variables (U) as well. When the while loop finishes, the algorithm generates a declaration for each procedure local variable. Table 3 below sets out an algorithm (choose_a_segment( )) by which random source code generator engine 20 chooses a segment.
TABLE 3
function choose_a_segment(type t, int n)
if (n <= max_internal_segs) then
S = all segments
else
S = all segments s with no input nor inout variables
end if
S = intersection of S and all segments with output type t
if (S == {}) then abort
return some random segment in S
end function
The algorithm shown in Table 3 takes two arguments: the desired type for the output variable of the to-be-chosen segment (t), and the number of segments generated so far (n). If n is less than the maximum number of allowed internal segments, (i.e., segments with inputs), the algorithm simply returns some random segment whose output variable type is t. However, if n exceeds the maximum number of allowed internal segments, the algorithm imposes the additional constraint that the returned segment has no input nor inout variables. This was done to prevent the algorithm from generating a random program with potentially infinite number of segments, thus ensuring that the algorithm terminates. If for some reason, random source code generator engine 20 cannot find a segment with the imposed constraints on input and output variables, random source code generator engine 20 generates an error and aborts. Table 4 below sets out an example procedure generated by random source code generator engine 20 using the segments set out in Table 1 above and the algorithms set out in Table 2 and Table 3 above:
TABLE 4
#pragma OPT_LEVEL 0
int test_1( ) {
float v5;
float v4
float v3
int v2;
float v1;
int _retval;
v5 = 1.0;
v4 = 1.0;
v3 = (int)(v4 + v5);
v1 = (float)v3;
v2 = 2
v1 += v2;
_retval = (int)v1;
return _retval;
}
#pragma OPT_LEVEL 2
int test_opt_1( ) {
float v5:
float v4
float v3
int v2;
float v1;
int _retval;
v5 = 1.0;
v4 = 1.0;
v3 = (int)(v4 + v5);
v1 = (float)v3;
v2 = 2;
v1 += v2;
_retval = (int) v1;
return _retval;
}
void main( ) {
if (test_1( ) != test_opt_1( ))
printf("Test 1 failed.");
}
Random source code generator engine 20 begins by generating the statement "return _retval" and inserting the variable _retval in the set of undefined variables and the set of procedure local variables. Random source code generator engine 20 then enters the main while loop of function generate_func_body( ). This iterates until there are no more undefined variables. On the first iteration, the loop removes _retval from the set of undefined variables. Function generate_func_body( ) then calls choose_a_segment( ) with the type of _retval (int) and the number of segments generated so far (0). Now since the number of segments generated so far is less than the maximum number of internal segments, (for example, max_internal_segs is 3 for this example), function choose_a_segment( ) can use either segment (b) or segment (c). For example, segment (c) is randomly chosen from these two segments and returned. Next, random source code generator engine 20 renames the output variable z of the segment's body ("z=(int) x;") to _retval, then renames the input variable x to new variable v1. Random source code generator engine 20 then adds v1 to the set of undefined variables and the set of procedure-local variables. At the end of the iteration, the updated text of the segment's body is prepended to the start of the procedure's body. This process continues until there are no more undefined segments. Termination is guaranteed since after max_internal_segs number of segments are generated, the routine choose_a_segment( ) restricts the set of legal segments to either segment (a) or (b). Once there are no more undefined variables, random source code generator engine 20 inserts a declaration for each procedure-local variable and returns. In an alternative embodiment of the present invention, random source code generator engine 20 is extended to randomly generate control flow. To support control flow, a new SEGMENT( ) function is added to segment file 21 syntax. The SEGMENT( ) function is a subsegment which is used in the body of a segment to indicate locations where random source code generator engine 20 should insert randomly generated code. When a segment's body is added to the code of the random program, all SEGMENT( ) functions in that body are replaced with randomly generated sequences of segments. This allows random source code generator engine 20 to generate random, structured control flow without needing explicit knowledge on how specific control flow constructs work. Some examples of uses of the SEGMENT( ) functions are set out in Table 5 below.
TABLE 5
Segment a:
INPUT: int x, int i
OUTPUT: int j
BEGIN
if (x) {
SEGMENT(IN i, OUT j);
} else {
SEGMENT(IN i, OUT j);
}
END
Segment b:
INPUT: float x, int n
OUTPUT: float z
LOCAL: int i, float y
BEGIN
for (i = 0; i < n; ++i) {
SEGMENT(IN i, IN x, OUT y);
z += y;
}
END
The SEGMENT( ) function takes several arguments, which describe the variables to be used or defined by the random code sequence that will replace the SEGMENT( ) function. Each argument of a SEGMENT( ) function is prepended by an IN, OUT, or INOUT keyword, so to identify if the given variable is to be used, defined, or used and modified. The algorithm for generating random code for a SEGMENT( ) function is similar to the algorithm set out in Table 2, but with a few modifications. At the start of the algorithm, all OUT and INOUT arguments are placed in the undefined variable set. Also, whenever a new segment is chosen, before putting all input and inout variables of that segment in the undefined set, an attempt is made to match the input and inout variables with all unmatched IN and INOUT arguments. If an input/inout variable is the same type as an unmatched IN/INOUT argument, that variable is matched with the argument, the variable is renamed to the IN/INOUT argument's name, and the variable is not placed in the undefined variable set. If max_internal_segs number of segments is not generated yet, some random subset of the segment's input variables are purposely not matched to the IN arguments, so to encourage longer random code sequences. Occasionally, copies are inserted to resolve naming conflicts of INOUT arguments, where the names of the input and output variables matched with the INOUT argument do not match. The algorithm for choosing a segment, set out in Table 3 is modified to handle control flow. First, when more than max_internal_segs have been generated, instead of using segments with no inputs, the algorithm only uses the segments whose input variable's can match the unmatched IN arguments, (i.e., there are no more input variables of a particular type than the number of unmatched IN arguments of that type). Also, any segments containing a SEGMENT( ) function are not allowed when there are more than max_internal_segs segments. This is to prevent the algorithm from generating programs with infinitely deep nesting of control flow. One issue with the above modifications to handle SEGMENT( ) functions is that it is possible for the algorithm to generate random code sequences for a SEGMENT( ) function that do not use all the IN arguments. Although this does not result in incorrectly running programs, it is undesirable since it can result in a lot of dead code. This issue is partially remedied by extending the algorithm for choosing a segment to perform some lookahead to identify specific candidates. This technique is not foolproof, but it does make it less likely for unresolved IN arguments to occur. Other preferred embodiments of the present invention include extensions that are added to random source code generator engine 20 to improve its usefulness. For example, header segments are used. This allows specification of code in the header of a randomly generated program so that it is possible to specify global variables, constants, and type declarations. Also random number/token generators are used to cut down on the number of segments that need to be written in order to specify some random range of numbers, (e.g., from -2 to 2) or some random token, (e.g., `+`, `-`, `*`, `/`). Also, in various embodiments of the present invention, segments are assigned weights to make the segments more or less likely to be used. Further, multiple input (INM) arguments and multiple input/output (INOUTM) arguments of SEGMENT functions are provided for. This allows certain input to be used multiple times, (e.g., the induction variable of a loop or array types). To allow this, the two additional qualifiers for SEGMENT function arguments (IN and INOUTM) are added. Also scoping constraints are used. The scoping constraints restrict the places where a segment can be used to particular instances of SEGMENT functions. For example, if a segment uses the break statement in C, the use of that segment is restricted to loop or switch bodies. To allow this, segments are labeled with a particular scope type and specific SEGMENT( ) functions are labeled with that type. Also global (SGLOBAL) variables are used. Random source code generator engine 20 can be used to generate random loops with arrays accesses. A simplified example of segments random source code generator engine 20 uses to generate random code with arrays is shown in Table 6 below:
TABLE 6
Segment a:
HEADER
BEGIN
typedef int_array int[100];
typedef index int;
END
OUTPUT: int_array x
LOCAL: index i;
BEGIN
for (i = 0; i < 100; ++i)
x[i] = rand( );
END
Segment b:
INPUT: int_array x
OUTPUT: int z
LOCAL: index i;
BEGIN
z = 0;
for (i = 0; i < 100; ++i)
z = z*3 + x[i];
END
Segment c:
INPUT: int_array x
INOUT: int_array y
LOCAL: index i
OUTPUT:
BEGIN
for (i = 0; i < 100; ++i) {
SEGMENT(INM i, IN x, INOUT y);
}
END
Segment d:
INPUT: index i, int_array x
INOUT: int_array y
BEGIN
y[i] = y[i] * RND(-2:2) + x[i]*RND(-2:2);
END
Table 7 shows an example of random code random source code generator engine 20 generates from the segments shown in Table 6 above.
TABLE 7
typedef int_array int[100];
typedef index int;
int test_1( ) {
index i1, i2, i3, i4, i5;
int_array x1, x2, x3;
int_retval;
_retval = 0;
for (i5 = 0; i5 < 100; ++i5)
x2[i5] = rand( );
for (i4 = 0; i4 < 100; ++i4)
x1[i4] = rand( );
for (i2 = 0; i2 < 100; ++i2) 55
for (i3 = 0; i3 < 100; ++i3)
x3[i3] = rand( );
x1[i2] = x1[i2] *-1 + x3[i2]*-2;
x1[i2] = x1[i2] *2 + x2[i2]*1;
}
for (i1 = 0; i1 < 100; ++i1)
_retval = _retval*3 + x1[i1];
return _retval;
}
int test_opt_1( ) {
(Same source code as test_1( ))
}
int main( ) {
if (test_1( ) != test_opt_1( ))
printf("Test 1 failed.backslash.n");
}
In the example shown in Table 7 above, a new type called int_arrays is declared in the header segment of the algorithm shown. Header segments are described above. The new type called int_arrays declares an integer array. By doing this, arrays can be used as inputs, outputs and local variables in segments. Also, in the header segment there is declared a new type called index to hold the array index. A special type for array indices is declared so that array index variables can be specified as inputs or outputs of segments without the worry of out-of-bounds errors. Also, a segment is included that collapses an array to a single checksum integer so that all of the elements of an array can contribute to the return value of the procedure. With careful design of the segment file 21, more complex examples can be generated with multidimensional arrays, multiply nested loops, complex array indexing, and other characteristics of array intensive programs. The foregoing discussion discloses and describes merely exemplary methods and embodiments of the present invention. As will be understood by those familiar with the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
|
Same subclass Same class Consider this |
||||||||||
