System and method for parallel variable optimization5515535Abstract An optimizer for optimizing an intermediate representation (IR) tree having multiple nodes. The IR tree represents a partial compilation of a source code. The source code is written using a high level language supporting data parallel processing. According to the present invention, the optimizer optimizes the IR tree by minimizing the number and size of temporary parallel variables in the IR tree. Such minimization optimizes the IR tree because temporary parallel variables require an enormous amount of space in memory. Claims What is claimed is: Description A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
______________________________________
1 shape 2! 10! shape A;
2 shape 2! 10! shape B;
4 unsigned int: shape A Pvariable A;
5 unsigned int: shape B Pvariable B;
______________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 1
______________________________________
In Code Segment 1, the line numbers to the left are not included in the C* instructions but are for reference purposes only. This is true for all Code Segments contained herein. The shape declaration statements in lines 1 and 2 declare shapeA and shape B. Note that shapeA and shapeB have the same layout (that is, the same size, dimension, and order in which their elements are stored in memory). The parallel variable statements on lines 4 and 5 declare parallel variables PvariableA and PvariableB. Note that PvariableA and PvariableB do not have equivalent shapes since shapeA and shapeB, while they have the same layout, are not equivalent shapes. For two shapes to be equivalent, the shapes must either be identical (in which case the shapes would be equal), or be the subject of a shape assignment statement. A shape assignment statement is shown below in Code Segment 2.
______________________________________
1 shape ! ! shape A;
2 shape 2! 10! shape B;
4 shape A = shape B; /* shape assignment statement */
5
6 {
7 unsigned int: shape A Pvariable A;
8 unsigned int: shape B Pvariable B;
9 }
______________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 2
______________________________________
The shape assignment statement in line 4 assigns shapeB to shapeA, making shapeA and shapeB denote the same size, dimension, and layout. The shape assignment statement can be used only when shapeA and shapeB are declared as shapes having compatible ranks and the destination shape (that is, shapeA) is not compile time fully specified. The shape assignment node makes shapeA equivalent to shapeB. Thus, due to the shape assignment instruction, PvariableA and PvariableB have equivalent shapes. Returning to FIG. 5, if the IR code optimizer 313 determines, in step 506, that the node is a shape assignment node, then the IR code optimizer 313 performs step 526. In step 526, the IR code optimizer adds the source of the assignment statement (that is, the shape to the right of the equal sign in the shape assignment statement) to the list of equivalents for the destination of the shape assignment statement (that is, the shape to the left of the equal sign in the shape assignment statement). In step 528, the IR code optimizer 313 adds the destination shape to the list of equivalents for the source shape. An example of a list of equivalents is shown below in Table 1. As Table 1 illustrates, shapeA has as an equivalent shape, shapeB. Similarly, shapeB has as an equivalent shape, shapeA.
TABLE 1
______________________________________
List of Equivalents
Shape Equivalent Shapes
______________________________________
shape A shape B
shape B shape A
______________________________________
In step 508, the IR code optimizer 313 determines whether any nodes remain in the IR tree to process. If nodes remain, the IR code optimizer 313 returns to step 504. Therefore, the IR code optimizer 313 performs steps 504 and 506 (and conditionally steps 526 and 528) for each of the nodes in the IR tree. If the IR code optimizer 313 determines, in step 508, that all of the nodes in the IR tree have been processed, then the IR code optimizer 313 processes step 510. In step 510, the IR code optimizer 313 returns to the top of the IR tree and selects the first node. In step 512, the IR code optimizer 313 determines whether the node is a (1) a parallel instruction containing a promotion of a scalar value to a parallel variable, (2) a parallel operation containing a type demotion of a parallel variable, or (3) any other type of instruction. If the IR code optimizer 313 determines that the node is a parallel operation containing a promotion of a scalar value to a parallel variable, then the IR code optimizer 313 executes step 516. In step 516, the IR code optimizer 313 attempts to eliminate the promotion of the scalar value to the parallel variable. The IR code optimizer 313 attempts to eliminate this promotion in order to minimize the number of temporary parallel variables. Generally, the IR code optimizer 313 performs step 516 by replacing an instruction having a parallel variable operand with an equivalent instruction having a scalar value operand, instead of the parallel variable operand. If the IR code optimizer 313 determines that the node contains a parallel operation which contains a type demotion of a parallel variable, then the IR code optimizer 3 13 executes step 518. In step 518, the IR code optimizer 313 attempts to reduce the size of the demoted parallel variable. Steps 516 and 518 are described in greater detail below. If, in step 512, the IR code optimizer 313 determines that the node is something other than a parallel operation containing a promotion or a parallel operation containing a type demotion, then the IR code optimizer 313 executes step 520. The IR code optimizer 313 also executes step 520 after executing steps 516 and 518. In step 520, the IR code optimizer 313 determines whether there are any more nodes in the IR tree to process. If there are more nodes, then the IR code optimizer 313 returns to step 510. Therefore, the IR code optimizer 313 executes steps 510 and 512 (and conditionally, steps 516 and 518) for each of the nodes in the IR tree. If, in step 520, the IR code optimizer 313 determines that all the nodes have been processed, then the IR code optimizer 313 executes step 522. In step 522, the IR code optimizer 313 eliminates all unnecessary temporary parallel variables. After executing step 522, the parallel variable optimization performed by the IR code optimizer 313 is complete. Step 522 is described in greater detail below. 5. Step 516: Attempt to Eliminate Promotion As noted above, during step 516 the IR code optimizer 313 attempts to eliminate promotions of scalar values to parallel variables. Of course, promotions are sometimes necessary. Thus, the IR code optimizer 313 must distinguish between promotions which may be eliminated and promotions which may not be eliminated. The example C* instructions in Code Segments 3, 4, and 5 illustrate a promotion which may be eliminated. Code Segment 3 contains an addition of a+3 at line 5 a is a parallel variable. 3 is a scalar. According to C*, the scalar 3 must be promoted to a parallel variable before it may be added to a. Code Segment 5 illustrates the unoptimized IR code 332 generated by the compiler 154. Note that Code Segment 5 has not been optimized by the IR code optimizer 313. In line 7 of the Code Segment 5, a temporary parallel variable CMC.sub.-- p.sub.-- temp.sub.-- 0 (hereinafter called temporary parallel variable 0) is created. In line 9, the scalar 3 is moved into temporary parallel variable 0, thereby promoting the scalar 3 to a parallel variable. In line 10, a is added to the temporary parallel variable 0. In line 11, the content of the temporary parallel variable 0 is moved into the parallel variable result. Therefore, the C* instruction: result=a+3 from line 5 of Code Segment 3 is represented by 4 instructions (at lines 7, 9, 10, 11) in the unoptimized IR code 332 shown in Code Segment 5. Additionally, the temporary parallel variable 0 was required. Code Segment 4 illustrates the optimized IR code 399 representation of the Code Segment 3. The optimized IR code 399 illustrated in Code Segment 4 is generated by the IR code optimizer 313. In line 9 of the Code Segment 4, the parallel variable a is added to the scalar 3. The sum is assigned to the parallel variable result. The instruction in line 9 automatically promotes the scalar 3 to a parallel variable. As illustrated by Code Segment 4, the IR code optimizer 313 has optimized the unoptimized IR code optimizer 332 illustrated in Code Segment 5 in that the addition in line 5 of the Code Segment 3 is represented by a single instruction in the optimized IR code 399 (at line 9 of Code Segment 4). Also, the IR code optimizer 313 is able to compile the addition at line 5 of Code Segment 3 without using a temporary parallel variable. Therefore, the optimized IR code 399 (illustrated in Code Segment 4) generated by the IR code optimizer 313 executes faster and requires less RAM memory.
__________________________________________________________________________
Code Segment 3
1 int: current sum.sub.-- product (
2 int: current a)
3 {
4 int: current result;
5 result = (a+3);
6 return result;
7 }
__________________________________________________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 4
1
CMC.sub.-- i sum.sub.-- product (a)
2
CMC.sub.-- i a;
3
4
{
5
CMC.sub.-- Shape.sub.-- t CMC.sub.-- entry.sub.-- shape = ((CMC.sub.--
call.sub.-- start.sub.-- trap && CMC.sub.-- start.sub.-- trap()),
6
CM current.sub.-- vp set);
7
CMC.sub.-- Pvar.sub.-- t CMC.sub.-- retrun.sub.-- val = CMC.sub.--
global.sub.-- ret.sub.-- val;
8
CMC.sub.-- i result = CM.sub.-- allocate.sub.-- stack.sub.-- field.sub.-
- vp.sub.-- set (32, CM.sub.-- current.sub.-- vp.sub.-- set);
9
CM.sub.-- s.sub.-- add.sub.-- constant.sub.-- 2.sub.-- 1L (result, a,
3, 32);
10
CM.sub.-- s.sub.-- move.sub.-- 1L(CMC.sub.-- return.sub.-- val, result,
32);
11
CM.sub.-- deallocate.sub.-- stack.sub.-- through (result);
12
return(CMC.sub.-- return.sub.-- val);
13
}
__________________________________________________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 5
1
CMC.sub.-- i sum.sub.-- product (a)
2
CMC.sub.-- i a;
3
{
4
CMC.sub.-- Shape.sub.-- t CMC.sub.-- entry.sub.-- shape = (CMC.sub.--
call.sub.-- start.sub.-- trap && CMC.sub.-- start.sub.-- trap()),
5
CM.sub.-- current.sub.-- vp.sub.-- set);
6
CMC.sub.-- Pvar.sub.-- t CMC.sub.-- return.sub.-- val = CMC.sub.--
global.sub.-- ret.sub.-- val;
7
CMC.sub.-- i CMC.sub.-- p.sub.-- temp 0 = CM.sub.-- allocate.sub.--
stack.sub.-- filed.sub.-- vp.sub.-- set (32, CM.sub.-- current.sub.--
vp.sub.-- set);
8
CMC.sub.-- i result = CM.sub.-- allocate.sub.-- stack.sub.-- filed.sub.-
- bp.sub.-- set(32, CM.sub.-- current.sub.-- vp.sub.-- set);
9
CM.sub.-- s.sub.-- move.sub.-- const.sub.-- always.sub.-- 1L(CMC.sub.--
p.sub.-- temp.sub.-- 0, 3, 32);
10
CM.sub.-- s.sub.-- add.sub.-- 2.sub.-- 1L(CMC.sub.-- p.sub.-- temp.sub.-
- 0, a, 32);
11
CM.sub.-- s.sub.-- move.sub.-- 1L(result, CMC.sub.-- p.sub.-- temp.sub.-
- 0, 32);
12
CM.sub.-- s.sub.-- move.sub.-- 1L(CMC.sub.-- return.sub.-- val, result,
32);
13
CM.sub.-- deallocate.sub.-- stack.sub.-- through(CMC.sub.-- p.sub.--
temp.sub.-- 0);
14
return(CMC.sub.-- return.sub.-- val);
15
}
__________________________________________________________________________
Copyright, Thinking Machines Corporation, 1991
__________________________________________________________________________
FIG. 7 illustrates the operation of the IR code optimizer 313 during step 516 in greater detail. In describing the flow chart in FIG. 7, reference will be made to Code Segments 3, 4, and 5. In step 703, the IR code optimizer 313 scans forward in the unoptimized IR code 332 for the node in which the destination of the promotion occurs as the source. For example, in Code Segment 5, the destination of the promotion in line 9 is the temporary parallel variable 0. Specifically, the temporary parallel variable 0 receives the promotion of the scalar value 3. Scanning forward from line 9, the IR code optimizer 313 would locate the instruction at line 10. The temporary parallel variable 0 is the source in the instruction at line 10 (note that in the instruction at line 10, the temporary parallel variable 0 is both a source and a destination). Therefore, the IR code optimizer 313 in step 703 would locate the instruction at line 10. In step 704, the IR code optimizer 313 would determine whether a node was found in step 703. If a node was not found, the IR code optimizer 313 could not eliminate the promotion. Therefore, the IR code optimizer 313 would return. If the IR code optimizer 313 determines that a node was found in step 703, then the IR code optimizer 313 would process step 706. In step 706, the IR code optimizer 313 would determine whether an equivalent instruction from the Paris library 162 exists wherein the field associated with the promoted value is a scalar. For example, in Code Segment 4, the instruction at line 9 performs the same function (that is, an addition) as the instruction at line 10 in Code Segment 5. However, the instruction at line 9 in Code Segment 4 contains a scalar parameter whereas the instruction at line 10 in Code Segment 5 contains a parallel variable. If such an equivalent node is not found in step 706, then the promotion cannot be eliminated and the IR code optimizer 313 returns. However, if such a node is found in step 706, then the IR code optimizer 313 processes step 708. In step 708, the IR code optimizer 313 replaces the node in the unoptimized IR code 332 with its equivalent found in step 706. Therefore, referring to Code Segment 5, the IR code optimizer 313 would replace the line 10 with the instruction shown in Code Segment 4 at line 9. In step 710, the IR code optimizer 313 eliminates the promote node. For example, in Code Segment 5, the IR code optimizer 313 eliminates the instructions at lines 7, 9, and 11. Note that these instructions are not in the optimized IR code 399 shown in Code Segment 4. 6. Step 518: Attempt to Reduce Size of Elements of Demoted Parallel Variable As noted above, during step 518 the IR code optimizer 313 attempts to reduce the size of the demoted parallel variable. Such operation by the IR code optimizer 313 is called "demotion optimization." Demotion optimization according to the present invention shall be described below with reference to Code Segments 6, 7, and 8.
__________________________________________________________________________
1 short:current sum.sub.-- product (
2 short:current a,
3 short:current b,
4 short:current c)
5 {
6 short:current result;
7 result = (a+b)*c;
8 return result;
9 }
__________________________________________________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 6
__________________________________________________________________________
1
CMC.sub.-- s sum.sub.-- product(a,b,c)
2
CMC.sub.-- s a;
3
CMC.sub.-- s b;
4
CMC.sub.-- s c;
5
{
6
CMC.sub.-- Shape.sub.-- t CMC.sub.-- entry.sub.-- shape = ((CMC.sub.--
call.sub.-- start.sub.-- trap && CMC.sub.-- start.sub.-- trap()),
7
CM.sub.-- current.sub.-- vp.sub.-- set);
8
CMC.sub.-- Pvar.sub.-- t CMC.sub.-- retrun.sub.-- val = CMC.sub.--
global.sub.-- ret.sub.-- val;
9
CMC.sub.-- s CMC.sub.-- op.sub.-- ptemp.sub.-- 0 = CM.sub.-- allocate.s
ub.-- stack.sub.-- field.sub.-- vp.sub.-- set(16,
CM.sub.-- current.sub.-- vp.sub.-- set);
10
CMC.sub.-- s result = CM.sub.-- allocate.sub.-- stack.sub.-- field.sub.
-- vp.sub.-- set(16, CM.sub.-- current.sub.-- vp.sub.-- set);
11
CM.sub.-- s.sub.-- add.sub.-- 3.sub.-- 1L (CMC.sub.-- op.sub.--
ptemp.sub.-- 0, a, b, 16);
12
CM.sub.-- s.sub.-- multiply.sub.-- 3.sub.-- 1L(result, CMC.sub.--
op.sub.-- ptemp.sub.-- 0, c, 16);
13
CM.sub.-- s.sub.-- move.sub.-- 1L(CMC.sub.-- retrun.sub.-- val,
result, 16);
14
CM.sub.-- deallocate.sub.-- stack.sub.-- through(CMC.sub.-- op.sub.--
ptemp.sub.-- 0);
15
return(CMC.sub.-- return.sub.-- val);
16
}
__________________________________________________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 7
__________________________________________________________________________
1
CMC.sub.-- s sum.sub.-- product(a,b,c)
2
CMC.sub.-- s a;
3
CMC.sub.-- s b;
4
CMC.sub.-- s c;
5
{
6
CMC.sub.-- Shape.sub.-- t CMC.sub.-- entry.sub.-- shape = ((CMC.sub.--
call.sub.-- start.sub.-- trap && CMC.sub.-- start.sub.-- trap()),
7
CM.sub.-- current.sub.-- vp.sub.-- set);
8
CMC.sub.-- Pvar.sub.-- t CMC.sub.-- retrun.sub.-- val = CMC.sub.--
global.sub.-- ret.sub.-- val;
9
CMC.sub.-- i CMC.sub.-- p.sub.-- temp.sub.-- 0 = CM.sub.-- allocate.sub
.-- stack.sub.-- filed.sub.-- vp.sub.-- set(64,
CM.sub.-- current.sub.-- vp.sub.-- set);
10
CMC.sub.-- i CMC.sub.-- p.sub.-- temp.sub.-- 1 = CM.sub.-- add.sub.--
offset.sub.-- to.sub.-- field.sub.-- id(CMC.sub.-- p.sub.-- temp.sub.--
0, 32);
11
CMC.sub.-- s result = CM.sub.-- allocate.sub.-- stack.sub.-- field.sub.
-- vp.sub.-- set (16, CM.sub.-- current.sub.-- vp.sub.-- set);
12
CM.sub.-- s.sub.-- move.sub.-- 2L(CMC.sub.-- p.sub.-- temp.sub.-- 0,
a, 32, 16);
13
CM.sub.-- s.sub.-- move.sub.-- 2L(CMC.sub.-- p.sub.-- temp.sub.-- 1,
b, 32, 16);
14
CM.sub.-- s.sub.-- add.sub.-- 2.sub.-- 1L (CMC.sub.-- p.sub.--
temp.sub.-- 0, CMC.sub.-- p.sub.-- temp.sub.-- 1, 32);
15
CM.sub.-- s.sub.-- move.sub.-- 2L(CMC.sub.-- p.sub.-- temp.sub.-- 1,
c, 32, 16);
16
CM.sub.-- s.sub.-- multiply.sub.-- 2.sub.-- 1L(CMC.sub.-- p.sub.--
temp.sub.-- 0, CMC.sub.-- p.sub.-- temp.sub.-- 1, 32);
17
CM.sub.-- s.sub.-- move.sub.-- 2L(result, CMC.sub.-- p.sub.-- temp.sub.
-- 0, 16, 32);
18
CM.sub.-- s.sub.-- move.sub.-- 1L(CMC.sub.-- return.sub.-- val,
result, 16);
19
CM.sub.-- deallocate.sub.-- stack.sub.-- through(CMC.sub.-- p.sub.--
temp.sub.-- 0);
20
return(CMC.sub.-- retrun.sub.-- val);
21
}
__________________________________________________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 8
__________________________________________________________________________
Code Segment 6 illustrates a routine written in C*. Code Segment illustrates an unoptimized IR code corresponding to the routine in Code Segment 6. Code Segment 7 illustrates an optimized IR code corresponding to the unoptimized IR code of Code Segment 8. The IR code optimizer 313 performs demotion optimization in order to generate the optimized IR code of Code Segment 7 from the unoptimized IR code from Code Segment 8. Referring first to Code Segment 8, the variable a is moved to CMC.sub.-- p.sub.-- temp.sub.-- 0 (hereinafter called "temporary parallel variable 0") in line 12. At line 13, the variable b is moved to CMC.sub.-- p.sub.-- temp.sub.-- 1 (hereinafter called "temporary parallel variable 1"). In line 14, temporary parallel variable 0 (that is, variable a) is added to temporary parallel variable 1 (that is, variable b). In line 15, the variable c is moved into temporary parallel variable 1. In line 16, temporary parallel variable 0 (that is, the addition of a and b) is multiplied with temporary parallel variable 1 (that is, variable c). In line 17, the product is moved into the variable result. Note that temporary parallel variable 0 and temporary parallel variable 1 are integers having 32 bits. Variables a, b and c are short integers having 16 bits. Therefore, it is inefficient for the compiler to compile certain mathematical operations involving the short integer variables a, b, and c by using the integer temporary parallel variable 0 and temporary parallel variable 1. Code Segment 7 illustrates the optimized IR code corresponding to the unoptimized IR code in Code Segment 8. Note that in Code Segment 7, the IR code optimizer 313 has optimized the mathematical operations relating to the short integer variables a, b, and c using the variables themselves plus a short integer temporary parallel variable called CMC.sub.-- op.sub.-- ptemp.sub.-- 0 (hereinafter called "ptemp.sub.-- 0"). FIGS. 8A and 8B collectively illustrate the operation of the IR code optimizer 313 during step 518. In the operation shown in FIG. 8A, the IR code optimizer 313 determines whether a node having an output precision may be implemented using another node having a lower output precision. If such implementation is possible, then the IR code optimizer 313 inserts code to lower the output precision. In the operation shown in FIG. 8B, the IR code optimizer 313 eliminates unnecessary demote and promote nodes which were necessary to perform the above precision reduction. In describing the flow chart contained in FIGS. 8A and 8B, reference shall be made to Code Segments 7 and 8. Specifically, a running example shall be used wherein the IR code optimizer 313 is compiling the unoptimized IR code illustrated in Code Segment 8 to produce the optimized IR code shown in Code Segment 7. In step 802, the IR code optimizer 3 13 scans the IR tree backwards from the demote node (found in step 512) for a node having a destination equal to the source of the demote node. This node shall be called node 1 for reference purposes. Referring to Code Segment 8, assume that the demote node is as shown at line 17. During step 802, the IR code optimizer 313 would scan backwards and find the node corresponding to line 16. The node represented by line 16 is node 1. In step 804, the IR code optimizer 313 determines whether the operation of node 1 may be performed at a lower precision. Generally, mathematical and logical operations wherein the high order bits do not affect the result of the lower order bits may be performed at lower precisions. Such mathematical and logical operations which may be performed at lower precision include addition, subtraction, logical OR, logical AND, exclusive OR, and multiplication. Examples of mathematical and logical operations which may not be performed at a lower precision include division, shift right, and square root. Referring to Code Segment 8, the operation of node i is a multiply. As noted above, multiplies may be performed at lower precision. Therefore, the IR code optimizer 313 executes step 808. Note that if the IR code optimizer 313 determined that the operation of node 1 could not be performed at a lower precision, then the IR code optimizer 313 would not be able to perform demotion optimization. Therefore, the IR code optimizer 313 would return. In step 808, the IR code optimizer 313 generates, for each source of node 1, a temporary parallel variable having the same shape and type of the destination of the demote node. Referring to Code Segment 8, the destination of the demote node is the variable result. Result has a shape current and a type short integer. Node 1 has two sources: temporary parallel variable 0 and temporary parallel variable 1. Therefore, in step 808, the IR code optimizer 313 generates two new temporary parallel variables, each having the shape current and the type short integer. For reference purposes, these :new temporary parallel variables shall be called NT0 and NT1 (where NT stands for "new temporary parallel variable"). In step 810, for each source of node 1, the IR code optimizer 313 inserts a demote node Oust above node 1), having one source of node 1 as its source and one of the temporary parallel variables as its destination. Referring to Code Segment 8, in step 810, the IR code optimizer 313 would insert the two nodes shown in Code Segment 8.1. Code Segment 8.1 is an addition to Code Segment 8. As their line numbers indicate, the lines in Code Segment 8.1 occur after line 15 in Code Segment 8. The lines in Code Segment 8.1 represent type casts wherein temporary parallel variable 0 is east to a short integer of shape current and assigned to NT0. Similarly, temporary parallel variable 1 is cast as a short integer of shape current and assigned to NT1. In step 812, the IR code optimizer 313 replaces the destination of node 1 with the destination of the demote node. Referring to Code Segment 8, the IR code optimizer 313 replaces the node 1 at line 16 with the node 1 shown in Code Segment 8.2. Note that the IR code optimizer 313 has replaced the two variable multiply operation of line 16 (in Code Segment 8) with the three variable multiply operation in Code Segment 8.2.
__________________________________________________________________________
15.1
CM.sub.-- s.sub.-- move.sub.-- 2L(NT0, CMC.sub.-- p.sub.-- temp.sub.--
0, 16, 32);
15.2
CM.sub.-- s.sub.-- move.sub.-- 2L(NT1, CMC.sub.-- p.sub.-- temp.sub.--
1, 16, 32);
__________________________________________________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 8.1
__________________________________________________________________________
16 CM.sub.-- s.sub.-- multiply.sub.-- 3.sub.-- 3L(result, CMC.sub.--
p.sub.-- temp.sub.-- 0, CMC.sub.-- p.sub.-- temp.sub.-- 1, 16, 32,
__________________________________________________________________________
32);
Copyright, Thinking Machines Corporation, 1991
Code Segment 8,1
__________________________________________________________________________
In step 814, the IR code optimizer 313 replaces each source of node 1 with its associated temporary parallel variable. Referring to Code Segment 8, the IR code optimizer 313 produces the node 1 as shown in Code Segment 8.3. Note that, in Code Segment 8.3, the variables result, NT0, and NT1 are all short integers. Also note that the node 1 in Code Segment 8.3 is similar to the multiply shown in line 12 of Code Segment 7. In step 815, the IR code optimizer 313 removes the demote node. Following the completion of step 815, the IR code optimizer 313 has replaced mathematical operations, from the unoptimized IR code, which use temporary parallel values of large sizes, with equivalent mathematical operations, in the optimized IR code, which use temporary parallel variables having smaller sizes. In the remaining steps of the flow chart shown in FIGS. 8A and 8B, the IR code optimizer 313 attempts to delete demotions (and corresponding promotions) which were required to construct the optimized mathematical operation (using temporary parallel variables of smaller sizes). In step 818, the IR code optimizer 313 selects one of the nodes in the IR tree. The IR code optimizer 313 starts at the top node and progresses forward. In step 820, the IR code optimizer 313 determines whether the selected node is a promote node. If the selected node is not a promote node, then the IR code optimizer 313 determines whether there are any other nodes to process (in step 822). If there are more nodes to process, the IR code optimizer 313 selects another node for processing by moving to step 818. If, in step 820, the IR code optimizer 313 determined that the selected node was a promote node, then the IR code optimizer 313 executes step 824. Referring to Code Segment 8, assume that in step 818 the IR code optimizer 313 selected the node at line 12. The node at line 12 is a promote node wherein a short integer (variable a) is promoted to an integer (temporary parallel variable 0). In processing step 820, the IR code optimizer 313 would move to step 824. In step 824, the IR code optimizer 313 scans forward from the promote node for a demote node having a source equal to the destination of the promote node. Referring to Code Segment 8, the destination of the promote node (in line 12) is temporary parallel variable 0. The node at line 15.1 (in Code Segment 8.1) is a demote node having a source equal to temporary parallel variable 0. Therefore, the node at line 15.1 is a demote node having a source (temporary parallel variable 0) equal to the destination (temporary parallel variable 0) of the promote node. Therefore, while processing step 824, the IR code optimizer 313 would select the node at line 15.1 (in Code Segment 8.1). In step 826, the IR code optimizer 313 would determine whether a demote node as specified in step 824 was found. If such a demote node was not found, then demote nodes could not be eliminated and the IR code optimizer 313 returns. If, however, the IR code optimizer 313 found a demote node in step 824, the IR code optimizer 313 would process step 828. In step 828, the IR code optimizer 313 determines whether the type of the source of the promote node equals the type of the destination of the demote node. If the types are not equal, then the IR code optimizer 313 cannot eliminate demote nodes. Therefore, the IR code optimizer 313 returns. However, if the IR code optimizer 313 determines that the types are equal, then the IR code optimizer 313 processes step 830. Referring to Code Segment 8, the source of the promote node is variable a having a type of short integer. The destination of the demote node is temporary parallel variable NT0 having a type of short integer. Therefore, the IR code optimizer 313 would find that step 828 was satisfied and therefore process step 830. In step 830, the IR code optimizer 313 removes the promote node. Referring to Code Segment 8, in processing step 830, the IR code optimizer 313 would remove the promote node at line 12. In step 832, the IR code optimizer 313 replaces the demote node with an assignment wherein the destination of the demote node is set equal to the source of the promote node. Referring to Code Segment 8.1, the IR code optimizer 313 in step 832 would replace the node at line 15.1 with the node shown in Code Segment 8.4, below.
______________________________________
16 CM.sub.-- s.sub.-- multiply.sub.-- 3.sub.-- 1L (result, NT0, NT1,
16);
______________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 8.3
______________________________________
15.1 CM.sub.-- s.sub.-- move.sub.-- 1L(NT0, a, 16);
______________________________________
Copyright, Thinking Machines Corporation, 1991
Cose Segment 8,4
______________________________________
In step 834, the IR code optimizer 313 copy propagates the source of the assignment (created in step 832) to remove this assignment. Copy propagation is well known to those skilled in the art. Consider Code Segment 8.4. The assignment of a to NT0 essentially means that a may be used whenever NT0 is used. Therefore, during step 834, the IR code optimizer 313 scans through the IR tree and replaces all occurrences of NT0 with the variable a. Since a is used everywhere instead of NT0, NT0 is superfluous. Therefore, during step 834, the IR code optimizer 3 13 removes .the assignment of a to NT0 at line 15.1 (in Code Segment 8.4). This is illustrated in Code Segment 7 at line 11 wherein a is used in the add operation rather than a temporary parallel variable. After executing step 834, the IR code optimizer 313 has completed its demotion optimization of the demoted node found in step 512. Note that step 518 will be performed for the demote node in line 15.2 of Code Segment 8.1. After line 518 has been performed for all of the demote nodes in the Code Segment 8 (including the additions to the Code Segment 8 shown in Code Segments 8.1 and 8.3), the optimized IR code shown in Code Segment 7 will be achieved. 7. Step 522: Eliminate Unnecessary Temporary Parallel Variables As noted above, during step 522 the IR code optimizer 313 eliminates unnecessary temporary parallel variables. Unnecessary temporary parallel variables and the elimination of such unnecessary temporary parallel variables are illustrated in the C* instruction shown in Code Segment 9, 10, and 11. Code Segment 9 represents the source code 308. Code Segment 11 represents the unoptimized IR code 332 of the source code 308 shown in Code Segment 9. Code Segment 10 represents the optimized IR code 399 of the unoptimized IR code 332 shown in Code Segment 11.
__________________________________________________________________________
1 void fraction (a, b, c)
2 double:current a, b, c;
3 {
4 double:current m;
6 m = a*a/b = b*b/c;
7 }
__________________________________________________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 9
__________________________________________________________________________
1 void fraction (a, b, c)
2 CMC.sub.-- d a;
3 CMC.sub.-- d b;
4 CMC.sub.-- d c;
5 {
6 CMC.sub.-- Shape.sub.-- t CMC.sub.-- entry.sub.-- shape = ((CMC.sub.--
call.sub.-- start.sub.-- trap && CMC.sub.-- start.sub.-- trap()),
7 CM.sub.-- current.sub.-- vp.sub.-- set);
8 CMC.sub.-- d CMC.sub.-- p.sub.-- temp.sub.-- 0 = CM.sub.-- allocate.sub
.-- stack.sub.-- field.sub.-- vp.sub.-- set(128, CM.sub.-- current.sub.
-- vp.sub.-- set);
9 CMC.sub.-- d CMC.sub.-- p.sub.-- temp.sub.-- 2 = CM.sub.-- add.sub.--
offset.sub.-- to.sub.-- field.sub.-- id(CMC.sub.-- p.sub.-- temp.sub.--
0, 64);
10 CMC.sub.-- d m = CM.sub.-- allocate.sub.-- stack.sub.-- field.sub.--
vp.sub.-- set(64, CM.sub.-- current.sub.-- vp.sub.-- set);
11 CM.sub.-- f.sub.-- multiply.sub.-- always.sub.-- 3.sub.-- 1L(CMC.sub.--
p.sub.-- temp.sub.-- 0, a, a, 52, 11);
12 CM.sub.-- f.sub.-- divide.sub.-- always.sub.-- 3.sub.-- 1L(CMC.sub.--
p.sub.-- temp.sub.-- 0, b, 52, 11);
13 CM.sub.-- f.sub.-- multiply.sub.-- always.sub.-- 3.sub.-- 1L(CMC.sub.--
p.sub.-- temp.sub.-- 2, b, b, 52, 11);
14 CM.sub.-- f.sub.-- divide.sub.-- always.sub.-- 2.sub.-- 1L(CMC.sub.--
p.sub.-- temp.sub.-- 2, c, 52, 11);
15 CM.sub.-- f.sub.-- add.sub.-- 3.sub.-- 1L(m, CMC.sub.-- p.sub.--
temp.sub.-- 0, CMC.sub.-- p.sub.-- temp.sub.-- 2, 52, 11);
16 CM.sub.-- deallocate.sub.-- stack.sub.-- through(CMC.sub.-- p.sub.--
temp.sub.-- 0);
17 }
__________________________________________________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 10
__________________________________________________________________________
void fraction(a, b, c)
2 CMC.sub.-- d a;
3 CMC.sub.-- d b;
4 CMC.sub.-- d c;
5 {
6 CMC.sub.-- Shape.sub.-- t.sub.-- CMC.sub.-- entry.sub.-- shape =
((CMC.sub.-- call.sub.-- start.sub.-- trap && CMC.sub.-- start.sub.--
trap()),
7 CM.sub.-- current.sub.-- vp.sub.-- set);
8 CMC.sub.-- d CMC.sub.-- p.sub.-- temp.sub.-- 0 = CM.sub.-- allocate.sub
.-- stack.sub.-- field.sub.-- vp.sub.-- set(256, CM.sub.-- current.sub.
-- vp.sub.-- set);
9 CMC.sub.-- d CMC.sub.-- p.sub.-- temp.sub.-- 1 = CM.sub.-- add.sub.--
offset.sub.-- to.sub.-- field.sub.-- id(CMC.sub.-- p.sub.-- temp.sub.--
0, 64);
10 CMC.sub.-- d CMC.sub.-- p.sub.-- temp.sub.-- 2 = CM.sub.-- add.sub.--
offset.sub.-- to.sub.-- field.sub.-- id(CMC.sub.-- p.sub.-- temp.sub.--
1, 64);
11 CMC.sub.-- d CMC.sub.-- p.sub.-- temp.sub.-- 3 = CM.sub.-- add.sub.--
offset.sub.-- to.sub.-- field.sub.-- id(CMC.sub.-- p.sub.-- temp.sub.--
2, 64);
12 CMC.sub.-- d m = CM.sub.-- allocatte.sub.-- stack.sub.-- field.sub.--
vp.sub.-- set(64, CM.sub.-- current.sub.-- vp.sub.-- set);
13 CM.sub.-- f.sub.-- multiply.sub.-- always.sub.-- 3.sub.-- 1L(CMC.sub.--
p.sub.-- temp.sub.-- 0, a, a, 52, 11);
14 CM.sub.-- f.sub.-- divide.sub.-- always.sub.-- 3.sub.-- 1L(CMC.sub.--
p.sub.-- temp.sub.-- 1, CMC.sub.-- p.sub.-- temp.sub.-- 0, b, 52,
11);
15 CM.sub.-- f.sub.-- multiply.sub.-- always.sub.-- 3.sub.-- 1L(CMC.sub.--
p.sub.-- temp.sub.-- 2, b, b, 52, 11);
16 CM.sub.-- f.sub.-- divide.sub.-- always.sub.-- 3.sub.-- 1L(CMC.sub.--
p.sub.-- temp.sub.-- 3, CMC.sub.-- p.sub.-- temp.sub.-- 2, c, 52,
11);
17 CM.sub.-- f.sub.-- add.sub.-- 3.sub.-- 1L(m, CMC.sub.-- p.sub.--
temp.sub.-- 1, CMC.sub.-- p.sub.-- temp.sub.-- 3, 52, 11);
18 CM.sub.-- deallocate.sub.-- stack.sub.-- through(CMC.sub.-- p.sub.--
temp.sub.-- 0);
19 }
__________________________________________________________________________
Copyright, Thinking Machines Corporation, 1991
Code Segment 11
__________________________________________________________________________
Note that the unoptimized IR code 399 shown in Code Segment includes four temporary parallel variables. The optimized IR code 399 shown in Code Segment 10 uses two temporary parallel variables. The manner in which the IR code optimizer 313 executes step 522 to eliminate unnecessary temporary parallel variables will now be described in greater detail. FIGS. 9A and 9B collectively illustrate the operation of the IR code optimizer 313 during step 522. In step 902, the IR code optimizer 313 conducts a live variable analysis to generate conflict sets of the temporary parallels in the unoptimized IR code 332. The conflict sets contain live variables. In this context, live variables refer to temporary parallel variables which are simultaneously needed for processing some instruction. Therefore, the temporary parallel variables in a conflict set are said to conflict since one of the temporary parallel variables cannot be used in place of another temporary parallel variable since both are required to process a future instruction. Live variables, conflict sets, and methods for generating conflict sets are well-known to those skilled in the art. For example, referring to the unoptimized IR code 332 in Code Segment 11, the temporary parallel variable 0 and temporary parallel variable 1 do not conflict. Also, the temporary parallel variable 2 and temporary parallel variable 3 do not conflict. Therefore, the temporary parallel variable 0 and the temporary parallel variable 1 would not be in the same conflict set. Also, the temporary parallel variable 2 and the temporary parallel variable 3 would not be in the same conflict set. In step 904, the IR code optimizer 313 selects one of the temporary parallel variables in the unoptimized IR code 332 for processing. In step 906, the IR code optimizer 313 determines whether any equivalence classes exist. An equivalence class is essentially the opposite of a conflict set. An equivalence class contains temporary parallel variables which do not conflict. Additionally, the temporary parallel variables in an equivalence class have the same data type and the same shape. According to the present invention, each of the equivalence classes has a class representative. The class representative is one of the temporary variables in the equivalence class. All of the other temporary variables in the equivalence class map to the class representative. In other words, the class representative in an equivalence class replaces all of the other temporary parallel variables in the equivalence class. In this manner, the number of temporary parallel variables is minimized. If the IR code optimizer 313 determines in step 906 that more equivalence classes exist, then the IR code optimizer 313 processes step 908. In processing step 908, and subsequent steps 910, 912, and 914, the IR code optimizer 313 attempts to assign the selected temporary parallel variable to an equivalence class. Therefore, in step 908, the IR code optimizer 313 selects one of the equivalence classes. In step 910, the IR code optimizer 313 determines whether the selected temporary parallel variable conflicts with the class representative of the selected equivalence class. If they conflict, then the selected temporary parallel variable cannot be added to the current equivalence class. Therefore, the IR code optimizer 313 loops back to step 906 to process the next equivalence class. If, however, in step 910, the IR code optimizer 313 determines that the selected temporary parallel variable does not conflict with the class representative of the selected equivalence class (selected in step 908), then the IR code optimizer 313 processes step 912. In step 912, the IR code optimizer 313 determines whether the selected temporary parallel variable has a shape and data type which is equivalent to the class representative of the selected equivalent class. The manner in which the IR code optimizer 313 determines the shape equivalence/nonequivalence of two parallel variables is illustrated in FIG. 6. Note that the flowchart in FIG. 6 is written in terms of determining whether the shape of a parallel variable A is equivalent to the shape of a parallel variable B. In the context of implementing step 912 of FIG. 9A, assume that parallel variable A represents the selected temporary parallel variable and that parallel variable B represents the class representative of the selected equivalence class. In step 602, the IR code optimizer 313 determines whether the shape of variable A is equal to the shape of variable B. As discussed above, two parallel variables can only have equal shapes if the parallel variables were declared using the identical shape. This is illustrated in the C* instructions shown in Example 6. As illustrated in Example 6, variable1 and variable2 are declared using shapeC (see line 8). Therefore, the shape of variable1 and the shape of variable2 are equal.
______________________________________
1 shape 100! shape C;
2 shape ! shpae D;
3 shape 10! shape E;
4
5 shape D = shape E;
6
7 {
8 int: shape C variable 1, variable 2;
9 int: shape D variable 3;
10 int: shape E variable 4;
11 }
______________________________________
Copyright, Thinking Machines Corporation, 1991
Example 6
______________________________________
If, in step 602, the shape of the variable A is not equal to the shape of the variable B, then the IR code optimizer 313 executes step 604. In step 604, the IR code optimizer 313 determines whether the shape of the variable A and the shape of the variable B are equivalent. As noted above, the shapes of two parallel variables are equivalent if the shape of one of the parallel variables has been assigned to the shape of the other parallel variable via a shape assignment node. For example, in Example 6, the shape of variable3 is equivalent to the shape of variable4 because shaped and shapeE have been assigned to one another in the shape assignment instruction in line 5. According to the preferred embodiment of the present invention, the IR code optimizer 313 performs step 604 by using the list of equivalents which the IR code optimizer 313 generated in steps 504, 506, 508, 526, and 528. Specifically, the IR code optimizer 313 accesses the list of equivalents and determines whether the shape of the variable A is listed as an equivalent to the shape of the variable B (alternatively, the IR code optimizer 313 could access the list of equivalents and determine whether the shape of the variable B is listed as an equivalent of the shape of the variable A). If the IR code optimizer 313 determines that the shape of the variable A is either equal to (from step 602) or equivalent to (from step 604) the shape of the variable B, then the IR code optimizer 313 executes step 608. Otherwise, the IR code optimizer 313 executes step 606. In step 608, the IR code optimizer 313 indicates that the shape of the variable A is equivalent to the shape of the variable B. (Note that if two shapes are equal, then they are also equivalent.) In step 606, the IR code optimizer 313 indicates that the shape of the variable A is not equivalent to the shape of the variable B. (Note that if two shapes are not equivalent, then the two shapes cannot be equal.) Referring again to FIG. 9A, if the IR code optimizer 313 determines in step 912 that the selected temporary variable and the class representative of the selected equivalence class do not have equivalent shape and data type, then the selected temporary parallel variable cannot be added to the selected equivalence class. Therefore, the IR code optimizer 313 loops back to step 906 to process the next equivalence class. However, if the selected temporary parallel variable has an equivalent shape and data type as the class representative of the selected equivalence class, the IR code optimizer 313 processes step 914. In step 914, the IR code optimizer 313 adds the selected temporary parallel variable to the selected equivalence class. In steps 917, 930, 932, and 934, the IR code optimizer 313 determines whether the selected temporary parallel variable, which has just been added to the selected equivalence class, should be assigned as the class representative of the selected equivalence class. Therefore, in step 917, the IR code optimizer 313 determines whether the selected temporary parallel variable and class representative of the selected equivalence class have the same scope in the unoptimized IR code 332. The meaning and implication of scope, with regard to computer programs, are well-known to those skilled in the art. If, in step 917, the IR code optimizer 313 determines that they have the same scope, then the IR code optimizer 313 executes step 932. Otherwise, the IR code optimizer 313 executes step 930. In step 932, the IR code optimizer 313 determines whether the temporary parallel variable was declared before the class representative. If the selected temporary parallel variable was declared before the class representative, then the IR code optimizer 313 sets the class representative equal to the selected temporary parallel variable. This is done in step 934. If, in step 932, the IR code optimizer 313 finds that the selected temporary parallel variable was declared after the class representative, then the IR code optimizer 313 does not change the class representative. Rather, the IR code optimizer 313 loops to step 926 to process the next temporary parallel variable. In step 930, the IR code optimizer 313 determines whether the selected temporary parallel variable is at a higher scope than the class representative. If the selected temporary parallel variable is at a higher scope, then the IR code optimizer 313 sets the class representative equal to the selected temporary parallel variable. This is done in step 934. Otherwise, the IR code optimizer 313 loops to step 926 in order to process the next temporary parallel variable. Consider once again step 906. If, in processing step 906, the IR code optimizer 313 determines that more equivalence classes do not exist, then the IR code optimizer 313 executes steps 918, 920, 922, and 926 in order to create a new equivalence class. Specifically, in step 918 the IR code optimizer 313 creates a new equivalence class. In step 920, the IR code optimizer 313 assigns the selected temporary parallel variable to the new equivalence class. In step 922, the IR code optimizer 313 sets the class representative of the new equivalence class equal to the selected temporary parallel variable. In step 926, the IR code optimizer 313 determines whether there are more temporary parallel variables in the unoptimized IR code 332 to process. If more temporary parallel variables exist, then the IR code optimizer 313 loops back to 904 to process the next temporary parallel variable. Otherwise, the IR code optimizer 313 returns. The operation of the IR code optimizer 313 during step 522 is further described below with reference to Code Segments 9, 10, and 11. Specifically, if the IR code optimizer 313 processed the unoptimized IR code 332 of Code Segment 11 according to the flow chart shown in FIGS. 9A and 9B, then the IR code optimizer 313 would create two equivalence classes according to steps 918, 920, and 922. The first equivalence class would have temporary parallel variable 0 and temporary parallel variable 1. The second equivalence class would contain temporary parallel variable 2 and temporary parallel variable 3. The IR code optimizer 313 would build the equivalence classes by performing steps 906, 908, 910, 912, and 914. Note that temporary parallel variable 0, temporary parallel variable 1, temporary parallel variable 2, and temporary parallel variable 3 all have equivalent shape and data type. Therefore, the membership in the first and second equivalence classes is based entirely on conflict among the temporary parallel variables (as determined in step 910). Such conflict between the four temporary parallel variables of Code Segment 11 is described above. The class representative of the first equivalence class is temporary parallel variable 0. The class representative of the second equivalence class is temporary parallel variable 2. Note that the four temporary parallel variables in the Code Segment 1 all have the same scope. Therefore, class representative selection in the first and second equivalence classes is determined entirely on order of declaration (as determined in step 932). Note that temporary parallel variable 0 is declared before temporary parallel variable 1. Also note that temporary parallel variable 2 is declared before temporary parallel variable 3. Therefore, temporary parallel variable 0 and temporary parallel variable 2 are the class representatives in their respective equivalence classes. Since temporary parallel variable 0 is the class representative in the first equivalence class, the temporary parallel variable 1 maps to temporary parallel variable 0. This is illustrated in the optimized IR code 399 of Code Segment 10 where all references to temporary parallel variable 1 are replaced by references to temporary parallel variable 0. Since temporary parallel variable 2 is the class representative in the second equivalence class, the temporary parallel variable 3 maps to the temporary parallel variable 2. This is illustrated in the optimized IR code 399 of Code Segment 10 wherein all references to the temporary parallel variable 3 are replaced by references to the temporary parallel variable 2. While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
|
Same subclass Same class Consider this |
||||||||||
