System and method for multilevel promotion5561801Abstract A compiler for compiling a computer program wherein the computer program is adapted for use with a data parallel computer. The compiler comprises a front end which generates a parse tree from a source code. In generating the parse tree, the front end coordinates the compilation of type conversion operations and promotion operations such that run-time efficiency is maximized. In other words, the front end compiles the type conversion operations and promotion operations in an order which maximizes run-time efficiency. Claims What is claimed is: Description A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
______________________________________
shape [8]employees; (Example 1)
shape [16]companies;
______________________________________
The statements in Example 1 declare a shape called employees and a shape called companies. The employees shape has one dimension (a rank of 1) and 8 positions. The companies shape has one dimension and 16 positions. A dimension is also referred to as an axis. A shape can have multiple axes. Each of the axes are specified in a set of brackets to the left of the shape name. For example, the following statement in Example 2 declares a two-dimensional shape:
______________________________________
shape [256][512]image;
(Example 2)
______________________________________
The statement in Example 2 declares a shape called image. The shape image has two dimensions (a rank of 2), one of 256 positions and another of 512 positions. The left-most axis is referred to as axis 0. The next axis to the right is referred to as axis 1. Parallel variables are similar to standard C variables. However, parallel variables have a shape in addition to their data type and storage classes. The shape defines how many elements of a parallel variable exists, and how they are organized. Each element occupies one position within the shape and contains a single value. If a shape has 16384 positions, for example, a parallel variable of that shape has 16384 elements--one for each position. Parallel variables are declared as shown in Example 3.
______________________________________
double:employees employee.sub.-- id;
(Example 3)
char:companies company.sub.-- id;
______________________________________
In Example 3, employee.sub.-- id is declared as a parallel variable having shape employees and type double. company.sub.-- id is declared as a parallel variable having shape companies and type character. Each element of a parallel variable can be thought of as a single scalar variable. But a C* program can also carry out operations on all elements (or any subset of the elements) of a parallel variable at the same time. Operations which operate on parallel variables are called parallel operations. Once a parallel variable has been declared, left indexing may be used to specify an individual element of it. For example, [2]employee.sub.-- id refers to the third element of employee.sub.-- id. [2] is called the coordinate for this element. Elements of parallel variables are mapped to physical processors 112. This is illustrated in FIG. 1B. FIG. 1B illustrates the data parallel computer 110 in greater detail. As shown in FIG. 1B, the first element of employee.sub.-- id is mapped to the processor 0. The second element of employee.sub.-- id is mapped to the processor 1. Similarly, the third, forth, fifth, sixth, seventh, and eight elements of employee.sub.-- id are mapped to the processors 2, 3, 4, 5, 6 and 7, respectively. In a similar manner, the elements of the parallel variable company.sub.-- id are mapped to the processors 0-7. With regard to parallel variables of shape companies, note that each element does not map to a different physical processor. For example, both elements [0]company.sub.-- id and [8]company.sub.-- id map to physical processor 0. However, according to C*, all elements map to different virtual processors. Thus, for particular shapes, each physical processor may represent multiple virtual processors. For the shape companies, the number of virtual processors per physical processor is 2. For the shape employees, the number of virtual processors per physical processor is 1. This is called the virtual processor (VP) ratio. Before a parallel operation may be executed on a parallel variable, the current shape must be set to the shape of the parallel variable. The current shape may be set by using the C* "with" statement. For example, to operate with parallel variables of shape employees (such as employee.sub.-- id), the C* statements in Code Segment 1 may be used.
______________________________________
with(employees) {
/* operations on parallel variables of
shape employees go here */
};
Code Segment 1
______________________________________
Within the "with" statement of Code Segment 1, parallel operations may be performed on parallel variables of shape employees. However, parallel operations may not be performed on parallel variables (such as company.sub.-- id), having other shapes (such as companies). According to C*, parallel operations may be performed on a subset of the elements of a parallel variable. This is done using the C* "where" statement. The "where" statement restricts the context in which parallel operations are performed. A "where" statement specifies which positions in a shape remain active. Code in the body of a "where" statement applies only on elements in active positions. For example, the C* statements in Code Segment 2, below, restrict parallel operations to positions of shape employees where the value of the parallel variable employee.sub.-- id is greater than 6.
______________________________________
with(employees)
where (employee.sub.-- id > 6)
/* parallel code in restricted context
goes here */
Code Segment 2
______________________________________
The controlling expression that "where" evaluates to set the context must operate on a parallel variable of the current shape. The controlling expression evaluates to 0 (false) or non-zero (true) separately for each position. Positions in which the expression is false are made inactive. If no positions are active, code is still executed, but operations on the parallel variable of the current shape has no result. Initially, operations in all shapes are active. Referring to FIG. 1B, for example, the "where" statement in Code Segment 2 evaluates to true for the employee.sub.-- id elements in processors 4, 5 and 7. Therefore, as a result of the "where" statement in Code Segment 2, the employee.sub.-- id elements in processors 4, 5 and 7 are active. The use of the "where" statement to make certain positions of a shape inactive and other positions of the shape active is called context manipulation. C* supports promotion of scalar values to parallel variables. According to C*, programmers can use standard C binary operations when one of the operands is parallel and the other is scalar (the parallel variable must be of the current shape). In such cases, the scalar value is first promoted to a parallel value of the shape of the parallel operand, and this parallel value is used in the operation. C* also supports type conversions. Both promotions and type conversions may be performed in the same instruction. Consider Code Segment 3, below.
______________________________________
1 char c;
2 employee.sub.-- id = c;
Code Segment 3
______________________________________
Recall from Example 3 that employee.sub.-- id was declared as a parallel variable having a shape employees and a type double. In code segment 3, c is declared as a scalar variable having a type character. Therefore, the assignment statement in line 2 of Code Segment 3 causes c to be promoted to a parallel variable having a shape employees and a type double. In other words, the assignment statement in line 2 of Code Segment 3 causes two operations to be performed. Specifically, a promotion of a scalar variable to a parallel variable is performed. Also, a type conversion from an original base type (that is, character) to a final base type (that is, double) is performed. 3. Overview of the C* Compiler The general operation of the C* compiler is similar to the operation of conventional compilers. Such operation of conventional compilers are well known and are described in many publically available documents, such as Compilers, Principles, Techniques, and Tools by Aho, Sethi, and Ullman (Addison-Wesley Publishing Company, Reading Mass., 1988), which is herein incorporated by reference in its entirety. As noted above, however, the C* source code 308 may include instructions involving parallel variables. The compilation of such parallel instructions are not well known. In compiling the C* source code 308, the C* compiler 154 replaces these parallel instructions with calls to functions in the Paris library 162. FIG. 3A is a high level block diagram of the structure of the C* compiler 154 of the present invention. As FIG. 3A shows, the C* compiler 154 is comprised of a front end 310, a middle end 314, an optimizer 313, and a back end 314. FIG. 3B illustrates the structure of the front end 310. The front end 310 includes a lexical analyzer 320, a syntax analyzer 322, a dot handler 324, a type caster 326 and a semantic analyzer 328. FIG. 4 is a flow chart which illustrates the interaction between the modules of FIG. 3A. The flow chart of FIG. 4 also illustrates the detailed method of the step 210 of FIG. 2 of generating C source code from C* source code. In a step 410, the front end 310 receives C* source code 308 and constructs a parse tree 318. Parse tree construction is carried out by the lexical analyzer 320, syntax analyzer 322, dot handler 324, type caster 326 and semantic analyzer 328. In a step 412, the middle end 312 receives the parse tree 318 and generates an intermediate representation (IR) code 332. The IR code 332 is also called the IR tree 332. The IR code 332 contains calls to functions in the Paris library 1962. Note that the IR code 332 generated by the middle end 312 is unoptimized. In a step 414, the IR code optimizer 313 optimizes the unoptimized IR code 332 received from the middle end 312 to produce optimized IR code 399. One aspect of the optimization is carried out by an efficient parallel communication module 338 of the optimizer 313. Specifically, the efficient grid communication module 338 replaces IR nodes which specify general parallel communication with IR nodes which specify grid communication where doing so is possible and would result in a more efficient communication. The structure and operation of the efficient grid communication module 338 is described in a pending patent application entitled "A Compiler For Parallel Communication Instructions", which was cited above. In a step 415, the optimizer 313 performs other optimization, such as parallel variable optimization. Such optimization is described in a pending patent application entitled "System and Method For Parallel Variable Optimization", which was cited above. The back end 314 receives either the unoptimized IR code 332 (from the middle end 312) or the optimized IR code 399 (from the optimizer 313). In a step 416, the back end 318 generates C source code. As noted above, the C source code contains calls to functions in the Paris library 162. Thus, the C source code is also called C/Paris source code. 4. System and Method for Multilevel Promotions As noted above (see Code Segment 3 and the accompanying text), C* supports promotion of scalar variables to parallel variables. C* also supports type conversions. As also noted above, the elements of parallel variables are mapped to physical processors. Therefore, a promotion of a scalar variable to a parallel variable can be viewed as a broadcast operation wherein the value of the scalar variable is broadcast to the processors in which the elements of the destination parallel variable are located. Consequently, the terms broadcast operation and promotion operation shall be used interchangeably in this patent document. In compiling a parallel instruction which contains both a promotion operation and a type conversion operation (such as the assignment statement in line 2 of Code Segment 3), the front end 310 endeavors to achieve two goals. First, the front end attempts to make it explicit that the parallel instruction contains both a promotion operation and a type conversion operation. In other words, the front end 310 compiles the parallel instruction using at least two operations, wherein one of the operations is a promotion operation and another of the operations is a type conversion operation. By making it explicit that the parallel instruction contains both a promotion operation and a type conversion operation, the front end 310 facilitates the operation of the other portions of the compiler 154 (such as the middle end 312, optimizer 313, and back end 314). In addition, the front end 310 attempts to optimize the parallel instruction with regard to the order in which the promotion operation and the type conversion operation are performed. In some instances, it is more efficient for the promotion operation to be executed before the type conversion operation. In other instances, it is more efficient for the type conversion operation to be executed before the promotion operation. According to the present invention, the front end 310 determines and uses the ordering which is most run-time efficient. FIG. 5 illustrates the operation of the front end 310 when the front end 310 is compiling a parallel instruction which contains both a promotion operation and a type conversion operation. According to the method shown in FIG. 5, the front end 310 determines and compares the cost of a first alternative and a second alternative. The front end 310 then selects and uses the alternative which has the lowest cost. Herein, the term "first alternative" refers to performing the broadcast operation and then performing the type conversion operation. The term "second alternative" refers to first performing the type conversion operation and then the broadcast operation. According to the first alternative, the type conversion operation is performed by the processors 112 in the data parallel computer 110. This is the case since the scalar value is first broadcast from the front end computer 146 to the data parallel computer 110 before the type cast operation is performed. According to the second alternative, the type conversion operation is performed by the front end computer 146. This is the case since the type conversion operation is performed before the broadcast operation is performed. Referring now to FIG. 5, in steps 504, 506, and 508 the front end 310 determines the cost of the first alternative. Specifically, in step 504 the front end 310 determines the cost to broadcast a value of the original type. For example, referring to Code Segment 3, the front end 310 would determine the cost to broadcast a character value (associated with variable c) to the relevant processors (that is, the processors on which the elements of employee.sub.-- id are mapped). In step 506, the front end 310 determines the cost of performing a type conversion operation from a parallel value of the original base type to a parallel value of the final base type. For example, referring to Code Segment 3, in step 506, the front end 310 would determine the cost of performing a type conversion operation in the processors 112 from the character type (that is, the original base type) to the double type (that is, the final base type). In step 508, the front end 310 sums the cost associated with the first alternative. In other words, the front end 310 sums the costs determined in step 504 and 506. In steps 510, 512, and 514 the front end 310 determines the cost of the second alternative. Specifically, in step 510, the front end 310 determines the cost of a type conversion operation of a scalar value of the original base type to a scalar value of the final base type performed in the front end computer 146. For example, referring to Code Segment 3, the front end 310 in step 510 would determine the cost of a type conversion performed by the front end computer 146 from the character type (that is, the original base type) to the double type (that is, the final base type). In step 512, the front end 310 determines the cost to broadcast the converted value of the final base type. For example, referring to Code Segment 3, the front end 310 in step 512 determines the cost to broadcast a value of the type double (that is, the final base type) to the relevant processors 112 (that is, the processors on which the elements of employee.sub.-- id are mapped). In step 514, the front end 310 sums the cost of the second alternative. In other words, the front end 310 sums the costs determined in steps 510 and 512. In step 516, the front end 310 determines whether the costs associated with the first alternative are greater than the costs associated with the second alternative. If the costs associated with the first alternative are greater than the costs associated with the second alternative, then the front end 310 forms step 520. Otherwise, the front end performs step 518. In step 520, the front end 310 compiles the parallel instruction containing both the promotion operation and the type conversion operation by first emitting a type conversion instruction and then emitting a broadcast instruction. In step 518, the front end 310 compiles the parallel instruction containing both the promotion operation and the type conversion operation by first emitting a broadcast operation and then emitting a type conversion operation. Note that such costs relating to type conversion operations and promotion operations are readily determined from the information which accompany specific processors. Ordinarily, this processor information specifies promotion operations and type conversion operations in terms of required machine cycles. The operation of the front end 310 according to the method shown in FIG. 5 may be compile-time impractical since the flow chart in FIG. 5 must be performed by the front end 310 for every parallel instruction containing both a promotion operation and a type conversion operation. Therefore, according to a second embodiment of the present invention, either the first alternative or the second alternative is selected. The front end 310 then always operates according to the selected alternative. The selected alternative is determined by performing computer simulations which yield run-time performance numbers regarding the first alternative and the second alternative. From these computer simulations, it is possible to empirically determine which alternative, on average, is more run-time efficient. The first alternative may be advantageous since it broadcasts values of the original base type rather than values of the final base type. Thus, the first alternative is advantageous if the original base type is smaller than the final base type since it takes less time to transmit smaller values. However, the first alternative is not advantageous in this regard if the original base type is larger than the final base time. The second alternative may be advantageous because the front end computer 146 may be more efficient at performing type conversion operations than the processors 112 of the data parallel computer 110. The run-time efficiency of the first alternative decreases as the VP ratio increases. This is true because, with greater VP ratios, multiple iterations may be required to perform the type conversion operations on the processors 112 in the data parallel computer 110. Since the VP ratio is dependent on machine size (that is, the number of processors) and problem size (that is, the number of positions in shapes), the VP ratio cannot be determined at compile time. Thus, it is impossible to determine at compile time whether the VP ratio will adversely affect the run-time efficiency of the first alternative. In other words, it is impossible to determine at compile time the run-time efficiency of the first alternative. In contrast, the run-time efficiency of the second alternative is not affected by the VP ratio. Thus, the second alternative is advantageous because, unlike the first alternative, its run-time efficiency may be more precisely determined at compile-time. According to the preferred embodiment of the present invention, the front end 310 always operates according to the second alternative. In other words, in compiling parallel instructions containing both promotion operations and type conversion operations, the front end 310 always first emits a type conversion operation. Then, the front end 310 emits a promotion operation. The front end 310 operates in this manner because, according to the preferred embodiment, the front end computer 146 is more efficient at performing type conversion operations than the processors 112 in the data parallel computer 110. Code Segments 4 and 5 illustrate the operation of the front end 310 with regard to compiling parallel instructions containing both promotion operations and type conversion operations. Code Segment 4 contains a C* source code 308. Code Segment 5 contains C/Paris code 340 which corresponds to the C* source code 308 in Code Segment 4.
______________________________________
Code Segment 4
1 main() {
2 shape [8192]S;
3 char c;
4 int x;
5 int:S i;
6 double:S d;
8 /* Implicit cast */
9 x = c;
10 i = c;
11 d = c;
12
13 /* Explicit cast */
14 x = (int) c;
15 i = (int:S) c;
16 d = (double:S) c;
17
18 /* Explicit casts: first type conversion then promotion */
19 i = (int:S)(int) c;
20 d = (double:S)(double) c;
21
22 /* Explicit casts: first promotion then type conversion */
23 i = (int:S)(char:S) c;
24 d = (double:S)(char:S) c;
25 }
Code Segment 5
1 #include <.sub.-- CMC.sub.-- types.h>
2 #include <.sub.-- CMC.sub.-- defs.h>
3
4 static char CMC.sub.-- version[] = "C* Version 6.0.2 (60) for sun4";
5 int main();
6 int main()
7 {
8 CMC.sub.-- Shape.sub.-- t CMC.sub.-- entry.sub.-- shape =
((CMC.sub.-- call.sub.-- start.sub.-- trap &&
9 CMC.sub.-- start.sub.-- trap()), CM.sub.-- current.sub.-- vp.sub.--
set);
10 int CMC.sub.-- s.sub.-- temp.sub.-- 1;
11 double CMC.sub.-- s.sub.-- temp.sub.-- 2;
12 CMC.sub.-- Shape.sub.-- t S = allocate.sub.-- shape(&S, 1, 8192);
13 char c,
14 int x;
15 CMC.sub.-- Pvar.sub.-- t i = CM.sub.-- allocate.sub.-- stack.sub.--
field.sub.-- vp.sub.-- set(96, S);
16 CMC.sub.-- Pvar.sub.-- t d = CM.sub.-- add.sub.-- offset.sub.--
to.sub.-- field.sub.-- id(i, 32);
17 x = (int )c;
18 CMC.sub.-- s.sub.-- temp.sub.-- 1 = (int )c;
19 CM.sub.-- s.sub.-- move.sub.-- constant.sub.-- 1L(i, CMC.sub.--
s.sub.-- temp.sub.-- 1, 32);
20 CMC.sub.-- s.sub.-- temp.sub.-- 2 = (double )c;
21 CM.sub.-- f.sub.-- move.sub.-- constant.sub.-- 1L(d, CMC.sub.--
s.sub.-- temp.sub.-- 2, 52, 11);
22 x = (int )c;
23 CMC.sub.-- s.sub.-- temp.sub.-- 1 = (int )c;
24 CM.sub.-- s.sub.-- move.sub.-- constant.sub.-- 1L(i, CMC.sub.--
s.sub.-- temp.sub.-- 1, 32);
25 CMC.sub.-- s.sub.-- temp.sub.-- 2 = (double )c;
26 CM.sub.-- f.sub.-- move.sub.-- constant.sub.-- 1L(d, CMC.sub.--
s.sub.-- temp.sub.-- 2, 52, 11);
27 CMC.sub.-- s.sub.-- temp.sub.-- 1 = (int )c;
28 CM.sub.-- s.sub.-- move.sub.-- constant.sub.-- 1L(i, CMC.sub.--
s.sub.-- temp.sub.-- 1, 32);
29 CMC.sub.-- s.sub.-- temp.sub.-- 2 = (double )c;
30 CM.sub.-- f.sub.-- move.sub.-- constant.sub.-- 1L(d, CMC.sub.--
s.sub.-- temp.sub.-- 2, 52, 11);
31 CMC.sub.-- s.sub.-- temp.sub.-- 1 = (int )c;
32 CM.sub.-- s.sub.-- move.sub.-- constant.sub.-- 1L(i, CMC.sub.--
s.sub.-- temp.sub.-- 1, 32);
33 CMC.sub.-- s.sub.-- temp.sub.-- 2 = (double )c;
34 CM.sub.-- f.sub.-- move.sub.-- constant.sub.-- 1L(d, CMC.sub.--
s.sub.-- temp.sub.-- 2, 52, 11);
35 CM.sub.-- deallocate.sub.-- stack.sub.-- through(i);
36 if (S != CMC.sub.-- no.sub.-- vp.sub.-- set)
37 deallocate.sub.-- shape(&S);
38 }
39 static int CMC.sub.-- start.sub.-- trap()
40 {
41 CMC.sub.-- Shape.sub.-- t saved.sub.-- shape =
(CMC.sub.-- init(), CM.sub.-- current.sub.-- vp.sub.-- set);
42 CMC.sub.-- call.sub.-- start.sub.-- trap = 0;
43
44 CM.sub.-- set.sub.-- vp.sub.-- set(saved.sub.-- shape);
45 return 0;
46 }
______________________________________
Copyright, Thinking Machines Corporation, 1991
Referring first to Code Segment 4, lines 9-11 contain implicit cast instructions. Lines 14-16 contain explicit cast instructions. Lines 19-20 contain explicit cast instructions wherein the programmer attempts to first perform type conversion and then perform promotion. Lines 23-24 contain explicit cast instructions wherein the programmer attempts to first perform promotion and then perform type conversion. Line 17 in Code Segment 5 corresponds to line 9 in Code Segment 4. As shown in line 17 of Code Segment 5, the front end 310 has emitted a type conversion operation. Note that the line 9 in Code Segment 4 represents a scalar instruction having only a type conversion operation. Lines 18 and 19 in Code Segment 5 correspond to line 10 in Code Segment 4. In compiling the parallel instruction in line 10 of Code Segment 4 (which contains both a promotion operation and a type conversion operation), the front end 310 first emitted a type conversion operation in line 18 of Code Segment 5 and then emitted a promotion operation in line 19 of Code Segment 5. Lines 20 and 21 of Code Segment 5 correspond to line 11 of Code Segment 4. In compiling line 11 of Code Segment 4 (which contains both a promotion operation and a type conversion operation), the front end 310 first emitted a type conversion operation in line 20 of Code Segment 5 and then emitted a promotion operation in line 21 of Code Segment 5. In compiling the instructions in lines 14-16, 19-20, and 23-24 of Code Segment 4, the front end 310 emitted the instructions in lines 22-26, 27-30, and 31-34. respectively, in Code Segment 5. The instructions in lines 22-26 are the same as the instructions in line 17-21. The instructions in lines 27-30 and 31-34 are the same as the instructions in lines 18-21. This illustrates that the front end 310 operates in the same manner for implicit cast operations, explicit cast operations, and explicit casts operations wherein the programmer attempts to perform the promotion operation before the type yconversion operation, or vice versa. While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
|
Same subclass Same class Consider this |
||||||||||
