System, method, and computer program product for partial redundancy elimination based on static single assignment form during compilation6026241Abstract Partial redundancy elimination of a computer program is described that operates using a static single assignment (SSA) representation of a computer program. The SSA representation of the computer program is processed to eliminate partially redundant expressions in the computer program. This processing involves inserting .PHI. functions for expressions where different values of the expressions reach common points in the computer program. A result of each of the .PHI. functions is stored in a hypothetical variable h. The processing also involves a renaming step where SSA versions are assigned to hypothetical variables h in the computer program, a down safety step of determining whether each .PHI. function in the computer program is down safe, and a will be available step of determining whether each expression in the computer program will be available at each .PHI. function following eventual insertion of code into the computer program for purposes of partial redundancy elimination. The processing also includes a finalize step of transforming the SSA representation of the computer program having hypothetical variables h to a SSA graph that includes some insertion information reflecting eventual insertions of code into the computer program for purposes of partial redundancy elimination, and a code motion step of updating the SSA graph based on the insertion information to introduce real temporary variables t for the hypothetical variables h. Claims What is claimed is: Description A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
______________________________________
procedure .PHI.-Insertion
for each expression E.sub.i do {
DF.sub.-- phis[i] .rarw. empty-set
for each variable j in E.sub.i do
Var.sub.-- phis[i][j] .rarw. { }
for each occurrence X of E.sub.i in program do {
DF.sub.-- phis[i].rarw. DF.sub.-- phis[i] .orgate. DF.sup.+ (X)
for each variable occurence V in X do
if (V is defined by .phi.) {
j .rarw. index of V in X
Set.sub.-- var.sub.-- phis(Phi(V), i, j)
}
}
for each expression E.sub.i do {
for each variable j in E.sub.i do
DF.sub.-- phis[i].rarw. DF.sub.-- phis[i].orgate. Var.sub.-- phis[i][j]
insert .PHI.'s for E.sub.i according to DF.sub.-- phis[i]
}
end .PHI.-Insertion
procedure Set.sub.-- var.sub.-- phis(phi, i, j)
if (phi .epsilon slash. Var.sub.-- phis[i][j]) {
Var.sub.-- phis[i][j] .rarw. Var.sub.-- phis[i][j] .orgate. {phi}
for each operand V in phi do
if (V is defined by .phi.)
Set.sub.-- var.sub.-- phis(Phi(V), i, j)
}
end Set.sub.-- var.sub.-- phis
______________________________________
Copyright, Silicon Graphics, Inc., 1997
Well known algorithms for SSA .phi. placement with linear time complexity can alternatively be used to insert .PHI.'s (that is, to implement the .PHI.-Insertion step 912), such as the algorithms described in R. Johnson, D. Pearson, and K. Pingali, "The program structure tree: Computing control regions in linear time," Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 171-185, June 1994; V. Sreedhar and G. Gao, "A linear time algorithm for placing .phi.-nodes," Conference Record of the Eighteenth ACM Symposium on Principles of Programming Languages, pages 62-73, January 1995; and R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck., "Efficiently computing static single assignment form and the control dependence graph," ACM Trans. on Programming Languages and Systems, 13(4):451-490, October 1991, incorporated herein by reference in their entireties. Any modifications to these well known algorithms needed to insert .PHI.'s as described above will be apparent to persons skilled in the relevant art(s) based on the discussion contained herein. The correctness of the .PHI.-Insertion step 912 as described above is proven by the following LEMMA 1 and PROOF. LEMMA 1 (Sufficiency of .PHI. insertion) If B is a basic block where no expression .PHI. is inserted and the expression is partially anticipated at the entry to B, exactly one evaluation of the expression (counting .perp. as an evaluation) can reach the entry to B. PROOF: Suppose at least two different evaluations of the expression, .psi..sub.1 and .psi..sub.2, reach the entry to B. It cannot be the case that .psi..sub.1 and .psi..sub.2 both dominate B; suppose without loss of generality that .psi..sub.1 does not dominate B. Now there exists a block B.sub.0 that dominates B, is reached by .psi..sub.1 and .psi..sub.2, and lies in DF.sup.+ (.psi..sub.1) (n.b., B.sub.0 may be B). If .psi..sub.1 is a computation of the expression, the .PHI.-Insertion step 912 must have placed a .PHI. in B.sub.0, contradicting the proposition that .psi..sub.1 reaches B. If on the other hand .psi..sub.1 is an assignment to an operand .nu. of the expression (so .perp. is among the values reaching B), there must be a .phi. for .nu. in B.sub.0 by the correctness of the input SSA form. Hence when .PHI.-Insertion processed B.sub.0, it must have placed a .PHI. there, once again contradicting the proposition that .psi..sub.1 reaches B. 4.2 The Rename Step During the Rename step 914, the optimizer 106 assigns SSA versions to h in its SSA form. The version numbering produced for h differs from the eventual SSA form for the temporary t, but has the following two important properties. First, occurrences that have identical h-versions have identical values (and are subject to possible elimination). Second, any control flow path that includes two different h-versions must cross an assignment to an operand of the expression or a .PHI. for h. In performing the Rename step 914, the optimizer 106 preferably utilizes the well known SSA Renaming algorithm in which a preorder traversal of the dominator tree is conducted. This SSA Renaming algorithm is described in a number of publicly available documents, such as R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, "Efficiently computing static single assignment form and the control dependence graph," ACM Trans. on Programming Languages and Systems, 13(4):451-490, October 1991, incorporated by reference herein. However, this SSA Renaming algorithm performed by the optimizer 106 is modified as follows. In addition to maintaining a renaming stack for each variable in the program, the optimizer 106 maintains a renaming stack for every expression; entries on these expression stacks are popped as, during the traversal of the dominator tree, the optimizer 106 backs up to the blocks that define them. Maintaining the variable and expression stacks together allows the optimizer 106 to decide efficiently whether two occurrences of an expression should be given the same h-version. There are three kinds of occurrences of expressions in the program: (1) the expressions in the original program, which are herein called real occurrences: (2) the .PHI.'s inserted in the .PHI.-Insertion step 912; and (3) .PHI. operands, which are regarded as occurring at the exits of the predecessor nodes of the corresponding edges. The Rename step 914 performs the following steps upon encountering an occurrence q of the expression E.sub.i. If q is a .PHI., the optimizer 106 assigns q a new version. Otherwise, the optimizer 106 checks the current version of every variable in E.sub.i (i.e., the version on the top of each variable's rename stack) against the version of the corresponding variable in the occurrence on the top of E.sub.i 's rename stack. If all the variable versions match, the optimizer 106 assigns q the same version as the top of E.sub.i 's stack. If any of the variable versions does not match, then there are two cases: (a) if q is a real occurrence, the optimizer 106 assigns q a new version; (b) if q is a .PHI. operand, the optimizer 106 assigns the special version .perp. to that .PHI. operand to denote that the value of E.sub.i is unavailable at that point. Finally, the optimizer 106 pushes q on E.sub.i 's stack and proceeds. For example, FIG. 6 (corresponding to the example of FIGS. 4 and 5) shows the dense SSA graph that results after h in the example has been renamed. This expression renaming technique also takes advantage of the SSA representation of the program variables. The remaining steps of the SSAPRE algorithm (described below) rely on the fact that .PHI.'s are placed only where E.sub.i is partially anticipated (i.e., there is no dead .PHI. in the SSA graph of h). Dead .PHI.'s can efficiently be identified by applying the standard SSA-based dead store elimination algorithm (described in R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, "Efficiently computing static single assignment form and the control dependence graph," ACM Trans. on Programming Languages and Systems, 13(4):451-490, October 1991, incorporated by reference herein) on the SSA graph formed after renaming. From here on, it is assumed that only live .PHI.'s are represented in the SSA form of h. The correctness of the Rename step 914 as described above is proven by the following LEMMAS and PROOFS. LEMMA 2 (Correctness of version renaming) If two occurrences are assigned the same version by Rename, the expression has the same value at those two occurrences. PROOF: This lemma follows directly from the fact that the Rename step 914 assigns the same version to two occurrences of an expression E.sub.i only if all the SSA versions of their expression operands match. This proof can be completed by reference to the single-assignment property and the correctness of the SSA renaming algorithm for variables (described in R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, "Efficiently computing static single assignment form and the control dependence graph," ACM Trans. on Programming Languages and Systems, 13(4):451-490, October 1991). LEMMA 3 (Versions capture all the redundancy) If two occurrences .psi..sub.x, .psi..sub.y are assigned versions x, y by Rename, exactly one of the following holds: no control flow path can reach from .psi..sub.x to .psi..sub.y without passing through a real (i.e., non-.phi.) assignment to an operand of the expression (meaning that there is no redundancy between the occurrences); or there is a path (possibly empty, in which case x=y) in the SSA graph of use-def arcs from y to x (implying that any redundancy between .psi..sub.x and .psi..sub.y is exposed to the algorithm). PROOF: Suppose there is a control flow path from .psi..sub.x to .psi..sub.y that does not pass through any assignment to an operand of the expression. The proof will proceed by induction on the number of .PHI.'s for the expression traversed by . If encounters no .PHI., x=y establishing the basis for our induction. If hits at least one .PHI., the last .PHI. on defines .psi..sub.y. Now the induction hypothesis is applied to that part of up to the corresponding operand of that .PHI.. 4.3 The DownSafety Step One criterion required for PRE to insert a computation is that the computation is down-safe (or anticipated) at the point of insertion. Down safety is a well known concept. Down safe flags are designated as "ds" in the figures. Down safety is briefly described with reference to FIG. 15. The expression a+b in basic blocks 1506 and 1510 is partially redundant. The issue is whether the expression a+b can be inserted into basic block 1504 (for the purpose of eliminating the occurrence of expression a+b in basic block 1510). a+b can only be inserted into basic block 1504 if that insertion would be down safe. An insertion is not down-safe if there is a control flow path from that insertion in which the expression is not evaluated before program exit. The path from basic block 1504 to basic block 1512 via basic block 1508 is such a path. Accordingly, an insertion of the expression a+b in basic block 1504 is not down safe. Thus, the insertion should not take place. In the dense SSA graph constructed by the Rename step 914, each node either represents a real occurrence of the expression or is a .PHI.. It can be shown that SSAPRE insertions are only necessary at .PHI.'s, so down-safety only needs to be computed for them. Using the SSA graph, down-safety is sparsely computed by backward propagation along the use-def edges. A .PHI. is not down-safe if there is a control flow path from that .PHI. along which the expression is not evaluated before program exit or before being altered by redefinition of one of its variables. Except for loops with no exit, this can happen only due to one of the following cases: (a) there is a path to exit along which the .PHI. result version is not used; or (b) there is a path to exit along which the only use of the .PHI. result version is as an operand of a .PHI. that is not down-safe. Case (a) represents the initialization for the backward propagation of down-safety; all other .PHI. 's are initially marked down.sub.-- safe. DownSafety propagation is based on case (b). Since a real occurrence of the expression blocks the case (b) propagation, the algorithm marks each .PHI. operand with a flag has.sub.-- real.sub.-- use when the path to the .PHI. operand crosses a real occurrence of the same version of the expression. It is convenient to perform initialization of the case (a) down.sub.-- safe and computation of the has.sub.-- real.sub.-- use flags during a dominator-tree preorder pass over the SSA graph. Since the Rename step 914 conducts such a pass, the optimizer 106 preferably includes these calculations in the Rename step 914 with minimal overhead. The operation of the optimizer 106 when performing the tasks related to down safe during the Rename step 914 is represented in a flowchart 1602 shown in FIG. 16. Initially, in step 1604, all down safe flags are true and all has.sub.-- real.sub.-- use flags are false. When the Rename step 914 assigns a new version to a real occurrence of expression E.sub.i or encounters a program exit, it examines the occurrence on the top of E.sub.i 's stack before pushing the current occurrence (steps 1606 and 1608). If the top of stack is a .PHI. occurrence, the Rename step 914 clears that .PHI.'s down.sub.-- safe flag because the version it defines is not used along the path to the current occurrence (or exit). This is represented by steps 1610 and 1612. When the Rename step 914 assigns a version to a .PHI. operand, it sets that operand's has.sub.-- real.sub.-- use flag if and only if a real occurrence for the same version appears at the top of the rename stack. This is represented by steps 1616 and 1618. As indicated by step 1614, after performance of the down safe steps, the optimizer 106 continues with the Rename step 914. The DownSafety propagation algorithm 918 is alternatively represented by the following high level pseudocode:
______________________________________
procedure DownSafety
for each expr-.PHI. F in program do
if (not down.sub.-- safe (F))
for each operand opnd of F do
Reset.sub.-- downsafe (opnd)
end DownSafety
procedure Reset.sub.-- downsafe(X)
if (has.sub.-- real.sub.-- use(X) or X not defined by .PHI.)
return
F .rarw. .PHI. that defines X
if (not down.sub.-- safe(F))
return
down.sub.-- safe(F) .rarw. false
for each operand opnd of F do
Reset.sub.-- downsafe(opnd)
end Reset.sub.-- downsafe
______________________________________
Copyright, Silicon Graphics, Inc., 1997
The correctness of the Down Safety step 918 as described above is proven by the following LEMMA and PROOF. LEMMA 4 (Correctness of down.sub.-- safe). A .PHI. is marked down.sub.-- safe after DownSafety if and only if the expression is fully anticipated at that .PHI.. PROOF: It is first noted that each .PHI. marked not down.sub.-- safe during the Rename step 914 is indeed not down-safe. The SSA renaming algorithm has the property that every definition dominates all its uses. Suppose that a .PHI. appears on the top of the stack when the Rename step 914 creates a new version or encounters a program exit. In the case where a program exit is encountered, the .PHI. is obviously not down-safe because there is a path in the dominator tree from the .PHI. to exit containing no use of the .PHI.. Similarly, if the Rename step 914 assigns a new version to a real occurrence, it does so because some expression operand has a different version in the current occurrence from its version at the .PHI.. Therefore, there exists a path in the dominator tree from the .PHI. to the current occurrence along which there is an assignment to .nu.. Minimality of the input HSSA program implies, then, that any path from the .PHI. to the current occurrence and continuing to a program exit must encounter an assignment to .nu. before encountering an evaluation of the expression. Therefore the expression is not fully anticipated at the .PHI.. Next we make the observation that any .PHI. whose down.sub.-- safe flag gets cleared during the DownSafety step 918 is not down-safe, since there is a path in the SSA use-def graph from an unused version to that .PHI. where no arc in the path crosses any real use of the expression value. Indeed one such path appears on the recursion stack of the Reset.sub.-- downsafe procedure at the time the down.sub.-- safe flag is cleared. Finally, it is necessary to show that all the .PHI.'s that are not down-safe are so marked at the end of DownSafety. This fact is a straightforward property of the depth-first search propagation performed by Reset.sub.-- downsafe. 4.4 The WillBeAvail Step The WillBeAvail step 920 has the task of predicting whether an expression will be available at each .PHI. result following insertions for PRE. In the Finalize step 922, insertions will be performed at incoming edges corresponding to .PHI. operands at which the expression will not be available (without that insertion), but the .PHI.'s will.sub.-- be.sub.13 avail predicate is true. The operation of the optimizer 106 when performing the WillBeAvail step 920 is represented by a flowchart 1702 shown in FIG. 17. The WillBeAvail step 920 includes two forward propagation passes performed sequentially, in which the optimizer 106 conducts a reachability search in the SSA graph for each expression. The first pass (represented by step 1706) computes the can.sub.-- be.sub.-- avail predicate for each .PHI. by first initializing it to true for all .PHI.'s. It then begins with the "boundary" set of .PHI.D's at which the expression cannot be made available by any down-safe set of insertions. These are .PHI.'s that do not satisfy the down.sub.-- safe predicate and have at least one .perp.-valued operand. The can.sub.-- be.sub.-- avail predicate is set to false and the false value is propagated from such nodes to others that are not down-safe and that are reachable along def-use arcs in the SSA graph, excluding arcs at which has.sub.-- real.sub.-- use is true. .PHI. operands defined by .PHI.'s that are not can.sub.-- be.sub.-- avail are set to .perp. along the way. After this propagation step, can.sub.-- be.sub.-- avail is false for a .PHI. if and only if no down-safe placement of computations can make the expression available. The .PHI.'s where can.sub.-- be.sub.-- avail is true together designate the range of down-safe program areas for insertion of the expression, plus areas that are not down-safe but where the expression is fully available in the original program. The entry points to this region (the .perp.-valued .PHI. operands) can be thought of as SSAPRE's earliest insertion points. The second pass (represented by step 1708) works within the region computed by the first pass to determine the .PHI.'s where the expression will be available following the insertions that will actually be made, which implicitly determines the latest (and final) insertion points. It works by propagating the later predicate, which it initializes to true wherever can.sub.-- be.sub.-- avail is true. It then begins with the real occurrences of the expression in the program, and propagates the false value of later forward to those points beyond which insertions cannot be postponed (moved downward) without introducing unnecessary new redundancy. At the end of the second pass, will.sub.-- be.sub.-- avail for a .PHI. is given by ##EQU1## FIG. 6 shows the values of down.sub.-- safe (ds), can.sub.-- be.sub.-- avail (cba), later and will.sub.-- be.sub.-- avail (wba) for the program example at each .PHI. for h. For convenience, a predicate is defined to indicate those .PHI. operands where insertions will be performed: It is said that insert holds for a .PHI. operand if and only if the following hold: the .PHI. satisfies will.sub.-- be.sub.-- avail; and the operand is .perp. or has.sub.-- real.sub.-- use is false for the operand and the operand is defined by a .PHI. that does not satisfy will.sub.-- be.sub.-- avail. The invention preferably uses the term placement to refer to the set of points in the program where a particular expression's value is computed. The WillBeAvail step 920 is alternatively represented by the following high level pseudocode:
______________________________________
procedure Compute.sub.-- can.sub.-- be.sub.-- avail
for each expr-.PHI. F in program do
if (not down.sub.-- safe(F) and
can.sub.-- be.sub.-- avail (F) and
.E-backward. an operand of F that is .perp.)
Reset.sub.-- can.sub.-- be.sub.-- avail(F)
end Computer.sub.-- can.sub.-- be.sub.-- avail
procedure Reset.sub.-- can.sub.-- be.sub.-- avail(G)
can.sub.-- be.sub.-- avail(G) .rarw. false
for each expr-.PHI. F with operand opnd defined by G do
if (not has.sub.-- real.sub.-- use(opnd)) {
set that .PHI. operand to .perp.
if (not down.sub.-- safe (F) and can.sub.-- be.sub.-- avail (F))
Reset.sub.-- can.sub.-- be.sub.-- avail (F)
end Reset.sub.-- can.sub.-- be.sub.-- avail
procedure Compute.sub.-- later
for each expr-.PHI. F in program do
later (F) .rarw. can.sub.-- be.sub.-- avail (F)
for each expr-.PHI. F in program do
if (later (F)) and
.E-backward. an operand opnd of F such that
(opnd .noteq. .perp. and has.sub.-- reach.sub.-- use (opnd)))
Reset.sub.-- later (F)
end Computer.sub.-- later
procedure Reset.sub.-- later (G)
later (G) .rarw. false
for each expr-.PHI. F with operand opnd defined by G do
if (later (F))
Reset.sub.-- later (F)
end Reset.sub.-- later
procedure Will Be Avail
Compute.sub.-- can.sub.-- be.sub.-- avail
Compute.sub.-- later
end Will Be Avail
______________________________________
Copyright, Silicon Graphics, Inc., 1997
The correctness of the WillBeAvail step 920 as described above is proven by the following LEMMAS and PROOFS. LEMMA 5 (Correctness of can.sub.-- be.sub.-- avail). A .PHI. satisfies can.sub.-- be.sub.-- avail if and only if some safe placement of insertions makes the expression available immediately after the .PHI.. PROOFS: Let F be a .PHI. satisfying can.sub.-- be.sub.-- avail. If .PHI. satisfies down.sub.-- safe, the result is immediate because it is safe to insert computations of the expression at each of .PHI.'s operands. If F is not down-safe and satisfies can.sub.-- be.sub.-- avail, note that the expression is available in the unoptimized program at F because there is no path to F from a .PHI. with a .perp.-valued operand along def-use arcs in the SSA graph. Now let F be a .PHI. that does not satisfy can.sub.-- be.sub.-- avail. When the algorithm resets this can.sub.-- be.sub.-- avail flag, the recursion stack of Reset.sub.-- can.sub.-- be.sub.-- avail gives a path bearing witness to the fact that no safe set of insertions can make the expression available at F. LEMMA 6 (Correctness of later). A can.sub.-- be.sub.-- avail .PHI. satisfies later after WillBeAvail if and only if there exists a computationally optimal placement under which that .PHI.'s result is not available immediately after the .PHI.. PROOF: The set of .PHI.'s not satisfying later after WillBeAvail is exactly the set of can.sub.-- be.sub.-- avail .PHI.'s reachable along defuse arcs in the SSA graph from has.sub.-- real.sub.-- use operands of can.sub.-- be.sub.-- avail .PHI.'s. Let e a path in the def-use SSA graph from such a .PHI. operand to a given expr-.PHI. F with later(F)=false. We will prove by induction on the length of that F must be made available by any computationally optimal placement. If F is not down-safe, the fact that F is can.sub.-- be.sub.-- avail means all of F's operands must be fully available in the unoptimized program. They are therefore trivially available under any computationally optimal placement, making the result of F available as well. In the case where F is down-safe, if contains no arcs there is a has.sub.13 real.sub.-- use operand of F. Such an operand must be fully available in the optimized program, so any insertion below F would be redundant with that operand, contradicting computational optimality. Since F is down-safe, that operand is already redundant with real occurrence(s) in the unoptimized program and any computationally optimal placement must eliminate that redundancy. The way to accomplish this is to perform insertions that make the expression fully available at F. If F is down-safe and contains at least one arc, we apply the induction hypothesis to the .PHI. defining the operand of F corresponding to the final arc on to conclude that operand must be made available by any computationally optimal placement. As a consequence, any computationally optimal placement must make F available by the same argument as in the basis step (previous paragraph). The following lemma shows that the will.sub.-- be.sub.-- avail predicate computed by WillBeAvail faithfully corresponds to availability in the program after insertions are performed for .PHI. operands satisfying insert. LEMMA 7 (Correctness of will.sub.-- be.sub.-- avail). The set of insertions chosen by SSAPRE together with the set of real occurrences makes the expression available immediately after a .PHI. if and only if that .PHI. satisfies will.sub.-- be.sub.-- avail. PROOF: We establish the "if" direction with a simple induction proof showing that if there is some path leading to a particular .PHI. in the optimized program along which the expression is unavailable, that .PHI. does not satisfy will.sub.-- be.sub.-- avail. Let Q(k) be the following proposition: For any expr-.PHI. F, if there is a path P(F) of length k in the SSA def-use graph beginning with .perp., passing only through .PHI.'s that are not will.sub.-- be.sub.-- avail along arcs that do not satisfy has.sub.-- real.sub.-- use insert, and ending at F, F is not will.sub.-- be.sub.-- avail. Q(0) follows directly from the fact that no insertion is performed for any operand of F, since it is not marked will.sub.-- be.sub.-- avail. The fact that F has a .perp.-valued operand implies that such an insertion would be required to make F available. Now to see Q(k) for k>0, notice that Q(k-1) implies that the operand of F corresponding to the final arc of (F) is defined by a .PHI. that is not will.sub.-- be.sub.-- avail, and there is no real occurrence of the expression on the path from that defining .PHI. to the operand of F. Since we do not perform an insertion for that operand, F cannot satisfy will-be-avail. To establish the "only if" direction, suppose expr-.PHI. D F does not satisfy will-be-avail. Either F does not satisfy can-be-avail or F satisfies later. In the former case, F is not available in the optimized program because the insertions performed by SSAPRE are down-safe. In the latter case, F was not processed by Reset.sub.-- Later, meaning that it is not reachable along def-use arcs from a .PHI. satisfying will-be-avail. Therefore, insertion above F would be required to make F's result available, but F is not will-be-avail so the algorithm performs no such insertion. 4.5 The Finalize Step The Finalize step 922 plays the role of transforming the SSA graph for the hypothetical temporary h to the valid SSA form that reflects insertions and in which no .PHI. operand is .perp.. This valid SSA form is stored in a precise SSA graph 924. Operation of the optimizer 106 when performing the Finalize step 922 is represented by a flowchart 1802 in FIG. 18. When performing the Finalize step 922, the optimizer 106 performs the following tasks: It decides for each real occurrence of the expression whether it should be computed on the spot or reloaded from the temporary. For each one that is computed, it also decides whether the result should be saved to the temporary. It sets two flags, reload and save, to represent these two pieces of information. This is represented by step 1806. For .PHI.'s where will-be-avail is true, insertions are performed at the incoming edges that correspond to .PHI. operands at which the expression is not available. This is represented by step 1808. Expression .PHI.'s whose will-be-avail predicate is true may become .phi.'s for t. .PHI.'s that are not will-be-avail will not be part of the SSA form for t, and links from will-be-avail .PHI.'s that reference them are fixed up to refer to other (real or inserted) occurrences. This is represented by step 1810. Extraneous .PHI.'s are removed. This is represented by step 1812. The flags discussed herein are generally called insertion information. The Finalize step 922 creates a table Avail-def.sub.i (for available definitions) for each expression E.sub.i to perform the first three of the above tasks. The indices into this table are the SSA versions for E.sub.i 's hypothetical temporary h. Avail-def.sub.i [x] points to the defining occurrence of E.sub.i for h.sub.x, which must be either: (a) a real occurrence; or (b) a .PHI. for which will-be-avail is true. The Finalize step 922 performs a preorder traversal of the dominator tree of the program control flow graph. In the course of this traversal it will visit each defining occurrence whose value will be saved to a version of the temporary, t.sub.y, before it visits the occurrences that will reference t.sub.y ; such a reference is either: (a) a redundant computation that will be replaced by a reload of t.sub.y, or (b) a use of h.sub.x as a .PHI. operand that will become a use of t.sub.y as a .phi. operand. Although in an embodiment of the invention the processing order of the Finalize step 922 is modeled after the standard SSA rename step (described in R Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, "Efficiently computing static single assignment form and the control dependence graph," ACM Trans. on Programming Languages and Systems, 13(4):451-490, October 1991, incorporated by reference herein), the Finalize step 922 does not require any renaming stack because SSA versions have already been assigned. In the course of its traversal, the Finalize step 922 processes occurrences as follows: 1. .PHI.--If its will-be-avail is false, nothing needs to be done. (An example of this is the .PHI. in block 3 of the running example. See FIG. 6.) Otherwise, h.sub.x must be visited for the first time. Set Avail-def.sub.i [x] to this .PHI.. 2. Real occurrence of E.sub.i --If Avail-def.sub.i [x] is .perp., h.sub.x is being visited for the first time. If Avail-def.sub.i [x] is set, but that occurrence does not dominate the current occurrence, the current occurrence is also a definition of h.sub.x. (An example of this latter case is the first h.sub.2 in block 9 of FIG. 6.) In both of these cases, Avail-def.sub.i [x] is updated to the current occurrence. Otherwise, the current occurrence is a use of h.sub.x, and the save flag is set in the occurrence pointed to by Avail-def.sub.i [x] and the reload flag of the current occurrence. 3. Operand of .PHI. in a successor block Recall that .PHI. operands are considered as occurring at their corresponding predecessor blocks.)--If will-be-avail of the .PHI. is false, nothing needs to be done. Otherwise if the operand satisfies insert (e.g., operand h.sub.2 in the .PHI. at block 6 of FIG. 6), insert a computation of E.sub.i at the exit of the current block. If will-be-avail holds but the operand does not satisfy insert, set the save flag in the occurrence pointed to by Avail-def.sub.i [x] (which cannot be .perp.), and update that .PHI. operand to refer to Avail-def.sub.i [x] (e.g., operand h.sub.3 in the .PHI. at block 8 of FIG. 6). The Finalize step 922 is alternatively represented by the following high level pseudocode:
______________________________________
procedure Finalize.sub.-- visit (block)
for each occurrence X of E.sub.i in block do {
save (X) .rarw. false
reload (X) .rarw. false
x .rarw. version (X)
if (X is .PHI.) {
if (will.sub.-- be.sub.-- avail (X))
Avail.sub.-- def[i][x] .rarw. X
else if (Avail.sub.-- def[i][x] is .perp. or
Avail.sub.-- def[i][x] does not dominate X)
Avail.sub.-- def[i][x] .rarw. X
else if (Avail.sub.-- def[i][x] is real) {
save (Avail.sub.-- def[i][x]) .rarw. true
reload (X) .rarw. true
}
}
for each S in Succ (block) do {
j .rarw. WhichPred (S, block)
for each expr-.PHI. F in S do
if (will.sub.-- be.sub.-- avail (F)) {
i .rarw. Which Expr (F)
if (j.sup.th operand of F satisfies insert) {
insert E.sub.i at the exit of block
set (j.sup.th operand of F to inserted occurrence
}
else {
x .rarw. version (j.sup.th operand of F)
if (Avail.sub.-- def[i][x] is real) {
save (Avail.sub.-- def[i][x]) .rarw. true
set j.sup.th operand of F to Avail.sub.-- def[i][x]
}
}
}
}
for each K in Children (DT, block) do
Finalize.sub.-- visit (K)
end Finalize.sub.-- visit
procedure Finalize
for each version x of E.sub.i in program do
Avail.sub.-- def[i][x] .rarw. .perp.
Finalize.sub.-- visit (Root (DT))
end Finalize
______________________________________
Copyright, Silicon Graphics, Inc., 1997
The removal of extraneous .PHI.'s, or SSA minimization, for h is not a necessary task as far as PRE is concerned. However, the extraneous .PHI.'s take up storage in the program representation, and may affect the efficiency of other SSA-based optimizations to be applied after PRE. Removing extraneous .PHI.'s also requires changing their uses to refer to their replacing versions. SSA minimization can be implemented as a variant of the .phi. insertion step in SSA construction. All of the .phi.'s are initially marked as being extraneous. Applying the .phi. insertion algorithm, the optimizer 106 can find and mark the .PHI.'s that are not extraneous based on the iterated dominance frontier of the set of real assignments to h in the program (i.e., real occurrences with the save bit set plus the inserted computations). The optimizer 106 then passes over all the extraneous .PHI.'s to determine a replacing version for each one. Whenever an extraneous .PHI. defines version h.sub.x and has an operand using h.sub.y that is not defined by an extraneous .PHI., y is the replacing version for x. From such a .PHI. the optimizer 106 propagates the replacing version through all its uses: once the replacing version for a .PHI. is known, the replacing version for every use of that .PHI. becomes known (the replacing version of each use is the same as the replacing version of the .PHI.) and the optimizer 106 propagates recursively to all uses of that .PHI.. It is straightforward to see that this method replaces all references to extraneous .PHI.'s by references to non-extraneous occurrences. FIG. 7 shows the example program at the end of the Finalize step 922. The correctness of the Finalize step 922 as described above is proven by the following LEMMAS and PROOFS. LEMMA 8 (Correctness of save/reload). At the point of any reload, the temporary contains the value of the expression. PROOF. This lemma follows directly from the Finalize algorithm 922 and from the fact that Rename assigns versions while traversing the SSA graph in dominator-tree preorder. In particular, Finalize 922 ensures directly that each reload is dominated by its available definition. Because the live ranges of different versions of h do not overlap, each reloaded occurrence must refer to its available definition. LEMMA 9 (Optionality of reload). The optimized program does not compute the expression at any point where it is fully available. PROOF: It is straightforward to check that the optimized program reloads the expression value for any occurrence defined by a .PHI. satisfying will-be-avail and it reloads the expression value for any occurrence dominated by another real occurrence of the same version. Therefore we need only note that will-be-avail accurately reflects availability in the optimized program (by Lemma 7) and that by the definition of insert we only insert for .PHI. operands where the insertion is required to achieve availability. 4.6 The CodeMotion Step Once the hypothetical temporary h has been put into valid SSA form, the only remaining task is to update the SSA program representation to reflect the results of PRE. This involves introducing the real temporary t for the purpose of eliminating redundant computations. This task is performed by the optimizer 106 in the Code Motion step 926. This task is straightforward due to the fact that h is already in valid SSA form. The SSA form of t is a subgraph of the SSA form of i, since defs of h (including .PHI.'s) with no use are omitted. The operation of the optimizer 106 when performing the Code Motion step 926 is represented by a flowchart 1902 in FIG. 19. The CodeMotion step 926 walks over the SSA graph of h. At a real occurrence, if save is true, the optimizer 106 generates a save of the result of the computation into a new version of t (step 1904). If reload is true, the optimizer 106 replaces the computation by a use of t (step 1906). At an inserted occurrence, the optimizer 106 saves the value of the inserted computation into a new version of t (step 1908). At a .PHI. of h, the optimizer 106 generates a corresponding (for t (step 1910). FIG. 8 shows the example program at the end of the CodeMotion step 926. 5.0 Theoretical Results In this section, the main results about SSAPRE derived from the LEMMAS already given above are discussed. Theorem 1. SSAPRE chooses a safe placement of computations; i.e., along any path from entry to exit exactly the same values are computed in the optimized program as in the original program. Proof: Since insertions take place only at points satisfying down.sub.-- safe, this theorem follows directly from Lemma 4. Theorem 2. SSAPRE generates a reload of the correct expression value from temporary at a real occurrence point if and only if the expression value is available at that point in the optimized program. Proof: This theorem follows from the fact that reloads are generated only when the reloaded occurrence is dominated by a will.sub.-- be.sub.-- avail .PHI. of the same version (in which case we refer to Lemma 7 for the availability of the expression at the reload point), or by a real occurrence of the same version that is marked save by Finalize. Theorem 3. SSAPRE generates a save to temporary at a real occurrence or insertion point if and only if the following hold: the expression value is unavailable (in the optimized program) just before that point, and the expression value is partially anticipated just after that point (i.e., there will be a use of the saved value). Proof: This theorem follows directly from Lemma 9 and from the fact that the Finalize algorithm 922 sets the save flag for a real occurrence only when that occurrence dominates a use of the same version by another real occurrence or by a .PHI. operand. In the former case the result is immediate, and in the latter case we need only refer to the fact that the expression is partially anticipated at every .PHI. remaining after the Rename step 914. Theorem 4. SSAPRE chooses a computationally optimal placement; i.e., no safe placement can result in fewer computations along any path from entry to exit in the control flow graph. Proof: We need only show that any redundancy remaining in the optimized program cannot be eliminated by any safe placement of computations. Suppose is a control flow path in the optimized program leading from one computation, .psi..sub.1, of the expression to another computation, .psi..sub.2, of the same expression with no assignment to any operand of the expression along . By Theorem 2, the expression value cannot be available just before .psi..sub.2, so .psi..sub.2 is not dominated by a real occurrence of the same version (by Lemma 9) nor is it defined by a will.sub.-- be.sub.-- avail .PHI. (by Lemma 7). Because .psi..sub.1, and .psi..sub.2 do not have the same version and there is no assignment to any expression operand along , the definition of .psi..sub.2 's version must lie on , and since it cannot be a real occurrence nor a will.sub.-- be.sub.-- avail .PHI., it must be a .PHI. that is not will.sub.-- be.sub.-- avail. Such a .PHI. cannot satisfy later because one of its operands is reached by .psi..sub.1, so it must not be down-safe. So no safe set of insertions could make .psi..sub.2 available while eliminating a computation from . Theorem 5. SSAPRE chooses a lifetime-optimal placement, specifically, if p is the point just after an insertion made by SSAPRE and C denotes any computationally optimal placement, C makes the expression fully available at p. Proof: This theorem is a direct consequence of Lemma 6 and Theorem 4. Theorem 6. SSAPRE produces minimal SSA form for the generated temporary. Proof: This minimality result follows directly from the correctness of the dominance-frontier .PHI.-insertion algorithm. Each .PHI. remaining after the Finalize step 922 is justified by being on the iterated dominance frontier of some real or inserted occurrence that will be saved to the temporary. 6.0 Additional Embodiments Since SSAPRE is a sparse algorithm, an implementation can reduce the maximum storage needed to optimize all the expressions in the program by finishing the work on each expression before moving on to the next one. Under this scheme, the different lexically identical expressions that need to be worked on by SSAPRE are maintained as a worklist. If the expressions in the program are represented in tree form, the invention can also exploit the nesting relationship in expression trees to reduce the overhead in the optimization of large expressions. There is also a more efficient algorithm for performing the Rename step 914 of SSAPRE. These alternate embodiments of the invention, which represent practical implementation embodiments, are described in this section. 6.1 Worklist-driven PRE Under worklist-driven PRE, the invention includes an initial pass, Collect-Occurrences (see step 906 in FIG. 9), that scans the entire program and creates a worklist for all the expressions in the program that need to be worked on by SSAPRE. For each element of the worklist, the invention represents its occurrences in the program as a set of occurrence nodes. Each occurrence node provides enough information to pinpoint the location of the occurrence in the program. Collect-Occurrences 906 is the only pass that needs to look at the entire program. The six steps of SSAPRE (steps 912, 914, 918, 920, 922, 926) operate on each expression based only on its occurrence nodes. The intermediate storage needed to work on each expression can be reclaimed when working on the next one. The Collect-Occurrences step 906 enters only first order expressions into the worklist. First order expressions contain only one operator. For example, in the expression (a+b)-c, a+b is the first order expression and is entered into the worklist, but (a+b)-c is not initially entered into the worklist. After SSAPRE has worked on a+b, any redundant occurrence of a+b will be replaced by a temporary t. If PRE on a+b changes (a+b)-c to t-c, the CodeMotion step 926 will enter the new first order expression t-c as a new member of the worklist. Redundant occurrences of t-c, and hence redundancies in (a+b)-c, will be replaced when t-c is processed. If the expression (a+b)-c does not yield t-c when a+b is being worked on, a+b is not redundant, implying that (a+b)-c has no redundancy and can be skipped by SSAPRE. This approach deals cleanly with the interaction between the optimizations of nested expressions and gains efficiency by ignoring the higher order expressions that exhibit no redundancy. (For higher order expressions that have redundancies, this approach also has the secondary effect of converting the expression tree essentially to triplet form.) This strategy is difficult to implement in conventional bit-vector PRE, which typically works on all expressions in the program simultaneously in order to take advantage of the parallelism inherent in bit-vector operations. In manipulating the sparse representation of each expression, some steps in the algorithm need to visit the occurrence nodes in an order corresponding to a preorder traversal of the dominator tree of the control flow graph. For this purpose, the invention maintains the occurrence nodes for a given expression in the order of this preorder traversal of the dominator tree. As mentioned in Section 4.2, there are three kinds of occurrences. Collect-Occurrences 906 only creates the real occurrence nodes. The .PHI.-Insertion step 9121 inserts new occurrence nodes that represent .PHI.'s and .PHI. operands. Under worklist-driven PRE, a fourth kind of occurrence nodes is required to indicate when the optimizer 106 reaches the program exits in the Rename step 914. These exit occurrence nodes can be represented just once and shared by all expressions. 6.2 Delayed Renaming The Rename algorithm 914 described in Section 4.2 maintains version stacks for all the variables in the program in addition to the version stacks for the expressions. Apart from taking up additional storage, updating the variable stacks requires keeping track of when the values of the variables change, which may incur significant overhead. The algorithm 914 described above is not in line with sparseness, because in a sparse algorithm, the time spent in optimizing an expression should not be affected by the number of times its variables are redefined. Also, under the worklist-driven implementation of SSAPRE, it is not possible to pass over the entire program in the Rename step 914, because that would imply passing over the entire program once for every expression in the program. The solution of both of these problems is to use a more efficient algorithm for the renaming step 914 called delayed renaming. Recall the purpose of the variable stacks in the Rename step 914 is to enable the determination of when the value of an available expression is no longer current by checking if the versions of all the variables are the same as the current versions. At a real occurrence of the expression, it is not necessary to rely on the variable stacks, because the current versions of all its variables are represented in the expression. The variable stacks are only needed when renaming .PHI. operands. To implement delayed renaming, the Rename step 914 uses two separate passes. The first pass, Rename-1, is the same as Rename described above, except that it does not use any variable stack. At a .PHI. operand, it optimistically assumes that its version is the version on top of the expression stack. Thus, it can perform all its work based on the occurrence nodes of the expression. Rename-1 computes an initial version of the SSA graph for h that is optimistic and not entirely correct. The correct renaming of .PHI. operands is delayed to the second pass, Rename-2, which relies on seeing a later real occurrence of the expression to determine the current versions of the variables. Seeing a later real occurrence implies that at the earlier .PHI., the expression is partially anticipated. Thus, the versions of the .PHI. operands are fixed up only for these .PHI.'s. Rename-2 works according to a worklist built for it by Rename-1, which contains all the real occurrences that are defined by .PHI.'s. From the versions of the variables at the merge block of a .PHI., it determines the versions of the variables at each predecessor block based on the presence or absence of .phi.'s for the variables at that merge block. If they are different from the versions assumed at the .PHI. operand in the Rename-1 pass, Rename-2 invalidates the .PHI. operand by resetting it to .perp.. Otherwise, the .PHI. operand renamed by Rename-1 is correct. If the .PHI. operand is also defined by .PHI., it is added to the worklist so that the process can continue up the SSA graph. For example, Rename-1 will initially set the second operand of the .PHI. for h in block 8 of FIG. 6 to h.sub.3. Rename-2 resets it to .perp.. Table 1 specifies the rules for deciding when two occurrences should be assigned the same h-version in the absence of the variable stacks. Rules 1 and 3 are applied in Rename-1, while rules 2 and 4 are applied in Rename-2.
TABLE 1
______________________________________
Assigning h-versions in Delayed Renaming
current condition for
def at top
occurrence
identical
Rule of stack X h-version
______________________________________
1 real real all corresponding variables have
2 real .PHI. operand
same versions
3 .PHI. real defs of all variables in X dominate
4 .PHI. .PHI. operand
the .PHI.
______________________________________
An additional advantage of delayed renaming is that it allows the invention to determine the .PHI.'s that are not live without performing a separate dead store elimination phase. In delayed renaming, only the operands at .PHI.'s at which the expression is partially anticipated are fixed up. The remaining .PHI.'s correspond to dead .PHI.'s, and they can be marked for deletion. CONCLUSION The SSAPRE algorithm of the invention performs PRE while taking full advantage of the SSA form in the input program and within its operation. It incorporates the advantages shared by all the other SSA-based optimization techniques: no separate phase to collect local attributes, no data flow analysis involving bit vectors, sparse representation, sparse computation of global attributes, and unified handling of each optimization's global and local forms. In actual implementation, by working on one expression at a time, the invention can also lower the maximum storage requirement needed to optimize all the expressions in the program, and also exploit the nesting relationship in expression trees to speed up the optimization of large expressions. SSAPRE enables PRE to be seamlessly integrated into a global optimizer that uses SSA as its internal representation. Because the SSA form is updated as optimization progresses, optimizations can be re-invoked as needed without incurring the cost of repeatedly rebuilding SSA. From an engineering point of view, SSAPRE permits a cohesive software implementation by making SSA and sparseness the theme throughout the optimizer. Previous uses of SSA were directed at problems related to variables. SSAPRE represents the first use of SSA to solve data flow problems related to expressions or operations in the program. The invention shows that data flow problems for expressions can be modeled in SSA form by introducing hypothetical temporaries that store the values of expressions. Such an approach opens up new ways to solve many data flow problems by first formulating their solution in terms of the SSA graph of the hypothetical temporary. Candidates for this new approach are code hoisting and the elimination of load and store redundancies. The SSAPRE approach can also incorporate techniques developed in the context of classical PRE, such as the integration of strength reduction into the PRE optimization phase. Processing expressions one at a time also allows other possibilities for SSAPRE by customizing the handling of different types of expressions. For example, one might suppress PRE for expressions that are branch conditions because the branch instructions can evaluate the conditions without extra cost. One might also move selected loop-invariant operations out of loops to points that are not down-safe because they will not raise exceptions. Since SSAPRE works bottom-up with respect to an expression tree, it can reassociate the expression tree when no optimization opportunity was found with the original form. While various embodiments of the present invention have been described above, it should be understood that they have been presented by the way of example only, and not limitation. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
|
Same subclass Same class Consider this |
||||||||||
