System and method for performing selective dynamic compilation using run-time information6427234Abstract Selective dynamic compilation of source code is performed using run-time information. A system is disclosed that implements a declarative, annotation based dynamic compilation of the source code, employing a partial evaluation, binding-time analysis (BTA), and including program-point-specific polyvariant division and specialization and dynamic versions of traditional global and peephole optimizations. The system allows programmers to declaratively specify policies that govern the aggressiveness of specialization and caching, providing fine control over the dynamic compilation process. The policies include directions for controlling specialization at promotion points and merge points, and further define caching policies, and speculative-specialization policies. The system also enables programmers to specialize programs across arbitrary edges, both at traditional locations, such as procedure boundaries, but also within procedures. Programmers are enabled to conditionally specialize programs based on evaluation of arbitrary compile-time and run-time conditions. Claims The invention in which an exclusive right is claimed is defined by the following: Description FIELD OF THE INVENTION
TABLE 1
Dynamic-to-Static Promotion Specialization Policy Values
Promotion Policies
Policy Value Description
Auto_promote Automatically insert a dynamic-to-static promotion
when the annotated static variable is possibly assigned
a dynamic value.
Manual_promote Introduce promotions only at explicit annotations.
Note that the caching policies provide control over both the number of cached versions of specialized code that are maintained (i.e., none, one, or all previously specialized versions may be cached), and over whether to dispatch on the promoted value when choosing the cached version. 2. Control Flow Merges A control flow merge is a point where multiple program paths join. Policies that apply at merge points are shown below in TABLES 3-5.
TABLE 3
Merge Division Policy Values
Division Policies
Policy Value Description
Poly_divide Perform polyvariant division.
Mono_divide Perform monovariant division.
Monovariant division permits variables to be divided between static and dynamic in only one combination at a program point. Polyvariant division allows different combinations of variables to be treated as static at one program point. Multiple divisions arise after a merge when incoming paths into the merge are annotated differently.
TABLE 4
Merge Specialization Policy Values
Specialization Policy
Policy Value Description
Poly_specialize Perform polyvariant specialization at merges.
Mono_specialize Perform monovariant specialization at merges.
Monovariant specialization permits only one version of code to be generated for the merge for each version preceding it. Polyvariant specialization produces a version for each path, if there are different values of static variables on different paths. The effect can be powerful, causing multi-way loop unrolling. However, unfolding all possible paths through loops with dynamic branches can lead to code explosion.
TABLE 5
Merge Caching Policy Values
Merge Caching Policy
Policy Value Description
m_cache_all.sub.-- Specialize at merges, assuming that the context
unchecked is different than any previous or subsequent
specialization.
m_cache_all Cache each specialized version at merges.
m_cache_one Cache only the latest version at merges, throwing
away the previous version if the context changes.
m_cache_one.sub.-- Cache one version, and assume the context is the
unchecked same for all future executions of this merge.
Merge caching policies at merge points are analogous to the caching policies at promotion points. 3. Dynamic Branches A dynamic branch is one whose direction is determined by a dynamic predicate. Dynamic branches determine execution of dynamically generated code. When the branch is specialized, it is not known whether both successors will be executed. The run-time specializer may decide (according to the current laziness policy) whether to speculatively specialize both successors, or whether to suspend specialization and allow the successors to be specialized on demand. TABLE 6 lists different policy values that correspond to the laziness policy.
TABLE 6
Speculative-Specialization (Laziness) Policy Values
Laziness Policies
Policy Values Description
Lazy Suspend specialization at all dynamic branches,
avoiding all speculative code generation.
Specialize lazy Suspend specialization at all dynamic branch
successors dominating specializable merge points
and specializable call sites, avoiding speculative
specialization of multiple versions of code after
merges.
Loop_specialize.sub.-- Suspend specialization at all dynamic branch
lazy successors dominating specializable loop-head
merge points and specializable call sites, allowing
speculative specialization except where it might be
unbounded.
Eager Eagerly specialize successors of branches, assuming
that no unbounded specialization will result,
allowing full speculative specialization.
Policy Implementation DyC employs two main components to enable annotation of policies. The first component is an interface for associating a policy with a variable at a given program point. The interface is implemented by annotating a program's source code. The basic annotations that drive run-time specialization are make_static, and make_dynamic. Make_static takes a list of variables, each of which is treated as a run-time constant at all subsequent program points until DyC reaches either a make_dynamic annotation that lists the variable or the end of the variable's scope (which acts as an implicit make_dynamic). Source code can be annotated with make_static, using the following syntax, which associates the variable identified as variable-name with the policy value policy-value. make_static (variable-name:policy-value) A policy value is simply one of the different values corresponding to a policy. A make_dynamic annotation may be used to clear all policies associated with a variable at a given program point, using the following syntax. make_dynamic (variable-name) The second component comprises a system for propagating bindings from variables to policy values through the program. The system uses an iterative data flow analysis (IDFA) on a CFG to propagate sets of divisions. A division is a mapping from variables to policy values. Since polyvariant division is supported, a given variable can have more than one policy value associated with it for each policy at a given program point, giving rise to multiple divisions at each program point. FIG. 6 illustrates the logic used by the system in determining how sets of divisions are caused to flow across nodes in the CFG. In a decision block 300, a determination is made as to whether the node being processed is an entry node. If the answer is yes, the set of divisions is initialized to be empty in a block 302, and the processing of the current node is complete. If the node is not an entry node, the logic flows to a decision block 304, which determines if the node is of the form make_static (x:new_value). If the answer is yes, all pairs ((x, old_value), where old_value is a value of the same policy as new_value), are deleted from each division being caused to flow, and a pair of annotated variables (x, new_value) is added to each division in a block 306, whereupon the processing of the node is complete. If the answer to decision block 304 is no, the logic flows to a decision block 308, which determines if the node is a make_dynamic (x) node. If so, the annotated variables, e.g., all bindings of the form (x, *), are removed from each division in a block 310, and the processing of the node is complete. If the node is not a make_dynamic (x) node, the logic flows to a decision block 312, wherein a determination is made as to whether the node being processed is a merge node. If so, a block 314 performs a meet operation on incoming data sets, whereby the set of divisions succeeding the node is set to the result of a meet of the two incoming sets of divisions at the node, and the process is complete. The meet of two sets of divisions is the union of the pair-wise meet of the divisions in the sets. When meeting a pair of divisions, if either division has variables undergoing polyvariant division, the meet results in one extra division being generated for each policy value of the variable at the meet. For the other variables in the divisions being met, the resultant division simply contains the meet of the policy values for each policy of those variables from each division. The policy values are assumed to fit in a lattice, with more aggressive policies lower in the lattice (see TABLE 7 below for the lattices corresponding to particular policies), and with the meet of two lattice points corresponding to the greatest lower bound of the points. The flow analysis initializes values to the highest point on the lattice and solves for the greatest fixed point. The remaining possibility (where all determinations made in decision blocks 300, 304, 308, and 312 have resulted in "no" answers) is that the operation at the node is "normal" (i.e., of the form x:=. . . ), and in a decision block 316, a determination is made to whether or not the right-hand side of the foregoing equation consists of variables that are all static. If the answer is yes, x is a derived static variable, and the logic flows to a decision block 318. This decision block determines if the derived static variable already has a policy value attached to it. If no policy value is attached to it, in a block 320, the derived static variable is assigned the highest policy value (i.e., least aggressive policy) in the lattice of values for that policy. The basic intuition is that in the absence of instructions to the contrary, specialization for derived variables should be the least aggressive possible. If the derived variable had a policy associated with it in the incoming division (i.e., the answer to decision block 318 is yes), x is bound to that policy in a block 322. If on the other hand, the node makes x dynamic, the answer to decision block 316 is no, and the system removes bindings of x (x,*) from all divisions in a block 324. The lattices for the policies the invention implements are shown below. Each of these lattices is a total order on the possible policy values. Picking the greatest lower bound of two policy values therefore corresponds simply to picking the lower of the values in the lattice.
TABLE 7
Lattices of Policy Values
Policy Lattice
Policy Lattice of Policy Values
Specialization poly_specialize <= mono_specialize
Division poly_divide<=mono_divide
Promotion auto_promote <= manual_promote
merge-caching m_cache_all_unchecked <= m_cache_all <=
m_cache_one
<= m_cache_one_unchecked
promotion-caching p_cache_none_unchecked <= p_cache_all <=
p_cache_one
<= p_cache_one_unchecked
Laziness laz <= specialize_lazy <= loop_specialize.sub.--
lazy <= eager
The IDFA to propagate policy values from annotations to all relevant program points is completely defined by the foregoing policy lattice, the flow functions, and the meet functions. In traditional implementations of specializers, a BTA exists to determine which variables are static, and which are dynamic. Decision block 316 of FIG. 6 requires information from the BTA to determine whether the variables are static or dynamic. In turn, some of the policy information (such as the specialization and promotion policies) influences the BTA's determination of which variables are static or dynamic. Due to this co-dependence between the BTA and the policy flowing analyses, the two necessarily run in lock-step. In fact, in a preferred implementation, the two flow analyses are folded into one IDFA. It should be noted that the particular syntax presented above for associating variables with policy values (or for disassociating variables from policies) is one of many possible syntax's that could be used. Any syntax that sets up (or destroys) the association of variables with policies could be implemented. Additionally, the discussion above specifies assigning the least aggressive policy possible to derived static variables (when no policy has been previously assigned). As an option, the policy assigned to the derived state variable could be some (approximately defined) meet of the policies of the variables from which it is defined. Though several policies are shown in the tables above, the framework for specifying and propagating policies in the preferred embodiment is not limited to these particular policies. Specifically, additional policies that have richer lattices than those presented above can be implemented following the same scheme. Furthermore, the policies can control any other aspect of specialization, as long as they are expressible as policy values, attributable at program points, and flow functions are definable to propagate these policy values from their annotation points. Interprocedural Annotations Run-time specialization normally applies within the body of a single procedure: calls to a procedure P from within a dynamic region or specialized function all branch to the same unspecialized version of P. P itself may have another specialized region in its body, but this break in the specialization will cause all the different specialized calls of P to merge together at the entry to P, only to be split back apart again by the cache checks at the make_static annotation in P's body. To avoid this overhead, calls can themselves be specialized, branching to correspondingly specialized versions of the callee procedure, thereby extending dynamic regions across procedure boundaries. The specialize annotation names a procedure with a given number of arguments and provides a list of divisions for the procedure. Each division lists a non-empty subset of the formal parameters of the procedure that will be treated as run-time constants; a division can specify the same policies for listed variables as a make_static annotation. As described below, for each division, DyC's static compiler produces a code generation procedure (i.e., a generating extension) for that division that takes the static formals as arguments and, when invoked on their run-time values, produces a specialized residual procedure that takes the remaining arguments of the original procedure (if any), in classical partial evaluation style. At each call site in a specialized region to a procedure P with an associated specialize annotation, DyC searches for the division specified for P that most closely matches the division of actual arguments at the call site (favoring divisions listed earlier in P's specialize annotation, in case of ties). The most closely matching division is the one with the greatest number of formal parameters annotated as static that correspond to static actual arguments and no static formals that correspond to dynamic actuals. If one is found, the static compiler produces code that, when specializing the call site at run time: (1) invokes the generating extension for the selected division of P, passing the necessary run-time constant arguments; and, (2) generates code that will invoke the resulting specialized version for P, passing any remaining arguments. Thus, when the specialized call is eventually executed, the call will branch directly to the specialized callee and pass only the run-time variable arguments. If no division specified for P matches the call, then the general unspecialized version of P is called. Calls to P outside any dynamic region continue to invoke the unspecialized version of P. The callee procedure and any call sites can be compiled separately. They need to agree on the specialize annotation, which typically is placed next to the procedure's extern declaration in a header file. Since call boundaries across which specialization should take place are explicitly identified by the programmer, the inter-procedural analysis that would normally be required to identify (and propagate run-time constants through) specializable callees is avoided. The constant prefix to the specialize annotation is an (unsafe) assertion by the programmer that the annotated procedure acts like a pure function; in other words, it returns the same result given the same arguments without looping forever, making externally observable side effects, or generating any exceptions or faults. DyC exploits this information by calling a constant function from call sites whose arguments are static at specialization time and treating its result as a run-time constant, i.e., reducing the call rather than specializing or residualizing the call. This behavior is different than simply providing a division where all formals are static, since that condition would leave a zero-argument call whose result was a dynamic value in the specialized code. The system also allows the programmer to prefix individual function calls with an @ annotation to specify that the result of a function call should be treated as a run-time constant if its arguments are run-time constants. For instance, to indicate that a call to the cosine function is a pure function, a programmer could write: make_static(x); y=cos@(x); . . . /* later uses of y are specialized for y's value at specialization time */ . . . The above example illustrates the use of "@" as a per-call site version of the constant annotation. This annotation is beneficial because the programmer may know that particular uses of a function will not generate side effects, although the function may produce side effects in general. Analysis of the Annotations Given the programmer annotations described in the previous section, DyC performs data flow analysis akin to BTA over each procedure's CFG representation to compute where and how run-time specialization should be performed. The output of this analysis is information associated with each program point (formally, each edge between instructions in the CFG); the domain of the information, BTA, is specified in FIG. 7. Some of the constraints on the form of the domain are specified in FIG. 8. Flow graph nodes are generated from the grammar shown in FIG. 9, where Var, Const, UnaryOp, BinaryOp, and Proc are terminals and Policies are as defined in FIG. 7. This output is used to produce the generating extension that invokes the run-time specializer, as described below. In the accepted notation, .fwdarw. constructs the domain of partial finite maps (sets of ordered pairs) from one domain to another, dom and range project the first and second elements, respectively, of the ordered pairs in the map, and applying a map f to an element in dom(f) returns the corresponding range element. The notation uses x to construct cross-product domains. D(p) is written to project from the product p the element that corresponds to component domain D, and p[D.fwdarw.v] is written to compute a new product p' that is like p but whose D element has a value v. Pow denotes the powerset domain constructor. Note that: A.fwdarw.B.lhalfcircle.Pow(A.times.B). The analysis essentially considers only scalar local variables and compiler temporaries, and annotated data structures are treated as static pointers. The binding times of memory locations are not computed. The analysis computes a set of divisions for each program point. Each division maps variables annotated as static by make_static or specialize to their associated policies at that program point. Two divisions are distinct if and only if there is some variable in one division that is annotated with the polyvariant division policy and is either not found (i.e., it is dynamic) or is annotated differently in the other division; divisions that do not differ in the policies of any variables annotated with the polyvariant division policy will be merged together by the analysis. For each division the analysis computes the following information: (1) The analysis computes the set of static variables (run-time constants) at that program point, including both user-annotated static variables (called root variables) and any derived static variables computed (directly or indirectly) from them. The computed set of static variables will be used to determine which computations and operands are static, versus which are dynamic. In addition, it is used to index into the run-time specializer's caches; consequently, the analysis also computes the appropriate caching policy for each static variable. For internal purposes, the analysis tracks the set of annotated run-time constants from which each static variable was computed, directly or indirectly, as described in below. (2) The analysis computes those points that require dynamic-to-static promotions of variables. Non-empty promotion sets correspond to promotion points for the listed variables. Promotions get inserted after make_static annotations for variables that are not already static, and after (potential) assignments of dynamic values to variables that are annotated with the auto-promotion policy. (3) The analysis computes those points that require the demotion of variables. The set of demoted variables indicates which previously static variables have become dynamic and need to be initialized with their last static value by residual assignments (called explicators). (4) The analysis identifies which merge points require polyvariant specialization, called specializable merges points, because at least one variable that is annotated with the polyvariant specialization policy has potentially different definitions on different merge predecessors. The set of such discordant variables is computed at these merge points, and is empty at all other points. The following paragraph describes the procedure representation that the system assumes and the set of data flow analyses used to construct this output. Procedure Representation The system assumes that the procedures being analyzed are represented in a standard CFG, where nodes in the graph can be of one of the following forms: (1) an operator node such as a move, add, or call, with one predecessor and successor; (2) a merge node with multiple predecessors and one successor; (3) a conditional branch node with one predecessor and multiple successors, with a single operand that selects the appropriate successor edge; (4) an entry node with no predecessors and a single successor, which acts to bind the procedure's formals upon entry; or, (5) a return node with one predecessor and no successors, and with a single operand that is the procedure's result. To enable the analyses to detect when potentially different definitions of a variable merge, it is assumed that merge nodes are annotated with a list of variables that have different reaching definitions along different predecessors, yielding one variable in the list for each .phi.-function that would be inserted if the procedure was converted to static single assignment (SSA) form. Prepasses The analyses will need to identify those program points where a variable may be assigned. Direct assignments as part of an OpNode are clear, but assignments through pointers and as side effects of calls are more difficult to track. The system abstracts this "may-side-effect" analysis problem into a prepass whose output is MayDefVars. MayDefVars is a set of variables at each program point that may be modified during execution of the previous node (other than the left-hand side variable of the node). The analyses will work better if it can identify when annotated and derived run-time constant variables are no longer used (i.e., "dead"). The system abstracts the result of a live variables analysis into a prepass that computes LiveVars, the set of live variables at each program point. It also computes and abstracts a similar analysis, UsedVars, which is the set of variables at each program point that has an earlier definition and a later use (but may temporarily be dead at this point). LiveVars is used to determine when variables can be removed from StaticVarInfo. Because Division contains the policies attributed to annotated variables, a variable cannot be removed from Division when it simply goes dead; when the variable is used again downstream, its policy information will be needed. Hence, UsedVars is used to determine when an annotated variable can be removed from Division. Finally, the system processes the inter-procedural specialization directives and records them in the Specializations domain. Specializations maps each annotated procedure to a set of divisions given in the specialize annotation and indicates whether the procedure was annotated as constant. This information is assumed to be replicated at all program points, for convenience in writing the analysis functions. The Main Analysis FIGS. 10A-C define flow functions of the annotation analysis. FIGS. 11A-B define helper functions of the annotation analysis. The BTA family of data flow equations defines the information on the program point(s) after a node in terms of the information computed for the point(s) before the node (bta), the helper information described below for the program point(s) after the node (lvs, uvs, and mds), and the ever-present specialized function information (sp). A solution to the (recursive) data flow equations is the greatest fixed point of the set of equations for each node in the procedure, which is solved by a simple iterative data flow analysis. The top element of the lattice, used to initialize back edges during the initial iteration of analysis of loops, is the empty set (no divisions). The system follows the conventions of data flow analysis in solving for greatest fixed points and initializing information along edges to the top of the lattice. In general, each flow function computes a new, updated set of divisions from the inflowing set(s) of divisions. Any permanently dead variables (those no longer in the UsedVars set) are removed from the set of annotated variables, Division, and any, at least temporarily, dead variables (those no longer in the LiveVars set) are removed from the set of run-time constants, StaticVarInfo, to avoid unnecessary polyvariant division or specialization. Permanently dead variables are not removed from Division if any static variables derived from them are still live, because doing so would require those derived static variables to be killed, as described below. Once a new set of divisions and associated information is computed, divisions that no longer differ in the policies of any variables annotated as leading to polyvariant division are merged together into a single division. Thus the degree of polyvariant division can vary from program point to program point. Entry Nodes The analysis of the procedure entry node creates the initial division(s), including at least the empty unspecialized division with no run-time constants. For a specialized procedure, each of the divisions listed in the specialize annotation introduces an additional specialized division in the analysis. For each division, the set of run-time constants is initialized to the set of annotated variables, with each variable's initial caching policy taken from its specified PromotionCachingPolicy. Make Static and Make Dynamic Nodes The analysis of a make_static pseudo-instruction adds a new static variable to each of the existing divisions and replaces the policies associated with the variable, if it is already present in some division. If the variable was not already a run-time constant in some division, then the make_static instruction introduces a dynamic-to-static promotion. The make_dynamic instruction simply removes the annotated variable from each of the inflowing divisions; as described above, this step may cause divisions to merge and run-time static variables derived from the newly dynamic variable to be dropped. Assignment and Store Nodes The various forms of assignment nodes all have similar analyses, dependent only on whether the right-hand side expression is a run-time constant expression. Compile-time constants are trivially run-time constants. A unary or binary expression yields a run-time constant, if its operands are run-time constants and if the operator is a pure function (e.g., it cannot trap and always returns the same result given the same arguments). A load instruction yields a run-time constant if and only if its address operand is a run-time constant (which includes fixed values, such as the address of a global or local variable) and it is annotated with @ by the programmer. A call to a procedure annotated by the programmer as constant yields a run-time constant if all its arguments are run-time constants. A store instruction has no definitely assigned result variable, only potential side effects, as described by the MayDefVars set. The effect of these nodes is summarized into two sets. The first is a (singleton or empty) set of variables definitely assigned run-time constant values; the other is a set of variables possibly assigned dynamic expressions (comprised of the assigned variable if the right-hand side expression is dynamic, as well as any variables in the MayDefVars set). The definitely static variables are added to the set of run-time constant variables. The possibly dynamic variables are divided into those annotated with the auto-promote policy (which instructs DyC to insert a dynamic-to-static promotion automatically if these variables are ever assigned a dynamic value), and those that aren't auto-promoted (which DyC drops from the set of annotated variables and the set of run-time constants, if present in either). As with the analysis of any node, dropping variables from the set of annotated variables can cause divisions to merge. Merge Nodes The analysis of a merge node must deal with discordant variables that have potentially different definitions along different predecessors (these variables were identified by a prepass and stored with the merge node, as described above). For those discordant variables that the programmer annotated as run-time constants with a polyvariant specialization policy, the analysis will mark this merge as discordant in those variables, triggering specialization of the merge and downstream code. Any other discordant variables are dropped from the set of annotated variables and run-time constants, if present. (Again, this dropping of variables from the annotated set may cause divisions to merge.) Derived run-time constants are implicitly monovariantly specialized, since they were not explicitly annotated as polyvariantly specialized by the programmer. The caching policy for all discordant variables at the merge is set to those variables' merge caching policy. This analysis can be improved for the case of a static merge. A static merge is a merge where at most one of the merge's predecessors can be followed at specialization time, because the predecessors are reached only on mutually exclusive static conditions. Since only one predecessor will be specialized, the merge node won't actually merge any branches in the specialized code and only one definition of each static variable will reach the merge when the residual code is executed. In fact, all that is required is to ensure that only one definition of a static variable can reach the merge at execution time, either because there is only one reaching definition, or potentially different definitions are only along predecessors with mutually exclusive static reachability conditions. Such variables are not included in the set of discordant variables. The reachability analysis used to identify static merges is discussed below. Branch and Return Nodes The analysis of a branch node simply replicates its incoming information along both successors (as always, after filtering the set of variables to exclude those that are no longer live along that successor). Return nodes need no analysis function, since there are no program points after return nodes. Caching Policies and Derivations of Static Variables At each program point, the analysis computes a caching policy for each variable. This caching policy is used to control indexing into the run-time specializer's caches of previously specialized code. Annotated variables at promotion points (and at the start of analysis of a division of a specialized function) are given the user-specified PromotionCachingPolicy value. At specializable merge points, a discordant variable is changed to use the variable's MergeCachingPolicy value. Derived run-time constants are given the CacheOneUnchecked policy. This policy ensures that unannotated run-time constants are never used in cache lookups and consequently do not lead to additional specialization beyond that explicitly requested by the user. The unchecked caching policy is safe, as long as each derived run-time constant is a pure function of some set of annotated variables. An annotated variable can be assigned a static expression, in which case it is treated (more efficiently) as a derived run-time constant with a CacheOneUnchecked policy, instead of its annotated caching policy. Assignments to root annotated variables violate the assumption that a derived run-time expression is a function of a set of root annotated variables. In this case, the derived run-time constants need to be dropped from the set of static variables, and annotated derived run-time constants need to be assigned new cache policies; preferably, the system meets the cache policies of their prior root variables. The analysis tracks the set of root annotated variables, SourceRoots, on which a derived run-time constant depends; whenever a root variable is (possibly) assigned to or is removed from the division, all dependent run-time constants are dropped (or restored to their regular caching policy, if roots themselves). This distinction between root and derived variables is a significant source of complexity in the analysis. Computation of Demotions At each program point, the analysis computes the set of demoted variables. A variable can be demoted in two ways: (1) if it was static before the point but is dynamic after the point (svi-svo in the equations), or (2) if it becomes static at the node but is dropped from the set of static variables right after the node because of filtering of live variables (svo-svf in the equations). Additional Lattice Meet Operations The Merge helper function uses the lattice meet operators for the Division and DivisionInfo domains. The lattice meet operator .andgate..sub.Division over elements of Division indicates how to combine different annotations for a set of variables in the same division, and is defined as follows: d.sub.1.andgate..sub.Division d.sub.2.ident.{(v,p).vertline.v.di-elect cons.dom(d.sub.1).andgate.dom(d.sub.2) p=d.sub.1 (v).andgate..sub.Policies d.sub.2 (v)} Elements of Policies are met point-wise. Elements of individual policy domains are totally ordered, with elements listed earlier in the set of alternatives for a domain in FIG. 11 ordered less than elements listed later; for example: AutoPromote.ltoreq..sub.PromotionPolicy ManualPromote Thus, the lattice meet operator for a particular policy domain returns its smallest argument, for example: AutoPromote.andgate..sub.Promotionpolicy ManualPromote=AutoPromote This rule has the effect of picking the strongest policy of any of the merging divisions. The lattice meet operator .andgate..sub.Divisioninfo over elements of Divisioninfo is the point wise meet over its component domains, which are defined as follows:
si.sub.1 .andgate.StaticVarInfo si.sub.2 .ident.
let si.sub.new =
{ (v, (p,rvs)) .vertline. v.epsilon.dom(si.sub.1).andgate.dom(si.sub.2)
.LAMBDA.
P = P.sub.1 .andgate.CachingPolicy P.sub.2 .LAMBDA.
rvs = rvs.sub.i .orgate. rvs.sub.2
where P.sub.2 = CachingPolicy(si.sub.2 (v))
P.sub.1 = CachingPolicy(si.sub.1 (v))
rvs.sub.1 = SourceRoots(si.sub.1 (v))
rvs.sub.2 = SourceRoots(si.sub.2 (v))}
vs.sub.1 .andgate.Promotions vs.sub.2 .ident.
vs.sub.1.orgate.vs.sub.2.andgate.dom(si.sub.new)
vs.sub.1 .andgate.DiscordantVars vs.sub.2 .ident.
vs.sub.1.orgate.vs.sub.2.andgate.dom(si.sub.new)
vs.sub.1 .andgate.Demotions vs.sub.2 .ident. vs.sub.1.orgate.vs.sub.2
Reachability Analysis The system identifies static merges by computing a static reachability condition at each program point for each division. A static reachability condition is a boolean expression (in conjunctive normal form) over the static branch outcomes, which are required in order to reach that program point. A static branch is a branch whose test variable is identified as a run-time constant by the BTA analysis. A static merge is one whose predecessors have mutually exclusive static reachability conditions. A merge is static for a particular variable x with respect to a given division if and only if at most one possible definition reaches the merge, or different incoming potential definitions are along mutually exclusive predecessors. Reachability conditions are computed at the same time as the BTA information, since they depend on the BTA's division and static variable analysis and influence the BTA analysis's treatment of merge nodes. Creating the Generating Extensions Given the output of the BTA analysis, DyC statically constructs the code and static data structures that, when executed at run time, will call the run-time specializer with the appropriate run-time constant arguments to produce and cache the run-time specialized code, i.e., the generating extensions. With reference back to FIG. 2, the generating extensions are created by the generating extension section of DyC's Multiflow compiler, using the following steps. A split divisions step is first performed in a block 228. The compiler statically replicates control flow paths, so that each division receives its own code. After replication, each program point corresponds to a single division. Replication starts at entries to specialized functions (producing several distinct functions), and at merge points where different divisions combine. Replicated paths remerge at points where divisions cease to differ and are joined by the Merge function. In a block 230, the compiler identifies which branch successor edges should be lazy specialization edges. Lazy points (due to dynamic-to-static promotions) are identified. The compiler also identifies the boundaries of the units manipulated by the run-time specializer (described above), as shown by a block 232. Unit boundaries primarily correspond to dynamic-to-static promotion points, eviction points (where variables are evicted from the set of annotated variables), specializable merge points, and lazy branch successor edges. The first three cases are cache lookup points, and the last case avoids speculative specialization. A clustering algorithm then attempts to merge boundaries together to minimize their cost. The Unit and UnitEdge specializer data structures are generated at the end of this process. Further discussion concerning the identification of lazy edges is presented below. Next, the process flows to a block 234, wherein the compiler separates the static operations (OpNodes whose right-hand side expressions were computed to be static by the BTA analysis) and the dynamic operations into two separate, parallel control flow sub-graphs. These sub-graphs are respectively called "set-up code" and "template code." Further aspects of this separation are discussed below. Also discussed below is the method the system uses for determining the control flow of the static sub-graph, after all dynamic branches have been removed from it. In a block 236, the compiler inserts explicators in the dynamic sub-graph for all variables in the Demotions set at each program point. For Demotions sets at merge nodes, each assignment must be inserted on each predecessor edge to the merge where the now-dynamic variable was previously static. The DC operations needed to complete the implementation of Specialize, such as cache lookups, memory allocation, and branch patching, are inserted into the static and dynamic sub-graphs before they are passed through the back end of the compiler, as shown by a block 238. Some optimizations of the calls to the run-time specializer are discussed below. The primary steps for generating extensions is thus complete. The next step is performed by back end 208, wherein Multiflow's combined register allocator and instruction scheduler optimizes the ordinary static code, the static code to be executed by the run-time specializer, and the dynamic code. A post-pass integration step that follows assembly code generation is performed in block 216, to integrate the dynamic code into the static specializer code so that the dynamic code is emitted at run time when the corresponding static code is executed by a generating extension. In this integration step, each unit's ReduceAndResidualize function is completed. The control flow and the reduce operations of the ReduceAndResidualize function are derived from the static control flow sub-graph. The residualize operations are introduced by translating the operations and dynamic branches of the dynamic sub-graph into code to emit the dynamic instructions (perhaps with run-time constant operands) in the static sub-graph; this process is described in more detail below. The resulting sub-graph forms the ReduceAndResidualize function for the unit, and the dynamic sub-graph is discarded. Computing Lazy Branch Successors Laziness policies on variables indicate the extent of speculative specialization that should be performed after dynamic branches. Based on these policies, successors of some dynamic branches are determined to be lazy edges, each of which corresponds to a one-time suspension and resumption of specialization at run time. A branch successor edge is lazy if and only if its test variable is dynamic and at least one of the following conditions holds. (1) At least one of the run-time constants at the branch is annotated with the Lazy policy, (2) The branch successor edge determines execution (as defined below) of a predecessor edge of a later specializable merge node, where at least one of the discordant variables is annotated with the SpecializeLazy policy, (3) The branch successor edge determines execution of a predecessor edge of a later specializable loop-head merge node, where at least one of the discordant variables is annotated with the LoopSpecializeLazy policy, or (4) The branch successor edge determines execution of a later call to a specialized division of a procedure, and some run-time constant live at the call is not annotated with the Eager policy. A branch successor edge determines execution of a program point if and only if the edge is "postdominated" by the program point, but the branch node itself is not, i.e., the branch successor is (one of) the earliest point(s) where it is determined that the downstream program point will eventually be executed. Once the postdominated information relating program points is computed, a linear scan over the dynamic branches, specializable merge points, and specialized calls serves to compute the lazy edge information. Unit Identification and Automatic Positioning of Cache Lookups It is often preferable to cache dynamically generated code, so that it can be reused without having to perform additional dynamic compilations on the same code segment, both at the program point at which the code is generated and by other program points that need identical code, such as multiple calls sites for the same function. In determining caching points, the system considers both the savings due to code reuse and the cost of using and maintaining the cache. Reuse is achieved by caching the specialized code when it is dynamically generated and then performing a cache lookup, based on the values of the static variables, to retrieve the cached specialized code when the code is executed again. A cache lookup compares the current values of the static variables (variables whose values are used for run-time specialization) with previous sets of values that have been used to generate specialized versions of the code following the current program point. The purpose of performing cache lookups during run-time code generation is to avoid producing redundant code, which takes time and occupies storage space. However, cache lookups also take time, so it is desired that they not be used with excessive frequency. The method described below is a technique for selecting where to insert cache lookups in the process of performing run-time specialization for a particular piece of code, and is independent of how the cache lookups are implemented. The first step for positioning cache lookups is to identify points in the region of the program to be dynamically compiled, called caching points, where the potential for reuse may arise. This step is performed during the BTA. Initial placement of these caching points coincide with unit boundaries, which are identified as follows: (1) A transition, called a lazy point, from executing (statically or dynamically generated code) to generating code. Before dynamically generating the code, the system checks in the reusable code cache to see if the code has been previously specialized to identical static-values. If they have changed, then new static values, call promoted values, must be passed to the dynamic compiler, so that code that is specialized to the new values can be dynamically compiled. Lazy points where values are promoted are called promotion points. A non-empty Promotions set at a program point corresponds to a dynamic-to-static promotion point, and introduces a unit boundary. (2) A procedure entry. The procedure could be invoked again with the same parameter values and other (global) values for which it has been specialized. (3) A merge of control paths on which static variables could be assigned the same values. Under this condition, the code that follows would be the same for both paths. A non-empty DiscordantVars list corresponds to a specializable merge point, and introduces a unit boundary. (4) A program point that occurs after a static variable has become either dynamic (i.e., it is no longer used for specialization) or dead (i.e., it is no longer used at all). Static variables that meet this criteria are called evicted variables. If an evicted variable was not a pure function of other static variables, then eliminating it from the set of specialized values could eliminate differences between specialized versions of the code that follows. Any point where the cache context differs from the cache context at a predecessor point is a unit boundary, since different degrees of polyvariant specialization or of cache retention can occur. If any static variable is annotated with the CacheAllUnchecked policy, then the cache context is the special marker replicate. Otherwise, the cache context is the pair of the set of variables annotated with the CacheAll policy and the set of variables annotated with the CacheOne policy. (The set of variables annotated with CacheOneUnchecked do not contribute to the cache context.) In practice, the rule concerning different cache contexts can be relaxed since, except at promotion points, these boundaries are not required. Unit boundary clustering (see the next subsection) also helps to mitigate the impact of the many of the boundaries that are caused to be inserted by this rule. It is noted that a program point can be a boundary in more than one way. In addition, units are constrained to be single-entry regions. To ensure this constraint is met, additional unit boundaries are inserted at control flow merges of paths (including loop back edges) from different units. These unit boundaries can be omitted, however, if all paths from different units have mutually exclusive static reachability conditions (the same way it is determined that multiple static definitions are not truly discordant, as discussed above). Omitting such unit boundaries eliminates the overhead associated with crossing the omitted unit boundaries (discussed in the next subsection), and permits program points to be shared among multiple units, at the cost of larger generating extensions. A UnitEdge data structure records whether each unit edge should be specialized eagerly or lazily. A unit boundary is eager, unless it is a promotion point (which must be suspended until the computed run-time value is available) or a lazy edge. Coalescing Caching Points by Clustering Unit Boundaries A unit boundary introduces run-time specialization overhead--to package the run-time constant context from the exiting unit's ReduceAndResidualize function, to execute the run-time specializer and any cache lookups, and to invoke the target unit's ReduceAndResidualize function (unpacking the target's run-time context). In some circumstances, series of unit boundaries can be created with little if any work intervening, for instance when a series of annotated static variables become dead, leading to a series of eviction points and corresponding unit boundaries. To avoid excessive unit boundaries, the system attempts to combine multiple boundaries whenever possible. This goal is accomplished by a unit boundary clustering algorithm, which works as follows. First, for each boundary, a range over the procedure where that boundary can be legally moved is constructed. Procedure entries, specializable merge points and lazy edge boundaries cannot be moved, so their range is a single program point. Eviction and promotion points can move to any control equivalent program point (i.e., a program point with the same control dependencies) that is bounded by earlier and later uses of the evicted or promoted variables, respectively; however, promotion points cannot move above earlier definitions. It is preferable to delay inserting the single-entry producing unit boundaries until after all of the other boundaries have been clustered, so they do not participate in the clustering algorithm. Second, the system sorts the boundary ranges in increasing order of their ends, and then makes a linear scan through this sorted list. The system removes the range that ends first in the list (the kernel range), removes all other ranges that overlap with the kernel range (the union of these ranges forms a cluster), and finds the intersection of these ranges. This resulting intersection is the program region where the caching points can be placed. It is preferable to use the earliest possible points for evictions and later points for promotions, as these will reduce the amount of specialized code. The system chooses either the start or end of the intersection range, based on the relative mix of promotions and evictions, and inserts a single caching point for all the merged ranges at that point. Then it continues processing the sorted list of boundary ranges, until the list is exhausted. This algorithm for coalescing boundary ranges produces the minimum number of unit boundaries possible, given the restricted kinds of ranges produced in the first step (the restriction to control equivalent program points is key). To prove this, note that the system produces a cluster if and only if it detects a kernel range, so that the number of clusters is equal to the number of kernels. Since kernels never overlap, no clustering scheme could place two kernels in the same cluster. The number of kernels is therefore also the minimum number of clusters required, implying that the algorithm produces no more clusters and, therefore, no more boundaries than necessary. Because caching points may be placed at unit boundaries, moving them can increase or decrease the amount of code reuse. Thus, clustering sometimes trades off reuse for fewer boundary crossings. It may be desirable to limit the length of the ranges so that boundaries sufficiently far from each other are not coalesced, or otherwise limit the length of the ranges to prevent different types of boundaries that are relatively distant from each other from being clustered together. For example, it may not be beneficial to combine distant boundaries due to evictions and promotions, since eviction boundaries must occur earlier and promotion boundaries later, in order to maximize reuse. As an alternative, the foregoing algorithm could be modified to only introduce boundaries at predetermined locations, such as eviction points. More elaborate versions of the clustering algorithm could permit coalescing of unit boundaries beyond control equivalent regions, but this variation would require more than a straightforward extension to the algorithm presented above. The ranges would no longer be strictly linear. Moving boundaries below branches or above control flow merges would create identical boundaries on all paths from the branches or to the merges. Moving boundaries in the opposite direction could only be permitted if identical boundaries existed on all the paths. An example illustrating the clustering of unit boundaries based on promoted and evicted variables is shown in FIG. 12. In this example, a variable is evicted because the variable becomes dead, i.e., it is not used downstream. The left-hand side of FIG. 12 represents a CFG comprising a plurality of unit blocks of code, including blocks B1, B2, B3, B4, and B5. Each unit block comprises a definition of one or more variables (i.e., "x=. . . ") and/or one or more lines of code operating on a previously-defined variable (i.e., ". . . x . . . "). Blocks B1, B3, and B5 are control equivalent blocks. Two (or more) blocks are control equivalent if the execution of one of the blocks guarantees that execution of the other blocks will occur. In this case, execution of block B1 guarantees that execution of blocks B3 and B5 will occur, no matter what path is taken. Conversely, execution of blocks B2 and B4 are not guaranteed. Furthermore, blocks B2 and B4 are not control equivalent, because execution of either of blocks B2 or B4 does not guarantee execution of the other block. Each definition of a variable is assumed to cause a promotion for that variable. The allowed range of movement for each boundary is denoted by the double-headed arrows. For each variable v, v.sub.p denotes the legal range of motion for the boundary induced by promotion of the variable, and v.sub.e denotes the range for the boundary induced by eviction of the variable. Note that a promotion-induced boundary v.sub.p ranges between the definition of v, which caused the promotion, and the first use of v. Furthermore, the boundary may only be moved to (and merged with) a control equivalent program point. Therefore, the range for y.sub.p ranges from the middle of block B1 to the end of block B3, but not into block B5, due to the intervening use of y in block B4. Blocks B2 and B4 are excluded from the allowed range for y.sub.p since they are not control equivalent to block B1. Given the foregoing input specification, the algorithm works as follows. The algorithm first clusters boundaries in block B1 and blocks that are control equivalent to block B1, i.e., blocks B3 and B5. The ranges corresponding to these control equivalent blocks (both promoted and evicted variable ranges) are sorted by their respective end points, starting with the earliest (beginning at the bottom of the flow graph, working towards the top) end point. Note that ranges w.sub.p and w.sub.e are not eligible to be coalesced with these ranges, since they are in a block (B2), which is not control equivalent to block B1. At the end of this step, the ranges are arranged as in a list L, following the form L=(y.sub.e, x.sub.e, z.sub.e, z.sub.p, y.sub.p, x.sub.p). The algorithm next picks y.sub.e from L, since it ends first, and determines which ranges overlap with y.sub.e at its end points; in this case, x.sub.e and z.sub.e are the overlapping ranges. The intersection of these three ranges is simply the range y.sub.e. Finally, since all three of these ranges are eviction-based ranges, the algorithm places a coalesced boundary point CB.sub.1 at the start of y.sub.e. The coalesced ranges y.sub.e, x.sub.e, and z.sub.e are then removed from L, so that L=(z.sub.p,y.sub.p,x.sub.p). Of the remaining ranges in L, the one with the earliest endpoint is z.sub.p. At the point where z.sub.p ends, it intersects only with y.sub.p. The intersection of the ranges z.sub.p and y.sub.p results in the range z.sub.p. A coalesced boundary point CB.sub.2 for ranges z.sub.p and y.sub.p is placed at the end of z.sub.p, since this is a promotion-based boundary. After removing y.sub.p and z.sub.p from L, the remaining range for this set of control equivalent blocks is x.sub.p. x.sub.p can therefore not be coalesced with any range. Therefore, a coalesced boundary point CB.sub.3 for x.sub.p is simply placed at the end of the range x.sub.p. The system now examines the blocks that are not control equivalent to block B1, i.e., blocks B2 and B4. Block B2 is first selected. There are no other blocks that are control equivalent to block B2. Additionally, the boundary ranges in block B2, w.sub.p and w.sub.e, have no other boundary ranges that could potentially be merged with ranges w.sub.p and w.sub.e. List L in this case is simply L=(w.sub.e,w.sub.p). Since we and w.sub.p do not overlap, the algorithm simply results in a coalesced boundary point CB.sub.4 at the beginning of w.sub.e and a boundary point CB.sub.5 at the end of w.sub.p. The remaining block (B4) has no boundaries in it, so the process is complete. The caching points are then placed at the coalesced boundaries defined above by the algorithm. Separating Static and Dynamic Operations For most straight-line operations, it is clear whether the operation is static or dynamic. However, call instructions are trickier. A call to a regular unspecialized function (or to the unspecialized version of a specialized function) is treated as a dynamic operation and appears only in the dynamic sub-graph. A call to a constant function (or one annotated with @) with static arguments is treated as a regular static computation, appearing only in the static sub-graph. A call to a particular specialized division of a function has both static and dynamic components. To implement such a call, the call operation is split into two separate calls, one static and one dynamic. The static version of the call invokes the statically compiled generating extension for the selected division of the callee, taking as arguments the division's static arguments and returning a static procedure address. This step is followed by a dynamic call that invokes the static procedure address and passes the remaining arguments to produce a dynamic result. The static call will be moved to the static sub-graph, and the dynamic call will appear in the dynamic sub-graph. Control flow nodes, including branches and merges, initially are replicated in both the static and the dynamic sub-graphs. Later transformations can optimize them. Determining Control Flow of the Static Sub-graph Once each unit has been identified and split into separate static and dynamic control flow sub-graphs, the control flow structure of the unit's ReduceAndResidualize function is computed. Static and dynamic branches in the unit receive different treatment. A static branch is taken at specialization time, and does not appear in the dynamically generated (residual) code; accordingly, only one of its successors produces dynamically generated code. Consequently a static branch appears as a regular branch in the final ReduceAndResidualize function, selecting some single successor to pursue and residualize. A dynamic branch, on the other hand, is emitted as a regular branch into the dynamically generated code, and both its successors must be residualized. Consequently, no branch appears in the ReduceAndResidualize function at a dynamic branch, and the successors of the dynamic branch are linearized instead. In the presence of arbitrary, unstructured control flow with mixed static and dynamic branches, this linearization process may require some code duplication to avoid maintaining specialization-time data structures and overhead. The algorithm first splits all static control paths within the unit, linearizing dynamic branches by topologically sorting their successors, then re-merges the common tails of the static paths, bottom-up. A static control path includes all dynamically reachable basic blocks, given particular decisions for all static conditional branches. Each static branch can appear on a static control path at most once, because units cannot contain static loops. The time required by the algorithm can be exponentially related to the maximum number of sequential static branches on any static control path within a single unit, which is expected to be a small number in practice. The next step performs a linearization of the separate branches. Linearization causes what were originally alternative code segments (i.e., blocks that shared a common dynamic branch) to be executed sequentially. The algorithm must ensure that the segments executed earlier do not alter the initial static state expected by subsequent alternative segments. This constraint can be achieved by saving the static state at each dynamic branch and restoring it before executing each branch successor, which is the approach that the algorithm uses to propagate the static context between units. However, within a single unit, a more efficient solution is possible if static variables are converted to SSA form. SSA form ensures that only one assignment is made to each variable, which implies that state changes made by segments that occur earlier in the linearized unit are made to variables not read by alternative segments. In this case, the SSA form is easy to compute, because issues arising from loops and aliasing can be safely ignored due to DyC's restrictions on the form of units (i.e., units cannot contain static loops) and its prohibition of static stores. If these restrictions were eased, however, an alternate solution might have to be found. FIGS. 13A-13B graphically portray the before and after results of an exemplary set of code blocks that have been split and linearized by a system employing the algorithm described above. The process begins with the non-linearized sub-graph shown in FIG. 13A, which comprises six numbered boxes 1, 2, 3, 4, 5, and 6 that respectively represent basic code blocks 401, 402, 403, 404, 405, and 406. A circle 408 enclosing an "S" represents a static branch, while a circle 410 enclosing a "D" represents a dynamic branch. As shown in FIG. 13B, the splitting process creates two separate branches below static branch of circle 408. Since block 406 is originally (see FIG. 13A) common to both branches, it is necessary to create a duplicate block 406' of block 406. The linearization process linearizes execution of blocks 404 and 405, which were previously common descendents of the dynamic branch of circle 410 (now removed). Integrating Dynamic Code into Static Code To produce the final code for a unit's ReduceAndResidualize function, the system uses the linearized static CFG, which computes all the static expressions, and blends in code to generate the dynamic calculations with the appropriate run-time constants embedded in them. To accomplish this step, the system maintains a mapping from each basic block in the dynamic sub-graph to a set of corresponding basic blocks in the static sub-graph. When splitting apart static and dynamic operations, each dynamic block is mapped to its static counterpart(s). Unit linearization may create multiple instances of a basic block in the static sub-graph, as discussed above. The mapping is updated as the static sub-graph is linearized and some blocks are replicated, and as the sub-graphs are optimized, through instruction scheduling. The two sub-graphs are integrated, one dynamic block at a time. First, the static code computes any run-time constants used in the block's dynamic instructions. Then, code to emit the dynamic block is appended to its corresponding static block. The code to emit a dynamic instruction embeds the values of any small run-time constant operands into the immediate field of the emitted instruction. If the run-time constant is too large to fit in the immediate field, code is emitted to load the run-time constant from a global table into a scratch register. The emitted instruction then reads the scratch register to access the run-time constant. The emitting code also performs any peephole optimizations that are based on the run-time constant value, such as replacing multiplications by constants with sequences of shifts and adds. FIG. 13C illustrates the split and linearized sub-graph of FIG. 13B after it has been integrated (i.e. merged). Note that the right-hand branch is still linearized, while block 406 once again becomes a common descendent of both branches. Optimizing Specializer Interactions Each initial promotion point at the entrance to a dynamic region is implemented by generating a static call to the run-time specializer, passing the run-time values of the cache context at that program point. The specializer section above described the run-time specializer as if a single, general purpose specializer took control at this and all other unit boundaries. The system optimizes this pedagogical model as follows. The Specialize function is specialized for each Unit argument. All the run-time manipulations of the Unit and UnitEdge data structures are eliminated, the unit's ReduceAndResidualize function is inlined, and the processing of outgoing lazy unit edges is inlined. If the cache policy for any of the unit's context variables is CacheAllUnchecked, then the cache lookup and store calls are omitted. Rather than recursively call Specialize, a pending-list is used to keep track of unprocessed (eager) unit edges. Furthermore, the overhead of pushing and popping the static context on and off the pending-list can be avoided for one successor of each unit, which eliminates more than half of this overhead in dynamic regions without dynamic switch statements. In addition, ends of dynamic regions are compiled into direct jumps to statically compiled code. Enabling On-demand (Lazy) Specialization Across Arbitrary Control flow Edges It is often desirable to defer specialization of arbitrary program parts until those parts are guaranteed to be executed. The system preferably uses lazy specialization to accomplish this objective. Lazy specialization provides faster specialization, enables aggressive specialization that depends on interleaving execution and specialization, guarantees termination of certain aggressive forms of specialization, and reduces the amount of space occupied by the specialized code. Lazy specialization enables specialization of promoted static variables whose values may only be known when the thread of execution reaches the code awaiting lazy specialization. The value of a promoted static variable at a given program point may change many times during execution of the program. Lazy specialization can produce optimized code for each of these values and also enables deferred specialization of parts of the code that may not be executed, thereby reducing both the amount of specialized code produced and the overhead of producing it. In particular, it enables an aggressive form of loop specialization, termed complete loop unrolling, wherein each iteration of the loop is specialized on demand, as it is about to execute. In the absence of lazy behavior, specialization by complete loop-unrolling is not guaranteed to terminate. In most prior art systems, specialization akin to lazy specialization as used in the present invention is restricted to specializing entire functions on different values of function arguments. However, as is done in a preferred embodiment of the present invention, it is often beneficial to lazily specialize parts of a program smaller than a function. Such parts may include a rarely taken side of a conditional branch, separate iterations of a loop, or a fragment of code within a function that uses a static variable. Some prior art systems do allow a limited set of optimizations other than specialization to be performed lazily on arbitrary parts of programs. However, when these systems resume optimization after the deferral, they cannot use optimization information computed before the deferral, produce optimized code for different instances of this optimization information, or defer optimization again when optimizing the initial deferred part of the program. Without these capabilities, these systems cannot add to their repertoire of optimizations any specialization that specializes effectively across a deferral, specializes on promoted static variables, or provides complete loop-unrolling that is guaranteed to terminate. As noted above, DyC supports specialization of arbitrary program fragments, not just of entire functions. Further, because the system has a mechanism to propagate information across deferrals (called the context of specialization), it can both propagate optimization information across deferrals and specialize on promoted static variables (the many values of which correspond to different contexts). Finally, the system is capable of deferring specialization while specializing code that is already deferred, thereby enabling complete loop unrolling. This capability is referred to as nested laziness. DyC's Implementation of Lazy Specialization Given a CFG representation of a program, and a particular edge in the CFG that connects two basic blocks, the system provides a mechanism for eagerly specializing the source block of the edge, while deferring the specialization of the destination block until the specialized version of the source block and the edge are executed (i.e., a mechanism for lazy specialization). The system replaces the edge with code that generates, immediately after the source block is specialized, i.e., with a stub. The stub, when executed, gathers values of variables that comprise the context of specialization of the destination block and invokes a routine to specialize the destination block with respect to the context. If possible, the stub patches in a direct branch from the specialized version of the source block to the specialized version of the destination block, so that in future traversals of this edge, the stub may be bypassed. Finally, the stub jumps to the newly specialized version of the destination block. In more detail, consider an edge E.sub.12 that connects two basic blocks, B.sub.1 and B.sub.2, as shown in FIG. 14. Suppose both B.sub.1 and B.sub.2 are to be specialized, but specialization will be suspended across E.sub.12, thereby making E.sub.12 a lazy edge. In other words, B.sub.2 will not be specialized until both the specialized version of B.sub.1 has been executed and the execution seeks to continue across edge E.sub.12. FIG. 15 illustrates the logic used by the system to transform the sub-graph shown in FIG. 14 into a sub-graph, which, when executed, achieves suspended specialization across edge E.sub.12. The logic begins in a block 500, which invokes a procedure to generate a specialized version of B.sub.1 called version B.sub.1 ', based on the specialization principles discussed above. The execution of B.sub.1 ' computes values that constitute the context that will be used to specialize block B.sub.2. The code to construct this contact is generated, as noted in a block 502. In a block 504, the system patches the branch at the end of B.sub.1 ' to point to the beginning of the context-gathering code that was generated in block 502. The code generated in block 502 is the beginning of the stub that will be executed if the execution of B.sub.1 ' is followed by a traversal of edge E.sub.12. The next part of the stub is generated in a block 506, and comprises code that invokes the specialization mechanism to specialize B.sub.2 with respect to the context just constructed. This specialized version of B.sub.2 is called B.sub.2 '. Note that if B.sub.2 has a lazy edge emanating from it (i.e., it has nested laziness), the mechanism in FIG. 15 would suspend specialization across that edge as well. Specialization of deferred code can thus itself be lazy. The logic next proceeds to a decision block 508, which determines whether edge E.sub.12 is one-time lazy. If the context for specializing B.sub.2 will be unchanged on all subsequent executions of B.sub.1 ', edge E.sub.12 is termed one-time lazy, and the system can patch the stub so that it will not be executed during future traversals of edge E.sub.12. If the answer to decision block 508 is yes, the logic flows to a block 510, which generates code to patch the branch at the end of B.sub.1 ' to point to the beginning of B.sub.2 ', thus bypassing the stub in future traversals of edge E.sub.2. If the answer is no, block 510 is skipped, and the logic proceeds to a block 512, which generates a jump to B.sub.2 ', so that B.sub.2 ' is to be executed after the stub is executed. The logic also proceeds to block 512 after completion of block 510, if applicable. The process is completed in a block 514, which specializes any other blocks than can be specialized with B.sub.1 without deferral. The code example in FIG. 16 illustrates an implementation of the lazy specialization process described above. In particular, this example shows how stubs are generated to achieve suspension of specialization, and how these stubs are replaced by specialized basic blocks once specialization resumes. In FIG. 16, an exemplary function, foo(x,y), has a CFG comprising three basic blocks B11, B12, and B13, connected by edges 600 and 602. It is assumed that a programmer specifies (e.g. by the make_static annotation shown) that the body of the function will be specialized with respect to a variable x. Furthermore, it is assumed that block B3 will be infrequently executed, and that the programmer has chosen to specialize the if(y) branch lazily (using the "lazy" policy in the annotation). The effect of specializing a branch lazily is to suspend specialization across any edge of the branch that has not been traversed at least once at the time of specialization. Therefore, the first time foo() is invoked, specialization will be suspended across edges 600 and 602. FIG. 17 illustrates the stubs (labeled stub-B12 and stub-B13) generated by the system to effect the suspension of specialization; the stubs are respectively connected to a specialized version of block B11, labeled B11', by edges 604 and 606. Now suppose that the specialized code in FIG. 17 is executed. Suppose also that the common case holds, and edge 604 is traversed, causing stub-B12 to be executed, leading to the creation (and execution) of a specialized version of B12 (labeled B12'). Stub B12' takes the place of stub-B12. As shown in FIG. 18, in future calls to foo() (if the common case holds), execution will traverse an edge 608, causing the specialized code B12' to be executed without entering stub-B12. Finally, suppose the uncommon case (traversal of edge 606) eventually holds, causing stub-B13 to be executed. A specialized version of block B13, called B13', will then be created and executed. As shown in FIG. 19, stub-B13 will be replaced by block B13', creating a new edge 610, so that subsequent instances of the uncommon case will cause execution to directly flow to block B13'. At the end of this process, foo() will have been completely specialized, with no specialization deferred further along either branch edge. The principles presented above regarding lazy specialization can be applied to other program conditions. For instance, the notion of a value of a variable (on which specialization takes place) can be broadened to include class, type, or similar abstract properties of the variable. The context information gathered and passed can have any of these kinds of information. Furthermore, the run-time optimizations performed need not be restricted to specialization. Any run-time optimization that benefits from deferring optimization across certain branches can benefit from these principles. For these optimizations, the system provides the benefits of lazy optimization at arbitrary program points, of propagating optimization information across deferrals (through the context, which in this case would contain the particular optimization information), and of nested laziness. Conditional Specialization It is often the case that it is unknown at compile time whether it would be advantageous to apply specialization at run time. In order to overcome this problem, the system provides a method for implementing conditional specialization, whereby a programmer can have code portions dynamically specialized based on arbitrary compile-time and run-time information. The method preferably uses extensions to the C programming language to specify the conditions on which specialization should depend, thereby enabling a programmer to decide when it is profitable to specialize. The two opposing factors that usually determine whether | ||||||
