Method for optimizing array bounds checks in programs6343375Abstract A method and several variants for optimizing the detection of out of bounds array references in computer programs are described, while preserving the semantics of the computer program. Depending on the variant implemented, the program is divided at run-time or compile-time into two or more regions. The regions are differentiated by the number of checks that need to be performed at run-time on the array accesses within the region. In particular, some regions of the program will not need any array bounds checks performed at run-time, which will increase the speed at which the computer program executes. As well, the state of program variables at the time any out of bounds access is detected is the same as the state of the program variables would have been had the transformation not been performed. Moreover, the regions not needing any checks at run-time will be known at compile-time, enabling further compiler optimizations on the region. The variants of the method are distinguished by the number of regions created, the number of checks needed at run-time, and the size of the program that results from the optimization. Claims Having thus described our invention, what we claim as new and desire to secure by Letters Patent is as follows: Description BACKGROUND OF THE INVENTION
concise form explicit-references form
do i=l.sub.i,u.sub.i
A.sub.1 [.sigma..sub.1 ]
do i=l.sub.i,u.sub.i A.sub.2 [.sigma..sub.2 ]
B(i) .
.
.
end do
A.sub..rho. [.sigma..sub..rho. ]
end do
We define .chi..sub.j [i] as the outcome of reference A.sub.j [.sigma..sub.j ] in iteration i. An outcome represents the result of executing the reference. The possible values for .chi..sub.j [i] are given in FIG. 2. We can express the outcome .chi..sub.j [i] as a function of the reference A.sub.j [.sigma..sub.j ] and the value of the loop index i: .chi..sub.j [i]=.gamma.(A.sub.j [.sigma..sub.j ],i). (2) We represent by .chi.[i] the vector describing the .rho. outcomes for iteration i. Also, we represent by X[l.sub.i :u.sub.i ] the matrix of all outcomes for all iterations of loop L(i,l.sub.i,u.sub.i,B(i)). X[i]=.chi.[i] and X[i][j]=.chi..sub.j [i]. The structure of X is shown in FIG. 7. Again, note that it is valid to use a function .gamma.(A.sub.j [.sigma..sub.j ],i) that computes an estimate .chi..sub.j [i] of the outcome as long as .tau..sub.min (.chi..sub.j [i]).gtoreq..tau..sub.min (.chi.j[i]) We want to identify any violations (null pointers or out of bounds) that happen during execution of reference A.sub.j [.sigma..sub.j ]. Therefore, before each reference A.sub.j [.sigma..sub.j ] in iteration i, we need to perform a test .tau..sub.j [i] that identifies the appropriate violation. The actual test to be performed before an array reference A.sub.j [.sigma..sub.j ] in iteration i can be expressed as a function of the outcomes of array references, the value of j for the particular reference, and the loop index i: .tau..sub.j [i]=.zeta.(X,j,i) (3) where .zeta.(X,j,i) can be any function satisfying .zeta.(X,j,i).gtoreq..tau..sub.min (.chi..sub.j [i]). (4) In particular, the choice of test for one reference can depend on the outcome of another reference, as in: ##EQU5## In this case, for each iteration, either an all tests is performed before each array reference, or no tests are performed. We represent by .tau.[i] the vector describing the .rho. tests that have to be performed for iteration i. We represent by T[l.sub.i :u.sub.i ] the matrix of all tests for all iterations of loop L(i,l.sub.i,u.sub.i,B(i)). T[i]=.tau.[i] and T[i][j]=.tau..sub.j [i]. The structure of T is shown in FIG. 8. The computation of .tau..sub.j [i] for all j and all i can be performed by a function Z that operates on matrix X producing matrix T (the function Z(X) can be seen as a matrix form of .zeta.(X,j,i)): T[l.sub.i :u.sub.i ]=Z(X[l.sub.i :u.sub.i ]). (6) We define B.sub..tau.[i] (i) as a version of body B(i) that performs the tests described by .tau.[i]. B.sub..tau.[i] (i) can be constructed by the transformation: ##EQU6## The loop L(i,l.sub.i,u.sub.i,B(i)) can be transformed into a loop L(i,l.sub.i,u.sub.i,B.sub.96 [i] (i)) which performs the tests to detect the violations: ##STR1## The transformed loop L(i,l.sub.i,u.sub.i,B.sub..tau.[i] (i)) can be generated dynamically or statically. Using dynamic code generation, the appropriate versions of the loop body B.sub..tau.[i] (i) can be generated before each iteration. (Versions can be saved and reused.) Using static code generation techniques, the transformed loop 901 in FIG. 9 can be implemented as shown in 905. For reasons of performance, it is appropriate to group together iterations of the iteration space that have similar characteristics. A region of an iteration space is defined as a sequence of consecutive iterations of a loop that perform the same tests. That is, if iterations i.sub.1 and i.sub.2 belong to the same region, then .tau.[i.sub.1 ]=.tau.[i.sub.2 ]. The iteration space of a loop can be divided into n regions, represented by a vector {character pullout}[1:n] of regions, with each region identified by a triple: {character pullout}[.delta.]=({character pullout}[.delta.].l,{character pullout}[.delta.].u,{character pullout}[.delta.]..tau.),.delta.=1, . . . , n (7) where {character pullout}[.delta.].l is the first iteration of region .delta., {character pullout}[.delta.].u is the last iteration of region .delta. and {character pullout}[.delta.]..tau. is the vector of tests for all iterations in region .delta.. The structure of vector {character pullout}[1:n] is shown in FIG. 10. Using this partitioning of the iteration space into regions, the loop L(i, l.sub.i, u.sub.i, B(i)) can be transformed as follows: ##EQU7## A partitioning can have any number of empty regions. A region {character pullout}[.delta.] is empty if {character pullout}[.delta.].l>{character pullout}[.delta.].u. Given a partitioning vector with empty regions, we can form an equivalent partitioning vector with only nonempty regions by removing the empty regions from the first vector. From now on, we consider only partitioning vectors for which every region is nonempty. For a partitioning vector {character pullout}[1:n] to be valid, the following conditions must hold: 1. The first iteration of region {character pullout}[.delta.+1] must be the iteration immediately succeeding the last iteration of region {character pullout}[.delta.] in execution order, for .delta.=1, . . . , n-1. We can express this requirement as: {character pullout}[.delta.+1].l={character pullout}[.delta.].u+1. (8) 2. The first iteration of {character pullout}[1] must be the first iteration, in execution order, of the iteration space. We can express this requirement as: {character pullout}[1].l=l.sub.i. (9) 3. The last iteration of {character pullout}[n] must be the last iteration, in execution order, of the iteration space. We can express this requirement as: {character pullout}[n].u=u.sub.i. (10) 4. The test for array reference A.sub.j [.sigma..sub.j ] in region {character pullout}[.sigma.], as defined by the test vector {character pullout}[.delta.]..tau., has to be greater than or equal to the minimum test for each outcome of reference A.sub.j [.sigma..sub.j ] in region {character pullout}[.delta.]. We can express this requirement as: ##EQU8## Note that there are many legal partitionings of an iteration space (infinite, if empty regions are used). In particular, given a valid partitioning vector {character pullout}[1:n], it is legal to move the first iteration of a region {character pullout}[.delta.] to the preceding region {character pullout}[.delta.-1] provided that: 1. Region 1{character pullout}[.delta.-1] exists. That is, .delta.=2, . . . , n. 2. The test vector {character pullout}[.delta.-1]..tau. is greater than or equal to the test vector {character pullout}[.delta.]..tau.. That is, {character pullout}[.delta.-1]..tau..sub.j.gtoreq.{character pullout}[.delta.]..tau..sub.j, for j=1, . . . , 92 . Conversely, given a valid partitioning vector {character pullout}[1:n], it is legal to move the last iteration of a region {character pullout}[.delta.] to the succeeding region {character pullout}[.delta.+1] provided that: 1. Region {character pullout}[.delta.+1] exists. That is, .delta.=1, . . . , n-1. 2. The test vector {character pullout}[.delta.+1]..tau. is greater than or equal to the test vector {character pullout}[.delta.]..tau.. That is, {character pullout}[.delta.]..tau..sub.j.gtoreq.{character pullout}[.delta.]..tau..sub.j, for j=1, . . . , .sigma.. These rules can be applied repeatedly to allow multiple iterations to be moved from one region to another. The partitioning of most interest is what we call the minimum partitioning, which has the minimum number of regions. This partitioning is obtained when all the nonempty regions {character pullout}[.delta.] are maximal in size. A region {character pullout}[.delta.]=({character pullout}[.delta.].l,{character pullout}[.delta.].u,{character pullout}[.delta.]..tau.) is maximal in size if .tau.[{character pullout}[.delta.].l-1].noteq.{character pullout}[.delta.]..tau. and .tau.[{character pullout}[.delta.].u+1].noteq.{character pullout}[.delta.]..tau.. That is, if the region can not be enlarged by adding preceding or succeeding iteration, then the region is maximal in size. Note that .tau.[l.sub.i -1] and .tau.[u.sub.i +1] are undefined and therefore different from any {character pullout}[.delta.]..tau.. The minimum partitioning may contain regions that could be shown to be empty at run-time or with extensive compile-time analysis, but cannot be shown empty at compile-time or with naive compile-time analysis. The vector of tests for array references in a loop execution (represented by matrix T) is defined by function .xi.(X,j,i). The matrix T defines which versions of the loop body need to be generated. Depending on when code generation occurs, different levels of information are available about the spaces which form the domain and range of the function .xi.(X,j,i) for a particular execution of the loop. The .xi.(X,j,i) function is a mapping from the space of outcomes(X), array references, and the loop iteration space into the space of tests (T). Of the inputs, or domain, only the space of array references can generally be precisely known from a naive examination of the source code of the program. Without further analysis of the program, the space of the vector of outcomes is assumed to contain all possible vectors of outcomes, and the integer space forming the loop iteration space is bounded only by language limits on the iteration space. Standard analytic techniques such as constant propagation (S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann Publishers, 1997) and range analysis (W. Blum and R. Eigenmann, "Symbolic range propagation", Proceedings of the 9.sup.th International Parallel Processing Symposium, April 1995) can be used to more precisely compute the iteration space that partially forms the domain of the .xi.(X,j,i) function. Analysis may also show that some vectors of outcomes are not possible. For example, if the loop lower bound is known, the array lower bound is known, and all terms in array references that do not depend on the loop control variable are known, it may be possible to show, by examining the source code, that no lower bounds violations will be in the vector of outcomes. More aggressive symbolic analysis can be used to show that no consistent solution exists for some combinations of outcomes. For example, it may be possible to show that a lower bound violation of A.sub.1 and an upper bound violation of A.sub.2 are inconsistent for the same value of the loop index variable i. When more complete information is available to the algorithms computing regions, more precise region generation (closer to a minimum partitioning) is possible. More information can be made available by delaying the computation of regions until run-time (using the inspector-executor and other techniques described below) or until load-time (using dynamic or just-in-time compilation) (J. Auslander, M. Philiopose, C. Change, S. Eggers, and B. Bershad, "Fast effective dynamic compilation", Procedings of the ACM SIGPLAN '96 Conference on Programming Language Design and Implementation, pp. 149-159, Philadelphia, Pa., 21-24 May 1996). The following algorithm (procedure inspector) shows the implementation of an inspector that computes the minimum partitioning of an iteration space. The inspector collapses the computation of the outcome vector .chi.[i], the test vector .tau.[i], and the actual partitioning. For each reference A.sub.j [.sigma..sub.j ], the inspector verifies the outcome .chi..sub.j [i] of the reference and immediately computes .tau..sub.j [i]. The particular inspector shown implements .xi.(X,j,i)==.tau..sub.min (.chi..sub.j [i]), but any legal .xi.(X,j,i) function could be used. After building the vector .tau. for iteration i, the inspector checks if this .tau. is different from the test vector .tau. of the current region. The inspector either starts a new region or adds iteration i to the current region.
procedure inspector (R,n,l.sub.i,u.sub.i,(Aj[.sigma..sub.j ],j=1,
. . . , .rho.))
n=0
.tau.=undefined
do i=l.sub.i,u.sub.i
do j=1,.rho.
.tau..sub.j =no test
if (A.sub.j =null) .tau..sub.j =null test
elsif (.sigma..sub.j <lo(A.sub.j)).tau..sub.j =lb test
elsif (.sigma..sub.j >up(A.sub.j)).tau..sub.j =ub test
end if
end do
if (.tau.=.tau.) then
R[n].u=i
else
n=n+1
R[n]=(i,i,.tau.)
.tau.=.tau.
end if
end do
end procedure
An inspector to construct the minimum partitioning of an iteration space.
An embodiment of the invention to optimize array reference tests for a loop L(i,l.sub.i,u.sub.i,B(i)) is shown in FIG. 11. Outcomes for each array reference of a loop are computed in 1101 of FIG. 11. Tests for each array reference of a loop are computed in 1102 of FIG. 11. The different versions of the loop body B.sub..tau. (i) are generated in 1103 of FIG. 11 for all possible values of .tau.. Execution of loop L(i,l.sub.i,u.sub.i,B.sub..tau.[i] (i)) in 1104 of FIG. 11 completes the process. Based solely on the list of primitive tests of FIG. 3, there are 5.sup..rho. different versions of a loop body with .rho. array references A.sub.j [.sigma..sub.j ]. We arrive at this number because for each A.sub.j [.sigma..sub.j ] there are five possible tests (FIG. 3): no test, null test, lb test, ub test, and all tests. However, the choice of a function .xi.(X,j,i) naturally limits the test vectors that can be generated. The particular inspector shown never generates an all tests, and therefore only 4.sup..rho. different versions of the loop body are possible. Summarizing, the process of optimizing the array access checks in a loop involves: 1. Using a function .gamma.(A.sub.j [.sigma..sub.j ],i) to compute matrix X where X[i][j]=.chi..sub.j [i] is the outcome of reference (A.sub.j [.sigma..sub.j ] in iteration i (1101 in FIG. 11). 2. Using a function .xi.(X,j,i) to compute vector T where T[i][j]=.tau..sub.j [i] is the test to be performed before reference (A.sub.j [.sigma..sub.j ] in iteration i (1102 in FIG. 11). 3. Generating the necessary versions B.sub..tau. (i) of body B(i) (1103 in FIG. 11). 4. Executing the appropriate version B.sub..tau.[i] (i) in each iteration i of loop L(i,l.sub.i,u.sub.i,B(i)) (1104 in FIG. 11). The process for optimizing array reference tests using regions is shown in FIG. 12. The main differences between this process and that of FIG. 11 are in the computation of regions 1203 and in the execution of loop 1205. The transformation used to execute the loop in 1205 is shown in FIG. 13. The original loop 1301 (shown in more detail in 1311) is transformed into loop 1305 (shown in more detail in 1314). Driver loop 1315 iterates over the regions, and loop 1316 executes the iterations for each region. Execution of 1314 corresponds to performing 1205 in FIG. 12. Again, the loop body versions can be generated either dynamically or statically. Static generation can be performed following the patterns shown in FIG. 14 and FIG. 15, as explained next. Let there be m distinct values for the test vectors {character pullout}[1]..tau., {character pullout}[2]..tau., . . . ,{character pullout}[n]..tau.. We label these test vectors .tau..sup.1,.tau..sup.2, . . . ,.tau..sup.m. We need to generate m versions of the loop body, B.sub..tau..sub..sup.1 (i),B.sub..tau..sub..sup.2 (i), . . . ,B.sub..tau..sub..sup.m (i), one for each possible test vector .tau.. In FIG. 14, m different loops of the form do i={character pullout}[.delta.].l,{character pullout}[.delta.].u B.sub..tau..sub..sup.k (i) end do are instantiated, one for each version of .tau..sup.k, k=1, . . . , m. These loops are shown in 1410, 1415, and 1421. The loops are guarded by their respective if-statements (1409, 1414, and 1420) that select only one loop for execution in each iteration of the driver loop 1408. An alternative is shown in FIG. 15. Again, m different loops of the form do i=l.sup.k (.delta.),u.sup.k (.delta.) B.sub..tau..sub..sup.k (i) end do are instantiated, one for each version of .tau..sup.k, k=1, . . . , m. These loops are shown in 1509, 1512, and 1516. To guarantee that only one of the loops executes in each iteration of the driver loop 1508, we define: ##EQU9## This method of partitioning the iteration space of a loop into regions and implementing each region with code that tests for at least the array violations that occur in that region constitute our method for optimizing array reference tests during the execution of a loop while preserving the semantics of the loop. It is important to note that this method can be applied to each loop in a program or program fragment independently. FIG. 16 illustrates the method as applied to both loops 1602 and 1604 of a doubly nested loop structure 1601. The resulting structure is shown in 1609. {character pullout}.sub.1 is the regions vector for loop 1602, and {character pullout}.sub.2 is the regions vector for loop 1604. Note that the versions S.sup.1 (i.sub.1) and S.sup.2 (i.sub.1) in 1609 (1612 and 1618) are controlled just by {character pullout}.sub.1, while the versions of the loop body B(i.sub.1, i2) (1615) is controlled both by {character pullout}.sub.1 and {character pullout}.sub.2. The actual implementation of 1609 can use any combination of the methods in FIG. 14 and FIG. 15. FIG. 17 shows an implementation using the method of FIG. 14 for both loops. In some situations, it may be possible to directly compute .chi..sub.j [i] and .tau..sub.j [i] for a reference A.sub.j [.sigma..sub.j ] and all values of i. When this is the case, the reference A.sub.j [.sigma..sub.j ] can be dropped from procedure inspector, and we can just make .tau..sub.j =.tau..sub.j [i] in each iteration i. When .tau..sub.j [i] can be directly computed for all references A.sub.j [.sigma..sub.j ] and all values of i, the inspector becomes unnecessary. Stated differently, the role of the inspector is performed by direct computation. Direct Computation of Outcomes Consider an array reference of the form A[.function.(i)] in the body B(i) of loop L(i,l.sub.i,u.sub.i,B(i)). Let the function .function.(i) be either monotonically increasing or monotonically decreasing in the domain i=l.sub.i, . . . , u.sub.i. (We later relax this constraint for some types of functions.) We use the notation lo(A) and up(A) to denote the lower and upper bounds of A respectively. For the method to work in all cases, we define lo(A)=0 and up(A)=-1 when A is null. The safe region of the iteration space of i with respect to an indexing operation A[.function.(i)] is the set of values of i that make .function.(i) fall within the bounds of A. That is, for all i in the safe region, the outcome .chi.[i] of reference A[.function.(i)] is OK. Therefore, in the safe region we want: lo(A).ltoreq..function.(i).ltoreq.up(A), .A-inverted.i .epsilon. safe region. If .function.(i) is monotonically increasing, the safe region is defined by: .left brkt-top..function..sup.-1 (lo(A)).right brkt-top..ltoreq.i.ltoreq..left brkt-bot..function..sup.-1 (up(A)).right brkt-bot.. (14) Conversely, if .function.(i) is monotonically decreasing the safe region is defined by: .left brkt-top..function..sup.-1 (up(A)).right brkt-top..ltoreq.i.ltoreq..left brkt-bot..function..sup.-1 (lo(A)).right brkt-bot.. (15) Equations (14) and (15) define the range of values of i that generate legal array indices for A. However, i must also lie between the loop bounds l.sub.i and u.sub.i. In general, we want: {character pullout}(.function.(.),A).ltoreq.i.ltoreq.{character pullout}(.function.(.),A) where ##EQU10## {character pullout}(.function.(.),A) and {character pullout}(.function.(.),A) are, respectively, the safe lower bound and safe upper bound of the iteration space of i with respect to A[.function.(i)]. If {character pullout}(.function.(.),A)>{character pullout}(.function.(.),A), then the safe region of the iteration space of i with respect to A[.function.(i)] is empty. If {character pullout}(.function.(.),A).ltoreq.{character pullout}(.function.(.),A), then these safe bounds partition the iteration space into three regions: (i) the region from l.sub.i to {character pullout}(.function.(.),A)-1, (ii) the safe region from {character pullout}(.function.(.),A) to {character pullout}(.function.(.),A), and (iii) the region from {character pullout}(.function.(.),A)+1 to u.sub.i. Any one of regions (i), (ii), and (iii) could be empty. We can compute the outcome .chi.[i] of array reference A[.function.(i)] in each of these regions: if .function.(i) is monotonically increasing, ##EQU11## if .function.(i) is monotonically decreasing, ##EQU12## If the body B(i) has .rho. indexing operations on i, of the form A.sub.1 [.function..sub.1 (i)],A.sub.2 [.function..sub.2 (i)], . . . , A.sub..rho. [.function..sub..rho. (i)], we can compute {character pullout}(.function.(.),A.sub.j) and {character pullout}(.function..sub.j (.),A.sub.j) (1.ltoreq.i.ltoreq..rho.) using Equations (16) and (17). From there, we can compute the outcome .chi..sub.j [i] for each reference A.sub.j [.function..sub.j (i)] and iteration i using Equations (18) and (19). Next, we discuss how to actually compute {character pullout}(.function.(.),A) and {character pullout}(.function.(.),A) in the case of some common subscript functions. Linear Subscripts In the particular case of a linear subscript function of the form .function.(i)=ai+b, the inverse function .function..sup.-1 can be easily computed: ##EQU13## Also, the monotonicity of .function.(i) is determined from the value of a: if a>0, then .function.(i) is monotonically increasing, and if a<0, then .function.(i) is monotonically decreasing. Note that the values of a and b need not be known at compile time, since {character pullout}(.function.(.),A) and {character pullout}(.function.(.),A) can be efficiently computed at run-time. Affine Subscripts Consider the d-dimensional loop nest
do i.sub.1 = l.sub.i.sub..sub.1 ,u.sub.i.sub..sub.1
do i.sub.2 = l.sub.i.sub..sub.2 ,u.sub.i.sub..sub.2
. . .
do i.sub.d = l.sub.i.sub..sub.d ,u.sub.i.sub..sub.d
B(i.sub.1,i.sub.2, . . . , i.sub.d)
end do
. . .
end do
end do.
Let there be an array reference A[.function.(i.sub.1,i.sub.2, . . . , i.sub.d)] in the body of B(i.sub.1,i.sub.2, . . . , i.sub.d). Let the subscript be an affine function of the form .function.(i.sub.1,i.sub.2, . . . , i.sub.d)=a.sub.1 i.sub.1 +a.sub.2 i.sub.2 + . . . +a.sub.d i.sub.d +b, where i.sub.1,i.sub.2, . . . , i.sub.d are loop index variables, and a.sub.1,a.sub.2, . . . , a.sub.d, b are loop invariants. At the innermost loop (i.sub.d) the values of i.sub.1,i.sub.2, . . . , i.sub.d-1 are fixed, and .function.(.) can be treated as linear on i.sub.d. Determination of safe bounds for the i.sub.1,i.sub.2, . . . , i.sub.d-1 loops can be done using the inspector/executor method described above. Alternatively, these safe bounds can be approximated. Replacing true safe bounds {character pullout}(.function.(.),A) and {character pullout}(.function.(.),A) by approximated safe bounds {character pullout} and {character pullout} does not introduce any hazards as long as {character pullout}.gtoreq.{character pullout}(.function.(.),A) and {character pullout}.ltoreq.{character pullout}(.function.(.),A). Techniques for approximating the iteration subspace of a loop that accesses some range of an affinely subscripted array axis are described in S. P. Midkiff, "Computing the local iteration set of block-cyclically distributed reference with affine subscripts", Sixth Workshop on Compilers for Parallel Computing, 1996, and K. van Reeuwijk, W. Denissen, H. J. Sips, and E. M. R. M. Paalvast, "An implementation framework for HPF distributed arrays on message-passing parallel computer systems", IEEE Transactions on Parallel and Distributed Systems, 7(9):897-914, September 1996. Constant Subscripts For an array reference A[.function.(i)] where .function.(i)=k (a constant), .function.(i) is neither monotonically increasing nor monotonically decreasing. Nevertheless, we can treat this special case by defining ##EQU14## {character pullout}(k,A)=u. Basically, the safe region for reference A[k] is either the whole iteration space, if k falls within the bounds of A, or empty otherwise. Modulo-function Subscripts Another common form of array reference is A[.function.(i)] where .function.(i)=g(i) mod m+ai+b. In general, this is not a monotonic function. However, we know that the values of .function.(i) are within the range described by ai+b+j, for i=l.sub.i, l.sub.i +1, . . . , u.sub.i and j=0,1, . . . , m-1. We define a function h(i,j)=ai+b+j. Let h.sub.max be the maximum value of h(i,j) in the domain i=l.sub.i, l.sub.i +1, . . . , u.sub.i and j=0,1, . . . , m-1. Let h.sub.min be the minimum value of h(i,j) in the same domain. These extreme values of h(i,j) can be computed using the techniques described by Utpal Banerjee, in "Dependence Analysis", Loop Transformations for Restructuring Compilers, Kluwer Academic Publishers, Boston, Mass., 1997. Then we can define ##EQU15## {character pullout}(.function.(i),A)=u.sub.i. That is, the safe region is the whole iteration space if we can guarantee that .function.(i) is always within the bounds of A, or empty otherwise. Known Subscript Range All the previous functions are particular cases of subscript functions for which we can compute their range of values. If for an array reference A[.function.i)] we know, by some means, that h.sub.min.ltoreq..function.(i).ltoreq..sub.max, then we can define ##EQU16## {character pullout}(.function.(i),A)=u.sub.i. That is, the safe region is the whole iteration space if we can guarantee that .function.(i) is always within the bounds of A, or empty otherwise. If the subscript function is not one of the described cases, then the more general inspector/executor method described above should be used to determine the outcomes of an array reference A[.sigma.]. This method also lifts the restriction on function .function.(.) being monotonically increasing or decreasing. Application of Direct Computation of Outcomes Let there be .rho. array references A.sub.j [.rho..sub.i ] in the body B(i) of a loop L(i,l.sub.i,u.sub.i,B(i)). Consider the case where all .rho. references are of the form A.sub.j [.function..sub.j (i)],1.ltoreq.j.ltoreq..rho. and .function..sub.j (i) is such that we can compute {character pullout}(.function..sub.j (.),A.sub.j) and {character pullout}(.function..sub.j (.),A.sub.j) as described above. Therefore, we can directly compute .chi..sub.j [i] for any j and i. Moreover, there are only three possible values for an outcome .chi..sub.j [i]:(<,OK,>). We number these outcomes 0, 1, and 2, respectively. Note that there are 3.sup..rho. possible values for .chi.[i]. Two arrays L[1:.rho.][0:2] (for lower bounds) and U[1:.rho.][0:2] (for upper bounds) are formed. Element L[i][j] contains the upper bound of loop indices that lead to outcome j at reference i. Element U[i][j] contains the upper bound of loop indices that lead to outcome j at reference i. Elements L[i][j] and U[i][j] can be computed using: ##EQU17## L[j][1]={character pullout}(.function..sub.j (.),A.sub.j), (23) U[j][1]={character pullout}(.function..sub.j (.),A.sub.j), (24) ##EQU18## Outcome types on a particular reference divide the iteration space into three disjoint segments (regions). Therefore, outcome types for all .rho. references divide the iteration space into 3.sup..rho. disjoint regions (3.sup..rho. is the number of possible values for .chi.[i]). Let .xi.(.delta.) denote the j.sup.th digit of the radix-3 representation of .delta.. We can form a vector {character pullout}[1:3.sup..rho. ] of all possible regions, where {character pullout}[.delta.] is the region with outcome .xi..sub.j (.delta.) for reference A.sub.j [.function..sub.j (i)]. The tests performed in this region are the minimum necessary to detect array access violations, according to the outcome. The region {character pullout}[.delta.] can be computed as: ##EQU19## {character pullout}[.delta.]..tau.=(.tau..sub.min (.xi..sub.1 (.delta.)),.tau..sub.min (.xi..sub.2 (.delta.)), . . . ,.tau..sub.min (.xi..sub..rho. (.delta.))). (29) Vector {character pullout} cannot be used directly in 1314 of FIG. 13 because the regions are not necessarily ordered by increasing iteration number. Therefore, we compute vector {character pullout} by sorting {character pullout}[.delta.] on its l field: {character pullout}[1:3.rho.]={character pullout}[1:3.sup..rho. ] sorted in ascending order of l. (30) This vector {character pullout} can now be used in 1314 to execute the loop. The next method is a variant of the method for optimizing array reference tests during the execution of a loop while preserving the semantics of the loop described above in which only the versions of the loop body B(i) that might actually be executed are generated. In particular, versions of the loop body B(i) are generated only for those regions described in the previous section for which it cannot be shown by data flow and symbolic analysis that {character pullout}[.delta.].l>{character pullout}[.delta.].u, 1.ltoreq..delta..ltoreq.3.sup..rho.. We use the same notation developed above. An embodiment of the method is shown in FIG. 18. FIG. 18 shows the process by which the necessary versions of the loop are generated. Computation of .chi..sub.j [i] (shown in 1801) and .tau..sub.j [i] (shown in 1802) for i=l.sub.i, . . . , u.sub.i,j=1, . . . ,.rho. takes place as in FIG. 12. The computation of the regions vector {character pullout}[1:n] (shown in 1803) is done as before, except as noted below. In step 1804, the set S.sub..tau. of tests for each iteration is computed, and in step 1805, a loop body B.sub..tau. (i) is generated for each unique .tau. in set S.sub..tau.. It is the action of steps 1803, 1804 and 1805 that distinguish this variant from the section entitled "Specializing the Method for Loops" disclosed above. In particular, step 1803 is implemented using data flow and symbolic analysis to more accurately determine non-empty regions. In steps 1804 and 1805, the regions so identified are generated. Finally, in step 1806, the transformed version of loop L(i,l.sub.i,u.sub.i,B(i)) is executed. Computing a Safe Region for the Iteration Space We have shown above how to compute the safe bounds of a loop with respect to an array reference A[.function.(i)]. If the body B(i) has .rho. indexing operations on i, of the form A.sub.1 [.function..sub.1 (i)],A.sub.2 [.function..sub.2 (i)], . . . , A.sub..rho. [.function..sub..rho. (i)], we can compute {character pullout}(.function..sub.j (.),A.sub.j) and {character pullout}(.function..sub.j (.),A.sub.j)(1.ltoreq.j.ltoreq..rho.) using Equations (16) and (17) (or one of the other variations described above). The safe range for i, with respect to all references, is between the maximum of all {character pullout}(.function..sub.j (.),A.sub.j) and the minimum of all {character pullout}(.function..sub.j (.),A.sub.j): ##EQU20## where the safe region of the iteration space is defined by i={character pullout}.sup.s,{character pullout}.sup.s +1, . . . , {character pullout}.sup.s. For values of i in this region, we are guaranteed that any array reference A.sub.j [.function..sub.j (i)] will not cause a violation. If {character pullout}.sup.s >{character pullout}.sup.s, then this region is empty. Note that the safe region, as defined by {character pullout}.sup.s and {character pullout}.sup.s, corresponds to region {character pullout}[11 . . . 1.sub.3 ] defined above: {character pullout}.sup.s ={character pullout}[11 . . . 1.sub.3 ].l (33) {character pullout}.sup.s ={character pullout}[11 . . . 1.sub.3 ].u (34) To extract loop bounds that we can use in our methods, we define two operators, .PSI..sub.l and .PSI.u: (l.sup.s,.tau..sup.l)=.PSI..sub.l (l.sub.i,u.sub.i {character pullout}.sup.s,{character pullout}.sup.s,(.function..sub.l (.), . . . ,.function..sub..rho. (.))), (u.sup.s,.tau..sup.u)=.PSI..sub.u (l.sub.i,u.sub.i,{character pullout}.sup.s,{character pullout}.sup.s,(.function..sub.l (.), . . . ,.function..sub..rho. (.))), where l.sup.s is the lower safe bound and u.sup.s is the upper safe bound for the entire iteration space of i. We want to generate l.sup.s and u.sup.s such that they partition the iteration space into three regions: (i) a region from l to l.sup.s -1, (ii) a region from l.sup.s to u.sup.s, and (iii) a region from u.sup.s +1 to u. Region (ii) has the property that no run-time checks are necessary. The operators .PSI..sub.i and .PSI..sub.u also produce two vectors, .tau..sup.l and .tau..sup.u, with .rho. elements each. The element .tau..sup.l (j) defines the type of run-time check that is necessary for reference A.sub.j [.function..sub.j (i)] in region (i). Correspondingly, the element .tau..sup.u (j) defines the type of run-time check necessary for reference A.sub.j [.function..sub.j (i)] in region (iii). We consider the two possible cases for {character pullout}.sup.s and {character pullout}.sup.s separately. {character pullout}.sup.s >{character pullout}.sup.s : In this case, the safe region of the loop is empty. We select l.sup.s =u+1 and u.sup.s =u, thus making region (i) equal to the whole iteration space and the other regions empty. The value of .tau..sup.u is irrelevant, and .tau..sup.l can be selected, for simplicity, to indicate that all checks need to be performed in all references. {character pullout}.sup.s.ltoreq.{character pullout}.sup.s : In this case, there is a nonempty safe region, and we simply make l.sup.s ={character pullout}.sup.s and u.sup.s ={character pullout}.sup.s. From the computation of the safe bounds, we derive that for values of i less than l.sup.s, accesses of the form A[.function.(i)] can cause violations on the lower bound of A, if .function.(i) is monotonically increasing, and on the upper bound of A, if .function.(i) is monotonically decreasing. For values of i greater than u.sup.s accesses of the form A[.function.(i)] can cause violations on the upper bound of A, if .function.(i) is monotonically increasing, and on the lower bound of A, if .function.(i) is monotonically decreasing. The values of .tau..sup.l and .tau..sup.u have to reflect the necessary run-time checks. Summarizing: We define operators .PSI..sub.l and .PSI..sub.u to compute: ##EQU21## If A.sub.j is null in a reference A.sub.j [.function..sub.j (i)], then, according to our definition, lo(A.sub.j)=0 and up(A.sub.j)=-1. This, in turn, makes {character pullout}(.function..sub.j (.),A.sub.j)>{character pullout}(f.sub.j (.),A.sub.j) and consequently {character pullout}.sup.s >{character pullout}.sup.s. Therefore, if there is a nonempty safe region (i.e., {character pullout}.sup.s.ltoreq.{character pullout}.sup.s), then necessarily none of the pointers A.sub.j is null. This safe region needs neither run-time bounds tests nor run-time null pointer tests. In fact, if {character pullout}.sup.s.ltoreq.{character pullout}.sup.s, then null pointer tests are superfluous in the regions preceding and succeeding the safe region. The method to optimize array reference tests works by partitioning the iteration space into regions of three different types: (i) regions that do not require any tests, (ii) regions which require tests as defined by .tau..sup.l, and (iii) regions which require tests as defined by .tau..sup.u. Of particular interest is the partitioning into exactly three regions, as defined by the following vector {character pullout}[1:3]: ##EQU22## where .tau..sup.false is a test vector indicating that no tests are performed (on the .rho. references of the form A.sub.j [.function..sub.j (i)]). Other legal partitioning vectors can be generated by subdividing each of the three regions defined in Equation (39). It is also legal to move the first iteration of {character pullout}[2] to {character pullout}[1] or the last iteration of {character pullout}[2] to {character pullout}[3]. (The moving can be repeated as many times as desired.) Note also that it is perfectly legal to compute .tau..sup.l and .tau..sup.u in any manner that results in test vectors greater than those computed in Equations (37) and (38), respectively. Even .tau..sup.false can be redefined as to include as many tests as desired. However, using exactly the three regions as computed in Equation (39) and .tau..sup.l and .tau..sup.u computed in Equations (37) and (38) has the advantage that the partitioning is minimal for a general loop, and that the tests for each reference are minimal for the corresponding partitioning. An implementation of the method using the generic transformation of FIG. 13 is shown in FIG. 19. Note that only three versions of the loop body are necessary, since {character pullout}[.delta.]. .tau. can only assume three values (.tau..sup.l, .tau..sup.u, and .tau..sup.false) as shown in 1911. For compactness, we use the notation B.sub.l, B.sub.u, and B.sub.false, to denote the three loop body versions B.sub..tau..sub..sup.l , B.sub..tau..sub..sup.u , and B.sub..tau..sub..sup.false , respectively. The explicit implementation of the transformed loop can follow either of the patterns shown in FIG. 14 and FIG. 15. We mentioned that .tau..sup.l and .tau..sup.u can be chosen to be any vector greater than as defined in Equations (37) and (38). In particular, we can make {character pullout}[1]..tau.={character pullout}[3]..tau.=.tau..sup.true, where .tau..sup.true is a test vector indicating that an all tests should be performed before each array reference. This leads to a variation of the method that only requires two versions of the loop body: B.sub..tau..sub..sup.false and B.sub..tau..sub..sup.true , which for compactness, we represent by B.sub.false and B.sub.true, respectively. The transformations representing this variant method are illustrated in FIGS. 20 and 21. FIG. 20 follows the pattern of FIG. 14, while FIG. 21 follows the pattern of FIG. 15. This variation of the method only uses two versions of code for the loop body. Because our method and its variant always generate three regions with specific test vectors, they can be implemented by explicitly instantiating three loops with the appropriate loop bounds as defined by the {character pullout} vector. The implementation of the method using three loop body versions (B.sub.l, B.sub.u, and B.sub.false) is shown in FIG. 22 while the implementation using only two versions (B.sub.true and B.sub.false) is shown in FIG. 23. In FIG. 22, loop 2206 implements region {character pullout}[1] of 2201, loop 2209 implements region {character pullout}[2] of 2201, and loop 2212 implements region {character pullout}[3] of 2201. Correspondingly, in FIG. 23, loop 2306 implements region {character pullout}[1] of 2301, loop 2309 implements region {character pullout}[2] of 2301, and loop 2312 implements region {character pullout}[3] of 2301. Applying the Optimizations to Loop Nests The methods described above can be applied to each and every loop in a program. In particular, they can be applied recursively to all loops in a loop nest. The order of application does not affect the final result. In our discussion we will sometimes show the application of the methods from outermost loop to innermost loop order. At other times, we will show the application of the methods from innermost to outermost loop order. In FIG. 24, a doubly nested original loop is shown in 2401. The body of the outer i.sub.1 loop (2402) consists of a (possibly empty) section of straight-line code S.sup.1 (i.sub.1) (2403), followed by an inner i.sub.2 loop (2404), followed by a (possibly empty) section of straight-line code S.sup.2 (i.sub.1) (2407). The body of the inner loop B(i.sub.1,i.sub.2) (2405) will, in general, contain references to both loop index variables, i.sub.1 and i.sub.2. We first apply the transformation to the outer i.sub.1 loop, resulting in the structure shown in 2409. {character pullout}.sub.1 is the partitioning vector for the i.sub.1 loop. The three test vectors for regions {character pullout}.sub.1 [1], {character pullout}.sub.1 [2], and {character pullout}.sub.1 [3] are .tau..sub.1.sup.l, .tau..sub.1.sup.false, and .tau..sub.1.sup.u, respectively. S.sub.R.sub..sub.1 .sub.[.delta.]..tau. (i.sub.1).sup.1, S.sub.R.sub..sub.1 .sub.[.delta.]..tau. (i.sub.1).sup.2, and B.sub.R.sub..sub.1 .sub.[.delta..sub..sub.1 .sub.]..tau. (i.sub.1,i.sub.2) represent S.sup.1 (i.sub.1), S.sup.2 (i.sub.1), and B(i.sub.1,i.sub.2), respectively, with the array reference tests indicated by {character pullout}.sub.1 [.delta..sub.1 ]..tau.. We then apply the method for optimizing array reference tests within an arbitrary loop structure to the inner loop i.sub.2, resulting in the structure shown in 2419. {character pullout}.sub.2 is the partitioning vector for the i.sub.2 loop. The three test vectors for regions {character pullout}.sub.2 [1], {character pullout}.sub.2 [2], and {character pullout}.sub.2 [3] are .tau..sub.2.sup.l, .tau..sub.2.sup.false, and .tau..sub.2.sup.u, respectively. B.sub.R.sub..sub.1 .sub.[.delta..sub..sub.1 .sub.]..tau..R.sub..sub.2 .sub.[.delta..sub..sub.2 .sub.]..tau. (i.sub.1,i.sub.2) represents B(i.sub.1,i.sub.2) with the array reference tests indicated by {character pullout}.sub.1 [.delta..sub.1 ]..tau. and {character pullout}.sub.2 [.delta..sub.2 ]..tau.. The actual implementation of 2419 can follow either of the patterns shown in FIG. 14 or FIG. 15. Note that this recursive application of the method for optimizing array reference tests within an arbitrary loop structure generates three versions of S.sup.1 (i.sub.1) and S.sup.2 (i.sub.1) (one for each value of {character pullout}.sub.1 [.delta..sub.1 ]..tau. .epsilon. (.tau..sub.1.sup.l,.tau..sub.1.sup.u,.tau..sub.1.sup.false)) and nine versions of B(i.sub.1,i.sub.2) (one for each combination of {character pullout}.sub.1 [.delta..sub.1 ]..tau. .epsilon. (.tau..sub.1.sup.l,.tau..sub.1.sup.u,.tau..sub.1.sup.false) and {character pullout}.sub.2 [.delta..sub.2 ]..tau..epsilon.(.tau..sub.2.sup.l,.tau..sub.2.sup.u,.tau..sub.2.sup. falsel). In general, for a d-dimensional loop nest, 3.sup.d versions of the innermost loop body B(i.sub.1,i.sub.2, . . . ,i.sub.d) are generated by the recursive application of the method for optimnizing array reference tests within an arbitrary loop structure. FIG. 25 shows the first step in the application of a variation of the method for optimizing array reference tests within an arbitrary loop structure which only uses two versions of code for the loop being optimized. In FIG. 25, this variation is applied to a doubly nested loop 2501. This doubly nested loop has the same structure as 2401. In this particular case we are using the implementation of the method shown in FIG. 20. In this first step, the transformation is applied to the outer i.sub.1 loop 2502, resulting on the structure shown in 2509. {character pullout}.sub.1 is the partitioning vector for the i.sub.1 loop. The three test vectors for regions {character pullout}.sub.1 [1], {character pullout}.sub.1 [2], and {character pullout}.sub.1 [3] are ,.tau..sub.1.sup.true, .tau..sub.1.sup.false, and .tau..sub.1.sup.true, respectively. S.sub.true.sub..sub.1 .sup.1 (i.sub.1), S.sub.true.sub..sub.1 .sup.2 (i.sub.1) and B.sub.true.sub..sub.1 (i.sub.1,i.sub.2) represent S.sup.1 (i.sub.1), S.sup.2 (i.sub.1), and B(i.sub.1,i.sub.2), respectively, with the array reference tests indicated by .tau..sub.1.sup.true. Correspondingly, S.sub.false.sub..sub.1 .sup.1 (i.sub.1), S.sub.false.sub..sub.1 .sup.2 (i.sub.1), and B.sub.false.sub..sub.1 (i.sub.1,i.sub.2) represent S.sup.1 (i.sub.1), S.sup.2 (i.sub.1), and B(i.sub.1,i.sub.2), respectively, with the array reference tests indicated by .tau..sub.1.sup.false. The second step in the application of this variation of the method to the doubly nested loop is shown in FIG. 26. The result 2509 of the first step is shown again in 2601 for convenience. The resulting structure, after the second step of transformations, is shown in 2622. The inner i.sub.2 loop 2606 is transformed into loop 2627, while the inner i.sub.2 loop 2615 is transformed into loop 2645. For each iteration of the i.sub.1 loop, {character pullout}.sub.2 is the partitioning vector for the i.sub.2 loop. The three test vectors for regions {character pullout}.sub.2 [1], {character pullout}.sub.2 [2], and {character pullout}.sub.2 [3] are .tau..sub.2.sup.true, .tau..sub.2.sup.false, and .tau..sub.2.sup.true, respectively. Note that there are four versions of B(i.sub.1,i.sub.2), one for each combination of {character pullout}.sub.1 [.delta..sub.1 ]..tau..epsilon. (.tau..sub.1.sup.true, .tau..sub.1.sup.false) and {character pullout}.sub.2 [.delta..sub.2 ]..tau..epsilon. (.tau..sub.2.sup.true, .tau..sub.2.sup.false). In general, for a d-dimensional loop nest, 2.sup.d versions of the innermost loop body B(i.sub.1,i.sub.2, . . . ,i.sub.d) are generated by recursive application of this variation of the method for optimizing array reference tests within an arbitrary loop structure. This same method can be applied recursively using the implementation shown in FIG. 21. The first step of this particular case is shown in FIG. 27. The method is applied to a doubly nested loop 2701, which has the same structure as 2401. In the first step, the outer i.sub.1 loop 2702 is transformed into loop 2710, which iterates over the three regions of the iteration space of 2702. Iterations of 2711 are only executed for {character pullout}.sub.1 [.delta..sub.1 ]..tau.=true. Iterations of 2718 are only executed for {character pullout}.sub.1 [.delta..sub.1 ]..tau.=false. This is achieved by setting the loop bounds appropriately: ##EQU23## S.sub.true.sub..sub.1 .sup.1 (i.sub.1), S.sub.true.sub..sub.1 .sup.2 (i.sub.1), B.sub.true.sub..sub.1 (i.sub.1,i.sub.2), S.sub.false.sub..sub.1 .sup.1 (i.sub.1), S.sub.false.sub..sub.1 .sup.2 (i.sub.1), and B.sub.false.sub..sub.1 (i.sub.1,i.sub.2) in FIG. 27 are exaclty the same as in FIG. 25. The result 2709 from the first step of the transformation is replicated in 2801 of FIG. 28 for convenience. FIG. 28 shows the second step of the transformation. The inner i.sub.2 loop 2805 is transformed into loop 2822, while the inner i.sub.2 loop 2812 is transformed into loop 2834. Each of these loops implements one instance of the partitioning vector {character pullout}.sub.2. Once again, the bounds for the i.sub.2 loops have to be set appropriately: ##EQU24## The method for optimizing array reference tests within an arbitrary loop structure and its variations through explicit instantiation of the regions of the loop iteration space can also be applied recursively. Application of the method of FIG. 22 to a doubly nested loop is shown in FIG. 29. The usual original loop is shown in 2901. We can apply the method to the inner i.sub.2 loop 2904 first, which results in the three loops 2912, 2915, and 2918 shown in 2909. Each of these loops implements one region of the iteration space of the original i.sub.2 loop 2904. The method is then applied to the outer i.sub.1 loop 2910, resulting in the three loops 2924, 2937, and 2950. Each of these loops implements one region of the iteration space of the original i.sub.1 loop 2902. Note that, as previously discussed, there are nine versions of loop body B(i.sub.1,i.sub.2), which appear explicitly in 2923. Application of the method of FIG. 23 to a doubly nested loop is shown in FIG. 30. The usual original loop is shown in 3001. We can apply the method to the inner i.sub.2 loop 3004 first, which results in the three loops 3012, 3015, and 3018 shown in 3009. Each of these loops implements one region of the iteration space of the original i.sub.2 loop 3004. The method is then applied to the outer i.sub.1 loop 3010, resulting in the three loops 3024, 3037, and 3050. Each of these loops implements one region of the iteration space of the original i.sub.1 loop 3002. Note that, as previously discussed, there are nine versions of loop body B(i.sub.1,i.sub.2), which appear explicitly in 3023. Selective Application of the Methods to Loop Nests Instead of applying the foregoing methods recursively to all loops in a loop nest, different strategies can be applied. The application of any of these methods to a loop results in a partitioning of the iteration space of the loop into two or three types of regions (depending on the method). One of the types, characterized by the .tau..sup.false test vector, does not need any array reference tests on references indexed by the loop control variable. Since the other types, characterized by the .tau..sup.l, .tau..sup.u, or .tau..sup.true test vectors, contain tests on at least some array references, the benefits of applying the transformation recursively are diminished. On the other hand, by continuing to apply the transformation to the .tau..sup.false regions we can potentially arrive at regions that are free of all array reference tests. For many programs, these regions with no tests are expected to contain most or all of the iterations of the loop nest. Therefore, we propose variants to the foregoing methods in which the transformation is recursively applied always from outer loop to inner loop order. The recursion is applied only to those regions without tests on the loop index variable. Consider first the method for optimizing array reference tests within an arbitrary loop structure as described above. In FIG. 31 we show the transformation applied to the outer i.sub.1 loop 3102 of a doubly nested loop 3101. The resulting structure is shown in 3109, using the implementation of FIG. 14. We show the implementation explicitly in this case because the next step of the transformation will be applied only to loop 3123, which belongs to the region with no tests. 3109 is replicated in 3201 of FIG. 32 for convenience. This figure shows the recursive application of the method to the i.sub.2 loop 3215. This loop is transformed into loop 3245, which contains three instances of the i.sub.2 loop in 3247, 3252, and 3257. Each instance implements one of the three types of regions. The overall structure resulting from the selective recursive application of the method is shown in 3231. Note that only five versions of the loop body B(i.sub.1,i.sub.2) are generated: B.sub.l.sub..sub.1 .sub.,true.sub..sub.2 (i.sub.1 i.sub.2) (which is the same as B.sub.l.sub..sub.1 (i.sub.1,i.sub.2), B.sub.false.sub..sub.1 .sub.,l.sub..sub.2 (i.sub.1,i.sub.2), B.sub.false.sub..sub.1 .sub.,false.sub.2 (i.sub.1,i.sub.2), B.sub.false.sub..sub.1 .sub.,u.sub..sub.2 (i.sub.1,i.sub.2) and B.sub.u.sub..sub.1 .sub.,true.sub..sub.2 (i.sub.1,i.sub.2) (which is the same as B.sub.u.sub..sub.1 (i.sub.1,i.sub.2)). In general, for a d-dimensional loop nest, 2d+1 versions of the innermost loop body B(i.sub.1,i.sub.2, . . . , i.sub.d) are generated by the selective recursive application of the method. (If tests are not specified for a loop with control variable i.sub.j, then all references indexed by i.sub.j must be fully tested.) The exact same operation can be performed using the implementation of FIG. 15. The first step is shown in FIG. 33. The method is applied to the outer i.sub.1 loop 3302 of the double loop nest 3301. This results in structure 3309, replicated in 3401 of FIG. 34 for convenience. The method is applied again to the inner i.sub.2 loop 3412, resulting in the final structure 3425. Selective recursion can also be applied to the variation of the method which only uses two versions of code for the loop being optimized. FIG. 35 shows the method applied to the outer i.sub.1 loop 3502 of a doubly nested loop 3501, using the implementation of FIG. 20. FIG. 35 is identical to FIG. 25. Using selective recursion, the method is applied next only to loop 3523. Structure 3509 is replicated in 3601 of FIG. 36 for convenience. Loop 3615 is transformed into loop 3636, which contains two instances, 3638 and 3643, of the inner i.sub.2 loop. Note that three versions of the loop body B(i.sub.1,i.sub.2) are generated: B.sub.true.sub..sub.1 .sub.,true.sub..sub.2 (i.sub.1,i.sub.2) (which is the same as B.sub.true.sub..sub.1 (i.sub.1,i.sub.2)), B.sub.false.sub..sub.1 .sub.,true.sub..sub.2 (i.sub.1,i.sub.2), and B.sub.false.sub..sub.1 .sub.,false.sub..sub.2 (i.sub.1,i.sub.2). In general, for a d-dimensional loop nest, d+1 versions of the innermost loop body B(i.sub.1, i.sub.2, . . . , i.sub.d) are generated by the selective recursive application of the method. FIGS. 37 and 38 show the selective recursive application of the method using the implementation of FIG. 21. FIG. 37 is identical to FIG. 27. In it, the method is applied to the outer i.sub.1 loop 3702 of a doubly nested loop 3701. This results in structure 3709. Structure 3709 is replicated in 3801 of FIG. 38. The method is then applied to the i.sub.2 loop 3812. Loop 3812 is transformed into loop 3829. Note the same three versions of the loop body B(i.sub.1,i.sub.2) in 3818 as in 3622. Finally, selective recursion can also be used with the method for optimizing array reference tests within an arbitrary loop structure and its variations through explicit instantiation of the regions of the loop iteration space. Selective recursive application of the method of FIG. 22 to a doubly nested loop 3901 is shown in FIG. 39. The method is first applied to the outer i.sub.l loop 3902, which results in the three loops 3910, 3917 and 3924 in 3909. Loop 3917 implements the region with no array reference tests on the index variable i.sub.1, as indicated by the S.sub.false.sub..sub.1 .sup.1 (i.sub.1), B.sub.false.sub..sub.1 (i.sub.1,i.sub.2), S.sub.false.sub..sub.1 .sup.2 (i.sub.1) versions of code. The method is then applied recursively only to i.sub.2 loop 3919, which results in the three loops 3941, 3944, and 3947 in 3931. Note the same five versions of the loop body B(i.sub.1,i.sub.2) as in FIGS. 32 and 34. Selective recursive application of the method of FIG. 23 to a doubly nested loop 4001 is shown in FIG. 40. The method is first applied to the outer i.sub.1 loop 4002 which results in the three loops 4010, 4017, and 4024 in 4009. Loop 4017 implements the region with no array reference tests on the index variable i.sub.1, as indicated by the S.sub.false.sub..sub.1 .sup.1 (i.sub.1), B.sub.false.sub.1 (i.sub.1,i.sub.2), S.sub.false.sub..sub.1 .sup.2 (i.sub.1) versions of code. The method is then applied recursively only to i.sub.2 loop 4019 which results in the three loops 4041, 4044, and 4047 in 4031. Note the same three versions of the loop body B(i.sub.1,i.sub.2) as in FIGS. 36 and 38. Methods for Perfect Loop Nests Consider a d-dimensional perfect loop nest as follows:
do i.sub.l = l.sub.i.sub..sub.1 ,u.sub.i.sub..sub.1
do i.sub.2 = l.sub.i.sub..sub.2 ,u.sub.i.sub..sub.2
. . .
do i.sub.d = l.sub.i.sub..sub.d ,u.sub.i.sub..sub.d
B(i.sub.1,i.sub.2,...,i.sub.d)
end do
. . .
end do
end do
This loop nest defines a d-dimensional iteration space where i.sub.1, i.sub.2, . . . , i.sub.d are the axes of the iteration space. Each iteration, except for the last, has an immediately succeeding iteration in execution order. Conversely, each iteration, except the first, has an immediately preceding iteration in execution order. This d-dimensional iteration space can be partitioned into a sequence of regions defined by a vector {character pullout}[1:n], where {character pullout}[.delta.]=(({character pullout}[.delta.].l.sub.1, {character pullout}[.delta.].l.sub.2, . . . , {character pullout}[.delta.].l.sub.d), {character pullout}[.delta.].u.sub.1, {character pullout}[.delta.].u.sub.2, . . . , {character pullout}[.delta.].u.sub.d), {character pullout}[.delta.]..tau.), {character pullout}[.delta.].l.sub.j =lower bound of loop i.sub.j in region {character pullout}[.delta.], {character pullout}[.delta.].u.sub.j =upper bound of loop i.sub.j in region {character pullout}[.delta.], {character pullout}[.delta.]..tau.=test vector for region {character pullout}[.delta.]. A partitioning can have any number of empty regions, since an empty region does not execute any iterations of the iteration space. Given a generic partitioning vector {character pullout}[1:m] with (m-n) empty regions, we can form an equivalent partitioning vector {character pullout}[1:n] with only nonempty regions by removing the empty regions from {character pullout}[1:m]. From now on, we consider only partitionings where every region {character pullout}[.delta.] is nonempty. For a partitioning vector to be valid, the following conditions must hold: 1. The first iteration of region {character pullout}[.delta.+1] must be the iteration immediately succeeding the last iteration of region {character pullout}[.delta.] in execution order, for .delta.=1, . . . ,n-1. 2. The first iteration of {character pullout}[.delta.] must be the first iteration, in execution order, of the iteration space. 3. The last iteration of {character pullout}[n] must be the last iteration, in execution order, of the iteration space. 4. The test for array reference A.sub.j [.sigma..sub.j ] in region {character pullout}[.delta.], as defined by the test vector element {character pullout}[.delta.]..tau..sub.j, has to be greater than or equal to the minimum test for each outcome of reference A.sub.j [.sigma..sub.j ] in region {character pullout}[.delta.]. We can express this requirement as: {character pullout}[.delta.]..tau..gtoreq..sub.{character pullout}[.delta.] max(.tau..sub.min (.chi..sub.j [i.sub.1,i.sub.2, . . . ,i.sub.d ])).jk (48) where the max is computed over all iterations of region {character pullout}[.delta.] and .chi..sub.j [i.sub.1,i.sub.2, . . . , i.sub.d ] is the (possibly estimated) outcome of reference A.sub.j [.sigma..sub.j ] in iteration point (i.sub.1,i.sub.2, . . . , i.sub.d). Using any valid partitioning, a d-dimensional perfect loop nest can be implemented by a driver loop that iterates over the regions. This is shown in FIG. 41. 4101 is the original perfect loop nest, while 4111 is an implementation of this loop nest with a driver loop 4112 iterating over the regions. The partitioning also works when the body of each loop i.sub.j in a d-dimensional loop nest consists of three sections, each possibly empty: (i) a section of straight line code S.sup.j, followed by (ii) a loop i.sub.j+1, followed by (iii) another section of straight line code S'.sup.j. This structure is shown in 4201 of FIG. 42. The implementation with a driver loop 4218 iterating over the regions is shown in 4217. The execution of each section of straight line code S.sup.j and S'.sup.j is guarded with an if-statement to guarantee that they are executed only at the right times. As a result of the partitioning of the iteration space, the same value for the set (i.sub.1, i.sub.2, . . . , i.sub.j) can occur on consecutive iterations of the driver loop. For correct execution, S.sup.j should only execute for the first occurrence of a particular value of (i.sub.1, i.sup.2, . . . , i.sub.j). This first occurrence corresponds to the execution of a region with {character pullout}[.delta.].l.sub.j+1 =l.sub.i.sub..sub.j+1 , {character pullout}[.delta.].l.sub.j+2 =l.sub.i.sub..sub.j+2 , . . . {character pullout}[.delta.].l.sub.d =l.sub.i.sub..sub.d . Correspondingly, S'.sub.j should only execute for the last occurrence of a particular value of (i.sub.1, i.sub.2, . . . , i.sub.j). This last occurrence corresponds to the execution of a region with {character pullout}[.delta.].u.sub.j+1 =u.sub.i.sub..sub.j+1 , {character pullout}[.delta.].u.sub.j+2 =u.sub.i.sub..sub.j+2 , . . . , {character pullout}[.delta.].u.sub.d =u.sub.i.sub..sub.d . Thus, we arrive at the expressions for the guards used in 4217: ##EQU25## Finally, the test vector {character pullout}[.delta.]..tau. must be applied to the section of straight line code for each region {character pullout}[.delta.]. We denote by S.sup.j.sub.{character pullout}[.delta.]..tau. a version of S.sup.j that performs test {character pullout}[.delta.]..tau..sub.j before reference A.sub.j [.sigma..sub.j ], if this reference occurs in S.sup.j. The same is valid for S'.sup.j.sub.{character pullout}[.delta.]..tau. with respect to S'.sup.j. We are particularly interested in computing a partitioning where the regions can be of two kinds: 1. All array references A.sub.j [.sigma..sub.j ] in the region execute successfully. 2. Some array reference A.sub.j [.sigma..sub.j ] in the region causes a violation. For a region {character pullout}[.delta.] of kind (1), we can define {character pullout}[.delta.]..tau.=false as a test vector that does not test any reference A.sub.j [.sigma..sub.j ]. For a region {character pullout}[.delta.] of kind (2), we can define {character pullout}[.delta.]..tau.=true as a test vector that performs all tests on every array reference A.sub.j [.sigma..sub.j ]. In this case, we only need two versions of B(i.sub.1, i.sub.2, . . . , id): (i) B.sub.true (i.sub.1, i.sub.2, . . . , i.sub.d) performs all tests for all array references, and (ii) B.sub.false (i.sub.1, i.sub.2, . . . , i.sub.d) does not perform any tests. If all references are of the form that allow the safe bounds l.sub.i.sub..sub.j .sup.s and u.sub.i.sub..sub.j .sup.s to be computed for every loop i.sub.j, then the aforementioned partitioning can be computed by the following algorithm:
Procedure for computing the regions of a loop nest.
procedure regions(R,j,(.alpha..sub.1, .alpha..sub.2, ..., .alpha..sub.j-1),
(.omega..sub.1, .omega..sub.2, ..., .omega..sub.j-1), .delta., d, B
S1 if(l.sub.j < l.sub.j.sup.s) then
S2 .delta. = .delta. + 1
S3 R[.delta.] = ((.alpha..sub.1, ..., .alpha..sub.j-1, l.sub.j
l.sub.j+1, ..., l.sub.d), (.omega..sub.1, ..., .omega..sub.j-1,
l.sub.j.sup.s -1, u.sub.j+1, ...,
u.sub.d), true)
S4 endif
S5 if (j = d) then
S6 if (l.sub.d.sup.s .ltoreq. u.sub.d.sup.s) then
S7 .delta. = .delta. + 1
S8 R[.delta.] = ((.alpha..sub.1, ..., .alpha..sub.d-1, l.sub.d.sup.s),
(.omega..sub.1, ..., .omega..sub.d-1, u.sub.d.sup.s), false)
S9 endif
S10 else
S11 do k = l.sub.j.sup.s, u.sub.j.sup.s
S12 regions(R,j+1,(.alpha..sub.1, .alpha..sub.2, ..., .alpha..sub.j-1,
k), (.omega..sub.1, .omega..sub.2, ..., .omega..sub.j-1, k), .delta., d, B
S13 end do
S14 end if
S15 if (u.sub.j > u.sub.j.sup.s) then
S16 .delta. = .delta. + 1
S17 R[.delta.] = ((.alpha..sub.1, .alpha..sub.j-1, u.sub.j.sup.s +1,
l.sub.j+1, ..., l.sub.d), (.omega..sub.1, ..., .omega..sub.j-1, u.sub.j,
u.sub.j+1, ...,
u.sub.d), true)
S18 end if
end procedure
The algorithm in procedure regions ( ) takes seven parameters: 1. The vector {character pullout} that is being computed. 2. The index j indicating that region extends along index variable i.sub.j are being computed. 3. The vector (.alpha..sub.1,.alpha..sub.2, . . . ,.alpha..sub.j-1) where .alpha..sub.k is the lower bound for loop index i.sub.k in the regions to be computed. 4. The vector (.omega..sub.1,.omega..sub.2, . . . ,.omega..sub.j-1), where .omega..sub.k is the upper bound for loop index i.sub.k in the regions to be computed. 5. The count .delta. of the number of regions already computed. 6. The dimensionality d of the loop nest. 7. The vector {character pullout}[1:d], where {character pullout}[j]=(l.sub.j, u.sub.j, l.sub.j.sup.s, u.sub.j.sup.s) contains the full and safe bounds for loop i.sub.j. To compute the entire vector of regions, the invocation regions ({character pullout}, 1, ( ), ( ), .delta.=0, d, {character pullout}) should be performed. The value of .delta. at the end of the computation is the total number of regions in {character pullout}. An important optimization. If, for a particular value of j, l.sub.j.sup.s =l.sub.j and u.sub.j.sup.s =u.sub.j, then the safe region along axis i.sub.j of the iteration space corresponds to the entire extent of the axis. If l.sub.k.sup.s =l.sub.k and u.sub.k.sup.s =u.sub.k for k=j+1, . . . , d, then axis i.sub.j can be partitioned into only three regions: (i) one region from l.sub.j to l.sub.j.sup.s -1, (ii) one region from l.sub.j.sup.s to u.sub.j.sup.s, and (iii) one region from u.sub.j.sup.s +1 to u.sub.j. Each of these regions spans the entire iteration space along axes i.sub.j+1, i.sub.j+2, . . . , i.sub.d. Collapsing multiple regions into a single region reduces the total number of iterations in the driver loop 4218 and, consequently, reduces the run-time overhead of the method. To incorporate this optimization in the computation of regions, procedure regions is modified as follows:
Optimized procedure for computing the regions of a loop nest.
procedure regions(R,j,(.alpha..sub.1, .alpha..sub.2, ..., .alpha..sub.j-1),
(.omega..sub.1, .omega..sub.2, ..., .omega..sub.j-1), .delta., d, B
if(l.sub.j < l.sub.j.sup.s) then
.delta. = .delta. + 1
R[.delta.] = ((.alpha..sub.1, ..., .alpha..sub.j-1, l.sub.j+1, ...,
l.sub.d), (.omega..sub.1, ..., .omega..sub.j-1, l.sub.j.sup.s -1,
u.sub.j+1, ...,
u.sub.d), true)
endif
if (j = d) then
if (l.sub.d.sup.s .ltoreq. u.sub.d.sup.s) then
.delta. = .delta. + 1
R[.delta.] = ((.alpha..sub.1, ..., .alpha..sub.d-1, l.sub.d.sup.s),
(.omega..sub.1, ..., .omega..sub.d-1, u.sub.d.sup.s) false)
endif
elseif (nochecks((l.sub.j+1, l.sub.j+2, ..., l.sub.d), (l.sup.s.sub.j+1,
l.sup.s.sub.j+2, ..., l.sub.d.sup.s), (u.sub.j+1, u.sub.j+2,
..., u.sub.d),
(u.sup.s.sub.j+1, u.sup.s.sub.j+2, ..., u.sub.d.sup.s)) then
.delta. = .delta. + 1
R[.delta.] = ((.alpha..sub.1, ..., .alpha..sub.j-1, l.sub.j.sup.s,
l.sub.j+1, ..., l.sub.d), (.omega..sub.1, ..., .omega..sub.j-1,
u.sub.j.sup.s,
u.sub.j+1, ..., u.sub.d), false)
else
do k + l.sub.j.sup.s, u.sub.j.sup.s
regions(R,j+1,(.alpha..sub.1, .alpha..sub.2, ..., .alpha..sub.j-1, k),
(.omega..sub.1, .omega..sub.2, ..., .omega..sub.j-1, k), .delta., d, B)
end do
end if
if (u.sub.j > j.sub.j.sup.s) then
.delta. = .delta. + 1
R[.delta.] = ((.alpha..sub.1, ..., .alpha..sub.j-1, u.sub.j.sup.s +1,
l.sub.j+1, ..., l.sub.d), (.omega..sub.1, ..., .omega..sub.j-1, u.sub.j,
u.sub.j+1, ..., u.sub.d), true)
end if
end procedure
boolean function no checks((l.sub.l, ..., l.sub.m), (l.sub.l.sup.s, ...,
l.sub.m.sup.s), (u.sub.l, ..., u.sub.m), u.sub.l.sup.s,
..., u.sub.m.sup.s))
if (((l.sub.i = l.sub.i.sup.s)(u.sub.i = u.sub.i.sup.s)).A-inverted.i=1,
..., m) then
return true
else
return false
end if
end function
We discuss two alternatives for the implementation of the method for optimizing array reference tests within a loop nest using static code generation. If the vector {character pullout} is computed as previously described, only two versions of the loop body need to be generated, B.sub.true (i.sub.1, i.sub.2, . . . , i.sub.d) and B.sub.false (i.sub.1, i.sub.2, . . . , i.sub.d). Also, two versions of S.sup.j, S.sup.j.sub.true and S.sup.j.sub.false and two versions of S'.sup.j, S'.sup.j.sub.true and S'.sup.j.sub.false, need to be generated. Using these two versions of the loop body and two versions of each section of straight line code, a d-dimensional loop nest can be transformed as shown in FIGS. 43 and 44. The loop nest without sections of straight line code 4301 is transformed into 4311. The loop nest with sections of straight line code 4401 is transformed into 4417. The transformed code in 4311 and 4417 has two instances of the original loop nest. The bounds for each loop in these loop nests can be computed by ##EQU26## The guard expressions used in the if-statements in 4417 can be expressed in terms of these bounds: ##EQU27## Alternatively, the implementation shown in FIG. 45 can be used. It is semantically equivalent to the method shown in FIG. 44, but it does not require computing the new loop bounds l.sub.j.sup.true (.delta.), u.sub.j.sup.true (.delta.), l.sub.j.sup.false (.delta.), and u.sub.j.sup.false (.delta.) used in FIG. 44. Instead, an if-statement 4519 is used to verify which kind of region is region {character pullout}[.delta.]. If the region requires a test on any array reference, then the loop nest 4520-4534 is executed. If the region does not require any tests on array references, then the loop nest 4536-4550 is executed. Using a Single Region In the extreme case, we can partition the entire iteration space of a d-dimensional loop nest into a single region {character pullout}[1]. This region is defined as: {character pullout}[1].l.sub.j =l.sub.i.sub..sub.j ,for j=1, . . . , d, (57) {character pullout}[1].u.sub.j =u.sub.i.sub..sub.j ,for j=1, . . . , d, (58) ##EQU28## We use {character pullout}[1]..tau.=true to denote that all tests must be performed, and {character pullout}[1]..tau.=false to denote that no tests are to be performed. Using only two versions of the loop body and each section of straight line code, we transform the loop nest 4601 into 4617. Note that, since there is only one region, there is no need for a driver loop in 4617. There are two instances of the loop nest in 4617, and the bounds are computed by ##EQU29## FIG. 47 shows another transformation of the loop nest 4701 into 4717 that implements a partitioning with a single region. A single test 4718 is used in 4717 to verify the kind of region. The appropriate instance of the loop nest is selected based on the test. The value of the flag check is the same as {character pullout}[1]..tau.. If the safe bounds l.sub.i.sub..sub.j .sup.s and u.sub.i.sub..sub.j .sup.s can be computed for every loop i.sub.j, then check can be computed by ##EQU30## In general, check=true if the outcome of any array reference A[.sigma.] in the execution of the loop nest is a violation or is unknown, and check=false if all outcomes of array references in the execution of the loop nest are either successful or not executed. Treating Sequences of Array References with Versions This approach of dynamically selecting from two versions of a loop (one with all tests and another with no tests) can be extended to any sequence of array references. Consider the sequence of .rho. array references (A.sub.1 [.sigma..sub.1 ], A.sub.2 [.sigma..sub.2 ], . . . , A.sub..rho. [.sigma..sub..rho. ]) shown in 4801 of FIG. 48. This sequence of references can be replaced by the code in 4805 which dynamically selects from two versions of the references. The version in lines 4807-4812 performs all tests, while the version in lines 4814-4819 does not perform any tests. The selection is based on the value of check, which can be computed as ##EQU31## In general, the evaluation of lo(A.sub.j).ltoreq..sigma..sub.j and .sigma..sub.j.ltoreq.up(A.sub.j) requires symbolic computation and/or run-time evaluation. For loops in particular, it is necessary to represent the range of values that .sigma..sub.j can take. This representation can be in the form of symbolic lower and upper bounds of the range. This method can be applied to any body of code for which lo(A.sub.j).ltoreq..sigma..sub.j and .sigma..sub.j.ltoreq.up(A.sub.j) can be evaluated. This body of code can be a loop, a section of straight-line code, a whole procedure, or even an entire program or program module. In the worst case, if the value of lo(A.sub.j).ltoreq..sigma..sub.j or .sigma..sub.j.ltoreq.up(A.sub.j) cannot be determined, then a conservative guess has to be made in either comparison, which will cause check to evaluate to true, and the version with all tests to be selected. Note that the selected sequence of references can contain any subset of the actual sequence present in a body of code. Array references left out of the sequence can be treated as a special form of reference that includes tests. These references with tests appear in both versions of code. Speculative Execution Quite often, explicit tests for the validity of array references impose an additional performance penalty by constraining some compiler or programmer optimizations. On some systems, the cost associated with these optimization constraints is actually higher than the cost of performing the tests themselves. The cost of these constraints can be minimized by speculatively executing the loop. In speculative execution, two versions of the code are formed. The first contains tests to determine if an invalid array reference occurs, but does not insist that the violations be detected precisely. This allows many optimizations that depend on code motion to occur in this first version. (See S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann Publishers, 1997.) The results of the speculative execution are saved in temporary storage until it is determined that the speculative execution finished without any invalid references occurring. At that point, the results are saved to permanent storage. If an invalid array reference occurs or any other exception is detected, the results in temporary storage are discarded, and the computation is performed again in precise order. Let S be a sequence of code containing one or more array references. The transformation of this code to support speculative execution is shown in FIG. 49, where S is represented by 4901. The transformation proceeds as follows. First, two versions of S are generated, with an imprecisely ordered version S' in 4907-4912, and a precisely ordered version S in 4914-4920. Tests are placed into the precisely checked version as described in previous methods (4914, 4916, and 4919). The placement of tests in the imprecisely ordered version is constrained only by the data flow of the program. That is, a the test cannot be done before the array is created or the subscript value can been computed. These tests can be optimized using the methods discussed in the Background of the Invention. When the tests show that an invalid array reference occurs, a flag is set to true. (See, for example, 4906.) This flag indicates that a problem occurred during the execution of the imprecise version and that the precise version should be run. Because array reference violations are not necessarily being detected at the time they occur (either the computation or the tests may have been moved by optimizations), the effect of the computation cannot be allowed to be observed outside of S. Thus, each reference to an array A.sub.j possibly written to in S' is replaced by a reference to a corresponding auxiliary array A'.sub.j (lines 4907, 4909, 4912). An array A'.sub.j is initialized to the value of A.sub.j before execution of S' begins. If flag is set to true at the end of execution of S', then a violation occurred, and the computation represented by S is rerun with precise ordering in S (4914-4920). The A'.sub.j arrays are ignored. If the flag is set to false, then no array reference violations occurred, and it is necessary to copy the results in the various auxiliary A.sub.j ' arrays to their permanent locations. Lines 4922 through Lines 4925 are added to accomplish this. We note that standard techniques can be used to reduce the number of elements that are actually copied between A.sub.j ' and A.sub.j. At the end of execution of 4905, independent of the execution path taken, the A.sub.j ' auxiliary arrays associated with the A.sub.j arrays can be explicitly freed or garbage collected, depending on the functionality of the underlying run-time system. While the invention has been described in terms of several preferred embodiments, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.
|
Same subclass | ||||||||||
