Method and system for generating compact code for the loop unrolling transformation6035125Abstract A loop unrolling trasformation specified by loop unrolling factors UF[1], . . . , UF[k] is performed on a perfect nest of k multiple loops to produce an unrolled loop representation as follows. Moving from the outermost loop to the innermost loop of the nest, the unroll factor UF[j] of the current loop is examined. First, the separate unrolled loop body is expanded by the specified unroll factor UF[j]. Second, the loop header for the current loop is adjusted so that if the loop's iteration count, COUNT[j], is known to be less than or equal to the unroll factor, UP[j], then the loop header is simply an assignment of the index variable to the lower-bound expression; otherwise, the loop header is adjusted so that the unrolled loop's iteration count equals .left brkt-bot.COUNT[J]/UF[J].right brkt-bot. a rounded down truncation of the division. Third, a remainder loop nest is generated, if needed. The size of the generated code when unrolling multiple nested loops is substantially reduced. The proportion of the object code comprising lower execution frequency remainder loops is also substantially reduced. The compile-time of unrolled multiple nested loops is also substantially reduced. Claims We claim: Description A portion of the Disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
TABLE A
______________________________________
do l = 1, n
do k = 1, n
do j = 1, n
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
end do
end do
end do
end do
______________________________________
Applying the unroll transformation of the present invention to the perfect nest of loops of Table A with unroll factors UF[1]=UF[2]=UF[3]=4 produces the unrolled loop of Table B containing 64 copies of the code from the original loop body (UF[1].times. . . . .times. UF[3]=4.times.4.times.4=64) and also containing 21 remainder loops, each containing a single copy of the code from the original loop body ((UF[1].times. . . . .times.UF[k-1])+(UF[1].times. . . . .times.UF[k-2])+ . . . +(UF[1].times.UF[2])+(UF[1])+1=(UF[1].times.UF[2])+(UF[1])+1=16+4+1=21). To understand how the loop unrolling of the present invention can improve register locality, notice that the original loop in Table A has no register locality for arrays a, b, and c as the original loop performs three load operations per iteration. However, the main unrolled loop body in Table B has an abundant amount of register locality, e.g., the value of a(i,j,k) can be loaded once and reused in a register in the first four statements of the unrolled loop body. In fact, of the 64*3=192 references to arrays a, b, and c in the main unrolled loop body, only 48 contain distinct values. If the target processor has sufficient registers to hold these 48 array elements, then the main unrolled loop would only perform 48 load operations. However, the original loop would have performed 192 loads for 64 iterations. Thus, the practice of the present invention reduces the number of load operations performed by a factor of four in this example. If the program semantics permit reassociation of addition (+) operations, loop unrolling can also help reveal more instruction-level parallelism in this example. For example, a partial sum can be computed for the first 32 statements on one functional unit, another partial sum can be computed in parallel for the remaining 32 statements on another functional unit, and the two partial sums could be combined by a single add operation at the end.
TABLE B
______________________________________
do l=1,n - 3,4
do k=1,n - 3,4
do j=1,n - 3,4
do i=1,n,1
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
sum = sum + a(i,j,k) + b(i,j,l + 1) + c(i,k,l + 1)
sum = sum + a(i,j,k) + b(i,j,l + 2) + c(i,k,l + 2)
sum = sum + a(i,j,k) + b(i,j,l + 3) + c(i,k,l + 3)
sum = sum + a(i,j,k + 1) + b(i,j,l) + c(i,k + 1,l)
sum = sum + a(i,j,k + 1) + b(i,j,l + 1) + c(i,k + 1,l + 1)
sum = sum + a(i,j,k + 1) + b(i,j,l + 2) + c(i,k + 1,l + 2)
sum = sum + a(i,j,k + 1) + b(i,j,l + 3) + c(i,k + 1,l + 3)
sum = sum + a(i,j,k + 2) + b(i,j,l) + c(i,k + 2,l)
sum = sum + a(i,j,k + 2) + b(i,j,l + 1) + c(i,k + 2,l + 1)
sum = sum + a(i,j,k + 2) + b(i,j,l + 2) + c(i,k + 2,l + 2)
sum = sum + a(i,j,k + 2) + b(i,j,l + 3) + c(i,k + 2,l + 3)
sum = sum + a(i,j,k + 3) + b(i,j,l) + c(i,k + 3,l)
sum = sum + a(i,j,k + 3) + b(i,j,l + 1) + c(i,k + 3,l + 1)
sum = sum + a(i,j,k + 3) + b(i,j,l + 2) + c(i,k + 3,l + 2)
sum = sum + a(i,j,k + 3) + b(i,j,l + 3) + c(i,k + 3,l + 3)
sum = sum + a(i,j + 1,k) + b(i,j + 1,l) + c(i,k,l)
sum = sum + a(i,j + 1,k) + b(i,j + 1,l + 1) + c(i,k,l + 1)
sum = sum + a(i,j + 1,k) + b(i,j + 1,l + 2) + c(i,k,l + 2)
sum = sum + a(i,j + 1,k) + b(i,j + 1,l + 3) + c(i,k,l + 3)
sum = sum + a(i,j + 1,k + 1) + b(i,j + 1,l) + c(i,k + 1,l)
sum = sum + a(i,j + 1,k + 1) + b(i,j + 1,l + 1) + c(i,k + i,l + 1)
sum = sum + a(i,j + 1,k + 1) + b(i,j + 1,l + 2) + c(i,k + i,l + 2)
sum = sum + a(i,j + 1,k + 1) + b(i,j + 1,l + 3) + c(i,k + i,l + 3)
sum = sum + a(i,j + 1,k + 2) + b(i,j + 1,l) + c(i,k + 2,l)
sum = sum + a(i,j + 1,k + 2) + b(i,j + 1,l + 1) + c(i,k + 2,l + 1)
sum = sum + a(i,j + 1,k + 2) + b(1,j + 1,l + 2) + c(i,k.+ 2,l + 2)
sum = sum + a(i,j + 1,k + 2) + b(i,j + 1,l + 3) + c(i,k + 2,l + 3)
sum = sum + a(i,j + 1,k + 3) + b(i,j + 1,l) + c(i,k + 3,l)
sum = sum + a(i,j + 1,k + 3) + b(i,j + 1,l + 1) + c(i,k + 3,l + 1)
sum = sum + a(i,j + 1,k + 3) + b(i,j + 1,l + 2) + c(i,k + 3,l + 2)
sum = sum + a(i,j + 1,k + 3) + b(i,j + 1,l + 3) + c(i,k + 3,l + 3)
sum = sum + a(i,j + 2,k) + b(i,j + 2,l) + c(i,k,l)
sum = sum + a(i,j + 2,k) + b(i,j + 2,l + 1) + c(i,k,l + 1)
sum = sum + a(i,j + 2,k) + b(i,j + 2,l + 2) + c(i,k,l + 2)
sum = sum + a(i,j + 2,k) + b(i,j + 2,l + 3) + c(i,k,l + 3)
sum = sum + a(i,j + 2,k + 1) + b(i,j + 2,l) + c(i,k + 1,l)
sum = sum + a(i,j + 2,k + 1) + b(i,j + 2,l + 1) + c(i,k + 1,l + 1)
sum = sum + a(i,j + 2,k + 1) + b(i,j + 2,l + 2) + c(i,k + 1,l + 2)
sum = sum + a(i,j + 2,k + 1) + b(i,j + 2,l + 3) + c(i,k + 1,l + 3)
sum = sum + a(i,j + 2,k + 2) + b(i,j + 2,l) + c(1,k + 2,l)
sum = sum + a(i,j + 2,k + 2) + b(i,j + 2,l + 1) + c(i,k + 2,l + 1)
sum = sum + a(i,j + 2,k + 2) + b(i,j + 2,l + 2) + c(i,k + 2,l + 2)
sum = sum + a(i,j + 2,k + 2) + b(i,j + 2,l + 3) + c(i,k + 2,l + 3)
sum = sum + a(i,j + 2,k + 3) + b(i,j + 2,l) + c(i,k + 3,l)
sum = sum + a(i,j + 2,k + 3) + b(i,j + 2,l + 1) + c(i,k + 3,l + 1)
sum = sum + a(i,j + 2,k + 3) + b(i,j + 2,l + 2) + c(i,k + 3,l + 2)
sum = sum + a(i,j + 2,k + 3) + b(i,j + 2,l + 3) + c(i,k + 3,l + 3)
sum = sum + a(i,j + 3,k) + b(i,j + 3,l) + c(i,k,l)
sum = sum + a(i,j + 3,k) + b(i,j + 3,l + 1) + c(i,k,l + 1)
sum = sum + a(i,j + 3,k) + b(i,j + 3,l + 2) + c(i,k,l + 2)
sum = sum + a(i,j + 3,k) + b(i,j + 3,l + 3) + c(i,k,l + 3)
sum = sum + a(i,j + 3,k + 1) + b(i,j + 3,l) + c(i,k + 1,l)
sum = sum + a(i,j + 3,k + 1) + b(i,j + 3,l + 1) + c(i,k + i,l + 1)
sum = sum + a(i,j + 3,k + 1) + b(i,j + 3,l + 2) + c(i,k + i,l + 2)
sum = sum + a(i,j + 3,k + 1) + b(i,j + 3,l + 3) + c(i,k + i,l + 3)
sum = sum + a(i,j + 3,k + 2) + b(i,j + 3,l) + c(i,k + 2,l)
sum = sum + a(i,j + 3,k + 2) + b(i,j + 3,l + 1) + c(i,k + 2,l + 1)
sum = sum + a(i,j + 3,k + 2) + b(i,j + 3,l + 2) + c(i,k + 2,l + 2)
sum = sum + a(i,j + 3,k + 2) + b(i,j + 3,l + 3) + c(i,k + 2,l + 3)
sum = sum + a(i,j + 3,k + 3) + b(i,j + 3,l) + c(i,k + 3,l)
sum = sum + a(i,j + 3,k + 3) + b(i,j + 3,l + 1) + c(i,k + 3,l + 1)
sum = sum + a(i,j + 3,k + 3) + b(i,j + 3,l + 2) + c(i,k + 3,l + 2)
sum = sum + a(i,j + 3,k + 3) + b(i,j + 3,l + 3) + c(i,k + 3,l + 3)
end do
end do
do j=j,n,1
do i=1,n,1
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
sum = sum + a(i,j,k) + b(i,j,l + 1) + c(i,k,l + 1)
sum = sum + a(i,j,k) + b(i,j,l + 2) + c(i,k,l + 2)
sum = sum + a(i,j,k) + b(i,j,l + 3) + c(i,k,l + 3)
sum = sum + a(i,j,k + 1) + b(i,j,l) + c(i,k + 1,l)
sum = sum + a(i,j,k + 1) + b(i,j,l + 1) + c(i,k + 1,l + 1)
sum = sum + a(i,j,k + 1) + b(i,j,l + 2) + c(i,k + 1,l + 2)
sum = sum + a(i,j,k + 1) + b(i,j,l + 3) + c(i,k + 1,l + 3)
sum = sum + a(i,j,k + 2) + b(i,j,l) + c(i,k + 2,l)
sum = sum + a(i,j,k + 2) + b(i,j,l + 1) + c(i,k + 2,l + 1)
sum = sum + a(i,j,k + 2) + b(i,j,l + 2) + c(i,k + 2,l + 2)
sum = sum + a(i,j,k + 2) + b(i,j,l + 3) + c(i,k + 2,l + 3)
sum = sum + a(i,j,k + 3) + b(i,j,l) + c(i,k + 3,l)
sum = sum + a(i,j,k + 3) + b(i,j,l + 1) + c(i,k + 3,l + 1)
sum = sum + a(i,j,k + 3) + b(i,j,l + 2) + c(i,k + 3,l + 2)
sum = sum + a(i,j,k + 3) + b(i,j,l + 3) + c(i,k + 3,l + 3)
end do
end do
end do
do k=k,n,1
do j=1,n,1
do i=1,n,1
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
sum = sum + a(i,j,k) + b(i,j,l + 1) + c(i,k,l + 1)
sum = sum + a(i,j,k) + b(i,j,l + 2) + c(i,k,l + 2)
sum = sum + a(i,j,k) + b(i,j,l + 3) + c(i,k,l + 3)
end do
end do
end do
end do
do i=l,n,1
do k=1,n,1
do j=1,n,1
do i=1,n,1
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
end do
end do
end do
end do
______________________________________
The advantages and benefits of the present invention may be appreciated by comparing the 85 copies of the loop body generated by the present invention to the 125 copies generated by the conventional unroll-and-jam transformation of the prior art as shown in Table C. The advantages and benefits of the present invention may be further appreciated by comparing the 21 copies of the lower execution frequency remainder loops generated by the present invention to the 61 copies generated by the conventional unroll-and-jam transformation of the prior art as shown in Table C. Though it may be possible to perform additional loop fusion or loop jamming on some of the remainder loops in the prior art version, the loop fusion will only affect remainder loops and will not reduce the number of copies.
TABLE C
______________________________________
do l = 1, n-3, 4
do k = 1, n-3, 4
do j = 1, n-3, 4
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
sum = sum + a(i,j+1,k) + b(i,j+1,l) + c(i,k,l)
sum = sum + a(i,j+2,k) + b(i,j+2,l) + c(i,k,l)
sum = sum + a(i,j+3,k) + b(i,j+3,l) + c(i,k,l)
sum = sum + a(i,j,k+1) + b(i,j,l) + c(i,k+1,l)
sum = sum + a(i,j+1,k+1) + b(i,j+1,l) + c(i,k+1,l)
sum = sum + a(i,j+2,k+1) + b(i,j+2,l) + c(i,k+1,l)
sum = sum + a(i,j+3,k+1) + b(i,j+3,l) + c(i,k+1,l)
sum = sum + a(i,j,k+2) + b(i,j,l) + c(i,k+2,l)
sum = sum + a(i,j+1,k+2) + b(i,j+1,l) + c(i,k+2,l)
sum = sum + a(i,j+2,k+2) + b(i,j+2,l) + c(i,k+2,l)
sum = sum + a(i,j+3,k+2) + b(i,j+3,l) + c(i,k+2,l)
sum = sum + a(i,j,k+3) + b(i,j,l) + c(i,k+3,l)
sum = sum + a(i,j+1,k+3) + b(i,j+1,l) + c(i,k+3,l)
sum = sum + a(i,j+2,k+3) + b(i,j+2,l) + c(i,k+3,l)
sum = sum + a(i,j+3,k+3) + b(i,j+3,l) + c(i,k+3,l)
sum = sum + a(i,j,k) + b(i,j,l+1) + c(i,k,l+l)
sum = sum + a(i,j+1,k) + b(i,j+1,l+1) + c(i,k,l+l)
sum = sum + a(i,j+2,k) + b(i,j+2,l+1) + c(i,k,l+l)
sum = sum + a(i,j+3,k) + b(i,j+3,l+1) + c(i,k,l+l)
sum = sum + a(i,j,k+1) + b(i,j,l+l) + c(i,k+1,l+1)
sum = sum + a(i,j+1,k+1) + b(i,j+1,l+1) + c(i,k+1,l+1)
sum = sum + a(i,j+2,k+1) + b(i,j+2,l+1) + c(i,k+1,l+1)
sum = sum + a(i,j+3,k+1) + b(i,j+3,l+1) + c(i,k+1,l+1)
sum = sum + a(i,j,k+2) + b(i,j,l+1) + c(i,k+2,l+1)
sum = sum + a(i,j+1,k+2) + b(i,j+1,l+1) + c(i,k+2,l+1)
sum = sum + a(i,j+2,k+2) + b(i,j+2,l+1) + c(i,k+2,l+1)
sum = sum + a(i,j+3,k+2) + b(i,j+3,l+1) + c(i,k+2,l+1)
sum = sum + a(i,j,k+3) + b(i,j,l+1) + c(i,k+3,l+1)
sum = sum + a(i,j+1,k+3) + b(i,j+1,l+1) + c(i,k+3,l+1)
sum = sum + a(i,j+2,k+3) + b(i,j+2,l+1) + c(i,k+3,l+1)
sum = sum + a(i,j+3,k+3) + b(i,j+3,l+1) + c(i,k+3,l+1)
sum = sum + a(i,j,k) + b(i,j,l+2) + c(i,k,l+2)
sum = sum + a(i,j+1,k) + b(i,j+1,l+2) + c(i,k,l+2)
sum = sum + a(i,j+2,k) 4 b(1,j+2,l+2) + c(i,k,l+2)
sum = sum + a(i,j+3,k) + b(i,j+3,l+2) + c(i,k,l+2)
sum = sum + a(i,j,k+1) + b(i,j,l+2) + c(i,k+1,l+2)
sum = sum + a(i,j+1,k+1) + b(i,j+1,l+2) + c(i,k+1,l+2)
sum = sum + a(i,j+2,k+1) + b(i,j+2,l+2) + c(i,k+1,l+2)
sum = sum + a(i,j+3,k+1) + b(i,j+3,l+2) + c(i,k+1,l+2)
sum = sum + a(i,j,k+2) + b(i,j,l+2) + c(i,k+2,l+2)
sum = sum + a(i,j+1,k+2) + b(i,j+1,l+2) + c(i,k+2,l+2)
sum = sum + a(i,j+2,k+2) + b(i,j+2,l+2) + c(i,k+2,l+2)
sum = sum + a(i,j+3,k+2) + b(i,j+3,l+2) + c(i,k+2,l+2)
sum = sum + a(i,j,k+3) + b(i,j,l+2) + c(i,k+3,l+2)
sum = sum + a(i,j+1,k+3) + b(i,j+1,l+2) + c(i,k+3,l+2)
sum = sum + a(i,j+2,k+3) + b(i,j+2,l+2) + c(i,k+3,l+2)
sum = sum + a(i,j+3,k+3) + b(i,j+3,l+2) + c(i,k+3,l+2)
sum = sum + a(i,j,k) + b(i,j,l+3) + c(i,k,l+3)
sum = sum + a(i,j+1,k) + b(i,j+1,l+3) + c(i,k,l+3)
sum = sum + a(i,j+2,k) + b(i,j+2,l+3) + c(i,k,l+3)
sum = sum + a(i,j+3,k) + b(i,j+3,l+3) + c(i,k,l+3)
sum = sum + a(i,j,k+1) + b(i,j,l+3) + c(i,k+1,l+3)
sum = sum + a(i,j+1,k+1) + b(i,j+1,l+3) + c(i,k+1,l+3)
sum = sum + a(i,j+2,k+1) + b(i,j+2,l+3) + c(i,k+1,l+3)
sum = sum + a(i,j+3,k+1) + b(i,j+3,l+3) + c(i,k+1,l+3)
sum = sum + a(i,j,k+2) + b(i,j,l+3) + c(i,k+2,l+3)
sum = sum + a(i,j+1,k+2) + b(i,j+1,l+3) + c(i,k+2,l+3)
sum = sum + a(i,j+2,k+2) + b(i,j+2,l+3) + c(i,k+2,l+3)
sum = sum + a(i,j+3,k+2) + b(i,j+3,l+3) + c(i,k+2,l+3)
sum = sum + a(i,j,k+3) + b(i,j,l+3) + c(i,k+3,l+3)
sum = sum + a(i,j+1,k+3) + b(i,j+1,l+3) + c(i,k+3,l+3)
sum = sum + a(i,j+2,k+3) + b(i,j+2,l+3) + c(i,k+3,l+3)
sum = sum + a(i,j+3,k+3) + b(i,j+3,l+3) + c(i,k+3,l+3)
end do
end do
do j = j, n
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
sum = sum + a(i,j,k+1) + b(i,j,l) + c(i,k+1,l)
sum = sum + a(i,j,k+2) + b(i,j,l) + c(i,k+2,l)
sum = sum + a(i,j,k+3) + b(i,j,l) + c(i,k+3,l)
end do
end do
end do
do k = k, n
do j = 1, n-3, 4
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
sum = sum + a(i,j+1,k) + b(i,j+1,l) + c(i,k,l)
sum = sum + a(i,j+2,k) + b(i,j+2,l) + c(i,k,l)
sum = sum + a(i,j+3,k) + b(i,j+3,l) + c(i,k,l)
end do
end do
do j = j, n
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
end do
end do
end do
do k = 1, n-3, 4
do j = j, n
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l+1) + c(i,k,l+1)
sum = sum + a(i,j,k+1) + b(i,j,l+1) + c(i,k+1,l+1)
sum = sum + a(i,j,k+2) + b(i,j,l+1) + c(i,k+2,l+1)
sum = sum + a(i,j,k+3) + b(i,j,l+1) + c(i,k+3,l+1)
end do
end do
end do
do k = k,n
do j = 1, n-3, 4
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l+1) + c(i,k,l+1)
sum = sum + a(i,j+1,k) + b(i,j+1,l+1) + c(i,k,l+1)
sum = sum + a(i,j+2,k) + b(i,j+2,l+1) + c(i,k,l+1)
sum = sum + a(i,j+3,k) + b(i,j+3,l+1) + c(i,k,l+1)
end do
end do
do j = j, n
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l+1) + c(i,k,l+1)
end do
end do
end do
do k = 1, n-3, 4
do j = j, n
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l+2) + c(i,k,l+2)
sum = sum + a(i,j,k+1) + b(i,j,l+2) + c(i,k+1,l+2)
sum = sum + a(i,j,k+2) + b(i,j,l+2) + c(i,k+2,l+2)
sum = sum + a(i,j,k+3) + b(i,j,l+2) + c(i,k+3,l+2)
end do
end do
end do
do k = k, n
do j = 1, n-3, 4
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l+2) + c(i,k,l+2)
sum = sum + a(i,j+1,k) + b(i,j+1,l+2) + c(i,k,l+2)
sum = sum + a(i,j+2,k) + b(i,j+2,l+2) + c(i,k,l+2)
sum = sum + a(i,j+3,k) + b(i,j+3,l+2) + c(i,k,l+2)
end do
end do
do j = j, n
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l+2) + c(i,k,l+2)
end do
end do
end do
do k = 1, n-3, 4
do j = j, n
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l+3) + c(i,k,l+3)
sum = sum + a(i,j,k+1) + b(i,j,l+3) + c(i,k+1,l+3)
sum = sum + a(i,j,k+2) + b(i,j,l+3) + c(i,k+2,l+3)
sum = sum + a(i,j,k+3) + b(i,j,l+3) + c(i,k+3,l+3)
end do
end do
end do
do k = k, n
do j = 1, n-3, 4
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l+3) + c(i,k,l+3)
sum = sum + a(i,j+1,k) + b(i,j+1,l+3) + c(i,k,l+3)
sum = sum + a(i,j+2,k) + b(i,j+2,l+3) + c(i,k,l+3)
sum = sum + a(i,j+3,k) + b(i,j+3,l+3) + c(i,k,l+3)
end do
end do
do j = j, n
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l+3) + c(i,k,l+3)
end do
end do
end do
end do
do l = l, n
do k = 1, n-3, 4
do j = 1, n-3, 4
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
sum = sum + a(i,j+1,k) + b(i,j+1,l) + c(i,k,l)
sum = sum + a(i,j+2,k) + b(i,j+2,l) + c(i,k,l)
sum = sum + a(i,j+3,k) + b(i,j+3,l) + c(i,k,l)
sum = sum + a(i,j,k+1) + b(i,j,l) + c(i,k+1,l)
sum = sum + a(i,j+1,k+1) + b(i,j+1,l) + c(i,k+1,l)
sum = sum + a(i,j+2,k+1) + b(i,j+2,l) + c(i,k+1,l)
sum = sum + a(i,j+3,k+1) + b(i,j+3,l) + c(i,k+1,l)
sum = sum + a(i,j,k+2) + b(i,j,l) + c(i,k+2,l)
sum = sum + a(i,j+1,k+2) + b(i,j+1,l) + c(i,k+2,l)
sum = sum + a(i,j+2,k+2) + b(i,j+2,l) + c(i,k+2,l)
sum = sum + a(i,j+3,k+2) + b(i,j+3,l) + c(i,k+2,l)
sum = sum + a(i,j,k+3) + b(i,j,l) + c(i,k+3,l)
sum = sum + a(i,j+1,k+3) + b(i,j+1,l) + c(i,k+3,l)
sum = sum + a(i,j+2,k+3) + b(i,j+2,l) + c(i,k+3,l)
sum = sum + a(i,j+3,k+3) + b(i,j+3,l) + c(i,k+3,l)
end do
end do
do j = j, n
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
sum = sum + a(i,j,k+1) + b(i,j,l) + c(i,k+1,l)
sum = sum + a(i,j,k+2) + b(i,j,l) + c(i,k+2,l)
sum = sum + a(i,j,k+3) + b(i,j,l) + c(i,k+3,l)
end do
end do
end do
do k = k, n
do j = 1, n-3, 4
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
sum = sum + a(i,j+1,k) + b(i,j+1,l) + c(i,k,l)
sum = sum + a(i,j+2,k) + b(i,j+2,l) + c(i,k,l)
sum = sum + a(i,j+3,k) + b(i,j+3,l) + c(i,k,l)
end do
end do
do j = j, n
do i = 1, n
sum = sum + a(i,j,k) + b(i,j,l) + c(i,k,l)
end do
end do
end do
end do
______________________________________
Referring next to Table D, psuedo-code illustrating operations preferred in the carrying out of the present invention are shown. This psuedo-code is also illustrated in FIG. 3.
TABLE D
______________________________________
INPUTS:
1. LOOP[l]. . . LOOP[k], a perfect nest of k loops, numbered from
outermost to innermost, in an optimizer intermediate representation.
The notation, "do i.sub.j = lb.sub.j, ub.sub.j, inc.sub.j ", denotes
the contents
of statement LOOP[j] with index variable i.sub.j, lower bound
lb.sub.j, upper bound ub.sub.j, and increment inc.sub.j.
2. UF[j] >= 1, unroll factor for each LOOP[j].
3. COUNT[j], constant value or symbolic expression for number of
iterations executed by LOOP[j], where UF[j] is assumed to be less
than or equal to COUNT[j] if COUNT[j] is a constant. It is well
known by those skilled in the art to derive COUNT[j] from ib.sub.j
(lower bound of j), ub.sub.j (upper bound of j), and inc.sub.j
(increment of j).
OUTPUT:
1. Updated intermediate representation of the unrolled loops to reflect
the loop unrolling transformation specified by UF[l], . . ., UF[k].
METHOD:
Set next.sub.-- parent = parent of LOOP[j] in intermediate
representation, the
original loop nest;
Detach subtree rooted at LOOP[l] from next.sub.-- parent/* this subtree
is used as the source for generating copies of the original loop body
in the unrolled and remainder loops */;
Set unrolled.sub.-- body = copy of body of innermost loop, LOOP[k];
for(j = l; j <= k; j++) {
Set current.sub.--parent = next.sub.-- parent;
/* STEP 1: Expand unrolled.sub.-- body by factor UF[j] for index i.sub.j
*/
Set new.sub.-- unrolled.sub.-- loop.sub.-- body = copy of
unrolled.sub.-- body;
for (u = l; u <= UF[j] - 1; u++) {
Set one.sub.-- copy = copy of unrolled.sub.-- body;
replace all references of loop index variable "i.sub.j " in one.sub.--
copy by
"i.sub.j + inc.sub.j *u";
append one.sub.-- copy to end of new.sub.-- unrolled.sub.-- loop.sub.--
body;
}
Delete unrolled.sub.-- body and set unrolled.sub.-- body =
new.sub.-- unrolled.sub.-- loop.sub.-- body;
/* STEP 2: Adjust header for unrolled loop j */
Construct the remainder expression er.sub.j, = mod(COUNT[j], UF[j]);
if( COUNT[j] is constant and COUNT[j] = UF[j]) {
/* loop j is to be completely unrolled */
Construct the statement, "i" = lb.sub.j ", call it next.sub.-- parent,
and make it the first
(leftmost) child of current.sub.-- parent;
}
else {
Make a copy of the LOOP[j] statement, call it next.sub.-- parent,
change
it to "do i.sub.j = lb.sub.j, ub.sub.j - er.sub.j *inc.sub.j,
UF[j]*inc.sub.j ", and make it a child of
current.sub.-- parent;
}
/* STEP 3: Generate remainder loop sub-nest, if necessary */
if (er.sub.j is not a constant 0) {
Set tree.sub.-- copy = copy of subtree rooted at LOOP[j];
change the outermost statement in tree.sub.-- copy to
"do i.sub.j = ub.sub.j - (er.sub.j -l)*inc.sub.j, ub.sub.j, inc.sub.j
";
Make tree.sub.-- copy a child of current.sub.-- parent;
}
} /* for (j = . . . ) */
Make unrolled.sub.-- body a child of next.sub.-- parent;
Delete subtree rooted at LOOP[l] (original loop nest);
______________________________________
Referring next to FIG. 3 through FIG. 6, flowcharts illustrating operations preferred in carrying out the present invention are shown. In the flowcharts, the graphical conventions of a diamond for a test or decision and a rectangle for a process or function are used. These conventions are well understood by those skilled in the art, and the flowcharts are sufficient to enable one of ordinary skill to write code in any suitable computer programming language. These flowcharts correspond to the pseudo-code of Table D. Referring now to FIG. 4, the process of the invention, generally referred to as 400, begins at process block 405. Thereafter, process block 410 sets a next.sub.-- parent equal to the parent of LOOP[1] in an intermediate representation, and process block 415 detaches a subtree rooted at LOOP[1] from the next.sub.-- parent. FIG. 5 illustrates the tree structure 500 for the original intermediate representation of the loops to be unrolled comprising a parent 510 of LOOP[1] 520, and k perfectly nested loops comprising LOOP[1] 520, LOOP[2] 530, and LOOP[3] 540. Next, process block 420 sets an unrolled.sub.-- body 550 equal to a copy of a body of an innermost loop, LOOP[k]. Then, process block 425 begins a for loop with j initialized to 1 and incremented by j++ while j is less than or equal to k. Within this loop, process block 430 sets a current.sub.-- parent equal to the next.sub.-- parent, and then performs three steps. Step one, performed by process block 435, expands the unrolled.sub.-- body by a factor UF[j] for index i.sub.j. FIG. 6 illustrates an unrolled body 600 produced by process block 435 after j=1, and FIG. 7 illustrates an unrolled body 700 produced by process block 435 after j=2. Step two, performed by process block 440, adjusts a header for an unrolled loop j. Step three, performed by process block 445, generates a remainder loop sub-nest, if necessary. Thereafter, process block 450 increments the for loop index j by j++, and then decision block 455 determines if the for loop is completed (if j less than or equal to k). If the for loop is not completed, then processing loops back to process block 430 to process the next.sub.-- parent. Returning now to decision block 455, if the for loop is completed (if j not less than or equal to k), then process block 460 makes the unrolled.sub.-- body a child of next.sub.-- parent, and process block 465 deletes the subtree rooted at LOOP[1]. The process then ends at process block 470. Referring now to FIG. 8, an expansion of process block 435 of FIG. 4, generally referred to as 800, is illustrated. This process 800 performs the first step of expanding the unrolled.sub.-- body by a factor UF[J] for index i.sub.j. The process 800 begins at process block 805, and thereafter, process block 810 sets a new.sub.-- unrolled.sub.-- loop.sub.-- body equal to a copy of the unrolled.sub.-- body. Then, process block 815 begins a for loop with u initialized to 1 and incremented by u++ while u is less than or equal to UF[j]-1. Within this loop, process block 820 sets one.sub.-- copy equal to the copy of unrolled.sub.-- body; process block 825 replaces all references of loop index variable "i.sub.j " in one.sub.-- copy by "i.sub.j +inc.sub.j *u"; and process block 830 appends one.sub.-- copy to the end of new.sub.-- unrolled.sub.-- loop.sub.-- body. Process block 835 then increments the for loop index u by u++, and decision block 840 determines if the for loop is completed (if u less than or equal to UF[j]-1). If the for loop is not completed, then processing loops back to process block 820 to again process one.sub.-- copy. Returning now to decision block 840, if the for loop is completed (if u not less than or equal to UF[j]-1), then process block 845 deletes the unrolled.sub.-- body and sets unrolled.sub.-- body equal to the new.sub.-- unrolled.sub.-- loop.sub.-- body, and then processing returns through process block 850 to continue processing process block 440 of FIG. 4. Referring now to FIG. 9, an expansion of process block 440 of FIG. 4, generally referred to as 900, is illustrated. This process 900 performs the second step of adjusting the header for an unrolled loop j. The process 900 begins at process block 905, and thereafter, process block 920 constructs the remainder expression erj which is equal to mod(COUNT[j], UF[j]). Decision block 930 then determines if COUNT[j] is a constant equal to UF[j]. If COUNT[j] is a constant equal to UF[j], then loop j is to be completely unrolled and process block 940 constructs the statement, "i.sub.j =lb.sub.j ", calls it next.sub.-- parent, and makes it a child of current.sub.-- parent, and then processing returns through process block 950 to continue processing process block 445 of FIG. 4. Returning now to decision block 930, if COUNT[j] is not a constant equal to UF[j], then process block 960 makes a copy of the LOOP[j] statement; calls it next.sub.-- parent; changes it to "do i.sub.j lb.sub.j, ub.sub.j -er.sub.j *inc.sub.j, UF[j]*inc.sub.j "; and makes it a child of current.sub.-- parent. Processing then returns through process block 950 to continue processing process block 445 of FIG. 4. Referring now to FIG. 10, an expansion of process block 445 of FIG. 4, generally referred to as 1000, is illustrated. This process 1000 performs the third step of generating a remainder loop sub-nest, if necessary. The process 1000 begins at process block 1010, and thereafter, decision block 1020 determines if er.sub.j is not a constant equal to 0. If er.sub.j is not a constant equal to 0, then process block 1030 sets tree.sub.-- copy equal to the copy of subtree rooted at LOOP[j]; process block 1040 changes the outermost statement in tree.sub.-- copy to "do i.sub.j =ub.sub.j -(er.sub.j -1)*inc.sub.j, ub.sub.j, inc.sub.j "; and process block 1050 makes tree.sub.-- copy a child of current.sub.-- parent. Processing then returns through process block 1060 to continue processing process block 450 of FIG. 4. Returning now to decision block 1020, if er.sub.j is a constant equal to 0, then processing returns through process block 1060 to continue processing process block 450 of FIG. 4. Referring now to FIG. 11, a block diagram illustrates a computer system 1100 used in performing the method of the present invention, forming part of the apparatus of the present invention, and which may use the article of manufacture comprising a computer-readable storage medium having a computer program embodied in said medium which may cause the computer system to practice the present invention. The computer system 1100 includes a processor 1102, which includes a central processing unit (CPU) 1104, and a memory 1106. Additional memory, in the form of a hard disk file storage 1108 and a computer-readable storage device 1110, is connected to the processor 1102. Computer-readable storage device 1110 receives a computer-readable storage medium 1112 having a computer program embodied in said medium which may cause the computer system to implement the present invention in the computer system 1100. The computer system 1100 includes user interface hardware, including a mouse 1114 and a keyboard 1116 for allowing user input to the processor 1102 and a display 1118 for presenting visual data to the user. The computer system may also include a printer 1120. Although the present invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made without departing from the spirit and the scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
