Method and system for improving the locality of memory references during execution of a computer program6292934Abstract The present invention provides a method and system for determining an optimal placement order for basic blocks within a computer program to improve locality of reference and reduce the working set of the computer program. By reducing the working set, the computer program requires less memory than it normally would require to execute on a computer system. The optimal placement order for basic blocks within a computer program reflects the concurrency of usage for basic blocks during execution of the computer program. The method for determining an optimal placement order includes analyzing the computer program to identify all of the basic blocks, determining how many times each basic block is executed, assigning a placement order to each basic block depending upon how many times each basic block was executed, and reordering the basic blocks according to their assigned placement orders to produce an optimized computer program. The method used to identify all of the basic blocks includes disassembling known instruction addresses to identify the beginning and end of basic blocks and processing jump tables to identify more instruction addresses. Processing jump tables includes processing the first entry of every jump table before processing the second entry of any jump table. The present invention further optimizes a computer program by replacing rarely executed instructions with other instructions that require a smaller amount of storage space. Claims We claim: Description TECHNICAL FIELD
JMP *(BaseAddress + index)
{pad bytes}
BaseAddress &(TARGET1)
&(TARGET2)
.
.
.
&(TARGETn)
{pad bytes}
TARGET1 .
.
.
{pad bytes}
TARGETn .
.
.
Notice the appearance of pad bytes at various locations within the above code. For performance reasons, a compiler program typically inserts pad bytes to align code and data to a specific address. In the above code example, a jump table containing "n" entries is located at the label "BaseAddress." The starting address of a jump table is its base address. The instruction "JMP *(BaseAddress+index)" jumps to one of the "Targetn" labels indirectly through the jump table. The "index" indicates which entry in the jump table to jump through. A jump table may also be used by an indirect call instruction. Also, as shown above, the first entry in a jump table typically points to code that is located immediately after the jump table and a jump table typically follows a basic block having an indirect branch exit instruction. Due to the complexities and problems associated with jump table analysis, the optimizer program 114 uses special processing for jump tables. A routine ProcessJumpTable identifies instructions referenced by jump table entries. As new instruction addresses are identified by the jump table analysis, ProcessJumpTable calls FindBB to disassemble the instructions at those addresses and identify all basic blocks that are encountered during the disassembly process. The routine ProcessJumpTable is explained below in more detail with reference to FIG. 5. FIG. 4 is a flow diagram of the routine FindBB in accordance with a preferred embodiment of the present invention. In step 401, FindBB determines whether the resolve list contains any addresses. As explained above, known instruction addresses are stored on the resolve list. If the resolve list does not contain any addresses, then FindBB is done. If the resolve list is not empty, then, in step 403, FindBB removes an instruction address from the resolve list and scans a list of known code blocks to determine whether a known code block starts at this instruction address. The list of known code blocks contains addresses of labeled instructions. For example, referring to the above example code for a jump table, the labels "Target1" and "Targetn" indicate the start of code blocks. If a block starts at the instruction address, there is no need to re-examine the address so FindBB loops back to step 401. If a known code block does not start at the instruction address, then the instruction address must be the start of a new code block. In step 405, FindBB splits the known or unknown code block that contains the instruction address and records the instruction address as the start of a new basic block. In steps 407 and 408, FindBB sequentially disassembles the instructions that follow the start of the new basic block until a transfer exit instruction is found. A transfer exit instruction is any instruction that may cause a transfer of control to another basic block. Examples of such exit instructions include branches, conditional branches, traps, calls, and returns. When a transfer exit instruction is found, in step 409, FindBB records the address of the exit instruction as the end of the new code block. All addresses within range of the previously identified block that follow the exit instruction of the newly identified basic block become another new basic block. In steps 411-414, FindBB determines the follower and target addresses, if any, for the new code block, and queues the follower and target addresses on the resolve list for later examination. A follower address is the address of an entrance instruction of a "fall through" block; that is, no branch or jump instruction is needed to access the block. A target address is the address of an instruction for a block of code that is the destination of a branch or jump instruction. If the exit instruction for the new block is an indirect jump or call instruction, then FindBB determines whether a jump table may start at the base address of the instruction. Because jump tables required special handling, in steps 415 and 416, FindBB stores the base address of the termination instruction in a base list. Each entry in the base list contains an address and an index into a jump table. The entries in the base list are sorted by index value so that the first entry in the list has the lowest index. Whenever a base address is added to the base list, the corresponding index value is set to zero. The index value corresponds to the entry in the jump table that will be processed next as discussed below. FindBB then loops back to step 401 to examine the next address on the resolve list, if more addresses exist. As mentioned above, FindBB uses special processing to identify the extent of a jump table. This special processing includes processing all jump tables in a breadth-first manner. That is, a routine ProcessJumpTable processes the first entry in every jump table before processing the second or subsequent entries in any jump table. When FindBB disassembles an instruction that references a jump table, the base address of the jump table is put on the base list (see step 416 of FIG. 4). FIG. 5 is a flow diagram of the routine ProcessJumpTable in accordance with a preferred embodiment of the present invention. In step 501, ProcessJumpTable determines whether the base list contains any entries. If the base list does not contain any entries, then ProcessJumpTable ends. If the base list contains one or more entries, then, in step 503, ProcessJumpTable places the address pointed to by the first entry on the resolve list. This address is determined by adding the contents of the base address to the index value. In steps 505 and 506, ProcessJumpTable determines whether the end of the jump table has been reached, and, if not, places the next entry in the jump table onto the base list with the index value incremented. The end of a jump table has been reached when the next address is a pad byte or the entrance instruction of a code block. In step 507, ProcessJumpTable calls the routine FindBB. FindBB may then identify the start of additional jump tables. ProcessJumpTable processes the newly identified jump tables to the same depth as the other jump tables because the base address of a newly identified jump tables is added to the base list in index order. This breadth-first processing of jump tables tends to maximize the chances of identifying a code block that immediately follows a jump table. In this way, ProcessJumpTable ceases processing a jump table when the next address following a jump table entry contains the entrance instruction of a basic block. Each basic block identified has associated data that includes an address, a size, a unique identifier known as a block identifier ("BID"), a follower block identifier ("BIDFollower"), and target block identifier ("BIDTarget"). Each BIDFollower field contains the BID of a block to which control will pass if a block exits with a fall through condition. Each BIDTarget field contains the BID of a block to which control will pass if a block exits with a branch condition. Referring to example basic blocks shown below in Table A, block "B1" has a size of 17 bytes. Additionally, block "B2" is the follower block of block "B1" and block "B10" is the target block of block "B1." A "nil" value stored in either the BIDFollower or BIDTarget fields indicates no follower or target block, respectively.
TABLE A
Address Instruction Assembled Instruction
Id: B1 Size: 0x11(17) BidFollower: B2 BidTarget: B10
0075FE00 53 push ebx
0075FE01 56 push esi
0075FE02 57 push edi
0075FE03 8B 44 24 14 mov eax,dword ptr [esp+14]
0075FE07 8B F8 mov edi,eax
0075FE09 8B 74 24 18 mov esi,dword ptr [esp+18]
0075FE0D 85 F6 test esi,esi
0075FE0F 74 30 je 0075FE41
Id: B2 Size: 0xf(15) BidFollower: B3 BidTarget: nil
0075FE11 C7 06 FF FF FF mov dword ptr [esi],FFFFFF
0075FE17 8B 4C 24 10 mov ecx,dword ptr [esp+10]
0075FE1B BB 26 00 00 00 mov ebx,00000026
Id: B3 Size: 0x4(4) BidFollower: B4 BidTarget: B8
0075FE20 38 19 cmp byte ptr [ecx],bl
0075FE22 75 11 jne 0075FE35
Id: B4 Size: 0x5(5) BidFollower: B5 BidTarget: B7
0075FE24 83 3E FF cmp dword ptr [esi],FF
0075FE27 75 0B jne 0075FE34
Id: B5 Size: 0X5(5) BidFollower: B6 BidTarget: B7
0075FE29 38 59 01 cmp byte ptr [ecx+0.1],bl
0075FE2C 74 06 je 0075FE34
Id: B6 Size: 0x6(6) BidFollower: B7 BidTarget: nil
0075FE2E 8B D0 mov edx,eax
0075FE30 2B D7 sub edx,edi
0075FE32 89 16 mov dword ptr [esi],edx
Id: B7 Size: 0x1(1) BidFollower: B8 BidTarget: nil
0075FE34 41 inc ecx
Id: B8 Size: 0x9(9) BidFollower: B9 BidTarget: B13
0075FE35 8A 11 mov dl,byte ptr [ecx]
0075FE37 88 10 mov byte ptr [eax],dl
0075FE39 41 inc ecx
0075FE3A 84 D2 test dl,dl
0075FE3C 74 1C je 0075FE5A
Id: B9 Size: 0x3(3) BidFollower: nil BidTarget: B3
0075FE3E 40 inc eax
0075FE3F EB DF jmp 0075FE20
Id: B10 Size: 0xd(13) BidFollower: B11 BidTarget: B13
0075FE41 8B 4C 24 10 mov ecx,dword ptr [esp+10]
0075FE45 8A 11 mov dl,byte ptr [ecx]
0075FE47 88 10 mov byte ptr [eax],dl
0075FE49 41 inc ecx
0075FE4A 84 D2 test dl,dl
0075FE4C 74 0C je 0075FE5A
Id: B11 Size: 0x2(2) BidFollower: B12 BidTarget: nil
0075FE4E 8B FF mov edi,edi
Id: B12 Size: 0xa(10) BidFollower: B13 BidTarget: B12
0075FE50 40 inc eax
0075FE51 8A 11 mov dl,byte ptr [ecx]
0075FE53 88 10 mov byte ptr [eax],dl
0075FE55 41 inc ecx
0075FE56 84 D2 test dl,dl
0075FE58 75 F6 jne 0075FE50
Id: B13 Size: 0x8(8) BidFollower: nil BidTarget: nil
0075FE5A 2B C7 sub eax,edi
0075FE5C 5F pop edi
0075FE5D 5E pop esi
0075FE5E 5B pop ebx
0075FE5F C2 0C 00 ret 000C
The pseudo code for the method used in a preferred embodiment of the present invention to identify basic blocks is shown below in Table B. The pseudo code illustrates the situation in which the computer program has multiple entry points. The address of the entry points are stored in the table named EPTable.
TABLE B
EntryPointTable (EPTable)-each entry contains an entry point into the
program code being
disassembled
BaseAddressTable (BA Table)-each entry contains a base address of a jump
table and an index of the
next entry to be processed. The entries in the table are sorted by
index.
IdentifyBB ( )
{ while (EPTable != empty)
nextEntryPoint = GetEPTable( )
FindBB (nextEntryPoint)
endwhile
while (BA Table != empty)
GetBA Table (baseAddress, index)
FindBB (*(baseAddress+index))
PutBA Table (baseAddress, index + 1)
endwhile
}
FindBB(Address)
{ startBB (address
nextAddrews = address
do
CurAddress = nextAddress
disassemble instruction at curAddress
nextAddress = nextAddress + 1
while (instruction != end of BB)
endBB(curaddress)
if instruction is a jump
FindBB(address of target of instruction)
if instruction is conditional jump
FindBB(address of target of instruction)
FindBB(address of follower of instruction)
if instruction is indirect jump or call
putBA Table(BaseAddress in instruction, 0)
}
PutBA Table(Base Address, index)
{ if (BaseAddress is a fixup &&
BaseAddress is in code or unknown section
store (BaseAddress, index) in BA Table in sorted order
by index
}
GetBA Table(Base Address, index)
{ if (BaseAddress is a fixup &&
BaseAddress is in code or unknown section
store (BaseAddress, index) in BA Table in sorted order
by index
}
GetBA Table(BaseAddress, index)
{ retrieve BaseAddress with lowest index from BA Table
}
GetEPTable (address)
{ retrieve address stored in next entry of EPTable
}
Referring back to FIG. 2, in step 203, the optimizer program 114 records execution data for each basic block during execution of an instrument ed version of the computer program 116. The instrument ed version of the computer program 116 preferably includes instrumentation code in the form of calls to one or more library routines. Instrumentation code may be manually added to the computer program 116, or the optimizer program 114 may automatically insert a call to a library routine into each basic block when the basic block is identified. A library routine is a routine stored in a library file that can be used by any program that can link into the library file. A library routine is typically used so that the same code does not have to be duplicated throughout the instrument ed computer program. After the instrumentation code is added, addresses must be adjusted to account for the added instructions. In a preferred embodiment, the library routine records execution data by causing a counter corresponding to a basic block to be incremented every time the basic block is accessed. Although the added instructions are preferably in the form of a call to a library routine, this is an optirmization and not necessary to carry out the present invention. The call to the library routine is preferably inserted immediately before the exit instruction of a basic block. During execution of the instrument ed computer program on the computer system 100, execution data is gathered for each basic block. In one embodiment of the present invention, a user interacts with the instrument ed program while the instrument ed computer program is executing. In another embodiment of the present invention, an executor program interacts with the instrument ed computer program according to a programmed scenario. The scenario may take the form of an execution script. While instrumentation code may be added to every basic block, a preferred embodiment of the present invention adds instrumentation code only to selected basic blocks, called instrumentation points. When the instrument ed computer program is executed on the computer system, the instrumentation code records execution information for only the basic blocks selected as instrumentation points. The recorded execution information is then used to calculate execution information for the non-instrument ed basic blocks. This method is described in detail in the patent application filed concurrently herewith and entitled "METHOD AND SYSTEM FOR SELECTING INSTRUMENTATION POINTS IN A COMPUTER PROGRAM," which is incorporated herein by reference. Example execution data for each basic block shown in Table A is provided below in Table C.
TABLE C
Execution
Data Address Instruction Assembled Instruction
Id: B1 Size: 0x11(17) BidFollower: B2 BidTarget: B10
89 0075FE00 53 push ebx
0075FE01 56 push esi
0075FE02 57 push edi
0075FE03 8B 44 24 14 move eax,dword ptr [esp+14]
0075FE07 8B F8 mov edi,eax
0075FE09 8B 74 24 18 mov esi,dword ptr [esp+18]
0075FE0D 85 F6 test esi,esi
0075FE0F 74 30 je 0075FE41
Id: B2 Size: 0xf(15) BidFollower: B3 BidTarget: nil
89 0075FE11 C7 06 FF FF FF FF mov dword ptr [esi],FFFFFFFF
0075FE17 8B 4C 24 10 mov ecx,dword ptr [esp+10]
0075FE1B BB 26 00 00 00 mov ebx,00000026
Id: B3 Size: 0x4(4) BidFollower: B4 BidTarget: B8
927 0075FE20 38 19 cmp byte ptr [ecx],bl
0075FE22 75 11 jne 0075FE35
Id: B4 Size: 0x5(5) BidFollower: B5 BidTarget: B7
59 0075FE24 83 3E FF cmp dword ptr [esi],FF
0075FE27 75 0B jne 0075FE34
Id: B5 Size: 0x5(5) BidFollower: B6 BidTarget: B7
59 0075FE29 38 59 01 cmp byte ptr [ecx+01],bl
0075FE2C 74 06 je 0075FE34
Id: B6 Size: 0x6(6) BidFollower: B7 BidTarget: nil
59 0075FE2E 8B D0 mov edx,eax
0075FE30 2B D7 sub edx,edi
0075FE32 89 16 mov dword ptr [esi],edx
Id: B7 Size: 0x1(1) BidFollower: B8 BidTarget: nil
59 0075FE34 41 inc ecx
Id: B8 Size: 0x9(9) BidFollower: B9 BidTarget: B13
927 0075FE35 8A 11 mov dl,byte ptr [ecx]
0075FE37 88 10 mov byte ptr [eax],dl
0075FE39 41 inc ecx
0075FE3A 84 D2 test dl,dl
0075FE3C 74 1C je 0075FE5A
Id: B9 Size: 0x3(3) BidFollower: nil BidTarget: B3
838 0075FE3E 40 inc eax
0075FE3F EB DF jmp 0075FE20
Id: B10 Size: 0xd(13) BidFollower: B11 BidTarget: B13
0 0075FE41 8B 4C 24 10 mov ecx,dword ptr [esp+10]
0075FE45 SA 11 mov dl,byte ptr [ecx]
0075FE47 88 10 mov byte ptr [eax],dl
0075FE49 41 inc ecx
0075FE4A 84 D2 test dl,dl
0075FE4C 74 0C je 0075FE5A
Id: B11 Size: 0x2(2) BidFollower: B12 BidTarget: nil
0 0075FE4E 8B FF mov edi,edi
Id: B12 Size: 0xa(10) BidFollower: B13 BidTarget: B12
0 0075FE50 40 inc eax
0075FE51 8A 11 mov dl,byte ptr [ecx]
0075FE53 88 10 mov byte ptr [eax],dl
0075FE55 41 inc ecx
0075FE56 84 D2 test dl,dl
0075FE58 75 F6 jne 0075FE50
Id: B13 Size: 0x8(8) BidFollower: nil BidTarget: nil
89 0075FE5A 2B C7 sub eax,edi
0075FE5C 5F pop edi
0075FE5D 5E pop esi
0075FE5E 5B pop ebx
0075FE5F C2 0C 00 ret 000C
Referring back to FIG. 2, in step 205 the optimizer program 114 assigns a placement order to each basic block based upon the execution data recorded in step 203. The placement order assigned to a basic block reflects how many times the block is accessed during execution of the instrument ed computer program. For example, basic blocks that are rarely accessed (i.e., "dead code") are assigned a low placement order, while basic blocks that are commonly executed (i.e., "live code") are assigned a high placement order. Those skilled in the art will appreciate that many methods exist for determining how many times a basic block will be accessed during execution of a computer program. For example, a programmer with knowledge about when and how often basic blocks are accessed may manually assign placement orders. Alternatively, instrumentation code such as a call to a library routine may be inserted into each basic block to record when the basic block is accessed. Those skilled in the art will appreciate that many methods exist for determining an optimal placement order for the basic blocks, and those methods may be used separately or in conjunction with the methods described herein. In one embodiment of the present invention, a process called "run building" may be used to order basic blocks in such a way so as to maximize the probability of executing straight line code. Run building improves locality by reducing the number of "jumps" that must be taken. FIG. 6 is an overview flow diagram of a method for determining an optimal placement order using a run builder in accordance with this embodiment of the present invention. In step 601, the run builder orders all edges by execution count. An edge is an exit instruction in a basic block; it defines the flow of control from one block, called a source block, to another block, called a destination block. In steps 602-605, the run builder iterates over each edge, determining whether the edge's source block may be joined with the edge's destination block. If neither the source nor the destination blocks have been previously joined to a different block, then the run builder assigns consecutive placement orders to the source and destination blocks. In another embodiment of the present invention, a separation process is used to order blocks in such a way so as to group basic blocks together based on the number of times during execution of the computer program that each basic block is executed. Using the separation process, basic blocks that are executed frequently are grouped together and basic blocks that are executed infrequently are grouped together. FIG. 7 is an overview flow diagram of a method for determining an optimal placement order using a separator program in accordance with this embodiment of the present invention. In step 701, the separator determines whether there are any basic blocks that have not been placed into the optimized computer program. If there are basic blocks that have not been placed into the optimized computer program, then in step 705 the separator selects one of these basic blocks and compares the selected basic block's execution count (part of the execution data) with a predetermined separation value. If the selected basic block's execution count is greater than the predetermined separation value, then in step 706 the separator appends the selected basic block to a list of "active" basic blocks. If the selected basic block's execution count is less than or equal to the predetermined separation value, then in step 707 the separator appends the selected basic block to a list of "inactive" basic blocks. Different numbers may be assigned to the separation value, depending upon the type of block separation desired. For example, if a zero value is used as the predetermined separation value, then dynamically dead code will be separated from dynamically live code. Referring back to FIG. 2, after an optimal placement order has been determined, in step 207 the basic block linker program 112 produces an optimized computer program by reordering the basic blocks according to the determined optimal placement order. To reorder the basic blocks, the basic block linker ("BBLinker") program 112 re-links the basic blocks according to their assigned placement orders. FIG. 8 is an overview flow diagram of this re-linking process. In step 801, the BBLinker program loads all of the basic blocks in the computer program 116 into the main memory 104. In step 803, the BBLinker program orders the basic blocks according to their assigned placement orders. At this time, the BBLinker also notes the new address of each block. In step 805, the BBLinker reviews the exit instruction of each block to determine if modifications are required to reflect the new address of each basic block, and, if modification are required, makes the necessary modifications. In step 807, the BBLinker modifies any references to the reordered basic blocks to reflect the new ordering and updates the symbol table to reflect the new addresses. Optimization of the computer program 116 is now complete. In step 809, the BBLinker program copies the optimized computer program to the secondary memory 106. Table D shown below demonstrates some of the code transformations that may be made to the basic blocks shown above in FIG. C. These transformations include grouping frequently executed blocks together, grouping infrequently executed blocks together, adjusting jump instruction indexes, and inverting the test on conditional branches.
TABLE D
Execution
Count Address Instruction Assembled Instruction
Id: B1 Size: 0x11(17) BidFollower: B2 BidTarget: B10
89 0075FE00 53 push ebx
0075FE01 56 push esi
0075FE02 57 push edi
0075FE03 8B 44 24 14 mov eax,dword ptr [esp+14]
0075FE07 8B F8 mov edi,eax
0075FE09 8B 74 24 18 mov esi,dword ptr [esp+18]
0075FE0D 85 F6 test esi,esi
0075FE0F 74 EC 01 0A 00 je 00800000
Id: B2 Size: 0xf(15) BidFollower: B3 BidTarget: nil
89 0075FE14 C7 06 FF FF FF FF mov dword ptr
0075FE1A 8B 4C 24 10 mov ecx,dword ptr [esp+10]
0075FE1E 8B 26 00 00 00 mov ebx,00000026
Id: B3 Size: 0x4(4) BidFollower: B4 BidTarget: B8
927 0075FE23 38 19 cmp byte ptr [ecx],bl
0075FE25 74 14 je 0075FE3B
Id: B4 Size: 0x9(9) BidFollower: B5 BidTarget: B7
927 0075FE27 8A 11 mov dl,byte ptr [ecx]
0075FE29 88 10 mov byte ptr [eax],dl
0075FE2B 41 inc ecx
0075FE2C 84 D2 test dl,dl
0075FE2E 74 03 je 0075FE33
Id: B5 Size: 0x3(3); BidFollower: 86 BidTarget: B7
838 0075FE30 40 inc eax
0075FE31 EB DF jmp 0075FE23
Id: B6 Size: 0x8(8) BidFollower: B7 BidTarget: nil
89 0075FE33 2B C7 sub eax,edi
0075FE35 5F pop edi
0075FE36 5E pop esi
0075FE37 5B pop ebx
0075FE38 C2 0C 00 ret 000C
Id: B7 Size: 0x5(5) BidFollower: B8 BidTarget: nil
59 0075FE3B 83 3E FF cmp dword ptr [esi],FF
0075FE3E 75 0B jne 0075FE4A
Id: B8 Size: 0x5(5) BidFollower: B9 BidTarget: B13
59 0075FE40 38 59 01 cmp byte ptr [ecx+01],bl
0075FE43 74 06 je 0075FE4A
Id: B9 Size: 0x6(6) BidFollower: nil BidTarget: B3
59 0075FE45 8B D0 mov edx,eax
0075FE47 2B D7 sub edx,edi
0075FE49 89 16 mov dword ptr [esi],edx
Id: B10 Size: 0x1(1) BidFollower: B11 BidTarget: B13
59 0075FE4B 41 inc ecx
0075FE4C EB D6 jmp 0075FE27
The following three basic blocks are relocated away from the rest of the basic blocks because they were not executed during execution of the instrument ed computer program.
.quadrature.
Id: B11 Size: 0xd(13) BidFollower: B12 BidTarget: nil
0 00800000 8B 4C 24 10 mov ecx,dword ptr [esp+10]
00800004 8A 11 mov dl,byte ptr [ecx]
00800006 88 10 mov byte ptr [eax],dl
00800008 41 inc ecx
00800009 84 D2 test dl,dl
0080000B 74 DC je 0075FE41
Id: B12 Size: 0x2(2) BidFollower: B13 BidTarget: B12
0 0080000D 8B FF mov edi,edi
Id: B13 Size: 0xa(10) BidFollower: nil BidTarget: nil
0 0080000F 40 inc eax
00800010 8A 11 mov dl,byte ptr [ecx]
00800012 88 10 mov byte ptr [eax],dl
00800014 41 inc ecx
00800015 84 D2 test dl,dl
00800017 75 F6 jne 0080000F
00800019 E9 FF F5 FE 23 jmp 0075FE41
In other embodiments of the present invention, further optimizations may be made to the computer program 116 by replacing rarely executed instructions with other instructions that require a smaller amount of storage space. For example, a conditional branch instruction typically occupies 5-6 bytes of storage space, while a conditional branch-to-self instruction typically occupies 2 bytes of storage space. After using the previously-described methods to identify and separate live code blocks from dead code blocks, conditional branches from the live code to the dead code may be identified and replaced by conditional branch-to-self instructions. For each replacement, a savings of 3-4 bytes of storage space is realized. The execution of a conditional branch-to-self instruction results in an execution of an "infinite loop" when the condition is met (i.e., transfer to a rarely executed code block). A monitor process detects when a program is executing such an infinite loop and causes the program to branch to the rarely executed code. FIG. 9 is a flow diagram of a method used in this alternate embodiment of the present invention to identify conditional branch instructions within the computer program and replace each conditional branch instruction with a conditional branch-to-self instruction. In steps 901-903, the optimizer program 114 examines the executable image, searching for a conditional branch instruction that branches from live code to dead code. In step 905, the optimizer program 114 stores the address, or index from some location within the executable image, of the located conditional branch instruction and the address of the conditional branch instruction's target instruction in a storage data structure such as a table. In step 907, the optimizer program 114 replaces the located conditional branch instruction with a conditional branch-to-self instruction. Execution of the conditional branch-to-self instruction will cause an infinite loop to occur when the condition is met. This alternate embodiment also provides a monitoring process to monitor the executing computer program, detect an infinite loop, and take appropriate action. FIG. 10 is a flow diagram of a monitoring process used in this alternate embodiment of the present invention. When the monitoring process detects that an infinite loop is occurring (step 1003), in step 1005 the monitoring process determines the address of the instruction which caused the infinite loop and then searches the storage data structure for an entry matching the address. To detect that an infinite loop is occurring, the monitoring process reads the address stored in the program counter, determines which instruction is stored at that address, and, if the instruction stored at that address is a conditional branch-to-self, determines if the condition has been satisfied. One method of determining the address of the instruction which caused the infinite loop is reading the address currently stored in the program counter. If a matching entry is found in the storage data structure (step 1006), then in step 1007 the monitoring routine causes the monitored computer program to continue execution at the target instruction corresponding to the entry in the storage data structure. Preferably, the monitoring process is a background process, that is, the monitoring routine is assigned a lower priority than the executing computer program in the computer system's allotment of time to tasks so that the monitoring routine only gets a small percentage of processing time. Although the present invention has been described in terms of a preferred embodiment, it is not intended that the invention be limited to this embodiment. Modifications within the spirit of the invention will be apparent to those skilled in the art; the scope of the present invention is defined by the claims which follow.
|
Same subclass Same class Consider this |
||||||||||
