Method and system for simulating execution of a target program in a simulated target system6973417Abstract A method and system for simulating the execution of a software program on a simulated hardware system. An instrumented software program is divided into program segments delineated by tags and is then analyzed for data describing the program segments. The data is tabulated and indexed in a function data table according to the program segments. Hardware parameters that at least define a portion of the simulated hardware system are tabulated in a hardware configuration file. The software program is executed on a host system, and when a tag is executed, data indexed in the function data table under the program segment corresponding to the executed tag and hardware parameters tabulated in the hardware configuration file are used to calculate an estimated execution time for the program segment corresponding to the executed tag. The estimated execution time for the program segment is added to a running total for the overall execution time of the software program. The system includes a memory for storing software parameters describing program segments of the software program, and for storing hardware parameters that at least partially define the simulated hardware system. The system further includes a processor for calculating an estimated execution time for the software program using the software and hardware parameters. Claims 1. A method for simulating the execution of a software program on a simulated hardware system at least partially defined by hardware parameters, the method comprising: Description TECHNICAL FIELD
Note that in general, the above data may be tabulated multiple times for the same code segment, but with different predicted branch tag numbers. It will be appreciated that more or less data for each code segment may be tabulated in the function data table 26. A more detailed description of the assembly analyzer 24 will be provided below. Where the software developer has been provided only with a compiled version of the target software rather than the source code, which may be the case where a portion of the target software has been written by a different software developer, a library data table 34 may be provided in place of the function data table 26. The library data table would contain similar data as that tabulated in the function data table 26 such that the library data table 34 can be referenced to calculate the estimated execution time for the code segments of the compiled version of the target software. One preferred method for invoking the information in the library data table is to instrument the available source code with external function tags which indicate when a library function is being called. A technique for doing this is described in the patent application "Instrumentation Tagging for Testing and Debugging a Computer Program having Source Code Unavailable for Instrumentation." by Sidney R. Maxwell, bearing application Ser. No. 09/234,785, filed Jan. 20, 1999 and commonly owned with the present application. Other techniques for invoking the information in the library data table may also be used. When encountered in the program, these external function tags would access the information in the library data table. It is also possible to use instruction-level instrumentation in the host version of the library to accomplish a similar effect. The instrumented source code 18 is compiled again using the host compiler 38 to provide a host executable 40 that will eventually be executed in the host system 30. Whenever a coverage tag in the host executable 40 is executed by the host system 30, a Tag function is called by the executed coverage tag to reference the function data table 26 and obtain the data indexed under the unique number provided to the Tag function by the executed coverage tag. As will be explained in more detail below, the Tag function calculates the execution time of the corresponding code segment and adds this value to a running total for the total target program execution time. In addition to referencing the function data table 26, the Tag function references a hardware configuration file 44 to obtain hardware parameters defining a simulated target hardware system 46. The target hardware includes a target processor 48 as well as target circuitry, such as a target memory 49. These hardware parameters may be provided by the software developer. Prior to executing the host executable 40 on the host system 30, the data of the hardware configuration file 44 is provided to the host system 30 during an initialization sequence. The hardware parameters stored in the hardware configuration file 44 include parameters that can be observed directly from the hardware design of the target hardware 46, as well as parameters that may be calculated analytically from a known design or estimated experimentally. For example, some hardware parameters that may be directly observed from the hardware design are:
Another example of possible hardware parameters in another embodiment include the following: Target processor parameters:
Program store parameters:
Data store parameters:
Volatile store parameters:
Some examples of hardware parameters that can be estimated experimentally are the data look-ahead factor, the real-time operating system (RTOS) kernal overhead percentage, and the RTOS interrupt instructions. The data look-ahead factor can be estimated by running a program which creates a data array that is too large to fit in the data cache, and then runs a set of experiments with increasing amounts of computation between data accesses. In each experiment, execution time is measured for the same computation with and without data accesses. In the first experiment, each element of the data array is read successively with an absolute minimum of computation. This measurement is then repeated with essentially the same computation, but without the data array accesses. The difference between these two execution times provides a first estimate of the data access time. This experiment is then repeated with increasing amounts of computation between data accesses. Results of these experiments can be analyzed by plotting data access time as a function of computation time. What should be observed is that the amount of time consumed by data accesses decreases approximately linearly from some upper limit for very small amounts of computation to a lower limit for relatively large amounts of computation. The slope of this curve is the data look-ahead factor. The lower limit of the data access time should be noted as well. In order to experimentally estimate the RTOS kernel overhead percentage, the execution time of the test program is measured both when the program is run as a stand-alone program on the target hardware 46, and when the program is run as a task by the RTOS. Any difference in execution time is assumed to be RTOS overhead. This experiment should probably be run with several test programs of different length in order to make sure that the modeling method is valid, as well as to obtain more accurate parameters. The RTOS interrupt instructions parameter must be determined through a hardware measurement. Ideally, one would want to measure the time between assertion of the hardware interrupt and access to the address of the associated interrupt service routine (ISR). However, this method may not be possible where there is instruction caching. That is, the ISR could be present in the instruction cache, so that the address of the first instruction in the ISR might never show up on the address bus. In order to avoid this difficulty, the ISR in the test program should write immediately to a static address chosen from the memory map of the test circuit. The total interrupt latency can then be measured as the delay between the assertion of an interrupt at the pin of the processor and the appearance on the address bus of the write address from the ISR. Once the total interrupt latency has been measured, it can be decomposed into its component parts. The interrupt latency should be measured with and without the RTOS. In the absence of the RTOS, the interrupt latency should turn out to be the sum of the hardware latency plus the function call time. However, this assumption should be verified by the software developer. With the RTOS, any increase in interrupt latency will be attributed to the RTOS. It will be appreciated that the type of hardware parameters included in the hardware configuration file 44 may include more or less than that described above and still remain within the scope of the present invention. As with the types of code segment data tabulated in the function data table 26, the types of data that may be included in the hardware configuration file 44 described herein is not intended to be exhaustive. As mentioned previously, when a coverage tag is executed by the host system 30, a Tag function is called to calculate the execution time of the code segment corresponding to the executed coverage tag. The execution time of a code segment is calculated with respect to the following variables:
The execution time for a program segment consists of three components: time spent in the CPU processing the instructions, time spent getting the next instructions, and time spent reading and writing data. Although it will be possible to accurately estimate the CPU processing time and the instruction access time, it will not be possible to accurately estimate the data access time because that will, in general, be data-dependent. The most general version of the estimation procedure will therefore yield a minimum, maximum, and expected execution time, along with an estimation variance. For processor architectures which involve pipelining, branch prediction and pre-fetching, the number of clock cycles spent executing a program segment in the CPU will, in general, be a function of the program segment which follows it. That is, there will be clock cycles wasted due to branch mis-prediction. Also, the time taken to execute a switch statement may be an even more complex function of the branch taken. Thus, to make the solution suitably general, the number of CPU cycles is expressed as a function of the preceding tag and the current tag. Thus, the tag table will have entries for a number of possible pairs of tags. The resulting equation is Namely, the number of instructions, which is a function of the tag number and the predicted branch tag, is multiplied by the CPU cycle time. The amount of time spent reading instructions from memory is heavily dependent on instruction caching behavior. It is assumed, however, that instruction processing will be performed in parallel with instruction access, so that the cpu cycle time can be subtracted from the instruction access. The time taken to get an instruction from program store must take into account that instructions will usually be fetched as part of a cache line (and also the number of wait states for the program store). This calculation also takes into account the fact that the effects of pipeline delay in the CPU can be eliminated by counting the program segment length at a fixed location in the CPU pipe. If the result of the equation is less than zero, the answer is set equal to zero. It is assumed that processing must stop to await a data access. This is a somewhat pessimistic assumption; but anything more precise must take into account the fact that there may be instruction accesses as well as data accesses during a given program segment. If the processor contains a data cache, then the data access time is otherwise Similarly, there will be some program segments for which the data access is to volatile memory. The data access time for these program segments is In an alternative embodiment, the execution time of a code segment is calculated based on the following principles and equations: The first component of the code segment execution time, the CPU time, is the time needed to execute the instructions of the code segment, including any affect from parallel instruction execution and branch prediction. The CPU time can be calculated from the equation: where the number of CPU cycles to execute the code segment is obtained from the function data table 26 and, as will be explained in greater detail below, the branch prediction penalty may be experimentally obtained. The CPU cycle period is obtained from the hardware configuration file 44. The second component of the code segment execution time, the instruction access time, is the additional time required to make the instruction available for execution by the target processor 48, and includes the effects of caching and the wait states associated with program memory. Single-level caches, as well as two-level caches having a primary L1 cache and a secondary L2 cache, are accommodated in the instruction access time calculation. The instruction access time may be calculated from the equation: The resulting instruction access time is added to the CPU time when it is greater than or equal zero. If the result is less than zero, the instruction access time is ignored. The program memory access time per instruction may be calculated from the equation: Information related to the size of the L1 and L2 caches, cache access time, memory access time, bus cycle time, and memory wait states are provided by the hardware parameters of the hardware configuration file 44. The number of instructions not in the caches, and consequently, the number of instructions that must be loaded from program memory, will be calculated based on information provided from an Instruction Cache model of the simulated target processor 48. The Instruction Cache model will be explained in greater detail below. The third component of the code segment execution time, the data access time, is the additional time required to make the data available for the target processor 48, and includes the effects of data caching, look-ahead accesses, and access to volatile memory. The data access time may be calculated from the equation: The resulting data access time is added to the CPU time and instruction access time when it is greater than or equal to zero. Otherwise, the data access time is ignored. The data memory access time per instruction may be calculated from the equation: and the volatile memory access time may be calculated from the equation: The hardware parameters, such as the data memory access time per byte, and data and volatile memory wait states, are obtained from the hardware configuration file 44. It will be appreciated that the various equations provided above for calculating the code segment execution time may be modified and remain within the scope of the present invention. The equations may be modified in light of the hardware configuration of the simulated target hardware 46. For example, if the simulated target processor 48 has only a single-level instruction cache, the equation for calculating the instruction access time will be to: If the result of the equation is less than zero, the answer is set equal to zero. Two additional factors which must be considered in a real-time system are interrupt latency and RTOS kernal overhead. These two factors can be estimated using the following parameters:
The first two parameters can be derived from data provided by the RTOS supplier, while the third parameter can be determined through a hardware measurement. Ideally, one would want to measure the time between assertion of the hardware interrupt and access to the address of the associated ISR; however that measurement could be invalidated by instruction caching. That is, the ISR could be present in the instruction cache, so that the address of the first instruction in the ISR might never show up on the address bus. In order to avoid this difficulty, a test program should be written in which the ISR writes immediately to a static address chosen from the memory map of the test circuit. The total interrupt latency can then be measured as the delay between the assertion of an interrupt at the pin of the processor and the appearance on the address bus of the write address from the ISR. This total interrupt latency should be measured with and without the RTOS. Once the total interrupt latency has been measured, it can be decomposed into its component parts. The interrupt latency should be measured with and without the RTOS. In the absence of the RTOS, the interrupt latency will be the sum of the hardware latency plus the function call time. With the RTOS, any increase in interrupt latency will be attributed to the RTOS. The interrupt latency must be added to the execution time every time an ISR is called. This delay is calculated as In addition, the RTOS interrupt handler may need to be loaded into instruction cache, causing additional delay essentially as described in the equation for tinstruction. If the RTOS interrupt handler is not described in the Library Function Table, then the microkernal size, Nkernal, may be used to estimate the time to load the cache. As will be appreciated, other techniques that fall within the scope of the invention may also be used, for example, the equations for calculating the instruction access time may be: This equation may further be modified depending on the desired accuracy of the instruction access time calculation. For example, if the software programmer determines that the effect of the instruction look-ahead factor is negligible, then the deduction from the instruction access time due to this factor can be ignored, resulting in a modified instruction access time equation: Modifying the time equations to adjust the desired accuracy may be helpful to the software programmer where rough time calculations will suffice. Two additional factors that should considered when estimating the execution time of a code segment in a real-time operating system are interrupt latency and RTOS kernal overhead. The interrupt latency may be estimated from the equation: This equation assumes that the interrupt latency is the sum of the hardware and software interrupt latencies, and further assumes that the software interrupt latency may be estimated by the RTOS interrupt latency. The RTOS kernal overhead may be calculated from the equation: where the RTOS overhead percentage is obtained from the hardware configuration file 44 and the segment execution time is provided by the Tag function. This equation assumes that the RTOS kernal is invoked at every clock tick and performs about the same amount of work regardless of conditions. However, this assumption may not be accurate, and the software developer should determine its validity prior to simulation. As mentioned previously, the instrumented source code 14 is compiled once using the target system compiler 20, and another time using the host system compiler 38. The resulting compiled target software 22 is analyzed by the assembly analyzer 24 which generates data regarding the program segments defined by the coverage tags instrumented in the source code 14, such as the number of instructions and data accesses between the coverage tags, the stack size, and similar data. The various data is organized into the function data table 26 and indexed by the tag numbers corresponding to the tag values of the coverage tags. The tabulated data is referenced by the host system 30 for calculating the execution time. The resulting host system executable 40 is executed on the host system 30. As shown in FIG. 2A, the alternative embodiment 2B and 3, upon initiating the execution of the host system executable 40, a Connect to Host function 56 will request the hardware parameters 45 from a processor block 50 of the host system 30 (step 102 of FIG. 3). As mentioned previously, the hardware parameters 45 are obtained by the processor block 50 from the hardware configuration file 44, which includes hardware parameters such as the CPU cycle time, the size of the cache, volatile memory access time, and the like. After the Connect to Host function 56 obtains the hardware parameters from the processor block 50, it constructs an Instruction Cache object 58 as directed by the processor block 50, and stores the rest of the hardware parameters obtained in a shared memory 52. The host system executable 40, which includes the coverage tags instrumented into the source code 14 by the software developer, is executed (step 104) after the initialization sequence. When the coverage tags are executed by the host system 44 (step 108), the coverage tags call a Tag function 60 that performs the execution time calculation and provides the Tag function 60 a tag number (step 110) that corresponds to a code segment defined in the function data table 26 created by the assembly analyzer 24. The Tag function 60 references the function data table 26 (step 112) and obtains data indexed under the tag number, such as the number of instructions for the code segment, the number of CPU cycles, stack data size, and the name of the code segment corresponding to the tag number. As discussed previously, the Tag function 60 calculates a total estimated execution time 70 by summing together a CPU time 62, an instruction access time 64, and a data access time 68. The general equations that are used by the Tag function 60 to compute these times use hardware parameters obtained from the hardware configuration file 44 and the data tabulated in the function data table 26 constructed by the assembly analyzer 24. The Tag function maintains a running total for the total execution time 70 of the target software on the simulated target processor 48. A running total for each portion of the total execution time 70 may also be maintained by the Tag function 60, that is, separate running totals for the CPU time 62, the instruction access time 64, and the data access time 68. After each coverage tag is executed by the host system 30, the Tag function 60 calculates the execution time for the corresponding code segment and adds the execution time to the running total. The equations of the following example are specific applications of the more general equations for calculating the code segment execution time that were discussed previously. The following equations have been simplified from the more general equations in order to facilitate explanation and have been provided as merely examples. The Tag Function 60 calculates the CPU time 62 of the total segment execution time 70 (step 114) from the data obtained from the function data table 26 and the hardware configuration file 44. Namely, the CPU time 62 may be calculated by using the equation: where the number of CPU cycles for the segment corresponding to the tag number is obtained from the tabulated data, and the CPU cycle period is provided by the hardware configuration file 44. Namely, the number of instructions, which is a function of the tag number and the predicted branch tag, is multiplied by the CPU cycle time. As an alternative, the CPU time 62 may be calculated using the equation. where the number of CPU cycles for the segment corresponding to the tag number is obtained from the tabulated data, and the CPU cycle period is provided by the hardware configuration file 44. Alternatively, the CPU time 62 can be calculated using the equation: where the simulated target processor 48 employs a branch prediction technique. The value of the branch prediction penalty may be determined experimentally. After calculating the CPU time 62, the Tag function 60 provides the segment name, number of instructions, tag number, and a cache flag to the Instruction Cache object 58 to calculate the instruction access time (step 116). The Instruction Cache object 58 maintains an image of the instruction cache of the simulated target processor 48 and carries out the calculation for the instruction access time 64 of the total execution time 70. The contents of the instruction cache image may be viewed by the software developer on a display 74. In calculating the instruction access time, the Instruction Cache object 58 determines whether an instruction cache is used by checking the status of the cache flag from the function data table 26. If the cache flag indicates that an instruction cache is not used, the Instruction Cache object 58 calculates the instruction access time using the equation: Alternatively, the Instruction Cache object 58 may calculate the instruction access time using the equation: where the number of instructions is obtained from the function data table 26, and the program memory access time per instruction is the product of the bus cycle time and the program memory wait states, as discussed previously. In the case where an instruction cache is employed, the Instruction Cache object 58 determines whether the code segment defined by the currently executed coverage tag is present in the instruction cache image. If the code segment is present, a zero is returned by the Instruction Cache object 58 to the Tag function 60 which represents the fact that no time will be added to the total execution time 70 for instruction access time. However, if the code segment is not present, the contents of the instruction cache image are removed according to the caching policy, and the currently executed code segment must be loaded. The time to load the code segment may be calculated in two ways, depending on whether the simulated target processor 48 utilizes a single- or two-level cache design. In a single-level cache design, the time to load the code segment may be calculated using the same equation as for calculating the instruction access time where the instruction cache is not used. However, for a two-level cache design, having a primary L1 cache and a secondary L2 cache, the time may be calculated using the equation: Alternatively, the equation: may be used where the L2 access time per instruction is provided by the hardware configuration file 44. After the instruction access time 64 is calculated, the Instruction Cache object 58 updates the instruction cache image in the shared memory 52 so that the software developer can view the most recent contents even if the target software has stopped. The data access time 68 is calculated by the Tag function 60 in a manner similar to the instruction access time 64 (step 118). That is, the data access time 68 is the time necessary to load the data cache with the data not already present. The relevant data to calculate the data access time 68 is obtained by the Tag function 60 from the function data table 26 and the hardware configuration file 44. The Tag function 60 sums together the CPU time 62, the instruction access time 64, and the data access time 68, and adds the result to the running total for the total estimated execution time 70 of target software for the simulated target processor 48 (step 120). The software developer can view the running total of the execution time and the contents of the instruction cache on the display 74 by accessing the shared memory through a user interface. An example of such a user interface is provided in the Rees et al. patent, previously referenced. The user interface requests the processor block 50 of the host system 30 to display the contents of the instruction cache image. The processor block 50 allocates a section of the shared memory 52 to store instruction cache display data, and causes a Instruction Cache Display function 80 to be constructed. The processor block 50 further provides an offset of the shared memory 52 to the Instruction Cache Display function 80 and to the Instruction Cache object 58. The Instruction Cache object 58 updates the image of the instruction cache, and the Instruction Cache Display function 80 displays the data found in the shared memory 52. Shown in FIGS. 4A-C is an embodiment of an assembly analyzer 150 that may be substituted for the assembly analyzer 24 of FIG. 1. The software developer's instrumented code 18 (FIG. 1) is assembled using the target compiler 20 to generate assembly files 154a-b. The assembly files 154a-b are placed in an assembly directory 156. There may be one or more assembly directories, such as 156a-c, containing the assembly code for the user's program. An optional memory map file 158 lists the assembly files 154a-b that describe code which will be placed in memory other than instruction-cached RAM, which is the default memory type. For example, some assembly files may contain initialization code which will be stored in FLASH memory (not shown). In one embodiment, the syntax for the optional memory map file 158 is:
As mentioned previously, a function data table 160 is produced by the assembly analyzer 150. The function data table 160 contains the names of the function and a table with one line of data for every branch of each of the functions analyzed. In one embodiment, the format of the function data table 160 is:
@! A main program 164 shown in FIG. 4A is responsible for constructing a compiler object 166 and a target object 168, routing the assembly files 154a-b to the compiler and target objects, 166 and 168, respectively, and producing the resulting function data table 160. The compiler object 166 is responsible for parsing the assembly files 154a-b and passing the individual instructions to the target object 168. There is typically a different type of compiler object (e.g., derived class) 166 for each compiler family supported. A new class derived from the base class compiler should be written, however, no other significant changes to the program will be required in order to support an additional processor type. The target object 168 is responsible for interpreting the execution flow of individual instructions of the assembly files 154a-b, accumulating the execution time description for each code segment, and writing out the function data table 160. There are different types of target objects (e.g., derived class) 168 for each processor or processor family supported. To support an additional processor type, a new class derived from the base class target may need to be written. However, no other significant changes to the program will be required. The flow chart shown in FIGS. 4A-C further illustrates the relationship between the various elements previously mentioned. The software developer invokes the assembly file analyzer 150, possibly through a shell script or similar process automation (step 200). The analyzer 150 may be invoked with the following command line options:
Once invoked, the main program 164 interprets the command line arguments to obtain a list of assembly directories 156a-c, an extension for the assembly files 154a-b, a memory map file 158, a compiler type, a target type, and a filename for the function data table 160 (step 202). The main program 164 then constructs the target object 168 (step 204) and the compiler object 166 (step 206), and provides a pointer to the target object as one of the arguments to the constructor for the compiler object. The main program 164 further obtains a directory for the assembly directory 156 (step 208) and searches the directory for an assembly file 154 with the specified file extension (step 210). The name of the assembly file 154 is supplied to the compiler object 166. The main program 164 searches the memory map file 158 for the name of the assembly file 154 to be processed (step 212). If the main program 164 finds the name, it sends the memory type to the target object 168 (steps 214-216). At this time, the compiler object 166 opens the assembly file 154 (step 218) and identifies individual instructions (step 220). The op-code, operands, and most recent label are sent to the target object 168. The target object 168 identifies the beginning and end of the individual code segments of the assembly file 154 and accumulates the execution time data for each such segment until the end of the file is reached. A list of function names from the instruction labels is also accumulated by the target object 168 (steps 220-224). Steps 210 through 224 are repeated for each assembly file 154 in the assembly directory 156 (step 226). Similarly, steps 208 through 226 are repeated for each assembly directory 156 in the list of assembly directories (step 228). After completing the list of assembly directories, the main program 164 provides the name for the function data table 160 to the target object 168. The target object 168 sorts the list of function names into alphabetical order, outputs the list of function names to the function data table 160, and sorts the function data table 160 in order of ascending tag number (step 230). The function data table 160 is provided by the target object 168 upon completion. An embodiment of a target object 250 that may be substituted for the target object 168 of FIG. 4C is illustrated in FIG. 5. The target object 250 represents an instruction set and architecture for the Motorola/IBM Power PC family of processors in general, and specifically the Motorola PPC 601. However, it will be appreciated that the selection of the specific target processor is made for the sake of providing an example, and should not be interpreted as limiting the scope of the invention. As mentioned previously, there is typically a different type of target object for each target processor or processor family. As explained above, the target object 250 generally performs two functions. First, a stream of instructions is analyzed to identify program code segments, accumulate a set of execution parameters for each code segment, and accumulate a list of function names. As mentioned previously, the program developer's source code 14 is broken down into code segments that are defined by tags inserted by the instrumenter 16 (FIG. 1). Second, the list of function names and code segment execution parameters is output by the target object 250. The target object 250 maintains a function name list 252 of the function names that have been encountered. A tag table list 254 of accumulated tag table entries 256a-d is also maintained by the target object 250. Each tag table entry 256a-d represents a data structure describing the execution parameters for an individual code segment. FIG. 5 further includes a logical block diagram of a CPU instruction pipeline 270 for a Motorola PPC 601 processor in order to illustrate the overall relationship between the target object 250 and the CPU instruction pipeline 270. The CPU instruction pipeline includes several CPU stages, such as Fetch Arbitration (FA) and Cache Arbitration (CARB) CPU stages, 302 and 304, respectively. Each CPU stage represents an execution stage in one of the pipelines of the target processor. Each stage can contain name and instruction information. The CPU stages also have a processing function which takes the instruction's op code as an argument. For example, the Integer Execution (IE) CPU stage 310 counts the number of CPU cycles consumed by a code segment, starting with the first non-null instruction it executes in the segment and ending when it processes the first instruction for the next code segment. The Cache Access (CACC)CPU stage 306 counts the number of data accesses in a given code segment. The Branch Execute (BE) CPU stage 308 is responsible for determining the predicted branch tag. It does so by storing the label for the predicted branch. The label will be used later as an index to the corresponding tag number. The stages are arranged in FIG. 5 in the same dependency order as they are in the target processor. FIG. 6 illustrates the relationship between an instruction 400 and its surroundings in the target processor. The instruction 400 typically includes an op code, operands, and label. It also has pointers to four other instruction objects 402a-d. These pointers can be used as execution synchronization tags required by the superscalar architecture of the PPC 601. From its op code, the instruction 400 can locate its associated execution table 404. The execution table 404 describes the execution sequence of an instruction and the prerequisites for each execution stage in a tabulated format. Registers 406a-n receive information from the instruction execution table 404 and provide data to the instruction 400. The registers 406a-n may be either a general or special purpose registers in the target processor. 406a-n may be either general or special purpose registers in the target processor. Each of the registers 406a-n knows whether an instruction is making exclusive use of it, whether or not its contents have been defined and, if defined, what value it contains. In operation, instructions 400 are read into the FA and CARB stages 302 and 304 until eight instructions 400a-h are present in the FA/CARB stages. Starting from the end of the CPU instruction pipeline 270, each CPU stage is given the opportunity to update its contents. This process will typically involve determining whether it has completed processing the current instruction and, if it has, requesting the next instruction from the preceding CPU stage. The requested instruction 400 uses its instruction execution table 404 to determine whether the resources are available to move to the next CPU stage. If the required resources are available, the instruction 400 moves to the requesting CPU stage, which immediately processes its instruction(s), thus making the result available for any applicable feed-forward operations. The FA and CARB stages 302 and 304 are the last stages to be updated. This, in effect, returns execution to the beginning. The FA/CARB keeps a record of all the instructions it has processed for a given code segment, thus making those instructions available in case there is an inline loop in the segment. The FA is responsible for counting the number of instructions in a code segment. Although the software developer's source code has been described as being separated into code segments, it is more precise to state that the source code is separated into functions, and each function can be separated into one or more code segments. When instrumented with coverage tags, a function will contain a function entry tag, at least one function exit tag, and a branch tag for every branch point in the function. The segmentation of the code is performed in such a way that there is a unique code segment associated with each tag. The algorithm which assigns code segments to tags must therefore take function entry and exit into account. The first code segment in a function extends from the beginning of the function to the second call to the tag function. Subsequent code segments begin with the call to the tag function, and end either at the next call to the tag function, or else at the very end of the function. Detecting the beginning of the function is also useful in that it makes the function name available when displaying the contents of instruction cache. A function begins either when the first instruction or tagged bubble reaches the IE stage 310 or on the cycle after the tag for a blr or rfi instruction completes in the IE. When a new function begins, the label on the first instruction in the function is assumed to be the function name. The IE stage searches the function name list to see if the new label is on the list. If it is, the IE issues a warning message and continues processing. Otherwise, it adds the new label to the function name list. The IE stage subtracts 8 from the constant portion of the second operand in the stwu instruction and then divides by four to get the number of words of stack data. A new segment begins either at the beginning of a function (as above) or, if not at the beginning of a function, on the cycle after the tag for a bl instruction to the label ExportTag completes in the IE. When the tag for a bl instruction (branch with return address in the link register) to ExportTag completes in the IE, the IE uses the value in the last register to be loaded with a constant value as the tag number. When a new segment begins, the IE appends a new Tag Table Entry to the Tag Table List. This Tag Table Entry is initialized with the memory type and function name. A segment ends either when a tag for a blr (branch to link register) or rfi (return from interrupt) instruction completes in the IE or, if not at the beginning of a function, when a tag for a by instruction to the label ExportTag completes in the IE. If at the beginning of a function, the segment ends either when a tag for a blr or rfi instruction completes in the IE or when the second tag for a bl instruction to the label ExportTag completes in the IE. In other words, the first segment of a function contains the first branch to ExportTag, and gets its segment number from that call to ExportTag. All other segments get their segment number from the call to ExportTag which precedes them. There will be cases in which the source level instrumentation does not detect when a loop will be inserted in the target assembly code. These "in-line" loops will occur, for example when an assignment is made to a large struct, causing all the bytes in the struct to be copied from the source to the destination. These loops will always iterate a constant number of cycles, and can therefore be accurately included inside a program segment. During normal operation, the IE maintains a list of all of the labels it has encountered since the beginning of the program segment, and the number of CPU cycles since each label was first encountered. The BE stage detects when a conditional branch is already resolved (for example by having a defined value in the counter (CTR) register). The BE stage then sends the instruction identity, the branch label and number of loop iterations to the IE stage. The IE stage waits until it detects the branch tag for the loop branching instruction. When it detects the branch tag, the IE stage looks for the loop label in its list of labels. If the label is found, the IE calculates the number of additional CPU cycles consumed by the internal loop: This calculation takes into account that fact that the fetching of loop instructions starts one CPU cycle before the branch tag gets to the IE, and the fact that the loop instructions will not have to be fetched for the first loop iteration and the continued execution past the loop. The IE stage adds the CPU cycle increment to the execution time for each of the labels. Performing the calculation in this way allows nested inlined loops to be supported. When a segment ends, the IE stage creates a function data table entry for the predicted branch for any branch tags which were not resolved within the program segment, and a function data table entry with a null (default) predicted branch tag. For each such function data table entry, the number of CPU cycles is taken from the IE stage, the number of data accesses is taken from the CACC stage, the number of instructions is taken from the FA stage, and the predicted branch tag is taken from the BE stage. The number of CPU cycles is decreased by two to account for the instructions needed to load the tag number, and the number of instructions is decreased by three to account for the instructions used to call ExportTag. Alternatively, a code segment may be viewed conceptually as being either a branch segment or a function. A function is generally a sequence of instructions that does return to the branch call when the sequence is finished. A branch segment, on the other hand, is generally a sequence of instructions that does not return from where the branch segment was initially called when the sequence is completed. A function begins either when the first instruction or tagged bubble reaches the IE stage 310 or on the cycle after the tag for a conditional branch-return (blr) instruction completes in the IE stage 310. When a new function begins, the label on the first instruction in the function is assumed to be the function name. The IE stage 310 searches the function name list 252 to see if the new label is on the list. If it is, the IE stage 310 issues a warning message and continues processing. Otherwise, it adds the new label to the function name list 252. The IE stage 310 subtracts 8 from the constant portion of the second operand in the stwu instruction and then divides by four to get the number of words of stack data. A new branch segment begins either at the beginning of a function, as previously described, or on the cycle after the tag for a conditional branch (bl) instruction to the label ExportTag completes in the IE stage 310. When the tag for a bl instruction to ExportTag completes in the IE stage 310, the IE stage 310 uses the value in the last register to be loaded with a constant value as the tag number. When a new segment begins, the IE stage 310 appends a new tag table entry 256 to the tag table list 254. The tag table entry 256 is initialized with the memory type and segment name. The function and branch segment ends either when a tag for a blr instruction or a tag for a bl instruction to the label ExportTag completes in the IE stage 310. When a segment ends, the number of CPU cycles is taken from the IE stage 310, the number of data accesses is taken from the CACC stage 306, the number of instructions is taken from the FA stage 302, and the predicted branch tag is taken from the BE stage 308. The number of CPU cycles is decreased by two to account for the instructions needed to load the tag number, and the number of instructions is decreased by three to account for the instructions used to call ExportTag. From the foregoing it will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention. Accordingly, the invention is not limited except as by the appended claims.
|
Same subclass Same class Consider this |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
