System and method for state restoration in a diagnostic module for a high-speed microprocessor7043416Abstract A system and method are presented for saving and restoring the state of a diagnostic module in a microprocessor. The diagnostic module contains a complex break state machine, capable of halting the microprocessor at specified breakpoints. These breakpoints are based on combinations of instruction locations and/or data values, along with previous machine states. A problem occurs with prior art diagnostic modules when the processor returns from an exception occurring during a fix-up cycle inserted to handle a data load miss associated with an instruction located in a branch delay slot (the location immediately following a conditional branch instruction). Under these circumstances, the exception handler restores the program counter to the location of the branch instruction, causing the branch to be re-executed. The prior art state machine erroneously updates its internal state a second time when the branch is re-executed. According to the system and method disclosed herein, at each state change the previous machine state saved. Thus, when a branch instruction is re-executed, the complex break state machine of the present invention is restored to its previous state, thereby correcting the error. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
FIG. 1 illustrates the operation of a pipelined processor, which performs all five operations concurrently on five different instructions. Each of the five operations is completed in one clock cycle, as indicated by the T1 . . . T10 axis along the top of the diagram. Note that INSTR. 2 is fetched into the pipeline at time T2, while INSTR. 1 is being decoded. Similarly, processing of INSTR. 2 is completed at time T6—just one clock cycle after the completion of INSTR. 1. The operation of pipeline is analogous to an automobile assembly line, where each worker on the line performs a single assembly operation on a car as it moves down the line—as opposed to completely assembling each car before beginning the next. The benefit of the pipeline is apparent in FIG. 1, where six instructions are processed in the same amount of time (from T1 to T10) taken to process two instructions, if each instruction were processed separately. Optimally, each of the pipeline operations is accomplished in one processor clock cycle (as in FIG. 1). However, if any stage of the pipeline is unable to complete its operation in a single clock cycle, all the preceding stages must be made to wait. This is known as "stalling" the pipeline. This situation often arises in connection with conditional branch instructions. Normally, the processor executes instructions consecutively, in the same order in which they are fetched from memory. A conditional branch is an instruction that tests for a particular condition, and depending on the outcome, redirects the processor to an instruction at a different location (i.e., the "branch target"), rather than the next consecutive instruction. As an example, consider the following sequence of instructions:
The second instruction in this sequence is a conditional branch to the location labeled "underflo." The BLTZ instruction tests the contents of register r3—if r3 contains a value greater than or equal to zero, the next instruction in the sequence (i.e., LW) is executed. However, if r3 contains a negative value, the processor goes to the branch target location ("underflo") and continues execution from there. Conditional branch instructions are part of the instruction set of virtually every microprocessor. However, they pose a problem for the instruction pipeline. To maintain maximum throughput, it is necessary to keep the pipeline full. This requires that as a given instruction enters the execution stage of the pipeline, the next instructions to be executed must be fetched. In FIG. 1, for example, it can be seen that the pipeline fetches the third instruction in the sequence (INSTR. 3) and decodes the second (INSTR. 2) while it executes the first (INSTR. 1). However, in the case of a conditional branch instruction, the next instruction to be executed could (depending on the result of the test) be either the instruction following the branch, or the branch target; prior to executing the branch instruction, it is not known which it will be. To deal with this situation, many modern pipeline-equipped microprocessors employ special classes of branch instructions that allow the programmer to optimize pipeline usage, based on the most likely path taken by a branch. Normally, the instruction immediately following a jump or branch is always executed while the target instruction is being fetched from storage. The "branch likely" class of instructions operate exactly like their non-likely counterparts, except that when the branch is not taken, the instruction following the branch is cancelled. In many cases, the programmer can make a reasonable assumption about the direction most likely to be taken by a branch instruction. By making an appropriate choice of either a normal branch instruction or a branch likely instruction, the pipeline can remain filled in the majority of cases. If it is assumed the branch will not be taken, a normal branch instruction is used; the pipeline then fetches instructions immediately following the branch. On the other hand, if it is assumed the branch will be taken, a branch likely instruction is used; the pipeline then fetches instructions from the branch target, and ignores the instructions directly after the branch. For example, in the above instruction sequence, if it is assumed that the branch will not be taken, the third and fourth instructions will appear in the pipeline immediately following the branch instruction. On the other hand, if it is assumed that the branch will be taken, the fifth and sixth instructions will appear in the pipeline immediately following the branch instruction. In either case, of course, the "guess" may be incorrect—this is often referred to as a "mispredicted branch." When this happens, the pipeline stalls and a few cycles are lost while the pipeline is refilled with the correct instructions. It is often necessary to break the normal program flow in a microprocessor in order to perform some urgent, unscheduled function. For example, a microprocessor might be tasked with monitoring a boiler. Its normal activities might include reporting the temperature of the boiler and the volume of water it contains. It would be desirable, if the pressure within the boiler became dangerously high for instance, to have a means of preemptively overriding the normal activities of the processor to permit it to respond (as quickly as possible) to the situation—say, by venting the boiler and sounding an alarm. The speed with which a microprocessor executes in-line instructions is not a useful indication of how quickly it may react to such an unscheduled stimulus. Although the program may include instructions to periodically read the pressure gauge, if the processor is busy it may not get around to checking the pressure soon enough to avoid an explosion. Interrupts provide a mechanism for forcing the processor to abruptly (and usually, temporarily) abandon its normal program to execute special instructions associated with the interrupt. Interrupts typically make use of special hardware features in the processor, such as an interrupt vector table. Each entry in the vector table is the address of a software routine associated with a particular interrupt (commonly known as an "Interrupt Service Routine" or "ISR"). For example, a processor may receive interrupts from external sources, such as a keyboard or mouse, as well as internal sources, such as a timer. For each potential interrupt source, there is an ISR designed to respond to the interrupt. Associated with every interrupt is an entry in the vector table, containing the address of the corresponding ISR. When an interrupt occurs, the processor finds the location in the vector table corresponding to the interrupt and performs an immediate jump to the address contained there. Additional special hardware in the processor makes it possible to resume processing following the interrupt. Interrupts may arise from a variety of sources, such as timers, external alarms, user input, etc. A special class of interrupts, known as "exceptions," is generally associated with conditions originating within the processor itself. For example, a common type of exception occurs when the processor (generally because of an oversight on the part of the human programmer) attempts to divide by zero. When the processor attempts to perform this operation, an internally generated interrupt (i.e., an exception) redirects the processor to an interrupt vector—e.g. an error message generator. To deal with the increased speed and complexity of today's integrated circuits, the semiconductor industry has developed a standard for on-chip testing. Issued as IEEE Std. 1149.1 and 1149.1a, the JTAG ("Joint Test Action Group") standard was developed to allow standardized testing of an integrated circuit after it has been assembled onto a printed circuit board. The JTAG standard defines an interface, by means of which diagnostic bit patterns ("test vectors") may be applied to the inputs and test results returned from the outputs of the core logic in the device under test. Test vectors are entered and test results retrieved in serial form. Consequently, providing a JTAG interface in an IC does not entail the addition of a large number of pins to the device package. The diagnostic capabilities of a microprocessor may include the use of breakpoints on both data and instructions. Breakpoints permit the processor to run at full speed until a particular instruction or data value, or combination of instructions or data values is encountered, whereupon the processor is immediately halted. Breakpoints are a valuable debugging technique, since they allow the user to effectively "freeze" the processor at a precise stage in its execution and examine its internal state to reveal conditions leading to a problem. Breakpoints may be implemented by state machines in a hardware break module within the processor. These state machines constantly monitor the processor's internal address and data lines, comparing the values present on these lines to predetermined "target" values. The state machine for simple breakpoints responds at the first occurrence of a target address or data value, while the one for complex breakpoints responds to combinations of simple breakpoint events (and previous machine states). For example, a simple breakpoint could be defined to halt the processor as soon as the contents of a specified register become zero. A complex breakpoint, on the other hand, might halt the processor only if the register contains zero while the program counter is in a specified address range. EJTAG, is an extension of the JTAG standard, with a similar serial interface. To facilitate debugging, the simple and complex break state machines may be programmed via an EJTAG-compliant interface on the microprocessor. A further diagnostic feature present in many advanced microprocessors is a real-time program counter (PC) trace. The PC trace feature outputs the current value of the program counter while the processor executes at full speed. If a problem occurs, the PC trace can furnish valuable data for a "post mortem" analysis. To reduce the overhead associated with the trace, program counter information is provided relative to a specified anchor point. Thus, when instructions flow sequentially, there is no need for continuous updating of the program counter value. Only when the program counter changes via a jump, branch, etc. is it necessary to indicate the new program location. FIG. 2 contains a block diagram of an exemplary microprocessor, along with a hardware break module. Referring to FIG. 2, the Central Processor (CPU) 10 is directly coupled to a Memory Management Unit (MMU) 12, as well as to Complex Break Unit 14, within which is a Simple Break Unit 16. Execution of the CPU 10 can be temporarily halted by a Debug Break signal received from OR gate 30. Note that one of the sources of the Debug Break signal can be a Hardware Break signal from the Complex Break Unit 14. Similarly, OR gate 28 generates the Hardware Break signal in response to four possible inputs, three of which are generated by the Simple Break Unit 16, and one of which is generated by Complex Break Logic 24. The Simple Break Unit 16 contains three modules (Instruction Address Match Logic 18, Processor Address Bus and Processor Data Bus Match Logic 20, and Data Address and Data Value Match Logic 22) that originate break inputs to OR gate 28. The three modules constantly compare values on the address bus, data bus and instruction bus with breakpoint values stored in internal registers (not shown), and generate trigger signals when there is a match. If Simple Breaks are enabled, these trigger signals are passed through OR gate 28 to OR gate 30, where they cause a break in the execution of the CPU 10. The three simple trigger signals are also forwarded to Complex Break Logic 24, which generates a trigger complex signal derived from combinations of simple break events. In the present embodiment, the Complex Break Logic 24 is implemented as a state machine, which changes state based on its current state and the trigger signals it receives from the Simple Break Unit 16. The three simple break triggers and the complex break trigger are also forwarded to OR gate 26, which generates a Trace Trigger signal 32, used to generate a PC trace. As stated earlier, the use of fix-up cycles to handle load misses can lead to a problem for the diagnostic circuitry in a pipeline-equipped microprocessor. Recall that a fix-up cycle is inserted during the pipeline processing of an instruction requiring a data fetch. When data required by the instruction (typically, from cache memory) is not available in time to complete the MEM stage of the pipeline, a data load miss occurs. The pipeline must then be halted temporarily and a fix-up cycle added to the pipeline timing while the data is obtained. FIG. 3 illustrates the operation of the 5-stage pipeline for the following sequence of instructions:
The first instruction in this example subtracts the contents of r1 from r2 and places the result in r3. The second instruction tests the subtraction result and branches conditionally (i.e., if the result is positive) to the instruction at the label "go_here." If the contents of r3 are not positive, the third instruction is executed, which loads a value from memory into r2. The fourth instruction simply increments r3, while the fifth instruction stores the value in r3 to memory. The sixth instruction starts the sequence all over again. In FIG. 3, the instructions are shown to the left, in the order in which they are fetched from memory. To the right of each instruction is the sequence of operations performed on it in the pipeline, with the processor cycle associated with each operation shown above. As described earlier, during each processor cycle, the pipeline can perform the following five operations, for example, simultaneously:
The pipeline states along any row indicate the stage of processing of a given instruction for each processor cycle. For example, the third row in FIG. 3 indicates that the LW instruction is fetched during processor cycle T3, decoded during processor cycle T4, executed during processor cycle T5, etc. Since this instruction sequence includes a branch instruction, the behavior of the pipeline depends on whether or not the branch to "go_here" is taken. In this example, it is assumed that the branch will not be taken, so all of the instructions are executed in the order in which they appear in memory. Note that the third instruction (LW) occurs in a branch delay slot, and that a data load miss occurs during processor clock cycle T7, in the WB pipeline stage of this instruction. A fix-up cycle is inserted to allow the data required by the LW instruction to be fetched from memory. Processing of all of the pending instructions in the pipeline is temporarily suspended during the fix-up cycle. Thus, for example, the fourth instruction does not advance beyond the MEM pipeline stage from T7 to T8. On the other hand, the branch instruction has already left the pipeline by the time the fix-up cycle occurs—this event is recorded by the complex break logic within the state machine (item 24 in FIG. 2), as described in greater detail below. A block diagram representing prior art logic for handling fix-up cycles in an exemplary pipelined processor is shown in FIG. 4. The fix-up logic 50 receives inputs from data 52 and address 54 breakpoint comparators; these comparators generate an active logic level signal when a data or address breakpoint is detected. The output of data and address comparators 52 and 54 is latched by registers C1 60, and C3 64, respectively. Similarly, signal dhq_dloadp, signifying a data load in the MEM stage of the pipeline, is latched by register C2 68. During a fix-up cycle, the inputs to the data and address comparators may change state, so registers C1-C3 preserve the state of their respective signals for use by the complex break state machine. Multiplexers 62, 66 and 70 select between the current and the saved state of the data, address and dhq_dloadp signals, respectively. If a fix-up cycle is in progress, the signal dc_fixd is active, and each multiplexer couples the output of its respective register to the siggen module 72a. The siggen module 72a generates trigger signals to halt the processor, initiate a PC trace, etc. During a fix-up cycle, the siggen module 72a also places the combined states received from multiplexers 62, 66 and 70 into register M 74, the output of which is coupled to additional logic 72b within the siggen module. The prior art logic just described suffers from a significant drawback that affects the operation of the on-chip debug circuitry. This flaw relates to the manner in which the complex break state machine is updated when an exception occurs during the fix-up cycle for a conditional branch instruction. As previously described, the complex break state machine is updated each time a qualifying trigger event occurs. It is essential that these updates be accurate, since the breakpoints generated by the state machine are based on particular combinations of trigger events and previous machine states. However, under circumstances such as those described in connection with FIG. 3, the complex break state machine may receive an erroneous update. Referring again to FIG. 3, recall that the branch instruction (BGTZ) emerges from the instruction pipeline during the fix-up cycle inserted to handle a data load miss associated with the instruction in the branch delay slot (LW). As stated above, this causes the diagnostic state machine to be updated. Now, assume that an exception occurs during the fix-up cycle T7. This can occur when the branch delay slot contains a special instruction that directly generates an exception, or with other types of instructions in that location that unintentionally result in an exception (e.g., an overflow on an arithmetic operation). In any case, the exception is summarily responded to by special program code within an exception handler. When it has finished executing, the exception handler typically restores the program counter to the instruction following the one that caused the exception. However, when the instruction responsible for the exception occupies a branch delay slot, the exception handler returns the PC to the previous instruction—i.e., to the branch instruction (BGTZ) itself. However, this results in the instruction passing through the pipeline again, causing the diagnostic state machine to be (incorrectly) updated a second time. Thus, for the instruction sequence of FIG. 3, the prior art fix-up logic shown in FIG. 4 does not correctly update the complex break state machine. The two occurrences in the instruction pipeline of the same branch instruction will result in a spurious second update of the complex break state machine, violating specified requirements for the on-chip diagnostic module. An embodiment of the system and method for backing up and restoring complex break state information, illustrated in FIG. 5, overcomes this problem. Note that some of the circuitry in FIG. 5 is contained in the prior art system shown in FIG. 4. Components in FIG. 5 that are also present in FIG. 4 have the same item numbers. In the embodiment shown in FIG. 5, the state contained in register M 74 is forwarded through a portion of the siggen circuitry 72b to multiplexer 76. During normal operation (i.e., not a fix-up cycle), the multiplexer selects this input and latches the state in register W 78. The contents of register W 78 are also copied into a Backup register 80. During a fix-up cycle, the multiplexer selects the contents of the Backup register 80, instead of the current state presented by the siggen module 72b. In the embodiment of FIG. 5, the state saved in register W 78 will not be incorrectly updated a second time when the branch instruction is re-executed upon returning from the exception handler. Instead, the previous state (saved in the backup register 80) is retained. Thus, the state history of the complex break state machine, and the PC trace will correctly indicate only one execution instance of the branch instruction. The system and method disclosed herein correctly update the complex break state machine in the diagnostic module of a microprocessor. Advantageously, this is accomplished without resorting to costly, extensive modification of the microprocessor architecture (e.g. lengthening the instruction pipeline). It is believed that this system and method may be incorporated into a high-performance microprocessor design with no loss in performance or capabilities. It will be appreciated by those skilled in the art having the benefit of this disclosure that this invention is believed to present a system and method for saving and restoring the state of a diagnostic module. Further modifications and alternative embodiments of various aspects of the invention will be apparent to those skilled in the art in view of this description. Details described herein, such as the number of state machines in the diagnostic module and the exact manner in which machine states are backed up and restored, are exemplary of a particular embodiment. It is intended that the following claims be interpreted to embrace all such modifications and changes and, accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
|
Same subclass Same class Consider this |
||||||||||||||||||
