System for obtaining parallel execution of existing instructions in a particulr data processing configuration by compounding rules based on instruction categories5732234Abstract A system for processing a sequence of instructions has a set of compounding rules based on an analysis of existing instructions to separate them into different classes. The analysis determines which instructions qualify, either with instructions in their own class or with instructions in other classes, for parallel execution in a particular hardware configuration. Such compounding rules are used as a standard for pre-processing an instruction stream in order to look for groups of two or more adjacent scalar instructions that can be executed in parallel. Claims We claim: Description FIELD OF THE INVENTION
TABLE 1
______________________________________
X1 ;any compoundable instruction
X2 ;any compoundable instruction
LOAD R1,(X) ;load R1 from memory location X
ADD R3,R1 ;R3 = R3 + R1
SUB R1,R2 ;R1 = R1 - R2
COMP R1,R3, ;compare R1 with R3
X3 ;any compoundable instruction
X4 ;any compoundable instruction
______________________________________
If the hardware imposed upper limit on compounding is two (at most, two instructions can be executed in parallel in the same cycle), then there are a number of ways to compound this sequence of instructions depending on the scope of the compounding software. If the scope of compounding were equal to four, then the compounding software would consider together (X1, X2, LOAD, ADD) and then slide forward one instruction at a time to consider together (X2, LOAD, ADD, SUB) and (LOAD, ADD, SUB, COMP) and (ADD, SUB, COMP, X3) and (SUB, COMP, X3, X4), thereby producing the following optimum pairings as candidates for a compound instruction: ›-- X1! ›X2LOAD! ›ADD SUB! ›COMP X3! ›X4--! This optimum pairing provided by the invention completely relieves the interlocks between the LOAD and ADD and between the SUB and COMP, and provides the additional possibilities of X1 being compounded with its preceding instruction and of X4 being compounded with its following instruction. On the other hand, a superscalar machine which pairs instructions dynamically in its instruction issue logic on strictly a FIFO basis, would produce only the following pairings as candidates for parallel execution: ›X1 X2! ›LOAD ADD! ›SUB COMP! ›X3 X4! This inflexible pairing incurs the full penalty of certain interlocking instructions, and only partial benefit of parallel processing are achieved. The self explanatory flow chart of FIG. 13 shows the various steps taken to determine which adjacent existing instructions in a byte stream are in categories or classes which qualify them for being grouped together to form a compound instruction for a particular computer system configuration. Referring to FIG. 5, there are many possible locations in a computer system where compounding may occur, both in software and in hardware. Each has unique advantages and disadvantages. As shown in FIG. 5, there are various stages that a program typically takes from source code to actual execution. During the compilation phase, a source program is translated into machine code and stored on a disk 46. During the execution phase the program is read from the disk 46 and loaded into a main memory 48 of a particular computer system configuration 50 where the instructions are executed by appropriate instruction processing units 52, 54, 56. Compounding could take place anywhere along this path. In general as the compounder is located closer to an instruction processing unit or CPUs, the time constraints become more stringent. As the compounder is located further from the CPU, more instructions can be examined in a large sized instruction stream window to determine the best grouping for compounding for increasing execution performance. However such early compounding tends to have more of an impact on the rest of the system design in terms of additional development and cost requirements. One of the important objects of the invention is to provide a technique for existing programs written in existing high level languages or existing assembly language programs to be processed by software means which can identify sequences of adjacent instructions capable of parallel execution by individual functional units. The flow diagram of FIG. 6 shows the generation of a compound instruction set program from an assembly language program in accordance with a set of customized compounding rules 58 which reflect both the system and hardware architecture. The assembly language program is provided as an input to a software compounding facility 59 that produces the compound instruction program. Successive blocks of instructions having a predetermined length are analyzed by the software compounding facility 59. The length of each block 60, 62, 64 in the byte stream which contains the group of instructions considered together for compounding is dependent on the complexity of the compounding facility. As shown in FIG. 6, this particular compounding facility is designed to consider two-way compounding for "m" number of fixed length instructions in each block. The primary first step is to consider if the first and second instructions constitute a compoundable pair, and then if the second and third constitute a compoundable pair, and then if the third and fourth constitute a compoundable pair, all the way to the end of the block. Once the various possible compoundable pairs C1-C5 have been identified, an additional very desirable step is to determine the optimum choice of compound instructions formed by adjacent scalar instructions for parallel execution. In the example shown, the following different sequences of compounded instructions are possible (assuming no branching): I1, C2, I4, I5, C3, C5, I10; I1, C2, I4, I5, I6, C4, I9, I10; C1, I3, I4, I5, C3, C5, I10; C1, I3, I4, I5, I6, C4, I9, I10. Based on the particular hardware configuration, the compounding facility can select the preferred sequence of compounded instructions and use flags or identifier bits to identify the optimum sequence of compound instructions. If there is no optimum sequence, all of the compoundable adjacent scalar instructions can be identified so that a branch to a target located amongst various compound instructions can exploit any of the compounded pairs which are encountered (See FIG. 14). Where multiple compounding units are available, multiple successive blocks in the instruction stream could be compounded at the same time. The specific design of a software compounding facility will not be discussed here because the details are unique to a given instruction set architecture and underlying implementation. Although the design of such compounding programs is somewhat similar in concept to modern compilers which perform instruction scheduling and other optimizations based on a specific machine architecture, the criteria used to complete such compounding are unique to this invention, as best shown in the flow chart of FIG. 13. In both instances, given an input program and a description of the instruction set and also of the hardware architectures (i.e., the structural aspects of the implementation), an output program is produced. In the case of the modern compiler, the output is an optimized new sequence of existing instructions. In the case of the invention, the output is a series of compound instructions each formed by a group of adjacent scalar instructions capable of parallel execution, with the compound instructions being intermixed with non-compounded scalar instructions, and with the necessary control bits for execution of the compound instructions included as part of the output. Of course, it is easier to pre-process an instruction stream for the purpose of creating compound instructions if known reference points already exist to indicate where instructions begin. As used herein, a reference point means some marking field or other indicator which provides information about the location of instruction boundaries. In many computer systems such a reference point is expressly known only by the compiler at compile time and only by the CPU when instructions are fetched. Such a reference point is unknown between compile time and instruction fetch unless a special reference tagging scheme is adopted. When compounding is done after compile time, a compiler could indicate with reference tags (see FIG. 11) which bytes contain the first byte of an instruction and which contain data. This extra information results in a more efficient compounder since exact instruction locations are known. Of course, the compiler could identify instructions and differentiate between instructions and data in other ways in order to provide the compounder with specific information indicating instruction boundaries. When such instruction boundary information is known, the generation of the appropriate compounding identifier bits proceeds in a straightforward manner based on the compounding rules developed for a particular architecture and system hardware configuration (See FIG. 8). When such instruction boundary information is not known, and the instructions are of variable length, a more complex problem is presented (See FIGS. 9 and 16). Incidentally, these figures are based on a preferred encoding scheme described in more detail in Table 2A below, wherein two-way compounding provides a tag bit of "1" if an instruction is compounded with the next instruction, and a tag bit of "0" if it is not compounded with the next instruction. The control bits in a control field added by a compounder contain information relevant to the execution of compound instructions and may contain as little or as much information as is deemed effective for a particular implementation. An exemplary 8-bit control field is shown in FIG. 12. However, only the first control bit is required in the simplest embodiment to indicate the beginning of a compound instruction. The other control bits provide additional optional information relating to the execution of the instructions. In an alternate encoding pattern for compounded instructions applicable to both two-way compounding as well as large group compounding, a first control bit is set to "1" to indicate that the corresponding instruction marks the beginning of a compound instruction. All other members of the compound instruction will have their first control bit set to "0". On occasion, it will not be possible to combine a given instruction with other instructions, so such a given instruction will appear to be a compound instruction of length one. That is, the first control bit will be set to "1", but the first control bit of the following instruction will also be set to "1". Under this alternate encoding scheme, the decoding hardware will be able to detect how many instructions comprise the compound instruction by monitoring all of the identifier bits for a series of scalar instructions, rather than merely monitoring the identifier bit for the beginning of a compound instruction as in the preferred encoding scheme shown below in Tables 2A-2C. The flow diagram of FIG. 7 shows a typical implementation for executing a compound instruction set program which has been generated by a hardware preprocessor 66 or a software processor 67. A byte stream having compound instructions flows into a compound instruction (CI) cache 68 that serves as a storage buffer providing fast access to compound instructions. CI issue logic 69 fetches compound instructions from the CI Cache and issues their individual compounded instructions to the appropriate functions units for parallel execution. It is to be emphasized that compound instruction execution units (CI EU) 71 such as ALU's in a compound instruction computer system are capable of executing either scalar instructions one at a time by themselves or alternatively compounded scalar instructions in parallel with other compounded scalar instructions. Also, such parallel execution can be done in different types of execution units such as ALU's, floating point (FP) units 73, storage address-generation units (AU) 75 or in a plurality of the same type of units (FP1, FP2, etc) in accordance with the computer architecture and the specific computer system configuration. Thus, the hardware configurations which can implement the present invention are scalable up to virtually unlimited numbers of execution units in order to obtain maximum parallel processing performance. Combining several existing instructions into a single compound instruction allows one or more instruction processing units in a computer system to effectively decode and execute those compounded existing instructions in parallel without the delay that arises in conventional parallel processing computer systems. In the simplest exemplary encoding schemes of this application, minimal compounding information is added to the instruction stream as one bit for every two bytes of text (instructions and data). In general, a tag containing control information can be added to each instruction in the compounded byte stream--that is, to each non-compounded scalar instruction as well as to each compounded scalar instruction included in a pair, triplet, or larger compounded group. As used herein, identifier bits refers to that part of the tag used specifically to identify and differentiate those compounded scalar instructions forming a compounded group from the remaining non-compounded scalar instructions. Such non-compounded scalar instructions remain in the compound instruction program and when fetched are executed singly. In a system with all 4-byte instructions aligned on a four byte boundary, one tag is associated with each four bytes of text. Similarly, if instructions can be aligned arbitrarily, a tag is needed for every byte of text. In the illustrated embodiment herein, all System/370 instructions are aligned on a halfword (two-byte) boundary with lengths of either two or four or six bytes, one tag with identifier bits is needed for every halfword. In a small grouping example for compounding pairs of adjacent instructions, an identifier bit "1" indicates that the instruction that begins in the byte under consideration is compounded with the following instruction, while a indicates that the instruction that begins in the byte under consideration is not compounded. The identifier bit associated with halfwords that do not contain the first byte of an instruction is ignored. The identifier bit for the first byte of the second instruction in a compounded pair is also ignored. (However, in some branching situations, these identifier bits are not ignored.) As a result, this encoding procedure for identifier bits means that in the simplest case of two-way compounding, only one bit of information is needed by a CPU during execution to identify a compounded instruction. Where more than two scalar instructions can be grouped together to form a compound instruction, additional identifier bits may be required to provide adequate control information. However, in order to reduce the number of bits required for minimal control information, there is still another alternative format for keeping track of the compounding information. For example, even with large group compounding, it is possible to achieve one bit per instruction with the following encoding: the value "1" means to compound with the next instruction, and the value "0" means to not compound with the next instruction. A compound instruction formed with a group of four individual instructions would have a sequence of compounding identifier bits (1,1,1,0). As with the execution of other compound instructions described herein, compounding identifier bits associated with halfwords which are not instructions and therefore do not have any opcodes are ignore at execution time. Under the preferred encoding scheme described in detail below, the minimum number of identifier bits needed to provide the additional information of indicating the specific number of scalar instructions actually compounded is the logarithm to the base 2 (rounded up to the nearest whole number) of the maximum number of scalar instructions that can be grouped to form a compound instruction. For example, if the maximum is two, then one identifier bit is needed for each compound instruction. If the maximum is three or four, then two identifier bits are needed for each compound instruction. If the maximum is five, six, seven or eight, then three identifier bits are needed for each compound instruction. This encoding scheme is shown below in Tables 2A, 2B and 2C:
TABLE 2A
______________________________________
(maximum of two)
Identifier Total #
Bits Encoded meaning Compounded
______________________________________
0 This instruction is not compounded
none
with its following instruction
1 This instruction is compounded
two
with its one following instruction
______________________________________
It will therefore be understood that each halfword needs a tag, but under this preferred encoding scheme the CPU ignores all but the tag for the first instruction in the instruction stream being executed. In other words, a byte is examined to determine if it is a compound instruction by checking its identifier bits. If it is not the beginning of a compound instruction, its identifier bits are zero. If the byte is the beginning of a compound instruction containing two scalar instructions, the identifier bits are "1" for the first instruction and "0" for the second instruction. If the byte is the beginning of a compound instruction containing three scalar instructions, the identifier bits are "2" for the first instruction and "1" for the second instruction and "0" for the third instruction. In other words, the identifier bits for each half word identify whether or not this particular byte is the beginning of a compound instruction while at the same time indicating the number of instructions which make up the compounded group. These exemplary methods of encoding compound instructions assume that if three instructions are compounded to form a triplet group, the second and third instructions are also compounded to form a pair group. In other words, if a branch to the second instruction in a triplet group occurs, the identifier bit "1" for the second instruction indicates that the second and third instruction will execute as a compounded pair in parallel, even though the first instruction in the triplet group was not executed. Of course, the invention is not limited to this particular preferred encoding scheme. Various other encoding rules, such as the alternate encoding scheme previously described, are possible within the scope and teachings of the invention. It will be apparent to those skilled in the art that the present invention requires an instruction stream to be compounded only once for a particular computer system configuration, and thereafter any fetch of compounded instructions will also cause a fetch of the identifier bits associated therewith. This avoids the need for the inefficient last-minute determination and selection of certain scalar instructions for parallel execution that repeatedly occurs every time the same or different instructions are fetched for execution in the so-called super scalar machine. Despite all of the advantages of compounding a binary instruction stream, it becomes difficult to do so under certain computer architectures unless a technique is developed for determining instruction boundaries in a byte string. Such a determination is complicated when variable length instructions are allowed, and is further complicated when data and instructions can be intermixed, and when modifications are allowed to be made directly to the instruction stream. Of course, at execution time instruction boundaries must be known to allow proper execution. But since compounding is preferably done a sufficient time prior to instruction execution, a unique technique has been developed to compound instructions without knowledge of where instructions start and without knowledge of which bytes are data. This technique is described generally below and can be used for creating compound instructions formed from adjacent pairs of scalar instructions as well as for creating compound instructions formed from larger groups of scalar instructions. This technique is applicable to all instruction sets of the various conventional types of architectures, including the RISC (Reduced Instruction Set Computers) architectures in which instructions are usually a constant length and are not intermixed with data. Additional details of this compounding technique are disclosed in copending application Ser. No. 07/519,382 entitled "General Purpose Compounding Technique For Instruction-Level Processors" filed May 4, 1990. Generally speaking, the compounding technique provides for the compounding two or more scalar instructions from an instruction stream without knowing the starting point or length of each individual instruction. Typical instructions already include an opcode at a predetermined field location which identifies the instruction and its length. Those adjacent instructions which qualify for parallel execution in a particular computer system configuration are provided with appropriate tags to indicate they are candidates for compounding. In IBM system/370 architecture where instructions are either two, four or six bytes in length, the field positions for the opcode are presumed based on an estimated instruction length code. The value of each tag based on a presumed opcode is recorded, and the instruction length code in the presumed opcode is used to locate a complete sequence of possible instructions. Once an actual instruction boundary is found, the corresponding correct tag values are used to identify the commencement of a compound instruction, and other incorrectly generated tags are ignored. This unique compounding technique is exemplified in the drawings of FIGS. 8-9 and 14-15 wherein the compounding rules are defined to provide that all instructions which are 2 bytes or 4 bytes long are compoundable with each other (i.e., a 2 byte instruction is capable of parallel execution in this particular computer configuration with another 2 byte or another 4 byte instruction). The exemplary compounding rules further provide that all instructions which are 6 bytes long are not compoundable at all (i.e., a 6 byte instruction is only capable of execution singly by itself in this particular computer configuration). Of course, the invention is not limited to these exemplary compounding rules, but is applicable to any set of compounding rules which define the criteria for parallel execution of existing instructions in a specific configuration for a given computer architecture. The instruction set used in these exemplary compounding techniques of the invention is taken from the System/370 architecture. By examining the opcode for each instruction, the type and length of each instruction can be determined and the control tag containing identifier bits is then generated for that specific instruction, as described in more detail hereinafter. Of course, the present invention is not limited to any specific architecture or instruction set, and the aforementioned compounding rules are by way of example only. The preferred encoding scheme for compound instructions in these illustrated embodiments has already been shown above in Table 2A-2C. In a First case with fixed length instructions having no data intermixed and with a known reference point location for the opcode, the compounding can proceed in accordance with the applicable rules for that particular computer configuration. Since the field reserved for the opcode also contains the instruction length, a sequence of scalar instructions is readily determined, and each instruction in the sequence can be considered as possible candidates for parallel execution with a following instruction. A first encoded value in the control tag indicates the instruction is not compoundable with the next instruction, while a second encoded value in the control tag indicates the instruction is compoundable for parallel execution with the next instruction. In a Second case with variable length instructions having no data intermixed, and with a known reference point location for the opcode and also for the instruction length code (which in System/370 is included as part of the opcode), the compounding can proceed in a routine manner. As shown in FIG. 8, the opcodes indicate an instruction sequence 70 as follows: the first instruction is 6 bytes long, the second and third are each 2 bytes long, the fourth is 4 bytes long, the fifth is 2 bytes long, the sixth is 6 bytes long, and the seventh and eighth are each 2 bytes long. A C-vector 72 in FIG. 8 shows the values for the identifier bits (called compounding bits in the drawings) for this particular sequence 70 of instructions where a reference point indicating the beginning of the first instruction is known. Based on the values of such identifier bits, the second and third instructions form a compounded pair as indicated by the "1" in the identifier bit for the second instruction. The fourth and fifth instructions form another compounded pair as indicated by the "1" in the identifier bit for the fourth instruction. The seventh and eighth instructions also form a compounded pair as indicated by the "1" in the identifier bit for the seventh instruction. The C-vector 72 of FIG. 8 is relatively easy to generate when there are no data bytes intermixed with the instruction bytes, and where the instructions are all of the same length with known boundaries. Another situation is presented in a Third case where instructions are mixed with non-instructions, with a reference point still being provided to indicate the beginning of an instruction. The schematic diagram of FIG. 11 shows one way of indicating an instruction reference point, where every halfword has been flagged with a reference tag to indicate whether or not it contains the first byte of an instruction. This could occur with both fixed length and variable length instructions. By providing the reference point, it is unnecessary to evaluate the data portion of the byte stream for possible compounding. Accordingly, the compounding unit can skip over and ignore all of the non-instruction bytes. A more complicated situation arises where a byte stream includes variable length instructions (without data), but it is not known where a first instruction begins. Since the maximum length instruction is six bytes, and since instructions are aligned on two byte boundaries, there are three possible starting points for the first instruction the the stream. Accordingly, the technique provides for considering all possible starting points for the first instruction in the text of a byte stream 79, as shown in FIG. 9. Sequence 1 assumes that the first instruction starts with the first byte, and proceeds with compounding on that premise. In this exemplary embodiment, the length field is also determinative of the C-vector value for each possible instruction. Therefore a C-vector 74 for Sequence 1 only has a "1" value for the first instruction of a possible compounded pair formed by combinations of 2 byte and 4 byte instructions. Sequence 2 assumes that the first instruction starts at the third byte (the beginning of the second halfword), and proceeds on that premise. The value in the length field for the third byte is 2 indicating the next instruction begins with the fifth byte. By proceeding through each possible instruction based on the length field value in the preceding instruction, the entire potential instructions of Sequence 2 are generated along with the possible identifier bits as shown in a C-vector 76. Sequence 3 assumes that the first instruction starts at the fifth byte (the beginning of the third halfword), and proceeds on that premise. The value in the length field for the fifth byte is 4 indicating the next instruction begins with the ninth byte. By proceeding through each possible instruction based on the length field value in the preceding instruction, the entire potential instructions of Sequence 3 are generated along with the possible identifier bits as shown in a C-vector 78. In some instances the three different Sequences of potential instructions will converge into one unique sequence. In FIG. 9 it is noted that the three Sequences converge on instruction boundaries at the end 80 of the eighth byte. Sequences 2 and 3, while converging on instruction boundaries at the end 82 of the fourth byte, are out-of-phase in compounding until the end of the sixteenth byte. In other words, the two sequences consider different pairs of instructions based on the same sequence of instructions. Since the seventeenth byte begins a non-compoundable instruction at 84, the out-of-phase convergence is ended. When no valid convergence occurs, it is necessary to continue all three possible instruction sequences to the end of the window. However, where valid convergence occurs and is detected, the number of sequences collapses from three to two (one of the identical sequences becomes inoperative), and in some instances from two to one. Thus, prior to convergence, tentative instruction boundaries are determined for each possible instruction sequence and identifier bits assigned for each such instruction indicating the location of the potential compound instructions. It is apparent from FIG. 9 that this technique generates three separate identifier bits for every two text bytes. In order to provide consistency with the pre-processing done in the aforementioned first, second and third cases, it is desirable to reduce the three possible sequences to a single sequence of identifier bits where only one bit is associated with each halfword. Since the only information needed is whether the current instruction is compounded with the following instruction, the three bits can be logically ORed to produce a single sequence in a CC-vector 86. For purposes of parallel execution, the composite identifier bits of a composite CC-vector are equivalent to the separate C-vectors of the individual three Sequences 1-3. In other words, the composite identifier bits in the CC-vector allow any of the three possible sequences to execute properly in parallel for compound instructions or singly for non-compounded instructions. The composite identifier bits also work properly for branching. For example, if a branch to the beginning 88 of the ninth byte occurs, then the ninth byte must begin an instruction. Otherwise there is an error in the program. The identifier bit "1" associated with the ninth byte is used and correct parallel execution of such instruction with its next instruction proceeds. The various steps in the compounding method shown in FIG. 9 as described above are illustrated in the self-explanatory flow chart of FIG. 16. The best time for providing reference point information for instruction boundaries is at the time of compiling. Reference tags 101 could be added at compile time to identify the beginning of each instruction, as shown in FIG. 11. This enables the compounder to proceed with the simplified technique for the aforementioned First, Second and Third cases. Of course, the compiler could identify instruction boundaries and differentiate between instructions and data in other ways, in order to simplify the work of the compounding unit and avoid the complications of a technique like the one shown in FIG. 9. FIG. 10 shows a flow diagram of a possible implementation of a compounder for handling instruction streams like the one in FIG. 9. A multiple number of compounder units 104, 106,108 are shown, and for efficiency purposes this number could be as large as the number of halfwords that could be held in a text buffer. In this version, the three compounder units would begin their processing sequences at the first, third, and fifth bytes, respectively. Upon finishing with a possible instruction sequence, each compounder starts examining the next possible sequence offset by six bytes from its previous sequence. Each compounder produces compound identifier bits (C-vector values) for each halfword in the text. The three sequences from the three compounders are ORed 110 and the resulting composite identifier bits (CC-vector values) are stored in association with their corresponding textual bytes. One beneficial advantage provided by the composite identifier bits in the CC-vector is the creation of multiple valid compounding bit sequences based on which instruction is addressed by a branch target. As best shown in FIGS. 14-15, differently formed compounded instructions are possible from the same byte stream. FIG. 14 shows the possible combinations of compounded instructions when the computer configuration provides for parallel issuance and execution of no more than two instructions. Where an instruction stream 90 containing compounded instructions is processed in normal sequence, the Compound Instruction I will be issued for parallel execution based on decoding of the identifier bit for the first byte in a CC-vector 92. However, if a branch to the fifth byte occurs, the Compound Instruction II will be issued for parallel execution based on decoding of the identifier bit for the fifth byte. Similarly, a normal sequential processing of another compounded byte stream 94 will result in Compound Instructions IV, VI and VIII being sequentially executed (the component instructions in each compound instruction being executed in parallel). In contrast, branching to the third byte in the compounded byte stream will result in Compound Instructions V and VII being sequentially executed, and the instruction beginning at the fifteenth byte (it forms the second part of Compound Instruction VIII) will be issued and executed singly, all based in the identifier bits in the CC-vector 96. Branching to the seventh byte will result in Compound Instructions VI and VIII being sequentially executed, and branching to the eleventh byte will result in Compound Instruction VIII being executed. In contrast, branching to the ninth byte in the compounded byte stream will result in Compound Instruction VII being executed (it is formed by the second part of Compound Instruction VI and the first part of Compound Instruction VIII). Thus, the identifier bits "1" in the CC-vector 96 for Compound Instructions IV, VI and VIII are ignored when either of the Compound Instructions V or VII is being executed. Alternatively the identifier bits "1" in the CC-vector 96 for Compound Instructions V and VII are ignored when any of Compound Instructions IV, VI or VIII are executed. FIG. 15 shows the possible combinations of compounded instructions when the computer configuration provides for parallel issuance and execution of up to three instructions. Where an instruction stream 98 containing compounded instructions is processed in normal sequence, the Compound Instructions X (a triplet group) and XIII (a pair group) will be executed. In contrast, branching to the eleventh byte will result in Compound Instruction XI (a triplet group) being executed, and branching to the thirteenth byte will result in Compound Instruction XII (a different triplet group) being executed. Thus, the identifier bits "2" in a CC-vector 99 for Compound Instructions XI and XII are ignored when Compound Instructions X and XIII are executed. On the other hand when Compound Instruction XI is executed, the identifier bits for the other three Compound Instructions X, XII, XII are ignored. Similarly when Compound Instruction XII is executed, the identifier bits for the other three Compound Instructions X, XI, XIII are ignored. There are many possible designs for an instruction compounding unit depending on its location and the knowledge of the text contents. In the simplest situation, it would be desirable for a compiler to indicate with reference tags which bytes contain the first byte of an instruction and which contain data. This extra information results in a more efficient compounder since exact instruction locations are known. This means that compounding could always be handled as in the First, Second and Third case situations in order to generate the C-vector identifier bits for each compound instruction. A compiler could also add other information such as static branch prediction or even insert directives to the compounder. Other ways could be used to differentiate data from instructions where the instruction stream to be compounded is stored in memory. For example, if the data portions are infrequent, a simple list of addresses containing data would require less space than reference tags. Such combinations of a compounder in hardware and software provide many options for efficiently producing compound instructions. While exemplary preferred embodiments of the invention have been disclosed, it will be appreciated by those skilled in the art that various modifications and changes can be made without departing from the spirit and scope of the invention as defined by the following claims.
|
Same subclass Same class Consider this |
||||||||||
