Data processing device for processing virtual machine instructions6298434Abstract A preprocessor is functionally inserted between a memory and a processor core. The preprocessor fetches virtual machine instructions, like Java instructions, from the memory and from them it generates native instructions which are supplied to the processor core. In response to a special virtual instruction the preprocessor supplies a native jump to subroutine to the processor core, monitors when the processor core returns from that subroutine and then resumes supplying generated native instructions. The invention also provides for a processor which has a special instruction which calls a subroutine and causes the processor to convert a call context for virtual machine instructions to a call context for a high level language subroutine before making the call. Claims What is claimed is: Description Unpublished European patent application No. 97203033.2 improves the speed with which programs expressed in the virtual machine instruction set are processed. This is achieved by adding a preprocessor between the memory and the processor core. The preprocessor stores for each particular virtual machine instruction one or more native instructions that express the function of that particular machine instruction. The preprocessor reads a virtual machine instruction from memory, selects the native instruction or instructions defined for that virtual machine instruction and supplies this instruction or these instructions to the processor core for execution. The processor core executes the native instructions that perform the function defined by the virtual machine instruction consecutively: between the clock cycles for those native instructions no processor core clock cycles are used in which the processor core executes additional instructions to select the appropriate native instructions.
1w $a0, 4 ($tosp) // load the top element of the stack in register $a0
1w $a1, 8 ($tosp) // load the second element of the stack in $a1
add $a0, $a1, $a0 // add $a0 and $a1, place the sum in $a0
addi $tosp, $tosp, 4 // lower the stack by one element
sw $a0, 4 ($tosp) // store the sum in the new top of stack
Preferably, the converter 132 comprises a table for converting a virtual machine instruction to a sequence of native instructions. A one dimensional table may be used, where each cell of the table comprises a sequence of native instructions for one corresponding virtual machine instruction. The cell number may correspond to the value of the corresponding virtual machine instruction. As an example, the sequence of native instructions for the Java integer addition (0.times.60) may be located in cell 96 (=0.times.60 in hexadecimal notation). Since the length of the sequence of native instructions may vary considerably for the various virtual instructions, preferably the sequences are located in a table where the sequences immediately follow each other, that is without any explicit cell structure. FIG. 2 shows a translation table 200 for use in is shown in FIG. 2, where the implicit cell boundaries are indicated using dotted lines. In order to be able to locate a sequence for a virtual machine instruction a code index table 210 may be used, which for each virtual machine instruction (VMI 1 to VMI N) indicates the starting point of the corresponding sequence in the translation table 200. For the cell of the translation table 200 which corresponds to VMI 3 the related sequence 220 of native instruction NI 1 to NI M are shown. When the converter 132 receives a virtual machine instruction I, the converter uses the virtual machine instruction I as an index for accessing code index table 210, in order to retrieve the starting point of the sequence of native instructions that should be executed each time an instruction of the same type as I is processed. Subsequently, the converter 132 accesses the translation table 200 at a first address indicated by the code index retrieved from code index table 210 in order to read a first native instruction. This first native instruction is supplied to the processor core 112 via the feed unit 136. Subsequently, the converter accesses the translation table 200 at addresses following the first address to read further native instructions used to execute the virtual machine instruction I. When all these instructions have been read, which is indicated for example by a termination entry in the translation table 200, the process is repeated for a subsequent virtual machine instruction. A further example of a conversion is given for the Java byte code bipush n (used for sign extending byte n and placing the result on top of the stack). This virtual machine instruction consists of two bytes {0.times.16 and n), where the first byte specifies the operation and the second byte provides the parameter n. The instruction may be converted to the following sequence of native MIPS instructions:
ori $a0, $0, n /* Load register $a0 with constant n */
sll $a0, $a0, 24 /* Shift left by 24 bits */
sra $a0, $a0, 24 /* Arithmetic shift right, causing sign extension,
by replicating last left-most bit */
sw $a0, 0 ($tosp) /* Store result at new top of stack */
addi $tosp, -4 /* Increment stack size */
This example illustrates that a virtual machine instruction may be parametrized, where an operation code is followed by at least one operand. Advantageously, the converter 132 comprises a translation table 200, where native instructions are represented either by the full code or by an instruction skeleton. As an example, the instruction addi $tosp, -4 (last instruction of the sequence of the previous example) contains no variable parts and may be located in full as a 4-byte entry in the table. The instruction ori $a0, $0, n (first instruction of the sequence of the previous example) contains a variable part and may be located in the table as a skeleton, not specifying the variable part (being n). Preferably, the entry in the table for an instruction skeleton is the same width as a full instruction (e.g. 4-bytes for a MIPS processor), allowing a uniform table. Further information may be located in the table (or in separate table(s)) for indicating how the unspecified part of the native instruction skeleton should be filled in. Advantageously, micro-programming is used to fill in the unspecified parts. The further information may then comprise or indicate micro code. It will be appreciated that it is advantageous to use for an instruction skeleton a same structure (width and composition) as for a full native instruction. However, other structures may be used as well. If the virtual machine is a stack oriented machine, preferably the stack or at least the top elements of the stack are mapped onto registers of the processor core 112. In this way the memory stack (with the virtual machine stack) is mapped to the register stack (using for example memory locations with increasing addresses for increasingly higher positions on the stack or memory locations with decreasing addresses for increasingly higher positions on the stack, and independently register locations with increasing addresses for increasingly higher positions on the stack or register locations with decreasing addresses for increasingly higher positions on the stack). Assuming that registers $r1, $r2 and $r3 contain three successive elements of the memory stack, where initially $r1 corresponds to the first empty location of the memory stack, $r2 contains the top of the memory stack, and $r3 contains the second element of the memory stack, the Java byte code bipush n may be converted to the following sequence of native MIPS instructions: ori $r1, $0, n sll $r1, $r1, 24 sra $r1, $r1, 24 similarly if $r2 is the top of stack a Java byte code instruction add may be converted into add $r3, $r2, $r3 In this case, the preprocessor 130 also keeps a pointer indicative of the register (e.g. $r2) that stores the top of stack. When a virtual machine instruction is translated, the preprocessor generates native instructions in which the references to register are adapted according to the content of this pointer to the top of stack. The pointer is updated according to virtual the instruction. E.g. after the bipush instruction one item has been added to the stack and if the bipush instruction of the example is followed by a next bipush instruction, the top of stack might be $r1, so that in the translation of this next bipush instruction $r0 is used instead of $r1 in the native instructions (ori $r0, $0, n'/s11 $r0, $r0, 24/sra $r0, $r0, 24). Similarly after Java byte code instruction iadd the stack contains one item less, so that if the iadd instruction of the example is followed by a bipush instruction, the top of stack might be $r3, so that in the translation of this next bipush instruction $r2 is used instead of $r1 in the native instructions (ori $r2, $0, n'/s11 $r2, $r2, 24/sra $r2, $r2, 24). Logically, the pre-processor 130 manages an independent virtual machine instruction pointer indicating the current (or next) virtual machine instruction in the instruction memory 120. The processor core 112 has its own instruction pointer (program counter) and issues instruction addresses based upon that program counter. When the pre-processor 130 is active these instruction addresses are not normally used to select which individual virtual machine instructions are to be executed. In this case the program counter of the processor core is partly redundant. The program counter of the processor core 112 and its instruction fetch mechanism are present in the processor core 112 anyway because the processor core 112 is a standard core, designed for executing native programs on its own, the preprocessor being added as an option. During normal processing of virtual machine instructions, the value of the instruction addresses issued by the processor core 112 are irrelevant for selecting individual instructions. Instead, the preprocessor 130 uses its own viral machine instruction pointer to determine which virtual machine instruction should be loaded and supplies to the processor core 112 the native machine instructions that are stored for that virtual machine instruction in the preprocessor 130. These stored native machine instructions normally do not include jump, branch etc. instructions that cause a change in the value of the program counter of the processor core. However, according to the invention, the preprocessor 130 responds to some special virtual machine instructions by supplying a "jump to subroutine" instruction (or its equivalent) to the processor core 112. The jump target address of this instruction points at a location in instruction memory 120 which contains the start of a subroutine in native instructions. In response to this jump to subroutine instruction, the processor core 112 starts issuing instruction addresses from the jump target address, receiving the native instructions that make up the subroutine from memory 120 and executing these instructions without intervention of the preprocessor 130. During execution of the subroutine the preprocessor 130 suspends the issuing of converted instructions: it merely monitors the operation of the processor core 112 to determine whether a return from subroutine corresponding to the original jump to subroutine occurs. If the preprocessor 130 detects such a return from subroutine, the preprocessor 130 resumes issuing native instructions dependent on the virtual machine instructions. FIG. 3 shows a flow-chart of an embodiment of operation of the pre-processor 130. The flow-chart shows a loop of sequential steps for reasons of clarity; in practice the steps may be pipelined: steps from one or more iterations of the loop may be executed concurrently. In a first step 30 of the flow-chart, the pre-processor 130 increments its virtual machine program counter. In a second step 31 of the flow-chart, the pre-processor 130 fetches the virtual machine instruction pointed at by the virtual machine program counter. In a third step 32, the preprocessor 130 tests whether there are any native instructions for that virtual machine instruction. If not, the flow-chart continues from the first step 310, if so the preprocessor 130 continues with a fourth step 33. In the fourth step 33, the preprocessor 130 tests whether a native subroutine should be called or not. If not, the preprocessor 130 executes a fifth step 34, in which it supplies a native instruction to the processor core 112 and continues with the third step 32 to see if there are any further native instructions for the virtual machine instruction left. If in the fourth step 33 the preprocessor 130 decides that a native subroutine should be called, the preprocessor 130 proceeds to a sixth step 35 in which it supplies an appropriate subroutine call instruction to the processor core 112. After that the preprocessor 130 monitors operation of the processor core in a seventh step 36 and decides in an eighth step 37 whether the processor core 112 has returned from the subroutine. The pre-processor 130 keeps monitoring operation of the processor core 112 until the processor core 112 returns from the subroutine. Thereupon the pre-processor returns to the third step 32. An advantageous way of monitoring the operation of the processor core 112 is to observe the instruction addresses issued by the processor core 112. In this cases the pre-processor 130 uses the instruction address issued by the processor core 112 to detect whether it should supply converted instructions to the processor core 112 or allow the processor core 112 to execute native instructions. E.g. a converted instructions is supplied only if the issued address is in a predetermined range. In this case, the instruction address issued by the processor core 112 will be in this predetermined range when the jump to subroutine instruction is issued. The jump target address will be outside the predetermined range, causing execution of native instructions from memory 120, and upon return from subroutine the instruction address issued by the processor core 112 will return to the predetermined range, causing the pre-processor 130 to resume the supplying of converted instructions. Of course, the switch between supplying converted instructions, executing the native instructions of the subroutine and back can be implemented in various other ways. For example, one might use a jump instead of a jump to subroutine to start the subroutine and execute a jump back anywhere into the predetermined range at the end of the subroutine. This will also enable the pre-processor to detect that execution of the subroutine has been completed. Preferably, the preprocessor uses a RAM memory for storing one or more addresses of the subroutines. In this case, the table of native instructions which is used to convert virtual machine instructions into native instructions may contain, for selected instructions, an entry indicating that the pre-processor 130 should supply a native jump-to-subroutine instruction to the processor core 112 and a pointer to the location in RAM where the target address of that jump to subroutine instruction is stored. Upon encountering such an entry, the preprocessor 130 will supply the jump to subroutine instruction to the processor core 112, for example with the target address substituted into this instruction or using a jump to subroutine indirect instruction using a pointer to the RAM location which stores the target address. The use of a RAM to store the target address makes it possible to alter the subroutine that is executed for a virtual machine instruction very quickly, without constructing different hardware. Preferably, the pre-processor 130 makes it possible to use subroutines of native instructions compiled from a high level language (HLL), like the language C, for executing more complex virtual machine instructions. Causing the processor core to execute a subroutine of native instructions compiled from a HLL language normally requires that the arguments of the subroutine are prepared in a specific way for the processor core. For example in a MIPS processor the first four arguments of the subroutine are expected to be present in registers r4-7. Similarly, results are normally delivered in a specific way. For example in a MIPS processor results are delivered in registers $r2, $r3. The return address for the subroutine is expected in a specific register, for example register $r31. In the virtual machine model these arguments and results are in virtual storage locations, which the preprocessor maps to physical registers and memory locations. However these registers and storage location may differ from those used in the HLL subroutine calls. Therefore, the pre-processor is preferably arranged so that in response to one or more virtual machine instructions the preprocessor first feeds native instructions to the processor core which cause the processor core to transfer information from storage elements that hold the content of positions of the stack of the virtual machine to storage elements where arguments are expected for calling subroutine compiled from the HLL. Subsequently, the preprocessor feeds an instruction to the processor core to transfer control to the subroutine and after return from that subroutine the pre-processor feeds native instructions to the processor core to transfer results of the subroutine from predefined storage elements to storage elements that hold the content of positions on the stack of the virtual machine. Preferably the pre-processor causes the processor core to fetch the address of the subroutine from a RAM location in response to at least one of the virtual machine instructions that are treated in this way. Arguments of the HLL subroutine may be compile-time arguments and run-time arguments or a mix of both. Run time-arguments are computed during program execution. The run-time arguments are therefore not available when the program is compiled (i.e. at compile-time). An example of a simple virtual machine instruction whose argument is available at compile-time is the Java bytecode `bipush n`, where `n` is supplied by the compiler. An example of a virtual machine instruction that requires run-time arguments is the Java bytecode `iadd` (of course in practice one would use HLL subroutines only for more complicated instructions). This instruction adds two integer arguments from the stack and places the result on the stack. A high level language (e.g. C) function for this virtual machine instruction could be defined as follows: int iadd(int a0, int a1) {return a0+a1;} The programming conventions for the MIPS processor require subroutines compiled from the HLL to expect the first 4 arguments in registers $r4 to $r7, and to place results in registers $r2 and $r3, and then return to the address pointed to by register $r31. The HLL compiler might translate the HLL iadd function to the following sequence of MIPS instructions: add $r2, $r4, $r5 jr $r31 These instructions provide for using the arguments, producing a result (add) and return from the HLL subroutine. Before transferring control to the HLL subroutine, the preprocessor has to manipulate run-time data, that is, arguments for the above function have to be moved from the program context into the argument registers, prior to calling function `iadd`. For the Java virtual machine, this requires manipulating stack values, since the Java VM places computational values on a stack. In case a preprocessor generates processor instructions for such a call, it may generate the following sequence of MIPS instructions: 1w $r4,4($sp) // Get arguments from the virtual machine stack 1w $r5,8($sp) addi $sp,$sp,8 jalr $r31,iadd // Call the HLL subroutine nop // Fill branch-delay slot of the MIPS processor core. sw $2,0($sp) // Place result back on stack. addi $sp,$sp,-4 In addition, the preprocessor may cause the processor core to save any information from storage elements that may be overwritten by the HLL subroutine. In particular, the part of the virtual machine stack that is stored in registers may be transferred to memory, preferably to the locations where the virtual machine stack is stored during normal execution of virtual machine instructions (i.e. without HLL subroutine execution). HLL subroutines often also use a stack; normally it is guaranteed that HLL subroutines do not alter the contents of the stack "below" what is present on stack before the HLL subroutine is called. Therefore in principle one may use the same physical stack and stack pointer register for the virtual machine and for the HLL subroutine, if necessary after reserving a sufficient amount of stack space (by incrementing or decrementing the stack pointer as appropriate for a growing stack) before the HLL subroutine is called. Another example of a Java bytecode which requires run-time arguments is the `jsr n` (jump to subroutine at location `current+n`). A HLL function for this virtual machine instruction could have the following call definition (e.g. in an include file): void jsr( unsigned n, unsigned current); A preprocessor generating a call to this function would feed the following sequence of MIPS instructions to the processor core: ori $r4,$0,n lui $r5,(current>>16) ori $r5,$r5,(current&0.times.FF) jalr $r31,jsr nop Of course the use of a mechanism for transferring argument information from locations representing storage in a virtual machine to storage elements where this argument information is expected by a HLL subroutine is not limited to preprocessors. One might implement this type of mechanism inside the processor core, or in a processor more generally, so that the processor in response to a special instruction transfers information to standard locations used by HLL subroutines and then transfers control to such a subroutine. Thus, in response to normal instructions the processor uses arguments from storage locations defined by a first argument storage convention, e.g. by taking arguments for each normal instruction from locations pointed at by a stack pointer, executing the function of the normal instruction, writing back the results at a conventional storage location and then executing the next instruction. In response to the special instruction, arguments are transferred and the next instruction to be executed is the entry point of the subroutine. FIG. 4 shows a flow-chart of processor operation which implements the transfer of arguments. At the left of this figure is a normal execution loop: in a first step 40, the program counter is incremented, in a second step 41 the instruction pointed at by the program counter is fetched, in a third step 42 it is tested whether this instruction is a special instruction, if not the instruction is executed in a fourth step 43 and the processor returns to the first step 40. The loop is shown as a sequence of steps, but of course steps from several iterations of the loop may be executing concurrently in a pipe-line. During execution of the instruction the processor reads the arguments of the instruction from locations as defined by the first argument storage convention. If in the third step 42 the processor finds that the instruction is special, the processor executes a fifth step, to test whether there are any arguments. If so, the processor executes a sixth step 46, to transfer an argument from a storage location as defined in the first storage convention to its corresponding storage location in the second storage convention and returns to the fifth step 45 to see if there are any further arguments. If the processor determines in the fifth step 45 that there are no further arguments left, the processor transfers control to the subroutine in a seventh step 47. Thereupon the processor executes the subroutine (shown as a dashed step 48). If the flow-chart is executed by a pre-processor 130, the pre-processor 130 monitors the processor core 112 in this step until a return is detected. Upon return from the subroutine the processor transfers the results of the subroutine (if any) from storage locations defined by the second argument convention to storage locations defined by the first argument storage convention in an eight step. After that the processor resumes with the first step 40 (of course the program counter may already be incremented when the subroutine is called, in which case the first step may be skipped over).
|
Same subclass Same class Consider this |
||||||||||
