Method and apparatus for performing binary translation5930509
Abstract
A computer system for executing a binary image conversion system which converts instructions from a instruction set of a first, non native computer system to a second, different, native computer system, includes an run-time system which in response to a non-native image of an application program written for a non-native instruction set provides an native instruction or a native instruction routine. The run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction. Thereafter, the non-native instructions and the profile statistics are fed to a binary translator operating in a background mode and which is responsive to the profile data generated by the run-time system to form a translated native image. The run-time system and the binary translator are under the control of a server process. The non-native image is executed in two different enviroments with first portion executed as an interpreted image and remaining portions as a translated image. The run-time system includes an interpreter which is capable of handling condition codes corresponding to the non-native architecute. A technique is also provided to jacket calls between the two execution enviroments and to support object based services. Preferred techniques are also provide to determine interprocedural translation units. Further, intermixed translation/optimization techniques are discussed.
Claims
What is claimed is:
1. A method executed in a computer system for performing binary translation of a first binary image associated with a first computer architecture having a first instruction set to a second binary image associated with a second computer architecture having a second instruction set, the method comprising the steps of:
translating, in accordance with profile execution information produced from a runtime execution of said first binary image by a runtime interpreter, a first portion of said first binary image to a second portion of said second binary image by producing a plurality of intermediate representations, each of said plurality of intermediate representations including one or more intermediate elements, each of said intermediate elements associated with a set of invariant characteristics remaining constant throughout said translating step, said set of invariant characteristics including at least two invariant characteristics, said at least two invariant characteristics including correspondence to an instruction included in said second instruction set, and correspondence to an instruction in said first instruction set, said translating step including the substeps of:
producing an initial intermediate representation of said first portion;
producing a final intermediate representation of said first portion, said initial and said final intermediate representations being of said plurality of intermediate representations; and
optimizing said initial intermediate representation, substeps of optimizing and translating being intermixed to produce said final intermediate representation; and
generating said second portion of said second binary image using said final intermediate representation, said profile execution information characterizing said runtime execution of said first binary image;
wherein said initial and final intermediate representations include a list of instruction code cells, each of said instruction code cells including an instruction opcode and one or more code cell operands, said first computer architecture being a complex instruction set computer and said second computer architecture being a reduced instruction set computer, and wherein a machine instruction included in said first portion of said first binary image corresponds to one or more instruction code cells in said initial intermediate representation, and wherein said step of producing said initial intermediate representation includes the substeps of:
associating an exception bit flag with each of said instruction code cells;
initializing, as indicated in an opcode exception table, each of said exception bit flags, each of said exception bit flags being set to a bit value of 1 when said opcode exception table indicates a runtime exception can be generated when a machine instruction, which is in said first instruction set and corresponds to the instruction opcode, is executed, otherwise each of said exception bit flags being set to a bit value of 0, said opcode exception table being indexed by instruction opcode and having an entry for each instruction opcode corresponding to an instruction in said first instruction set;
examining each machine instruction included in said first portion of said first binary image to determine memory operands, each of said memory operands representing a memory location;
determining for each of said memory operands an effective address formation to reference a corresponding memory location;
selecting for each of said memory operands one of a load operation or a store operation, the load operation being used to read from the corresponding memory location, and the store operation being used to write to the corresponding memory location;
determining for each machine instruction a functional operation performed by said each machine instruction;
generating, for each of said memory operands, a first instruction code cell and a second instruction code cell, said first instruction code cell computing the corresponding to the effective address formation, said second instruction code cell performing said selected one of said load operation or said store operation;
generating, for said each machine instruction, a third instruction code cell corresponding to said functional operation performed by said machine instruction; and
recording with each of said instruction code cells a first binary image address corresponding to the machine instruction from which said each instruction code cell is generated.
2. The method of claim 1, wherein said translating step further includes the step of optimizing said initial intermediate representation producing another intermediate representation used to produce said final intermediate representation.
3. The method of claim 2, wherein said initial, final and other intermediate representations each comprise a list of instruction code cells, each of said instruction code cells including an instruction opcode and one or more associated operands, said instruction opcode being one of a set of instruction opcodes comprising opcodes corresponding to source instructions associated with said first computer architecture and destination instructions associated with said second computer architecture.
4. The method of claim 1, wherein said initial intermediate representation comprises one or more basic blocks of instruction code cells, each of said basic blocks comprising one or more instruction code cells having no entry or exit between code cells of said each basic block, and wherein a condition code bit mask is associated with each of said instruction code cells of said initial intermediate representation, and wherein said substeps of optimizing and translating include condition code processing comprising the steps of:
performing, for each of said basic blocks, data flow analysis using said condition code bit masks associated with said instruction code cells in said each basic block to produce local summary condition code information associated with said each basic block;
determining, for each of said basic blocks using said summary condition code information associated with said each basic block, global condition code information identifying a portion of said condition codes which are referenced in other basic blocks; and
updating the initial intermediate representation to include a representation of said condition codes and references to said condition codes within and between basic blocks as determined, respectively, by said local summary condition code information and said global condition code information.
5. The method of claim 4, wherein said step of updating said initial intermediate representation includes the steps of:
adding to the initial intermediate representation one or more instruction code cells which sets said each condition code; and
adding to the initial intermediate representation one or more other instruction code cells which uses said each condition code.
6. The method of claim 1, wherein said step of translating includes register processing, said register processing comprising the steps of:
determining which operands of instruction code cells are partial register operands;
replacing each of said partial register operands with an entire register operand;
adding one or more instruction code cells to said first intermediate representation to produce a result using said entire register operand that is equivalent to using said partial register operand; and
determining and updating any data dependencies of said partial register operands.
7. The method of claim 1, wherein said step of translating includes performing floating-point peephole optimization processing which replaces a first group comprising four instruction code cells with a single other instruction code cell, said first group of instruction code cells corresponding to machine instruction in said first instruction set, said single other instruction code cell corresponding to a machine instruction in said second instruction set, and wherein said first group of instruction code cells comprises code cells corresponding to four machine instructions as follows:
a floating point test instruction that tests a floating point value and accordingly sets one or more status values;
a store instruction which stores said status values to a register;
a test instruction which compares the register containing the status values to a bit mask to determine a condition, and sets condition codes accordingly; and
a conditional branch instruction transferring control to one instruction upon the condition being satisfied, otherwise control is transferred to another instruction.
8. The method of claim 1, wherein said translating step includes performing floating point register stack addressing processing that replaces a first group of one or more instruction code cells with a second group of one or more other instruction code cells, said first group of instruction code cells corresponding to machine instructions in said first instruction set which push and pop floating point values used as operands from the stack register.
9. The method of claim 1, wherein said initial intermediate representation comprises one or more basic blocks of instruction code cells, each of said basic blocks comprising one or more instruction code cells having no entry or exit between code cells of said each basic block, and wherein said step of translating includes performing local and global optimizations, said local optimizations being performed within a basic block and said global optimizations being performed between two or more basic blocks.
10. The method of claim 1, wherein said step of translating includes performing exception handler processing that includes the steps of:
examining each instruction code cell in said initial intermediate representation to determine which operands cause exceptions to occur;
determining and recording a mapping of registers for each instruction code cell from the second instruction set to first instruction set, said mapping being used during execution of said second binary image if an exception occurs and being included in said second binary image; and
recording in said second binary image an intervening runtime user exception handler to which execution control is passed when a runtime exception occurs, said intervening runtime user exception handler mapping a first context to a second context which is passed to another exception handler, said first context including registers in said second instruction set, said second context including registers in said first instruction set, said intervening runtime user exception handler using said mapping to map said first context to said second context.
11. The method of claim 1, wherein said step of translating includes final processing comprising the steps of:
replacing instruction opcodes corresponding to operations in said first instruction set with other instruction opcodes corresponding to operations in said second instruction set;
replacing 32-bit operands with 64-bit sign-extended operands; performing interimage call processing for each translation unit that is called from another translation unit, said each translation unit being not defined within said first binary image, said interimage call processing causing runtime execution control to be passed to said runtime interpreter when performing said interimage call at a subsequent runtime; and
performing intra-image call processing for calls between translation units in said first image file, intra-image call processing including, for each intra-image call made from a first translation unit to a second translation unit, the step of:
replacing one or more instruction code cells corresponding to said intra-image call with other instruction code cells causing runtime execution control to directly transfer from said first translation unit to said second translation unit without passing runtime execution control to said runtime interpreter when performing said intra-image call.
12. The method of claim 11, wherein said step of replacing 32-bit operands with 64-bit sign-extended operands uses local data flow analysis information to locate operand references and definitions in said initial intermediate representation.
13. The method of claim 11, wherein said intra-image call processing is performed for a first group of one or more instruction code cells in said initial intermediate representation corresponding to one or more instructions in said first instruction set for implementing one of a program counter-relative call or absolute call from a first translation unit to a second translation unit, and wherein said first group of instruction code cells is replaced with a second group of instruction code cells corresponding to one or more instructions for a program counter-relative call in said second instruction set.
Description
BACKGROUND OF THE INVENTION
This invention relates generally to computer systems and more particularly to binary translation performed in a computer system.
As it is known in the art, computer systems which generally include a central processing unit (CPU), a main memory and an input/output device interconnected by a system bus are used to execute computer programs to perform a useful task. One type of computer program is an operating system which is used to interface the CPU to an application program. The aforementioned application program is used by a user of the computer system to perform the useful task. The operating system typically includes software resources needed by the computer system to interface hardware elements to the computer system as well as to interface the application program with other programs executing in a computer system.
Application programs typically include programs such as word processors which execute in the computer system under the control of the operating system. An application program typically includes one or more binary image containing machine instructions for execution in a computer system. One technique commonly used to produce the binary image includes compiling source code written in the programming language to produce object code. The object code is usually linked with a linker to produce the binary image. The source code generally comprises one or more statements written in a programming language, such as one of the well-known commercially available C, C++, or Fortran programming languages. The object code generally comprises machine instructions and data. The machine instructions are typically executed by a computer processor in a computer system.
It is generally known that binary images or machine executable programs comprising an application program are made for execution in a computer system of a particular computer architecture or instruction set as well as the particular operating system. Typically, binary images made for one architecture and operating system cannot execute on a different architecture and/or operating system.
New architectures are developed to provide significant performance improvements for hardware associated with the architecture. However, one drawback to a new architecture is that existing binary images comprising an application that execute on an older architecture cannot directly run on a newer architecture due to different instruction sets of the new and old architectures. While it is desirable to migrate to a new architecture that may be faster or more efficient than an older architecture, one of the most significant drawbacks for a user is the result that an existing application and data files used on an old architecture are not directly transferrable for use in the new architecture.
As a result, techniques have been developed to assist users in migrating applications and data from an older architecture to a new architecture. One such technique includes translating a binary image designed for execution in an older computer architecture to another binary image for execution in a new computer architecture.
One problem in performing binary translations is formulating an efficient method for transforming a first binary image associated with a first computer architecture to a second binary image associated with a second computer architecture. The method should also be flexible and not unduly restrict the binary translation process.
SUMMARY OF THE INVENTION
In accordance with the present invention is an apparatus for performing a binary translation of a first binary image to a second binary image. The apparatus includes a means for producing a first intermediate representation of the first binary image, a binary image converter, and a code generating means. The first intermediate representation includes one or more instruction codecells. A codecell includes an opcode corresponding to an instruction included in either a first or a second instruction set. The binary image converter includes a translator and an optimizer. The translator translates the first intermediate representation into a second intermediate representation in response to profile execution information. The optimizer intercommunicates with the translator to optimize the first intermediate representation. The code generating means generates the second binary image using the second intermediate representation.
With such an arrangement, a binary image is translated and optimized in a new and flexible way which efficiently uses computer system resources. The foregoing arrangement is flexible in that the steps of optimization and translation can be intermixed and performed in a variety of different orderings. The intermediate representation affords this flexibility by not imposing undue restrictions or making assumptions about the state of an intermediate representation at various points during translation and optimization.
Using the foregoing arrangement of the invention decreases development and maintenance costs associated with a binary translation process. Using a single intermediate representation throughout the binary translation process allows a common service routines to be used throughout the binary translation process, as contrasted with a more costly binary translation process having various IRs requiring multiple corresponding sets of service routines operating on the various IRs.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing features and other aspects of the invention will now become apparent when the accompanying description is read in conjunction with the following drawings in which:
FIG. 1 is a block diagram of a computer system;
FIG. 2 is a block diagram of a dual stage instruction conversion system including a run-time system and a background system;
FIG. 3 is a block diagram of the run-time system portion of the instruction conversion system of FIG. 2;
FIG. 3A is a flow chart depicting the steps performed at run-time to execute a non-native image on the system of FIG. 1;
FIG. 4 is a more detailed block diagram of a binary translator used in the background system portion of the conversion system of FIG. 2;
FIG. 5 is block diagram of a data structure representing a profile record structure;
FIG. 6 is a block diagram of a representative profile record of the profile record structure of FIG. 5;
FIG. 7 a diagram showing a typical arrangement for a instruction for a complex instruction set computer (CISC);
FIG. 8 is block diagram of a register file in the computer system of FIG. 1 showing assignment of registers corresponding to the non-native architecture;
FIG. 9 is a diagram showing a typical construct for one of the registers in the register file of FIG. 8
FIG. 10 is a pictorial representation of connections of various data structures including a dispatch table to determine an equivalent routine for the interpreter;
FIG. 11 is a pictorial representation of the process for activating an alternate dispatch table;
FIG. 12 is a diagram showing an arrangement of an entry from the dispatch table of FIG. 10;
FIG. 13 is diagram showing a typical arrangement of condition codes of a CISC architecture which implements condition codes;
FIG. 14 is a block diagram of an arrangement to determine evaluation routines for condition codes;
FIG. 15 is a block diagram of an arrangement to determine evaluation routines for current and previous values of condition codes;
FIGS. 16-18 are a series of diagrams useful in understanding how condition codes are handled in the run-time system of FIG. 3;
FIGS. 19 and 20 are diagrams showing relationship between address spaces;
FIG. 21 is a diagram of a context data structure used in the interpreter of FIG. 4;
FIG. 22 is a block diagram of a pair of data structures stored in memory which represents a return address stack for a non-native image of a program as well as shadow stack for a native image of the program;
FIG. 23 is a diagram showing the relationship between the data structures of FIG. 22 and execution of non-native and native routines with calls into corresponding non-native and native routines;
FIG. 24 is a diagram of a data structure including translated or native-image routines and call address translation table;
FIG. 25 is a diagram depicting the relationship of the routine call tables in the translated image and the shadow stack to the on-line and background systems;
FIG. 26 is a flow diagram of a typical application program instruction sequence used to illustrate aspects of the invention;
FIG. 27 is a block diagram showing an example of an object;
FIG. 28 is a block diagram showing an example of cross process calling of object methods;
FIG. 29 is a block diagram showing an example of an interface structure;
FIG. 30 is a flow chart showing an example of steps leading to the use of an object in an object oriented service system;
FIG. 31 is a flow chart showing steps in an example embodiment of a method for intercepting functions to perform interface structure replacement;
FIG. 32 is a flow chart showing an example replacement interface structure;
FIG. 33 shows an example embodiment of a template for a jacket function;
FIG. 34 is a flow chart showing steps performed in an example embodiment of a PBJA jacket function when called from non-native code;
FIG. 35 is a flow chart showing steps performed by an example embodiment of a PBJA jacket function when called from native code;
FIG. 36 is a flow chart showing steps performed by an example embodiment of a PAJB jacket function when called from native code;
FIG. 37 is a flow chart showing steps performed by an example embodiment of a PAJB jacket function when called from non-native code;
FIG. 38 is a block diagram showing an example of a system for load time processing to support interception of functions which take a pointer to an object as a parameter;
FIG. 39 is a flow chart showing an example of steps performed at run time to support interception of functions which take a pointer to an object as a parameter;
FIG. 40 is a flow chart showing an example embodiment of steps performed during general function jacketing;
FIG. 41 is a flow chart showing steps to determine and use translation units when performing a binary translation;
FIG. 41A is a flow chart showing steps to form translation units of a non-native binary image;
FIG. 42 is a flow chart showing steps of flow path determination;
FIG. 42A is a flow chart showing steps to determine transfer of control target locations for an indirect transfer instruction;
FIG. 43 is a block diagram showing two types of entries included in the profile statistics;
FIG. 44 is a flow chart showing steps for determining regions;
FIG. 45 is a block diagram of a list of code cells;
FIG. 46 is a diagram which shows the relationship between FIGS. 47 and 48;
FIGS. 47 and 48 are block diagrams which illustrate an arrangement of local data flow analysis information;
FIG. 49 is a block diagram of an opcode table;
FIG. 50 is a block diagram of a data flow analysis arrangement illustrating the use of read-modify and modify-write fields of the basic block value (BBV) data structure of FIG. 47;
FIG. 51 is a block diagram which depicts the BBSC summary information field of FIG. 48;
FIG. 52 is a block diagram of an arrangement comprising global data flow analysis information;
FIG. 53 is a more detailed block diagram of the global data flow connections of FIG. 52;
FIG. 54 is a block diagram of the control flow edge (CFE) data structure;
FIG. 55 is a flowchart that sets forth steps of performing a global data flow analysis;
FIGS. 56A and 56B are flowcharts that set forth method steps for determining merge points during global data flow analysis;
FIG. 57 is a block diagram of a global data flow analysis arrangement illustrating a merge point.
FIG. 58A-58D are block diagrams depicting different variations of the binary image transformer;
FIG. 59 is a flow chart of steps of translating the binary image;
FIG. 60 is a flow chart of the step for one method for selecting the translation unit to be processed;
FIG. 60A is a representation of a call graph used in the method steps of FIG. 60;
FIG. 61 is a flow chart depicting an alternative method for selecting a translation unit to be processed;
FIG. 62A is a flow chart listing steps for forming an initial intermediate representation (IR) of a binary image;
FIG. 62B is a block diagram of a data structure illustrating a transformation of a source instruction to an IR with memory operands removed;
FIG. 62C is a block diagram of a data structure used to indicate whether an IR instruction corresponds to a machine instruction which can generate an exception;
FIG. 63 is a flow chart showing steps for translating and optimizing an initial IR to produce the final IR for a given translation unit;
FIG. 64 is a flow chart showing steps for performing condition code processing;
FIG. 65A is a block diagram of a bit mask associated with an IR instruction code cell used to represent condition codes that can be affected by the corresponding IR instruction code cell;
FIG. 65B is a block diagram which depicts an example transformation from source instructions comprising the first binary image as affected by condition code processing;
FIG. 66 is a flow chart depicting steps for register processing;
FIG. 67A is a block diagram which depicts a 32 bit register in an architecture which has partial register operands;
FIG. 67B is a block diagram which depicts a transformation of an initial IR as a result of register processing;
FIG. 68A is a block diagram which depicts a code pattern which is detected by early floating point optimization processing;
FIG. 68B is a block diagram which is a table indicating a replacement instruction for a specific code pattern detected in early floating point optimization processing;
FIG. 69 is a flow chart depicting steps for local basic block and global routine optimization processing;
FIG. 70 is a flow chart depicting steps of code selection and operand processing which place the IR in final form;
FIG. 70A is a flow chart depicting steps of intra image call processing;
FIG. 71A is a block diagram depicting a translated image comprising tables used in exception handling;
FIG. 71B is a block diagram depicting a table entry in a translator exception table; and
FIG. 71C is a block diagram depicting run time transfer of control when a translated image is executed and an exception occurs.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Computer System
Referring now to FIG. 1, a computer system 10, is shown to include a processor module 12 which has a high performance processor 12a. The computer system 10 further includes, in addition to the processor module 12, a main memory 14, an disk adaptor 15 and an I/O user interface 18, as well as a monitor 19 all coupled by a system bus 20, as shown. Here the processor 12a is a high performance microprocessor such as an Alpha.RTM. microprocessor manufactured by Digital Equipment Corporation, assignee of the present invention, or other high performance processor.
The main memory 14 is comprised of dynamic random access memory and is used to store instructions and data for use by the microprocessor 12a on the processor module 12. The disk adaptor 15 is used to couple the system bus 20 to a disk bus which itself is coupled to disk storage device 17.
The disk storage device 17 is here illustratively partitioned into a plurality of segments or blocks of data which are here represented for convenience as being self-contained and contiguous, but which may be scattered across the disk 17 and be non-contiguous. The disk 17 includes a first storage segment 17a storing an operating system for the computer system 10 as well as an application program stored in segment 17b.
The application program stored in segment 17b is a non-native executable image. That is, the application program is comprised of instructions from a different instruction set than that used in the computer system 10 (i.e. a different computer architecture). Also the application program could have been written for a different operating system than that stored in 17a. Since the instructions provided in the program stored in segment 17b are different from the instruction set executed on the microprocessor 12a the program in segment 17b can not be directly executed on the system 10.
The disk also includes a storage segment 17c which here represents an native executable image of the application program stored in segment 17b. This native image is generated in the computer system via a binary image conversion system (16, FIG. 2) which is here stored with the operating system in the segment 17a as will be described. The image stored in segment 17c corresponds to instructions which can be executed on the microprocessor 12a and thus conforms to the architecture of the computer system 10.
Also stored in a segment 17d are profile statistics which are collected during execution of a portion of the non-native application program stored in 17b. The profile statistics are provided by execution of a run-time routine which converts non-native instructions into native instructions. These profile statistics are used in a background process to convert portions of the non-native image into a native image corresponding to the operation and function of those portions of the non-native application program. In addition, data which are used for the particular application program are also be stored on the disk in segment 17e.
The computer system 10 further includes an I/O user interface 18 which is here an interface used to couple a mouse 19a, for example, to the system bus 20 as well as a monitor 19.
The computer system 10 operates in a generally conventional manner. That is, at "power on", selected portions (not numbered) of the operating system stored in segment 17a are loaded into main memory 14 and occupy a particular address space in main memory 14, such as, address space 14a. As a user of the computer system 10 executes application programs on the system 10, the application programs are run under the control of the operating system.
A typical operating system represented by that stored in 17a is the so-called Windows NT.RTM. operating system of Microsoft Corporation Redmond, Wash. In Windows NT.RTM. or other window type operating systems, displayable images called "icons" are presented to a user on the monitor 19. These icons represent an executable command to initiate execution of a program. When pointed to by a cursor controlled by a mouse, for example, and clicked on this user action activates the command and causes the represented computer program to execute.
Here, however, the application program stored in segment 17b is written in a non native instruction set. That is, the instruction set of the application program is not the same as the instruction set of the computer system 10. Thus, the executable image of the application program stored in segment 17b is comprised of non-native instructions which can not be directly executed on the computer system 10. Nevertheless, the non-native application has a corresponding icon (not shown) which is represented in the window provided by the operating system.
Each non-native application image has a unique identification name (ID) or image key. The identification name or image key is included in the non-native image file and is a unique identifier for the non-native application image. During installation of the file containing the image, typically a server process portion of the operating system determines the unique ID or key to the non-native application image. The ID number is generally assigned by concatenating together unique information of the file. Examples of the types of information include, the time stamp of the file, the file name, the file size and the date that the file was originally produced. Thus, the same non-native image if loaded a multiplicity of times on the computer system will have the same I.D. number. The statistics as well as the translated code associated with each one of the non-native images will be the union of all prior executions of the non-native images for each instance of the non-native application. Other arrangements are of course possible.
When the user clicks on the icon for the program stored in 17b, a portion of the operating system recognizes the ID of the executable image represented by that icon as being comprised of instructions that are non-native to the instruction set and architecture of computer system 10. In general a software module called a loader in the operating system will recognize that the identification name (ID) of the file represented by the selected icon as being non-native to the architecture. Thus, the operating system initiates the execution of an instruction conversion program 16 or feeds the file instruction by an instruction to an instruction pre-processor. Alternatively, a loader can be provided which handles the non-native image by examining the image to determine all files, libraries and resources needed by the image. The loader will thus prepare the non-native image for execution. Part of the preparation is the initiation of the instruction conversion program 16 or alternatively instruction pre-processor, as will now be described.
Binary Image Conversion System
Referring now to FIG. 2, the binary image conversion system 16 is shown to include a run-time system 32 which is responsive to instructions provided from the disk segment 17b. As mentioned, t he run-time system 32 can be implemented as software to emulate the non-native architecture or as a hardware preprocessor to convert the non-native instructions into native instructions. When implemented as software, the run time system 32 consumes more disk space on disk 17 and occupies more main memory storage in main memory 14. Whereas, when implemented in hardware, the run time system 32 requires more chip space in the high performance microprocessor 12a. Here the run-time system will be described as a software implementation which operates in an execution address space 20 of the computer system 10.
As mentioned above, disk segment 17b stores instructions of an application program complied and/or written for an instruction set which is different from the instruction set of system 10. The run-time system 32 receives portions of a non-native executable image from segment 17b comprised of the non-native instructions. The run-time system 32 provides a native instruction or a native instruction routine comprised of a plurality of instructions which are executed by the computer system 10 to provide the same functionality as the non-native image. That is, the functionality called for in the instruction in the executable image of the non-native instruction set is equivalently provided by the routines determined by the run-time system 32. The run-time system executes the equivalent routines on the computer system 10. This provides the equivalent function to provide the same result in computer system 10 which implements the new architecture as would occur in a new or old computer system (not shown) implementing the non-native architecture.
In a preferred embodiment of the run time system 32, the run-time system 32 examines and tests the code from the segment 17b to determine what resources are used by the instruction and the function of the instruction. The run-time system 32 provides the equivalent instructions corresponding to the architecture of the computer system 10.
As the equivalent instructions are determined they are executed in the system 10 and profile data or statistics, as will be described, are collected in response to their execution. The profile statistics describe various execution characteristics of the instruction sequence. These profile data are fed to a server process 36 via a datapath 32b.
Prior to performing a conversion by the run time system 32, the run-time system 32 interrogates the server process 36 via a path 32a to determine from the server process whether there is a native image corresponding to the routine of the application program stored in segment 17b whose execution has just been requested by a user. If a native image does not exist (as would occur the first time the non-native image is executed), the run-time system initiates an interpretation process. If there is code in existence for the particular instruction reached in the application program, due to a prior execution in the run-time system and subsequent conversion by a background system, the run-time system 32 will request and execute the native code.
As mentioned, in general, the first time the application program 17b is executed by a user there will be no native image code in existence. As the program executes, however, native code will be generated by the background process in a manner to be described, and over time as substantial portions of the non-native image are executed, convertible portions of the non-native image will be converted by the background process into native image code. As native image code is generated, it is also stored in segment 17c in a manner that is transparent to the user.
In addition the native image file 17c contains an address correlation table which is used to track the segments of native code corresponding to segments of non-native code. This table is used at run time of the program in segment 17b to determine whether and which non-native segments have equivalent translated native segments.
Translation into the native image is provided via a background system 34 which operates in one embodiment after the interpreter has finished execution of the instructions to provide translated code dependant upon the execution characteristics of the run-time converted instructions. Alternatively, the background system operates while there is a pause in CPU utilization by the run-time system 32. Alternatively, the background system can make translated code available to the run-time system 32 during execution to permit substitution of translated code for a subsequent occurrence of the non-native image during the current execution of the application program. Further still, the run-time system can be implemented as a preprocessor which provides the profile statistics for use by the background process. The background process can be implemented in hardware or software or a combination of both.
The background system 34 receives the profile data generated by the run-time system 32. In accordance with the characteristics of the profile data, the background system 34 forms a native image of at least portions of the instructions of the non-native image stored in segment 17b of disk 17. A preferred arrangement is to have the background system implemented as a binary translator to produce translated code. The native image portions are stored in logical disk drive 17' for use if needed in subsequent executions of the application program from segment 17b. Here it should be understood that the logical disk drive 17' is a logical partition of the disk drive 17 and is here referred to as being a logical disk drive, because in general, it is transparent to the user, but it physically represents space storage such as segment 17c on the actual disk drive 17. Alternatively, the logical disk drive 17 could be a separate disk drive.
The run-time system 32 and the background system are each under the control of the server process 36. The server process 36 is present throughout the operation of the computer system 10. The server process 36 is a software service process which, amongst other things, is used to schedule various transactions within and between the run-time 32 and background systems 34.
After generation of native image code such as by the binary translator, the image translated code is stored on logical disk drive 17' in logical segment 17c' with the profile statistics being stored in logical segment 17d'. These locations correspond to segments 17c and 17d in FIG. 2.
Each time there is a new execution of the application program stored in segment 17b, the run-time system will send a request to the server process 36 for native code corresponding to the non-native code currently in the run-time system 32. The translated code is code which was generated by a previous execution of the background system 34 in accordance with the profile statistics collected by execution of the routines furnished by the run-time system 32. The server process 36 supplies corresponding translated code (if any) to the run-time system 32. If there is translated code, the run-time system 32 will have the translated code execute in place of interpreting the code. Otherwise if there is no translated code, the run-time system 32 will interpret, translate, or otherwise convert the relevant portions of the non-native code currently executed in the computer system 10.
As more code of the program stored in segment 17b is executed, more sections of the program are interpreted producing as a result of the execution, profile statistics which are fed to the server process 36.
The server process 36 controls inter alia the storage of the profile statistics. That is, the server process 36 will merge new (raw) statistics with previously stored merged statistics to provide a new merged profile. The server process will compare the new merged profile with the stored merger profile and will initiate a translation process in the background system 34 when there is a difference between the two statistics. The degree of difference needed to initiate execution is selectable. Such a difference indicates that heretofore never executed code was interpreted and executed in the run-time system. This process will be ongoing until all portions of the non-native image have been encountered by the user and all of the portions which can be translated by the background system 34 have been translated.
The server process also determines the unique key or I.D. number to uniquely identify the non-native image stored in segment 17b. As mentioned above, the attributes of the image comprising the I.D. include the file size, the date of creation of the image, the time stamp and so forth. This key is also used to identify the profile statistics with the non-native program.
The background system 34 will, in general, translate nearly all instructions provided from the non-native applications stored in 17b. Certain types of instructions are preferably not translated. In general those instructions which are not translated are ones in which the execution of the instruction is not predictable. For example, instructions which are self modifying (i.e. are not in read only sections, that is, are on a writtable page) will not be translated. For these instructions the run-time system will execute them via the interpretation routines. Further, instructions for which in the non-native architecture there is no easily produced analog in the native architecture will not be translated. For example, in the X86 architecture of Intel, floating point instructions use a floating point control register to determine inter. alia. rounding modes etc. Although for many executions of the instructions the contents of the register may be in a normal state, this can not be guaranteed. Rather than have the translator determine the state it is more economical to handle these instructions in the interpreter.
Since execution or profile statistics in part determines what code is translated by the background translator non-instruction code is not mistaken for instructions by the translator. Therefore, the translated code can be optimized without fear of optimizing non-instructions.
Referring now to FIG. 3, the run-time system 32 is shown to include an execution address space containing run-time system 32 which includes a run-time interpreter 44, a non-native loader 42 which is fed the ID corresponding to the non-native application image provided from segment 17b of the disk 17, a native image loader 43, native operating system dll's (dynamic link libraries) 45 and a return address stack management arrangement 20 (FIG. 22). The non-native loader 42 is similar to the native image loader 43 except it is capable of handling non-native images and interrogates the server process to determine whether there is any native code corresponding to the non-native code awaiting execution. The non-native loader 42 receives instructions corresponding to a non-native image of the application segment 46a and a native image of the application 46b corresponding to translated instructions provided from the background translator 34, and segment 46c corresponding to data. The non-native loader 42 is used to initially load the non-native file. The native loader 43 is used to initially load the native file if any.
Referring now also to FIG. 3A, at the initiation of an execution of the program stored in segment 17b, (via selection of the appropriate icon) (step 50a) the native loader 43 determines whether an architecture number associated with the non-native image is a native or a non-native image. If the image is a native image execution continues as normal. If however the image is a non-native image, the native loader 43 calls the non-native loader 42 at step 50b. The non-native loader 42 loads the non-native image at step 50c and also recognizes that this architecture number associated with the program represents an application program written for a non-native instruction set. The non-native loader starts the binary image conversion system 16. The non-native loader 42 initially queries the server 36 at step 50d to respond with native code to accelerate execution of the image represented by the code stored in 17b. It should be appreciated that the function of the native loader 43 and the non-native loader 42 can be combined into a single loader.
If this is the first time running the application, the server 36 responds at step 50e by indicating that there is no corresponding native image to execute in place of the non-native image. Therefore, the non-native loader 42 instructs the interpreter 44 to begin an interpretation at step 50f of the instructions from the non-native image. The interpreter 44, for each instruction, determines the length or number of bytes comprising the instruction, identifies the opcode portion of the instruction, and determines the resources needed by the instruction. The interpreter maps the non-native instruction to a native instruction or a native sequence of instructions based upon inter alia the opcode. These instructions are executed by the computer system 10 in the address space 20 (FIG. 3). The run-time interpreter 44 collects data resulting from the execution of the instructions as will be described in conjunction with FIG. 6. These "profile statistics" are stored by the server 36 on the logical disk drive 17'.
The run-time interpreter 44 examines and analyzes the instructions to determine the proper native instruction sequence to replace for the non-native instructions provided from the executable image 46a. These native instructions as they are executed continue to generate profile statistics which are collected and stored in logical disk drive storage 17c'. This process continues until execution of the program 17b is terminated by the user.
After termination of the execution of the non-native program, a background process 34 is initiated (not shown). Alternatively, the background process 34 could be initiated to steal execution cycles from the run-time process 32 or alternatively could be used to substitute into the run-time process translated native image code for routines which are subsequently called during execution of the program 17b, as explained above. The exact sequence of which the background processor is used in conjunction with the run-time processor is an implementation detail.
For subsequent executions of the program the interpreter 44 will only provide interpreter code if the server process 36 does not return a native image equivalent of the sequence which is provided from the background process 34 as will be described.
Thus, if at step 50e the server responds with native code, the native image loader 42 at step 50g loads the native code. After the native image code is loaded, the non-native image loader 42 is called at step 50h to fix up the image. In general the non-native image will provide address tables corresponding to inter alia variables in the non-native image which are needed in the execution of the native image. That is, at step 50h the native and non-native images are stitched together to enable the native image to use information in the non-native image. At step 50i the native code is executed. In general, the native code that is executed corresponds to one or more basic blocks or routines of instruction which terminate by a return statement. After execution, a determination is made based upon characteristics of the return instruction execution and by use of a shadow stack as will be described, whether native image code can continue to be executed. If not then control is transferred to the interpreter. The interpreter continues to interpret and execute until it determines as at step 50k that it can resume using native code.
As also shown in FIG. 3, a jacketing routine 48 is used to jacket functions leaving the execution address space 20 to the native execution space of the computer process of computer system 10 as well as those arising from the native execution space of the computer processor 10 into the execution address space 20 as will be further described in conjunction with FIGS. 27-40.
Referring now to FIG. 4, a preferred embodiment of the background system 34 is shown under the control of the server 36 (FIG. 1). The server 36 determines, responsive to the profile statistics data provided from the server 36, via logical disk drive 17', whether to initiate a translation process in the background. Preferably, the background system 34 translates only portions of the non-native instructions of the application program which were actually executed (via the interpreter 32) in responsive to a session invoking the program.
The non-native image code is examined at 52 in the server and if the code is the type that should be translated, it is fed to the translator 54. In a preferred environment, the translated code 54 is also fed to an optimizer 58, and again, if the type of code is of a type which can be optimized, it is fed through to the optimizer 58 or else, the process exits or terminates to await the submission of new code from executed portions of the non-native image stored at 17b. Other, techniques for performing translation and translation/optimization will be described. After the translator process 54 and/or the optimization processor 58, either translated code is stored in segment 17b' or optimized translated code is stored in segment 17b'.
Profile File Data Structure
Referring now to FIG. 5, a profile file data structure 60 used to store information gathered at execution time by instructions in the interpreter 34 is shown. The data structure 60 has records which contain information about the execution of a non-native architecture program when the program executes control transfer instructions. The profile record can include other information. That is, the profile records contain information about a target address encountered in the non-native image.
The data structure 60 is shown to include two principal sections. The first section is a profile header section 62 which comprises an image key field 62a. The image key field 62a is used to store information regarding the ID or identification of the profile record. The information in this field 62a is used to associate the profile statistics with a corresponding non-native image and its associated translated code, if any. Thus, the image key field 62a corresponds to the image ID or key field as mentioned above. The profile header 62 also includes a version field 62b comprised of a major version field 62b' and a minor version field 62b". The major version field 62b' and minor version field 62b" are each here 16 bit or 2 bytes in length and their union provides a resulting 32 bit version field 62b. The version fields are used to keep track of which version of the interpreter was used to generate the profile statistics in the table and the profile file format.
The profile file 60 also includes a plurality of raw profile records, here 64.sub.a -64.sub.n. Each of the profile records 64.sub.a -64.sub.n maintains information about run-time execution of control transfer instructions in the non-native image. Each of these records are variable length records as is each of the unique profile files 60. Thus, for each control transfer encountered during execution of the non-native image in the interpreter 34 a raw profile record is produced. The interpreter 34 will place into the raw profile record information regarding the execution of the control transfer instruction. The information which is included in the raw profile record is as described below. Suffice it here to say, however, that the raw profile records are used by the server process to provide a profile record which is then used during translation of the associated routines in the background system.
Referring now to FIG. 6, an exemplary one of the raw profile records here 64.sub.n is shown. The raw profile record 64.sub.n includes a profile record structure 66 including an address field 66a, a flag field 66b and a count field which tracks the number of indirect targets of control transfer 66c. The address field 66a contains the actual target address in the non-native image, as determined by the interpreter 44. This address is the actual target address of the instruction that caused a control transfer during execution of the non-native image. The address field 66a is generally the address length of the non-native architecture or here 32 bits or 4 bytes long. The flags field 66b contains the states of the flags at the target address. The flags field 66b is here 2 bytes or 16 bits long. The n.sub.-- direct field 66c is a counter field which keeps track of the number of indirect target or computed target addresses contained in the remainder of the profile record 64.sub.n as will be described below.
There are additional optional fields 70 which comprise the record. One field is a count field 70a which corresponds to either the number of times a control transfer occurred to the address contained in field 66a or a count branch taken field counter which keeps track of the number of times a branch was taken by the instruction corresponding to the address contained in field 66a. Fields 70b.sub.0 -70b.sub.n correspond to addresses which are the targets of the control transfer and are cumulatively maintained in the profile record structure.
The optional fields 70 are used to keep track or maintain a count of the targets of the control transfer instruction in the image. The count field 70a is either a control transfer field count of the number of times control was transferred to the target address or a branch taken field corresponding to the count of the number of times a conditional control transfer of a branch instruction was taken. The type of field 70a is determined by the flags field 66b being "ANDED" or masked with a value which tests the state of the associated flag. This test determines whether the target address was a result of a control transfer instruction or a branch instruction. This optional field is also a long word.
The target of control transfer fields 70b.sub.1 -70b.sub.n are the target addresses of the control transfer which occurred at the control transfer instruction. These fields keep track of the addresses for indirect transfers, that is, transfers to a run-time computed target address.
The profile statistics are managed by the server process 36. The profile statistics are collected by the interpreter 44 during the course of execution of the emulated code. For each execution the server 36 searches for a profile record corresponding the target address. The server 36 merges the new run-time statistics with the existing statistics to produce a new profile file.
The server 36 makes use of a software cache and hash table (not shown) to keep track of the profile records. For an address which needs to be looked up, the address is looked up in the cache in 4 different locations that is by using a four way associative cache (not shown). If the address is not there it is looked up in a conventional hash table. The information in the hash table is the count values for the fields.
Run-Time Interpreter
Details of an interpreter used to convert non-native instructions to native instructions and provide profile or run-time statistics will now be described. In particular the interpreter 44 interprets instructions of the so-called X86 architecture by Intel Corporation San Francisco, Calif.) into ALPHA instructions by Digital Equipment Corp. will be described.
Referring now to FIG. 7, an X86 instruction 100 is shown to include as many as six different fields. These fields are an opcode 100a, an rm byte 100b, a scaled index and base (sib) byte 100c, a displacement 100d, any immediate data 100e, and any one of six types of prefixes 100f.
The opcode 100a defines the task or operation which will be executed by the instruction 100. The rm byte 100b is an effective address specification and is used in conjunction with the opcode 100a to specify a general operand as a memory or register location and, in some cases, also participates in defining the operation. The sib byte 100c is used in conjunction with the rm byte 100b to provide additional flexibility in addressing memory locations. The displacement field 100d provides a displacement from the base register or from virtual zero of a segment. The immediate data field 100e provides immediate data to the opcode 100a.
The prefixes 100f are located before the opcode 100a in the instruction 100. Possible prefixes 100f are a segment override which implements a second (or multiple) addressing space, a repeat specifier value to repeat a specific instruction n times, a lock assertion for synchronization in multiple CPU environments, an address size prefix which selects between 16 and 32 bit addressing, an operand size prefix which selects between 16 and 32 bit operands, and an opcode prefix which selects an alternative opcode set.
From the opcode 100a it can be determined whether an rm byte 100b, an unconditional displacement, or the immediate data field is provided in the instruction 100. It can be determined from the rm byte 100b whether a sib byte 100c and/or a conditional displacement field 100d is included in the instruction 100. As all fields are not required by each Intel instruction 100, Intel instructions are not of a fixed length, but rather are of varying lengths.
The run-time interpreter 44 (FIG. 3) is, in the preferred embodiment, implemented on a computer system 10 (FIG. 1) which conforms to the Alpha architecture. An Alpha architecture computer system operates using the Alpha instruction set which is comprised of fixed length instructions. The run-time interpreter 44 operates on a single Intel instruction at a time. For each Intel instruction a single Alpha instruction or multiple Alpha instructions forming a corresponding Alpha routine, is provided which is an operational equivalent to the Intel instruction.
To transparently emulate the execution of an Intel or other non-native instruction 100 the run-time interpreter 44 should be capable of emulating the operation of the Intel or non-native memory, registers, condition codes and a program counter which, on a 32 bit Intel machine is referred to as an extended instruction pointer, EIP. In this way, a result of the execution of the instruction 100 is recorded accurately.
The run-time interpreter 44 uses the same memory space for data while executing Alpha routines corresponding to Intel instructions as is used when executing native Alpha instructions. This is possible because the strict standards to which Win32 software applications adhere allow for differences in calling conventions but not in the representation of the data. The maintenance of the Intel registers, condition codes and EIP are discussed below.
Referring now to FIG. 8, a table 101 depicting Intel or non-native values assigned to the registers of computer system 10 is shown to include eight registers which are assigned to emulate the operation of the eight Intel integer registers, EAX 104a, EBX 104b, ECX 104c, EDX 104d, EDI 104e, ESI 104f, EBP 104g, and ESP 104h. A single register, CONTEXT 105, is assigned to serve as a pointer to the emulator state context maintained in memory which is used to manage each thread executing in a multitasking environment. An additional register, FSP 106, stores a floating point stack pointer for addressing an eight entry stack of floating operands.
Three registers, CCR 107a, CCS 107b, and CCD 107c are assigned to store information which allow condition code bits to be maintained in an unevaluated state by the on-line interpreter 44. The SHADOW 108 register provides a pointer to the shadow stack (as will be described) which maintains activation records for translated code. The SEGOFF 109 register maintains an offset from address zero in the native architecture memory permitting the native architecture to emulate multiple addressing spaces which are possible in the Intel architecture and other non-native architectures. Four additional registers T0 110a, T1 110b, T2 110c and T3 110d are assigned as temporary registers.
The frame 112 register identifies the activation record at the most recent activation of the run-time interpreter 44. The Emulator's Return Address, ERA 114, register stores the return address when the run-time interpreter 44 calls a private sub-routine. The Effective Address, EA 116, register stores the result of evaluating an RM byte 100b and to specify a memory address to a memory access routine.
Seven of the remaining registers, NXTEIP 118a, NXTQ.sub.-- LO 118b, NXTQ.sub.-- HI 118c, NXTJMP 118d, Q0 118e, Q1 118f and QUAD 120 retain values which are used by the interpreter 44 to identify a complete Intel instruction 100 from the instruction stream and to provide pipelining capabilities.
To identify an Intel instruction 100, the run-time interpreter 44 assembles an eight byte (64 bit) snapshot of the instruction stream beginning at the start of the current Intel instruction number. This quadword is retained in QUAD 120.
To assemble QUAD 120, the run-time interpreter 44 captures two quadwords of information from the instruction stream. The run-time interpreter 44 uses the address in the instruction stream identified by the next extended instruction pointer, NXTEIP 118a, as the starting address for the first quadword. NXTEIP 118a identifies a random byte in the instruction stream at which the next instruction to be executed begins. Here, computer system 10 (FIG. 1) requires a quadword aligned address for this initial capture. Accordingly, if NXTEIP 118a is not a quadword aligned address, the three low order bits are first zeroed thus forcing the capture to occur beginning at a quadword boundary. The quadword captured beginning at this quadword aligned address is stored in register Q0 118e. By executing the capture in this manner, the quadword stored in register Q0 118e will at least provide the low byte of the next instruction.
The second quadword capture occurs at an address identified by NXTEIP 118a incremented by seven bytes. Here again, computer system 10 requires a quadword aligned address for this second capture. If the address identified by NXTEIP 118a incremented by seven bytes is not quadword aligned, the run-time interpreter 44 forces the three low order bits to zero thus forcing the address to be quadword aligned. From this quadword aligned address, the capture is performed and the quadword is stored in register Q1 118f. Here, the quadword stored in register Q1 118f contains at least the high order byte of the quadword beginning at the next instruction as identified by NXTEIP 118a.
To extract the low order bytes of the quadword beginning at NXTEIP 118a, the run-time interpreter 44 executes an instruction which, using the three low bits of NXTEIP 118a, determines a byte in register Q0 118e which is identified by NXTEIP 118a, whether or not this byte is quadword aligned. The data in register Q0 118e is copied to register NXTQ.sub.-- LO 118b and shifted right to locate the byte identified by NXTEIP 118a in the low order byte register NXTQ.sub.-- LO 118b. The high order bytes of NXTQ.sub.-- LO 118b which, after the shift, no longer contain valid information are zeroed.
The three low bits of the address identified by NXTEIP 118a incremented by seven bytes is used to determine the high order byte of the quadword beginning at NXTEIP 118a. Here, the data in register Q1 118f is copied to register NXTQ.sub.-- HI 118c shifted left to locate the byte identified by NXTEIP 118a incremented by seven bytes in the high order byte of register NXTQ.sub.-- HI 118c. Here, the low order bytes of NXTQ.sub.-- HI 118c which no longer contain valid information as a result of the shift are zeroed. The result of ORing the contents of registers NXTQ.sub.-- LO 118b and NXTQ.sub.-- HI 118c is stored in QUAD 120.
Referring now to FIG. 9, the low bit of QUAD 120 is shown to be aligned with an Extended Instruction Pointer, EIP 121. In an Intel machine, the EIP 121 identifies a location in the instruction stream which corresponds to the beginning of the current instruction. As each instruction in the instruction stream is executed, the EIP 121 is incremented in the instruction stream to point to the beginning of the next instruction. QUAD 120, therefore, holds a quadword of information beginning at the byte identified by EIP 121.
To determine the operation of the Intel instruction 100 and a corresponding Alpha routine which performs the operational equivalent of the Intel instruction 100, the interpreter uses the information contained in QUAD 120. Typically, the first byte of an Intel instruction is the opcode 100a as shown in FIG. A. The run-time interpreter 44 extracts the first and second low bytes 120a, 120b of QUAD 1002 to provide a two byte instruction fragment 122. From this two byte instruction fragment 122, a corresponding Alpha routine and the length of the instruction 100 are determined.
Referring now to FIG. 10, an arrangement 130 to determine the length of the Intel instruction 100 and the corresponding Alpha routine which implements the operational equivalent of the Intel instruction 100, is shown. The arrangement 130 extracts the two low bytes 120a, 120b from QUAD 120 to provide the two byte instruction fragment 122. This two byte instruction fragment 122 is used as an index into a dispatch table 131 which resides in system memory 14 (FIG. 1).
The dispatch table 131 includes 2.sup.16 =64K (65536), 32 bit entries of which entry 131i is representative. Each entry corresponds to each instruction in a set of instructions available in the Intel instruction set. The contents of these 32 bit entries 131i include a field 131a containing an address at which the corresponding Alpha routine resides in system memory 14 as well as a field 131b containing the length of the instruction.
The dispatch table 131 is generated by a tool which identifies each instruction in the Intel instruction set such that the two byte instruction fragment 122 is sufficient information to identify the proper entry which corresponds to the current Intel instruction 100. The tool also provides the complete length of the Intel instruction 100 and includes this information in the dispatch table in the length field 131b along with the location of the Alpha routine which will provide the functional equivalent of the Intel instruction 100 in the address field 131a. The run-time interpreter 44 chooses among eight dispatch tables based upon the sequence of prefix elements 100f preceding the actual opcode 100a.
As discussed above in conjunction with FIG. 7, an Intel instruction 100 may be comprised of multiple elements 100a-100f. Multiple dispatch tables are provided by run-time interpreter 44 to handle the different values and combination of values which can be selected by the prefix element 100f. As discussed above, three possible prefixes 100f are addressing size (16 or 32 bits), operand size (16 or 32 bits) and two byte opcode, which selects an alternative opcode set. Any one or combination of these prefixes 100f may be present in an Intel instruction 100.
The addressing size prefix toggles between an addressing size for the Intel system which truncates address arithmetic to 16 bits or to 32 bits. Typically, the address size is 32 bits. The operand size prefix is similar wherein an operand expected by the system is 16 bits under a 16 bit operand size or 32 bits when the operand size is set for 32 bits. Here again, the typical operand size is 32 bits. The final prefix toggles between two alternative opcode sets. The first is a one byte opcode set and the second is a two byte opcode set. Here, a one byte opcode set is typically selected. A dispatch table similar to the dispatch table 131 in FIG. 10 is provided in system memory 14 for each of the eight possible combinations of prefixes 100f, the default dispatch table is dispatch table 131 having a 1 byte opcode with a 32 bit addressing size and a 32 bit operand size.
In addition to an entry for each instruction, also included in dispatch table 131 is an entry for each prefix 100f and prefix 100f combination. The 32 bit entry 131j, corresponding to a prefix 100f, activates a different dispatch table in memory 14 in which the subsequent opcode 100a in the instruction stream and its corresponding two byte instruction fragment 122 may be used to index the proper 32 bit entry 131i.
Referring now to FIG. 11, a process for activating an alternate dispatch table 131' is shown to include extracting a two byte instruction fragment 122 from QUAD 120. The two byte instruction fragment 122 is used as an index into the dispatch table 131.
Here, the two byte instruction fragment 122 identifies an entry in the dispatch table 131j. The dispatch table entry 131j includes a native routine address 131a in memory 14 and the length 131b of the Intel instruction 100 which here, is 001 or one byte. The first byte of the two byte instruction fragment 122 is a prefix 100f to instruction 100 which selects 16 bit addressing. Accordingly, the native routine 132 identified by the native routine address 131a, instructs the run-time interpreter 44 to activate the dispatch table 131' which corresponds to an instruction set implementing 16 bit addressing.
The length 131b of the Intel instruction 100 is provided to the run-time interpreter 44 which increments EIP 121 one byte in QUAD 120 to identify the beginning of the next instruction. A new two byte instruction fragment 122' is extracted from QUAD beginning at the new location identified by EIP 121. This two byte instruction fragment 122' identifies an entry 131i' in dispatch table 131'. Again, the two portions of the dispatch table entry 131i' identify the native routine address 131a' in memory 14 of the native routine 134 which is the operational equivalent of the Intel instruction 100 and the length 131b' of instruction 100.
The run-time interpreter 44 executes the native routine 134 which provides the operational equivalent of Intel instruction 100. Once complete, the on-line interpreter activates the default dispatch table 131 for 32 bit addressing and operands and one byte opcodes. While the run-time interpreter 44 is executing the native routine 134 for Intel instruction 100, the process just described allows the run-time interpreter 44 to identify the beginning of the subsequent instruction by incrementing EIP 121. In addition, the entry in the active dispatch table 131 which corresponds to the subsequent instruction is also identified. From this entry 131n, the address of the native routine 131na corresponding to the subsequent instruction as well as the length 131nb of the subsequent instruction are determined. This arrangement allows the on-line interpreter to operate in a pipelined fashion, executing multiple instructions in parallel.
Referring now to FIG. 12, a 32 bit entry 131i from dispatch table 131 is shown to be divided into two sections, the first section 131a corresponding to bits 3-31 of the 32 bit entry 1012 and the second section 131b corresponding to bits 0-2 of the 32 bit entry. Bits 3-31, section 131a are used to address the Alpha routines which execute the operational equivalent of the Intel instruction 100 and bits 0-2 131b signify the length of the Intel instruction 100.
The dispatch table targets are aligned on quadword boundaries. That is, the Alpha instructions which the entries in the dispatch table 131 point to and execute the operational equivalent of Intel instruction 100, are located in system memory 14 on quadword boundaries. In this way, bits 0-2 of the address of the Alpha instructions are always zero. As a result, bits 0-2 131b' may be used to convey additional information about the instruction as here, where these bits are used to signify the length of the instruction. As the addresses of the Alpha routines are always 000 in bits 0-2 field 131b', a full 32 bit address is recreated by appending these zeros to bits 3-31 1012a to provide a complete 32 bit address.
As control is passed to the Alpha routine identified by the 32 bit address, bits 0-2 are used to increment EIP 121 so that EIP 121 is pointing to the beginning of the next instruction. Here, if the length of the Intel instruction 100 is from 1-6 bytes in length, QUAD 120 contains sufficient information to form a second, two byte instruction fragment 122 which may be used to index the current dispatch table to determine the corresponding Alpha routine for the next Intel instruction. This arrangement allows the run-time interpreter 44 to pipeline instructions and thus execute the application program more quickly and efficiently. While an Alpha routine is being accessed corresponding to a current instruction, the run-time interpreter 44 is able to determine the address and length of the next Intel instruction 100 in the instruction stream. A value of zero returned from bits 0-2 field 131b of the 32 bit entry 131i for the length of the Intel instruction 100 however, indicates that the instruction was longer than 6 bytes and hence, pipelining is not possible for this Intel instruction and accordingly, the EIP 121 is not incremented. It is then the responsibility of the Alpha routine to increment EIP 121 and to refill the pipeline.
Condition Code Processing
Referring now to FIG. 13, general purpose registers 135 of an Intel X86 machine are shown to include a single register, EFLAGS 135a, in which condition codes are maintained. This register, EFLAGS 135a, maintains the six condition code bits, the Carry bit 136a (C), the Negative bit 136b (N), the Zero bit 136c (Z), the Overflow bit 136d (O), the Parity bit 136e (P), and the Low Nibble Carry bit 136f (A). Each of these bits may be cleared or set as a result of the execution of an Intel instruction 100. To completely emulate the operation of the Intel application, the run-time interpreter 44 also maintains, in an unevaluated state, the current state of the condition codes resulting from the execution of an Alpha routine which corresponds to the Intel instruction 100.
As is often the case in systems which maintain condition codes, a subsequent condition code modifying instruction may be executed, thus overwriting the changes made to the condition code bits by a prior condition code modifying instruction, before the state of the condition codes is required by a subsequent instruction. In addition, many of the condition code modifying instructions effect only a partial set of the condition code bits. Accordingly, a complete evaluation of the condition code bits after execution of every condition code modifying instruction would be wasteful at CPU time. Nevertheless, the state of the condition code bits needs to be readily ascertainable throughout the execution of the X86 image should the current state of the condition codes be required.
Referring now to FIG. 14, the run-time interpreter 44 is shown to include a set of data storage locations 138, a table of methods 139, and evaluation routines 140 which are used to emulate the X86 condition codes during execution of an X86 image in computer system 10.
The set of data storage locations 138 is shown to include three locations 138a, 138b, 138c which are updated upon execution of an instruction which would have modified the condition codes in an X86 system. The first location, data1 138a, and the second location, data2 138b, store data used in the execution of the instruction, for example, an operand and a result of the instruction. This information is used later during execution of the application program should it become necessary to evaluate the condition codes.
The third location, pointer 138c, contains a pointer to the table of methods 139 which is a dispatch table used to evaluate the condition codes should the system require the current value of the condition codes. The table of methods 1022 contains an entry for each of the eight predicates available in X86 conditional branches (and equivalent SETcc instructions), an entry to obtain the nibble carry, A 136f, bit and an entry to obtain a complete image at the EFLAGS 135a register. The set of methods includes one for each of the six condition codes.
Each entry in the table of methods 139, identifies an evaluation routine 140 which evaluates the condition described in the method table entry. Data1 138a and data2 138b are provided to the evaluation routines to determine the state of the condition code bits should a subsequent instruction require the current state of the condition codes.
When an Alpha routine is executed for an Intel instruction which would have modified one or more of the condition codes, the run-time interpreter 44 stores zero to two pieces of information from the instruction in the first two storage locations, data1 138a and data2 138b. These pieces of information, possibly an operand and a result of the operation, are used by the evaluation routines to compute the condition codes. In the third storage location, pointer 138c, a pointer is placed which, in accordance with the type of instruction which was executed, identifies the entry in the table of methods 139 which will identify the evaluation routines 140 which are to be called if and when the condition codes are evaluated.
The table of methods 139 is specific to the type of instruction executed. That is, if the instruction modifies all of the condition codes, the table of methods includes an entry pointing to a routine for each of the six condition codes. If the instruction modifies only the C bit, the only entry in the table of methods 138 is a entry pointing to an evaluation routine which will evaluate the C bit. Other possibilities include instructions which modify all of the condition code bits except for the C bit (ALL.sub.-- BUT.sub.-- C) instructions which modify only the Z bit (ONLY.sub.-- Z) and instructions which modify only the C and O bits (C.sub.-- AND.sub.-- O). The table of methods 139 for instructions of these types would include entries pointing to routines which correspond to all but the C bit, only the Z bit and only the C and O bits respectively.
Each entry in the table of methods 138 identifies a separate evaluation routine 140 which computes that specific condition code predicate or image of EFLAGS 135. Because these routines are only executed when necessary, the condition codes are maintained in an unevaluated state and accordingly, only minimally effect the execution speed of the application. Data1 138a and data2 138b are provided to the evaluation routine 1024 to determine the effect the instruction had, or should have had, on the condition codes. Later, when a subsequent instruction is encountered by the run-time interpreter 44 which requires the current value of one or all of the condition code bits as input to the instruction, for example, as a condition in a conditional instruction, the run-time interpreter 44 uses the information provided in the data storage locations 138a and 138b, the table of methods 139 and the evaluation routines 140 to determine the current values of the condition code bits.
As discussed above, an Intel instruction can modify all condition code bits, or a subset of those bits. If the current instruction which modified the condition code bits modifies only the C bit and the previous instruction modified all of the condition code bits it would be wasteful to gather the data necessary to evaluate all but the C bit and copy it into the table of methods 139 which is provided for the current C bit modifying instruction. As a result, the run-time interpreter 44 maintains information to evaluate the previous state of the condition code bits based upon a previous condition code modifying instruction as well as the current condition code modifying instruction.
Referring now to FIG. 15, the interpreter is shown to include two sets of data storage locations 138 and 138', two corresponding tables of methods 139 and 139' and corresponding evaluation routines 140 and 140'. A first condition code evaluation grouping 137 corresponds to a current condition code modifying instruction and a second condition code evaluation grouping 137' corresponds to a previously executed condition code modifying instruction. Further, a finite state machine (FSM) is provided which determines how the previous and current states of the condition codes are maintained. The states and transitions of the FSM are the five types of condition code updates: ALL.sub.-- BUT.sub.-- C, ONLY.sub.-- C, C.sub.-- AND.sub.-- O, ONLY.sub.-- Z and ALL. Each transition has associated with it one of three actions: replace, push or resolve.
Provided below is a table, TABLE 1, which describes the action taken to maintain the condition code bits. The action is contingent upon which condition code bits the current instruction will modify as well as which condition code bits were modified by a previously executed condition code modifying instruction. In addition, the actions have been carefully selected to provide an action for the transition which entails a minimal amount of work yet still provides the run-time interpreter 44 a complete up-to-date set of condition code bits at any time.
In a replace action, the contents of the current condition code evaluation grouping are replaced by the values resulting from the next instruction. That is, the contents of the data storage locations 138, the corresponding table of methods 139 and the evaluation routines 140 are replaced with values which will enable the run-time interpreter 44 to evaluate the condition codes modified as a result of the next instruction. A replace action does not modify the contents of the previous condition code evaluation grouping. A replace action is appropriate when the set of condition code bits modified by the next condition code modifying instruction includes at least all of the condition code bits in the set of condition code bits modified by the most recent condition code modifying instruction.
A push action however, replaces the contents of the previous condition code evaluation grouping 137' with the contents of the current condition code evaluation grouping 137. The current condition code evaluation grouping 137 is used to provide the necessary information to evaluate the condition code bits modified by the next instruction. A push action is appropriate when the set of condition code bits modified by the next condition code modifying instruction does not include all of the condition code bits in the set of condition code bits modified by the most recent condition code modifying instruction. In addition, a union of the two condition code bit sets results in a complete set of condition code bits.
The final action is a resolve. The resolve is the most complicated of all the actions. In a resolve, the state of the condition codes, as represented by the current and previous condition code evaluation groupings 137 and 137', is evaluated resulting in a complete set of condition code bits, or an ALL, in the current condition code evaluation grouping 137. A push is then performed for the next instruction. A resolve action is appropriate when more than two condition code evaluation groupings would be necessary to maintain a complete set of condition code bits.
TABLE I
______________________________________
Most Recent CC State
Next CC ALL.sub.--
State BUT.sub.-- C
ONLY.sub.-- C
C.sub.-- AND.sub.-- O
ONLY.sub.-- Z
ALL
______________________________________
ALL.sub.--
replace push push replace
push
BUT.sub.-- C
ONLY.sub.-- C
push replace resolve resolve
push
C.sub.-- AND.sub.--
push replace replace resolve
push
ONLY.sub.--
resolve resolve resolve replace
push
Z
ALL replace replace replace replace
replace
______________________________________
As mentioned above, the first condition code evaluation grouping 137 maintains in an unevaluated state the state of the condition codes corresponding to the execution of a current instruction. The second condition code evaluation grouping 138 maintains in an unevaluated state the state of the condition codes corresponding to the execution of a previous instruction.
The first set of data storage locations 138 here, registers CCR 107a, CCS 107b and CCD 107c retain three values. CCR 107a and CCS 107b contain data used by the current, non-native instruction such as an operand and a result of the instruction. CCD 107c contains a pointer to the dispatch table 139 provided to evaluate the state of the condition codes which are modified as a result of the execution of the current instruction. The second set of data storage locations 138' retain similar values corresponding to a previous condition code modifying instruction.
Here, each condition code evaluation grouping 137, 137' is shown to include a location in the respective table of methods 139, 139' which indicates the category of instruction which was executed. That is, whether the instruction modifies all of the condition code bits or a subset of the condition code bits. Using this value and the information in the FSM of TABLE I, the run-time interpreter 44 maintains in an unevaluated state, the complete set of condition code bits.
To illustrate how this works, an example is provided in conjunction with FIG. 15, in which a current instruction modifies all of the condition code bits (ALL) and a next instruction modifies only the C bit (ONLY.sub.-- C). In this simple example, the contents of the second condition code evaluation grouping 137', which provides the previous condition code state, is immaterial as will be shown.
As the current instruction modifies all of the condition code bits, the category location 139a of dispatch table 139 would indicate an ALL value. Accordingly, an entry for each of the six condition code bits is provided in dispatch table 139a to access evaluation routines 140 for each condition code bit.
When the corresponding Alpha routine for the next instruction is executed, the category location 139a of the current dispatch table is accessed to determine the category of the previous instruction. Using the category information provided and the information contained in TABLE 1 the run-time interpreter 44 manipulates the contents of each condition code evaluation grouping 137, 137' accordingly.
Here, the category of the most recently executed instruction is ALL while the category of the next instruction is ONLY.sub.-- C. As shown in TABLE I, when the most recent condition code state is an ALL and the next instruction is an ONLY.sub.-- C, the action which is to be taken is a push. Here, a push is an appropriate action because the set of bits modified by the next condition code modifying instruction, {C}, does not include all of the bits modified by the most recently executed condition code modifying instruction, {C, N, O, P, A}. Moreover, a union at the two condition code bit sets results in a complete set of condition code bits, {C, N, Z, O, P, A}.
The information retained in the current condition code evaluation grouping 137 is pushed or copied into the storage locations for the previous condition code evaluation grouping 137'. That is, the data in CCR 138a and CCS 138b are copied to pdata1 138a' and pdata2 138b' respectively and CCD 138c is copied to pptr 138c'. The current condition code evaluation grouping 137 is then used to store the data used to evaluate the C bit which is the only condition code bit modified by the next instruction. An example is provided below in conjunction with FIGS. 16 and 17 which describes a resolve action.
Referring now to FIG. 16, a set of condition code state diagrams 150 includes a condition code state 152 diagram for a previously executed condition code modifying instruction, a condition code state 154 diagram for a most recently executed condition code modifying instruction and a condition code state 156 diagram for a next condition code modifying instruction. Here, the previous condition code state 152 is ALL.sub.-- BUT.sub.-- C in which all but the C bit is modified. The most recent condition code state 154 is C.sub.-- AND.sub.-- O in which only the C and O bits are modified as a result of the execution of the most recently executed condition code modifying instruction. The next condition code state 156 is ONLY.sub.-- C in which only the C bit is modified.
Referring back to TABLE 1, it may be seen that when the most recent state is C.sub.-- AND.sub.-- O and the next state is ONLY.sub.-- C the appropriate action to be taken is a resolve action. It can be seen from FIG. H a replace action would not preserve the most recent state of the O bit as the current condition code state would be overwritten by information only capable of determining the C bit. A push however would lose the information necessary to determine the most recent values of the N, Z, P and A bits. As discussed above, more than two condition code evaluation groupings would be required to fully preserve the current states of each of the condition code bits. Accordingly, the information stored in the first and second condition code evaluation groupings 137, 137' is resolved resulting in a complete set of condition code bits.
Referring now to FIG. 17, the most recent condition code state 154' diagram is shown to contain a complete set of condition code bits. As a result of the resolve action, the most recent condition code state 154' is ALL and the next condition code state 156' is an ONLY.sub.-- C. Referring again to TABLE 1, the appropriate action to be taken is a push when the most recent condition code state is ALL and the next condition code state is ONLY.sub.-- C. Accordingly, the run-time interpreter 44 can push the condition code information resulting from execution of the next instruction without losing any condition code bit information.
Referring now to FIG. 18, the previous condition code state 152" diagram is shown to indicate a complete set of condition code bits which was pushed from the most recent condition code state 154' in FIG. 17. The most recent condition code state 154" diagram of FIG. 18 now indicates execution of a condition code modifying instruction which modified only the C bit. As may be seen, all information relating to the most current state of each of the condition code bits has been preserved.
Multiple Address Spaces
Referring now to FIG. 19, an implementation of multiple address spaces on an Intel machine is shown to include segments CS 160, DS 162, and SS 164 identifying address 0 166 of a first address space 168 and segment FS 170 identifying address 0 172 of a second address space 174. Data X 168i is located within the first address space 168 and data Y 174i is located within the second address space 174.
It should be noted that the first address space 168 and the second address space 174 exist independently from each other. Accordingly, there is no relationship between the location identified by segments CS 160, DS 162, and SS 164 and segment FS 170. Nor is there any relationship between the address of the location of data X 168i in the first address space 168 and address of the location of data Y 174i in the second address space 174.
Referring now to FIG. 20, emulation of multiple address spaces on a native architecture is shown to include segments CS 160', DS 162', and SS 164' identifying address 0 166' of a first address space 168' and segment FS 170' identifying address 0 172' of a second address space 174' where segment FS 170' has an offset 175 from address 0 166' of the first address space 168'. The value of the offset 175 is stored in SEGOFF 109 (FIG. 8).
Context Data Structure
Referring now to FIG. 21, a context data structure 180 which resides in memory is shown. The context data structure 180 is used by the on-line interpreter 44 to handle multitasking capabilities of the non-native software application. When, due to multitasking, an additional thread is executed during operation of the non-native software application, a snap-shot of the current state of the run-time interpreter 44 is saved in context data structure 180. The context data structure 180 is used by the new thread to provide the run-time interpreter 44 executing in the new thread the state of the run-time interpreter 44 executing in the thread which initialized the new thread.
Values which are saved in the context data structure 180 include the current condition code state in field 181. Thus, this field includes subfields (not shown) to provide copies of the values stored in registers CCR 138a, CCS 138b and CCD and 138c. Values are provided in field 182 to store the previous state of the condition code bits. The context data structure also includes copies of the integer registers EAX 104a, EBX 104b, ECS 104c, EDX 104d EDI 104e, ESI 104f, EBP 104g and ESP 104h in field 183.
In field 183 values for the six segments (seldomly used in WIN32 applications) are provided. The six segments, four of which are depicted in FIGS. 19 and 20 are cs, ds, es, fs, gs and ss. A copy of the floating stack pointer 106 (FIG. 8) is also provided in field 185 in addition to a starting value for the floating stack pointer as well as the floating stack entries.
Field 186 of the context data structure 180 provides pointers to each of the eight possible dispatch tables. Exemplary dispatch tables 131 and 131' are depicted in FIGS. 10 and 11. The context data structure 180 also provides in field 187 the Extended Instruction Pointer, EIP 121.
A repeat specifier value, as designated by one of the possible prefixes 100f (FIG. 8), is provided in field 188. Values relating to the Emulator Return Address, ERA 114, register are stored in field 189. In fields 190 and 191 pointers used to maintain the profile table as well as pointers to portable math routines are also provided respectively. Values of selected constants are also provided in the context data structure 180 in field 192 while pointers to maintain a linked list of context data structures is provided in field 193.
An additional aspect of a preferred embodiment includes structuring the order of the software which implements the run-time interpreter 44 such that critical blocks of the software code exist in a single cache block. In this way, the run-time interpreter 44 is able to execute more efficiently as the portions of the interpreter 44 which are executed most often are resident in the cache.
Non-Native Return Address Stack and Shadow Stack
Referring now to FIG. 22, a return address stack arrangement 210 is shown to include a non-native return address stack 211 and a shadow stack 212. The non-native return address stack 211 is an address stack which is produced as if the non-native image were executing in the non-native environment. The non-native return address stack 211 comprises a plurality of frames 219, each of said frames including a corresponding one of non-native return address fields 213a-213c, as well as fields 215a-215c for local storage, as shown. The non-native return address stored in locations 213a-213c corresponds to the routine return address that is pushed onto the stack by the program when it executes a call instruction. That is, the non-native program when executing in a native environment would place on the stack 211 a particular return address corresponding to the address space as if the non-native program was executing in its native environment.
As also mentioned, the return stack arrangement 210 also includes a shadow stack 212. The shadow stack 212 likewise is comprised of a plurality of frames 214, each of said frames 214 comprising a header field 216a-216c and corresponding or associated local storage fields 218a-218c.
The return address arrangement 210 also includes a pair of stack pointers, one for the non-native return stack 211 and one for the shadow stack 212. The non-native return address stack pointer 217 also referred to as SP points to the bottom or most recent entry in the non-native return address stack. Here the non-native return address stack 211 has an initial address A.sub.0 of < 7FFFFFFF >. The initial address of < 7FFFFFFF > insures that as the stack pointer SP is decremented, the largest stack pointer value will not be sign extended by an LDL instruction as will be described. Likewise, the shadow stack 212 has a stack pointer 221 referred to as SSP and has an initial address A.sub.0 =< 0000000077FFFFFFF >.
The header portion 216a-216c of the shadow stack frames 214 here comprises four sub-fields. The first sub-field 220a also referred to as SP is the contents of the non-native stack pointer 17 corresponding to the return address in the non-native stack pointer for the particular shadow stack frame 214. Here the non-native stack pointer corresponds to the size of the emulated operating system. Thus, for a 32 bit operating system, the non-native stack pointer 220a would comprise four bytes.
The second entry 220b in the header 216a-216c is the non-native instruction pointer value 220b. The non-native instruction pointer is the address that is pushed onto the non-native return address stack 211. This address also comprises the same number of bytes as the number of bytes supported in the operating system. Thus, again for a 32 bit operating system, the number of bytes is 4.
The third entry 20c in the header portion 216a-216c is a native return address field 220c. The native return address field 220c comprises the native return address which is placed on the shadow stack if a translated routine executes a call instruction. This corresponds to the address of the native instruction which is to resume execution in the translated routine after the called routine has completed.
The fourth entry in the header 216a-216c is the native dynamic link 220d. The native dynamic link field is a pointer to the previous shadow frame header 214. Thus, in FIG. 22, the value stored in the field "dylnk" corresponds to the location of the next shadow frame header 216b. This value is preferably included in the shadow stack 212 to allow the shadow stack 212 to make provisions for a variable amount of local storage in fields 218a-218c. In situations where the local storage fields are not provided or their size is fixed, it is not necessary to have a dynamic link field.
The local storage fields 215a-215c in the non-native register stack 211 comprises routine calls and routine arguments of the non-native system and is provided to faithfully replicate that which would occur in the non-native system were it being executed on its native architecture. The routine locals and routine arguments stored in the non-native return stack are passed to translated routines via the translation process described above and as will be further described in detail below. In the shadow stack 212, however, provision is also provided for local storage in fields 218a-218c. For example, often when a compiler is used to compile a program, the actual instructions of the program use more logical registers than physically exist in the machine on which the program is to be executed. Accordingly, the compiler often provides temporary storage for logical register manipulations and uses the program stack to store these registers.
Non-Native Return Stack and Shadow Stack Management
The non-native return address stack 211 is managed exactly as dictated by the non-native code being emulated in the interpreter 44. When the interpreter 44 is executing the non-native or non-native code of a particular thread, there is only one native frame on the shadow stack 212 for the interpreter. This permits the interpreter to transfer execution into translated code in the event that there is corresponding translated code to be executed. The interpreter does not push frames onto the shadow stack 212. Further, when transferring into and out of translated routines, the interpreter does not push data onto the native system stack. Rather, when transferring into and out of translated routines, shadow frames 214 are pushed onto the shadow stack 212 to record the state associated with the translated routines.
The shadow stack 212 tends to be synchronous with the routine frames on the non-native return stack. Although calling jackets (48 FIG. 3) may cause another instance of the interpreter 44 to be produced if a callback is performed, and thus push another interpreter frame onto the non-native return address stack 211, once the jacketed operation has been completed this extra frame is removed from the non-native or non-native stack 211.
With a translated routine, however, a shadow frame 214 is pushed onto the shadow stack 212 each time a translated routine is called. The shadow frame 214 includes the space necessary for the translated routine's locals such as the spilled registers mentioned above, and the shadow frame header.
Referring now to FIG. 23, an example of the operation of the shadow stack 212 is shown. The program 230 includes a routine A which has a plurality of instructions, one of which is a call to a routine B (call B) at 233. Routine B, likewise, has a plurality of instructions with the last instruction being a return instruction RET. Program flow 230 represents a program flow for the non-native program executing in its native environment. In routine A, when the Call B instruction 233 is executed, it causes the next instruction at address A.sub.N to be pushed onto the non-native return address stack 211, as shown. The stack pointer for the non-native instruction stack 211 is incremented to the next value, thus pointing to the entry for A.sub.N. Routine B is called by routine A and executes its instructions causing at the last instruction (RET) a return which causes a pop from the non-native return address stack 211. The pop delivers the address A.sub.N on the location of the next instruction to be loaded into the program counter for execution.
Were routine A and routine B translated as mentioned above to provide corresponding translated routines A' and B' (242 and 245) during execution of translated code in the native architecture, an instruction Call B' would be encountered at 243. The shadow frame is allocated at the beginning of a routine for all calls that the routine can make. The instruction Call B' causes the shadow stack to be provided with a shadow stack frame 14 which comprises the four above-mentioned fields 20a-20d and the optional fields for local storage. Thus, in field 20a is provided the contents A.sub.N of the stack pointer (SP) 17 of the non-native return stack 11. This value corresponds to the location where the return address stored in the non-native return address stack 211 for the corresponding native instruction execution will be found.
Likewise, stored in field 220b is a copy of the non-native return address that was pushed on the non-native stack by the execution of the call instruction. The non-native return address is provided by the translated image and corresponds to the non-native call for the particular call in the native or translated image. Here the non-native extended instruction pointer has a value corresponding to A.sub.N. Likewise, stored in field 220c is the value of the native return address A.sub.N '. The dynamic link is stored in field 220d which corresponds to the address of a preceding shadow stack frame header. A new dynamic link is produced by saving the value of the shadow stack pointer prior to allocating a new frame. In location 218 is provided local storage for allocated variables provided during the translation of the corresponding routines A' and B' from the translator as mentioned above.
Both the interpreter 44 (FIG. 3) and the translator 54 (FIG. 4) use the shadow stack 212 for determining the next instruction to be executed upon the processing of a return instruction. When translated code is executed in the computer system and a return instruction is encountered, a check is made to determine whether the code that followed the native call in the translator routine was well behaved.
That is, two assumptions are tested. The first is that the non-native code was well behaved with respect to the depth in the non-native return address stack 211. The second assumption is that the code was well behaved with respect to the return address. If both of these conditions are not satisfied then the code following the translated call cannot be executed and the instruction flow has to revert back to the interpreter for continuing execution until such time as it encounters another call or return instruction or possibly a computed jump instruction.
These two conditions are determined by examining the value of the contents of the non-native stack pointer SP as stored in location 220a to determine whether it is equal to the contents of the non-native stack pointer 217. As mentioned above the non-native stack pointer 217 corresponds to the current location on the non-native return address stack 211. Thus this test is a measure of whether the non-native stack 211 and the shadow stack 212 are at the same depth. The second check is to determine whether the return address stored in location 220b corresponds to the return address stored in the location in the non-native return address stack 211 pointed to by the value of the SP pointer 217.
This check thus determines that the return address for the non-native instruction is the same in the non-native stack 211 as well as the shadow stack 212. If this condition is not satisfied then the interpreter changed the value of the return address. If either condition is not satisfied, then execution is continued in the run-time interpreter 44 until such time as another call or return or computed jump instruction is encountered.
Call Address Translation Table
Referring now to FIG. 24, a call address translation table 222 is produced during translation of non-native code. As shown the call translation table is appended to the translated code as in field 221. The translated code 221 and the call address translation table 222 provide the image 17c referred to in FIG. 3. The table 222 includes a pair fields one field 223a corresponds to addresses or more particularly to address offsets from the starting address of calls for translated code routines and the other field 223b corresponds to address offsets to the corresponding starting address in the non-native architecture. The table 222 is here appended to the end of the translated image 221 as mentioned above.
Referring now to FIG. 25, the use of the shadow stack 212 as well as a call address translation table as mentioned above is illustrated. As shown in FIG. 25, both table look-ups and shadow stack manipulations are used in the run-time interpreter 44 or a run-time translation system as well as in the execution of translated code. Table look-ups are used for each instance of a call instruction by the interpreter 44 or for each instance of execution of translated code. The shadow stack 212 is used during the processing of return instructions for the interpreter 44 as well as during execution of calls in the translated code.
During execution of translated code there are two possibilities resulting from execution of a return instruction (RET). The first possibility shown as path 256b is that the afore-mentioned test or check is passed and thus the return instruction can safely return and continue execution of translated code. The second possibility shown as path 256a is that if either one of the two checks fails, then execution returns to the possibly updated address in the non-native stack and execution continues or proceeds within the interpreter 44 until such time as a call, computed jump or a second return instruction is encountered.
Similarly, when the interpreter is executing native code in emulation mode, the interpreter likewise performs a check. A first path 258a would be if there is no corresponding translated code available to be used by the interpreter. The second path 258b would be taken if the interpreter encounters a return address in which there is a valid corresponding translated routine. Thus, the shadow stack 212 permits the interpreter to return to execution of translated code without requiring any corruptive or invasive modification of the non-native return address stack 211.
Similarly, with table look-ups when a call 252 is encountered, the interpreter 44 will perform a table look-up which, if there is a corresponding translated routine, will permit the translated code to execute via path 252b. Otherwise, the interpreter 44 will continue execution via path 252a. Similarly, the translated code when it performs a call 254 will determine if there is a corresponding translated routine for the call and, if so, will permit execution via path 254b. Otherwise, control will be transferred back to the interpreter via path 254a.
By providing a shadow stack 212 which runs synchronous to the non-native return address stack 211, several advantages are provided. The first advantage is that since the shadow stack 212 provides storage for native return addresses and other information required in the native system, it is not necessary to place this information on the non-native return address stack 211. Thus, the non-native return address stack 211 is not violated or remains true to that which would occur during normal execution of the non-native program in the non-native architecture. Amongst other things maintaining a true uninterrupted non-native stack 211 permits a non-native exception handler to execute without any complex manipulation to remove native return addresses. In general, when an exception occurs during execution of the native instructions the exception handler in the native architecture only expects to encounter native architecture instruction addresses. And similarly a non-native exception handler only expects to encounter non-native instruction addresses.
Moreover, the shadow stack 212 being accessible to both the translated code and the interpreter 44 permits the interpreter to return control back to translated code since the interpreter can use the shadow stack to determine a valid native return address which will continue execution of translated code. Without the shadow stack 212, therefore, it would be necessary either to place the native return addresses onto the non-native return stack which is undesirable as mentioned above or to make the unit of translation be limited to a basic block. As will be described below this latter option is undesirable since it limits the opportunities for optimization of the translated code. Further, by having a non-native stack 211 and shadow stack 212, non-native return addresses can be separately managed from the native return addresses. This permits exception handlers for each image to properly handle problems which caused an exception since the exception handlers do not have to deal with return addresses associated with foreign code.
Referring now to FIG. 26, a translated routine 260 can have a call 260a which in turn has other calls 261a to 261c to other translated routines such as 262a. Also in a translated routine 264, the routine can encounter a switch/jump instruction 264a which is a computed branch or jump to another routine such as routines 265a to 265c. Management of the shadow stack 212 in conjunction with execution of translated code, execution in an original interpreter and activation of a new interpreter will now be described.
Sentinel Shadow Stack Frame
When a new interpreter activation initializes its native frame for the shadow stack, it pushes a sentinel shadow stack frame header onto the shadow stack 212. The stack pointer address is set at 7FFFFFFF, the largest stack pointer possible, a value which will not be extended by an LDL instruction. This frame is needed for interpreter processing of return instructions. The shadow stack frame return address field 220c is set equal to 1 (a non-zero value) but is never used. The shadow dynamic link field 220d is set equal to 0 to indicate that this is the initial or sentinel frame on the shadow stack. The shadow stack extended instruction pointer is set to 0 and is never used.
During normal interpreter operation, that is, while the interpreter is executing instructions, it does not follow the stack pointer for the shadow stack. Thus, it does not push or place shadow frame entries onto the shadow stack 212 even if the interpreter interprets non-native calls that modify the non-native return address stack 211. If the interpreter encounters a non-native instruction call that calls a non-native instruction routine that has been translated, however, then the interpreter stores the instruction program counter onto the non-native return address stack 211 as in normal operation and into the shadow stack 212. The interpreter 44 also performs a jump to the translated routine's interpreter entry point. The translated routine returns to the interpreter 44 by jumping through one of its entry points as will be described below.
Every translated routine has two entry points. One entry point is called when the interpreter calls it and the other one is called when another translated routine calls it. The entry points only differ in the additional prologue or preparation that is performed when the routine is entered from another translated routine. When a translated routine is entered from another translated routine, the following occurs: The register which contains the native return address is stored into the return address field in the shadow stack for the particular shadow frame header by executing an instruction
STL R26, 4(sp)
This instruction is executed before the shadow stack 212 is extended so that the return address in the shadow stack 212 is always valid for all shadow frames 214 except the top one. This arrangement is required when the shadow frames 214 are discarded as a result of an exception or because execution had to resume in the interpreter. Next the execution falls through to the interpreter entry point.
Translated Routine Entered from Interpreter
When a translated routine is entered from the interpreter, the following happens: A shadow frame is produced for the translated routine. The size of the frame is 16 plus bytes where 16 is the number of bytes needed to represent the header and the additional number of bytes are those used to represent the local storage associated with the translated routine. The shadow frame header dylink field 220d is set to the original stack pointer. The following instructions are executed:
______________________________________
MOV SP, T1
SUB SP, #<16+size>,sp
STQ T1, (sp)
______________________________________
The shadow stack frame is produced using the above sequence.
When a translated routine executes a return instruction to return control to its caller routine, the following occurs. Noting that the current value of the non-native stack pointer points to the non-native return address, the non-native return address is popped off of the non-native return stack 211 into the non-native instruction pointer. If a "Return N` instruction is being performed then also a pop of N argument bytes from the non-native return stack is performed. The following instructions are used to execute these routines
______________________________________
MOV ESP, T1
LDL EIP, (esp)
ADDL ESP, #<4+arg.sub.-- bytes>, ESP
______________________________________
The previous shadow stack frame is located and the contents of the dynamic link are evaluated. Next the native code determines whether the non-native stack pointer and the instruction pointer are the same as expected by the caller. That is, the native code determines that the value of SP is equal to the contents of SP in the stack pointer 17 and the value of IP is equal to the value of the return address stored at the location pointed to by the stack pointer 17.
If these values are correct then the translated routine can return control to the return address stored in the caller's shadow frame (i.e., return control to another translated routine). If either of these checks fail however, then either the call was from the interpreter or the non-native stack has been modified. In either case, execution is resumed in the interpreter after a potential clean-up of the shadow stack 212. The following instructions are used to perform the two checks:
______________________________________
LDQ T2, 8 (T.sub.0)
Loads both gEIP and gESP
SLL T1, #32, T1 The actual ESP before popping
the non-native return address
OR EIP, T1, T1 The actual EIP and ESP in a
quad word
SUBQ T1, T2, T1
LDL T3, 4 (T.sub.0)
Load the native return
address in case it is needed
BNE T1, $1
MOV TO, SP Actual discarded shadow frame
RET (T3,)
______________________________________
where T0, T1, T2 and T3 are available registers in the native architecture which would not interfere with the state of registers in the non-native system.
Translated Routine Calls Another Translated Routine
When the translated routine calls another translated routine, the following occurs. The non-native return address is loaded into a register and the register is pushed onto the non-native return stack 211 and the non-native stack pointer is loaded into the non-native stack pointer field in the shadow stack 212. A jump to subroutine instruction is executed to the translated routine entry point placing the native return address in a register. The translated routine executes until the routine returns to its caller.
It is possible that the translated routine may never return to its caller, for example, if the translated routine detects that the non-native stack 211 has been modified. In this case, if the non-native stack 211 has been modified the interpreter 44 will be entered to clean up the shadow stack 212 and resume execution as mentioned above. If, however, the translated routine does return to its caller, the translated routine will have left the non-native state valid including the non-native stack pointer and will also have left the shadow stack 212 valid insuring that it is in synchronization with the non-native stack 211. Thus, the called translated routine can continue executing.
If a translated routine calls a routine that has not been translated, it then enters the interpreter. The non-native return address is passed to a register in the interpreter 44 and the contents of the register are pushed onto the non-native return address stack 211. This corresponds to the non-native return address. The contents of the register are also loaded into the non-native extended instruction pointer field in the shadow stack 212. The extended stack pointer 217 which points to the non-native return address just pushed onto the non-native return stack is itself loaded into the non-native extended stack pointer field 20a in the shadow stack 212. The non-native address of the routine being called is then loaded into the non-native instruction pointer and a jump to subroutine instruction is executed to the interpreter entry point. A look-up call entry is performed placing the native return address in stack pointer 217. The interpreter stores the stack pointer 217 in the native return address field 220c of the shadow stack 212 and executes until the interpreter 44 interprets a return instruction.
Translated Routine Calls Jacketed Routine
If a translated routine calls a jacketed routine, the following occurs. A jump to subroutine instruction to the jacketed routine entry point is performed placing the non-native return address in the non-native stack pointer 217. The jacketed routine produces a native frame and executes the native routine. Since only operating system supplied entry points are jacketed, these are known to be well-behaved and thus will not alter their return address. Therefore, the non-native stack pointer or the non-native instruction pointer in the shadow stack are not saved and there is no check performed on them before returning from the jacketed routine.
If the jacketed routine performs a call back, then another interpreter activation native frame will be produced and a separate shadow stack will be managed. When the call back returns, the interpreter activation native frame will be removed together with the now empty shadow stack. When the jacketed call returns, it will remove its native frame leaving the stack frame pointing again to the top shadow frame of the previous interpreter activation. As with the above, the jacketed routine may never return to its caller. For example, an exception may occur that causes the call back interpreter to be exited and non-native frames discarded. This will cause the shadow stack 212 to be cleaned up. If, however, it does return to its caller the jacketed routine will have left the non-native state valid including the non-native stack pointer 217. It will also have left the shadow stack 212 valid insuring that it is in sync with the non-native stack 211. Therefore, the caller translated routine can continue executing. Entry to Interpreter Due to Indirect Jump or Switch
A translated routine can also enter the interpreter due to an unknown indirect jump. If translated code performs a jump to a target that is not statically known, for example, indirect jump to a target not listed in the pro |