Method and apparatus for forming a translation unit5842017
Abstract
A computer system for executing a binary image conversion system which converts instructions from a instruction set of a first, non native computer system to a second, different, native computer system, includes an run-time system which in response to a non-native image of an application program written for a non-native instruction set provides an native instruction or a native instruction routine. The run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction. Thereafter, the non-native instructions and the profile statistics are fed to a binary translator operating in a background mode and which is responsive to the profile data generated by the run-time system to form a translated native image. The run-time system and the binary translator are under the control of a server process. The non-native image is executed in two different enviroments with first portion executed as an interpreted image and remaining portions as a translated image. The run-time system includes an interpreter which is capable of handling condition codes corresponding to the non-native architecute. A technique is also provided to jacket calls between the two execution enviroments and to support object based services. Preferred techniques are also provide to determine interprocedural translation units.
Claims
What is claimed is:
1. A method executed in a computer system for forming a translation unit from a binary image, the method comprising the steps of:
gathering, in response to execution of instructions included in said binary image, profile statistics including runtime information from executing said instructions using a runtime interpreter; and
determining, using said profile statistics, said translation unit comprising one or more regions, each region representing an area of contiguous instruction addresses in said binary image in which there are no breaks in the instruction addresses of the area of each region, and wherein said one or more regions in combination are substantially equivalent to a programming routine, said determining step further comprising the steps of:
tracing execution paths of code included in said binary image by using said profile statistics; and
merging a first and a second region into a third combined region when said first and second regions have overlapping or adjacent boundaries.
2. The method of claim 1 further comprising the step of:
optimizing said translation unit wherein said optimizing includes performing procedural and interprocedural optimizations.
3. The method of claim 2 wherein said step of gathering profile statistics is performed by a foreground binary translation system, and said steps of determining and optimizing said translation unit are performed by a background binary translation system executing as a background task.
4. The method of claim 1 wherein said runtime interpreter translates first instructions included in said binary image from a first computer instruction set associated with a first computer system to second instructions from a second computer instruction set associated with a second computer system, said profile statistics including one or more CALL entry-point addresses,
said step of gathering profile statistics including the step of:
gathering said one or more CALL entry-point addresses, each CALL entry-point address corresponding to a target address in said first computer system to which control is transferred during execution of said binary image through a routine invocation; and
said step of determining said translation unit further including the steps of:
tracing flow paths originating from said CALL entry-points; and
determining, in response to said tracing step, said regions.
5. The method of claim 4 wherein said profile statistics include an indirect transfer entry corresponding to an indirect transfer instruction and wherein an indirect transfer entry includes:
a first address identifying the location of an indirect transfer instruction, said indirect transfer instruction transferring control to an address only determinable at runtime using the content of a location that dynamically changes during execution of said binary image; and
a target address list of entries, each entry in said target address list representing a runtime value corresponding to an address in said first computer system and identifying said address to which said indirect transfer instruction transferred control at runtime, said target address list being associated with said first address;
and wherein said step of tracing flow paths originating from CALL entry-points uses said first address and said target address list associated therewith.
6. The method of claim 5 wherein said step of tracing flow paths originating from CALL entry-points terminates a flow path when a routine return is detected or when a determination is made using said profile statistics that an indirect transfer instruction transfers control to a null target wherein said associated target address list has no entry comprising a runtime value corresponding to an address to which said indirect transfer instruction transferred control during execution.
7. The method of claim 5 wherein said step of tracing flow paths originating from CALL entry-points further includes the steps of:
classifying instructions included in said binary image as one of a straight line execution instruction or a flow alteration instruction and wherein, upon executing a straight line execution instruction, the next instruction immediately executed is stored in at an address location in said binary image contiguous to said straight line execution instruction, and wherein, upon executing a flow alteration instruction, the next instruction immediately executed is not guaranteed to be stored at an address location contiguous to said flow alteration instruction causing alteration of straight line execution flow control;
determining, for each of said instructions classified as a flow alteration instruction, whether said each instruction is one of an indirect transfer instruction, or a direct program-counter relative transfer instruction, said indirect transfer instruction using runtime values to determine target addresses, said direct program-counter relative transfer instruction using offsets relative to a current address in a program counter in said computer system, said program counter including the address identifying an instruction in said binary image;
performing, for an instruction classified as an indirect transfer instruction, the steps of:
determining targets of said indirect transfer instruction using an indirect transfer entry corresponding to said indirect transfer instruction and included in said profile statistics, said indirect transfer entry including an address list identifying one or more first branch target locations of said indirect transfer instruction; and
tracing the control flow of each of said first branch target locations included in said address list; and
performing, for an instruction classified as a direct program-counter relative transfer instruction, the steps of:
determining second branch target locations of said direct program-counter relative transfer instruction using an offset stored within said binary image; and
tracing the control flow of each of said second branch target locations.
8. The method of claim 7 wherein said step of tracing said flow paths originating from CALL entry-points produces an instruction list and wherein an instruction having a corresponding instruction address in said binary image is added to said instruction list after being classified in said classifying step, and wherein said step of determining said regions uses said instruction list to determine address boundaries of each of said one or more regions comprising said translation unit using said instruction addresses included in said instruction list to identify areas of contiguous instruction addresses.
9. The method of claim 8, wherein said step of determining regions includes the step of:
merging a first and second translation unit into a third combined translation unit when said first and second translation units each include a common region.
10. The method of claim 7 wherein said step of determining targets of said indirect transfer instruction obtains said indirect transfer entry from a hash table wherein a location of an indirect transfer entry in said hash table corresponding to an indirect transfer instruction is dependent upon an address identifying a location of said indirect transfer instruction in said binary image.
11. The method of claim 4 wherein said profile statistics include a target address entry that includes:
a target address identifying a unique address in said binary image to which control is transferred;
a call flag indicating whether the target address has been the target transfer of control of a routine call; and
a count field being an integer quantity indicating the number of times the target address has been the target of a transfer of control as determined by the runtime interpreter; and wherein each of said one or more CALL entry-point addresses is a target address included in a target address entry with a corresponding call flag indicating that the target address is the target of a routine call.
12. An apparatus that forms a translation unit from a binary image, the apparatus comprising:
a profile statistics gatherer for gathering, in response to execution of instructions included in said binary image, profile statistics including runtime information from executing said instructions using a runtime interpreter; and
a determiner for determining, using said profile statistics, said translation unit comprising one or more regions, each region representing an area of contiguous instruction addresses in said binary image in which there are no breaks in the instruction addresses of the area of each region, and wherein said one or more regions in combination are substantially equivalent to a programming routine, said determiner further comprising:
an execution path tracer for tracing execution paths of code included in said binary image by using said profile statistics; and
a region merger for merging a first and a second region into a third combined region when said first and second regions have overlapping or adjacent boundaries.
13. The apparatus of claim 12 further comprising:
an optimizer for optimizing said translation unit wherein said optimizer includes means for performing procedural and interprocedural optimizations.
14. The apparatus of claim 13, wherein said profile statistics gatherer is included in a foreground binary translation system that gathers said profile statistics, and said determiner and said optimizer are included in a background binary translation system executing as a background task.
15. A memory comprising:
a profile statistics gatherer for gathering, in response to execution of instructions included in a binary image, profile statistics including runtime information from executing said instructions using a runtime interpreter; and
a determiner for determining, using said profile statistics, a translation unit comprising one or more region, each region representing an area of contiguous instruction addresses in said binary image in which there are no breaks in the instruction addresses of the area of each region, and wherein said one or more regions in combination are substantially equivalent to a programming routine, said determiner further comprising:
an execution path tracer for tracing execution paths of code included in said binary image by using said profile statistics; and
a region merger for merging a first and a second region into a third combined region when said first and second regions have overlapping or adjacent boundaries.
16. The memory of claim 15, wherein said runtime interpreter translates first instructions included in said binary image from a first computer instruction set associated with a first computer system to second instructions from a second computer instruction set associated with a second computer system, said profile statistics including one or more CALL entry-point addresses, said profile statistics gatherer further comprising:
an entry-point gatherer for gathering said one or more CALL entry-point addresses, each CALL entry-point address corresponding to a target address in said first computer system to which control is transferred during execution of said binary image through a routine invocation; and
said determiner further comprising:
a region determiner coupled to said tracer for determining said regions.
17. The memory of claim 16 wherein said profile statistics include an indirect transfer entry corresponding to an indirect transfer instruction and wherein an indirect transfer entry includes:
a first address identifying the location of an indirect transfer instruction, said indirect transfer instruction transferring control to an address only determinable at runtime using the content of a location that dynamically changes during execution of said binary image; and a target address list of entries, each entry in said target address list representing a runtime value corresponding to an address in said first computer system and identifying said address to which said indirect transfer instruction transferred control at runtime, said target address list being associated with said first address;
and wherein said execution path tracer traces flow paths originating from CALL entry-points and uses said first address and said target address list associated therewith.
18. An apparatus for forming a translation unit from a binary image, the apparatus comprising:
a foreground binary translation system for gathering, in response to execution of instructions included in said binary image, profile statistics including runtime information from executing said instructions using a runtime interpreter; and
a background binary translation system for determining, using profile statistics, said translation unit comprising one or more regions, each region representing an area of contiguous instruction addresses in said binary image in which there are no breaks in the instruction addresses of the area of each region, and wherein said one or more regions in combination are substantially equivalent to a programming routine, said background binary translation system further including:
an execution path tracer for tracing execution paths of code included in said binary image by using said profile statistics; and
a region merger for merging a first and second region into a third combined region when said first and second regions have overlapping or adjacent boundaries.
19. The apparatus of claim 18, wherein said background binary translation system includes and optimizer that performs procedural and interprocedural optimizations.
20. The apparatus of claim 18, wherein said foreground binary translation system includes a runtime interpreter which gathers said profile information.
21. A computer program product for forming a translation unit from a binary image, the computer program product comprising:
first program code for gathering, in response to execution of instructions included in said binary image, profile statistics including runtime information from executing said instructions using a runtime interpreter; and
second program code for determining, using profile statistics, said translation unit comprising one or more regions, each region representing an area of contiguous instruction addresses in said binary image in which there are no breaks in the instruction addresses of the area of each region, and wherein said one or more regions in combination are substantially equivalent to a programming routine, said second program code further including:
third program code for tracing execution paths of code included in said binary image by using said profile statistics; and
fourth program code for merging a first and second region into a third combined region when said first and second regions have overlapping or adjacent boundaries.
22. The computer program product of claim 21, wherein said second program code for gathering said profile statistics includes code that performs procedural and interprocedural optimizations.
23. The computer program product of claim 21, wherein said first program code includes a runtime interpreter which gathers said profile information .
Description
BACKGROUND OF THE INVENTION
This invention relates generally to computer systems, and more particularly to systems which translate computer programs.
As it is known in the art, computer systems which generally include a central processing unit (CPU), a main memory, and an input-output device interconnected by a system bus are used to execute computer programs to perform a useful task. One type of computer program is an operating system which is used to interface the CPU to an application program. The aforementioned application program is used by a user of the computer system to perform the useful task. The operating system includes the software resources needed by the computer system to interface hardware elements to the computer system as well as to interface the application program and other programs to the computer system.
Application programs typically include programs, such as word processors, which execute on the computer system under the control of the operating system.
Generally, a compiler transforms source code written in a programming language into object code which is an input to a linker producing a binary image or machine executable program. The binary image is usually produced for execution in a particular computer system and generally comprises machine instructions and data. The machine instructions are executed by a computer processor (CPU) in the computer system.
A compiler typically includes a front end, which generally performs language-specific tasks such as syntactic and semantic processing of the source code, and a back end which generally performs tasks including code optimizations in an optimizing compiler, and generation of object code including machine instructions and data.
The compiler converts a unit of source code corresponding to a routine or procedure into object code. In one technique used to produce object code from source code, the front end generates a representation of the source code in an intermediate language which is processed by the back end to optimize the intermediate language representation and produce object code.
The compiler front end generally "filters" the source code by only allowing correctly formed source programs to be processed by the compiler back end. Thus, the compiler back end generally processes, e.g., performs optimizations upon, correctly formed programs having a predefined structure. Components included in a compiler back end, such as an optimizer, generally make assumptions regarding their inputs such that compiler back end optimizations are typically performed on complete and correctly formed routines. As an example, a series of machine instructions includes a return statement in a routine at the end of the body of code associated with the routine. The optimizer, upon detecting the series of instructions, makes an assumption that the series of instructions indicates the beginning or ending of the body of code associated with the routine.
Generally, such assumptions cannot be made when the input to a component of the back end, such as the optimizer, is not guaranteed to have a particular structure, such as input "filtered" by a front-end. Basic assumptions, such as a structurally well-defined routine common to traditional programming languages such as "C" and "Fortran", cannot be made by a back end component using input not having a specified structure or predefined properties.
Optimizations, as performed on source code to generally reduce execution time and reduce system resource requirements, are typically classified into the following four levels: local or peephole optimization, basic block optimizations, procedural or global optimizations, and interprocedural optimizations. The number of assumptions regarding program structure generally increases with each level of optimization, peephole optimization assuming the least and interprocedural optimizations assuming the most regarding program structure.
A peephole optimization uses a window of several instructions and tries to substitute a more optimal sequence of equivalent instructions. A basic block optimization is performed within a basic block of instructions. A basic block is defined as a sequence of instructions with no intervening entry or exit point. A procedural or global optimization is performed upon a group of instructions forming a procedure or routine. An interprocedural optimization is performed amongst or between procedures.
Due to the assumptions regarding program structure, some existing binary image translators have generally employed only peephole and basic block level optimization techniques.
It is generally known that binary images, or machine executable programs, comprising application programs are made for execution in a computer system of a particular computer architecture or instruction set as well as a particular operating system. Computer architectures and operating systems are varied and generally, binary images made for one particular architecture and operating system cannot execute on a different architecture and/or operating system.
New architectures are developed in order to provide significant performance improvements for the hardware associated with the architecture. For example, the so-called Alpha architecture of Digital Equipment Corporation is based upon a 64-bit RISC (Reduced Instruction Set) architecture. On such an architecture, a binary image compiled for that architecture executes much faster with higher performance than a corresponding binary image on other lower performance architectures.
One drawback to a new architecture is that existing binary images comprising an application that executes on an older architecture cannot directly run on the new architecture due to the different instruction sets of the new and older architectures. While it is desirable to migrate to a new architecture, one of the most significant drawbacks for a user is that existing applications and data files are usually not directly transferrable to the new architecture.
As a result, techniques have been developed to assist users in migrating the applications and data from an older architecture to a new architecture. One such technique includes translating a binary image designed for execution in an older computer architecture to another binary image for execution in a new computer architecture.
It would be desirable to apply an efficient binary translation technique producing a translated binary image that executes correctly. Additionally, it would be desirable to improve upon the efficiency and performance of all resulting translated binary images, such as by optimizing to decrease execution time. However, difficulties arise when performing a binary translation due to the lack of information and the inability to make structural assumptions about a binary image being translated. When performing a binary translation, the source code originally compiled to produce the binary image, or other information describing the structure of the binary image, may not be available. Further, the binary image may not have been produced using high-level language source code. For a binary translator to successfully and efficiently translate any binary image, the binary translator cannot generally presume certain properties about a binary image, or that a binary image comprises only "filtered" input, as produced by a compiler. For example, a binary image can be produced using a low-level unstructured programming language such as assembly language source code processed by an assembler. The assembly language source code typically provides minimal information about program organization and routine structure because a low level programming language, like an assembly language, is generally very unstructured imposing few programming restrictions and including few programming language semantics delineating a routine and data.
As a result, problems are generally encountered when performing a binary translation. It can be difficult to distinguish between executable machine instructions and data if a binary image intermixes the two, such as a binary image including instructions and data produced from low-level assembly language programming. When examining a binary image, it is also difficult to define routine boundaries and identify machine instructions corresponding to a particular routine due to the lack of structural program information. A binary translator, therefore, cannot generally make assumptions as to the structure and properties of a binary image undergoing translation.
As a consequence of the foregoing difficulties, performing binary translations are typically difficult, often imperfect, and generally include optimization techniques, if any, which are much less robust and less aggressive than those of traditional compilers. The result of a binary translation is typically a translated binary image which does not perform efficiently when executed.
As a result, although it is desirable to increase the efficiency and performance of a resulting translated binary image, existing techniques of improving the performance of a binary image cannot readily be employed in binary image translation.
SUMMARY OF THE INVENTION
In accordance with the present invention is a method executed in a computer system for forming a translation unit from a binary image. The method includes gathering profile statistics including runtime information from executing instruction using a runtime interpreter. Using the profile statistics, the translation unit is determined. The translation unit includes one or more regions, each region representing an area of contiguous instruction addresses in the binary image.
Further, in accordance with the invention is a memory comprising means for gathering profile statistics that include runtime information from executing instructions included in a binary image as executed by a runtime interpreter. The memory also includes a means for determining a translation unit using the profile statistics. The translation unit includes one or more regions, each region representing an area of contiguous instruction addresses in the binary image.
With such an arrangement, a translation unit analogous to a routine is determined from a binary image, as used during a binary image translation. Local and global optimizations can be performed as part of the binary translation process. In accordance with the invention is a technique for forming translation units of a binary image that affords a new and flexible way to determine a translation unit analogous to a routine enabling components of a background system, such as the background optimizer, to perform procedural and interprocedural optimizations in binary image translations.
The invention affords flexible techniques for forming translation units in that the techniques can be used when performing a binary translation without placing restrictions and making undue assumptions regarding a binary image being translated. This flexibility allows the invention to be applied to generally all binary images rather than restricting application of the translation unit determination technique of the invention to a small subset of binary images, such as those binary images satisfying a particular set of conditions or properties.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing features and other aspects of the invention will now become apparent when the accompanying description is read in conjunction with the following drawings in which:
FIG. 1 is a block diagram of a computer system;
FIG. 2 is a block diagram of a dual stage instruction conversion system including a run-time system and a background system;
FIG. 3 is a block diagram of the run-time system portion of the instruction conversion system of FIG. 2;
FIG. 3A is a flow chart depicting the steps performed at run-time to execute a non-native image on the system of FIG. 1;
FIG. 4 is a more detailed block diagram of a binary translator used in the background system portion of the conversion system of FIG. 2;
FIG. 5 is block diagram of a data structure representing a profile record structure;
FIG. 6 is a block diagram of a representative profile record of the profile record structure of FIG. 5;
FIG. 7 a diagram showing a typical arrangement for a instruction for a complex instruction set computer (CISC);
FIG. 8 is block diagram of a register file in the computer system of FIG. 1 showing assignment of registers corresponding to the non-native architecture;
FIG. 9 is a diagram showing a typical construct for one of the registers in the register file of FIG. 8
FIG. 10 is a pictorial representation of connections of various data structures including a dispatch table to determine an equivalent routine for the interpreter;
FIG. 11 is a pictorial representation of the process for activating an alternate dispatch table;
FIG. 12 is a diagram showing an arrangement of an entry from the dispatch table of FIG. 10;
FIG. 13 is diagram showing a typical arrangement of condition codes of a CISC architecture which implements condition codes;
FIG. 14 is a block diagram of an arrangement to determine evaluation routines for condition codes;
FIG. 15 is a block diagram of an arrangement to determine evaluation routines for current and previous values of condition codes;
FIGS. 16-18 are a series of diagrams useful in understanding how condition codes are handled in the run-time system of FIG. 3;
FIGS. 19 and 20 are diagrams showing relationship between address spaces;
FIG. 21 is a diagram of a context data structure used in the interpreter of FIG. 4;
FIG. 22 is a block diagram of a pair of data structures stored in memory which represents a return address stack for a non-native image of a program as well as shadow stack for a native image of the program;
FIG. 23 is a diagram showing the relationship between the data structures of FIG. 22 and execution of non-native and native routines with calls into corresponding non-native and native routines;
FIG. 24 is a diagram of a data structure including translated or native image routines and call address translation table;
FIG. 25 is a diagram depicting the relationship of the routine call tables in the translated image and the shadow stack to the on-line and background systems;
FIG. 26 is a flow diagram of a typical application program instruction sequence used to illustrate aspects of the invention;
FIG. 27 is a block diagram showing an example of an object;
FIG. 28 is a block diagram showing an example of cross process calling of object methods;
FIG. 29 is a block diagram showing an example of an interface structure;
FIG. 30 is a flow chart showing an example of steps leading to the use of an object in an object oriented service system;
FIG. 31 is a flow chart showing steps in an example embodiment of a method for intercepting functions to perform interface structure replacement;
FIG. 32 is a flow chart showing an example replacement interface structure;
FIG. 33 shows an example embodiment of a template for a jacket function;
FIG. 34 is a flow chart showing steps performed in an example embodiment of a PBJA jacket function when called from non-native code;
FIG. 35 is a flow chart showing steps performed by an example embodiment of a PBJA jacket function when called from native code;
FIG. 36 is a flow chart showing steps performed by an example embodiment of a PAJB jacket function when called from native code;
FIG. 37 is a flow chart showing steps performed by an example embodiment of a PAJB jacket function when called from non-native code;
FIG. 38 is a block diagram showing an example of a system for load time processing to support interception of functions which take a pointer to an object as a parameter;
FIG. 39 is a flow chart showing an example of steps performed at run time to support interception of functions which take a pointer to an object as a parameter;
FIG. 40 is a flow chart showing an example embodiment of steps performed during general function jacketing;
FIG. 41 is a flow chart showing steps to determine and use translation units when performing a binary translation;
FIG. 41A a flow chart showing steps to form translation units of a non-native binary image;
FIG. 42 is a flow chart showing steps of flow path determination;
FIG. 42A is a flow chart showing steps to determine transfer of control target locations for an indirect transfer instruction;
FIG. 43 is a block diagram showing two types of entries included in the profile statistics;
FIG. 44 is a flow chart showing steps for determining regions;
FIG. 45 is a block diagram of a list of code cells;
FIG. 46 is a diagram which shows the relationship between FIGS. 47 and 48;
FIGS. 47 and 48 are block diagrams which illustrate an arrangement of local data flow analysis information;
FIG. 49 is a block diagram of an opcode table;
FIG. 50 is a block diagram of a data flow analysis arrangement illustrating the use of read-modify and modify-write fields of the basic block value (BBV) data structure of FIG. 47;
FIG. 51 is a block diagram which depicts the BBSC summary information field of FIG. 48;
FIG. 52 is a block diagram of an arrangement comprising global data flow analysis information;
FIG. 53 is a more detailed block diagram of the global data flow connections of FIG. 52;
FIG. 54 is a block diagram of the control flow edge (CFE) data structure;
FIG. 55 is a flowchart that sets forth steps of performing a global data flow analysis;
FIGS. 56A and 56B are flowcharts that set forth method steps for determining merge points during global data flow analysis;
FIG. 57 is a block diagram of a global data flow analysis arrangement illustrating a merge point.
FIG. 58A-58D are block diagrams depicting different variations of the binary image transformer;
FIG. 59 is a flow chart of steps of translating the binary image;
FIG. 60 is a flow chart of the step for one method for selecting the translation unit to be processed;
FIG. 60A is a representation of a call graph used in the method steps of FIG. 60;
FIG. 61 is a flow chart depicting an alternative method for selecting a translation unit to be processed;
FIG. 62A is a flow chart listing steps for forming an initial intermediate representation (IR) of a binary image;
FIG. 62B is a block diagram of a data structure illustrating a transformation of a source instruction to an IR with memory operands removed;
FIG. 62C is a block diagram of a data structure used to indicate whether an IR instruction corresponds to a machine instruction which can generate an exception;
FIG. 63 is a flow chart showing steps for translating and optimizing an initial IR to produce the final IR for a given translation unit;
FIG. 64 is a flow chart showing steps for performing condition code processing;
FIG. 65A is a block diagram of a bit mask associated with an IR instruction code cell used to represent condition codes that can be affected by the corresponding IR instruction code cell;
FIG. 65B is a block diagram which depicts an example transformation from source instructions comprising the first binary image as affected by condition code processing;
FIG. 66 is a flow chart depicting steps for register processing;
FIG. 67A is a block diagram which depicts a 32 bit register in an architecture which has partial register operands;
FIG. 67B is a block diagram which depicts a transformation of an initial IR as a result of register processing;
FIG. 68A is a block diagram which depicts a code pattern which is detected by early floating point optimization pressing;
FIG. 68B is a block diagram which is a table indicating a replacement instruction for a specific code pattern detected in early floating point optimization processing;
FIG. 69 is a flow chart depicting steps for local basic block and global routine optimization processing;
FIG. 70 is a flow chart depicting steps of code selection and operand processing which place the IR in final form;
FIG. 70A is a flow chart depicting steps of intra image call processing;
FIG. 71A is a block diagram depicting a translated image comprising tables used in exception handling;
FIG. 71B is a block diagram depicting a table entry in a translator exception table; and
FIG. 71C is a block diagram depicting run time transfer of control when a translated image is executed and an exception occurs.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
COMPUTER SYSTEM
Referring now to FIG. 1, a computer system 10, is shown to include a processor module 12 which has a high performance processor 12a. The computer system 10 further includes, in addition to the processor module 12, a main memory 14, an disk adaptor 15 and an I/O user interface 18, as well as a monitor 19 all coupled by a system bus 20, as shown. Here the processor 12a is a high performance microprocessor such as an Alpha.RTM. microprocessor manufactured by Digital Equipment Corporation, assignee of the present invention, or other high performance processor.
The main memory 14 is comprised of dynamic random access memory and is used to store instructions and data for use by the microprocessor 12a on the processor module 12. The disk adaptor 15 is used to couple the system bus 20 to a disk bus which itself is coupled to disk storage device 17.
The disk storage device 17 is here illustratively partitioned into a plurality of segments or blocks of data which are here represented for convenience as being self-contained and contiguous, but which may be scattered across the disk 17 and be non-contiguous. The disk 17 includes a first storage segment 17a storing an operating system for the computer system 10 as well as an application program stored in segment 17b.
The application program stored in segment 17b is a non-native executable image. That is, the application program is comprised of instructions from a different instruction set than that used in the computer system 10 (i.e. a different computer architecture). Also the application program could have been written for a different operating system than that stored in 17a. Since the instructions provided in the program stored in segment 17b are different from the instruction set executed on the microprocessor 12a the program in segment 17b can not be directly executed on the system 10.
The disk also includes a storage segment 17c which here represents an native executable image of the application program stored in segment 17b. This native image is generated in the computer system via a binary image conversion system (16, FIG. 2) which is here stored with the operating system in the segment 17a as will be described. The image stored in segment 17c corresponds to instructions which can be executed on the microprocessor 12a and thus conforms to the architecture of the computer system 10.
Also stored in a segment 17d are profile statistics which are collected during execution of a portion of the non-native application program stored in 17b. The profile statistics are provided by execution of a run-time routine which converts non-native instructions into native instructions. These profile statistics are used in a background process to convert portions of the non-native image into a native image corresponding to the operation and function of those portions of the non-native application program. In addition, data which are used for the particular application program are also be stored on the disk in segment 17e.
The computer system 10 further includes an I/O user interface 18 which is here an interface used to couple a mouse 19a, for example, to the system bus 20 as well as a monitor 19.
The computer system 10 operates in a generally conventional manner. That is, at "power on", selected portions (not numbered) of the operating system stored in segment 17a are loaded into main memory 14 and occupy a particular address space in main memory 14, such as, address space 14a. As a user of the computer system 10 executes application programs on the system 10, the application programs are run under the control of the operating system.
A typical operating system represented by that stored in 17a is the so-called Windows NT.RTM. operating system of Microsoft Corporation Redmond, Wash. In Windows NT.RTM. or other window type operating systems, displayable images called "icons" are presented to a user on the monitor 19. These icons represent an executable command to initiate execution of a program. When pointed to by a cursor controlled by a mouse, for example, and clicked on this user action activates the command and causes the represented computer program to execute.
Here, however, the application program stored in segment 17b is written in a non native instruction set. That is, the instruction set of the application program is not the same as the instruction set of the computer system 10. Thus, the executable image of the application program stored in segment 17b is comprised of non-native instructions which can not be directly executed on the computer system 10. Nevertheless, the non-native application has a corresponding icon (not shown) which is represented in the window provided by the operating system.
Each non-native application image has a unique identification name (ID) or image key. The identification name or image key is included in the non-native image file and is a unique identifier for the non-native application image. During installation of the file containing the image, typically a server process portion of the operating system determines the unique ID or key to the non-native application image. The ID number is generally assigned by concatenating together unique information of the file. Examples of the types of information include, the time stamp of the file, the file name, the file size and the date that the file was originally produced. Thus, the same non-native image if loaded a multiplicity of times on the computer system will have the same I.D. number. The statistics as well as the translated code associated with each one of the non-native images will be the union of all prior executions of the non-native images for each instance of the non-native application. Other arrangements are of course possible.
When the user clicks on the icon for the program stored in 17b, a portion of the operating system recognizes the ID of the executable image represented by that icon as being comprised of instructions that are non-native to the instruction set and architecture of computer system 10. In general a software module called a loader in the operating system will recognize that the identification name (ID) of the file represented by the selected icon as being non-native to the architecture. Thus, the operating system initiates the execution of an instruction conversion program 16 or feeds the file instruction by an instruction to an instruction pre-processor. Alternatively, a loader can be provided which handles the non-native image by examining the image to determine all files, libraries and resources needed by the image. The loader will thus prepare the non-native image for execution. Part of the preparation is the initiation of the instruction conversion program 16 or alternatively instruction pre-processor, as will now be described.
BINARY IMAGE CONVERSION SYSTEM
Referring now to FIG. 2, the binary image conversion system 16 is shown to include a run-time system 32 which is responsive to instructions provided from the disk segment 17b. As mentioned, the run-time system 32 can be implemented as software to emulate the non-native architecture or as a hardware preprocessor to convert the non-native instructions into native instructions. When implemented as software, the run time system 32 consumes more disk space on disk 17 and occupies more main memory storage in main memory 14. Whereas, when implemented in hardware, the run time system 32 requires more chip space in the high performance microprocessor 12a. Here the run-time system will be described as a software implementation which operates in an execution address space 20 of the computer system 10.
As mentioned above, disk segment 17b stores instructions of an application program complied and/or written for an instruction set which is different from the instruction set of system 10. The run-time system 32 recieves portions of a non-native executable image from segment 17b comprised of the non-native instructions. The run-time system 32 provides a native instruction or a native instruction routine comprised of a plurality of instructions which are executed by the computer system 10 to provide the same functionality as the non-native image. That is, the functionality called for in the instruction in the executable image of the non-native instruction set is equivalently provided by the routines determined by the run-time system 32. The run-time system executes the equivalent routines on the computer system 10. This provides the equivalent function to provide the same result in computer system 10 which implements the new architecture as would occur in a new or old computer system (not shown) implementing the non-native architecture.
In a preferred embodiment of the run time system 32, the run-time system 32 examines and tests the code from the segment 17b to determine what resources are used by the instruction and the function of the instruction. The run-time system 32 provides the equivalent instructions corresponding to the architecture of the computer system 10.
As the equivalent instructions are determined they are executed in the system 10 and profile data or statistics, as will be described, are collected in response to their execution. The profile statistics describe various execution characteristics of the instruction sequence. These profile data are fed to a server process 36 via a datapath 32b.
Prior to performing a conversion by the run time system 32, the run-time system 32 interrogates the server process 36 via a path 32a to determine from the server process whether there is a native image corresponding to the routine of the application program stored in segment 17b whose execution has just been requested by a user. If a native image does not exist (as would occur the first time the non-native image is executed), the run-time system initiates an interpretation process. If there is code in existence for the particular instruction reached in the application program, due to a prior execution in the run-time system and subsequent conversion by a background system, the run-time system 32 will request and execute the native code.
As mentioned, in general, the first time the application program 17b is executed by a user there will be no native image code in existence. As the program executes, however, native code will be generated by the background process in a manner to be described, and over time as substantial portions of the non-native image are executed, convertible portions of the non-native image will be converted by the background process into native image code. As native image code is generated, it is also stored in segment 17c in a manner that is transparent to the user.
In addition the native image file 17c contains an address correlation table which is used to track the segments of native code corresponding to segments of non-native code. This table is used at run time of the program in segment 17b to determine whether and which non-native segments have equivalent translated native segments.
Translation into the native image is provided via a background system 34 which operates in one embodiment after the interpreter has finished execution of the instructions to provide translated code dependant upon the execution characteristics of the run-time converted instructions. Alternatively, the background system operates while there is a pause in CPU utilization by the run-time system 32. Alternatively, the background system can make translated code available to the run-time system 32 during execution to permit substitution of translated code for a subsequent occurrence of the non-native image during the current execution of the application program. Further still, the run-time system can be implemented as a preprocessor which provides the profile statistics for use by the background process. The background process can be implemented in hardware or software or a combination of both.
The background system 34 receives the profile data generated by the run-time system 32. In accordance with the characteristics of the profile data, the background system 34 forms a native image of at least portions of the instructions of the non-native image stored in segment 17b of disk 17. A preferred arrangement is to have the background system implemented as a binary translator to produce translated code. The native image portions are stored in logical disk drive 17' for use if needed in subsequent executions of the application program from segment 17b. Here it should be understood that the logical disk drive 17' is a logical partition of the disk drive 17 and is here referred to as being a logical disk drive, because in general, it is transparent to the user, but it physically represents space storage such as segment 17c on the actual disk drive 17. Alternatively, the logical disk drive 17 could be a separate disk drive.
The run-time system 32 and the background system are each under the control of the server process 36. The server process 36 is present throughout the operation of the computer system 10. The server process 36 is a software service process which, amongst other things, is used to schedule various transactions within and between the run-time 32 and background systems 34.
After generation of native image code such as by the binary translator, the image translated code is stored on logical disk drive 17' in logical segment 17c' with the profile statistics being stored in logical segment 17d'. These locations correspond to segments 17c and 17d in FIG. 2.
Each time there is a new execution of the application program stored in segment 17b, the run-time system will send a request to the server process 36 for native code corresponding to the non-native code currently in the run-time system 32. The translated code is code which was generated by a previous execution of the background system 34 in accordance with the profile statistics collected by execution of the routines furnished by the run-time system 32. The server process 36 supplies corresponding translated code (if any) to the run-time system 32. If there is translated code, the run-time system 32 will have the translated code execute in place of interpreting the code. Otherwise if there is no translated code, the run-time system 32 will interpret, translate, or otherwise convert the relevant portions of the non-native code currently executed in the computer system 10.
As more code of the program stored in segment 17b is executed, more sections of the program are interpreted producing as a result of the execution, profile statistics which are fed to the server process 36.
The server process 36 controls inter alia the storage of the profile statistics. That is, the server process 36 will merge new (raw) statistics with previously stored merged statistics to provide a new merged profile. The server process will compare the new merged profile with the stored merger profile and will initiate a translation process in the background system 34 when there is a difference between the two statistics. The degree of difference needed to initiate execution is selectable. Such a difference indicates that heretofore never executed code was interpreted and executed in the run-time system. This process will be ongoing until all portions of the non-native image have been encountered by the user and all of the portions which can be translated by the background system 34 have been translated.
The server process also determines the unique key or I.D. number to uniquely identify the non-native image stored in segment 17b. As mentioned above, the attributes of the image comprising the I.D. include the file size, the date of creation of the image, the time stamp and so forth. This key is also used to identify the profile statistics with the non-native program.
The background system 34 will, in general, translate nearly all instructions provided from the non-native applications stored in 17b. Certain types of instructions are preferably not translated. In general those instructions which are not translated are ones in which the execution of the instruction is not predictable. For example, instructions which are self modifying (i.e. are not in read only sections, that is, are on a writtable page) will not be translated. For these instructions the run-time system will execute them via the interpretation routines. Further, instructions for which in the non-native architecture there is no easily produced analog in the native architecture will not be translated. For example, in the X86 architecture of Intel, floating point instructions use a floating point control register to determine inter. alia. rounding modes etc. Although for many executions of the instructions the contents of the register may be in a normal state, this can not be guaranteed. Rather than have the translator determine the state it is more economical to handle these instructions in the interpreter.
Since execution or profile statistics in part determines what code is translated by the background translator non-instruction code is not mistaken for instructions by the translator. Therefore, the translated code can be optimized without fear of optimizing noninstructions.
Referring now to FIG. 3, the run-time system 32 is shown to include an execution address space containing run-time system 32 which includes a run-time interpreter 44, a non-native loader 42 which is fed the ID corresponding to the non-native application image provided from segment 17b of the disk 17, a native image loader 43, native operating system dll's (dynamic link libraries) 45 and a return address stack management arrangement 20 (FIG. 22). The non-native loader 42 is similar to the native image loader 43 except it is capable of handling non-native images and interrogates the server process to determine whether there is any native code corresponding to the non-native code awaiting execution. The non-native loader 42 receives instructions corresponding to a non-native image of the application segment 46a and a native image of the application 46b corresponding to translated instructions provided from the background translator 34, and segment 46c corresponding to data. The non-native loader 42 is used to initially load the non-native file. The native loader 43 is used to initially load the native file if any.
Referring now also to FIG. 3A, at the initiation of an execution of the program stored in segment 17b, (via selection of the appropriate icon) (step 50a) the native loader 43 determines whether an architecture number associated with the non-native image is a native or a non-native image. If the image is a native image execution continues as normal. If however the image is a non-native image, the native loader 43 calls the non-native loader 42 at step 50b. The non-native loader 42 loads the non-native image at step 50c and also recognizes that this architecture number associated with the program represents an application program written for a non-native instruction set. The non-native loader starts the binary image conversion system 16. The non-native loader 42 initially queries the server 36 at step 50d to respond with native code to accelerate execution of the image represented by the code stored in 17b. It should be appreciated that the function of the native loader 43 and the non-native loader 42 can be combined into a single loader.
If this is the first time running the application, the server 36 responds at step 50e by indicating that there is no corresponding native image to execute in place of the non-native image. Therefore, the non-native loader 42 instructs the interpreter 44 to begin an interpretation at step 50f of the instructions from the non-native image. The interpreter 44, for each instruction, determines the length or number of bytes comprising the instruction, identifies the opcode portion of the instruction, and determines the resources needed by the instruction. The interpreter maps the non-native instruction to a native instruction or a native sequence of instructions based upon inter alia the opcode. These instructions are executed by the computer system 10 in the address space 20 (FIG. 3). The run-time interpreter 44 collects data resulting from the execution of the instructions as will be described in conjunction with FIG. 6. These "profile statistics" are stored by the server 36 on the logical disk drive 17'.
The run-time interpreter 44 examines and analyzes the instructions to determine the proper native instruction sequence to replace for the non-native instructions provided from the executable image 46a. These native instructions as they are executed continue to generate profile statistics which are collected and stored in logical disk drive storage 17c'. This process continues until execution of the program 17b is terminated by the user.
After termination of the execution of the non-native program, a background process 34 is initiated (not shown). Alternatively, the background process 34 could be initiated to steal execution cycles from the run-time process 32 or alternatively could be used to substitute into the run-time process translated native image code for routines which are subsequently called during execution of the program 17b, as explained above. The exact sequence of which the background processor is used in conjunction with the run-time processor is an implementation detail.
For subsequent executions of the program the interpreter 44 will only provide interpreter code if the server process 36 does not return a native image equivalent of the sequence which is provided from the background process 34 as will be described.
Thus, if at step 50e the server responds with native code, the native image loader 42 at step 50g loads the native code. After the native image code is loaded, the non-native image loader 42 is called at step 50h to fix up the image. In general the non-native image will provide address tables corresponding to inter alia variables in the non-native image which are needed in the execution of the native image. That is, at step 50h the native and non-native images are stitched together to enable the native image to use information in the non-native image. At step 50i the native code is executed. In general, the native code that is executed corresponds to one or more basic blocks or routines of instruction which terminate by a return statement. After execution, a determination is made based upon characteristics of the return instruction execution and by use of a shadow stack as will be described, whether native image code can continue to be executed. If not then control is transferred to the interpreter. The interpreter continues to interpret and execute until it determines as at step 50k that it can resume using native code.
As also shown in FIG. 3, a jacketing routine 48 is used to jacket functions leaving the execution address space 20 to the native execution space of the computer process of computer system 10 as well as those arising from the native execution space of the computer processor 10 into the execution address space 20 as will be further described in conjunction with FIGS. 27-40.
Referring now to FIG. 4, a preferred embodiment of the background system 34 is shown under the control of the server 36 (FIG.1). The server 36 determines, responsive to the profile statistics data provided from the server 36, via logical disk drive 17', whether to initiate a translation process in the background. Preferably, the background system 34 translates only portions of the non-native instructions of the application program which were actually executed (via the interpreter 32) in responsive to a session invoking the program.
The non-native image code is examined at 52 in the server and if the code is the type that should be translated, it is fed to the translator 54. In a preferred environment, the translated code 54 is also fed to an optimizer 58, and again, if the type of code is of a type which can be optimized, it is fed through to the optimizer 58 or else, the process exits or terminates to await the submission of new code from executed portions of the non-native image stored at 17b. Other, techniques for performing translation and translation/optimization will be described. After the translator process 54 and/or the optimization processor 58, either translated code is stored in segment 17b' or optimized translated code is stored in segment 17b'.
PROFILE FILE DATA STRUCTURE
Referring now to FIG. 5, a profile file data structure 60 used to store information gathered at execution time by instructions in the interpreter 34 is shown. The data structure 60 has records which contain information about the execution of a non-native architecture program when the program executes control transfer instructions. The profile record can include other information. That is, the profile records contain information about a target address encountered in the non-native image.
The data structure 60 is shown to include two principal sections. The first section is a profile header section 62 which comprises an image key field 62a. The image key field 62a is used to store information regarding the ID or identification of the profile record. The information in this field 62a is used to associate the profile statistics with a corresponding non-native image and its associated translated code, if any. Thus, the image key field 62a corresponds to the image ID or key field as mentioned above. The profile header 62 also includes a version field 62b comprised of a major version field 62b' and a minor version field 62b". The major version field 62b' and minor version field 62b" are each here 16 bit or 2 bytes in length and their union provides a resulting 32 bit version field 62b. The version fields are used to keep track of which version of the interpreter was used to generate the profile statistics in the table and the profile file format.
The profile file 60 also includes a plurality of raw profile records, here 64.sub.a -64.sub.n. Each of the profile records 64.sub.a -64.sub.n maintains information about run-time execution of control transfer instructions in the non-native image. Each of these records are variable length records as is each of the unique profile files 60. Thus, for each control transfer encountered during execution of the non-native image in the interpreter 34 a raw profile record is produced. The interpreter 34 will place into the raw profile record information regarding the execution of the control transfer instruction. The information which is included in the raw profile record is as described below. Suffice it here to say, however, that the raw profile records are used by the server process to provide a profile record which is then used during translation of the associated routines in the background system.
Referring now to FIG. 6, an exemplary one of the raw profile records here 64.sub.n is shown. The raw profile record 64.sub.n includes a profile record structure 66 including an address field 66a, a flag field 66b and a count field which tracks the number of indirect targets of control transfer 66c. The address field 66a contains the actual target address in the non-native image, as determined by the interpreter 44. This address is the actual target address of the instruction that caused a control transfer during execution of the non-native image. The address field 66a is generally the address length of the non-native architecture or here 32 bits or 4 bytes long. The flags field 66b contains the states of the flags at the target address. The flags field 66b is here 2 bytes or 16 bits long. The n.sub.-- direct field 66c is a counter field which keeps track of the number of indirect target or computed target addresses contained in the remainder of the profile record 64.sub.n as will be described below.
There are additional optional fields 70 which comprise the record. One field is a count field 70a which corresponds to either the number of times a control transfer occurred to the address contained in field 66a or a count branch taken field counter which keeps track of the number of times a branch was taken by the instruction corresponding to the address contained in field 66a. Fields 70b.sub.0 -70b.sub.n correspond to addresses which are the targets of the control transfer and are cumulatively maintained in the profile record structure.
The optional fields 70 are used to keep track or maintain a count of the targets of the control transfer instruction in the image. The count field 70a is either a control transfer field count of the number of times control was transferred to the target address or a branch taken field corresponding to the count of the number of times a conditional control transfer of a branch instruction was taken. The type of field 70a is determined by the flags field 66b being "ANDED" or masked with a value which tests the state of the associated flag. This test determines whether the target address was a result of a control transfer instruction or a branch instruction. This optional field is also a long word.
The target of control transfer fields 70b.sub.1 -70b.sub.n are the target addresses of the control transfer which occurred at the control transfer instruction. These fields keep track of the addresses for indirect transfers, that is, transfers to a run-time computed target address.
The profile statistics are managed by the server process 36. The profile statistics are collected by the interpreter 44 during the course of execution of the emulated code. For each execution the server 36 searches for a profile record corresponding the target address. The server 36 merges the new run-time statistics with the existing statistics to produce a new profile file.
The server 36 makes use of a software cache and hash table (not shown) to keep track of the profile records. For an address which need to be looked up, the address is looked up in the cache in 4 different locations that is by using a four way associative cache (not shown). If the address is not there it is looked up in a conventional hash table. The information in the hash table is the count values for the fields.
RUN-TIME INTERPRETER
Details of an interpreter used to convert non-native instructions to native instructions and provide profile or run-time statistics will now be described. In particular the interpreter 44 interprets instructions of the so-called X86 architecture by Intel Corporation San Francisco, Calif.) into ALPHA instructions by Digital Equipment Corp. will be described.
Referring now to FIG. 7, an X86 instruction 100 is shown to include as many as six different fields. These fields are an opcode 100a, an rm byte 100b, a scaled index and base (sib) byte 100c, a displacement 100d, any immediate data 100e, and any one of six types of prefixes 100f.
The opcode 100a defines the task or operation which will be executed by the instruction 100. The rm byte 100b is an effective address specification and is used in conjunction with the opcode 100a to specify a general operand as a memory or register location and, in some cases, also participates in defining the operation. The sib byte 100c is used in conjunction with the rm byte 100b to provide additional flexibility in addressing memory locations. The displacement field 100d provides a displacement from the base register or from virtual zero of a segment. The immediate data field 100e provides immediate data to the opcode 100a.
The prefixes 100f are located before the opcode 100a in the instruction 100. Possible prefixes 100f are a segment override which implements a second (or multiple) addressing space, a repeat specifier value to repeat a specific instruction n times, a lock assertion for synchronization in multiple CPU environments, an address size prefix which selects between 16 and 32 bit addressing, an operand size prefix which selects between 16 and 32 bit operands, and an opcode prefix which selects an alternative opcode set.
From the opcode 100a it can be determined whether an rm byte 100b, an unconditional displacement, or the immediate data field is provided in the instruction 100. It can be determined from the rm byte 100b whether a sib byte 100c and/or a conditional displacement field 100d is included in the instruction 100. As all fields are not required by each Intel instruction 100, Intel instructions are not of a fixed length, but rather are of varying lengths.
The run-time interpreter 44 (FIG. 3) is, in the preferred embodiment, implemented on a computer system 10 (FIG. 1) which conforms to the Alpha architecture, An Alpha architecture computer system operates using the Alpha instruction set which is comprised of fixed length instructions. The run-time interpreter 44 operates on a single Intel instruction at a time. For each Intel instruction a single Alpha instruction or multiple Alpha instructions forming a corresponding Alpha routine, is provided which is an operational equivalent to the Intel instruction.
To transparently emulate the execution of an Intel or other non-native instruction 100 the run-time interpreter 44 should be capable of emulating the operation of the Intel or non-native memory, registers, condition codes and a program counter which, on a 32 bit Intel machine is referred to as an extended instruction pointer, EIP. In this way, a result of the execution of the instruction 100 is recorded accurately.
The interpreter 44 uses the same memory space for data while executing Alpha routines corresponding to Intel instructions as is used when executing native Alpha instructions. This is possible because the strict standards to which Win32 software applications adhere allow for differences in calling conventions but not in the representation of the data. The maintenance of the Intel registers, condition codes and EIP are discussed below.
Referring now to FIG. 8, a table 101 depicting Intel or non-native values assigned to the registers of computer system 10 is shown to include eight registers which are assigned to emulate the operation of the eight Intel integer registers, EAX 104a, EBX 104b, ECX 104c, EDX 104d, EDI 104e, ESI 104f, EBP 104g, and ESP 104h. A single register, CONTEXT 105, is assigned to serve as a pointer to the emulator state context maintained in memory which is used to manage each thread executing in a multitasking environment. An additional register, FSP 106, stores a floating point stack pointer for addressing an eight entry stack of floating operands.
Three registers, CCR 107a, CCS 107b, and CCD 107c are assigned to store information which allow condition code bits to be maintained in an unevaluated state by the on-line interpreter 44. The SHADOW 108 register provides a pointer to the shadow stack (as will be described) which maintains activation records for translated code. The SEGOFF 109 register maintains an offset from address zero in the native architecture memory permitting the native architecture to emulate multiple addressing spaces which are possible in the Intel architecture and other non-native architectures. Four additional registers T0 110a, T1 110b, T2 110c and T3 1101d are assigned as temporary registers.
The frame 112 register identifies the activation record at the most recent activation of the run-time interpreter 44. The Emulator's Return Address, ERA 114, register stores the return address when the run-time interpreter 44 calls a private sub-routine. The Effective Address, EA 116, register stores the result of evaluating an RM byte 100b and to specify a memory address to a memory access routine.
Seven of the remaining registers, NXTEIP 118a, NXTQ.sub.-- LO 118b, NXTQ.sub.-- HI 118c, NXTJMP 118d, Q0 118e, Q1 118f and QUAD 120 retain values which are used by the interpreter 44 to identify a complete Intel instruction 100 from the instruction stream and to provide pipelining capabilities.
To identify an Intel instruction 100, the run-time interpreter 44 assembles an eight byte (64 bit) snapshot of the instruction stream beginning at the start of the current Intel instruction number. This quadword is retained in QUAD 120.
To assemble QUAD 120, the run-time interpreter 44 captures two quadwords of information from the instruction stream. The run-time interpreter 44 uses the address in the instruction stream identified by the next extended instruction pointer, NXTEIP 118a, as the starting address for the first quadword. NXTEIP 118a identifies a random byte in the instruction stream at which the next instruction to be executed begins. Here, computer system 10 (FIG. 1) requires a quadword aligned address for this initial capture. Accordingly, if NXTEIP 118a is not a quadword aligned address, the three low order bits are first zeroed thus forcing the capture to occur beginning at a quadword boundary. The quadword captured beginning at this quadword aligned address is stored in register Q0 118e. By executing the capture in this manner, the quadword stored in register Q0 118e will at least provide the low byte of the next instruction.
The second quadword capture occurs at an address identified by NXTEIP 118a incremented by seven bytes. Here again, computer system 10 requires a quadword aligned address for this second capture. If the address identified by NXTEIP 118a incremented by seven bytes is not quadword aligned, the run-time interpreter 44 forces the three low order bits to zero thus forcing the address to be quadword aligned. From this quadword aligned address, the capture is performed and the quadword is stored in register Q1 118f. Here, the quadword stored in register Q1 118f contains at least the high order byte of the quadword beginning at the next instruction as identified by NXTEIP 118a.
To extract the low order bytes of the quadword beginning at NXTEIP 118a, the run-time interpreter 44 executes an instruction which, using the three low bits of NXTEIP 118a, determines a byte in register Q0 118e which is identified by NXTEIP 118a, whether or not this byte is quadword aligned. The data in register Q0 118e is copied to register NXTQ.sub.-- LO 118b and shifted right to locate the byte identified by NXTEIP 118a in the low order byte register NXTQ.sub.-- LO 118b. The high order bytes of NXTQ.sub.-- LO 118b which, after the shift, no longer contain valid information are zeroed.
The three low bits of the address identified by NXTEIP 118a incremented by seven bytes is used to determine the high order byte of the quadword beginning at NXTEIP 118a. Here, the data in register Q1 118f is copied to register NXTQ.sub.-- HI 118c shifted left to locate the byte identified by NXTEIP 118a incremented by seven bytes in the high order byte of register NXTQ.sub.-- HI 118c. Here, the low order bytes of NXTQ.sub.-- HI 118c which no longer contain valid information as a result of the shift are zeroed. The result of ORing the contents of registers NXTQ.sub.-- LO 118b and NXTQ.sub.-- HI 118c is stored in QUAD 120.
Referring now to FIG. 9, the low bit of QUAD 120 is shown to be aligned with an Extended Instruction Pointer, EIP 121. In an Intel machine, the EIP 121 identifies a location in the instruction stream which corresponds to the beginning of the current instruction. As each instruction in the instruction stream is executed, the EIP 121 is incremented in the instruction stream to point to the beginning of the next instruction. QUAD 120, therefore, holds a quadword of information beginning at the byte identified by EIP 121.
To determine the operation of the Intel instruction 100 and a corresponding Alpha routine which performs the operational equivalent of the Intel instruction 100, the interpreter uses the information contained in QUAD 120. Typically, the first byte of an Intel instruction is the opcode 100a as shown in FIG. A. The run-time interpreter 44 extracts the first and second low bytes 120a, 120b of QUAD 1002 to provide a two byte instruction fragment 122. From this two byte instruction fragment 122, a corresponding Alpha routine and the length of the instruction 100 are determined.
Referring now to FIG. 10, an arrangement 130 to determine the length of the Intel instruction 100 and the corresponding Alpha routine which implements the operational equivalent of the Intel instruction 100, is shown the arrangement 130 extracts the two low bytes 120a, 120b from QUAD 120 to provide the two byte instruction fragment 122. This two byte instruction fragment 122 is used as an index into a dispatch table 131 which resides in system memory 14 (FIG. 1).
The dispatch table 131 includes 2.sup.16 =64K (65536), 32 bit entries of which entry 131i is representative. Each entry corresponds to each instruction in a set of instructions available in the Intel instruction set. The contents of these 32 bit entries 131i include a field 131a containing an address at which the corresponding Alpha routine resides in system memory 14 as well as a field 131b containing the length of the instruction.
The dispatch table 131 is generated by a tool which identifies each instruction in the Intel instruction set such that the two byte instruction fragment 122 is sufficient information to identify the proper entry which corresponds to the current Intel instruction 100. The tool also provides the complete length of the Intel instruction 100 and includes this information in the dispatch table in the length field 131b along with the location of the Alpha routine which will provide the functional equivalent of the Intel instruction 100 in the address field 131a. The run-time interpreter 44 chooses among eight dispatch tables based upon the sequence of prefix elements 100f preceding the actual opcode 100a.
As discussed above in conjunction with FIG. 7, an Intel instruction 100 may be comprised of multiple elements 100a-100f. Multiple dispatch tables are provided by run-time interpreter 44 to handle the different values and combination of values which can be selected by the prefix element 100f. As discussed above, three possible prefixes 100f are addressing size (16 or 32 bits), operand size (16 or 32 bits) and two byte opcode, which selects an alternative opcode set. Any one or combination of these prefixes 100f may be present in an Intel instruction 100.
The addressing size prefix toggles between an addressing size for the Intel system which truncates address arithmetic to 16 bits or to 32 bits. Typically, the address size is 32 bits. The operand size prefix is similar wherein an operand expected by the system is 16 bits under a 16 bit operand size or 32 bits when the operand size is set for 32 bits. Here again, the typical operand size is 32 bits. The final prefix toggles between two alternative opcode sets. The first is a one byte opcode set and the second is a two byte opcode set. Here, a one byte opcode set is typically selected. A dispatch table similar to the dispatch table 131 in FIG. 10 is provided in system memory 14 for each of the eight possible combinations of prefixes 100f, the default dispatch table is dispatch table 131 having a 1 byte opcode with a 32 bit addressing size and a 32 bit operand size.
In addition to an entry for each instruction, also included in dispatch table 131 is an entry for each prefix 100f and prefix 100f combination. The 32 bit entry 131j, corresponding to a prefix 100f, activates a different dispatch table in memory 14 in which the subsequent opcode 100a in the instruction stream and its corresponding two byte instruction fragment 122 may be used to index the proper 32 bit entry 131i.
Referring now to FIG. 11, a process for activating an alternate dispatch table 131' is shown to include extracting a two byte instruction fragment 122 from QUAD 120. The two byte instruction fragment 122 is used as an index into the dispatch table 131.
Here, the two byte instruction fragment 122 identifies an entry in the dispatch table 131j. The dispatch table entry 131j includes a native routine address 131a in memory 14 and the length 131b of the Intel instruction 100 which here, is 001 or one byte. The first byte of the two byte instruction fragment 122 is a prefix 100f to instruction 100 which selects 16 bit addressing. Accordingly, the native routine 132 identified by the native routine address 131a, instructs the run-time interpreter 44 to activate the dispatch table 131' which corresponds to an instruction set implementing 16 bit addressing.
The length 131b of the Intel instruction 100 is provided to the run-time interpreter 44 which increments EIP 121 one byte in QUAD 120 to identify the beginning of the next instruction. A new two byte instruction fragment 122' is extracted from QUAD beginning at the new location identified by EIP 121. This two byte instruction fragment 122' identifies an entry 131i' in dispatch table 131'. Again, the two portions of the dispatch table entry 131i' identify the native routine address 131a' in memory 14 of the native routine 134 which is the operational equivalent of the Intel instruction 100 and the length 131b' of instruction 100.
The run-time interpreter 44 executes the native routine 134 which provides the operational equivalent of Intel instruction 100. Once complete, the on-line interpreter activates the default dispatch table 131 for 32 bit addressing and operands and one byte opcodes. While the run-time interpreter 44 is executing the native routine 134 for Intel instruction 100, the process just described allows the run-time interpreter 44 to identify the beginning of the subsequent instruction by incrementing EIP 121. In addition, the entry in the active dispatch table 131 which corresponds to the subsequent instruction is also identified. From this entry 131n, the address of the native routine 131na corresponding to the subsequent instruction as well as the length 131nb of the subsequent instruction are determined. This arrangement allows the on-line interpreter to operate in a pipelined fashion, executing multiple instructions in parallel.
Referring now to FIG. 12, a 32 bit entry 131i from dispatch table 131 is shown to be divided into two sections, the first section 131a corresponding to bits 3-31 of the 32 bit entry 1012 and the second section 131b corresponding to bits 0-2 of the 32 bit entry. Bits 3-31, section 131a are used to address the Alpha routines which execute the operational equivalent of the Intel instruction 100 and bits 0-2 131b signify the length of the Intel instruction 100.
The dispatch table targets are aligned on quadword boundaries. That is, the Alpha instructions which the entries in the dispatch table 131 point to and execute the operational equivalent of Intel instruction 100, are located in system memory 14 on quadword boundaries. In this way, bits 0-2 of the address of the Alpha instructions are always zero. As a result, bits 0-2 131b' may be used to convey additional information about the instruction as here, where these bits are used to signify the length of the instruction. As the addresses of the Alpha routines are always 000 in bits 0-2 field 131b', a full 32 bit address is recreated by appending these zeros to bits 3-31 1012a to provide a complete 32 bit address.
As control is passed to the Alpha routine identified by the 32 bit address, bits 0-2 are used to increment EIP 121 so that EIP 121 is pointing to the beginning of the next instruction. Here, if the length of the Intel instruction 100 is from 1-6 bytes in length, QUAD 120 contains sufficient information to form a second, two byte instruction fragment 122 which may be used to index the current dispatch table to determine the corresponding Alpha routine for the next Intel instruction. This arrangement allows the run-time interpreter 44 to pipeline instructions and thus execute the application program more quickly and efficiently. While an Alpha routine is being accessed corresponding to a current instruction, the run-time interpreter 44 is able to determine the address and length of the next Intel instruction 100 in the instruction stream. A value of zero returned from bits 0-2 field 131b of the 32 bit entry 131i for the length of the Intel instruction 100 however, indicates that the instruction was longer than 6 bytes and hence, pipelining is not possible for this Intel instruction and accordingly, the EIP 121 is not incremented. It is then the responsibility of the Alpha routine to increment EIP 121 and to refill the pipeline.
CONDITION CODE PROCESSING
Referring now to FIG. 13, general purpose registers 135 of an Intel X86 machine are shown to include a single register, EFLAGS 135a, in which condition codes are maintained. This register, EFLAGS 135a, maintains the six condition code bits, the Carry bit 136a (C), the Negative bit 136b (N), the Zero bit 136c (Z), the Overflow bit 136d (O), the Parity bit 136e (P), and the Low Nibble Carry bit 136f (A). Each of these bits may be cleared or set as a result of the execution of an Intel instruction 100. To completely emulate the operation of the Intel application, the run-time interpreter 44 also maintains, in an unevaluated state, the current state of the condition codes resulting from the execution of an Alpha routine which corresponds to the Intel instruction 100.
As is often the case in systems which maintain condition codes, a subsequent condition code modifying instruction may be executed, thus overwriting the changes made to the condition code bits by a prior condition code modifying instruction, before the state of the condition codes is required by a subsequent instruction. In addition, many of the condition code modifying instructions effect only a partial set of the condition code bits. Accordingly, a complete evaluation of the condition code bits after execution of every condition code modifying instruction would be wasteful at CPU time. Nevertheless, the state of the condition code bits needs to be readily ascertainable throughout the execution of the X86 image should the current state of the condition codes be required.
Referring now to FIG. 14, the run-time interpreter 44 is shown to include a set of data storage locations 138, a table of methods 139, and evaluation routines 140 which are used to emulate the X86 condition codes during execution of an X86 image in computer system 10.
The set of data storage locations 138 is shown to include three locations 138a, 138b, 138c which are updated upon execution of an instruction which would have modified the condition codes in an X86 system. The first location, data1 138a, and the second location, data2 138b, store data used in the execution of the instruction, for example, an operand and a result of the instruction. This information is used later during execution of the application program should it become necessary to evaluate the condition codes.
The third location, pointer 138c, contains a pointer to the table of methods 139 which is a dispatch table used to evaluate the condition codes should the system require the current value of the condition codes. The table of methods 1022 contains an entry for each of the eight predicates available in X86 conditional branches (and equivalent SETcc instructions), an entry to obtain the nibble carry, A 136f, bit and an entry to obtain a complete image at the EFLAGS 135a register. The set of methods includes one for each of the six condition codes.
Each entry in the table of methods 139, identifies an evaluation routine 140 which evaluates the condition described in the method table entry. Data1 138a and data2 138b are provided to the evaluation routines to determine the state of the condition code bits should a subsequent instruction require the current state of the condition codes.
When an Alpha routine is executed for an Intel instruction which would have modified one or more of the condition codes, the run-time interpreter 44 stores zero to two pieces of information from the instruction in the first two storage locations, data1 138a and data2 138b. These pieces of information, possibly an operand and a result of the operation, are used by the evaluation routines to compute the condition codes. In the third storage location, pointer 138c, a pointer is placed which, in accordance with the type of instruction which was executed, identifies the entry in the table of methods 139 which will identify the evaluation routines 140 which are to be called if and when the condition codes are evaluated.
The table of methods 139 is specific to the type of instruction executed. That is, if the instruction modifies all of the condition codes, the table of methods includes an entry pointing to a routine for each of the six condition codes. If the instruction modifies only the C bit, the only entry in the table of methods 138 is a entry pointing to an evaluation routine which will evaluate the C bit. Other possibilities include instructions which modify all of the condition code bits except for the C bit (ALL.sub.-- BUT.sub.-- C) instructions which modify only the Z bit (ONLY.sub.-- Z) and instructions which modify only the C and O bits (C.sub.-- AND.sub.-- O). The table of methods 139 for instructions of these types would include entries pointing to routines which correspond to all but the C bit, only the Z bit and only the C and O bits respectively.
Each entry in the table of methods 138 identifies a separate evaluation routine 140 which computes that specific condition code predicate or image of EFLAGS 135. Because these routines are only executed when necessary, the condition codes are maintained in an unevaluated state and accordingly, only minimally effect the execution speed of the application. Data1 138a and data2 138b are provided to the evaluation routine 1024 to determine the effect the instruction had, or should have had, on the condition codes. Later, when a subsequent instruction is encountered by the run-time interpreter 44 which requires the current value of one or all of the condition code bits as input to the instruction, for example, as a condition in a conditional instruction, the run-time interpreter 44 uses the information provided in the data storage locations 138a and 138b, the table of methods 139 and the evaluation routines 140 to determine the current values of the condition code bits.
As discussed above, an Intel instruction can modify all condition code bits, or a subset of those bits. If the current instruction which modified the condition code bits modifies only the C bit and the previous instruction modified all of the condition code bits it would be wasteful to gather the data necessary to evaluate all but the C bit and copy it into the table of methods 139 which is provided for the current C bit modifying instruction. As a result, the run-time interpreter 44 maintains information to evaluate the previous state of the condition code bits based upon a previous condition code modifying instruction as well as the current condition code modifying instruction.
Referring now to FIG. 15, the interpreter is shown to include two sets of data storage locations 138 and 138', two corresponding tables of methods 139 and 139' and corresponding evaluation routines 140 and 140'. A first condition code evaluation grouping 137 corresponds to a current condition code modifying instruction and a second condition code evaluation grouping 137' corresponds to a previously executed condition code modifying instruction. Further, a finite state machine (FSM) is provided which determines how the previous and current states of the condition codes are maintained. The states and transitions of the FSM are the five types of condition code updates: ALL.sub.-- BUT.sub.-- C, ONLY.sub.-- C, C.sub.-- AND.sub.-- O, ONLY.sub.-- Z and ALL. Each transition has associated with it one of three actions: replace, push or resolve.
Provided below is a table, TABLE 1, which describes the action taken to maintain the condition code bits. The action is contingent upon which condition code bits the current instruction will modify as well as which condition code bits were modified by a previously executed condition code modifying instruction. In addition, the actions have been carefully selected to provide an action for the transition which entails a minimal amount of work yet still provides the run-time interpreter 44 a complete up-to-date set of condition code bits at any time.
In a replace action, the contents of the current condition code evaluation grouping are replaced by the values resulting from the next instruction. That is, the contents of the data storage locations 138, the corresponding table of methods 139 and the evaluation routines 140 are replaced with values which will enable the run-time interpreter 44 to evaluate the condition codes modified as a result of the next instruction. A replace action does not modify the contents of the previous condition code evaluation grouping. A replace action is appropriate when the set of condition code bits modified by the next condition code modifying instruction includes at least all of the condition code bits in the set of condition code bits modified by the most recent condition code modifying instruction.
A push action however, replaces the contents of the previous condition code evaluation grouping 137' with the contents of the current condition code evaluation grouping 137. The current condition code evaluation grouping 137 is used to provide the necessary information to evaluate the condition code bits modified by the next instruction. A push action is appropriate when the set of condition code bits modified by the next condition code modifying instruction does not include all of the condition code bits in the set of condition code bits modified by the most recent condition code modifying instruction. In addition, a union of the two condition code bit sets results in a complete set of condition code bits.
The final action is a resolve. The resolve is the most complicated of all the actions. In a resolve, the state of the condition codes, as represented by the current and previous condition code evaluation groupings 137 and 137', is evaluated resulting in a complete set of condition code bits, or an ALL, in the current condition code evaluation grouping 137. A push is then performed for the next instruction. A resolve action is appropriate when more than two condition code evaluation groupings would be necessary to maintain a complete set of condition code bits.
TABLE I
__________________________________________________________________________
Next CC Most Recent CC State
State ALL.sub.-- BUT.sub.-- C
ONLY.sub.-- C
C.sub.-- AND.sub.-- O
ONLY.sub.-- Z
ALL
__________________________________________________________________________
ALL.sub.-- BUT.sub.-- C
replace
push push replace
push
ONLY.sub.-- C
push replace
resolve
resolve
push
C.sub.-- AND.sub.-- O
push replace
replace
resolve
push
ONLY.sub.-- Z
resolve
resolve
resolve
replace
push
ALL replace
replace
replace
replace
replace
__________________________________________________________________________
As mentioned above, the first condition code evaluation grouping 137 maintains in an unevaluated state the state of the condition codes corresponding to the execution of a current instruction. The second condition code evaluation grouping 138 maintains in an unevaluated state the state of the condition codes corresponding to the execution of a previous instruction.
The first set of data storage locations 138 here, registers CCR 107a, CCS 107b and CCD 107c retain three values. CCR 107a and CCS 107b contain data used by the current, non-native instruction such as an operand and a result of the instruction. CCD 107c contains a pointer to the dispatch table 139 provided to evaluate the state of the condition codes which are modified as a result of the execution of the current instruction. The second set of data storage locations 138' retain similar values corresponding to a previous condition code modifying instruction.
Here, each condition code evaluation grouping 137, 137' is shown to include a location in the respective table of methods 139, 139' which indicates the category of instruction which was executed. That is, whether the instruction modifies all of the condition code bits or a subset of the condition code bits. Using this value and the information in the FSM of TABLE I, the run-time interpreter 44 maintains in an unevaluated state, the complete set of condition code bits.
To illustrate how this works, an example is provided in conjunction with FIG. 15, in which a current instruction modifies all of the condition code bits (ALL) and a next instruction modifies only the C bit (ONLY.sub.-- C) . In this simple example, the contents of the second condition code evaluation grouping 137', which provides the previous condition code state, is immaterial as will be shown.
As the current instruction modifies all of the condition code bits, the category location 139a of dispatch table 139 would indicate an ALL value. Accordingly, an entry for each of the six condition code bits is provided in dispatch table 139a to access evaluation routines 140 for each condition code bit.
When the corresponding Alpha routine for the next instruction is executed, the category location 139a of the current dispatch table is accessed to determine the category of the previous instruction. Using the category information provided and the information contained in TABLE 1 the run-time interpreter 44 manipulates the contents of each condition code evaluation grouping 137, 137' accordingly.
Here, the category of the most recently executed instruction is ALL while the category of the next instruction is ONLY.sub.-- C. As shown in TABLE I, when the most recent condition code state is an ALL and the next instruction is an ONLY.sub.-- C, the action which is to be taken is a push. Here, a push is an appropriate action because the set of bits modified by the next condition code modifying instruction, {C}, does not include all of the bits modified by the most recently executed condition code modifying instruction, {C, N, O, P, A}. Moreover, a union at the two condition code bit sets results in a complete set of condition code bits, {C, N, Z, O, P, A}.
The information retained in the current condition code evaluation grouping 137 is pushed or copied into the storage locations for the previous condition code evaluation grouping 137'. That is, the data in CCR 138a and CCS 138b are copied to pdata1 138a' and pdata2 138b' respectively and CCD 138c is copied to pptr 138c'. The current condition code evaluation grouping 137 is then used to store the data used to evaluate the C bit which is the only condition code bit modified by the next instruction. An example is provided below in conjunction with FIGS. 16 and 17 which describes a resolve action.
Referring now to FIG. 16, a set of condition code state diagrams 150 includes a condition code state 152 diagram for a previously executed condition code modifying instruction, a condition code state 154 diagram for a most recently executed condition code modifying instruction and a condition code state 156 diagram for a next condition code modifying instruction. Here, the previous condition code state 152 is ALL.sub.-- BUT.sub.-- C in which all but the C bit is modified. The most recent condition code state 154 is C.sub.-- AND.sub.-- O in which only the C and O bits are modified as a result of the execution of the most recently executed condition code modifying instruction. The next condition code state 156 is ONLY.sub.-- C in which only the C bit is modified.
Referring back to TABLE 1, it may be seen that when the most recent state is C.sub.-- AND.sub.-- O and the next state is ONLY.sub.-- C the appropriate action to be taken is a resolve action. It can be seen from FIG. H a replace action would not preserve the most recent state of the O bit as the current condition code state would be overwritten by information only capable of determining the C bit. A push however would lose the information necessary to determine the most recent values of the N, Z, P and A bits. As discussed above, more than two condition code evaluation groupings would be required to fully preserve the current states of each of the condition code bits. Accordingly, the information stored in the first and second condition code evaluation groupings 137, 137' is resolved resulting in a complete set of condition code bits.
Referring now to FIG. 17, the most recent condition code state 154' diagram is shown to contain a complete set of condition code bits. As a result of the resolve action, the most recent condition code state 154' is ALL and the next condition code state 156' is an ONLY.sub.-- C. Referring again to TABLE 1, the appropriate action to be taken is a push when the most recent condition code state is ALL and the next condition code state is ONLY.sub.-- C. Accordingly, the run-time interpreter 44 can push the condition code information resulting from execution of the next instruction without losing any condition code bit information.
Referring now to FIG. 18, the previous condition code state 152" diagram is shown to indicate a complete set of condition code bits which was pushed from the most recent condition code state 154' in FIG. 17. The most recent condition code state 154" diagram of FIG. 18 now indicates execution of a condition code modifying instruction which modified only the C bit. As may be seen, all information relating to the most current state of each of the condition code bits has been preserved.
MULTIPLE ADDRESS SPACES
Referring now to FIG. 19, an implementation of multiple address spaces on an Intel machine is shown to include segments CS 160, DS 162, and SS 164 identifying address 0 166 of a first address space 168 and segment FS 170 identifying address 0 172 of a second address space 174. Data X 168i is located within the first address space 168 and data Y 174i is located within the second address space 174.
It should be noted that the first address space 168 and the second address space 174 exist independently from each other. Accordingly, there is no relationship between the location identified by segments CS 160, DS 162, and SS 164 and segment FS 170. Nor is there any relationship between the address of the location of data X 168i in the first address space 168 and address of the location of data Y 174i in the second address space 174.
Referring now to FIG. 20, emulation of multiple address spaces on a native architecture is shown to include segments CS 160', DS 162', and SS 164' identifying address 0 166' of a first address space 168' and segment FS 170' identifying address 0 172' of a second address space 174' where segment FS 170' has an offset 175 from address 0 166' of the first address space 168'. The value of the offset 175 is stored in SEGOFF 109 (FIG. 8).
CONTEXT DATA STRUCTURE
Referring now to FIG. 21, a context data structure 180 which resides in memory is shown. The context data structure 180 is used by the on-line interpreter 44 to handle multitasking capabilities of the non-native software application. When, due to multitasking, an additional thread is executed during operation of the non-native software application, a snap-shot of the current state of the run-time interpreter 44 is saved in context data structure 180. The context data structure 180 is used by the new thread to provide the run-time interpreter 44 executing in the new thread the state of the run-time interpreter 44 executing in the thread which initialized the new thread.
Values which are saved in the context data structure 180 include the current condition code state in field 181. Thus, this field includes subfields (not shown) to provide copies of the values stored in registers CCR 138a, CCS 138b and CCD and 138c. Values are provided in field 182 to store the previous state of the condition code bits. The context data structure also includes copies of the integer registers EAX 104a, EBX 104b, ECS 104c, EDX 104d EDI 104e, ESI 104f, EBP 104g and ESP 104h in field 183.
In field 183 values for the six segments (seldomly used in WIN32 applications) are provided. The six segments, four of which are depicted in FIGS. 19 and 20 are cs, ds, es, fs, gs and ss. A copy of the floating stack pointer 106 (FIG. 8) is also provided in field 185 in addition to a starting value for the floating stack pointer as well as the floating stack entries.
Field 186 of the context data structure 180 provides pointers to each of the eight possible dispatch tables. Exemplary dispatch tables 131 and 131' are depicted in FIGS. 10 and 11. The context data structure 180 also provides in field 187 the Extended Instruction Pointer, EIP 121.
A repeat specifier value, as designated by one of the possible prefixes 100f (FIG. 8), is provided in field 188. Values relating to the Emulator Return Address, ERA 114, register are stored in field 189. In fields 190 and 191 pointers used to maintain the profile table as well as pointers to portable math routines are also provided respectively. Values of selected constants are also provided in the context data structure 180 in field 192 while pointers to maintain a linked list of context data structures is provided in field 193.
An additional aspect of a preferred embodiment includes structuring the order of the software which implements the run-time interpreter 44 such that critical blocks of the software code exist in a single cache block. In this way, the run-time interpreter 44 is able to execute more efficiently as the portions of the interpreter 44 which are executed most often are resident in the cache.
NON-NATIVE RETURN ADDRESS STACK AND SHADOW STACK
Referring now to FIG. 22, a return address stack arrangement 210 is shown to include a non-native return address stack 211 and a shadow stack 212. The non-native return address stack 211 is an address stack which is produced as if the non-native image were executing in the non-native environment. The non-native return address stack 211 comprises a plurality of frames 219, each of said frames including a corresponding one of non-native return address fields 213a-213c, as well as fields 215a-215c for local storage, as shown. The non-native return address stored in locations 213a-213c corresponds to the routine return address that is pushed onto the stack by the program when it executes a call instruction. That is, the non-native program when executing in a native environment would place on the stack 211 a particular return address corresponding to the address space as if the non-native program was executing in its native environment.
As also mentioned, the return stack arrangement 210 also includes a shadow stack 212. The shadow stack 212 likewise is comprised of a plurality of frames 214, each of said frames 214 comprising a header field 216a-216c and corresponding or associated local storage fields 218a-218c.
The return address arrangement 210 also includes a pair of stack pointers, one for the non-native return stack 211 and one for the shadow frames 214. The non-native return address stack pointer 217 also referred to as SP points to the bottom or most recent entry in the non-native return address stack. Here the non-native return address stack 211 has an initial address A.sub.0 of <7FFFFFFF>. The initial address of <7FFFFFFF> insures that as the stack pointer SP is decremented, the largest stack pointer value will not be sign extended by an LDL instruction as will be described. Likewise, the shadow stack 212 has a stack pointer 221 referred to as SSP and has an initial address A.sub.0 =<0000000077FFFFFFF>.
The header portion 216a-216c of the shadow stack 212 here comprises four sub-fields. The first sub-field 220a also referred to as SP is the contents of the non-native stack pointer 17 corresponding to the return address in the non-native stack pointer for the particular shadow stack frame 214. Here the non-native stack pointer corresponds to the size of the emulated operating system. Thus, for a 32 bit operating system, the non-native stack pointer 220a would comprise four bytes.
The second entry 220b in the header 216a-216c is the non-native instruction pointer value 220b. The non-native instruction pointer is the address that is pushed onto the non-native return address stack 211. This address also comprises the same number of bytes as the number of bytes supported in the operating system. Thus, again for a 32 bit operating system, the number of bytes is 4.
The third entry 20c in the header portion 216a-216c is a native return address field 220c. The native return address field 220c comprises the native return address which is placed on the shadow stack if a translated routine executes a call instruction. This corresponds to the address of the native instruction which is to resume execution in the translated routine after the called routine has completed.
The fourth entry in the header 216a-216c is the native dynamic link 220d. The native dynamic link field is a pointer to the previous shadow frame header 214. Thus, in FIG. 22, the value stored in the field "dylnk" corresponds to the location of the next shadow frame header 216b. This value is preferably included in the shadow stack to allow the shadow stack 212 to make provisions for a variable amount of local storage in fields 218a-218c. In situations where the local storage fields are not provided or their size is fixed, it is not necessary to have a dynamic link field.
The local storage fields 215a-215c in the non-native register stack 211 comprises routine calls and routine arguments of the non-native system and is provided to faithfully replicate that which would occur in the non-native system were it being executed on its native architecture. The routine locals and routine arguments stored in the non-native return stack are passed to translated routines via the translation process described above and as will be further described in detail below. In the shadow stack 212, however, provision is also provided for local storage in fields 218a-218c. For example, often when a compiler is used to compile a program, the actual instructions of the program use more logical registers than physically exist in the machine on which the program is to be executed. Accordingly, the compiler often provides temporary storage for logical register manipulations and uses the program stack to store these registers.
NON-NATIVE RETURN STACK AND SHADOW STACK MANAGEMENT
The non-native return address stack 211 is managed exactly as dictated by the non-native code being emulated in the interpreter 44. When the interpreter 44 is executing the non-native or non-native code of a particular thread, there is only one native frame on the shadow stack 212 for the interpreter. This permits the interpreter to transfer execution into translated code in the event that there is corresponding translated code to be executed. The interpreter does not push frames onto the shadow stack 212. Further, when transferring into and out of translated routines, the interpreter does not push data onto the native system stack. Rather, when transferring into and out of translated routines, shadow frames 214 are pushed onto the shadow stack 212 to record the state associated with the translated routines.
The shadow stack 212 tends to be synchronous with the routine frames on the non-native return stack. Although calling jackets (48 FIG. 3) may cause another instance of the interpreter 44 to be produced if a callback is performed, and thus push another interpreter frame onto the non-native return address stack 211, once the jacketed operation has been completed this extra frame is removed from the non-native or non-native stack 211.
With a translated routine, however, a shadow frame 214 is pushed onto the shadow stack 212 each time a translated routine is called. The shadow frame 214 includes the space necessary for the translated routine's locals such as the spilled registers mentioned above, and the shadow frame header.
Referring now to FIG. 23, an example of the operation of the shadow stack 212 is shown. The program 230 includes a routine A which has a plurality of instructions, one of which is a call to a routine B (call B) at 233. Routine B, likewise, has a plurality of instructions with the last instruction being a return instruction RET. Program flow 230 represents a program flow for the non-native program executing in its native environment. In routine A, when the Call B instruction 233 is executed, it causes the next instruction at address A.sub.N to be pushed onto the non-native return address stack 211, as shown. The stack pointer for the non-native instruction stack 211 is incremented to the next value, thus pointing to the entry for A.sub.N. Routine B is called by routine A and executes its instructions causing at the last instruction (RET) a return which causes a pop from the non-native return address stack 211. The pop delivers the address A.sub.N on the location of the next instruction to be loaded into the program counter for execution.
Were routine A and routine B translated as mentioned above to provide corresponding translated routines A' and B' (242 and 245) during execution of translated code in the native architecture, an instruction Call B' would be encountered at 243. The shadow frame is allocated at the beginning of a routine for all calls that the routine can make. The instruction Call B' causes the shadow stack to be provided with a shadow stack frame 14 which comprises the four above-mentioned fields 20a-20d and the optional fields for local storage. Thus, in field 20a is provided the contents A.sub.N of the stack pointer (SP) 17 of the non-native return stack 11. This value corresponds to the location where the return address stored in the non-native return address stack 211 for the corresponding native instruction execution will be found.
Likewise, stored in field 220b is a copy of the non-native return address that was pushed on the non-native stack by the execution of the call instruction. The non-native return address is provided by the translated image and corresponds to the non-native call for the particular call in the native or translated image. Here the non-native extended instruction pointer has a value corresponding to A.sub.N. Likewise, stored in field 220c is the value of the native return address A.sub.N '. The dynamic link is stored in field 220d which corresponds to the address of a preceding shadow stack frame header. A new dynamic link is produced by saving the value of the shadow stack pointer prior to allocating a new frame. In location 218 is provided local storage for allocated variables provided during the translation of the corresponding routines A' and B' from the translator as mentioned above.
Both the interpreter 44 (FIG. 3) and the translator 54 (FIG. 4) use the shadow stack 212 for determining the next instruction to be executed upon the processing of a return instruction. When translated code is executed in the computer system and a return instruction is encountered, a check is made to determine whether the code that followed the native call in the translator routine was well behaved.
That is, two assumptions are tested. The first is that the non-native code was well behaved with respect to the depth in the non-native return address stack 211. The second assumption is that the code was well behaved with respect to the return address. If both of these conditions are not satisfied then the code following the translated call cannot be executed and the instruction flow has to revert back to the interpreter for continuing execution until such time as it encounters another call or return instruction or possibly a computed jump instruction.
These two conditions are determined by examining the value of the contents of the non-native stack pointer SP as stored in location 220a to determine whether it is equal to the contents of the non-native stack pointer 217. As mentioned above the non-native stack pointer 217 corresponds to the current location on the non-native return address stack 211. Thus this test is a measure of whether the non-native stack 211 and the shadow stack 212 are at the same depth. The second check is to determine whether the return address stored in location 220b corresponds to the return address stored in the location in the non-native return address stack 211 pointed to by the value of the SP pointer 217.
This check thus determines that the return address for the non-native instruction is the same in the non-native stack 211 as well as the shadow stack 212. If this condition is not satisfied then the interpreter changed the value of the return address. If either condition is not satisfied, then execution is continued in the run-time interpreter 44 until such time as another call or return or computed jump instruction is encountered.
CALL ADDRESS TRANSLATION TABLE
Referring now to FIG. 24, a call address translation table 222 is produced during translation of non-native code. As shown the call translation table 222 is appended to the translated code as in field 221. The translated code 221 and the call address translation table 222 provide the image 17c referred to in FIG. 3. The table 222 includes a pair fields one field 223a corresponds to addresses or more particularly to address offsets from the starting address of calls for translated code routines and the other field 223b corresponds to address offsets to the corresponding starting address in the non-native architecture. The table 222 is here appended to the end of the translated image 221 as mentioned above.
Referring now to FIG. 25, the use of the shadow stack 212 as well as a call address translation table as mentioned above is illustrated. As shown in FIG. 25, both table look-ups and shadow stack manipulations are used in the run-time interpreter 44 or a run-time translation system as well as in the execution of translated code. Table look-ups are used for each instance of a call instruction by the interpreter 44 or for each instance of execution of translated code. The shadow stack 212 is used during the processing of return instructions for the interpreter 44 as well as during execution of calls in the translated code.
During execution of translated code there are two possibilities resulting from execution of a return instruction (RET). The first possibility shown as path 256b is that the afore-mentioned test or check is passed and thus the return instruction can safely return and continue execution of translated code. The second possibility shown as path 256a is that if either one of the two checks fails, then execution returns to the possibly updated address in the non-native stack and execution continues or proceeds within the interpreter 44 until such time as a call, computed jump or a second return instruction is encountered.
Similarly, when the interpreter is executing native code in emulation mode, the interpreter likewise performs a check. A first path 258a would be if there is no corresponding translated code available to be used by the interpreter. The second path 258b would be taken if the interpreter encounters a return address in which there is a valid corresponding translated routine. Thus, the shadow stack 212 permits the interpreter to return to execution of translated code without requiring any corruptive or invasive modification of the non-native return address stack 211.
Similarly, with table look-ups when a call 252 is encountered, the interpreter 44 will perform a table look-up which, if there is a corresponding translated routine, will permit the translated code to execute via path 252b. Otherwise, the interpreter 44 will continue execution via path 252a. Similarly, the translated code when it performs a call 254 will determine if there is a corresponding translated routine for the call and, if so, will permit execution via path 254b. Otherwise, control will be transferred back to the interpreter via path 254a.
By providing a shadow stack 212 which runs synchronous to the non-native return address stack 211, several advantages are provided. The first advantage is that since the shadow stack 212 provides storage for native return addresses and other information required in the native system, it is not necessary to place this information on the non-native return address stack 211. Thus, the non-native return address stack 211 is not violated or remains true to that which would occur during normal execution of the non-native program in the non-native architecture. Amongst other things maintaining a true uninterrupted non-native stack 211 permits a non-native exception handler to execute without any complex manipulation to remove native return addresses. In general, when an exception occurs during execution of the native instructions the exception handler in the native architecture only expects to encounter native architecture instruction addresses. And similarly a non-native exception handler only expects to encounter non-native instruction addresses.
Moreover, the shadow stack 212 being accessible to both the translated code and the interpreter 44 permits the interpreter to return control back to translated code since the interpreter can use the shadow stack to determine a valid native return address which will continue execution of translated code. Without the shadow stack 212, therefore, it would be necessary either to place the native return addresses onto the non-native return stack which is undesirable as mentioned above or to make the unit of translation be limited to a basic block. As will be described below this latter option is undesirable since it limits the opportunities for optimization of the translated code. Further, by having a non-native stack 211 and shadow stack 212, non-native return addresses can be separately managed from the native return addresses. This permits exception handlers for each image to properly handle problems which caused an exception since the exception handlers do not have to deal with return addresses associated with foreign code.
Referring now to FIG. 26, a translated routine 260 can have a call 260a which in turn has other calls 261a to 261c to other translated routines such as 262a. Also in a translated routine 264, the routine can encounter a switch/jump instruction 264a which is a computed branch or jump to another routine such as routines 265a to 265c. Management of the shadow stack 212 in conjunction with execution of translated code, execution in an original interpreter and activation of a new interpreter will now be describe
SENTINEL SHADOW STACK FRAME
When a new interpreter activation initializes its native frame for the shadow stack, it pushes a sentinel shadow stack frame header onto the shadow stack 212. The stack pointer address is set at 7FFFFFFF, the largest stack pointer possible, a value which will not be extended by an LDL instruction. This frame is needed for interpreter processing of return instructions. The shadow stack frame return address field 220c is set equal to 1 (a non-zero value) but is never used. The shadow dynamic link field 220d is set equal to 0 to indicate that this is the initial or sentinel frame on the shadow stack. The shadow stack extended instruction pointer is set to 0 and is never used.
During normal interpreter operation, that is, while the interpreter is executing instructions, it does not follow the stack pointer for the shadow stack. Thus, it does not push or place shadow frame entries onto the shadow stack 212 even if the interpreter interprets non-native calls that modify the non-native return address stack 211. If the interpreter encounters a non-native instruction call that calls a non-native instruction routine that has been translated, however, then the interpreter stores the instruction program counter onto the non-native return address stack 211 as in normal operation and into the shadow stack 212. The interpreter 44 also performs a jump to the translated routine's interpreter entry point. The translated routine returns to the interpreter 44 by jumping through one of its entry points as will be described below.
Every translated routine has two entry points. One entry point is called when the interpreter calls it and the other one is called when anoth |