Fast translation and execution of a computer program on a non-native architecture by use of background translator6091897
Abstract
A computer system for executing a binary image conversion system which converts instructions from a instruction set of a first, non native computer system to a second, different native computer system, includes an run-time system which in response to a non-native image of an application program written for a non-native instruction set provides an native instruction or a native instruction routine. The run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction. Thereafter, the non-native instructions and the profile statistics are fed to a binary translator operating in a background mode and which is responsive to the profile data generated by the run-time system to form a translated native image. The run-time system and the binary translator are under the control of a server process. The non-native image is executed in two different environments with first portion executed as an interpreted image and remaining portions as a translated image. The run-time system includes an interpreter which is capable of handling condition codes corresponding to the non-native architecture. A technique is also provided to jacket calls between the two execution environments and to support object based services. Preferred techniques are also provided to determine interprocedural translation units. Further, intermixed translation/optimization techniques are discussed.
Claims
What is claimed is:
1. A memory storing as binary image conversion system which converts instructions from a non-native instruction set to a native instruction set, said binary image conversion system comprising:
a server;
a run-time system transparently initiated upon invocation of an application program, said application program comprising non-native instructions, the run-time system interpreting untranslated non-native instructions of a non-native binary image corresponding to the application program by providing and executing a native instruction or a native instruction routine comprised of a plurality of native instructions, the run-time system receiving from the server and executing previously translated portions of native code, said translated portions of native code corresponding to non-native code portions of the non-native binary image, said run-time system further comprising:
a collector for collecting profile data in response to interpretation of the non-native instructions to determine execution characteristics of the non-native instructions; and
a background system invoked by the server, responsive to the profile data generated by the run-time system, for forming translated portions of native code from and corresponding to portions of the non-native instructions of the application program, wherein said background system translates only portions of the non-native instructions of the application program which have actually been interpreted in response to invoking the application program.
2. The memory of claim 1, wherein said collector for collecting profile data maintains information about a control transfer in the non-native image.
3. The memory of claim 2, wherein said run-time system further comprises:
an interpreter which examines a single non-native instruction of the application program at a time to provide and execute the native instruction or a native instruction routine comprised of a plurality of native instructions.
4. The memory of claim 3, wherein said interpreter comprises:
an examiner for examining the non-native instruction to determine resources needed by the instruction and an equivalent native instruction routine to perform the function called for by the non-native instruction.
5. The memory of claim 4 further comprising:
a generator for producing a data structure, said data structure comprising:
a store of equivalent translated native code segments; and
an address correlation table used to track said native code segments corresponding to non-native code segments, with entries in said table corresponding to addresses of entries of the equivalent translated native code segments of the store.
6. The memory of claim 5, wherein said background system translates into a native image only those non-native instructions of the application program whose execution is predictable.
7. The memory of claim 6, wherein said only those non-native instructions of the application program whose execution is predictable excludes self modifying code.
8. The memory of claim 7 further comprising:
an invoker for invoking the server to provide translated portions of the application program for those portions of the application program which have said translated portions; and
an invoker for invoking the server to obtain from the run-time system a native instruction or a native instruction routine comprised of a plurality of native instructions for those non-native instructions of the application program for which there is no translated native instruction or translated native instruction routine.
9. A code conversion system stored in memory, said memory comprising:
a run-time system initiated transparently upon invocation of an application program, said application program comprising non-native instructions, said run-time system further comprising:
means, in response to a non-native image corresponding to the application program, for providing and executing native code instructions corresponding to untranslated non-native instructions of the non-native image, and
means for collecting profile statistics characterizing targets of executed control transfer instructions;
a server means which provides, to the run-time system, translated native instructions corresponding to non-native instructions of the non-native image; and
a binary translator system invoked by the server means, said binary translator system comprising:
means, responsive to said profile statistics, for translating non-native instructions of an application program into native instructions included in a native image of the application program; and
means for storing the translated native image of the application program on a persistent storage device, wherein
said means for translating translates only portions of non-native instructions of a non-native image whose execution does not cause machine exceptions as determined by the profile statistics.
10. The code conversion system of claim 9 further comprising:
means for invoking the translating means while there is a pause the CPU utilization by the run-time system.
11. The code conversion system of claim 9 further comprising:
means for invoking the translating means after the run-time system completes execution of the program.
12. The code conversion system of claim 9 further comprising:
means for invoking the translating means during execution of the run-time system.
13. A computer system for executing a binary image conversion system which converts instructions from a non-native instruction set to a native instruction set, comprising:
a run-time system transparently initiated upon invocation of an application program, said application program comprising non-native instructions, the run-time system, in response to a non-native image having instructions of the application program, providing and executing a native instruction or a native instruction routine comprised of a plurality of native instructions, the run-time system receiving from a server and executing previously translated portions of native code corresponding to non-native code portions of the non-native binary image, said run-time system further comprising:
a collector for collecting profile data in response to execution of the native instructions and the native instruction routines to determine execution characteristics of the non-native instruction; and
a background system, invoked by said server, responsive to the profile data generated by the runtime system to form translated portions of a native image of the non-native instructions of the application program, wherein the background system translates only portions of non-native images comprising instructions which have been executed; and
said server, further comprising:
a determiner for determining whether there is any translated native image code for the application program, and
a scheduler for scheduling transactions within and between the run-time and background systems.
14. The computer system of claim 13 further comprising:
a producer for producing a data structure, said data structure comprising:
a store of equivalent translated native code segments; and
an address correlation table used to track said translated native code segments corresponding to segments of non-native code, with entries in said table corresponding to addresses of entries of the equivalent translated native code segments of the store.
15. The computer system of claim 14, wherein said collector for collecting profile data maintains information about a control transfer in the non-native image.
16. A method of converting a binary executable compiled for a first non-native architecture into a translated native image for a second native architecture comprising the steps of:
transparently executing the binary executable compiled for said first non-native architecture using an interpreter;
collecting profile data in response to the execution of the binary executable compiled for said first non-native architecture to determine various execution characteristics of instruction sequences; and
in response to the profile data generated by the interpreter, transparently forming said translated native image for at least portions of the binary executable in the non-native image in accordance with the characteristics of the profile data, and wherein, the first time the binary executable is invoked for execution, said step of executing using an interpreter is performed without performing a binary translation of any instructions included in the binary executable, and, upon subsequent invocations for execution, executing previously translated segments of native corresponding to non-native code segments, such that said step of executing using an interpreter is performed in subsequent conversions of the binary executable only for those portions of the binary executable which have not been translated.
17. The method of claim 16 further comprising the step of:
interpreting a single non-native instruction of the machine executable to provide one or more equivalent native instructions, and wherein said step of interpreting occurs each time previously unexecuted portions of the binary executable are executed.
18. The method of claim 17, wherein a server process supplies translated code to the run-time system corresponding to a portion of the binary executable; and
if there is translated code corresponding to said portion of the binary executable, the server process has the translated code executed;
if there is no translated code corresponding to said portion of the binary executable, the server process has an interpreter perform said interpreting step to convert the portion of the binary executable currently executed in the computer system while producing profile statistics which are supplied to the server process, and wherein said server process operates at invocation time of said binary executable.
19. The method of claim 18, further including the step of:
storing he profile statistics with the server process.
20. The method of claim 19, wherein said step of storing the profile statistics further includes the steps of:
merging new profile statistics with existing profile statistics producing merged profile statistics;
comparing the merged profile statistics to the existing profile statistics;
determining, in response to said comparing step, if there are differences between the merged and existing profile statistics; and
initiating a translation of the binary executable by performing said forming step if said determining step determines there are differences.
21. The method of claim 20, wherein said step of determining if there are differences between the merged and existing profile statistics is performed in accordance with a threshold value establishing a level of difference required to be determined prior to performing said initiating step.
22. The method of claim 21, wherein said level of difference is selectable.
23. The method of claim 18, further including the step of:
forming a unique key associated with the binary executable compiled for a first non-native architecture.
24. The method of claim 23, wherein said server process performs said step of determining said unique key.
25. The method of claim 23, wherein said step of forming said unique key uses attributes of the binary executable to form said unique key.
26. The method of claim 25, wherein said attributes include one of: a file size, a creation date of said binary executable, and a time-stamp.
27. The method of claim 25, wherein said unique key is used by the server process to associate the profile statistics with the binary executable.
28. The memory of claim 7, further including:
a determiner for determining if there are any translated portions of the application program;
a native loader for loading the translated portions for execution;
a non-native loader for loading other portions of the application program including non-native instructions, said non-native loader including a connecting means for connecting the translated portions to the other portions that are untranslated enabling the other portions and the translated portions to be executed in the second native computer system.
29. A binary image conversion system which converts instructions from an instruction set of a first, non native computer system to a second, different, native computer system, said binary image conversion system comprising:
a server;
a run-time system transparently initiated upon invocation of an application program comprising non-native instructions which, in response to the non-native instructions, provides a native instruction or a native instruction routine comprised of a plurality of native instructions, said run-time system further comprising:
a component for collecting profile data in response to execution of the native instructions and the native instruction routines to determine execution characteristics of the non-native instruction;
a component for querying the server to determine whether there exists any native image code for the non-native instructions of the application program; and
an interpreter which examines a single non-native instruction of the application program at a time to provide the native instruction or a native instruction routine comprised of a plurality of native instructions, said server invoking said interpreter upon making a determination while executing said application program that there is no native image code corresponding to the single non-native instruction; and
a background system invoked by the server, responsive to the profile data generated by the run-time system for forming translated portions of a native image corresponding to portions of the non-native instructions of the application program which have already been interpreted in response to invoking the application program.
30. The binary conversion system of claim 29, wherein said component for collecting profile data maintains information about a control transfer in the non-native image.
31. The binary conversion system of claim 30, wherein said interpreter comprises:
an instruction examiner for examining the non-native instruction to determine resources needed by the instruction and an equivalent native instruction routine to perform the function called for by the non-native instruction.
32. The binary conversion system of claim 31 further comprising:
a component for producing a data structure, said data structure comprising:
a store of equivalent translated native code segments; and
an address correlation table used to track said native code segments corresponding to non-native code segments with entries in said table corresponding to addresses of entries of the equivalent translated native code segments of the store.
33. The binary conversion system of claim 32, wherein said background system translates into a native image only those non-native instructions of the application program whose execution is predictable.
34. The binary conversion system of claim 33, wherein said only those non-native instructions of the application program whose execution is predictable excludes self modifying code.
35. The binary conversion system of claim 34 further comprising:
an invoker for invoking the server to provide translated portions of the application program for those portions of the application program which have said translated portions; and
an invoker for invoking the server to provide from the run-time system a native instruction or a native instruction routine comprised of a plurality of native instructions for those non-native instructions of the application program for which there is no translated native instruction or translated native instruction routine.
36. The binary conversion system of claim 34, further including:
a component for determining if there are any translated portions of the application program;
a native loader for loading the translated portions for execution;
a non-native loader for loading other portions of the application program including non-native instructions, said non-native loader including a connecting means for connecting the translated portions to the other portions that are untranslated enabling the other portions and the translated portions to be executed in the second native computer system.
37. The binary conversion system of claim 36, wherein the runtime system is implemented in software and the background system is implemented in hardware.
38. The binary conversion system of claim 29, wherein the runtime system is implemented in hardware and the background system is implemented in software.
39. The binary conversion system of claim 29, wherein the runtime system and the background system are implemented in hardware.
40. The binary conversion system of claim 29, wherein the runtime system and the background system are implemented in software.
41. The binary conversion system of claim 29, wherein the runtime system and the background system have portions which are implemented in both hardware and software.
42. A binary translator system comprising:
a runtime system transparently initiated upon invocation of an application program comprising non-native instructions, for generating profile data during execution of an application program describing the execution of the application program;
a component, invoked by a server component, responsive to profile data generated by the run time system, for transparently translating non-native instructions of the application program into native instructions included in a translated native image of the application program wherein said non-native instructions have previously been executed upon invocation of the application program and wherein said non-native instructions have not yet been translated;
a runtime interpreter for interpreting non-native instructions at runtime of said applications program; and
said server component for determining at runtime of said application program whether to invoke said component for translating or said runtime interpreter to execute a current non-native instruction of said application program.
43. The binary translator system of claim 42, wherein said run time system is implemented in software, and said component for translating non-native instructions is implemented in hardware.
44. The binary translator system of claim 42, wherein said run time system is implemented in hardware, and said component for translating non-native instructions is implemented in software.
45. The binary translator system of claim 42, wherein said run time system and said component for translating non-native instructions are implemented in hardware.
46. The binary translator system of claim 42, wherein said run time system and said component for translating non-native instructions are implemented in software.
47. The binary translator system of claim 42, wherein said run time system and said component for translating non-native instructions have portions which are implemented in both hardware and software.
48. The binary translator system of claim 42 further comprising:
said run-time system which, responsive to a non-native image, converts instructions of the non-native image into native code; and
a component for collecting run-time profile statistics characterizing targets of control transfer instructions.
49. The binary translator system of claim 48, wherein said component for translating translates only portions of non-native instructions of a non-native image whose execution does not cause machine exceptions as determined by the profile statistics.
50. The binary translator system of claim 49 further comprising:
a component for invoking said component for translating while there is a pause in CPU utilization by the run-time system.
51. The binary translator system of claim 49 further comprising:
a component for invoking said component for translating after the run-time system completes execution of the program.
52. The binary translator system of claim 49 further comprising:
a component for invoking said component for translating during execution of the program by the run-time system.
53. A binary image conversion system which converts instructions from an instruction set of a non native computer system to an instruction set of a native computer system, comprising:
a run-time system transparently initiated upon invocation of an application program comprising non-native instructions, which, in response to a non-native image having instructions of the application program, provides and executes a native instruction or a native instruction routine comprised of a plurality of native instruction, wherein the run-time system executes previously translated portions of native code corresponding to non-native code portions of the non-native binary image, said run-time system further comprising:
a component for collecting profile data in response to execution of the native instructions and the native instruction routines to determine execution characteristics of the non-native instruction;
a background system responsive to the profile data generated by the runtime system to form translated portions of a native image of the non-native instructions of the application program, wherein the background system translates only portions of non-native images comprising instructions which have been executed;
a component for invoking the background system;
a component for determining whether there is any translated native image code for the application program; and
a component for scheduling transactions within and between the run-time and background systems.
54. The binary image conversion system of claim 53, wherein the background system is implemented in software, and the run-time system is implemented in hardware.
55. The binary image conversion system of claim 53, wherein the background system is implemented in hardware, and the run-time system is implemented in software.
56. The binary image conversion system of claim 53, wherein the background system and the run-time system are implemented in hardware.
57. The binary image conversion system of claim 53, wherein the background system and the rum-time system are implemented in software.
58. The binary image conversion system of claim 53, wherein the background system and the run-time system have portions which are implemented in both hardware and software.
59. The binary image conversion system of claim 53, wherein the background system translates only portions of non-native images comprised of instructions whose execution is predictable.
60. The binary image conversion system of claim 59 further comprising:
a component for producing a data structure, said data structure comprising:
a store of equivalent translated native code segments; and
an address correlation table used to track said translated native code segments corresponding to segments of non-native code, with entries in said table corresponding to addresses of entries of the equivalent translated native code segments of the store.
61. The binary image conversion system of claim 60, wherein said means for collecting profile data maintains information about a control transfer in the non-native image.
62. The memory of claim 1 wherein the server invokes the background system in response to a comparison of recent profile statistics with previously stored profile statistics.
Description
This invention relates generally to computer systems and more particularly to execution of computer programs on non-native computer system architectures.
As is known in the art, computer systems which generally include a central processing unit, a main memory, and input-output device interconnected by a system bus are used to execute computer programs to perform some useful task. One type of computer program is an operating system which is used to interface the central processing unit to an application program. The aforementioned application program is a program used by a user of the computer system to perform the useful task. The operating system includes all of the software resources needed by the computer system to interface each of the hardware elements to the computer system as well as to interface the application program and other programs on the computer system.
The application programs which can include programs such as spreadsheets, wordprocessors, electronic mail executes on the computer system under the control of the operating system. In addition, the operating system often includes routines or libraries which the application program uses during execution.
It is generally known that application programs are written for a particular computer architecture that is, computer instruction set, as well as a particular operating system. Computer architectures are varied, but common architectures include the so-called Alpha.RTM. architecture by Digital Equipment Corporation's assignee of the present invention, the so called X86 architecture which is based upon the family of microprocessors designed and built by Intel Corp., as well as others such as the Power PC.RTM. architecture designed and built by Motorola, IBM and Apple. Other architectures include the VAX.RTM. architecture by Digital Equipment Corporation and the PARISC.RTM. architecture by Hewlett Packard.
Generally, programs written for one architecture are also written for a particular operating system which is supported on the architecture. Thus, for the aforementioned Alpha architecture, the Alpha architecture supports the Windows NT.RTM. operating system by Microsoft Corporation, the Open VMS.RTM. operating system, by Digital Equipment Corporation, and the OSF-UNIX operating system by Digital Equipment Corporation.
These application programs written for one architecture and a particular operating system can not directly execute on a different architecture and/or different operating system. In general a so-called source file (which is a non-executable listing of the program) is fed to a complier and linker to convert the source listings into executable code which a computer can execute.
New computer architectures are developed in order to provide significant performance improvements for the hardware associated with the architecture. For example, the so-called Alpha.RTM. architecture based upon the 64 bit Alpha microprocessor is a RISC type architecture (Reduced Instruction Set Computer). On such an architecture, an application program executes much faster with higher performance and thus provides to a user significant performance advantages.
One drawback to a new architecture, however, is that often application programs written for older architectures can not run directly on the new architecture. This occurs because the instruction sets of the new architecture and the old architecture are different and since different instructions are used in each architecture, the programs are not directly transferable.
While it is desirable for a user to migrate to a new higher performance architecture, one of the most significant drawbacks for a user to migrate to a new architecture is the user's pre-existing investment in software applications as well as data files resulting from application programs run on the old architecture. That is, an architecture which has been in existence for a number of years and used by a large number of users will represent a substantial cumulative financial investment. If that investment is not transferable to a new architecture users will be highly reluctant to migrate to the new architecture.
There are several approaches which have been developed over the years to assist users to migrate from an old architecture to a new architecture. One approach is the so-called "porting approach". With the porting approach a software vendor agrees with the owner of the architecture to take its source code (i.e. the non-executable listing) for a particular application program and run the source code through a compiler developed for the new architecture and to perform some sort of checking such as a quality assurance check at the end of the compilation.
Thus, with porting the source code of application is converted into an executable image which can run on the new architecture. Porting is a viable solution for many applications. In particular applications involved in the so-called enterprise computing field are excellent candidates for porting since the nature of the program and the nature of the customer are such that there is an expectation of a high level of support from the software vendor as well as from the owner of the architecture.
However, porting is not always a viable option for certain segments of the computer market. In particular, in the so-called shrinkwrap software market porting is not a viable option. In the shrinkwrap market computer programs are bought relatively inexpensively and are used in personal computers. Often in this market there is not sufficient support for the user.
A further problem with porting is that a relatively large amount of time is required to port an application program to a new architecture, and it requires positive, active intervention of a human in order to port the application program. Further, the concept of porting assumes the existence or availability of source code which is not always the case.
Another technique commonly employed is so-called "on-line interpretation". In on-line interpretation, a software module called an "interpreter" interprets instructions from an executable version of the application program written in the non-native instruction set and chooses instructions or routines to execute in the new architecture. The interpreter is an emulator which responsive to a executable file corresponding to the old architecture provides a converted executable file that is executed on the new architecture.
The interpreter for each instruction tests the instruction to determine the resources needed by the instruction and analyzes the instruction to determine the function of the instruction. From this testing the interpreter maps the instruction to a routine that performs the same function only written in instructions executable on the new architecture. After the routine is identified the routine is executed in the computer system to provide the equivalent function called for in the application program written in the non-native instruction set. After execution a new instruction in the non-native architecture is provided to the interpreter.
Although interpreters are useful to convert application programs between architectures at run time, one of the significant drawbacks of interpreters is that they are exceedingly slow. Thus, performance advantages of the faster, high-performance architectures are lost with using an interpreter to convert instructions from an old architecture to a new architecture.
On-line translators have been used to aleviate this problem somewhat. In a on-line translator, small amounts of code that appears to the run-time system to be repetitive, is translated to improve the efficiency of the interpretation process. Generally a small buffer of the translated code is maintained and is used to execute until it is determined to translated code which has a higher usage rate. In that case the old translated code is overwritten. There are several problems with this approach. One problem is that there are very limited opportunities for optimization since the translation process can not hold up the interpretation process. A second problem is that translated code is saved only for short periods of time. Therefore repeated translations of the same code must be endured while the results of code are ultimately wasted.
A second technique commonly used is a so-called "static translator". In a static translator, a software program called a translator receives an instruction sequence in the old architecture, and for each instruction sequence provides an instruction sequence in the new architecture to accomplish what the original instruction does in the old architecture. While a translator is generally faster than an interpreter, one problem with a translator is that often times non-native statements in the executable image are translated assuming that the statements were instructions when in fact the statements were not instructions but rather data or some other non-instruction information or heuristics.
This drawback may not cause problems in the execution of an application program in translated code since the translated non-insruction is never reached. Nevertheless, the use of this type of a translator increases the size of the memory file required to store the translated copy since it produces a translation of the entire file including non-code portions. Furthermore, the noncode translation minimizes the opportunities to run the translated code though an optimizer.
An optimizer is a compiler type of routine which operates on executable code to determine opportunities for reordering and otherwise, optimizing the executable code to provide improved performance. Optimizers are not generally useful with such online translators because the online translator invariably translates non-instruction code into instructions in the new architecture. Thus, any optimization which takes into consideration translated non instructions will most likely provide an optimized executable image which would have significant and fatal flaws.
Another problem with static translation is that it requires direct human intervention to invoke the translation process. While this may not be a concern for programs provided to the so-called enterprise market where there is a lot of system support for the user from the independent software vendor as well as the owner of the target architecture, this is a major concern for programs provided as so-called shrinkwrap where the user typically has little or no support.
In the shrinkwrap market, the typical user desires to load the application program and immediately begins execution of the program as desired without the need for intervention with a translation or interpretation process. Thus, the online interpreter and the static translator are not adequate solutions for converting programs from one architecture to another particularity for the so-called shrinkwrap market.
SUMMARY OF THE INVENTION
In accordance with the present invention, a memory storing a binary image conversion system which converts instructions from an instruction set of a first, non native computer system to a second, different, native computer system, said binary image conversion system includes a run-time system which in response to non-native instructions of an application program written for a non-native instruction set provides an native instruction or a native instruction routine comprised of a plurality of native instructions. The run-time system also includes means for collecting profile data in response to execution of the native instructions and the native instruction routines to determine execution characteristics of the non-native instruction. The binary conversion system also includes a background system responsive to the profile data generated by the run-time system to form a translated native image of the non-native instructions of the application program. With such an arrangement, a system is provided to convert non-native images of programs into images that can be executed on a computer system. This permits relatively easy migration to new architectures. The run-time system by collecting data corresponding to execution of the non-native image in the runtime system permits the behavior of the image to be observed through the profile statistics. Also with the profile statistics a process can determine which code to translated in the background system. This eliminates the possibility that non-instruction code is mistaken for instructions by the background system. Therefore, the translated code can be optimized without fear of including non-instructions as part of the optimized code. This provides significant opportunities to use new and more powerful optimization techniques.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing features and other aspects of the invention will now become apparent when the accompanying description is read in conjunction with the following drawings in which:
FIG. 1 is a block diagram of a computer system;
FIG. 2 is a block diagram of a dual stage instruction conversion system including a run-time system and a background system;
FIG. 3 is a block diagram of the run-time system portion of the instruction conversion system of FIG. 2;
FIG. 3A is a flow chart depicting the steps performed at run-time to execute a non-native image on the system of FIG. 1;
FIG. 4 is a more detailed block diagram of a binary translator used in the background system portion of the conversion system of FIG. 2;
FIG. 5 is a block diagram of a data structure representing a profile record structure;
FIG. 6 is a block diagram of a representative profile record of the profile record structure of FIG. 5;
FIG. 7 a diagram showing a typical arrangement for a instruction for a complex instruction set computer (CISC);
FIG. 8 is block diagram of a register file in the computer system of FIG. 1 showing assignment of registers corresponding to the non-native architecture;
FIG. 9 is a diagram showing a typical construct for one of the registers in the register file of FIG. 8
FIG. 10 is a pictorial representation of connections of various data structures including a dispatch table to determine an equivalent routine for the interpreter;
FIG. 11 is a pictorial representation of the process for activating an alternate dispatch table;
FIG. 12 is a diagram showing an arrangement of an entry from the dispatch table of FIG. 10;
FIG. 13 is diagram showing a typical arrangement of condition codes of a CISC architecture which implements condition codes;
FIG. 14 is a block diagram of an arrangement to determine evaluation routines for condition codes;
FIG. 15 is a block diagram of an arrangement to determine evaluation routines for current and previous values of condition codes;
FIGS. 16-18 are a series of diagrams useful in understanding how condition codes are handled in the interpreter of FIG. 4;
FIGS. 19 and 20 are diagrams showing relationship between address spaces;
FIG. 21 is a diagram of a context data structure used in the interpreter of FIG. 4;
FIG. 22 is a block diagram of a pair of data structures stored in memory which represents a return address stack for a non-native image of a program as well as shadow stack for a native image of the program;
FIG. 23 is a diagram showing the relationship between the data structures of FIG. 22 and execution of non-native and native routines with calls into corresponding non-native and native routines;
FIG. 24 is a diagram of a data structure including translated routines and call address translation table;
FIG. 25 is a diagram depicting the relationship of the routine call tables in the translated image and the shadow stack to the on-line and background systems;
FIG. 26 is a flow diagram of a typical application program instruction sequence used to illustrate aspects of the invention;
FIG. 27 is a block diagram showing an example of an object;
FIG. 28 is a block diagram showing an example of cross process calling of object methods;
FIG. 29 is a block diagram showing an example of an interface structure;
FIG. 30 is a flow chart showing an example of steps leading to the use of an object in an object oriented service system;
FIG. 31 is a flow chart showing steps in an example embodiment of a method for intercepting functions to perform interface structure replacement;
FIG. 32 is a flow chart showing an example replacement interface structure;
FIG. 33 shows an example embodiment of a template for a jacket function;
FIG. 34 is a flow chart showing steps performed in an example embodiment for a PBJA jacket function when called from non-native code;
FIG. 35 is a flow chart showing steps performed by an example embodiment of a PBJA jacket function when called from native code;
FIG. 36 is a flow chart showing steps performed by an example embodiment of a PAJB jacket function when called from native code;
FIG. 37 is a flow chart showing steps performed by an example embodiment of a PAJB jacket function when called from non-native code;
FIG. 38 is a block diagram showing an example of a system for load time processing to support interception of functions which take a pointer to an object as a parameter;
FIG. 39 is a flow chart showing an example of steps performed at run time to support interception of functions which take a pointer to an object as a parameter;
FIG. 40 is a flow chart showing an example embodiment of steps performed during general function jacketing;
FIG. 41 is a flow chart showing steps to determine and use translation units when performing a binary translation;
FIG. 41A is a flow chart showing steps to form translation units of a non-native binary image;
FIG. 42 is a flow chart showing steps of flow path determination;
FIG. 42A is a flow chart showing steps to determine transfer of control target locations for an indirect transfer instruction;
FIG. 43 is a block diagram showing two types of entries included in the profile statistics;
FIG. 44 is a flow chart showing steps for determining regions;
FIG. 45 is a block diagram of a list of code cells;
FIG. 46 is a diagram which shows the relationship between FIGS. 47 and 48;
FIGS. 47 and 48 are block diagrams which illustrate an arrangement of local data flow analysis information;
FIG. 49 is a block diagram of an opcode table;
FIG. 50 is a block diagram of a data flow analysis arrangement illustrating the use of read-modify and modify-write fields of the basic block value (BBV) data structure of FIG. 47;
FIG. 51 is a block diagram which depicts the BBSC summary information field of FIG. 48;
FIG. 52 is a block diagram of an arrangement comprising global data flow analysis information;
FIG. 53 is a more detailed block diagram of the global data flow connections of FIG. 52;
FIG. 54 is a block diagram of the control flow edge (CFE) data structure;
FIG. 55 is a flowchart that sets forth steps of performing a global data flow analysis;
FIGS. 56A and 56B are flowcharts that set forth method steps for determining merge points during global data flow analysis;
FIG. 57 is a block diagram of a global data flow analysis arrangement illustrating a merge point.
FIGS. 58A-58D are block diagrams depicting different variations of the binary image transformer;
FIG. 59 is a flow chart of steps of translating the binary image;
FIG. 60 is a flow chart of the step for one method for selecting the translation unit to be processed;
FIG. 60A is a representation of a call graph used in the method steps of FIG. 60;
FIG. 61 is a flow chart depicting an alternative method for selecting a translation unit to be processed;
FIG. 62A is a flow chart listing steps for forming an initial intermediate representation (IR) of a binary image;
FIG. 62B is a block diagram of a data structure illustrating a transformation of a source instruction to an IR with memory operands removed;
FIG. 62C is a block diagram of a data structure used to indicate whether an IR instruction corresponds to a machine instruction which can generate an exception;
FIG. 63 is a flow chart showing steps for translating and optimizing an initial IR to produce the final IR for a given translation unit;
FIG. 64 is a flow chart showing steps for performing condition code processing;
FIG. 65A is a block diagram of a bit mask associated with an IR instruction code cell used to represent condition codes that can be affected by the corresponding IR instruction code cell;
FIG. 65B is a block diagram which depicts an example transformation from source instructions comprising the first binary image as affected by condition code processing;
FIG. 66 is a flow chart depicting steps for register processing;
FIG. 67A is a block diagram which depicts a 32 bit register in an architecture which has partial register operands;
FIG. 67B is a block diagram which depicts a transformation of an initial IR as a result of register processing;
FIG. 68A is a block diagram which depicts a code pattern which is detected by early floating point optimization processing;
FIG. 68B is a block diagram which is a table indicating a replacement instruction for a specific code pattern detected in early floating point optimization processing;
FIG. 69 is a flow chart depicting steps for local basic block and global routine optimization processing;
FIG. 70 is a flow chart depicting steps of code selection and operand processing which place the IR in final form;
FIG. 70A is a flow chart depicting steps of intra image call processing;
FIG. 71A is a block diagram depicting a translated image comprising tables used in exception handling;
FIG. 71B is a block diagram depicting a table entry in a translator exception table; and
FIG. 71C is a block diagram depicting run time transfer of control when a translated image is executed and an exception occurs.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Computer System
Referring now to FIG. 1, a computer system 10, is shown to include a processor module 12 which has a high performance processor 12a. The computer system 10 further includes, in addition to the processor module 12, a main memory 14, an disk adaptor 15 and an I/O user interface 18, as well as a monitor 19 all coupled by a system bus 13, as shown. Here the processor 12a is a high performance microprocessor such as an Alpha.RTM. microprocessor manufactured by Digital Equipment Corporation, assignee of the present invention, or other high performance processor.
The main memory 14 is comprised of dynamic random access memory and is used to store instructions and data for use by the microprocessor 12a on the processor module 12. The disk adaptor 15 is used to couple the system bus 20 to a disk bus which itself is coupled to disk storage device 17.
The disk storage device 17 is here illustratively partitioned into a plurality of segments or blocks of data which are here represented for convenience as being self-contained and contiguous, but which may be scattered across the disk 17 and be non-contiguous. The disk 17 includes a first storage segment 17a storing an operating system for the computer system 10 as well as an application program stored in segment 17b.
The application program stored in segment 17b is a non-native executable image. That is, the application program is comprised of instructions from a different instruction set than that used in the computer system 10 (i.e. a different computer architecture). Also the application program could have been written for a different operating system than that stored in 17a. Since the instructions provided in the program stored in segment 17b are different from the instruction set executed on the microprocessor 12a the program in segment 17b can not be directly executed on the system 10.
The disk also includes a storage segment 17c which here represents an native executable image of the application program stored in segment 17b. This native image is generated in the computer system via an instruction conversion system (16, FIG. 2) which is here stored with the operating system in the segment 17a as will be described. The image stored in segment 17c corresponds to instructions which can be executed on the microprocessor 12a and thus conforms to the architecture of the computer system 10.
Also stored in a segment 17d are profile statistics which are collected during execution of a portion of the non-native application program stored in 17b. The profile statistics are provided by execution of a run-time routine which converts non-native instructions into native instructions. These profile statistics are used in a background process to convert portions of the non-native image into a native image corresponding to the operation and function of those portions of the non-native application program. In addition, data which are used for the particular application program are also be stored on the disk in segment 17e.
The computer system 10 further includes an I/O user interface 18 which is here an interface used to couple a mouse 19a, for example, to the system bus 20 as well as a monitor 19.
The computer system 10 operates in a generally conventional manner. That is, at "power on", selected portions (not numbered) of the operating system stored in segment 17a are loaded into main memory 14 and occupy a particular address space in main memory 14, such as, address space 14a. As a user of the computer system 10 executes application program on the system 10, the application programs are run under the control of the operating system.
A typical operating system represented by that stored in 17a is the so-called Windows NT.RTM. operating system of Microsoft Corporation Remond, Wash. In Windows NT.RTM. or other window type operating systems, displayable images called "icons" are presented to a user on the monitor 19. These icons represent an executable command to initiate execution of a program. When pointed to by a cursor controlled by a mouse, for example, and clicked on this user action activates the command and causes the represented computer program to execute.
Here, however, the application program stored in segment 17b is written in a non native instruction set. That is, the instruction set of the application program is not the same as the instruction set of the computer system 10. Thus, the executable image of the application program stored in segment 17b is comprised of non-native instructions which can not be directly executed on the computer system 10. Nevertheless, the non-native application has a corresponding icon (not shown) which is represented in the window provided by the operating system.
Each non-native application image has a unique identification name (ID) or image key. The identification name or image key is included in the non-native image file and is unique identifier for the non-native application image. During installation of the file containing the image, typically a server process portion of the operating system determines the unique ID or key to the non-native application image. The ID number is generally assigned by concatenating together unique information of the file. Examples of the types of information include, the time stamp of the file, the file name, the file size and the date that the file was originally produced. Thus, the same non-native image if loaded a multiplicity of times on the computer system will have the same I.D. number. The statistics as well as the translated code associated with each one of the non-native images will be the union of all prior executions of the non-native images for each instance of the non-native application. Other arrangements are of course possible.
When the user clicks on the icon for the program stored in 17b, a portion of the operating system recognizes the ID of the executable image represented by that icon as being comprised of instructions that are non-native to the instruction set and architecture of computer system 10. In general, a software module called a loader in the operating system will recognize the identification name (ID) of the file represented by the selected icon as being non-native to the architecture. Thus, the operating system initiates the execution of an instruction conversion program 16 or feeds the file instruction by instruction to an instruction pre-processor. Alternatively, a loader can be provided which handles the non-native image by examining the image to determine all files, libraries and resources needed by the image. The loader will thus prepare the non-native image for execution. Part of the preparation is the initiation of the instruction conversion program 16 or alternatively the instruction pre-processor, as will now be described.
Binary Image Conversion System
Referring now to FIG. 2, the binary image conversion system 16 is shown to include a run-time system 32 which is responsive to instructions provided from the disk segment 17b. As mentioned, the run-time system 32 can be implemented as software to emulate the non-native architecture or as a hardware preprocessor to convert the non-native instructions into native instructions. When implemented as software, the run time system 32 consumes more disk space on disk 17 and occupies more main memory storage in may memory 14. Whereas, when implemented in hardware, the run time system 32 requires more chip space in the high performance microprocessor 12a. Here the run-time system will be described as a software implementation which operates in an execution address space 20 of the computer system 10.
As mentioned above, disk segment 17b stores instructions of an application program complied and/or written for an instruction set which is different from the instruction set of system 10. The run-time system 32 receives porrtions of a non-native executable image from segment 17b comprised of the non-native instructions. The run-time system 32 provides a native instruction or a native instruction routine comprised of a plurality of instructions which are executed by the computer system 10 to provide the same functionality as the non-native image. That is, the functionality called for in the instruction in the executable image of the non-native instruction set is equivalently provided by the routines determined by the run-time system 32. The run-time system executes the equivalent routines on the computer system 10. This provides the equivalent function to provide the same result in computer system 10 which implements the new architecture as would occur in a new or old computer system (not shown) implementing the non-native architecture.
In a preferred embodiment of the run time system 32, the run-time system 32 examines and tests the code from the segment 17b to determine what resources are used by the instruction and the function of the instruction. The run-time system 32 provides the equivalent instructions corresponding to the architecture of the computer system 10.
As the equivalent instructions are determined they are executed in the system 10 and profile data or statistics, as will be described, are collected in response to their execution. The profile statistics describe various execution characteristics of the instruction sequence. These profile data are fed to a sever process 36 via a datapath 32b.
Prior to performing a conversion by the run time system 32, the run-time system 32 interrogates the server process 36 via a path 32a to determine from the server process whether there is a native image corresponding to the routine of the application program stored in segment 17b whose execution has just been requested by a user. If a native image does not exist (as would occur the first time the non-native image is executed), the run-time system initiates an interpretation process. If there is code in existence for the particular instruction reached in the application program, due to a prior execution in the run-time system and subsequent conversion by a background system, the run-time system 32 will request and execute the native code.
As mentioned, in general, the first time the application program 17b is executed by a user there will be no native image code in existence. As the program executes, however, native code will be generated by the background process in a manner to be described, and over time as substantial portions of the non-native image are executed, convertible portions of the non-native image will be converted by the background process into native image code. As native image code is generated, it is also stored in segment 17c in a manner that is transparent to the user.
In addition the native image file 17c contains an address correlation table which is used to track the segments of native code corresponding to segments of non-native code. This table is used at run time of the program in segment 17b to determine whether and which non-native segments have equivalent translated native segments.
Translation into the native image is provided via a background system 34 which operates in one embodiment after the interpreter has finished execution of the instructions to provide translated code dependant upon the execution characteristics of the run-time converted instructions. Alternatively, the background system operates while there is a pause in CPU utilization by the run-time system 32. Alternatively, the background system can make translated code available to the run-time system 32 during execution to permit substitution of translated code for a subsequent occurrence of the non-native image during the current execution of the application program. Further still, the run-time system can be implemented as a preprocessor which provides the profile statistics for use by the background process. The background process can be implemented in hardware or software or a combination of both.
The background system 34 receives the profile data generated by the run-time system 32. In accordance with the characteristics of the profile data, the background system 34 forms a native image of at least portions of the instructions of the non-native image stored in segment 17b of disk 17. A preferred arrangement is to have the background system implemented as a binary translator to produce translated code. The native image portions are stored in logical disk drive 17' for use if needed in subsequent executions of the application program from segment 17b. Here it should be understood that the logical disk drive 17' is a logical partition of the disk drive 17 and is here referred to as being a logical disk drive, because in general, it is transparent to the user, but it physically represents space storage such as segment 17c on the actual disk drive 17. Alternatively, the logical disk drive 17 could be a separate disk drive.
The run-time system 32 and the background system are each under the control of the server process 36. The server process 36 is present throughout the operation of the computer system 10. The server process 36 is a software service process which, amongst other things, is used to schedule various transactions within and between the run-time 32 and background systems 34.
After generation of native image code such as by the binary translator, the image translated code is stored on logical disk drive 17' in logical segment 17d with the profile statistics being stored in logical segment 17d. These locations correspond to segments 17c and 17d in FIG. 2.
Each time there is a new execution of the application program stored in segment 17b, the run-time system will send a request to the server process 36 for native code corresponding to the non-native code currently in the run-time system 32. The translated code is code which was generated by a previous execution of the background system 34 in accordance with the profile statistics collected by execution of the routines furnished by the run-time system 32. The server process 36 supplies corresponding translator code (if any) to the run-time system 32. If there is translated code, the run-time system 32 will have the translated code execute in place of interpreting the code. Otherwise if there is no translated code, the run-time system 32 will interpret, translate, or otherwise converted the relevant portions of the non-native code currently executed in the computer system 10.
As more code of the program stored in segment 17b is executed, more sections of the program are interpreted producing as a result of the execution, profile statistics which are fed to the server process 36.
The server process 36 controls inter alia the storage of the profile statistics. That is, the server process 36 will merge new (raw) statistics with previously stored merged statistics to provide a new merged profile. The server process will compare the new merged profile with the stored merged profile and will initiate a translation process in the background system 34 when there is a difference between the two statistics. The degree of difference needed to initiate execution is selectable. Such a difference indicates that heretofore never executed code was interpreted and executed in the run-time system. This process will be ongoing until all portions of the non-native image have been encountered by the user and all of the portions which can be translated by the background system 34 have been translated.
The server process also determines the unique key or I.D. number to uniquely identify the non-native image stored in segment 17b. As mentioned above, the attributes of the image comprising the I.D. include the file size, the date of creation of the image, the time stamp and so forth. This key is also used to identify the profile statistics with the non-native program.
The background system 34 will, in general, translate nearly all instructions provided from the non-native applications stored in 17b. Certain types of instructions are preferably not translated. In general those instructions which are not translated are ones in which the execution of the instruction is not predictable. For example, instructions which are self modifying (i.e. are not in read only sections, that is, are on a writtable page) will not be translated. For these instructions the run-time system will execute them via the interpretation routines. Further, instructions for which in the non-native architecture there is no easily produced analog in the native architecture will not be translated. For example, in the X86 architecture of Intel, floating point instructions use a floating point control register to determine inter. alia. rounding modes etc. Although for many executions of the instructions the contents of the register may be in a normal state, this can not be guaranteed. Rather than have the translator determine the state it is more economical to handle these instructions.
Since execution or profile statistics in part determines what code is translated by the background translator, non-instruction code is not mistaken for instructions by the translator. Therefore, the translated code can be optimized without fear of optimizing non-instructions.
Referring now to FIG. 3, the run-time system 32 is shown to include an execution address space containing run-time system 32 which includes a run-time interpreter 44, a non-native loader 42 which is fed the ID corresponding to the non-native application image provided from segment 17b of the disk 17, a native image loader 43, native operating system dll's (dynamic link libraries) 45 and a return address stock management arrangement 210. The non-native loader 42 is similar to the native image loader 43 except it is capable of handling non-native images and interrogates the server process to determine whether there is any native code corresponding to the non-native code awaiting execution. The non-native loader 42 receives instructions corresponding to a non-native image of the application segment 46a and a native image of the application 46b corresponding to translated instructions provided from the background translator 34, and segment 46c corresponding to data. The non-native loader 42 is used to initially load the non-native file. The native loader 43 is used to initially load the native file if any.
Referring now also to FIG. 3A, at the initiation of an execution of the program stored in segment 17b, (via selection of the appropriate icon) (step 50a) the native loader 43 determines whether an architecture number associated with the non-native image indicates a native or a non-native image. If the image is a native image execution continues as normal. If however the image is a non-native image, the native loader 43 calls the non-native loader 42 at step 50b. The non-native loader 42 loads the non-native image at step 50c and also recognizes that this architecture number associated with the program represents an application program written for a non-native instruction set. The non-native loader starts the code conversion program 16. The non-native loader 42 initially queries the server 36 at step 50d to respond with native code to accelerate execution of the image represented by the code stored in 17b. It should be appreciated that the function of the native loader 43 and the non-native loader 42 can be combined into a single loader.
If this is the first time running the application, the server 36 responds at step 50e by indicating that there is no corresponding native image to execute in place of the non-native image. Therefore, the non-native loader 42 instructs the interpreter 44 to begin an interpretation at step 50f of the instructions from the non-native image. The interpreter 44, for each instruction, determines the length or number of bytes comprising the instruction, identifies the opcode portion of the instruction, and determines the resources needed by the instruction. The interpreter maps the non-native instruction to a native instruction or a native sequence of instructions based upon inter alia the opcode. These instructions are executed by the computer system 10 in the address space 20. The run-time interpreter 44 collects data resulting from the execution of the instructions as will be described in conjunction with FIG. 6. These "profile statistics" are stored by the server 36 on the logical disk drive 17'.
The run-time interpreter 44 examines and analyzes the instructions to determine the proper native instruction sequence to replace for the non-native instructions provided from the executable image 46a. These native instructions as they are executed continue to generate profile statistics which are collected and stored in logical disk drive storage 17d. This process continues until execution of the program 17b is terminated by the user.
After termination of the execution of the non-native program, a background process 34 is initiated (not shown). Alternatively, the background process 34 could be initiated to steal execution cycles from the run-time process 32 or alternatively could be used to substitute into the run-time process translated native image code for routines which are subsequently called during execution of the program 17b, as explained above. The exact sequence of which the background processor is used in conjunction with the run-time processor is an implementation detail.
For subsequent executions of the program the interpreter 44 will only provide interpreter code if the server process 36 does not return a native image equivalent of the sequence which is provided from the background process 34 as will be described.
Thus, if at step 50e the server responds with native code, the native image loader 42 at step 50g loads the native code. After the native image code is loaded, the non-native image loader 42 is called at step 50h to fix up the image. In general the non-native image will provide address tables corresponding to inter alia variables in the non-native image which are needed in the execution of the native image. That is, at step 50h the native and non-native images are stitched together to enable the native image to use information in the non-native image. At step 50i the native code is executed. In general, the native code that is executed corresponds to one or more basic blocks or routines of instruction which terminate by a return statement. After execution, a determination is made based upon characteristics of the return instruction execution and by use of a shadow stack as will be described, whether native image code can continue to be executed. If not then control is transferred to the interpreter. The interpreter continues to interpret and execute until it determines as at step 50k that it can resume using native code.
As also shown in FIG. 3, a jacketing routine 48 is used to jacket functions leaving the execution address space 20 to the native execution space of the computer process of computer system 10 as well as those arising from the native execution space of the computer processor 10 into the execution address space 20 as will be further described in conjunction with FIGS. 27-40.
Referring now to FIG. 4, a preferred embodiment of the background system 34 is shown under the control of the server 36 (FIG. 2). The server 36 determines, responsive to the profile statistics data provided from the server 36, via logical disk drive 17', whether to initiate a translation process in the background. Preferably, the background system 34 translates only portions of the non-native instructions of the application program which were actually executed (via the interpreter 32) in responsive to a session invoking the program.
The non-native image code is examined at 52 in the server and if the code is the type that should be translated, it is fed to the translator 54. In a preferred environment, the translated code 54 is also fed to an optimizer 58, and again, if the type of code is of a type which can be optimized, it is fed through to the optimizer 58 or else, the process exits or terminates to await the submission of the new code from 17b. Other, techniques for performing translation and translation/optimization will be described. After the translator process 54 and/or the optimization process 58, either translated code is stored in segment 17b' or optimized translated code is stored in segment 17b'.
Profile File Data Structure
Referring now to FIG. 5, a profile file data structure 60 used to store information gathered at execution time by instructions in the interpreter 34 is shown. The data structure 60 has records which contain information about the execution of a non-native architecture program when the program executes control transfer instructions. The profile record can include other information. That is, the profile records contain information about a target address encountered in the non-native image.
The data structure 60 is shown to include two principal sections. The first section is a profile header section 62 which comprises an image key field 62a. The image key field 62a is used to store information regarding the ID or identification of the profile record. The information in this field 62a is used to associate the profile statistics with a corresponding non-native image and its associated translated code, if any. Thus, the image key field 62a corresponds to the image ID or key field as mentioned above. The profile header 62 also includes a version field 63 comprised of a major version field 63a and a minor version field 63b". The major version field 63a and minor version field 63a" are each here 16 bit or 2 bytes in length and their union provides a resulting 32 bit version field 63b. The version fields are used to keep track of which version of the interpreter was used to generate the profile statistics in the table and the profile file format.
The profile file 60 also includes a plurality of raw profile records, here 64.sub.a -64n. Each of the profile records 64.sub.a -64n maintains information about run-time execution of control transfer instructions in the non-native image. Each of these records are variable length records as is each of the unique profile files 60. Thus, for each control transfer encountered during execution of the non-native image in the interpreter 34 a raw profile record is produced. The interpreter 34 will place into the raw profile record information regarding the execution of the control transfer instruction. The information which is included in the raw profile record is as described below. Suffice it here to say, however, that the raw profile records are used by the server process to provide a profile record which is then used during translation of the associated routines in the background system.
Referring now to FIG. 6, an exemplary one of the raw profile records here 64.sub.n is shown. The raw profile record 64.sub.n includes a profile record structure 66 including an address field 66a, a flag field 66b and a count field which tracks the number of indirect targets of control transfer 66c. The address field 66a contains the actual target address in the non-native image, as determined by the interpreter 44. This address is the actual target address of the instruction that caused a control transfer during execution of the non-native image. The address field 66a is generally the address length of the non-native architecture or here 32 bits or 4 bytes long. The flags field 66b contains the states of the flags at the target address. The flags field 66b is here 2 bytes or 16 bits long. The n.sub.-- direct field 66c is a counter field which keeps track of the number of indirect target or computed target addresses contained in the remainder of the profile record 64.sub.n as will be described below.
There are additional optional fields 70 which comprise the record. One field is a count field 70a which corresponds to either the number of times a control transfer occurred to the address contained in field 66a or a count branch taken field counter which keeps track of the number of times a branch was taken by the instruction corresponding to the address contained in field 66a. Fields 70b.sub.0 -70b.sub.n correspond to addresses which are the targets of the control transfer and are cumulatively maintained in the profile record structure.
The optional fields 70 are used to keep track or maintain a count of the targets of the control transfer instruction in the image. The count field 70a is either a control transfer field count of the number of times control was transferred to the target address or a branch taken field corresponding to the count of the number of times a conditional control transfer of a branch instruction was taken. The type of field 70a is determined by the flags field 66b being "ANDED" or masked with a value which tests the state of the associated flag. This test determines whether the target address was a result of control transfer instruction or a branch instruction. This optional field is also a long word.
The target of control transfer fields 70b.sub.1 -70b.sub.n are the target addresses of the control transfer which occurred at the control transfer instruction. These fields keep track of the addresses for indirect transfers, that is, transfers to a run-time computed target address.
The profile statistics are managed by the server process 36. The profile statistics are collected by the interpreter 44 during the course of execution of the emulated code. For each execution the server 36 searches for a profile record corresponding the target address. The server 36 merges the new run-time statistics with the existing statistics to produce a new profile file.
The server 36 makes use of a software cache and hash table (not shown) to keep track of the profile records. For an address which need to be looked up the address is looked up in the cache in 4 different locations that is by using a four way associative cache (not shown). If the address is not there it is looked up in a conventional hash table. The information in the hash table is the count values for the fields.
Run-Time Interpreter
Details of an interpreter used to convert non-native instructions to native instructions and provide profile or run-time statistics will now be described. In particular, how the interpreter 44 interprets instructions of the so-called X86 architecture by Intel Corporation, San Francisco, Calif. into ALPHA instructions by Digital Equipment Corp. will be described.
Referring now to FIG. 7, and X86 instruction 100 is shown to include as many as six different fields. These fields are an opcode 100a, an rm byte 100b, a scaled index and base (sib) byte 100c, a displacement 100d, any immediate data 100e, and any one of six types of prefixes 100f.
The opcode 100a defines the task or operation which will be executed by the instruction 100. The rm byte 100b is an effective address specification and is used in conjunction with the opcode 100a to specify a general operand as a memory or register location and, in some cases, also participates in defining the operation. The sib byte 100c is used in conjunction with the rm byte 100b to provide additional flexibility in addressing memory locations. The displacement field 100d provides a displacement from the base register or from virtual zero of a segment. The immediate data field 100e provides immediate data to the opcode 100a.
The prefixes 100f are located before the opcode 100a in the instruction 100. Possible prefixes 100f are a segment override which implements a second (or multiple) addressing space, a repeat specifier value to repeat a specific instruction n times, a lock assertion for synchronization in multiple CPU environments, an address size prefix which selects between 16 and 32 bit addressing, an operand size prefix which selects between 16 and 32 bit operands, and an opcode prefix which selects an alternative opcode set.
From the opcode 100a it can be determined whether an rm byte 100b, an unconditional displacement, or the immediate data field is provided in the instruction 100. It can be determined from the rm byte 100b whether a sib byte 100c and/or a conditional displacement field 100d is included in the instruction 100. As all fields are not required by each Intel instruction 100, Intel instructions are not of a fixed length, but rather are of varying lengths.
The run-time interpreter 44 (FIG. 3) is, in the preferred embodiment, implemented on a computer system 10 (FIG. 1) which conforms to the Alpha architecture. An Alpha architecture computer system operates using the Alpha instruction set which is comprised of fixed length instructions. The run-time interpreter 44 operates on a single Intel instruction at a time. For each Intel instruction a single Alpha instruction or multiple Alpha instructions forming a corresponding Alpha routine, is provided which is an operational equivalent to the Intel instruction.
To transparently emulate the execution of an Intel or other non-native instruction 100 the on-line interpreter 44 should be capable of emulating the operation of the Intel or non-native memory, registers, condition codes and a program counter which, on a 32 bit Intel machine is referred to as an extended instruction pointer, EIP. In this way, a result of the execution of the instruction 100 is recorded accurately.
The on-line interpreter 44 used the same memory space for data while executing Alpha routines corresponding to Intel instructions as is used when executing native Alpha instructions. This is possible because the strict standards to which Win32 software applications adhere allow for differences in calling conventions but not in the representation of the data. The maintenance of the Intel registers, condition codes and EIP are discussed below.
Referring now to FIG. 8, a table 101 depicting Intel or non-native values assigned to the registers of computer system 10 is shown to include either registers which are assigned to emulate the operation of the eight Intel integer registers, EAX 104a, EBX 104b, ECX 104c, EDX 104d, EDI 104e, ESI 104f, EBP 104g, and ESP 104h. A single register, CONTEXT 105, is assigned to serve as a pointer to the emulator state context maintained in memory which is used to manage each thread executing in a multitasking environment. An additional register, FSP 106, stores a floating point stack pointer for addressing an eight entry stack of floating operands.
Three registers CCR 107a, CCS 107b, and CCD 107c are assigned to store information which allow condition code bits to be maintained in an unevaluated state by the on-line interpreter 44. The SHADOW 108 register provides a pointer to the shadow stack (as will be described) which maintains activation records for translated code. The SEGOFF 109 register maintains an offset from address zero in the native architecture memory permitting the native architecture to emulate multiple addressing spaces which are possible in the Intel architecture and other non-native architectures. Four additional registers T0 110a, T1 110b, T2 110c and T3 110d are assigned as temporary registers.
The frame 112 register identifies the activation record at the most recent activation of the run-time interpreter 44. The Emulator's Return Address, ERA 114, register stores the return address when the run-time interpreter 44 calls a private sub-routine. The Effective Address, EA 116, register stores the result of evaluating an RM byte 100b and to specify a memory address to a memory access routine.
Seven of the remaining registers, NXTEIP 118a, NXTQ.sub.-- LO 118b, NXTQ.sub.-- HI 118c, NXTJMP 118d, Q0 118e, Q1 118f and QUAD 120 retain values which are used by the on-line interpreter 44 to identify a complete Intel instruction 100 from the instruction stream and to provide pipelining capabilities.
To identify an Intel instruction 100, the run-time interpreter 44 assembles an eight byte (64 bit) snapshot of the instruction stream beginning at the start of the current Intel instruction number. This quadword is retained in QUAD 120.
To assemble QUAD 120, the run-time interpreter 44 captures two quadwords of information from the instruction stream. The run-time interpreter 44 uses the address in the instruction stream identified by the next extended instruction pointer, NXTEIP 118a, as the starting address for the first quadword. NXTEIP 118a identifies a random byte in the instruction stream at which the next instruction to be executed begins. Here, computer system 10 (FIG. 1) requires a quadword aligned address for this initial capture. Accordingly, if NXTEIP 118a is not a quadword aligned address, the three low order bits are first zeroed thus forcing the capture to occur beginning at a quadword boundary. The quadword captures beginning at this quadword aligned address is stored in register Q0 118e. By executing the capture in this manner, the quadword stored in register Q0 118e will at least provide the low byte of the next instruction.
The second quadword capture occurs at an address identified by NXTEIP 118a incremented by seven bytes. Here again, computer system 10 requires a quadword aligned address for this second capture. If the address identified by NXTEIP 118a incremented by seven bytes is not quadword aligned, the run-time interpreter 44 forces the three low order bits to zero thus forcing the address to be quadword aligned. From this quadword aligned address, the capture is performed and the quadword is stored in register Q1 118f. Here, the quadword stored in register Q1 118f contains at least the high order byte of the quadword beginning at the next instruction as identified by NXTEIP 118a.
To extract the low order bytes of the quadword beginning at NXTEIP 118a, the run-time interpreter 44 executes an instruction which, using the three low bits of NXTEIP 118a, determines a byte in register Q0 118e which is identified by NXTEIP 118a, whether or not this byte is quadword aligned. The data in register Q0 118e is copied to register NXTQ.sub.-- LO 118b and shifted right to locate the byte identified by NXTEIP 118a in the low order byte register NXTQ.sub.-- LO 118b. The high order bytes of NXTQ.sub.-- LO 118b which, after the shift, no longer contain valid information are zeroed.
The three low bits of the address identified by NXTEIP 118a incremented by seven bytes is used to determine the high order byte of the quadword beginning at NXTEIP 118a. Here, the data in register Q1 118f is copied by register NXTQ.sub.-- HI 118c shifted left to locate the byte identified by NXTEIP 118a incremented by seven bytes in the high order byte of register NXTQ.sub.-- HI 118c. Here, the low order bytes of NXTQ.sub.-- HI 118c which no longer contain valid information as a result of the shift are zeroed. The result of ORing the contents of registers NXTQ.sub.-- LO 118b and NXTQ.sub.-- HI 118c is stored in QUAD 120.
Referring now to FIG. 9, the low bit of QUAD 120 is shown to be aligned with an Extended Instruction Pointer, EIP 121. In an Intel machine, the EIP 121 identifies a location in the instruction stream which corresponds to the beginning of the current instruction. As each instruction in the instruction stream is executed, the EIP 121 is incremented in the instruction stream to point to the beginning of the next instruction. QUAD 120, therefore, holds a quadword of information beginning at the byte identified by EIP 121.
To determine the operation of the Intel instruction 100 and a corresponding Alpha routine which performs the operational equivalent of the Intel instruction 100, the interpreter uses the information contained in QUAD 120. Typically, the first byte of an Intel instruction is the opcode 100a as shown in FIG. 7. The run-time interpreter 44 extracts the first and second low bytes 120a, 120b of QUAD 120 to provide a two byte instruction fragment 122. From this two byte instruction fragment 122 (FIG. 10), a corresponding Alpha routine and the length of the instruction 100 are determined.
Referring now to FIG. 10, a process 130 to determine the length of the Intel instruction 100 and the corresponding Alpha routine which implements the operational equivalent of the Intel instruction 100, is shown to include extracting the two low bytes 120a, 120b from QUAD 120 to provide the two byte instruction fragment 122. This two byte instruction fragment 122 is used as an index into a dispatch table 131 which resides in system memory 14 (FIG. 1).
The dispatch table 131 includes 2.sup.16 =64K (65536), 32 bit entries of which entry 131i is representative. Each entry corresponds to each instruction in a set of instructions available in the Intel instruction set. The contents of these 32 bit entries 131i include a field 131a containing an address at which the corresponding Alpha routine resides in system memory 14 as well as a field 131b containing the length of the instruction.
The dispatch table 131 is generated by a tool which identifies each instruction in the Intel instruction set such that the two byte instruction fragment 122 is sufficient information to identify the proper entry which corresponds to the current Intel instruction 100. The tool also provides the complete length of the Intel instruction 100 and includes this information in the dispatch table in the length field 131b along with the location of the Alpha routine which will provide the functional equivalent of the Intel instruction 1000 in the address field 131a. The run-time interpreter 44 chooses among either dispatch tables based upon the sequence of prefix elements 100f preceding the actual opcode 100a.
As discussed above in conjunction with FIG. 7, an Intel instruction 100 may be comprised of multiple elements 100a-100f. Multiple dispatch tables are provided by run-time interpreter 44 to handle the different values and combination of values which can be selected by the prefix element 100f. As discussed above, three possible prefixes 100f are addressing size (16 or 32 bits), operand size (16 or 32 bits) and two byte opcode, which selects an alternative opcode set. Any one or combination of these prefixes 100f may be present in an Intel instruction 100.
The addressing size prefix toggles between an addressing size for the Intel system which truncates address arithmetic to 16 bits or to 32 bits. Typically, the address size is 32 bits. The operand size prefix is similar wherein an operand expected by the system is 16 bits under a 16 bit operand size or 32 bits when the operand size is set for 32 bits. Here again, the typical operand size is 32 bits. The final prefix toggles between two alternative opcode sets. The first is a one byte opcode set and the second is a two byte opcode set. Here, a one byte opcode set is typically selected. A dispatch table similar to the dispatch table 131 in FIG. 10 is provided in system memory 14 for each of the eight possible combinations of prefixes 100f, the default dispatch table is dispatch table 131 having a 1 byte opcode with a 32 bit addressing size and a 32 bit operand size.
In addition to an entry for each instruction, also included in dispatch table 131 is an entry for each prefix 100f and prefix 100f combination. The 32 bit entry 131j, corresponding to a prefix 100f, activates a different dispatch table in memory 14 in which the subsequent opcode 100a in the instruction stream and its corresponding two byte instruction fragment 122 may be used to index the proper 32 bit entry 131i.
Referring now to FIG. 11, a process for activating an alternate dispatch table 131' is shown to include extracting a two byte instruction fragment 122 from QUAD 120. The two byte instruction fragment 122 is used as an index into the dispatch table 131.
Here, the two byte instruction fragment 122 identifies an entry in the dispatch table 131j. The dispatch table entry 131j includes a native routine address 131a in memory 14 and the length 131b of the Intel instruction 100 which here, is 001 or one byte. The first byte of the two byte instruction fragment 122 is a prefix 100f to instruction 100 which selects 16 bit addressing. Accordingly, the native routine 132 identified by the native routine address 131a, instructs the run-time interpreter 44 to activate the dispatch table 131' which corresponds to an instruction set implementing 16 bit addressing.
The length 131b of the Intel instruction 100 is provided to the run-time interpreter 44 which increments EIP 121 one byte in QUAD 120 to identify the beginning of the next instruction. A new two byte instruction fragment 122' is extracted from QUAD beginning at the new location identified by EIP 121. This two byte instruction fragment 122' identifies an entry 131i' in dispatch table 131'. Again, the two portions of the dispatch table entry 131i' identify the native routine address 131a' in memory 14 of the native routine 134 which is the operational equivalent of the Intel instruction 100 and the length 131b' of instruction 100.
The run-time interpreter 44 executes the native routine 134 which provides the operational equivalent of Intel instruction 100. Once complete, the on-line interpreter activates the default dispatch table 131 for 32 bit addressing and operands and one byte opcodes. While the run-time interpreter 44 is executing the native routine 134 for Intel instruction 100, the process just described allows the run-time interpreter 44 to identify the beginning of the subsequent instruction by incrementing EIP 121 (FIG. 9). In addition, the entry in the active dispatch table 131 which corresponds to the subsequent instruction is also identified. From this entry 131n, the address of the native routine 131na corresponding to the subsequent instruction as well as the length 131nb of the subsequent instruction are determined. This arrangement allows the on-line interpreter to operate in a pipelined fashion, executing multiple instructions in parallel.
Referring now to FIG. 12, a 32 bit entry 131i from dispatch table 131 is shown to be divided into two sections, the first section 131a corresponding to bits 3-31 of the 32 bit entry and the second section 131b corresponding to bits 0-2 of the 32 bit entry. Bits 3-31, section 131a are used to address the Alpha routines which execute the operational equivalent of the Intel instruction 100 and bits 0-2 131b signify the length of the Intel instruction 100.
The dispatch table targets are aligned on quadword boundaries. That is, the Alpha instructions which the entries in the dispatch table 131 point to and execute the operational equivalent of Intel instruction 1000, are located in system memory 14 on quadword boundaries. In this way, bits 0-2 of the address of the Alpha instructions are always zero. As a result, bits 0-2 131b' may be used to convey additional information about the instruction as here, where these bits are used to signify the length of the instruction. As the addresses of the Alpha routines are always 000 in bits 0-2 field 131b', a full 32 bit address is recreated by appending these zeros to bits 3-31 1012a to provide a complete 32 bit address.
As control is passed to the Alpha routine identified by the 32 bit address, bits 0-2 are used to increment EIP 121 so that EIP 121 is pointing to the beginning of the next instruction. Here, if the length of the Intel instruction 100 is from 1-6 bytes in length, QUAD 120 contains sufficient information to form a second, two byte instruction fragment 122 which may be used to index the current dispatch table to determine the corresponding Alpha routine for the next Intel instruction. This arrangement allows the run-time interpreter 44 to pipeline instructions and thus execute the application program more quickly and efficiently. While an Alpha routine is being accessed corresponding to a current instruction, the run-time interpreter 44 is able to determine the address and length of the next Intel instruction 100 in the instruction stream. A value of zero returned from bits 0-2 field 131b of the 32 bit entry 131i for the length of the Intel instruction 100 however, indicates that the instruction was longer than 6 bytes and hence, pipelining is not possible for this Intel instruction and accordingly, the EIP 121 is not incremented. It is then the responsibility of the Alpha routine to increment EIP 121 and to refill the pipeline.
Condition Code Processing
Referring now to FIG. 13, general purpose registers 135 of an Intel X86 machine are shown to include a single register, EFLAGS 135a, in which condition codes are maintained. This register, EFLAGS 135a, maintains the six condition code bits, the Carry bit 136a (C), the Negative bit 136b (N), the Zero bit 136c (Z), the Overflow bit 136d (O), the Parity bit 136e (P), and the Low Nibble Carry bit 136f (A). Each of these bits may be cleared or set as a result of the execution of an Intel instruction 100. To completely emulate the operation of the Intel application, the run-time interpreter 44 also maintains, in an unevaluated state, the current state of the condition codes resulting from the execution of an Alpha routine which corresponds to the Intel instruction 100.
As is often the case in systems which maintain condition codes, a subsequent condition code modifying instruction may be executed, thus overwriting the changes made to the condition code bits by a prior condition code modifying instruction, before the state of the condition codes is required by a subsequent instruction. In addition, many of the condition code modifying instructions effect only a partial set of the condition code bits. Accordingly, a complete evaluation of the condition code bits after execution of every condition code modifying instruction would be wasteful at CUP time. Nevertheless, the state of the condition code bits needs to be readily ascertainable throughout the execution of the X86 image should the current state of the condition codes be required.
Referring now to FIG. 14, the run-time interpreter 44 is shown to include a set of data storage locations 138, a table of methods 139, and evaluation routines 140 which are used to emulate the X86 condition codes during execution of an X86 image in computer system 10.
The set of data storage locations 138 is shown to include three locations 138a, 138b, 138c which are updated upon execution of an instruction which would have modified the condition codes in an X86 system. The first location, data1 138a, and the second location, data2 138b, store data used in the execution of the instruction, for example, an operand and a result of the instruction. This information is used later during execution of the application program should it become necessary to evaluate the condition codes.
The third location, pointer 138c, contains a pointer to the table of methods 139 which is a dispatch table used to evaluate the condition codes should the system require the current value of the condition codes. The table of methods 139 contains an entry for each of the eight predicates available in X86 conditional branches (and equivalent SETcc instructions), an entry to obtain the nibble carry, A 136f, bit and an entry to obtain a complete image at the EFLAGS 135a register. The set of methods includes one for each of the six condition codes.
Each entry in the table of methods 139, identifies an evaluation routine 140 which evaluates the condition described in the method table entry. Data1 138a and data2 138b are provided to the evaluation routines to determine the state of the condition code bits should a subsequent instruction require the current state of the condition codes.
When an Alpha routine is executed for an Intel instruction which would have modified one or more of the condition codes, the run-time interpreter 44 stores zero to two pieces of information from the instruction in the first two storage locations, data1 138a and data2 138b. These pieces of information, possibly an operand and a result of the operation, are used by the evaluation routines to compute the condition codes. In the third storage location, pointer 138c, a pointer is placed which, in accordance with the type of instruction which was executed, identifies the entry in the table of methods 139 which will identify the evaluation routines 140 which are to be called if and when the condition codes are evaluated.
The table of methods 139 is specific to the type of instruction executed. That is, if the instruction modifies all of the condition codes, the table of methods includes an entry pointing to a routine for each of the six condition codes. If the instruction modifies only the C bit, the only entry in the table of methods 139 is a entry pointing to an evaluation routine which will evaluate the C bit. Other possibilities include instructions which modify all of the condition code bits except for the C bit (ALL.sub.-- BUT.sub.-- C) instructions which modify only the Z bit (ONLY.sub.-- Z) and instructions which modify only the C and O bits (C.sub.-- AND.sub.-- O). The table of methods 139 for instructions of these types would include entries pointing to routines which correspond to all but the C bit, only the Z bit and only the C and O bits respectively.
Each entry in the table of methods 139 identifies a separate evaluation routine 140 which computes that specific condition code predicate or image of EFLAGS 135. Because these routines are only executed when necessary, the condition codes are maintained in an unevaluated state and accordingly, only minimally effect the execution speed of the application. Data1 138a and data2 138b are provided to the evaluation routine 140 to determine the effect the instruction had, or should have had, on the condition codes. Later, when a subsequent instruction is encountered by the run-time interpreter 44 which requires the current value of one or all of the condition code bits as input to the instruction, for example, as a condition in a conditional instruction, the run-time interpreter 44 uses the information provided in the data storage locations 138a and 138b, the table of methods 139 and the evaluation routines 140 to determine the current values of the condition code bits.
As discussed above, an Intel instruction can modify all condition code bits, or a subset of those bits. If the current instruction which modified the condition code bits modifies only the C bit and the previous instruction modified all of the condition code bits it would be wasteful to gather the data necessary to evaluate all but the C bit and copy it into the table of methods 139 which is provided for the current C bit modifying instruction. As a result, the run-time interpreter 44 maintains information to evaluate the previous state of the condition code bits based upon a previous condition code modifying instruction as well as the current condition code modifying instruction.
Referring now to FIG. 15, the interpreter is shown to include two sets of data storage locations 138 and 138', two corresponding tables of methods 139 and 139' and corresponding evaluation routines 140 and 140'. A first condition code evaluation grouping 137 corresponds to a current condition code modifying instruction and a second condition code evaluation grouping 137' corresponds to a previously executed condition code modifying instruction. Further, a finite state machine (FSM) is provided which determines how the previous and current states of the condition codes are maintained. The states and transitions of the FSM are the five types of condition code updates: ALL.sub.-- BUT.sub.-- C, ONLY.sub.-- C, C.sub.-- AND.sub.-- O, ONLY.sub.-- Z and ALL. Each transition has associated with it one of three actions: replace, push or resolve.
Provided below is a table, TABLE 1, which describers the action taken to maintain the condition code bits. The action is contingent upon which condition code bits the current instruction will modify as well as which condition code bits were modified by a previously executed condition code modifying instruction. In addition, the actions have been carefully selected to provide an action for the transition which entails a minimal amount of work yet still provides the run-time interpreter 44 a complete up-to-date set of condition code bits at any time.
In a replace action, the contents of the current condition code evaluation grouping are replaced by the values resulting from the next instruction. That is, the contents of the data storage locations 138, the corresponding table of methods 139 and the evaluation routines 140 are replaced with values which will enable the run-time interpreter 44 to evaluate the condition codes modified as a result of the next instruction. A replace action does not modify the contents of the previous condition code evaluation grouping. A replace action is appropriate when the set of condition code bits modified by the next condition code modifying instruction includes at least all of the condition code bits in the set of condition code bits modified by the most recent condition code modifying instruction.
A push action however, replaces the contents of the previous condition code evaluation grouping 137' with the contents of the current condition code evaluation grouping 137. The current condition code evaluation grouping 137 is used to provide the necessary information to evaluate the condition code bits modified by the next instruction. A push action is appropriate when the set of condition code bits modified by the next condition code modifying instruction does not include all of the condition code bits in the set of condition code bits modified by the most recent condition code modifying instruction. In addition, a union of the two condition code bit sets results in a complete set of condition code bits.
The final action is a resolve. The resolve is the most complicated of all the actions. In a resolve, the state of the condition codes, as represented by the current and previous condition code evaluation groupings 137 and 137', is evaluated resulting in a complete set of condition code bits, or an ALL, in the current condition code evaluation grouping 137. A push is then performed for the next instruction. A resolve action is appropriate when more than two condition code evaluation groupings would be necessary to maintain a complete set of condition code bits.
TABLE I
__________________________________________________________________________
Next CC Most Recent CC State
State ALL.sub.-- BUT.sub.-- C
ONLY.sub.-- C
C.sub.-- AND.sub.-- O
ONLY.sub.-- Z
ALL
__________________________________________________________________________
ALL.sub.-- BUT.sub.-- C
replace push push replace
push
ONLY.sub.-- C
push replace
resolve
resolve
push
C.sub.-- AND.sub.-- O
push replace
replace
resolve
push
ONLY.sub.-- Z
resolve resolve
resolve
replace
push
ALL replace replace
replace
replace
replace
__________________________________________________________________________
As mentioned above, the first condition code evaluation grouping 137 maintains in an unevaluated state the state of the condition codes corresponding to the execution of a current instruction. The second condition code evaluation grouping 138 maintains in an unevaluated state the state of the condition codes corresponding to the execution of a previous instruction.
The first set of data storage locations 138 here, registers CCR 107a, CCS 107b and CCD 107c retain three values. CCR 107a and CCS 107b contain data used by the current, non-native instruction such as an operand and a result of the instruction. CCD 107c contains a pointer to the dispatch table 139 provided to evaluate the state of the condition codes which are modified as a result of the execution of the current instruction. The second set of data storage locations 138' retain similar values corresponding to a previous condition code modifying instruction.
Here, each condition code evaluation grouping 137, 137' is shown to include a location in the respective table of methods 139, 139' which indicates the category of instruction which was executed. That is, whether the instruction modifies all of the condition code bits or a subset of the condition code bits. Using this value and the information in the FSM of TABLE I, the run-time interpreter 44 maintains in an unevaluated state, the complete set of condition code bits.
To illustrate how this works, an example is provided in conjunction with FIG. 15, in which a current instruction modifies all of the condition code bits (ALL) and a next instruction modifies only the C bit (ONLY.sub.-- C). In this simple example, the contents of the second condition code evaluation grouping 137', which provides the previous condition code state, is immaterial as will be shown.
As the current instruction modifies all of the condition code bits, the category location 139a of dispatch table 139 would indicate an ALL value. Accordingly, an entry for each of the six condition code bits is provided in dispatch table 139a to access evaluation routines 140 for each condition code bit.
When the corresponding Alpha routine for the next instruction is executed, the category location 139a of the current dispatch table is accessed to determine the category of the previous instruction. Using the category information provided and the information contained in TABLE 1 the run-time interpreter 44 manipulates the contents of each condition code evaluation grouping 137, 137' accordingly.
Here, the category of the most recently executed instruction is ALL while the category of the next instruction is ONLY.sub.13 C. As shown in TABLE I, when the most recent condition code state is an ALL and the next instruction is an ONLY.sub.13 C, the action which is to be taken is a push. Here, a push is an appropriate action because the set of bits modified by the next condition code modifying instruction, {C}, does not include all of the bits modified by the most recently executed condition code modifying instruction, {C, N, O, P, A}. Moreover, a union at the two condition code bit sets results in a complete set of condition code bits, {C, N, Z, O, P, A}.
The information retained in the current condition code evaluation grouping 137 is pushed or copied into the storage locations for the previous condition code evaluation grouping 137'. That is, the data in CCR 138a and CCS 138b are copied to pdata1 138a' and pdata2 138b' respectively and CCD 138c is copied to pptr 138c'. The current condition code evaluation grouping 137 is then used to store the data used to evaluate the C bit which is the only condition code bit modified by the next instruction. An example is provided below in conjunction with FIGS. 16 and 17 which describes a resolve action.
Referring now to FIG. 16, a set of condition code state diagrams 150 includes a condition code state 152 diagram for a previously executed condition code modifying instruction, a condition code state 154 diagram for a most recently executed condition code modifying instruction and a condition code state 156 diagram for a next condition code modifying instruction. Here, the previous condition code state 152 is ALL.sub.-- BUT.sub.-- C in which all but the C bit is modified. The most recent condition code state 154 is C.sub.-- AND.sub.-- O in which only the C and O bits are modified as a result of the execution of the most recently executed condition code modifying instruction. The next condition code state 156 is ONLY.sub.-- C in which only the C bit is modified.
Referring back to TABLE 1, it may be seen that when the most recent state is C.sub.-- AND.sub.-- O and the next state is ONLY.sub.-- C the appropriate action to be taken is a resolve action. It can be seen from FIG. 16 a replace action would not preserve the most recent state of the O bit as the current condition code state would be overwritten by information only capable of determining the C bit. A push however would lose the information necessary to determine the most recent values of the N, Z, P and A bits. As discussed above, more than two condition code evaluation groupings would be required to fully preserve the current states of each of the condition code bits. Accordingly, the information stored in the first and second condition code evaluation groupings 137, 137' is resolved resulting in a complete set of condition code bits.
Referring now to FIG. 17, the most recent condition code state 154' diagram is shown to contain a complete set of condition code bits. As a result of the resolve action, the most recent condition code state 154' is ALL and the next condition code state 156' is an ONLY.sub.-- C. Referring again to TABLE 1, the appropriate action to be taken is a push when the most recent condition code state is ALL and the next condition code state is ONLY.sub.-- C. Accordingly, the run-time interpreter 44 can push the condition code information resulting from execution of the next instruction without losing any condition code bit information.
Referring now to FIG. 18, the previous condition code state 152" diagram is shown to indicate a complete set of condition code bits which was pushed from the most recent condition code state 154' in FIG. 17. The most recent condition code state 154" diagram of FIG. 18 now indicates execution of a condition code modifying instruction which modified only the C bit. As may be seen, all information relating to the most current state of each of the condition code bits has been preserved.
Multiple Address Spaces
Referring now to FIG. 19, an implementation of multiple address spaces on an Intel machine is shown to include segments CS 160, DS 162, and SS 164 identifying address 0 166 of a first address space 168 and segment FS 170 identifying address 0 172 of a second address space 174. Data X 168i is located within the first address space 168 and data Y 174i is located within the second address space 174.
It should be noted that the first address space 168 and the second address space 174 exist independently from each other. Accordingly, there is no relationship between the location identified by segments CS 160, DS 162, and SS 164 and segment FS 170. Nor is there any relationship between the address of the location of data X 168i in the first address space 168 and address of the location of data Y 174i in the second address space 174.
Referring now to FIG. 20, emulation of multiple address spaces on a native architecture is shown to include segments CS 160', DS 162', and SS 164' identifying address 0 166' of a first address space 168' and segment FS 170' identifying address 0 172' of a second address space 174' where segment FS 170' has an offset 175 from address 0 166' of the first address space 168'. The value of the offset 175 is stored in SEGOFF 109 (FIG. 8).
Context Data Structure
Referring now to FIG. 21, a context data structure 180 which resides in memory is shown. The context data structure 180 is used by the run-time interpreter 44 to handle multitasking capabilities of the non-native software application. When, due to multitasking, an additional thread is executed during operation of the non-native software application, a snap-shot of the current state of the run-time interpreter 44 is saved in context data structure 180. The context data structure 180 is used by the new thread to provide the run-time interpreter 44 executing in the new thread the state of the run-time interpreter 44 executing in the thread which initialized the new thread.
Values which are saved in the context data structure 180 include the current condition code state in field 181. Thus, this field includes subfields (not shown) to provide copies of the values stored in registers CCR 138a, CCS 138b and CCD and 138c. Values are provided in field 182 to store the previous state of the condition code bits. The context data structure also includes copies of the integer registers EAX 104a, EBX 104b, ECS 104c, EDX 104d EDI 104e, ESI 104f, EBP 104g and ESP 104h in field 183.
In field 183 values for the six segments (seldomly used in WIN32 applications) are provided. The six segments, four of which are depicted in FIGS. 19 and 20 are cs, ds, es, fs, gs and ss. A copy of the floating stack pointer 106 (FIG. 7) is also provided in field 185 in addition to a starting value for the floating stack pointer as well as the floating stack entries.
Field 186 of the context data structure 180 provides pointers to each of the eight possible dispatch tables. Exemplary dispatch tables 131 and 131' are depicted in FIGS. 10 and 11. The context data structure 180 also provides in field 187 the Extended Instruction Pointer, EIP 121.
A repeat specifier value, as designated by one of the possible prefixes 100f (FIG. 8), is provided in field 188. Values relating to the Emulator Return Address, ERA 114, register are stored in field 189. In fields 190 and 191 pointers used to maintain the profile table as well as pointers to portable math routines are also provided respectively. Values of selected constants are also provided in the context data structure 180 in field 192 while pointers to maintain a linked list of context data structures is provided in field 193.
An additional aspect of a preferred embodiment includes structuring the order of the software which implements the on-line interpreter 44 such that critical blocks of the software code exist in a single cache block. In this way, the on-line interpreter 44 is able to execute more efficiently as the portions of the interpreter 44 which are executed most often are resident in the cache.
Non-Native Return Address Stack and Shadow Stack
Referring now to FIG. 22, a return address stack arrangement 210 is shown to include a non-native return address stack 211 and a shadow stack 212. The non-native return address stack 211 is an address stack which is produced as if the non-native image were executing in the non-native environment. The non-native return address stack 211 comprises a plurality of frames 219, each of said frames including a corresponding one of non-native return address fields 213a-213c, as well as fields 215a-215c for local storage, as shown. The non-native return address stored in locations 213a-213c corresponds to the routine return address that is pushed onto the stack by the program when it executes a call instruction. That is, the non-native program when executing in a native environment would place on the stack 211 a particular return address corresponding to the address space as if the non-native program was executing in its native environment.
As also mentioned, the return stack arrangement 210 also includes a shadow stack 212. The shadow stack 212 likewise is comprised of a plurality of frames 214, each of said frames 214 comprising a header field 216a-216c and corresponding or associated local storage fields 218a-218c.
The return address arrangement 210 also includes a pair of stack pointers, one for the non-native return stack 211 and one for the shadow stack 212. The non-native return address stack pointer 217 also referred to as SP points to the bottom or most recent entry in the non-native return address stack. Here the non-native return address stack 211 has an initial address A.sub.0 of <7FFFFFFF>. The initial address of <7FFFFFFF> insures that as the stack pointer SP is decremented, the largest stack pointer value will not be sign extended by an LDL instruction as will be described. Likewise, the shadow stack 212 has a stack pointer 221 referred to as SSP and has an initial address A.sub.0 =<0000000077FFFFFFF>.
The header portion 216a-216c of the shadow stack 212 here comprises four sub-fields. The first sub-field 220a also referred to as SP is the contents of the non-native stack pointer 217 corresponding to the return address in the non-native stack pointer for the particular shadow stack frame 214. Here the non-native stack pointer corresponds to the size of the emulated operating system. Thus, for a 32 bit operating system, the non-native stack pointer 220a would comprise four bytes.
The second entry 20b in the header 216a-216c is the non-native instruction pointer value 220b. The non-native instruction pointer is the address that is pushed onto the non-native return address stack 211. This address also comprises the same number of bytes as the number of bytes supported in the operating system. Thus, again for a 32 bit operating system, the number of bytes is 4.
The third entry 220 in the header portion 216a-216c is a native return address field 220c. The native return address field 220c comprises the native return address which is placed on the shadow stack if a translated routine executes a call instruction. This corresponds to the address of the native instruction which is to resume execution in the translated routine after the called routine has completed.
The fourth entry in the header 216a-216c is the native dynamic link 220d. The native dynamic link field is a pointer to the previous shadow frame header 214. Thus, in FIG. 22, the value stored in the field "dylnk" corresponds to the location of the next shadow frame header 216b. This value is preferably included in the shadow stack 212 to allow the shadow stack 212 to make provisions for a variable amount of local storage in fields 218a-218c. In situations where the local storage fields are not provided or their size is fixed, it is not necessary to have a dynamic link field.
The local storage fields 215a-215c in the non-native register stack 211 comprise routine calls and routine arguments of the non-native system and is provided to faithfully replicate that which would occur in the non-native system were it being executed on its native architecture. The routine locals and routine arguments stored in the non-native return stack are passed to translated routines via the translation process described above and as will be further described in detail below. In the shadow stack 212, however, provision is also provided for local storage in fields 218a-218c. For example, often when a compiler is used to compile a program, the actual instructions of the program use more logical registers than physically exist in the machine on which the program is to be executed. Accordingly, the compiler often provides temporary storage for logical register manipulations and uses the program stack to store these registers.
Non-Native Return Stack and Shadow Stack Management
The non-native return address stack 211 is managed exactly as dictated by the non-native code being emulated in the interpreter 44. When the interpreter 44 is executing the non-native or non-native code of a particular thread, there is only one native frame on the shadow stack 212 for the interpreter. This permits the interpreter to transfer execution into translated code in the event that there is corresponding translated code to be executed. The interpreter does not push frames onto the shadow stack 212. Further, when transferring into and out of translated routines, the interpreter does not push data onto the native system stack. Rather, when transferring into and out of translated routines, shadow frames 214 are pushed onto the shadow stack 212 to record the state associated with the translated routines.
The shadow stack 212 tends to be synchronous with the routine frames on the non-native return stack. Although calling jackets (48 FIG. 3) may cause another instance of the interpreter 44 to be produced if a callback is performed, and thus push another interpreter frame onto the non-native return address stack 211, once the jacketed operation has been completed this extra frame is removed from the non-native or non-native stack 211.
With a translated routine, however, a shadow frame 214 is pushed onto the shadow stack 212 each time a translated routine is called. The shadow frame 214 includes the space necessary for the translated routine's locals such as the spilled registers mentioned above, and the shadow frame header.
Referring now to FIG. 23, an example of the operation of the shadow stack 212 is shown. The program 230 includes a routine A which has a plurality of instructions, one of which is a call to a routine B (call B) at 233. Routine B, likewise, has a plurality of instructions with the last instruction being a return instruction RET. Program flow 230 represents a program flow for the non-native program executing in its native environment. In routine A, when the Call B instruction 233 is executed, it causes the next instruction at address A.sub.N to be pushed onto the non-native return address stack 211, as shown. The stack pointer for the non-native instruction stack 211 is incremented to the next value, thus pointing to the entry for A.sub.N. Routine B is called by routine A and executes its instructions causing at the last instruction (RET) a return which causes a pop from the non-native return address stack 211. The pop delivers the address A.sub.N on the location of the next instruction to be loaded into the program counter for execution.
Were routine A and routine B translated as mentioned above to provide corresponding translated routines A' and B' (242 and 245) during execution of translated code in the native architecture, an instruction Call B' would be encountered at 243. The shadow frame is allocated at the beginning of a routine for all calls that the routine can make. The instruction Call B' causes the shadow stack to be populated with a shadow stack frame 214 which comprises the four above-mentioned fields 220a-202d and the optional fields for local storage. Thus, in field 220a is provided the contents A.sub.N of the stack pointer (SP) 217 of the non-native return stack 211. This value corresponds to the location where the return address stored in the non-native return address stack 211 for the corresponding native instruction execution will be found.
Likewise, stored in field 20b is a copy of the non-native return address that was pushed on the non-native stack by the execution of the call instruction. The non-native return address is provided by the translated image and corresponds to the |