Analysis of code form

Method and system of cache management using spatial separation of outliers

6901587

Abstract

A method and a system of cache management using spatial separation of outliers. The system includes a dynamic compiler arranged to create compiled fragments of code having dominant code blocks and outliers. Memory coupled to the dynamic compiler is managed by a compiler manager such that dominant code blocks are stored in one portion of the memory and the outliers are stored in another portion of the memory. Storing the dominant path code separate from the outliers increases efficiency of the system.


Claims

1. A computer system comprising:

a dynamic compiler; the dynamic compiler including:

an execution history recorder configured to record the number of times a fragment of code is executed, the execution history recorder having a threshold;

a code-executing interpreter for executing the code coupled to the execution history recorder;

a compiler manager coupled to the execution history recorder;

a compiler coupled to the compiler manager, the compiler arranged to create compiled fragments of code having dominant code blocks and at least one outlier; and

wherein memory is coupled to the dynamic compiler, the memory managed by the compiler manager such that dominant code blocks are stored in one portion of the memory and the at least one outlier is stored in another portion of the memory.

2. A system as claimed in claim 1, the outlier having instructions to synchronize states.

3. A system as claimed in claim 2, wherein the outlier is operable to pass control to a block of glue code.

4. A system as claimed in claim 3, wherein the glue code instructs the code executing interpreter to interpret a non-dominant fragment of code.

5. A system as claimed in claim 4, wherein the at least one outlier is patched so as to access a compiled version of the non-dominant fragment of code.

6. A system as claimed in claim 1, the dynamic compiler further including at least two outliers, and wherein dominant fragments of code and the at least two outliers are stored in a code buffer in the memory, the dominant fragments of code filled from one end of the code buffer and the at least two outliers filled from the other end of the code buffer.

7. A system as claimed in claim 1, the dynamic compiler further including at least two outliers, and wherein dominant fragments of code and the at least two outliers are stored in a code buffer in the memory, the dominant code fragments and at least two outliers filled from the same end of the code buffer.

8. A system as claimed in claim 1, the dynamic compiler further including a threshold tuner coupled to the execution history recorder, the threshold tuner operable to adjust the threshold of the execution history recorder.

9. A system as claimed in claim 1, the dynamic compiler further including:

a memory searcher coupled to the code-executing interpreter;

a converter device coupled to the interpreter; and

an execution device coupled to the converter device.

10. A system as claimed in claim 1, the dynamic compiler further including a queue coupled to the compiler.

11. A system as claimed in claim 1, the compiler manager having a memory manager that monitors memory available to the compiler.

12. A system as claimed in claim 11, wherein a deleter is coupled to the memory manager.

13. A system as claimed in claim 1, wherein the number of times the fragment of code is executed is recorded when the fragment of code is executed by the code-executing interpreter.

14. A system as claimed in claim 1, wherein the dynamic compiler is a multi-threaded system and the compiler runs on a separate thread so the progress of code execution is not blocked.

15. A system as claimed in claim 1, wherein the execution history recorder is further configured to record from where a transfer of control into the fragment of code came and to where control is transferred out of the fragment of code.

16. A system as claimed in claim 1, wherein the execution history recorder is further configured to alert the compiler manager when the fragment of code has been executed the threshold number of times.

17. A system as claimed in claim 16, wherein the compiler manager administers the queue of frequently executed fragments of code for compilation.

18. A system comprising:

a code-executing interpreter;

a memory searcher coupled to the code-executing interpreter;

a converter device coupled to the code-executing interpreter;

an execution device coupled to the converter device;

an execution history recorder configured to record the number of times a fragment of code is compiled, the execution history recorder coupled to the code-executing interpreter and having a threshold;

a threshold tuner coupled to the execution history recorder, the threshold tuner operable to adjust the threshold of the execution history recorder;

a compiler manager coupled to the execution history recorder;

a compiler coupled to the compiler manager; and

memory coupled to the compiler and managed by the compiler manager such that dominant blocks of code are stored in one portion of the memory and at least one outlier is stored in another portion of the memory.

19. A system as claimed in claim 18, further comprising a queue coupled to the compiler.

20. A system as claimed in claim 19, wherein the compiler manager further includes a memory manager that monitors memory available to the compiler.

21. A system as claimed in claim 20, further comprising a deleter coupled to the memory manager.

22. A system as claimed in claim 18, wherein the compiler generates compiled fragments of code with only one entry point.

23. A method of compiling computer code, the method comprising:

establishing an execution threshold within a code-executing interpreter;

executing a number of fragments of the computer code;

recording the number of times each of the fragments of code is executed;

queuing one fragment of code for compilation when the number of times the one fragment of code has been executed matches a threshold;

compiling the one fragment of code;

generating outliers related to fragments of code that have not been executed the threshold number of times; and

storing the one compiled fragment of code and the outliers in separate portions of memory.

24. A method as claimed in claim 23, further comprising adjusting the threshold after it is established.

25. A method as claimed in claim 23, further comprising monitoring memory available to the compiler.

26. A method as claimed in claim 25, further comprising deleting code from memory to meet the requirements of the compiler.

27. A method as claimed in claim 23, further comprising running the compiler on a thread that is separate from a thread of the code-executing interpreter.

28. A method as claimed in claim 23, further comprising recording a transfer of control into one fragment of code and a transfer out of the one fragment of code.

29. A method as claimed in claim 23, further comprising searching memory for preexisting compiled versions of fragments of code.

30. A method as claimed in claim 23, wherein the computer code includes at least one Method, and at least one of the fragments of code includes less than the entire at least one Method.

31. A method as claimed in claim 23, further comprising performing an exception check.

32. A method claimed as claim 31, further comprising performing a code optimization.

33. A method as claimed in claim 31, further comprising interpreting exception code when an exception occurs.

34. A method as claimed in claim 31, further comprising establishing a link to a bailout device.

35. A method as claimed in claim 31, further comprising passing control to the code-executing interpreter.

36. A method as claimed in claim 31, further comprising updating condition states.

37. A method as claimed in claim 36, further comprising interpreting exception code after updating condition states.


Description

SPECIFICATION

BACKGROUND OF THE INVENTION

1. Field of Invention

This invention relates, in its most general aspects, to a computer system and to a method of operating that system, and to improvements in the performance of various operations within such a system. It also relates to a computer-readable storage medium. The computer system may be, may include, or may be part of, a virtual machine. The computer-readable storage medium may contain executable code or other instructions for programming the computer system/virtual machine. In particular, the invention relates to the management of memory in a computer system, and in particular to the management of cache memory in a computer system. In a preferred embodiment, the invention relates to outliers for spatial separation of infrequent code, etc.

2. Description of Related Art

In recent years, there have been developments in programming languages towards what is known as an object-oriented language. In these developments, concepts are regarded as ‘objects’, each carrying with it a set of data, or attributes, pertinent to that object, as well as information relating to so-called ‘methods’, that is functions or sub-routines, that can be performed on that object and its data. This is well known to those skilled in the art of computing and/or programming.

The advent and rapid advancement in the spread and availability of computers has led to the independent development of different types of systems, such as the IBM and IBM-compatible PC running IBM-DOS or MS-DOS or MS-Windows applications, the Apple Macintosh machines running their own Apple System operating system, or various Unix machines running their own Unix operating systems. This proliferation of independent systems has led to useful applications being available only in one format and not being capable of running on a machine for which the application was not designed.

Under such circumstances, programmers have devised software which ‘emulates’ the host computer's operating system so that a ‘foreign’ application can be made to run successfully in such a way that, as far as the user is concerned, the emulation is invisible. In other words, the user can perform all of the normal functions of say a Windows-based application on a Unix machine using a Unix-based operating system without noticing that he is doing so.

A particularly notable product of this type is that developed by Insignia Solutions of High Wycombe, GB and Santa Clara, Calif., USA and known under the name ‘SoftWindows 2.0 for Powermac’. This software enables a physical Macintosh computer to emulate a PC having an Intel 80486DX processor and 80487 maths co-processor plus memory, two hard disks, IBM-style keyboard, colour display and other features normally found on recent versions of the PC-type of computer.

Furthermore, there is an ever-increasing demand by the consumer for electronics gadgetry, communications and control systems which, like computers, have developed independently of one another and have led to incompatibility between operating systems and protocols. For example, remote-control devices for video players, tape players and CD players have similar functions, analogous to ‘play,’ ‘forward, ’ reverse, ‘pause,’ etc, but the codes for transmission between the remote control, or commander, operated by the user may not be compatible either between different types of equipment made by the same manufacturer or between the same types of equipment made by different manufacturers. There would be clear benefits of having software within the equipment which can produce for example the correct ‘play’ code based upon a ‘play’ command regardless of the specific hardware used in the equipment. Such software is commonly known as a ‘Virtual Machine.’

Other uses and applications are legion: for example, set-top boxes for decoding television transmissions, remote diagnostic equipment, in-car navigation systems and so-called ‘Personal Digital Assistants.’ Mobile telephones, for instance, can have a system upgrade downloaded to them from any service provider.

Emulation software packages tend to have certain features in common, notably that they are not general purpose but are dedicated. They are of most benefit in rapid development areas and have a distinct advantage in enabling manufacturers to cut costs. In particular, they can divorce software from the physical machine, i.e., the effect of the software in the physical machine can be altered by the emulating software without having to go into the machine's native software to implement those changes.

The specific object-oriented language used in some of the implementations described later is that known as Java (registered trade mark to Sun Microsystems Corporation). Some of the following implementations will enable Java to be used in smaller devices than is currently possible because of the improved performance and/or reduced memory footprint. Future uses projected for embedded software (virtual machines) include computers worn on the body, office equipment, household appliances, and intelligent houses and cars.

While it is recognised that there are clear advantages in the use of virtual machines, especially those using object-oriented languages, there are naturally areas where it is important and/or beneficial for some of the operations that are carried out within the system to be optimised. These may include reducing the memory requirement, increasing the speed of operation, and improving the ‘transparency’ of the system when embedded in another system. One of the principal aims of the inventions described herein is to provide a Virtual Machine which is optimised to work as quickly as possible within a memory constraint of, for example, less than 10, 5, 2 or even 1 Mbyte. Such a constraint is likely to be applicable, for example, to electronics gadgetry and other equipment where cost (or size) is a major constraint.

In a computer system there are various levels of cache memory. It is of benefit to the system, in terms of improved efficiency and therefore speed, if the caches themselves can be operated efficiently. It has been appreciated pursuant to the present invention that it would be advantageous to have code which is likely to be executed frequently located in the caches and in particular in the fastest cache. In the embodiment of the invention described below, Java code is compiled for faster execution at run-time using a dynamic compiler. In order to improve cache density of useful code (density), as one of the aims of the invention, it would be beneficial to have in the fastest of the caches the compiled code that the dynamic compiler has produced.

Prior art solutions do not maximise the density of cache memory. For example, as is discussed in more detail below, it has been appreciated that the fast caches of prior art systems are often occupied by large amounts of infrequently accessed code reducing the density of frequently accessed code in the cache which may lead to more cache misses. The present invention seeks to mitigate this and/or other problems.

BRIEF SUMMARY OF THE INVENTION

According to a first aspect of the present invention, there is provided a computer system including a compiler, the compiler being arranged to compile dominant path fragments of code.

A dominant path represents a frequently executed path of execution through the code and may include a large number of individual blocks of code. By arranging for the dominant path to be compiled (and preferably only the dominant path to be compiled), the density of useful code in the compiled version of the code is increased since the compiled version includes only code which is executed frequently. Thus the density of useful code in the cache can be increased.

By arranging for the dominant path to be compiled, it is possible to arrange for blocks of code including the most frequently executed paths through the code to be more likely to be stored in the cache, and more likely to be stored in the same (L1) cache as other blocks of the dominant path code. Thus the run-time execution of the dominant path can be faster.

Preferably, the system further includes an execution history recorder for recording information about the dominant path. Preferably, an on-line record of the dominant path is made during the execution run. Preferably, therefore, the system includes means for determining the dominant path fragment during the execution of the code.

Preferably, the system further includes a compiler for compiling code and, preferably, the compiler is arranged to compile a dominant path fragment. Preferably, the compiler is an on-line compiler. Preferably, the dominant path fragment does not include infrequently executed code. Thus, if the dominant path fragments of code are arranged separately from infrequently executed fragments of code, management of the memory of the system can be improved.

Further discussion of preferred features in the compilation of the dominant path can be found in Appendix 1 of this specification.

Preferably, the system further includes an outlier for use where a path of execution leaves the dominant path.

According to a second aspect of the invention, there is provided a computer system including outliers for use in the execution of infrequently executed code.

Where the path of execution would leave the dominant path, for example, due to a conditional transfer to a non-dominant location of the code or due to an exception condition being detected, control is passed to the outlier. Preferably the outlier is in the same code buffer as the fragment of dominant path from which control is transferred.

The dominant path is a ‘best guess’ of the likely path of execution through the code based on current behaviour. It will sometimes prove to be inapplicable for a particular execution of the code. The outliers are used to deal with the situation.

Preferably, the system further includes an interpreter. Preferably, the interpreter is used to execute at least some of the infrequently executed code. Preferably, the system further includes a converter for converting between the execution of compiled code and non-compiled code. The converter preferably includes outliers.

Where the execution has left the dominant path due to a conditional transfer, preferably, the outlier is adapted to effect transfer of control to the interpreter.

Where execution has left the dominant path due to an exception being encountered, preferably, the outlier is adapted to transfer control to an exception handler.

Preferably, the outlier is adapted to update states before execution of infrequently executed code. For example, where control is being passed to the new non-dominant path, which is typically interpreted until that new section warrants compilation, the updating may be required, for example, where optimisations have been used in the compilation of the dominant path code.

Preferably, the code includes a conditional branch to the outlier, the conditional branch including a conditional test and being such that execution follows the dominant path if the conditional test fails. Processors often predict that forward branches will fail and will carry out various checks before the branch is carried out. If the condition of the branch occurs rarely so that usually the execution falls through (in the dominant path), when the code for the condition is compiled, the code is arranged so that if the condition is true, the control passes to the outlier. Thus the forward branch occurs only rarely and thus the processor checks are only carried out on the rarely executed jump to the outlier. Thus, processor time can be reduced because the condition is usually not true and the execution simply drops through to follow the dominant path.

Preferably, the system includes means for separating frequently executed code from infrequently executed code.

That is a particular important feature of the present invention which may be provided independently, thus the invention further provides a computer system including means for separating frequently executed code and infrequently executed code.

By separating the frequently executed code from the infrequently executed code, it is made possible for memory of the system to be managed more efficiently. For example, it makes it possible to arrange for less of the infrequently executed code to be pulled into the cache. That can give improved execution speed of the frequently executed code at runtime by reducing the cache misses. The means for separating the code may be provided by a compiler which compiles the code in a particular way as described in more detail below. The separation may be effected by arranging that certain types of code are stored in one memory area and other types of code are stored in a different memory location.

Preferably, the system further includes an outlier, and means for separating dominant path fragments from the outlier.

Thus the system preferably includes means for storing the frequently executed code in a first memory region and means for storing infrequently executed code in a second memory region. Preferably, the system includes means for storing the dominant path fragments in a first memory region and means for storing outliers in a second memory region. Preferably, the first memory region and the second memory region are regions of a code buffer.

Preferably the frequently executed code and infrequently executed code are generated in different areas of the code buffer. For example, the system may include means for storing the infrequently executed code "backwards" in the buffer. Preferably, the system includes means for storing the dominant path fragments and the outlier at opposite ends of the code buffer.

By storing the code in that way, it is possible to arrange the code so that frequently executed code is likely to be drawn into a code cache while infrequently executed code is unlikely to be pulled into the cache. Therefore, preferably the code is stored so that infrequently executed code is unlikely to be pulled into a cache.

That is a particularly important feature of the present invention, and can be provided independently. Thus the invention further provides a computer system including a code cache, the system being arranged so that infrequently executed code is unlikely to be stored in the cache.

Preferably, in the compilation of the dominant path, the frequently executed code includes the compiled dominant path fragments. Those fragments are preferably generated forwards in the code buffer. The outliers are preferably generated backwards in the code buffer, thus spatially separated from the dominant path fragments. Thus the memory occupied by the outliers in the code buffer can be much less than a compiled version of the original portion of infrequently executed code fragment of the uncompiled code.

The present invention further provides a computer system including means for storing substantially all of (and preferably only) the dominant path compiled code together in one memory region. Preferably, the system further includes means for storing code for dealing with the non-dominant cases in spatially separate regions.

The present invention also provides a method of operating a computer system, the method including compiling dominant path fragments of code. Preferably, the method includes determining the dominant path during the execution of the code.

Preferably, an outlier is used when a path of execution leaves the dominant path, and preferably the outlier effects transfer of control to the interpreter and/or to an exception handler.

Preferably, the outlier updates states before execution of infrequently executed code.

Preferably, where the code includes a conditional branch to the outlier, the conditional branch includes a conditional test such that execution follows the dominant path if the conditional test fails.

Preferably the method includes separating frequently executed code from infrequently executed code.

Also provided by the invention is a method of operating a computer system, including separating frequently executed code and infrequently executed code.

Preferably, the method includes separating dominant path fragments from outliers and preferably storing the dominant path fragments in a first memory region and storing outliers in a second memory region. Preferably, the first memory region and the second memory region are regions of a code buffer. Preferably the method includes storing the dominant path fragments and the outliers at opposite ends of the code buffer.

Preferably the method includes storing the code so that infrequently executed code is unlikely to be pulled into a cache.

The invention also provides a method of storing code in a computer system including a code cache, the method being such that infrequently executed code is unlikely to be stored in the cache.

According to the present invention, there is further provided a method of operating a computer system including the steps of: compiling dominant path code, and storing the compiled code in one memory region. Preferably, the method includes storing outliers in a separate memory region.

Also provided by the invention is a method of compiling code, the compilation being effected so that frequently executed code is separate from outliers.

The invention also provides code stored in a computer system by a method described herein and provides a compiler for compiling code in accordance with the invention.

The invention further provides a computer-readable storage medium having a programme recorded thereon for carrying out a method according to the invention.

The invention also provides a computer-readable storage medium having a programme recorded thereon for compiling code, the compilation being effected so that frequently executed code is separate from outliers.

The invention further provides a computer programmed according to a method as aforesaid.

The invention also provides a computer programmed for compiling code, the compilation being effected so that frequently executed code is separate from outliers.

Accordingly, the invention provides a computer system including means for storing substantially all of (and preferably only) the dominant path compiled code together in one memory region, whilst, preferably, any outlier is only stored in spatially separate regions. Such a memory layout typically maximises the amount of useful code loaded into the cache.

The invention also provides a method of operating a computer system including the steps of: compiling all of the dominant path code; and storing substantially all of the compiled code in one memory region, while preferably storing outliers in a separate region.

An ‘outlier’ is so called since it lies out of the normal memory region for predominantly executed code. In this way the infrequent, by which may be meant the non-dominant path, code is separated from the more frequently used dominant path code, and so does not get loaded into the cache as long as the dominant path is executing.

Any, some or all of the features of any aspect of the invention may be applied to any other aspect.

The following considerations apply to any and all the inventions and aspects of the inventions described above.

BRIEF DESCRIPTION OF SEVERAL VIEWS OF THE DRAWINGS

The invention will be described in conjunction with the following drawings in which like reference numerals designate like elements and wherein:

FIG. 1 is a block diagram a virtual machine of the present invention;

FIG. 1A shows a section of code before compilation;

FIG. 1B shows a standard compilation of the code of FIG. 1A;

FIG. 1C shows compilation of code in accordance with a preferred embodiment;

FIG. 1D shows a code buffer;

FIG. 1E shows the memory arrangement in a computer system;

FIG. 1F shows apparatus for carrying out the method of the invention;

FIG. A1-1 shows paths of execution;

FIG. A1-2 shows the comparative costs of compiling dominant paths;

FIG. A1-3 shows a dispatch table;

FIG. A1-4 is a schematic representation of apparatus for carrying out the invention;

FIG. A1-5 shows paths of execution through code;

FIGS. A2-1 to A2-4 illustrate the use of patches in compiled code;

FIG. A2-5 is a flow diagram of a preferred method embodiment;

FIG. A2-6 illustrates the use of patches with potentially polymorphic methods; and

FIG. A2-7 is a block diagram of a preferred apparatus embodiment.

FIG. A3-1 illustrates the principle of a virtual machine;

FIG. A3-2 illustrates the operation of an emulator stack;

FIG. A3-3 illustrates the operation of a unified stack;

FIG. A3-4 shows an embodiment of the present invention;

FIG. A3-5 shows an apparatus embodiment of the present invention;

FIG. A4-1 shows apparatus for carrying out the method of the invention;

FIG. A4-2 shows a fragment of code including an exception; and

FIG. A4-3 shows a compiled fragment of code in accordance with the present invention.

FIG. A5-1 shows a flow diagram illustrating a class loader that checks for instance Methods of a new class being loaded;

FIG. A5-2 shows a section of compiled code;

FIG. A5-3 shows a different section of compiled code;

FIG. A5-4 shows apparatus for carrying out a preferred embodiment;

FIG. A6-1 is a schematic illustration of data storage in a stack;

FIG. A6-2 shows an activation stack;

FIG. A6-3 illustrates how checks are made on references in a frame;

FIG. A6-4 shows the arrangement of data in a procedure call frame;

FIG. A6-5 shows the execution of a procedure;

FIG. A6-6 shows the arrangement of the contents of a barrier descriptor block;

FIG. A7-1 illustrates a hierarchical structure in object-oriented programming;

FIG. A7-2 shows the arrangement of data stored in dispatch tables;

FIG. A7-3 shows the application of an interface hash table to a dispatch table;

FIG. A7-4 is a hierarchical structure of a domestic equipment system;

FIG. A7-5 shows dispatch tables used in operating devices in the domestic system of FIG. A7-4; and

FIG. A7-6 shows a controller program with driver devices for operating the devices in the domestic system of FIG. A7-4.

DETAILED DESCRIPTION OF THE INVENTION

The invention will be illustrated in more detail with reference to the following Examples, but it should be understood that the present invention is not deemed to be limited thereto.

A specific example of a preferred embodiment of virtual machine is now described with reference to FIG. 1.

The virtual machine 20 is an executable code installed in the particular item of equipment 22. It can provide a degree of independence from the hardware and operating system. The virtual machine may typically include any, some, or all of the following features: an operating engine, a library of routines, one or more interpreters, one or more compilers, storage means for storing a plurality of instruction sequences, queue management means, and buffer management means.

The virtual machine is coupled to one or more applications 24 on one side (the "high level" side), and, on the other side (the "low level" side), perhaps via various intermediate logical units, to the hardware 26 of the item of equipment. The hardware can be regarded as including various ports or interfaces 28 (perhaps an interface for accepting user input); the virtual machine receives events from those ports or interfaces. The hardware also includes one or more processors/control means 30 and memory 32.

FIG. 1A shows a section of Java bytecode including blocks B1, B2, B3, B4 and B5 which carry out calculations 1, 2, 3, 4 and 5, respectively. B4 is code which deals with exceptions which may occur in B1, B2 or B3 (see paths 9000, 9002 and 9004 to B4). The dominant path through the blocks is found to be such that control (almost) always passes from B1 to B3 (path 9006) at the conditional transfer of control at the end of B1, and B3 passes control to B5 (path 9008). The paths 9000, 9002 and 9004 are hardly ever taken.

An outline of the original Java source for the example of FIG. 1A is

void method () {
try {
calculations 1 // calculations 1 and if (condition) if (condition) { translates to block B1
calculations 2 // calculations 2 translates to block B2
} calculations 3 // translates to block B3 and a jump to B5
}
} catch () {
calculations 4 // translates to block B4
} calculations 5 // translates to block B5 }

Suppose that predominantly the condition is false, and none of the calculations 1, 2 or 3 encountered an exception which would be caught by the catch clause (block B4). Therefore, the useful code based on this dynamic behaviour consists solely of blocks B1, B3 and B5.

Standard compilation techniques for this code (especially in the case of compilation at runtime) would be to emit code for all five blocks, to allow for all eventualities in the subsequent execution of the compiled code. Thus the compiled versions of B2 and B4 potentially waste memory space, and as detailed below can lead to reduced cache density of useful code compared to preferred embodiments. If many such methods are compiled in this standard manner, the wider range of address space used to encompass the compiled code can lead to control transfers crossing address space page boundaries more frequently, with ensuing higher frequency of page faults (if virtual memory is enabled on the computer system), compared to preferred embodiments.

As a program runs, the processor picks up instructions from the memory. When the instructions for the program run over the end of a page, the memory manager must be interrogated to find and check the next page if that next page is not in main memory. That is time consuming. Crossing a page boundary is therefore time consuming.

A standard compilation of the code is shown in FIG. 1B. Blocks B1, B2, B3, B4 and B5 are set out sequentially.

FIG. 1C shows compiled code according to a preferred embodiment. Note that the dominant path includes blocks B1, B3 and B5.

The compilation of the code has inverted the logic of the condition test in block B1, so that the predicted fall through case is to block B3, and the unpredicted flow of control is to an outlier OL1. Note that the code for the blocks B1 and B3 are spatially contiguous despite not being contiguous at the source and bytecode levels. This is advantageous to modern processors with branch prediction hardware. Note also that this contiguity by definition occupies a smaller range of the memory address space than if block B2 had been inserted in between.

Blocks B2 and B4 do not exist in the compiled versions of the code because they were found not to be a part of the dominant path.

B5 is also spatially contiguous with block B3, and the original unconditional control transfer present in the bytecode for jumping over the exception handler B4 requires no corresponding host instruction. Block B3 simply drops though into block B5 in terms of control flow. Thus blocks B1, B3 and B5 are spatially contiguous, and hence occupy a smaller range of the memory address space in total than if they were interspersed with blocks B2 and B4. These blocks (B1, B3 and B5) have been packed to model the current execution characteristics of the Java method.

When B1 first receives control, requiring loading of a cache line into the processor, better cache density ensues in the immediately affected cache line. Code infrequently executed (in blocks B2 and B4) does not get pulled into the cache.

Now consider several methods (or dominant paths thereof) compiled in a similar manner, and into a given code buffer. As these pass control amongst each other, the cache perturbations will be reduced by having a greater cache density of useful code. Low cache density can lead more frequently to cache-collisions and cache-misses. Also, with computer systems employing virtual memory, preferred embodiments can give a reduction in page faults, as a consequence of the reduction in address space usage for frequently executable code. A page fault occurs when the processor tries to execute an instruction which is not in memory. When a page fault occurs, the page in which the instructions to be executed are located are loaded into memory from the permanent storage device that is being used for the virtual memory. This is a time consuming operation which slows down the speed of execution.

FIG. 1C shows outliers OL1 and OL2 for use if the execution leaves the dominant path. If the conditional test passes at the end of B1, control will pass to OL1. OL1 synchronises states (that is, ensures that register-cached values are spilt back to their corresponding memory locations) and then passes control to a piece of glue code to effect resumption of the unpredicted (non-dominant) path corresponding to calculations 2 via a fall back interpreter. Until such time as the corresponding bytecodes of the non-dominant path execute frequently enough to warrant dynamic compilation, these continue to be interpreted, thus saving space in the code buffers (which are limited resources) for more important paths of bytecode execution. Thus outliers of the type of OL1 handle the case where normal control flow takes an unpredicted path away from the dominant path, such as needing to execute calculations 2.

An example of code of an outlier such as OL1 is as follows:
    • a=rn //update states and restore memory locations for a, b, c
    • b=rm
    • c=rs
    • callglue (3000)// calls the glue code and tells it to interpret uncompiled code from bytecode address 3000
      The interpreter will start execution at the beginning of block B2. If the bytecode at 3000 is executed enough, it will later be compiled. The next time the glue is told to interpret from 3000, it will recognise that there is a compiled version of B2. It will amend the ‘callglue’ line of OL1 (automatically) to ‘goto . . . ’ to direct control to the compiled version. This is known as "patching" (see Appendix 2 of this specification). Thus, the next time the outlier OL1 is called, the control will be transferred directly to B2, without the glue being used. (See also Appendix 1 of this specification).


  • A different type of outlier OL2, deals with the situation in which an exception condition is recognised within the dominant path (for example, block B1 attempts to access an array outside of its legal bounds). The dominant path passes control to an outlier (OL2) to deal with the exception. Here, the outlier synchronises state as usual, and then passes control to the glue code to raise the exception within the virtual machine.

    An example of code of an outlier such as OL2 is as follows:
    • a=rn //update states, restore memory locations for a, b and c
    • b=rm
    • c=rs
    • callglue raise exception X //tell glue to transfer control to an execution handler for dealing with an exception of type X


  • Further discussion of the use of glue code and the transfer of control to the interpreter can be found in the section Appendix 1 of this specification.

    Only two outliers have been shown in FIG. 1C for clarity. In practice, separate outliers would be provided to deal with each exception and each deviation from the dominant path which could occur.

    Outliers are spatially far separated from those blocks of code corresponding to their associated dominant paths. A given compilation produces a set of blocks for the dominant path and another set of blocks of outliers used by the dominant path when unpredicted or exceptional behaviour is encountered during execution.

    FIG. 1D shows the blocks of compiled code and the outliers filled into a code buffer 9054. The dominant path blocks are filled into the buffer in the direction 9056 and the outliers are filled in the direction 9058. The dominant path blocks occupy one end of the code buffer, and its outliers the other end. Each compilation of a new fragment of code produces new sets of dominant path blocks and outliers and the code buffer is laid out so that the outliers and dominant path blocks grow towards each other. Hence it can be seen that in the normal course of execution, where outliers are not executed, their presence is transparent in the system with respect to the processor cache behaviour. Thus maximum cache density of useful code, and maximum address space density of useful code is possible.

    The code buffer is managed by the compiler manager which indicates where the pointers are at the high memory and low memory ends of the buffer. As the compiled code is generated for a block, the compiled version of the block will be entered in the buffer, followed by the block of code for the outlier(s). The code for the outlier is then moved to the opposite end of the buffer. Thus the dominant path blocks and outlier blocks fill the buffer from separate ends. This improves cache density and reduces paging problems.

    In an alternative embodiment, the blocks of dominant path code and outliers can be filled from the same end of the buffer, but in blocks for each fragment of code. In the example above, the buffer would include (in order) B1, B3, B5, OL1, OL2, OL3. . . . The next fragment to be compiled would also lay down the code for the dominant path blocks followed by that for the outliers. That arrangement is, however, less preferred since address space is being used up by the outliers and there is a greater chance that code of the outliers will be pulled into the cache.

    As FIG. 1E of the drawings indicates, a processor chip 9200 may operate at a speed of 400 MHz and be associated with an on-board, first level memory cache 9202 of 16K. A second level cache 9204 of say 512K would be associated with the chip 9206. These are in addition to the normal RAM 9208 of perhaps 32 MB operating at a speed considerably less than the 400 MHz of the first and second level cache memories. In operation, the processor would pull instructions in from the cache a line at a time (32 bytes). By ensuring that the most frequently used code, that is, the compiled dominant path code, is stored in a separate memory region from the less frequently used code, the density of the most frequently used instructions in the cache can be increased. In the process, less frequently used instructions will also be stored together but in non-cache memory and will thus not pollute the cache.

    Identification of the Frequently Executed Fragments

    In order to separate the frequently executed fragments from infrequently executed fragments of a section of code, it is necessary first to identify those fragments which are frequently executed. This can be accomplished by analysing an execution run of the code and identifying the most frequently executed paths though the code (the dominant path). The dominant path can be determined from a previous run of the code. In the present embodiment of the invention, the dominant path is determined dynamically on line during a run. Detailed discussion of the determination of the dominant path can be found in Appendix 1 of this specification. In summary, the number of times each block of code is executed is recorded by an execution history recorder. The execution history recorder notes that the block has been executed and also notes from where the control has passed into the block and also notes the successor of the block (to where the control passes from the block). From that information, the most popular successors of each block can be determined and thus the dominant path can be found.

    In the case where the code is code of a Java application, the code is first translated by an interpreter. The execution history recorder is run by the interpreter and records information about the interpretation of each block. Once a block has been executed a threshold number of times by the interpreter, the interpreter passes details of the block to a queue for compilation which is managed by a compiler manager. The threshold number of times may be 5. When the compiler manager inspects the queue and takes the block for compilation, it traces the dominant path from the block using the information recorded by the execution history recorder regarding the interpretation of the block and its most popular successors. The compiler then produces a compiled version of the dominant path fragment of code as described in more detail below.

    For example, for a section of non-compiled code having a general structure as that shown schematically in FIG. 1A, a path of execution through the blocks of code is usually B1, B3, B5. When the block B1 has been executed 5 times, it is queued for compilation. The compiler traces the dominant path from B1 and finds that, although the exceptions sometimes occurred, the most popular successor of B1 was B3, and the most popular successor of B3 was B5. Thus the dominant path from B1 is B1, B3, B5. The compiler then proceeds to produce a compiled version of the dominant path.

    Compilation of the Dominant Path

    Full compiled versions of the infrequently executed pieces of code B2 and B4 are not prepared. In an alternative embodiment, compiled versions of the code could be prepared but compilation of those sections would take time and the compiled versions would occupy memory space and thus this alternative embodiment is not attractive where there is limited memory, for example in a virtual machine.

    The fragments B1, B3, B5 are laid out sequentially (see fragments B1, B3, B5 of FIG. 1C). Optimisations are made in the compilation of the code, for example using known optimisation techniques. Exception checks are inserted at relevant positions in the compiled code, the exception checks corresponding to the checks originally in the blocks B1, B3, B5 of the non-compiled code. The exception checks each include a jump to a relevant piece of code called an outlier (OL2 is shown for the exception in B1). As indicated above, it is preferred that the outlier does not just contain a compiled version of the code B4 for handling the exceptions. The outliers include code for updating any necessary states and registers before transfer of control out of the compiled version of code.

    For example, where the compiled code has been optimised, at the time of the conditional transfer corresponding to that at the end of block B1, some states may not yet have been updated at the end of block b1. Also, the compiled version of the code may hold states in different memory locations to those of the original code. The outlier OL1 updates all of the states and registers to what they would have been at the transfer of control out of the block B1 into B2. The outlier OL1 then transfers control to a conversion device which transfers control to the interpreter which then proceeds to interpret the code for B2. Once the exception has been handled, if appropriate, the control can be passed back, via the glue code, to the outlier, which reinstates the states which had been updated and execution of the compiled code can resume at block B3. See Appendix 1 of this specification for a further discussion of the role of the conversion device and the glue code.

    It will be appreciated that, in most cases, an exception will not occur and the execution will simply pass through the blocks B1, B3, B5.

    As indicated above, the compiled code is generated in the code buffer forwards and the outliers are generated in the code buffer backwards so that they are spatially separated in the buffer. Thus the outliers are less likely to be pulled into a cache. Although the execution of the exceptions (via the outliers) might be slower than for the case where the infrequently executed code was cached with the dominant path code, that decrease in speed is more than compensated for by the increased speed of execution of the dominant path, especially where the infrequently executed code is very rarely executed.

    Apparatus for carrying out the method of the present invention is shown schematically in FIG. 1F. The apparatus includes an interpreter 9300 for interpreting code. An execution history recorder 9302 records details of the execution of the code by the interpreter 9300. When a block is executed the predetermined number of times, the execution history recorder 9302 notifies the compiler manager 9304 which administers a queue of blocks for compilation. The compiler 9306 consults the queue and takes blocks for compilation, determines the dominant path from the records of the execution history recorder 9302 and compiles the dominant path fragment and prepares any necessary outliers for the fragment. The compiled fragments are loaded into the code buffer 9308. The dominant path fragments are loaded forwards in the buffer 9308 and the outliers are loaded backwards in the buffer 9308. At some time, lines of the compiled code in the buffer 9308 are pulled into the cache 9310. Compiled code is executed from the buffer 9308 or from the cache 9310 by the execution device 9312. If an exception is encountered which cannot be handled by the dominant path code, the outlier 9314 updates any necessary states and transfers to the glue code 9316 which transfers control to the interpreter 9300 which proceeds to interpret code for the handling of the exception.

    In any or all of the aforementioned, certain features of the present invention have been implemented using computer software. However, it will of course be clear to the skilled man that any of these features may be implemented using hardware or a combination of hardware and software. Furthermore, it will be readily understood that the functions performed by the hardware, the computer software, and such like are performed on or using electrical and like signals.

    Features which relate to the storage of information may be implemented by suitable memory locations or stores. Features which relate to the processing of information may be implemented by a suitable processor or control means, either in software or in hardware or in a combination of the two.

    In any or all of the aforementioned, the invention may be embodied in any, some or all of the following forms: it may be embodied in a method of operating a computer system; it may be embodied in the computer system itself; it may be embodied in a computer system when programmed with of adapted or arranged to execute the method of operating that system; and/or it may be embodied in a computer-readable storage medium having a program recorded thereon which is adapted to operate according to the method of operating the system.

    As used herein throughout the term ‘computer system’ may be interchanged for ‘computer,’ ‘system,’ ‘equipment,’ ‘apparatus,’ ‘machine’ and like terms. The computer system may be or may include a virtual machine.

    In any or all of the aforementioned, different features and aspects described above, including method and apparatus features and aspects, may be combined in any appropriate fashion.

    It will be understood that the present invention(s) has been described above purely by way of example, and modifications of detail can be made within the scope of the invention.

    Each feature disclosed in the description, and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination.

    APPENDIX 1

    The present invention relates to a computer system and to a method of operating a computer system. In particular, the invention relates to computer systems including a compiler for compiling code for execution. In a preferred embodiment, the invention relates to Dynamic Compilation of the Dominant Path.

    This invention is preferably related to the optimisation of the runtime representation of object-oriented computer languages by means of runtime compilation technology and preferably to the optimisation of the runtime representation of object-oriented computer languages by means of runtime compilation technology. Aspects of the invention are related to optimised execution of virtual machines, and in particular Java virtual machines.

    The invention relates in particular to trace scheduling, optimising compilers, dynamic compilation, profile guided optimisations, just in time compilers and the Java VM specification.

    In some applications, for example using the Java language, code may be interpreted directly using an interpreter. The interpreter translates the code during execution and thus, the interpretation of code can be very slow. The execution of compiled code is therefore preferred since such execution is generally significantly faster than interpretation.

    Standard compilers translate all of the code of an application to give a complete compiled runtime representation of the code for execution. Such standard compilation is time consuming, especially where optimisation of the compiled code is desired, and is usually carried out off-line before execution of the code.

    The Just-in-Time (JIT) compiler provides on-line compilation of code. For example, using a JIT compiler, when a method is first encountered in the execution of the code, the execution is stopped and the JIT compiler compiles the whole of the method, optimising where possible. Thus the JIT compiler compiles the whole method, including parts of the method which are unlikely to be used. Such compilation wastes time in compilation and the compiled version of the code takes up space in the memory. This can present a particular problem for an embedded application where minimising the use of memory is of importance.

    Generally, compilers of the runtime representation of computer languages and in particular so-called Just-in-time (JIT) compilers, compile the representation of a whole method a time, or a larger unit (for example, a file or one of many classes at a time). Often a significant portion of an application relates to handling exceptional situations, or rarely executed code. Typically, the compiler blocks any further progress of the application until the compilation completes.

    The conventional compilation approach therefore spends time compiling code which is rarely executed, and the compiled result occupies space which would have not been needed if the rarely executed code were not present. Optimisation opportunities are often reduced by having to cater for control paths through the rarely executed code.

    Offline compilers which use profile input from a previous run of the application can often optimise the frequently executed paths of an application to mitigate the latter problem. However they still must compile every path through the application, and cannot easily react when an application exhibits different behaviour, to that of the profile run.

    For the JIT compiler, when the ‘invoke’ instruction for a method is encountered, control is passed to the JIT compiler and, if the method has not previously been compiled, a compiled version is created. The compiled version is then used for the subsequent execution of the method. Once the budgeted memory available to the JIT compiler is used, the compilation of new methods is not possible and the use of the JIT compiler ceases. Methods subsequently found will be interpreted, thus slowing subsequent execution of the non-compiled code.

    The amount of memory available to the compiler varies depending on the computer system used. The overall memory allocated to the compiler includes the code buffer space, the space allocated to the compiler for building required internal data structures and for register allocation. That memory is usually set aside for the compiler prior to compilation.

    JIT compilers were designed for use on desktop computer systems having plenty of memory. The memory allocated to the compiler is generally so great that the amount of buffer space available to the compiler is, in practice, unlimited.

    For embedded systems, however, the amount of memory allocated to the compiler might be 70 or 80K. Clearly, that imposes constraints on the amount of code that may be compiled.

    In summary, the Invention described in this application involves any, some or all of the following features, in any combination:
    • 1. Compile fragments of code for the dominant path rather than whole methods.
    • 2. Use execution history to determine which paths through the application are the dominant ones.
    • 3. Use a fallback interpreter to interpret infrequently executed code.
    • 4. Have an online compilation system which can compile code on demand as the application executes. This system does not block progress of the application. The system runs as a separate thread, whose priority is adaptive.
    • 5. Have the ability to incorporate new fragments of code into a running multi-threaded system.
    • 6. Support removal of fragments of code from a running multi-threaded system.
    • 7. Constrain the amount of memory used by the dynamic compiler during its execution at any time.


  • The invention described in this application aims to, among other things, reduce the performance impact of online compilation, generate code which is optimised for the dominant paths through an application, allow better optimisation of code, within time and memory constraints, reduce the storage overhead of compiled code which is rarely executed, improve application responsiveness in a multi-threaded computer system, and reduce the amount of memory used by the compiler itself.

    According to the present invention, there is provided a computer system including a compiler for compiling the code of an application, wherein the compiler is arranged to compile a fragment of the code.

    By compiling only fragments of code rather than whole methods, it is made possible only to compile the most desirable sections of code, leaving the less desirable fragments uncompiled.

    By this method, the compilation may be made more efficient as only those fragments required are compiled. Also, the memory of the system need not be filled with compiled versions of rarely executed code.

    Where reference is made to a fragment of code, it preferably refers to a section of code which represents less than a whole method. Preferably the fragment of code includes one or more blocks of code. It is preferred that the smallest unit of compilation of the code is a block.

    A particularly preferred feature of the invention is that the fragment of code is a dominant path fragment of the code.

    It will be understood that a dominant path fragment includes a fragment including a number of blocks of code which represents a preferred execution route through the relevant code. For example, where a section of code includes a conditional branch, on repeated execution of code through the branch, one path through the branch is likely to be preferred over another path through the branch. The fragment of code associated with the preferred route through the branch is preferably considered to be a dominant path fragment.

    As indicated below, in some cases, another less preferred route through the branch may also be a dominant path.

    In a preferred embodiments of the present invention, the dominant path fragments of code include code which is frequently executed. Preferably, the dominant path does not include infrequently executed code. Such infrequently executed code may include, for example, code for handling infrequently encountered exceptions.

    By compiling only the dominant path, in accordance with a preferred embodiment of the invention, the storage overhead of storing compiled code which is rarely executed can be minimised. Further, optimisation techniques can be used to optimise the execution of the dominant path code thus increasing the speed of execution of the dominant path code. Further, the compiler need not waste time on-line in compiling rarely executed code and so the overall speed of execution in the system can be improved.

    In a preferred embodiment of the invention, a fragment of code is considered to be part of a dominant path if it is executed more than a predetermined number of times.

    Preferably, the computer system further includes an execution history recorder for recording the number of times a fragment of code is executed, preferably interpreted.

    Preferably the execution history recorder records the number of times a block of code is interpreted.

    In preferred embodiments of the invention, as well as recording how many times a particular block has been interpreted, the execution history recorder also records further information regarding the execution of the block, for example, from where the transfer of control into the block came and to where control was transferred out of the block. The recorder preferably also records what type of code was executed in the block.

    Preferably a fragment which has been interpreted a number of times which is equal to or greater than a threshold is able to be compiled. Preferably, the threshold is greater than or equal to 2, 5 or even 10.

    Thus the frequently executed blocks of code are compiled. It is generally unpreferable for unexecuted blocks to be compiled. In preferred embodiments of the invention, no unexecuted blocks are compiled.

    Preferably the system further includes a compiler manager and the execution history recorder is arranged to alert the compiler manager when a fragment of code has been interpreted the threshold number of times. In preferred embodiments of the invention, the compiler manager administers a queue of frequently executed blocks for compilation. Preferably the queue is managed in such a way that only the more frequently executed blocks are chosen from the queue for compilation by the compiler.

    Preferably, the threshold is able to be dynamically tuned. For the example above, in which the compiler manage administers a queue, if the queue is persistently long, the threshold is preferably raised so that fewer blocks are sent to the queue for compilation.

    It is highly preferable for the execution history recorder to be arranged to record during the execution of the application. It is preferred for the execution history to be collected on-line so that a representation of the dominant path for the particular execution of the application by the system may be determined and used to generate the compiled code. In the alternative, when information regarding the dominant path is captured from a previous run, there is a risk that conditions may have changed from the previous run and the dominant path of the previous run is not a representation of the dominant path of the present run. Furthermore, the dominant path may change during a run.

    Preferably, the system further includes an interpreter for interpreting the code of the application and the execution history recorder is arranged to record the interpretation of fragments of code. It is more efficient for the interpreter to manage the execution history recordal. It is envisaged that the recordal of execution of compiled fragments of code could be carried out but in many cases it is thought that it would not be worthwhile having regard to the time and memory required to do so.

    Most preferably, the execution history recorder is arranged to record a path of execution from a first fragment to a second fragment. Preferably, the path of execution from a first block to a second block is recorded. In a preferred embodiment, the execution history recorder records, for the execution of a particular block, to where control was transferred from the block. Thus, for a particular block, the most likely successor block can be determined. Thus a dominant path from the particular block can be determined. If the particular block passes the threshold number of executions and is compiled, a dominant path from that particular block through the most likely successors can be compiled.

    Thus, preferably, the compiler is arranged to compile a path of fragments.

    Preferably, the system is arranged so that only fragments in which all of the code has been executed are able to be compiled. Some sections of code are not always suitable for compilation. If sections of the code have not been executed, the unexecuted portions might include "hidden" code which is unsuitable for compilation. Compilation of such unexecuted code is avoided in preferred embodiments of the invention.

    In embodiments of the present invention, a block of code is unsuitable for compilation if it has not executed all the way to a control transfer. As a result, there may still be symbolic resolution required—a job left for the interpreter to implement.

    Preferably, the compiled version of the dominant path exposes only one external entry point to the rest of the system. Therefore, assumptions may be made in the compilation of the code. Thus the compiler is preferably arranged to create compiled fragments having only one external entry point.

    Where the fragments of code are compiled, preferably the compiler is able to optimise the compiled code. Such optimisations might include inlining. Where compiled code has been optimised, in particular where assumptions have been made when optimising the code which might later prove to be untrue or too limiting, preferably the compiled code is associated with a marker to indicate that a particular optimisation or assumption has been made.

    In preferred embodiments of the invention, several optimisations are made, in many cases using various assumptions, to produce particularly efficient compiled code for the dominant path.

    Preferably, the system includes a fallback interpreter. Preferably the fallback interpreter is not used when a compiled version of code is available, but is used when no compiled version is available, an exception occurs, or an assumption proves false during execution.

    Preferably the system includes an interpreter and at least one portion of compiled code wherein, on execution of the code, at least a first portion of the code is executed from compiled code and at least a second portion of the code is executed from non-compiled code by the interpreter. Preferably, the system uses a fall back interpreter.

    This feature is of particular importance and may be provided independently. Thus, a further aspect of the invention provides a computer system including an interpreter and further including the code of an application, the code including at least one portion of compiled code, wherein, on execution of the code, at least a first portion of the code is executed from the compiled code and at least a second portion of the code is executed by the interpreter.

    The interpreter can be used where there is no compiled versions of the code available or, for example, where assumptions made in the compilation of the code are found to be untrue. Thus more aggressive optimisation is thus made possible to produce optimised code which might not be ‘safe’ to use in all cases. Where a case is identified in which the compiled version is not safe to use, the fallback interpreter can complete the execution of the necessary code without excessive disruption to the execution and without the need to cease execution while a fresh compiled version of the section of code is produced.

    Preferably the system further includes a searching device for determining whether there is a compiled version of a fragment of code. Thus, the possibility of time being wasted when an interpreter interprets a section of compiled code is available, is reduced. Preferably, the compiler is able to compile on-line. Thus the compiler is able to create compiled versions for any new dominant path fragments which may appear during a run.

    In a preferred system, the system is multi-threaded. Preferably the compiler runs on a separate thread to the thread executing code.

    Preferably, the compiler is able to limit the memory which is used by itself and by the compiled fragments. Thus the compiler preferably has a memory management policy enforced by the compiler to limit the memory used by compilation. This is of particular importance for virtual machines which have limited memory. Preferably the system also includes a deletion device for deletion of compiled code. Thus compiled versions of less frequently used code are able to be deleted to release memory for new compiled code.

    The present invention finds particular application for virtual machines, in particular in embedded systems. It is envisaged that the invention could also find general use in systems for which there is the choice of executing compiled code and interpreting code. The invention is of particular use in systems having memory constraints.

    The invention also provides a compiler for compiling code in a computer system, the compiler being arranged for the compilation of a fragment of code. Preferably the compiler is arranged for the compilation of a dominant path fragment of the code.

    Accordingly, the invention provides a computer system containing a compiler for compiling the operating code of an application, in which only dominant path (or near dominant path) fragments of the code are compiled.

    This technique can afford the primary advantage of enhancing performance and reducing compiled space. It is important for a small memory application and involves a mixture of trade offs between memory size, compilation time and performance.

    In its preferred form, it also enables the use of key optimisation techniques, involving loops and inlining, without the overhead of global dataflow analysis, and hence allows the compiler itself to execute much faster than compilers that do perform global dataflow analysis. The memory usage of the compiler itself is also much lower.

    In the system as defined, advantageously only the dominant path of execution is compiled, rather than all the paths through the code, while the remaining paths are interpreted.

    It is a particularly preferred feature that the compiler is operating on-line, in the sense that as the operating code is running parts of it are being compiled; what is termed the dominant path may be constantly changing as execution of the code progresses.

    The invention further provides a method of operating a computer system, the computer system including a compiler for compiling the code of an application, wherein a fragment of the code is compiled.

    Preferably, the number of times a fragment of code is executed is recorded by an execution history recorder.

    In a preferred embodiment wherein the system further includes a compiler manager and the execution history recorder alerts the compiler manager when a fragment of code has been executed a threshold number of times, and preferable wherein the execution history recorder records during the execution of the application.

    The invention provides in a further aspect a method of operating a computer system including an interpreter and further including the code of an application, the code including at least one portion of compiled code, wherein the method includes executing at least a first portion of the code from the compiled code and executing at least a second portion of the code using the interpreter.

    Preferably the compiler compiles on line. Preferably the memory available to the compiler is limited and preferably the method further includes the step of deleting compiled code.

    Also, according to the invention, there is provided a method of operating a computer system containing a compiler for compiling the operating code of an application, the method including compiling only the dominant path fragments of the code.

    The method can enhance the performance and reduce the compiled space requirement of the computer system and the memory space requirements of the compiler itself.

    Advantageously, information identifying the dominant path is provided from the execution history of the code. The execution history information is preferably derived dynamically as the program runs. The execution history information is advantageously captured from a previous run of the code.

    In its preferred embodiment, infrequently executed code is interpreted in a fallback interpreter, whereby preferably execution of the code can continue without the need for compiled code for the infrequently executed code.

    Advantageously, an online compilation system is provided which can compile code on demand as the application/program executes whereby compilation information can be generated in response to the appearance of a new frequently executed path.

    When the computer system is operating in a multi-threaded system, new fragments of code are preferably incorporated into the multi-threaded system, whereby preferably to achieve smoother operation without stopping running threads.

    The invention further provides a method of operating a computer system containing a compiler for compiling the operating code of an application, the method including compiling only the dominant path fragments of the code.

    Preferably the method includes compiling a fragment of the code and preferably includes compiling a dominant path fragment of the code.

    The invention also provides the use of a fall back interpreter to execute infrequently executed code.

    Further provided by the invention is code for a computer system, the code including compiled code produced by a method as aforesaid.

    Any, some, or all of the features of any of the aspects of the invention may be applied to any other aspect.

    The following considerations apply to any and all the inventions and aspects of the inventions described above.

    1. Compile Fragments of Code for the Dominant Path rather than Whole Methods.

    A summary of a preferred embodiment is as follows:

    The compiler takes as input the runtime representation of the source program, and execution history information (which may be obtained as described below). The execution history information could be live (that is, dynamically changing as the program runs), or captured from a previous run of the program.

    Execution history information is combined with structural information determined from the runtime representation of the program source, to establish what is the dominant path of the program the compiler should compile. Unexecuted code is preferably never included in the dominant path.

    The compiler treats the dominant path as a super-block fragment, laying the code out sequentially, even though the program source may not be. Branches and tests are adjusted where necessary to make the dominant path fall-through. Code and registers are optimised with the assumption that the dominant path will be followed to the end. This improves performance on modern processor architectures. Critically, the dominant path only exposes one external entry point. This greatly simplifies and enhances optimisations.

    As shown in FIG. 1A, where the path of execution would leave the dominant path, the appropriate run-time tests are inserted with a forward branch 1000 to some stub code referred to as an "Outlier" 1002. The outlier stub updates any state which the dominant path has not written back yet, before transferring control out of the fragment. The mainline code of dominant paths are generally kept together, as are the outlier stubs as shown at 1002. This improves performance on modern processors, especially where branch prediction software/hardware initially assumes that forward branches are less likely. It also provides better instruction cache behaviour.

    Compiling dominant paths of execution allows loop optimisations and inlining to be performed, while simplifying the analysis required for many optimisations. It obviates the need for the compiler to have to resolve symbolic references. That is left to the fallback interpreter.

    For example, when loading a new class symbolic references are used, for example, for fields so that when the first time the reference is seen it is necessary to load the class hierarchy satisfying the symbolic references. Where, in a preferred embodiment of the invention, all of the relevant code has been interpreted at least once, the symbolic references have already been resolved before the code is compiled.

    Often exceptions need to be recognised in the middle of a loop after some global state has changed. The exception check can be performed early outside the loop, forcing the code into the fallback interpreter, thus allowing the check to be removed from the loop, and code motion to be performed in the presence of those exceptions.

    The fallback interpreter will execute the loop and recognise the exception at the right time, albeit more slowly. It is assumed that exceptions rarely occur, and therefore the benefits of the optimised loop will outweigh the disadvantages.

    Various optimisations can be made in compiling the code. The optimisations may be made at block level or may be more widespread, in particular where several blocks are involved. An advantage of the preferred embodiments of the invention is that flow analysis need not be carried out. Registers are preferably used for the compiled code to give faster execution of the compiled code.

    Where the fall back interpreter is available for use, it is possible to make various assumptions when compiling the code and to omit several safety checks which might otherwise have been required if no fallback interpreter were available. If later any of the assumptions is proved wrong, or if the lack of safety checks would cause something to go wrong, the fallback interpreter can be used to interpret the relevant non-compiled code.

    When the compiler is being executed online as the application is executed, the compilation overheads are often critical. By only compiling the dominant path, the compiler is simpler, quicker, and uses less memory for its analysis and therefore can afford to perform more optimisations than would otherwise be feasible, especially in a small memory system.

    2. Use execution History to Determine Which Paths Through the Application are the Dominant Ones.

    Execution history is captured as the application executes. It is maintained at the block level, when a transfer of control occurs. It is preferred for the execution history recorder to record when a block is entered (when the transfer of control into the block occurs). The execution history recorder may also record other details relating to the execution of the block, for example which is the next block (successor) that was executed after the block in question. Thus information about the preferred route of execution through the blocks of code may be obtained rather than only information about individual blocks.

    For each block an entry count and list of successors is kept with a count associated with each. These counts act as an indicator of popularity. Execution history records also contain an indication of what instruction caused the transfer of control which ends the block. Only blocks that have executed up to the transfer of control are candidates. For blocks which have not executed all of the way through, it is not known what type of code is ‘hidden’ in that part of the block which has not been executed. Such hidden code might contain code which requires symbolic resolution. It is therefore preferred that such blocks are not compiled. Where the count of the block is made in the execution history recorder as the control is transferred from the block, only blocks which have executed to the end will be counted. Alternatively, or in addition, checks can be carried out prior to compilation to check whether the block has executed to the end.

    When memory is constrained, execution history records are recycled in two ways. Firstly, the list of successors is limited to a small number, and when a new successor is encountered the least popular existing successor is replaced with the new one. When there are no free execution history records, all of the history records associated with the least frequently used method are moved to the free list.

    In summary, compilation of a fragment is triggered by the entry count of a block exceeding a given threshold. The threshold may be fixed, or dynamically tuned. However, if the state of the history block indicates that the block is already queued to be compiled, or is not compilable, it is ignored. Such a block may not be queued for compilation.

    In a preferred embodiment, when the code is first executed, none of the code is compiled. Execution is initially carried out by the interpreter. As each block is interpreted, the count of the block held by the execution history is increased by one. The execution history recorder records, for each block, from where the transfer of control into the block came and to where the control was transferred from the block. The execution history may also contain further information about the execution of the block, for example the type of code executed in the block. A threshold is set and when the count for a particular block reaches the threshold value, the block is entered on the queue for compilation. The threshold may be 5; when a particular block has been executed 5 times, it is entered on the queue.

    The compiler is associated with a compiler manager which manages the queue of blocks for compilation. When a particular block reaches the threshold number of executions, the execution history recorder sends a message to the compiler manager to enter the block on the queue for compilation. The compiler is running on a separate thread and checks at intervals to see whether there is an item for compilation in the queue and, at some time, the compiler will start to compile the block referred to at the top of the queue.

    In a preferred embodiment, the queue is managed so that new entries onto the queue are entered at the top of the queue and are therefore most likely to be compiled. When the queue is managed in that way, blocks which reach the threshold many times are more likely to be compiled than blocks which reach the threshold only a few times, or once. So that the queue does not become unmanageable, the compiler manager may delete part or all of the queue from time to time.

    If it is found that too many blocks are being queued for compilation, the threshold can be raised. Equally, if few, or no, blocks are being queued for compilation, the threshold can be lowered. This can be carried out dynamically during the execution of the application. The compiler manager can monitor the length of the queue and, if desired, send a message to the execution history recorder to increase or decrease the threshold.

    When the compiler compiles a block which is queued by the compiler manager, it may proceed to compile just that single block. It is preferred, however, that the compiler uses the information gathered by the execution history recorder regarding the successors of the block and compiles not only the single block which has reached the threshold but also the most popular successors of the block, thus compiling the most popular path from the block (the dominant path). It will be appreciated that the successors of the block may or may not have been executed the threshold number of times to be eligible for compilation in their own right but, nevertheless, are compiled as a part of the dominant path from a block which has been executed the threshold number of times.

    When the compiler takes a block for compilation, it carries out checks to determine whether the block is one which is desirable to compile, for example, if it is able to be compiled, and whether there is already a compiled version of the block available.

    The compiler then traces the dominant path (though the most popular successors of the block) until it gets to the end of the method or comes across a piece of code which it is not desirable to compile, for example because a compiled version already exists. Other code which is not desirable to compile would be code which merges back into the dominant path other than at the original block that triggered compilation. Flow analysis would be required for optimal compilation otherwise. The compiler detects and prevents such control flow merges from occurring (having determined the likely flow at a branch, the unlikely flow is handled by generating code to exit the fragment). It will not pass beyond the end of the method but it will follow, for example, invokes to follow the dominant path. When the compiler stops in its tracing of the dominant path, it starts to compile the code, starting at the beginning of the dominant path.

    When a compilation triggers, the dominant path can be determined by following the most popular successors a block at a time, including following method calls.

    Generally speaking, execution history of the running application is a good indicator of which paths are the dominant ones.

    It will be appreciated that, where there are two or more paths through a method, both or all of the paths through the method may be dominant paths and be compiled if the relevant blocks are executed sufficient times.

    Execution history does not need to be accurate, and can be updated in a number of ways. Rather than track execution history in compiled code, which would slow execution down significantly, execution history is maintained by the fallback interpreter.

    3. Have a Fallback Interpreter Which Interprets Infrequently Executed Code.

    Having a fallback interpreter means that when infrequent or exceptional code is executed, execution can continue without the presence of compiled code for it. The fallback interpreter maintains execution history. It also means that all issues to do with class resolution can be solely handled by the fallback interpreter.

    Where only the dominant path of the code is compiled, where the path of execution leaves the dominant path, interpretation of non-compiled code will be necessary. Furthermore, optimisations may have been carried out in the compilation of the compiled code and, if it is discovered at a later stage that assumptions which were made in the optimisations were incorrect, the fallback interpreter is used to interpret the relevant section of code. Also, the run starts execution using the interpreter before any compiled versions of the code have been created.

    It will be seen, therefore, that there are many occasions where it might be necessary to pass control of execution from the compiled version to the interpreter and away from the interpreter when compiled code is available.

    As is described in more detail below for a particular embodiment, while the interpreter is translating code, checks are carried out to see if there is a compiled version of the code next to be executed. Thus unnecessary interpretation can be avoided.

    Again, as discussed in more detail below, when control is passed to and from the interpreter and between separate pieces of compiled code, special conversion devices are provided. Examples of such devices are "glue code" and "outliers". The conversion devices help to ensure the smooth transfer of execution between compiled versions of the code. They hold, for example, information regarding the address of code to be interpreted at the end of a compiled section and are of particular importance where optimisations have been made in the compiled version to ensure that the variables are up to date and are stored on the correct registers, for example, when the execution is transferred.

    For example, when a jump is made from the compiled code to the interpreter, the interpreter expects memory state to be current, so if a memory location has been put into a register for the compiled version, it needs to be returned to the correct memory location before the interpreter proceeds.

    4. Have an Online Compilation System Which can Compile Code on Demand as the Application Executes.

    As and when application behaviour changes, a dynamic compiler can generate optimised code for any new frequently executed paths which show up. By running as a separate thread, this allows the application to continue useful work via the fallback interpreter.

    5. Have the Ability to Incorporate New Fragments of Code into a Running Multi-threaded System.

    Smoother operation is obtained if a new fragment of code can be incorporated without stopping running threads.

    Once the compiler has completed the compilation of the dominant path for a particular block, it sends a message to the compiler manager that the compilation has been completed. Until complete, the compiled code is kept from the executable code. The compiler manager loads the compiled code in the executable code. The necessary changes are made in the dispatch tables and code cache to indicate that the compiled code is available for the relevant block and where the compiled code is.

    The introduction of the compiled code is carried out atomically so that the stopping of running threads is not required.

    6. Support Removal of Fragments of Code from a Running Multi-threaded System.

    Removal of code fragments is also key to being able to operate in restricted memory environments. It also allows code which was optimised for one dominant path to be replaced with different code when new dominant paths appear. Code can be compiled with optimistic optimisations on the basis that they can be deleted if the optimistic assumptions under which the code was compiled are broken.

    As indicated above, where assumptions made about the dominant path are found to be incorrect for subsequent execution of the code, the fallback interpreter can be used to interpret a non-dominant path through the code. However, if a dominant path which has been compiled is subsequently executed infrequently, it would be desirable to remove the compiled version of the code to release the memory used by the compiled version.

    In some embodiments, the number of times of execution of each piece of compiled code is monitored and, if it is executed infrequently, can be marked as suitable for deletion.

    In a preferred embodiment, the number of times a code buffer is accessed is recorded. Before passing control into a buffer, its execution count is increased. The least popular buffer may be deleted when desirable.

    For example, at a certain point, the compiler may run out of code buffer space. A buffer is then deleted. If a count has been made of the number of times control has been passed into the various buffers, the least popular buffer may be deleted. Alternatively, the oldest buffer may be deleted.

    It will be appreciated that various checks will usually be carried out before the deletion of the buffer to reduce the risk of disruption to the system. See, for example, Appendix 3 of this specification.

    The fact that compilation costs can be radically reduced is illustrated by the schematic diagram in FIG. A1-2 in which the comparative time taken up in profiling, compiling and executing at full speed for the invention 1020 and the typical prior art 1022 are shown as a proportion of a 10-second time slot.

    Use of the dominant path also allows the dynamic compiler to be memory constrained by truncating a fragment some way along the path when the compiler reaches its budgeted memory limit. This is impossible in prior-art compilers.

    Thus, when the compiler has used all of its allocated memory, the compilation of a fragment can be terminated. It will be understood that suitable steps would usually need to be taken so that at the end of the truncated compiled fragment, control can be passed back to the interpreter so that execution can continue at the correct byte code address and with the correct updated parameters and register structures, where required.

    It is crucial in small memory computer systems that the compiler adheres to a memory budget. Prior art compilers typically view memory as an unlimited resource. Hence they may consume large amounts of memory during compilation, to build internal representations of its input program, and to hold results of dataflow analysis and the like.

    In contrast, the dynamic compiler works within external configurable constraints imposed upon it at system start up or build time. It then compiles as much of a fragment as it can within these constraints. If necessary, it truncates the fragment, by relying on the feedback interpreter to receive control at the truncation point. This is impossible in prior art compilers, where the unit of compilation is a method or greater, and where no interaction with a fallback interpreter is available.

    There now follows an example of a run in which execution history is used to determine a dominant path, the dominant path fragment is compiled and execution switches between compiled and non-compiled code.

    The system described includes a virtual machine (VM) and includes an interpreter (in C language) and a Java application. The system is multithreaded and includes a Java main thread, a Compiler Manager thread and a compiler thread.

    For example, the Java application includes Class A:

    Class A
    static main ()
    { for (i=f; i<100; i++) Aa=newA(); a.method(i); }

    The Java thread is started:
    • Java A
    • class load A


  • Class A is loaded and A's dispatch table is loaded. The dispatch table is shown schematically in FIG. A1-3. FIG. A1-3 shows A's dispatch table 1030 having various address entries 1032. For example, the main method is located at address 4000.

    The main program of the VM identifies the address of the method main A at 4000 and calls glue code:
    • call glue (4000)


  • Glue code is a part of the conversion device which enables the execution to switch between the use of the interpreter and the execution of compiled code. Glue code includes several devices for effecting smooth transfer between the execution of compiled code and non-compiled code. Glue code includes sections for one or more of:
    • 1. updating states of memory locations and register states.
    • 2. passing control to the interpreter when no compiled version of code is available or optimisations made in compiling code are found to be inappropriate.
    • 3. passing control away from the interpreter when a compiled version of code for execution is available.


  • The conversion device may include outliers as described above for updating the states. For example, when an exception is encountered in execution of compiled code, control may pass first to an outlier for states to be updated before passing to the glue code for instructing the interpreter to begin executing the code for dealing with the exception.

    The glue code then calls the interpreter to start to execute code beginning at address 4000:
    • call interpreter (4000)


  • The interpreter starts at address 4000 and executes the byte code until it reaches the invoke instruction. The interpreter returns to the glue code which determines that the interpreter is trying to perform the invoke. The interpreter knows where the invoke is in the dispatch table, and tells the glue code.

    The glue code takes the object reference for the method off the stack and looks at the dispatch table to get the address for the method.

    If a compiled version of the start of the method is available, the address of the compiled version will be entered in the dispatch table, and the compiled version is executed.

    If there is no reference to a compiled version of the start of the method, the dispatch table includes an entry for "invoke glue" and a return is effected to a separate section of the glue code which starts interpretation of the method at the relevant address:
    • call interpreter (5000)


  • When the interpreter jumps into the method, it sends a message to the execution history recorder that the method is about to be executed.

    At the end of the method, there is a return, and the interpreter returns to the glue code which returns the execution to the previous method for interpretation or execution of a compiled version as indicated above.

    The glue code includes a dedicated portion for handling returns which ensures that the register, stacks, and so on are correct for the execution of the next piece of code. For example, where the method has been executed from a compiled version and the next piece of code is to be interpreted, anything put onto registers for the compiled version has to be restored into the correct memory location before the next section of code is generated. Thus the return handling glue code restores any states which have been altered as a result of the use of the compiled code.

    Thus the return to the glue code further returns to the return handling glue code before execution passes to the next portion of code.

    The various portions of glue code described above may all be a part of the same piece of glue code, or may be separate glue code pieces. The updating of the states may be carried out by outliers as described above and in the "Detailed Description of the Invention" of this specification.

    A further example below describes the action of the interpreter for a transfer of control other than an invoke.

    In this embodiment, the following method has just been invoked and is to be executed using the interpreter:

    void func (int p, int a) {
    int x=p; for (int i=a; i<p; i++)
    { x=x/i; }
    }

    The interpreter executes the method in byte code, symbolised in numbered lines as follows:

    Bytecode: Java:
    Ø iload_1 x = p; 1 istore_3 2 iload_2 i = a; 3 istore 4 5 goto 16 8 iload_3 x = x/i; 9 iload 4 11 idiv 12 istore_3 13 i inc 4 1 i++; 16 iload 4 i < p?-reiterate if true 18 iload_1 19 if_icmplt 8 22 return

    The method void func is called for the first time. There is no compiled version so the method starts execution by the interpreter. At execution time, the following blocks (groups of lines of code) are recognised by the interpreter:
    • b1={0-5}
    • b2={19}
    • b3={8-19} (not a basic block)
    • b4={22}


  • The interpreter executes the first block b1. The interpreter runs an execution history recorder in which it records that b1, has been executed once and has a count of 1. (Preferably, it also records that the successor of b1, is b2 and that b1was executed all of the way through. For simplicity, references to the recordal of such extra information is omitted below).

    At the end of the block, the interpreter consults the code cache to see if there is a compiled version of the next block b2. (Note that in this example, while there is a transfer of control from one block to another, there is not an invoke and thus there is no return to the glue code. In an alternative embodiment, the interpreter might return to the glue code after every block, but that is likely to be time consuming. In the preferred embodiments described herein, the interpreter only returns to the glue code when
    • a. it encounters an invoke,
    • b. it encounters a return,
    • C. it finds from the code cache that there is a compiled version of the next block, or
    • d. via an exception.


  • In this case there is no compiled version, so the interpreter proceeds to execute b2, giving b2 a count of 1 in the execution history recorder. The interpreter consults the cache again and, finding no compiled version of b3, proceeds to execute b3. For the present example, the loop is repeated 3 times so when a return is made from the method by block b4 (going through the return handler glue code as described above), the counts of the blocks in the execution history recorder are as follows:
    • b1=1
    • b2=1
    • b3=3
    • b4=1


  • If the threshold for compilation is 5, none of the blocks b1, b2 or b3 will be queued for compilation.

    After the next time the method void func is called, the counts will be as follows:
    • b2=2
    • b2=2
    • b3=6
    • b4=2


  • Thus the execution history recorder sends a message to the Compiler Manager to queue b3 for compilation. At some later time, the compiler will consult the queue, and compile b3. Before compilation, the compiler determines the dominant path from b3 using the record for b3 in the execution history recorder which indicates the successors of b3. In this simple case, the most popular successor of b3 is b3 so that only the single block b3 representing the loop is compiled. The compilation of b3 may be optimised for example by using registers to store the values of p, x, i and a. A pre-exception condition check could be inserted for an i=0 check (division by zero) (see Appendix 4 of this specification). When the compiler has completed the compilation, it notifies the Compiler Manager what compilation has been done, where the compiled version is and whether it includes a method entry point or not. The compiled version is not available for execution at this time.

    In due course, the compiler manager will load the compiled version of b3. The code cache is updated so that the host code address for that part of the method now points to where the compiled code is.

    At a later time when the method func is called, the interpreter consults the code cache after execution of b2 and finds that a compiled version of b3 is available.

    The interpreter returns to the glue code which, as described above, effects the execution of the compiled version of b3.

    At a later time still, the method func will have been executed 5 times so that b1 and b2 are queued for compilation.

    When b1, is taken for compilation, the compiler will determine the dominant path from b1. The successor of b1 is b2 (the compiler does not consider b3 for compilation as part of the dominant path on this occasion because there is already a compiled version).

    The fragment b1 and b2 is compiled and the dispatch table is updated.

    On a subsequent execution, the compiled code for b1/b2 is executed, a return is made to the glue code, which effects execution of the b3 compiled code. If the path from compiled b1/b2 to the glue to the compiled b3 is effected a sufficient number of times, a patch connecting the compiled b1/b2 to compiled b3 may be made. (Patching is described in more detail under Appendix 2 of this specification). Thus the execution can be made more efficient because the step through the glue is no longer required.

    At a later time, a memory manager associated with the compiler manager decides that memory for the compiler should be freed. The oldest buffer chosen for deletion includes the compiled version of b3. The compiler manager calls the deleter to delete the buffer. Certain checks have to be carried out before deletion (see for example Appendix 3 of this specification). In the example given above, there is a particular problem because a patch was inserted between the compiled code for b1/b2 (which is not deleted) and the compiled code for b3 (which will be deleted). For a discussion of how this problem may be overcome, see Appendix 2 of this specification).

    FIG. A1-4 shows apparatus 1040 suitable for carrying out the embodiment described above.

    The apparatus 1040 includes an interpreter 1042 for interpreting Java code 1043 in the computer system. When the interpreter reaches the end of a block of code, unless there is an invoke or a return, it consults the code cache using the code cache searcher 1044 to see if a compiled version of the next block is available. If there is, the converter device 1046 (which includes the glue code referred to above) carries out the necessary changes and alterations before passing control to an executer 1048 for executing the compiled version 1049 of the code.

    As interpreter 1042 executes, it records in the execution history recorder 1050 which blocks of code have been executed as well as further details about the execution of the block, for example which blocks were executed before and after the block and what type of code was executed.

    The execution history recorder 1050 notifies the compiler manager 1052 when a block is executed a threshold number of times. The block is held in a queue 1054 managed by the compiler manager 1052. A threshold tuner 1056 monitors the length of the queue from information from the compiler manager 1052. Based on information regarding the length of the queue, the threshold tuner 1056 alters the threshold for the execution history recorder 1050 to send a block to the compiler manager.

    A compiler 1058 compiles blocks referred to in the queue 1054. The compiler 1058 uses information from the execution history recorder 1050 regarding the execution of the block to determine the dominant path from the block and prepares a complied version of the code. When the compiled version is complete, the compiler 1058 notifies the compiler manager 1052 which updates the necessary dispatch tables and code caches and loads the compiled version.

    The compiler manager 1052 includes a memory manager 1060 which monitors the memory available to the compiler 1058. If memory available becomes low, the memory manager 1060 instructs a deleter 1062 which is coupled to the memory manager 1060 to delete some of the compiled code. Also, if the queue 1054 becomes too long, the compiler manager 1052 instructs the deleter 1062 to delete some or all of the queue 1054.

    FIG. A1-5 shows paths of execution through code of a method generally referred to as 1066.

    The figure shows schematically various fragments of code, for example 1068, 1070, 1072. Such fragments of code may each represent a block of code.

    The code shown in the Figure has one external entry point 1074. After block 1072, there is a conditional branch 1076, for example an exception check. If an exception occurs, the execution passes along path A to code 1078 to handle the exception. Otherwise, code passes along path B to code block 1080 at which point there may be a call (path C to block 1082) or the execution may follow path D to code sections 1083, 1084. Execution may pass along path E to block 1085 or path F to block 1086.

    Information about execution runs through the code 1066 is recorded on the execution history recorder 1050 run by the interpreter 1042.

    If block 1068 is found to have been executed by the interpreter the threshold number of times, it is passed to the queue 1054. The compiler 1058 consults the execution history in the recorder 1050 and finds that:
    • 1. The more popular successor of 1072 is 1080 (that is, execution passed along path B more often than along path A);
    • 2. The more popular successor of 1080 is 1083 (that is, execution passed along D more often than along C); and
    • 3. The more popular successor of 1084 is 1085 (that is, execution passed along D more often than along C).


  • The compiler 1058 determines that the dominant path is therefore 1068, 1070, 1072, 1080, 1083, 1084, 1085 through the code. The dominant path is indicated as 1088.

    While the compiler 1058 was tracing the dominant path 1088, it noted that fragment 1084 was never executed all the way through (path F was never followed). Thus, 1084 is not a suitable candidate for compilation and the dominant path fragment for compilation does not include fragments 1084 or 1085.

    Thus the compiled dominant path fragment includes fragments 1068, 1070, 1072, 1080 and 1083.

    In any or all of the aforementioned, certain features of the present invention have been implemented using computer software. However, it will of course be clear to the skilled man that many of these features may be implemented using hardware or a combination of hardware and software. Furthermore, it will be readily understood that the functions performed by the hardware, the computer software, and such like are performed on or using electrical and like signals.

    Features which relate to the storage of information may be implemented by suitable memory locations or stores. Features which relate to the processing of information may be implemented by a suitable processor or control means, either in software or in hardware or in a combination of the two.

    APPENDIX 2

    The present invention relates in one aspect to a method of creating a link from a first piece of compiled code to a second piece of compiled code, and to a method of compiling code. It relates in another aspect to methods of and apparatus for examining memory in a computer system to allow a section of compiled code to be deleted, and to a method of and apparatus for deleting compiled code in a computer system, in particular where there may be a link between sections of compiled code. The invention has particular (but not exclusive) application to a self-modifying multi-threaded environment. In a preferred embodiment, the invention relates to multi-threaded fragment patching.

    A self-modifying environment may be one in which sections of compiled code are created and deleted dynamically during execution. Such an environment is described in Appendix 1 of this specification. A multi-threaded environment is one in which several processes, or threads, operate asynchronously in the same workspace.

    In a self-modifying environment there may be situations in which a link must be made between a first section of compiled code and a second section of compiled code that is located elsewhere in the workspace, to enable execution to transfer between the two sections of code. The process of transferring execution from one piece of code to the other generally involves a number of steps, including putting the address of the first piece of code on the stack, together with register values, transferring execution to an intermediate piece of code that identifies the location of the second piece of code, and then transferring execution to the second piece of code. A problem with transferring execution in this way is that a relatively large amount of time is spent in making the transfer.

    In a first aspect of the present invention there is provided a method of providing a link between two pieces of compiled code in a self-modifying. multi-threaded computer system, including inserting a patch from one piece of compiled code to the other.

    By providing patches from one piece of compiled code to another, execution may transfer more quickly than if the patches were not made.

    The step of inserting a patch may include changing a control transfer instruction within the compiled code. The control transfer instruction may be any instruction which causes execution to transfer to another address, such as a jump instruction or a call instruction. The control transfer instruction may be changed to point to the address of the piece of code to which a patch is made.

    The step of changing a control transfer instruction may be carried out atomically. By atomically it is preferably meant that the other threads cannot view the area being changed in a partially changed form. This may be achieved for a single processor system by ensuring that the step of inserting a patch is carried out as a single write operation. Alternatively, some processors provide one or more special instructions or sequences of special instructions which are defined to ensure atomicity; such instructions may be used to ensure atomicity in single processor and multi-processor systems. In this way it can be ensured that patch manipulation is completed before any other operations which may influence the work space are carried out.

    The first aspect of the invention also provides an apparatus for providing a link between two pieces of compiled code in a self-modifying multi-threaded computer system, including means for inserting a patch from one piece of compiled code to the other.

    The first aspect of the invention also provides a method of compiling code, the code including two possible paths of execution, the method including compiling the code corresponding to one of the paths of execution, and including in the compiled code a control transfer instruction which is capable of being changed atomically to point to the address of another piece of code.

    In this way, the compiled code can be arranged so that a patch to another piece of code can be made after the code has been compiled, in particular, to enable the other path to be executed.

    Preferably, the control transfer instruction is of a type which can point to an address which is further from its own address than if the most optimum form of the control transfer instruction were used. This feature can allow the patch to be to a more distant piece of code than would otherwise be the case.

    The method may include forming an outlying section of code which includes the control transfer instruction.

    The first aspect of the invention also provides a compiler adapted to carry out any of the above methods of compiling code.

    In some circumstances it may be desirable or necessary to remove the patches which have been made, for example, because a code buffer containing a section of compiled code is to be deleted, or because assumptions which where made about a piece of compiled code prove not to be valid.

    Thus, in a second aspect of the invention there is provided a method of examining memory containing a section of compiled code in a self-modifying multi-threaded computer system, including identifying any patches into the section of compiled code, and redirecting any such patches. The method may be carried out, for example, because a section of compiled code is to be deleted, or because the section of compiled code is no longer to be used. The redirection of the patch enables execution to continue at the patch without the section of compiled code.

    The second aspect of the invention further provides a method of deleting compiled code in a self-modifying multi-threaded computer system, including selecting a section of compiled code to be deleted, identifying any patches into the section of compiled code, redirecting any such patches, and deleting the section of compiled code.

    Preferably, any such patches are directed to the address of a continuation code. The continuation code enables execution to continue without the section of code. The continuation code may be arranged to effect interpretation of subsequent instructions, or it may be arranged to perform a dispatch table transfer.

    Preferably, the step of redirecting a patch is done atomically, to ensure that other threads cannot access the location being patched when the patch operation is only partially completed. An alternative solution would be to stop all executing threads while the patch was redirected, but that is less preferred due to the execution time lost while the threads are stopped.

    In order to identify patches going into the section of compiled code, the method may include calculating a hash value of the address of the section of compiled code, and examining a hash table of patch blocks to identify any patches into the section of compiled code.

    In the interests of efficient memory usage, any unused patches (such as patches out of the code buffer) should be deleted, so that the overhead associated with the patch can be reclaimed. Therefore, the method preferably further includes identifying any patches out of the section of compiled code, and removing any such patches.

    Thus, the second aspect of the present invention also provides a method of examining memory in a self-modifying multi-threaded computer system when a section of compiled code is to be deleted, including identifying any patches out of the section of compiled code and removing any such patches.

    Preferably the method of examining memory further includes the steps of:
    • examining a frame of a stack in the computer system;
    • identifying whether the frame contains a return address which is in the range of addresses of the section of compiled code to be deleted;
    • and altering the contents of the frame when such a return address is identified.


  • Thus, the second aspect of the invention also provides a method of examining memory in a self-modifying multi-threaded computer system to allow a section of compiled code to be deleted, the method including the steps of:
    • examining a frame of a stack in the computer system;
    • identifying whether the frame contains a return address which is in the range of addresses of the section of compiled code;
    • altering the contents of the frame when such a return address is found;
    • identifying any patches into the section of compiled code; and
    • redirecting any such patch.


  • Thus the second aspect of the invention preferably includes one or more of the features of one or more aspects of the invention described in Appendix of this specification.

    Preferably, the method further includes identifying any patches out of the section of compiled code and removing any such patches.

    Preferably, the alteration of the contents of the frame and/or the redirecting of the patch are carried out at the time of deletion of the section of compiled code rather than, for example, as patches or returns into the deleted code are found during execution.

    The second aspect of the invention also provides apparatus for examining memory in a self-modifying multi-threaded computer system to allow a section of compiled code to be deleted, including means for identifying any patches into the section of compiled code, and means for redirecting any such patches. Thus, execution may continue at the patch without the section of compiled code.

    The second aspect of the invention also provides an apparatus for deleting compiled code in a self-modifying multi-threaded computer system, including means for selecting a section of compiled code to be deleted, means for identifying any patches into the section of compiled code, means for redirecting any such patches, and means for deleting the section of compiled code.

    Preferably, the apparatus includes means for calculating a hash value of the address of the section of compiled code, and means for examining a hash table of patch blocks to identify any patches into the section of compiled code.

    Preferably, the apparatus further includes means for identifying any patches out of the section of compiled code, and means for removing any such patches.

    The second aspect of the invention also provides apparatus for examining memory in a self-modifying multi-threaded computer system to allow a section of compiled code to be deleted including means for identifying any patches out of the section of compiled code and means for removing any such patches.

    Features of one aspect may be applied to other aspects; similarly, method features may be applied to the apparatus and vice versa.

    The following considerations apply to any and all the inventions and aspects of the inventions described above.

    As described above in Appendix 1 of this specification, dynamic compilation may result in fragments of code in a method being compiled, rather than the whole method. The fragments that are compiled correspond to the dominant path, as determined, for example, from the run time representation of the source program and execution history information. At a later stage, other fragments of code may be compiled, for example, where the original assumptions that were made about the dominant path prove to be incorrect.

    As an example, if the code contains a conditional control transfer instruction (such as a conditional branch instruction or a conditional call instruction), the compiler decides whether or not the transfer is likely to be made, and then compiles the code corresponding to the path that is most likely to be followed (the dominant path). However, during execution, it may be decided that in fact the other path should be followed. In such circumstances, when the transfer instruction is encountered, execution transfers to a piece of code known as ‘glue code.’ If the path that is to be followed has not been compiled, then the glue code causes interpretation of subsequent instructions in the path to be followed. If the interpreted path is followed a certain number of times, the compiler may decide that it is worthwhile compiling that section of code, and will then produce a compiled version of the code.

    A self-modifying environment is thereby created, in which sections of compiled code are created (and possibly deleted) dynamically during execution. Such an environment is typically multi-threaded, with several processes operating in the same work space concurrently.

    According to a preferred embodiment, in such a situation, a patch is made from the transfer instruction in the original section of code to the newly compiled section of code. The patch modifies the transfer instruction so as to cause execution to transfer directly to the address of the newly compiled section of code. In order to allow the patch to be made, at the time of compilation the compiled code is arranged so that a patch can be inserted at a later stage, should this be required. This is done, for example, by compiling a longer form of the transfer instruction than is necessary for the original compiled code, to allow a transfer to a more distance piece of code to be made at a later stage.

    A patch may also be made from the newly compiled section of code back to the original section of code, if necessary.

    It should be noted that in a multi-threaded environment, patching such as that described above needs to be done atomically, that is, as a single instruction, so that other threads cannot view the area being changed in a partially changed form. Therefore, the code is arranged so that the patch can be made atomically. To retain atomicity, the patching could be done as a single write operation. Alternatively, some processors provide one or more special instructions or sequences of special instructions which ensure atomicity. In a multi-processor environment the address of the location being patched will probably, for many processors, need to be aligned according to the size of the patch data (such that the address is an integer multiple of the size of the operation).

    A first example will now be described with reference to FIGS. A2-1 and A2-2. This example concerns the case where the non-native code contained a call instruction.

    Referring to FIG. A2-1, a first code fragment 23002 has a call instruction 23003 at address aaa. In the original non-native code this call instruction called the subroutine ‘bar’. During compilation the subroutine bar was not compiled (for example, because it was not certain which version of bar would be used), but instead a piece of outlying code 23004 was created to deal with the situation where bar is called. Call instruction 23003 points to