Method and apparatus for supporting multiple processor-specific code segments in a single executable6049668Abstract A computer-implemented method identifies a code segment which is to be customized to a plurality of different processor types. The method generates object code for the code segment, including generating a plurality of sections for the code segment, each of the plurality of sections being object code for the code segment customized for one of the plurality of different processor types, and generating a control section that causes a selected one of the plurality of sections to be called during execution of the object code in accordance with an executing processor's processor type. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE 1
______________________________________
cpu.sub.-- specifier
Processor Type
______________________________________
pentium.sub.-- ii
Pentium .RTM. II processor
pentium.sub.-- pro
Pentium .RTM. Pro processor
pentium.sub.-- mmx
Pentium .RTM. processor with MMX .TM. techenology
pentium Pentium .RTM. processor
generic A "generic" processor, other than one of the Pentium .RTM.
processor family or Pentium .RTM. Pro processor
______________________________________
family.
The second construct is a "dispatch" construct which is used during compilation to identify the processor-specific constructs and the different processor types to which they correspond. The syntax of this dispatch construct is: cpu.sub.-- dispatch (cpu.sub.-- specifier [,cpu.sub.-- specifier [ . . . ]]) empty.sub.-- function.sub.-- definition The "empty.sub.-- function.sub.-- definition" is an empty.sub.-- function (no code) having the same name as the function.sub.-- definition. Multiple cpu.sub.-- specifier identifiers may be included in the cpu.sub.-- dispatch construct, one for each cpu.sub.-- specific construct for the function.sub.-- definition. According to one embodiment of the present invention, the cpu.sub.-- specific and cpu.sub.-- dispatch constructs are implemented as an "extension" to the C and C++ programming languages. Although these extension constructs are not part of the original programming language, they can be added to the language and used as if they were a part of the original language, such as by using the Microsoft.TM.".sub.-- declspec" keyword. The ".sub.-- declspec" keyword can be used to identify a function as an extension to the language. According to one implementation, the syntax for doing so is as follows: .sub.-- declspec (cpu.sub.-- specific (cpu.sub.-- specifier))function.sub.-- definition .sub.-- declspec (cpu.sub.-- dispatch (cpu.sub.-- specifier [, cpu.sub.-- specifier [ . . . ]])) empty.sub.-- function.sub.-- definition The cpu.sub.-- specifier, function.sub.-- definition, and empty.sub.-- function.sub.-- definition are the same as discussed above. FIG. 2 illustrates multiple code segments written in the C++ programming language incorporating the cpu.sub.-- specific and cpu.sub.-- dispatch constructs. As illustrated, software program 200 includes a first cpu.sub.-- specific construct 201 which identifies a print.sub.-- cpu function 205 customized to the Pentium.RTM. II processor (as indicated by cpu.sub.-- specifier identifier 208). Similarly, program 200 also includes a second cpu.sub.-- specific construct 221 which identifies a print.sub.-- cpu function 225 customized to the Pentium.RTM. Pro processor, a third cpu.sub.-- specific construct 241 which identifies a print.sub.-- cpu function 245 customized to the Pentium.RTM. processor with MMX.TM. technology, and a fourth print.sub.-- cpu construct 261 which identifies a print.sub.-- cpu function 265 customized to the Pentium.RTM. processor. As illustrated, each of the four print.sub.-- cpu functions has the same function name but different instructions which are customized to particular processor types. Software program 200 also includes a cpu.sub.-- dispatch construct 281. The cpu.sub.-- dispatch construct 281 includes a list of identifiers which includes each of the processor types listed in the cpu.sub.-- specific constructs 201, 221, 241, and 261. The cpu.sub.-- dispatch construct 281 identifies a print.sub.-- cpu function 287, the name of which is the same as the function names in the cpu.sub.-- specific constructs 201, 221, 241, and 261. The cpu.sub.-- specific constructs and the cpu.sub.-- dispatch constructs allow the present invention to be used multiple times within the same program on different function names. Thus, other processor-specific functions (not shown) can be included along with constructs 201, 221, 241, 261, and 281 in program 200. It should be noted that the cpu.sub.-- specific functions 201, 221, 241, and 261 of FIG. 2 may be located adjacent one another as illustrated in program 200, or alternatively may be distributed throughout different locations of program 200. FIG. 3 is a flowchart illustrating the steps followed in compiling the high-level language according to one embodiment of the present invention. During compilation, a dispatch construct is first identified, step 305. In the illustrated embodiment, this is the cpu.sub.-- dispatch construct 281 of FIG. 2. Processor-specific constructs corresponding to the dispatch construct are then identified, step 310. In the illustrated embodiment, the empty.sub.-- function.sub.-- definition of the cpu.sub.-- dispatch construct is the same name as the function.sub.-- definition of the cpu.sub.-- specific construct. Thus, the compiler is able to search through the high-level program to identify each of the different constructs which correspond to the dispatch construct, which are constructs 201, 221, 241, and 261 of FIG. 2. The compiler then modifies the names of each of the processor-specific functions, step 315. This is done in order for the assembler to distinguish between each of the different functions. However, this step is done by the compiler and is not visible to the high-level language programmer, who views each of the functions as having the same name. In one embodiment, this is accomplished by a "name mangling" algorithm, which modifies function names as necessary during compilation. In this embodiment, the compiler is pre-programmed with possible processor types and an appropriate modification for each function name based on processor type. By way of example, the characters "$B" can be added to the end of a function name for a Pentium.RTM. processor type, while the characters "$E" can be added to the end of a function name for a Pentium.RTM. II processor type. In the illustrated embodiment, at least one character which is an invalid character for a finction name in the high-level language is added to the function name in the object code. This use of an invalid high-level language character in the object code ensures that the compiler does not modify the name to be the same as another function name created by the programmer. The compiler then generates multiple source code tests corresponding to the dispatch construct, step 320. These multiple source code tests access an intel.sub.-- cpu.sub.-- indicator variable to identify the processor type. According to one embodiment of the present invention, the intel.sub.-- cpu.sub.-- indicator is a bit vector which encodes the processor type. The bit vectors and their corresponding processor types according to one embodiment of the present invention are illustrated in Table II below. Alternate embodiments can include a lesser or greater number of bits.
TABLE II
______________________________________
Bit Vector Processor Type
______________________________________
00000000000000000000000000000001
generic
00000000000000000000000000000010
Pentium .RTM. processor
00000000000000000000000000000100
Pentium .RTM. Pro processor
00000000000000000000000000001000
Pentium .RTM. processor with
MM .TM. technology
00000000000000000000000000010000
Pentium .RTM. II processor
______________________________________
The compiler then adds a dispatch fail instruction to the assembly code, step 325. This dispatch fail instruction identifies a course of action to take when the processor type is not identifiable. In the illustrated embodiment, the dispatch fail instruction is a jump instruction to a dispatch fail function which is a library function that is programmer-replaceable. Thus, the programmer is able to display, for example, an error message indicating the program cannot be executed by the current processor, or alternatively provide a "bare minimum" amount of code which will allow the program to continue running. The compiler then adds a processor identification instruction to the assembly code, step 330. The processor identification instruction identifies a course of action to take when the processor type has not yet been identified. In the illustrated embodiment, the processor identification instruction is a call to a cpu.sub.-- indicator initialization function which loads the processor type information into the intel.sub.-- cpu.sub.-- indicator variable. Thus, once the processor type is loaded into the intel.sub.-- cpu.sub.-- indicator variable, the code will be able to access and identify the processor type. The cpu.sub.-- indicator initialization function obtains the processor type information using the CPUID instruction, supported by many Intel processors. The CPUID instruction identifies the processor family (e.g., Pentium.RTM. processor family or Pentium.RTM. Pro processor family), as well as whether the processor is enabled with MMX.TM. technology (e.g., the Pentium.RTM. processor with MMX.TM. technology or the Pentium.RTM. II processor), thereby indicating whether the processor type is a Pentium.RTM. processor, Pentium.RTM. II processor, Pentium.RTM. Pro processor, or Pentium.RTM. processor with MMX.TM. technology. Additional information may also be returned by the CPUID instruction, such as the stepping of the processor. This additional information can be used in alternate embodiments of the present invention to distinguish between different processor types. By way of example, a particular stepping of a processor may have a "bug" which is not present in subsequent steppings, and thus different code segments can be written customized to the different steppings. During initialization of the program, the intel cpu.sub.-- indicator variable is initialized to zero. The processor type is then stored in the intel.sub.-- cpu.sub.-- indicator variable when the cpu.sub.-- indicator initialization function is called. Thus, in the illustrated embodiment the cpu.sub.-- indicator initialization function need not be called more than once during program execution. FIG. 4 illustrates sample assembly code generated according to one embodiment of the present invention by a compiler from the program code 200 of FIG. 2. The assembly code provides a series of tests for processor types. The tests are performed during execution by checking a value stored at the memory location identified by .sub.-- intel.sub.-- cpu.sub.-- indicator (i.e., the intel.sub.-- cpu.sub.-- indicator variable). If the test succeeds, then the code jumps to the appropriate address for the beginning of the function for the identified processor type. However, if a test fails, then the code checks for another processor type. As illustrated, the code initially checks with test 402 whether the processor type is a Pentium.RTM. II processor. If the processor type is a Pentium.RTM. II processor, then the jump instruction 404 transfers program execution to the memory location indicated by .sub.-- print.sub.-- cpu$E, which is the memory address of the section of code for the print.sub.-- cpu function customized to the Pentium.RTM. II processor (function 205 of FIG. 2). Similar tests are made for the Pentium.RTM. Pro processor, Pentium.RTM. processor with MMX.TM. technology, and the Pentium.RTM. processor. The final test 412 checks whether there is a non-zero value stored in the intel.sub.-- cpu.sub.-- indicator variable. If there is a non-zero value, then jump instruction 414 jumps program execution to a dispatch fail function located at the address intel.sub.-- cpu.sub.-- dispatch.sub.-- fail. However, if there is not a non-zero value stored in the intel.sub.-- cpu.sub.-- indicator variable, then a call 416 to the cpu.sub.-- indicator initialization function located at the address intel cpu.sub.-- indicator is made. Upon return from the cpu.sub.-- indicator initialization function, the program execution continues in a jump to test instruction 402, thereby repeating the process. However, now that the intel.sub.-- cpu.sub.-- indicator variable has been initialized, one of the tests for processor type will be successful, indicating either a particular processor type or a dispatch fail. In the illustrated embodiment, the compiler orders the test instructions so that the program execution jumps to the most "advanced" function (that is, the function customized to the most advanced processor architecture) which can be executed by the processor executing the program. By way of example, if two customized functions are generated, one for a Pentium.RTM. processor and one for a Pentium.RTM. processor with MMX.TM. technology, and if the processor executing the program is a Pentium.RTM. II processor, then the test for the Pentium.RTM. processor with MMX.TM. technology is successful, thereby causing program execution to jump to the function customized for the Pentium.RTM. processor with MMX.TM. technology. Also in the illustrated embodiment, the compiler orders the test instructions in the assembly code such that the highest performance processor is tested for first. This reduces the overhead (the additional tests) of the present invention for higher performance processors. However, alternate embodiments can using different orderings. In one such alternate embodiment, the test instructions are ordered so that the most likely processor to be executing the program is tested for first. The assembly code illustrated in FIG. 4 is a sample of assembly code which is generated according to one embodiment of the present invention. Alternate embodiments can generate different assembly code. By way of example, the ordering of the test instructions can be changed, the test values (-16, -8, -12, and -2) can be changed, different types of testing or comparing instructions can be used, etc. For ease of explanation, the present invention has been described in terms of the assembly code generated by the compiler. Those skilled in the art will appreciate that this assembly code is subsequently converted to object code which is executed by the processor. FIG. 5 illustrates an example hardware system suitable for use with one embodiment of the present invention. Hardware system 500 includes processor 502 and cache memory 504 coupled to each other as shown. Additionally, hardware system 500 includes high performance input/output (I/O) bus 506 and standard I/O bus 508. Host bridge 510 couples processor 502 to high performance I/O bus 506, whereas I/O bus bridge 512 couples the two buses 506 and 508 to each other. Coupled to bus 506 are network/communication interface 524, system memory 514, and video memory 516. In turn, display device 518 is coupled to video memory 516. Coupled to bus 508 is mass storage 520, keyboard and pointing device 522, and I/O ports 526. Collectively, these elements are intended to represent a broad category of hardware systems, including but not limited to general purpose computer systems based on the Pentium.RTM. processor, Pentium.RTM. Pro processor, Pentium.RTM. II processor, or Pentium.RTM. processor with MMX.TM. technology, available from Intel Corporation of Santa Clara, Calif. These elements 502-524 perform their conventional functions known in the art. In particular, network/communication interface 524 is used to provide communication between system 500 and any of a wide range of conventional networks, such as an Ethernet, token ring, the Internet, etc. It is to be appreciated that the circuitry of interface 524 is dependent on the type of network the system 500 is being coupled to. Mass storage 520 is used to provide permanent storage for the data and programming instructions to implement the above described functions, whereas system memory 514 is used to provide temporary storage for the data and programming instructions when executed by processor 502. I/O ports 526 are one or more serial and/or parallel communication ports used to provide communication between additional peripheral devices which may be coupled to hardware system 500. It is to be appreciated that various components of hardware system 500 may be rearranged. For example, cache 504 may be on-chip with processor 502. Alternatively, cache 504 and processor 502 may be packed together as a "processor module", with processor 502 being referred to as the "processor core". Furthermore, certain implementations of the present invention may not require nor include all of the above components. For example, mass storage 520, keyboard and pointing device 522, and/or display device 518 and video memory 516 may not be included in system 500. Additionally, the peripheral devices shown coupled to standard I/O bus 508 may be coupled to high performance I/O bus 506; in addition, in some implementations only a single bus may exist with the components of hardware system 500 being coupled to the single bus. Furthermore, additional components may be included in system 500, such as additional processors, storage devices, or memories. In one embodiment, the compiling and assembling of instructions according to the present invention is implemented as a series of software routines run by hardware system 500 of FIG. 5. In this embodiment, compiler 110 and assembler 120 of FIG. 1 are each implemented as a series of software routines. These software routines comprise a plurality or series of instructions to be executed by a processor in a hardware system, such as processor 502 of FIG. 5. Initially, the series of instructions are stored on a storage device, such as mass storage 520. It is to be appreciated that the series of instructions can be stored using any conventional storage medium, such as a diskette, CD-ROM, magnetic tape, DVD, laser disk, ROM, Flash memory, etc. It is also to be appreciated that the series of instructions need not be stored locally, and could be received from a remote storage device, such as a server on a network, via network/communication interface 524. The instructions are copied from the storage device, such as mass storage 520, into memory 514 and then accessed and executed by processor 502. In one implementation, these software routines are written in the C++ programming language. It is to be appreciated, however, that these routines may be implemented in any of a wide variety of programming languages. FIG. 6 is a block diagram illustrating a device on which one embodiment of the present invention can be implemented. The device 600 is meant to represent a wide variety of devices in which the present invention can be implemented, including conventional storage media (such as a floppy disk, hard disk, or a random access memory), as well as discrete hardware or firmware. The device 600 includes a compiler portion 602 and an assembler portion 604. Compiler portion 602 includes the instructions, to be executed by a processor, for carrying out the process of compiling a high-level language into assembly code, whereas assembler portion 604 includes the instructions, to be executed by a processor, for carrying out the process of converting the assembly code into object code. It should be noted that, although specific syntax for the present invention is discussed above, alternate embodiments can use variations on this syntax. According to one such alternate embodiment, the empty.sub.-- function.sub.-- definition of the cpu.sub.-- dispatch construct is not empty, rather, it contains the code the user wishes for the compiler to make processor-specific. The compiler generates a different piece of object code for each of the different processors, based on the code of the cpu.sub.-- dispatch construct. Each of these different pieces of code are optimized by the compiler for the particular processor types (e.g., by setting of specific compiler switches). Various examples of processor types are given in the discussions above. Although different Intel-architecture processors are discussed, the present invention may also be used to customize code to different manufacturers or different processor types of another manufacturer. Additionally, the present invention is discussed above with reference to the C or C++ programming language. In alternate embodiments, the processor-specific and dispatch constructs are provided in other programming languages, such as PASCAL, Fortran, Java, etc. Furthermore, other modifications can be made by compiler 110 to further enhance the processor-specific customization of the present invention. In an alternate embodiment, one such customization is the setting and clearing of particular processor optimization switches. In this embodiment, when compiling the cpu.sub.-- dispatch and cpu.sub.-- specific constructs, additional switches or compiler options are set by the compiler which correspond to the processor type of the function being compiled (as identified by the cpu.sub.-- specific construct). These additional switches and/or compiler options cause the compiler to further customize the source code generated for the particular processor type. According to another alternate embodiment, the compiler automatically and dynamically customizes the source code for particular processor types. In this alternate embodiment, the compiler analyzes the source code on a code segment by code segment basis to determine whether a performance advantage can be obtained over the non-customized version of the function by customizing the function to a particular processor type. If greater than a threshold performance advantage can be obtained, then the compiler compiles the source code customized for particular processor types in addition to compiling the source code for a "generic" processor. Otherwise, a "generic", non-processor-specific compilation is performed. Thus, the present invention supports multiple processor-specific code segments in a single executable. The present invention allows a programmer to write multiple different code segments, each customized to a particular type of processor, yet each advantageously having the same identifier. Additionally, the present invention also allows a programmer to write a single code segment and advantageously have that single code segment customized to different processor types automatically by the compiler. Subsequently, during program execution, the proper customized code segment is advantageously selected based on the processor which is executing the program. Thus, a method and apparatus for supporting multiple processor-specific code segments in a single executable has been described. Whereas many alterations and modifications of the present invention will be comprehended by a person skilled in the art after having read the foregoing description, it is to be understood that the particular embodiments shown and described by way of illustration are in no way intended to be considered limiting. References to details of particular embodiments are not intended to limit the scope of the claims.
|
Same subclass Same class Consider this |
||||||||||
