Apparatus and method for cross-compiling source code5946489Abstract A method of cross-compiling computer programs includes the step of extracting constants from an inheriting computer program written in a first computer language. The extracted constants refer to a generating computer program written in a second computer language. A new program in the second computer language is then created using the constants. The new program is then compiled for a target computer to ascertain compiled constant values. The compiled constant values are then substituted into the inheriting computer program to produce a final inheriting computer program. Claims We claim: Description BRIEF DESCRIPTION OF THE INVENTION
______________________________________
(1) $(OBJDIR)/
make.sub.-- struct.sub.-- offsets.s:
extract.sub.-- offsets.nawk opcodes.wide opcodes.h
(2) $(OBJDIR)/make.sub.-- struct.sub.-- offsets.s: executeJava.sub.--
sparc.m4.s
(3) $(M4) --DJAVAOS -DEXTRACT.sub.-- OFFSETS
$< .vertline. sort -u .vertline. nawk -f
extract.sub.-- offsets.nawk > $(OBJDIR)/make.sub.-- struct.sub.--
offsets.c
(4) $(CC) $(CFLAGS) $(INCLUDES) -S $(OBJDIR)/
make.sub.-- struct.sub.-- offsets.c -o
$@
______________________________________
The foregoing and following computer code of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the computer code, as it appears in the Patent and Trademark Office files or records, but otherwise reserves all rights established under applicable copyright laws. Lines (1) and (2) of the code indicate that if any of the four files (1) "extract.sub.-- offsets.nawk", (2) "opcodes.wide", (3) "opcodes.h", or (4) "executeJava.sub.-- sparc.m4.s" change, then line (3) should be executed. In other words, the software system description 30 will result in system construction commands 36 that are processed by the tool controller 38. If the tool controller 38 identifies that any of the four files have changed, it will invoke the cross compiler 46. The file "extract.sub.-- offsets.nawk" contains portions of the cross compiler 46 of the invention. In particular, it contains instructions to implement steps 62 and 66 of FIG. 4. If this file changes, then the result should be re-computed. If the files "opcodes.wide" or "opcodes.h" have changed, then it's possible that some of the constants have changed value. Any change to the file "executeJava.sub.-- sparc.m4.s" means that there may be new constants or field offsets of interest or that some previously interesting constants or field offsets are no longer important. If one of the four enumerated files has changed, the following code is executed:
______________________________________
m4-DJAVAOS-DEXTRACT.sub.-- OFFSETS executeJava.sub.-- sparc.m4.s
.vertline.
sort-u .vertline.
nawk-f extract.sub.-- offsets.nawk > $(OBJDIR)/make.sub.-- struct.sub.--
offsets.c
cc <various flags>-S $(OBJDIR)/make.sub.-- struct.sub.-- offsets.c-o
______________________________________
$@
This code corresponds to the operations performed by the cross compiler 46. The code causes "executeJava.sub.-- sparc.m4.s" to run through processor m4 with the two flags JAVAOS and EXTRACT.sub.-- OFFSETS set to 1. This causes the processor m4 to modify the inheriting language program to identify generating program references (step 60). Thus, the processor m4 can be thought of as the constants and fields locator module 50. For example, all references to constants are written as "DEFINED.sub.-- CONSTANT (<constant>)". Further, all references and updates to fields are written in such a way that it is unnecessary to know the size, signedness, or offset of the particular field in the record. For example, each "GET.sub.-- FIELD (<reg>, <structure>, <field>, <result>)" term is converted into a line of the form LOAD.sub.-- STORE <structure> <field>. These operations are implemented as follows. The file executeJava.sub.-- sparc.m4.s is written in a highly stylized format, suitable for m4. The first line of the file is "ifdef(`EXTRACT.sub.-- OFFSETS`, `divert(-1)`)", which says that if the flag EXTRACT.sub.-- OFFSETS is defined, then throw out all input unless otherwise instructed. Later in the file, there are m4 directives such that if EXTRACT.sub.-- OFFSETS is defined, the entire contents of the file are ignored except for occurrences of DEFINED.sub.-- CONSTANT(<baz>), EXTRACT.sub.-- OFFSET (<mystruct>,<myfield>), GET.sub.-- FIELD(<base>,<mystruct>,<myfield>,<reg>), and SET.sub.-- FIELD(<value>,<base>,<mystruct>,<myfield>), where each of the lower-case strings is, in fact, a text string. Each occurrence of DEFINED.sub.-- CONSTANT(<baz>) is output as "CONST <baz>". Each occurrence of EXTRACT.sub.-- OFFSET(<mystruct>,<myfield>) is output as "FIELD<mystruct>,<myfield>". Each occurrence of GET.sub.-- FIELD(<base>, <mystruct>,<myfield>,<reg>) and SET.sub.-- FIELD(<value>,<base>,<mystruct>, <myfield>) is output as "LDST <mystruct>,<myfield>". The code in the file is written such that it makes use of these macros. For example, If reg1 contains a pointer to a methodblock structure, and its ClassClass field is required, GET.sub.-- FIELD(reg 1, methodblock, ClassClass, reg2) is written. If reg3 is to be compared to the constant "opc.sub.-- wide", "cmp reg3, DEFINED.sub.-- CONSTANT (opc.sub.-- wide)"is written. In particular, all references to constants are wrapped inside DEFINED.sub.-- CONSTANT(..). To extract the field from a pointer, the GET.sub.-- FIELD instruction is used. To set the field of a pointer, the SET.sub.-- FIELD instruction is used. To get a pointer to the field of a pointer, "STRUCTURE.sub.-- OFFSET(..)" is added to the pointer. The foregoing processing associated with steps 60 and 62 results in a line for each reference to a constant and a line for each reference to a field. If there are multiple references to a field, each of those references will generate a separate line. It is easier, although not necessary, to process each defined constant and each field only once. Thus, preferably, duplicate lines are deleted to create a constants and fields file (step 64). This may be accomplished by passing the result through "sort -u", a standard utility, which sorts lines and deletes duplicate lines. The sorting of lines is an artifact of the "sort" utility and therefore is unimportant. A new generating language program with a stylized header and footer is then created (step 66). Each line is associated with a macro call, which is defined in the header. For example, the line LOAD.sub.-- STORE<structure><field> becomes LOAD.sub.-- STORE (<structure>, <field>). The specific header and trailer information and the exact code generated for each line is dependent on the generating language. This operation is more fully appreciated with the following example. The constants and fields file created by step 64 is processed by a text processor called "nawk". The program that the text processor runs is contained in the file "extract.sub.-- offsets.nawk". The "nawk" program generates a C program (a generating language program) that has three parts. The first part is a stylized header, for example:
______________________________________
#include "oobj.h"
#include "interpreter.h"
#include "opcodes.h"
#include "tree.h"
#include "typecodes.h"
#include "stddef.h"
#define SHOWME(structure, name).backslash.
{ struct structure *temp; .backslash.
asm("SET.sub.-- STRUCTURE.sub.-- INFO(`" #structure "`,`" #name
"`,`%0`,`%1`,`%2`)" .backslash.
:: "n" (sizeof(temp -> name)), .backslash.
"n" (offsetof(struct structure, name)), .backslash.
"n" ((typeof(temp -> name))(.about.0) < 0)); .backslash.
#define FIELDOFF(structure, name) .backslash.
asm("SET.sub.-- FIELD.sub.-- OFFSET(`" #structure "`,
`" #name "`,`%0`)" .backslash.
:: "n" (offsetof(struct structure, name)) )
#define CONSTANT(name) asm("SET.sub.-- VALUE(`"#name"`,
`%0`)" :: "n"
((int)name) )
main(int argc, char ** argv){
asm("| File automatically generated by m4 and nawk" );
asm("| Do not bother editing this| Find the source|" );
______________________________________
The processing of this header information is discussed below. After this stylized header is created, the following transformations are made for the constants and fields: "CONST<baz>" is assigned to "CONSTANT(<baz>)", "LDST <struct><myfield>" is assigned to "SHOWME(<mystruct>,<myfield>)", and "FIELD <mystruct><myfield>" is assigned to "FIELDOFF(<mystruct>,<myfield>)". A final closing "}" is then appended as a footer. The result is a program "make.sub.-- struct.sub.-- offsets. c", which can be compiled by a C compiler. In other words, at this point, step 66 of FIG. 4 is completed. The next processing step is to compile the program for the target machine (step 68). In accordance with the invention, the compiler is given specific switches to tell it to generate assembly language for the target machine, rather than to generate a binary file. In particular, the stylized generating language file from step 66 is specifically designed so that it generates highly stylized assembly-language code. The code may be stylized to the point that it cannot be actually assembled into machine code. All that is necessary is that the resulting assembly language code be machine-parseable so that one can determine (1) the value that the compiler gave to each constant and (2) the size (number of bytes), offset, and signedness of each of the fields. The result of this step will always be assembly language, regardless of the inheriting language. It is coincidental that in the present example, that the original inheriting language is also assembly language. At this point, the .s file has the following form for each of the SHOWME, CONSTANT, and FIELDOFF items, respectively: SET.sub.-- STRUCTURE.sub.-- INFO(<structure>,<field>,<size>,<offset>,<signedness>) SET.sub.-- VALUE(<name>,<value>) SET.sub.-- FIELD.sub.-- OFFSET(<structure>,<field>,<offset>). The process by which this transformation takes place is as follows. Because of the #define, each line of the form CONSTANT (<baz>) gets turned into:
______________________________________
"asm ( "SET.sub.-- VALUE(`" "baz" "`,`%0`)" : : "n" ( ( int ) baz )
______________________________________
)".
Similarly, each line of the form FIELDOFF(mystruct,myfield) gets turned into:
______________________________________
"asm ( "SET.sub.-- FIELD.sub.-- OFFSET(`" "mystruct" "`,`" "myfield" "`,
`%0`)" : : "n" (
offsetof ( struct mystruct , myfield ) ) )"
______________________________________
Each line of the form SHOWME(mystruct,myfield) gets turned into:
______________________________________
"{ struct mystruct * temp ;
asm ( "SET.sub.-- STRUCTURE.sub.-- INFO(`" "mystruct"
"`,`"
"myfield" "`,`%0`,`%1`,`%2`)" : : "n" ( sizeof( temp -> myfield ) ) ,
"n" ( offsetof( struct mystruct, myfield) ) ,
"n" ( ( typeof( temp -> myfield
) ) ( .about. 0 ) < 0 ) ); }"
______________________________________
These instructions make use of the Gnu C-compiler (or other similar C compiler) facility for generating specific assembly language. However, since the compilation to assembly code is solely for the purpose of defining constants and field offsets, the inline assembly code that is generated does not have to be executable. In particular, the inline assembly code:
______________________________________
"asm ( "SET.sub.-- VALUE(`" "baz" "`,`%0`)" : : "n" ( ( int ) baz )
______________________________________
)"
generates, in assembly language:
______________________________________
"SET.sub.-- VALUE(`baz`,`23`)"
______________________________________
where 23, in this case happens to be the value of baz. The inline assembly code:
______________________________________
"asm ( "SET.sub.-- FIELD.sub.-- OFFSET(`" "mystruct" "`,`" "myfield" "`,
`%0`)" : : "n" (
offsetof( struct mystruct , myfield )"
______________________________________
generates:
______________________________________
"SET.sub.-- FIELD.sub.-- OFFSET(`mystruct`,`myfield`,`16`)"
______________________________________
where 16, for example, is the offset of "myfield" in the "mystruct" structure. The inline assembly code:
______________________________________
"{ struct mystruct * temp ;
asm ( "SET.sub.-- STRUCTURE.sub.-- INFO(`" "mystruct"
"`,`"
"myfield" "`,`%0`,`%1`,`%2`)" : : "n" ( sizeof( temp -> myfield ) ) ,
"n" ( offsetof( struct mystruct, myfield ) ) ,
"n" ( ( typeof( temp -> myfield
) ) ( .about. 0 ) < 0 ) ) ; }"
______________________________________
turns into:
______________________________________
"SET.sub.-- STRUCTURE.sub.-- INFO(`mystruct,`myfield,`2`,`16`,`1`)".
______________________________________
In this case, the 16 is again the offset of the field. The "2" indicates that this is a 2-byte quantity. The "1" indicates that this is a signed value (an unsigned value would assign a "0" to this field). Observe that the s file generated will have additional information. It will specifically have the lines:
______________________________________
"| File automatically generated by m4 and nawk"
"| Do not bother editing this| Find the source|"
______________________________________
These lines are created by the asm declarations at the start of the main routine defined in step 66. The code will also include function prologues and other information. If one intends to cross compile for a different platform, then the .c file is compiled into assembly for the target platform. The macros are written in such a way that all the constant values, field offsets, etc., are generated for the target machine. The .s file is processed by extracting lines that contain the word SET. For example, one can create the file "executeJava.sub.-- sparc.include" by extracting lines containing the word SET. This can be done using the UNIX command "grep", e.g., "grep SET extractoffsets.s executeJava.sub.-- sparc.include". The next processing step is to combine the compiled constants and field parameters with the original inheriting language program. That is, for each occurrence of DEFINED.sub.-- CONSTANT(<name>), replace it with the exact value of the constant. For each occurrence of GET.sub.-- FIELD(<reg>,<structure>,<field>,<result>) convert it into specific inheriting language instructions necessary to access the value. Thus, the final inheriting language program 47 is a combination of the original program text with the result of the processing of the cross compiler 46. These operations can be performed as follows. Run "executeJava.sub.-- sparc.m4" through the processor m4 a second time. However, this time the EXTRACT.sub.-- OFFSET flag is not turned-on. This causes m4 macros called SET.sub.-- FIELD.sub.-- OFFSET, SET.sub.-- STRUCTURE.sub.-- INFO, and SET.sub.-- VALUE to be defined. In addition, the previously created file "ExecuteJavaStructOffsets.include" is read in. Each line of the file is interpreted using the macro definitions defined in step 60. The remainder of the file "executeJava.sub.-- sparc.m4" is handled normally by the processor m4. In particular, the four macros DEFINED.sub.-- CONSTANT(baz) EXTRACT.sub.-- OFFSET(mystruct,myfield) GET.sub.-- FIELD(base,mystruct,myfield,reg) SET.sub.-- FIELD(value,base,mystruct,myfield) have completely different meanings than they did in step 60. For constants, SET.sub.-- VALUE(`name`,`value`) is turned into define(`name,value`) and DEFINED.sub.-- CONSTANT(name) is turned into "name". Hence, if one has SET.sub.-- VALUE(`baz`,`23`) from the include file, and the occurrence of DEFINED.sub.-- CONSTANT(baz) somewhere in the text, the macro pre-processor will turn DEFINED.sub.-- CONSTANT(baz) into baz and then that into 23. For field offsets, SET.sub.-- FIELD.sub.-- OFFSET(struct,field,offset) is turned into the assembler directive: "struct.field=offset". Similarly, "STRUCT.sub.-- OFFSET (struct,field)" is turned into "struct.field". The next effect is that every occurrence of "STRUCT.sub.-- OFFSET(struct,field)" becomes "struct.offset", which the assembler can then replace with the correct value. For field accessors and setters, "SET.sub.-- STRUCTURE.sub.-- INFO(`mystruct`,`myfield`,`4`,`16`,`1`)" defines two macros: "GET.sub.-- FIELD.mystruct.myfield(base, result)" and "SET.sub.-- If one intends to cross compike for a different FIELD.mystruct. myfield(base, value)". The definition of:
______________________________________
"GET.sub.-- FIELD.mystruct.myfield(base, result)" is "ld.2.1 ›base+16!,
value"
"SET.sub.-- FIELD.mystruct.myfield(base, value)" is "st.2.1 value,
›base+16!,"
______________________________________
where the "4", "16", and "1" are extracted from the fields. Separately, the opcodes st.1.0, st.1.1, st.2.0, st.2.1, st.4.0, st.4.1, 1d.1.0, 1d.1.1, 1d.2.0, 1d.2.1, 1d.4.0, 1d.4.1, are defined to be the appropriate opcodes for storing and loading the appropriately sized field of the appropriate sign. For example, on SPARC, "1d.2.1" is defined to be "ldsh" (load a signed half-word). Similarly,
______________________________________
GET.sub.-- FIELD(base,mystruct,myfield, reg)
SET.sub.-- FIELD(value,base,mystruct,myfield)
______________________________________
would be defined to be
______________________________________
GET.sub.-- FIELD.mystruct.myfield(base,reg)
SET.sub.-- FIELD.mystruct.myfield(base,value)
______________________________________
respectively. Hence
______________________________________
GET.sub.-- FIELD(base,mystruct,myfield,reg)
______________________________________
turns into
______________________________________
GET.sub.-- FIELD.myfield.mystruct(base,reg)
______________________________________
which turns into
______________________________________
ld.2.1 ›base+16!,reg
______________________________________
which turns into
______________________________________
ldsh ›base+16!, reg,
______________________________________
which is the desired result of an inheriting language program instruction that has utilized information from a generating language program. FIG. 5 illustrates an alternate apparatus for practicing the invention. FIG. 5 corresponds to FIG. 1, but includes a storage device 80, a communications interface 82, a network link 84, and a network 86. The programs stored in the memory 24 may be downloaded from a computer-readable medium associated with the storage device 80, or alternately, may be executed from the computer-readable medium associated with the storage device 80. The term "computer-readable medium" refers to any medium that participates in providing instructions to the processor 22 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, associated with the storage device 110. Volatile media includes dynamic memory. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 28. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described below, or any other medium from which a computer can read. Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 22 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to the computer system 20 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector coupled to the bus 28 can receive the data carried in the infra-red signal and place the data on bus 28. The bus 28 then carries the data to the memory 24, from which the processor 22 retrieves and executes the instructions. The instructions received by the memory 24 may optionally be stored on the storage device 80 either before or after execution by the processor 22. The computer system 20 also includes a communication interface 82 coupled to the bus 28. The communication interface 82 provides a two-way data communication coupled to a network link 84 that is connected to a network 86. For example, the communication interface 82 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, the communication interface 82 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, the communication interface 82 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information. The network link 84 typically provides data communication through one or more networks, represented by the network 86. For example, the network link 84 may provide a connection to a network 88 that includes a host computer operated as an Internet Service Provider (ISP). The ISP in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet". The network 86 uses electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on the network link 84 and through the communication interface 82, which carry the digital data to and from the computer system 20, are exemplary forms of carrier waves transporting the information. The computer system 20 can send messages and receive data, including program code, through the network 86, the network link 84, and the communication interface 82. In the Internet example, a server on the network 86 may transmit a requested code for an application program through the network 86, the network link 84, and the communication interface 82. The received code may be executed by the processor 22 as it is received and/or stored in the storage device 80, or other non-volatile storage for subsequent execution. In this manner, the computer system 20 may obtain application code in the form of a carrier wave. The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the invention. In other instances, well known circuits and devices are shown in block diagram form in order to avoid unnecessary distraction from the underlying invention. Thus, the foregoing descriptions of specific embodiments of the present invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, obviously many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.
|
Same subclass Same class Consider this |
||||||||||
