Relocation format for linking6859932Abstract An executable program is prepared from a plurality of object code modules, each object code module including section data and associated relocations and at least one of the object code modules further including code sequences at least some of which are like to be repeatedly included in the executable program. Wherever a code sequence is to be inserted, a relocation instruction specifies the location of the code sequence and the code sequence is inserted into the section data at the appropriate point. A linker, a method for assembling, and a computer program product support these operations. Claims What is claimed is: Description FIELD OF THE INVENTION
TABLE 1
Name RC_ARG1 Meaning
RC_PARAM 3 r_arg1 is param
RC_VAL 2 r_arg1 is value
RC_SYM 1 r_arg1 is symbol
RC_UNUSED 0 r_arg1 is unused
The above described new type of relocation section supports a number of special relocations which allow a number of different functions to be performed by the linker. FIG. 3 is a block diagram of components of the linker which will be used to describe these additional functions. It will be appreciated that in practice the linker can be constituted by a suitably programmed microprocessor. It will be understood therefore that the schematic blocks shown in FIG. 3 are for the purposes of explaining the functionality of the linker. The linker comprises a module reader 10 which reads a set of incoming object files as user written code modules and library object files from the library 6. A relocation module 12 reads the relocations in the object code module. A section data module 14 holds section data from the object code module and allows patching to take place in response to relocation instructions in the object code module interpreted by the relocation module 12. The relocation module can also interpret special relocations and apply these to the section data held in the section data module 14. A program former 20 receives sequences from the section data module 14 and/or the library 18 depending on the actions taken by the relocation module 12 and forms the executable program 5 which is output from the linker 4. The linker also includes a condition evaluator 22 which operates in conjunction with a stack-type store 24. The condition evaluator reads the value of the top entry of the stack 24. The linker also implements three arrays or tables as follows, a parameter array 16, a symbol table 17 and a condition array 26. Before describing more specifically each of the above new relocations, the basic operation of forming an executable by a linker is summarised below. The basic operation comprises: 1. copying sections from input modules to same-name sections in the output executable, and 2. patching sections following the relocations in their corresponding relocation sections. This includes deleting code sequences from the module (caused by an assembler directive LT_IF, discussed later) and inserting code sequences (caused by a macro call, also discussed later). After step 1, all the branches of the LT_IF . . . LT_ENDIF assembler directives are present in the executable, and the linker is only concerned with deleting unwanted sequences. In the case of link time macro calls, at step 2, it inserts section data from the macro section (discussed later), deleting the requisite marker bytes. The macro section will itself be subject to the same step 2, each time a macro insertion is required. Link Time Calculations The first special relocation type which will be described allows arbitrary calculations to be passed to the linker by way of a number of special relocations which are defined by the reltype field of the new relocation format ELF32_relo. These relocations are numbered 6-29 in Annexe 2. The set of the special relocation types listed in Annexe 2 allow the linker to support a general purpose stack based calculator. These relocations allow the value of symbols and constants to be pushed on the stack 24 and a designated manipulation to be performed. With the bits RC_ARG1 in the class field CF set to RC_UNUSED (see Table 1), binary operators act on the top two stack entries. Otherwise, the value passed and the top of stack (tos) entry are used. Unary operators operate on the top of the stack 24 (tos). Both pop their operands and place the result on the top of the stack. The full definition of the relocation types to support this is given in Annexe 2. There follows examples of their use. Patch symbol plus addend in 16 bit target integer This could be accomplished by the following ordered sequence of relocations. The effect of the sequence is illustrated schematically in FIG. 4. FIG. 4 illustrates section data and its accompanying set of relocations forming part of an object code module 3. The relocations will be read in order from the bottom in FIG. 3. The listed relocations are: R_PUSH symbol /* relocation to push value of symbol on stack */R R_PUSH value /* relocation to push constant value on stack */ R_ADD /* pop top two values off stack add them and push result back */ R_b16s0B2/patch the value popped from the top of stack into the section data, 16 bits are to be patched, starting at bit 0, in target object two byte wide */ all with the same offset (the offset of the integer to be patched in the section). The result of the patch is shown in the section data which forms part of the executable program S. The above relocations are implemented as described in the following with reference to FIGS. 3 and 4. The section data and relocations are read by the module reader 10. The section data is applied to the section data module 14 and the relocations are applied to the relocation module 12. The relocation module considers the first relocation, in this case R_PUSH symbol and acts accordingly to read the required value of the identified symbol from the symbol table 17 and push it onto the stack 24. The subsequent relocations are read, and the necessary action taken with respect to the stack as defined above. Finally, the last bit relocation R_b16s0B2 patches the final result value from the stack 24 into the 16 bit target integer. This patched section data is held in a section data module 14 ready for inclusion in the final program at the program former 20 unless, of course, some later relocations make further modifications prior to completion of linking. As a short-hand any operator can be accompanied by a symbol as one of its operands (the left hand operand in the case of binary operators). In that case the following sequence could be used: R_PUSH value /* relocation to push value on stack */ R_ADD symbol /* pop top add the value of the symbol and push back the result */ R_b16s0B4 /* patch section data, 16 bits, starting at bit 0, in target object four bytes wide */ Although the above are given as examples of use of the stack calculator in the linker, the stack calculator is not actually needed for this calculation since both a symbol and a value could be passed in one normal bit relocation. All that is needed in this case is: R_b16s0B2 symbol value. Nevertheless the example illustrates how the special relocations support a stack based calculator at the linker. The top of stack can also be used for conditional linker relocations as described later. For example, to include section bytes if a symbol has more than 8 bits we could use: R_PUSH symbol R_PUSH 0xffff_ff00 R_AND (the above relocations all have the address field r_offset set equal to the start of the section bytes to be conditionally included) R_ENDIF (with the address field r_offset set equal to end of section bytes to be included+1) (R_ENDIF is discussed later) The relocation R_PUSH can have a number of different effects. With the bits RC_ARG1 set to RC_SYM (i.e. the r_arg1 field acts as a symbol index), the field s1 holds a value to indicate what part of symbol information is to be pushed on the stack. The value held in the S.sub.1 field is indicated in
TABLE 2
Name Meaning Value
SF_NAME st_name 1
SF_VALUE st_value 2
SF_SIZE st_size 3
SF_INFO st_info 4
SF_OTHER st_other 5
SF_INDEX st_shndx 6
Different macro parameter types (MPT) can be passed with the R_PUT_PARAM and R_GET_PARAM relocations. They enable type-checking of the macro call parameters, and allow the linker to distinguish symbol indexes from values. MPT_VALUE denotes a constant value and is denoted by value 0 in the s2 field. MPT_SYMBOL denotes a symbol index and is denoted by value 1 in the s2 field. For a processor having two instruction modes, this artefact can be used to denote the mode of instruction that the symbol labels. Thus, the R_PUSH_ST_OTHER is used to detect at link time if a symbol is mode A or mode B code. The assembler sets s1 to mask off the STO_MODE_A bit in the symbol's st_other field. The linker pushes the bitwise AND of s1 and the st_other field on the internal linker stack. This can be used by the linker conditional relocations to insert the appropriate calling sequences. Conditional Section Data Another set of the special relocations allow code sequences to be conditionally included in a final executable program. For now, it is assumed that all the possible alternative sequences are included in the section data of the object code module which the linker is currently examining. It will become clear later, however, that other possibilities exist for the alternative sequences. A method of conditionally including one sequence out of a number of alternatives in the section data will now be described with reference to FIGS. 3 and 5. The assembler 2 acts on Conditional Assembler directives to generate special relocations which instruct the linker to conditionally delete unwanted section data. FIG. 5 shows how a resulting object module comprises a set of sections, each section comprising a plurality of code sequences O1,O2,o3 each having a relocation section R1,R2,R3 generated by the assembler. The section data .xxx is shown in FIG. 5 with its relocations R1,R2,R3 in the relocation section .relo.xxx. The relocation bracket between them R_IF and R_END IF relocations to denote the respective offsets defining the code sequences in the section data. An example sequence is illustrated in FIG. 5. The relocation sections are read by the relocation module 12 of the linker 4 to determine how to patch the section data to form a program. According to this embodiment relocation sequences are included in the relocation section associated with each code sequence in the section data to denote that a sequence may be conditionally deleted in the program depending on the top of stack value determined by the previous stack manipulations done by the linker. These relocations compute the conditions to be evaluated, using the symbols or values in the section data. In FIG. 5, code sequences O1,o2,O3 are alternative sequences for possible deletion in the final module. Thus, the final executable program 5 might include sequence O2 only, sequences O1,O3 having been deleted by the linker because of the relocations R1,R3. In that case, sequence O2 has been "patched" (i.e. not deleted) using relocations in R2. At link time the relocation module 12 makes multiple passes over the section's relocations recording which conditional passages are included. These are held in the section data module 14 while the condition evaluator 22 evaluates the condition by examining the top of stack. The conditions for inclusion are based on the values of symbols and, since some of these will be forward references to labels in the same section, the result of a given conditional expression may change on the next pass. For this reason multiple passes are required until no more changes are needed. In order to support the conditional section relocation, a number of new Assembler Directives are required as follows. At These cause certain special relocations to be issued as described later: LT_IF expr Marks the start of a block of section data to be conditionally deleted. The condition is that expr should evaluate non-zero. The assembler issues the stack manipulation relocation 6-29 in Annexe 2 to push expr on the linker stack 24 and an R_IF relocation. LT_ELSE Marks the start of block of section data to be conditionally inserted/deleted. The condition is the previous LT_IF at the same level of nesting evaluated as zero. The assembler issues an R_ELSE relocation. LT_CONDITION condition_name expr The assembler issues the relocations to calculate the expr (that is, expr is on top of the stack). If condition_name has already appeared in an LT_CONDITION directive then the index associated with it is re-used. Otherwise the next unused index is chosen (initially 0). The assembler then issues R_STORE with that index. In this way, the condition array 26 can be constructed. After the condition_name has been associated with an index in this way it can be used in an expression in place of a constant or symbol. When used, the assembler issues R_FETCH with the index associated with condition_name. That index is used to address the condition array 26. The scope of condition_name is the section where the LT_CONDITION directive occurs, from its point of first occurrence. LT_ENDIF Marks where normal linker processing re-starts after an LT_IF/LT_ELSE/LT_IF_FIXED (described later) directive. The assembler issues an R_ENDIF relocation. The following are the special relocations used to support conditional section data deletions, which are issued by the assembler responsive to the conditional Assembler Directives. R_IF Causes the top entry to be popped from the linker's stack of values. If the value is zero then section data is skipped and the succeeding relocations are ignored until R_ELSE/R_ENDIF is encountered. If the value is non-zero then relocations are processed and instructions are not deleted until R_ELSE/R_ENDIF is encountered. R_ENDIF Defines the end of the relocations subject to the R_IF relocation, and of section data to be conditionally deleted subject to the R_IF relocation. R_ELSE If this is encountered while section data is being taken then section data is skipped and the succeeding relocations are ignored until R_ENDIF is encountered. If encountered while skipping due to R_IF then relocations are processed and instructions are no longer deleted until R_ENDIF is encountered. R_STORE index A value is popped from the linker's stack of values. It is put in the condition array 26 kept by the linker for this purpose. The value is stored at the index passed with the relocation (in the nonbit.subtype field) . This relocation avoids the overhead of passing the same calculation to the linker many time over. R_FETCH index A value is pushed on the linker's stack of values. The value pushed is the value in the condition array 26 at the index passed with the relocation. Link Time (LT) Macros Reference will now be made to FIGS. 3 and 6 to describe link time macros. Link time macros contain parameterizable code sequences M1,M2 etc that are presented to the linker just once, in a section of the object code module reserved for this purpose. This section has the name macro pre-defined for it. Code for the macro section is created by the assembler exactly as for other sections from user written source code. The macro section provides code sequences which may optionally be included in the final program. As mentioned earlier, the most useful optimizations may be stored in macro sections in object files in the standard library 6 delivered with the toolchain. The macro code extends the possibilities for optimization. Associated with each macro section macro is a relocation section (.relo.macro) MR which contains the relocations generated by the assembler for the macro section. A .relo.macro section can contain relocations that patch in parameters to its macro section. It also contains relocations which determine conditions to establish which macro code sequences are included in the final executable program. The object code module includes a symbol section holding symbols which allow values to be accessed by relocations. As a matter of terminology we will call relocatable sections which are not the .macro section ordinary sections. One such section is labelled section.xxx in FIG. 6. It includes alternative code sequences labelled O1,O2 in FIG. 6, each with an associated relocation R1,R2,R3 in the relocation section .relo.xxx. Link time macros are created by a programmer and included in the source code module. A link time macro is invoked by naming a symbol defined in the .macro section at the inserting location IL in the ordinary section .xxx where the optimizable sequence is required. The parameters are also specified. These are done by two relocations R_PUT_PARAM and R_MACRO_CALL discussed later which are generated by the assembler. Invocation of a macro section by the assembler is achieved by generating the macro call relocation R_MACRO_CALL<symbol>in the ordinary section relocations, e.g. before R1 in FIG. 6. In one embodiment, the assembler also plants a marker byte MB at the insertion location IL in the section data thus ensuring that the inserted code sequences have a distinct address. The linker 4 implements a macro call relocation by opening the macro section M and its related .relo.macro section MR. The symbol identified in the macro call relocation accesses the symbol section which holds at that symbol an offset identifying a location in the macro section. The relocation module 12 first locates this offset in the object code module 3 and verifies that there is a link time macro starting at that offset with the correct macro name. In FIG. 6, M1 is specified. The relocation module 12 then traverses the relo.macro section starting at the R_START_MACRO until it encounters the end of macro relocation R_EXIT_MACRO. The macro section includes a number of alternative code sequences, each associated with conditional expressions embodied in the relocations in the MR section. The linker skips over any code sequences (and associated relocations) for which conditional linker expressions evaluate as false (as described earlier) Code sequences not skipped are to be inserted in the ordinary section replacing the marker byte(s) MB. Before being inserted these macro section bytes will be relocated themselves, taking into account their destination address in the ordinary section. If the same link time macro is invoked at multiple locations in the ordinary section then that part of the .macro section will be relocated multiple times with different values for the program counter at the start of the macro sequence depending on where it is being inserted in the ordinary section. Linker optimization involves multiple passes over the relocations since the value of symbols change as code is inserted, and some symbols will be forward references. Those that are forward references will change, and so invalidate any uses of that symbol earlier in the same pass. For this reason it is necessary to continue making passes through the ordinary section applying relocations until the values of the symbols have stabilized. The effect of this after linking is to provide in the final executable program 5 at the marked location IL in the ordinary section data .xxx a set of the macro code sequences (e.g. M1 in FIG. 6) drawn from the macro in the macro section between the offset identified in an R_START_MACRO relocation and that specified in the R_EXIT_MACRO relocation. In order to support link time macros, a number of new Assembler Directives are required as follows. These cause macro sections and macro relocations to be invoked as described later. In the macro section. LT_IF FIXED As LT_IF except that instead of passing a boolean expression expr, the condition is internal to the linker optimization process. The condition is normally false but becomes true when the linker requires a fixed length insert. The assembler issues an R_IF-FIXED relocation. LT_DEFMACRO macro_name(<param_type>param_name[,<paramtype>param_name]) [:maxbytes[:align[:sched_info]]] This directive introduces a link time macro definition. The macro_name should be the label at the first instruction of the macro body. The param_names are the formal parameters used in the body of the macro. The assembler emits R_GET_PARAM for each occurrence of a formal parameter in an expression in the body of the macro. The param_type associated with the formal parameter is passed with the relocation R_GET_PARAM. The assembler emits R_START_MACRO at this point. The integers maxbytes and align (or zero if they are absent) are encoded in the subtype fields of the R_START_MACRO relocation. The sched_info field is used by the assembler for optimizing. This value is passed in the r_arg1 field and any value mismatch between the call and caller is reported by the linker unless sched_info is zero. LT_ENDMACRO Marks the end of the macro body. The assembler emits R_EXIT_MACRO at this point. In ordinary sections. LT_DECLMACRO macro_name(<param_type>[,<param_type>]) [:sched_info] The name of the macro and the types of the parameters that it expects are given in the directive. The link time macro name hides any mnemonic of the same name and is hidden by any assembler macro of the same name. The R_MACRO_CALL relocation is issued. The value sched_info is passed in the r_arg2 field of the macro call relocation (0 if not specified). As an alternative to macro code being written in the object code module itself, it can be supplied in an object file within the toolchain library 6. A link time (LT) macro invocation is signalled to the assembler by the syntax: macro_name[param[,param] . . . ] [:maxbytes[:align[:sched_info]]] For each parameter the assembler emits a relocation R_PUT_PARAM for the parameter with index values 0,1, . . . etc. The assembler then emits the R_MACRO_CALL relocation with the symbol macro_name. The meaning of the macro invocation is that a LT selected sequence of instructions is to be inserted by the linker at this point in the code. LT macro invocation is allowed only in ordinary sections. The integers maxbytes, align, optionally passed in the macro call, enable error checking between the macro call and its instantiation. They are encoded into the subtype fields of the R_MACRO_CALL relocation. They are also used by the assembler to determine the maximum number of bytes that the macro call will generate, and the alignment (i.e. any guarantees about the low order bits of the macro length being zero). The integer sched_info must match any value given in the corresponding declaration. It is passed to the linker in the r_arg2 field. It contains architecture specific information about the kind of instructions contained in the macro (used by the assembler for scheduling). A value of zero for any of these means no information is provided, and link time checking is turned off. Relocations for .macro Sections R_IF_FIXED This is like R_IF except that instead of popping a value from the stack, the condition is whether the linker is attempting to optimize. The linker will not be attempting to optimize if the code is marked as not optimizable, or if after several passes the macro is oscillating in size. For this purpose the linker maintains a condition flag. R_START_MACRO The linker seeks this relocation at the offset labelled by the macro name (relocations prior to this one are not processed) It is an error for this macro to appear more than once at one offset in a .macro section. R_GET_PARAM index This relocation conveys in its r.nonbit.subtype1 field s1 an index for accessing the parameter array 16. The linker reads the index'th parameter from its parameter array 16. The interpretation of this parameter depends on the RC_ARG1 bit in the r_class field (see Table 3). If this is set then the parameter is an index into the symbol table 17 and the symbol's value is pushed on to the linker's stack 24 of values. Otherwise the value itself is pushed. In all cases the nonbit.subtype2 field s2 is checked for type mis-match with the value stored in the parameter array at the index passed. R_EXIT_MACRO The linker stops inserting bytes/processing relocations from the .macro section. It discards the parameter array and then the macro invocation terminates. Relocations for Ordinary Sections R_PUT_PARAM index An index is passed in the r.nonbit.subtype1 field s1. The value in the r_arg1 field is stored by the linker in the parameter array 16 at this index. The linker also stores the value of the r.nonbit.subtype2 field s2 of this relocation along with the parameter. This enables the linker to perform type checking when R_GET_PARAM is encountered. R_MACRO_CALL symbol The symbol specifies an offset in the macro section. The relocations in .relo.macro are traversed from the R_START_MACRO at that offset until R_EXIT_MACRO is processed. Section data from the macro section are inserted in the section at the location of the R_MACRO_CALL relocation. This relocation is only found inside relocation sections of ordinary sections. Generally multiple passes are required through the relocations for values to stabilize. The linker will store the current number of bytes patched by the R_MACRO_CALL relocation with that relocation. There may be circumstances where the optimization would not terminate because of a macro relocation oscillating in size indefinitely. If this happens the linker will start patching such macros with the condition "fixed size" true, so that the number of bytes patched-in stays constant from one pass to the next. The fixed size condition is checked for by the R_IF_FIXED relocation. There follows an example of how to write a link time macro. The parts in the FIXED FONT are the actual sample assembler file for a link time macro. In between is commentary in normal font. SECTION .macro A link time macro is defined by the directive LT_DEFMACRO, for example a macro with a symbol parameter would be defined: LT_DEFMACRO const_load(.SYM s) The name of the macro must label the start of the sequence of instructions to be inserted and be exported, thus: EXPORT const_load const_load: Directives are written to instruct the linker to insert some of the subsequent instructions until the LT_ENDMACRO directive is reached. The alternatives are selected by expressions involving the parameters to the macro. For example: T_IF (s =<0xFFFF) MOVI s, R.phi. LT_IF_FIXED NOP; to pad out the code to a fixed length when not optimizing LT_ENDIF LT_ENDIF LT_IF (a>0xFFFF) MOVI (s>>16), R.phi. SHORI (s&0.times.FFFF), R.phi. LT_ENDIF LT_ENDMACRO From an ordinary section the link time macro would be declared to the assembler and then invoked as follows: SECTION .text, AX LT_DECLMACRO const_load(.SYM); declaration of the macro and its parameter type. For a symbol the type is MPT_SYMBOL. IMPORT fred; fred is unknown until link time . . . const_load fred; call link time macro to load value of symbol fred into register R0. The assembler emits a single marker byte into the section data. It is instructive to write out this example with the assembler generated Elf side by side, see Tables 3 and 4.
TABLE 3
Assembler source Relocations generated in .relo.macro
SECTION .macro .macro and .relo.macro sections are
created
LT_DEFMACRO R_START_MACRO
const_load (.SYM s)
EXPORT const_load const_load is put in the Elf symbol
const_load: table as global
LT_IF (s =<0xFFFF) R_PUSH 0xFFFF
R_GET_PARAM index=0
type=MPT_SYMBOL
R_LE
R_IF
MOVI s, R.phi. (program counter advances)
R_GET_PARAM index=0
type=MPT_SYMBOL
R_b16s5B4
LT_IF_FIXED R_IF_FIXED
NOP (program counter advances)
LT_ENDIF R_ENDIF
LT_ENDIF R_ENDIF
LT-IF (s > 0XFFFF) R_PUSH 0XFFFF
R_GET_PARAM index=0
type=MPT_SYMBOL
R_GT
R_IF
MOVI (s<<16), R.phi. (program counter advances)
R_GET_PARAM index=0
type=MPT_SYMBOL
R_PUSH 16
R_SHR
R_b16s5B4
SHORI (s&0xFFFF), R.phi. (program counter advances)
R_GET_PARAM index=0
type=MPT_SYMBOL
R_PUSH 0xFFFF
R_AND
R_b16s5B4
LT_ENDIF R_ENDIF
LT_ENDMACRO R_EXIT_MACRO
|
Same subclass Same class Consider this |
||||||||||
