Object-oriented symbolic debugger using a compiler driven database and state modeling to control program execution5812850Abstract A human oriented object programming system provides an interactive and dynamic process for debugging computer programs which facilitates the development of complex computer programs such as operating systems and large applications with graphic user interfaces (GUIs). The debugging system uses a database of information relating machine executable code to source code. The database is developed during the compilation process using an extensible object-oriented set of tools. The tools standardize the information developed during compilation into an information format which the debugging system can utilize to provide the user with a powerful source code view of the corresponding executing code. Claims Having thus described our invention, what we claim as new, and desire to secure by Letters Patent is as follows: Description COPYRIGHT NOTIFICATION
______________________________________
int Foo(char a)
bool yes = a== `Y`;
if (yes) {
printf ("%c = OK\n", a);
return yes;
}
else {
printf ("%c = NO\n", a);
return yes;
}
}
______________________________________
A sample program which returns a "1" if the parameter is equal to a "Y" and a "0" otherwise. This program is utilized to elaborate on how maps are used in a preferred embodiment. A user may set a breakpoint on the first "printf" statement, but not the second to determine flow of control through the program as it executes. Maps are used to manage the breakpoint execution as the code is optimized by the compiler. INSTRUCTION MAP FIG. 7 is a block diagram providing an overview of the Instruction Map 62 generated by the disclosed system, and the Instruction Map entries 64 which make up the Instruction Map 62. The Instruction Map 62 provides detailed knowledge of how each executable machine instruction relates to the original source code which caused the instruction to be created. The following describes, from the provider's perspective, a PCSourceMapElement entry 64 in the map 62.
______________________________________
struct PCSourceMapElement
TPCLocation
fPcLocation;
TSourceReference*
fSourceReference;
TOptimizations*
fOptimization;
PCAttributes
fPCAttributes;
PCSourceMapElement();
.about.PCSourceMapElement();
};
______________________________________
FIG. 7 shows each element of the map. Each map entry 64 in the map 62, consists of the following: 1. An instruction index range, of type TPCLocation: This indicates the set of instructions being described by the entry. The instruction index range typically consists of two numbers: a start and a length. The start indicates the first instruction covered by the map entry. The length indicates how many instructions, beginning with the starting instruction, are covered by the map entry. It is not necessary that a start and length be used. For instance, a set could be used, where each member of the set corresponds to an instruction in the program. It is also contemplated that the compressed forms of the start/length information could be used. For instance, the start could be implied to be the first instruction following the previous map entry, and only the length would be encoded. The particular start/length form could be chosen in order to complement the particular characteristics of the information provider (e.g. the compiler). 2. A source reference, of type TSourceReference: This describes a selection of text which, according to the provider, was responsible for the generation of the instructions covered by this entry. Source references are described in further detail below, but a wide variety of methods of referencing source are provided. For instance, the developer can reference a contiguous set of characters from some text, or have a set of character ranges, or use a token reference (see below--"The Token Map"), and so forth. 3. A set of attributes, of type PCAttributes: The current set of attributes defined includes: In.sub.-- Prologue, In.sub.-- Epilogue, Frame.sub.-- Not.sub.-- Constructed, Frame.sub.-- Destructed, and Characteristic.sub.-- Location. Instructions marked with In.sub.-- Prologue are involved in setting up a function for execution, such as saving registers, establishing a frame, or making copies of actual parameters. They usually map to some predefined source position, such as the opening brace in a C or C++ function. Instructions marked with In.sub.-- Epilogue are involved in finishing up a function's execution, such as restoring registers, destroying a frame, or copying a result to the caller. They usually map to some predefined source position, such as the closing brace in a C or C++ function. Instructions marked with Frame.sub.-- Not.sub.-- Constructed are usually part of the prologue sequence. The marking indicates that any frame for the function has not been set up yet. The instruction(s) that actually set up the frame also will be marked with this attribute. Keep in mind that instruction attributes describe the state of things if one were to stop on that particular instruction, before that instruction was executed. So, Frame.sub.-- Not.sub.-- Constructed would be reset (i.e., not set) beginning with the instruction after the frame is constructed. Instructions marked with Frame.sub.-- Destructed are usually part of the epilogue sequence. The marking indicates that any frame for the function has been thrown away. The instruction(s) that actually throw away the frame (e.g. restore a frame pointer to the caller's value) will not be marked with this attribute, but the instructions after would. The Characteristic.sub.-- Location attribute is used to note that the instruction corresponds to an interesting breakpoint location for some statement. It is one way of making such an association. The preferred way of noting breakpoint locations for a statement is to make the annotation directly in the Statement Map 68. 4. A list of Optimization Annotations, derived from type TOptimization: The section below, entitled "Code Optimization Annotations" describes the specific annotation entries that can appear in this list. If an instruction maps to several different source references, then a separate PCSourceMapElement entry could be created for each mapping, but use of Optimization Annotations for this case would probably be more desirable. It is important to keep in mind that the discussion herein is describing the information provider's interface to the symbolic information component of the Debugger. The Debugger reformats any provided information into a form more suitable and possibly more compact, but this is all hidden from the provider. In a sample Instruction Map associated with the sample program, as shown in FIG. 8, common subexpression elimination and common tail merge optimization by the compiler make it difficult to set a breakpoint on one of the "printf" statements, but not the other "printf" statement. FIG. 8 is an Instruction Map in accordance with a preferred embodiment of the invention. The location 501 is a number representative of the op code of the machine instruction corresponding to an instruction in the sample program set forth above. The attributes 502 defined in a preferred embodiment includes: In.sub.-- Prologue, In.sub.-- Epilogue, Frame.sub.-- Not.sub.-- Constructed and Frame.sub.-- Destructed. Instructions marked with In.sub.-- Prologue are involved in setting up a function for execution, such as saving registers, establishing a frame, or making copies of actual parameters. The instructions map to some predefined source position, such as the opening brace in a C or C++ function. Instructions marked with In.sub.-- Epilogue are involved in finishing up a function's execution, such as restoring registers, destroying a frame, or copying a result to the caller. The instructions map to some predefined source position, such as the closing brace in a C or C++ function. Instructions marked with Frame.sub.-- Not.sub.-- Constructed are usually part of the prologue sequence. The marking indicates that any frame for the function has not been set up yet. The instruction(s) that actually set up the frame also will be marked with this attribute. Instruction attributes describe the state of things if stopped on that particular instruction, before that instruction was executed. So, Frame.sub.-- Not.sub.-- Constructed would be unset (i.e., not set) beginning with the instruction after the frame is constructed. Instructions marked with Frame.sub.-- Destructed are usually part of the epilogue sequence. The marking indicates that any frame for the function has been thrown away. The instruction(s) that actually throw away the frame (e.g., restore a frame pointer to the caller's value) will not be marked with this attribute, but the instructions after would. In FIG. 8, PF 506 the P is shorthand for the In.sub.-- Prologue attribute as discussed above and the F is shorthand for a Frame.sub.-- Not.sub.-- Constructed attribute. In ED 507, the E is shorthand for In.sub.-- Epilogue attribute and the D is shorthand for the Frame.sub.-- Destructed attribute. An Optimization Annotation 503 describes if the instruction is affected by optimization performed by the compiler. Source 504 corresponds to the source code related to the instruction according to the compiler. The instruction area 505 is not part of the actual map in accordance with a preferred embodiment, but is included as an aid for clarifying the invention to assist in understanding the preferred embodiment. The instructions 505 are for a hypothetical machine and loosely correspond to the sample program set forth above. The remaining elements of FIG. 8 will be discussed below in the discussion of FIGS. 9 and 10. FIG. 9 illustrates a frame in accordance with a preferred embodiment at the time the sample program is invoked. A frame, such as Frame 521, is a range of memory locations in a computer containing parameters passed to a function, return addresses, saved register values, values for program variables and locations to store expression values as they are computed. A frame is typically accessed via one or two logical registers: a StackPointer 522 pointing to the beginning of the frame's stack in memory and a FramePointer 523 pointing to the beginning of the frame in memory. In FIG. 10 (corresponding to label 511 of FIG. 8), the old FramePointer 523 value is pushed onto the stack, and at label 512, the FramePointer 523 is set equal to the StackPointer 522 to create a new frame. FIG. 10 corresponds to the creation of a frame after execution of instruction 512 in FIG. 8. As shown in FIG. 10, Frame Pointer 523 points to Stackpointer 522, which points to a saved Framepointer. As discussed above, instructions 511 and 512 (FIG. 8) are prologue code that initialize the environment for the sample program's execution. This corresponds to the "P" attributes shown at 506 as discussed above. Until the instruction at 512 has been executed, the frame for the sample program has not been initialized, which is captured in the map with the "F" attributes 506. When a function is about to end, it must restore the caller's frame and return any resulting value. These functions are typically performed in the epilogue, which correspond to the instructions at 513 and 514 of FIG. 8, which both have the "E" attribute 507. Also, the instruction at label 513 restores the caller's frame value. So at label 514, the sample program's frame has been destroyed as indicated by the "D" attribute 507. The compiler indicates the sample program's source which best motivates the generation of the instructions 505. The Source 504 could correspond to a character range in the actual program listing or could refer to a set of tokens in a Token Map. One skilled in the art could envision many other ways to map instructions to source without departing from the claimed invention. The Code Optimization Annotation 503 contains zero or more Code Optimization Annotations for each instruction. The Code Optimization Annotation 515 indicates that the common operation corresponding to passing a variable "a" to a "printf" routine in the sample program is performed only once even though there are two statements in the sample program that execute this code. Code Optimization Annotation 516 indicates that many of the instructions necessary for invoking the first "printf" routine corresponds to the second "printf" invocation. Likewise Code Optimization 517 and 518 indicates that much of the code in the second "printf" invocation corresponds to the first. ADOPTPCSOURCEMAP FIG. 11 is a block diagram of the function AdoptPCSourceMap 129 for asking the information reader to adopt a particular format of information from a provider. The provider must put together a list of PCSourceMapElement entries 130 and ask the Debugger to adopt the information which describes a component in the Debugger Database. The following shows the Adopt function interface:
______________________________________
void AdoptPCSourceMap (const THoopsPropertyName name,
TPCSourceMapList*
pcSourceMap);
______________________________________
There is a different map kept to describe the Interface v. the Implementation properties of a component, so the developer must supply a "name" parameter to indicate which property the map describes. THE TOKEN MAP FIG. 12 is a block diagram showing a general overview of Token Map 66, and details of the Token Map entry objects 78. The Token Map 66 provides an indication of contiguous strings of characters in the source program. The entries 78 correspond to "tokens" in the source program's language, like identifier names, integer values, strings, keywords, arithmetic operators, etc. The following describes, from the provider's perspective, an entry in the Token Map 66:
______________________________________
TLanguageToken (const Language TokenID
id,
const TSourceElement&
sourceElement,
const ComponentIdentifier
component,
ELanguageTokenKind kind=
kNoTokenKind,
LanguageTypeID typeHandle=
kNoSemanticType,
const LanguageTokenID
dataMapHandle=
KNoDataMapEntry);
______________________________________
Each entry 78 in the Token Map 66, consists of the following, shown descriptively in FIG. 12: 1. An identifier, of type LanguageTokenID: This is a positive number, a key, which uniquely defines an entry in the Token Map 66. 2. A source reference, of type TSourceElement: This describes a selection of text which, according to the provider, corresponds to a single interesting token. A TSourceElement provides a character range, an indication of the text for which the range applies. There are also TSourceElement objects available for implicit text which may not actually appear in the source, such as the C++ implicit variable "this". 3. A kind, of type ELanguageTokenKind. A provider-specific enumeration which categorizes the token. It might break out tokens into, say, identifiers, literals, keywords, operators, etc. 4. A Type Map handle, of type LanguageTypeID: For tokens which correspond to objects which can take on a value, or which represent a type name in the source language, this is a handle or key to an entry in the Type Map 72 (see the section regarding "The Type Map") describing that type of the token. 5. A Data Map handle, of type LanguageTokenID: For tokens which correspond to objects which can take on a value, this is a handle, or key, to an entry in the Data Map 70 describing how to read and write that value. Ideally, all characters which are not spaces, tabs, linefeeds or similar positioning characters should be covered by a Token Map entry. Tools like a pretty printer could conceivably then operate on the displayed text of a component by simply adjusting the source position of elements in the Token Map 66 without affecting any other clients of the symbolic data for a component. Other tools and maps should then reference elements of the source component by entries in the Token Map 66. The provider must put together a list of TLanguageToken entries and ask the Debugger to adopt the information which describes a component in the cpProfessional Database. The following shows the adopt function interface:
______________________________________
void AdoptTokenMap (const THoopsPropertyName name
TLanguageTokenSetList*
tokenMap);
______________________________________
There is a different map kept to describe the interface v. the implementation properties of a component, so the developer must supply a "name" parameter to indicate which property the map describes. FIG. 13 is an example Token Map in accordance with a preferred embodiment. The Key 610 is a unique number corresponding to each entry in the Token Map. The Source 620 references characters described by the Token Map entry uniquely identified by the Key 610. The Kind 630 is a language specific enumeration for each possible token in the computer language. The Type Map Key 640 value corresponds to a Key in the Type Map if there is an entry in the Type Map. Each entry in a Type Map describes everything necessary to interpret a value for the program. The number of bits of storage required to store a value whether a scalar, array or other language type. If a scalar, whether signed or unsigned. A Data Map Key 650 corresponds to the unique identifier Key in the Data Map which is illustrated in FIG. 15 at 651. THE TYPE MAP The Type Map 72 will not be discussed in detail. Entries in the Token Map 66 contain a handle, or key, which correspond to an entry in the corresponding Type Map 72. The provider must create a Type Map entry to satisfy each handle referenced in the Token Map 66 except for references to standard, or "built-in" types, such as various kinds of integer or floating point types. These built-in types are referenced by special, reserved Type Map handles. Each entry in the Type Map 72 describes, in a language-neutral way, a value type. There are subclasses which describe integer, floating point, pointer, reference, array, subprogram, etc. types. Each entry describes the bit size of an object of the type. Scalar type entries describe if and how the value sign is formed, the minimum and maximum values for the type, and so forth. Enumeration types include a list mapping each valid value to a value name. Array types indicate whether the index bounds are static or dynamic, and how to get the bounds. This is just a slice of the kind of information contained for each entry in the Type Map 72. DATA MAP FIG. 14 is a block diagram showing a general overview of the Data Map 70 component of the Debugger Database in accordance with the principles of the present invention. Anything that can take on a value in a program might be described by an entry in the Data Map 70. Entries might correspond to variables and constants in a program, parameters of a function, or implicit value-taking objects, such as the implicit "this" variable in C++ objects. The following describes, from the provider's perspective, an entry in the Data Map 70:
______________________________________
struct PCDataMapElement
TPCLocation FPcLocation;
TSourceReference*
fSourceReference;
TDataLocation*
fDataLocation;
TAccessFormula*
fReadFormula;
TAccessFormula*
fWriteFormula
TOptimizationList*
fOptimization;
PCDataMapE1ement();
.about.PCDataMapElement();
};
______________________________________
Each entry 82 in the Data Map 70, consists of the following: 1. An instruction index range, of type TPCLocation: This indicates the set of instructions being described by this entry. The indicated variable is live whenever the program is about to execute any of the instructions in this range. The value typically consists of two numbers: a start and a length. The start indicates the first instruction covered by the map entry. The length indicates how many instructions, beginning with the starting instruction, are covered by the map entry. See the above discussion regarding "The Instruction Map" for possible encodings of this field. 2. A source reference, of type TSourceReference: This describes a selection of text corresponding to the data object described by this entry. Source references are further described below under "Source references." The preferred source reference method is to name an entry in the Token Map 66 corresponding to the data object described. 3. A description of the live location of the variable, of type TDataLocation: A data location provides a formula which is to be interpreted in the context of the executing program when it has paused at any instruction in the instruction range covered by this map entry. It can refer to absolute addresses, or registers in some frame context. It can have additional operators and operands to add offsets to an address, get the contents of memory pointed-at by an address, get a range of bits from a value, etc. Data locations are further described below in "Data locations." 4. A description of how to read the value, of type TAccessFormula: In most cases, a value is simply contained in the bits described by the data location just described. With some types of optimization, however, the value might be part of an expression which can be decomposed to extract the desired value. For instance, a loop control variable might be embedded in an induction variable, a variable which points at some array element indexed by the loop control variable. The induction variable then contains the address of the "i-th" array element, where "i" is the loop control variable. The read access formula could then indicate that to get the value of "i" from the induction variable, one must first subtract the base address of the array, and then divide by the array element size. Access formulas are further described below in "Access formulas." 5. A description of how to write the value, of type TAccessFormula: Like the read access formula just described, but this describes how to create the needed expression value to store when it is desired to update the value of the data object corresponding to this map entry. For the induction variable example above, the formula would indicate that the new value for the data object would be multiplied by the size of an array element, and added in the base address of the array. The result is the real value that must be written to the data location. See "Access formulas" for further description of access formulas. 6. A list of Optimization Annotations, of type TOptimization: The section below entitled "Data Optimization Annotations" describes the specific annotation entries that can appear in this list. The provider must put together a list of PCDataMapElement entries and ask the Debugger to adopt the information which describes a component in the Debugger Database. The following shows the adopt function interface:
______________________________________
void AdoptPCDataMap (const THoopsPropertyName name,
TPCDataMapList*
pcDataMap);
______________________________________
There is a different map kept to describe the interface v. the implementation properties of a component, so the developer must supply a "name" parameter to indicate which property the map describes. FIG. 15 is an example of a Data Map in accordance with a preferred embodiment. The Key in the Data Map 651 is a unique number corresponding to each entry in the Data map. The Variable 652 corresponds to a variable in the source program and although not required, it typically references an entry in the Token Map. The PC Location 653 indicates a range of instructions in the Instruction Map (FIG. 8) at label 501 for which the Variable 652 is live. Live refers to a variable that has storage allocated for its use. Location 657 is an address or register corresponding to the storage allocated for a variable's use. Read Formula 654 defines how a variable is read from location 657. Write Formula 655 defines how a variable is written to a location 657. Optimization Annotation 656 corresponds to a data Optimization Annotation describing any optimizations performed by the compiler. In the sample program, the functions return value is always equal to the variable "yes", so Data Optimization 658 indicates that the function's return value has a shared lifetime with variable "yes". By shared lifetime, changing the value of one variable necessarily entails a mirrored change to the other variable. Similarly, data optimization 659 indicates that the "yes" variable has a shared lifetime with return value for function "Foo". STATEMENT MAP FIG. 16 is a block diagram showing a general overview of the Statement Map 68 and details of the Statement Map entries which make up the Statement Map 68. The term "statement" should in this context should be explained to avoid misinterpretation. Each entry in this map corresponds to something that the provider considers to be an interesting statement-like entity. It might not only be the usual notion of a statement in the source language, such as a switch statement, an assignment statement, or even a complete function, but it might correspond to interesting elements, like an expression, a label, etc. Statements are related to each other such that they form a tree description of the source program. The root node of the tree corresponds to a complete compilation unit. Each child of the root represents statements Immediately contained in that root. Each of those children can also have children corresponding to, say, statements contained within a block statement. The following describes, from a provider's perspective, an entry 80 in the Statement Map 68:
______________________________________
TLanguageStatement
(const LanguageStatementID id,
const TSourceReference& source,
ELanguageStatementKind
kind= kNoStatementKind,
LanguageStatementID
parentStatement=
kNoStatementID,
TBreakpointList*
breakpointList=NULL);
______________________________________
Each entry 80 in the Statement Map 68 is comprised of the following objects (shown descriptively by number in FIG. 16): 1. An identifier, of type LanguageStatementID: This is a positive number, or key, which uniquely identifies an entry in the Statement Map 68. 2. A source reference, of type TSourceReference: This describes a selection of text which, according to the provider, corresponded to a single interesting "statement". See "Source references" for a description of all the ways the provider can provide a source reference. 3. A kind, of type ELanguageStatementKind: A provider-specific enumeration which breaks out the various kinds of "statements." For example, one language might provide an enumeration which categorizes assignment statements, break statements, goto labels, switch labels, functions names, etc. 4. A parent statement, of type LanguageStatementID: This is a positive number which references another statement which represents the parent in a tree-form view of statements, or is kNoStatementID which indicates that this Statement Map entry is the root of the statement tree. 5. A list of breakpoint locations for the statement, of type TBreakpointList*: Because of certain code optimizations, such as loop unrolling, a single statement may have more than one breakpoint location associated with it. FIG. 17 is an example of a Statement Map in accordance with a preferred embodiment. The Key 670 in the Statement Map is a unique number corresponding to each entry in the Statement Map. The Source 674 corresponds to the text of the Statement in the source program. The Kind 671 is a language sensitive enumeration describing the statement. The Parent 672 refers to an entry in Key 670 or has a value zero assigned to it. The Parent relationship describes the statements in a program in a tree relationship. So, for example, a Statement Map entry whose parent value is zero is the root of the tree. The Breakpoint 673 refers to a Location value 501 in the Instruction Map set forth in FIG. 8 and corresponds to the compiler's advice on the best instruction where a breakpoint should be installed for this statement. A list of breakpoint locations is permissible, depending on optimizations performed by the Compiler. SOURCE REFERENCES FIG. 18 shows the base class Source Reference 84 which provides a variety of methods for referencing text in a program. As shown in FIG. 18, it's possible to describe source text as: 1. A range of characters in some property of a program component: The source reference consists of a reference to a program component, a property within that component, and some indication of a range of characters within that property. Typically, the range is indicated by a start character number and a character string length, but other encodings are possible. 2. A set of character ranges in some property of a program component: The source reference consists of a set of the character ranges just described. This allows the developer to reference noncontiguous strings of characters. 3. An entry in some Token Map 66: The source reference consists of a reference to a program component, and an indication of whether to use the Token Map 66 corresponding to the interface or implementation property of the component, and a key corresponding to the identifier of some entry in the Token Map 66. 4. A range of entries in some Token Map 66: The source reference consists of a reference to a program component, and an indication of whether to use the Token Map 66 corresponding to the interface or implementation property of the component, and a range of keys corresponding to identifiers of entries in the Token Map 66. Typically, the range is indicated by a starting token identifier and a number to tokens in the range, but other encodings are possible. 5. A set of entries in some Token Map 66: The source reference consists of a reference to a program component, and an indication of whether to use the Token Map corresponding to the interface or implementation property of the component, and a set of keys corresponding to identifiers of entries in the Token Map 66. DATA LOCATIONS FIG. 19 is a diagram showing the objects for conveying data locations. Data Locations 86 describe 1) the address of a data value, 2) in some particular context. The context is provided by a Data Map entry which contains the data location description. For most architectures, data locations can be described by one abstract base class and three concrete derived classes. The concrete classes are: 1. TAbsoluteAddressDataLocation: Which is constructed simply by providing an address value. 2. TExternalSymbolDataLocation: Which is constructed simply by providing a reference to a program component. The referenced component will provide a name and any other information, such as a package or library name, necessary for the Debugger to resolve the address of the referenced external symbol. 3. TRegisterIDDataLocation: Which is constructed simply by providing a target architecture-dependent register name. Keep in mind that data locations are interpreted in some context. A context is conceptually equivalent to a list of stack frames. Each frame in the stack consists of a set of registers--including a program counter register. To find the address of an object, a frame in the list is found which contains a program counter value which corresponds to a program counter in the program counter range for a Data Map entry. A global variable is described using a program counter range which encompasses all addresses. The appropriate contextual frame is needed only in the case of a TRegisterIDDataLocation data location, since the value for the register named for that concrete class is the value of a register associated with the contextual frame. For absolute address and external symbol data locations, the contextual frame specifies a process or thread address space. The base TDataLocation class allows the provider to add additional steps which must be taken, starting with the address implied by one of the three concrete classes, to form the final live address of a data object. The base class allows the provider to append a list of offset and indirect operators to the address implied by the base class. So, the provider might say something like: 1) Given the address provided by the base class, add 12. 2) Given the address so far, read an address value from memory and use that new value as the address so far (i.e. perform an "indirect" operation). 3) Add 18 to the address developed so far. 4) Perform another indirect operation. Finally, the provider can specify a bit offset to be applied to the final address. This then defines the bit address of the start of the value of the data object described. ACCESS FORMULAS FIG. 20 shows the objects necessary for conveying access formulas. As described in "The Data Map" section above, the value of a data object may or may not exist simply at the address given for that data object. Instead of the value for object "a", we might find that the value of "a+3" is stored in the location, or the value of an induction variable, etc. An access formula describes the steps to be performed to either extract the data object value from an address, or to build a new value from a data object to be stored at an address. A few predefined access formulas are provided to describe common situations. The first of these is the kAccessFormulaDirect formula 108. This describes the common condition where the value is simply stored at the indicated address. The second of these is the kAccessFormulaTooComplex formula 110. This describes the situation where a variable is live, and at a known address, but the value is embedded in the value at the address in such a way that it is not practical for the provider to describe how to extract the value. For instance, the value, "a+Foo(13)" might be stored in a location described by the Data Map entry for variable "a". The Debugger couldn't determine the value of "a" in the location unless it could determine the value of the function call result, and it may not be possible for the Debugger to call the function "Foo" as the call may have side effects. In all other cases, the access formula is built from combinations of the derived classes which include, but are not limited to: 1. TAccessFormulaBinaryOperator 100: An operator is applied to a read and a write access formula operand to derive a new access formula. Operators include the expected addition, subtraction, multiplication, division, etc. 2. TAccessFormulaUnaryOperator 102: An operator is applied to an access formula operand to derive a new access formula. Operators include the expected negate, logical not, etc. as well as an "address of" operator. 3. TAccessFormulaToken 104: A reference to the value of a variable implied by the Token Map 66 entry, or the address of the variable, if an "address of" unary operator is applied to this kind of access formula. 4. TAccessFormulaLiteral 106: A reference to a self defining, literal value, such as an address, an integer, etc. As an example, an access formula to describe how to extract the value of a loop variable from an induction variable might be constructed as follows: 1. Begin with a kAccessFormulaDirect formula 108. 2. Create a TAccessFormulaToken 104 access formula referencing the array. 3. Create a TAccessFormulaUnaryOperator 102 access formula whose operand is the access formula from step 2 above and whose operator is "address of". 4. Create a TAccessFormulaBinaryOperator 100 whose left operand is the direct access value from step 1 above, whose right operand is the value from step 3 above, and whose operator is subtract. 5. Create a TAccessFormulaLiteral 106 access formula describing the element size of the array. 6. Create a TAccessFormulaBinaryOperator 100 whose left operand is the access formula from step 4 above, whose right operand is the access formula from step 5 above, and whose operator is divide. The access formula resulting from step 6 above would be a possible read access formula for a Data Map entry of a variable whose value was part of an induction variable. CODE OPTIMIZATION ANNOTATIONS FIG. 21 is a diagram showing a code optimization annotation typically used for conveying annotation information in the Database. A single Optimization Annotation class, TCodeOptimization 120, is sufficient to provide information for most common code optimizations. That class encapsulates a source reference (see"Source references" above), a cause or kind indication, and a PCAttributes value. The following shows the class constructor and the needed optimization kind enumeration:
______________________________________
enum ECodeOptimizationKind {
kAlgebraicIdentity,
kCommonSubExpression,
kCommonTailMerge,
kInlineExpansion,
kTailRecursion,
kTemplateInstantiation
};
TCodeOptimization
(ECodeOptimizationKind kind,
TSourceReference sourceReference,
PCAttributes pcAttributes);
______________________________________
The PCAttributes value is required for optimizations such as inline expansion, where there may be a new prologue and epilogue which is related to the expansion. With this Optimization Annotation and by marking the epilogue of the expansion, for instance, it becomes possible, for instance, to show the inline source equivalent when stepping into the expansion code and to allow a "run until about to return" command equivalent to occur. The Debugger would then execute until the epilogue for the expansion was about to be executed. For the constructor shown above, if, for instance, code is merged due to a common subexpression elimination optimization, the Instruction Map entry for the resulting code sequence would have a source reference which would indicate any one of the common subexpressions which were merged, and the list of Optimization Annotations for the instruction would have source references to the other common subexpressions which were merged, with an indication that the optimization was because of common subexpression elimination (i.e., kCommonSubexpression). For another example, if the tails of, say, the true and false parts of an if-statement were merged, then the Instruction Map entry for the resulting code sequence would have a source reference which would indicate one of the two tails which were merged, and the Optimization Annotation for the instruction would have a source reference which would indicate the other tail which was merged, and the annotation would have an indication that the optimization was because of common tail merging (i.e. kCommonTailMerge). As a final example, if a function were inline-expanded at the point of a call, the resulting instructions would all have a source reference to the original call text, and each instruction's Optimization Annotation would have a source reference back to some statement in the inline-expanded function for which the instruction applies. The annotation would indicate an inline expansion optimization (i.e. klnlineExpansion). See"Code optimization and annotations" below for more discussion of how this annotation is used. DATA OPTIMIZATION ANNOTATIONS FIG. 22 is a diagram showing Data Optimization Annotation classes. Two Optimization Annotation classes are sufficient to provide information for most common data optimizations. The classes are: 1. TDataSharedOptimization 122: This describes how two or more data objects share a single live location. So, any attempt to change the value of one must change the value of the others. The class encapsulates a list of references to the other shared live location data objects and an indicator as to the type of optimization which caused the shared live location. 2. TDataConstraintOptimization 124: This describes how the value of a data object must be constrained for proper code execution. For instance, if a compiler asserts that a value is positive, and so can perform a strength reduction operation whereby a divide operator is converted to an arithmetic shiftright operator, then this class can indicate that the data value is constrained to be positive. The class encapsulates an indication of the valid range of values for the data object, and an indicator as to the type of optimization which caused the constraint. The following shows the constructors for the various data Optimization Annotations:
______________________________________
enum EDataOptimizationKind {
kEvenConstraint,
kOddConstraint,
kRangeConstraint,
kSharedLocation
};
TDataSharedOptimization
(TSourceReference sourcereference) ;
TDataConstraintOptimization
(EConstraintKind predefinedKind) ;
TDataConstraintOptimization
(signed long minConstraint, signed long
maxConstraint) ;
TDataConstraintOptimization
(unsigned long minConstraint, unsigned long
maxConstraint) ;
TDataConstraintOptimization
(double minConstraint, double maxConstraint) ;
______________________________________
CODE OPTIMIZATION AND ANNOTATIONS Following is a brief description of several common compiler optimizations and how the Debugger information format deals with them. 1. Algebraic Identity: If a program were to contain, say, the following 4 statements: 1: a=b+c; 2: d=b-d; 3: e=c+d; 4: f=b+c; and, ignoring, for the sake of argument, issues like possible side effects from arithmetic evaluation or operator overloading, then statements 1 and 4 are algebraically identical and no instructions for statement 4 need to be generated. The Instruction Map for the code for statement 1 would have TCodeOptimization 120 Annotations to indicate that the code also was for statement 4. 2. Algebraic simplification: When expressions are factored, such as by eliminated useless "add zero" operations, or applying associative properties to simplify an expression, no Optimization Annotation is needed. The Instruction Map is capable of describing the correct mapping without annotations. 3. Branch merging: When a branch instruction which branches to a second branch instruction is replaced with a branch instruction which branches to the target of the second branch instruction, no Optimization Annotation is needed. The Instruction Map is capable of describing the correct mapping without annotations. 4. Canonical reordering: Canonical reordering of expressions affects only the pattern matching used internally by the compiler. For instance, recognizing that the subexpression "b+e" is identical to the subexpression "e+b" for purposes of detecting common subexpressions is a form of canonical reordering. No Optimization Annotation is needed. 5. Code motion: Code motion usually involves one of two processes. In most cases, loop-invariant code is moved from within a loop to outside. In other cases, code is moved to achieve better instruction scheduling. In both cases, no Optimization Annotations are needed and the normal Instruction Map can describe the effects. 6. Common subexpression elimination: When common subexpressions are detected, a single expression result is computed and the other expression computations are replaced with a reference to the single result. Each of the instructions in the single computation would have a list of TCodeOptimization 120 Annotations which references the source of the other computations which were eliminated. 7. Common tail merging: When the tail statements of, say, the true and false parts of an if-statement are merged as being common code, a TCodeOptimization 120 Annotation for one of the tails will reference the source that is not referenced in the Instruction Map. 8. Constant folding: No Optimization Annotations are needed for constant folding. The Instruction Map is capable of mapping the resulting constant to all the text pieces that went into its computation. 9. Current value: The value for a data object may not be computed at the expected place in a code stream. Instruction scheduling may move the computation to an early position in the code stream. Use/definition analysis may allow the computation to take place in a later position. No Optimization Annotations are required to reflect this state. The normal instruction and Data Maps are sufficient to describe this affect. 10. Data constraint: If the compiler generates code based upon knowledge about a constraint in a data object's value beyond that implied by the object's type, it notes the value constraint via a TDataConstraintOptimization 124 Annotation. 11. Dead code elimination: When unreachable code is removed by the compiler, no annotations are needed. The fact that source exists which is not covered by entries in the Instruction Map is sufficient to imply that dead code was eliminated. 12. Inline expansion: When function calls are inline expanded, both code and data Optimization Annotations may be needed. The resulting code is mapped to the call in the Instruction Map, and TCodeOptimization 120 Annotations map those same instructions to statements in the inline-expanded function. Also, entries in the Data Map 70 will reference source from the inline-expanded function as function variables are mapped into the expansion. In some cases, variables in the caller code will carry over into the expansion, as when local variables are mapped into formal parameters before the expansion. In this case, a TDataSharedOptimization 122 Annotation will be needed for the Data Map 70 to indicate the other variable which is live in the location. 13. Induction variable: Sometime the value of a variable need not be kept explicitly in some memory location or a register. Instead, an expression involving the variable is kept. The actual value of the variable must be "inferred" from the expression value. The effects of this optimization are handled by read/write access formulas in the Data Map 70. 14. Instruction scheduling: This is covered by the discussion of code motion above. 15. Loop unrolling: In loop unrolling, a loop which would normally be executed "n" times is converted into a loop which is executed fewer times by concatenating the loop body multiple times. For instance, instead of executing a loop body 8 times, and doing 8 branches back to the beginning of the loop, the code could be replaced with a loop that executed 4 copies of the body and execute that code only twice--eliminating costly branches. No Optimization Annotations are required to describe this optimization. The Instruction Map is allowed to map a multiple instructions to the same source reference. Note that the statement map for a statement which has been copied may need multiple breakpoint locations to indicate that multiple copies of the statement exist. 16. Reduction in strength: Sometimes a compiler will replace relatively expensive instructions, such as a divide, by less expensive instructions, such as a shift. No Optimization Annotations are needed to deal with this optimization unless data constraints drove the optimization, as discussed in earlier examples. In that case a TDataConstraintOptimization annotation is needed. 17. Shared location: Sometimes a compiler will determine that two or more data objects are semantically equivalent over some range of code and use a single live location to contain both objects. The TDataSharedOptimization 122 annotation describes the effect of this sharing. 18. Tail recursion elimination: When the last statement in the some control flow for a function recursively calls the same function, the compiler will sometimes replace that call with a less expensive branch which will then reuse the same stack frame. Some instruction in the code sequence which set's up the frame for reuse should have a TCodeOptimization 120 Annotation which maps to the function name and indicates that tail recursion optimization has been applied. The Debugger can use this information to create virtual stack frames to fill in for the reused frames when viewing the list of stack frames for a program. 19. Template instantiation: The effects of template instantiation are not handled by Optimization Annotations. The affected maps merely reference the appropriate source text, which may or may not be in the same component as the map itself. EXAMPLES The following examples provide a few specific failings of popular debugging information formats. Consider Dwarf2 and XCOFF. Variable Definitions Variable definitions describe the variable's location in an executing program. Dwarf2 references variables by pointing at their first character position in a source component. There is no additional information to aid in name resolution. For instance, using just debugging information from Dwarf2, a developer could not point at an arbitrary token in a source program and determine its semantic resolution. A developer could not determine if two variables having the same string name were semantically the same variable. This is even more problematic when dealing with templates, where a token can have more than one semantic resolution (i.e. the local resolution in the template body, and a resolution due to parameter substitution when the template is instantiated). A C++ template is a generalized pattern for producing code, like a cookie mold in a kitchen. A user instantiates a template by providing details for turning a generalized class into a specific object tailored to the user's specifications. With XCOFF, the problem is even greater as only the string name of a variable is captured, with no indication of where that string comes from in the source program. The information used by the Debugger of the present invention includes Token Maps, which show semantic resolution information for variables, and Statement Maps, which show how variables are used in declarations and statements. The data base used by the present invention also provides Usage and Definition Links, allowing a developer to find all places where a variable is used, or where the definition point is for a variable. Variable Access Variable access is a description of where a variable's value is located and how to extract it from a register value or a value in memory. In XCOFF, a variable must be live in a single site (e.g. a register or memory location) through a whole function. A variable's value must be solely contained in the site--it can't be part of an expression value in the site. No optimization information can be used to describe any optimization information know to the compiler. In Dwarf2, variables can be live in multiple sites, but they still must be solely contained in the site and no optimization information can be described. The Debugging System disclosed herein provides information in the Data Map 70 which instantiates variables to be in multiple sites, for the value to be embedded in an expression, and for optimization information to be supplied. For example, for the code fragment:
______________________________________
int a›13! ;
for (int i = 0; i < 13; i++)
a›i! = a›i! +i ;
______________________________________
the value of variable "i" may not exist by itself. With the optimization of induction variable generation, a register may contain the value of "address of a ›i!" instead. The value of "i" could be determined by taking the value of that register, subtracting the base address of array "a", and dividing by the element size of array "a". The Data Map 70 utilized as part of the present invention allows such a description. The Debugger Data Map 70 of the present invention provides for Optimization Annotations to describe how two variables in a source program are treated as semantically equivalent by the compiler for some range of code, so any change in the value of one necessarily changes the value of the other. The map also provides for the compiler to describe value constraints of a variable. For instance, a compiler might have "proved" that a variable's value in the source is positive at some point, and convert a divide by 4 operation into an arithmetic shift right operation (a reduction in strength optimization). The compiler can indicate that the variable's value for some code range must be positive for the generated code to work correctly. Instruction to Source Mapping Both XCOFF, and Dwarf2 use source to instruction maps. That is, a line in the source program is mapped to an instruction in the code. XCOFF does not describe the possibility of more than one statement on a line. Neither format allows for describing the effects of optimizations, such as common subexpression elimination. Both formats assume a single return point for a function. The Debugger information format of the present invention maps instructions to source code. The source code reference can be in many forms, such as a range of characters, a reference to a token, a set of character ranges, a set of tokens, or a range of tokens. So, optimizations such as common subexpression elimination, where a single instruction may relate to several text strings, are easily described. Both XCOFF, and Dwarf2 assume a contiguous set of instructions in the prologue and epilogue of a function. The Debugger information format of the present invention allows each instruction to be marked if it is part of the prologue or epilogue. Further, each instruction can be marked to indicate whether or not a frame exists at that location--information useful in stack walking and determining the context of a user's request. XCOFF cannot deal with the effects of instruction scheduling in any useful way. Dwarf2 allows for a line to map to multiple discontiguous sets of instructions, as a developer would see with instruction scheduling, but it has no way of indicating which instruction corresponds to the "breakpoint" location equivalent of any particular statement. The Debugger information format of the present invention allows the compiler to specify the "best" breakpoint location for a statement or even parts of statements, such as expressions and subexpressions. XCOFF deals only with lines of source code. Dwarf2 can, by implication, deal with smaller pieces of a line, but there is no way to indicate what semantic component the piece represents, such as a statement, expression, or subexpression. Further, Dwarf2 does not describe which pieces of source text correspond to the line piece. The Debugger Statement Map 68 of the present invention allows any piece or set of pieces of source text to be described. Further, the Statement Map entries can correspond to subexpressions, expressions, statements, functions, or any other language-extensible component. Templates and Generics A generic is another way of referring to a template. XCOFF has no provisions for describing the effects of template or generic instantiation. Dwarf2 has a basic capability to describe the instantiation, but has no provision for allowing, say, setting a breakpoint in a template function and having it apply to all instantiations of the template. The Debugger Project Database of the present invention coupled with the Debugger Information allows debugging templates in both their abstract and instantiated forms. Inlines Inlines improve execution by copying a subroutine's logic (source statements) inline to reduce the overhead associated with branching to the subroutine and returning from the subroutine. XCOFF has no ability to describe inline expansions. Dwarf2 has some ability to describe the effects of inline expansions, but cannot deal with most code optimizations. With the Debugger information format of the present invention, the maps can describe code generation and the effects of optimization from more than one viewpoint. For instance, if a developer has a function call like, "i=Foo (3)", an instruction involved in the call can be mapped both to the "Foo (3)" text at the call site, and to one or more pieces of text in Foo's body which is inlined expanded. Further, with the Debugger Project Database of the present invention, it is possible, say, to set a breakpoint for a statement in a function and have that breakpoint apply to all inline expansions of that function. While the invention has been described in terms of a preferred embodiment in a specific system environment, those skilled in the art recognize that the invention can be practiced, with modification, in other and different hardware and software environments within the spirit and scope of the appended claims.
|
Same subclass Same class Consider this |
||||||||||
