Method and apparatus for analyzing computer code using weakest precondition6029002Abstract An analyzer for maintaining and analyzing source code is disclosed. The analyzer includes a software translator for converting conventional source code into an intermediate language, slicing capability based upon weakest precondition determination, dual direction flow analysis and incorporation of a computational model to facilitate iterative code. Claims What is claimed is: Description COPYRIGHT NOTICE
TABLE 1
______________________________________
Fields of Typical IL Statement
______________________________________
23 0.0.1.2 assign x = y + 3 (1)
23: Line number within source file 210.
0.0.1.2: The source block. A block identifier uniquely
identifying a block of code in the source file defining
a procedure, such that "if" and "do" statements
define blocks within which there may be sub-blocks.
In analyzer 200 this number identifies a node con-
taining a single IL statement in the form of a text
string. In analyzer 400, this number identifies a
basic block containing at least one IL statement in the
form of a tree data structure.
assign x = Y + 3
Instruction corresponding to the source code in the
form of modified or extended DGC.
______________________________________
Referring to "source block" in Table 1, block numbers are sequences of decimal numbers separated by periods. The block numbers of two different source code blocks are always different (that is, block numbers must be unique). Block numbers which represent sub-blocks of a given block have block numbers with the same leading fields as the parent block. Blocks which are siblings differ only in the last field. This permits block parent, child and sibling relationships to be quickly tested using string compare operations. Blocks are also used to delimit variable scopes (see, Appendix B). New block numbers are created at "if" statements, "do" statements, source language blocks and procedures. "if" statements and "do" statements always create new blocks with unique numbers (within the parent block). Specifically, if the current block level is "0.1.2" and an "if" is encountered, the condition and the code in the "then" clause will get a block number of "0.1.2.n+1", where n is a previously unused decimal number at that level. Procedures create new blocks. A "." is appended followed by a previously unused decimal number at that level. Unnamed blocks within the code (such as C's `{. . . }`) are equivalent to blocks and can be implemented as a pair of proc statements.
TABLE 2
______________________________________
IL Statement Types
______________________________________
assign: basic assignment (DGC based)
if: corresponding to conventional if-then-else
do: standard iteration
call: standard call
goto: standard goto
return: classic return
LABEL: defines a label in the intermediate language
L-entry:
to introduce labeled procedure entries
L-procedure:
to introduce labeled procedures
file: to introduce new source files analyzed
null: nop (no operation)
proc: general purpose no-op
setRet mark the end of a paragraph as the end of a perform state-
ment
SetNull:
clear the end of a paragraph
______________________________________
The IL described above is processed within engine 250 as a text string contained within a statement link list, as described below. The source block number identified in Table 1 identifies a specific node for a specific IL statement (e.g., node 1200 in FIG. 7). 2. Translator 410 Referring to FIG. 5a, translator 410 includes the same functional units as translator 220, i.e., filter 320, scanner 330 and parser/code generator 340. Accordingly, the foregoing discussion applies equally to these components as used within analyzer 400. However, in this embodiment, parser/code generator 340 outputs a single file 420 containing translated source code and a symbol table. An exemplary set of rules and definitions necessary to carry out this operation for a "C" source language on a scanner and parser/code generator (i.e., Flex and BISON, respectively, as described above) are provided in the files listed in Table 3 below, and contained in Appendix A attached hereto.
TABLE 3
______________________________________
Exemplary rules and definitions for translator 410
File Description
______________________________________
parser.y State definition rule 520
scanner.1 Token definitions 530
parcer.y BNF 540
fns.sub.-- : a2d.c, e2h.c, i2l.c, m2p.c, q2t.c, u2z.c
Semantic Rules 550
Same as Semantic Rules
Symbol table
______________________________________
The foregoing rules and definitions generate IL statements that correspond with Tables 1 and 2 above. Further, translator 410 also generates statement types identified in Table 4 below.
TABLE 4
______________________________________
Additional IL Statement Types
______________________________________
var: to declare variables
type: to specify type information
member: to declare a member of a structured type or field
Version: provides information about version of language
______________________________________
Moreover, the IL statement produced by translator 410 includes an extra field to identify the source file from which it comes. This field is illustrated in equation (2) below and Table 5.
TABLE 5
______________________________________
Extra IL Statement Field
______________________________________
0 23 0.0.1.2 assign x = y + 3 (2)
0: The zero based file number of source file 210 (FIGS. 2-4)
The remaining fields are described in Table 1.
______________________________________
The symbol table contained in file 420 is encoded in the var, member and type statements identified above. The IL described above (i.e., in Tables 1, 2, 4 and 5) is more accurately referred to as "external IL" since the operator can access this during code analysis. A specification setting out the parameters of the external IL language upon which files 230 and 420 are based is attached as Appendix B hereto (i.e., "Specification of PIL" ). As Tables 2 and 4 make clear, the IL of analyzer 200 is essentially a subset of the IL for analyzer 400. Translator 410 also generates a second representation of IL to facilitate processing and analysis of source code by engine 440. This second type of IL, which is also included in IL file 420, is called "internal IL". Internal IL is represented in an Abstract Syntax Tree (AST) data structure. This structure is invisible to the user and is accessed by engine 440 to carry out its operations. The AST structure is described in A. V. Aho, et al., Compilers: Principles, Techniques, and Tools, Addison-Wesley (1986) (hereinafter "Aho"), which is hereby incorporated by reference in its entirety for all purposes. A graphical representation of an AST 500 for the instruction in equation (1) is provided in FIG. 5b. Further, a specification setting out the parameters of the internal IL is attached as Appendix B hereto (i.e., "Internal PIL and Symbol Table Design"). C. UNITY After IL conversion, the operator in analyzer 200 (via interface 130) or the UNITY format generator 430 in analyzer 400 (FIGS. 2 and 4) may organize the resulting IL file in accordance with the UNITY computational model. In analyzer 200, this operation is performed manually. Conversely, in analyzer 400, it is anticipated that this operation will be automated. The UNITY model intended to be used in these embodiments is described in J. Misra, A Discipline of Multi-Programming, University of Texas Austin Report (1994), which is hereby incorporated by reference in its entirety for all purposes. A copy of this report is available from the University of Texas upon request. Code analyzers 200 and 400 may employ the UNITY computational model to represent any program that is analyzed. Through UNITY, no distinction need be made between sequential code and (1) iterating code in a sequential program or (2) the repeated non-deterministic execution of interacting sequential programs in a system. The same computational model and formalism may be used for analyzing parallel programs, multitasking and other forms of concurrence. With the incorporation of the UNITY model, translation of source code by analyzers 200 and 400 requires two general steps: (1) The entire source code is translated into sequences of assignments and alternation statements (i.e., intermediate language). Iterative statements (do while's and any other code that can be iterated) are isolated and presented as separate sequences. (2) These isolated sequences are organized as non-deterministically executing portions of a UNITY program. The execution of the sequences, however, depends on some conditions for their execution. Some of these conditions may contain artificially created variables that have the sole purpose of forcing the original sequence of execution of the source program (e.g., the variable "c" in the example of Table 7 below). These sequences represent a natural form programmers use for an informal understanding of the code. They essentially are ordered in the same way programmers mentally organize program statements when reading code with the purpose of obtaining information about its meaning. Specifically, the UNITY model places iterating code in a more accommodating format for code analysis. Some code that iterates is contained within a program that executes sequentially. The iterating portion of this program (e.g., a loop) must ultimately end; therefore, this type of code is called "terminating." The entire program, in turn, may be iterating in the sense that it can be used repeatedly against some permanent data (i.e., information held in a data base). The iterations of the entire program can be any number (i.e., there is no coded mechanism to terminate the repeated execution of a program), therefore, this type of code is called "non-terminating." It is anticipated the UNITY model will be applied to a subset of all iterating code. Specifically, UNITY will be applied when iterations are caused by a set of cooperating processes executing concurrently (e.g., repeated and simultaneous access to a common database) or by a program whose global framework is based upon one or more large loops. In the context of the UNITY model, small, deterministic loops presented in a sequential program are typically of simple construction and therefore can be manually substituted with specification (i.e., preconditions and postconditions) that define their meaning without being separated from the main program (i.e., the sequential code). The study of iterating code, both terminating and non-terminating, involves the use of "invariants," as discussed below. The partitioning of sequential code into the UNITY computational model is one way to permit the verification of invariants for the entire program or its relevant portions (i.e., a loop body) using only "weakest precondition calculation" and expression simplification, as discussed below. The UNITY computational model may be generally represented in the expression of Table 6 below.
TABLE 6
______________________________________
UNITY Computational Model
______________________________________
initially x, y, z, ... = e.sub.1, e.sub.2, e.sub.3, ...
<B1 --> P1 [ ] B2 --> P2 [ ] ...>
______________________________________
Referring to Table 6, x, y, z, etc. are program variables that assume the initial values of the expressions e.sub.1, e.sub.2, e.sub.3, etc. Further, P1, P2, etc. are sequential programs executing non-deterministically and repeatedly when the leading conditions (i.e., "guards") B1, B2, etc. hold (any of B1, B2, etc. may be the constant predicate true: in that case the corresponding program would execute at any time). A fat bar ("[]") separates each B-P pair. The UNITY model also requires that the non-deterministic execution is chosen fairly (this can be assumed in all practical cases; i.e., there exists no biasing for any particular P) and that P1, P2, etc. execute as a whole piece of non-interrupted code. The UNITY logic supplies effective ways to specify and prove properties, and is a convenient way to present code to an operator for informal understanding. Therefore, it is useful to transform sequential programs that have significant portions of their code that can be iterated into a UNITY form. Such iterated code, be it a large loop body in a sequential program or a set of cooperating processes that can even execute concurrently, can be treated exactly in the same way as sequential code after transformation into the UNITY model. The transformation of sequential code into UNITY requires (in general) the use of auxiliary variables for forcing the sequential execution of these code portions that are represented as non-deterministically executed sequences. Exemplary cases are provided below. These cases represent most of the situations found in practice. The approach can easily be generalized and extended to other more unusual situations. FIG. 6 provides a graphical representation of a single "do while" loop, which in UNITY form is represented as equation (3). When B holds, the UNITY program reaches a Fixed Point (FP) that corresponds to the loop termination. (A FP is a state where the execution of any of the sequences P.sub.i in the program is either impossible, or does not produce any state change.) <B.fwdarw.S> (3) FIG. 7 provides a graphical representation of a single "do until" loop, which in UNITY form is shown in Table 7. In this table, "c" is an auxiliary variable (set initially to zero) which forces the execution sequence S; B.fwdarw.S; etc. The program reaches an FP when B holds.
TABLE 7
______________________________________
"do until" Loop
______________________________________
initially c=0
<c=0 --> S; c:=1 [ ] c=1 B --> S>
______________________________________
FIG. 8 provides a graphical representation of a "nested do while" loop, which in UNITY form is shown in Table 8. The two loops identified in Table 8 terminate at an FP when c=0 B holds.
TABLE 8
______________________________________
"nested do while" Loop
______________________________________
initially c=0
<c=0 B1-->S1; c:=1[ ]
c=1 B2-->S2[ ]
c=1 B2-->S3; c:=0>
______________________________________
The mechanics for converting sequential programs into UNITY form, such as the foregoing, is relatively straight forward. The starting point is the sequential program in BLOCK form. Each block is a sequence of assign, alternation, loop, abort, and skip statements. Here are the block rules; the blocks should be created as a result of the flow analysis: Blocks cannot be contained within blocks; A GOTO, if any, must occur at the end of a block (this defines a block boundary, if fact); Any label that is the target of a GOTO must be on the first statement in a block (defining the other block boundary). This is because the other basic rule for a block is that the only entrance is at the beginning; The start of a loop body is a block; The start of each alternation is a block; The blocks are numbered sequentially from 1 to N as they occur in the code text. Converting a sequential program in block form into UNITY involves the following steps: 1. Create a global variable called BlockNumber, which is initialized to 1; 2. The last statement of each block must set the next block number; For a GOTO, set BlockNumber to correspond to the label that was the target of the GOTO (the GOTO does not appear in the UNITY program); If the block appears at the end of a loop (immediately before the od), set BlockNumber to be the block for the beginning of the loop body; If the block appears at the end of a guarded statement as part of an alternation, the set BlockNumber to number the block immediately following the alternation; If the block is followed immediately by another, possibly within a loop or alternation, set BlockNumber to the number of the following block. 3. Every guard in the UNITY program tests the value of BlockNumber for equality to the guarded block, and this test is conjoined with whatever other guard (from an alternation or loop) might be associated with the block. 4. Every guarded statement in the UNITY program is the code of the block with successor BlockNumber computations added (as specified above), GOTOs removed, and all do, od, if, and fi control statements removed. This process is graphically displayed in FIGS. 8a-8c. Specifically, FIG. 8a illustrates original sequential code, FIG. 8b shows the same code augmented with blocks, and FIG. 8c illustrates a UNITY program created by the foregoing rules. An operator of code analyzer 200 or UNITY format generator 430 of code analyzer 400, breaks a program into segments as required by the UNITY model. The operator may do this through a manual review and identification of iterating portions of code. Conversely, generator 430 may carry out this operation through the use of internal IL. (Although generator 430 is shown as a separate block in FIG. 4, it may be embodied as a module in engine 440.) This representation as noted above, is in the form of an AST data structure, such as that shown in FIG. 5b. Generator 430 utilizes this data structure to construct a program dependency graph, a construct that is commonly used for data flow analysis of programs and is well known to those having ordinary skill in the art. The program dependency graph contains the dependency relationships among various statements within a program. These dependencies are of three types: (1) control dependencies, (2) data flow dependencies, and (3) declaration dependencies. Control dependencies indicate the order that statements may be executed. Data flow dependencies indicate how data is propagated through a dependency graph. Declaration dependencies indicate the dependency relationship between data declarations and the statements where their states are changed. FIG. 9 illustrates an example of a program dependency graph constructed from the C- language program fragment provided below in Table 9.
TABLE 9
______________________________________
C-Language Program Fragment
______________________________________
main ( )
{
[1] int fact
[2] int n
[3] n=4
[4] fact=1
[5] While (n!=0) {
[6] fact = fact * n
[7] n = n-1
}
}
______________________________________
In the fragment of Table 9, the two declarations in lines [1] and [2] are related to equations in lines [3], [4], [6], and [7]. This relationship is graphically represented by lines 936-942 in FIG. 9. Regarding control flow, "main ()" precedes every statement while the inequality in line [5] must precede the statements of the loop body (i.e., [6] and [7]). This relationship is represented by lines 944-956 in FIG. 9. Regarding data flow, the equations in lines [6], [7], and the inequality in line [5] (i.e., n!=0) initially use the value of n set in line [3] and fact in line [4]. These equations also use the value of fact set in the previous cycle of the while loop. These relationships are represented by lines 958-966 in FIG. 9. The use of a program dependency graph permits the separation of program segments as required by the UNITY model. In the foregoing example, the loop (lines [5], [7]) should be a separate segment because the loop iterates to termination for each use of "main ()." Moreover, this loop can be separated since it is an isolated sub-tree in the dependence graph of FIG. 9. Referring to this figure, if node 920 were removed from dependence graph 900, then the entire loop would be removed from the program fragment. Significantly, only one statement in "main ()" must precede the loop. However, all "main ()" declarations must be carried in the separate segment because of declaration dependencies. Similarly, statements [3] and [4] just prior to the loop also depend on these declarations. This is an observation necessary for the propagation of weakest precondition calculation from one segment to another (i.e., "backwards" analysis, as discussed below). Code analyzer 400 uses the program dependency graph data structure to segment a sequential program into UNITY segments, as illustrated in FIG. 10. Referring to FIGS. 4 and 10, IL file 420 (containing "internal IL" in AST form and a symbol table) is received by code analysis engine 440, and combined to form a program dependency graph, in accordance with block 1010 of FIG. 10. Engine 440 then searches the newly created dependency graph for statements that define iterations (e.g., "do while," "goto," etc.), as indicated in block 1020. Thereafter, engine 440 creates alternate UNITY statements for each interaction directed to repeating execution and exiting operation, pursuant to block 1025. The engine also introduces sequencing variables (e.g., variable "c", in Tables 7 and 8) where necessary. Finally, pursuant to block 1030, engine 440 utilizes the control flow dependencies of the subject dependence graph for isolating the subgraph associated with each of the iterations found in block 1020. The operations carried out in blocks 1010, 1020, and 1030 are well known to a person having ordinary skill in the art since such operations are also performed by any commercially-available compiler that optimizes program loops. These operations are also discussed in Aho. As the foregoing illustrates, control flow dependencies are essential for isolating iterative segments. Conversely, declaration and data flow dependencies supply useful information for efficient execution of forward and backward analyses. D. Analysis Sequence: Link Lists FIG. 11 illustrates the software analysis sequence of code analysis engine 250 under the control of user interface 130 (FIG. 2). As a general summary of operation, engine 250 initially generates a statement link list from translated code, in accordance with block 1110. Thereafter, an operator designates the relevant code fragment to be analyzed in block 1112. Finally, flow analysis (either forward or backward) is carried out in blocks 1114-1134. Source code representing this embodiment of the invention is provided in a file entitled "PCA2 main routine," located in the Provisional Appendix. 1. Statement Link List Analysis engine 250 builds a link list from IL File 230 and a corresponding symbol table in SYM file 240. This operation is well known to those having ordinary skill in the art. An overview of the resulting data structure is provided in FIG. 12. FIG. 12a illustrates a link list node 1200. Every statement in source code file 210 is represented by a text string disposed within a node (i.e., "struct line" in the source code). A text string simply represents a statement (i.e., "y:=x+1") as a string of characters. Node 700 includes four sets of pointers (i.e., addresses to other nodes in memory) to other structures; i.e., previous node ("prev"), next node ("next"), a parent of the node ("parent"), and children ("children"). The children structure is made up of an array to facilitate any type of branching flow in the source code. FIG. 12b illustrates a basic data structure of analyzer 300 using link list nodes. More specifically, FIG. 12b illustrates a double-link list 1210 including three nodes; 1202, 1204 and 1206. As illustrated by this data structure, code analysis engine 250 can traverse a link list of nodes representing a portion of source code by moving down a "next" pointer or up a "prev" pointer. An illustration of this data structure within a more complex source code construct is shown in FIG. 12c. FIG. 12c illustrates a branch construct having multiple levels to accommodate an if-then-else statement. Referring to this figure, fork node 1212 is coupled to "end-if-node" 1220 via previous pointer 1252 and next pointer 1254. Fork node 1212 is also coupled to "if-condition" node 1214 via child pointer 1224, previous pointer 1222, and parent pointer 1226. Control flows to node 1214 when the subject if-then-else condition is true. Node 1214 is coupled to node 1216 via previous pointer 1256 and next pointer 1258. Node 1216 is also coupled to fork node 1212 via parent pointer 1228. An identical data structure follows from node 1216 to node 1218 with any number of nodes in between. Node 1218 is the last node of this sequence, being coupled to fork node 1212 via parent pointer 1230 and to end-if-node 1220 via child pointer 1234 and next pointer 1232. An identical data structure exists on the "else" (i.e., left-hand) side of the figure wherein nodes 1222, 1224, 1226, 1212 and 1220 are coupled to each other via previous pointers 1250, 1260; next pointers 1262, 1236; child pointers 1248, 1238; and parent pointers 1240, 1242, and 1246 as shown therein. Control flows to node 1222 when the subject if-then-else condition is false. Engine 250 will traverse an if-then-else statement in IL file 230 on the data structure shown in FIG. 12c. Fork node 1212 is a "virtual" node supported by the data structure but not existing in the original source code file 210. Similarly, end-if-node 1220 is a virtual node (signifying the end of an if-then-else statement) supported by the data structure but not existing in the source code file 210. Referring to FIG. 12c, engine 250 can traverse the branch statement represented by this structure simply moving down next pointer 754 or up previous pointer 752. Next or previous pointers allow for movement to the same or higher level. Conversely, child pointers allow for movements to lower levels. The left-hand side of FIG. 12c (i.e., the "else" side) will be created by engine 250 even if there is no else statement within the source code file 210. Parent pointers allow for immediate movement to fork node 1212, as shown in FIG. 12c. Alternatively, engine 250 may traverse one or more previous pointers to return to fork node 1212. If any child node (i.e., 1214-1218 and 1222-1226) includes an if statement, the same structure as 7c may be found within that node. FIG. 12d illustrates the components of node 1200. Referring to FIG. 12d, node 1200 includes a type field to identify the type of intermediate language statement (i.e., if, assign, goto, etc.). This node also includes an "LHS" field to identify the left-hand side of an assignment statement and an "RHS" field to identify a right-hand side of an assignment statement. (The LHS and RHS fields are ASCII format for display to a user on monitor 120.) Finally, node 1200 includes a hash field which is an array of integers, wherein each integer is an index to a symbol table (i.e., in SYM file 240) containing attributes of the subject variable. As noted above, an exemplary symbol table may be found in co-pending application Ser. No. 08/550,869, (attorney docket no. 5773-3(P1)). The link list data structure disclosed in FIGS. 12a-12d is of conventional type well known to those having ordinary skill in the art. Accordingly, it will be apparent to those of skill in the art that this structure may be readily applied to other source code constructs (i.e., assign, goto, etc.). Similarly, the text string data structure disposed in each node is a standard structure well known to those having ordinary skill in the art. Such structure may consist of a string of ASCII characters. Returning to FIG. 11, once a statement link list has been generated, the code-analyzer operator may designate a relevant code fragment within that list pursuant to block 1112. Specifically, the operator may construct a desired fragment by selecting elements from an existing link list, or directly from the corresponding source or IL files. Responding to operator instructions, engine 250 constructs a code fragment by selecting a (1) starting statement and (2) ending statement. Upon designating a relevant code fragment, the operator of code analyzer 200 has the option of performing a forward or backward analysis pursuant to blocks 1114-1134 of FIG. 11. 2. Flow Analysis Referring to FIG. 11, upon selection of a program fragment in block 1112, the operator may now begin flow analysis pursuant to block 1114 in FIG. 11. A flow analysis begins with the operator selecting either a forward analysis in block 1116 or a backward analysis in block 1118. If forward analysis is chosen, control flows to block 1120. Conceptually, forward analysis is a classical application of data flow analysis. Forward analysis is the process of following the execution path of the code fragment being analyzed and either displaying information to the operator or building a data base described in the flow of control. In this process, engine 250 traverses a link list of nodes defining translated source code in a forward direction. At each node, engine 250 textually identifies a statement of the source code represented by a text string to determine the next node in the execution path. More specifically, engine 250 processes each node in forward analysis by interpreting associated IL statements through character matching. To begin forward analysis, the operator must select corresponding starting and ending points pursuant to block 1120. Thereafter, control flows to block 1122 which represents forward analysis operation. A forward analysis may be used to determine the various paths of operation to get to some point downstream in the code. Variations on this operation include a display of all intervening branching conditions, a display of multiple end points and a display of the entire code and all paths between selected starting and ending points. Forward analysis may also be used to show only those statements that are involved in the manipulation of one or more variables. This is a slice of the code that represents the path. More specifically, a slice is obtained by forward analysis under control of the operator. It begins at the starting point and continues until it reaches the ending point designated by the user, or the end of the relevant code fragment or the code fragment being analyzed. The resulting forward slice is saved for use as a relevant code fragment for slicing or weakest precondition analysis. A top level flow diagram illustrating forward analysis in code analyzer 200 is provided in FIG. 13a. Referring to this figure, engine 250 begins forward analysis at the first selected statement pursuant to block 1336. This selected statement is defined by the starting point identified by the operator in block 1120. Engine 250 places the initial statement number (i.e., starting point) in a LIST in memory and initializes additional space in memory (pathstorage) for the first execution path to be analyzed. Control then flows to block 1338 where engine 250 produces a list of facts (i.e., data utilized by CLIPS commands), each fact being a list of node identities representing a possible path. This process is described in more detail in the pseudo code of Table 10.
TABLE 10
______________________________________
enter initial and end statement number;
put initial statement number in a LIST,
initialize storage for the first path;
position at the initial statement number in the
source file -- this is now the current
statement;
repeat as long as there is a statement number
in LIST -->
repeat as long as next statement is not the
end point -->
if current statement branches -->
store the statement number where flow
of control goes in LIST;
[ ] current statement does not branch -->
skip
fi;
put current statement in pathstorage;
get next statement;
check if next statement closes a loop
(isolate the whole loop in this case)
end repeat.
Store path in file (path is a flat list with
condition statements linked together).
Top element in LIST being initial statement number
initializes storage for another path; remove top
element from LIST;
end repeat.
______________________________________
Referring to Table 10, the initial statement number (selected by the operator) is initially treated as the "current statement" for purposes of analysis. Engine 250 then enters two repeating or looping operations. The outer loop enables engine 250 to enter multiple paths, while the inner loop enables the engine to traverse a current path undergoing analysis. Starting with the initial statement number, engine 250 processes the first path within the inner loop. If the current statement branches, the statement number is stored in the LIST. Otherwise, this step is skipped. Thereafter, the current statement is stored in memory (i.e., pathstorage) with the corresponding "condition" for continuing down the current path. Engine 250 retrieves the next statement (i.e., link list node) along the current path and checks to determine if the statement closes a loop. If yes, the entire loop is isolated for purposes of imposing a UNITY model or for reviewing manually. The path analysis within the inner loop is then repeated, so long as the next statement is not an end point. As noted above, the IL statements are not executed in forward analysis but rather are interpreted through character matching. Upon completing a path (i.e., reaching an end point), the entire path is stored in pathstorage as a flat list with condition statements linked together. Engine 250 then returns to the LIST and removes the top element. The top element, representing the statement number of the first branch in the previous path, is used to initialize analysis and storage for another path. Starting at this new statement number, engine 250 repeats inner loop operations; i.e., storing corresponding conditions in pathstorage and recording branches in LIST. These inner and outer loop operations are repeated until every statement and every path is processed by engine 250. After a list of facts are produced pursuant to block 1338, control flows to block 1340 in FIG. 13a where the user may filter or manipulate the fact contents through the use of CLIPS commands. More specifically, for statements disposed between starting and ending points, the operator may retrieve (1) the stored conditions and corresponding IL code (linked by statement number), (2) all the statements and (3) all the corresponding paths. Additionally, specific variables may be identified for purposes of textually slicing the code based upon direct textual matching. This is a simple form of textual slicing --accommodating only direct hits--and therefore different from the more complex textual slicing described in FIG. 27, below. If no end point is specified for forward analysis, analyzer 200 will continue to process code until a return (i.e., if in a procedure), the end of the program, or a termination/exit statement is reached. Analyzer 200 also facilitates a "suppress" feature whereby all calls are ignored when performing forward analysis. Accordingly, only a single, primary path is processed. Control then flows to block 1342 where the operator may selectively display information on monitor 120. As noted above, such display includes all intervening branching conditions, multiple end points and display of the entire code and all paths between selected starting and ending points. Examples of forward analysis output are provided in FIGS. 13d-13i, based upon source code in FIGS. 13b and 13c. In the examples, a forward analysis is performed on a routine in "samplel.cob" called P010-CALC-LEAP-YEAR (FIG. 13b) which is called from another routine in the same program called 0300-INITIALIZE-RTN (FIG. 13c). These examples were generated on a early prototype to analyzer 200. FIG. 13d illustrates forward analysis to determine the ability to reach one point from another. This example shows the conditions under which point 2226 is reached and in which lines those conditions appear in the source code. "ENDLine 1" indicates that line 2226 has been reached. Note that two paths are available to reach this point, denoted by "[1]" and "[2]." FIG. 13e demonstrates that more than one end point may be identified; "ENDLine 1" indicates that a first end point is reached and "ENDLine 2" indicates a second end point is reached. FIG. 13f illustrates the display of all code between the start and end points for four different execution paths. FIG. 13g illustrates the textual, forward slicing of a portion of the code based upon variables "C-FEB" and "C-TEMP-1." FIG. 13h illustrates the extraction of variables upon which C-TEMP1 is dependent. This extraction is achieved through reliance on a data dependency graph such as shown in FIG. 9, below. Significantly, the results of FIG. 13h may then be used in FIG. 13i to perform a slice of all such variables from the subject code. Returning to FIG. 11, in addition to forward analysis, an operator may choose to perform backward analysis pursuant to block 1118. If backward analysis is selected, control flows to block 1126 wherein the operator must choose starting and ending points in the same manner as for forward analysis. Once valid starting and ending statements have been identified, engine 250 prompts the operator to enter a postcondition pursuant to block 1128. Upon presentation of the postcondition, backward analysis may begin pursuant to block 1130, wherein engine 250 traverses a link list of nodes defining translated source code in a backward direction. Backward analysis is used in two different operations: one for calculating the "weakest precondition" (wp) of a non-iterating code segment and the other for verifying invariants for one or more iterating segments of the code. In backward analysis, each statement is processed by using the content of the statement to modify a postcondition to obtain a weakest precondition (such operation is described below). Engine 250 interprets each statement though character matching. All program constructs included in the intermediate language are formed by a variety of statement types including alternation statements, assignments, skips, aborts and iterations connected together by the sequencing s operator (";"). Iterating code segments associated with large loops or cooperating processes executing concurrently will not be processed within the text of the main program because of the manual transformation of sequential programs into the UNITY model. Small, terminating loops, will typically be left within the main program. However, these segments may be analyzed separately requiring some operator input, such as an invariant. This determination is made through visual inspection of the code. Verification of invariants for iterating statements uses the weakest precondition calculation code. This code represents most of the work for invariant verification. A discussion of weakest precondition computation follows. 3. wp Computation The concept of the weakest precondition was introduced by E. W. Dijkstra (see, Dijkstra) and so far has been used primarily for theoretical work on the semantics of programming languages. Weakest precondition ("wp") may be defined as follows: given a program S and a condition P (i.e., a logical predicate which the program must satisfy when it terminates), the weakest precondition of S for P (i.e., "wp.S.P.") is the least restrictive condition Q that must hold before the execution of S to guarantee its termination in one of the states defined by P. This relationship is succinctly expressed in equation (4) below. wp.S.P.=Q (4) The weakest precondition of many useful programming language statements is calculated by a simple syntactic manipulation of the programming language statements themselves and of the formula describing the postcondition. Several examples are provided below to illustrate the rules for wp computation of six basic statement types defined in DGC language (i.e., assignment, composition, alternation, skip, abort and repetition statements). In these examples, the definitions set out in Table 11 apply.
TABLE 11
______________________________________
S, T:
denotes programs (statements), possibly compound statements.
P, Q:
denote predicates, often in simple propositional calculus and
occasionally in first order predicate calculus with quantifiers.
These
predicates are not part of a program; rather, they are generated by
the operator and used to express the state of program variables.
B: denotes a predicate, or logical expression, that is part of a pro-
gram and is evaluated as part of the execution of alternation and
repetition statements.
______________________________________
As a first example, S1 is a single assignment statement provided as equation (5) below. x:=x+1 (5) Weakest preconditions of S.sub.1 for two predicates are shown in equations (6) and (7) below. wp.S.sub.1 .multidot.(x=2)=(x=1) (6) wp.S.sub.1.(x>0)=(x>-1) (7) In words, program S will terminate in the state x=2 if and only if it starts in the state x=1, and it will terminate with x>0 if and only if it starts with x>-1. As a general rule, the semantics of the assignment statement x:=e, for any expression e, is defined in equation (8) where the notation P=P(x(-e) means that all occurrences of the variable x in predicate P are textually replaced by "e". If the variable does not occur in the predicate, then the predicate remains unchanged and, referring to equation (4), predicate Q=predicate P. wp.(x:=e).P=P(x.fwdarw.e) (8) As a second example, S.sub.2 is the composition of two assignment statements provided as equations (9) and (10) below. x:=x+1; (9) y:=y*y (10) Examples of weakest preconditions of S.sub.2 for several predicates are shown in equations (11) through (13) below. ##EQU1## As a general rule, for any two statements S.sub.a and S.sub.b, the semantics of the composition of the two statements in connection with weakest precondition is shown in equation (14). Wp.S.sub.a ;S.sub.b.P=wp. S.sub.a.(wp.S .sub.b.P) (14) This is a "backward computation." The wp of the last statement is the predicate used in the wp computation of the first statement. As a third example, S.sub.3 is a guarded alternation statement provided as equation (15) below. if x>0.fwdarw.x:=x-1; []x.ltoreq.0.fwdarw.skip; fi (15) Equation (15) is written in DGC form and in words means if x is greater than zero, x is set equal to x-1, and if x is less than or equal to zero, no operation is performed and control moves to the next statement (i.e., "skip"). Weakest precondition of S.sub.3 for an exemplary predicate is provided in equation (16). wp.S.sub.3 .multidot.(x>2)=(x>3) (16) Generally, the semantics of an alternation statement in the form of "if B then S.sub.a else S.sub.b" in connection with weakest precondition is shown in equations (17) (disjunctive form) and (18) (conjunctive form). ##EQU2## As a fourth example, S.sub.4 is an abort statement--whose wp for any predicate is always "false" (i.e., results in program failure) because the program never reaches the end. The weakest precondition semantic is provided in equation (19). wp.abort.P=false (19) As a fifth example, S.sub.5 is a skip statement--which is always equal to the predicate "P" of equation (4) because the skip does not change anything. The corresponding weakest precondition semantic is provided in equation (20). wp.skip.P=P (20) Finally, a repetition statement in DGC form (i.e., "do loop") is provided in equation (21) below. do B.fwdarw.S od (21) Referring to equation (21), "B" guards the loop body "S," which is repeated so long as B evaluates to true. If B is initially false, then S is never executed. To be useful, S must have "progress properties" that assure that B will eventually be false (although many systems, such as an operating system, loop forever). Repetition semantics must be defined in terms of a "loop invariant" which is a predicate that holds before and after S; that is, S does not transform the invariant predicate. A necessary and sufficient condition for invariant predicate P.sub.I is shown in equation (22). ##EQU3## Equation (22) states that once past the guard, both B and the invariant hold and if the invariant is to hold after the loop body, B and P.sub.I must imply the weakest precondition. When the loop statement terminates, the guard will be false but the invariant must still hold. Furthermore, the invariant must hold initially. These observations lead to the weakest precondition semantics of the repetition statement as shown in equation (23). ##EQU4## FIG. 14a provides a top-level flow chart of backward analysis in analyzer 200. At block 1430, engine 250 performs a forward analysis between the starting and ending points selected in block 1126 of FIG. 11 to create path fact(s) (see, FIGS. 13a-13i and related discussion). In block 1432, engine 250 massages the resulting path fact(s) by performing such operations as textual or transitive-closure slicing. Such slicing of linked list data structures is well known in the art. An analogous operation for basic block data structures is provided below (i.e., FIGS. 27, 28a, 28b). Control then flows to block 1434 where wp computation is performed on each selected path. This operation is described in FIG. 14b below. Finally, the wp of every two paths newly calculated are ORed together in block 1436 and then simplified pursuant to FIG. 17a (discussed below). FIG. 14b illustrates flow chart 1400 which describes weakest precondition calculation in analyzer 200 for each program statement (S) of interest within a path defined by an initial line number and final line number. This operation may be applied to a fragment or the entire original program. In block 1410 of flow chart 1400, the operator identifies initial and final line numbers and a postcondition P. Engine 250 responds by identifying the last statement (S) contained between the initial and final line numbers. After these initial values are established, control flows to block 1412 where engine 250 determines whether S is an alternation. If S is not an alternation, control flows immediately to block 1416 where a weakest precondition is calculated based on postcondition P and statement S. Alternatively, if S is an alternation, then control flows to block 1414 where engine 250 determines whether any assignment in statement S has a left-hand side (LHS) variable in an assignment statement that is contained in the variables of P. If there is, control flows to block 1416 for weakest precondition calculation. Alternatively, if there is not, control flows to block 1415 where statement S is substituted with a skip statement and then control flows to block 1416. In block 1416, weakest precondition is calculated based on statement S and postcondition P. Upon calculating a new precondition (which will be a postcondition for the previous statement), a new set of variables are identified based on the latest postcondition. (A flow chart describing the operation in block 1416 is provided in FIG. 15.) Control then flows to block 1418 where engine 250 determines whether P contains only constants and free variables (i.e., variables that do not occur in the program). If this is answered in the affirmative, control flows to block 1422 where expression simplification is performed and the process ends pursuant to block 1426 since any remaining statements will have no effect on P. However, if block 1418 is answered in the negative, control then flows to block 1420 where engine 250 inquires whether S is the first statement between initial and final line numbers. If yes, control again flows to block 1422. If no, control flows to block 1424 where S is decremented by 1, thereby identifying the next statement to be processed. This operation continues until block 1418 or block 1420 is answered in the affirmative. Referring again to FIG. 14b, the expression simplification for block 1422 is optional and described in greater detail below. Rather than repeatedly inspect the postcondition P for variables of interest during wp computation, engine 250 may alternatively employ a separate list of variables ("V") in accordance with the program variable list shown in FIG. 25. If such a list were employed, the process in FIG. 14b would be modified only slightly. In block 1410, engine 250 would generate the list based on the initial contents of P. In block 1414, the variable list V rather than the postcondition P would be searched for matching variables with a new statement. Use of a variable list could be significantly more efficient than a postcondition text string when performing the searching operation in block 1414, especially if the latter has a significant number of terms. For example, a statement is shown in FIG. 14c containing 21 terms but only 7 variables. Presumably, the time required to search a list with 7 entries is less than searching a 21-term text string. Returning to FIG. 14b, upon computing a weakest precondition in block 1416, variable list V is updated by adding and deleting variables based upon the latest postcondition. In summary, incorporating a variable list such as shown in FIG. 25 would affect operations in blocks 1410, 1414 and 1416 of FIG. 14b. FIG. 15 illustrates the wp computation carried out in block 1416 of FIG. 14b. In this analysis, it is presumed that Hoare's axiom holds which, in a practical sense, means variable aliasing is not accommodated. Turning to FIG. 15, engine 250 determines whether S is an alternation. If yes, control flow to block 1514 where a result (i.e., weakest precondition) is equal to a guard (i.e., B1 or B2) logically ANDed with the weakest precondition of a corresponding statement (i.e., S1 or S2) based on postcondition P. As discussed above, B's are guards to their corresponding statements. The result in block 1514 assumes that only two statements are present in the associated alternation statement. In this instance, engine 250 calls the code recursively to evaluate the weakest precondition for each branch of the if (i.e., alternation). If S is not an alternation, control flows to block 1512 to determine whether S is an assignment. If yes, engine 250 performs a textual substitution of the predicate P (i.e., postcondition), replacing all occurrences of the subject variable in the predicate with the expression on the right-hand side of the assignment. If statement is not an assignment, control flows to blocks 1518 where engine 250 determines whether statement S is a procedure call. If yes, control flows to block 1520 where the result is a product of recursively performing the weakest precondition computation with the subject procedure as its argument. If S is not a procedure call, control flows to block 1522 where engine 250 determines whether S is an iteration. If yes, control flows to block 1524 where weakest precondition is computed for the loop body. In block 1524, this computation is performed by engine 250 processing the loop body. After which, control flows to block 1526 where it is determined whether the precondition of iteration S equals the postcondition. If yes, control flows to block 1528 where the result is set equal to P (i.e., P is a true invariant). If no, control flows to block 1530 where engine 250 asks the operator for the weakest precondition of S because the value cannot be computed. In responding to analyzer 200 in block 1530, the operator will also determine manually--by inspection--whether P is false. If S is none of the above, it is treated as a skip and the routine returns to the process in FIG. 14b pursuant to block 1538. After computation is performed in blocks 1514, 1516, 1520, and 1528, expression simplification is performed pursuant to block 1534 to simplify the result before returning to block 1416 in FIG. 14b pursuant to block 1536. Although not shown in FIG. 15, if S is a "goto" statement, wp calculation is performed on each associated path in a backwards flow--ultimately generating a value in accordance with the other statement types. More specifically, engine 250 performs a separate wp computation on each path associated with the goto. The individual results are ORed together (and simplified) thereby generating a single wp for the statement. Invariant verification, an operation performed on iterating statements, is illustrated in FIG. 16. Referring to FIG. 16, invariant verification begins at block 1610 with the operator providing an invariant. Control flows to block 1612 where weakest precondition calculation for a first segment S using the invariant as postcondition P is initialized. In block 1614, engine 250 applies the relationship for invariant calculation as shown in equation (22) below: ##EQU5## In equation (24), P is the supplied invariant and B is the condition (i.e., guard) for the segment execution. After performing the calculation in block 1614 (pursuant to the wp computation process of FIG. 15), engine 250 simplifies the result in block 1616. The simplification process may require user interaction as shown in block 1618. (See Library of Lemmas discussion, below) Simplification is described in greater detail below. In block 1620, engine 250 determines whether the resulting formula is reduced to "true". If no, control flows to block 1622 where the user may introduce another variant and begin the process again at block 1612. However, if the formula is reduced to true, the invariant is verified and the operator is free to consider the next segment pursuant to block 1624. 4. Simplification FIG. 17a illustrates the expression simplification process of blocks 1436 in FIG. 14a, 1422 in FIG. 14b, 1534 in FIG. 15, and 1616 in FIG. 16. The process of FIG. 17a is applicable to any predicate (i.e., precondition, postcondition, etc.) and is embodied in a simplifier module contained within engine 250. The source code for an exemplary simplifier is included in a file entitled "PCA2 simplifier," included in the Provisional Appendix. Referring to FIG. 17a, engine 250 initially accepts an input string in block 1710. This input string may represent, for example, a postcondition. The string is processed in block 1720 for ultimate reduction to a single OR chain 1773 shown in FIG. 17f. Processing is performed by a parser disposed within engine 250. The steps carried out in block 1720 are illustrated in FIGS. 17b-17f. At the outset, engine 250 parses an input string into tokens; each token holding a number or a variable as shown in FIG. 17b. Parsing a string into tokens enables engine 250 to "understand" the expression; i.e., learn what variables are present. Engine 250 next combines tokens into "terms;" each term data structure 1768 holding a number and/or variable (i.e., a term of a polynominal). As shown in FIG. 17c, a sequence of terms 1768 added together forms polynominal 1769. Engine 250 reads the input string at block 1710 sequentially and, therefore, can identify and situate operators (i.e., <, >, .ltoreq., .gtoreq., =, .noteq., etc.) correctly between polynominals. As shown in FIG. 17d, when two polynominals 1769 are disposed on either side of an operator 1770, the result is an equation or "assign" 1771. Upon identifying and assembling assigns, engine 250 concurrently places these data structures in a standardized format by moving integers to the right hand side (RHS) of the operator and variables to the left hand side (LHS) pursuant to block 1730. Engine 250 also conducts some rudimentary simplification by combining terms (i.e., same variable) and determining whether the newly created assign may logically reduce to a "TRUE" (i.e., y=y) or "FALSE" (i.e., 1=0) for all cases. Once the current assign has been reduced to its simplest form, engine 250 applies standard YACC Grammar (See Aho) in block 1720 to link this assign with other assigns which are related via a logical AND (i.e., an ampersand). As a result, engine 250 creates an AND chain 1772 made up of two or more assigns 1771, as shown in FIG. 17e. Upon creation of AND chain(s) 1772, engine 250 will begin the next level of simplification: (1) removal of equivalent assigns in each chain pursuant to block 1740 (i.e., eliminate redundant assigns), and (2) application of logical rules to each chain for possible reduction to false, pursuant to block 1750. As an example of the latter operation, if any assign reduced to false in block 1730, its corresponding AND chain (if any) would also reduce to false. Concurrent with the operations in blocks 1740 and 1750, engine 250 also assembles AND chains 1772 into a single logical OR chain 1773, again pursuant to standard YACC Grammar. After AND chains 1772 are simplified, OR chain 1773 is evaluated with logical rules in block 1760 to determine whether further simplification is possible. For example, if any AND chain reduced to true, then OR chain 1773 would also reduce to true. Four fundamental boolean relationships useful for reduction are shown below in Table 12.
TABLE 12
______________________________________
True Anything = Anything
True Anything = True
False Anything = False
False Anything = Anything
______________________________________
The logical simplification applied to OR chains in block 1760 is further illustrated in FIG. 17g. Referring to FIG. 17g, engine 250 builds a table of assignments in block 1761, reducing each statement (i.e., assign) to a single variable. In this way, relationships between different statements may be tracked (i.e., x=a and x.noteq.y NOT a). Engine 250 then builds logic vectors in block 1762 based on the relationship between variables (i.e., statements). Karnaugh Maps may be applied here to simplify relationships. Referring to block 1762 in FIG. 17g, the relationship "b AND c OR a AND b AND c OR NOT a AND b AND c" is ultimately reduced to a series of 1's, 0's and "don't cares" (i.e., "X"). These vectors are simplified in block 1763 resulting in the vector "X11." Finally, this simplified vector is changed back to a statement (i.e., input string) through the variables b and c, pursuant to block 1764. In the example of FIG. 17g, "X11" converts to b AND c, which becomes "x=yx>y." Since the relationships on either side of the conjunction cannot both be true, this statement simplifies to "false." Alternatively, if the vectors simplified to a relationship that was true under all circumstances (i.e., x=xy=y), then the statement would reduce to "true." If the input string cannot be simplified to True or False, some valid, logical relationship is output from the simplifier. The rules applied in this simplification operation include the boolean relationships provided in FIGS. 43a-43c. How this value is subsequently used will depend upon the on-going operation. For example, if the string is simplified during the course of a wp computation pursuant to block 1532 of FIG. 15, the simplified value serves as a postcondition for the next (i.e., upstream) statement. Alternatively, if the result is a final wp computation in a backward analysis pursuant to block 1422 of FIG. 14, the simplified expression is displayed to the operator on monitor 120. Finally, if the result is a simplified invariance computation pursuant to block 1616 of FIG. 16, the expression is displayed to the operator on monitor 120 for consideration pursuant to block 1620, as described therein. Improvements to the simplification process of FIG. 17a include inputting tokens rather than strings. Such a configuration would eliminate two transformation steps (i.e., from and to strings) within the process. A second improvement is creating a symbol table (i.e., in block 1761 of FIG. 17g) that is permanent. Current operation requires the creation of a new table for each statement that is simplified. However, it would be more efficient to maintain the table in memory and simply update when new relationships are produced through simplification of new statements. Such improvements are currently being implemented in the most recent version of analyzer 200. Referring again to the exemplary source code in FIGS. 13b and 13c, several examples of backward analysis output are provided in FIGS. 42a-42c. In the example of FIG. 42a, the postcondition chosen was "1" (i.e., TRUE; a constant without any dependency on the source code). Accordingly, the weakest precondition is TRUE; i.e., line 2217 will always be reached from line 2207. In the example of FIG. 42b, the variable C-FEB is assigned a free variable "n" as the postcondition. The weakest precondition tells that line 2217 can be reached from line 2207 by two different paths. One path reports that if C-YY is divisible by 4, then C-FEB will have a value of 29. The other path reports that if C-YY is not divisible by 4, the C-FEB will have a value of 28. Finally, in the example of FIG. 42c, the postcondition is used to check at what condition C-FEB will be assigned a value greater than 29. The weakest precondition result reports that C-FEB cannot be assigned a value greater than 29 from line 2207 to line 2217. Accordingly, the weakest precondition is FALSE. E. Analysis Sequence: Flow Graph FIGS. 18a and 18b illustrate the software analysis sequence of code analysis engine 440 under the control of user interface 130 (FIG. 4). As a general summary of operation, engine 440 initially generates a flow graph from translated code, in accordance with block 1810. Thereafter, an operator designates the relevant code fragment to be analyzed in block 1812. The operator must then decide whether to slice the subject fragment and, if so, under what criteria in accordance with blocks 1814-1826. Finally, flow analysis (either forward or backward) is carried out in blocks 1828-1846. 1. Flow Graph Generation Analysis engine 440 builds a flow graph from the internal IL and a corresponding symbol table. A flow graph is a tuple (B, E) where B is a set of "basic blocks" representing a procedure P, and E is a set of directed arcs between the basic blocks. A basic block is a data structure constructed to represent a list of statements in P such that the flow of control of P enters at the beginning of a basic block and leaves at the end without halting or branching until the last statement of the block. A basic block is represented as a structure containing a linked list of statements. A basic block, within a flow graph, has successors and predecessors. For example, if a basic block B ends in an unconditional branch, then the basic block B2 containing the target of the branch is its only successor, and B1 will be a predecessor of B2. If B1 ends in a conditional branch then it will have two successors. A directed arc exists from any basic block B1 to any basic block B2, if B2 is a successor of B1. An example of a procedure (i.e., procedure "foo") and its corresponding flow graph is shown in FIGS. 19a and 19b. Referring to FIG. 19b, boxes 1910 to 1940 represent the basic blocks of procedure foo while arrows 1950 to 1980 represent the arcs between the basic blocks. The construction of basic blocks in a flow graph architecture is standard compiler technology and, therefore, well known to those having ordinary skill in the art. Additional description may be found in Aho. FIGS. 20a-20c illustrate how a second sample C function "foo" is modified and configured within engine 440. FIG. 20 a shows foo in its native source code; C. FIG. 20b illustrates this function in external IL format. Finally, FIG. 20c shows a foo in flow graph format. The flow graph of FIG. 20c is made up of a number of blocks or "graph nodes." For example, graph node 2010 is a basic block, as described above, containing a block header 2012, an internal IL statement 2014, identification of block predecessors 2016 and identification of block successors 2018. Block header consists of a block number (integer obtained by enumerating the number of basic blocks in the flow graph) and remarks (information regarding the basic block). Internal IL statement is a sequence of internal IL statements that make up the block. Block predecessors and successors are lists of block numbers of the predecessors and successors of the current basic block. An additional datastructure that is created and is used in backward analysis is a control flow chart (i.e., control tree). This is generated using the mechanism described in Shirir, "Structual Analysis: A New Approach to Flow Analysis in Optimizing Compilers", Computer Languages, Vol. 5, pp. 151-153 (1980). This structure creates superblocks (Shirir refers to them as "intervals") which contain lists of basic blocks. These represent sequential (i.e., non-iterative) segments of code. These are the same segments of code described in the UNITY discussion. The control tree is further discussed in Appendix C. Engine 440 will display the information in FIGS. 20a-20c on monitor 120 in accordance with display 2100 of FIG. 21. To help the operator correlate flow graph information with source code and IL code information, a graph window 2110 will be aligned with a source code window 2120 and an external IL window 2130 automatically, as shown in FIG. 21. Selections made between the source, IL and flow graph elements are always "in sync." Returning to FIG. 18a, once a flow graph has been generated, the code-analyzer operator may designate a relevant code fragment within that flow graph pursuant to block 1812. Specifically, the operator may construct a desired fragment by selecting elements from an existing flow graph, or directly from the corresponding source or external IL files, as shown in FIG. 21. Responding to operator instructions, engine 440 constructs a code fragment by selecting a (1) starting basic block, (2) starting statement, (3) ending basic block, and (4) ending statement. Upon designating a relevant code fragment, the operator of code analyzer 400 has the option of "slicing" the fragment, pursuant to block 1814 of FIG. 18a. 2. Slicing The translation of source code into intermediate language helps disentangle the possible complex flow of control of sequential programs. In fact, the presentation of a program in intermediate-language form (i.e., in such formats as assignments, alternations, iterations, etc.) is by itself sufficient to clarify its meaning in many cases. The creation of sequences of condition-actions (i.e., B.fwdarw.P) as required by the UNITY computation model is a further aid to informal understanding of the code. The latter, however, has a drawback. The number of sequences as well as their length is, in most practical cases, very large. This is of course also true for the source code of any large programs, but the condition/action sequences that can be extracted from any program are a much less compact presentation (albeit more suitable for human understanding) than source code. For example, in a program made up only by one set of nested if statements, the number of sequences that can be extracted may only be equal to the number of conditions existing in the code itself. However, if two program fragments are connected in sequence (via the ";" operator), the number of sequences to examine for the composite program equals the product of the number of sequences of the two components. In fact, the sequential composition of program statements can produce an explosive growth on the number of sequences that need to be examined during program reading. To cope with large size, code analyzer 400 employs a slicing function, which enables the operator to extract portions of a subject program relevant to the analysis. Condition/action sequences extracted from a slice of the program, are usually much smaller and fewer in number than those extracted from the original source code. Referring again to FIG. 18a, the operator has an option of slicing a relevant code fragment at block 1814. If no slicing is requested, control flows to block 1828. However, if slicing is requested, the operator must then select starting and ending points in block 1816 (i.e., define the scope of the slicing operation). The ending point is the last statement of the relevant code fragment considered in the computation of a slice. It is designated by the operator and must lie within the relevant code fragment. It is the statement at which backward analysis begins or forward analysis ends. The starting point is a statement lying in the relevant code fragment and preceding the ending point in the flow of execution. It is the statement at which forward analysis begins or backward analysis ends. Engine 440 can define a slice by selecting an appropriate (1) starting basic block, (2) starting statement, (3) ending basic block, and (4) ending statement, as designated by the operator. After the initial slice has been defined, the operator selects desired slicing criteria in block 1818 of FIG. 18a. Engine 440 is capable of performing three different types of slicing operations: (1) semantic slicing using weakest precondition computation, (2) simple textural slicing, and (3) transitive closure slicing pursuant to blocks 1820, 1822, and 1824, respectively. a. Semantic Slicing The slicing of a program using the weakest precondition is called "semantic slicing." The precondition for slicing need not be a complex relation. It is possible to use free variables (i.e., those that do not occur in the program at issue) in the form x=m (x is a program variable) to obtain the semantic slice for any state involving x. In fact, by calculating backward the weakest precondition of all statements preceding the starting point, all statements that do not produce a weakest precondition different from the postcondition and the calculated weakest precondition can be safely eliminated. The remaining ones are those constituting the semantic slice for x, starting at the chosen point where the precondition x=m has been assumed. The slice thus calculated is (with the exception of a pathological case of almost no practical interest) the smallest program that shows all statements contributing to the state of x at the point where the postcondition has been assumed. The pathological case is that of programs containing statements changing the state of a variable and then changing it back to the previous one. For example, statements (25) and (26) provided below would be left in the slice, although they are useless: x:=x+1 (25) ;x:=x-1 (26) In practice, the calculation of semantic slices does not require the full evaluation of the weakest precondition. This is because real world programming languages use as alternation only "if-then-else" statements. For these statements, the weakest precondition would be different only if there is an assignment to a variable named in the precondition. This may be explained by reference to the proof shown in Table 13.
TABLE 13
______________________________________
Example of Alternation Statement Not Affecting Postcondition
______________________________________
wp. (if B then S1 else S2). Q
(B wp.S1.Q) ( B wp.S2.Q)
= {S2 does not name any variable in Q}
(B wp.S1.Q) ( B Q)
= {S1 does not change the state of any variable
named in Q}
(B B) Q
=
Q
______________________________________
Consequently, semantic slices can be effectively calculated by automatic inspection of the intermediate-language code. Semantic slices, when applied in large programs, appear to be very effective because in many programs, those portions that are connected sequentially usually deal with different variables. These portions may contain several alternation blocks. Consequently, the number of interesting sequences is reduced, via the elimination of whole alternation blocks. Furthermore, it is likely the length of these sequences is shorter because of the elimination of many statements that do not affect the states of the variables of interest. (1) Semantic Slice Examples Consider the program fragment provided in Table 14 below.
TABLE 14
______________________________________
Exemplary Program Fragment
______________________________________
y:=x
;if x = 0 then (x:=x+1; z:=0) else z:=y end
;z:=z+1
;y:=0
______________________________________
The semantic slices for x, y and z (i.e., those portions of the program which affect the states of x, y and z, respectively) are provided in statements (27), (28) and (29), respectively. ;if x=0 then x:=x+1 else skip end (27) y:=0 (28) ;if x=0 then z:=0 else z:=y end ;z:=z+1 (29) The semantic slice for a pair of variables (i.e., x, y) is provided in statement (30). ;if x==0 then x:=x+1 else skip end ;y:=0 (30) Note that some statements containing y are not in the semantic slice because they do not effect the states of y at the point of interest (i.e., the end of the fragment under consideration). (2) Program slicing operation Code analysis engine 440 may carry out semantic slicing operations pursuant to two methods. In either case, the body of a program intended to be searched is defined as "S", where S is a composition of individual statements (i.e., S.sub.1, S.sub.2 . . . S.sub.N), each of which (Si) is an assignment, skip, abort, alternation, composition or repetition. Further, the goal to both methods is slicing S with respect to the slicing predicate P. In other words, engine 440 determines a member of the set "PSlice (S,P)." In a first method, the steps carried out by engine 440 are described generally below. 1. Compute P.sub.i =wp.(S.sub.i S.sub.i+1 ; . . . ; S.sub.N).P for 1.ltoreq.i.ltoreq.N. 2. Determine all pairs <i,j> with i<j such that P.sub.i .ident.P.sub.j. 3. Starting with j=N, find the smallest i so that <i,j> is one of these pairs and slice all the statements from S.sub.i to S.sub.j inclusive. 4. Set j=i-1 and repeat step (3) while j is positive. 5. Next, examine all remaining alternation statements, performing steps (1) to (5) to slice each guarded statement and loop body. If examining statement S.sub.i (in the original labeling), then P.sub.i plays the role of P (the slicing predicate). In simplifying alternations, combine guard predicates where the guarded statements are the same. This first method, discussed in detail in FIGS. 22 and 23, has proven effective for moderately-sized code sequences, although its time complexity is O(N.sup.2). A second method, described in FIGS. 23 and 24, is faster although less comprehensive since step 2 only uses pairs where i=j-1 and step 3 is omitted. Both methods are described in detail below. The second method will probably be incorporated in analyzer 400. FIGS. 22a and 22b disclose flow chart 2200 which describes a first method of semantic slicing of program segment S with respect to predicate P. Each statement S.sub.i in segment S is a skip, abort, composition, assignment, alternation, or repetition. The weakest precondition for each statement within segment S is calculated based upon predicate P. The rules defined above for the various statement types are followed and P.sub.N+1 is set equal to slicing predicate P. The method begins at block 2214 where a counter i is set equal to N. Pursuant to blocks 2214 through 2220, engine 440 calculates the precondition predicate P.sub.i for each statement S.sub.i based upon a postcondition of P.sub.i+1 (i.e., backwards computation). As shown in FIG. 22a, this operation iterates through i=N to i=1. Referring to block 2218, if a statement S.sub.i is an alternation statement, control flows to block 5100 (i.e., block 2324) in FIG. 23b which will perform a slicing operation in an attempt to simplify this statement. This operation is discussed further below. After the attempted simplification is complete, control returns to block 2218 where the predicate P.sub.i is calculated for the current statement. If a statement Si is an iterative statement, and wp.(loop body).P.sub.i+1 =P.sub.i+1, then this statement has no effect on the variables in the slicing predicate and the entire loop may be sliced. Accordingly, P.sub.i may be set equal to P.sub.i+1. However, if this first equality does not hold, then the operator searches for a logical conjunction of the P.sub.i+1 such that no variables are shared between that conjunction and the body of the loop or the guard of the loop. These conjunctions contribute directly to P.sub.i. For each variable in the loop or in the guard, a condition of the form is added: variable=free.sub.-- variable. All statements of the loop are left unsliced. For example, if P.sub.i+1 =x=7y=23 and x does not appear in the body of the loop, and the only variables which appear in the loop and guard are y and z, P.sub.i would be set equal to x=7=lambdaz, =mu. This approach results in a conservative slice. The assumption is made that any of the named variables set equal to a free variable could potentially be present in the correct weakest precondition. Therefore, slicing with this precondition results in a superset of the correct slice. After calculating the predicate P.sub.i for each statement S.sub.i in blocks 2214-2220, control flows to blocks 2222-2232 wherein slicing operations are performed. Referring to FIG. 22b, all statements S.sub.i contained within segment S are set as "unmarked" for slicing in block 2224. Blocks 2226-2232 attempt to identify equivalent predicate values (P.sub.i and P.sub.j) that will establish a sequence of statements that have no effect on the variables making up the predicates. Blocks 2226 and 2232 represent a first outer loop used to sequence through values of P.sub.j where j is initially set equal to N. Blocks 2228 and 2230 represent a second inner loop where predicate values P.sub.i are sequenced through and i is initially set equal to 1. Starting with the smallest i value and largest j value, P.sub.i is compared with P.sub.j in block 2230. If there is no match, i is incremented and the equivalence check is performed again. However, if there is a match, then control flows to block 2232 which marks all statements from S.sub.i to S.sub.j for slicing. After this sequence of statements have been identified, j is decremented 1 and flow returns to block 2226 to repeat this operation. Alternatively, if there is no corresponding equivalent P.sub.i for a particular P.sub.j, then j is decremented by 1 in block 2228 and the operation repeats itself for the next j value starting at block 2226. After the j values are exhausted, (i.e., j is decremented to 1) control flows from 2226 to block 2234 where every marked statement in segment S is replaced with a skip statement. By this operation, marked statements are sliced from segment S. Upon completing the slicing operation in segment S, a secondary slicing operation may be performed on each alternation statement. This operation is carried out according to flow chart 2300 in FIGS. 23a and 23b. In block 2310, counter i is said equal to 1 and in 2312 this value is checked to determine if all S.sub.i values have been considered. If they have, then slicing is complete and the routine may be exited via block 2320. If not, control flows to block 2314 where all skip, abort, repetition and assignment statements are passed over for secondary slicing. In which case, control flows to block 2318 where i is incremented and the exercise is repeated for the next S.sub.i. However, if S.sub.i is an alternation statement, then engine 440 will apply an alternation slicing theorem, as shown in FIG. 23b, to further simplify this statement pursuant to block 2316. More specifically, as shown in blocks 2316 and FIG. 23a, control flows to address 5100 (i.e., block 2324) to carry out alternation slicing theorem. The alternation slicing theorem provides that for any alternation statement (i.e., "if B then S1 else S2") where none of the alternative statements (i.e., neither S1 nor S2) contain any assignment to a variable listed in the postcondition predicate, its weakest precondition is equal to its postcondition (i.e., the effect of this statement is the equivalent of a skip). Referring to FIG. 23b, block 2324 represents the first step in the alternation slicing theorem. As indicated in this block, the weakest precondition of S.sub.i based upon a post condition of P.sub.i+1 does not equal P.sub.i+1 because the alternation statement is not a skip. Had this statement been true, alternation statement S.sub.i would have been converted into a skip in block 2234 as discussed above. In blocks 2326-2332, the weakest precondition of each statement associated with a particular guard (B) is evaluated against predicate P.sub.1+1. If the predicate is true, then statement T.sub.j is replaced with a skip as indicated in block 2328. However, if T.sub.j is not a skip statement, then statement T.sub.j is sliced with respect to predicate P.sub.i+1 by invoking a slow complete slice starting at block 2210 (i.e., address 2000) in FIG. 22a or invoking a fast slice at block 2210 (i.e., address 1000) in FIG. 24. After this operation is complete, control flows to block 2332 where j is incremented by 1 and the analysis is repeated for the next guard and statement pair. Upon completion of the operation over all, control is returned via block 2334 to block 2318 in FIG. 23a. In block 2318, i is incremented by 1 and the next S.sub.i statement is evaluated pursuant to the foregoing discussion. Once i exceeds the value N (i.e., the number of statements S.sub.i) control flows to block 2320 which results in an exit of this routine with segment S now being sliced with respect to predicate P. Flow chart 2400 in FIG. 24 illustrates a portion of a fast slicing operation which is similar to the first half of the slow slicing operation shown in flow chart 2200 of FIG. 22a. Referring to FIGS. 22 and 24, blocks 2210-2218 and block 2222 in both figures have the same meaning and function. In FIG. 24, however, block 2410 replaces block 2220 from FIG. 22a. In block 2410, predicate P.sub.i (representing a precondition) is compared with predicate P.sub.i+1 (representing a postcondition). If these values are equal, then the corresponding statement has no effect on the variables defining these predicates and the statement may be sliced from the corresponding segment S. This operation is repeated for each P.sub.i value within the segment. As shown in FIG. 24, once i is decremented to a value less than 1, control flows from block 2216 to block 2222 which then passes to block 2310 at address 4100 in FIG. 23a. The remaining operation is then identical to that described above for the slow slicing operation. As an alternative embodiment, engine 440 may create a variable list "V" to carry out the foregoing alternation slicing theorem, pursuant to FIGS. 25 and 26. Referring again to flow charts 2200 and 2400 in FIGS. 22a and 24, respectively, engine 440 may create a variable list V in block 2210. During this operation, the engine places variables named in predicate P into the list. In block 2212 of FIGS. 22a and 24, engine 440 adds new variables and deletes irrelevant variables from the list after a new computation of predicate P.sub.i. This list is updated again in block 2218 of FIGS. 22a and 24. An exemplary variable list and its association with a program under analysis is shown schematically in FIG. 25. Referring to FIG. 25, statement 2530 is associated with variables listed in program variable list 2510 through postcondition 2520. The variables themselves are linked to each other within variable list 2510 as shown by arrows 2540 and 2550. Referring to FIG. 26, an exemplary postcondition 2610 is shown associated with the variables contained therein; x and y. These variables are shown listed in variable list 2510. Significantly, postcondition contains four terms but only two variables. Engine 440 can analyze the previous statement S by verifying whether S contains on any left-hand side ("LHS") of a statement the variables x or y. The placement of these variables on the left-hand side of a statement indicates a new value is being assigned to them. The names of the variables are linked to the postcondition as shown in FIG. 26. It is currently anticipated that engine 440 will not use a variable list "V" to carry out the foregoing alternation slicing theorem. Rather, engine 440 will check for the variables involved in a change of state for the s alternation slicing theorem directly from the postcondition b. Textual Slicing Returning to FIG. 18a, the foregoing discussion on semantic slicing covers blocks 1820 and 1826. Alternatively, the operator may choose to textual slice a selected code fragment pursuant to blocks 1822 and 1826. This operation is straightforward requiring simple textual matching of the slice criteria (i.e., "seed variables") and the program fragment. Those statements that cannot assign a value to a seed variable will simply be extracted from the slice by engine 440, as described below. This operation is analogous to a word search in a conventional word processing program. More specifically, the operator specifies a simple textual slice by naming a set of program variables called "seed variables" and designating a code fragment as described above. The result consists of every source or IL statement lying within the designated code fragment that can assign to one or more of the seed variables. Assignments made by called procedures are included. A flow chart describing this operation is provided in FIG. 27. Flow chart 2700 in FIG. 27 details the process of textual slicing. In block 2710, a list of all basic blocks in any path between a starting and ending statement (identified in block 1816 of FIG. 18a) is created by engine 440. This operation is a simple graph coloring procedure described in Aho. Upon creation of the list in block 2710, control flows to block 2712 to determine whether all basic blocks have been considered. If yes, the procedure is terminated at block 2714, resulting in a sliced fragment. This slice will then be the subject of flow analysis pursuant to blocks 1828-1846 in FIGS. 18a and 18b. If all basic blocks are not done, control flows to block 2716 in FIG. 27 to determine whether all statements in a basic block have been considered. If yes, control flows to block 2718 where the next basic block is selected from the list created in block 2710. Thereafter, control returns to block 2712 to repeat the foregoing process. If there is a statement in a basic block under consideration that has not yet been reviewed, control flows to block 2720 to determine whether such statement could possibly assign a value to one of the seed variables. The analysis in block 2720 is carried out in a conservative manner. As such, if an assignment is identified through a pointer, engine 440 assumes it can change one of the seed variables and, therefore, is not removed from the slice. Similarly, for a basic block in a subroutine, any statement that assigns to a formal parameter (i.e., identifier appearing in a procedure definition; called a "formal argument" in C and a "dummy argument" in Fortran) that can correspond to an actual parameter (i.e., argument passed to a called procedure) which is one of the seed variables is kept. Based on this conservative approach, if the subject statement can possibly assign to one of the seed variables, the statement remains in the slice and control flows to block 2724 where the next statement is selected. Alternatively, if no assignment can be made by the subject statement, control flows to block 2722 where the subject statement is removed from the slice and control then flows to block 2724. As shown in FIG. 27, this operation is carried out repeatedly until all statements within all blocks of a selected list are reviewed and either kept within the slice or discarded. c. Transitive Closure Referring again to FIG. 18a, the operator may select yet a third choice for slicing criteria in the form of transitive closure as shown in block 1824. The operator specifies a slice for transitive closure by naming a set of program variables called seed variables. The result consists of every source or PIL statement, lying within the designated code fragment, which would be obtained by the series of steps outlined in FIGS. 28a and 28b. Flow chart 2800 in FIGS. 28a and 28b describe a slicing operating based upon transitive closure. In block 2810, engine 440 creates a list of all basic blocks on any path between a starting and ending statement that was designated in block 1816 of FIG. 18a. This operation is the same as block 2710 in FIG. 27. In block 2812, engine 440 creates a list of variables representing an initial seed list as defined by the operator. Every seed variable is a current working variable. Control then flows to block 2814 where engine 440 determines whether all basic blocks have been reviewed. If yes, control flows to blocks 2816-2820, which are described below. If no, control flows to block 2822 where engine 440 determines whether all statements in the current basic block undergoing review have been considered. If yes, the next basic block is selected in block 2824 and control returns to block 2814. If no, control flows to block 2826 to determine whether the subject statement could possibly assign a value to one of the seed variables. If no, the statement is removed from the slice in block 2834 and control thereafter flows to block 2832 to select the next statement. If yes, the subject statement is maintained within the slice. In brief, every source or IL statement lying within the designated slice which causes an assignment to one or more of the seed variables is added to the current slice. Assignments made by called procedures, including their side effects, are included. Control next flows to block 2830 where all referenced variables in the statement being kept within a slice are added to the seed list initially generated in block 2812. Additionally, a flag is set indicating that a variable has been added if at least one was not in the previous list. As a result, every program variable which appears in a statement to be included in the slice and which is not a current seed variable becomes a new seed variable. Global variables and variables declared in called procedures are included. Engine 440 selects the next statement pursuant to block 2832 and control returns to block 2822, as described above. After all statements in all blocks have been reviewed, control flows to block 2816 to determine whether any seed variables were added to the slicing criteria. If there are no new seed variables, the result is the current slice and control flows to block 2818 which terminates the procedure and returns control to block 1828 in FIG. 18a. Alternatively, if new seed variables have been added to the seed list of block 2812, control flows to block 2820 where the list of basic blocks is reset (i.e., markers indicating that a basic block has been reviewed are cleared) and the flag indicating new seed variables have been added to the seed list is reset. In so doing, the new seed variables are converted to current seed variables and subsequently used as slicing criteria. Control then flows to point A where the slicing operation with the newly converted variables is repeated starting at block 2814. As the foregoing illustrates, slicing is repeatedly executed under transitive closure until all possible variables, included within the affected statements, are used as slicing criteria. In so doing, a "closure" is obtained with respect to the affected variables. The operation as described in flow chart 2800 cover blocks 1824 and 1826 in flow chart 1800 of FIG. 18a. Upon completion of transitive closure sli | ||||||
