Method and system for determining source code location5822592Abstract A first improved method is disclosed for constructing a parse tree. The parse tree includes parse tree node records including a child pointer field, a sibling pointer field and an end position field. The improvement comprises storing information in the end position field. The information being selected from the group of an end position value and a parent node address. A second improved method is disclosed for identifying a source code segment associated with a first parse tree node record. The improvement comprises the step of determining the start position of the source code segment based on a variant field of the first node record and a variant field of a second node record. The improvement further comprises the step of determining the end position of the source code segment based on the variant field of the first node record. Claims What is claimed is: Description TECHNICAL FIELD
______________________________________
(1) Nonterminal: TerminalOrNonterminal.sub.-- 1
. . .
TerminalOrNonTerminal.sub.-- k
{ action code }
______________________________________
Using YACC, action code can be embedded in grammars describing target languages. Such action code is often written in C and can be used to manipulate symbol tables and construct parse trees when dealing with non-trivial target languages. The action code can also be used for code generation or for directly interpreting target language constructs when dealing with relatively simple languages. Consider a simple grammar, G, with nonterminals {S,B}, terminals {a,b}, and three production rules as follows: (2.1) G : S B (2.2) S : a (2.3) B : b Given input steam 116 containing "a b b b b", FIG. 1 illustrates the state of parsing operation after the following sequence of operations: shift reduce (2.2) shift reduce (2.3) reduce (2.1) shift The next parser operation is a reduction with rule 2.3, and currently, cursor 118 of lexical analyzer 112 points to the end position of the second "b", the terminal symbol to be consumed by the next reduction. Generally, in order to apply a YACC production rule such as production rule (1) in a reduction, parser 110 repetitively invokes lexical analyzer 112 and performs shift operations. During each iteration, parser 110 determines whether the top-most k symbols on stack 114 match the right-hand side of production rule (1) by examining the current parser state and the next input token. When the top-most k symbols on stack 114 match the right-hand side of production rule (1) parser 110 performs a reduction based on production rule (1). Parser 110 replaces the top-most k symbols with the left-hand side symbol of (1). After the reduction, any associated action code is executed. At this point, cursor 118 of lexical analyzer 112 stays at the ending position of the scanned input stream 116, or the ending position of a subsentence derived from production rule (1)'s left-hand symbol, in this case, the construct Nonterminal. In the event that parse tree construction code is embedded in the associated action code, a node for the construct Nonterminal would be created. This node represents the derivation of a subsentence by Nonterminal. A partial correspondence from the parse tree node to the source code location of its subsentence can also be established by storing the position of cursor 118 of the lexical analyzer 112 in the node. The correspondence is partial since only the end position of the subsentence is available. A novice YACC developer might suggest establishing a full correspondence by associating action code with the production rule. Action code could be added to the beginning of production rule (1) for creating the parse tree node and setting the start position of a subsentence. While action code could be added at the end of production rule (1) for setting the end position. Unfortunately, the applicability of this straightforward solution is rather limited. Most target languages include empty production rules --rules with an empty right-hand side. This solution will only work when the target languages are simple enough that additional empty production rules would not introduce ambiguities into the grammar. YACC introduces a new empty rule for action code inserted in the middle of a production rule. Thus, if action code was added at the beginning of the rules, the reduction of the corresponding empty rules and the subsequent execution of the action code would occur with cursor 118 of lexical analyzer 112 pointing at the starting position of a subsentence. For a complex grammar, this approach would greatly increase the number of actual grammar rules. It is quite likely that the large number of empty rules would alter the definition of the original language. Prior Art Parse Trees Referring now to FIG. 2, there is illustrated a prior art parse tree node data structure. Parse trees are represented as binary trees using linked lists. As the number of offsprings of a node in a parse tree varies from node to node, these parse trees are actually generalized binary trees. A typical prior art parse tree node is represented using data structure 210. Each node includes two pointers. Child pointer 216 associates a node with its first offspring. Sibling pointer 214 associates a node with its next sibling. Thus, all offsprings of a node can be found by traversing its child pointer 216 followed by sibling pointers 214. In addition to pointers for traversing the parse tree, data structure 210 also includes StartPosition field 211 and EndPosition field 212. StartPosition field 211 is used to keep track of the first position of the symbol sequence associated with a node. Likewise, EndPosition field 212 is used to keep track of the last position of the symbol sequence associated with a node. Each node in a parse tree represents a sequence of terminal symbols. StartPosition field 211 establishes a partial correspondence between the node and the sequence of terminal symbols the node derives. A complete correspondence is established when StartPosition field 211 is used in conjunction with EndPosition field 212. Without StartPosition field 211, determining the first position of a node's symbol sequence would require access to its parent node. There are two alternatives for deriving this information. The first alternative is to add a third pointer to node data structure 210 which points to a node's parent. The second alternative is to perform a tree traversal to find a node's parent when needed. The first approach is unsatisfactory as it provides no benefit. The memory space saved by eliminating StartPosition 211 would be used by the new pointer. The drawback of the second approach is that it would result in considerable performance deterioration. Consider the following simple grammar:
______________________________________
S .fwdarw. aPb
P .fwdarw. cQd
Q .fwdarw. Pe .linevert split. e
______________________________________
Further, consider a sentence, x, in the language of the grammar:
______________________________________
accededb
12345678
______________________________________
The numbers beneath the individual symbols indicate their positions. Thus, the position of the symbol "a" is 1; the position of the first "e" is 4; and the location of the substring "ced" is ›3,5!. The canonical derivation sequence of x is as follows:
______________________________________
S .fwdarw.
aPb .fwdarw.
acQdb .fwdarw.
acPedb .fwdarw.
accQdedb .fwdarw.
accededb
______________________________________
Referring now to FIG. 3, there is illustrated a representation of this derivation sequence as a parse tree using data structure 210. The directions of the arcs are assumed left-to-right and top-down, respectively. The grammar symbol each node represents is also noted in the diagram for easy identification. Node 310 is representative of the nodes of the parse tree illustrated in FIG. 3. Node 310 includes child pointer 316, sibling pointer 314 and EndPosition field 312. Child pointer 316 points to node 318. Node 318 is the first child of node 310. Sibling pointer 314 contains nil because node 310 has no siblings. Start-Position field 311 contains the value 1 indicating that the first position of the code segment associated with node 310 is 1. EndPosition field 312 contains the value 8 indicating that the last position of the source code segment associated with node 310 is 8. Using this prior art parse tree, identifying the location of a source code segment associated with a node requires lexical analyzer 112 to derive of the segment's first position which requires a parse tree traversal. A parse tree traversal for this simple sentence results in minor performance degradation when the parse tree is generated. As the complexity of either the language or the sentence increases, however, so does the level of performance degradation. In addition, this prior art parse tree reserves memory for the StartPosition field associated with each node. The Data Structure Of The Present Invention Referring now to FIG. 4, there is illustrated a parse tree node data structure 410 in accordance with the present invention. Each node includes two pointers. Child pointer 416 associates a node with its first offspring. Sibling pointer 414 associates a node with its next sibling. These pointers perform substantially the same functions as child pointer 216 and sibling pointer 214, respectively. Data structure 410 also includes EndPosition/ParentPointer field 412. Field 412 is a variant field which is used in performing two functions. For nodes which are a rightmost child node, the EndPosition/ParentPointer field 411 contains a pointer to the node's parent. For all other nodes, the EndPosition/ParentPointer field 412 contains the last position occupied by the symbol sequence associated with the node. Using data structure 410, the present invention overcomes the deficiencies of the prior art without requiring additional memory space and without incurring unacceptable performance degradation. An observation that makes the method of the present invention feasible is that in a parse tree representation, a node corresponding to a non-terminal symbol always has the same end position as its right-most child node. To take the advantage of this observation, the method of the present invention eliminates StartPosition field 211 from the prior art data structure. Further, the present invention utilizes the fields of the prior art data structure differently. In particular, the method of the present invention interprets EndPosition/ParentPointer field 412 of a right-most child node as a pointer to its parent node. This interpretation is possible because the end position of a right-most child node is always the same as that of its parent node. Note that the rightmost child nodes are those whose sibling pointer contains nil. FIG. 5 illustrates the parse tree for the derivation sequence in accordance with the present invention. The revised interpretation of the EndPosition/ParentPointer field as a variant field is illustrated by reference numeral 524. The EndPosition/ParentPointer field of node 522 contains a pointer 524 to its parent node 520. Node 522 is the right-most child of node 520. Finding a parent node in accordance with the method of the present invention is a matter of following the parent pointer contained in the EndPosition/ParentPointer field of one's right-most sibling node. The performance cost of such an operation typically involves a couple to half a dozen pointer traversals, as is the case of this example. The control logic for determining source code locations of parse tree nodes is outlined below. Using a functional style, the control logic is presented as a collection of functions. Control Logic: Determining source locations of parse tree nodes.
______________________________________
Node = childPointer, siblingPointer, endPosition .linevert split.
parentPointer
startposition =
.sup. .lambda.n: Node if (p = nil)
.sup. then 1
.sup. else (1 +
.sup. endposition (leftsibling
p.childPointer n))
.sup. where p = parent n
endposition =
.sup. .lambda.p: Node if (p< >nil)
then if (rightmostsibling n)
.sup. then endposition p
.sup. else n.endPosition
else n.endPosition
.sup. where p = parent n
leftmostsibling =
.sup. .lambda.p, n: Node if (p.childPointer = n)
then True
else False
rightmostsibling =
.sup. .lambda.n: Node if (n.siblingPointer = nil)
then True
else False
leftsibling =
.sup. .lambda.s, n: Node if (s.siblingPointer = n)
then s
else (leftsibling s.siblingPointer n)
parent =
.sup. .lambda.n: Node if (rightmostsibling n)
then n.parentPointer
else (parent n.siblingPointer)
______________________________________
Note that the "parent" function described above returns a nil for the root node. The root node is the node pointed to by a global pointer to a parse tree. Such a conditional test, however, is omitted from the "parent" function. Some example computations of the two functions, "startposition" and "endposition" are shown below, based on the parse tree of FIG. 4. The notation Node›P! represents the node corresponding to grammar symbol "P". Multiple occurrences of a symbol are further differentiated with a number i, indicating the ith occurrence of a symbol from left-to-right. startposition(Node›P!) =2 endposition(Node›P!) =7 startposition(Node›d!) =7 endposition(Node›d!) =7 startposition(Node›c, 2!) =3 endposition(Node›c, 2!) =3 While the best mode for carrying out the invention has been described in detail, those familiar with the art to which this invention relates will recognize various alternative designs and embodiments for practicing the invention as defined by the following claims.
|
Same subclass Same class Consider this |
||||||||||
