Per-keystroke incremental lexing using a conventional batch lexer5737608Abstract A system and method are disclosed that enable a batch lexer to be used to incrementally update a token stream representation of a computer program maintained in an editor as the computer program is being edited. A keystroke executive interprets editing inputs and dispatches editing events to a lexical analyzer. The lexical analyzer converts a range of the tokens likely to be affected to an equivalent old textual stream that preserves whitespace implied by but not represented within the token stream. A new text stream is generated from the old text stream by carrying out the current editing event. I.e., insertion of text is handled by the insertion of the relevant text into the old text stream (now the new stream) and deletion of a character is handled by deleting the appropriate character from the old text stream. The batch lexer is then invoked on the new text stream and as a result returns a new token stream. The fewest possible tokens from the new token stream that reflect the entire impact of the current editing event are returned to the keystroke executive along with an updated editing point within the new token stream as the suggested token stream update. The keystroke executive is free to ignore or accept the suggested update. The new token stream and the new text stream can be generated lazily, respectively one token and one character at a time. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE 1
______________________________________
Event stream
Position Content Lexical Class
113 158a 158b 158c
______________________________________
0 1 0.vertline.
Integer literal
0x 1 0x.vertline.
Incomplete integer literal
0x5 1 0x5.vertline.
Integer literal
0x5H 1 0x5H.vertline.
Illegal integer literal
______________________________________
In this example, note how the lexical class changes as the user types. This is because the keystroke executive 130 lexically analyzes the event stream 113 after each user keystroke. Also note that the token stream position 158a does not change as the user's keystrokes have the effect of modifying a single token, which, in this example, is the first token. Rather, what does change after each token stream 158 update is the insertion point 157, which marks both the keystroke executive's work position in the token stream 158 and the position of the cursor (displayed to the user on the display 118), with which the user interacts with the program being edited. B. Token Boundaries In another marked contrast to text editors, the keystroke executive 130 does not represent in the token stream 158 spaces typed by the user between program tokens (however, as will be discussed later, the keystroke executive 130 does represent in the token stream 158 typed spaces in string and character literals). Instead, the textual role played by whitespace in conventional text editors is replaced in the token stream 158 by token boundaries maintained by the keystroke executive 130 as the user types. Most token boundaries are implicit in what the user types and are recognized by the keystroke executive 130 and tokenizer 132 through lexical analysis. At such implicit boundaries there is no editable element between tokens. For example, recall that the tokens 158-1 through 158-13 in the token stream 158 shown in FIG. 3 correspond to the event stream "for(i=0; i<1024;i++)". This expression could have been entered from the input device 112 in many ways, all semantically equivalent; e.g., the user might have typed the event stream 113 as "for.sub.-- (i.sub.-- =.sub.-- 0;.sub.-- i<1024;.sub.-- i++)" or as "for(i=0;.sub.-- i<1024;.sub.-- i++)", where the ".sub.-- " symbol corresponds to a typed space in the event stream 113. To the user entering this text, these spaces play an important role. For example, they provide visual separation between adjacent program words and, as they are directly displayed by text editors, are an integral component in rendering a pleasing display of the program 124. However, none of these spaces are necessary in a semantic sense; i.e., their presence or absence does nothing to change the meaning of the for-loop expression (although in some situations, covered below, spaces do provide semantic information). Consequently, the keystroke executive 130, in tokenizing these event streams, discards any spaces entered by the user between the tokens 158-1 through 158-13. That is, the boundaries between the tokens 158-1 through 158-13 are implicit, being lexically inferable from the lexical classes of the adjacent tokens. At such implicit boundaries there are no editable elements between the tokens. 1. Separators There are also ambiguous token boundaries, where the program language would require a space in a purely textual representation (i.e., ambiguous boundaries are token boundaries where a space would be required in the textual equivalent of the token stream to preserve the boundary). In these cases, the keystroke executive 130 creates a separator, an editable element that acts something like a space. More specifically, in the present editor 122, a separator will only be associated with adjacent tokens in the token stream 158 that could legitimately be merged into one token. Generally, this determination is made by comparing the content of what the user is typing to the language definition of the program such as defined in the extended lexeme table 168a-2. For example, no separator is needed between the two tokens "for" and "("in the token stream" for("because those two tokens cannot legitimately be merged. In other words, no space is required in the textual equivalent of the token stream to preserve the boundary between the program words "for" and "(", which is not an ambiguous boundary. As a result, regardless of whether the use types "for.sub.-- ("or "for(", the keystroke executive 130 always gives the same token stream without a separator: "for", "(". In contrast, if the user types two adjacent "+" characters without any intervening spaces, the keystroke executive 130 generates a single corresponding plus-plus token "++", but if the user types a plus character "+" followed by a space ".sub.-- " then another plus character "+", the keystroke executive 130 returns a different token stream with two tokens and an intervening separator: "+.quadrature.+". Note that the separator in the two token example is not required because the token stream representation of the event stream "+.sub.-- +" would be ambiguous without it; in fact, the event stream "+.sub.-- +" could be unambiguously represented as two adjacent plus ("+") tokens having an implicit token boundary. This separator is also not required to create a pleasing display of the program 124 as the typographical display processor 170 displays visual whitespace between adjacent tokens based on their respective lexical classes, not user-entered spaces. Rather, the separator in the token stream 158 corresponding to the event stream "+.sub.-- +" is required so that the user can edit the corresponding stream as if it were text. I.e., by providing an editable separator corresponding to the displayed whitespace between the two plus tokens, the editor 122 gives the user an object to delete, whose deletion instructs the keystroke executive 130 to join the separated tokens into the single token, "++". If this separator were not present, there would be no simple, text-like way in which a user could join the two tokens. For example, consider the editor's behavior if a necessary separator were missing. If there were only an implicit boundary between the tokens (i.e., if the event stream "+.sub.-- +" were internally represented as the two adjacent tokens "+" and "+"), the user might place the cursor ("1") in the whitespace between the displayed tokens as follows: "+.vertline.+", and hit the rubout key, thinking this action should delete the space between the tokens, thus textually joining them. However, because the only editable element to the left of the cursor is the leftmost plus token, this action would only result in the keystroke executive 130 deleting that leftmost token. Thus, adding a separator between the two plus tokens allows the user to join the two tokens in a familiar, text-like manner (i.e., by putting the cursor between the separator and the rightmost plus token, then deleting backwards). Moreover, because a separator serves only an editing purpose and not a visual (whitespace) purpose, the keystroke executive 130 will allow only one separator between tokens, no matter how many spaces a user enters. In addition to serving as an editable element, a separator acts as an immutable token boundary; i.e., adjacent tokens with an intervening separator may never be joined as a side effect of other operations. The preceding discussion set out what separators are; i.e. they are editable elements that act like textual space and define immutable token boundaries. The following discussion sets out various ways in which separators are implemented in the present editor 122. a. Separators-Implementation: As stated above, the tokenizer 132 or the keystroke executive 130 determines whether a separator is required in each token boundary based on lexical properties of the two surrounding tokens. In the preferred embodiment of the present invention, the keystroke executive 130 does not actually represent any required separators in the token stream 158, but decides, on the fly, whether a separator exists or should exist between two tokens. Alternatively, based on the same type of decision procedure, the keystroke executive could actually add an explicit separator to the token stream 158. This decision procedure can be implemented by the editor 122 in two different ways, one approximate but conservative (meaning that some unnecessary separators are introduced into the token stream, but that no necessary separators are omitted), the other exact (meaning that only necessary separators are introduced into the token stream). The approximate procedure must be conservative as it is ok with respect to editing behavior to have additional separators but absolutely critical not to omit necessary ones. The preferred embodiment of the present invention implements the approximate but conservative approach. In this approach, the keystroke executive 130 or tokenizer 132 determines the need for a separator between two tokens by referring to the separator table 168a-1 on the basis of the lexical classes 158c of the adjacent tokens. The separator table 168a-1, as is shown in FIG. 3, consists of a grid of boolean separator values (0/1>no separator/separator) indexed by the lexical class of two adjacent tokens in the token stream 158. For example, if the leftmost of the two adjacent tokens has a class 158b of "101" (reading the vertical axis of the table 168a-1 ), and the rightmost has a class 158b of "100" (reading the horizontal axis of the table 168a-2), they would be separated by a separator as their intersection in the separator table 168a-1 has a boolean separator value of "1". This is an approximate approach because it is based on the lexical classes of the adjacent tokens and not the particular tokens being lexically analyzed. For example, class-oriented rules in the separator table 168a-1 would indicate that a separator should always be included between an identifier on the left and a floating point number on the right. This rule is required because a user might occasionally want to join these types of tokens; e.g., where the left token is "A" and the right token is the floating point number "1E5", these two tokens could legitimately be merged into the single token identifier "A1E5". However, because this rule is conservative, it introduces an unnecessary separator in other situations where the textual equivalent of adjacent tokens of the same respective classes have an unambiguous boundary; e.g., where the left token is "A" and the right token is "0.1E5", the tokens cannot legitimately be joined. Using this approximate approach it is ok to add unnecessary separators (although this does require some special editing approaches), but it is not ok to eliminate necessary separators. For an example of such a case where the separator is critical, see the "++" example set out above. Thus, the approximate approach embodied in the extended lexeme table 168a-2 errs on the side of adding unnecessary separators. As intimated above, this approximate and conservative approach does cause some editing problems, such as when a user tries to delete an unnecessary separator such as the one associated with the boundary between the token "A" on the left and the token "0.1 E5" to the right (in this case the editor 122 simple moves the cursor over the unnecessary separator). However, this table lookup method is a fast, resource-efficient approach that can be executed on the fly as the user of the editor 122 generates the event stream 113. Moreover, it should be noted that unnecessary separators occur rarely in practice in languages such as C and C++ because the adjacent tokens that would give rise to an unnecessary separator never legitimately occur next to each other. Like the approximate approach, the exact decision procedure is rule-based, but rather than being token class-oriented, this latter approach evaluates the specific adjacent tokens according to a set of lexical rules to determine whether adjacent tokens could legitimately be joined. For example, the efficient approach would prescribe a separator between the tokens "A" and "1 E5" but not between the tokens "A" and "0.1 E5". This approach eliminates editing side effects as described above, but is more computationally intensive. 2. Provisional Separators Provisional separators are a temporary record that the user has just hit the space bar in a place, such as between two tokens, where there is no need of an additional separator. The keystroke executive 130 or tokenizer 132 makes use of these provisional separators based on the assumption that the user must have typed the space for a good reason, thus, a record of the space should be preserved, if only for a short time. This allows the user to enter a new token whose boundary with the previous token is ambiguous, in which event the keystroke executive 130 eventually converts the provisional separator to an ordinary separator. If the user types a new token whose boundary with the previous token is not ambiguous, the keystroke executive 130 merely eliminates the provisional separator. In the preferred embodiment of the editor 122, there can be only one provisional separator in the token stream 158, and it only exists immediately to the left of the insertion point (described later). For example, assume that the user wishes to modify the text expression "a=b" to read "a.sub.-- c=b". In a text editor, a typical user might accomplish this change by placing the cursor to the left of the equals sign in the first expression, hitting the spacebar and typing a "c" following the space. However, in the editor 122, the event stream "a=b" is actually represented as the token stream 158 shown in Table 2 consisting of three tokens with no intervening separators.
TABLE 2
______________________________________
position content class p/s/bit
158a 158b 158c 158d
______________________________________
i a variable/61 0
i+1 = equals- 0
arith.op./71
i+2 b variable/61 0
______________________________________
Thus, in the absence of a provisional separator, when the user places the insertion point 157 between the tokens "a" and "=" and strikes the spacebar, the keystroke analyzer 130 would make no corresponding change in the token stream 158. I.e., the space typed by the user would not be recorded or recoverable. This is because the insertion point 157 already corresponds to an implicit token boundary between unambiguous tokens. Thus, when the user goes on to type the "c", which textually has an ambiguous boundary with the token "a" (i.e., "a.sub.-- c" has a different textual meaning than "ac"), the keystroke executive 130 would not introduce the necessary separator at the boundary between the "a" and "c" tokens and instead would form the updated token stream "ac=b". However, as the present editor 122 assumes that the user must have typed the seemingly superfluous space for a reason, it provides a means (the provisional separator) for representing this kind of editing activity in the internal token stream 158, which allows the editor to change the provisional separator to an ordinary separator when the user enters the "c". In the preferred embodiment of the present invention, the provisional separator is represented as a boolean provisional separator bit 158d (0/1>set, not set) in the insertion point data structure 157. Thus, in the present editor 122, there is only one provisional separator permitted in the token stream 158, and, if present, it is always understood to exist only to the left of the insertion point. This is because a provisional separator is an ephemeral object created by the editor 122 as a result of an immediately preceding input event. In another preferred embodiment of the present invention, the provisional separator is represented as an additional bit each token in the token stream 158 indicating whether the token is followed by a provisional separator. To allow the kind of editing behavior expected by users of text editors, the keystroke executive 130 or tokenizer 132 sets the provisional separator bit 158d whenever the user types a space at an existing token boundary where no separator is present. Thus, in the above example, the tokenizer 132 sets the provisional separator bit 158b when the user types a space between the "a" and "=" tokens; and the resulting token stream would read: "a.box-solid.=b" where ".box-solid." represents the provisional separator 158b at the insertion point 157). After the tokenizer 132 inserts the provisional separator in the token stream, the typographical display processor 170 will display the provisional separator as a space, but possibly in a way that is distinct from a displayed ordinary separator. Generally, a provisional separator 158b is ephemeral, being reset to 0 by the keystroke executive 130 and possibly replaced by an ordinary separator depending on the next user keystroke. In the example above, assuming that the user types a "c" at the provisional separator ".box-solid.", the tokenizer 132 will appropriately replace the provisional separator with an ordinary separator, meaning that the token stream 158 now reads "a.quadrature.c=b". The tokenizer uses the ordinary separator in this case as the token boundary between the tokens "a" and "b", which are both of the lexical class "variable/61", is ambiguous. If the user typed a plus sign "+" at the provisional separator instead of "c", the tokenizer 132 would remove the provisional separator completely as the resulting boundary between "a" and "=" in the expression "a+=b" would be unambiguous. Note that the tokenizer 132 will never enter more than one provisional separator between adjacent tokens into a flow. The insertion point 157 can occupy many possible positions with respect to an individual token and the aforementioned separators. However, of these myriad insertion point positions, only the six basic positions shown in Table 3 are necessary to the following discussion of the token stream and insertion point updating performed by the keystroke executive 130 (i.e., Table 3 is not exhaustive). Each row of Table 3 describes the insertion point, shows its associated index, which will frequently be referred to in the following discussions, and also includes an example of a token stream segment ("n: in int") showing the respective insertion point 157. In these examples, "(", "n", ":", "in", "int" and ")" are all tokens and the vertical bar (".vertline.") marks the insertion position 157 and the corresponding cursor position. Ordinary separators and provisional separators are shown as a hollow box ".quadrature." and a filled box ".box-solid." respectively. Table 3: Basic Insertion Point Positions
______________________________________
Position Example
______________________________________
1: Within a token. (n: i.vertline.n.quadrature.int)
2: Between tokens on a line,
(n: .vertline.in.quadrature.int)
no separator.
3. Between tokens on a line,
(n: in.vertline..quadrature.int)
at left of a separator.
4. Between tokens on a line,
(n: in.quadrature..vertline.int)
at right of a separator.
5. Between tokens on a line,
at right of provisional separator.
(n: .box-solid..vertline.in.quadrature.int)
6. Between tokens on a line, at left
(n: in.box-solid..vertline..quadrature.int)
of a sep, at right of prov. sep.
______________________________________
Thus, the keystroke executive 130 must have access to the following information to unambiguously locate the cursor in the token stream 158: (1) the token-oriented position (one of the six values from above), (2) if the token-oriented position is "1", the token-stream position of the token in which the insertion point is positioned, or, if the token-oriented position is not "1", the token-stream position of the token to the right of the insertion position; (3) the specific editable element to the right of the insertion point; and (4) the provisional separator bit 158d. Except for the provisional separator bit 158d, this cursor positioning information can be stored in a data structure accessible to the keystroke executive 130 or can be determined on the fly by the keystroke executive 130 from the editing context. The preferred embodiment of the present invention adopts the latter approach; however, for the purposes of this application, the four values (1) through (4) are illustrated in FIG. 3 as the lexical.sub.-- position 157b, the token.sub.-- position 157a, the editable.sub.-- element.sub.-- position 157c, and the provisional separator bit 158d of the hypothetical insertion point data structure 157. C. Lexical Analysis Process As set out above, maintaining the token stream 158 on the fly requires the keystroke executive 130, with the assistance of the tokenizer 132, to recognize all of the lexemes allowed by the language in which the program is being entered. Lexical analysis to this extent is well-known in batch-oriented tokenizer's, such as "Lex", which are executed on a program after the program has been entered. However, to maintain the token stream 158 on the fly completely and accurately, the keystroke executive 130 and the tokenizer 132 must also recognize the ill-formed and incomplete lexemes that are formed while a user of the editor 122 is typing. Maintaining the token stream 158 in this manner requires the keystroke executive 132 to analyze the impact of each user keystroke in the context of a small portion of the existing token stream 158 (called a token stream segment 138), insertion point 157, extended lexeme table 168a-2 and separator table 168a-1. Of course, the number of tokens in the token stream segment 138, and their positions relative to the current insertion point depends on the lexical characteristics of the programming language in which the program is being entered. For example, in C or C++, the keystroke analyzer 130 generally needs to analyze no more than the tokens adjacent to the token containing the insertion point. While the keystroke executive 130 can sometimes update the insertion point 156 and token stream 158 by itself (e.g., when a user deletes the single character of a single character token), in most situations it must consult with the tokenizer 132, which will suggest an update to the token stream segment 138 based on the extended lexical rules 168a-2. In such a consulting situation, the keystroke executive 130 queries the tokenizer 132 differently based on whether the current editing operation is an insertion or a deletion. In the case of an insertion, the keystroke executive 130 passes the tokenizer 132 a token stream segment 138 consisting of the token to the left of the insertion point and all of the tokens to the end of the current line of text, the character to be inserted, and the position at which that character is to be inserted, and asks the tokenizer 132 to recommend minimal changes to the token stream segment 138 that reflect the effects of the textual editing operation. In response, the tokenizer 132, which is a narrow expert in lexical analysis, knowing nothing of the insertion point 157, visual whitespace, or cursor behavior, performs an incremental lexical analysis on the token stream segment 138 by inserting the new character into the textual equivalent of the token stream (treating all separators as spaces) in the place indicated by the keystroke executive 130, and then analyzing the result as if it were text. Because the tokenizer 132 tries to make minimal changes to the token stream 138, in most situations changes to the token stream will not go beyond the token to the right of the insertion point. The tokenizer 132 then returns its suggested token stream update 138' to the keystroke executive 130. It must also return the position of the inserted character relative to the suggested update to the token stream segment 138' so that the keystroke executive 130 can update the insertion point appropriately. When the editing action is a delete, the keystroke executive 130 identifies the editable element in the token stream segment 138 the user wants to delete and asks the tokenizer 132, "if this element were to be deleted, how should the token stream segment 138 be updated?" As in the case of an insertion, the tokenizer 132 computes the minimal change to the token stream segment 138 and returns its suggested update and the offset of the deleted character relative to the suggested update to the token stream segment 138' to the keystroke executive 130. The keystroke executive 130 is then free to accept or reject the suggestion based on higher level information to which only it has access. The keystroke executive 130 then applies the suggested update 138' back to the token stream 158 and updates the insertion point 157. As an example of case where the keystroke executive 130 rejects the suggestion of the tokenizer 132 assume that a user about to create a comment has just typed the first character of a C-style comment beginning delimiter (i.e., "/"). The keystroke executive 130 passes this keystroke to tokenizer 132, which concludes that this keystroke is a C division token; therefore it suggests that the keystroke executive 130 replace the token stream segment 138 with an updated token segment 138' including the new division token. At this point, the keystroke executive 130 has no way of knowing that the user is typing a comment; therefore, it accepts the tokenizer's suggestion. Now assume that the user has just typed the second character of the comment beginning delimiter (i.e., "*"), which would look to an ordinary lexer like a C multiplication token. In the preferred embodiment of the present invention, the tokenizer 132 using extended lexical analysis rules is aware that it should join these two tokens together to form a special token ("/*") that is not in the language, This causes no confusion because neither C nor C++ allow a division token to be meaningfully adjacent to a multiplication token. Thus, the tokenizer 132 suggests that the keystroke executive update the token stream 158 by replacing the division token "/" with the special token "/*". However, and this is an area that will be covered in depth later, a basic tenet of the editor 122 is that comments and any other non-program information be kept separate from the token stream 158, simplifying lexical analysis of the program 124. Consequently, the keystroke executive 130 rejects the replace suggestion made by the tokenizer 132 and instead deletes the "/" from the token stream 158 and calls the comment processor 136 to create a new comment. This commenting process is described in greater detail in a related application by the same inventor, "Token-Based Computer Program Editor with Program Comment Management," Ser. No. 08/499,088. Having generally described how the tokenizer 132 and the keystroke executive 130 cooperate to incrementally update the token stream representation 158 of a computer program being edited, further details of the tokenizer's preferred structure and method of operation will now be set out. D. Tokenizer/Lexer Implementation In the preferred embodiment, the Tokenizer 132 and the objects with which it interacts are units of an object-oriented system. As is well known, object-oriented systems provide a set of inter-related classes, each of which can specify methods and data structures associated with objects that are instances of that particular class. Classes can also have subclasses, which inherit the data structures and methods of one or more parent classes and which can include additional data structures and methods specific to the subclass. Additionally, object instances of one class can encapsulate, or "wrap" objects of another class so as to provide a seamless external interface to the wrapped object. For example, a wrapper object could act as an intermediary between a design program and a graphics object storing graphical data that is incomprehensible to the design program. Such a wrapper object would provide methods on the graphics object that translate the graphical data to a form that is compatible with the design program. FIG. 4 shows six classes that cooperatively compute a token stream update while performing minimal lexical analysis. These classes, which include TokenEdit 200, IncrLexemeStream 300, InsTextTokenIstream 310, DelCharTokenIstream 320, LexicalAnalyzer 330 and TokenIstream 340, are now briefly described. The TokenEdit class 200 provides a set of methods that compose the Tokenizer 132 functions described above and the data structures used by those methods to return a suggested update to the token stream segment 138' to the keystroke executive 130 following an input event. The IncrLexemeStream class 300 encapsulates a text stream and lexically analyzes (using a batch lexer, such as "Lex", encapsulated by the LexicalAnalyzer class 330) the wrapped text stream. The Tokenlsteam class 340 encapsulates a TokenRange structure (a sequence of tokens, for example, the token stream segment 138) and permits the TokenRange to be read, one character at a time using the method get.sub.-- char() 342.1, as if the TokenRange were a simple text stream. The classes InsTextTokenIstream 310 and DelCharTokenIstream 320 are subclasses of the TokenIstream class 340, and each specializes the behavior of class TokenIstream 340 by permitting a single (per instance) textual change (insertion or deletion respectively) to be applied to the textual equivalent of the TokenRange. Thus, a client that creates an instance of class InsTextTokenIstream 310 and then reads the text stream produced by method get.sub.-- char() 312.1 does not see the exact textual equivalent of the encapsulated TokenRange, but instead sees the textual equivalent after it has been modified by the insertion of text at a particular location (both specified when the instance of InsTextTokenIstream is created). Taken together, the classes shown in FIG. 4 enable a conventional batch lexer to be invoked on a token stream segment 138 wrapped by one of the "istream" classes 310, 320. These classes also allow the resulting tokens to be returned to the keystroke executive 130 as a suggested update to the token stream segment 138' including an updated insertion point. Thus, the present invention makes possible the use of existing, text-oriented batch lexers to perform incremental lexical analysis on data streams with which they would otherwise be completely incompatible. Before the classes are described in depth, their initialization and interactions following an insertion event are described in reference to FIGS. 5A and 5B. Referring to FIG. 5A, there is shown is a process flow diagram illustrating the steps by which object instances of the classes shown in FIG. 4 are created and initialized following an input event. In this and subsequent figures, objects are shown as boxes with rounded corners, and classes and data are shown as boxes with squared corners. Messages passing between objects or actions performed by one object on another object are indicated by arrows whose direction shows the direction of the method or action. In most cases, the arrows are labelled with an index, such as "(5-0)", associated with a particular message. Input events 113 are received by the editor's keystroke executive 130, which dispatches those events according to category. In most cases, the keystroke executive 130 creates a TokenEdit object 132 by calling the method TokenEdit::insertText() 212 (in the case of an insertion event) or the TokenEdit::deleteChar() method 232 (in the case of a deletion) (5-1). Due to the similarity in the way insertions and deletions are handled by the present invention, the rest of this description assumes that the keystroke executive 130 called the insertText() method 212. The insertText() method 21 2 creates an instance (InsTextTokenIstreamObj 410) of class InsTextTokenIstream 310 by issuing (5-2) an InsTextTokenIstream message to the class 310. An InsTextTokenIstreamObj 410 produces on demand (lazily) a stream of characters that represent the hypothetical text stream, called "new text." Once the InsTextTokenIstreamObj 410 has been instantiated, the insertText() method 212 issues an IncrLexemeStream message (5-3) to the IncrLexemeClass 300, resulting in the creation of an object instance ("LexStream") 400 of the class IncrLexemeStream 300. An IncrLexemeStream object 400 produces on demand (lazily) a stream of new tokens that represents the hypothetical state of the token stream after the desired insertion (deletion) has been made. In this case, the LexStream object 400 is initialized to produce the new token stream from the "virtual text stream" produced by the InsTextTokenIstreamObj 410. In turn, the LexStream object 400 creates an object instance (LexObj) 430 of the Lexical Analyzer class 330, which encapsulates the extended batch lexer that will actually generate the lexemes returned by the LexStream object 430 (5-3a). Referring to FIG. 5B, there is shown a flow chart illustrating the post-initialization interactions of the objects described in reference to FIG. 5A as they incrementally compute the token stream modification of a computer program following an input event. Following the initialization steps (5-1), (5-2), (5-3) and (5-3a), the tokenizer method TokenEdit::insertText() 212 (or, in the case of a deletion event, TokenEdit::deleteChar() 232) enters a synchronization loop that compares the existing token stream segment 138 with the hypothetical token stream produced by the IncrLexemeStreamObj 410. This loop calls (5-4) the get.sub.-- LexData() method 302.1, whose function is to return (5-9) the next token from the new token stream. The get.sub.13 LexData() method 302.1 accomplishes this by invoking (5-5) the Lexer() method 332.1 of the LexStream object 430, whose role is to lexically analyze the textual equivalent of the new token stream and return token information (5-8) for the next token. The Lexer 332.1 obtains the textual basis for the new token stream one character at a time from the get.sub.-- char() method 312.1 of the InsTextTokenIstreamObj 410. Each time it is invoked (5-6), the get.sub.-- char() method 312.1 returns (5-7) the next character of the new text stream, including any inserted characters, in the appropriate order. Based on the new tokens received (5-9) from the LexStream object 400 and the old tokens in the token stream segment 138, the insertText() method 212 computes the minimum token-based changes necessary to update the current token stream 138 and returns (5-10) information about the change to the keystroke executive 130, which decides whether to make the change. Importantly, the tokenizer 132 takes care to compute where, relative to the hypothetical new token stream, the inserted (deleted) character is (was); this information is used by the keystroke executive 130 to place the insertion point after it carries out the change. The classes are now described in detail in reference to FIG. 4. TextRef and TokenRange Classes 202, 246 An aspect of the present invention that is central to the operation of the classes shown in FIG. 4 is the way the present invention represents text coordinates, token positions in the token stream 158, ranges of tokens (such as a token stream segment 138), and the location of the insertion point 157 within or between tokens. Text Coordinates Text coordinates are conventional. A text location is the position between two adjacent characters in a text stream, represented as an integer index from 0 through M, where the stream contains M characters. The value of the index corresponds to the number of characters to the left of the position. Thus, the beginning of a stream is position 0, and the end is position M. In the present invention, text streams are generated as textual equivalents of the token stream segment 138 before and after the the current editing event. The character at a particular position is the character (if any) to the immediate right of the designated position. Thus, the first character in a text stream is at position 0, the last is at position M-1, and there is no character at position M. Token Coordinates A simple token location is the position between two adjacent tokens in a a token stream (e.g., the token stream 158), represented as an integer index from 0 through N, where the stream contains N tokens. The value of the index corresponds to the number of tokens to the left of the position. Thus, the beginning of a token stream is position 0, and the end is position N. The token at a particular position is the token (if any) to the immediate right of the designated position. Thus, the first token in a stream is at position 0, the last is at position N-1, and there is no token at position N. Token Ranges A token range (expressed in the present invention as the type "TokenRange" 246) is a pair of simple token coordinates (a,b), where 0<=a<=b<=N, that identifies the possibly empty sub-sequence of 5 tokens in a stream between the two positions. In the preferred embodiment, the coordinate "a" is referred to as "firstTokIndex" and the coordinate "b" as "lastTokIndex". Thus, token range (0,1) includes exactly one token, the first in the sequence, and the range (1,3) contains two tokens. The range (0,0) is empty. The range (1,1) is also empty, although at a different position. The range (0,N) contains the entire stream. Note that a token range does not contain tokens but simply holds references to tokens. Edit Points An editing point, or simply "point", (expressed in the present invention as type "TextRef" 202) is a pair consisting of a simple token location ("token.sub.-- position") and a text location ("element.sub.-- position"). The text location refers to a position in the text contained by the token at the token location. Thus, point (0,0) is just before the first character of the first token, and the point (0,1) is just after the first character of the first token. Point (0,M), where the first token contains M characters, is immediately after the final character of the first token. When there is no separator present at a token position, the following two points are logically equivalent and indistinguishable to a user: (n,m) at end of token n, which contains m characters, and (n+1,0) at beginning of token n+1. TokenEdit Class 200 Referring to FIG. 4, the tokenizer 132 is implemented by the TokenEdit class 200, which incorporates data structures 240 and methods 210 that pertain to high-level tokenizer functions. The methods 210 include insertText() 212 and deleteChar() 232. Each of the methods 210 allows the tokenizer 132 to compute the minimal incremental change (i.e., the suggested update 138') to the token stream segment 138 that carries out the current editing operation from the event stream 113. These methods are called by the keystroke executive 130 with appropriate arguments as dictated by the editing situation, and each call returns a new instance of TokenEdit that represents the suggested change 138'. For example, when the event stream 113 produces a text insertion operation, the keystroke executive 130 calls the Tokenizer's insertText() method 212 with a pointer to the inserted text 113 ("insertText"), the current token stream segment 138 ("TSS"), represented as a TokenRange structure 246, and an instance of the TextRef structure 202 ("editPoint"), which includes the token.sub.-- position 157a and the element.sub.-- position 157c from the insertion point data structure 157. Similarly, when the event stream 113 produces a character deletion operation, the keystroke executive 130 calls the Tokenizer's deleteChar() method 232 with the edit point and TSS arguments. As the operation of the deleteChar method() 232 differs from that of the insertText() method 212 in only a few details that are readily inferred from the inherent differences between text insertion and character deletion, the remainder of this discussion will focus on the operation of the insertText() method 212. Once called, the appropriate tokenizer method 210 (i.e., insertText() 212 or deleteChar() 232) computes, with the assistance of methods and data encompassed by the IncrLexemeStream class 300,a suggested update to the token stream segment 138' that represents a minimal update to the token stream segment 138 consistent with the editing action and the editPoint passed by the keystroke executive 130. The suggested update 138' is represented by data stored in a new instance 132 of class TokenEdit, including newPoint 242, replaceRange 244 and the list of new tokens 250, which together define the tokenizer's suggested update to the token stream segment 138 in such a way that the keystroke executive 130 can (1) replace existing tokens in the token stream 158 with new tokens that reflect the editing action and (2) correctly update the insertion point 157. The keystroke executive 130 accesses the updated token stream information by accessing the pointer to the TokenEdit object instance that is returned by the called tokenizer method. The replaceRange 244 specifies a range of tokens in the token stream 158 that is suggested to be replaced. As shown in FIG. 4, the replaceRange 244 is a structure of the type TokenRange 246. The new tokens suggested to replace the tokens specified in the replaceRange 244 are included in a list of tokens called new list 250. As with tokens in the token stream 158, a token in the new list 250 is an instance of a token class 248 that specifies a token's stream.sub.-- position 248.1, content 248.2, lexical class 248.3 and, optionally, a provisional separator bit 248.4. The newPoint structure 242 stores the location of the textual change that led to the suggested update 138'. This location is stored as coordinates that become meaningful relative to the token stream 158 after the suggested update is actually performed. The newPoint structure 242 is an instance of the TextRef class 202. Consequently, the newPoint 242 structure has two elements: (1) newPoint.token.sub.-- position (token position of the inserted text relative to the token stream 158 after the change is made); and (2) newPoint.element.sub.-- position (an offset from the beginning of the token indicated in the token.sub.-- position) that identifies the position to the right of the inserted text. The InsertText () method 212 computes the minimal update by comparing the existing token stream segment 138 before the change with a hypothetical token stream that reflects the result of the text insertion. In order to do this, the comparison conceptually operates over 4 different streams of: old tokens (the token stream segment 138), old text, new text, and new tokens. These streams are related by the following conceptual transformations: 1. old tokens to old text; convert old tokens to a text stream, translating separators to "strong" whitespace characters that preserve lexical boundaries; 2. old text to new text: insert text at point; and 3. new text to new tokens: conventional lexical analysis. These transformations are implemented lazily in order to minimize computation, meaning that the newly created streams are never represented fully but are instead computed in small pieces on demand, as needed by the comparison loop. The insertText() method 212 sets up this comparison by first invoking the InsTextTokenIstream() method 312.2 of the InsTextTokenIstream class, giving the method as arguments the old token stream 138, the text to be inserted, and the point at which it should be inserted. This creates an instance InsTextTokenIstreamObj 410 that produces the "new text" stream, which can be read a character at a time using its get char() method 312.1. The insertText() method 21 2 then invokes the IncrLexemeStream() method 302.2 of the IncrLexemeStream class, giving it as an argument the newly created "new text" stream. This creates an instance, LexStream 400, of the IncrLexemeStream class 300, that produces the "new token" stream. The new token stream can be examined one lexeme at a time by invoking its get LexData() method 302.1, which returns the next token in the new token stream each time it is executed. The insertText() method 212 then compares the existing token stream segment 138 with the new tokens produced by repeatedly invoking the get.sub.-- lexData() method 302.1 until it is able to discern no further difference between the two token streams. At that point, the insertText() method 212 returns the information describing the suggested update 138'. The operations of the get.sub.-- lexData() and IncrLexemeStream() methods 302.1, 302.2 and the methods of the "istream" and LexicalAnalyzer classes 310, 320, 330, 340 that cooperate with the methods 302.1,302.2 are now described. IncrLexemeStream and Lexical Analyzer Classes 300, 330 The IncrLexemeStream class 300 and Lexical Analyzer class 330 lexically analyze the hypothetical new text stream and generates a sequence of "new tokens" that is intelligible to the keystroke executive 130 and compatible with the existing token stream 158. The IncrLexemeStream class 300 has methods 302 that include get.sub.-- lexData() 302.1 and IncrLexemeStream() 302.2. The get.sub.-- lexData() method 302.1 is called by the insertText() method 212 whenever it needs another token from the "new token" stream. In response, the get.sub.-- lexData() method 302.1 calls the Lexer() method 332.1 of the Lexical Analyzer class 330, which produces, in cooperation with the get.sub.-- char() method 312.1 of the InsTextTokenIstream class 310, the next token. The get.sub.-- lexData() method 302.1 returns to the calling tokenizer method 210 (e.g., insertText() 212) three pieces of data: "lVal" the lexical class of the returned token, "tokLength" the number of characters in the token, and "tokText", a pointer to the characters composing the token. As described in reference to FIG. 5A the IncrLexemeStream() method 302.2, which is a constructor for the class IncrLexemeStream 302, is called by insertText() 212 at the beginning of every editing operation to create and initialize an object instance of the class IncrLexemeStream 300. Once created, the instance of the class IncrLexemeStream 300 subsequently executes all get.sub.-- lexData messages from the insertText() method 212. The IncrLexemeStream() method 302.2 also sets up an instance of the LexicalAnalyzer class 330 that lexically analyzes the text stream. The Lexical Analyzer class 330 encapsulates a conventional batch lexer ("lex" in this case) whose tables 168a are computed offline (as part of the construction of the program) from a description of the lexical rules of the language to be analyzed. The tables 168a-2 in this case are derived from descriptions that are extended in two significant ways from common practice. First, the lexical rules of the language are expanded to recognize and categorize appropriately the incomplete and illegal tokens that can be formed as a user of the editor writes and modifies a program. Second, the rules of the language are extended so that whitespace characters, instead of being ignored as is customary, are analyzed and reported as quasi-tokens (not part of the language proper). This lexer is represented in FIG. 4 as the Lexer() method 332.1. Data 334 used by the Lexer() method 332.1 is also encapsulated in the LexicalAnalyzer class 330. Thus, the lexeme.sub.-- table 168a-2 is also included in the class 330. The Lexer() 332.1 reads from a text stream that is generated one character at a time by one of the "istream" methods, which include TokenIstream, InsTextTokenIstream and DelCharTokenIstream. These classes are now described. "Istream" Base Class 340 (TokenIstream) The TokenIstream class 340 encapsulates a TokenRange structure (a sequence of tokens, for example, the token stream segment 138) and uses the lexical rules of the language to permit the tokens identified by the TokenRange to be read, one character at a time, using method get.sub.-- char() 342.1, exactly as if the TokenRange were a simple text stream. The class TokenIstream is a subclass of the "istream" class supplied with standard C++ implementations, and by re-implementing appropriate virtual functions such as get.sub.-- char() is able to produce a text stream in a form suitable for textual input to the conventional batch lexer encapsulated by the LexicalAnalyzer class 330. The implementation is lazy, computing the text stream only on demand. Method get.sub.-- char() 342.1 in most cases simply returns the next unseen character of the token currently being read. When the last character of a token is reached, however, the separator table for the language 168a-1 is consulted to decide what character will be returned next. If there is no separator present between the current token and its successor, then the first character of the following token will be returned next. If there is a separator present, then a designated separator substitute character is returned next, followed by the first character of the following token. The separator substitute character could be a simple space character but, for the purposes of the present invention, it is a character (a newline character in conventional language) that will preserve the lexical boundary in certain unusual situations where a space might not. FIG. 6A represents an instructive example of how the TokenIstream class 340 can be combined with the Lexical Analyzer class 330 to translate a sequence of tokens to text and then back into tokens. In this instructive example, where the contents of the token stream are not being modified, the output will be equivalent to the input. In this and subsequent figures objects are represented by boxes with rounded corners and the data they interact with are represented by boxes with square corners. In FIG. 6A, the input token stream segment 138 consists of four tokens, "a", "+" "b" and "c". Because the boundary between the tokens "b" and "c" is ambiguous, that boundary includes a separator, shown as an asterisk, "*". As described above, the separator "*" is not actually present in the token stream segment 138 but is inferred by the get.sub.-- char() method 342.1 based on the classes of the tokens on either side of a token boundary and information in the separator table 168a-1 (see FIG. 3). In response to repeated get.sub.-- char() calls to a TokenIstreamObj 440 the get.sub.-- char() method 342.1 outputs (one character at a time) a text stream 413 that comprises the text: "a+b.rarw.c". The symbol ".rarw." in the text stream 413 represents a language-specific, separator substitution character ("substChar") 168-a3, which is inserted by the TokenIstreamObj 440 into the text stream whenever a separator "*" is encountered in the token stream segment 138. For example, in the preferred embodiment, which is directed to an editor for C++ programs, the substChar 168-a3 could be a space for ordinary conditions but, in order to deal with unusual situations, is a newline character. The presence of the substChar 168-a3 ensures that the Lexer() 332.1 stops scanning the text stream 413 when it reaches a token boundary occupied by a separator, such as the boundary between the tokens "b" and "c". For example, in the preferred embodiment, the C++ batch lexer that is the core of the Lexer() 332.1 always stops scanning the text stream 413 when it encounters a newline character. Thus, instead of combining the two tokens "b" and "c" into a single token "bc" , the Lexer() 332.1 returns two separate tokens "b" and "c". Additionally, the Lexer() 332.1 emits a whitespace quasi token 335, denoted by the hollow box symbol "58 ", to mark the position of the separator substitution character in the new token stream. This whitespace quasi token 335 is not part of the language proper, but is returned by the Lexical Analyzer class 330 in order to permit accurate accounting of all characters seen by the Lexer. As for the remainder of the text stream 413, because the lexical rules of C++ always imply a token boundary between arithmetic symbols (e.g., "+") and adjacent variables (e.g., "a" and "b"), the get.sub.-- char() method 342.1 does not insert any separator substitute 168-3 between those characters. Based on the text stream 413 and information in the extended lexeme table 168-a2, the Lexer() 332.1 is thus able to return a new token stream 431 that (with the exception of the whitespace quasi-token) is equivalent to the input token stream segment 138. "Istream" classes 310, 320 (InsTextTokenIstream, DelCharTokenIstream) The classes InsTextTokenIstream 310 and DelCharTokenIstream 320 are subclasses of the TokenIstream class 340 and inherit the behavior described above. However, each specializes (alters) the behavior of class TokenIstream 340 by permitting a single, per-instance textual change (text insertion and deletion respectively) to be applied in the process of computing the text stream. Because the two subclasses perform similar functions, the emphasis in the following descriptions is on the InsTextTokenIstream class 310. A client that creates an instance of class InsTextTokenIstream 310 and then reads the text stream produced by method get.sub.-- char() 312.1 does not see the exact textual equivalent of the encapsulated TokenRange (as would be produced by an instance of class TokenIstream 340) but instead sees the textual equivalent after modification by the insertion of text at a particular location, both text and location having been specified when the instance of InsTextTokenIstream is created. The InsTextTokenIstream class 310 includes two methods 312, get.sub.-- char() 312.1 and InsTextTokenIstream() 312.2. The DelCharTokenIstream class 320 includes similar methods, get.sub.-- char() 322.1 and DelCharTokenIstream() 322.2, whose respective behavior is inferable from the following descriptions of the methods 312.1 and 312.2. The DelCharTokenIstream class 320 includes data 324 which, except for lacking an analogue to the insertText variable 314.3, are similar to the data 314 for the InsTextTokenIstream class 310. The data 324 do not include an analogue to the insertText variable 314.3 as the DelCharTokenIstream class 320 does not handle text insertions, only deletions. The InsTextTokenIstream() method 312.2, which is the constructor for the InsTextTokenIstream class 310, is called every time the IncrLexemeStream() method 302.2 is executed (this occurs every time an insertion event occurs). When executed, the InsTextTokenIstream() method 312.2 creates a new instance ("lnsTextTokenIstreamObj") 410 of the class InsTextTokenIstream 310, which subsequently responds to all get.sub.-- char() messages issued by the Lexer() method 332.1. The get.sub.-- char() method 312.1 is invoked by the Lexer() 332.1 every time it needs another character from the new text stream 413. Based on the character returned by get.sub.-- char() 312, the Lexer() 332.1 either adds the character to a token being formed, which it reanalyzes, or, if the character completes a token, breaks off lexical analysis and returns a completed token. Data 314 used by the get.sub.-- char() method 312.1 includes TSS 314.1 (corresponding to the token stream segment 138), editPoint 314.2, and insertText 314.3, which are passed by the insertText( ) method 212 when it initializes an InsTextTokenIstreamObj 410 (see FIG. 5A). The get.sub.-- char() method 31 2.1 also has access to a separator substitute character (substChar) 168a-3 (see FIG. 6B), which is a character that acts as a language-specific lexical barrier, and a scanIndex 314.6 that holds the method's working position in the token stream segment 138. In addition to these InsTextTokenIstream data 314, the get.sub.-- char() method also makes use of the token class information 158c of each token in the token stream segment 138 and the separator table 168a-1, which indicates, based on the classes of adjacent tokens and language rules, which adjacent tokens require a separator. Each time it is called, the get.sub.-- char() 312.1 method consults the aforementioned data and determines whether it needs to: (1) return the next character from the textual equivalent of the token stream segment 138 (when the value of scanIndex 314.6 is not equivalent to editPoint 314.2 and is not between tokens whose boundary includes a separator); (2) return insertText 314.3 (when the value of scanIndex 314.6 is equivalent to editPoint 314.2); and (3) return substChar 168a-3 (when the value of scanIndex 314.6 is between adjacent tokens whose boundary includes a separator). Referring to FIG. 6B, there is shown a data flow diagram that illustrates some of the steps by which the InsTextTokenlStreamObj 410 generates a text stream 413 ("new text") that reflects an insertion operation and which, when processed by the LexObj 430, results in a suggested update 138' that represents the minimum change to the token stream segment 138 consistent with the insertion. Specifically, in the example of FIG. 6B, an insertion event has just occurred in which a user has placed the cursor (represented in FIG. 6B with the symbol "?") between the "b" and "c" of the token 158-3 ("abc") and typed a "+" symbol. Reflecting this input event, the keystroke executive 130 sets the insertText 314.3 to "+", generates a token stream segment 138 specifying the tokens from the leftmost token touching the insertion point (i.e., the token 158-3) through the end of the line 158-7 and sets editPoint.token.sub.-- position ("TP") to the token position of the first token of the token stream segment 138 and editPoint.element.sub.-- position ("EP") to 2, indicating that, within the token indicated by "TP", exactly two characters are to the left of the insertion point. This information is provided to the insertText() method 212 (not shown in FIG. 6B), which uses the information to initialize an InsTextTokenIstreamObj 410. The object instance 410 also has access to the separator table 168a-1 and the substChar 168-a3. Note that the token stream segment 138 is actually passed to the InsTextTokenIstreamObj 410 as a TokenRange specifying the tokens contained in the segment. Based on these data, in the course of repeated get.sub.-- char() calls from the LexObj 430, the get.sub.-- char() method 312.1 constructs the text stream 413, "ab+c", the new text that reflects the current insertion. Note that the get.sub.-- char() method 312.1 does not insert any substChars 168-a3 in this text stream 413 as there were no separators in the original token stream segment 138. Based on this text stream 413, the LexObj 430 constructs the new token steam 431 that includes three new tokens: "ab", "+" and "c" and the other tokens from the token stream segment 138 (represented as with the ellipsis ". . . "). Using a process described below in reference to FIG. 7, the insertText() method 212 observes that no further changes are necessary and sets newPoint 242 at the beginning of the "c" token, the position immediately following the newly inserted text (this position is equivalent to the end of the "+" token, since there is no separator present). As with all other instances of an editPoint, the newPoint is represented as a coordinate pair. In this case the coordinates are "(27, 0)", where the token position 27 refers to the location that will be occupied by new token "c" in the token stream 158 if the change is actually made; this information enables the keystroke executive 130 to place the insertion point properly in the modified token stream. The insertText() method 212 returns to the keystroke executive 130 the resulting suggested update 138', which includes three data members in the TokenEdit instance corresponding to the editing action. The first data item is the newPoint 242, which is "(27, 0)" as described above. The second data item is a replaceRange 244 that indicates which tokens of the token stream 158 are to be replaced, which in the example of FIG. 6B would be the range "(25,26)" that includes the single token "abc". The third is a list of new tokens to be inserted 250, which in the example would contain the three new tokens "ab", "+" and "c". Assuming the keystroke executive 130 decides to accept the suggested update 138', it would produce the updated token stream segment 158' where the original token 158-3 ("abc") has been replaced with tokens 158-3.1 ("ab"), 158-3.2 ("+") and 158-3.3 ("c"). The operations of the insertText() method 212 are now described with reference to FIGS. 7 and 8. Referring to FIG. 7, there is shown a block diagram of the insertText() method 212, which includes an insertText program 214, which actually implements that actions ascribed herein to the insertText() method 212, and insertText variables 216. As already described in reference to FIG. 4, the insertText() method 212 takes as inputs the token stream segment ("TSS") 138, the editPoint 314.2 and the insertText 314.3. Based on these inputs, the insertText() method 212 computes: 1. the range (replaceRange 244) of tokens in the token stream segment 158 to be replaced with tokens returned by the Lexer() 332.1, 2. a list (new.sub.-- list 250) of the new replacement tokens 248, and 3. the location (newPoint 242) just after the inserted text expressed in coordinates that are meaningful after the suggested updated 138' is made to the token stream 158. The insertText() method 212 accomplishes this by performing a parallel walk over the old token stream (the token stream segment 138 before the insertion) and the new token stream 431 generated by the Lexer() method 332.1 from the text stream 413, recording details of the needed changes and stopping when no further change is needed. This requires the insertText() method 212 to operate conceptually over four different streams: 1. old tokens (the token stream segment 138), 2. old text 411 (the textual equivalent of the token stream segment 138, which is implicitly generated by the get.sub.-- char() methods of the istream classes 310, 320, 340), 3. new text (the text stream 413 resulting from inserting the specified text into the old text stream 411), and 4. new tokens 431 (generated by the get.sub.-- lexData() method 302.1, which lexically analyzes the new text stream 413). The conceptual transformations performed on these streams by the insertText() method 212 and the methods it calls is are as follows: 1. old tokens 138>old text 411 (convert old tokens to conceptual text, translating separators to "strong" whitespace characters that preserve lexical boundaries); 2. old text 411>new text 413 (insert "insertText" at the "editPoint"); 3. new text 413>new tokens 431 (lexically analyze the new text, counting whitespace). In the preferred embodiment, these transformations are implemented lazily, meaning that they are computed on demand, token by token, as the insertText() method 212 walks the streams. The insertText() method 212 stops its parallel walk when it reaches a token boundary after the insertion point (editPoint) 314.2 where the (conceptually) same character is the first character of a token in both the old and new token streams 138, 431. From this point on there can be no further lexical effects of the textual editing operation. The insertText() method 212 determines whether the stopping criterion for the parallel walk is met (i.e., reaching a token boundary after the editPoint 242 where the same character is the first character of tokens in both the old and new token streams) by counting characters in the conceptual old and new text streams, which differ only by the presence in the new text stream of the "insertText" 314.3. This requires the insertText() method 212 to maintain several internal variables 216 that it uses when comparing the old token stream 138 to the new token stream 431 returned by the Lexer() 332.1. These positional variables 216 include: insertTextPosition 221 the offset in the first token of the TSS 314.1 where the insertText 314.3 should be inserted (this is the same as editPoint.element.sub.-- position); insertLength 222 the number of characters in insertText 314.3; newPointTokenPos 223 the token that includes newPoint 242 (this variable is updated during the parallel walk and is used to set newPoint.token.sub.-- position); newPointCharPos 224 the number of characters in the token identified by newPointTokenPos 223 to the left of newPoint 242 (this variable is updated during the parallel walk and is used to set newPoint.element.sub.-- position); prevOldToken 225 the old token previously seen during the walk; oldCharIndex 226 the number of characters walked so far in the old token stream (separators count as 1 character); and newCharIndex 227 the number of characters walked so far in the new token stream, including whitespace characters; nextOldToken 228 the current old token being examined; nextNewToken 229 the current new token being examined; and nextNewTokenLength 230 the length of the nextNewToken 229. Note that each iteration of the insertText() method 212 described in reference to FIG. 7 walks over a single discrete chunk of either the old token stream 138 or new token stream 250, attempting to locate a position where the stopping condition holds, For the purpose of this loop, a chunk of the old token stream 138 can be either a token or a separator. A chunk of the new token stream 250 can be either a token or a whitespace quasi-token 335 returned by the lexer. The following are loop invariants (i.e., conditions that hold true at the beginning of the loop's body as long as the loop is being iterated): 1. The tokenRange replaceRange 244 includes those tokens in the old token stream 138 that have already been examined. If there is a separator between the last examined token and its successor, then the separator has been examined when prevOldToken 225 is NULL, otherwise it has not yet been examined; 2. Characters in the old text stream examined so far (0 through position oldCharIndex 226) correspond to a conversion of the old token stream examined so far (the tokens in replaceRange) to text, with each separator in the old token stream converted to a single whitespace character; 3. Characters in the new text stream (which is equivalent to the old text stream with new text inserted) examined so far (0 through position newCharIndex 227) correspond to characters that have been converted by the lexer to tokens, either ordinary tokens or quasi-tokens whose only role is to report whitespace seen by the lexer; 4. The list of tokens new list 250 includes all ordinary tokens that have been produced the by the lexer so far. It does not include the quasi-tokens whose only role is to report whitespace characters; 5. newPoint 242 is non-NULL if we have walked at least to the end of the newly inserted text; i.e., newCharIndex 227>=newPointCharPos 224; and 6. When newPoint 242 is non-NULL, it refers to the position just past the inserted text, represented in new token coordinates. Referring to FIG. 8A, there is shown a first of two flowcharts (the other is FIG. 8B) depicting the steps of the insertText() method 212 as it walks over the two token streams 138, 431. This figure omits description of the insertText() method 212 initialization steps, some of which have been described in reference to FIG. 5A. As part of the initialization steps, a subset of the insertText variables 216 are set as follows: insertTextPosition 221=editPoint.element.sub.-- position; (in the preferred embodiment, the insertion point is always in the first token of the TSS 314.1); insertLength 222=length of insertText 314.1; newPointTokenPos 223=firstTokIndex of the token stream segment 138; newPointCharPos 224=insertTextPosition+insertLength; prevOldToken 225=NULL; oldCharIndex 226=0; newCharIndex 227=0; replaceRange 244=an empty token range positioned at the beginning of the token segment 138; and new.sub.-- list 250=an empty list of tokens. Once these variables are initialized, the insertText() method 212 initiates a loop in which it performs the parallel walk over the old and new token streams 138, 431. As the first step in the loop, the insertText() method 212 determines whether newCharIndex 227 is less than oldCharIndex 226+insertLength 222 (702). If this condition is true (702-Y), then the walk of the new token stream is behind the old stream and insertText 212 gets another new token (nextNewToken 229) by calling get.sub.-- LexData() 302.1 (704) and setting the variable nextNewTokenLength 230 to the length of the returned token (704). If the nextNewToken 229 returned by get.sub.L-- exData() 302.1 is a whitespace quasi token (706), the insertText() method 212 determines whether the point just after the newly inserted text falls within the range of characters just lexed as whitespace (708). This condition is represented in C++ pseudocode as: "if (newPoint 242 is NULL && newCharlindex 227<=newPointCharPos 224 && newPointCharPos 224<=newCharIndex 227+nextNewTokenLength 230)." If this condition is true (708-Y), the insertText() method 212 records newPoint 242 as being at the beginning of the next token (710) and proceeds to step (712). If this condition is false (708-N), proceed to step (712), where insertText() 212 adds the length of the whitespace token just examined to newCharIndex 227 (712) and proceeds to step (734), where the Iccp end condition is evaluated. If the condition (706-N) is false, then nextNewToken 229 is not a whitespace quasi token and the insertText() method 212 begins to process nextNewToken 229 accordingly. As the first step in processing a non-whitespace token, the insertText() method 212 determines whether the point just after the newly inserted text falls within the range of characters just lexed (corresponding to the nextNewToken 229) (714). A | ||||||
