System and method for horizontal alignment of tokens in a structural representation program editor5857212Abstract An editor for structurally represented computer programs transforms user-entered text on-the-fly into a stream of tokens that constitute words of the program under edit. Each token is classified as one of group of extended lexemes, and based upon token stream information the editor prettyprint displays the program as the user types. Prettyprinting involves typesetting each token in a visually distinct manner and displaying a varying amount of visual inter-token whitespace between the tokens, based upon token lexical type. The program may be user-edited from the prettyprinted display as though the program were internally represented as text. Cursor position and display appearance depend on the lexical types of tokens adjacent the cursor. To improve aesthetics of the prettyprinted display, a user may insert one or more alignment markers into lines of associated text. The presence of such marker(s) forces horizontal alignment between associated text lines containing such markers. The presence, number, and occurrence of such markers in associated lines of text is noted, and the pixel distance from a boundary edge to the first occurring marker in each line is calculated. The maximum such distance determines relative position of the first alignment marker. Pixel units are added to the whitespace gap preceding the first marker in the other associated lines to force such markers into alignment with the marker whose position represented the maximum distance. This process is then repeated for second alignment markers in each line, third alignment markers, and so on. Claims What is claimed is: Description FIELD OF THE INVENTION
TABLE 1
______________________________________
Event stream
Position Content Lexical Class
113 158a 158b 158c
______________________________________
0 1 0.linevert split.
Interger literal
0x 1 0x.linevert split.
Incomplete integer literal
0x5 1 0x5.linevert split.
Integer literal
0x5H 1 0x5H.linevert split.
Illegal integer literal
______________________________________
In this example, note how the lexical class changes as the user types. This is because the keystroke executive 130 lexically analyzes the event stream 113 after each user keystroke. Also note that the token stream position 158a does not change as the user's keystrokes have the effect of modifying a single token, which, in this example, is the first token. Rather, what does change after each token stream 158 update is the insertion point 157, which marks both the keystroke executive's work position in the token stream 158 and the position of the cursor (displayed to the user on the display 118), with which the user interacts with the program being edited. B. Token Boundaries In another marked contrast to text editors, the keystroke executive 130 does not represent in the token stream 158 spaces typed by the user between program tokens (however, as will be discussed later, the keystroke executive 130 does represent in the token stream 158 typed spaces in string and character literals). Instead, the textual role played by whitespace in conventional text editors is replaced in the token stream 158 by token boundaries maintained by the keystroke executive 130 as the user types. Most token boundaries are implicit in what the user types and are recognized by the keystroke executive 130 and tokenizer 132 through lexical analysis. At such implicit boundaries there is no editable element between tokens. For example, recall that the tokens 158-1 through 158-13 in the token stream 158 shown in FIG. 3 correspond to the event stream "for(i=0;i<1024;i++)". This expression could have been entered from the input device 112 in many ways, all semantically equivalent; e.g., the user might have typed the event stream 113 as "for.sub.-- (i.sub.-- =.sub.-- 0;.sub.-- i<1024;.sub.-- i++)" or as "for(i=0;.sub.-- i<1024;.sub.-- i++)", where the ".sub.-- " symbol corresponds to a typed space in the event stream 113. To the user entering this text, these spaces play an important role. For example, they provide visual separation between adjacent program words and, as they are directly displayed by text editors, are an integral component in rendering a pleasing display of the program 124. However, none of these spaces are necessary in a semantic sense; i.e., their presence or absence does nothing to change the meaning of the for-loop expression (although in some situations, covered below, spaces do provide semantic information). Consequently, the keystroke executive 130, in tokenizing these event streams, discards any spaces entered by the user between the tokens 158-1 through 158-13. That is, the boundaries between the tokens 158-1 through 158-13 are implicit, being lexically inferable from the lexical classes of the adjacent tokens. At such implicit boundaries there are no editable elements between the tokens. 1. Separators There are also ambiguous token boundaries, where the program language would require a space in a purely textual representation (i.e., ambiguous boundaries are token boundaries where a space would be required in the textual equivalent of the token stream to preserve the boundary). In these cases, the keystroke executive 130 creates a separator, an editable element that acts something like a space. More specifically, in the present editor 122, a separator will only be associated with adjacent tokens in the token stream 158 that could legitimately be merged into one token. Generally, this determination is made by comparing the content of what the user is typing to the language definition of the program such as defined in the extended lexeme table 168a-2. For example, no separator is needed between the two tokens "for" and "(" in the token stream "for(" because those two tokens cannot legitimately be merged. In other words, no space is required in the textual equivalent of the token stream to preserve the boundary between the program words "for" and "("; which is not an ambiguous boundary. As a result, regardless of whether the use types "for.sub.-- (" or "for(", the keystroke executive 130 always gives the same token stream without a separator: "for", "(". In contrast, if the user types two adjacent "+"characters without any intervening spaces, the keystroke executive 130 generates a single corresponding plus-plus token "++", but if the user types a plus character "+" followed by a space ".sub.-- " then another plus character "+" the keystroke executive 130 returns a different token stream with two tokens and an intervening separator: "+.quadrature.+". Note that the separator in the two token example is not required because the token stream representation of the event stream "+.sub.-- +" would be ambiguous without it; in fact, the event stream "+.sub.-- +" could be unambiguously represented as two adjacent plus ("+") tokens having an implicit token boundary. This separator is also not required to create a pleasing display of the program 124 as the typographical display processor 170 displays visual whitespace between adjacent tokens based on their respective lexical classes, not user-entered spaces. Rather, the separator in the token stream 158 corresponding to the event stream "+.sub.-- +" is required so that the user can edit the corresponding stream as if it were text. I.e., by providing an editable separator corresponding to the displayed whitespace between the two plus tokens, the editor 122 gives the user an object to delete, whose deletion instructs the keystroke executive 130 to join the separated tokens into the single token, "++". If this separator were not present, there would be no simple, text-like way in which a user could join the two tokens. For example, consider the editor's behavior if a necessary separator were missing. If there were only an implicit boundary between the tokens (i.e., if the event stream "+.sub.-- +" were internally represented as the two adjacent tokens "+" and "+" ), the user might place the cursor (".linevert split.") in the whitespace between the displayed tokens as follows: "+.linevert split.+", and hit the rubout key, thinking this action should delete the space between the tokens, thus textually joining them. However, because the only editable element to the left of the cursor is the leftmost plus token, this action would only result in the keystroke executive 130 deleting that leftmost token. Thus, adding a separator between the two plus tokens allows the user to join the two tokens in a familiar, text-like manner (i.e., by putting the cursor between the separator and the rightmost plus token, then deleting backwards). Moreover, because a separator serves only an editing purpose and not a visual (whitespace) purpose, the keystroke executive 130 will allow only one separator between tokens, no matter how many spaces a user enters. In addition to serving as an editable element, a separator acts as an immutable token boundary; i.e., adjacent tokens with an intervening separator may never be joined as a side effect of other operations. The preceding discussion set out what separators are; i.e. they are editable elements that act like textual space and define immutable token boundaries. The following discussion sets out various ways in which separators are implemented in the present editor 122. a. Separators-Implementation: As stated above, the tokenizer 132 or the keystroke executive 130 determines whether a separator is required in each token boundary based on lexical properties of the two surrounding tokens. In the preferred embodiment of the present invention, the keystroke executive 130 does not actually represent any required separators in the token stream 158, but decides, on the fly, whether a separator exists or should exist between two tokens. Alternatively, based on the same type of decision procedure, the keystroke executive could actually add an explicit separator to the token stream 158. This decision procedure can be implemented by the editor 122 in two different ways, one approximate but conservative (meaning that some unnecessary separators are introduced into the token stream, but that no necessary separators are omitted), the other exact (meaning that only necessary separators are introduced into the token stream). The approximate procedure must be conservative as it is ok with respect to editing behavior to have additional separators but absolutely critical not to omit necessary ones. The preferred embodiment of the present invention implements the approximate but conservative approach. In this approach, the keystroke executive 130 or tokenizer 132 determines the need for a separator between two tokens by referring to the separator table 168a-1 on the basis of the lexical classes 158c of the adjacent tokens. The separator table 168a-1, as is shown in FIG. 3, consists of a grid of boolean separator values (0/1.fwdarw.no separator/separator) indexed by the lexical class of two adjacent tokens in the token stream 158. For example, if the leftmost of the two adjacent tokens has a class 158b of "101" (reading the vertical axis of the table 168a-1), and the rightmost has a class 158b of "100" (reading the horizontal axis of the table 168a-2), they would be separated by a separator as their intersection in the separator table 168a-1 has a boolean separator value of "1". This is an approximate approach because it is based on the lexical classes of the adjacent tokens and not the particular tokens being lexically analyzed. For example, class-oriented rules in the separator table 168a-1 would indicate that a separator should always be included between an identifier on the left and a floating point number on the right. This rule is required because a user might occasionally want to join these types of tokens; e.g., where the left token is "A" and the right token is the floating point number "1E5", these two tokens could legitimately be merged into the single token identifier "A1E5". However, because this rule is conservative, it introduces an unnecessary separator in other situations where the textual equivalent of adjacent tokens of the same respective classes have an unambiguous boundary; e.g., where the left token is "A" and the right token is ".1E5", the tokens cannot legitimately be joined. Using this approximate approach it is ok to add unnecessary separators (although this does require some special editing approaches), but it is not ok to eliminate necessary separators. For an example of such a case where the separator is critical, see the "++" example set out above. Thus, the approximate approach embodied in the extended lexeme table 168a-2 errs on the side of adding unnecessary separators. As intimated above, this approximate and conservative approach does cause some editing problems, such as when a user tries to delete an unnecessary separator such as the one associated with the boundary between the token "A" on the left and the token ".1E5" to the right (in this case the editor 122 simple moves the cursor over the unnecessary separator). However, this table lookup method is a fast, resource-efficient approach that can be executed on the fly as the user of the editor 122 generates the event stream 113. Moreover, it should be noted that unnecessary separators occur rarely in practice in languages such as C and C++ because the adjacent tokens that would give rise to an unnecessary separator never legitimately occur next to each other. Like the approximate approach, the exact decision procedure is rule-based, but rather than being token class-oriented, this latter approach evaluates the specific adjacent tokens according to a set of lexical rules to determine whether adjacent tokens could legitimately be joined. For example, the efficient approach would prescribe a separator between the tokens "A" and "1E5" but not between the tokens "A" and ".1E5". This approach eliminates editing side effects as described above, but is more computationally intensive. 2. Provisional Separators Provisional separators are a temporary record that the user has just hit the space bar in a place, such as between two tokens, where there is no need of an additional separator. The keystroke executive 130 or tokenizer 132 makes use of these provisional separators based on the assumption that the user must have typed the space for a good reason, thus, a record of the space should be preserved, if only for a short time. This allows the user to enter a new token whose boundary with the previous token is ambiguous, in which event the keystroke executive 130 eventually converts the provisional separator to an ordinary separator. If the user types a new token whose boundary with the previous token is not ambiguous, the keystroke executive 130 merely eliminates the provisional separator. In the preferred embodiment of the editor 122, there can be only one provisional separator in the token stream 158, and it only exists immediately to the left of the insertion point (described later). For example, assume that the user wishes to modify the text expression "a=b" to read "a.sub.-- c=b". In a text editor, a typical user might accomplish this change by placing the cursor to the left of the equals sign in the first expression, hitting the spacebar and typing a "c" following the space. However, in the editor 122, the event stream "a=b" is actually represented as the token stream 158 shown in Table 2 consisting of three tokens with no intervening separators.
TABLE 2
______________________________________
position content class p/s/bit
158a 158b 158c 158d
______________________________________
i a variable/61 0
i + 1 = equals- 0
arith.op./71
i + 2 b variable/61 0
______________________________________
Thus, in the absence of a provisional separator, when the user places the insertion point 157 between the tokens "a" and "=" and strikes the spacebar, the keystroke analyzer 130 would make no corresponding change in the token stream 158. I.e., the space typed by the user would not be recorded or recoverable. This is because the insertion point 157 already corresponds to an implicit token boundary between unambiguous tokens. Thus, when the user goes on to type the "c", which textually has an ambiguous boundary with the token "a" (i.e., "a.sub.-- c" has a different textual meaning than "ac"), the keystroke executive 130 would not introduce the necessary separator at the boundary between the "a" and "c" tokens and instead would form the updated token stream "ac=b". However, as the present editor 122 assumes that the user must have typed the seemingly superfluous space for a reason, it provides a means (the provisional separator) for representing this kind of editing activity in the internal token stream 158, which allows the editor to change the provisional separator to an ordinary separator when the user enters the "c". In the preferred embodiment of the present invention, the provisional separator is represented as a boolean provisional separator bit 158d (0/1.fwdarw.set, not set) in the insertion point data structure 157. Thus, in the present editor 122, there is only one provisional separator permitted in the token stream 158, and, if present, it is always understood to exist only to the left of the insertion point. This is because a provisional separator is an ephemeral object created by the editor 122 as a result of an immediately preceding input event. In another preferred embodiment of the present invention, the provisional separator is represented as an additional bit each token in the token stream 158 indicating whether the token is followed by a provisional separator. To allow the kind of editing behavior expected by users of text editors, the keystroke executive 130 or tokenizer 132 sets the provisional separator bit 158d whenever the user types a space at an existing token boundary where no separator is present. Thus, in the above example, the tokenizer 132 sets the provisional separator bit 158b when the user types a space between the "a" and "=" tokens; and the resulting token stream would read: "a.box-solid.=b" where ".box-solid." represents the provisional separator 158b at the insertion point 157). After the tokenizer 132 inserts the provisional separator in the token stream, the typographical display processor 170 will display the provisional separator as a space, but possibly in a way that is distinct from a displayed ordinary separator. Generally, a provisional separator 158b is ephemeral, being reset to 0 by the keystroke executive 130 and possibly replaced by an ordinary separator depending on the next user keystroke. In the example above, assuming that the user types a "c" at the provisional separator ".box-solid.", the tokenizer 132 will appropriately replace the provisional separator with an ordinary separator, meaning that the token stream 158 now reads "a.quadrature.c=b". The tokenizer uses the ordinary separator in this case as the token boundary between the tokens "a" and "b", which are both of the lexical class "variable/61", is ambiguous. If the user typed a plus sign "+" at the provisional separator instead of "c", the tokenizer 132 would remove the provisional separator completely as the resulting boundary between "a" and "=" in the expression "a+=b" would be unambiguous. Note that the tokenizer 132 will never enter more than one provisional separator between adjacent tokens into a flow. C. Lexical Analysis Process As set out above, maintaining the token stream 158 on the fly requires the keystroke executive 132, with the assistance of the tokenizer 132, to recognize all of the lexemes allowed by the language in which the program is being entered. Lexical analysis to this extent is well-known in batch-oriented tokenizer's, such as "Lex", which are executed on a program after the program has been entered. However, to maintain the token stream 158 on the fly completely and accurately, the keystroke executive 130 and the tokenizer 132 must also recognize the ill-formed and incomplete lexemes that are formed while a user of the editor 122 is typing. Generating this kind of extended lexical information, e.g., whether a token is a legal or incomplete floating point expression, requires the keystroke executive 132 to analyze the impact of each user keystroke in the context of a small portion of the existing token stream 158 (called a token stream segment 138), insertion point 157, extended lexeme table 168a-2 and separator table 168a-1. Of course, the number of tokens in the token stream segment 138, and their positions relative to the current insertion point depends on the lexical characteristics of the programming language in which the program is being entered. For example, in C or C++, the keystroke analyzer 130 generally needs to analyze no more than the tokens adjacent the token containing the insertion point. While the keystroke executive 130 can sometimes update the insertion point 156 and token stream 158 by itself (e.g., when a user deletes the single character of a single character token), in most situations it must consult with the tokenizer 132, which will suggest an update to the token stream segment 138 based on the extended lexical rules 168a-2. In such a consulting situation, the keystroke executive 130 queries the tokenizer 132 differently based on whether the current editing operation is an insertion or a deletion. In the case of an insertion, the keystroke executive 130 passes the tokenizer 132 a token stream segment 138 consisting of the token to the left of the insertion point and all of the tokens to the end of the current line of text, the character to be inserted, and the position at which that character is to be inserted, and asks the tokenizer 132 to recommend minimal changes to the token stream segment 138 that reflect the textual modification. In response, the tokenizer 132, which is a narrow expert in lexical analysis, knowing nothing of the insertion point 157, visual whitespace, or cursor behavior, performs an incremental lexical analysis on the token stream segment 138 by inserting the new character into the textual equivalent of the token stream (treating all separators as spaces) in the place indicated by the keystroke executive 130, and then analyzing the result as if it were text. Because the tokenizer 132 tries to make minimal changes to the token stream 138, in most situations changes to the token stream will not go beyond the token to the right of the insertion point. The tokenizer 132 then returns its suggested token stream update to the keystroke executive 130. It must also return the position of the inserted character relative to the updated token stream segment 138' so that the keystroke executive 130 can update the insertion point appropriately. When the editing action is a delete, the keystroke executive 130 identifies the editable element in the token stream segment 138 the user wants to delete and asks the tokenizer 132, "if this element were to be deleted, how should the token stream segment 138 be updated?" As in the case of an insertion, the tokenizer 132 computes the minimal change to the token stream segment 138 and returns its suggested update and the offset of the deleted character relative to the updated token stream segment 138' to the keystroke executive 130. The keystroke executive 130 is then free to accept or reject the suggestion based on higher level information to which only it has access. The keystroke executive 130 then writes the updated token stream 138' back to the token stream 158 and updates the insertion point 157. This process is described in greater detail in a pseudocode fragment in appendix A. As an example of a case where the keystroke executive 130 rejects the suggestion of the tokenizer 132 assume that a user about to create a comment has just typed the first character of a C-style comment beginning delimiter (i.e., "/"). The keystroke executive 130 passes this keystroke to tokenizer 132, which concludes that this keystroke is a C division token; therefore it suggests that the keystroke executive 130 replace the token stream segment 138 with an updated token segment 138' including the new division token. At this point, the keystroke executive 130 has no way of knowing that the user is typing a comment; therefore, it accepts the tokenizer's suggestion. Now assume that the user has just typed the second character of the comment beginning delimiter (i.e., "*"), which would look to an ordinary lexer like a C multiplication token. In the preferred embodiment of the present invention, the tokenizer 132 using extended lexical analysis rules is aware that it should join these two tokens together to form a special token ("/*") that is not in the language. This causes no confusion because neither C nor C++ allow a division token to be meaningfully adjacent to a multiplication token. Thus, the tokenizer 132 suggests that the keystroke executive update the token stream 158 by replacing the division token "/" with the special token "/*". However, and this is an area that will be covered in depth later, a basic tenet of the editor 122 is that comments and any other non-program information be kept separate from the token stream 158, simplifying lexical analysis of the program 124. Consequently, the keystroke executive 130 rejects the replace suggestion made by the tokenizer 132 and instead deletes the "/" from the token stream 158 and calls the comment processor 136 to create a new comment. This commenting process is described below in greater detail. The extended lexical type data generated by the tokenizer 132 are critical to the success of the present editor 122 as they provide information to the typographical display processor 170 that allows it to typeset the line being edited before it has been completed by the user and before it has been syntactically analyzed by the structural analyzer 150. For example, based on extended lexical analysis results, the prettyprinter 170 is able to typeset the incomplete floating point token "1.0E" or the illegal token "1.0E@" according to the display specification for a completed floating point expression, while displaying the extended lexical type information (e.g., that the token is incomplete or ill-formed) in such a manner as to assist the user in completing or correction the token (e.g., by highlighting the illegal token in red). This would not be possible in a conventional batch processor, which could not handle the torrent of incomplete tokens generated as the user enters the program. Of course, the fact that the editor 122 can provide lexical information that is displayed by the prettyprinter as the user types would be of no value to users unless they could easily interact with and edit the program being displayed. Thus, to achieve a high level of user acceptance, users should be able to interact with the displayed program as if it were textually, not structurally, represented. This implies that cursor movement and whitespace representation in the present editor should approximate that in conventional text based editors for the most common editing operations. How the present editor 122 does this, in the context of updating the token stream 158, is set out below. 1. Token Stream Updating Methods for Text-Like Cursor Control In the present editor 122, a user edits a computer program 124 by interacting with the displayed program through a cursor whose cursor position is defined by the current insertion point 157 in the token stream 158. The current insertion point 157 is maintained by the keystroke executive 130, which intercepts all of the user keystrokes and transforms them into the token stream 158; the typographical display processor 170 then communicates the current insertion point 157 to the user via a cursor at the appropriate place within the prettyprinted token stream 158 on the display 118. In a text-oriented editor, such editing behaviors are a simple matter--the editor simply moves the cursor from character to character as the user edits the underlying text, which has an internal representation that, generally, corresponds one-to-one with the displayed program. This simple heuristic also applies to the situation when the user moves the cursor from one program word to another across one or more spaces, as spaces are editable elements in conventional text editors. However, the present editor 122 internally represents the program 124 as a token stream 158, small bits of which (i.e., token stream segments 138) are updated on the fly by the keystroke executive 130. Moreover, there are no inter-token spaces in the token stream representation 158; instead, the tokens are separated either implicitly or by separators. Consequently, when the user edits the event stream 158, the keystroke executive 130 or the tokenizer 132, in determining how to update the token stream segment 138, must take into account (1) the user's keystroke/editing action (2) the position of the insertion point/cursor relative to the tokens and, possibly, separators, composing the token stream segment 138, and (3) the content 158b and lexical class 158c of the tokens in the token stream segment 138. Additionally, the keystroke executive 130 must update the insertion point appropriately, and the typographical display processor 170 must display the cursor in a screen position that (1) corresponds to the insertion point 157 and (2) is intuitive to the user. As lexical classes have been discussed already, this section will explain the methods and rules by which the keystroke executive 130 and/or the tokenizer 132 computes an updated token stream segment 138' and insertion point 157. The sole purpose of these methods and rules is to give the user the impression that they are editing a stream of editable elements in a conventional text editor. The first issue in defining updating rules for the keystroke executive 130 is to consider the range of editing operations permitted by the editor 122. The present editor 122 supports three basic text-oriented editing operations. These three basic editing operations are: (1) inserting a non-space character; (2) deleting an editable element and (3) inserting a space. The editor 122 also supports basic navigating operations such as mouse clicks and cursor forward and cursor backward keystrokes. These navigating operations are handled in the obvious way, by moving the insertion point over the editable elements. Given that the token stream of the present invention includes tokens and separators, and that no two separators can be adjacent except in the situation where one of the separators is a provisional separator, the insertion point 157 can occupy many possible positions with respect to an individual token and the aforementioned separators. However, of these myriad insertion point positions, only the six basic positions shown in Table 3 are necessary to the following discussion of the token stream and insertion point updating performed by the keystroke executive 130 (i.e., Table 3 is not exhaustive). Each row of Table 3 describes the insertion point, shows its associated index, which will frequently be referred to in the following discussions, and also includes an example of a token stream segment ("n: in int") showing the respective insertion point 157. In these examples, "(", "n", ":", "in", "int" and ")" are all tokens and the vertical bar (".linevert split.") marks the insertion position 157 and the corresponding cursor position. Ordinary separators and provisional separators are shown as a hollow box ".quadrature." and a filled box ".box-solid." respectively.
TABLE 3
______________________________________
Basic Insertion Point Positions
Position Example
______________________________________
1: Within a token. (n: i.vertline.n.quadrature.int)
2: Between tokens on a line,
(n: .vertline. in.quadrature.int)
no separator.
3. Between tokens on a line,
(n: in.vertline..quadrature.int)
at left of a separator.
4. Between tokens on a line,
(n: in.quadrature..vertline.int)
at right of a separator.
5. Between tokens on a line,
(n:.box-solid..vertline.in.quadrature.int)
at right of provisional separator.
6. Between tokens on a line, at left
(n: in.box-solid..vertline..quadrature.int)
of a sep, at right of prov. sep.
______________________________________
Thus, the keystroke executive 130 must have access to the following information to unambiguously locate the cursor in the token stream 158: (1) the token-oriented position (one of the six values from above), (2) if the token-oriented position is "1", the token-stream position of the token in which the insertion point is positioned, or, if the token-oriented position is not "1", the token-stream position of the token to the right of the insertion position; (3) the specific editable element to the right of the insertion point; and (4) the provisional separator bit 158d. Except for the provisional separator bit 158d, this cursor positioning information can be stored in a data structure accessible to the keystroke executive 130 or can be determined on the fly by the keystroke executive 130 from the editing context. The preferred embodiment of the present invention adopts the latter approach; however, for the purposes of this application, the four values (1) through (4) are illustrated as the token.sub.-- position 157b, the text.sub.-- stream.sub.-- position 157a, the editable.sub.-- element.sub.-- position 157c, and the provisional separator bit 158d of the hypothetical insertion point data structure 157. Having discussed the basic insertion point positions, the three methods by which the editor 122 executes the three basic editing operations are now set out. When reviewing the tables illustrating these various methods, please note that the editing outcomes shown in the tables reflect notions of what is natural cursor and editing behavior in text editing (the goal of the editor 122 being to provide text-like editing of structurally-represented programs). In most cases, the outcomes follow obviously from the editing operation. In a very small percentage of the cases, the outcomes reflect presumed user preferences which could be resolved differently depending on the user. Thus, these outcomes should be understood to reflect design choices made in the invention, other outcomes being equally consistent with the teachings of the present editor 122. Moreover, not all possible outcomes are shown, or if shown, discussed as the following discussions will focus on those editing outcomes that provide insight into important aspects of the keystroke executive's behavior. These editing methods are also described by the pseudocode included herein as Appendix B. The keystroke executive 130 also includes heuristics that determine how the cursor position/insertion point should be updated following editing actions that cause a separator to be added or removed from the token stream 158. One such heuristic follows: when an editing operation results in the addition of a separator in the token stream, place the cursor behind the new separator, where "behind" means to the right of separators added to the left of a deleted character, and to the left of separators added to the right of an inserted character. The idea behind this heuristic is to keep the cursor from hopping past new separators to render the behavior of the cursor familiar to users of the editor. Of course, this and all other heuristics might not seem natural to users who do not share the same set of expectations about "natural" editor performance. Consequently, any number of cursor control heuristics can be written into the keystroke executive 130. a. Inserting a Character When the user types a non-space character, the keystroke executive 130 or tokenizer 132 inserts the character into the token stream segment 138 at a position corresponding to the insertion point 157; lexically analyzes the results, treating separators as immutable token boundaries; and computes changes to token boundaries and classes. The resulting updated token stream segment 138' is then written back to the token stream by the keystroke executive 130. In this process, the lexical analysis step proceeds left to right, starting at the beginning of the left-most token touching the insertion point. For example, Table 4 shows how the tokenizer 132 responds to the insertion of the character "x" in representative flows at the various insertion points. Note that in every case shown, the insertion point is displayed to the right of the new character "x" when the operation has been completed.
TABLE 4
______________________________________
Insert an "x"
Position Example Editor Response
Final Position
______________________________________
1: Within a (n: i.vertline.n.quadrature.int)
Change "in" to "ixn";
1:
token. reanalyze. "ix.vertline.n.quadrature.int"
2: Between, (n: .vertline. in.quadrature.int)
Change ":in" to
1:
no sep. ":xin"; reanalyze.
"n: x.vertline.in.quadrature.int"
3. At left of a
(n: in.vertline..quadrature.int)
Change "in" to "inx";
3:
separator. reanalyze. "inx.vertline..quadrature.int"
4. At right of a
(n: in.quadrature..vertline.int)
Change "int" to
1:
separator "xint"; reanalyze.
"in.quadrature..vertline.xint"
5. At right of a
(n: .box-solid..vertline.in int)
Change "in to "xin";
1:
prov. sep. reanalyze. "n: x.vertline.in.quadrature.int"
6. At left of a
(n: in.box-solid..vertline..quadrature.int)
Analyze "x" by itself;
3:
sep., insert new token
"n: in x.vertline..quadrature.int"
right of a
prov. sep.
______________________________________
Referring to Table 4, the "Final Position" in each case results from the example shown. Other final positions are also possible; for example, assuming that the cursor is in position 1, e.g., "i.linevert split.n", and the user types the token "+" instead of "x", lexical analysis would result in an updated token stream segment 138' of "i+.linevert split.n", consisting of three tokens with only implicit boundaries, where the final insertion point, or cursor position, is position 2 (between tokens--no separator). Table 5 presents various examples in which the keystroke executive 130 would place the final cursor position in different final positions than those shown in Table 4. Unlike Table 4, these examples involve the insertion of different characters in different illustrative token stream segments. Thus, the "Example" column shows the existing token stream (bolded) and the user keystroke (in italics). The "Final Position" column shows the final cursor/insertion point position and the corresponding updated token stream segment, including the new insertion point marked with the ".linevert split." symbol. Ordinary and provisional separators are shown as in Tables 3 and 4.
TABLE 5
______________________________________
additional insert examples
Final
Position Example Editor Response
Position
______________________________________
1: Within a a.vertline.b;
change "ab" to
2:
token. type "+" "a+b"; "a+.vertline.b"
reanalyze.
2: Between, a.vertline.);
Change "a)" to
2:
no sep. type "+" "a+)"; "a+.vertline.)"
reanalyze.
3. At left of a
+.vertline..quadrature.+;
Change "+.quadrature.+" to
2:
separator. type "+" "++.quadrature.+";
"++.vertline.+"
reanalyze.
4. At right of a
a.quadrature..vertline.b;
Change "a.quadrature.b" to
2:
separator. type "+" "a.quadrature.+b";
"a+.vertline.b"
reanalyze.
4. At right of a
a.quadrature..vertline..1;
Change "a.quadrature..1" to
3:
separator. type "a" "a.quadrature.a.quadrature..1";
"a.quadrature.a.vertline..quadrature..
1"
reanalyze.
5. At right of a
a.box-solid..vertline.);
Change "a.box-solid.)" to
2:
______________________________________
In the first of the examples ("position"=1) in FIG. 5, the user has typed a plus token "+" between two editable elements of the token "ab". Consequently, when this token stream segment 138 is evaluated by the tokenizer 132, it is split into three tokens with implicit token boundaries (implicit because the equivalent text stream "a+b" is unambiguous regardless of intervening spaces). Thus, the tokenizer 132 suggests the updated token stream segment 138', where the insertion point is in position 2, between the "+" and "b" tokens with no separator. The second example ("position"=2) is similar, except in this case, the initial token stream consisted of two tokens "a" and ")" with no separator. Because the updated token stream 138' is equally unambiguous after insertion of the "+" token, no separator is required between the "+" and ")" tokens, meaning the final insertion point and cursor are still in position 2. The third example ("position"=3) is interesting because it results in the elimination of a separator. In this case the user has placed the insertion point between the first "+" token and the separator (".quadrature.") and entered another plus "+". The tokenizer 132 recognizes the two plus characters "++" as a legitimate C/C++ arithmetic operator, so it joins the two into a single "++" token. The tokenizer 132 also recognizes, based on the contents of the separator table 168a-1, that no separator is necessary between the "++" token and the adjacent arithmetic operator "+". Thus, the tokenizer 132 returns an updated token stream segment 138' in which the separator has been removed and the insertion point is now between the "++" and "+" tokens; this is position 2. In the fourth example ("position"=4), the separator also has disappeared after the insertion. This is because, the ambiguity, thus the need for a separator, between the adjacent variables "a" and "b" is eliminated by the insertion of the plus token "+". Thus, the final insertion position in the updated token stream segment 138' is position 2, following the just-inserted plus token. In the fifth example (also with "position"=4), the tokenizer 132 inserts the character "a" into the token stream segment "a.quadrature..linevert split..1" and reanalyzes the resulting text string. Because, in C/C++, it might be possible to join the inserted identifier "a" with either an identifier (such as "a", to the left of the insertion point) or a floating point literal (such as ".1" to the right of the insertion point), in the updated token stream segment 138', a separator is associated with the token boundary between the tokens "a" and ".1". However, in this particular example the new separator is unnecessary as the particular tokens "a" and ".1" could never legitimately be joined. In the sixth and final example, the provisional separator is present because, before the current insertion event, the user typed a space between the tokens "a" and ")", where no separator was otherwise required. When the user types the character "+" the tokenizer 132 deletes the provisional separator, which is no longer necessary given the unambiguous boundaries between the tokens "a", "+" and ")", and the insertion point lies in position 2, between the "+" and ")" tokens. b. Deleting a Character When the user deletes an editable element in the token stream 158, the action taken by the keystroke executive 130 depends on the editing context. When the user deletes the only character of a single character token, the keystroke executive 130 deletes the token from the token stream segment 138. When the user deletes the separator between two tokens, the keystroke executive 130 joins the adjacent tokens of the token stream segment 138 into a single token. When the user deletes a character from a multi-character token, the keystroke executive 130 removes the character from the token. In most cases, after deleting the character the keystroke executive 130 reanalyzes the resulting flow in the same fashion as for insertions, then writes the updated token stream segment 138' back to the token stream 158. Table 6 shows how the keystroke executive 130 and tokenizer 132 respond to a "delete next character" command issued by the user when the cursor is at a position in the flow corresponding to the various insertion positions. A similar table is not shown for the "delete previous character" command as those results are nearly symmetrical to those shown below.
TABLE 6
______________________________________
Delete Next Character
Final
Position Example Editor Response Position
______________________________________
1. Within a token.
A.vertline.BC
Change "ABC" to "AC";
1:
reanalyze. "A.vertline.C"
1. Within a token.
A.vertline.B+
Change "AB+" to "A+";
2:
reanalyze. "A.vertline.+"
1. Within a token.
A.vertline.B.quadrature.C
Change "AB.quadrature.C" to "A.quadrature.C";
3:
reanalyze. "A.vertline..quadrature.C"
2. Between, no
+.vertline.AB
Change "+AB" to "+B";
2:
sep. reanalyze. "+.vertline.B"
2. Between, no
A.vertline.+B
Change "A+B" to "AB";
3:
sep. reanalyze. "A.vertline..quadrature.B"
3. At left of a
A.vertline..quadrature.B
Change "A.quadrature.B" to "AB";
1:
separator. reanalyze. "A.vertline.B"
3. At left of a
A.vertline..quadrature..1
Change "A.quadrature..1" to "A.1";
4:
separator. reanalyze "A.quadrature..vertline..1"
4. At right of a
A.quadrature..vertline.B+
Change "A.quadrature.B+" to
2:
separator. "A.quadrature.+"; reanalyze
"A.vertline.+"
4. At right of a
A.quadrature..vertline.BC
Change "A.quadrature.BC" to "A.quadrature.C";
4:
separator. reanalyze. "A.quadrature..vertline.C"
5. At right of a
+.box-solid..vertline.AB
Change "+.box-solid.AB" to "+.box-solid.B";
2:
separator. reanalyze. "+.vertline.B"
5. At right of a
A.box-solid..vertline.+B
Change "A.box-solid.+B" to "A.box-solid.B";
4:
separator. reanalyze. "A.quadrature..vertline.B"
6. At left of a
A.box-solid..vertline..quadrature.B
Change "A.box-solid..quadrature.B" to
"A.box-solid.B";
4:
sep. right of reanalyze. "A.quadrature..vertline.B"
a prov. sep.
______________________________________
In example 6 (starting position=3, final position=1), the user is attempting to delete the separator between the adjacent tokens "A" and "B". The keystroke executive 130 treats this as a request by the user to join the tokens adjacent the separator. Thus, in example 6, the result of the deletion operation is the single, two character token "AB", where the final insertion point position lies between the "A" and the "B". However, in some cases, as in example 7, where an unnecessary separator has previously been inserted between two tokens, the user's attempt to delete that separator will result in a no-operation. In this example (starting position=3, final position=4), the user has attempted to delete the separator between the tokens "A" and ".1". As a result of this editing operation, the keystroke executive 130 asks the tokenizer 132 to reanalyze the textual equivalent of the starting token stream as if the space corresponding the separator were not present (i.e., the text stream "A.1"). Because this textual equivalent clearly comprises an adjacent identifier and floating point literal, the tokenizer 132 does not join the three characters "A", "." and "1" into a single token, but returns two tokens "A" and ".1" to the keystroke executive 130. The keystroke executive 130, seeing that the tokenizer 132 did not comply with the request to join the tokens, leaves the separator in place and merely moves the insertion point over the separator. This is why, in example 7, the updated token stream segment 138' is identical to the starting token stream segment 138, but the insertion point is changed to position 4, to the right of the separator. The remaining examples in Table 6 follow in a straightforward manner from these examples and other descriptions of the various insertion points and token stream updating methods of the present invention. c. Inserting a Space In the present editor 122 the user is encouraged to think of the spacebar textually (i.e., as in a traditional text editor where the user enters a space by hitting the space bar); however, in the present editor 122, by typing a space the user actually tells the keystroke executive 130 that there should be a token boundary at the insertion point. The keystroke executive's response to the spacebar depends on where in the token flow the user attempted to enter the space. For example, the keystroke executive 130 will not create a boundary where one already exists. The possible responses of the keystroke executive 130 to the user's attempt to insert a space are illustrated in Table 8 for the six basic insertion point positions.
TABLE 7
______________________________________
Press Space Bar
Position Example Editor Response
Final Position
______________________________________
1. Within a (n: i.vertline.n.quadrature. int)
Split "in" into "i"
4:
token. and "n"; reanalyze
"i.quadrature..vertline.n int"
2. Between, (n: .vertline. in.quadrature. int)
Add provisional
5:
no sep. separator. "n: .box-solid..vertline.in.quadrature
. int"
3. At left of a
(n: in.vertline..quadrature. int)
Move insertion point
4:
separator. over separator.
"in .quadrature..vertline.int"
4. At right (n: in.quadrature. .vertline.int)
No-op (blink
4:
of a insertion point).
"in.quadrature. .vertline.int"
separator.
5. At right of
(n: .box-solid..vertline.in.quadrature. int)
No-op (blink
5:
a prov. insertion point).
"n: .box-solid..vertline.in.quadrature
. int"
sep.
6. At left of
(n: in.box-solid. .vertline..quadrature. int)
No-op (blink
6:
a sep., insertion point).
"in.box-solid..vertline..quadrature.
int"
right of a
prov.
sep.
______________________________________
As an alternative to the third example (starting position=3) in Table 7, the keystroke executive 130 could place the insertion point in position "6" after the user types a space at position "3". This outcome would reflect a belief that the user is typing the space as the first step in adding a new token between the tokens "in" and "int". The third example in Table 7 reflects a different belief, that the user expects the insertion point to progress past the space just typed. Each of these implementation alternatives are motivated by a desire to provide program editing behavior that users accustomed to text editors would consider normal. Other alternatives consistent with the design of the editor 122 could also be proposed. d. String Literals The present editor 122 treats strings as tokens of the lexical class "string literal" which are included in the token stream 158. However, none of the preceding rules regarding token formation and token stream updating are applicable to the editing of string literals. This is because string literals do not have to observe the same lexical rules as program statement, apart from using the appropriate begin string and end string delimiters, such as the double quotes (") required in C and C++. For example, whereas there is a general rule in the editor 122 that no spaces are allowed in the token stream 158, string tokens may contain spaces. Thus, the keystroke executive 130 handles strings exceptionally, by forming a single token of class "string literal" containing all of the keystrokes, including characters and spaces, entered by the user, and storing that token in the token stream 158 as any other token would be stored. The editor 122 also provides a lexical class of "incomplete string literal", to allow the proper treatment and typographical representation of strings as they are being entered or edited. For example, consider the normal method by which a new string token is created in the editor 122. First, the user types a double quote (") to start the string. The keystroke executive 130 then passes the double quote and a token stream segment 138 to the tokenizer 132, which recognizes the double quote as belonging to the lexical class of "incomplete string literal" and suggests that the keystroke executive insert a new token (") of class "incomplete string literal" into the token stream 158. Assuming that the user continues to type string characters after the opening double quote, the keystroke executive 130 does not call the tokenizer to analyze these subsequent keystrokes, but simply appends them to the same incomplete string literal. In other words, when the user is editing a string, the editor 122 behaves like a normal text editor. When the user ends the string by typing the closing double quote the keystroke executive 130 appends that final character to the string token and changes the lexical class of the token to "string literal." This process can reversed by the user typing backwards starting with the closing double quote. However, users often depart from this simple method of entering a string literal. For example, a user wishing to turn part of a line of code into a string might first type the entire line, then move the cursor back to a point in the middle of the line and enter the opening double quote, perhaps followed by some additional characters, then move the cursor to the end of the line and type the closing double quote. The following series of statements illustrate such an editing sequence, the cursors position after each step being indicated by the ".linevert split." character: 1. a=b+c*d.linevert split. 2. a=".linevert split.b+c*d 3. a="e.linevert split.b+c*d 4. a="e/.linevert split.b+c*d 5. a="e/b+c*d" At step 1, the keystroke executive 130 and the tokenizer 132 will have generated a token stream 158 including 7 tokens (a, =, b, +, c, *, and d), all separated by implicit token boundaries. At step 2, the keystroke executive 130 inserts a new, incomplete string literal, including only the double quote, between the equals and the "b" tokens. In steps 3 and 4, the keystroke executive appends the two additional characters, "e" and "/" to the incomplete string literal, so that the token string now includes 8 tokens (a, =, "e/, b, +, c, *, and d). So far, the actions of the keystroke executive 130 and tokenizer 132 in response to these first four steps are no different from those described for the normal case; however, in step 5, when the user types the closing double quote, the tokenizer 132, which only analyzes a small portion of the token stream, for example, the final "d", is not aware of the opening quote some 6 tokens. Thus, instead of including all of the tokens from "+" to "d" in the string literal beginning "e/, as intended by the user, the tokenizer 132 suggests that the keystroke executive 130 add another new, incomplete string literal to the token stream 158. However, as the keystroke executive 130 is aware of contextual information not provided to the tokenizer 132, e.g., the fact that the line on which the user is typing includes another incomplete string literal, the keystroke executive 130 will reject the tokenizer's suggestion, and treat the second double quote as the closing double quote of the first incomplete string literal, meaning the updated token stream 158 will include three tokens, (a, =, and "e/b+c*d"). Note that the keystroke executive 130 can perform the same kind of on-the-fly, contextual matching for other bracketing characters such as parentheses, square brackets and curly braces. In an alternative preferred embodiment of the present editor 122, a second string literal management model is provided that does not permit unmatched string literal delimiters to be entered by users. In this model, whenever the user types an unmatched string delimiter, the keystroke editor 130 causes a complete string literal object to be created, including the closing quote, which is displayed by the typographical display processor 170 in a distinctive manner, such as within a box with visible boundaries. As long as the user keeps the cursor and insertion point within the string literal borders, the keystroke executive 130 simply passes all user keystrokes to the appropriate string literal object. When the user types the ending quote symbol, the keystroke executive 130 moves the insertion point across the closing quote. Once a string literal object has been created, the user is not allowed to delete either of the quotes until all of the characters in the string are deleted. Thus, there are no unmatched string literals in this second string management model. Unlike the first model, discussed above, there is also no way for strings to become program tokens and vice versa as a side-effect of normal editing actions. This concludes the detailed description of the lexical analysis performed by the keystroke executive 130 and/or the tokenizer 132. We will now proceed to a discussion of the other functional block included within the keystroke executive 130, the comment processor 136. COMMENT PROCESSOR 136 The comment processor 136 processes program comments entered by a user. Generally, a user has wide latitude as to where the comments are placed and what purpose they serve. For example, a comment might be a line of typographical characters used to separate two blocks of code, a text paragraph explaining the function of the preceding or following block of code, or a brief description of a variable being declared on the same line. Referring to FIG. 4, there is shown a block diagram of the editor 122 that emphasizes the functional blocks and data structures related to the comment processor 136. The elements in FIG. 4 not shown in prior figures include comment list elements 160b-1, 160b-2. As described in reference to FIG. 2, the comment processor 136, which is called by the keystroke executive 130, is responsible for managing the comment list 160b. In the preferred embodiment of the present editor 122, this comment list is implemented as a C++ object that inherits certain of its properties from the Annotation List 160b, which is also implemented as a C++ object. The comment list 160b comprises a list of comment elements 160b-i, where i is an integer between 0 to N. Each of these comment elements is an instance of the C++ class "CLComment", which is a language specific type (i.e., its attributes are specific to a specific language such as C++) that inherits from the language independent class "Annotation", which defines the general attributes of annotations, such as comments. This structure, where the comments are completely separated from the program is an important element of the editor 122. This is because, only by splitting the program representation in this way can the language comments be separated from the program tokens, allowing the keystroke executive 130 and the tokenizer 132 to analyze the progam tokens on the fly. As is shown in FIG. 4, the attributes associated with the class `ClComment" and with each comment list element 160b-i include a comment identifier 182a, the anchor position 182b of a comment within the token list 158, lexical.sub.-- position 182d, vertical.sub.-- space 182e and a text pointer (text.ptr) 182f, which addresses one or more strings containing the text of the comment as entered by the user. These attributes will be described below. 1. Method of Operation--Comment Processor 136 The keystroke executive 130 initially calls the comment processor 136 whenever the insertion point 157 is between tokens and the user types a comment beginning delimiter, such as "/*", used in C, or "//", used in C++. This process was described above in the context of the discussion of the keystroke executive's lexical analysis process. The comment processor 136 then records every keystroke of the comment until the user types the comment ending delimiter (e.g, "*/") or the keystroke executive 130 ceases to pass on the keystrokes from the event stream 113, such as when the user moves the cursor outside the comment boundary. While a comment is being entered, the keystroke-by-keystroke lexical analysis described above is disabled. This is because the content of a comment is not subject to lexical rules like regular program statements. In addition to recording the text of each comment, the comment processor 136 also records a small amount of textual positioning information about each comment. This information is necessary so that the comment can be correctly positioned by the typographical display 170 processor relative to the text (tokens) with which the comment was originally associated. By storing this information outside the program syntax tree 162, two problems associated with comment storage in prior art tree-based structure editors are avoided. First, by associating the comments with the token stream 158, which is not affected by program compilation, it is easy to keep comments correctly positioned in the structured program representation 156, and therefore the program display 118, from one compile to the next. This is not possible in the prior art editors, which attach the comments to specific nodes of the syntax tree 162, which is modified by the compiler/parser. Second, given that the present editor 122 maintains the token stream 158 on the fly, it always possible to associate the comments with the structured representation 156, even when the syntax tree 1 62 has not been generated or when there is no structural analyzer 150 at all. In contrast, the prior art structural editors cannot associate comments with the syntax tree for parts of the program containing syntax errors preventing the structural representation of those parts. Finally, this representation which stores minimal comment information is far more compact than structural editors which solve the problem of locating comments tied to a syntax tree by storing with those comments copious amounts of information allowing the comments original textual position to be recovered. First, as was mentioned above, the comment processor 136 records the complete text of each comment. This text is stored in a form of representation that is suitable for textual editing. For example, the preferred embodiment of the present invention stores comment texts as strings 184-i that can be accessed by a text editor. In an alternative embodiment of the editor 122, comments are stored as word processor documents, which can then be edited as embedded documents within the program 124. Second, the comment processor 136 records the exact position of each comment with respect to the lexemes/tokens of the token stream, or flow, 158 in which the comment was entered. In the preferred embodiment of the editor 122, this information is represented as the variable "position" 182b, within a comment list element 160b, where the position 182b is set by the comment processor 136 to the integer position of the token in the token stream 158 immediately to the right of the comment. This encoding implies the following about the representation of comments by the comment processor 136: 1) there may be any number of comments at a single token position; 2) the ordering of comments at a single token position must be recorded externally; e.g., by order of appearance in the comment list 160b; 3) there can be any number of comments at position 0 (before all language tokens); and 4) there can be any number of comments at position N (after all language tokens). Third, the comment processor 136 stores information regarding the positioning of the comment with respect to adjacent lines of the program 124 as it was entered by the program author. This type of information is represented in terms that correspond to the ways in which authors typically enter comments; e.g., whether the comment was embedded in a line of text or on a separate line, and how many blank lines the author entered around the comment. The comment processor 122 stores this information in one of two ways: symbolically or in terms of adjacent line breaks. The symbolic storage technique is employed by the preferred embodiment of the present invention and is the technique illustrated in FIG. 4. This symbolic information consists of two enumerated variables, "lexical.sub.-- position" 182d and "vertical.sub.-- space" 182e. The first variable, lexical.sub.-- position 182d, reflects the comment's position with respect to the nearest language tokens and is selected from the set of values that includes: "Line", "Trailing" and "Embedded", where "Line" indicates that the comment is on a line or lines by itself; "Trailing" means that language tokens appear only to the left of the comment and on the same line, and "Embedded" indicates that language tokens are to the right of the comment and on the same line. The second variable, vertical.sub.-- space 182e, is related to the existence of vertical space above and/or below the comment and has a value that can be selected from: "noVerticalSpace", "beforeOnly", "afteronly" and "beforeAfter". These values have the following meanings: "noVerticalSpace" means that there are no blank lines above or below the comment; "beforeOnly" and "afteronly" indicate that there are blank lines only preceding/following the comment; and "beforeAfter" indicates that there are blank lines both before and after the comment. These two enumerated variables 182d and 182e capture the placement of comments in an way that will prove most useful to the prettyprinter 170. To generate this information, the comment processor 136 accesses and analyzes the parts of the token stream 158 adjacent to the comment's anchor spot (i.e., stream position 182b). The second storage technique, used in an alternative embodiment of the present invention, is simpler than the first. The information stored in this second technique is actually computed in the same manner as the information generated in the first storage technique, but is stored without being abstracted. This information consists of two integer | ||||||
