Brazilian Portuguese grammar checker5870700Abstract A process for checking the Brazilian-Portuguese grammar of original text provided by a user. For each sentence, a token is generated to correspond to each of the words in the sentence. For each word in the sentence, an orthographic dictionary is searched to find possible variations in spelling, meaning and grammatical classification. Each token is expanded to include the variations in spelling, meaning and grammatical classification of its corresponding word. Each token is submitted to a heuristic process for excluding any variation which is incompatible with the token's juxtaposition within the sentence. Pre-analysis rules are applied to each token to identify and determine positionally correct alternatives for select pre-analysis errors in grammar based on the juxtaposition of each word within said sentence. An analysis tree, containing all of the tokens for the sentence and recording syntactic classification of the tokens and groups of tokens, is generated for the sentence. A set of grammar rules having terminal elements, special predicates, and non-terminal elements is generated as well. The set of grammar rules is applied to each token of the analysis tree by following analysis paths, including a so-called low-cost analysis path, in order to find cost errors in the analysis tree. It is determined which of the analysis paths is the low-cost analysis path, wherein the low-cost analysis has the fewest cost errors for the sentence. Any cost errors found in the analysis tree by analysis via the low-cost analysis path are reported to the user. Claims What is claimed is: Description FIELD OF THE INVENTION
______________________________________
Sample text:
Pedro, o meu vizinho, nao
esta em casa. Ele foi ao
jogo de futebol.
Analysis:
Pedro WORD
PUNCTUATION
o WORD
meu WORD
vizinho WORD
PUNCTUATION
nao WORD
esta WORD
em WORD
casa WORD
. PUNCTUATION (actually, the `.` is consumed,
and only the EOL is kept).
END-OF-LINE
______________________________________
During the stage or orthographic analysis, each WORD included in the TOKEN list is submitted to a search in the ORTHOGRAPHIC DICTIONARY file. This dictionary includes various words stored in the following forms: radical, inflection ending, grammar information, and accent information. "Radicals" are stored without any accents so that all imperfect homographs corresponding to the word in the text may be found and utilized in the analysis. The following events are identified here: imperfect homographs (different accent), accentuation errors, and unlisted words. Any such errors are reported to the error diagnosis module and then transmitted to the user interface module. After the user chooses the appropriate measure to be taken (i.e., correct or ignore the reported error), the action is recorded and the process continues with the next word in the list. The TOKEN corresponding to each WORD found in the dictionary is expanded with a list of possible meanings, each including information on grammar class and properties (plural or singular, gender, person, mood, etc.). Here is an example of the orthographic analysis: Sample text: Pedro, o meu vizinho, nao esta em casa. Ele foi ao jogo de futebol. Analysis: esta WORD esta ›this!--demonstrative pronoun, singular, feminine esta ›is!--linking verb--third person, singular, indicative mood casa WORD casa ›home!--noun, feminine, singular casa ›marries!--direct and indirect transitive verb--third person, singular, indicative casa ›be married!--direct and indirect transitive verb--second person, singular, imperative Upon generation of the list of expanded TOKENS, each TOKEN is subject to preliminary processing. A heuristic process excludes any meaning with are incompatible with each TOKEN based upon its placement with the sentence and the classes of the surrounding words. This elimination of incompatible classes limits the search and saves time. At this stage, combinations and contractions are identified, as are adverbial phrases, dates, special uses of the accent grave (crasis), errors in use of the verb `haver`, use of `meia` and `muita` in lieu of paronymous adverbs, `mim` versus `eu`, and others. Here is an example of preliminary processing of the TOKEN List: Sample text: Vende-se uma casa. Analysis: The particle `se` can generally be any of the following: objective case pronoun, subordinating conjunction, or adverbial conjunction. However, the precedent word in this case ends in a hyphen, making it impossible for the `se` to be a conjunction. Next, the list of expanded TOKENS is submitted to a process of grammatical analysis, by means of a "generative grammar" which has been modified by a special predicates for grammatical events. At this stage, the following "cost errors" are found: verb/noun agreement, improper homograph, excess or missing commas, improper use of pronoun (i.e., nominative/objective case). These errors are continuously counted without interrupting the analysis. During the analysis process, an analysis tree is generated to record the syntactic classification of the various TOKENS and groups identified. Here is an example of an analysis tree:
______________________________________
Sample tree: sentence
subject predicate
verbal predicate
noun phrase verbal form direct object
article noun adjective adverb intransitive verb EOL
Sample text:
O gato preto nao morreu
______________________________________
If a sentence is successful--no error recorded--it is deemed correct, and its grammatical analysis is complete. The client process of the analyst is responsible for preparing and submitting a new sentence for analysis. In the case of a successful analysis with a non-zero error count, the analysis tree with the lowest cost is maintained, and the analysis continues until all rules are exhausted or an analysis tree without error is found. At the end of the analysis, if errors are recorded, these are forwarded to the diagnosis and suggestions module. According to the error, the module creates suggestions and forwards the error and suggestion list to the user interface module. Some steps of error processing occur after the construction of the analysis tree. To deal with errors concerning misplacement of atonic pronouns in the objective case, the attractive words are considered, such as negative adverbs or pronouns. If an oblique pronoun follows a verb, which in turn follows one such attracting word, a suggestion is made to change the placement of the pronoun. Two other situations may demand the placement of an oblique (objective) pronoun after the verb: verbs in the infinitive mode and the beginning of sentences. Comma misplacements are detected by using terminal symbols that accept a null TOKEN. Upon construction of the tree, such terminals are identified, generating errors whenever they do not correspond to an expected event. For example, grammar permits two types of terminals: NO-COMMA or RECOMMENDED-COMMA. If, within the text, a comma corresponds to the terminal NO-COMMA, or there is no comma corresponding to the terminal RECOMMENDED-COMMA, an error will be indicated. A similar method is used for the various forms of `porque` (porque, por que, and porque--respectively meaning roughly: why, because, and the reason why). All TOKENS that fit any of the forms of this word receive all three classifications. At the end of the analysis, if the expected form for the TOKEN is not found, an error is indicated. Almost all errors generate suggestions to be presented to the user. Most suggestions are generated as the errors are found, when all the elements required are available. The most complex process is that of generating suggestions for agreement errors, in which case it is necessary to identify which words require inflection and which may remain unchanged. Here is an example of the grammatical analysis for agreement errors: Sample text: O homem de cabelos grisalhos e engra.cedilla.ado. Suggestions: O homem de cabelos grisalhos e engra.cedilla.ado. Os homens de cabelos grisalhos engra.cedilla.ados. A mulher de cabelos grisalhos e' engragada. As mulheres de cabelos grisalhos sao engra.cedilla.adas. Note that the phrase "de cabelos grisalhos" remains unaltered for all suggestions. B. DETAILS OF GRAMMATICAL ANALYSIS The grammar rules used in the analysis define the nonterminal elements as a function of terminals and other nonterminal elements. Grammar rules are expressed as follows:
______________________________________
prefix : left side .vertline.
right side
______________________________________
(term to be (rule repre- (alternate
defined) senting term) definitions)
______________________________________
Not every rule contains a `right side` in its expression. In the expressions of the rules, terminal elements will be represented by capitalized nouns, while nonterminal elements are shown in lower case. Here is an example: subject: PERSONAL.sub.-- PRONOUN.vertline.noun.sub.-- phrase `Subject is the term being defined. `PERSONAL.sub.-- PRONOUN` is a terminal. The two clauses defining `subject` are separated by `.vertline.`. The evaluation of terminal elements is embedded in the grammatical analysis mechanism and does not require being (and should not be) defined as a function of others. Terminals such as NOUN, ADJECTIVE, ADVERB, etc. represent the grammatical class of a word. In addition, terminals are provided to identify subclasses (PERSONAL.sub.-- PRONOUN, INTENSITY.sub.-- ADVERB, NEGATIVE.sub.-- ADVERB, etc.) or characteristics (COMPARATIVE, MORE.sub.-- THAN, LESS.sub.-- THAN, etc.). Additional terminals are provided which act as controls, allowing a certain degree of programming. Each grammatical line may contain special predicates controlling the agreement rules. Additional special predicates may be included under the objectives for a rule to allow for rule analysis control. The following table shows a list of special predicates with an explanation of the usefulness of each: Here is a simplified example: sentence: subject predicate CVN(1,2) Interpretation: One sentence is made of a subject and a predicate. The subject and predicate must agree in gender, number and person. Note that each of the non-terminal elements "subject" and "predicate" must also be defined for the above definition of sentence to be complete. Therefore, the definition continues as follows: subject: PERSONAL.sub.-- PRON CVN(1) .vertline. ARTICLE NOUN CN (1,2) PES(3) Interpretation: A subject is made of a personal pronoun, assuming its gender, number and person. A subject may also be made up of an article followed by a noun. Such two elements must agree in gender and number. Gender and number of the noun must be transmitted to the subject, as well as the information that it is a third person. predicate: VERB complement CV(1) GEN(0) Interpretation: A predicate is made up of a verb followed by a complement. The number and person of the verb must transmitted to the predicate. The predicate accepts either gender (masculine or feminine). The grammatical analysis process is carried out by an interpreter program installed in the memory of the computer and being adapted to interpret grammar rules with backtracking and cutting resources. Under the grammar rules, the non-terminal symbols causes the analysis state to be pulled and the search for new rules that have as start the non-terminal toe expanded. Each terminal symbol encountered under a grammar rule is verified against the current token in the token list, aiming at a matching of attributes. If the token and the symbol are compatible, both are accepted and the interpreter goes to the next symbol of the grammar rule and to the next token of the list. When the matching fails, one symbol under the grammar rule and one token in the list are receded, searching for the next possible matching (backtracking). If an attempt to recede to the beginning of a rule occurs, this rule fails, provoking the search for a new rule to the non-terminal symbol. When the interpreter reaches the end of a rule without faults, the identification of the non-terminal symbol that starts it was successful. The analysis state is unpiled and the non-terminal symbol of the previous rule is accepted. If all the rules for a non-terminal symbol are tested without a matching being found, there was a fault in the symbol identification. The analysis state is unpiled and the previous rule fails, provoking the search for a new rule. The CUT symbol controls the backtracking mechanism marking a point to the rule that, if hit during the backtracking, causes the rule to fail without retrocession to the beginning of the rule. When the terminal symbols match the tokens in the list, some parameters are copied from the tokens to the analysis tree. So, gender, number, grammatical order and other parameters for the non-terminal symbols are being defined. According to the rules defined by special attributes (e.g.: CN(1,2)--testing noun concordance), some of these parameters are disseminated in the tree and tested against other symbols parameters. Special predicates for processing grammatical events have been added to the basic mechanism. For example, the CN (i,j) predicate tests noun agreement (gender and term) of the order i and j terms for the rule under analysis. In analyzing the grammar a sentence, some errors will not entail a rule default, but instead increase the "cost" stored in the analysis structure. Spelling errors, agreement errors, comma misplacement, and pronoun misplacement each have a cost. Under the analysis process, a sentence is deemed correct whenever an analysis is successfully completed (no "faults") and the accrued cost for errors is zero. If cost has accrued, the analysis tree and its associated cost (if final cost reached is lower than the stored cost) are saved and the analysis continues as if a fault had occurred (EOL predicate). Any errors detected during this stage of the grammatical analysis are reported at its end. The tree resulting form the analysis represents the raw material for the suggestion maker. Several routines perform a post-analysis of the tree and point out errors and suggestions. Analysis involving "generative grammar" can result in exponentially increasing complexity. In order to speed up the analysis, several systems are employed. Once the list of expanded TOKENS (with accompanying grammar information) is generated, and prior to the grammatical analysis, a heuristic process eliminates any meaning which are incompatible for each TOKEN, based upon the placement of each TOKEN within the sentence and the classes of surrounding words. The elimination of incompatible classes limits the search and saves time. Here is an example: Sample text: Vende-se uma casa. Analysis: In general, the particle `se` can be any of the following: objective case pronoun, subordinating conjunction, or adverbial conjunction. However, in this case, the precedent word ends in a hyphen, making it impossible for the `se` to be a conjunction. Other specific events are also dealt with prior to the grammatical analysis. These specific "pre-analysis errors", as well as a summary of the rules for simplifying their analysis, are discussed in the following paragraphs. The terms "no," "na," "nos," or "nas" can signal either a combination of the preposition "em" plus a definite article or an oblique pronoun. If the precedent word does not end with a hyphen, the particles "no", "na" and "nas" cannot be oblique pronouns (objective case). "O", "a", "os", or "as" can be any of the following: an article, an oblique pronoun, or a demonstrative pronoun. If neither the precedent nor the following word is a verb, "o", "a", "os" and "as" cannot be oblique pronouns (objective case). Further, "a" can be a preposition, an article, an oblique pronoun, or a demonstrative pronoun. If the antecedent word ends with a hyphen, or if it is the last word of a sentence, "a" cannot be a preposition. If the following word is a demonstrative or oblique pronoun, "a" is a preposition. If the following word is an infinitive verb, "a" is a preposition or an oblique pronoun. "Um" or "uma" can be either an article, a numeral, or an indefinite pronoun. Before an impersonal infinitive, "Uma" and "um" are not articles. Whenever the following word is not an adjective, noun, indefinite adjective pronoun, or infinitive, then the current word is not an article. Some nouns are identical to verbal forms. For example, "canto" can be either a noun or the indicative of the verb "cantar." Whenever the precedent word is a definite article but not an oblique pronoun, or if it is an indefinite article, possessive pronoun, or adjective demonstrative pronoun, then the current word is not a verb. Any event of crasis before masculine or infinitive words is incorrect. Repeated words within a sentence are shown, and the user is given the option of removing duplicates. When several proper names in the list of tokens occur sequentially, without any other token between them, the analyzer admits that all the sequence designates the same individual, converting this sequence into a single token with the attributes inherent to the proper name. Examples: Fernando Henrique Cardoso foi eleito presidents. (Fernando Henrique Cardoso was elected President.) A costa do Estado dp Rio de Janeiro e muito bonita. (The State of Rio de Janeiro seashore is very beautiful.) The same agglutination mechanism identifies words composed by hyphenization, converting them into a single token with the noun attributes. Examples: Hoje eu nao esquecerei o guarda-chuva. (I will not forget the umbrella today.) O pao-de-lo que Maria fez esta delicioso. (The cake Mary made is delicious.) Several measures can be employed to reduce the size and complexity of the sentence being analyzed. A table of adverbial and prepositional idioms (a medida que, a despeito de, etc.) is created. Such idioms are identified in a sentence by comparison with the entries in the table. Once identified, they are removed from the text and replaced with a dummy word of a proper grammatical class (adverb or preposition). Date expressions are identified by standard format (ex: 4 de maio de 1995--number "de" month "de" year). Once identified, they are removed from the text and replaced with a generic adverb. Inessential wording such as "por examplo" ›for instance! and "isto e" ›to wit! is eliminated to reduce the size and complexity of the sentence under analysis. For some words with paronymous equivalents in different grammatical classes (mau x mal; mas x mais), the paronymous form of the word is included in the list of word meanings prior to the grammatical analysis. At the conclusion of the analysis, the routine for pinpointing spelling errors shall detect whether the word should have been written differently. Any time "para mim" is followed by an impersonal infinitive, whether or not preceded by a negative adverb, is detected, "para eu" is suggested as an alternative. In a case such as, "A porta esta meia aberta," the analysis detects any use of "meia" when the following word is a participle or adjective, and the preceding word is not an article, possessive pronoun, or adjective demonstrative pronoun. The adverb "meio" is suggested as correction. Situations are identified when the word "muita" (a series of) is used in place of the intensity adverb "muita" (much). The error is pointed out and the correct word is suggested. In such a case as "Eu tenho muita pouca paciencia", the analysis detects the use of the word "muita"--an indefinite pronoun--instead of the intensity adverb "muito." Each grammar symbol represents a set of subrules, all with the same preceding term, united by the ".vertline." symbol, representing an "OR" connection. During the grammatical analysis, the first subrule is tried first. If it fails, the next subrule is tried, and so on. The price, in terms of elapsed time, of trying inadequate subrules can be high. The analysis mechanism may have to be taken far, through a succession of failures, before the subrule is deemed inapplicable to the sentence. Therefore, whether and when a subrule is tried can have great impact on the speed of the analysis. In order to optimize speed of analysis, the basic table of rules is pre-processed to determine, for each subrule, a set of properties called FIRST. FIRST is the set of all properties that a TOKEN may possess in order for that subrule to be feasible. Upon definition of the set FIRST, a mask of bits for mapping the presence of each set property is built and added to the subrule. Each TOKEN has a map of properties with the same format as the map of subrules. The grammatical analysis mechanism, before processing a subrule, checks whether the current TOKEN satisfies any of the properties of FIRST. A simple logical AND operation is sufficient to determine whether the subrule should be tried. Otherwise, the analysis proceeds to the next subrule. The elimination of fruitless rules greatly improves the speed of analysis. Here is an example: Subject: PERSONAL.sub.-- PRONOUN.vertline.article adjective NOUN Article: DEFINITE ARTICLE.vertline.INDEFINITE ARTICLE.vertline."" Adjective: ADJECTIVE.vertline."" (Note: "" denotes a blank subrule) The FIRST for the first subrule for "subject" is {PERSONAL.sub.-- PRONOUN}, and the FIRST for the second subrule is {DEFINITE.sub.-- ARTICLE, INDEFINITE.sub.-- ARTICLE, ADJECTIVE, NOUN}. The occurrence of ADJECTIVE is due to the possibility of "article" being blank. The occurrence of NOUN is due to the possibility of "adjective" being blank. A hypothetical TOKEN with properties (ADVERB, INTENSITY.sub.-- ADVERB, INDEFINITE.sub.-- PRONOUN, ADJECTIVE) would inhibit experimenting the first subrule for subject, but would activate trial of the second subrule. This occurs because the intersection of the possible classes for FIRST token of the first subrule with the classes for the TOKEN is empty and the intersection of the possible classes for the FIRST token of the second subrule is not empty: (ADJECTIVE). Each term of a subrule causes the interpreter to go forward if there is a match between the grammar and the sequence of TOKENS being considered. Otherwise, backtracking occurs. This is true only for normal predicates, those referring to properties of TOKENS. When the predicates for verbal and noun agreement detect any error, they add to the agreement error counter and yield a return message of success. This is because the path being followed may not be the correct path. Any errors found may be due to bad path selection. The cost is a measure of how mistaken the path is. In this case, the analysis tree and the cost of the analysis--which grows with the number of errors found--are saved in a result buffer. The EOL predicate yields a return message of FAULT so the analysis mechanism is deceived into choosing another path. The purpose is to find less expensive paths (with fewer errors) which are intended to reflect a more faithful interpretation of the phrase. During this subsequent analysis, the addition of paths to storage depends upon whether the cost is lower than that of paths already stored. There is no sense in proceeding with a path which has an accrued cost equal to or higher than the cost of a previously saved path. In order to speed up the analysis, whenever the accrued cost is equal to or exceeds the cost of a previously saved path, all predicated yield FAULT return messages in order to trigger backtracking of the analysis to a point of lower accrued cost. The invention operates under the presumption that the text is intended to be correct. Therefore, when an analysis path encounters a great many errors (a predetermined threshold), the path is deemed incorrect. At that point, all predicates start yielding FAULT return messages in order to trigger backtracking of the analysis to a point of accrued cost below the limit. The interpreter keeps each decision point in a stack, in order to be able to backtrack to this point in case of failure of later symbols or predicates. Besides decision points, the present invention also keeps the points where spelling and agreement errors are found. Thus, when the grammar checker finds errors in excess of the threshold or the previously saved cost, it may directly backtrack to the point where the last error has been found. The CUT predicate inhibits the revaluation of predicates within a subrule, returning a FAULT if any backtracking is needed to its left. THE CUTR predicate is similar to CUT: it inhibits the evaluation of anything to its left, including all subrules therefor. If the first subrule for predicate has been satisfied (a valid linking verb and predicative were found) and if later on a FAULT occurs, the second subrule will not be analyzed, saving time. At the end of a successful analysis in which the cost is not zero, the analysis tree is searched for terminal elements for which the spelling differs from the originally supplied one. This stage detects accent errors and paronymous words. Here is an example: homem publico vs. home publico The suggestion is direct, and the expected form (homem publico) is presented. Agreement errors are checked as follows. During the analysis, CN, CV, CVN predicates check the agreement of the various elements of the subrule, specified according to their arguments. If an error is found, an additional value is added to the path cost and the analysis goes on. Agreement is achieved as shown in this example: term: article noun CN(1,2) GEN and NUM are initialized with multi-purpose values. For each CN predicate argument (in this case, 1 and 2), the properties of the corresponding elements in the subrule are obtained (1=article, 2=noun) and the analysis proceeds with an AND for GEN and NUM, with the corresponding properties of the element. At the end, GEN and NUM are copied to the head of the list. If any of the properties is zero, there is an agreement error. The tree is searched for CN, CV, and CVN predicates. If any of these predicates has yielded no error, it has been removed from the tree prior to this stage. Therefore, the predicates found at the suggestion stage are those that have detected an error. From the information stored in each tree node, including the predicate, it is determined to which subrule the node belongs. Going up the tree hierarchy, the path that has been followed is examined to determine whether the error still affects this part of the tree. When the error no longer affects the mother rule, this stage of the analysis is complete. Here is an example: Sample text: Eu gosto do menina. The error is limited to the indirect object. Because the indirect object does not require agreement with the rest of the sentence, the analysis does not need to continue up the tree to the "verb" and "subject" level. It is sufficient to analyze only as far up as the "indirect object" nonterminal. Thus, when an error is pinpointed, it can be more specifically pointed out: Eu gosto do menina. Any properties which have not been affected by the agreement error (in the example above, number and person) shall be maintained. For properties which have been affected (gender in the example), all possible variations in each domain (masculine and feminine) shall be generated. For each combination of gender, number and person, the analysis proceeds down the tree, with inflection of all terminals. Here is an example: Sample text: Eu gosto do menina. For the error "do menina," the suggestions are "do menino" and "da menina". If any of the inflections is impossible, it is not suggested. For example: Sample text: Comprei uma carro . . . Suggestions is limited to "um carro" because there is no feminine inflection for "carro." The generated suggestions are arranged according to ascending number of altered words. If two suggestions have the same number of altered words, then the one with the fewest altered letters is presented first. At the end of the grammatical analysis, the placement of pronouns is tested. The tree is searched for objective pronouns. For each objective pronoun found, the position of its main and auxiliary verb is determined. Also, it is determined whether there are any attractive elements (i.e., negative adverbs, personal pronouns, relative pronouns) for the oblique pronoun. After these determinations are made, the routine judges whether the pronoun is correctly placed. If not, a correct placement is suggested. For simple verbs (without auxiliary), the following rules apply. Attractive elements require proximity of the pronoun ("Eu te amo" vs. "eu amo-te"; "Nao te amo" vs. "Nao amo-te"). Future present or future anterior must precede the verb. ("Joao te chamara" vs. "Joao chamara-te" vs. "Joao chamar-te-a"--the placement within the verb is not yet suggested). Participles do not accept placement of oblique pronouns after the verb. An oblique pronoun should not be placed at the beginning of a sentence. For verbal phrases, the dividing line between right and wrong is even thinner, depending upon the pronoun and euphony, as illustrated by returning to the first of the above examples: "Eu sei que vou te amar" vs. "Eu sei que te vou amar" (Portugal) vs. "Eu sei que vou amar-te" Non eurphonic: "Eu sei que vou a amar" vs. "Eu sei que a vou amar" (Portugal) vs. "Eu sei que vou ama-la" The list of suggestions is ordered according to commonality of use in Brazilian Portuguese, omitting or leaving for last archaic and Portugal uses. The check for comma usage, the grammar checker uses special predicates (RECOMMENDED.sub.-- COMMA and NO-COMMA) to mark where a comma is expected and where there should be no comma, respectively. In a manner similar to that for the agreement predicates, these special "comma" predicates do not result in faults if not satisfied. They only result in additions to the error counter and proceed, and the grammar checker seeks the least expensive path (having the fewest errors). Upon completion of the analysis, the tree is searched for predicates which have not been satisfied. Then suggestions (or orders, where applicable) are made for removal or addition of commas. A special procedure is followed to deal with the choice between "por que" and "porque," Before the grammatical analysis, the sentence is searched for all variations of "porque," which are then converted to a new "porque," uniting the class of "porque" and "por que." This new union is utilized during the grammatical analysis. at the end of the analysis, the tree is searched for "porques," and each one found is compared to the class used for the spelling form (joint or separate). In the case of disagreement, the error is contested and the correct form is suggested. The analysis is based on the following criteria: (1) if a conjunction, it should be "porque"; and (2) if an interrogative adverb, it should be "por que". In many cases, the sentence is unrecognizable due to an excess of errors or due to its complexity. In addition, there do exist structures which are not identified by the grammar checker. In this case, a partial analysis of noun phrases is conducted. The sentence is searched for any elements that could indicate the beginning of a noun phrase (article, adjective, pronoun, noun). If any such element is found, the grammar checker attempts to identify the noun phrase. This analysis is similar to that for a normal sentence, except that the nonterminal that is sought is "simplified.sub.-- noun.sub.-- phrase" instead of "sentence". The rules for "simplified.sub.-- noun.sub.-- phrase" do not accept sentences as noun phrases or as adjectives. This partial analysis locates agreement errors that otherwise would go undetected in an unrecognizable sentence. Although multiple embodiments are described herein, it should be understood that the inventor intends for this invention to cover other grammar checkers, not particularly described herein. For instance, the steps could be done in a different order. Or certain steps could be eliminated altogether.
|
Same subclass Same class Consider this |
||||||||||
