Digital data processing method and means for word classification by pattern analysis4471459
Abstract
Digital data processing means employing a method for separating acceptable spellings of words from nonacceptable spellings wherein each word comprises characters assigned positions. The character positions in each word are assigned increasing values from one end of the word to the other, positions of the same number of characters from the same end of each word being assigned the same value. Two words to be compared are called a query word and a candidate word. The method includes the following steps. A representation of a character in a given position of the query word is compared with representations of characters in the next lower position, the same position, and the next higher position in the candidate word and a compare type indication is formed representing a match or a mismatch between each query word character and the candidate word characters under comparison. The given position of the query word under comparison is changed to the next higher valued position of the query word and the preceding step is repeated to form another compare type indication. The compare type indications are processed to thereby form a spelling classification indication for one of the words under comparison representing an acceptable spelling or a nonacceptable spelling with respect to the other.
Claims
What is claim is:
1. A method using a digital data processing means for separating words with acceptable spellings from words with nonacceptable spellings wherein each word comprises characters assigned character positions, character position in each word being assigned increasing values from one end of the word to the other, characters at the same number of positions from one end of each word being assigned the same value, the two words to be compared being called a query word and a candidate word, the method comprising the steps of:
(a) comparing representation of a character in a given character position of the query word with representations of characters in the next lower character position, the same character position, and the next higher character position in the candidate word, and forming a compare type indication representing a match or a mismatch between such query word character and each of such candidate word characters under comparison;
(b) changing the given character position of the query word under comparison in the preceding step of comparing to the next higher valued character position of the query word and repeating the preceding step of comparing at least once to form another compare type indication; and
(c) processing and utilizing said compare type indications to thereby form a spelling classification indication for one of the words under comparison representing an acceptable spelling or a nonacceptable spelling.
2. A method according to claim 1 comprising the steps of:
repeating the preceding step of comparing, the step of changing and the step of processing and utilizing, for each of a plurality of different candidate words, to thereby form a spelling classification indication, for each of said plurality of candidate words; and
utilizing the spelling classification indication for each of said plurality of candidate words for physically separating representations of those candidate words indicated to be acceptable spellings from representations of those candidate words indicated to be nonacceptable spellings of the query word.
3. A method according to claim 2 comprising the step of combining a representation of each of said plurality of candidate words which are indicated to be an acceptable spelling with a representation of the corresponding spelling classification indication.
4. A method according to claim 3 wherein the step of physically separating comprises the step of physically transferring the combined representations of candidate words and acceptable spelling classifications.
5. A method according to claim 1 wherein the step of comparing and forming a compare type indication comprises the additional steps of:
forming for a query character under comparison a compare type representation comprising a lower storage position, a same storage position, and a next higher storage position which correspond respectively to such next lower character position, the same character position, and the next higher character position in the candidate word,
forming in each binary storage position a binary representation of a first value when the comparison with the candidate character in the corresponding character position of the candidate word results in an equality, and
forming in each storage position a binary representation of a second value when the comparison with the candidate character in the corresponding character position of the candidate word results in an inequality.
6. A method according to claim 1 wherein first and second compare type indications are formed using, respectively, a query character in a first character position and a second query character in a second higher valued character position of the query word, the steps of processing and utilizing comprising the steps of:
utilizing at least the first compare type indication for forming a first tentative spelling classification indication; and
utilizing at least the second compare type indication and the first tentative spelling classification indication for forming a second tentative spelling classification indication.
7. A method according to claim 6 including the step of utilizing the second tentative spelling classification indication in forming said spelling classification indication.
8. A method according to claim 1 comprising the step of forming a digitally coded signal representative of the spelling classification indication.
9. A method according to claim 1 comprising the step of forming a spelling classification indication comprising a digitally coded signal identifying any one of at least the following acceptable spelling errors between the query word and the candidate word:
a character insertion error,
a character deletion error, and
a character transposition error.
10. A method according to claim 9 wherein the acceptable spelling errors include equality and inequality between the characters and their order in the candidate word and the query word.
11. A method according to claim 1 wherein the step of processing and utilizing said compare type indications to form a spelling classification indication is partially performed following the formation of each compare type indication for each query character.
12. A method according to claim 1 comprising the additional step of accumulating the compare type indications in a store, and wherein the step of processing and utilizing said compare type indications comprises the step of processing and utilizing the compare type indications after they have been accumulated in the store.
13. A method using a digital data processing means for separating words with acceptable spellings from words with nonacceptable spellings wherein each word comprises characters assigned character positions, character positions in each word being assigned increasing values from one end of the word to the other, character at the same number of positions from one end of each word being assigned the same value, two words to be compared being called a query word and a candidate word, the method comprising the steps of:
(a) comparing representations of each of plural given characters in each of different given character positions of the query word with representations of characters in the candidate word;
(b) forming representations of a plurality of compare type indications, a different compare type indication for each of said plural given characters of the query word, each compare type indication representing whether a match was found in the step of comparing between one of the given characters of the query word and the three characters of the candidate word located in the next lower character position than, the same character position as, and the next higher character position than, such given character of the query word; and
(c) processing and utilizing said compare type indications in order of the corresponding given characters of the query to form a spelling classification indication for one of the words under comparison representing an acceptable spelling or a nonacceptable spelling.
14. A method according to claim 13 wherein the digital data processing means is programmable and comprising the steps of:
initiating the execution of a compare procedure enabling the data processing means to perform such steps of comparing representations and forming representations of a plurality of compare type indications.
15. A method according to claim 14 comprising the steps of initiating the operation of a branching procedure for performing the step of processing and utilizing said compare type indications, said branching procedure having a plurality of nodal junctions, each nodal junction having plural branches therefrom to one or more other nodal junctions, a different one of the plurality of nodal junctions being reached for each of said compare type indications, the particular branch which is taken from one nodal junction to another being conditioned upon the value represented by a compare type indication and the one nodal junction.
16. Digital data processing means for separating words with acceptable spellings from words with nonacceptable spellings wherein each word comprises characters assigned character positions, character positions in each word being assigned increasing values from one end of the word to the other, characters at the same number of positions from one end of each word being assigned the same value, two words to be compared being called a query word and a candidate word, comprising:
(a) means operative for comparing a representation of a character in a given character position of the query word with representations of characters in the next lower character position, the same character position, and the next higher character position in the candidate word, and for forming a compare type indication representing a match or a mismatch between such query word character and each of such candidate word characters under comparison;
(b) means for changing the given character position of the query word under comparison by the means for comparing to the next higher valued character position of the query word, the means for comparing comprising means operative for repeating the recited operation using the next higher valued character position of the query word to form another compare type indication; and
(c) means for processing and utilizing said compare type indications to thereby form a spelling classification indication for one of the words under comparison representing an acceptable spelling or a nonacceptable spelling.
17. Digital data processing means according to claim 16 comprising:
means for enabling a repeat of the recited operation for the comparing means, for the means for changing and for the means for processing for each of a plurality of different candidate words to thereby form a spelling classification indication for each of said plurality of candidate words; and
means for utilizing the spelling classification indication for each of said plurality of candidate words for physically separating representations of those candidate words indicated to be acceptable spellings from representations of those candidate words indicated to be nonacceptable spellings of the query word.
18. Digital data processing means according to claim 17 comprising means for combining a representation of each of said plurality of candidate words which are indicated to be an acceptable spelling with a representation of the corresponding spelling classification indication.
19. Digital data processing means according to claim 18 wherein the means for physically separating comprises means for physically transferring the combined representations of candidate words and acceptable spelling classifications.
20. Digital data processing means according to claim 16 wherein the means for comparing and forming a compare type indication comprises:
means for forming for a query character under comparison a compare type representation comprising a lower storage position, a same storage position, and a next higher storage position corresponding, respectively, to such next lower character position, the same character position, and the next higher character position in the candidate word, and
means for forming in each storage position a binary representation of a first value when the comparison with the candidate character in the corresponding character position of the candidate word results in an equality, and
means for forming in each storage position a binary representation of a second value when the comparison with the candidate character in the corresponding character position of the candidate word results in an inequality.
21. Means according to claim 16 wherein the first and second compare type indications are formed using a query character in a first character position and a second query character in a second higher valued character position of the query word, the means for processing and utilizing comprising:
means for utilizing at least the first compare type indication for forming a first tentative spelling classification indication; and
means for utilizing at least the second compare type indication and the first tentative spelling classification indication for forming a second tentative spelling classification indication.
22. Means according to claim 21 comprising means for utilizing the second tentative spelling classification indication in forming said spelling classification indication.
23. Means according to claim 16 comprising means for forming a digitally coded signal representative of the spelling classification indication.
24. Means according to claim 16 comprising means for forming a spelling classification indication comprising a digitally coded signal identifying any one of at least the following acceptable spelling errors between the query word and the candidate word:
a character insertion error,
a character deletion error, and
a character transposition error.
25. Digital data processing means according to claim 34 wherein the acceptable spelling errors include equality between the characters and their order in the candidate word and the query word.
26. Digital data processing means according to claim 16 wherein the operation of the means for processing and utilizing said compare type indications to form spelling classification indications is partially performed following the formation of each compare type indication for each query character.
27. Digital data processing means according to claim 16 comprising a store, means for accumulating the compare type indications in the store, and wherein the means for processing and utilizing said compare type indications comprises means for processing and utilizing the compare type indications after they have been accumulated in the store.
28. Digital data processing means for separating words with acceptable spellings from words with nonacceptable spellings wherein each word comprises characters assigned character positions, character positions in each word being assigned increasing values from one end of the word to the other, characters at the same number of positions from one end of each word being assigned the same value, two words to be compared being called a query word and a candidate word, comprising:
(a) means operative for comparing representations of each of plural given characters in each of different given character positions of the query word with representations of characters in the candidate word;
(b) means for forming representations of a plurality of compare type indications, a different compare type indication for each of said plural given characters of the query word, each compare type indication representing whether a match was found by the comparing means between one of the given characters of the query word and the three characters of the candidate word located in the next lower character position than, the same character position as, and the next higher character position than, such given character of the query word; and
(c) means for processing and utilizing said compare type indications in order of the corresponding given characters of the query to form a spelling classification indication for one of the words under comparison representing an acceptable spelling or a nonacceptable spelling.
29. Means according to claim 28 wherein the digital data processing means is programmable and comprising:
means for initiating the execution of a compare procedure enabling the digital data processing means to perform such steps of comparing representations and forming representations of a plurality of compare type indications.
30. Means according to claim 29 comprising means for initiating the operation of a branching procedure for performing the step of processing and utilizing said compare type indications, said branching procedure having a plurality of nodal junctions, each nodal junction having plural branches therefrom to one or more other nodal junctions, a different one of the plurality of nodal junctions being reached for each of said compare type indications, the particular branch which is taken from one nodal junction to another being conditioned upon the value represented by a compare type indication and the one nodal junction.
31. A method using a digital data processing means for separating words with acceptable spellings from words with nonacceptable spellings wherein each word comprises characters assigned character positions, character positions in each word being assigned increasing values from one end of the word to the other, characters at the same number of positions from one end of each word being assigned the same value, the two words to be compared being called a query word and a candidate word, the method comprising the steps of:
(a) storing the query word in a store in the processing means;
(b) storing candidate words in a store in the processing means;
(c) enabling the data processing means to operatively compare a representation of a character in a given character position of the stored query word with representations of characters in the next lower character position, the same character position, and the next higher character position in each of the stored candidate words, and form a compare type indication representing a match or a mismatch between such query word character and each of such candidate word characters under comparison;
(d) enabling the processing means to change the given character position of the stored query word in the preceding operation of comparing in sequence to higher valued character positions of the stored query word and for each such character position repeat the preceding step of comparing to form a plurality of compare type indications for each of the stored candidate words; and
(e) enabling the processing means to process and utilize the plurality of compare type indications formed for each of the stored candidate words to thereby form a spelling classification indication for each of the candidate words representing an acceptable spelling or a nonacceptable spelling.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
This invention is related to the subject of U.S. patent application Ser. No. 307,571 filed Sept. 30, 1981 entitled LOCATING DIGITAL CODED WORDS WHICH ARE BOTH ACCEPTABLE MISSPELLINGS AND ACCEPTABLE INFLECTIONS OF DIGITAL CODED QUERY WORDS, and to the subject of U.S. patent application Ser. No. 307,093 filed Sept. 30, 1981 entitled METHOD AND MEANS USING DIGITAL DATA PROCESSING MEANS FOR LOCATING REPRESENTATIONS IN A STORED TEXTUAL DATA BASE.
BACKGROUND OF THE INVENTION
This invention relates to digital data processing methods and means for separating acceptable spellings of words from nonacceptable spellings.
Methods have been proposed for using a programmed digital computer for correcting spelling errors. Efficiency is obtained by limiting the class of spelling errors to be considered. In this regard it has been found that over 80% of spelling errors fall into one of four classes of single error, namely, substitution of a character, deletion of a character, insertion of a character, or transposition of adjacent characters.
A specific computer program implemented method for correcting spelling errors is discussed in the Communications of the ACM, Volume 13, No, 2, February 1970, in an article entitled "Spelling Correction in Systems Programs" by Howard L. Morgan, pp. 90-94. This article discusses a first case where the entry word and the dictionary word are the same length. The method involves comparing the word in question against a dictionary of correctly spelled words. In this situation the method involves taking an exclusive OR of the word in question and the correctly spelled words and examining the nonzero postions in the result. If a character position is the same in both the word in question and the dictionary word, an exclusive OR will produce a zero result. If there is exactly one nonzero position, a substitution error has occurred. If there are more than two nonzero positions, the two positions are checked for equality. If they are equal, the presence of a transposition is checked in the word in question and in the dictionary word, at these positions, and the correction is made.
A second case occurs when the length of the two words differs by one character. In this situation an exclusive OR is used to find the first nonequal position, starting at the left. Subsequently the remaining parts of the word in question and the dictionary are aligned from the right and checked again by using the exclusive OR. If the two words match from the right down to the same unequal position, a single missing or added letter misspelling has been found at the position in question.
The aforementioned method discussed in the February 1970 Communications of the ACM suffers from the following disadvantages:
It must search all words in the dictionary.
It does not cope with candidates larger than query size plus or minus one, which can be valid misspellings of inflected forms of the query word.
It does not classify multiple errors.
Another technique is disclosed in Communications of the ACM, Volume 7, No. 3, March 1964, in an article entitled "A Technique for Computer Detection and Correction of Spelling Errors" by Fred J. Damerau, pp. 171-176. Again the word in question is compared against a dictionary of correctly spelled words. The search is accelerated by performing a comparison of words only when the character counts of the word in question and the word from the dictionary are the same length. If the word is found the processing is terminated and the word in question is assumed to be correctly spelled.
If the word in question is not found, the program again searches the dictionary this time using spelling correction rules. If the difference between the two words is greater than 1, the word in question cannot be an acceptable misspelling of the dictionary word. If the two words are equal or differ by only one character or if the character register is different in more than 2 bit positions, no comparison is possible and no further comparison is required.
If after the foregoing a match is still considered possible, the two words are compared position-for-position If the number of characters in the two words are the same and the words differ in only one character position, it is assumed that the two words are the same. If the two words differ in two adjacent character positions, the two characters of the word in question are interchanged and compared to the same two characters of the dictionary word and, if a match results, the two words are assumed to be the same. For all other cases of equal character length, a no-match condition is assumed. The next dictionary entry is compared to the word in question.
If the word in question is a character longer than the dictionary word, the first difference character of the word in question is discarded and the remaining characters are shifted left one position. If a match in all positions occurs, it is assumed that the words are the same. If the dictionary word is a character longer, the first difference character of the dictionary word is discarded and the words are compared the same as above.
The method is repeated until a match of the word in question is found or until all entries in the dictionary have been tested.
The method disclosed in the Communications of the ACM of March 1964 suffers from the following disadvantages:
It must search the entire dictionary.
It cannot handle candidate words larger than +1 greater than the query word.
The Communications of the ACM, December 1980, Volume 23, No. 12, contains an article entitled "Computer Programs for Detecting and Correcting Spelling Errors". This also makes reference to the fact that 80% of all spelling errors are of one of four classes. The article goes on to discuss a method whereby misspelled words in a dictionary may be matched against a token (i.e., word) which is being searched in the dictionary. In this regard, transpositions are detected by transposing each pair of adjacent characters in the token, one at a time, and searching for the resultant token. One extra letter is handled by deleting each character in the token, one at a time, and searching for the resultant tokens. One missing letter and one wrong letter are handled by one of two strategies. One of the approaches suggested is to substitute each potential character in each character position and search for the token in the dictionary. The problem with this approach is that a very large number of repeated searches is required. Another suggested approach is to create a table of tokens which are, in effect, acceptable misspellings of words. When a questionable word is encountered, one must search the table of acceptable misspellings to locate a token which matches the one in question. Obviously where a very large number of acceptable misspellings is involved, the search can be quite tedious.
Another approach discussed in that article involves the frequency of two letter pairs and three letter triples to detect potential misspellings in order to form an index into a table of acceptable misspellings.
Other approaches that have been taken involve socalled interactive spelling checkers. In this regard, each word is checked against a dictionary of correctly spelled words and, if the word is not in the dictionary, the user is asked what to do. This approach obviously does not provide any type of automatic matching of misspelled words. Another technique employed is to take tokens and convert them into standard phonetic spelling and to find similar sounding words in a dictionary. This, for example, works well with double errors using, for example, "f" for "ph" or "k" for "qu."
The problem of programming a computer to determine whether or not a string of characters is an acceptable misspelling of a given word has been widely considered. See for example "String Similarity and Misspellings" by Cyril N. Alberga, Communications of the ACM, Volume 10, No. 5, May 1967, pp. 302-313; "Approximate String Matching" by Patrick A. V. Hall and Geoff R. Dowling, Computing Surveys, Volume 12, No. 4, December 1980, pp. 381-402; and "Computer Programs for Detecting and Correcting Spelling Errors" by James L. Peterson, Communications of ACM, Volume 23, No. 12, December 1980, pp. 676-687.
These prior art approaches to handling misspelling are generally slow and inefficient for large data bases.
SUMMARY OF THE INVENTION
Briefly, a method is disclosed herein using a digital data processing means for separating acceptable spellings of words from nonacceptable spellings wherein each word comprises characters assigned imaginary positions. Character positions in each word are assigned increasing values from one of the words to the other, positions the same number of characters from one end of each word being assigned the same value. Two words to be compared are called a query word and a candidate word. The method involves the steps of comparing a representation of a character in a given position of the query word with representations of characters in the next lower position, the same position, and the next higher position in the candidate word, and forming a compare type indication representing a match or a mismatch between each query word character and each of such candidate word characters under comparison, changing the given position of the query word under comparison to the next higher value position of the query word and repeating the preceding step to form another compare type indication, and processing and utilizing the compare type indications to thereby form a spelling classification indication for one of the words under comparison representing an acceptable spelling or a nonacceptable spelling.
Preferably the method includes the step of causing a plurality of said candidate words to be compared against the query word to form a plurality of spelling classification indications, at least one for each of a plurality of candidate words. Also included is the step of utilizing spelling classification indications for each of the plurality of candidate words for physically separating representations of those candidate words indicated to be acceptable spellings from representations of those candidate words indicated to be nonacceptable spellings of the received query word.
Preferably a representation of each of the candidate words which are indicated to be acceptable spellings are combined with the representation of the corresponding spelling classification indication.
The step of physically separating preferably comprises the step of physically transferring the combined representations of candidate words and acceptable spelling classifications.
Also preferably the step of comparing and forming a compare type indication comprises the step of forming a compare type indication having first, second and third binary positions corresponding, respectively, to said next lower position, the same position, and the next higher position in the candidate word. Also preferably included are the steps of forming for a query character under comparison a compare type signal comprising a lower storage position, a same storage position, a next higher position corresponding, respectively, to such next lower position, the same position, and the next higher position in the candidate word, and forming in each binary storage position a binary representation of a first value when the comparison with the candidate character in the corresponding position of the candidate word results in an equality, and forming in each binary storage position a binary representation of a second value when the comparison with the candidate character in the corresponding position of the candidate word results in an inequality.
According to a preferred arrangement first and second compare type indications are formed using a query character in a first position and a second query character in a second higher valued position of the query word. The step of processing and utilizing includes the steps of utilizing at least the first compare type indication for forming a first tentative spelling classification indication and utilizes at least the second compare type indication and the first tentative spelling classification indication for forming a second tentative spelling classification.
Preferably the second tentative spelling classification indication is used in forming the final spelling classification indication.
According to a preferred embodiment a digitally coded signal is formed representative of the spelling classification indication.
Preferably the step of forming a spelling classification indication forms a digitally coded signal identifying any one of at least the following acceptable spelling errors between the query word and the candidate word: a character insertion error, a character deletion error, and a character transposition error as well as equality between the characters and their order in the candidate word and the query word.
Preferably the step of processing and utilizing the compare type indications to form a spelling classification indication is partially performed following the formation of each compare type indication for each query character.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1A is a schematic and block diagram of a digital data processing system and embodying the present invention;
FIG. 1B is an overall flow diagram of a method of operation of the system of FIG. 1A according to the present invention;
FIG. 1 is a more detailed schematic and block diagram of the digital data processing system of FIG. 1A and embodying the present invention;
FIG. 2 is a block diagram of the programming utilized in the digital data processing system of FIG. 1 indicating the portions of the system in which the various programs are executed;
FIG. 3 comprises FIGS. 3A through 3H and comprises a flow diagram disclosing the sequence of operation of the digital data processing system of FIG. 1 and the method created by the program PQAPCNTRL stored in internal ROM 1122,1124 as it is executed by the system of FIG. 1;
FIG. 4A is a pictorial view illustrating the form and organization of the CLASSIFY.sub.-- TABLE -1200 contained in ROM 1122,1124;
FIG. 4B is a pictorial view illustrating the form and organization of the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE -1201 contained in the ROM 1122,1124;
FIG. 4 is a pictorial view illustrating the form and organization of the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE stored in the internal ROM 1122,1124 of the system of FIG. 1;
FIG. 5 is a pictorial view depicting the form and organization of the SUFFIX.sub.-- TABLE contained in the internal ROM 1122,1124 of the system of FIG. 1;
FIG. 6 is a schematic and block diagram embodying an alternate method and means to that implemented using the program PQAPCNTRL and the digital data processing system of FIG. 1;
FIG. 7 depicts the format of a family of entries as they are stored in external RAM 1104; FIG. 7 also contains a pictorial view of the packets buffer and the results buffer in external RAM 1104; the results buffer contains the results from having determined entry words containing acceptable misspellings and acceptable inflections of an original query word; indicated in the Figure are the various PARM structure variables (see Table 4) and what they point to or indicate;
FIG. 8 is a pictorial view depicting the word entries formed by the microprocessor 1118 in the input portion of FIFO 1130 in preparation for the operation of the MCS 1114;
FIG. 9 is a pictorial view depicting the form of the same entry words as in FIG. 8 as they are selected by the MCS 1114 after having determined those entries which are acceptable misspellings of the original query word;
FIG. 10 comprises FIGS. 10A through 10D and is a flow diagram disclosing the sequence of operation of the digital data processing system of FIG. 1 and the method created by the PSUFIX program as it is executed by the system of FIG. 1;
FIG. 11 is a schematic and block diagram of the misspelling classification system (MCS) 1114 as well as the input and output portions of FIFO 1130, the control and data buses 1218 and 1216, the microprocessor 1118, and the internal RAM 1126,1128 of FIG. 1;
FIG. 12 comprises FIGS. 12A through 12L and is a flow diagram disclosing the sequence of operation of the MCS system 1114 of FIG. 11 and the method created by the MCS program as it is executed by the MCS system;
FIG. 13 is a schematic and block diagram depicting an alternate embodiment of a portion of the misspelling classification system of FIG. 11;
FIG. 14 is a schematic and block diagram depicting an alternate embodiment of a portion of the misspelling classification system of FIG. 11;
FIGS. 15 and 16 form a schematic and block diagram disclosing an alternate embodiment of a portion of the digital data processing system of FIG. 11;
FIG. 17 is a schematic and block diagram disclosing an alternate embodiment of a portion of the digital data processing system of FIG. 11; and
FIG. 18 is a schematic and block diagram disclosing an alternate embodiment of a portion of the digital data processing system of FIG. 11.
DETAILED DESCRIPTION
INDEX
I. GENERAL DESCRIPTION
A. COMPUTER PROGRAM METHOD AND MEANS
1. INTRODUCTION
2. SUMMARY OF METHOD
3. PROGRAMS
II. HARDWARE--QAP CONTROL BOARD AND EXTERNAL CIRCUITS OF FIG. 1
III. HARDWARE--MCS OF FIG. 11
IV. METHOD AND MEANS EMPLOYING QAP CONTROL PROGRAM (PQAPCNTRL)
A. CONSTRUCTION AND USE OF CLASSIFY.sub.-- TABLE 1200, SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE 1201, ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202, AND SUFFIX.sub.-- TABLE 1204
B. OPERATION
C. ALTERNATE EMBODIMENT OF FIG. 6
D. METHOD AND MEANS EMPLOYING SUFFIX STRIPPER PROGRAM (PSUFIX)
E. MISSPELLING CLASSIFICATION SYSTEM
1. INTRODUCTION
2. DIGITAL DATA PROCESSING METHOD AND MEANS OF FIGS. 11 AND 12A-12L
3. ALTERNATE EMBODIMENT OF FIG. 14
4. ALTERNATE EMBODIMENT OF FIGS. 15 AND 16
5. ALTERNATE EMBODIMENT OF FIG. 17
6. ALTERNATE EMBODIMENT OF FIG. 18
V. TABLES
TABLE 1--WORD FORMAT AND SYNC SIGNALS
TABLE 2--ACCEPTABLE AND NONACCEPTABLE MISSPELLING CLASSES
TABLE 3--QRIO STRUCTURE IN EXTERNAL RAM 1104
TABLE 4--PARM STRUCTURE; QUERY INFORMATION TRANSFERRED FROM EXTERNAL RAM 1104 IN INTERNAL RAM 1126,1128
TABLE 5--VARIABLES/TABLES USED BY PQAPCNTRL
TABLE 6--PSUFIX RELATED DATA ITEMS INCLUDING PARAMETERS,VARIABLES AND TABLES
TABLE 7--ENTIN STRUCTURE; ENTRY WORD FORMAT RETURNED BY MCS 1114 TO RAM 1126,1128 ON QAP BOARD 1109
TABLE 8--CLASSIFY.sub.-- TABLE
TABLE 9--SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE
TABLE 10--ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE
TABLE 11--SUFFIX.sub.-- TABLE
TABLE 12--STRIPPING RULES, ACCEPTABLE SUFFIXES AND EXAMPLES
TABLE 13--not used
TABLE 14--EXAMPLE
TABLE 15--not used
TABLE 16--COMPARE TYPE INDICATION
TABLE 17--EXAMPLE OF MISSPELLING CLASS DETERMINATION
TABLE 18--TRIPLET COMPARE TYPE INDICATION
TABLE 19--EXAMPLE OF MISSPELLING CLASS DETERMINATION FOR WORD WITH NON UNIQUE CHARACTERS
TABLE 20--SEQUENCE OF OPERATION FIGS. 7, 8
I. GENERAL DESCRIPTION
A. Computer Program Method and Means
1. Introduction
FIG. 1 is a schematic and block diagram of a programmable digital data processing system. Included are hardware and computer programs, the latter being stored in read only memory, for locating and determining candidate (also called entry) words contained in a stored data base which are both acceptable misspellings and acceptable inflections of query words. The data base preferably is a textual data base arranged into paragraphs and records. Hardware and software are also included that use the entry words which are acceptable misspellings and acceptable inflections of the words of the query and scores the paragraphs of the data base according to how well the paragraphs match the acceptable entry words. Representations of the paragraphs of the data base are returned to the user in decreasing order by score, the best scored paragraph being returned first.
Referring to FIG. 1A, a user using terminal 1102 enters query words, each word composed of one or more characters, into the system. External circuits including interface 1103, microprocessor 1108, random access memory (RAM) 1104, and read only memory (ROM) 1106 then parse the words of the query and throw away those words which have little or no significance to the query, called stop words. The remaining query words are referred to as the significant words of the query. The significant words of the query after parsing are stored in RAM 1104 and are then taken one by one and used to interrogate entries in a stored data base to locate those data base entry words which are both an acceptable misspelling and an acceptable inflection of the significant query words. Representations of the entry words of the data base are stored in various forms in a memory, namely, external disk storage device 1107, and as required are transferred through a disk controller 1105 to a random access memory (RAM) 1104 for processing.
QAP control board 1109 is a programmable microprocessor system. More particularly the QAP control board 1109 contains a microprocessor 1118 and a misspelling classification system 1114 which in turn also contains a programmable microprocessor. Also included in the system are two read only memories which for convenience are shown as one and is designated herein as read only memory (ROM) 1122,1124, two random access memories which for convenience are shown as one and is designated herein as random access memory (RAM) 1126,1128, and a first-in first-out (FIFO) memory 1130. An interface and control system designated generally at 1115 provides an interface between the microprocessor 1118 and the bus 1110 and hence the rest of the external circuits to the right of bus 1110. The interface control system 1115 also provides an interface between microprocessor 1118 and ROM 1122,1124, RAM 1126,1128, and FIFO 1130. The FIFO 1130 provides the main communication for transferring data between the MCS 1114 and the microprocessor 1118.
The programs which control the operation of the microprocessor 1118 are stored in the ROM 1122,1124. RAM 1126,1128 provides a scratch pad memory as well as a storage for various values utilized by the microprocessor 1118 in its operation.
Briefly, representations. of each significant query word are transferred from RAM 1104 to RAM 1126,1128. There the microprocessor 1118 takes the query words, one at a time, strips the suffix from the query words, leaving a stem, and forms a suffix classification indication for the query words. The query word is then passed through the FIFO 1130 to the MCS 1114. In addition the family of entry words for the query word are transferred from RAM 1104 through the FIFO 1130 to the MCS 1114. The microprocessor in the MCS 1114 then takes the stem of the query word, compares it against the beginning characters of the entry words, and, for each entry word, determines a misspelling classification. Those entry words determined to have an acceptable misspelling classification as compared to the query word stem are then transferred back along with the misspelling classification for the entry word to the FIFO 1130 and from there to the RAM 1126,1128. At this point then the RAM 1126,1128 contains the entry words which are acceptable misspellings of the corresponding query word, the length of the stem of the corresponding query word, and the suffix classification indication for the corresponding query word. The microprocessor 1118 then utilizes the length of the stem and the misspelling classification value to determine the position of the suffix in each entry word and further uses the suffix classification indication to determine if each of the entry words is an acceptable inflection of the original query word. Those entry words which are acceptable misspellings and further are acceptable inflections of the original query word are called equivalent words to the query and representations of those equivalent words are then transferred back to the RAM 1104 where they are used to form packages for scoring and output to the user, as explained hereinafter in more detail.
2. Summary of Method
With the overall block diagram of the system of FIG. 1A in mind, consider now the overall flow diagram of FIG. 1B. Initially as depicted at 3008, the user, using the keyboard 1102A of the operator console 1102 (FIG. 1A) forms a query. The query consists of one or more query words which the user would like to find in combination in a paragraph of the textual data base stored in the disk 1107. By way of example, the query words may be "RATES OF INTEREST".
As depicted at block 3012 the data processing system of FIG. 1A then uses a table of stop words 3010 to identify and remove the stop words from the query, leaving significant query words. The significant query words are then passed to the rest of the flow, one by one.
Each query word is processed as depicted at 3015 by determining the stem of the query word and the length of the stem of the query word. Additionally the suffix of the query word is stripped from the query word, leaving only the stem. The suffix of the query word alone or in combination with the adjacent portion of the stem is used to determine a class of acceptable suffixes for the stem of the query word. To be explained in more detail, the class of acceptable suffixes will be used to determine whether entry (candidate) words whose beginning characters are acceptable misspellings of the stem of the query word have an acceptable suffix and therefore the entry word is both an acceptable misspelling and an acceptable inflection of the query word. Therefore, after block 3015 of the flow, the system will have determined, for each query word of the query, the following: a query stem 3016, which is the original query word with the suffix removed (i.e., for the word "RATES", the suffix "ES" is stripped leaving the query stem RAT); a stem length indication 3018 which indicates the length of the stem (i.e., for the query word "RATES" the stem length will be 3); and a suffix class indication 3020 indicating the class in which the suffix of the query word is contained.
The textual data base contains entry words. One portion of the textual data base is a dictionary of entry words which are stored and are accessible by the first two letters. All of the words having the same first two letters are stored together. For example, representations of significant words beginning with the letters AA are arranged together, representations of significant words beginning with the letters AB are arranged together, etc. The data base entry words which have the same first two characters are called a family of data base entry words. Such an arrangement of the entry words is a preferred arrangement of the data base but is not essential to the present invention.
The family of entry words corresponding to one of the query words is first processed at block 3024 of the flow by comparing the query stem 3016 with the beginning characters of each of the entry words in the corresponding family of query words, thereby forming a set 3026 of entry words which are acceptable misspellings of the query word stem 3016. Therefore the family of entry words corresponding to the query word is reduced to a set of entry words whose beginnings are acceptable misspellings of the query word stem.
During block 3030 of the flow the entry words in set 3026 whose beginning characters are acceptable misspellings of the query word stem 3016 are then checked to determine if they are acceptable inflections of the original query word. To this end the suffix class indication 3020 for the query word is used to access an acceptable suffix table 3028 from which acceptable suffixes are obtained and compared against the entry words in set 3026 to determine those which have acceptable suffixes and are therefore acceptable inflections. Following block 3030 there is a set 3031 of entry words which are acceptable misspellings and acceptable inflections (i.e., equivalents) of the query word.
During block 3032 of the flow the system forms a package 3038 for the query word which has a packet for each entry word of set 3031. Each packet contains a set of coded information or indications which may be used to locate information about the corresponding entry word in the stored data base. Of interest to the present invention is that each packet has indications which as described below are used to locate each of the documents and/or paragraphs within the document in which the corresponding entry word is contained in the textual data base.
The steps of the method in blocks 3015, 3024, 3030 and 3034 are then repeated for another significant query word to thereby locate those entry words which are both acceptable misspellings and acceptable inflections (of the type discussed above) for the next query word. A package 3038 is formed for the next query word and contains a packet for each of the entry words which is an acceptable misspelling and an acceptable inflection of the corresponding query word. This process is repeated for each of the significant words of the query with a package being formed for each of the query words in the manner discussed above.
Using the packages 3038, one for each of the query words, during block 3036 of the flow, paragraph references are obtained which identify the actual paragraphs (and, if desired, documents) in which each of the equivalent words is contained. The paragraph references for each paragraph in the textual data base are then scored according to how well they match the equivalent words and finally, at block 3044, paragraphs of the data base corresponding to the paragraph references which have been scored will be output for visual display on the CRT of the operator console 1102. The textual data base containing the actual paragraphs of text is generally depicted at 3034 in the flow and is accessed and read out using the paragraph references and employing techniques well known in the data processing art.
3. Programs
FIG. 2 is a schematic and block diagram of the various computer programs arranged with reference to the portion of the digital data processing means of FIGS. 1 and 1A which executes each program. As indicated toward the top of FIG. 2, external microprocessor 1108 and RAM 1104 execute programs referred to as QPCNTL, QDETWD, QSS, QDPCMD, and QFLPKG, these computer programs preferably being in the form of firmware stored in ROM 1106. The QAP control board 1109 executes the computer programs PQAPCNTRL and PSUFIX which are stored in ROM 1122,1124. The microprocessor in the MCS 1114 executes the program MCS. The program MCS is also stored in a ROM (not shown) in FIG 1A. Therefore the various programs being stored in ROMs are firmware. To be explained in more detail, all of the computer programs depicted in FIG. 2 play a part in controlling the sequence of operation and thus the method whereby the digital data processing means of FIG. 1 processes the data.
Considering the various programs and their operation, the user first enters the query using terminal 1102. The external circuits 1104-1108 and in particular microprocessor 1108, using conventional programming (details not disclosed), strip the stop words from the query, leaving the significant query words in RAM 1104 which are to be processed by the system of FIG. 1. Stop words are nonsignificant words or words with little meaning such as "a", "the", "of", etc. Word stripping programs of this type are well known and need not be discussed further herein.
The query is a set of words such as a sentence or other group of words which the user wants to locate in the data base. By way of example, the user may want to locate a query containing such words as "HELP", "INTEREST", "RATES", after stripping of stop words. These query words are used by way of reference in subsequent discussions.
The query process control (QPCNTL) program controls the processing of all queries. The significant query words are first passed to the determine word candidate (QDETWD) program. The QDETWD program is a control program module responsible for controlling the looping through all significant query words to determine the acceptable inflections and misspellings of the words in the data base for each query word. To this end the program QDETWD passes the words of the query, one query word at a time, to the fill word packages program QFLPKG.
The textual data base (entry words) is arranged, stored and accessible in the disk storage device 1107, also called a secondary storage, by the first two letters of the word (i.e., in "families"). By calling a data base services program, the program QFLPKG obtains the family of entry words for each query word from the disk storage device 1107 and stores a representation thereof into an ENTRIES buffer (see FIG. 7) in external RAM 1104 and at the same time stores a packet corresponding to each of the entry words into a PACKETS buffer (see FIG. 4). To be explained, a packet is a unique fixed length representation of a word within the data base, as opposed to the actual variable length character stem. Using the query word "HELPS" as an example, all data base entry words having the first two letters HE are put into the ENTRIES buffer of RAM 1104 for processing against the query word.
The storage and retrieval of entry words in families, i.e., by the first two characters, is done for convenience and is therefore preferred but is not essential to the invention.
Each query word, the family of entry words for the query word, and the corresponding packets are passed to the program PQAPCNTRL located in the ROM 1122,1124 on the QAP control board 1109. The program PQAPCNTRL is responsible for reducing the family of entry words to a set of acceptable misspellings and acceptable inflections of the query word.
To this end the program PQAPCNTRL passes a query word to the suffix stripper program (PSUFIX). The PSUFIX program determines a stem of the query word by stripping off zero or more characters (i.e., a suffix), using certain tables and predetermined rules for acceptable suffixes. To be explained in more detail, the PSUFIX program determines the suffix and the stem portions of the query word and then goes on to form a suffix class indication having a value corresponding to any one of a plurality of different classes in which the identified suffix of the query word may be included. For example the query word "HELPS" would have the suffix "S" stripped therefrom. The program PQAPCNTRL then sends the determined query stem and the family of entry words to the misspelling classification subsystem (MCS) 1114.
The programmable microprocessor in the MCS 1114 using its own firmware computer program determines and returns to the program PQAPCNTRL those entry words (in the family) which have beginning characters that are acceptable misspellings of the stem (previously determined) of the query word. The MCS compares the identified query word stem with beginning characters of the entry words and, for a particular entry word, determines whether it is an acceptable or a nonacceptable match with the query word. The MCS also forms a match class indication which represents any one of a plurality of classes resulting from the comparison between the stems of the query and entry words.
After receiving the entry words which are indicated by match class indications to have beginning characters which are acceptable misspellings of the query word stem, the program PQAPCNTRL determines if such entry words have acceptable suffixes for the given query word. This determination is made using the suffix class indication (previously determined) for the query word to locate acceptable suffixes for the entry words. Those entry words which have acceptable suffixes are acceptable inflections of the query word. All of the family of entry words which are both acceptable misspellings and acceptable inflections of a particular query word are preferably separated from the other entry words in the family by placing into the RESULTS buffer (FIG. 7) in RAM 1104 a packet or representation for each acceptable entry word. The packet or representation consists of pointers or other coded information which are used to locate data concerning each of these entry words in the data base on disk storage device 1107. All packets for one query word are arranged in a package. A different package of packets is formed for each query word. Each word package is returned to the program QDETWD via the program QFLPKG in the external circuits.
After all the query words and the corresponding family of entry words for each query word of a particular query are processed, as discussed above, and a package for each query word is formed and returned to the program QDETWD via QFLPKG, the program QPCNTL passes the packages to the program QDPCMD which determines the set of paragraphs in the data base that contain the words of the packages. These paragraphs are then scored according to the number of packages represented in the paragraphs. Representations of the matching paragraphs are then sent to and visibly displayed to the user on the cathode ray tube (CRT) 1102b in terminal 1102 for use in locating desired data in the textual data base. Preferably the paragraphs are returned to the user and displayed on the CRT 1102b in decreasing value order by score, the paragraph with the highest score being returned first.
Although the details thereof are not disclosed herein, preferably the program QPCNTL calls a program QSS which expands the packages for each query word into synonyms and their acceptable misspellings and inflections using the techniques disclosed herein. The program QSS uses the program QFLPKG to perform the expansion. The program QPCNTL then passes the expanded packages to the program QDPCMD.
II. HARDWARE--QAP CONTROL BOARD AND EXTERNAL CIRCUITS OF FIG. 1
FIG. 1 depicts the overall digital data processing system in which the method and means according to the present invention is embodied. The query assist processor board (QAP control board) 1109 includes a microprocessor 1118 along with associated memory and a microprocessor based misspelling classification system (MCS) 1114 and a first-in first-out (FIFO) memory 1130 which serves as a buffer to pass data between the MCS 1114 and other circuits in the QAP control board 1109. Bus 1110 is provided for communication between the QAP control board 1109 and external circuits depicted to the right of the broken line in FIG. 1, including interface 1103 to operator console 1102, microprocessor 1108, random access memory 1104, read only memory 1106, and disk controller 1105 which in turn interfaces with disk storage device 1107. The QAP control board 1109 is a master on the bus and includes arbitration circuits 1172 which interface the microprocessor 1118 to the bus 1110.
The QAP control board 1109 provides the inflection and misspelling tolerance methods and means employed in connection with textual query functions. The microprocessor 1118 executes computer programs stored in read only memories to determine acceptable inflection forms of data base words for a given query word. The MCS 1114 classifies data base entry words into one of the above mentioned six misspelling classes or categories with respect to any significant query word stem.
Consider the circuits of the QAP control board 1109 in more detail. Clock signals for the microprocessor 1118 are derived from a conventional clock generator and driver 1150. The clock driver 1150 additionally cooperates with wait-state logic or generator 1154 to provide a ready signal to the microprocessor 1118 for slow memory and peripheral access.
The microprocessor 1118 is a conventional programmed microprocessor of the type 8086 manufactured by the Intel Corporation, the details of which are disclosed in the MCS-86 User's Manual published by the Intel Corporation and dated February 1979, the contents of which are incorporated by reference herein. Microprocessor 1118 performs the processing functions and generates the address and control signals required to access read only memories hereinafter referred to as ROM 1122,1124 and random access memories hereinafter referred to as RAM 1126,1128 and its I/O devices including the MCS 1114 and the external circuits.
Status lines S0, S1 and S2 out of microprocessor 1118 are driven by the microprocessor and are connected to a control unit 1176, arbitration circuit 1172 and a control circuit 1156. Address and data bus lines from microprocessor 1118 are coupled by way of address and data bus 1160 to a transceiver 1178, onboard address latch 1182, address latch 1158, and data transceiver 1162. I/O QAP decoder 1180 and RAM/ROM decoder 1186 are connected by control bus 1183 to on/off decoder 1184. The output of on/off decoder 1184 is connected to the input of arbitration circuit 1172. An address bus 1168 couples address lines to the address lines of the interrupt controller 1132, timer 1131, ROM 1122,1124, RAM 1126,1128, I/O QAP decoder 1180, RAM/ROM decoder 1186, and onboard address latch 1182. Data bus 1170 is coupled to the data lines of interrupt controller 1132, timer 1131, FIFO 1130, ROM 1122,1124, RAM 1126,1128, and transceiver 1178. A control bus 1188 is coupled to the control circuits of interrupt controller 1132, timer 1131, FIFO 1130, ROM 1122,1124, RAM 1126,1128, RAM/ROM decoder 1186, I/O QAP decoder 1180, wait-state generator 1154, and controller 1176.
Data bus 1216 couples data between the output portion of FIFO 1130 and the circuits in the MCS 1114. Address bus 1190 couples the output of address latch 1158 to the input of QAP attention circuit 1140 and to bus 1110.
Control unit 1156 is connected to bus 1110 by control bus 1166. Arbitration bus 1192 couples the bus 1110 to arbitration circuit 1172. Data bus 1165 couples the data transceiver 1162 to bus 1110. QAP swap byte logic 1174 has its control inputs connected to the address bus 1190 and an output of control unit 1156 and its output is connected to data transceiver 1162.
Briefly, the QAP control board 1109 operates as follows: The microprocessor has two phases of operation. One is to apply an address on data bus 1160 and the second phase is to either apply or receive data on data bus line 1160. By way of explanation, the data on data bus 1160 is 16 bits in length. The microprocessor 1118 initially puts out an address that is latched by the address latch 1158. At the same time the microprocessor 1118 applies status signals at the S0, S1 and S2 outputs which indicate whether an address is being provided and is a part of a memory cycle or is part of an I/O cycle, or whether there in fact is no memory or I/O cycle but an interrupt acknowledge cycle. The status signals enable the controllers 1176 and 1156 to take control later in the sequence, depending on the type of status signal and the cycle required. The onboard address latch 1182 will enable the address provided by the microprocessor 1118 through to the decoders 1180 and 1186. The decoders 1180 and 1186 will determine if the address is a valid onboard address (i.e., an address to be used within QAP control board 1109) and if so will select the addressed onboard circuit for subsequent transfer. The circuits which are addressed and selected are the following: RAM 1126, ROM 1124, FIFO 1130, timer 1131, and interrupt controller 1132. The decoders 1180 and 1186 also apply signals on control bus 1183 to on/off decoder 1184, enabling it to make a decision as to whether the address is an onboard or an offboard address. If decoder 1184 determines that this is not an onboard address, the address is used for addressing the external circuits bus 1110.
The decoder 1184 applies a signal to the input of arbitration circuit 1172, notifying it that it is responsible for obtaining control of bus 1110. In this regard there are multiple masters on bus 1110 whereas only one master can be putting addresses and data on the bus 1110 at any given time. The arbitration circuit 1172 via arbitration bus 1192 applies a signal on bus 1110 indicating that it is now in control and that none of the other circuits are to use the bus. This signal is sensed by the other circuits which demand access to bus 1110 and accordingly the other circuits do not attempt to apply conflicting signals on the bus.
Once the arbitration circuit 1172 has assumed control over the bus 1110, it passes signals along to control circuit 1156, causing it to apply control signals on the bus 1110 through bus 1116. Arbitration circuit 1172 also applies a signal to address latch 1158 causing it to put the address that was previously latched into the address latch 1158, onto bus 1110.
The control circuit 1156 additionally passes a signal along to data transceiver 1162 and QAP swap byte logic 1174, indicating the type of data transfer now in progress and whether it is data that is moving from address and data bus 1160 to bus 1110 or from bus 1110 to the address and data bus 1160. The controller 1156 in addition passes signals along the circuits 1162 and 1174 indicating whether the data transfer involves 16 bits or 8 bits and, if 8 bits are being transferred, where the 8 bits should be located on the 16 bit bus, i.e., whether it is in the low or high position on the bus.
Any transfer that is either onboard or offboard the QAP control board 1109 must be terminated through the wait state generator 1154. Accordingly the wait state generator receives a signal from bus 1110, causing the wait state generator 1154 in turn to control the clock driver 1150, putting the microprocessor 1118 into a wait state, waiting for data to be transferred.
The wait state generator determines the end of a data transfer when the microprocessor puts an address out to any device whether it is in the external circuits or onboard in the QAP control board 1109. As to data transfers to the ROM 1122,1124 and the RAM 1126,1128, the wait is terminated after a fixed time period. As to transfers over bus 1110, the wait state is terminated by the device with which the communication is taking place. For example if communication is taking place with the external RAM 1104, RAM 1104 generates a signal on bus 1110 indicating that it has put its data on bus 1110 and the wait state may now be terminated. The wait state generator 1154 senses the signal and enables the microprocessor 1118 to continue.
The operation is repeated again by the microprocessor 1118 when it applies an address on the address and data bus 1160.
The QAP attention circuit 1140 may be addressed by any of the circuits connected to bus 1110 by putting a special address on the bus 1110. The QAP attention circuit 1140 merely monitors the bus 1190 for a unique address and for a control signal which are applied thereto through busses 1190 and 1166, and responds to the unique address and the control signal to then apply an interrupt signal (called interrupt-3 signal) on line 1142. The interrupt controller 1132 in turn applies an interrupt to the microprocessor 1118. This then will cause the microprocessor 1118 to interrupt the processing on the QAP control board 1109 so that attention can be given to the requesting device. A program contained in ROM 1122,1124 will then determine the proper course of action to be taken. To this end RAM 1104 has a special buffer storage location similar to a mail box. When an interrupt is applied on the interrupt-3 line 1142, the QAP control board 1109 will interrogate the content of the special buffer storage location to determine the device that is desiring attention and to determine what is to be done. More specifically the microprocessor 1118 goes through a procedure whereby the controller 1132 is caused to pass data on bus 1170 to transceiver 1178 which in turn applies signals to the microprocessor 1118, telling the microprocessor 1118 how to handle the interrupt.
Consider now communication with the MCS 1114. An address, unique to the FIFO 1130, from microprocessor 1118 is placed on bus 1160 to indicate that a transfer is to take place to the FIFO 1130. The microprocessor 1118 must communicate with the MCS through FIFO 1130. The FIFO 1130 has an address 3ust like any other memory (i.e., ROM 1122, 1124, RAM 1126,1128). However it is considered an I/O device rather than a memory device. The microprocessor 1118 forms the address of FIFO 1130 and status signals are formed on lines SO, S1 and S2 indicating that this is to be an I/O write cycle. The address is latched by the onboard address latch 1182 and is applied on address bus 1168. The I/O QAP decoder 1180 decodes the address, determines that it is for FIFO 1130, and signals the FIFO 1130 to receive data. The data is subsequently applied on the address and data bus 1160 and the transceiver 1178 applies the data onto bus 1170 from which the data is stored into the input portion of FIFO 1130.
The data applied into the input portion of FIFO 1130 directs the operation of MCS 1114. The MCS 1114 goes through its operation as described in detail hereinafter and, assuming misspellings are found, the information is stored in the output portion of FIFO 1130. The MCS 1114 then applies an interrupt signal on line 1133 to the interrupt controller 1132 which in turn will interrupt the processing by microprocessor 1118 and provide signals to the microprocessor 1118 which represent that data is available in the output portion of FIFO 1130 from the MCS. The microprocessor 1118 then inspects the data in the output portion of FIFO 1130 and determines what action is required.
The circuits hereinabove described may be any one of a number well known to those in the computer art. Examples of some of these circuits will now be given.
The control unit 1156 may be a bus controller of type 8288 manufactured by the Intel Corporation and disclosed in the above referenced MCS-86 User's Manual. The programmable timer 1131 may be the programmable interval timer type 8253 manufactured by the Intel Corporation and disclosed in the above referenced MCS-86 User's Manual. The interrupt controller 1132 may be the type 8259A manufactured by the Intel Corporation and disclosed in the above referenced MCS-86 User's Manual. RAMs 1126 and 1128 are each preferably composed of static integrated circuit chip circuits well known in the computer art which are accessed via address bus 1168 only by the QAP conrol board 1109 and not by the external circuits. Variables and constants used by the various programs which are executed by the microprocessor 1118 are stored in predetermined locations in RAM 1126,1128 as will be described.
The external circuits include a standard operator terminal 1102 having a keyboard 1102A on which an operator composes textual information including queries. The terminal decodes the keyed textual information and provides machine readable binary coded output signals representing the text. Terminal 1102 also includes a cathode ray tube (CRT) display 1102B on which the results of textual queries are displayed for observation and use by the operator. A conventional interface 1103 connects the terminal 1102 to the bus 1110.
The microprocessor 1108 may be any one of a number of types of computer program controlled microprocessors well known in the computer art and for purposes of illustration is the type 86/12 manufactured by the Intel Corporation and disclosed in the above referenced MCS-86 User's Manual. The computer program for controlling the operation of microprocessor 1108 is stored in external ROM 1106. ROM 1106 is a conventional integrated circuit read only memory well known in the computer art. The random access memory 1104 is the scratch pad memory for the microprocessor 1108. Program variables used by microprocessor 1108 are stored in RAM 1104. Also, the family of entry words from the data base in disk storage device 1107 which correspond to each signficant word of the query are stored in RAM 1104 by microprocessor 1108 before being transferred to QAP control board 1109. After QAP control board 1109 determines acceptable misspellings and acceptable inflections from among the family of entry words, packets are stored in a RESULTS buffer in RAM 1104 which in turn can be used to locate these entry words. Other aspects of the operation of the system will be evident from the above identified referenced manual.
III. HARDWARE--MCS OF FIG. 11
The misspelling classification system (MCS) 1114 embodies means for performing the misspelling classification method according to the present invention. The MCS 1114 receives queries, entries and related parameters through the input portion of first-in first-out buffer (FIFO) 1130. The input portion (I) of FIFO 1130 is the interface of the MCS 1114 with the microprocessor 1118 (FIG. 1). The MCS 1114 performs special misspelling classification methods and returns results to the microprocessor 1118 through the output portion (O) of FIFO 1130 and generates an interrupt signal to notify microprocessor 1118 of completion of its operation.
FIG. 11 is a block diagram of MCS 1114 and also shows the FIFO buffer 1130 and buses 1216 and 1218 (see FIG. 1). Control bus 1218 and data bus 1216 to which the FIFO 1130 is coupled provide the main communication with the rest of the QAP control board 1109 (see FIG. 1). The heart of MCS 1114 is a program control microprocessor 1240. Although the microprocessor 1240 may be any one of a number well known in the computer art, preferably it is the 8.times.300 Bipolar Microcontroller manufactured by the Signetics Corporation and described in Signetics 8.times.300 Reference Manual printed in the United States in October 1977 by the Signetics Corporation, the contents of which are incorporated by reference herein.
The microprocessor 1240 is controlled by a computer program in the form of firmware stored in a programmable read only memory (PROM) 1236. In a preferred embodiment, two PROMs are employed. However the number is of no special significance to the present invention. A random access memory (RAM) 1242 is a high speed static RAM used as a scratch pad memory by microprocessor 1240. Again the number of RAMs may vary depending on the particular application.
I/O port 1250 is a conventional input/output port used to provide interrupts back to the microprocessor 1118 to control the operation of the input and output portions of FIFO 1130 and to monitor the status of the input and output portions of FIFO 1130. The output port 1250 generates an interrupt signal on line 1133 which is connected to the interrupt controller 1132 in QAP control board 1109 (FIG. 1) when a result is ready in the output portion of FIFO 1130.
I/O ports 1152 and 1154 are connected between address and data bus 1156 and the input and output portions of FIFO. The I/O port 1252 inputs 16 bits of data from the input portion of FIFO 1130 via data bus 1156 to the microprocessor 1240. The I/O port 1254 forms an output port which outputs 8 bit results from the microprocessor 1240 to the output portion of FIFO 1130. 16 bits at a time are input to FIFO and only 8 bits at a time are output because a greater amount of information is sent to the MCS than is returned. By way of example, not all entries that are sent to the MCS are returned as acceptable misspellings to the microprocessor 1118 in the QAP control board. The details of suitable I/O ports will be evident to those skilled in the art. However for purposes of explanation the I/O ports may be of the type 8T32 manufactured by Signetics Corporation and disclosed in the above referenced Signetics 8.times.300 Reference Manual.
Address latch 1158 is provided between the address and data bus 1156 and RAM 1242. The address latch 1158 is an output address latch to latch a specific address into RAM 1242 when microprocessor 1240 accesses the random access memory 1242. Although the address latch 1158 may be any one of a number of types known to those skilled in the art, preferably it is of the type 8T31 manufactured by the Signetics Corporation and disclosed in the above referenced Signetics 8.times.300 Reference Manual.
A typical flow for a query an entry through the QAP control board 1109 will now be described making reference to FIGS. 1 and 14. When the microprocessor 1118 on the QAP control board 1109 has an input for the MCS 1114, it deposits a query, entry, or control sequence into the input portion (I) of FIFO 1130 by doing an I/O output instruction. To the microprocessor 1118 the input portion of FIFO is just another I/O port, by way of example 64.times.16 bits deep, at the same I/O port address. Each time the microprocessor 1118 does an output instruction to the FIFO, a 16 bit word will ripple into the input portion of FIFO until it becomes full. At this time if the MCS 1114 does not empty the input portion of FIFO and microprocessor 1118 tries to do another output to FIFO, wait generator 1154 (FIG. 1) is activated, generating wait cycles to microprocessor 1118 until the MCS starts removing data from the input portion of FIFO. This situation, however, seldom occurs because the MCS unloads the input portion of FIFO at a much faster rate than microprocessor 1118 writes into the FIFO. The first priority for the MCS is to keep scanning the input portion of FIFO to see if there is any data there to be loaded. To this end the MCS 1114 tries to unload data from the input FIFO into its scratch pad in RAM 1242 as fast as it can until the whole entry, query, or control sequence is loaded. The MCS 1114 then performs the method controlled by the misspelling classification program (MCS) 1234 discussed in connection with FIG. 2. The MCS program as indicated in FIG. 11 is stored in PROM 1236 based on the data in RAM 1242 and sends the results back to the microprocessor 1118 through the output portion of FIFO 1130. When the whole result is deposited into the output portion of FIFO, the MCS will generate an interrupt on line 1133 to notify microprocessor 1118. Subsequently microprocessor 1118 starts unloading the output FIFO by doing an I/O input instruction.
IV. METHOD AND MEANS EMPLOYING QAP CONTROL PROGRAM (PQAPCNTRL)
A. Construction and use of Classify.sub.-- Table 1200, Suffix.sub.-- Strip.sub.13 State.sub.13 Table 1201, Acceptable.sub.-- Suffix.sub.-- Table 1202 and Suffix.sub.-- Table 1204.
Four tables are used in the process and method for stripping suffixes from query words, for classifying the suffix of the query word, and for determining whether entry words (from a family of entry words) are acceptable inflections of the corresponding query word. These tables are CLASSIFY.sub.-- TABLE 1200 depicted in FIG. 4A, the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE 1201 depicted in FIG. 4B, the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 depicted in FIG. 4, and the SUFFIX.sub.-- TABLE 1204 depicted in FIG. 5. Each of these tables is stored, is addressable and is accessible by the data processing system in a different prefixed location in the ROM 1122,1124 of the system of FIG. 1. The CLASSIFY.sub.-- TABLE 1200 and the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE 1201 are local variables used by the PSUFIX program (hereinafter described in detail) (see Table 6). The ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 and the SUFFIX.sub.-- TABLE 1204 are local variables used by the PQAPCNTRL program (see Table 5).
Before constructing the above tables the designer must create a set of stripping rules and corresponding acceptable suffix lists. These rules and lists are created by observing the particular written language of interest and noting character patterns that appear at the end of related words. By way of example, the following are four groups of words:
Group 1. RAT/E, RAT/ES, RAT/ED, RAT/ING
Group 2. INTEREST, INTEREST/ED
Group 3. HELP, HELP/S, HELP/ED, HELP/ING
Group 4. COMPUT/E, COMPUT/ES, COMPUT/ING.
Each of the words in each group is considered by the designer as an acceptable inflection of the other words within the same group. Also the designer considers which portion of each word is to be the stem and which is to be the suffix for the word, again in accordance with the language of interest. The above examples are given considering the characters to the left of the slash as the stem, and the characters to the right of the slash as the suffix.
The designer must consider what endings or the lack of an ending (i.e., a null) for a word are acceptable suffixes in forming acceptable inflections. For example the word RAT would not be an acceptable inflection of any of the words in Group 1. Similarly, the word HELPMATE would not be an acceptable inflection for the words in Group 3; the word COMPUTER would not be an acceptable inflection for the words in Group 4.
Accordingly the designer of the tables creates a set of stripping rules and corresponding acceptable suffix lists for use in designing the four tables.
FIG. 12 is one example of such a table including stripping rules, acceptable suffix lists, and examples of words with acceptable suffixes. Certain symbols are used for convenience. For example, the symbol "-" means that any character can be in that position; the symbol "/" means that this is the strip point between the characters of the stem and the suffix; and "null" means a suffix of zero or no characters. Each numbered row corresponds to a different rule. The rules and rows are given the same number for convenience.
Consider by way of example rule 1 in the row No. 1, Table 12. The symbol -/E indicates that the letter E is a suffix when it follows any letter of the stem. To the right in row No. 1, acceptable suffixes for the suffix E are E, ED, ES, and ING, and examples of words employing these acceptable suffixes are RATE and TABLE. In most cases the stem created can have any one of the acceptable suffixes added to the stem to create words that are similar enough to the original entry word to be returned for processing as an acceptable inflection of the original query word. As explained above, if the original word were RATE, then the stem would be RAT and the list of acceptable inflections for that stem would be RATE, RATED, RATES, and RATING.
Referring to rule 2, row No. 2, the symbol -/ES means that the letters ES form an acceptable suffix following any letter in the stem. To the right in row No. 2 it will be noted that acceptable suffixes for the suffix ES include E, ED, ES, and ING. Examples of words employing those suffixes are STATES and COMPUTES. Similar analysis may be used for examining the rules of acceptable suffixes and examples for the rules in rows Nos. 3, 4 and 5. Row No. 6 shows as a rule -/null. This means that any character, or combination of characters, not included by the rules in rows Nos. 1 through 5 is considered as a default or nonstrippable ending and therefore not a suffix. To the right in row No. 6 where there is no suffix on a word, acceptable suffixes would be null, S, ES, ED, and ING. Examples of such words which do not have a strippable suffix are HEBREW, CREDIT, and INTEREST.
When creating the stripping rules and acceptable suffixes, certain exceptions may be determined for simplicity and ease of implementation. Examples of such exceptions for Table 12 are noted toward the bottom of the table. For example if the stem length is less than two characters long, or there are no vowels in the stem and the suffix is not an "S", then no characters will be stripped and the default nonstrip rule indicated at row 6 of Table 12 is used in determining the acceptable suffixes. A further exception occurs if the stem is all numerals. If the stem is all numerals then the acceptable suffixes are only null and "S".
Considering the design of Table 12 in more detail, each rule (i.e., each row) is formed recognizing one particular character pattern class at the end of a word and is intended to be used to strip off zero or more characters. Additionally a suffix classification value (herein sometimes called a suffix classification indication) is assigned to the character pattern class which has associated with it a set of acceptable suffixes. The set of acceptable suffixes for each different valued suffix classification indication when added to the stem created from the query word, will produce words considered similar enough to the original query word to be acceptable inflections of the query word. To be explained in more detail, a suffix classification indication is formed for each query word and is later used to determine whether entry words (whose beginning characters have been determined to be acceptable misspellings of the query word) are proper inflections of the query word. This is done by comparing the suffix of the entry word with the acceptable suffixes in the class designated by the suffix classification indication.
Once a set of rules is created, such as in Table 12, it is important to remove any ambiguity within them so that no word can be stripped in more than one way. The rules should be placed in the table in the order in which they are to be applied in stripping suffixes from the query word. Note that in Table 12, rules 2 and 5 are not ambiguous. The reason is that rule 2 is applied first and if it fails then rule 5 is applied.
Once the rules have been created, an inventory is made of all possible characters that must be recognized in the rules. This inventory of characters is then used in creating the CLASSIFY.sub.13 TABLE (FIG. 4A). The CLASSIFY.sub.13 TABLE maps all possible characters to be encountered in the data base into classes of characters that can be treated as equivalents. With reference to Table 12, there are seven character classes, as follows: D, E, G, I, N, S, and all other characters. Each class of characters is then assigned a value called a character type. The characters of the query are represented in ASCII coded characters. The CLASSIFY.sub.13 TABLE is used to convert each ASCII coded character to its character type. In the example, seven different character classes and hence character types are employed. Table 8 is included as one example of the values of the character types assigned to the letters of the English alphabet. By way of example the seven character classes are assigned decimal character types 0 to 6. As generally illustrated in Table 8 and in FIG. 4A, the character types are stored in sequential addressable memory locations of the CLASSIFY.sub.-- TABLE 1200 and are accessible using the ASCII coded characters as addresses. In this manner an ASCII coded character for a particular character of the query word can be used to address the CLASSIFY.sub.13 TABLE 1200 and read out the corresponding character type from the corresponding location of the table. By way of example the character types for the ASCII coded characters A, B, D, and Z are 0, 0, 1 and 0, and are stored at locations 1200a, 1200b, and 1200c of the CLASSIFY.sub.-- TABLE 1200 in FIG. 4A.
Once the CLASSIFY.sub.13 TABLE 1200 is constructed, the SUFFIX.sub.-- STRIP.sub.13 STATE.sub.-- TABLE 1201 depicted in FIG. 4B is constructed. By way of example the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE 1201 is shown arranged into rows and columns with a node at the intersection of each row and column. Each column of the table is assigned a value corresponding to a different one of the character types contained in the CLASSIFY.sub.-- TABLE 1200. By way of example, the columns of the table are numbered from left to right, 0 through 6 (FIG. 4B) corresponding to the different possible character types. The table further has rows corresponding to states. Each row or state of the table represents a step in the process of examining characters at the ends of the query words. Each state represents a particular character context present at the end of a query word at a point in processing. Thus each row or state of the table has as many nodes as there are character types.
Stored at each node are two values, an "ACTION CODE" value (hereinafter sometimes referred to as ACTION or A), and a "NEXT state/selection code" (hereinafter sometimes referred to as NEXT or N). ACTION (A) and NEXT (N) for node 1201A are depicted by the symbols A/N in FIG. 4B by way of example. It will be understood that values for ACTION (A) and NEXT (N) are also contained in other nodes scattered throughout the table of FIG. 4B as required for the particular language and acceptable suffixes to be stripped.
The value of ACTION at each node represents the performance of specific actions that will occur. The value of NEXT at each node is either the next state (or row) of the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE 1201 which the machine will access, or else it will be a selection code (or suffix classification indication) representing a list of acceptable suffixes (and implying that no further state table processing is to be performed).
Table 9 gives an example of the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.13 TABLE for rules and acceptable suffixes depicted in Table 12. The right side of Table 9 provides a state description for ease of understanding. The action to be taken for each ACTION (A) value in the suffix table is indicated at the bottom of the table. Some of the values corresponding to NEXT (N) have an "S" preceding the value. This means that it is a state value and is to be used to select a row of the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE. A value for NEXT (N) without an S means a selection code and is to be used as a suffix class indication. By way of example, NEXT (N) value S4 means state 4 corresponding to row 4 of the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE whereas the NEXT (N) value 4 alone means a selection code or suffix class indication of 4.
The SUFFIX.sub.-- TABLE 1204 (FIG. 5) is constructed by creating a unique list of all possible suffixes from all the acceptable suffix lists in Table 12. This list is then ordered and structured in the most convenient manner for accessing. Preferably this is in order by length of the suffixes. Referring to FIG. 5, the table is arranged into rows and columns with a node at the intersection of each row and column. The node in the first column 1210 of each row is a character count which specifies the number of characters in the suffix represented to the right. The nodes in the columns 1212 to the right of the first column 1210 are numbered consecutively starting with 0. Each node is a position for storing an ASCII coded character representing a character of the suffix. Thus each row contains one or more characters making up the character or string of characters of a different acceptable suffix contained in the list of acceptable suffixes in Table 12. The rows in the SUFFIX.sub.-- TABLE 1204 are numbered consecutively beginning with 0. To be explained in more detail, the number for each row is called a suffix index. Suffix indications are stored in the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 and are used for addressing the corresponding numbered rows of the SUFFIX.sub.-- TABLE 1204 to thereby read out therefrom the character or string of characters for an acceptable suffix.
The ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 (FIG. 4) is used to locate a list of all of the acceptable suffixes for a particular suffix class indication (WDSELECT) or class of suffixes. To this end the designer creates a list of all of the unique acceptable suffixes for each different suffix class indication (WDSELECT) and in place of the actual suffixes, places the corresponding suffix indication. All of the resultant suffix indications for each suffix class indication in the list then become pointers to the rows of the SUFFIX.sub.-- TABLE 1204 where the actual suffixes can be located and read out.
The ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 (FIG. 4) is then constructed, by way of example, in rows and columns with a node at their intersections. The rows are numbered consecutively starting with 0 corresponding to the suffix class indications (WDSELECT). Therefore the rows in the tables are addressable using the suffix class indications. All of the suffix indications for a particular suffix classification indication are stored in the corresponding rows at the nodes in columns 1208 of Table 4. The nodes in each row in columns 1206 (which is to the left of columns 1208) store a count value which gives the number of suffix indices in the corresponding row.
Tables 11 and 10 depict examples of the content of the SUFFIX.sub.-- TABLE 1204 and the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 for the rules and acceptable suffixes depicted in Table 12. Each row of the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 corresponds to a different valued acceptable suffix indication. By way of example, the acceptable suffix indication 0 corresponds to row 0 of Table 10 (and hence row 0 of FIG. 4) and the acceptable suffix indices are 0 and 2. With reference to Table 11, an acceptable suffix index of 0 has zero or no acceptable suffixes. An acceptable suffix index of 2 contains one acceptable suffix character, namely, an S.
Refer now to Tables 9 through 12 and consider how they implement the rules and acceptable suffixes represented in the design Table 12. Consider by way of example the query word RATE and the entry word RATES, RATED, RAT, and RAPTING. The stem of the query word is RAT. The beginning letters of the entry word are all acceptable misspellings (where that term includes an exact match) of the stem RAT. This includes the word RAPTING which has a single character insertion error and is therefore also an acceptable misspelling. Consider how the digital data processing system will use the tables to first strip the suffix from the query word RATE, derive a suffix class indication and later select the acceptable suffixes corresponding to the suffix class indication, and compare them with the suffixes of the entry words to determine those entry words which are acceptable inflections of the query word.
The data processing system first strips the suffix from the query word RATE leaving the suffix "E". To this end the ASCII coded representation of the character E (in the query word RATE) is used to address the CLASSIFY.sub.-- TABLE 1200 (Table 8, FIG. 4A) and to read out from the corresponding location the character type value 2. The SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE 1201 (Table 9, FIG. 4B) is then accessed. The node at the intersection of row 0 column 2 corresponding to an initial state of 0 and character type 2 is addressed (using the values 0 and 2) and the following is read out:
ACTION(A)=1, NEXT(N)=3.
The ACTION (A) value 1 causes the SIZE value for the query word to be decremented by 1 thereby going from 4 to 3 characters in length, stripping off the character E, leaving the stem RAT. Since NEXT (N) does not include an "S", the suffix has been completely stripped and the value in NEXT (N) is a suffix classification indication, and no further states or operations using Table 1202 are required. The suffix classification indication 3 is then stored as a variable called WDSELECT. The system then compares the stem RAT with the beginning characters of the entry words RATES, RATED, RAT and RAETING and determines that all of the entry words are acceptable misspellings of the stem and therefore returns them as acceptable misspellings. To be explained in another section, a misspelling class indication is also formed indicating the class of misspelling of each entry word which is returned. This indication is used to determine the positions of the suffices in the entry words.
The returned entry words are now checked for acceptable inflections of the query word. Specifically the suffix classification indication (WDSELECT=3) is used to form an address into the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 (FIG. 4, Table 10) and accordingly the suffix indices 1, 3, 4 and 5 are read out. These suffix indices form the variable SUFF.sub.-- IX which is used to form and address the correspondingly numbered rows of the SUFFIX.sub.-- TABLE 1204. The correspondingly numbered rows 1, 3, 4 and 5 are read out causing the following list of acceptable suffixes (in ASCII code) to be formed: E, ED, ES, and ING. The position of the suffixes in each of the returned entry words is determined using the size value (SIZE) for the stem of the query word and the misspelling class indication. The list of acceptable suffixes is then compared with the suffixes determined in each of the returned entry words and equality is detected between the suffix list and the suffixes in the entry words RATES, RATED and RAPTING. Therefore these words are determined to be acceptable inflections as well as acceptable misspellings of the query word. Lack of equality is detected with the null suffix in the word RAT and therefore the word RAT is rejected as a nonacceptable inflection. Thus rule 1 (row 1) of Table 12 is implemented or defined in the four tables.
Rule 5 in Table 12 is -/S. Although this rule appears similar to rule 1, i.e., -/E, it cannot be handled in the initial state as was the case with rule 1. This is because rule 2, which is -/ES, also has an S at the end following the letter E. This means that, during stripping of the suffix from the query word, rule 2 must be checked first, using Table 1201. If rule 2 is not satisfied upon examining the character preceding the letter S in the query word, then rule 5 will be in effect. This process is implemented in Table 1201 of Table 9 in the column corresponding to character type 6 for an S in row 0. At the intersection of these two columns the ACTION (A) is 0 and the NEXT (N) value is S3. The ACTION (A) value means that no action is to be performed and the value S3 means that the value 3 is to be used as the NEXT value for addressing and accessing the correspondingly numbered row of the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE. In the state row corresponding to the NEXT value 3, each of the nodes is the same except for that under character type 2 for an E. When the ACTION (A) and NEXT (N) values are read out from the table at this node, the ACTION (A) value 2 causes the stem length value to be decremented by 2, thereby stripping off the letters ES and the NEXT (N) value 3 (not having an S preceding it) is a selection code or suffix classification indication. The NEXT (N) value 3 will subsequently be used for accessing row 3 of the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 (Table 10) for reading out the corresponding acceptable suffix indices which in turn will be used for accessing the SUFFIX.sub.-- TABLE 1204 (Table 11) to derive the list of acceptable suffixes. Referring back to Table 9 it will be noted that the rest of state row 3, other than that under character type 2 for an E, has an ACTION (A) value of 1 and therefore will only cause the stem length value to be decremented by 1, thereby stripping only the S at the end of the word and hence satisfying rule 5 (Table 12).
Rule 4 of Table 12 is -/ING and is handled in Table 1201 (Table 9) by three states, namely, 0, 2 and 4. The character G encountered in initial state 0 causes no stripping, and processing continues at state 2. If an N is encountered in the query word during state 2, the G is stripped by decrementing the stem length by 1, and state 4 is next entered. If in state 4 the letter I is encountered, then using the character type 4 and the state row value 4, the corresponding node is read out causing an ACTION (A) value 2 and a NEXT (N) value 4 to be read out. The ACTION (A) value 2 will cause the stem length to be decremented by 2, thereby stripping the N and I from the query word, and the value 3 is used as a suffix classification indication for accessing the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 and subsequently the SUFFIX.sub.-- TABLE 1204 (Tables 10 and 11). Returning to Table 9, if in state 2 anything other than the character N is encountered in the query word, then ACTION (A) value 0 and NEXT (N) value 1 are read out causing a DO NOTHING condition (as far as stripping is concerned) and an acceptable suffix indication of 1, which with reference to Tables 10 and 11 will cause the acceptable suffixes for rule 6 to be obtained from Table 11. If in state 4 anything other than an I is encountered then, with reference to Table 9, ACTION (A) value 3 and NEXT (N) value 1 are read, causing the stem length to be increased by 1, adding back on the stem that was stripped during the preceding state. Again the NEXT (N) value 1 then forms an acceptable suffix indication corresponding to row 1 of Table 1202 (Table 10) which in turn causes the acceptable suffixes at row 6 of Table 1204 (Table 11)to be read out. This then effectively carries out rule 6.
Different methods may be used for implementing the rules of Table 12 and creating the acceptable suffixes. For example by creating more ACTION logic for state 0,column 3, the digital data processing system may automatically determine if the letters "IN" precede the current character G. If so the value NEXT could automatically be set to 4 and the stem length decremented by 3. If the letters "IN" do not precede the current character G, then the data processing system could be arranged for automatically setting the NEXT value to the default selection code (i.e., the suffix classification indication) of 1 which in turn would cause the acceptable suffixes indicated at rule 6 to be read out. For example with the rules of Table 12, the following states are required:
1. Always have an initial state.
2. Rule 1 is satisfied within initial state.
3. Rules 2 and 5 are satisfied with one extra "S" state.
4. Rule 3 is satisfied by one extra "D" state.
5. Rule 4 is satisfied by "G" state and "NG" state.
6. Rule 6 is satisfied by all default nodes of above states.
ACTION codes are created as needed. ACTION codes 0, 1, 2, are more common. ACTION code 3 is required for Rule 4 to counteract unnecessary stripping of "G".
Which implementation is better is sometimes a matter of choice and can effect the simplicity or complexity of the implementation. For example a trade-off is found between the number of ACTION code types and the number of states. For complex sets of rules, more ACTION code types may be a necessity because state proliferation can cause code space problems especially for rule sets that require a large number of character classes.
When a new rule is created in Table 12 there are several ways to integrate or implement the rule in the existing tables. If the rule deals with a new character not dealt with before, it can require the creation of another character type and hence another column in the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE 1201 (Table 9). If there are new suffixes in the acceptable suffix list, then there must be insertions where appropriate within the SUFFIX.sub.-- TABLE 1204 and the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 needs to be modified where insertions cause index vales to change for old suffixes. An example is now given of how the designer would go about adding a rule to the Tables 8-12. Tables 8A, 9A, 10A, 11A and 12A show the changes to Tables 8-12 for adding the following:
______________________________________
Rule Acceptable Suffix List
Example
______________________________________
L/Y E, Y, IES ASSEMBLY, PROBABLY
______________________________________
Adding this rule to Table 12 as depicted in Table 12D would require the following:
1. Creating two new character types 7 and 8, respectively, for L and Y in CLASSIFY.sub.-- TABLE 1200 (see Table 8A).
2. Adding two new columns (7,8) to the SUFFIX.sub.-- STATE.sub.-- TABLE 1201 for each of the new character types as well as a new "Y" state row (5) to implement the rule (see Table 9A).
The node in state 0 for the character type 8 ##EQU1## would be accessed if a query word ended in Y. The new rule only applies if the Y is preceded by an L. For this reason, another state (row 5) must be added. The node of ##EQU2## specifies: "Do nothing and go to state 5 to examine the next S5 character in from the end". Since the only interest is in stripping a Y if preceded by an L, only the 7th column of state 5 has an ACTION/NEXT node other than default values. Its value pair of ##EQU3## specifies: "Strip a character (Y) off the end of the word and stop with an acceptable suffix list to be found in row 5 of the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202" (see Table 10A and 4B below).
All other nodes added by the new row and two new columns are default values to implement the other rules of Table 12.
3. Since the acceptable suffixes Y and IES are new suffixes, both must be inserted at the appropriate locations within the SUFFIX.sub.-- TABLE 1204 to ensure ordering by size and alphabetical order within suffixes of the same size (see Table 11A).
4. As a result of the new rule, the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 requires two types of changes:
A. All suffix string indices that were changed by the insertion of the new acceptable suffixes into the SUFFIX.sub.-- TABLE; for example, ED changed from 3 to 4, ES from 4 to 5, ING from 5 to 7.
B. A sixth row (index of 5) must be added to the table to represent this new unique acceptable suffix list of E, Y, IES.
It should be noted that this rule is an example of a situation where the end of the stem as well as the suffix of a query word contributes to the determination of a suffix class indication.
Consider how the suffix classification is determined and how the suffix is stripped for the query word "ASSEMBLY". The character Y causes row 0 column 8 of the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE 1201 (Table 9A) to be accessed, reading out ACTION(A)=0 NEXT(N)=S5. NEXT(N)=S5 is a state code and causes row (or state) 5 of the same table to be accessed. ACTION(A)=0 means that nothing is done to decrease the SIZE value for the query word and therefore no characters are as yet to be stripped.
The next character of the query is L corresponding to a character type of 7. Therefore the node at row 5 column 7 corresponding to NEXT(N) state code of 5 and character type 7 is read out resulting in ACTION(A)=1 NEXT(N)=5. ACTION (A)=1 causes the SIZE value to be decreased by one thereby stripping off the Y from ASSEMBLY. The NEXT(N) value of 5 is a selection code and therefore becomes the suffix classification indication WDSELECT for subsequent use in reading out the acceptable suffix list from row 5 of the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 (Table 10A).
Assume instead that the query word were CRAFTY. Row 0 column 8 of the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE 1201 (Table 9A) would have been accessed for the character type 8 for the character Y, resulting in ACTION(A)=0 and NEXT(N)=S5. However, the next character is a T which is a character type of 0. Therefore the system would have accessed row (state) 5 column 0 of the SUFFIX.sub.-- STRIP.sub.-- STATE.sub.-- TABLE 1201 resulting in ACTION(A)=0 and NEXT(N)=1. ACTION(A)=0 is the "do nothing" condition meaning that no characters are stripped. As a result the SIZE value for CRAFTY would remain at 7 and no characters would be stripped. The NEXT(N) value 1 is a selection code and is therefore a suffix classification corresponding to the default, non strip or -/null rule in the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE 1202 (Table 10A).
B. Operation
Consider briefly the various data structures used by the programs. The data structures are the QRIO structure (Table 3), PARM structure (Table 4), the QA structure (Table 6), and the ENTIN structure (Table 7). Each of these structures is a block of consecutive memory locations. The structures are at fixed locations and each item in the structure is at a prefixed offset in the corresponding block. All of the structures are in internal RAM 1126,1128 except for the QRIO structure which is stored in external RAM 1104.
A shorthand notation sometimes used to refer to the individual items in each structure is by the name of the structure followed by the name of the individual item. For example for the QRIO structure: QRIO.ENTRIES, QRIO.NUMENT, QRIO.PACKETS, etc.; for the PARM structure: PARM.ENTRIES, PARM.NUMENT, PARM PACKETS, etc.; for the QS structure: QS.SIZE and QS.STEM(58). The computer program PQAPCNTRL is executed by the system of FIG. 1 in the QAP control board 1109 and provides the sequencing of operations to determine the acceptable misspellings and inflections of a significant query word from among the family of significant entry words of the data base which begin with the same first two letters as the query word. It is to be noted that the invention is not limited to requirements for a match on the first two characters. For example a match might be required on only the first character or more than two characters. The PQAPCNTRL program is stored in ROM 1122,1124 and is executed out of these memories by the microprocessor 1118.
FIG. 3, composed of FIGS. 3A to 3H, forms a flow diagram illustrating the computer program PQAPCNTRL as well as the resultant method and sequence of operation of the digital data processing means of FIG. 1. FIGS. 3A through 3H use symbolic notations and symbols illustrating the sequence of operation, the meaning of which will become evident from the following discussion. The PQAPCNTRL flow diagrams of FIGS. 3A through 3H are arranged into blocks and are numbered 1-82 for convenience.
Referring to FIGS. 1 and 3, during block 1 of the flow power is turned on in the system of FIG. 1. During block 2 the QAP control board 1109 resets the MCS 1114 to an initial condition, clearing the first-in first-out (FIFO) buffer 1130 to zeros. Additionally, buffers including a variable M8612 (to be explained hereinafter) in RAM 1126,1128 are reset to zero. The variable M8612 is an interrupt flag (see Table 5) and is set to zero in preparation for entering the wait loop in block 3.
All communications between the QAP control board 1109 and the microprocessor 1108 are through a buffer at a predefined location depicted at 1137 in external RAM 1104 (FIG. 1). All information (query word information and entry word information) and status information are passed through buffer 1137. All communications between MCS 1114 and the rest of the QAP control board 1109 are through FIFO 1130. Inputs to the MCS are through the input section of the FIFO whereas outputs from the MCS to the rest of the QAP control board 1109 are through the output section of FIFO 1130.
Table 1 depicts the format of various word and synchronization signals. Five different items are depicted and for brevity are referred to by the table number followed by the number of the item, i.e., item 1 is Table 1-1, item 2 is Table 1-2, etc.
Referring to Table 1-1, following the resets, a 2 byte synchronization pattern consisting of the four hexadecimal digits "AAAA" (AAAAH) is sent by the microprocessor 1118 in the QAP control board 1109 to the input (I) portion of the FIFO 1130 from which it is read by the MCS 1114 to establish communication with the rest of the QAP control board 1109. "H" following a series of characters is used herein to denote a hexadecimal number. The MCS 1114 responds to the synchronization pattern by sending the same 2 byte synchronization pattern AAAAH back to the output (0) portion of FIFO 1130 and by forming an interrupt signal on line 1133 to interrupt controller 1132 which in turn applies an interrupt signal on line 1134. This interrupt causes the computer program PQAPCNTRL to cause the 2 byte synchronization pattern to be removed from the output portion of FIFO 1130.
Block 3 of the flow is a condition where the QAP control board 1109 goes into a command wait loop, waiting for an interrupt from the external microprocessor 1108, which signifies a request for service.
Deviating from the flow for a moment, the QAP control board 1109 communicates with the external microprocessor 1108 through communication buffer 137 in the external RAM 1104 in FIG. 1. A special QAP driver program (not shown) located in ROM 1106 causes command and parameters to be set into the communication buffer in the external RAM 1104, moving them from a buffer where information was set by the program QFLPKG. When the QAP control board 1109 detects an input because of an interrupt from the external microprocessor 1108, it moves pertinent parameters from a QRIO structure of data shown in Table 3 in the communication buffer to a PARM structure, shown in Table 4, located in a predetermined buffer 1125 in internal RAM 1126,1128. When the QAP control board 1109 finishes at the predefined buffer in external RAM 1104 with the results of the request and issues an input/output command which in turn causes an interrupt to the external processor 1108 causing it to process the result.
The QRIO structure (Table 3) includes a command byte. The only commands of interest are a command byte of 0, indicating an initiate query, and a command byte of 1 indicating a continue query. The operations responsive to these commands are discussed hereinafter. The structure also includes a value called ENTRIES which is a pointer to the location in external RAM 1104 where the family of entry words, beginning with the same two letters as the query word, is located. NUMENT is a word value giving the number of entry words in the RAM buffer pointed to by ENTRIES. The value PACKETS in the QRIO structure is a pointer to the beginning of a buffer of packets located in RAM 1104. A PACKET is a fixed length entry which is a complete representation of an entry word in the data base. The packets within this buffer correspond in a one-to-one fashion with the entry words within the buffer pointed to by ENTRIES (i.e., the first packet corresponds to the first entry word). RESULTS is a value in the QRIO structure which is a pointer to the beginning of a buffer in external RAM 1104 to receive those packets corresponding to entry words which are determined to be acceptable misspellings and inflections by the PQAPCNTRL program. NUMAVAIL is a word in the QRIO structure which identifies the maximum number of PACKETS which the RESULTS buffer will hold. A more complete discussion of these items and their use will be given in connection with block 28 of the flow of FIG. 3. QCHARS is a byte which identifies the total number of characters in the query word. QUERY(58) is an array of representations of the actual characters of the query character string which are being processed. The symbol (58) is used herein to indicate that the corresponding field of characters may be up to 58 characters or bytes long. However the invention is not limited to any particular length. A byte of information has 8 bits of information. A character is a byte of information.
Returning to block 3 of the flow, assume that the microprocessor 1108 generates an interrupt to the QAP, signifying a request. The interrupt causes an interrupt handling routine (not shown) on ROM 1122,1124 to store a nonzero value in RAM 1126,1128 in variable M8612 (Table 5). This causes the PQAPCNTRL program during block 4 to clear M8612 to zero and enter blocks 6 through 8. Here PQAPCNTRL causes the command byte in the QRIO structure located in external RAM to be checked. Assume now that during block 8, either an initiate command (0) or a continue command (2) is detected in the QRIO structure in external RAM 1104. This causes the operation of the QAP control board under control of the PQAPCNTRL program to branch through bullet Bl of the flow to block 15A of the flow.
If block 15A of the flow is reached it is because the QAP control board 1109 has received a request from microprocessor 1108 to initiate or continue a query. During block 15A a copy of the QRIO structure in external RAM 1104 (Table 3) is transferred to an identical PARM structure (Table 4) at a fixed location in the buffer in internal RAM 1126,1128. Block 16 of the flow is then entered. Certain initializing steps are now taken (see details in block 16 and Table 5). Representations of all of the items listed in Table 5, with the exception of the ACCEPTABLE.sub.-- SUFFIX TABLE and SUFFIX.sub.-- TABLE, are stored or declared to be in fixed locations of RAM 1126,1128. Representations of the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE and SUFFIX.sub.-- TABLE are stored in ROM 1122,1124. Block 17 is then entered.
During block 17 the PARM values QCHARS and NUMENT stored in internal RAM 1126,1128 are checked to determine whether they are valid. If QCHARS is greater than or equal to 2, then the query word contains at least two characters and is acceptable. If NUMENT is greater than zero, it means that there are some entry words to compare to the query stem and NUMENT is acceptable. Block 18 is entered and if either of these PARM values is unacceptable, the NO route out of block 18 is followed and block 23 will be entered where a parameter error is logged and the system exits through bullet H2, returning eventually back to block 3 of the flow. Assuming the parameters are acceptable (OK), the YES route out of block 18 is followed to block 19 where a check is made to see whether the command provided in the PARM structure is an initiate command. If the command is an initiate command, then block 24 of the flow is entered where a check is made to see whether a query is already in progress.
The variables depicted in Table 5 include a QUERY.sub.-- IN.sub.-- PROGRESS flag. This flag is set true on the first call to the PSUFIX routine which will occur during the subsequent block 26. During block 24 the QUERY.sub.-- IN.sub.-- PROGRESS flag is checked to see whether it is already true. Assuming this is the first call on the PSUFIX program, the flag will be false and block 26 will be entered. If for some reason the flag is true during block 24, then block 25 is entered where the MCS 1114 and the QAP control board 1109 are reset and a QAP control board error is logged and then the operation returns through bullet H2 eventually back to block 3 where the PQAPCNTRL enters its wait loop.
Assuming no error and block 26 is entered, the PSUFIX program (FIG. 10) which is stored in internal ROM 1122,1124 is now called and executed by the microprocessor 1118. The details of the operation while executing the PSUFIX will be discussed in detail in connection with FIG. 10. However a brief summary will be given at this point.
When PSUFIX is activated the system takes the PARM structure items (Table 4) QCHARS and QUERY(58) and, using the CLASSIFY.sub.-- TABLE (Table 8) and the SUFFIX.sub.-- STRIP.sub.-- STATE TABLE (Table 9) stored in internal ROM 1122,1124, locates the stem and suffix portions of the query word represented by QCHARS and QUERY(58) and forms a suffix class indication (WDSELECT) in the query (QS) structure indicating the one of a plurality of classes in which the identified suffix of the query is included
Table 6 depicts the QS structure which is the structure of information obtained by the call to the PSUFIX program. Referring to Table 6, the system under control of the PSUFIX program returns the values FFH representing a query command, a value SIZE which designates the number of characters in the query word stem, and a string of characters in STEM which are the actual characters in the stem of the query word. In addition a QTYPE value is returned which indicates whether the query is an alphanumeric word (0) or is an all numeric word (1) and the value WDSELECT is returned. WDSELECT is an index into a row of the ACCEPTABLE.sub.-- SUFFIX.sub.-- TABLE and is an indication of a class of suffixes on entry words that are acceptable, given the suffix stripped from the given query word.
Additionally during block 26 the QUERY.sub.-- IN.sub.-- PROGRESS flag (Table 4) is set true as discussed above.
Return now to the flow of FIG. 3. During block 27 the QUERY command portion of the QS structure of Table 6 is transferred through the input portion (I) of the FIFO 1130 to the MCS 1114. The query command flag (FFH) indicates to MCS 1114 that what follows is a query stem and MCS 1114 saves the SIZE and STEM characters in its internal RAM (FIG. 11).
During block 20 of the flow the command in the PARM structure (Table 4) is checked and if the command under execution is not a CONTINUE command, the program continues through bullet Cl to block 28. If on the other hand the command under execution is a CONTINUE command, then block 21 is entered in order to check to see whether a query stem word had been sent to the MCS 1114.
A CONTINUE command is one which indicates that the operation is to continue using a QUERY command previously provided to the MCS 1114. This situation occurs when the number of entry words in the family to be compared to the query word is too large to fit in the buffer in external RAM 1104 provided by the program QFLPKG. An initiate request and possibly one or more CONTINUE requests are then sent to QAP control board 1109 until either all entry words are sent or the result buffer space in external RAM 1104 allotted for misspellings and inflections of the original query word is filled with PACKETS. If a query had not been sent, then block 22 is entered where a continue error is logged and the operation returns via bullet H2 eventually back to block 3 where the system under control of PQAPCNTRL enters its command wait loop.
Assuming that a query word has already been sent to the MCS, block 28 is entered following block 21 or block 20. At this point a program loop is entered to process all of the entry words pointed to by the values ENTRIES and NUMENT in the PARM structure (Table 4). In block 28 an entry word in external RAM 1114 is transferred under control of PQAPCNTRL to the MCS 1114 through the input section of FIFO 1130.
The MCS 1114 in an asynchronous operation compares each of the received entry words against the query stem (STEM, Table 6) returned by a call on the PSUFIX program. Each of the entry words as sent to MCS 1114 has a format of the type depicted in Table 1-4. The parts of the entry word format are as follows: First is an unused byte. The first bit of the unused byte is strapped or set to zero under software control to avoid conflict with other control bytes in the MCS program such as AAH and FFH. Second is a packet index value which is the index of the packet in the PACKET buffer residing in the external RAM 1104 that corresponds to an entry word. Third is a SIZE byte which designates the number of bytes (i.e., characters) in the entry string. ENTRY(58) is the actual characters of the entry, the actual number of which is designated by SIZE. The actual number of entry characters sent is always an even number. For example, if SIZE=8, then 8 characters are sent. If SIZE=9 then 10 characters are sent. The MCS 1114 disregards the added (e.g., 10th) character. This is necessitated by the fact that the input portion of FIFO is 16 bits wide (2 bytes or characters).
Assume that MCS 1114 now has a query stem word and an entry word (see Table 7). The MCS then goes about the operation of comparing the query stem with the characters at the left end of the entry word. The MCS 1114 will only return those entry words which have acceptable misspellings (as defined above) with respect to the query word. It should be noted that the comparison made is only up to the length of the query stem, indicated by the SIZE byte in the query structure QS (Table 6), so that an exact match for instance is returned for a query stem of CAT and an entry word of CATATONIC. In the example for the entry word CATATONIC, ATONIC is considered to be the suffix in later processing.
As indicated in Table 1-5, the format of the entry word returned by the MCS 1114 is essentially the same as the format of the entry word (Table 1-4) which was provided to the MCS except that an MTYPE value is added at the beginning, indicating the type of misspelling classification between the query stem and entry word. Table 2-6 contains a listing of the different misspelling classifications which may be designated by the MTYPE value of the entry word. Although mismatch and substitution are included, they actually are not returned by the MCS to the external microprocessor. It should be understood that the invention is not limited to this arrangement. For example, all of the words could be returned with the MTYPE value indicating the type of error and subsequent processing could be used to eliminate the nonacceptable misspelling classifications.
With reference to Table 5, a variable MCS will be noted. The variable MCS is a location in the buffer in internal RAM 1126,1128 used to store a value which indicates the number of acceptable entries in the output portion of FIFO 1130. When the MCS1114 finds an acceptable entry it forms an interrupt signal on line 1133 which in turn causes interrupt controller 1132 to provide an interrupt on line 1134 and the microprocessor 1118 in turn increments the variable MCS so that the value of the variable MCS will reflect that another entry has been placed in FIFO. To be explained in connection with block 32, the MCS variable is decremented by 1 when an entry is read out of the output portion of FIFO.
The operation during block 28 will become more clear during the following discussion with reference to the example of FIGS. 7, 8 and 9. FIG. 7 depicts a buffer located in external RAM 1104 containing the following entry words: HEBREW, HELP, HELPS, and HEPLS which are to be compared against the query word HELPS. It will be understood that the query word HELPS has been processed, stripping the suffix S, leaving the stem HELP and the stem HELP is to be compared against the entry words in buffer 1402. In the actual operation the stem word has already been sent to MCS 1114 at this point in the operation.
Consider the format of the entry in buffer 1402 in more detail. Each of the entry words is preceded by a length value which gives the number of characters in the corresponding entry word. For example, the first value of 6 indicates a length in characters of the word HEBREW. The buffer 1402 contains all of the family of entry words for the query word HELP (by way of example, HEBREW, HELP, HELPS, and HEPLS), that is, all words beginning with the same first two letters as the query word HELP. It should also be noted that the number of entries in buffer 1402 is indicated by the value NUMENT contained in the PARM structure in internal RAM 1126,1128 (see Table 4).
At the lower part of FIG. 7 buffer areas 1404 and 1406 are shown. Buffer area 1404 is depicted having the same n |