Search improvements for electronic spelling machine5113340Abstract Improvements in a hand-held spelling machine increase the speed with which a query word is compared against the words in memory. One technique is to provide a look up table to encode all character sets of one or two letters into a coded string. Where the set of letters is three or more characters, a previously known algorithm is employed. Search of the memory is limited to only a few main branches of the tree. The limitation is a function of the first query word letter. The time it takes to calculate the similarity function is saved in two circumstances. When a similarity function is calculated at a particular level of the tree and found to be great enough so that there is no prune of the tree, then that decision not to prune is carried forward for other tree branches having the same letters prior to the level involved. Claims What is claimed is: Description A Microfiche Appendix consisting of five sheets having 320 frames plus five test frames is included.
______________________________________
Group Letters
______________________________________
1 G, T, C, D, J, K, Q, and X
2 B and P
3 S and Z
4 F and V
5 M and N
6 W, R, and a leading L
7 L
8 All vowels: A, E, I, O, U,
and Y
______________________________________
A leading L is simply an L as the first letter of the word. An example of an encode string is set forth below. All of the vowel sounds are treated as identical. Using as an example, PHONETIC as the input word (be it a query word or selected word), the encode string would contain the following:
______________________________________
P the ASCII character
TC code 224 transition code, from no
sound to P sound
Group 4 code F sound
H the silent ASCII H character
O the ASCII character
Group 8 code vowel sound
N the ASCII character
TC code 236 transition code, from P
sound to N sound
Group 5 code N sound
E the ASCII character
Group 8 code vowel sound
T the ASCII character
TC code 205 transition code, from N
sound to T sound
Group 1 code T sound
I the ASCII character
Group 8 code vowel sound
C the ASCII character
-- no transition code, since
T and C are in same group
Group 1 code C sound
______________________________________
The phonetic comparison module 34 compares the phonetically encoded query word 16 with the phonetically encoded selection word 18 to provide a similarity characteristic theta. This similarity characteristic is the theta calculation disclosed in the referenced '811 patent and thus no detailed disclosure of such is required herein Suffice it to say that a two pass algorithm is employed in calculating theta. A first pass is a comparison of the characters in a forward direction and the second pass is a comparison of the characters in a backward direction. LOOK UP TABLE The phonetic encode module PEM 22 combines a look up table and an algorithm for providing the encode string 26. The look up table is employed when the branch segment being encoded is one or two letters. The algorithm is employed where the branch segment is three or more letters. This provides an optimum trade-off of software capacity and encoding speed. Capacity is limited in a hand-held device and encoding speed is important for the utility of a hand-held device. It should be recognized that the significant value of the look up table in the PEM 22 is that when the tree is traversed, branch segments are often added to words or branch segments that have already been encoded. For example, with reference to FIG. 2, if ABACUS has been encoded, that encoding would have been by the algorithm. But the encoding of ABACUSES would constitute adding the encoding of ES to what has already been encoded. The encoding of ES would be done through the look up table. Branch 7 shows another situation where the first word encoded BABE would be by the algorithm but the additional encoding necessary to provide the word BABEL would be by the look up table. FIRST CHARACTER PRUNE The table below labelled First Character Prune indicates the specific operation of this first character pruning technique. The first letter of the query word is identified. The word is then compared only with the words in the tree that begin with one of the letters indicated in the second column of the table. All the branches of the tree that start with all other letters are pruned and thus the search time is materially reduced. For example, the query word "GRAET" would be compared only against words in the dictionary which start with one of the four letters G, J, N or K. It will also be noted from the table that the vowels A, E, I, O and U are compared only against all of the words starting with a vowel including the vowel Y. The vowel Y is also compared against all words starting with a vowel and in addition is also compared against words starting with J. Other pruning techniques may result in further pruning of the traverse. However, the prune based on the first character is the significant prune added by the improvement of this invention in saving time.
______________________________________
First Character Based Prune
Can Be Matched With
Words Having The
First Letter of Query Word
Following First Letters
______________________________________
A A E I O U Y
B B
C C K S Q
D D
E A E I O U Y
F F P V
G G J N K
H H W
I A E I O U Y
J J G H Y
K K C Q N G
L L R
M M N
N M N P G K
O A E I O U Y
P P F T N S
Q Q C K
R R L W
S S C Z P
T T P
U A E I O U Y
V V F
W W H R
X X Z S
Y A E I O U Y J
Z Z S X
______________________________________
The N Level Processing Simplification The tree branch pruning that is normally undertaken is a function of the theta value calculations of the sort disclosed in the '811 patent. What has been found is that there are two circumstances in which by foregoing pruning, time can be saved on the theta calculation that on the average more than makes up for the additional time involved in going through the branches of the tree that were not pruned. The pruning of the search routine normally saves time. But there are particular circumstances where the amount of time saved by pruning (and thus avoiding traversing a portion of the tree) does not warrant the time it takes to make the calculation that supports the prune. A first one of these situations is where a theta calculation determines that one does not prune at a particular tree level. That determination not to prune is carried down at that tree level without requiring that the theta calculation be remade for letter changes at that level. This can best be understood with reference to a particular example. The theta calculation is not made at levels 1, 2 or 3 because the dissimilarity factor is inadequate to justify the pruning of branches. The theta calculation is made at level 4 and above. When the theta calculation is less than a particular threshold, then all of the branches which stem from that level 4 character are pruned and no further comparison is made of the query word against words in those branches. But if theta is greater than the threshold, the indication of similarity is sufficiently great so that no pruning is involved. What has been recognized in this invention is that the calculation of theta being greater than such a threshold at level 4 can be used to make the non-pruning decision at level 4 for all tree words having the same letters at the first three levels. For example, with reference to FIG. 1, if ABAC from the tree is compared with the query word and the theta calculation is greater than the threshold, then a decision is made to proceed to level 5 and come up with words such as ABACI, ABACK, ABACUS and ABACUSES. Now, rather than recalculate theta for ABAF, the determination that pruning would not be undertaken for branches stemming from ABAC is automatically applied to branches stemming from ABAF as well as branches stemming from ABAL. This means that two theta calculations are avoided and that the time consumed in those two theta calculations is saved. Experience shows that traversing those branches for comparison against the query word has sufficient likelihood of providing valid words so that there is a reasonable trade off of time saved in calculating theta against time that might be saved in pruning these branches. This particular theta calculation saving routine does not affect the calculation of theta at level 5. Such calculations proceed regardless of the decision at level 4. Furthermore, theta calculations are made at level 4 where the first 32 letters are not identical. Thus, a theta calculation for the branch segment ABAC which results in the decision not to prune is a decision that would not be applied in branch 1 because the first three letters AAR differ from the first three letters ABA. LARGE QUERY WORD PROCESSING SIMPLIFICATION On a query word with seven or more characters, the division operation employed to provide the theta value becomes cumbersome. As described in the '811 patent, there are actually two comparison passes made, the first one of which informs the second one and both of which provide theta values that can be evaluated against different thresholds. The normal theta calculation and evaluation scheme is simplified and thus time is saved in this invention by eliminating one of two division operations. As disclosed in the '811 patent, the theta calculation or similarity function is calculated as a forward similarity function between the query word and each stored data word and is also separately calculated as a reverse similarity function between the query word and each stored data word. This dual similarity function calculation is the preferred mode for determining the degree of similarity and determining whether or not pruning is called for. What has been determined in connection with the improvements of this invention is that the calculation of theta after the forward comparison can be eliminated for words having seven or more characters. Pruning decisions on such words are made after the final theta calculation is provided subsequent to the completion of both the forward and reverse comparisons. Avoiding the calculation of the theta value after the forward comparison saves more time than the increased tree traversal time that may be incurred by virtue of not pruning in response to the first theta calculation. FIG. 3 is a flow chart representing a significant portion of the phonetic correction routine. The process indicated at the function box 40 refers to the encoding function which provides the query word encode string 24. Function box 41 refers to the next step in the traverse of the tree so as to provide the next string of word letters. The function box 42 refers to providing the encode string 26 for the set of current word letters provided by the function box 41. The function box 43 is the first character prune. The function box 44 refers to the theta value calculations of the sort described in the reference to the '811 patent. The decision box 45 provides a prune command as indicated at function box 46 if the theta value is below a particular threshold. However, if the theta value is above a particular threshold then a pruning command is not generated. It might be noted that the threshold has to be determined experimentally for each level of the tree. Different threshold values have to be established and, for any given memory entry format, the device has to be exercised and the tree traversed for a large number of query words in order to determine the optimum threshold value so that a minimum number of desirable words are eliminated yet a reasonable amount of pruning occurs so that the traversal time is reduced. The decision boxes 47 and 48 are pretty much self evident and provide the technique for eliminating certain valid words based on the theta value being below a threshold. The decision box 49 provides the determination for ranking the words in the result list provided by the function box 50. The decision box 51 simply provides an indication that the tree has been completely traversed so that the result list can be put out in the ranking established by the function box 50. This output is indicated by the function box 52 and may be provided to a screen on the keyboard of the machine dictionary. The device 60 shown in FIG. 4 illustrates a self-contained, battery operated, readily portable, hand-holdable product having a one line LCD character display 61 and a keyboard 62. The keyboard 62 includes keys for the 26 letters of the alphabet. In addition, it has an on switch 63 and a backspace key 64. The clear key 65 clears the display and permits the user to initiate another query word. The two keys 66 are scroll keys permitting the user to scroll back and forth through the word list that has been developed by the process illustrated in FIG. 3 in response to a query word. The enter key 67 is actuated after a query word has been completed so that the device will perform the tree traverse functions described herein. Microfiche attached Appendix A is a presently preferred listing in Z80 assembly source code together with commentary in Source Code C. This listing is by way of an example of the routines for accessing and operating an electronic dictionary to implement the combinations of routines of this invention. A skilled programmer may implement the invention by means of a different code listing. With respect to the file listing Appendix A, the 26 page file labelled with the file name: Zcorrect is of particular pertinence to the subject matter of this invention. Roughly speaking, material on pages 4 and 5 of that listing are relevant to the first character prune, the material on pages 5 and 6 are relevant to the N level processing simplification. The material on pages 7 and 8 are relevant to the large query word processing simplification. The contents of the Microfiche Appendix in sequence are as follows:
______________________________________
Starts on Microfiche
File Name Frame Nos.
Sheet Number
______________________________________
SpHI.asm 003-030 1
CUtils.asm 031-036 1
Io.asm 037-069 1
HelpText.asm 070-072 1
Help.asm 073-075 2
Main.asm 076-081 2
Data.asm 082-091* 2
Flags.asm 092-116 2
Spell.asm 117-131 2
Select.asm 132-136 2
PDict.asm 137-156 2
List.asm 157-165 3
Zcorrect.asm 166-191 3
Subs.asm 192-194 3
Numerics.asm 195-210 3
NumTable.asm 211-214 4
Correct.asm 215-229 4
CharSet.asm 230-233 4
Encode.asm 234-266 4
American GetTrie.asm
267-319** 4
Index 320 5
______________________________________
*Includes SpellEqu.asm file on frames 083-089
**Includes TrieEqu.asm file on frames 268-269
|
Same subclass Same class Consider this |
||||||||||
