Procedure for fining all words contained within any given word including creation of a dictionary4882703Abstract A procedure for finding words within words as implemented on a programmable digital computer first alphabetizes all letters in the given word then computes permutations of the alphabetized letters and compares them to a special dictionary created so that when a match is found in it this refers to dictionary words that are anagrams of the permutation of letters. The special dictionary is created by first preprocessing each word into an alphabetic concatenation of the letters in it, then appending the word to this anagram. This list is separated by word length, alphabetized and stored in random files for fast table look up. A finger index is created and used in the procedure to further speed execution of the process. Claims What is claimed is: Description BACKGROUND AND SUMMARY OF THE PROCESS
TABLE 1
______________________________________
BAKE
BAKER
BAKERY
BALD
BALE
BALEEN
BALK
BALKY
BALL
BALLAD
BALLET
BALLOT
BALM
BALMY
BALSA
BALSAM
BAMBOO
BANAL
BAND
BANDIT
BANDY
BANE
BANG
BANGLE
BANISH
BANJO
BANK
BANKER
BANNER
BANTAM
______________________________________
Next the letters of each individual word in the list are alphabetized and associated with the dictionary listing of the word. Table 2 shows how this would look for the sample words shown in Table 1. Then each word length is separated so that all four letter words are in one list, five letter words in another list and six letter words in yet another list. This is shown for the sample words of Tables 1 and 2 in Table 3. Finally, the alphabetized letters shown for each different word length are alphabetized and any anagrams are collected under the same main listing in each word length. Table 4 shows how the words from the sample listing picked from a standard dictionary in Table 1 would combine with words not shown in Table 1, but assumed to be contained in the same list, to form the special dictionary of words for this example.
TABLE 4
______________________________________
ABDL
BALD ABES
BASE ABIT
BAIT
ABDN
BAND ABET
ABET ABJM
JAMB
BATE
ABDR
BARD BEAT ABKL
BALK
BRAD
DRAB ABEU
BEAU ABKN
BANK
ABDU
DAUB ABGN
BANG ABKR
BARK
ABEK
BAKE ABGR
BRAG ABSK
BASK
BEAK GRAB ABLL
BALL
ABEL
ABLE
BALE ABHS
BASH ABLM
BALM
LAMB
ABEM
BEAM ABHT
BAHT
BATH ABLS
SLAB
ABEN
BANE
BEAN ABIL
BAIL ABLW
BAWL
ABER
BARE ABIM
IAMB ABNR
BARN
BEAR BRAN
BRAE ABIS
BAIS
______________________________________
This exemplefies the process of creating a special dictionary to clarify the steps necessary to create it on a small computer. Referring to FIG. 1, in that procedure a list of words from a standard dictionary would be entered into a string variable on the computer for which the process will be implemented. For recognition purposes, the word will be input using all upper case letters. A second variable will be created from this one by converting the letters in it to lower case and alphabetizing them then the original variable will be appended to this one and this variable will be stored in one of several sequential disk files according to its length (1). Table 5 depicts how this file would look for the four letter words listed in Table 1. When it has been decided that all words which should be in the data base have been entered a routine will be called to place the records in each sequential file in alphabetic order (2).
TABLE 5
______________________________________
abdlBALD abesBASE abitBAIT
abdnBAND abetABET abjmJAMB
abdrBARD abetBATE abklBALK
abdrBRAD abetBEAT abknBANK
abdrDRAB abeuBEAU abkrBARK
abduDAUB abngBANG abskBASK
abdkBAKE abgrBRAG abllBALL
abdkBEAK abrgGARB ablmBALM
abelABLE abrgGARB ablmLAMB
abelBALE abshBASH ablsSLAB
abemBEAM abhtBAHT ablwSLAB
abenBANE abhtBATH ablwBAWL
abenBEAN abilBAIL abnrBARN
aberBARE abimiamb abnrBRAN
aberBEAR abisBIAS
aberBRAE
______________________________________
For each sequential file on disk a corresponding random file will be created wherein the length of a record is equal to the length of the dictionary word in each sequential file. The first two record positions of this file shall be reserved for placement of the finger index which is inserted after the file has been written (3). Each sequential file is then read and both the lower and upper case words are written in that order to the corresponding random file but if a succeeding record carries the same lower case prefix as the one just written then only the upper case word following it is written for this record and any similar succeeding records (4). Table 6 shows how stored data would look in the random file for the words shown in Table 5.
TABLE 6
__________________________________________________________________________
abdlBALDabdnBAND abdrBARDBRADDRAB abduDAUBabekBAKE BEAKabelABLEBALE
abemBEAMabenBANE BEANaberBAREBEAR BRAEabesBASEabet ABETBATEBEATabeu
BEAUabngBANGabgr BRAGGARBGRABabhs BASHabhtBAHTBATH abilBAILabimIAMB
abisBIASabitBAIT abjmJAMBabklBALK abknBANKabkrBARK abskBASKabllBALL
ablmBALMLAMBabls SLABablwBAWLabnr BARNBRAN
__________________________________________________________________________
CREATING THE FINGER INDEX At this point the creation of the random file dictionary is complete. However, in order to speed execution of the process, a finger index must be created for each random file to indicate the starting point of certain letter combinations. Referring now to FIG. 2, this will be done by reserving the first two word positions in each random file for the pointer, or finger index, of the file. Therefore, when the sequential file is written to any random file it will start at record number 3. Since two characters are necessary to store an integer number the four and five letter word files may contain no more than two pointers per word. Six and seven letter words may have three pointers and so on up to the longest word we could have. The first pointer of the first word (6) will be used to mark the place at which the second character in the lower case word changes from "aa" to "ab", the next pointer will mark the change to "ac", then "ae" and so on. The first pointer of the second word (7) will mark the place at which the first character in the lower case word changes from "a" to "b", the next pointer for the change to "c", then "d" etc. for each letter in our dictionary. The manner in which this information is obtained is by scanning each file after the file has been written to it and storing the record number at each break point. After scanning the file, each point is recorded at the place that has been reserved for it in either the first or second word of the file [FIG. 1(5)]. IMPLEMENTATION OF THE PROCESS The following example outlines the operations that a computer must perform in order to execute the process for finding all words in any given word. The letters in the input word are first alphabetized then all permutations of those letters are listed for each successive word length to the minimum length required. Each of these permutations is then checked against the alphabetic listing in the special dictionary. If a match is found then all words associated with that permutation of letters is recorded. The process ends when all permutations listed have been checked in the dictionary. Any words that have been recorded as a match will constitute the solution. To exemplefy this, the word BAKERY will be used. The special dictionary shows no anagrams for ABEKRY. Table 7 and 8 list the permutations of these letters and words found in the dictionary.
TABLE 7
______________________________________
Five letter permutations of ABEKRY:
______________________________________
ABEKR.fwdarw.BAKER ABKRY.fwdarw.none
ABEKY.fwdarw.none AERKY.fwdarw.none
ABERY.fwdarw.none BEKRY.fwdarw.none
______________________________________
Implementation of this process on a small computer would follow the preceeding description except that, in order to speed up the process, the program would use the finger index that was created to search the dictionary from the starting point indicated to the last permutation of letters for that length word instead of searching it from the beginning of the file to the end. Referring now to FIGS. 3A and 3B, after a word has been input to the program for analysis (8) it will be stored in upper case letters (9). The program will then create a lower case variable W by converting and alphabetizing these letters and further store each individual letter in an indexed variable (10). A search is first made for the letters in alphabetical order to check for anagrams of the given word (11). If found (12) all but the given word is saved (13). The search for other words contained in the given word continues for words shorter than the given word at 14. A file is opened and the finger index checked to find a point to start the search (15). Permutations are computed in a nested loop operation (16) for the subscript of the indexed variable which when concatenated will form the W variable (17). The file search loop (18) reads every other record excluding any upper case words found (19,20,21) to the next lower case word greater than or equal to W (22). Each time this loop is executed for any word length the search picks up from the point last referenced in the file so will continue no further than to the last permutation of letters generated for any file. Matches to the W variable (24) would cause any upper case words following it in the file up to the next lower case word to be placed in the solution queue (25,26,27,28). Then permutations of the letters in the given word would be taken for successively shorter word lengths until the lower limit of word length is reached (30) after which the words in the solution queue may be alphabetized before being output in a readable format (31). As an example of the nature of a program which may be used to achieve this result, assume that the dictionary has been created as described and illustrated by Table 6 for the four, five and six letter words and stored in files named $4, $5 and $6 respectively. The following program exemplefies the coding necessary to implement this process in the BASIC language for just one type of computer. Similar programs in other languages and for other computers may be readily derived by persons of reasonable skill.
__________________________________________________________________________
100
DEFINT B-T: DEFSTR U-Z 'Defines integers & strings
105
DIM V(6),WORD(100) 'WORD is queue for solution
V( ) holds alphabetized letters
110
DEF FNW(Y)=CHR$(ASC(Y)-32*(ASC(Y)<97)) 'Make lower case
120
Z="BAKERY" 'This is the input word
130
FOR J=1 TO 6: V(J)=FNW(MID$(Z,J)) : NEXT
'Takes input and
makes lower case letters then places them in V( )
140
FOR I=1 TO 6: FOR J=I.dbd.1 TO 6 'Alphabetizes V( ) then
145
IF V(I)>V(J) THEN SWAP V(I), V(J) checks for anagrams
150
NEXT J: W=W+V(I): NEXT I: D=6: GOSUB 300
'of input word
160
GOSUB 500: FOR N=1 TO NUMBER: IF Z=WORD(N) THEN WORD(N)="")
170
NEXT: D=5: GOSUB 300 'Opens 5 letter word file
180
FOR i=1 TO 2: FOR J=I+1 TO 3'Sets up permutation of V( )
190
FOR K=J+1 TO 4: FOR L=K+1 TO 5: FOR M=L+1 TO 6
200
W-32 V(I)+V(J)+V(K)+V(L)+V(M): GOSUB 500 'Puts permutation in W
210
NEXT M,L,K.J,I
220
D=4: GOSUB 300 'Opens 4 letter word file
230
FOR I=1 TO 3: FOR J=I+1 TO 4 'Sets up permutation of V( )
240
FOR K=J+1 TO 5: FOR L=K+1 TO 6
250
W=V(I)+V(J)+V(K)+V(L): GOSUB 500 'Puts permutation in W
260
NEXT L,K,J,I: RESET
270
FOR N=1 TO NUMBER: PRINT WORD(N): NEXT 'Prints words found
280
END
300
X=STR$(D): MID$(X,1)="#": RESET 'Creates a file name
310
OPEN "R",1,X,D: FIELD #1,D AS Y: MAX=LOF(1)/D
'aand opens it
320
REM
330
REM The following coding picks a starting point for the
search using the FINGER INDEX.
340
REM
350
K=1: E=3: IF V(1)>"a"THEN GET 1,2: ON ASC(V)(1)) -97 GOTO 390,
380: GOTO 370
360
GET 1,1: ON ASC(V(2)) -96 GOTO 560,390,380
370
K=5: GOTO 390
380
K=3
390
E=CVI(MID$(Y,K,2)): RETURN 'E is the starting point
400
REM
410
REM
420
REM The following coding searches each file for the
430
REM alphabetic permutation of letters stored in W. The word
440
REM read from disk is stored in Y. If Y= W then all upper
450
REM case words following Y are stored in a subscripted
460
REM variaable named WORD. The number of the last word stored
470
REM here may be found in a variable called NUMBER.
480
REM
500
FOR E=E TO MAX STEP 2: GET 1,E 'Read every other word
510
IF ASC(Y)<91 THEN E=E+1: GET 1,E: GOTO 510
'Skip upper case
520
IF Y=W GOTO 540 'If word is less than
permutation
sought continue search.
530
NEXT: RETURN
540
IF W<>Y THEN RETURN 'Check if word is permutation
sought
550
E=E+1: GET 1,E: IF 91 >ASC(Y) THEN NUMBER=NUMBER +1:
WORD (NUMBER)=Y: GOTO 550 'if so store all upper case
words
following it
560
RETURN
__________________________________________________________________________
UPDATING THE DICTIONARY From time to time the need may arise to update the special dictionary made up to work with this process by either adding or deleting words. If the dictionary were a hard copy on paper it would be simple to either pencil in a new word or scratch out a word that is not necessary. However, an electronic media can be changed only by a programmed set of instructions. Referring to FIGS. 4A, 4B and 4C, when either adding or deleting words to this dictionary the word must first be input to the computer (32) either from the keyboard or by another input device. If the word is not entered in upper case the program will convert it to upper case and store it in a variable called V1. It will then create a second variable from the letters by alphabetizing and converting them to lower case then storing the result in a variable called V2 (33). Next, the file for words the same length as V1 will be opened and the finger index checked to extract the starting point of the record closest to the V2 variable. This file will be searched until a lower case word either greater than or equal to V2 is found (34). The procedure for either adding or deleting words differs from this point on so each will be discussed separately (35). ADDING WORDS TO THE DICTIONARY If the end of the file has been reached or if a lower case word is found that is greater than V2 (36) then V2 would be inserted in the list at that position followed by V1 and the finger index would be incremented by two for all letter combinations following the location of this word (37). If a lower case word is found that is equal to V2 then all upper case words following V2 are compared to V1 (42,43). If an upper case word is found equal to V1 then a message would be printed saying V1 "is already in the list" (45) and the program would return to the start of this process for input of another word or to end the routine (41). If an upper case word if found that is greater than V1 or if another lower case word is found then (44) only V1 would be inserted in the list immediately preceeding it. The finger index would be incremented by one for all letter combinations following the location of this word and the program would return to the start of the process to either end or process another word (41). A sample of the coding necessary to provide this function for a single word is shown below:
__________________________________________________________________________
100
DEFINT B-S: DEFSTR T-Z 'Defines integers and strings
110
DIM V(5): V1="BAKER": V2=V1 'Input word is BAKER
120
FOR J=1 TO 5: W=CHR$(ASC(MID$(V1,J)) XOR 32)
'Make lower case
130
MID$(V2,J)=W: V(J)=W: NEXT 'Store in V2 and V( )
140
FOR I=1 TO 4: FOR J=I+1 TO 5: IF V(I)>V(J) THEN SWAP V(I), V(J)
150
NEXT J,I 'Alphabetize letters in V( )
160
OPEN "R",1,"$5",5: FIELD #1,5 AS Y: MAX=LOF(1)/5
170
K=1: E=3: IF V(1)>"a" THEN GET 1,2: IF V(1)="b" GOTO 190
ELSE K=3: GOTO 190
180
GET 1,1: IF V(2)="a" GOTO 200
190
E=CVI(MID$(Y,K,2)) 'Use FINGER INDEX to get
starting point
200
FOR J=E TO MAX: GET 1,J: IF V2>Y THEN NEXT: J=J-1
'Searches
file til alphabetized input
word
exceeds word read from disk
300
IF J=MAX OR V2<>Y THEN H=2: GOSUB 450: LSET Y=V2: PUT 1,J:
J=J+ 1: GOTO 330 'If input is not alphabetized
word then
insert both alphabetized
letters and word
310
J=J+1: GET 1,J: IF V1>Y GOTO 310 'Check words following match
320
IF V1=Y THEN PRINT V1 "is already in the list":END ELSE H=1:
GOSUB 450 'If a match is found the word
is already in list
330
LSET Y=V1: PUT 1,J 'If not, then insert it in
list
400
IF V(1)>"a" GOTO 430 'Adjust FINGER INDEX
410
GET 1,1: ON ASC(V(2))-92 GOSUB 460,470
420
PUT 1,1: GET 1,2: GOSUB 460: GOTO 440
430
GET 1,2: IF V(1) ="b" THEN GOSUB 470 ELSE END
440
PUT 1,2: END
450
FOR L=MAX TO J STEP -1: GET 1,L: PUT 1,L+H: NEXT: MAX=MAX+H:
RETURN
460
MID$(Y,1)=MKI$(CVI(MID$(Y,1))+H)
470
MID$(Y,3)=MKI$(CVI(MID$(Y,3))+H)
480
RETURN
__________________________________________________________________________
DELETING WORDS FROM THE DICTIONARY If this is the end of the file or if a lower case is found that is greater than V2 (38) then a message would be printed saying V1 "is not in the list" (40) and the program would return to the start of the process for input of another word or the process would end (41). If V2 is found then all upper case words following it would be compared to V1. If V1 is not found (39) then the message would be printed saying V1 "is not in the list" (40) and the program would return to the beginning for input of another word or the process would end if there were no more updates (41). If V1 is found then the words immediately preceeding and following it would be checked (47,48). If both words are lower case then V1 and the word immediately preceeding it would be deleted and two would be subtracted from the finger index of all letter combinations following the location of this word (49). However, if either adjacent word is upper case then only V1 will be deleted and the finger index would be decremented by one for all letter combinations following the location of this word (50). A sample of the coding necessary to implement this function for a single word follows:
__________________________________________________________________________
100
DEFINT B-S: DEFSTR T-Z 'Defines integers and strings
110
DIM V(5): V1="BAKER": V2=V1 'Input word is BAKER
120
FOR J=1 TO 5: W=CHR$(ASC(MID$(V1,J)) XOR 32)
'Make lower case
130
MID$(V2,J)=W: V(J)=W: NEXT 'Store in V and V( )
140
FOR I=1 TO 4: FOR J=I+1 TO 5: IF V(I)>V(J) THEN SWAP V(I), V(J)
150
NEXT J,I 'Alphabetize letters in V( )
160
OPEN "R",1,"$5",5: FIELD #1,5 AS Y: MAX=LOF(1)/5
170
K=1: E=3: IF V(1)>"a" THEN GET 1,2: IF V(1)="b" GOTO 190
ELSE K=3: GOTO 190
180
GET 1,1: IF V(2)="a" GOTO 200
190
E=CVI(MID$(Y,K,2)) 'Use FINGER INDEX to get
starting point
200
FOR J=E TO MAX: GET 1,J: IF V2>Y THEN NEXT: J=J-1
'Searches
file til alphabetized input
word
exceeds word read from disk
300
IF J=MAX OR V2<>Y GOTO 450 ELSE H=-1 'If input is not word
sought (V2) or if end of list
then word is not in list
310
J=J+1: GET 1,J: IF ASC(Y)>90 GOTO 340 ELSE IF V1>Y GOTO 310
'Searches for input word V1
320
IF V1<>Y GOTO 450 ELSE GET 1,J+1: IF ASC(Y)>96 THEN GET 1,J-1:
IF ASC(Y)>96 THEN J=J-1: H=-2 'If not found then say so
otherwise check for single or
double deletion
340
FOR I=J TO MAX: GET 1,I-H: PUT 1,I: NEXT: MAX=MAX+H
'Delete
400
IF V(1)>"a" GOTO 430
410
GET 1,1: ON ASC(V(2))-96 GOSUB 460,470
420
PUT 1,1: GET 1,2: GOSUB 460: GOTO 440
430
GET 1,2: IF V(1)="b" THEN GOSUB 470 ELSE END
440
PUT 1,2: END
450
PRINT V1 "is not in the list": END
460
MID$(Y,1)=MKI$(CVI(MID$(Y,1))+H)
470
MID$(Y,3)=MKI$(CVI(MID$(Y,3))+H)
480
RETURN
__________________________________________________________________________
|
Same subclass Same class Consider this |
||||||||||
