Keyword search automatic limiting method4554631Abstract Database locations for user-supplied query keywords are individually determined. Those locations for the full set of query keywords and for successively smaller groups of such query keywords are screened for locations common to all words of a group. The screening is stopped after all query keywords have been considered and found to have been included in at least one of the tested groups of query keywords having a common database location. Claims What is claimed is: Description This invention relates to a method for computer-assisted keyword searching of textual database information; and it relates, in particular, to such a searching method which is useful in computer-assisted instruction systems.
TABLE 1
______________________________________
Query Keyword Tag List
Tag No. buffer edit file mm text data
______________________________________
1 1.2.1.2 1.2 1.1 3 1.2 4.4.3
2 2.1.2 2 1.2.2 4.4.3 1.4 4.5
3 2.4.1 1.4 4.6 3.1
4 2.6.3 1.5.1 3.2.5
5 1.5.2 3.2.7
6 1.5.3 3.3.3
7 1.5.4.1 3.4.4
8 1.5.5 3.4.5
9 2.3.7 4.1
10 2.4.1 4.5.1
11 3.1.1 4.5.5
12 3.3
13 4.6
______________________________________
Each keyword entered by the user as a part of the user's query is considered to be the name of a lexicographically ordered set of hierarchical tags representing topical locations where that query keyword is found in the database. Referring to Table 1, for example, the keyword "edit" is the name of a set of tags which contains two hierarchical tags, namely, 1.2 and 2, as shown in the column of Table 1 under the word "edit." Following the consideration of any particular set of query keyword tags in connection with the universal list, the process in FIG. 3 then, once more, checks to determine whether or not the last query keyword of the set has been sought in the database keyword list. If not, the process loops back to select and search for a new query keyword. If the last keyword of a set of query keywords has been considered, the process then tests the keyword-found counter in field 26 in the memory to determine whether or not it is null and thus indicating that no keywords were found. A non-null state means that at least one of the query keywords has been found in the course of the search. If no query keywords were found in the database keyword tag list, the process then ends by displaying to the user a notice indicating that none of the query keywords has been found. Alternatively, if at least one query keyword was found in the database keyword tag list, the process then calls a BITSTRING subroutine, illustrated in, and to be discussed in connection with, FIG. 5 for completing the construction of the bitstring list of memory field 23 and the Table 2 previously mentioned. That subroutine produces the Table 2 bitstrings which correlate the query keywords there with the universal list tags. Following completion of the bitstring list, the process then proceeds to select one or more subsets of the list by identifying therein only those sets of keywords which have the maximum desirable utility as measured by the full set of query keywords. This elimination process for less useful keyword sets is sometimes called pruning; and it is illustrated in, and will be described in connection with, a PRUNING subroutine illustrated in FIG. 6. Upon completion of that subroutine, the most relevant keyword combination locations will have been identified for the user and are displayed in a manner to be subsequently indicated. The FIG. 3 process is then at an end. FIG. 4 illustrates detail of the FIG. 3 step of adding appropriate ones of the tags of a found query keyword of the universal list. The FIG. 4 process constructs the universal list of tags in lexicographical order of their topics in accordance with three rules which are applied to each tag of a newly found query keyword. Those rules are based upon the fact that tags of a query keyword are considered in lexicographical order, and the fact that the universal list construction process inherently causes that list to be formed in lexicographical order. The rules are based upon both hierarchical and lexicographic comparisons of keyword tags indicating topical positions of the keywords in the database. In a lexicographical comparison, one tag is greater than another if, in the most significant order in which they differ, its number is greater. In a hierarchical comparison, one tag is greater than another if it has fewer orders of numbers; and its numbers in those orders are the same as the numbers in corresponding orders of the longer tag. A lexicographic comparison involves a character-by-character ordering as in the making of a dictionary. In the present context, it is used for a number-by-number series of comparisons of tags, starting at the hierarchically most significant number, i.e., the left-most number, and ending when a mismatch occurs. At the point of mismatch, the relative magnitudes determine the lexicographical order. Although lexicographic comparison makes the operation efficient, its use is not essential to the invention. The first rule for universal list construction is that if a tag of a found query keyword is not included already on the universal list (and, of course, no tags would be included for the first found keyword of a query), and if the tag is not hierarchically greater than another universal list tag in the same branch, or sequence, of the hierarchy, or less than any other tag on the same branch, or sequence, of the hierarchy, then that tag is added to the universal list in a position determined by a lexicographic comparison function. A second rule for the construction of the universal list is that, if the tag of the found query keyword is not already found on the universal list, and if it is hierarchically less than a tag which is already on the list in the same hierarchical branch, then, the latter tag is replaced on the universal list with the tag under consideration of the most recently found query keyword. Thirdly, if the most recently found query keyword tag under consideration is found on the universal list already, or if that tag under consideration is hierarchically greater than a tag which is already on the universal list, then this tag under consideration is not again added to the universal list. The universal list, constructed in the manner outlined, is included as a column (UTAG LIST) of data in a bitstring list which is constructed and stored in the field 23 of the FIG. 2 memory map. One bitstring list comprising the aforementioned six illustrative query keywords, and a universal list for those keywords, is included in the following Table 2:
TABLE 2
______________________________________
Bitstring List
Bit universal
No. list buffer edit file mm text data
______________________________________
1 1.1 0 0 1 0 0 0
2 1.2.1.2 1 1 0 0 1 0
3 1.2.2 0 1 1 0 1 0
4 1.4 0 0 1 0 1 0
5 1.5.1 0 0 1 0 0 0
6 1.5.2 0 0 1 0 0 0
7 1.5.3 0 0 1 0 0 0
8 1.5.4.1 0 0 1 0 0 0
9 1.5.5 0 0 1 0 0 0
10 2.1.2 1 1 0 0 0 0
11 2.3.7 0 1 1 0 0 0
12 2.4.1 1 1 1 0 0 0
13 2.6.3 1 1 0 0 0 0
14 3.1.1 0 0 1 1 1 0
15 3.2.5 0 0 0 1 1 0
16 3.2.7 0 0 0 1 1 0
17 3.3.3 0 0 1 1 1 0
18 3.4.4 0 0 0 1 1 0
19 3.4.5 0 0 0 1 1 0
20 4.1 0 0 0 0 1 0
21 4.4.3 0 0 0 1 0 1
22 4.5.1 0 0 0 0 1 1
23 4.5.5 0 0 0 0 1 1
24 4.6 0 0 1 1 0 0
______________________________________
Now, the universal list construction process U-LIST, using the aforementioned three rules, will be described in connection with FIG. 4. A first step is to determine whether or not there are any unconsidered tags associated with the query keyword. If none, e.g., as for the case where all tags of a found query keyword have been considered, the process returns to the FIG. 3 search routine at the last keyword test. However, on the initial call of this universal list routine for a query keyword, there will be a keyword tag left; and the first, or next, such tag (ktag) is selected as appropriate. A test is made for any remaining universal list tag (utag) which has not been considered with the ktag. If there is none, the ktag just selected is added to the universal list after the most recently added, if any, utag; and the process loops back to test for further ktags. However, assuming a remaining utag, the first, or next, such utag is selected as appropriate. Now, the last selected ktag is tested to determine whether or not it is hierarchically at least equal to the last selected utag. In a hierarchical comparison, a series of number comparisons are made, starting from the most significant number position, as in a lexicographic comparison, and ending again when a mismatch occurs at a number position in which there is no ktag number. In the latter case, the ktag can never appear in a display to a user, so the process loops back to select a new ktag without considering any other utags; and, therefore, the third rule has been satisifed. In the former case, a lexicographic comparison is required to determine whether or not the current utag is required for further comparison. If the ktag is lexicographically at least equal to the utag, another hierarchical test is made to determine whether or not the current ktag is hierarchically less than the current utag. If the test result is negative, the first rule has been satisfied, and the ktag is added to the universal list after the most recently added utag; and the process loops back to test for further ktags. If the test is affirmative, the utag can never appear in a display to the user, so it is replaced by the ktag on the universal list; and the process loops to test for further ktags. FIG. 5 depicts the BITSTRING subroutine process mentioned in connection with FIG. 3 and relating to the bitstring list of Table 2. Table 2 includes what might be considered ordinates in the form of bit numbers and the universal list of keyword tags; and it includes what might be considered abscissas in the form of the keywords of the user query. It remains, then, to secure the necessary binary ONE and ZERO information for filling in the columns of Table 2 to indicate whether or not a particular keyword is associated with any particular tag on the universal list. In the bitstring list of Table 2, a string of binary ONE and ZERO bits appears in the column beneath any particular keyword and represents the bitstring for that keyword. The bitstrings are also indicated in memory field 23 in FIG. 2 where they are stored with their respective keywords and associated utags. Each bit of the bitstring is set to a binary ONE or ZERO, depending upon whether or not an element of the universal list in Table 2 is also an element, or is hierarchically less than an element, of the particular keyword set of tags in Table 1. The process indicated in FIG. 5 depicts the method for determining and setting the binary ONE and ZERO bits of a bitstring for a simple keyword in Table 2. The first step in the process of FIG. 5 is to select the first, or next, found query keyword in the query keyword tag list field 22. The next step is to select the first ktag for that query keyword. Then, the first utag is selected from the universal list of Table 2. A bit pointer in memory field 26 is initialized to the bit number, in Table 2 and field 23, of the universal list tag just selected to provide a marker in field 23 for return when it is time to set a bit state. Now, the selected ktag for the Table 1 query keyword is hierarchically compared with the utag selected from the Table 2 universal list to determine whether or not the ktag is at least equal to the utag. This is the same type of comparison previously carried out in constructing the universal list, and which stops at the number order of the first mismatch or at the number position in the ktag where there is no number. In the latter case, the ktag is hierarchically greater than or equal to the utag, and the bitstring bit corresponding to both that tag and the ktag is set to the binary ONE condition. Next in the process of FIG. 5, a test is made to determine whether or not the last utag has been considered for the current keyword. If there are more, the process selects a new utag, increments the bit pointer, and loops back to compare the new utag with the same ktag for the Table 1 query keyword. However, if the last utag had been considered, a test is made for more found query keywords; and, if none, the process ends for this subroutine; and control is returned to the FIG. 3 keyword search process to initiate the pruning operation previously mentioned. However, if there are more query keywords to be considered, the process loops back to select the next query keyword. Returning to the FIG. 5 hierarchical tag comparison decision step, if the comparison revealed that the Table 1 query keyword list ktag is not greater than or equal to the Table 2 universal list utag, and if the query keyword list ktag is lexicographically not less than the universal list utag, the pointed bit in the keyword bitstring is set to ZERO. That setting indicates the absence of the universal list tag from the query keyword tag list, either directly, or as an hierarchical subordinate. Returning to the FIG. 5 lexicographical tag comparison decision step, if the comparison produces an affirmative result, a test is made to determine whether or not the last ktag of the set for the keyword has been considered. If it has, a test is made for the next query keyword as before. However, if the last ktag has not been considered, a new ktag is selected from the query keyword list in Table 1 for the same keyboard. The process then loops back to the hierarchical tag comparison test with the same utag. The retionale for the bitstring representation is that intersections among bitstrings can be performed by bit operations, which are expected to be fast on a computer. An "intersection" in this context is an additional bitstring which indicates, for a given set of query keywords, whether or not they all share any of the utags, and which ones, if any. The term is also used to indicate the process of finding that sharing state. The result of an intersection of a number n of bitstring sets will determine which utags have the corresponding keywords in common. In the illustrative example, to determine if any topic (represented by a utag) has six keywords in common, an intersection is done among the six bitstrings; but it can be seen in Table 2 that such an intersection would show that no topic has six keywords in common. An intersection of one set of three-keyword bitstrings is shown below in Table 3. As can be seen in the column on the right labeled "AND," bit number 12 is the only nonzero bit, i.e., the only bit which is in the ONE state for every one of the query keywords of the three-keyword set listed across the top of Table 3. This corresponds to the fact that the tag 2.4.1 (bit No. 12 in Table 2) has the three keywords (buffer, edit, and file) shown at the top of Table 3 in common.
TABLE 3
______________________________________
Intersection of Bitstrings
Bit No. buffer edit file AND
______________________________________
1 0 0 1 0
2 1 1 0 0
3 0 1 1 0
4 0 0 1 0
5 0 0 1 0
6 0 0 1 0
7 0 0 1 0
8 0 0 1 0
9 0 0 1 0
10 1 1 0 0
11 0 1 1 0
12 1 1 1 1
13 1 1 0 0
14 0 0 1 0
15 0 0 0 0
16 0 0 0 0
17 0 0 1 0
18 0 0 0 0
19 0 0 0 0
20 0 0 0 0
21 0 0 0 0
22 0 0 0 0
23 0 0 0 0
24 0 0 1 0
______________________________________
FIG. 6 depicts the subroutine, hereinbefore mentioned in connection with FIG. 3, for the so-called PRUNING operation. In this subroutine, a collector register 27 and two flag registers 29 and 30 are employed; and each includes a binary bit flag in a predetermined bit position for each query keyword. The universal list construction is complete, and a bitstring has been formed for every query keyword found in the database. Now, in the pruning subroutine, systematic intersections will be performed on the bitstring to focus the search results on one or more topics (represented by utags) of narrowest scope and greatest apparent user interest, as represented by the largest numbers of found query keywords contained therein. Potentially, there can be many topic locations, i.e., tags, on the universal list. To minimize the display of topic titles in the ultimate search result in some meaningful manner, without user intervention, combinations of different orders of intersections are performed on the keyword sets in Table 2 to reduce the number of topics that must be displayed to those which have the most keywords in common. For the purpose of this description, an n-combination is a selection of any number n of distinct objects from N objects in a total. An nth order intersection is the intersection together of all bitstrings in a particular n-combination of the N objects. If n/N is small, the number of nth order intersections possible is large; and if n=N, the number is unity. The first step in the PRUNING process of FIG. 6 is to set an indexing counter in field 26 of the FIG. 2 memory map to the number of keywords on Table 2, i.e., n=N. The next step is to initialize all flags in the cumulative flags register 29 to indicate that none of the query keywords has been employed in an intersection at a prior nth order. All flags in the current flags register 30 are similarly initialized to indicate no prior intersections in the current nth order. The next step is to select an n-combination of the keywords; and, at this initial stage, only one is possible. That selection is achieved by employing n pointers which point to the query keywords, respectively, of the selected n-combination. That is, n, multibit, binary coded words are the pointers. Initially, all of the query keywords are pointed at once. At lower orders, i.e., values of n, there are fewer pointers for any combination of keywords, but more combinations. Accordingly, if the keywords are assumed to be in a numerical sequence of locations, and the pointers are systematically varied to select all of the possible n-combinations of the nth order, the nth, or least significant, pointer varies most rapidly from set n to set N. Similarly, the first, or most significant, pointer varies most slowly from set 1 to position (N-n+1). Next, in FIG. 6, the process checks whether or not the selected n-combination of keywords is superfluous, that is, to see if all keywords corresponding to the selected n-combination have already been included as a subset of an n-combination for a previous, i.e., larger, value of n found to include an intersection. That determination is made by checking the flags for each of the query keywords for the current n-combination in the cumulative flags register 29. If all are set, the combination is superfluous. If an n-combination under consideration is superfluous, i.e., all of its words have been previously found, the process jumps ahead to test whether or not the current n-combination is the last n-combination of the nth order; and, if it is not, the process loops back to select a new n-combination. However, if the current n-combination is not superfluous, the process proceeds to calculate the intersection of the corresponding bitstrings in a manner which is hereinafter considered in connection with FIG. 7. The output of the intersection operation is either high or null to indicate that either the intersection found at least one coincidence of binary ONE bits in a corresponding bit position in all of the columns corresponding to the keywords of the selected n-combination; or it is null, indicating that, for every utag, at least one of the keywords in the n-combination set had a binary ZERO in its bitstring at the bit position for the universal list tag under consideration. If a null is found, the process jumps ahead, as previously mentioned, to test whether or not the last of the n-combinations has been considered at this nth order. However, if no null is found, i.e., if an intersection is found, for each query keyword in the n-combination, a corresponding flag in the current flags register 30 is set to record the keywords for that n-combination. Also, the same keywords, along with the universal list topic title tags associated therewith through the one or more sets of all-ONE bits identified by the nth order non-null intersection just detected, are loaded into a collector register 27. Next, the process tests to determine whether or not the last n-combination at this order of n has been considered, as indicated in FIG. 6. If the last has not been considered, the process loops back to select a new n-combination and repeat the intervening steps to collect in register 27 any additional n-combinations of keywords and their associated universal list tags which are identified in the inersection. However, if the last n-combination has been considered, the process updates register 30 by setting all those flags in the cumulative flags register 29 that are set in the current flags register 30, and then clears register 30. Then, if all flags of register 29 are set, all keywords have been used. If they have not been so used, the value of n is decremented; and the process loops back to select a new set of keywords using the new value of n. This looping continues until all of the keywords have been found in at least one intersection which is not superfluous. When all keywords have been so used, the subroutine of FIG. 6 ends; and control is returned to the main program in FIG. 5 to indicate the end of the keyword search. At that point, the search results, comprising the n-combinations of keywords and associated utags from collector register 27, are displayed. In the illustrative example, which started with six keywords, the intersection of all bitstrings for all six keywords is null, as can be seen in Table 2. This means that there is no topic in the courseware which is common to all six of the keywords. This is also true for the six subsets of those six keywords at the nth order where n=5, that is, for the six subsets of five keywords; and it is further true of the thirty subsets of n=4 keywords. Not until combinations including only n=3 keywords are formed does an intersection appear which is not null. Such an example is illustrated in Table 3, and the topics from that intersection are indicated below: buffer editor text 1.2.1.2 editor files text 1.2.2 buffer editor files 2.4.1 files mm text 3.1.1 3.3.3 The PRUNING process then continues with the intersections between nonsuperfluous n-combinations of two bitstrings. When this level is completed, the foregoing n=3 combinations in register 27 are augmented by the n=2 combinations shown below: mm data 4.4.3 data test 4.5.1 4.5.5 It will be noted from the foregoing that the only intersections involving n=2 keywords that are presented are those due to the combinations of the keywords "data," "mm," and "text." All other combinations involving two keywords are not displayed, since they are subsets of previous combinations involving three or more keywords that had non-null intersections. The search process need not continue in the illustrative case to the state of n=1, since any other combinations of keywords will be subsets of prior combinations. This state is reached when all keywords have been used in some combination involving non-null intersections. Usually, the number of universal list topic location tags displayed to the user can be significantly reduced from the original number of query keyword tags received from the database and placed in the query keyword tag list of register 22. In the foregoing example, a total of eight utags for six n-combinations were presented to the user out of the original 29 ktags associated in Table 1 with the query keywords which the user had offered. Keywords that refer to a single topic, i.e., have but one ktag, will return fewer utag topic locations than keywords that have multiple ktags. Thus, the keyword information retrieval mechanism is biased towards the person looking for a specific topic who supplies words designed to narrow the alternatives. On the other hand, the system also responds to the user who enters words that have no topics in common. In FIG. 7 is shown the intersection subroutine utilized in the FIG. 6 pruning process. This subroutine takes each newly identified, nonsuperfluous, n-combination of query keywords, and identifies any utags common to them all. The first step in the intersection subroutine is to initialize, i.e., set to all ONEs, an intersection result register 28 in the FIG. 2 memory map. That register has enough bit locations to equal the maximum number of utags that might be expected to be used for a single query from a user. A first, or next, bitstring for a keyword n-combination is then selected. This selected bitstring is then ANDed, e.g., in a bitwise fashion, with the corresponding bit contents of the aforementioned result register. The output of each AND operation is written back into the result register. At the end of the ANDing, there are ONEs in the result register for only the bit positions also having ONEs in the current bitstring used. An all-ZERO test of the result register contents will reveal a null result. If the result is null, there is no need to continue further, because there can be no non-null intersection with this n-combination; and the process jumps ahead to return control to the process of FIG. 6 from which it was called. On the other hand, if test outcome is negative, a test is made to determine whether or not the last bitstring of the n-combination has been considered. If it has not been considered, the process loops back to select a new bitstring and repeat the intervening operations. However, if the last bitstring has been considered, the subroutine ends; and control is returned to the FIG. 6 PRUNING subroutine. As previously indicated, in connection with FIG. 3, the final function in the search process is not actually indicated in the FIG. 3 process, but involves the display of the search results to the user. FIG. 8 illustrates an enlarged screen 17 with text therein illustrating the typical result, e.g., for the illustrative example, that is produced for the benefit of a user of the computer-assisted instruction system. Although the present invention has been described in connection with a particular application and embodiment thereof, additional applications, embodiments, and modifications which will be obvious to those skilled in the art are included within the spirit and scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
