Method and system for efficiently searching a master database for a desired target without accessing the master database6119113Abstract A method and system for efficiently searching a master database includes a memory device for storing a subset of the master database in an auxiliary data-base. A computer program receives an input signal corresponding to the search and processes the input signal with the auxiliary database to determine the presence or absence of qualifying search targets. Thus, target information is determined prior to actually accessing the database. Claims What is claimed is: Description TECHNICAL FIELD
TABLE I
______________________________________
1. QUINE
2. QUINLAN
3. QUINLAN
4. QUINLAN
5. QUINLAN
6. QUINLAN
7. QUINLAN
8. QUINLAN
9. QUINLAN
10. QUINLAN
11. QUINLAN
12. QUINLAN
13. QUINLAN
14. QUINLAN
15. QUINN
16. QUINN
17. QUINN
18. QUINN
19. QUINN
20. QUINN
21. QUINN
22. QUINN
23. QUINN
24. QUINN
25. QUINN
26. QUINN
27. QUINN
28. QUINN
29. QUINN
30. QUINN
31. QUINN
32. QUINN
33. QUINN
34. QUINN
35. QUINN
36. QUINN
37. QUINN
38. QUINN
39. QUINNEY
40. QUINT
41. QUINT
42. QUINT
43. QUINT
44. QUINT
45. QUINTANA
46. QUINTANA
47. QUINTANA
48. QUINTER
49. QUINTIN
50. QUIRK
51. QUIST
52. QUIST
53. QUIST
54. QUIZAR
55. QUOCK
56. RAAB
57. RAAB
58. RAAB
59. RAABE
60. RAABE
61. RABAGO
.
.
.
.
166. RAINFORD
167. RAINFORD
168. RAINS
169. RAINS
170. RAINS
171. RAINS
.
.
.
______________________________________
The highly simplified example uses 171 consecutive LastName fields from a White Pages directory to illustrate the method. The 171 entries are numbered for convenience. In practical applications, the number of records of the database may be much larger and consist of multiple fields such as, for example, FirstName and StreetAddress fields associated with each of the LastNames. There may be hundreds of thousands, or millions of records distributed among processors for rapid, reliable access. Database size and representation have no bearing on the present invention as long as the individual records may be targeted for retrieval by a series of inputs from an end user or application. These inputs may also be used to narrow the search by being directed toward a sequence of fields within records using repeated applications of the method described herein. In such a case, a search may proceed sequentially, according to one or more predetermined field search sequences, but in the same manner described herein. The method of the present invention compresses the master database into a series of small trees that may be more easily stored as an auxiliary database in a memory device. The series of small trees may also be more efficiently searched for target records. To prepare the master database for efficient search according to the present method, the allowable sequences of fields to be targeted must first be determined. In the example, only a single target field is illustrated. However, in practice, a limited number of characters might be accepted for each field before moving on to the next target field. In the example database given, the usefulness of more than three or four characters of the LastName field is quite limited. Thus, in practice, the implementation of this method might accept only three or four characters of the LastName field before moving to the FirstName or Street-Address fields to further narrow the search. FIG. 1 illustrates a partial search tree representation of the White Pages database example shown in Table I. The tree itself has two major branches, "Q" and "R," representing the two possible first characters of the LastName field. These are shown in the first row of the tree-table. Below each major branch label are the number of records in the database remaining as search targets following the corresponding entries. For example, entering a search string beginning with "Q" leaves 55 candidates in the database that match the target string--in this case, the first 55 LastNames. Likewise, entering a search string beginning with "R" leaves 116 candidates. The search of the tree proceeds down one level with each new search string entry. If the character entered does not appear in the corresponding row (to the left of the vertical border partitioning the branches), the search fails, i.e., there are no corresponding records in the database. In the example presented, the search string Q-U-A would fail immediately because there is no "A" in the third row. In fact, inspection of the tree reveals that all entries in the database beginning with "Q" begin with Q-U-I (54 records) or Q-U-O (1 record, QUOCK). A search of the derived trees yields convergence information as each alphanumeric character in the search string is entered, consisting of the number of entries found--from zero up to a settable maximum number. To illustrate, assume the search target is "QUINTER," a LastName that does, in fact, appear in the example database. Upon entering "Q" the search is narrowed from the total 171 records to 55, as shown below the "Q" in the first row. Entering "U" still leaves 55 candidates, i.e., all records beginning with "Q" begin with "QU." Entering "I" leaves 54; "N," 49; "T," 10; "E," 1; and "R," 1. While simple, this example illustrates how the progress of the search is contained in the tree representation of the original database. Furthermore, in the particular application, if it is desirable to display as many as 10 records (but fewer than 49), the search can be terminated and the record retrieval and display begun after the entry of "T" in this example. That is, the tree contains sufficient information to assure that all remaining candidates can be displayed. FIG. 2 illustrates how compression of the search tree begins. Using a maximum candidate record criterion of 10, as in the example of the previous paragraph, the search tree representation in FIG. 2 replaces the remaining records count at each node with either a "c" for "continue" accepting target string inputs or a "t" for "terminate" searching and begin retrieving 10 or fewer records from the full master database. Though not explicitly represented, the case of zero candidates is implicitly handled by showing no path continuation for an entry. For example, as mentioned above, Q-U-A has no path and would terminate the search with no corresponding record found. The key property of this method is that, once the criterion candidate maximum is determined, the number of candidates can be represented by one of only three possibilities: terminate with no candidate records; terminate with the number of candidates not to exceed the predetermined criterion (10 in the example); and continue accepting target string characters. These three possibilities can be represented with only two bits; e.g., "00", "01", and "10", respectively. The two-bit "11" representation can be reserved to indicate an error in the application. Once the search is terminated, if retrieval of the full record is desired, retrieval is effected from the original database using any conventional practice. Depending on the sophistication of search tree representation, the size and target distribution of the database, the characteristics of the input and output devices, and the goals of the database search--all of which depend on the particular application being implemented--the achievable compression due to the method described herein may be considerable. In the present case, using only the "Q" branch of the search tree as an example, FIG. 3 illustrates the data structure that could be compiled for the search. It assumes that a maximum of five entries will be accepted to narrow the search with the LastName field. The structure shown assumes full 8-bit ASCII input, data representation, and output capabilities for the input, storage, and display devices. Neglecting the addressing overhead (which could be considerable, depending on the search machine hardware and software), the "Q" branch of the example database, consisting of 55 entries totaling 317 ASCII characters, is represented with 12 characters and 12.times.2 additional bits to store the status information, i.e., "terminate, not found," "terminate, found," and "continue." Thus, the achieved compression ratio is {12.times.(8+2)}:(317.times.8), or more than 20:1. Turning now to FIG. 4, there is shown a simplified block diagram of the system of the present invention, denoted generally by reference numeral 10. The search tree is stored as an auxiliary database in memory device 12. The search tree includes information associated with the master database, such as the number of entries associated with each possible input signal, as shown in FIG. 1. Each one of these entries is then coded to indicate whether or not the search should continue based on the settable system criterion, or threshold. An input signal, such as a search string, is entered via an input device 14. Input device 14 may be any conventional input device such as, for example, a terminal keyboard, a keypad of a touch-tone telephone, an application, or any other similar devices. Computer program 16 receives the input signal and processes the input signal with the search tree in memory device 12 to determine the presence or absence of qualifying search targets, and their number, if any, relative to the settable threshold prior to actually searching the database. That is, computer program 16 determines whether or not a target record has been found and if so, whether or not an acceptable number of target records have been found. If retrieval of the full record is desired, retrieval is effected from the original master database 18 using any conventional practice. The present invention reduces the access time to the database and provides immediate feedback of target information without actually accessing the database. Thus, the search attempts that result in no matching records do not require a database access and also shorten the search string entry to the minimum number necessary to determine that no matches exist. Similarly, the search attempts that result in more than the system criterion number of matching records do not require a database access. The degree of increase in effective system capacity depends on the proportion of such cases as well as other user and performance characteristics. While the best modes for carrying out the invention have been described in detail, those familiar with the art to which this invention relates will recognize various alternative designs and embodiments for practicing the invention as defined by the following claims.
|
Same subclass Same class Consider this |
||||||||||
