Data ranking with a Lorentzian fuzzy score6701312Abstract The present mechanism relates to a method for searching a document database such as the Internet and ranking the results obtained from such a search. The mechanism also relates to ranking of a set of numerical data according to a set of user specified preferences, including target range, fuzziness and bias. A fuzzy score is calculated for each database record satisfying a query and the results ranked according to fuzzy score. The fuzzy score is calculated using a Lorentzian fuzzy score formula. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
Name Symbols Notes
Data Value x.sub.1, x.sub.2, . . . , x.sub.k The maximal and
minimal
data values are Max(x.sub.k) and
Min(x.sub.k), respectively
Target Range (x.sub.min, x.sub.max) x.sub.max .gtoreq.Xmin
Bias .beta. .beta. may be greater or smaller
than zero
Fuzzy Parameters ##EQU2## .DELTA. = Max(x.sub.k) - Min(x.sub.k)
Fuzziness or Closeness .alpha.1 and .alpha.2 .alpha.1 .gtoreq. 0 and
.alpha.2 .gtoreq. 0
According to an embodiment of the present invention, the present invention can calculate a fuzzy score for a user defined database query and rank the results of the query. The fuzzy score is calculated based on user specified criteria including, but not limited to, target range (x.sub.min, x.sub.max), fuzziness (.alpha.1 and .alpha.2) and bias (.beta.). In another embodiment, the fuzziness and/or bias is static and set by the software/system performing the search. While the parameters for fuzziness and bias are numeric, it is contemplated in at least one embodiment that these parameters be translated into more easily understood terminology for user selection. For example, instead of having a user specify a numeric value for each of the fuzziness parameters, .alpha.1 and .alpha.2, the user may select from a list of fuzziness categories (i.e. small, medium, large). The terms small, medium and large would equate to specific values of .alpha.1 and .alpha.2 and would be used to calculate a fuzzy score as described herein. The same holds true for the bias parameter, .beta.. Instead of having a user specify a numeric bias value, the user may select from a list of bias categories (i.e. toward the lower bound, toward the upper bound). These categories would also be equated to specific numeric values and would be used to calculate a fuzzy score as described herein. In order to illustrate the application of the Lorentzian function to calculate a fuzzy score, several examples will be given. Each example will illustrate how variations in user input parameters (i.e. target values, fuzziness, bias) affect the Lorentzian function as it is used to calculate a fuzzy score. For consistency, the examples will be based on a user who is shopping for a new home via a web site containing new home data. The web site allows the user to search a database of new homes based on the selling price of the home. For the following examples it is assumed the user is interested in houses between $200,000 and $250,000. Thus, the target range is defined as x.sub.min =200000 and x.sub.max =250000. In addition, the web site allows the user to input fuzziness values, .alpha.1 and .alpha.2, and a bias value, .beta.. For bias (.beta.) values less than zero (i.e. biased toward the lower bound of the target range), the Lorentzian fuzzy score has the following formula: For .beta.<0, ##EQU3## where x represents any data values, i.e., x.di-elect cons.{x.sub.1, . . . , x.sub.k }. FIG. 4 shows a sample plot 400 of S(x) with x.sub.min =200000, x.sub.max =250000, .alpha.1=2, .alpha.2=1, and .beta.=-0.1. The negative slope of the graph between the target values $200,000 and $250,000 is the result of the negative bias. The negative bias affects the fuzzy score calculated from data values between the target range by biasing those data values closer to the lower end of the target range. In addition, worth noting are the calculated fuzzy scores for the data points that lie outside the target range. The non-zero scores for those data points lying outside the target range are a direct result of the incorporation of the fuzziness parameters into the Lorentzian function. Consistent with the Lorentzian function is the rapid drop off of the fuzzy scores for the data points that lie farthest from the target range. In this example, and as illustrated in the table of result rankings 410, the query would have returned records for each of the 12 homes in the database. The addition of the fuzzy scores, however, makes it easy for the user to visualize the records that best conform to the original search parameters. For bias (.beta.) values greater than zero (i.e. biased toward the upper bound of the target range), the Lorentzian fuzzy score has the following formula: For .beta.>0, ##EQU4## where x represents any data values, i.e., x.di-elect cons.{x.sub.1, . . . , x.sub.k }. FIG. 5 shows a sample plot 500 of S(x) with x.sub.min =200000, x.sub.max =250000, .alpha.1=2, .alpha.2=1, and .beta.=0.1. The positive slope of the graph between the target values $200,000 and $250,000 is the result of the positive bias. The positive bias affects the fuzzy score calculated from data values between the target range by biasing those data values closer to the upper end of the target range. In addition, worth noting are the calculated fuzzy scores for the data points that lie outside the target range. The non-zero scores for those data points lying outside the target range are a direct result of the incorporation of the fuzziness parameters into the Lorentzian function. Consistent with the Lorentzian function is the rapid drop off of the fuzzy scores for the data points that lie farthest from the target range. In this example, and as illustrated in the table of result rankings 510, the query would have returned records for each of the 12 homes in the database. The addition of the fuzzy scores, however, makes it easy for the user to visualize the records that best conform to the original search parameters. In the case of no fuzziness, .alpha.1=0, .alpha.2=0, and where the bias (.beta.) value is less than zero (i.e. biased toward the lower bound of the target range), the Lorentzian fuzzy score has the following formula: For .beta.<0, .alpha.1=0, and .alpha.2=0, ##EQU5## where x represents any data values, i.e., x.di-elect cons.{x.sub.1, . . . , x.sub.k }. In the case of no fuzziness, .alpha.1=0, .alpha.2=0, and where the bias (.beta.) value is greater than zero (i.e. biased toward the upper bound of the target range), the Lorentzian fuzzy score has the following formula: For .beta.>0, .alpha.1=0, and .alpha.2=0, ##EQU6## where x represents any data values, i.e., x.di-elect cons.{x.sub.1, . . . , x.sub.k }. FIG. 6 shows a sample plot 600 of S(x) with x.sub.min =200000, x.sub.max =250000, .alpha.1=0, .alpha.2=0, and .beta.=-0.1. FIG. 6 also shows another sample plot 610 of S(x) with x.sub.min =200000, x.sub.max =250000, .alpha.1=0, .alpha.2=0, and .beta.=0.1. As with plots 400 and 500, plots 600 and 610 illustrate the effect a negative and positive bias have on the calculated fuzzy scores. As previously stated, a negative bias results in the negative slope of the plot 600 between the target values, while a positive bias results in the positive slope of the plot 610 between the target values. Since the fuzziness parameters, .alpha.1 and .alpha.2, were set to zero in both plot 600 and plot 610, all data values lying outside of the user defined target range are given a fuzzy score of zero. While the same query produced 12 records with fuzziness (as illustrated in FIGS. 4 and 5), without fuzziness only 4 records are retrieved. Moreover, records that would probably be of interest to the user, i.e. the $252,000 home, are never retrieved when the query does not incorporate the fuzziness parameters. In the special case where the target range degenerates into a single value (i.e. x.sub.min =x.sub.max =x) and where the bias (.beta.) value is less than zero (i.e. biased toward the lower bound of the target range), the Lorentzian fuzzy score has the following formula: For .beta.<0 and x.sub.min =x.sub.max =x, ##EQU7## where x represents any data values, i.e., x.di-elect cons.{x.sub.1, . . . , x.sub.k }. In the special case where the target range degenerates into a single value (i.e. x.sub.min =x.sub.max =x) and where the bias (.beta.) value is greater than zero (i.e. biased toward the upper bound of the target range), the Lorentzian fuzzy score has the following formula: For .beta.>0 and x.sub.min =x.sub.max =x, ##EQU8## where x represents any data values, i.e., x.di-elect cons.{x.sub.1, . . . , x.sub.k }. FIG. 7 shows a sample plot 700 of S(x) with x=225000, .alpha.1=1, .alpha.2=1, and .beta.=-0.1. FIG. 7 also shows another sample plot 710 of S(x) with x=225000, .alpha.1=1, .alpha.2=1, and .beta.=0.1. The affect of the bias in plot 700 and 710 is more difficult to discern since there is no target range. However, a close inspection of plot 700 shows that the negative bias does influence the calculated fuzzy scores for those data values less than the target value and in plot 710 that the positive bias does influence the calculated fuzzy scores for those data values greater than then target value. While each of the above examples has dealt with single term queries (i.e. where the user is searching based only on the selling price of a home), the present invention is easily extended to include queries with multiple search terms. To illustrate how the Lorentzian function can be used in a multiple-field query a more sophisticated home buying example is explored. The home buying database in this example is on the Internet and the user is given the option to search for new homes based on selling price, number of rooms, and number of baths. For this example, it is assumed the database contains the following records:
Selling
Price $150,000 $170,000 $195,000 S200,000 $225,000 $230,000
$235,000 $260,000 $280,000
Number 3 2 3 3 5 3 4
4 3
of Rooms
Number 2.5 2 4 3 3.5 2 3
3 1.5
of Baths
It is further assumed that the user has entered the following query:
Fuzziness: Medium
Selling Price: $200,000-$230,000 Bias: Lower Priced Homes
Number of Rooms: 3 Bias: More Rooms
Number of Baths: 3 Bias: More Baths
As previously discussed, and as illustrated here, the database or search tool/engine may be set up to allow the user to input non-numeric bias and fuzziness parameters that are more intuitive and user friendly. These non-numeric parameters are then equated to specific numeric parameters by the database application, search tool/engine or other system/software. For this example, the following non-numeric user specified bias parameters are equated to the following numeric parameters: Bias Lower Priced Homes=-0.1, More Rooms=0.1, More Baths=0.1. In addition, for this example, the following non-numeric specified fuzziness parameter is equated to the following numeric parameters: Medium.fwdarw..alpha.1=2, .alpha.2=2. While the user, in this example, was only allowed to enter a single fuzziness parameter for the entire query, it is contemplated in another embodiment that each query field (i.e. selling price, number of rooms, number of baths) could have its own separate fuzziness parameter. The process for searching the database and ranking the results of the query is straightforward. First, a fuzzy score is calculated for each query field and for each record using the appropriate Lorentzian formula. Second, an aggregate fuzzy score is calculated for each record using the fuzzy scores for each query field. Finally, the results are ranked according to the aggregate calculated fuzzy scores for each record. FIG. 8 shows the fuzzy score 800 for each record based only on selling price. Since the user has specified a bias toward the lower bound of the target range (as indicated by the user's desire for lower priced homes), the following Lorentzian fuzzy score formula is used: For .beta.<0, ##EQU9## where x represents any data values, i.e., x.di-elect cons.{x.sub.1, . . . , x.sub.k }, x.sub.min =200000, x.sub.max =230000, .alpha.1=2, .alpha.2=2, and .beta.=-0.1. FIG. 9 shows the fuzzy score 900 for each record based only on number of rooms. Since the user has specified a bias toward the upper bound of the target range (as indicated by the user's desire for more rooms), the following Lorentzian fuzzy score formula is used: For .beta.>0 and x.sub.min =x.sub.max =x, ##EQU10## where x represents any data values, i.e., x.di-elect cons.{x.sub.1, . . . , x.sub.k }, x=3, .alpha.1=2, .alpha.2=2, and .beta.=0.1. FIG. 10 shows the fuzzy score 1000 for each record based only on number of baths. Since the user has specified a bias toward the upper bound of the target range (as indicated by the user's desire for more baths), and since a single target value is also specified, the same Lorentzian fuzzy score formula used to calculate the fuzzy scores for each record based on number of rooms is used. Once a fuzzy score is calculated for each query field for each record, an aggregate fuzzy score is calculated for each record. In one embodiment, this is accomplished by simply adding the fuzzy scores of each query field together. In another embodiment, an aggregate fuzzy score is calculated by using a weighted sum. By using a weighted sum the results of a specific query field(s) can be given more weight. For example, a user may consider the selling price of a home more important than any of the other query fields. In order to incorporate this into the ranking methodology, the fuzzy scores calculated based only on the selling price of the home are multiplied by some factor so that the selling price of the home has more influence on the aggregate fuzzy scores. The aggregate fuzzy scores 1100 for each record in this example are shown in FIG. 11. In this example the fuzzy scores of each query field have simply been added together. Once the aggregate fuzzy scores are calculated, the database records are ranked according to aggregate fuzzy score. The ranked database records 1110 are also shown in FIG. 11. With the homes ranked, the user can easily identify those homes that best match the user's query. As would be expected based on the input query, the home selling for $200,000 with 3 rooms and 3 baths is ranked the highest. While a traditional database search would not have retrieved any records outside the user's input target range of $200,000-$230,000, 3 rooms and 3 baths, the present invention has returned all 9 records. Worth noting is the home selling for $195,000 with 3 rooms and 4 baths and the home selling for $235,000 with 4 rooms and 3 baths. Both of these homes would not have been included within the search results in a traditional database search, but in the present invention are ranked high due to the incorporation of the fuzziness parameters in the Lorentzian fuzzy score formulas. Other embodiments and uses of the present invention will be apparent to those skilled in the art from consideration of this application and practice of the invention disclosed herein. The present description and examples should be considered exemplary only, with the true scope and spirit of the invention being indicated by the following claims. As will be understood by those of ordinary skill in the art, variations and modifications of each of the disclosed embodiments, including combinations thereof, can be made within the scope of this invention as defined by the following claims.
|
Same subclass Same class Consider this |
||||||||||
