Method for predicting ratings6249785Abstract Predictions are based primarily upon similarities in pairs of ratings, irrespective of the actual value of the ratings. A table is used to translate each pair of ratings into rankings that are used to make predictions of future ratings. Similar ratings are ranked higher than dissimilar ratings. The prediction is based upon the average ({character pullout}) of the books linked to the book of interest, as rated by the user, plus the difference (.delta.) between the average rating of the book of interest, as rated by all users, and the average ratings of the linked books, as rated by the user. The averages may be weighted by the rankings. Alternatively, the prediction is based upon the cumulative values applied to books linked to the books rated by the user, where the values are based upon the user's ratings of the rated books. Claims What is claimed is: Description COPYRIGHT AUTHORIZATION
p W
1 0
2 0.23
3 0.44
4 0.60
etc.
In FIG. 10, table 49 is a map of the average positive coefficients multiplied by the weighting function. If one compares FIG. 10 with FIG. 8, it is clear that there are no coefficients remaining that are based upon a single pair of rated books. In FIG. 11, table 51 represents the weighted, average, positive coefficients above a predetermined threshold. In one embodiment of the invention, the threshold was 0.36. For the sake of this example, with such a small population of books, the threshold is 0.30. What are left in table 51 are the coefficients and identities of links (pairs) that can reliably be used to predict the response to an unrated book by a given reader. FIGS. 8, 9, 10, and 11 represent a winnowing process for finding the desired coefficients. The winnowing process looks at similarity of rating and the numbers of ratings for the desired coefficients. Once found, the coefficients are used to predict a reader's response to a book that he has not read. FIG. 12 is the same as FIG. 1 with the addition of two predictions. In accordance with the invention, any book that was not rated by a reader can be the subject of a prediction if there are at least two links between that book and other books. Looking at book B in FIG. 12, three people have not read the book. In table 51 (FIG. 11), neither row B nor column B has any links. Therefore, there is insufficient information to make a prediction about book B. Reader #2 has not read book G. According to table 51 (FIG. 11), book A is linked to book G (column A) and book H is linked to book G (row H). Reader #2 has read book A and book H. Therefore, there is sufficient information to make a prediction about book G for reader #2. Because the reader has read the linked books, the reader's own scale of evaluation is used to provide a predicted value. Thus, it does not matter whether a reader tends to rate high, low, or in a condensed range somewhere in the scale of one to ten. The predicted rating will track the reader's rating technique. The coefficients shown in table 51 (FIG. 11) are quantized to one of eight levels. The quantizing steps are not linearly related and are adjusted to minimize the difference between predicted and actual ratings as the database is developed. At present, the quantizing table is as follows.
TABLE 55
from to .rho.
0.00 0.10 0.00
>0.10 0.20 0.10
>0.20 0.30 0.20
>0.30 0.40 0.35
>0.40 0.45 0.50
>0.45 0.50 0.65
>0.50 0.55 0.80
>0.55 1.00 1.00
Rho (.rho.) is determined by quantizing the coefficients from table 51 (FIG. 11) in accordance with table 55 above. .rho. is applied to the average ratings of the linked books, i.e. their popularity, to make a prediction. EXAMPLE 1 Reader #2 wants a predicted rating for book G, for which links exist in table 51 (FIG. 11) to books A and H. Link A,G has a .rho. of 0.35 (see FIG. 11 and table 55), as does link G,H. Reader #2 has rated book A as 5 and rated book H as 8, for a weighted average rating, {character pullout}, of 6.5 calculated as follows. ##EQU1## The predicted rating is this weighted average rating of the linked books (6.5) plus or minus the difference, .delta., between the average rating of the subject book, i.e. the popularity of book, G, and the weighted averages of the ratings of the linking books. The averages are also weighted in accordance with the quantized data from table 55 and, therefore, in accordance with the similarities of the ratings. ##EQU2## Substituting the values from table 55 (FIG. 11) and from FIG. 12 produces the following. (The average rating of book A by all readers is 6.5, the average rating of book H by all readers is 7.25, and the average rating of book G by all readers is 7.33.) ##EQU3## 7.33-6.88=0.45=.delta. {character pullout}+.delta.=rating 6.50-0.45=6.95 Thus, the predicted rating, after rounding, of book G for reader #2 is 7. EXAMPLE 2 For reader #3, sufficient data is available to predict a rating for book D. From table 51 (FIG. 11), book D is seen to be linked to book C (see row D) and book K (see column D). Reader #3 read both of these books and their weighted average rating is 8.5. The average rating of book C by all readers is 6.75, the average rating of book D by all readers is 6.0, and the average rating of book K by all readers is 6.75. ##EQU4## The difference, .delta., is arbitrarily limited to 0.5. Thus, the predicted rating of book D for reader #3 is 8. From the foregoing examples based upon ten books and four readers, it is believed clear how the process operates with a great many more books and a great many more readers. In one embodiment of the invention, in which the number of books exceeded one thousand, it appears that a user who rates fewer than twenty-five books generally receives either fewer predictions or less accurate predictions than users who have rated more books. This is believed due to the fact that too few books of interest are linked to two or more books rated by that user. Expanding the search to books that are indirectly linked to books rated by that user could improve the situation. The process can be implemented in a computer program written completely from scratch, e.g. in the language known as "C" ("ANSI C", "C", "C+", "C++" etc.), which is particularly well suited to handling arrays of data. Other languages could be used instead. If implemented as a new computer program, look-up tables rather than mathematical expressions are used to implement weighting functions for maximum computing speed. The invention has been implemented in a relational database program known as "Access". Other relational database programs could be used instead. In a relational database, not all data is stored in memory or on disk as a single file but is stored in several, related files. This is particularly advantageous to the invention because the file containing the rating pairs can grow exponentially and it is preferred to have as little other data in the file as possible. FIG. 13 illustrates a memory map of a file containing a plurality of records in which each record included a reader number, a book number, and a rating. A second file, illustrated in FIG. 14, contains a plurality of records including reader number, reader name, demographics data (e.g. address, age, gender), and flags for rapidly grouping data (e.g. inactive reader). The data in the two files is related by the reader number, which is common to records in both files. Similarly, a separate file contains book numbers and book names, as shown in FIG. 15. This data is related to the data in FIG. 13 by book number, which is common to records in both files. Thus, one can display the data from the file illustrated in FIG. 13 but substitute reader's names and book's names for numbers, making the display easily intelligible while minimizing the amount of data in the files. Appendix A is a script for creating the rating pairs as described above, including some debugging statements. Appendix B is a script for predicting ratings. Both of these scripts are for the "Access" database program. The invention can be implemented in a computer programmed as a neural network. The books having a rating are represented by a layer of interconnected input units, wherein the input is the rating. The input units are interconnected by units in a hidden layer to provide the weighting function. Output units are connected to the units in the hidden layer to provide a predicted rating for a book. FIG. 16 is a flowchart of the rating process in accordance with the invention. Step 101 presents the user with a suitable screen listing the books currently known to the database. The books can be divided into groups as desired for ease in presentation and searching. Step 102 asks for additional titles. If the response is positive, a new screen is provided with the appropriate blanks for adequately identifying the book. Step 102 is optional and can be omitted if a user wishes to rate books already listed in the database. Step 103 elicits the ratings by a reader of the particular books identified or selected. The ratings are stored in memory in association with the reader's name and in association with the book's name. Step 104 ends the reader's input and begins processing the data to create the book pairs, e.g. as illustrated in FIG. 3. Table of coefficients 105 is the same as illustrated in either FIG. 2 or FIG. 22. The coefficients favor similar ratings, as described above. Steps 107, 108, and 109 cover calculations for all books for all users and may take a considerable length of time for a large number of books, e.g. over one thousand. Preferably, the steps are executed "off-line", i.e. while the system is not being accessed by users. One could, after the program had reached a reasonable size, e.g. more than 1,000 books, make predictions based on old data while new data was being processed. This would give the appearance of immediacy and satisfy an anxious user but is not preferred. An alternative embodiment of the invention, described in connection with FIGS. 19, 20, and 21, provides immediate predictions based upon the user's ratings of some books and is the preferred way to provide immediate predictions. Greater breadth and more accurate predictions result from the off-line process illustrated in FIG. 17. Step 107 and weighting function 108 summarize the process illustrated in FIGS. 7-11. Step 109 summarizes the steps described above in Examples 1 and 2. FIG. 18 is a flowchart for making predictions for a user on-line after the ratings for all books by all users have been processed off-line. A user is asked to enter a request in step 111. The request can be more detailed that simply asking, for example, for the five most likely books. One can refine the search by including other criteria, such as the type of book, e.g. fiction or nonfiction, or further divisions within a type, or one can restrict the search by author or some other criteria. The additional criteria are used with the calculated predictions to reduce the number of titles suggested in a typical fashion for a Boolean search (AND, OR, NOT, etc.). In step 112, the computer performs the search according to the criteria provided. In step 113, one can optionally omit certain predictions, e.g. based upon gender. In one embodiment of the invention, it turned out that there were a disproportionate number of female readers in the database, with the result that a book like "Little Women" would be recommended to a sixty year old male reader. While the book is a classic and the gentleman enjoys classics generally, the prediction was not seen as appropriate. Step 113 allows one to accommodate age, gender, or other criteria automatically from demographic information on the reader (FIG. 14). In step 114, the titles that have survived the screening process are presented to the reader with the anticipated rating. If, since last using the program, a reader may have read one or more of the books, an opportunity is provided to enter the data in steps 116 and 117. The new data has no immediate effect until the data is processed, preferably off-line (FIG. 17). Step 120 enables the user to refine the search or to end the session, step 123. A first time user desiring immediate results can be accommodated by an alternative embodiment of the invention, illustrated in FIG. 19. This process is somewhat the reverse of the preferred embodiment in that the process starts with the user's rating of books in the database rather than upon linkages established by all the readers. Step 131 calls for the creation of a table containing all permutations of pairs of books in the database having a weighted coefficient above the predetermined threshold. Continuing the example of ten books and four readers, FIG. 20 illustrates such a table based upon table 51 (FIG. 11). The user is then asked to rate whatever books in the database are of interest to the user, step 132. The ratings are stored and a table from the ratings as illustrated in FIG. 21. In FIG. 21, table 140 represents the ratings of four books by a first time user. In table 140, the first column is the name of the book, the second column is the rating of the book by the user on a scale of 1-10, and the third column is an assigned value based upon the rating as follows.
Rating Value
10 +1.0
9 +0.8
8 0.0
7 0.0
6 -0.5
5 -0.8
4 -1.0
3 -1.0
2 -1.0
1 -1.0
The ratings generally reflect a tendency to rate a large number of books either a 7 or an 8. Thus, only ratings above or below this value are given significance. The values table is created as follows. In FIG. 21, the rating of book C has a value of +0.8. This value is the applied to every book linked to book C in the permutation table. Thus, as indicated by table 141, book D i s assigned a value of 0.8 and book K is assigned a value of 0.8. In the next line, book D was rated 5, which has a value of -0.5. This value is assigned to every book linked to book D in the permutation table. Book C is linked to books C and K. Book C is added to the table. Book K is already in the table; therefore, the value for book K is the sum of the values of the ratings (0.8-0.5=0.3). Book G is linked to three books that are not in the table, A, F, and H. These books are added to the table with the rating of book G, producing table 143. Book H is linked to books A and G (FIG. 20) but was rated 7. In this case, the rating is ignored because its value is zero, producing table 144, which is the same as table 143. This completes step 133 (FIG. 19). As indicated by bracket 134, steps 131, 132, 133 can be combined into a single step wherein a complete table of permutations is not first generated. Instead, the appropriate lines of the values table are generated as needed, depending upon which books are rated by the user. The values in the resulting table are then sorted, preferably in descending order, step 137, and the top n books are presented to the user, excluding the ones already read by the user. Alternatively, or in combination, books with a rating value above a predetermined threshold can be recommended to the user. As illustrated in FIG. 21, books A and F are recommended to the user. FIG. 22 is a table of coefficients for use in step 104 and is the actual table used in a database of 1,139 books and 80 readers. Although lacking the symmetry of the table in FIG. 2, table 150 provides a prediction within 1 of an actual rating ninety-four percent of the time. This is far better than other schemes in the prior art, many of which are not even quantitative. The system was tested by deleting the rating of a book by one user, recalculating coefficients, and then asking for a prediction of a rating of that book by the user. The invention thus provides an apparatus and method for accurately predicting a person's subjective choices based upon accurate estimates that are unaffected by individual rating patterns. Having thus described the invention, it will be apparent to those of skill in the art that various modifications can be made within the scope of the invention. For example, it is possible to use secondary links for consistent, frequently occurring pairs of ratings. That is, one may not find more than one book rated by a user that is linked to a book in question but may find several books rated by the user that are linked to the one or more books that are linked to the book in question. Prediction could proceed on the basis of the several links, perhaps with adjustment to the weighting values. The adjustments provided by the tables described above enable one to refine the predictions significantly using real rather than test data. Although described in the context of books, the invention can be applied to many other areas of predicting behavior. Rounding off the final number for the predicted rating can be performed differently at the extremes of the ratings than in the middle. Low numbers are rounded up and high numbers are rounded down for any fractional portion. This is to accomodate the rating patterns of the actual users of the database. The sorting step in FIG. 19 can be omitted by having the computer search for the n highest values.
|
Same subclass Same class |
||||||||||
