Automated content and collaboration-based system and methods for determining and providing content recommendations6438579Abstract A content and collaborative filtering system for recommending entertainment oriented content items, such as music and video, and other media content items to a user based on similarity in profile between the user and other users and between the content indexed in the user's profile and other content in the database. The system stores implicit and explicit ratings data for such content items provided by the users. Upon request of the user, the system accesses the user's profile and corresponding content interests database. The system uses the relationships between the content items to determine a subset of the content items to be referred to the user. The system also correlates a similarity between the user's ratings of the content items and other users' ratings. Based on the correlations, a subset of users is selected that is then used to provide recommendations to the user. The recommended items have a high probability of being subjectively appreciated by the user. The recommendations produced by the system will be represented to the user using a visual representation of the relationships between the content items allowing the user to explore the items related to the recommended items. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE I
Data Tables
Table Description
1. Favorites Internally stores identifying information about media
content items selected and/or rated by a user including
rating weight and rating confidence information. For
example, an external representation of this table or a set
of such tables may be used to store particular media
content items, user lists of artists and collections
of interest, and other tables generally organized by
the user to reflect characterizing attributes of media
content items and set of such items that are of some
particular interest to the user. Preferably, an in-
memory temporary table with persistent database storage.
2. Target Internally contains categorization details of user groups
Clusters preferably on the basis of the strength of interest
relative to some distinguishing characterizing attributes.
This information can be used as an index to improve the
performance of collaborative-oriented intermediary
production of media content item recommendations.
Preferably, an in-memory temporary table with persistent
database storage.
3. User Internally contains identifying information reflecting the
Profile information contained in user profiles, including
characterizing attribute and media content ratings, for the
media content and media content items linked to a user.
The information in this table is preferably derived from
explicit rating information provided by the user and
through implicit observations performed by the system
against user browsing actions. Preferably, an in-memory
temporary table with persistent database storage.
4. Results Internally contains favorite, target cluster, user
correlation, collaborative, and content results data
generated in the process of identifying potential
recommendations and final results data determining a
recommendation set that may be presented to a user.
Preferably, an in-memory temporary table with persistent
database storage.
5. Content Contains linking and weighing information for
Relations characterizing attributes of media content items,
including for example in the case of music oriented
media content items, artist, tracks, collections, and
genres. Preferably, an in-memory temporary table with
persistent database storage.
When a user provides a request to the processing system 42, the working in-memory tables, including the final result tables, are cleared. Recommendations are then determined based on the Favorite media content items identified by the user, employing the content database and the user profiles tables. The recommendations are then stored in a final results table. The media content items in the final results table are then preferably sorted using the weights of the characterizing attributes as a key with the output list of sorted items being displayed to the user. In a preferred embodiment of the invention, the output of media content items are presented on a user accessible display, such as presented by a personal computer monitor, kiosk display, personal organizer touch pad screen, mobile phone display, and other communications connected informational screens. A detailed view of the structure of a preferred embodiment of the present is presented in FIG. 2. The logical system architecture 50 of the referral system preferably operates from one or more industry databases 52 that contain lists and information regarding available media content items. A music-oriented industry database would typically contain lists of singles, albums, CDs produced by artists and other organizations. Biographical information on listed artists and liner notes and pictures for particular media content items may be included. Media clips and content samples, tour poster images, and other images and documentation may also be included or referenced in other, potentially third-party databases. An expert weighting filter 54 provides a logical map of the various items listed in the industry database 52 relating and providing weighting factors for those items that share characterizing attributes. The map data is preferably stored as sets of one or more binary relations qualified by weighting values. The map data for a music track might include:
{Artist.sup.1 <=> Tract.sup.A (1.0)} {Tract.sup.A <=>
Movie.sup.5 (1.0)}
{CD.sup.X <=> Tract.sup.A (1.0)} {Artist.sup.1 <=>
Collection.sup.M (1.0)}
{Orchestral <=> Tract.sup.A (0.6)} {Artist.sup.1 <=>
Artist.sup.2 (0.4)}
{Artist.sup.1 <=> Orchestral(0.1)} {Orchestral <=>
Tract.sup.A (0.6)}
{Artist.sup.1 <=> 1990 Blues(0.8)} {Artist.sup.1 <=> 1980
Soul(0.7)}
A map weighting value of 1.0 preferably indicates a fixed relationship: a particular artist recorded a particular track. Although a weighting value is assigned, such relations are definitionally just binary relations. Lesser weighting values, which characterize weighted relations, represent a subjective expert opinion on similarity. Thus, this map can therefore be used to selectively filter media content items listed in the industry database 52 based on particular characterizing attributes and further qualified, optionally, by a minimum weighting value. A final weighting filter 56 is preferably used to combine the product of the expert weighting filter 54 with group behaviors 60 that are collected from the users of the system 50 and behaviors obtained from other sources. In the preferred embodiment of the present invention, the group behaviors reflect the consideration and review of different media content items collectively by the users. Preferably, these behaviors are represented in the final weighting filter 56 again as a map constructed of binary relationships between characterizing attributes of media content items qualified by weighting values. The further filtering performed through the final rating filter 56 thus effectively implements a collaborative function reflecting the values and interests of the user community resulting in a desirable, though preferably attenuated, bias in the selection of the recommendation set produced by the system 50. The other behaviors collected as part of the group behaviors 60, when used, is preferably derived from externally generated polls, rankings and ratings of different media content items. These external behavior sources may include weekly and other top-ten lists of popular media content items, published reviews of significant new artistic works, current sponsorship programs, and advertising. In each of these cases, the information provided by these sources is externally resolved into a form that, processed through the collected group behaviors 60, may be applied as a filtering map with the collected user behaviors against the product of the expert weighting filter. Over time, the final weighing filter may therefore effectively subsume the function and even operation of the expert weighting filter 54. The product of the final weighting filter 56 is prepared and provided as one of two information inputs into a referral system 62. The second input is obtained from a current user profile 64 created by the current user. That is, whenever a user logs in or is otherwise identified to the system 50, the user profile 64 is accessed from the profiles database 24. The user profile preferably contains data representing the characterizing attributes of media content items that are personally of interest to the user. This data is used to define the subjective relative rankings of particular artists, genres, and media content items as viewed by the user. Preferably, however, this data is ultimately stored in the user profile 64 again as binary media content relations. While some aspects of the user profile 64 may be edited directly 66, including specific statements of user identity and specific interests in different media content and content items, much of the user profile 64 is preferably obtained from a user behaviors 68 analysis. User actions, obtained by monitoring direct user input actions 66 and user browse actions 70 in navigating recommendation sets 72 are preferably examined to identify general and particular interests of the user and gauge the relative strengths of these interests. As generally shown in FIG. 3, a behavior analysis system 78 collects explicit and implied behaviors from the user. Explicit behaviors are defined as direct actions taken by a user that directly identify a level of interest relationship between characterizing attributes media content items. As such, an empirical selection of explicit behaviors can be identified as reflecting the most direct indications of user interest. In a preferred embodiment of the present invention, the explicit behaviors monitored and analyzed in connection with the development of user profiles are listed in Table II.
TABLE II
Explicit User Behaviors
Activity Description
1. Interviews On initial establishment of a user profile, a user
may elect to be interviewed or surveyed to collect
information helpful in constructing an initial
profile.
2. Ongoing Ratings A specific ratings request presented whenever a
particular content item is considered by a user;
provided to allow the user to continually update
and refine the user's profile.
3. Rating of Specific Spot-light type quick rating poll presented to user
New Content regarding new or special content items.
4. Review Ratings Rating of perceived value of opinions expressed
by particular analysts, periodicals, and other
information resources.
5. Post Purchase Prompted rating of prior content purchases.
Ratings
6. Profile Changes Edits and specific changes made to the user
profile.
As indicated in FIG. 3, the user may be interviewed, surveyed, and variously questioned initially and on an ongoing basis 84 to obtain direct statements of user interest in specific media content attributes and content items and the relative strength of these interests. Preferably, user directed edits of the user profile are also supported 86. To support these edits, the user is preferably presented with a representation of the interest relationships stored in the user profile 64, allowing the user to adjust the relations including, in particular, the weighting values of the displayed relations. Implicit behaviors monitored and analyzed are likewise identified empirically, though primarily from the actions a user makes in navigating a recommendation set 88, including reviewing and considering individual and groups of the media content items recommended. The implicit behaviors 88 recognized from the monitored and analyzed browsing actions 88, in a preferred embodiment of the present invention, in support of the ongoing development of user profiles, are listed in Table III.
TABLE III
Implicit User Behaviors
Activity Description
1. Searches Items and criteria specified as search parameters
2. Pre-Screening Items listened to or viewed.
3. Document Review Artist and Collection descriptions reviewed
4. Content Reviews Viewing of reviews by analysts
5. Purchase Actions Items added to a purchase list, gift list, and
actually purchased.
6. Adding to List Items added to wish and reminder lists
7. Browsing Time Time spent in connection with particular Items,
collections and genres
These implicit user behaviors 80 are analyzed to identify media content attribute and media content item interests implicitly expressed by the user through browsing activities. Preferably, the result of this analysis is again a set of binary relations between characterizing attributes of media content items and a relative weighting of the relations representing the strength of the interests. The binary relations and weightings produced from the explicit behaviors 82 and implied behaviors 80 are represented as ratings 90 that are then stored in the user profile 64 within the profile store 24. In a preferred embodiment of the present invention, the ratings are represented as normalized values within a range of 1.0 to -1.0, inclusive. Preferably, confidence levels 92 are also produced with the ratings 90. For explicit behaviors 82, the confidence level is generally represented as a normalized 9.0 value. For implied behaviors 80, the confidence level is itself a product of the user action analysis. Thus, confidence levels for implied behaviors are empirically-based reflections of the certainty that the monitored user actions represent an interest, and the determined strength of that interest, by a user. Normalized, the confidence levels for implicit ratings are preferably in the range between 9.0 and 0.0. Confidence levels are preferably produced and used subsequently in connection references to the user profile 64 by the referral system 62. These confidence levels are preferably also maintained for general use as part of the collected group behaviors 60. While the expert weighting filter 54 may also include confidence level data for the relationships established as part of the filter 54, preferred embodiments of the present invention do not utilize such confidence data. Rather, the weighted relations data provided by the expert weighting filter is accepted as provided with any subsequent modifications, by whatever party maintains and updates the expert weighting filter 54 data, as representing any changing in the weighted relations over time. Conversely, the present invention recognizes the potential for change in user interests over time by progressively reducing the confidence levels associated with at least the user and group implied behavior ratings. The rate of progression is again empirical, though subject to testing based on the variance in user actions that support the continued rating and confidence level of particular evidenced interests. Consequently, the active use of the system 50 over time enables the user profile 64 to remain as a close reflection of the user interests, even as those interests may change over time. Equally, interests incorrectly presumed to exist through implied behavior analysis are unlikely to be repeated or to be repeated frequently, resulting in the confidence levels associated with those ratings to be downgraded over time. Referring to FIG. 4, the resulting set of user profiles 24 can be viewed as a pool or sparse matrix 96 of interrelated characterizing attributes derived from explicit, implicit and other direct rating information source 100, 102, 104. Preferably, the relations are further separately identifiable by identifications of the individual users who have profiles 64. Another view of the sparse matrix 96 is shown in FIG. 5. The cells of the matrix 96 store data for combinations of users and particular characterizing attributes. Here, specific instances of collections (Co.sub.N), tracks (Tr.sub.X), artists (Ar.sub.Y) and genres (Ge.sub.Z) are correlated against the ratings and confidence levels of individual users. Again referring to FIG. 2, the portion of the sparse matrix 96 corresponding to the current user of the system 50 is presented as the current user profile 64 to the referral system 62. In response to a user request action, the referral system 62 produces a recommendation set 72. The form of these requests may be varied. Each request as made presentable by a current user, however, preferably identifies some basis or starting point for the media content items 52 known to the system 50 to be refined into a recommendation set 72. Preferably, a user request action identifies a media content item to the system 50. Other request types, are listed in Table IV.
TABLE IV
Request Types
Basis Description
1. Media Content provide a set of recommendations based on or
Item similar to a particular media content item
consistent with the user profile.
2. New Dance provide recommendations of new releases in the
Dance genre consistent with the user profile.
3. Top Ten Pop provide recommendations of media content
Tracks items similar to the current top ten pop tracks
that are consistent with the user profile.
4. Re-Releases provide recommendations of recent re-released
collections based on or consistent with the user
profile.
As illustrated in FIG. 6, the referral system 62 operates preferably as a graph traversal system over a data set collectively constructed from the user profile 64 and the product of the final weighting filter 56. Specific relations that are fixed as one-to-one, such as between an artist and a particular track, are defined as binary relations. Other, in effect, subjective relations are weighted relations. The value of the weighted relations are specific to the particular characterizing attributes related, such as between different tracks and between an artist and a specific genre. Based on the weighted relations between different characterizing attributes of the known media content items, a traversal of the data set can be made from any request identified starting point to a set of the most strongly related other media content items. Thus, a request to identify a media content collection similar to a given track may result in a graph traversal: Track.sub.1.fwdarw.Artist.sub.1.fwdarw.Genre.sub.A.fwdarw.Artist.sub. 2.fwdarw.Tracks.sub.2.fwdarw.Collection.sub.X, where each step of the traversal is qualified by the weighted rating and confidence level of the step. Each completed traversal therefore has a final computed rating and confidence level. The weighting value accumulated for traversal steps are derived from the corresponding, if any, weighted relations given in the sparse matrix of the user profile 64 and the relation weightings provided from the final weighting filter 56. Preferably, an empirical normalization is applied by the referral system to correspondingly balance the relative significance that is placed on weightings provided separately by the user profile 64 and final weighting filter 56. In similar manner, an empirical normalization is applied by the final weighing filter 56 relative to the weightings received from the collected group behaviors and the expert weighting filter 54. Thus, normalized, traversals that complete may then be ranked and sorted based on whatever criteria selected by the user, whether alphabetically by artist, total strength rating, or level of confidence. Traversals that are not completed can be the result of an accumulated rating and confidence level falling below a defined threshold. Other traversals may not be completed once they exceed a defined limit in the number of steps. Based on the presented recommendation set 72, the user is enabled to browse 70 the identified set of media content items and present a new request to the system 50. This new request may be independent of the recommendation set 72, though preferably is based on some characterizing attribute of the set 72. Thus, in reviewing and considering the individual merits of the media content items within a recommendation 72, the user may reference a media content item or characterizing attribut of the item for use as the basis for new request and generation of a corresponding recommendation set 72. This cycle may be repeated, each time providing additional information refining the user profile 64 as to the interests of the user and deepening the search for media content items that are of particular interest to the user. Finally, FIGS. 7A, 7B, and 7C present a flowchart for the overall system 50 for making recommendations to a user according to a preferred embodiment of the invention. FIGS. 7A, 7B, and 7C show the logic flow that the system 50 follows to accept input from and provides results to the user of the system. The examples below illustrate the operation of a preferred embodiment of the present invention to provide a music recommendation service. EXAMPLES The preferred uses of the system can be grouped into two main classes. The first class relates to uses where the system assists the user in narrowing down the number of choices that the user is faced with, at which point the user begins exploring the recommendations and related items using the navigation aids provided by the system before selecting an item to purchase or consume. An example of a use of this first group would be to help users identify compact discs that they may be interested in purchasing. The system would suggest a list of compact discs, and the user would then look at the details of the albums individually, and may listen to some preview samples of the tracks on the album. Alternatively, the user may navigate to related items (albums, artists, genres etc.) using the relationship navigation tools. Either way they would eventually decide on which item to purchase based on the information provided to them. The second class of possible uses is where the user makes a purchase (or consummation) of a media item recommended by the system based solely on the system's recommendation. In this scenario, the user demonstrates enough trust in the service to accept the automatically generated recommendations. A further example of this class of use is the recommendation of music content that would be automatically purchased and played to the user on a track-by-track basis. Unlike pure collaborative-based systems, the invention described would bootstrap its knowledge with minimal preference information (a single album liked) using the relationships inherent between content items using prescribed attributes such as genre, artist, year etc. In both classes of examples, the user would be able to provide feedback to the system regarding the recommendations indicating the degree to which the user liked the recommended item. The following is a demonstration of the operation of the system as it might be implemented for a database of music albums that are available for purchase from an online fulfiller. It tracks the operation of the system from the initial user preference input through to the recommendation of music items being provided to the user. First, the intermediate tables are cleared of data relating to the user recommendations. This occurs automatically and requires no input or intervention by the user. The user is then prompted to enter their favorite music items into the system. The user may have previously entered the information into the system. The number of items entered may be one or more items. The user may explicitly enter music items and ratings using a form style interface or the system may derive implicit ratings of music items based on system-based observations of user actions. The music items may include but not be limited to artists, albums, genres, and tracks. The music items may be specific to a group of related styles of music such as pop, easy listening and dance/pop or they may be across unrelated styles such as pop and death metal. In the first case the system will recommend music items in the group of related styles. In the second case, the system will prompt the user for the style of music to be used for the basis of the recommendations. The system would then restrict the recommendations to styles closely related to the users selected style. The ability for the system to automatically detect diverse musical tastes will enable the system to support the delivery of recommendations based on the user's currently desired style. For example, the user may request different music for a dinner party than when playing video games. The system checks if a restriction is required. If it is, the system loops through the favorites table and removes any items, which does not relate to the style selected by the user. The system splits the processing into two streams, the content recommendation and the collaborative recommendation streams. Content Recommendation The system accesses the first item in the user's favorites table, in this case the artist Pet Shop Boys. This item has a rating of 9 out of a possible 10. The system compares the item's rating with the current best, set initially to zero as no items have been processed. As it has a higher rating than the current highest rating, the system adds the music item to the content inputs table. It then checks if there are more items in the user's favorites table. There are two remaining items, New Order(6) and The Cranberries (6). Both have a lower rating than the first item and thus would not be added to the content inputs table. It is possible that the restriction of the number of items in the inputs table could be increased from one to more items (possibly three to five) to give the system a larger range of items to base the recommendations on. After processing the remaining two items, the system determines that no more items remain in the favorites table. At this point, the system has a content inputs table containing Pet Shop Boys with a rating of 9. The system then accesses the first item in the content inputs table. This item is the Pet Shop Boys with a rating of 9. The system searches the artist, artist association, album and genre tables retrieving music items that are related to the content inputs music item. Each item found is added to the content result table with the associated relationship weight. In this case, Pet Shop Boys belongs to the genres British Pop and 1980s Dance. These two genres are added to the content result table. The Pet Shop Boys are like the group New Order using the Artist Association table. The Pet Shop Boys have an album that has been rated highly called Bilingual. These items are all related to the Pet Shop Boys and are added to the Content Result Table. Collaborative Recommendation The system accesses the items in the favorites input table and converts them into a vector stored in memory. The vector is an array-based representation of the favorites input table that contains the item and user rating for that item. In this case the vector contains, as shown in FIG. 5, Pet Shop Boys, New Order, and the Cranberries. The system accesses the Cluster table. The cluster table contains a finite number of vectors that represent predefined clusters of users. The cluster table is used to improve the performance of the system by not requiring the system to compare the user vector with every other user vector. The system accesses the first vector in the Cluster table and performs a correlation algorithm, detailed above, to determine the correlation between the cluster and the user. In this case the cluster is Dance and the correlation is 94%. If the correlation were higher than the current highest, which it would be, as the high would be set to zero initially; the cluster is added to the target cluster table. The system then checks if there is more cluster vectors in the cluster table. If there are it loops over the remaining clusters, in this case, Heavy Metal and Rock and Roll, calculating the cluster vector's correlation with the user vector, comparing the correlation to the highest correlation and replacing the vector in the target cluster table with the cluster vector being processed (if the correlation is equal to the highest correlation then the target cluster table will contain two vectors or more). In this case, neither of the remaining vectors correlates better with the user's vector than the Dance cluster. The system has now determined that the user best fits into the cluster called Dance and can narrow down the search for matching users to a subset of the users very quickly. Users can be related to many clusters if they have diverse musical tastes. Once the system has determined the target cluster for the current user, it updates the user profile table linking the determined target clusters to the user and time stamping the entries. This updating of the user profile allows the system, for performance benefits, to skip the cluster determination steps if the requests falls within the time-frame for the user's cluster link time-stamps to be valid. In this example case, the step has been ignored to provide a more detailed example of the process. The system now accesses the first vector in the target cluster table, Dance. The system then searches the user profile table for the first user(s) linked to the Dance cluster, in this case John and David. The system then calculates the correlation between the selected user profile and the current users profile: John (89%), David (75%). If the correlation meets the correlation threshold, indicating similar tastes, the system would compare the two user profiles, identifying any items contained in the user profile vector that were not present in the current user profile. A weight for each item would be determined by multiplying the correlation with the rating to give the correlated rating weight. If the correlated rating weight is above the correlation weighted rating threshold, indicating the user is highly likely to like the item, the item would be added to the collaborative result table. If the item already exists in the table, the stored rating would be replaced by the average of the two ratings. The system would then check if there were more user profiles linked to the target cluster table. If there were, the system would access the remaining user vectors, processing each as above. Once the system has completed processing each user linked to the target cluster, the system would check if any target clusters remained in the target cluster table. If there were, the system would process each cluster, correlating the users in the cluster with the current user and determining items which are highly likely to be liked by the current user. At this point, the system has completed the collaborative filtering process. A collaborative result table exists which contains music items and their associated correlation weighted rating, which is an indicator of how likely the user is to like the item. Sort and Display Once the content related items have been added to the content result table and the collaborative filtering generated items have been added to the collaborative result table, the system combines the two tables together, removing duplicates (averaging the rating weights). The items in the result table are then compared with the users favorite items table with any duplicates removed from the result table. This ensures that the system does not display items the user has already rated in the recommendations. If there are no items in the results table, the system displays a "no results found" page. If there are items in the result table, the system sorts these items in descending order using the weight as the key and displays the results to the user. The items with the highest weight are the items most strongly recommended to the user, hence the sort key. The user will be able to explore further the recommendations by providing additional rating information to the system. The can use the list of results as the basis for subsequent decisions as to which items to sample, purchase, or consume. While the embodiments set forth above are discussed in terms of recommending musical compact discs, the present invention can also be used to recommend other items, such as videos, digital music (e.g. MP3 files), television shows, books and other consumer entertainment media content. In view of the above description of the preferred embodiments of the present invention, many modifications and variations of the disclosed embodiments will be readily appreciated by those of skill in the art.
|
Same subclass Same class Consider this |
||||||||||
