|
|
|
Market analysis, demand forecasting or surveying |
Predictive modeling of consumer financial behavior using supervised segmentation and nearest-neighbor matching6839682
Abstract
Predictive modeling of consumer financial behavior, including determination of likely responses to particular marketing efforts, is provided by application of consumer transaction data to predictive models associated with merchant segments. The merchant segments are derived from the consumer transaction data based on co-occurrences of merchants in sequences of transactions. Merchant vectors represent specific merchants, and are aligned in a vector space as a function of the degree to which the merchants co-occur more or less frequently than expected. Consumer vectors are developed within the vector space, to represent interests of particular consumers by virtue of relative vector positions of consumer and merchant vectors. Various techniques, including clustering, supervised segmentation, and nearest-neighbor analysis, are applied separately or in combination to generate improved predictions of consumer behavior.
Claims
We claim:
1. A method of predicting financial behavior of consumers, comprising:
obtaining a set of input transactions for a plurality of consumers with respect to a plurality of merchants;
defining at least one merchant segment, each merchant being associated with at least one of the defined merchant segments; and
for at least one consumer, applying the input transactions of the consumer by a computer to each of at least one merchant segment predictive model, each merchant segment predictive model defining for a merchant segment a prediction function between input transactions in a past time interval and financial behavior in a subsequent time interval, to produce for each consumer a predicted behavior in each of at least a subset of the merchant segments.
2. The method of claim 1, wherein the predicted behavior comprises a likelihood of positive response to an offer.
3. The method of claim 1, wherein the predicted behavior comprises a spending level with respect to a merchant.
4. The method of claim 1, further comprising:
generating a consumer vector for each of at least a subset of the consumers;
generating a merchant vector for each of at least a subset of the merchants;
wherein defining at least one merchant segment comprises performing supervised segmentation on the merchant vectors.
5. The method of claim 4, wherein performing supervised segmentation comprises applying a learning vector quantization algorithm to the merchant vectors.
6. The method of claim 4, wherein performing supervised segmentation comprises:
initializing a set of segment vectors;
accepting at least one segment label for at least one of the merchants; and
for each of at least a subset of the labeled merchants:
selecting at least one segment vector for a merchant having a merchant vector;
determining whether the selected segment vector matches the segment label for the merchant; and
responsive to the determination, adjusting zero or more of the segment vectors.
7. The method of claim 6, wherein selecting at least one segment vector for a merchant comprises selecting a segment vector that is closest to the merchant vector corresponding to the merchant.
8. The method of claim 6, wherein selecting at least one segment vector for a merchant comprises selecting at least one segment vector having a tolerance range that includes the value of the merchant vector.
9. The method of claim 1, further comprising:
training a predictive model using the predicted behavior of a plurality of consumers in at least a subset of the merchant segments, and additional observed behavior for the plurality of consumers with regard to a target segment not included in the subset of merchant segments; and
for at least one target consumer:
providing as input to the trained predictive model predicted behavior of the target consumer in at least a subset of the merchant segments; and
obtaining from the trained predictive model a predicted behavior of the target consumer with respect to the target segment.
10. The method of claim 1, further comprising:
for at least one consumer, associating the consumer with the merchant segment for which the consumer had the highest predicting spending relative to other merchant segments.
11. The method of claim 1, further comprising:
generating a consumer vector for each of at least a subset of the consumers;
generating a merchant vector for each of at least a subset of the merchants;
for at least one merchant segment, determining a segment vector as a summary vector of merchant vectors of merchants associated with the segment; and
for at least one consumer, associating the consumer with the merchant segment having the greatest dot product between the segment vector of the segment and a consumer vector of the consumer.
12. The method of claim 1, further comprising:
for at least one merchant segment:
ranking the consumers by their predicted spending in the merchant segment; and
determining for at least one consumer a percentile ranking in the merchant segment; and
for each consumer:
determining the merchant segment in which the consumer's percentile ranking is the highest, to uniquely associate each consumer with one merchant segment; and
for at least one merchant segment, determining summary transaction statistics for the consumers uniquely associated with the merchant segment.
13. The method of claim 1, further comprising:
for at least one merchant segment:
ranking the consumers by their predicted spending in the merchant segment;
determining for at least one consumer a percentile ranking in the merchant segment;
selecting as a population, the consumers having a percentile ranking in excess of predetermined percentile threshold; and
determining summary transaction statistics for selected population of consumers.
14. The method of claim 1, further comprising:
establishing for at least one merchant in the transaction data a merchant vector; and
updating the merchant vector of at least one merchant relative to the merchant vectors of other merchants according to co-occurrences of each merchant in the transaction data.
15. The method of claim 14, further comprising:
updating the merchant vector of at least one merchant based upon an unexpected amount deviation in a frequency of co-occurrence of the merchant with other merchants.
16. The method of claim 14, further comprising:
determining a co-occurrence frequency for at least one merchant with at least one other merchant in the transaction data;
determining for at least one pair of merchants, a relationship strength between the pair of merchants based on how much the determined co-occurrence frequency deviates from an expected co-occurrence frequency;
for at least one pair of merchant vectors, mapping the relationship strength into a vector space as a desired dot product between respective merchant vectors the merchants in the pair; and
updating at least one merchant vector so that the actual dot products between each pair of merchant vectors approximate the desired dot product between the merchant vectors.
17. The method of claim 16, wherein determining for at least one pair of merchants a relationship strength between the pair of merchants further comprises:
determining the relationship strength by ##EQU23##
where
r.sub.ij is the relationship strength between merchant.sub.i and merchant.sub.j in a pair of merchants;
T.sub.ij is the actual co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data; and
T.sub.ij is the expected co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data.
18. The method of claim 16, wherein determining for at least one pair of merchants a relationship strength between the pair of merchants further comprises:
determining the relationship strength by
r.sub.ij =sign(T.sub.ij -T.sub.ij).multidot.√2ln .lambda.
where
r.sub.ij is the relationship strength between merchant.sub.i and merchant.sub.j in a pair of merchants;
.lambda. is a log-likelihood ratio;
T.sub.ij is the actual co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data; and
T.sub.ij is the expected co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data.
19. The method of claim 16, wherein determining for at least one pair of merchants a relationship strength between the pair of merchants further comprises:
determining the relationship strength by ##EQU24##
where
r.sub.ij is the relationship strength between merchant.sub.i and merchant.sub.j in a pair of merchants;
.lambda. is a log-likelihood ratio;
T.sub.ij is the actual co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data; and
T.sub.ij is the expected co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data.
20. The method of claim 16, wherein updating at least one merchant vector so that the actual dot products between the at least one pair of merchant vectors approximates the desired dot product between the merchant vectors comprises a gradient descent update that updates the merchant vectors according to whether the actual dot product between them is greater or lesser than the desired dot product.
21. The method of claim 16, wherein updating at least one merchant vector so that the actual dot products between the at least one pair of merchant vectors approximates the desired dot product between the merchant vectors comprises determining for at least one merchant vector an error weighted average of the desired positions of the merchant vector from current position of at least one other merchant vector and the desired dot product between the merchant vector and at least one other merchant vector.
22. The method of claim 1, further comprising:
generating a consumer vector for each of at least a subset of the consumers;
generating a merchant vector for each of at least a subset of the merchants;
determining for at least one merchant name in the transaction data a merchant vector;
clustering the merchant vectors to form a plurality of merchant segments, wherein at least one merchant vector is associated with one and only one merchant segment; and
for at least one merchant segment, determining from the transactions of consumers at the associated merchants of the merchant, statistical measures of consumer transactions in the segment.
23. The method of claim 1, further comprising:
selecting a plurality of consumers associated with at least one merchant segment, the selected plurality selected according to their predicted spending in the merchant segment; and
providing promotional offers to the selected plurality of consumers.
24. The method of claim 1, further comprising:
training at least one of the merchant segment predictive models to predict spending in a predicted time period based upon transaction statistics of the consumer's transactions in a past time period.
25. The method of claim 24, wherein the transaction statistics comprises variables describing the recency of the consumer's transactions in one or more merchant segments, the frequency of the consumer's transactions in one or more merchant segments, and the amount of the consumer's transactions in one or more merchant segments.
26. A system for predicting financial behavior of consumers, comprising:
a database for storing a set of input transactions for a plurality of consumers with respect to a plurality of merchants;
at least one merchant segment, each merchant being associated with at least one of the defined merchant segments;
at least one merchant segment predictive model, for defining for a merchant segment a prediction function between input transactions in a past time interval and financial behavior in a subsequent time interval, to produce for each consumer a predicted behavior in each of at least a subset of the merchant segments.
27. The system of claim 26, wherein the predicted behavior comprises a likelihood of positive response to an offer.
28. The system of claim 26, wherein the predicted behavior comprises a spending level with respect to a merchant.
29. The system of claim 26, further comprising:
a merchant vector build module, coupled to the database, for generating a merchant vector for each of at least a subset of the merchants;
a consumer vector build module, coupled to the database, for generating a consumer vector for each of at least a subset of the consumers; and
a segmentation module, coupled to the merchant vector build module, for performing supervised segmentation on the merchant vectors.
30. The system of claim 29, wherein the at least one merchant segment predictive model applies a learning vector quantization algorithm to the merchant vectors.
31. The system of claim 26, wherein:
the merchant vector build module determines, for at least one merchant segment, a segment vector as a summary vector of merchant vectors of merchants associated with the segment; and
the consumer vector build module associates at least one consumer with the merchant segment having the greatest dot product between the segment vector of the segment and a consumer vector of the consumer.
32. A computer-readable medium comprising computer-readable code for predicting financial behavior of consumers, the computer-readable medium comprising:
computer-readable code adapted to obtain a set of input transactions for a plurality of consumers with respect to a plurality of merchants;
computer-readable code adapted to define at least one merchant segment, each merchant being associated with at least one of the defined merchant segments; and
computer-readable code adapted to, for at least one consumer, apply the input transactions of the consumer to each of at least one merchant segment predictive model, each merchant segment predictive model defining for a merchant segment a prediction function between input transactions in a past time interval and financial behavior in a subsequent time interval, to produce for each consumer a predicted behavior in each of at least a subset of the merchant segments.
33. The computer-readable medium of claim 32, wherein the predicted behavior comprises a likelihood of positive response to an offer.
34. The computer-readable medium of claim 32, wherein the predicted behavior comprises a spending level with respect to a merchant.
35. The computer-readable medium of claim 32, further comprising:
computer-readable code adapted to generate a consumer vector for each of at least a subset of the consumers;
computer-readable code adapted to generate a merchant vector for each of at least a subset of the merchants;
wherein the computer-readable code adapted to define at least one merchant segment comprises computer-readable code adapted to perform supervised segmentation on the merchant vectors.
36. The computer-readable medium of claim 35, wherein the computer-readable code adapted to perform supervised segmentation comprises computer-readable code adapted to apply a learning vector quantization algorithm to the merchant vectors.
37. The computer-readable medium of claim 35, wherein the computer-readable code adapted to performing supervised segmentation comprises:
computer-readable code adapted to initialize a set of segment vectors;
computer-readable code adapted to accept at least one segment label for at least one of the merchants; and
computer-readable code adapted to, for each of at least a subset of the labeled merchants:
select at least one segment vector for a merchant having a merchant vector;
determine whether the selected segment vector matches the segment label for the merchant; and
responsive to the determination, adjust zero or more of the segment vectors.
38. The computer-readable medium of claim 37, wherein the computer-readable code adapted to select at least one segment vector for a merchant comprises computer-readable code adapted to select a segment vector that is closest to the merchant vector corresponding to the merchant.
39. The computer-readable medium of claim 37, wherein the computer-readable code adapted to select at least one segment vector for a merchant comprises computer-readable code adapted to select at least one segment vector having a tolerance range that includes the value of the merchant vector.
40. The computer-readable medium of claim 32, further comprising:
computer-readable code adapted to train a predictive model using the predicted behavior of a plurality of consumers in at least a subset of the merchant segments, and additional observed behavior for the plurality of consumers with regard to a target segment not included in the subset of merchant segments; and
computer-readable code adapted to, for at least one target consumer:
provide as input to the trained predictive model predicted behavior of the target consumer in at least a subset of the merchant segments; and
obtain from the trained predictive model a predicted behavior of the target consumer with respect to the target segment.
41. The computer-readable medium of claim 32, further comprising:
computer-readable code adapted to, for at least one consumer, associate the consumer with the merchant segment for which the consumer had the highest predicting spending relative to other merchant segments.
42. The computer-readable medium of claim 32, further comprising:
computer-readable code adapted to generate a consumer vector for each of at least a subset of the consumers;
computer-readable code adapted to generate a merchant vector for each of at least a subset of the merchants;
computer-readable code adapted to, for at least one merchant segment, determine a segment vector as a summary vector of merchant vectors of merchants associated with the segment; and
computer-readable code adapted to, for at least one consumer, associate the consumer with the merchant segment having the greatest dot product between the segment vector of the segment and a consumer vector of the consumer.
43. The computer-readable medium of claim 32, further comprising:
computer-readable code adapted to, for at least one merchant segment:
rank the consumers by their predicted spending in the merchant segment; and
determine for at least one consumer a percentile ranking in the merchant segment; and
computer-readable code adapted to, for each consumer:
determine the merchant segment in which the consumer's percentile ranking is the highest, to uniquely associate each consumer with one merchant segment; and
for at least one merchant segment, determine summary transaction statistics for the consumers uniquely associated with the merchant segment.
44. The computer-readable medium of claim 32, further comprising:
computer-readable code adapted to, for at least one merchant segment:
rank the consumers by their predicted spending in the merchant segment;
determine for at least one consumer a percentile ranking in the merchant segment;
select as a population, the consumers having a percentile ranking in excess of predetermined percentile threshold; and
determine summary transaction statistics for selected population of consumers.
45. The computer-readable medium of claim 32, further comprising:
computer-readable code adapted to establish for at least one merchant in the transaction data a merchant vector; and
computer-readable code adapted to update the merchant vector of at least one merchant relative to the merchant vectors of other merchants according to co-occurrences of each merchant in the transaction data.
46. The computer-readable medium of claim 45, further comprising:
computer-readable code adapted to update the merchant vector of at least one merchant based upon an unexpected amount deviation in a frequency of co-occurrence of the merchant with other merchants.
47. The computer-readable medium of claim 45, further comprising:
computer-readable code adapted to determine a co-occurrence frequency for at least one merchant with at least one other merchant in the transaction data;
computer-readable code adapted to determine for at least one pair of merchants, a relationship strength between the pair of merchants based on how much the determined co-occurrence frequency deviates from an expected co-occurrence frequency;
computer-readable code adapted to, for at least one pair of merchant vectors, map the relationship strength into a vector space as a desired dot product between respective merchant vectors the merchants in the pair; and
computer-readable code adapted to update at least one merchant vector so that the actual dot products between each pair of merchant vectors approximates the desired dot product between the merchant vectors.
48. The computer-readable medium of claim 47, wherein the computer-readable code adapted to determine for at least one pair of merchants a relationship strength between the pair of merchants further comprises:
computer-readable code adapted to determine the relationship strength by ##EQU25##
where
r.sub.ij is the relationship strength between merchant.sub.i and merchant.sub.j in a pair of merchants;
T.sub.ij is the actual co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data; and
T.sub.ij is the expected co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data.
49. The computer-readable medium of claim 47, wherein the computer-readable code adapted to determine for at least one pair of merchants a relationship strength between the pair of merchants further comprises:
computer-readable code adapted to determine the relationship strength by
r.sub.ij =sign(T.sub.ij -T.sub.ij).multidot.√2ln .lambda.
where
r.sub.ij is the relationship strength between merchant.sub.i and merchant.sub.j in a pair of merchants;
.lambda. is a log-likelihood ratio;
T.sub.ij is the actual co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data; and
T.sub.ij is the expected co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data.
50. The computer-readable medium of claim 47, wherein the computer-readable code adapted to determine for at least one pair of merchants a relationship strength between the pair of merchants further comprises:
computer-readable code adapted to determine the relationship strength by ##EQU26##
where
r.sub.ij is the relationship strength between merchant.sub.i and merchant.sub.j in a pair of merchants;
.lambda. is a log-likelihood ratio;
T.sub.ij is the actual co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data; and
T.sub.ij is the expected co-occurrence frequency of merchant.sub.i and merchant.sub.j in the transaction data.
51. The computer-readable medium of claim 47, wherein the computer-readable code adapted to update at least one merchant vector so that the actual dot products between the at least one pair of merchant vectors approximates the desired dot product between the merchant vectors comprises computer-readable code adapted to perform a gradient descent update that updates the merchant vectors according to whether the actual dot product between them is greater or lesser than the desired dot product.
52. The computer-readable medium of claim 47, wherein the computer-readable code adapted to update at least one merchant vector so that the actual dot products between the at least one pair of merchant vectors approximates the desired dot product between the merchant vectors comprises computer-readable code adapted to determine for at least one merchant vector an error weighted average of the desired positions of the merchant vector from current position of at least one other merchant vector and the desired dot product between the merchant vector and at least one other merchant vector.
53. The computer-readable medium of claim 32, further comprising:
computer-readable code adapted to generate a consumer vector for each of at least a subset of the consumers;
computer-readable code adapted to generate a merchant vector for each of at least a subset of the merchants;
computer-readable code adapted to determine for at least one merchant name in the transaction data a merchant vector;
computer-readable code adapted to cluster the merchant vectors to form a plurality of merchant segments, wherein at least one merchant vector is associated with one and only one merchant segment; and
computer-readable code adapted to, for at least one merchant segment, determine from the transactions of consumers at the associated merchants of the merchant, statistical measures of consumer transactions in the segment.
54. The computer-readable medium of claim 32, further comprising:
computer-readable code adapted to select a plurality of consumers associated with at least one merchant segment, the selected plurality selected according to their predicted spending in the merchant segment; and
computer-readable code adapted to provide promotional offers to the selected plurality of consumers.
55. The computer-readable medium of claim 32, further comprising:
computer-readable code adapted to train at least one of the merchant segment predictive models to predict spending in a predicted time period based upon transaction statistics of the consumer's transactions in a past time period.
56. The computer-readable medium of claim 55, wherein the transaction statistics comprises variables describing the recency of the consumer's transactions in one or more merchant segments, the frequency of the consumer's transactions in one or more merchant segments, and the amount of the consumer's transactions in one or more merchant segments.
Description
BACKGROUND
1. Field of Invention
The present invention relates generally to analysis of consumer financial behavior, and more particularly to analyzing historical consumer financial behavior to accurately predict future spending behavior and likely responses to particular marketing efforts, in specifically identified data-driven industry segments.
2. Background of Invention
Retailers, advertisers, and many other institutions are keenly interested in understanding consumer spending habits. These companies invest tremendous resources to identify and categorize consumer interests, in order to learn how consumers spend money and how they are likely to respond to various marketing methods and channels. If the interests of an individual consumer can be determined, then it is believed that advertising and promotions related to these interests will be more successful in obtaining a positive consumer response, such as purchases of the advertised products or services.
Conventional means of determining consumer interests have generally relied on collecting demographic information about consumers, such as income, age, place of residence, occupation, and so forth, and associating various demographic categories with various categories of interests and merchants. Interest information may be collected from surveys, publication subscription lists, product warranty cards, and myriad other sources. Complex data processing is then applied to the source of data resulting in some demographic and interest description of each of a number of consumers.
This approach to understanding consumer behavior often misses the mark. The ultimate goal of this type of approach, whether acknowledged or not, is to predict consumer spending in the future. The assumption is that consumers will spend money on their interests, as expressed by things like their subscription lists and their demographics. Yet, the data on which the determination of interests is made is typically only indirectly related to the actual spending patterns of the consumer. For example, most publications have developed demographic models of their readership, and offer their subscription lists for sale to others interested in the particular demographics of the publication's readers. But subscription to a particular publication is a relatively poor indicator of what the consumer's spending patterns will be in the future.
Even taking into account multiple different sources of data, such as combining subscription lists, warranty registration cards, and so forth still only yields an incomplete collection of unrelated data about a consumer.
One of the problems in these conventional approaches is that spending patterns are time based. That is, consumers spend money at merchants that are of interest to them in typically a time related manner. For example, a consumer who is a business traveler spends money on plane tickets, car rentals, hotel accommodations, restaurants, and entertainment all during a single business trip. These purchases together more strongly describe the consumer's true interests and preferences than any single one of the purchases alone. Yet conventional approaches to consumer analysis typically treats these purchases individually and as unrelated in time.
Yet another problem with conventional approaches is that categorization of purchases is often based on standardized industry classifications of merchants and business, such as the SIC codes. This set of classification is entirely arbitrary, and has little to do with actual consumer behavior. Consumers do not decide which merchants to purchase from based on merchant SIC codes. Thus, the use of arbitrary classifications to predict financial behavior is doomed to failure, since the classifications have little meaning in the actual data of consumer spending.
A third problem is that different groups of consumers spend money in different ways. For example, consumers who frequent high-end retailers have entirely different spending habits than consumers who are bargain shoppers. To deal with this problem, most systems focus exclusively on very specific, predefined types of consumers, in effect, assuming that the interests or types of consumers are known, and targeting these consumers with what are believed to be advertisements or promotions of interest to them. However, this approach essentially puts the cart before the proverbial horse: it assumes the interests and spending patterns of a particular group of consumers, it does not discover them from actual spending data. It thus begs the questions as to whether the assumed group of consumers in fact even exists, or has the interests that are assumed for it.
Existing approaches also fail to take into account the degree of success of marketing efforts, with respect to customers that are similar to a target customer of a marketing effort.
Accordingly, what is needed is the ability to model consumer financial behavior based on actual historical spending patterns that reflect the time-related nature of each consumer's purchase. Further, it is desirable to extract meaningful classifications of merchants based on the actual spending patterns, and from the combination of these, predict future spending of an individual consumer in specific, meaningful merchant groupings. Finally, it is desirable to provide recommendations based on analysis of customers that are similar to the target customer, and in particular to take into account the observed degree of success of particular marketing efforts with respect to such similar customers.
In the application domain of information, and particularly text retrieval, vector based representations of documents and words is known. Vector space representations of documents are described in U.S. Pat. No. 5,619,709 issued to Caid et. al, and in U.S. Pat. No. 5,325,298 issued to Gallant. Generally, vectors are used to represent words or documents. The relationships between words and between documents is learned and encoded in the vectors by a learning law. However, because these uses of vector space representations, including the context vectors of Caid, are designed for primarily for information retrieval, they are not effective for predictive analysis of behavior when applied to documents such as credit card statements and the like. When the techniques of Caid were applied to the prediction problems, it had numerous shortcomings. First, it had problems dealing with high transaction count merchants. These are merchants whose names appear very frequently in the collections of transaction statements. Because Caid's system downplays the significance of frequently appearing terms, these high transaction frequency merchants were not being accurately represented. Excluding high transaction frequency merchants from the data set however undermines the system's ability to predict transactions in these important merchants. Second, it was discovered that past two iterations of training, Caid's system performance declined, instead of converging. This indicates that the learning law is learning information that is only coincidental to transaction prediction, instead of information that is specifically for transaction prediction. Accordingly, it is desirable to provide a new methodology for learning the relationships between merchants and consumers so as to properly reflect the significance of the frequency with which merchants appears in the transaction data.
SUMMARY OF THE INVENTION
The present invention overcomes the limitations of conventional approaches to consumer analysis by providing a system and method of analyzing and predicting consumer financial behavior that uses historical, and time-sensitive, spending patterns of individual consumers. In one aspect, the invention generates groupings (segments) of merchants, which accurately reflect underlying consumer interests, and a predictive model of consumer spending patterns for each of the merchant segments. In another aspect, a supervised segmentation technique is employed to develop merchant segments that are of interest to the user. In yet another aspect, a "nearest neighbor" technique is employed, so as to identify those customers that are most similar to the target customer and to make predictions regarding the target customer based on observed behavior of the nearest neighbors. Current spending data of an individual consumer or groups of consumers can then be applied to the predictive models to predict future spending of the consumers in each of the merchant clusters, and/or marketing success data with respect to nearest neighbors can be applied to predict likelihood of success in promoting particular products to particular customers.
In one aspect, the present invention includes the creation of data-driven grouping of merchants, based essentially on the actual spending patterns of a group of consumers. Spending data of each consumer is obtained, which describes the spending patterns of the consumers in a time-related fashion. For example, credit card data demonstrates not merely the merchants and amounts spent, but also the sequence in which purchases were made. One of the features of the invention is its ability to use the co-occurrence of purchases at different merchants to group merchants into meaningful merchant segments. That is, merchants that are frequently shopped at within some number of transactions or time period of each other reflect a meaningful cluster. This data-driven clustering of merchants more accurately describes the interests or preferences of consumers.
Merchants may also be segmented according to a supervised segmentation technique, such as Kohonen's Learning Vector Quantization (LVQ) algorithm, as described in T. Kohonen, "Improved Versions of Learning Vector Quantization," in IJCNN San Diego, 1990; and T. Kohonen, Self-Organizing Maps, 2d ed., Springer-Verlag, 1997. Supervised learning allows characteristics of segments to be directly specified, so that segments may be defined, for example, as "art museums," "book stores," "Internet merchants," and the like. Segment boundaries can be defined by the training algorithm based on training exemplars with known membership in classes. Segments may be overlapping or mutually exclusive, as desired.
In a preferred embodiment, the analysis of consumer spending uses spending data, such as credit card statements, retail data, or any other transaction data, and processes that data to identify co-occurrences of purchases within defined co-occurrence windows, which may be based on either a number of transactions, a time interval, or other sequence related criteria. Each merchant is associated with a vector representation; in one embodiment, the initial vectors for all of the merchants are randomized to present a quasi-orthogonal set of vectors in a merchant vector space.
Each consumer's transaction data reflecting his or her purchases (e.g. credit card statements, bank statements, and the like) is chronologically organized to reflect the general order in which purchases were made at the merchants. Analysis of each consumer's transaction data in various co-occurrence windows identifies which merchants co-occur. For each pair of merchants, their respective merchant vectors are updated in the vector space as a function of their frequency of their co-occurrence. After processing of the spending data, the merchant vectors of merchants that are frequented together are generally aligned in the same direction in the merchant vector space.
In one embodiment, clustering techniques or supervised segmentation techniques are then applied to define merchant segments. Each merchant segment yields useful information about the type of merchants associated with it, their average purchase and transaction rates, and other statistical information. (Merchant "segments" and merchant "clusters" are used interchangeably herein.)
In another embodiment, such segmentation is not performed. Rather, a "nearest neighbor" approach is adopted, in order to identify merchants, offers, promotions, and the like, that were most successful in connection with consumers that are determined to be the nearest neighbor to the target consumer.
Preferably, each consumer is also given a profile that includes various demographic data, and summary data on spending habits. In addition, each consumer is preferably given a consumer vector. From the spending data, the merchants from whom the consumer has most frequently or recently purchased are determined. The consumer vector is then the summation of these merchant vectors. As new purchases are made, the consumer vector is updated, preferably decaying the influence of older purchases. In essence, like the expression "you are what you eat," the present invention reveals, "you are whom you shop at," since the vectors of the merchants are used to construct the vectors of the consumers.
An advantage of this approach is that both consumers and merchants are represented in a common vector space. This means that given a consumer vector, the merchant vectors that are "similar" to this consumer vector can be readily determined (that is, they point in generally the same direction in the merchant vector space), for example using dot product analysis. Thus, merchants who are "similar" to the consumer can be easily determined, these being merchants who would likely be of interest to the consumer, even if the consumer has never purchased from these merchants before.
Given the merchant segments, the present invention then creates a predictive model of future spending in each merchant segment, based on transaction statistics of historical spending in the merchant segment by those consumers who have purchased from merchants in the segments, in other segments, and data on overall purchases. In one embodiment, each predictive model predicts spending in a merchant cluster in a predicted time interval, such as 3 months, based on historical spending in the cluster in a prior time interval, such as the previous 6 months. During model training, the historical transactions in the merchant cluster for consumers who spent in the cluster, is summarized in each consumer's profile in summary statistics, and input into the predictive model along with actual spending in a predicted time interval. Validation of the predicted spending with actual spending is used to confirm model performance. The predictive models may be a neural network, or other multivariate statistical model.
This modeling approach is advantageous for two reasons. First, the predictive models are specific to merchant clusters that actually appear in the underlying spending data, instead of for arbitrary classifications of merchants such as SIC classes. Second, because the consumer spending data of those consumers who actually purchased at the merchants in the merchant clusters is used, they most accurately reflect how these consumers have spent and will spend at these merchants.
To predict financial behavior, the consumer profile of a consumer, using preferably the same type of summary statistics for a recent, past time period, is input into the predictive models for the different merchant clusters. The result is a prediction of the amount of money that the consumer is likely to spend in each merchant cluster in a future time interval, for which no actual spending data may yet be available.
For each consumer, a membership function may be defined which describes how strongly the consumer is associated with each merchant segment. (Preferably, the membership function outputs a membership value for each merchant segment.) The membership function may be the predicted future spending in each merchant segment, or it may be a function of the consumer vector for the consumer and a merchant segment vector (e.g. centroid of each merchant segment). The membership function can be weighted by the amount spent by the consumer in each merchant segment, or other factors. Given the membership function, the merchant clusters for which the consumer has the highest membership values are of particular interest: they are the clusters in which the consumer will spend the most money in the future, or whose spending habits are most similar to the merchants in the cluster. This allows very specific and accurate targeting of promotions, advertising and the like to these consumers. A financial institution using the predicted spending information can direct promotional offers to consumers who are predicted to spend heavily in a merchant segment, with the promotional offers associated with merchants in the merchant segment.
Also, given the membership values, changes in the membership values can be readily determined over time, to identify transitions by the consumer between merchants segments of interest. For example, each month (e.g. after a new credit card billing period or bank statement), the membership function is determined for a consumer, resulting in a new membership value for each merchant cluster. The new membership values can be compared with the previous month's membership values to indicate the largest positive and negative increases, revealing the consumer's changing purchasing habits. Positive changes reflect purchasing interests in new merchant clusters; negative changes reflect the consumer's lack of interest in a merchant cluster in the past month. Segment transitions such as these further enable a financial institution to target consumers with promotions for merchants in the segments in which the consumers show significant increases in membership values.
In another aspect, the present invention provides an improved methodology for learning the relationships between merchants in transaction data, and defining vectors that represent the merchants. More particularly, this aspect of the invention accurately identifies and captures the patterns of spending behavior that result in the co-occurrence of transactions at different merchants. The methodology is generally as follows:
First, the number of times that each pair of merchants co-occurs with one another in the transaction data is determined. The underlying intuition here is that merchants whom the consumers' behaviors indicates as being related will occur together often, whereas unrelated merchants do not occur together often. For example, a new mother will likely shop at children's clothes stores, toy stores, and other similar merchants, whereas a single young male will likely not shop at these types of merchants. The identification of merchants is by counting occurrences of merchants' names in the transaction data. The merchants' names may be normalized to reduce variations and equate different versions of a merchant's name to a single common name.
Next, a relationship strength between each pair of merchants is determined based on how much the observed co-occurrence of the merchants deviated from an expected co-occurrence of the merchant pair. The expected co-occurrence is based on statistical measures of how frequently the individual merchants appear in the transaction data or in co-occurrence events. Various relationship strength measures may be used, based on for example, standard deviations of predicted co-occurrence, or log-likelihood ratios.
The relationship strength measure has the features that two merchants that co-occur significantly more often than expected are positively related to one another; two merchants that co-occur significantly less often than expected are negatively related to one another, and two merchants that co-occur about the number of times expected are not related.
The relationship strength between each pair of merchants is then mapped into the vector space. This is done by determining the desired dot product between each pair of merchant vectors as a function of the relationship strength between the pair of merchants. This step has the feature that merchant vectors for positively related merchants have a positive dot product, the merchant vectors for negatively related merchants have a negative dot product, and the merchant vectors for unrelated merchants have a zero dot product.
Finally, given the determined dot products for merchant vector pairs, the locations of the merchant vectors are updated so that actual dot products between them at least closely approximate the desired dot products previously determined.
The present invention also includes a method for determining whether any two strings represent the same thing, such as variant spellings of a merchant name. This aspect of the invention is beneficially used to identify and normalize merchant names given what is typically a variety of different spellings or forms of a same merchant name in large quantities of transaction data. In this aspect of the invention, the frequency of individual trigrams (more generally, n-grams) for a set of strings, such as merchant names in transaction data, is determined. Each trigram is given a weight based on its frequency. Preferably, frequently occurring trigrams are assigned low weights, while rare trigrams are assigned high weights. A high dimensional vector space is defined, with one dimension for each trigram. Orthogonal unit vectors are defined for each trigram. Each string (e.g. merchant name) to be compared is given a vector in the trigram vector space. This vector is defined as the sum of the unit vectors for each trigram in the string, weighted by the trigram weight. Any two strings, such as merchant names, can now be compared by taking their dot product. If the dot product is above a threshold (determined from analysis of the data set), then the strings are deemed to be equivalents of each other. Normalizing the length of the string vectors may be used to make the comparison insensitive to the length of the original strings. With either partial (normalization of one string but not the other) or non-normalization, string length influences the comparison, but may be used to match parts of one string against the entirety of another string. This methodology provides for an extremely fast and accurate mechanism for string matching. The matching process may be used to determine, for example, whether two merchant names are the same, two company names, two people names, or the like. This is useful in applications needing to reconcile divergent sources or types of data containing strings that reference to a common group of entities (e.g. transaction records from many transaction sources containing names of merchants).
In another aspect, the present invention employs nearest-neighbor techniques to predict responses to offers or other marketing-related value. Once consumer vectors have been developed as discussed above, a reference set of consumers is selected, having known response rates to offers (or having other characteristics that are known to be related to or good predictors of response rates). Each consumer in the reference set has a vector and a value describing the known or predicted response rate relevant to the offer being analyzed. The consumer vector for a proposed target consumer is obtained, and the nearest neighbors in the reference set are identified. The response rate among the nearest neighbors is aggregated and used as a predictor of the likely response rate for the target consumer. Based on this score for a number of potential target consumers, the marketing effort can be targeted at those consumers most likely to respond favorably, thus improving the efficiency of the marketing campaign.
In yet another embodiment, the present invention employs supervised segmentation of consumer vectors, based on manually applied labels for a reference population, in order to generate predictions of response rates.
The present invention may be embodied in various forms. As a computer program product, the present invention includes a data preprocessing module that takes consumer spending data and processes it into organized files of account related and time organized purchases. Processing of merchant names in the spending data is provided to normalize variant names of individual merchants. A data post-processing module generates consumer profiles of summary statistics in selected time intervals, for use in training the predictive model. A predictive model generation system creates merchant vectors, and clusters them into merchant clusters, and trains the predictive model of each merchant segment using the consumer profiles and transaction data. Merchant vectors and consumer profiles are stored in databases. A profiling engine applies consumer profiles and consumer transaction data to the predictive models to provide predicted spending in each merchant segment, and to compute membership functions of the consumers for the merchant segment. A reporting engine outputs reports in various formats regarding the predicted spending and membership information. A segment transition detection engine computes changes in each consumer's membership values to identify significant transitions of the consumer between merchant clusters. The present invention may also be embodied as a system, with the above program product element cooperating with computer hardware components, and as a computer-implemented method.
DESCRIPTION OF THE DRAWINGS
FIGS. 1a-1c are illustrations of merchant and consumer vector representations.
FIG. 2 is a sample list of merchant segments.
FIG. 3 is a flowchart of the overall process of the present invention.
FIG. 4a is an illustration of the system architecture of one embodiment of the present invention during operation.
FIG. 4b is an illustration of the system architecture of the present invention during development and training of merchant vectors, and merchant segment predictive models.
FIG. 5 is an illustration of the functional components of the predictive model generation system.
FIGS. 6a and 6b are illustrations of forward and backward co-occurrence windows.
FIG. 7a is an illustration of the master file data prior to stemming and equivalencing, and FIG. 7b is an illustration of a forward co-occurrence window in this portion of the master file after stemming and equivalencing.
FIG. 8 is an illustration of the various types of observations during model training.
FIG. 9 is an illustration of the application of multiple consumer account data to the multiple segment predictive models.
FIG. 10 is a flowchart of a process of supervised segmentation according to one embodiment of the present invention.
FIGS. 11A through 11C show an example of segment vector adjustment.
FIGS. 12A through 12C show a second example of segment vector adjustment.
FIG. 13 is a block diagram showing an example of response prediction using a predictive model.
FIG. 14 is a flow chart depicting a nearest-neighbor response prediction technique according to one embodiment of the present invention.
FIG. 15 is a flow chart depicting a technique of supervised segmentation of consumer vectors for predicting a response rate for a consumer with regard to a particular offer.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
A. Overview of Consumer and Merchant Vector Representation and the Co-occurrence of Merchant Purchases
One feature of the present invention that enables prediction of consumer spending levels at specific merchants and prediction of response rates to marketing offers is the ability to represent both consumer and merchants in the same modeling representation. A conventional example is attempting to classify both consumers and merchants with demographic labels (e.g. "baby boomers", or "empty-nesters"). This conventional approach is simply arbitrary, and does not provide any mechanisms for directly quantifying how similar a consumer is to various merchants. The present invention, however, does provide such a quantifiable analysis, based on high-dimensional vector representations of both consumers and merchants, and the co-occurrence of merchants in the spending data of individual consumers.
Referring now to FIGS. 1a and 1b, there is shown a simplified model of the vector space representation of merchants and consumers. The vector space 100 is shown here with only three axes, but in practice is a high dimensional hyper-sphere, typically having 100-300 components. In this vector space 100, each merchant is assigned a merchant vector. Preferably, the initial assignment of each merchant's vector contains essentially randomly valued vector components, to provide for a quasi-orthogonal distribution of the merchant vectors. This means that initially, the merchant vectors are essentially perpendicular to each other, so that there is no predetermined or assumed association or similarity between merchants.
In FIG. 1a, there is shown merchant vectors for five merchants, A, B, C, D, and E after initialization, and prior to being updated. Merchant A is an upscale clothing store, merchant B is a discount furniture store, merchant C is an upscale furniture store, merchant D is a discount clothing catalog outlet, and merchant E is a online store for fashion jewelry. As shown in FIG. 1c, merchants A and D have the same SIC code because they are both clothing stores, and merchants B and C have the same SIC code because they are both furniture stores. In other words, the SIC codes do not distinguish between the types of consumers who frequent these stores.
In FIG. 1b, there is shown the same vector space 100 after consumer spending data has been processed according to the present invention to train the merchant vectors. The training of merchant vectors is based on co-occurrence of merchants in each consumer's transaction data. FIG. 1c illustrates consumer transaction data 104 for two consumers, C1 and C2. The transaction data for C1 includes transactions 110 at merchants A, C, and E. In this example, the transaction at merchants A and C co-occur within a co-occurrence window 108; likewise the transactions at merchants C and E co-occur within a separate co-occurrence window 108. The transaction data for C2 includes transactions 110 at merchants B and D, which also form a co-occurrence event.
Merchants for whom transactions co-occur in a consumer's spending data have their vectors updated to point more in the same direction in the vector space, that is making their respective vector component values more similar.
Thus, in FIG. 1b, following processing of the consumer transaction data, the merchant vectors for merchants A, C, and E have been updated, based on actual spending data, such as C1's transactions, to point generally in the same direction, as have the merchant vectors for merchants B and D, based on C2's transactions. Clustering techniques are used then to identify clusters or segments of merchants based on their merchant vectors 402. In the example of FIG. 1b, a merchant segment is defined to include merchants A, C, and E, such as "upscale-technology_savvy." Note that as defined above, the SIC codes of these merchants are entirely unrelated, and so SIC code analysis would not reveal this group of merchants. Further, a different segment with merchants B and D is identified, even though the merchants share the same SIC codes with the merchants in the first segment, as shown in the transaction data 104.
Each merchant segment is associated with a merchant segment vector 105, preferably the centroid of the merchant cluster. Based on the types of merchants in the merchant segment, and the consumers who have purchased in the segment, a segment name can be defined, and may express the industry, sub-industry, geography, and/or consumer demographics.
The merchant segments provide very useful information about the consumers. In FIG. 1b there is shown the consumer vectors 106 for consumers C1 and C2. Each consumer's vector is a summary vector of the merchants at which the consumer shops. This summary is preferably the vector sum of merchant vectors at which the consumer has shopped at in defined recent time interval. The vector sum can be weighted by the recency of the purchases, their dollar amount, or other factors.
Being in the same vector space as the merchant vectors, the consumer vectors 106 reveal the consumer's interests in terms of their actual spending behavior. This information is by far a better base upon which to predict consumer spending at merchants, and likely response rates to offers, than superficial demographic labels or categories. Thus, consumer C1's vector is very strongly aligned with the merchant vectors of merchants A, C, and E, indicating C1 is likely to be interested in the products and services of these merchants. C1's vector can be aligned with these merchants, even if C1 never purchased at any of them before. Thus, merchants A, C, and E have a clear means for identifying consumers who may be interested in purchasing from them.
Which consumers are associated with which merchant segments can also determined by a membership function. This function can be based entirely on the merchant segment vectors and the consumer vectors (e.g. dot product), or on other quantifiable data, such as amount spent by a consumer in each merchant segment, or a predicted amount to be spent.
Given the consumers who are members of a segment, useful statistics can be generated for the segment, such as average amount spent, spending rate, ratios of how much these consumers spend in the segment compared with the population average, response rates to offers, and so forth. This information enables merchants to finely target and promote their products to the appropriate consumers.
FIG. 2 illustrates portions of a sample index of merchant segments, as may be produced by the present invention. Segments are named by assigning each segment a unique segment number 200 between 1 and M the total number of segments. In addition, each segment has a description field 210, which describes the merchant segment. A preferred description field is of the form:
Major Categories: Minor Categories: Demographics: Geography
Major categories 202 describe how the customers in a merchant segment typically use their accounts. Uses include retail purchases, direct marketing purchases, and where this type cannot be determined, then other major categories, such as travel uses, educational uses, services, and the like. Minor categories 204 describe both a subtype of the major category (e.g. subscriptions being a subtype of direct marketing) or the products or services purchased in the transactions (e.g. housewares, sporting goods, furniture) commonly purchased in the segment. Demographics information 206 uses account data from the consumers who frequent this segment to describe the most frequent or average demographic features, such as age range or gender, of the consumers. Geographic information 208 uses the account data to describe the most common geographic location of transactions in the segment. In each portion of the segment description 210 one or more descriptors may be used (i.e. multiple major, minor, demographic, or geographic descriptors). This naming convention is much more powerful and fine-grained than conventional SIC classifications, and provides insights into not just the industries of different merchants (as in SIC) but more importantly, into the geographic, approximate age or gender, and lifestyle choices of consumers in each segment.
The various types of segment reports are further described in section I. Reporting Engine, below.
B. System Overview
Turning now to FIG. 4a there is shown an illustration of a system architecture of one embodiment of the present invention during operation in a mode for predicting consumer spending. System 400 includes begins with a data preprocessing module 402, a data postprocessing module 410, a profiling engine 412, and a reporting engine 426. Optional elements include a segment transition detection engine 420 and a targeting engine 422. System 400 operates on different types of data as inputs, including consumer summary file 404 and consumer transaction file 406, generates interim models and data, including the consumer profiles in profile database 414, merchant vectors 416, merchant segment predictive models 418, and produces various useful outputs including various segment reports 428-432.
FIG. 4b illustrates system 400 during operation in a training mode, and here additionally includes predictive model generation system 440.
C. Functional Overview
Referring now to FIG. 3,, there is shown a functional overview of the processes supported by the present invention. The process flow illustrated and described here is exemplary of how the present invention may be used, but does not limit the present invention to this exact process flow, as variants may be easily devised.
Generally then, master files 408 are created or updated 300 from account transaction data for a large collection of consumers (account holders) of a financial institution, as may be stored in the consumer summary files 404 and the consumer transaction files 406. The master files 408 collect and organize the transactions of each consumer from different statement periods into a date ordered sequence of transaction data for each consumer. Processing of the master files 408 normalizes merchant names in the transaction data, and generates frequency statistics on the frequency of occurrence of merchant names.
In a training mode, the present invention creates or updates 302 merchant vectors associated with the merchant names. The merchant vectors are based on the co-occurrence of merchants' names in defined co-occurrence windows (such as a number of transactions or period of time). Co-occurrence statistics are used to derive measures of how closely related any two merchants are based on their frequencies of co-occurrence with each other, and with other merchants. The relationship measures in turn influence the positioning of merchant vectors in the vector space so that merchants who frequently co-occur have vectors that are similarly oriented in the vector space, and the degree of similarity of the merchant vectors is a function of their co-occurrence rate.
The merchant vectors are then clustered 304 into merchant segments. The merchant segments generally describe groups of merchants that are naturally (in the data) shopped at "together" based on the transactions of the many consumers. Each merchant segment has a segment vector computed for it, which is a summary (e.g. centroid) of the merchant vectors in the merchant segment. Merchant segments provide very rich information about the merchants that are members of the segments, including statistics on rates and volumes of transactions, purchases, and the like.
With the merchant segments now defined, a predictive model of spending behavior is created 306 for each merchant segment. The predictive model for each segment is derived from observations of consumer transactions in two time periods: an input time window and a subsequent prediction time window. Data from transactions in the input time window for each consumer (including both segment specific and cross-segment) is used to extract independent variables, and actual spending in the prediction window provides the dependent variable. The independent variables typically describe the rate, frequency, and monetary amounts of spending in all segments and in the segment being modeled. A consumer vector derived from the consumer's transactions may also be used. Validation and analysis of the segment predictive models may be done to confirm the performance of the models.
In one embodiment, a predictive model may also be developed to predict spending at vendors, responses to particular offers or other marketing schemes, and the like, that are not associated with a particular market segment. The predictive model is trained using vector values of a number of customers with respect to a number of market segments. The customers' known spending behavior and/or responses to offers (both positive and negative exemplars) are provided as training data for the predictive model. Based on these data items, the model is trained, using known techniques such as neural network backward propagation techniques, linear regression, and the like. A predicted response or spending behavior estimate can then be generated based on vector values for a customer with respect to a number of market segments, even when the behavior being predicted does not correspond to any of the known market segments.
In the production phase, the system is used to predict spending, either in future time periods for which there is no actual data as of yet, or in a recent past time period for which data is available and which is used for retrospective analysis. Generally, each account (or consumer) has a profile summarizing the transactional behavior of the account holder. This information is created, or updated 308 with recent transaction data if present, to generate the appropriate variables for input into the predictive models for the segments. (Generation of the independent variables for model generation may also involve updating 308 of account profiles.)
Each account further includes a consumer vector which is derived, e.g. as a summary vector, from the merchant vectors of the merchant at which the consumer has purchased in a defined time period, say the last three months. Each merchant vector's contribution to the consumer vector can be weighted by the consumer's transactions at the merchants, such as by transaction amounts, rates, or recency. The consumer vectors, in conjunction with the merchant segment vectors provide an initial level of predictive power. Each consumer can now be associated with the merchant segment having a merchant segment vector closest to the consumer vector for the consumer.
Using the updated account profiles, this data is input into the set of predictive models to generate 310 for each consumer, an amount of predicted spending in each merchant segment in a desired prediction time period. For example, the predictive models may be trained on a six-month input window to predict spending in a subsequent three-month prediction window. The predicted period may be an actual future period or a current (e.g. recently ended) period for which actual spending is available.
The predicted spending levels and consumer profiles allow for various levels and types of account and segment analysis 312. First, each account may be analyzed to determine which segment (or segments) the account is a member of, based on various membership functions. A preferred membership function is the predicted spending value, so that each consumer is a member of the segment for which they have the highest predicted spending. Other measures of association between accounts and segments may be based on percentile rankings of each consumer's predicted spending across the various merchant segments. With any of these (or similar) methods of determining which consumers are associated with which segments, an analysis of the rates and volumes of different types of transactions by consumers in each segment can be generated. Further, targeting of accounts in one or more segments may be used to selectively identify populations of consumers with predicted high dollar amount or transaction rates. Account analysis also identifies consumers who have transitioned between segments as indicated by increased or decreased membership values.
Using targeting criteria, promotions directed 314 to specific consumers in specific segments and the merchants in those segments can be realized. For example, given a merchant segment, the consumers with the highest levels (or rankings) of predicted spending in the segment may be identified, or the consumers having consumer vectors closest to the segment vector may be selected. Or, the consumers who have highest levels of increased membership in a segment may be selected. The merchants that make up the segment are known from the segment clustering 304. One or more promotional offers specific to merchants in the segment can be created, such as discounts, incentives and the like. The merchant-specific promotional offers are then directed to the selected consumers. Since these account holders have been identified as having the greatest likelihood of spending in the segment, the promotional offers beneficially coincide with their predicted spending behavior. This desirably results in an increased success rate at which the promotional offers are redeemed.
In an alternative embodiment, supervised segmentation is performed in place of the data-driven segmentation approach described above. Supervised segmentation allows a user to specify particular merchant segments that are of interest, so that relevant data can be extracted in a relevant and usable form. Examples of user-defined merchant segments include "art museums," "book stores," and "Internet merchants." Supervised segmentation allows a user to direct the system to provide predictive and analytical data concerning those particular segments in which the user is interested.
The technique of supervised segmentation, as employed by one embodiment of the present invention, determines segment boundaries and segment membership for merchants. Segment vectors are initialized, and are then iteratively adjusted using a training algorithm, until the segment vectors represent a meaningful summary of merchants belonging to the corresponding segment. The basis for the training algorithm is a Learning Vector Quantization (LVQ) technique, as described for example, in T. Kohonen, "Improved Versions of Learning Vector Quantization," in IJCNN San Diego, 1990. According to the techniques of the system, segments may overlap or they may be mutually exclusive, depending on user preference and the particular application. For example, with overlapping segments, a particular merchant (such as an Internet bookstore) might be a member of two or more merchant segments (e.g. "book stores" and "Internet merchants"). If mutually exclusive segments are used, the merchant will be assigned to only one segment, based on the learning algorithm's determination as to which segment is most suitable for the merchant.
Referring now to FIG. 10, there is shown a flowchart of an example of a supervised segmentation technique as may be used in connection with the present invention. According to the flowchart of FIG. 10, the system accepts user input specifying segments, and further specifying segment labels for a subset of merchants. Segment vectors are then iteratively adjusted based on the assigned segment labels, until segment vectors accurately represent an aggregation of the members of the respective segments.
A user specifies 1001 a set of merchant segments. A set of segment vectors are initialized 1002 for the specified merchant segments. The initial segment vectors may be orthogonal to one another, for example by being randomly assigned. Typically, the segment vectors occupy the same space as do merchant vectors, so that memberships, degrees of similarity, and affinities between merchants and segments can be defined and quantified.
For at least a subset of merchants, the user provides 1003 segment labels. In other words, the user labels the merchant with one (or more) of the specified merchant segments. These manually applied segment labels are then used by the system to train and refine segment vectors, as follows.
A labeled merchant is selected 1004 for processing. Based on the merchant vector for the selected merchant (derived previously from step 302 of FIG. 3, as described above), a segment is determined 1005 for the merchant. In one embodiment, this is the segment having a segment vector that is most closely aligned with the merchant vector (this may be determined, for example, by calculating the dot-product of the segment vector and merchant vector).
The segment specified by the manually applied segment label is compared 1006 with the segment determined in step 1005. If these are not the same segment, one or more segment vectors are adjusted 1008 in an effort to "train" the segment vectors. Either the segment vector determined in 1006 is moved farther from the merchant vector, or the segment vector specified by the label is moved closer to the merchant vector, or both vectors are adjusted.
For example, suppose we have a merchant such as Barnes & Noble. The user provides a segment label identifying the merchant as a bookstore. In step 1005, the system determines a segment for the merchant based on vector positioning. If the determined segment is, for example, "grocery store," which does not match the segment label, segment vectors would be adjusted accordingly. The segment vector for grocery stores might be moved farther away from the Barnes & Noble merchant vector, or the segment vector for bookstores might be moved closer to the Barnes & Noble merchant vector, or both adjustments might be made.
Referring to FIGS. 11A through 11C, there are shown examples of segment vector adjustments that may be performed when the selected segment does not correspond to the segment label manually applied to the merchant vector. FIG. 11A depicts a starting position for a merchant vector MV and three segment vectors SV.sub.1, SV.sub.2, and SV.sub.3. For illustrative purposes, vector space 100 is depicted as having three dimensions, though in practice it is a hypersphere having any number of dimensions. MV is assumed to have been manually labeled with segment 1, corresponding to segment vector SV.sub.1. It can be seen from the starting positions shown in FIG. 11A that the segment vector closest to merchant vector MV is SV.sub.2, which does not correspond to the label assigned to MV. Accordingly, one or more of segment vectors SV.sub.1 and SV.sub.2 are adjusted.
FIG. 11B depicts an adjustment that may be performed on the segment vector SV.sub.2 that is closest to the merchant vector MV. Segment vector SV.sub.2 is moved away from MV, so as to reflect the fact that MV was not labeled with SV.sub.2 FIG. 11C depicts another adjustment that may be performed; in this figure, segment vector SV.sub.1 is moved closer to MV, so as to reflect the fact that MV was labeled with SV.sub.1. In an alternative embodiment, both adjustments depicted in FIGS. 11B and 11C may be performed.
The degree and direction of adjustment may be determined by any desired means. For example, as described in Kohonen (1990), adjustment of SV.sub.2 as shown in FIG. 11B may be described as
SV.sub.2 (t+1)=SV.sub.2 (t)-.alpha.(t)[MV(t)-SV.sub.2 (t)]
where 0<.alpha.(t)<1, and .alpha. is decreasing monotonically with time (e.g. linearly, starting from a small value like 0.01 or 0.02).
Meanwhile, adjustment of SV1 as shown in FIG. 11C may be described as
SV.sub.1 (t+1)=SV.sub.1 (t)+.alpha.(t)[MV(t)-SV.sub.1 (t)]
where 0<.alpha.(t) <1, and a is decreasing monotonically with time (e.g. linearly, starting from a small value like 0.01 or 0.02).
If, in 1006, the selected segment does correspond to the segment label that has been assigned to the merchant, zero or more segment vectors are adjusted 1010. Either the segment vectors are left unchanged, or in an alternative embodiment, the assigned segment vector is moved closer to the merchant vector.
Thus continuing the Barnes & Noble example, if the determined segment is "bookstore," which does match the segment label, segment vectors may be left unchanged, or the segment vector for bookstores might be moved closer to the Barnes & Noble merchant vector.
Referring to FIGS. 12A through 12C, there is shown an example of a segment vector adjustment that may be performed when the selected segment does correspond to the segment label manually applied to the merchant vector. FIG. 12A depicts a starting position for a merchant vector MV and three segment vectors SV.sub.1, SV.sub.2, and SV.sub.3. MV is assumed to have been manually labeled with segment 1, corresponding to segment vector SV.sub.1. It can be seen from the starting positions shown in FIG. 12A that the segment vector closest to merchant vector MV is SV.sub.1, which does correspond to the label assigned to MV. Accordingly, either the vectors are left unchanged as shown in FIG. 12B, or, as shown in FIG. 12C, segment vector SV.sub.1 is moved closer to MV, so as to reflect the fact that MV was correctly assigned to SV.sub.1.
The degree and direction of adjustment may be determined by any desired means. For example, as described in Kohonen (1990), adjustment of SV.sub.1 as shown in FIG. 12C may be described as
SV.sub.1 (t+1)=SV.sub.1 (t)+.alpha.(t)[MV(t)-SV.sub.1 (t)]
where 0<.alpha.(t)<1, and a is decreasing monotonically with time (e.g. linearly, starting from a small value like 0.01 or 0.02).
In yet another embodiment, segment membership is nonexclusive, so that a merchant may be a member of more than one segment. A tolerance radius is established around the endpoint of each segment vector along the surface of a unit sphere; this tolerance radius represents a maximum allowable distance from the vector endpoint to the endpoint of a merchant vector, along the surface of the sphere. The tolerance radius may also be expressed as a minimum value resulting from a dot-product operation on the segment vector and a merchant vector; if the dot-product value exceeds this threshold value, the merchant is designated a member of the segment. Either technique may be used, as can any other method of defining a threshold value for segment membership.
Rather than adjusting segment vectors based on a determination of which segment vector is closest to the merchant vector, in this embodiment segment vectors are adjusted based on a determination of the merchant vector falling within the tolerance radius for one or more segment vectors. Adjustment of segment vectors may be performed as follows. Segment labels are manually applied to a merchant as described above in step 1003 of FIG. 10. The merchant vector is compared with segment vectors in order to determine whether the merchant vector falls within the predefined tolerance radius for any segment vectors. For each segment for which the merchant vector falls within the tolerance radius of the segment vector:
If the segment is one whose label was not manually applied to the merchant, adjust the segment vector to be farther from the merchant vector (FIG. 11B) and/or adjust other segment vectors whose labels were manually applied to the merchant to be closer to the merchant vector (FIG. 11C).
If the segment is one whose label was manually applied to the merchant, either do nothing (FIG. 12B) or adjust the segment vector to be closer to the merchant vector (FIG. 12C).
If the merchant vector does not fall within the tolerance radius of any segment vector, the system adjusts the segment vectors whose labels were manually applied to the merchant, to be closer to the merchant vector.
Once segments have been adjusted (if appropriate), a determination is made 1007 as to whether more training is required. This determination is made based on known convergence determination methods, or by reference to a predefined count of training iterations, or by other appropriate means. One advantage to the present invention is that not all merchants need be manually labeled in order to effectively train the vector set; once the segment vectors are sufficiently trained, merchants will automatically become associated with appropriate segments based on the positioning of their vectors.
As will be apparent to one skilled in the art, the supervised segmentation approach provides an alternative to unsupervised data-driven segmentation methods, and facilitates analysis of particular market segments or merchant types that are of interest. Thus, the above-described approach may be employed in place of the clustering methods previously described.
As indicated above, the various techniques of the present invention can be applied to other domains and environments. Thus references to "merchants," "accounts," and "customers" are merely exemplary, and are not intended to limit the scope of the invention.
D. Data Preprocessing Module
The data preprocessing module 402 (DPM) does initial processing of consumer data received from a source of consumer accounts and transactions, such as a credit card issuer, in preparation for creating the merchant vectors, consumer vectors, and merchant segment predictive models. DPM 402 is used in both production and training modes. (In this disclosure, the terms "consumer," "customer," and "account holder" are used interchangeably).
The inputs for the DPM are the consumer summary file 404 and the consumer transaction file 406. Generally, the consumer summary file 404 provides account data on each consumer who transaction data is to be processed, such as account number and other account identifying and descriptive information. The consumer transaction file 406 provides details of each consumer's transactions. The DPM 402 processes these files to organize both sets of data by account identifiers of the consumer accounts, and merges the data files so that each consumer's summary data is available with their transactions.
Customer summary file 404: The customer summary file 404 contains one record for each customer that is profiled by the system, and includes account information of the customer's account, and optionally includes demographic information about the customer. The consumer summary file 404 is typically one that a financial institution, such as a bank, credit card issuer, department store, and the like maintains on each consumer. The customer or the financial institution may supply the additional demographic fields that are deemed to be of informational or of predictive value. Examples of demographic fields include age, gender and income; other demographic fields may be provided, as desired by the financial institution.
Table 1 describes one set of fields for the customer summary file 404 for a preferred embodiment. Most fields are self-explanatory. The only required field is an account identifier that uniquely identifies each consumer account and transactions. This account identifier may be the same as the consumer's account number; however, it is preferable to have a different identifier used, since a consumer may have multiple account relationships with the financial institution (e.g. multiple credit cards or bank accounts), and all transactions of the consumer should be dealt with together. The account identifier is preferably derived from the account number, such as by a one-way hash or encrypted value, such that each account identifier is uniquely associated with an account number. The pop_id field is optionally used to segment the population of customers into arbitrary distinct populations as specified by the financial institution, for example by payment history, account type, geographic region, etc.
TABLE 1
Customer Summary File
Description Sample Format
Account_id Char[max 24]
Pop_id Char (`1`-`N`)
Account_number Char[max 16]
Credit bureau Short int as
score string
Internal credit risk Short int as
score string
Ytd purchases Int as string
Ytd_cash_adv Int as string
Ytd_int_purchases Int as string
Ytd_int_cash_adv Int as string
State_code Char[max 2]
Zip_code Char[max 5]
Demographic_1 Int as string
.
.
.
Demographic_N Int as string
Note the additional, optional demographic fields for containing demographic information about each consumer. In addition to demographic information, various summary statistics of the consumer's account may be included. These include any of the following:
TABLE 2
Example Demographic Fields for Customer Summary File
Description Explanation
Cardholder zip code
Months on books or open
date
Number of people on the Equivalent to number of plastics
account
Credit risk score
Cycles delinquent
Credit line
Open to buy
Initial month statement bal- Balance on the account prior to
ance the first month of transaction
data pull
Last month statement balance Balance on the account at the end
of the transaction data pulled
Monthly payment amount For each month of transaction
data contributed or the average
over last year.
Monthly cash advance For each month of transaction
amount data contributed or the average
over last year.
Monthly cash advance count For each month of transaction
data contributed or the average
over last year.
Monthly purchase amount For each month of transaction
data contributed or the average
over last year.
Monthly purchase count For each month of transaction
data contributed or the average
over last year.
Monthly cash advance inter- For each month of transaction
est data contributed or the average
over last year.
Monthly purchase interest For each month of transaction
data contributed or the average
over last year.
Monthly late charge For each month of transaction
data contributed or the average
over last year.
Consumer transaction file 406. The consumer transaction file 406 contains transaction level data for the consumers in the consumer summary file. The shared key is the account_id. In a preferred embodiment, the transaction file has the following description.
TABLE 3
Consumer Transaction File
Description Sample Format
Account_id Quoted char(24) - [0-9]
Account_number Quoted char(16) - [0-9]
Pop_id Quoted char(1) - [0-128]
Transaction_code Integer
Transaction_amount Float
Transaction_time HH:MM:SS
Transaction_date YYYYMMDD
Transaction_type Char(5)
SIC_code Char(5) - [0-9]
Merchant_descriptor Char(25)
SKU Number Variable length list
Merchant zip code Char[max 5]
The output for the DPM is the collection of master files 408 containing a merged file of the account information and transaction information for each consumer. The master file is generated as a preprocessing step before inputting data to the profiling engine 412. The master file 408 is essentially the customer summary file 404 with the consumer's transactions appended to the end of each consumer's account record. Hence the master file has variable length records. The master files 408 are preferably stored in a database format allowing for SQL querying. There is one record per account identifier.
In a preferred embodiment, the master files 408 have the following information:
TABLE 4
Master File 408
Description Sample Format
Account_id Char[max 24]
Pop_id Char (`1`-`N`)
Account_number Char[max 16]
Credit bureau score Short int as string
Ytd purchases Int as string
Ytd_cash_advances Int as string
Ytd_interest_on_purchases Int as string
Ytd_interest_on_cash_a Int as string
dvs
State_code Char[max 2]
Demographic_1 Int as string
.
.
.
Demographic_N Int as string
<transactions>
The transactions included for each consumer include the various data fields described above, and any other per-transaction optional data that the financial institution desires to track.
The master file 408 preferably includes a header that indicates last update and number of updates. The master file may be incrementally updated with new customers and new transactions for existing customers. The master file database is preferably be updated on a monthly basis to capture new transactions by the financial institution's consumers.
The DPM 402 creates the master file 408 from the consumer summary file 404 and consumer transaction file 406 by the following process:
a) Verify minimum data requirements. The DPM 402 determines the number of data files it is handling (since there maybe many physical media sources), and the length of the files to determine the number of accounts and transactions. Preferably, a minimum of 12 months of transactions for a minimum of 2 million accounts is used to provide fully robust models of merchants and segments. However, there is no formal lower bound to the amount of data on which system 400 may operate.
b) Data cleaning. The DPM 402 verifies valid data fields, and discards invalid records. Invalid records are records that are missing the any of the required fields for the customer summary file of the transaction file. The DPM 402 also indicates missing values for fields that have corrupt or missing data and are optional. Duplicate transactions are eliminated using account ID, account number, transaction code, transaction amount, date, and merchant description as a key.
c) Sort and merge files. The consumer summary file 404 and the consumer transaction file 406 are both sorted by account ID; the consumer transaction file 406 is further sorted by transaction date. Additional sorting of the transaction file, for example on time, type of transaction, merchant zip code, may be applied to further influence the determination of merchant co-occurrence. The sorted files are merged into the master file 408, with one record per account, as described above.
Due to the large volume of data involved in this stage, compression of the master files 408 is preferred, where on-the-fly compression and decompression is supported. This often improves system performance due to decreased I/O. In addition, as illustrated in FIG. 4a, the master file 408 may be split into multiple sub-files, such as splitting by population ID, or other variable, again to reduce the amount of data being handled at any one time.
E. Predictive Model Generation System
Referring to FIG. 4b, the predictive model generation system 440 takes as its inputs the master file 408 and creates the consumer profiles and consumer vectors, the merchant vectors and merchant segments, and the segment predictive models. This data is used by the profiling engine to generate predictions of future spending by a consumer in each merchant segment using inputs from the data postprocessing module 410.
FIG. 5 illustrates one embodiment of the predictive model generation system 440 that includes three modules: a merchant vector generation module 510, a clustering module 520, and a predictive model generation module 530.
1. Merchant Vector Generation
Merchant vector generation is application of a context vector type analysis to the account data of the consumers, and more particularly to the master files 408. The operations for merchant vector generation are managed by the merchant vector generation module 510.
In order to obtain the initial merchant vectors, additional processing of the master files 408 precedes the analysis of which merchants co-occur in the master files 408. There are two, sequential, processes that are used on the merchant descriptions, stemming and equivalencing. These operations normalize variations of individual merchants names to a single common merchant name to allow for consistent identification of transaction at the merchant. This processing is managed by the vector generation module 510.
Stemming is the process of removing extraneous characters from the merchant descriptions. Examples of extraneous characters include punctuation and trailing numbers. Trailing numbers are removed because they usually indicate the particular store in a large chain (e.g. Wal-Mart #12345). It is preferable to identify all the outlets of a particular chain of stores as a single merchant description. Stemming optionally converts all letters to lower case, and replaces all space characters with a dash. This causes all merchant descriptions to be an unbroken string of non-space characters. The lower case constraint has the advantage of making it easy to distinguish non-stemmed merchant descriptions from stemmed descriptions.
Equivalencing is applied after stemming, and identifies various different spellings of a particular merchant's description as being associated with a single merchant description. For example, the "Roto-Rooter" company may occur in the transaction data with the following three stemmed merchant descriptions: "ROTO-ROOTER-SEWER-SERV", "ROTO-ROOTER-SERVICE", and "ROTO-ROOTER-SEWER-DR". An equivalence table is set up containing a root name and a list of all equivalent names. In this example, ROTO-ROOTER-SEWER-SERV becomes the root name, and the latter two of these descriptions are listed as equivalents. During operation, such as generation of subsequent master files 408 (e.g. the next monthly update), an identified equivalenced name is replaced with its root name from the equivalence table.
In one embodiment, equivalencing proceeds in two steps, with an optional third step. The first equivalencing step uses a fuzzy trigram-matching algorithm that attempts to find merchant descriptions with nearly identical spellings. This method collects statistics on all the trigrams (sets of three consecutive letters in a word) in all the merchant descriptions, and maintains a list of the trigrams in each merchant description. The method then determines a closeness score for any two merchant names that are supplied for comparison, based on the number of trigrams the merchant names have in common. If the two merchant names are scored as being sufficiently close, they are equivalenced. Appendix I, below, provides a novel trigram-matching algorithm useful for equivalencing merchant names (and other strings). This algorithm uses a vector representation of each trigram, based on trigram frequency in data set, to construct trigram vectors, and judges closeness based on vector dot products.
Preferably, equivalencing is applied only to merchants that are assigned the same SIC code. This constraint is useful since two merchants may have a similar name, but if they are in different SIC classifications there is a good chance that they are, in fact, different businesses.
The second equivalencing step consists of fixing a group of special cases. These special cases are identified as experience is gained with the particular set of transaction data being processed. There are two broad classes that cover most of these special cases: a place name is used instead of a number to identify specific outlets in a chain of stores, and some department stores append the name of the specific department to the name of the chain. An example of the first case is U-Haul, where stemmed descriptions look like U-HAUL-SAN-DIEGO, U-HAUL-ATLANTA, and the like. An example of the second case is Robinsons-May department stores, with stemmed descriptions like ROBINSONMAY-LEE-WOMEN, ROBINSONMAY-LEVI-SHORT, ROBINSONMAY-TRIFARI-CO, and ROBINSONMAY-JANE-ASHLE. In both cases, any merchant description in the correct SIC codes that contain the root name (e.g. U-HAUL or ROBINSONMAY) are equivalenced to the root name.
A third, optional step includes a manual inspection and correction of the descriptions for the highest frequency merchants. The number of merchants subjected to this inspection varies, depending upon the time constraints in the processing stream. This step catches the cases that are not amenable to the two previous steps. An example is Microsoft Network, with merchant descriptions like MICROSOFT-NET and MSN-BILLING. With enough examples from the transaction data, these merchant descriptors can also be added to the special cases in step two, above.
Preferably, at least one set of master files 408 is generated before the equivalencing is determined. This is desirable in order to compile statistics on frequencies of each merchant description within each SIC code before the equivalencing is started.
Once the equivalencing table is constructed, the original master files 408 are re-built using the equivalenced merchant descriptions. This step replaces all equivalenced merchant descriptors with their associated root names, thereby ensuring that all transactions for the merchant are associated with the same merchant descriptor. Subsequent incoming transaction data can be equivalenced before it is added to the master files, using the original equivalence table.
Given the equivalence table, a merchant descriptor frequency list can be determined describing the frequency of occurrence of each merchant descriptor (including its equivalents).
Once the equivalence table is defined an initial merchant vector is assigned to each root name. The merchant vector training based on co-occurrence is then performed, processing the master files by account ID and then by date as described above.
2. Training of Merchant Vectors: The UDL Algorithm
As noted above, the merchant vectors are based on the co-occurrence of merchants in each consumer's transaction data. The master files 408, which are ordered by account and within account by transaction date, are processed by account, and then in date order to identify groups of co-occurring merchants. The co-occurrence of merchant names (once equivalenced) is the basis of updating the values of the merchant vectors.
The training of merchant vectors is based upon the unexpected deviation of co-occurrences of merchants in transactions. More particularly, an expected rate at which any pair of merchants co-occur in the transaction data is estimated based upon the frequency with which each individual merchant appears in co-occurrence with any other merchants, and a total number of co-occurrence events. The actual number of co-occurrences of a pair of merchants is determined. If a pair of merchants co-occur more frequently then expected, then the merchants are positively related, and the strength of that relationship is a function of the "unexpected" amount of co-occurrence. If the pair of merchants co-occurs less frequently than expected, then the merchants are negatively related. If a pair of merchants co-occurs in the data about the same as expected, then there is generally no relationship between them. Using the relationship strengths of each pair of merchants as the desired dot product between the merchant vectors, the values of the merchant vectors can be determined in the vector space. This process is the basis of the Unexpected Deviation Learning algorithm or "UDL".
This approach overcomes the problems associated with conventional vector based models of representation, which tend to be based on overall frequencies of terms relative to the database as a whole. Specifically, in a conventional model, the high frequency merchants, that is merchants for which there are many, many purchases, would co-occur with many other merchants, and either falsely suggest that these other merchants are related to the high frequency merchants, or simply be so heavily down-weighted as to have very little influence at all. That is, a high frequency merchant names would be treated as high frequency English words like "the" and "and", and so forth, which are given very low weights in conventional vector systems specifically because of their high frequency.
However, the present invention takes account of the high frequency presence of individual merchants, and instead analyses the expected rate at which merchants, including high frequency merchants, co-occur with other merchants. High frequency merchants are expected to co-occur more frequently. If a high frequency merchant and another merchant co-occur even more frequently than expected, then there is a positive correlation between them. The present invention thus accounts for the high frequency merchants in a manner that conventional methodologies cannot.
The overall process of modeling the merchant vectors using unexpected deviation is as follows:
1. First, count the number of times that the merchants co-occur with one another in the transaction data. The intuition is that related merchants occur together often, whereas unrelated merchants do not occur together often.
2. Next, calculate the relationship strength between merchants based on how much the observed co-occurrence deviated from the expected co-occurrence.
The relationship strength has the following characteristics:
Two merchants that co-occur significantly more often than expected are positively related to one another.
Two merchants that co-occur significantly less often than expected are negatively related to one another.
Two merchants that co-occur about the number of times expected are not related.
3. Map the relationship strength onto vector space; that is, determine the desired dot product between the merchant vectors for all pairs of items given their relationship strength. The mapping results in the following characteristics:
The merchant vectors for positively related merchants have a positive dot product.
The merchant vectors for negatively related merchants have a negative dot product.
The merchant vectors for unrelated merchants have a zero dot product.
4. Update the merchant vectors from their initial assignments, so that the dot products between them at least closely approximate the desired dot products.
The next sections explain this process in further detail.
a) Co-occurrence Counting
Co-occurrence counting is the procedure of counting the number of times that two items, here merchant descriptions, co-occur within a fixed size co-occurrence window in some set of data, here the transactions of the consumers. Counting can be done forwards, backwards, or bi-directionally. The best way to illustrate co-occurrence counting is to give an example for each type of co-occurrence count:
Example:
Consider the sequence of merchant names:
M1 M3 M1 M3 M3 M2 M3
where M1, M2 and M3 stands for arbitrary merchant names as they might appear in a sequence of transactions by a consumer. For the purposes of this example, intervening data, such as dates of transactions, amounts, transaction identifiers, and the like, are ignored. Further assume a co-occurrence window with a size=3. Here, the co-occurrence window is based on a simple count of items or transactions, and thus the co-occurrence window represents a group of three transactions in sequence.
i) Forward Co-occurrence Counting
The first step in the counting process is to set up the forward co-occurrence windows. FIG. 6a illustrates the co-occurrence windows 602 for forward co-occurrence counting of this sequence of merchant names. By definition, each merchant name is a target 604, indicated by an arrow, for one and only one co-occurrence window 602. Therefore, in this example there are seven forward co-occurrence windows 602, labeled 1 through 7. The other merchant names within a given co-occurrence window 602 are called the neighbors 606. In forward co-occurrence counting, the neighbors occur after the target. For window size=3 there can be at most three neighbors 606 within a given co-occurrence window 602. Obviously, the larger the window size, the more merchants (and transactions) are deemed to co-occur at a time.
the next step is to build a table containing all co-occurrence events. A co-occurrence event is simply a pairing of a target 604 with a neighbor 606. For the co-occurrence window #1 in FIG. 6a, the target is M1 and the neighbors are M3, M1, and M3. Therefore, the occurrence events in this window are: (M1, M3), (M1, M1), and (M1, M3). Table 5 contains the complete listing of co-occurrence events for every co-occurrence window in this example.
TABLE 5
Forward co-occurrence event table
Co-occurrence
Window Target Neighbor
1 M1 M3
1 M1 M1
1 M1 M3
2 M3 M1
2 M3 M3
2 M3 M3
3 M1 M3
3 M1 M3
3 M1 M2
4 M3 M3
4 M3 M2
4 M3 M3
5 M3 M2
5 M3 M3
6 M2 M3
The last step is to tabulate the number of times that each unique co-occurrence event occurred. A unique co-occurrence event is the combination (in any order) of two merchant names. Table 6 shows this tabulation in matrix form. The rows indicate the targets and the columns indicate the neighbors. For future reference, this matrix will be called the forward co-occurrence matrix.
TABLE 6
Forward Co-occurrence matrix
Neighbor
M1 M2 M3
Target M1 1 1 4 6
M2 0 0 1 1
M3 1 2 5 8
2 3 10 15
ii) Backward Co-occurrence Counting
Backward co-occurrence counting is done in the same manner as forward co-occurrence counting concept that the neighbors precede the target in the co-occurrence windows. FIG. 6b illustrates the co-occurrence windows for the same sequence of merchant names for backward co-occurrence counting.
Once the co-occurrence windows are specified, the co-occurrence events can be identified and counted.
TABLE 7
Backward co-occurrence event table
Co-occurrence
Window Target Neighbor
1 M3 M2
1 M3 M3
1 M3 M3
2 M2 M3
2 M2 M3
2 M2 M1
3 M3 M3
3 M3 M1
3 M3 M3
4 M3 M1
4 M3 M3
4 M3 M1
5 M1 M3
5 M1 M1
6 M3 M1
The number of times that each unique co-occurrence event occurred is then recorded in the backward co-occurrence matrix.
TABLE 8
Backward Co-occurrence matrix
Neighbor
M1 M2 M3
Target M1 1 0 4 2
M2 1 0 2 3
M3 4 1 5 10
6 1 8 15
Note that the forward co-occurrence matrix and the backward co-occurrence matrix are the transpose of one another. This relationship is intuitive, because backward co-occurrence counting is the same as forward co-occurrence counting with the transaction stream reversed. Thus, there is no need to do both counts; either count can be used, and then the transpose the resulting co-occurrence matrix taken to obtain the other.
iii) Bi-directional Co-occurrence Counting
The bi-directional co-occurrence matrix is just the sum of the forward co-occurrence matrix and the backward co-occurrence matrix. The resulting matrix will always be symmetric. In other words, the co-occurrence between merchant names A and B is the same as the co-occurrence between merchant names B and A. This property is desirable because this same symmetry is inherent in vector space; that is for merchant vectors V.sub.A and V.sub.B for merchants A and B, V.sub.A.multidot.V.sub.B =V.sub.B.multidot.V.sub.A. For this reason, the preferred embodiment uses the bi-directional co-occurrence matrix.
TABLE 9
Bi-directional Co-occurrence matrix
Neighbor
M1 M2 M3
Target M1 2 1 5 8
M2 1 0 3 4
M3 5 3 10 18
8 4 18 30
FIGS. 7a and 7b illustrate the above concepts in the context of consumer transaction data in the master files 408. In FIG. 7a there is shown a portion of the master file 408 containing transactions of a particular customer. This data is prior to the stemming and equivalencing steps described above, and so includes the original names of the merchants with spaces, store numbers and locations and other extraneous data.
FIG. 7b illustrates the same data after stemming and equivalencing. Notice that the two transactions at STAPLES that previously identified a store number are now equivalenced. The two car rental transactions at ALAMO which transactions previously included the location are equivalenced to ALAMO, as are two hotel stays at HILTON that also previously included the hotel location. Further note that the HILTON transactions specified the location prior to the hotel name. Finally, the two transactions at NORDSTROMS that previously identified a department have been equivalenced to the store name itself.
Further, a single forward co-occurrence window 700 is shown with the target 702 being the first transaction at the HILTON, and the next three transactions being neighbors 704.
Accordingly, following the updating of the master files 408 with the stemmed and equivalenced names, the merchant vector generation module 510 performs the following steps for each consumer account:
1. Read the transaction data in date order.
2. Forward count the co-occurrences of merchant names in the transaction data, using a predetermined co-occurrence window.
3. Generate the forward co-occurrence, backward co-occurrence and bi-directional co-occurrence matrixes.
One preferred embodiment uses a co-occurrence window size of three transactions. This captures the transactions as the co-occurring events (and not the presence of merchant names within three words of each other) based only on sequence. In an alternate embodiment the co-occurrence window is time-based using a date range in order to identify co-occurring events. For example, with a co-occurrence window of 1 week, given a target transaction, a co-occurring neighbor transaction occurs within one week of the target transaction. Yet another date approach is to define the target not as a transaction, but rather as a target time period, and then the co-occurrence window as another time period. For example, the target period can be a three-month block and so all transactions within the block are the targets, and then the co-occurrence window may be all transactions in the two months following the target period. Thus, each merchant having a transaction in the target period co-occurs with each merchant (same or other) having a transaction in the co-occurrence period. Those of skill in the art can readily devise alternate co-occurrence definitions that capture the sequence and/or time related principles of co-occurrence in accordance with the present invention.
b) Estimating Expected Co-occurrence Counts
In order to determine whether two merchants are related, the UDL algorithm uses an estimate about the number of times transactions at such merchants would be expected to occur. Suppose the only information known about transaction data is the number of times that each merchant name appeared in co-occurrence events. Given no additional information, the correlation between any two merchant names, that is how strongly they are related, cannot be determined. In other words, we would be unable to determine whether the occurrence of a transaction at one merchant increases or decreases the likelihood of occurrence of a transaction at another merchant.
Now suppose that it is desired to predict the number of times two arbitrary merchants, merchant.sub.i and merchant.sub.j co-occur. In the absence of any additional information we would have to assume that merchant.sub.i and merchant.sub.j, are not correlated. In terms of probability theory, this means that the occurrence of a transaction at merchant.sub.i will not affect the probability of the occurrence of a transaction at merchant.sub.j :
P.sub.j.vertline.i =P.sub.j [1]
The joint probability of merchant.sub.i and merchant.sub.j is given by
P.sub.ij =P.sub.i P.sub.j.vertline.i [2]
Substituting P.sub.j for P.sub.j.vertline.i into equation [2] gives
P.sub.ij P.sub.i P.sub.j.vertline.i =P.sub.i P.sub.j [3]
However, the true probabilities P.sub.i and P.sub.j, are unknown, and so they must be estimated from the limited information given about the data. In this scenario, the maximum likelihood estimate P for P.sub.i and P.sub.j is ##EQU1##
where
T.sub.i is the number of co-occurrence events that merchant.sub.i appeared in,
T.sub.j is the number of co-occurrence events that merchant appeared in, and
T is the total number of co-occurrence events.
These data values are taken from the bi-directional co-occurrence matrix. Substituting these estimates into equation [3] produces ##EQU2##
which is the estimate for P.sub.ij.
Since there are a total of T independent co-occurrence events in the transaction data, the expected number of co-occurring transactions of merchant and merchant.sub.j is ##EQU3##
This expected value serves as a reference point for determining the correlation between any two merchants in the transaction data. If two merchants co-occur significantly greater than expected by T.sub.ij, the two merchants are positively related. Similarly, if two merchants co-occur significantly less expected, the two merchants are negatively related. Otherwise, the two merchants are practically unrelated.
Also, given the joint probability estimate P.sub.ij and the number of independent co-occurrence events T, the estimated probability distribution function for the number of times that merchant.sub.i and merchant.sub.j co-occur can be determined. It is well known, from probability theory, that an experiment having T independent trials (here transactions) and a probability of success P.sub.ij for each trial (success here being co-occurrence of merchant.sub.i and merchant.sub.j) can be modeled using the binomial distribution. The total number of successes k, which in this case represents the number of co-occurrences of merchants, has the following probability distribution: ##EQU4##
This distribution has mean: ##EQU5##
which is the same value as was previously estimated using a different approach.
The distribution has variance: ##EQU6##
The variance is used indirectly in UDL 1, below. The standard deviation of t.sub.ij, .sigma..sub.ij, is the square root of the variance Var[t.sub.ij ]. If merchant.sub.i and merchant.sub.j are not related, the difference between the actual and expected co-occurrence counts, T.sub.ij -T.sub.ij, should not be much larger than .sigma..sub.ij.
c) Desired Dot-Products between Merchant Vectors
To calculate the desired dot product (d.sub.ij) between two merchants' vectors, the UDL algorithm compares the number of observed co-occurrences (found in the bi-directional co-occurrence matrix) to the number of expected co-occurrences. First, it calculates a raw relationship measure (r.sub.ij) from the co-occurrence counts, and then it calculates a desired dot product d.sub.ij from r.sub.ij. There are at least three different ways that the relationship strength and desired dot product can be calculated from the co-occurrence data:
Method: UDL1 ##EQU7##
Method: UDL2
r.sub.ij =sign(T.sub.ij -T.sub.ij).multidot.√2ln.lambda. [13]
##EQU8##
Method: UDL3 ##EQU9##
where T.sub.ij is the actual number of co-occurrence events for merchant.sub.i and merchant.sub.j, and .sigma..sub.r is the standard deviation of all the r.sub.ij.
In UDL2 and UDL3, the log-likelihood ratio, ln.lambda. is given by: ##EQU10##
Each technique calculates the unexpected deviation, that is, the deviation of the actual co-occurrence count from the expected co-occurrence count. In terms of the previously defined variables, the unexpected deviation is:
D.sub.ij =T.sub.ij -T.sub.ij [16]
Thus, D.sub.ij may be understood as a raw measure of unexpected deviation.
As each method uses the same unexpected deviation measure, the only difference between each technique is that they use different formulas to calculate r.sub.ij from D.sub.ij. (Note that other calculations of dot product may be used).
The first technique, UDL1, defines r.sub.ij to be the unexpected deviation D.sub.ij divided by the standard deviation of the predicted co-occurrence count. This formula for the relationship measure is closely related to chi-squared (.chi..sup.2), a significance measure commonly used by statisticians. In fact ##EQU11##
For small counts situations, i.e. when T.sub.ij <<1, UDL1 gives overly large values for r.sub.ij. For example, in a typical retail transaction data set, which has more than 90% small counts, values of r.sub.ij on the order of 10.sup.9 have been seen. Data sets having such a high percentage of large relationship measures can be problematic; because in these cases, .sigma..sub.r also becomes very large. Since the same .sigma..sub.r is used by all co-occurrence pairs, large values of .sigma..sub.r causes r.sub.ij /.sigma..sub.r to become very small for pairs that do not suffer from small counts. Therefore in these cases d.sub.ij becomes ##EQU12##
This property is not desirable, because it forces the merchant vectors of two merchants too be orthogonal, even when the two merchants co-occur significantly greater than expected.
The second technique, UDL2, overcomes of the small count problem by using log-likelihood ratio estimates to calculate r.sub.ij. It has been shown that log-likelihood ratios have much better small count behavior than .chi..sup.2, while at the same time retaining the same behavior as .chi..sup.2 in the non-small count regions.
The third technique, UDL3, is a slightly modified version of UDL2. The only difference is that the log likelihood ratio estimate is scaled by. ##EQU13##
This scaling removes the √{circumflex over(T)}.sub.ij bias from the log likelihood ratio estimate. The preferred embodiment uses UDL2 in most cases.
Accordingly, the present invention generally proceeds as follows:
1. For each pair of root merchant names, determine the expected number of co-occurrences of the pair from total number of co-occurrence transactions involving each merchant name (with any merchant) and the total number of co-occurrence transactions.
2. For each pair of root merchant names, determine a relationship strength measure based on the difference between the expected number of co-occurrences and the actual number of co-occurrences.
3. For each pair of root merchant names, determine a desired dot product between the merchant vectors from the relationship strength measure.
d) Merchant Vector Training
The goal of vector training is to position the merchant vectors in a high-dimensional vector space such that the dot products between them closely approximates their desired dot products. (In a preferred embodiment, the vector space has 280 dimensions, though more or less could be used). Stated more formally:
Given a set of merchant vectors V={V.sub.1, V.sub.2, . . . , V.sub.N, and the set of desired dot products for each pair of vectors D={d.sub.12,d,.sub.13, . . . ,d.sub.1N,d.sub.21,d.sub.23, . . . ,d.sub.2N,d.sub.3,1 . . . ,d.sub.N(N-1), position each merchant vector such that a cost function is minimized, e.g.: ##EQU14##
In a typical master file 408 of typical transaction data, the set of merchants vectors contains tens thousand or more vectors. This means that if it desired to find the optimal solution, then there must be solved a system of ten thousand or more high-dimensional linear equations. This calculation is normally prohibitive given the types of time frames in which the information is desired. Therefore, alternative techniques for minimizing the cost function are preferred.
One such approach is based on gradient descent. In this technique, the desired dot product is compared to the actual dot product for each pair of merchant vectors. If the dot product between a pair of vectors is less than desired, the two vectors are moved closer together. If the dot product between a pair of vectors is greater than desired, the two vectors are moved farther apart. Written in terms of vector equations, this update rule is:
V.sub.i (n+1)=V.sub.i (n)+.alpha.(d.sub.ij -V.sub.i.multidot.V.sub.j)V.sub.j [20] ##EQU15## Vj(n+1)=V.sub.j (n)+.alpha.(d.sub.ij -V.sub.i.multidot.V.sub.j)V.sub.i [22] ##EQU16##
This technique converges as long as the learning rate (.alpha.) is sufficiently small (and determined by analysis of the particular transaction data being used; typically in the range 0.1-0.5), however the convergence may be very slow.
An alternative methodology uses averages of merchant vectors. In this embodiment, the desired position of a current merchant vector is determined with respect to each other merchant vector given the current position of the other merchant vector |