Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications6742003Abstract A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed. The system permits a user to browse through the hierarchy, and, to readily comprehend segment inter-relationships, selectively expand and contract the displayed hierarchy, as desired, as well as to compare two selected segments or segment groups together and graphically display the results of that comparison. An alternative discriminant-based cluster scoring technique is also presented. Claims We claim: Description BACKGROUND OF THE DISCLOSURE
TABLE 1
Number
of
Category Name Records
broad 18
web tools 15789
developer 6632
advanced office 3868
office 12085
ie 22621
enterprise 10162
office support 9516
ie support 6687
windows support 12618
Window 200 contains a display area 201 and a slider 202. The similarity network 220 within the display area contains a node for each category and an arc for each pair of categories whose similarity is above the similarity threshold. For example, node 203 representing category "ie support" and node 204 representing category "windows support" have a similarity that is above the similarity threshold and are thus connected by arc 206. However, the similarity between category "ie support" and category "enterprise" is below the similarity threshold. Therefore, the similarity network has no arc between node 205 representing category "enterprise" and node 203 representing category "ie support." The shading within the nodes of the similarity graph indicate the size (i.e., number of records) of the category that the node represents relative to the category with the most number of records. Since category "ie" contains more records than any other category, the CV system shades the entire node representing category "ie." Since category "windows support" has a number of records that is approximately one-half the number of records in category "ie," the CV system shades approximately one-half of the node representing category "windows support." Alternatively, the shading of the nodes can represent the number of records in the category in relation to a total number of records in the collection. In such a case, the CV system would shade approximately 10% of the node representing a category that contains 10% of the records of the collection. The nodes of a category graph can also have various graphic shapes. The nodes of the similarity graph in this example are displayed as an oval containing the name of the category that the node represents. Alternatively, the nodes may be any shape such as a circle or a rectangle. FIG. 2B illustrates a sample rectangular node. The node contains the name of the category and the number of records in the category. The node also contains a shaded portion, the size of which indicates a proportion of the number of records in that category to the total number of records in the collection. Alternatively, the node might also display other statistical information such as an average value of an attribute (e.g., age) for records in the category or the mode of an attribute (e.g., color). The CV system provides vertical slider 202, which alternatively may be displayed as a horizontal slider, to allow the data analyst to set the similarity threshold. As the data analyst moves the slider up and down, the similarity threshold increases or decreases, respectively. FIG. 2C illustrates an exemplary similarity graph after the data analyst has decreased the similarity threshold by moving the slider down. In this example, the similarity between category "enterprise" and category "ie support" is now greater than the similarity threshold. Thus, the CV system displays an arc 207 between node 205 representing category "enterprise" and node 203 representing category "ie support." If the data analyst then increases the similarity threshold by moving the slider to where it was previously positioned, then the CV system would remove arc 207. Although the arcs of FIG. 2C indicate categories whose similarity is above the similarity threshold, the arcs do not indicate relative similarity between categories. FIG. 2D illustrates an exemplary similarity graph indicating relative similarity. The CV system indicates the relative similarity of two categories by thickness of the arcs connecting the nodes. That is, the CV system displays a thick arc to connect nodes representing categories that are similar, and displays a thin arc to connect nodes representing categories that are not similar or not as similar. In this example, since category "ie support" and category "windows support" are the most similar categories, the CV system has drawn arc 206 connecting node 203 representing category "ie support" and node 204 representing category "windows support" with a thickest width. The CV system may alternatively use various graphic representations as indications of similarity between categories. For example, proximity of the nodes to one another may indicate the similarity. That is, nodes that are displayed closest to each other are most similar. Also, the similarity of nodes may be indicated by color of the arcs. For example, a green arc may indicate a high degree of similarity, whereas a red arc may indicate a low degree of similarity, and so forth with other colors and similarity differences. Through manipulation of slider 202, the number of similarity arcs shown in a similarity network can range, with the slider at one end of its travel, from no arcs being shown to, with the slider at an opposing end of its travel, all pair-wise connections being shown. In practice, however, it is sometimes useful to limit an upper range of the slider so that not all arcs are shown. A useful upper limit for the slider is a point at which (1) the similarity network is connected (one can travel from any one node to any other) and (2) the number of arcs shown is a minimum. Furthermore, in practice, it is also useful to layout the similarity network in a manner that is pleasing to the eye. One preferred mode for doing so is to use a spring model as described in T. M. J. Fruchtermann et al, "Graph drawing by force-directed placement", Software Practice and Experience, Vol. 21, No. 11, 1991, pages 1129-1164 (which is incorporated by reference herein), where an apparent attractive force between nodes depends on the similarity measure between those nodes and particularly is proportional to a similarity score between those nodes. The CV system allows the data analyst to control combining and splitting of categories. In particular, the CV system allows the data analyst to combine categories that are most similar and to split categories that have been combined. The combining and splitting of categories allows the data analyst to focus on more or fewer categories, as needed. FIG. 2E illustrates the combining of the most similar categories. Here, slider 202 may be used to control the combining and splitting of categories. As the user moves the slider up an increment, the CV system selects the two categories represented by displayed nodes that are most similar and combines those categories into a single category. The CV system then removes the node for each of the categories to be combined along with arcs connected to those categories and displays a new single node representing the combined category. In this example, categories "ie support" and "windows support" are most similar. Therefore, nodes 203 and 204 and arcs connected to those nodes have been removed and node 210 representing the combined category "ie and windows support" has been added. As the user incrementally moves the slider down, the CV system splits the categories that were last combined. Thus, when the slider is moved down an increment after being incrementally moved up, then the CV system displays the same similarity graph that was displayed before the data analyst moved the slider. The CV system may animate combining and splitting of categories. That is, the CV system shows the two nodes representing categories to be combined moving towards each other to form a single node representing the combined categories. The CV system animates the splitting of nodes by showing the reverse process. To further help a data analyst focus on certain categories, the CV system allows a data analyst to de-emphasize a category. FIG. 2F illustrates the de-emphasizing of categories. When the data analyst instructs the system to de-emphasize a category, the system either removes the node representing that category and all connecting arcs from the similarity graph or displays that node and connecting arcs in a dimmed manner. For example, if the data analyst instructs the system to de-emphasize category "windows support," then the CV system removes node 204 representing category "windows support" and connecting arcs 206 and 212. FIGS. 3A-3K and 4A-4B illustrate the display of a hierarchical map. The CV system creates a hierarchical map by starting with the base categories, and successively and iteratively combining the most similar categories in order to generate combined categories until a single combined category contains all the records of the collection. The construction of the hierarchy can be guided by an automated procedure (e.g., as described herein), by direct input from a user providing guidance as to which nodes should be merged or split next, or by a combination of both using occasional user interaction. The hierarchical map can be displayed in either tree format or circular format. With tree format selected, the CV system displays the hierarchical map in a standard tree data structure layout with the root node at a top of the display and the leaf nodes at the bottom of the display. Alternatively, the CV system may display the tree data structure upside-down with the root node at the bottom of the display and leaf nodes at the top of the display or sideways with the root node at one side of the display and the leaf nodes at an opposing side of the display. With circular format selected, the CV system displays the hierarchical map in a circular layout with the leaf nodes at the perimeter of a circle and the root node at the center. FIGS. 3A-3K illustrate display of a hierarchical map in a tree format. FIG. 3A illustrates the display of a hierarchical map in the tree format with leaf nodes horizontally aligned. The hierarchical map 300 contains leaf nodes 301-310 for corresponding base categories. The non-leaf nodes represent combined categories. For example, node 311 represents a combined category "support" that is a combination of category "office support" and category "windows support." Thus, the category represented by node 311 contains the records of the categories "office support" and "windows support." Root node 319 of the hierarchical map represents a category that contains all the records in the collection. In FIG. 3A, all the leaf nodes are displayed horizontally aligned. In contrast, FIG. 3B illustrates a hierarchical map in which the leaf nodes are not horizontally aligned. The CV system allows a data analyst to select whether to display the leaf nodes horizontally aligned. When the leaf nodes are horizontally aligned, it may be easier for the data analyst to visually identify the base categories, but more difficult for the data analyst to identify the sub-categories of a combined category. Many of the user interface features of the similarity network have analogous features in the hierarchical map. For example, FIG. 3C illustrates the de-emphasizing a base category. In this example, the data analyst has selected to de-emphasize node 301 representing base category "office support." The CV system de-emphasizes node 301 by dimming or removing it. FIG. 3D illustrates de-emphasizing a combined category. In this example, the data analyst has selected to de-emphasize node 316 representing the combined category "support/enterprise." The data analyst can select to de-emphasize both the selected node and all its descendent nodes (i.e., the subtree with the selected node as its root) or only the descendent nodes. If a data analyst selects to de-emphasize a subtree, then the CV system can represent the subtree as a single node or can dim or remove the subtree. When a data analyst moves a cursor over the nodes of a category graph, the CV system displays additional information for the node. FIG. 3E illustrates movement of the cursor over a node of a hierarchical map. In this example, the data analyst has moved a cursor over node 309 representing category "office advanced." In this example, the complete name of the category is displayed. Alternatively, additional information about the node could be displayed, such as the number of records in the category. The CV system allows a data analyst to browse through a hierarchical map in either a top-down or bottom-up manner. The browsing displays the base categories and combined categories based on similarity. When browsing from the bottom up, the CV system displays nodes representing combined categories (along with child nodes) in the same order as those combined categories were generated when the hierarchical map was created. When browsing from the top down, the CV system displays the nodes representing combined categories in the reverse order. When browsing in a top-down manner, the CV system first displays the root node and its two child nodes because the root node represents the combined category that was generated last. The CV system displays "next" and "previous" buttons for browsing down and up the nodes in the hierarchy. Alternatively, the CV system provides a slider that allows the data analyst to move forward ("next") and backward ("previous") for browsing up and down the hierarchy of nodes. In response to the data analyst selecting the "next" button, the CV system displays child nodes representing the sub-categories of the displayed node but in a reverse order to that which the combined categories were generated. Also, in response to a data analyst selection of the "previous" button, the CV system removes the last child nodes displayed. When browsing in a bottom-up manner, the CV system first displays the node (and its child nodes) representing the combined category that was generated first. In response to the data analyst selection of "next node," the CV system displays the node (and child nodes if not already displayed) representing the combined category that was next generated. Also, in response to a data analyst selection of the "previous" button, the CV system removes the node(s) displayed most recently. The CV system supports browsing a hierarchical map that is displayed in either tree or circular format. FIGS. 3F-3K illustrate the browsing features of the CV system. The browsing features allow the user to incrementally display the hierarchical map in either a top-down or a bottom-up manner. When the user selects a top-down browse, root node 319 and its two child nodes 310 and 318 are displayed initially. At each request to browse down, additional child nodes are displayed in the reverse order in which the child nodes were combined to generate combined categories. As shown in FIG. 3G, as the data analyst first requests to browse down, the CV system displays node 316 representing the combined category "support/enterprise" and node 317 representing category "other." When the data analyst next requests to browse down, the CV system displays node 312 representing category "novice" and node 315 representing category "advanced," which are child nodes of node 317 representing category "other." When the data analyst then requests to browse down, the CV system displays nodes 307 representing category "web tools" and node 313 representing category "miscellaneous," which are child nodes of node 315 representing category "advanced." In this example, the data analyst has selected to re-center the node that is being browsed down in the center of the display. Thus, node 315 is shown in the center of the display. When in browsing mode, the data analyst may select a node to display a list of various options for displaying information relating to the nodes. FIG. 3H illustrates the list of options for a selected node. In this example, the data analyst has selected node 315 representing category "advanced." When the node is selected, the CV system displays a pop-up window indicating the various options that may be selected by the user. Table 2 below lists the options.
TABLE 2
Node summary
Compare this node with parent
Compare this node with sibling
Compare this node to rest of the
world
Compare this node with left child
Compare this node with right child
Compare the children of this node
A "node summary" includes more detailed information about the category that the node represents. For example, the node summary may include the number of records in the category and the percentage of the records that have various attribute values, which is referred to as "characteristic information". The "compare" options display similarity and discriminating information between the selected category and other categories. The discriminating information indicates which attributes distinguish the record in the selected category from records in other categories. FIGS. 3I-3K illustrate browsing in a bottom-up manner. Specifically, FIG. 3I depicts an initial display in a bottom-up browse. In this example, node 313 representing combined category "miscellaneous" is displayed along with its child node 308 representing category "developer" and child node 309 representing category "office advanced", because the combined category "miscellaneous" was the first combined category generated when generating the hierarchical map. Each time the user selects the "next" button, an additional combined category is displayed in the order that the combined categories was generated. FIG. 3J illustrates a display of the hierarchical map after the user has selected the "next" button three times. When the data analyst selects "next" button the first time, then the CV system displays node 311 representing the "support" category plus its child node 301 representing category "office support" and child node 302 representing category "windows support." When the data analyst selects the "next" button for the second time, then the CV system displays node 312 representing category "novice" and its child node 305 representing category "office" and child node 306 representing category "ie." When the data analyst selects the "next" button for the third time, the CV system displays node 314 representing category "support" along with its child node 303 representing the category "ie support." The other child node 311 representing combined category "support" is already displayed. FIG. 3K depicts selection of node 314 representing the category "support." The data analyst may also use a slider to browse the hierarchy up or down rather than use the "previous" and "next" buttons. The CV system can also animate the browsing of the hierarchical maps. When animating the browsing in a bottom-up manner, the CV system progressively displays the nodes from the bottom of the hierarchy towards the top at, for example, periodic time intervals. When animating browsing in a top-down manner, the CV system displays the root node first and then displays additional nodes periodically until the leaf nodes are displayed. FIG. 4 illustrates a hierarchical map displayed in a circular format. The leaf nodes of the hierarchy are displayed in a circle. The root node of the hierarchy is displayed in a center of the circle. The other non-leaf nodes are displayed between the root node and the surrounding leaf nodes. The same visualization features (e.g., browsing and de-emphasizing) that are used with the tree format can be used with the circular format of the hierarchical map. Also, similarity information can be displayed along with a hierarchical map by, for example, using different color arcs to connect nodes representing the different categories. Thus, a similarity graph is effectively superimposed on a hierarchical map. The CV system displays additional information about categories when requested by a data analyst. This additional information includes characteristic and discriminating information. FIGS. 5A-5C illustrate weights of evidence information that may be displayed when a data analyst selects a node of a category graph. The weights of evidence information includes identification of discriminating pages and characteristic pages. FIG. 5A illustrates the display of the characteristics pages of category "enterprise." The characteristic pages list the web pages that are accessed by the users in a category in order based on a corresponding probability that a user in the category accesses each such web page. The probability for any such page is equal to the number of users in the category who access the web page divided by the number of users in the category. The characteristic pages of category "enterprise" indicates that a user in that category has 0.915 probability of accessing the "windows" web page. Also, a user in that category has a 0.62 probability of accessing the "products" web page. FIG. 5B illustrates the discriminating pages for the category "enterprise." The top panel illustrates the web pages that discriminate the category "enterprise" from all other categories. The web pages are listed in order based on their ability to discriminate all other categories. Web pages that tend to be accessed by the users of a category and not accessed by users of the other categories are likely to be most discriminating. In this example, the "windows" web page, the "ntserver" web page, the "products" web page, and so on serve to discriminate users in category "enterprise" from all others. A bottom panel indicates the web pages that discriminate all other categories from "enterprise" category. Web pages accessed by users of the other categories and not accessed by users of a selected category tend to be most discriminating. In this example, the "workshop" web page, the "ie" web page, and so on are used to discriminate all of the categories from the category "enterprise." An example mathematical basis for discrimination is provided below. FIG. 5C illustrates the display of pair-wise discrimination for two categories. In this example, the user has selected to display information that tends to discriminate category "office support" from category "ie support." As shown by a top panel, the users of the category "office support" tend to use the "office" web page, whereas users of category "ie support" tend not to use the "office" web page. In contrast, the users of the category "ie support" tend to use the "ie" web page, whereas users of category "office support" tend not to use that particular web page. The CV system provides for displaying certain information in a 3-D graphical form. FIG. 6A illustrates a 3-D graph of probability that each attribute equals one for each binary attribute. The x-axis represents the categories (clusters), the y-axis represents the attributes, and the z-axis represents the probabilities. For example, the height of bar 601 represents the probability (of approximately 0.1) that a record in category 1 will have a value of one. In this example, indicator bars for a given attribute are shown in the same color or shade. FIG. 6B illustrates a 3-D graph of the same information as the graph of FIG. 6A except that the bars for a given category, rather than a given attribute, are shown in the same color or shade. These graphs therefore allow a data analyst to focus on attributes or categories. The CV system also provides for displaying categories in a decision tree format. FIG. 7 illustrates a decision tree format for displaying the categories of a collection. Decision tree 700 contains nodes corresponding to attributes and arcs corresponding to values of that attribute. The decision tree has node 701 corresponding to the attribute indicating whether a user accessed the "workshop" web page and arcs 701a and 701b indicating the values of zero and non-zero for that attribute. Node 702 corresponds to the attribute indicating whether a user accessed the "intdev" web page and arcs 702a and 702b indicating the values of 2 and not 2. Thus, each node, except the root node, represents a setting of attribute values as indicated by the arcs in the path from that node to the root node. When a data analyst selects a node, the CV system displays a probability for each category that a record in that category will have the attribute settings that are represented by the path. For example, when the data analyst selects node 703 representing the attribute setting of accessing the "workshop" web page at least once and accessing the "intdev" web page twice, the CV system displays table 704. The table identifies the categories, the number of records in each category that matches those attribute settings, and the probabilities. For example, the first line "0 5 0.0039" indicates that category 0 has 5 records that match the attribute settings and that the probability for category 0 is 0.0039. The CV system generates the decision tree by adding a column to a collection of records that contains the category of record. The CV system then applies a decision tree algorithm (see, e.g., D. Chickering, et al, "A Bayesian Approach to Learning Bayesian Networks with Local Structure," Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, 1997; which is incorporated by reference herein) to build a decision tree (or graph) in which the category column represents the target variable. Similarity, as used in the present invention, corresponds to "distance" between the records (cases) in two categories (clusters). We will now present a mathematical basis for calculating such a distance. In the following, X.sub.1, . . . , X.sub.m refers to the variables representing the attributes and x.sub.1, . . . , x.sub.m refers to the state of a variable, that is, the attribute values. First, however, various probabilities are defined that are used to calculate the distance. The probability of a record in a collection having attribute values x.sub.1, . . . , x.sub.m is represented by a joint probability density function given by the following equation: ##EQU1## where: h.sub.j represents category j, where p(h.sub.j) represents the probability that any record is in category j; p(x.sub.1, . . . , x.sub.m /h.sub.j) represents a conditional probability that a record has attribute values x.sub.1, . . . , x.sub.m given that it is a record from category j. The probability that a record is in category j is given by the following equation: ##EQU2## where: size (h.sub.j) is a count of a total number of records in category j, and the .alpha..sub.j are hyper-parameters (e.g., .alpha..sub.j 1 for all j). For example, if category j contains 10,000 records and the collection contains 100,1000 records, then p(h.sub.j)=0.1. It may be assumed that the probability, that a record with attribute values x.sub.1, . . . , x.sub.m is in category j, is the product of the probabilities for each attribute value that a record in category j has that attribute value and is given by the following equation: ##EQU3## where: p(x.sub.i /h.sub.j) is the conditional probability that a record has the attribute value x.sub.i for attribute i given that it is in category j. This probability is given by the following equation: ##EQU4## where: size(x.sub.i,h.sub.j) is the number of records in category j with a value for attribute i that equals the attribute value x.sub.i, where the summation is over all values of attribute i and where .alpha..sub.ij are hyper-parameters (e.g., .alpha..sub.ij =1, for all i and j). For example, if category j contains 10,000 records and 100 of those records have a value of 1 for attribute i, then p(1/h.sub.j)=0.01. Equation (1a) can be re-written by substituting Equation (1c) as the following equation: ##EQU5## Through a first technique, distance, i.e., similarity, between two categories is given by the sum of the Kullback-Leibler (KL) distance between the records in the first category and the records in the second category and the KL distance between the records in the second category and the records in the first category. This distance is given by the symmetric divergence (see H. Jefferys, Theory of Probability, (.COPYRGT. 1939, Oxford University Press)) as indicated in Equation 2(a) as follows: dist(h.sub.1,h.sub.2)=KL(p(X.sub.1, . . . , X.sub.m.vertline.h.sub.1),p(X.sub.1, . . . , X.sub.m.vertline.h.sub.2))+KL(p(X.sub.1, . . . X.sub.m.vertline.h.sub.2), p(X.sub.1, . . . , X.sub.m.vertline.h.sub.1)) (2a) Equation (2a) reduces to the following equation: ##EQU6## Thus, the distance between the first and second categories is the sum, for all possible combinations, of attribute values, of a first probability that a record with that combination of attribute values is in the first category minus a second probability that a record with that combination of attribute values is in the second category multiplied by a logarithm of the first probability divided by the second probability. Since Equation (2b) requires a summation over all possible combinations of attribute values, the determination of the similarity using this formula is computationally expensive. When Equation (1c) is substituted into Equation (2d), the result is the following equation: ##EQU7## Advantageously, Equation (2c) requires only the summation over all possible values of each attribute, and not over all possible combinations of attributes, and is thus computationally much more efficient than Equation (2b). Equation (2c) or, alternatively, Equation (2b) provides a way to calculate the similarity for a pair of base categories. Several different equations can be used to calculate the similarity between two combined categories. For example, when two categories are combined into a combined category, then the similarity between the combined category and every other category (combined or not combined) needs to be calculated for the display of a similarity graph. Equations (3a), (3b), and (3c) provide three different techniques for calculating the similarities with combined categories. The first technique averages the similarity between each pair of categories of the first and second combined categories and is given by the following equation: ##EQU8## where: G.sub.1 represents the first combined category and G.sub.2 represents the second combined category. Thus, the distance is the summation of the distances between each pair of categories multiplied by the probabilities (the latter being given by Equation (1b)) that a record is in each of the categories. The second and third techniques calculate the distance as either the minimum or maximum distance between any two pairs of categories in the first and second combined categories and are given by the following equations: dist(G.sub.1,G.sub.2)=min{dist(h.sub.j,h.sub.k).vertline.h.sub.j.epsilon.G. sub.1,h.sub.k.epsilon.G.sub.2 } (3b) dist(G.sub.1,G.sub.2)=max{dist(h.sub.j,h.sub.k).vertline.h.sub.j.epsilon.G. sub.1,h.sub.k.epsilon.G.sub.2 } (3c) Another technique for calculating the distance is by treating a combined category as a non-combined category having the records of the corresponding sub-categories. This technique results in Equation (4a) as follows: ##EQU9## where: p(x.sub.1, . . . , x.sub.m.vertline.G) is the conditional probability that a record has attribute values x.sub.1, . . . , x.sub.m given that it is a record from the combined category G. This probability is given by the following equation: ##EQU10## where: the denominator is the sum of the probabilities that any record is in each category G and the numerator is the sum for each category j in G of the probability that the record with attribute values x.sub.1, . . . , x.sub.m is in category j multiplied by the probability that a record in the collection is in category j. Equation (4a), however, cannot be factored in the same way as Equation (2b). Hence, determining the distance between combined categories G.sub.1 and G.sub.2 is computationally expensive because a summation over all possible combinations of attribute values is needed. For example, if there are 10 attributes with approximately 5 possible attribute values each, then there are approximately 10.sup.7 possible combinations of attribute values. Therefore, as one technique, the CV system approximates the distance using a Monte Carlo method such as simple sampling from G.sub.1 and G.sub.2 where S.sub.1, . . . , S.sub.r denote the samples from G.sub.1, and where t.sub.1, . . . , t.sub.s denote the samples from G.sub.2 (each s.sub.i and t.sub.i correspond to the observations x.sub.1, . . . , X.sub.n for all attributes). See, e.g., Shachter et al, "Simulation Approaches to General Probabilistic Inference in Belief Networks", Uncertainty in Artificial Intelligence, 1990, Vol. 5, pp. 221-231--which is incorporated by reference herein. The CV system approximates the distance between two combined categories by taking the sample data sets and applying them to the following: ##EQU11## where: p(s.sub.i.vertline.G.sub.j) and p(t.sub.i.vertline.G.sub.j) are computed using Equation (4b). The number of samples from G.sub.1 and G.sub.2 is taken in proportion to p(G.sub.1) and p(G.sub.2), where p(G.sub.j) is the probability that a record is in the set of categories defined by G.sub.j. This Monte Carlo method can be used to calculate the distance between both base and combined categories when Equation (2b), without an independence assumption, is used to determine distance. Another technique for calculating distance is to assume that the individual attributes are conditionally independent given G.sub.1, G.sub.2 and the set of clusters not in a union of G.sub.1 and G.sub.2, yielding Equation (5b) as follows: ##EQU12## As discussed above, attribute-value discrimination refers to how well the value of an attribute distinguishes the records of one category from the records of another category. One technique for calculating attribute-value discrimination is given by Equation (6a) as follows: ##EQU13## where: the probability that a record with a value of x.sub.i for attributes in combined category G.sub.1 is given by the following equation: ##EQU14## Attribute-value discrimination scores can be positive, negative or zero. If score discrim(x.sub.i.vertline.G.sub.1,G.sub.2) is positive, then an observation of the attribute value x.sub.i makes G.sub.1 more likely than G.sub.2. If the score discrim(x.sub.i.vertline.G.sub.1, G.sub.2) is negative, then the observation of the attribute-value x.sub.i makes G.sub.1 less likely than G.sub.2. If the score discrim(x.sub.i.vertline.G.sub.1, G.sub.2) is zero, then the observation of the attribute-value x.sub.i leaves the relative probabilities of G.sub.1 and G.sub.2 the same. The last case almost never occurs. There are several possibilities for displaying the attribute values and their corresponding discrimination scores. For example, in one instance, all attribute values are displayed such that: (1) the attribute values with positive and negative scores appear in separate areas of the screen, and (2) the attribute values with the largest scores (in absolute value) appear higher in the list. Alternatively, the discrimination scores for all attribute values except distinguished values (e.g., x.sub.i =0) are displayed. Also, non-binary attributes may be binarized into attributes that have only values zero and non-zero before being displayed. The homogeneity of a category indicates how similar the records of the category are to one another. The homogeneity is given by Equation (7) as follows: ##EQU15## where: G represents a category or a combined category and where p(G.vertline.x.sub.1, . . . , x.sub.m) is the probability that category G contains the record with attribute values x.sub.1, . . . , x.sub.m (obtainable from Bayes rule). FIG. 8 depicts, in high level form, implementational components of the first embodiment of the inventive CV system. As shown, the CV system executes on computer system 800 which includes a central processing unit, memory, and input/output devices. The CV system includes collection storage component 801, categorizer component 802, category storage component 803, user interface component 804 and analysis component 805. The collection storage component contains the attribute value for each attribute of each record in the collection. The categorizer component inputs the records of the collection storage component and identifies the various categories and stores the identification of the categories in the category storage component. The user interface component inputs data from the collection storage component and the category storage component and generates the various category graphs which are displayed on display 806. The user interface component invokes the analysis component to process the category storage information. The layout of the nodes can be determined by a variety of standard techniques for rendering graphs, including planar layouts, or any other scheme for minimizing edge crossings at display time. FIG. 9 depicts a flow diagram of routine 900, executed by computer system 800 shown in FIG. 8, for calculating the similarity of base categories. This routine, implemented through looping, selects each possible pair of base categories and calculates the similarity in accordance with Equation (2c) or Equation (2b) without the independence assumption. Clearly, many other distances can be used for calculating the similarity of categories in lieu of that specified in either of these two equations. For example, one could use an average hamming distance between records in each category. Specifically, through execution of step 901 shown in FIG. 9, routine 900 selects a first category h.sub.1. In step 902, if all the categories have already been selected as the first category, routine 900 terminates, else the routine continues at step 903. In step 903, this routine selects a second category h.sub.2 for which the similarity between the first and second categories has not yet been calculated. In step 904, if all such categories have already been selected, then routine 900 loops back to step 901 to select another first category, else the routine continues at step 905. In step 905, this routine calculates the similarity between the selected first and second categories and loops to step 903 to select another second category, and so forth. FIG. 10 depicts a flow diagram of routine 1000, executed by computer system 800 (shown in FIG. 8) for displaying a similarity graph. In particular, routine 1000 (shown in FIG. 10) displays a node for each base category and then displays an arc between those nodes representing categories whose similarity is above the similarity threshold. Specifically, through steps 1001-1003, routine 1000, using looping, displays nodes for the categories. In step 1001, the routine selects a category that has not yet been selected. In step 1002, if all the categories have already been selected, then routine 1000 continues at step 1004, else this routine continues at step 1003. In step 1003, routine 1000 displays a node representing the selected category and loops to step 1001 to select the next category. In steps 1004-1007, this routine loops displaying the arcs. In step 1004, the routine selects a pair of categories with a similarity above the similarity threshold. In step 1005, if all such pairs of categories have already been selected, then routine 1000 terminates, else this routine continues at step 1006. In step 1006, routine 1000 determines the thickness of the arc to be displayed between the selected pair of categories. In step 1007, the routine displays an arc of the determined thickness between the nodes representing the selected categories and loops to step 1004 to select another pair of categories. FIG. 11 depicts a flow diagram of routine 1100, executed by computer system 800 (see FIG. 8), for generating a hierarchical map. As shown in FIG. 11, routine 1100 starts with the base categories and successively combines categories that are most similar. In step 1101, this routine initializes a set of categories to contain each base category. In step 1102, if the set contains only one category, then the hierarchical map is complete and routine 1100 simply terminates, else this routine continues at step 1103. In step 1103, this routine selects the next pair of categories in the set that are most similar. Initially, the similarities of the base categories are calculated in accordance with routine 900 shown in FIG. 9. Through step 1104 (see FIG. 11), routine 1100 removes the selected pair of categories from the set. In step 1105, routine 1100 adds a combined category formed by the selected pair of categories to the set. In step 1106, routine 1100 calculates the similarity between the combined category and every other category in the set according to Equation (5) and loops back to step 1102 to determine whether the set contains only one category. FIG. 12 depicts a flow diagram of routine 1200, executed by computer system 800 (see FIG. 8), which displays a hierarchical map. Specifically, as shown in FIG. 12, in step 1201, routine 1200 selects a combined category starting with the last combined category that was generated. In step 1202, if all the combined categories have already been selected, then routine 1200 terminates, else routine 1200 continues at step 1203. In step 1203, this routine displays a node representing the selected combined category. In step 1204, routine 1200 displays an arc between the displayed node and its parent node. In step 1205, this routine displays a node representing any base sub-category of the combined category along with connecting arcs. Routine 1200 then loops back to step 1201 to select the next combined category, and so forth. FIG. 13 depicts, at a very high level, a block diagram of networked system 1300, that implements a second embodiment of the present invention, to provide clustering, cluster summarization, segment scoring, segment comparison and interactive hierarchical display of segments of cases that illustratively occur in an Internet-based electronic commerce environment. As shown, an Internet user stationed at client PC 1305 communicates through Internet 1320, via network connections 1315 and 1325, with server computer 1400 at a remote web site. This server implements, through Commerce Server system 1330, electronic commerce. Commerce Server system 1330 provides various functions that collectively implement infrastructure necessary to provide a comprehensive scalable, robust electronic business-to-business or business-to-consumer commerce web site; namely, user profiling, product cataloguing and content management, transaction processing, targeted marketing and merchandizing functionality, and analysis of consumer buying activities. These functions are provided, within system 1330, through web server 1340, transaction processor 1345, store 1350, which contains database 1360, and segment viewer 1500. Web server 1340 directly interacts, via Internet 1320 and network connections 1315 and 1325, with web browser 1307 situated within client PC 1307. Server 1340, as instructed by web browser 1307, downloads appropriate HTML web pages, stored in illustratively store 1350 (typically hard disk storage) and as symbolized by line 1343, to the browser for local display to the user situated at the client PC. This server also obtains responding cgi (common gateway interchange) messages sent by the browser and containing user-provided information of one sort or another in response to any of the displayed pages. Web server 1340 also specifies the pages accessed by the user to transaction processor 1345 and provides the transaction processor with the cgi responding messages it receives from the client PC. The transaction processor appropriately processes each transaction initiated by the user. In addition, the transaction processor updates database 1360 (also known as a "data warehouse") situate within store 1350 to reflect each user that visited the site served by server 1340, which may include not only those that completed a transaction, including storing the transaction details, but also those that did not, as well as with any user information (such as age, gender, income, preferences, etc.) entered by that user in response to a web page provided by server 1340. For each such user, database 1360 contains dataset 100 that contains a record for each such user along with predefined attributes (illustratively numbered 1 through j) for that user, and the class (category or cluster) to which that record is categorized. As noted, each such record together with all its attributes is commonly referred to as a "case". In addition, database 1360 also contains cluster data 1355 which specifies, e.g., clusters, segment and segment hierarchies. In accordance with our invention, segment viewer 1500, which operates on case and cluster data stored within database 1360, automatically generates appropriate clusters of cases and associated segments; and in response to user commands provided over line 1367 from a user, such as a business manager or data analyst that accesses commerce server 1330, compares user selected segments, and generates, on line 1363, a graphical display, based on calculated scored similarity values, of segment hierarchy. Segments are clusters of cases that exhibit similar behavior, such as users on a given site, and have similar properties, such as age or gender. A segment consists of a summary of the database records (cases) that belong to it. The summary, for which a mathematical basis is described hereinbelow, is derived from properties in database 1360. Segment groups are collections of similar segments or other segment groups. Furthermore, in accordance with our inventive teachings, similar segment groups can be merged together to form higher-level segment groups, with this operation iteratively continuing until a single, high-level segment group is formed representing all the cases in a dataset (an entire population). The segment groups form a hierarchy from which a user, such as a business manager or data analyst, can analyze trends and discover correlations within the case data at different levels of the segment hierarchy. Segment viewer 1500 graphically presents segments in hierarchical order, with a top-level segment group summarizing the entire population and lower-level groups and segments summarizing smaller and smaller subsets of the population. A percentage of the entire population contained within any given segment is also displayed in parentheses after a segment name. Viewer 1500 also permits, through the commands received over line 1367, the segment hierarchy to be expanded or contracted to facilitate understanding the depicted relationships among the displayed clusters. Further, viewer 1500 also scores the displayed segments, based on similarity measures, and ranks and displays those segments based on normalized scores. FIG. 18 depicts illustrative graphical display 1800 provided by segment viewer 1500 for exemplary case data, here illustratively being "Nielsen" television show ratings for a collection of previously aired television shows rather than e-commerce data (though e-commerce data would be very similarly displayed). Display 1800 shows segment hierarchy 1810 in a left portion of the display. A user, such as a business manager or data analyst, by clicking on a down arrow displayed within hierarchy 1810 can expand a segment group to expose its constituent segments, as shown. Each segment and group are listed along with their corresponding percentages of an entire population. In that regard, segment 5 represents 10% of the entire population, segment group 6 represents 27% of the entire population, and so forth. As depicted, segment group 6 also contains segments 3 and 4. In the absence of specifically naming any segment or group, segment viewer 1500 assigns generic names, such as segment 1, segment 2 and so forth, to the various segments. If a user selects a particular segment, here segment 4, in the hierarchy--that selection being signified by a black background, segment viewer will display summary 1820 of the selected segment in an upper right portion of the display. Summary 1820 contains a table having various columns of displayed data. Property (attribute) column 1825 which lists various attributes 1827 for the cases in the segment; value column 1830 provides the value of each of those attributes. The property/value pairs that are displayed are those that best summarize, in ranked order, the cases in the selected segment or segment group, here segment 4. Here, for the cases in segment 4, the property/value pair indicative of users who watched (attribute value=1; this value would equal zero for those that did not watch) the television show "MAD ABOUT YOU SPECIAL" best characterizes segment 4, followed by other property/value pairs accordingly. Score column 1840 provides a bar, such as bar 1840.sub.1, the length of which is a relative (normalized) indicator of just how well the property/value pair summarizes the cases in the selected segment or segment group. the longest length bar specifies the corresponding property/value that best summarizes the category. As illustratively shown in summary 1820, the top two entries appear to summarize, on an approximately equal basis, the cases that form segment 4. Through our present invention, a user of segment viewer 1500 can compare two segments or segment groups. In the context of electronic commerce, illustratively, one segment may correspond to those users who frequently visited a site implemented by Commerce Server 1330 (see FIG. 13), while another group may those user who infrequently visit that same site. With a given segment, here being segment 4, being selected by a user of segment viewer 1500, that user, through selection through use of pull down menu 1850, can select any other segment (or segment group) in the hierarchy to compare against the previously selected category. In the context of exemplary display 1800 shown in FIG. 18, the user, having selected segment 4 for display, has then selected segment group 8 to compare, as a comparison segment or segment group, against segment 4. Here, segment group 8 comprises one segment (segment 0) which contains another cluster of television viewers, though characterized by having, in some respects, preferred television programs in a slightly different order than that of segment 4. Once a segment is chosen for comparison that segment is shown in comparison area 1860. Area 1860 is also formed of a table which in column 1865 lists the attributes (properties) of that segment and in column 1870 the values of those attributes. These property/value pairs, as with summary 1820, are those that best summarize the selected segment or segment group for comparison, here segment group 8. Columns 1880 and 1890 provide visual, ranked, normalized results of that comparison through the use of displayed bar indicators. Specifically, the bars in column 1880, of which bar 1880.sub.1 is illustrative, indicate those attributes that which tend to favor the selected segment or segment group, with the length of each such bar indicating a relative degree to which a corresponding property/value pair, is likely to be seen more in the selected segment or segment group, here segment 4, than in the comparison segment or segment group, here segment group 8. Of the attributes shown, the property/value pair (user who watched the "JOHN LARROQUETTE SHOW") having the longest bar in column 1880, i.e., bar 1880.sub.1, most favors the selected segment or segment group, i.e., segment 4. Correlatively, the bars in column 1890, of which bar 1890.sub.1 is illustrative, indicate those attributes that which tend to favor the comparison segment or segment group, with the length of each such bar indicating a relative degree to which a corresponding property/value pair, is likely to be seen more in the comparison segment or segment group, here segment 8, than in the selected segment or segment group, here segment 4. Of the attributes shown, the property/value pair (users that specify they are part of a family unit of 5 people "related to the head of household") having the longest bar in column 1890, i.e., bar 1890.sub.1, most favors the comparison segment or segment group, i.e., segment group 8. Since only the cluster visualization aspect, i.e., the system components that form segment viewer 1500 and produce display 1800, is germane to the present invention, we will omit any further discussion of any of the other functionality provided by Commerce Server system 1330. FIG. 14 depicts a block diagram of server computer 1400 that forms a portion of networked system 1300 shown in FIG. 13. As shown in FIG. 14, server computer 1400, at a high level, comprises input interfaces (I/F) 1410, processor 1420, communications interfaces 1430, memory 1450 and output interfaces 1440, all conventionally interconnected by bus 1460. Memory 1450, which generally includes different modalities, such as illustratively: random access memory (RAM) 1452 for temporary data and instruction store; diskette drive(s) 1454 for exchanging information, as per user command, with floppy diskettes; and non-volatile mass store 1456 that is implemented through a hard disk(s), typically magnetic in nature. Mass store 1456 also stores executable instructions and associated data for server operating system (O/S) 1457 and application programs 1458. Programs 1458 include Commerce Server system 1330. O/S 1457 may be implemented by a conventional server operating system, such as the WINDOWS 2000 Server operating system commercially available from Microsoft Corporation of Redmond, Wash. ("WINDOWS 2000" is a trademark of Microsoft Corporation). Given that, we will not discuss any components of O/S 1457 as they are all irrelevant. Suffice it to say, that Commerce Server system 1330, being one of application programs 1458, executes under control of the O/S. Incoming information can arise from two illustrative external sources: network supplied information, from the Internet (and/or other networked facility) through network connection 1325 to communications interfaces 1430, or from a dedicated input source, via path(es) 1405, to input interfaces 1410. Dedicated input can originate from a wide variety of data sources, none of which is particularly relevant here. Input interfaces 1410 contain appropriate circuitry to provide necessary and corresponding electrical connections required to physically connect and interface each differing dedicated source of input information to server computer 1400. Under control of the operating system, application programs 1458 exchange commands and data with the external sources, such as web browser 1305 in client PC 1307 (see FIG. 13), via network connection 1325 or path(es) 1405, to transmit and receive information typically requested during program execution at the server. In addition, server computer 1400 communicates, via communication interfaces 1430 and communications link 1435, which may constitute, e.g., a link to a local area network, with transaction processor 1345 (see FIG. 13). Furthermore, input interfaces 1410 also electrically connect and interface user input device 1490, such as a keyboard and mouse, to server computer 1400. Display 1470, such as a conventional color monitor, and printer 1480, such as a conventional laser printer, are connected, via leads 1463 and 1467, respectively, to output interfaces 1440. The output interfaces provide requisite circuitry to electrically connect and interface the display and printer to the computer system. Through use of printer 1480, a user, e.g., data analyst or business manager, who can access the server computer can generate local hardcopy reports. Alternatively, this printer can be situated on, e.g., a local area network (not shown) to which server computer 1400 is also connected, via communication interfaces 1430. Since the specific hardware components of server computer 1400 as well as all aspects of the software stored within memory 1456, apart from the modules that implement the present invention, are conventional and well-known, they will not be discussed in any further detail. Generally speaking, client PC 1305 has an architecture that is similar, at the high level depicted in FIG. 14, to that of server computer 1400. With this in mind, we will now turn to discussing the components of segment viewer 1500 and then provide the mathematical basis which underlies the hierarchical tree construction, segment summary, comparison and scoring operations performed by the segment viewer. FIG. 15 depicts a block diagram of segment viewer 1500. As shown, the segment viewer contains clustering process 1510, cluster hierarchy generation process 1520 which contains inter-segment distance determination process 1525 and segment scoring process 1530; segment comparison process 1540 and graphics interface 1550; and operates in conjunction with data stored within database 1360 residing in store 1350. Specifically, transaction processor 1345 writes event data into database 1360. This data, in conjunction with its attributes, forms case data 100. As noted, data for each event together with its attributes forms a separate record (case) within the database, and specifically within case data 100. Clustering process 1510 automatically, and using a conventional clustering process, such as "EM" clustering, reads, as symbolized by lines 1503, data for the cases, in a dataset (population or collection) stored within case data 100 and automatically determines applicable mutually exclusive categories for these cases and then categorizes (classifies) each of those cases into those categories. This process stores the category for each case within case data 100 and specifically within a field associated with each corresponding record. As each case is categorized, i.e., placed into a corresponding cluster, process 1510 also forms a segment for each ensuing cluster. Alternatively, process 1520, rather than clustering process 1510, may form a segment from a corresponding cluster. As previously noted, a segment is a cluster of cases (having one or more cases) that exhibit similar behavior and have similar properties, and consists of a summary of the case(s) that belong to it. Process 1510 then stores, as symbolized by line 1507, the cluster and segment data, as data 1555, within database 1360. Cluster hierarchy generation process 1520 determines inter-segment similarity, scores the similarity measures and implements hierarchical agglomerative clustering (HAC). In particular, similarity between each pair of segments is mathematically determined through inter-segment distances calculated by inter-segment distance determination process 1525; the mathematical details of which will be specifically addressed later. Segment hierarchies are then formed based on scored similarity measures. To do so, process 1520 first considers all segments to be located at a common lowest hierarchical level and then automatically and selectively merges the segments, based on their scored similarity measures, through hierarchical agglomerative clustering to form a segment hierarchy. In particular, the segment similarity measures determined through distance determination process 1525 are applied, as symbolized by line 1527, to segment scoring process 1530 which, in turn, scores each segment (or segment group), here too the specific mathematical details of the scoring will be discussed later. Thereafter, process 1520 then causes those segments that have the closest similarity measures to be merged together to form a next higher-level group. To do this, process 1520 instructs, as symbolized by line 1543, clustering process 1510 to re-cluster those segments into a single segment group and apply the results, as symbolized by line 1515 back to process 1520. Process 1520 then calculates, through distance determination process 1525, the similarity between this new segment group and all the remaining segments. This HAC operation iteratively continues until a single, high-level segment group, i.e., a root node, is formed that represents all the cases in the entire data population. HAC can be readily understood by defining as "horizon" (cluster set) and how HAC changes that horizon. Initially, all singleton clusters reside in a current horizon. After merging any two nodes in that horizon into a merged node, the merged node is added to the horizon and the two original, now merged, nodes are removed. Hence, the only pairs of nodes that are eligible for merging are those then remaining in the horizon. As symbolized by line 1535, segment scoring process 1530 writes the scores of all segments and segment groups within data 1555 situated within database 1360. Once this process is completed, segment and segment group information is provided to graphics interface 1550 which forms a graphical display, of the form illustratively given by display 1800 shown in FIG. 18, that visually depicts the segment hierarchy. Once the hierarchy has been established and displayed, a user of segment viewer 1500 can compare two segments or segment groups. To do so, the user selects a segment through appropriate interaction with the displayed graphical interface provided by process 1550. In response to user commands on line 1367 that specify such a selection, process 1550, as symbolized by line 1547, identifies both the selected segment or segment group and the comparison segment or segment group to segment comparison process 1540. As symbolized by line 1539, comparison process 1540 specifies the segments or segment groups to be compared to segment scoring process 1530. Process 1530, in turn, causes segment hierarchy generation process 1520 to provide data for these segments and segment groups, including summarized data, as symbolized by line 1529, to graphics interface process 1550 for display, within display 1800 as shown in FIG. 18, as selected segment/segment group 1820 (specifically paired attributes/values in columns 1825 and 1830, and normalized ranked scores in column 1840) and as comparison segment/segment group 1860 (specifically paired attributes/values in columns 1865 and 1870). In addition, process 1520, as shown in FIG. 15, provides normalized scores for those segments and/or segment groups, as symbolized by line 1537, back to segment comparison process 1540. With this scoring information, comparison process 1540 compares the two selected and comparison segments and/or segment groups against each other with the results of that comparison being passed, also symbolized by line 1547, to graphics interface 1550 for graphical display (as columns 1880 and 1890 in display 1800 shown in FIG. 18). Graphics interface 1550 provides appropriate data and instructions, as symbolized by line 1363, to O/S 1457 (see FIG. 14) to generate visualized display 1800 on a monitor (not shown). As noted, through graphics interface 1550, the user of segment viewer 1500 can selectively expand or contract the displayed hierarchy to gain a better appreciation of the inter-relationships among the individual segments and segment groups that occupy the hierarchy. Furthermore, as noted above, some clustering distinctions, which are the product of mathematical clustering techniques, may be rather fine-grained from a quantitative perspective but are essentially meaningless, from a qualitative standpoint; hence, yielding an excessive number of segments. As such, the invention, through HAC process 1520 automatically and dynamically changes the hierarchy by eliminating appropriate numbers of node(s) and inter-segment links to reduce the number of levels (depth) in the hierarchy. To appreciate this feature, consider FIG. 16 which depicts an illustrative, though graphical, example of hierarchical level reduction provided through the present invention. Assume for the moment that a data population has been categorized into segments 1610 formed of k individual segments: C.sub.1, C.sub.2, . . . , C.sub.k being represented by leaf nodes 1610.sub.1, 1610.sub.2, 1610.sub.3, . . . , 1610.sub.k, respectively. As a result of HAC, a four-level segment hierarchy represented by tree 1600 results. Further, suppose that the tree is deeper, i.e., has an excess number of levels, than desired. This could be caused by one or more segments situated at intermediate levels in the hierarchy that represent unnecessary or immaterial distinctions. For example, in Commerce Server 1330, displayed hierarchical segment trees are limited, to simplify understanding, to a depth of three levels. If a resulting tree produced through HAC contains more than three levels, certain levels need to be removed and the tree appropriately re-arranged. Such is the situation illustratively shown in FIG. 16. The segment group of each parent node in tree 1600 is formed, through HAC, as a result of the union of the segments or segment groups associated with the two nodes situated immediately below it. The latter two nodes are viewed as child nodes, the child nodes situated to the lower left and right of a parent node, such as nodes 1610.sub.1 and 1610.sub.2, respectively, for parent node 1620 are correspondingly referred to as left and right child nodes. Hence, as symbolized by inter-nodal links 1611 and 1613, segments C.sub.1 and C.sub.2 (associated with nodes 1610.sub.1, and 1610.sub.2, respectively) have been merged through HAC to form parent node 1620. Parent node 1630 has been illustratively formed through HAC by merging, as represented by inter-nodal links 1623 and 1615, segment group associated with parent node 1620 and segment C.sub.3 associated with node 1610.sub.3. Root node 1640 has been formed, at least in part, through the merger, as symbolized by inter-nodal link 1635, of the segment group associated with parent node 1630. In order to convert four-level tree 1600 to its proper size of three levels, node(s) at one level and associated inter-nodal links must be removed; hence, segment and segment groups associated with those nodes merged into parent nodes at a next higher level, with the hierarchy being re-arranged accordingly. To determine which nodes to remove, the distances between the segments associated with the child nodes (e.g., nodes 1610.sub.1 and 1610.sub.2 for segments C.sub.1 and C.sub.2, respectively) for the first level of parent nodes are first determined. Then, the score for the second level of parent nodes (e.g., node 1630) is similarly determined based on its child nodes. After scores for two parent levels are so determined, the parent nodes with maximum scores are deleted. Links are connected between the child nodes of each removed parent node and the remaining node situated above the deleted parent node. For example, as shown in FIG. 16, assume parent node 1630 is to be removed. In this case, inter-nodal links 1623 and 1615 from its child nodes 1620 and 1610.sub.3 are deleted from the hierarchy as is inter-nodal link 1635 from deleted noted 1630 to its parent (root) node 1640. Root node 1640 is then connected, via new inter-nodal link 1627, to parent node 1620. Child node 1610.sub.3 is then connected to its appropriate remaining parent node (not shown) by new inter-nodal link 1617. This process continues until the tree becomes the proper depth; though for the example shown in FIG. 16 this process occurs just once to reduce the tree by just one level. Once appropriate level(s) are eliminated in the tree, the similarity measures for all remaining nodes are updated through a weighted average, as given by Equation (18) below, of the updated distances associated with its child nodes. Having now described the implementational and associated display aspects of segment viewer 1500, we will now describe the specific mathematical basis which underlies the various operations performed by the segment viewer. The basis will be separately described for each of the basic operations provided by the segment viewer: segment tree construction, segment set summary and segment set comparison. We will then provide a mathematical basis for our inventive alternate discriminant-based scoring technique. First, assume each case has n attributes. A. Hierarchical Tree Construction Given a set of segments C.sub.1, C.sub.2, . . . , C.sub.k and desired tree depth t, a hierarchical tree is constructed on top of these k segments as follows: 1. Construct an initial tree of arbitrary depth on top of segments C.sub.1, . . . , C.sub.k via Hierarchical Agglomerative Clustering (HAC). a. Compute distance, d.sub.clust, between every pair of segments as follows: ##EQU16## where: d.sub.j (C.sub.i,j,C.sub.h,j) is the distance between the distributions modeling attribute j in clusters i and h, respectively. If attribute j is modeled as BinGaussian, BinMultinomial or Binomial distribution, then: d.sub.j (C.sub.i,j,C.sub.h,j)=KL(p(x.sub.j.noteq.NULL.vertline.C.sub.i), p(x.sub.j.noteq.NULL.vertline.C.sub.h))+KL(p(x.sub.j =NULL.vertline.C.sub.i),p(x.sub.j =NULL.vertline.C.sub.h)) (9) where: KL is computed through Equation (10) as follows (assuming p.sub.1 >p.sub.2): KL(p.sub.1,p.sub.2)=(p.sub.1 -p.sub.2) log(p.sub.1 /p.sub.2) (10) Alternatively, if attribute j is modeled as a Gaussian distribution, which can occur if the attribute is "age" of an Internet site user, then: d.sub.j (C.sub.i,j,C.sub.h,j)=KL(p(x.sub.j =NULL.vertline.C.sub.i),p(x.sub.j =NULL.vertline.C.sub.h))0.5(KL(p.sub.11,p.sub.12)+KL(p.sub.21,p.sub.22) (11) where: .mu..sup.i.sub.j, .sigma..sup.i.sub.j are mean and standard deviation of attribute j in cluster C.sub.i, respectively; and .mu..sup.h.sub.j, .sigma..sup.h.sub.j are mean and standard deviation of attribute j in cluster C.sub.h, respectively. Here, null represents no available data for an attribute for a given user, e.g., "no response" provided by that user to a question in a site that requests his(her) age. The values p.sub.1i, p.sub.1h, p.sub.2i, p.sub.2h are given by the following Equations (12-15): ##EQU17## Here, G(t;0,1) is a value of a normal Gaussian function (with mean=0, and standard deviation=1) at t. ##EQU18## p.sub.21 =p(x.sub.j.noteq.NULL.vertline.C.sub.h)(0.685) (14) ##EQU19## FIG. 17 illustratively depicts two of these probability density functions for attribute j for two different segments Ci and C.sub.h as corresponding functions C.sub.i,j and C.sub.h,j also represented as functions 1710 and 1720, respectively. The mean of each function (.mu..sub.i and .mu..sub.h, respectively) is shown along with a location one standard deviation, .sigma., on either side of each mean. The integrated areas under the Gaussian functions for each of the values p.sub.11, p.sub.12, p.sub.21 and p.sub.22 are respectively shown as areas 1713, 1717, 1723 and 1727 in FIG. 17. If attribute j is modeled by a Multinomial distribution, let s.sub.j be the number of possible states for attribute j: ##EQU20## b. Merge the nearest pair of segments to produce a parent node. c. Compute the distance from the parent node to the other nodes in the tree. Let left denote the left child of the parent node and right denote the right child of the parent node. The distance from the parent node to the cluster represented by node c in the tree is given by Equation (17) as follows: ##EQU21## Here, w(left) and w(right) are a number of data points represented by the left and right child nodes, respectively. d. Continue HAC until a root node representing the entire data population (dataset) is generated. If the resulting tree has depth.ltoreq.t, then stop. Otherwise, proceed to step 2 below. 2. Remove internal nodes of the tree so that resulting tree has depth.ltoreq.t and the leaf nodes correspond to segments. a. Let node i denote a leaf in the tree that has | ||||||
