Methods and apparatus for using attribute transition probability models for pre-fetching resources6154767Abstract Building resource (e.g., Internet content) and attribute transition probability models and using such models for pre-fetching resources, editing resource link topology, building resource link topology templates, and collaborative filtering. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
______________________________________
A B C D
______________________________________
A 0.0 0.4 0.5 0.1
B 0.6 0.0 0.3 0.1
C 0.2 0.1 0.0 0.7
D 0.8 0.1 0.1 0.0
______________________________________
As shown above, if the user last requested resource B, then the probability that the user will next request resource A is 0.6, the probability that the user will next request resource C is 0.3, and the probability that the user will next request resource D is 0.1. Each of the rows in the matrix must sum to one. The values of the diagonal of the matrix are set to zero because the resulting models are used for prefetching and caching resources. Thus, even if usage logs indicate that users do repeatedly request the same resource, such a resource would already have been cached. Since, most Internet sites have much more than four (4) resources, in practice, the resource transition probability matrix will be much larger (e.g., on the order of 100 by 100). As discussed above, the elements of the matrix are determined by (i) counting the number of times users request a first resource to generate a first count, (ii) counting the number of times users request a second resource (immediately) after requesting the first resource to generate a second count, and (iii) dividing the second count by the first. Again, this model is fairly simple because it assumes that all users are the same. Again, in the refined methods, such as the process depicted in FIG. 35, to account for the diversity of users, a number of user types is specified. Each of these user types will have an associated resource transition probability matrix. Under this modeling framework, parameter estimation is much more challenging because an unobserved quantity, i.e., a cluster identifier, exists for each sequence of resource requests. The table below shows an example of data that may be observed from users traversing an Internet site.
______________________________________
SEQUENCE OF
CLUSTER USER/SESSION
RESOURCE TRANSITIONS
______________________________________
? 1 ABDBCEFA
? 2 DBCEFA
? 3 BDFECAE
? 4 FEACEBD
? 5 FAEDABCEAFEDC
. . .
. . .
. . .
______________________________________
As discussed above in step 3520 of FIG. 35, free parameters of a probabilistic model that might have generated the usage log data are estimated. These free parameters may be used to infer the cluster identifiers and the associated resource transition probability matrices. The following likelihood function is a mathematical expression for the probability that the actual usage data would be observed given the parameters of the function. ##EQU1## where i.ident.Origin resource index. j.ident.Destination resource index. k.ident.Observed processes index. l.ident.Cluster index. N.ident.The number of observed processes. i.sub.0 .ident.The initial state of the process. n.sub.ij .ident.The number of times a process transitioned from resource i to resource j. m.ident.The number of clusters. s.ident.The number of resources. p.sub.l .ident.A probability vector of length s which specifies an initial state distribution of cluster l. P.sub.l .ident.An s by s matrix of transition probabilities for cluster l. .alpha..ident.A probability vector of length m that contains the proportion of processes coming from any particular cluster. Basically, the term within the parentheses computes the probability that user k made the transitions that they did assuming that they are from cluster l. The term in the parentheses before the double product is called "the initial state distribution" and specifies the probability that user k started his(her) traversal through the Internet site from the resource from which he(she) started. The double product term is a product of all the probabilities of transitions that user k made. The (l)P.sub.ij term is element i,j in the resource transition probability matrix for cluster l. The exponent is an indicator of the cluster identifier and is 1 if user k is a member of cluster l and is 0 otherwise. Finally, the double product preceding the parentheses indicates that the above calculations are performed over all clusters and all users. The free parameters are p, P and .delta.. The refined methods of the present invention employ Bayesian inference and maximum likelihood inference approaches for estimating the free parameters. More specifically, regarding the Bayesian inference approach, applying Bayes theorem provides: ##EQU2## where P(A/B).ident.The probability of A given B. The probability of the assumed parameters given the usage data (P(assumed parameters.vertline.usage data)) is known as the "posterior". Finally, the probability of the usage data given the assumed parameters is known as the likelihood. Thus, the likelihood (P(usage data.vertline.assumed parameters) may be expressed as shown in equation (1). The probability of the assumed parameters (P(assumed parameters)) is a prior distribution which represents beliefs about the parameters before observing the data. In one implementation, non-informative (or "flat") priors are assumed to represent ambivalence toward the parameter values. Accordingly, a non-informative (or uninformative) Dirichlet hyperprior is used as a prior distribution function for parameters of the model. Then .delta. will be a distributed multinomial (1,.alpha.). A non-informative Dirichlet (1) hyperprior for the hyperparameter .alpha. corresponds to a uniform prior distribution over the m-dimensional simplex. Similarly, every row in every transition matrix will also have a non-informative Dirichlet prior distribution over the s-dimensional simplex. To reiterate, the prior distribution functions of the free parameters of the likelihood function are as follows: .delta..sup.(k) .apprxeq.Mult(1,.alpha.) .alpha..apprxeq.Dirichlet(1.sub.m) (l)p.apprxeq.Dirichlet(1.sub.s) (3) (l)P.sub.i,all j .apprxeq.Dirichlet(1.sub.s), where (l)P.sub.i,all j is the i.sup.th row of (l)P The joint distribution is proportional to the likelihood multiplied by the prior densities and therefore may be represented as: ##EQU3## Assuming that the first order Markov assumption is correct, this joint distribution captures all of the information about the process clustering that is contained in the data. However, this distribution is rather complex and all of the usual distribution summary values (mean, variance, etc.) are extremely difficult to extract. Using a Markov Chain Monte Carlo ("MCMC") approach to sample from this distribution avoids this problem with a degree of computational cost. Markov Chain Monte Carlo algorithms provide a method for drawing from complicated distribution functions. The form of the posterior distribution lends itself to a class of MCMC algorithms known as Gibbs samplers. Implementations of a Gibbs sampler partition the parameter space into "blocks" or "sets" of parameters where drawing from the distribution of the block given all of the other blocks is simple. Iterations of the Gibbs sampler in turn draw new values for each block of parameters from these block conditional distributions. The parameter space may be partitioned as follows. The rows of every transition matrix, the vector .alpha., and each .delta. will be block updated. The block conditionals are found from the above posterior. ##EQU4## The row updates are drawn from a distribution where the expected value is approximately the maximum likelihood estimator (or "MLE") for the row if the cluster assignments, .delta., were known. The vector .alpha. is drawn from a distribution where the expected value is approximately the mixture proportions if, again, the cluster assignments were known. Lastly, the cluster assignments are drawn such that probability of each cluster is proportional to the mixture probability times the likelihood of the observation coming from the associated transition matrix. The implementation of this algorithm initially fills in all of the transition matrices with s.sup.-1 and the vector .alpha. with m.sup.-1 and randomly assigns the .delta. to one of the m clusters. The algorithm proceeds by first updating all of the rows of P, then updates .alpha., and lastly updates .delta.. This constitutes one iteration. After a large number of iterations (approximately 10,000, but this depends on the data and dimension of the problem), the sequence of parameter values will approximate the joint posterior distribution and hence, arbitrary functionals of the posterior distribution may be computed. Regarding the maximum likelihood inference approach for estimating the free parameters, an Expectation Maximization (or "EM") algorithm may be used. EM algorithms iterate between obtaining maximum likelihood estimates for the unknown parameters given the complete data and computing the expected value of the missing data given the parameters. In this implementation, the algorithm iterates between computing maximum likelihood estimates for the transition matrices and reevaluating the cluster assignments. In the Gibbs sampling algorithm discussed above, the .delta..sup.(k) 's were coerced to put probability one (1) on one cluster and zero (0) on all of the others. Then assessment of Pr(l.fwdarw.k) (=.alpha..sub.L) comes directly from the distribution of the Monte Carlo sample of .delta..sup.(k). As opposed to the Gibbs sampling algorithm, the .delta.'s now represent a probability vector where .delta..sub.l indicates the probability that the process was generated from cluster l. Despite this difference, similarities between the Gibbs sampling algorithm and the EM algorithm will be evident. The likelihood function has to be modified to adapt to this alternate interpretation of .delta.. This version of the likelihood has the same meaning as that discussed above but its mathematical form would have been much more difficult to handle in the Bayesian framework. ##EQU5## To initialize the algorithm, the processes are randomly assigned to the m clusters. That is, the .delta.'s are randomly selected to represent assignment to one of the m clusters and .alpha. is the mean of the .delta.'s. With this complete data, maximum likelihood estimators (or "MLEs") for the initial state distribution and the transitions matrices may be determined as follows: ##EQU6## This equation is similar to equation 5 set forth above. Conditioning on the values of p and P, the cluster probabilities can be computed similar to equation 7 set forth above. ##EQU7## Each vector .delta..sup.(k) is then normalized to sum to unity. Lastly, the mixture probability vector .alpha. is updated as the mean of the .delta.'s. The EM algorithm is known to converge slowly in some situations. An alternative algorithm is proposed here. The algorithm is to force the .delta.'s to assign probability one to one of the clusters and zero to the remaining. Hartigan's k-means algorithm is an example of this type of constrained EM algorithm for multivariate normal data. To make this modification, in lieu of equation 12 set forth above, .delta..sup.(k) is assigned to the cluster from which has the highest probability of generating process k. The algorithm converges when an entire iteration is completed with no processes being reassigned. A major drawback to the EM approach is the lack of standard errors. Gibbs sampling produces the estimates of the standard deviation of the marginal posterior density for any parameter of interest. EM, on the other hand, is solely a maximization method. Variants of the EM algorithm like the SEM algorithm (Supplemented EM) rely on normal approximations to the sampling distribution of the parameter estimates. In practice, these estimates are often quite reasonable. For the case at hand, however, the observed information matrix can be quite difficult to calculate. The "label switching problem" does not exist for EM algorithms. The constrained EM algorithm lacks accuracy and detail but has the advantage of speed. The Gibbs sampler on the other hand can be used to compute arbitrary functionals of the distribution quite easily but takes several orders of magnitude longer to iterate to reasonable accuracy. Thus, a hybrid algorithm may be useful to borrow from the strengths and diminish the effect of the weaknesses of both algorithms. In a further implementation used for applied process cluster problems, the constrained EM algorithm is iterated to convergence. The cluster assignments from the constrained EM algorithm provide initial assignments for the Gibbs sampler. Then, with a relatively short burn-in period (i.e., less iterations needed), the Gibbs algorithm runs until it obtains decent estimates for the posterior means and variance of the parameters. Of course, other clustering methods and likelihood functions may be used. Having described examples of resource transition probability model building processes, the use of such processes in a networked client-server environment is now described below. FIG. 5 is a high level block diagram of a network environment 500 in which the server-side resource transition probability model building system 100 of the present invention may operate. The environment 500 includes, inter alia, a client (e.g., a personal computer) 502 which may communicate data via a network (e.g., the Internet) 506, and a server (e.g., a personal computer) 504 which may also communicate data via the network 506. The client 502 may include processor(s) 522, storage device(s) 524, and input/output interface(s) 526, which share a system bus 528. The storage device(s) 524 stores program instructions for implementing at least a portion of the process of the present invention. At least a portion of the process of the present invention may be effected when the processor(s) 522 executes the stored (and/or downloaded) program instructions. The input/output interface(s) 526 permits communication with the network 506, for example via an ISDN (or Integrated Services Digital Network) line termination device. The input/output interface(s) 526 further functions to condition inputs provided via an input device(s) 520 (e.g., a keyboard, mouse, and/or other man-machine interface) and to condition outputs provided to an output device(s) 521 (e.g., a video display, audio speakers, etc.). Similarly, the server (e.g., a personal computer) 504 may include a processors) 532, storage device(s) 534, and input/output interface(s) 536, which share a system bus 538. The storage device(s) 534 stores program instructions for implementing at least a portion of the process of the present invention. At least a portion of the process of the present invention may be effected when the processor(s) 532 executes the stored (and/or downloaded) program instructions. The input/output interfaces) 536 permits communication with the network 506, for example via a modem bank. The input/output interface(s) 536 (e.g., a Small Computer System Interface (or "SCSI") protocol unit) may also permit records, such as usage log records, and data, such as resource data, to be written to and read from a database stored on a storage device (e.g., a magnetic or optical disk) 540. The network 506 may include, inter alia, bridges, routers, switching systems, multiplexers, etc., to forward data to an addressed (e.g., in accordance with the TCP/IP (Transmission Control Protocol/Internet Protocol) protocol) destination. FIG. 6 is a high level process diagram of a networked client 602 and server 604 in which the client 602 may browse resources 634 of the server 604. The client 602 may include a resource browser process (or more generally, a resource requester) 620. When a resource is requested, the resource browser process 620 first checks a local resource cache 624 to determine if the resource is available at the client 602. If the requested resource is available, it is retrieved and rendered. If, on the other hand, the requested resource is not available locally at the client 602, the resource browser process 620 will submit a request for the resource, via an input/output interface process 610, possibly a proxy 630 such as America Online or a local Internet service provider, a networking process 640, and an input/output interface process 650 of a server 604, to a resource retrieval process (or more generally, a resource retriever) 660 of the server 604. The resource retrieval process 660 may first check a high speed memory resource cache 635 to determine whether the requested resource is available. If the requested resource is not available at the resource cache 635, the resource retrieval process 660 may request, via the input/output interface process 650 (e.g., a SCSI card) of the server 604, the resource from a larger, slower speed, storage device 634. In either case, the resource retrieval process 660 returns the requested resource, for example, via the input/output interface process 650 of the server 604, the networking process 640, possibly a proxy 630, and the input/output interface process 610 of the client 602, to the resource browser process 620 of the client 602. These processes may be used in known systems, such as those that manage client resource caches 624 in accordance with a least recently used ("LRU") replacement algorithm. FIG. 7a is a process diagram of a system 700 which may be used to effect exemplary server-side resource transition probability model building and pre-fetching processes of the present invention. Basically, the system 700 includes a client 702, a networking process 640, a resource server 704, and an analysis server 750. Although shown separately, the processes of the resource server 704 and the analysis server 750 may be carried out by a single server. Basically, the client 702 functions to (a) accept user selections for resources, (b) request resources from its resource cache or a server, (c) download and render resources, (d) download and store lists of resource transition probabilities, (e) manage cached resources, and (f) pre-fetch and cache resources based on a list of resource transition probabilities. Basically, the resource server 704 functions to (a) service requests for resources, whether the requests are in response to a user selection or pre-fetch, and (b) logging usage when appropriate. Finally, the analysis server 750 basically functions to generate resource transition probability models based on usage logs. The client 702 includes a storage area 732 for storing a small (resource transition probability) model list and a storage area 624' for caching resources. The client also includes an input/output interface process 610' and a browser process (or more generally, a resource requester) 620'. The input/output interface process 610' may include, for example, video driver protocols, audio driver protocols, networking layer protocols, and input device interfaces. The browser process 620' may include a user interface process (or more generally, a user interface) 722, a navigation process (or more generally, a navigator) 724, a resource rendering process (or more generally, a resource renderer) 726, a cache management process (or more generally, a cache manager) 728, and a resource pre-fetch process (or more generally, a resource pre-fetcher) 730. As shown in FIG. 7a, the user interface process 722 can interact and exchange data with the input/output interface process 610' and the navigation process 724. The navigation process 724 may further interact and exchange data with the input/output interface process 610', the cache management process 728, and the pre-fetch process 730. The resource rendering process 726 may interact and exchange data with the input/output interface process 610' and may receive data from the cache management process 728. The cache management process 728 may further interact and exchange data with the pre-fetch process 730 and the resource cache 624'. The pre-fetch process 730 may further interact and exchange data with the input/output interface process 610' and the small model list 732. FIG. 7b is a process diagram of an alternative client 702'. The alternative client 702' is similar to the client 702 of FIG. 7a, but differs in that a process management process (or more generally, a process manager) 790 is added. The process management process 790 provides a centralized control of the input/output interface process 610', the user interface process (or more generally, a user interface) 722, the navigation process (or more generally, a navigator) 724, the resource rendering process (or more generally, a resource renderer) 726, the cache management process (or more generally, a cache manager) 728, and the pre-fetch process (or more generally, a pre-fetcher) 730. Further, the process management process 790 may facilitate inter-process communications. The resource server 704, shown in FIG. 7a, includes a storage area 635' for storing cached resources, a storage area 734 for storing resources, a storage area 746 for storing usage log information, an input/output interface process 650', a resource retrieval process (or more generally, a resource retriever) 660', a usage log building process (or more generally, a usage recorder) 740, a parameter selection process (or more generally, a parameter selector) 742, and a user interface process (or more generally, a user interface) 744. The input/output interface process 650' of the resource server may interact and exchange data with a networking process 640 of the network 506, an input/output interface process 752 of the analysis server 750, resource storage area 734, the resource retrieval process 660', and the usage log storage area 746. The resource retrieval process 660' may further interact and exchange data with the usage log building process 740 and the resource cache storage area 635'. The usage log building process 740 may further interact with and provide data to the usage log storage area 746. The user interface process 744 may interact with and provide data to the parameter selection process 742, which may interact with and provide data to the usage log building process 740. The analysis server 750 includes an input/output interface process 752, a filter and merge process (or more generally, a filter/merger) 754 (optional), a resource transition probability model generation process (or more generally, a resource transition probability model generator) 756, and a storage area for storing resource transition probability models 758. .sctn.2.3 OPERATION OF SERVER-SIDE MODEL BUILDING SYSTEM The operation of the exemplary server-side resource transition probability model building system 100 will now be described with reference to FIGS. 8 through 12. FIG. 8 is a flow diagram of processing 800, carried out by a client 702 in response to a user resource selection (or "resource request"), in the exemplary server-side model building process of the present invention. First, as shown in step 802, the resource is requested from the resource cache 624' of the client 702. Referring back to FIG. 7a, this step may be carried out by navigation process 724 and cache management process 728. If, as shown in steps 804, 806 and 808 in FIG. 8, the requested resource is available from the resource cache 624' (i.e., a "hit"), the resource is rendered (may be carried out by resource rendering process 726) and the hit is reported to the resource transition model builder (may be carried out by navigation process 724). In a modified embodiment, the server will send a new table to the client (not previously sent with a pre-fetched resource) in response to the cache hit. If, on the other hand, the requested resource is not available from the resource cache 624', the client 702 requests the resource from the server 704 as shown in step 812. This step may be carried out by the cache management process 728, the navigation process 724, and the input/output interface process 610'. Skipping ahead to FIG. 9a, which is a flow diagram of processing 900, carried out by the resource server 704, in response to the client resource request, the resource server 704 first requests the resource from its resource cache 635' as shown in step 902. Referring back to FIG. 7a, this step may be carried out by the resource retrieval process 660'. If, as shown in steps 904 and 908 in FIG. 9a, the resource is not available from the resource cache 635', the resource is requested from the resource storage area 734. This step may be carried out by the resource retrieval process 660' also. Thereafter, as shown in step 906, the resource, whether obtained from the resource cache 635' or the resource storage area 734, is returned to the requesting client 702. Again, this step may be carried out by the resource retrieval process 660' and the input/output inter face process 650'. Before, after, or concurrently with steps 902, 904, 906, and 908, as shown in steps 910 and 912, a short list of resource transition probabilities is also returned to the requesting client 702. These steps may be carried out by the input/output interface process 752. Finally, as shown in step 914, if the requested resource was requested in response to a pre-fetch request, processing continues at return node 918. If, on the other hand, the requested resource was not requested in response to a pre-fetch request (e.g., if the request was in response to a user selection), the usage log 746 is updated as shown in steps 914 and 916. This step may be carried out by the usage log building process 740. The above described server processing 900 may be modified or refined as follows. First, if the request is a pre-fetch request, the server will only process such a request if it is sufficiently idle. That is, the resource server 704 will first serve explicit resource requests before serving pre-fetch requests for a resource that a user "might" want. Second, again, if the request is a pre-fetch request, the server might only send certain types of resources (e.g., non-image resources). Finally, if the client 702 submitting the pre-fetch resource request subsequently submits a resource request pursuant to a user selection processing, by resource server 704 of the pre-fetch resource request may be aborted. Recall from FIG. 8 that if a requested resource is available from the client's resource cache 635', such a hit (if the resource was pre-fetched) is reported to the resource server 704. As shown in FIG. 9b, the resource server processes such a hit report by updating the usage log 746 as shown in steps 950 and 952. Processing continues from return node 954. Returning now to FIG. 8, as shown in step 814, the small resource transition probability model list 732 of the client 702 is updated based on the returned list. This step may be carried out by the pre-fetch process 730. Before, after or concurrently with step 814, the returned resource is rendered by the client 702 as shown in step 816. This step may be carried out by the resource rendering process 726. FIG. 10 is a flow diagram of processing 1000, carried out by the analysis server 750, in an exemplary model building processes of the present invention. First, as shown in decision step 1002, it is determined whether it is time to update (or create a new or replace) a resource transition model. The data collection time period (or "sample period") is predetermined and will depend on the type of resources and the interrelationship between resources. For example, an Internet site having relatively static content, such as a site with resources related to movie reviews, may have a resource transition model which, once created, is updated weekly. On the other hand, an Internet site having relatively dynamic content, such as a site with resources related to daily news stories or even financial market information, may have a resource transition model with is replaced daily or hourly. Alternatively, the sample period may be defined by the filtering process discussed above with reference to FIG. 1. In any event, once it is determined that it is time to update, generate, or replace a resource transition model, as shown in step 1004, if necessary, usage logs are merged and filtered as discussed above with reference to FIG. 1. These steps may be carried out by filter and merge process 754. Next, as shown in step 1006 in FIG. 10, resource transition probability models are generated as discussed above with reference to FIGS. 2 through 4. This step may be carried out by the resource transition probability model generation process 756. Finally, the generated resource transition probability models are stored as shown in step 1008. Processing continues as shown in FIG. 10, at return node 1010. FIG. 11 is a high level messaging diagram of an exemplary server-side resource transition probability model building process carried out by the exemplary system 700. FIG. 12 is a more detailed messaging diagram of an exemplary server-side resource transition probability model building process of the exemplary system 700. FIG. 17 depicts an exemplary data structure for communicating a resource request, which may be used in the exemplary system 700 of FIG. 7a. FIG. 18 depicts an exemplary data structure for returning a resource, which may be used in the exemplary system 700 of FIG. 7a. Finally, FIG. 19 depicts an exemplary data structure for reporting a client resource cache hit (of a pre-fetched resource), which may be used in the exemplary system 700 of FIG. 7a. At a high level, FIG. 17 depicts an exemplary data structure 1700 for communicating a resource request from a client 702 to a resource server 704. As shown in FIG. 17, the resource request data structure 1700 may include a request type ID field 1710, a resource name field 1720, a resource location field 1730, a return (client) address field 1740, a selection and/or request time stamp field 1750, and an optional resource size field 1760. The request type ID field will include data to indicate whether the request is the result of a user selection or a pre-fetch determination. The resource name field 1720 and/or the resource location field 1730 serve to identify the requested resource. The resource name field 1720 may be a URL file name which includes directories and sub-directories at which the resource is stored. The resource location field 1730 may be the Internet address of the resource server 704 at which the resource is stored. The return address field 1740 includes information (e.g., an Internet address) of a client 702 making the request so that the resource server knows where to return the requested resource. The return address field 1740 may also be the Internet address and a node of a proxy 630 through which the client 702 access the Internet. The time stamp field 1750 includes time at which the user selection, or resource request was made. Alternatively, this information is not needed if the resource server time stamps resource requests when they are received or returned. (However, as will be discussed below, if the resource is requested pursuant to a "pre-fetch" request, this field is not needed or is not used.) Finally, the optional resource size field 1760 may be provided to express the size (e.g., in bytes) of the requested resource. Such information may be used when determining whether sufficient bandwidth is available to pre-fetch the resource and/or whether sufficient idle processing time is available to pre-render the resource. A field including user identification information (not shown), such as a cookie or a global unique identifier (or "GUID") for example, may also be included in the data structure 1700. At a high level, FIG. 18 depicts an exemplary data structure 1800 for communicating a resource or other data, such as a resource transition probability list, from a resource server 704 to a requesting client 702. As shown, the data structure 1800 includes a data type ID field 1810, a return (client) address field 1820, an optional resource size field 1830, and a payload section 1840. The data type ID field 1810 may be used to identify the type of data carried on the payload 1840. For example, the data may be a selected resource, a pre-fetch resource, or a resource transition probability list. The return address field 1820 includes address information (such as the Internet address of a client 702 or proxy 630) which permits the data to be forwarded to the appropriate entity. The optional resource size field 1830 includes information regarding the size (e.g., number of bytes) of the data carried in the payload 1840. If the payload includes a resource, it should also include the address of the resource. At a high level, FIG. 19 depicts an exemplary data structure 1900 for reporting a client resource cache hit (of a pre-fetch resource) from a client 702 to a resource server 704. The data hit report data structure 1900 may include a hit ID field 1910, a resource name field 1920, a resource location field 1930, and an optional selection time stamp field 1940. The hit ID field 1910 identifies the message as a resource cache hit report message. The resource name and location fields 1920 and 1930, respectively, correspond to the resource name and location fields 1720 and 1730, respectively, of resource request data structure 1700 discussed above with reference to FIG. 17. The optional selection time stamp field 1940 includes information which indicates a time at which a user selected a resource which was found at the client resource cache. This field is not needed if the resource server 704 time stamps the message 1900. A field including user identification information (not shown), such as a cookie or global unique identifier (or "GUID") for example, may also be included in the data structure 1900. Referring first to FIGS. 7, 9a and 9b, and 11, the client 702 submits a resource request 1102 to the resource server 704. Referring back to FIG. 17, the request 1102 may have data structure 1700. The resource server relays a request 1104 for the resource, first to the resource cache 635', and then, in the event of a cache "miss", to the resource storage area 734. The resource 1106 is returned to the server 704 which, in turn, returns the resource 1107 to the client 702. Referring back to FIG. 18, the returned resource 1106 may be in the payload 1840 of data structure 1800. The further processing of the resource at the client 702 is irrelevant for purposes of describing the server-side resource transition probability model. If the request 1102 was the result of a user selection, not a pre-fetch determination, the resource server 704 then sends a log 1108 of the request and provision of the resource to usage log 746. At some predetermined time, the analysis server 750 submits a request 1110 for the usage logs 746. The requested logs 1112 are returned in response. After the usage logs are merged, filtered, and provided to a resource transition model generation process, the resource transition probabilities 1114 are provided to the resource transition probability model storage area 758. Referring now to FIGS. 7 and 12, the flow of data and messages between the processes of system 700 is now described. In the following description, for purposes of simplicity, the input/output interface processes 610', 650' and 752 of the client 702, resource server 704, and analysis server 750, respectively, and the networking process 640 are not shown in FIG. 12. First, the user interface process 722 provides a user selection message 1202 to the navigation process 724. The user selection message may be generated by the user interface process 722 based on a user input, such as a mouse click on a hyper-text link of an HTML page. The navigation process 724 forms a resource selection request 1204 which is forwarded, via the input/output interface process 610', optional proxy 630, networking process 640, and input/output interface process 650', to the resource retrieval process 660'. Referring back to FIG. 17, the resource request communication 1204 may be in the form of data structure 1700. The request type ID field 1710 will indicate that the request is pursuant to a user selection. Information in the other fields will be as discussed above with reference to FIG. 17. In response, the resource retrieval process 660' first forms a resource request 1206 to the server's resource cache 635'. If the resource is available from the resource cache 635', it is returned in communication 1208. If, on the other hand, the resource is not available from the resource cache 635', it is returned as a miss in communication 1208. Further, if the resource was not available from the resource cache 635', the resource retrieval process 660' submits a request 1210 for the resource, via the input/output interface process 650', to the resource storage area 734, and the requested resource is returned in communication 1212. Whether the resource is obtained from the resource cache 635' or the resource storage area 734, it is returned to the navigation process 724 of the requesting client 702 in communication 1214. Referring back to FIG. 18, the communication 1214 may be in the form of the data structure 1800. The data type ID field 1810 of the data structure will indicate that the payload 1840 contains a selected resource. Before, after, or concurrently with the communication 1214, the resource retrieval process 660' reports the access of the resource in communication 1216 to the usage log building process 740. The usage log building process 740 provides an update 1218 to the usage logs stored in storage area 746. At a predetermined time, user logs are transmitted, via input/output interface process 650', and input/output interface process 752, to resource model transition generating process 756. Although not shown in FIG. 12, these logs may first be provided to the filter and merge process 754. The provision of the usage logs may be in response to a request generated at the resource server 704 or in response to a request (not shown) generated by the analysis server 750. Finally, the resource model transition generation process 756 provides an updated model (or new or replacement model), in communication 1222, to the storage area 758 for the resource transition probability models. Having described the function, structure, and operation of an exemplary system for building a server-side resource transition probability model(s), the use of such models, for example to pre-fetch resources or to edit the topology of a resource site, will be discussed below. The source of the server-side resource transition probability model is not particularly relevant for purposes of the pre-fetching and editing applications; the models may be generated internally (as described) or purchased or accessed from an independent entity. .sctn.3. PRE-FETCHING USING SERVER-SIDE MODEL As discussed above, resource pre-fetching can be used to better utilize processing resources and bandwidth of communications channels. In general, resource pre-fetching by the client utilizes idle bandwidth, and resource pre-fetching by the resource server utilizes idle processing and/or data bus resources of the server. Although resource pre-fetching may occur at both the client and the server, each type of pre-fetching will be separately described. .sctn.3.1 CLIENT PRE-FETCHING .sctn.3.1.1 FUNCTION OF PRE-FETCHING USING SERVER SIDE MODEL Basically, after a client 702 receives a requested resource, bandwidth on a communications path between the client 702 and the server 704 is available, while the resource is being rendered by the resource rendering process 726 or while a user is sensing and/or interpreting the rendered resource. The present invention permits this idle communications bandwidth to be exploited. More specifically, based on the previously requested resource (or based on previously requested resources), a list of transitions to other resources, in descending order of probability, is used to pre-fetch other resources. Such pre-fetched resources are stored at a client resource cache 624'. .sctn.3.1.2 STRUCTURE OF PRE-FETCHING USING SERVER-SIDE MODEL The structure of the pre-fetching system 700 is similar to that described above with reference to FIG. 7a. However, if the resource transition probability models are purchased from a third party, the processes 752, 754 and 756 of the analysis server 750 are not needed. .sctn.3.1.3 OPERATION OF PRE-FETCHING USING SERVER-SIDE MODEL In many instances, particularly with modem-based communications, a communication channel is maintained between the client and the server. While the client is rendering resources or a user is sensing (e.g., viewing, reading, and/or listening to) the rendered resource, the maintained communications channel is idle. Similarly, when the user is sensing the rendered resource, processing resources of the client may be relatively idle. Further, the processing resources of the server may be idle at times. The pre-fetching aspect of the present invention exploits such idle communications and processing resources. The operation of resource pre-fetching using a server-side resource transition probability model will now be described with reference to FIGS. 7, 13, 14, 15, 16a and 16b. Basically, when a client 702 requests a resource in response to a user selection, the server 704 returns the requested resource and a resource transition probability list to the client 702. Under appropriate conditions (e.g., idle bandwidth on a communications channel between the client 702 and server 704), the client will pre-fetch a resource based on the list. FIG. 13 is a flow diagram of processing, carried out by a client, in a pre-fetching process 1300 of the present invention. First, as shown in decision step 1302, the communications path between the client 702 and the resource server 704 is monitored, in a known way, to determine whether or not idle bandwidth is available. If idle bandwidth is available, then, as shown in steps 1302 and 1304, a resource is requested based on the resource transition probability list 732. More specifically, the most probable transition from the last requested resource is determined based on the ordered list from the resource transition probability model. The resource associated with the most probable transition is then pre-fetched. These steps may be carried out by pre-fetch process 730. FIG. 14 is a high level messaging diagram of an exemplary process for pre-fetching resources based on a resource transition probability model. FIG. 15 is a high level messaging diagram of an exemplary process of logging resource transitions to cached resources. FIGS. 16a and 16b, collectively, are a messaging diagram of an exemplary process for pre-fetching resources based on a resource transition probability model. In the following description, for purposes of clarity, the input/output interface processes 610' and 650' and 752 of the client 702, resource server 704 and analysis server 950, respectively, and the networking process 640 are not shown in FIGS. 14, 15, 16a, and 16b. Referring first to FIGS. 7 and 14, a client 702 desires a resource to render. The client 702 first submits a request 1402 to its own resource cache 624' to determine whether or not the resource is available at its resource cache 624'. If the resource is available at its resource cache 624', the resource is returned and rendered. However, in this example, it is assumed that the resource is not available from the resource cache 624'. Accordingly, a cache miss message 1404 is returned. In response, the client 702 then submits a request 1406 for the resource to the resource server 704. Referring back to FIG. 17, the request 1406 may be in the form of data structure 1700. In this case, the request type ID field 1710 will have data which indicates that the request was made pursuant to a user selection. The resource server 704 submits as shown in FIG. 14, a request 1408 for the resource. The requested resource is returned, either from the resource cache 635' or the resource storage area 734, in communication 1410. A log 1412 of the request and provision of the resource is provided to a usage log storage area 746. Before, after, or concurrently with communications 1408, 1410, and 1412, the server 704 submits a request 1414 for a rank ordered list of transition probabilities from the requested resource to other resources. In response, such a rank ordered transition probability list 1416 is returned. The server 704 then returns the requested resource and the rank ordered transition probability list in communication 1418 to the client 702. Referring back to FIG. 18, the communication 1418 may be in the form of one or more data structures 1800. In a first data structure 1800, information in the data type ID field 1810 will indicate that the payload 1840 includes selected resource data. In a second data structure 1800, information in the data type ID field 1810 will indicate that the payload 1840 includes a resource transition probability list. The client 702 renders the resource and provides the list to the small model list storage area 732 in communication 1420, shown in FIG. 14. Under certain circumstances (e.g., idle bandwidth available), the client 702 will submit a query 1422 for the most probable resource transition. In response, an identification of a resource to be pre-fetched is returned in communication 1424. The client 702 then submits a request 1426 for the pre-fetch resource to the resource server 704. Referring again to FIG. 17, if the communication 1426 is in the form of data structure 1700, the request type ID field 1710 will include data which identifies the resource request as being pursuant to a pre-fetch determination. In one embodiment, the resource server will only service a pre-fetch request if it has sufficient idle processing and/or data bus resources; the pre-fetch request has a lower priority than requests for resources resulting from a user selection. The resource server 704 then submits a request 1428 for the requested pre-fetch resource. The requested pre-fetch resource is returned, either from the resource cache 635' or the resource storage area 734, in communication 1430. Note that the resource server 704 does not, at this time, log the requested pre-fetch resource. This prevents the model building process of the present invention from creating a "self fulfilling prophecy". That is, the resource transition probability model should not be updated merely on the basis of its own predictions. The user of the client 702 must actually request rendering of the pre-fetched resource. The resource server 704 then communicates the pre-fetched resource, in communication 1432, to the client 702. If the communication 1432 is in the form of data structure 1800, the data type ID field 1810 will include information which indicates that the payload 1840 has pre-fetch resource data. The client 702 then sends the pre-fetched resource, in communication 1434, to the resource cache 624'. The pre-fetched resource is now available at the resource cache 624' of the client 702 should it be requested. In a modified embodiment, the pre-fetched resource is marked as a "low priority" resource for purposes of cache flushing and cache replacement algorithms. That is, if the cache becomes full and more space is needed, pre-fetched resources are more likely to be removed from the cache 624' than other resources. In addition to being cached, if processing resources of the client 702 are sufficiently idle, then the client 702 may begin pre-rendering processing of the pre-fetched resource. Referring now to FIG. 15, data communications, which occur when a pre-fetched resource is requested to be rendered, are shown. Recall from the discussion of FIG. 14 above that the return of a requested pre-fetch resource is not logged when retrieved by the resource server 704 in order to prevent the predictions from reinforcing themselves. As shown in FIG. 15, a client 702 first requests a resource to be rendered. A request 1502 is first submitted to the resource cache 624' of the client 702. In this instance, it is assumed that the requested resource had been pre-fetched and stored at the client's resource cache 624'. Accordingly, a cache hit and the requested resource are returned in communication 1504. In order to permit the resource transition probability model to reflect this, the cache hit is reported in message 1506 from the client to the resource server 704. Referring to FIG. 19, the report hit message 1506 may be in the form of data structure 1900. In response to the hit message, the server 704 submits a log 1508 to the usage log storage area 746. In one embodiment, the resource server 704 will also return a resource transition probability list for the pre-fetched and rendered resource as shown in communications 1510, 1512, 1514 and 1516. FIGS. 16a and 16b, collectively, are a messaging diagram of an exemplary process for pre-fetching resources based on a resource transition probability model. Referring now to FIGS. 7, 16a, and 16b, the operation of the exemplary system, in which resources are pre-fetched based on a resource transition probability model, is described. The client 702 processes a user resource selection as follows. A user selection is made (e.g., via a graphic user interface by double clicking a mouse when an arrow is on a hyper-text link) and the user interface process 722 communicates the user selection, in communication 1602, to the navigation process 724. In response, the navigation process 724 submits a resource selection request 1604 (e.g., via input/output interface process 610', networking process 640, and input/output interface process 650') to the resource retrieval process 660' of the resource server 704. Referring again to FIG. 17, the resource selection 1604 may be in the form of data structure 1700. If so, the request type ID field 1710 should have information which identifies the resource request as being made pursuant to a user selection. The server 704 services the resource selection 1604 as follows. The resource retrieval process 660' will submit a request 1606 for the selected resource to its resource cache 635'. If the selected resource is available from the resource cache 635', it is returned to the resource retrieval process 660' in communication 1608. If, on the other hand, the selected resource is not available from the resource cache 635', a cache miss indication is returned to the resource retrieval process 660' in communication 1608. In this latter case, the resource retrieval process 660' will submit a request 1610 for the selected resource (e.g., via input/output interface process 650') to the resource storage area 734. The requested resource is then returned to the resource retrieval process 660' in communication 1612. Thus, the resource retrieval process 660' will obtain the selected resource, either from the resource cache 635' or from the resource storage area 734. The server 704 will also log the returned resource as follows. The resource retrieval process 660' will then report the accessed resource, as well as the user accessing the resource and time of the selection by the user and/or of the retrieval, to the usage log building process 740 via communication 1614. In response, the usage log building process 740 will update the usage logs 746 via communication 1616. Before, after or concurrently with the communication 1616, the resource retrieval process 660' will return the requested resource (e.g., via input/output interface process 650', networking process 640, and input/output interface process 610'), in communication 1618, to the resource rendering process 726 of the browser process 620' of the client 702. Referring again to FIG. 18, the communication 1618 may be in the form of data structure 1800. In this case, the data type ID field 1810 will have information which indicates that the payload 1840 includes a selected resource. The resource is then rendered by the client 702. Based on the selected resource retrieved, the resource retrieval process 660' will submit, as shown in FIG. 16a, a request 1620 for a small (ordered) transition probability list (e.g., via input/output interface processes 650' and 752, assuming separate resource and analysis servers) to the resource transition probability model storage area 758. The requested list is returned to the resource retrieval process 660' in communication 1622. The resource retrieval process then communicates the list (e.g., via input/output interface process 650', networking process 640, and input/output interface process 610') to the pre-fetch process 730 of the browser process 620' of the client 702. Alternatively, the request 1620 for the small list may include the resource and the network address of the client 702. In this case, the analysis server 750 can communicate the small list directly to the pre-fetch process 730 of the client 702. Naturally, the communication 1618 of the requested resource and the communication 1624 of the small list can be combined into one communication. Furthermore, if separate communications are made, the temporal order of the communications should not matter. Resource pre-fetching may occur as follows. Thereafter, if idle bandwidth exists on the communications path between the client 702 and the resource server 704, the pre-fetch process 730 will formulate a pre-fetch resource request based on the small list storage at storage area 732. This pre-fetch resource request is communicated, as communication 1626, to the resource retrieval process 660'. Referring again to FIG. 17, the communication 1626 may be in the form of data structure 1700. In this case, the request type ID field 1710 will indicate that the resource request was made pursuant to a pre-fetch operation. The resource server 704 may service the pre-fetch request as follows. As was the case with communications 1606, 1608, 1610, and 1612, discussed above, the resource retrieval process 660' will submit, as shown in FIG. 16b, a request 1628 for the pre-fetch resource to its resource cache 635'. If the pre-fetch resource is available from the resource cache 635', it is returned to the resource retrieval process 660' in communication 1630. If, on the other hand, the pre-fetch resource is not available from the resource cache 635', a cache miss indication is returned to the resource retrieval process 660' in communication 1630. In this latter case, the resource retrieval process 660' will submit a request 1632 for the pre-fetch resource (e.g., via input/output interface process 650') to the resource storage area 734. The requested resource is then returned to the resource retrieval process 660' in communication 1634. Thus, the resource retrieval process 660' will obtain the pre-fetch resource, either from the resource cache 635' or from the resource storage area 734. To reiterate, pre-fetch requests may be given low priority by the resource server 704. That is, resource requests resulting from user selections may be given higher priority than those resulting from pre-fetch determinations. Since the rendering of the pre-fetch resource is merely a prediction at this point, rather than being provided to the resource rendering process 726 of the browser process 620' of the client 702, the pre-fetch resource is communicated, in communication 1636, (e.g., via input/output interface process 650', networking process 640, input/output interface process 610' and pre-fetch process 730) to the cache management process 728 (not shown in FIG. 16b) which stores the pre-fetched resource in resource cache 624'. Referring to FIG. 18, the communication 1636 may be in the form of data structure 1800. In this case, the data type ID field 1810 will indicate that the payload 1840 includes a pre-fetch resource. The pre-fetch resource may be (a) an entire HTML page with all associated resources, (b) resources, represented by large data files (e.g., large images), associated with the HTML page but not the page itself, or (c) the HTML page only. Thus, if a user selects a pre-fetch resource, other related resources may be needed. In such cases, the address of the pre-fetch resource must be stored so that the other related resources, which might, for example, only be addressed by a sub-directory, may be accessed. Notice also that the usage logs are not updated merely on the basis of the return of the requested pre-fetch resource. To reiterate, the usage logs are not updated at this time so that the resource transition prediction model will not be self-reinforcing. Rendering of pre-fetched and cached resources may occur as follows. Later, the user at the client 702 may request another resource. This selection is indicated by communication 1638 from the user interface process 722 to the navigation process 724. In response the resource selection, the navigation process 724 will first want to check the resource cache 624' of the client 702. This check is made, in communication 1640, to the cache management process 728 (not shown in FIG. 16b). Assuming that the user has selected a resource that had been pre-fetched (see e.g., communication 1636), the cached and pre-fetched resource is provided, in communication 1642, to the resource rendering process 726 which renders the selected resource to the user. If only a portion of the selected resource was pre-fetched and cached, requests for other related resources may be issued to the server 704. The address information of the pre-fetched and cached resource and the address information (which might be only a partial address) of the related resource(s) are combined (e.g., concatenated) so that the related resource(s) may be accessed. In further response to the resource cache hit, the cache management process 728 (not shown in FIG. 16b) reports the cache hit, in communication 1644, to the user log building process 740. Referring back to FIG. 19, the communication 1644 may be in the form of data structure 1900. Only at this time does the usage log building process 740 update the user logs 746 via communication 1646. Recall from communications 1510, 1512, 1514 and 1516, that the resource server 704 may communicate a resource transition probability list to the client when the pre-fetched and cached resource is rendered. As discussed above with reference to building server-side resource transition probability models, different resource transition probability models may be built based on different "clusters" of similar users. A user accessing the resources of the server may initially use weighted resource transition probability models (built from usage logs of clusters of similar users) based on a prior distribution of all users for pre-fetching resources. As more information is gathered about the user, the weighting is updated. .sctn.3.2 SERVER PRE-FETCHING .sctn.3.2.1 FUNCTION OF PRE-FETCHING USING SERVER SIDE MODEL Referring to FIG. 7a, recall that the resource server 704 may also be provided with a resource cache 635'. During times when the server 504 has available (or idle) processing resources, the server may load resources into its resource cache 635' based on the resource transition model and based on the resource(s) most recently requested by a server. Whether or not data bus (e.g., a SCSI bus) resources are available may also be checked. In this way, resources likely to be requested are made available in faster cache memory. .sctn.3.2.2 STRUCTURE OF PRE-FETCHING USING SERVER-SIDE MODEL The present invention may operate in a system 500 shown in FIG. 5 when the processor(s) 532 execute appropriate program instructions. The storage devices(s) 534 should include a resource cache 635', a section 534'a for storing name(s) of resource(s) most recently requested by server(s), and a section 534'b for storing resource transition probability lists. The resource cache 635' and storage sections 534'a and 534'b (see FIG. 22) may be logically or physically segmented such that a logically or physically separate memory area is available for each of a number of clients 502 accessing the server 504. .sctn.3.2.3 OPERATION OF PRE-FETCHING USING SERVER-SIDE MODEL An example of the operation of server pre-fetching using a server-side resource transition probability model is described with reference to FIGS. 21 and 22. FIG. 21 is a flow diagram of a server pre-fetch process 2100 which utilizes the above discussed server-side resource transition probability model. First, as shown in step 2102, system status is checked. More specifically, whether or not processing (and/or data bus) resources are available (i.e., idle processing resources) is determined. Pre-fetch cache space availability may also be checked. The pre-fetch cache may: (a) be a predetermined size, or (b) share memory space, in which case such shared memory space is rationed based on hit-to-miss ratios of the pre-fetched resources. Referring now to decision step 2104 and step 2106, if idle processing (and/or data bus) resources are available, a resource is cached based on a resource transition probability list for a resource most recently requested by a client. Note that the step 2106 may be carried out for individual clients or for all clients collectively. Operation continues as shown by return step 2108. FIG. 22 is a message flow diagram of a server pre-fetch process which uses the above discussed server-side resource transition probability model. In this example, also referring to FIGS. 6 and 7, it is assumed that the resource retrieval process 660/660' includes a pre-fetch process 2250. In addition, a system monitor process 2290, which may be carried out in a known way, is available. For example, an operating system may carry out system monitoring functions. First, as shown in communication 2202, the pre-fetch process 2250 queries the system monitor process 2290 regarding the system status, and in particular, whether or not idle processing (and/or data bus) resources are available. In response to this query, the system monitor process 2290 returns a status message which may include information which indicates whether or nor, or to what degree, idle processing (and/or data bus) resources are available. In the following, it is assumed that idle processing (and/or data bus) resources are available to such an extent that resources may be cached. Since idle processing (and/or data bus) resources are available, the pre-fetch process 2250 will take this opportunity to pre-fetch resources likely to be requested. Note that the following pre-fetch processing may take place for clients on an individual basis or on a collective basis. More specifically, the pre-fetch process 2250 submits a request 2206 to storage section 534'a, for name(s) of resource(s) most recently requested by server(s). The requested resource name(s) are returned in communication 2208. The pre-fetch process then submits to storage section 534'b, a request 2210 for list(s) associated with the resource name(s) returned in communication 2208. The requested list(s) is return in communication 2212. As discussed above, a resource transition probability list may be a rank ordered list of the probabilities of transiting from a given resource to other resources. The pre-fetch process 2250 uses this list to request, via request 2214, the resource most likely to be requested. This request 2214 is submitted to the resource storage area 734. The requested resource is returned in communication 2216. The | ||||||
