User configurable prefetch control system for enabling client to prefetch documents from a network server6023726Abstract A prefetching and control system for a computer network environment. The user configures the client's prefetch parameters which are: enabling/disabling prefetching, prefetch threshold value, and the maximum number of documents to prefetch. A prefetch value or weight is contained in the Hypertext Markup Language (HTML) page or prefetch file, called a pathfile, for each link. The HTML page contains the prefetch values for each of its links, while pathfile contains the weights for every link on the HTML page associated with the Universal Resource Locator (URL). The client compares the prefetch or weight values of each link with its threshold value to decide if the link should be prefetched and placed in the local cache as long as the maximum number of documents to prefetch is not exceeded. Pathfiles reside on the server and are created by the server or web administrator/author. The server automatically creates the pathfiles from its log files which are filtered to retain all of the valid document requests and average paths are derived from the filtered results. Weights are assigned to each path in the URL by the server and inserted into the pathfile along with the associated paths. If no log files exist on the server, then the web administrator/author may manually enter in the weights for each path. Claims I claim: Description BACKGROUND OF THE INVENTION
______________________________________
Subtag Description
______________________________________
NAME Named link end.
HREF URL for link resource.
REL Relationship forward link type.
REV Reverse link.
TITLE Advisory title string
PRE Prefetch value.
______________________________________
The ANCHOR tag remains backward compatible. It defaults to no prefetching. The PRE subtag of the ANCHOR tag holds a prefetch value which has the following properties: Typically ranging from 0.0 to 1.0. This value indicates the relevance in retrieving the link before the user asks for it. The higher this value, the more likely it is to be retrieved. These values will be normalized for a page. The PRE value is ignored for non-relevant protocols like telnet:, mailto:; news:, etc. After normalizing, two links of similar values will be pulled in the order they appear in the document. Zero value indicates that this link is never prefetched. Even if the user's threshold value is set to zero. Although the invention is described as having prefetch values ranging from 0.0 to 1.0, one skilled in the art can readily appreciate that there are many other ways to achieve this operation using other enumerated values, for example: low, medium, and high. The client has a very simple design because it interacts and uses the functionality of already existing modules. It reads the preferences for the user and, if prefetching is turned on, it builds a list of documents to prefetch from the current page based on the ANCHOR tag's PRE value. The client skips those links which have a PRE value less than the one specified by the user. It then removes those links which are already in the cache or currently being downloaded (possibly in a GETLIST). The client then proceeds to download these documents on a lower priority. The speed of the network connection is a useful factor in determining what and how much to prefetch. Links that have been prefetched may be displayed in a different color to convey to the user which documents that have been prefetched. As mentioned above, the user configures the following parameters on the client: Enable/disable prefetching. Prefetch threshold--ranges from 0.0 to 1.0. Good performance vs. Low disk use. Maximum number of documents to prefetch. Pathfiles: Pathfiles are used to predict the access patterns of web surfers. The web administrator/author of the document suggests certain paths that a user may follow from a given page. Pathfiles are created by web administrators/authors to improve the perceived access time of their web sites. The client makes a best guess using the pathfile to prefetch the most likely next choice or path of the user. Pathfiles are also used by the server to tell the client that it does not support prefetching. The following is an example of a specification for the format of a pathfile: If for some reason the client does not or can not (as in the case of a proxy) parse the content of a document for prefetch information, it should then use the mechanism described below to access the pathfile for a given document. Mechanism To assist a prefetching client, the server creates pathfiles for each resident URL. When the client sends a GET request for the document, the server replies with an additional LINK header referencing "prefetching" and an HREF pointing to the location of the pathfile. If prefetching is enabled on the client side, and the client does not already have the pathfile for this document, it requests the corresponding pathfile from the server. The client then parses the pathfile to assert that a valid entry of the current document is available. It then locates this document and reads the links pointed to by it. Based on the weights allocated in the pathfile, a client may decide to pull all, some, or none of the links in that list. These are the "prefetched" documents. They are held in a prefetch cache until the user of that client makes an explicit request for them or they expire. Syntax The semantics of the pathfile are as follows: Naming: The pathfiles may be named anything that may be considered an acceptable filename. This approach ensures consistency, the ability to be used on most of the common operating systems, and requires no server configuration. Location: The location of the pathfile is completely flexible. A server may decide to keep all of the pathfiles related to its site in a particular directory. However the pathfile must not span servers. A pathfile relating to documents on a server must be kept on the same server to enable the client to generate fully qualified and valid URLs for the documents listed in the pathfile. Content: The pathfile contains the list of documents recommended by the author for prefetching. The first entry in the pathfile is the "Realm" which indicates the relative path for the use with documents within a Realm. Realm has the following construct:
______________________________________
Realm:<CR.vertline.CRLF.vertline.LF>
/[<relative path name>]<CR.vertline.CRLF.vertline.LF>
______________________________________
A pathfile may have more than one Realm. If the Realm entry does not tally with the document being analyzed, the client should check other Realms listed in a pathfile. If none of the Realm entries match the document, the client must abort prefetching on the current document. The other entries organized in records consist of a filename, followed by a list of documents that may be prefetched after this file is fetched by the client. These records are separated by one or more blank lines. Each record has the following construct:
______________________________________
filename:<CR.vertline.CRLF.vertline.LF>
<prefetch document URL>[<space><weight>]<CR.vertline.CRLF.vertline.LF>
[<prefetch document URL>[<space><weight<]CR.vertline.CRLF.vertline.LF>]
<CR.vertline.CRLF.vertline.LF>
______________________________________
In this example, the filename is the name of the document that the client wants to analyze. Each record is terminated by an empty line. The client ignores empty lists. If no weights are assigned, then all of the links in the list have equal weights. Weights typically range from 0.0 to 1.0. This value indicates the relevance in retrieving the link before the user asks for it. The higher this value, the more likely it is to be retrieved. Weights will be normalized for a page. Although the invention is described as having weight values ranging from 0.0 to 1.0, one skilled in the art can readily appreciate that there are many other ways to achieve this operation using other enumerated values, for example: low, medium, and high. Comments: Comments are added to a pathfile using the # syntax. Any line with the first non-blank character as a "#" is treated as a comment line and is not used for processing. EXAMPLES The following is a simple example of a pathfile:
______________________________________
# Pathfile for http://dummy.server.com/
# Generated manually by Gagan Saksena on Aug. 1st. '97
# Other comments here.
# Realm has the relative path from the server name
Realm:
# The index.html file has links to several files.
# It lists here all the relevant ones. Note that there
# maybe other links/files not mentioned here.
index.html:
/whatsnew.html
/help.html
/contents.html
/chap1/first.html
whatsnew.html:
/contents.html 0.7
/help.html 0.3
# Note implicit values - here 0.5
contents.html:
/index.html
/chap1/first.html
# Note a completely different file that may not even be
# linked by any of the existing ones but has
# links back to them.
secret.html:
/index.html
/pics/family.html
______________________________________
The following is another example of a pathfile kept in a different directory:
______________________________________
# Pathfile for http://gagan/
# Generated manually by Gagan Saksena on Aug. 1st. '97
# Other comments here.
Realm:
/chap1
index.html:
/index.html 0.2
/contents.html 0.2
first.html 0.6
first.html:
second.html 0.9
/contents.html 0.1
second.html:
first.html
/chap2/first.html
______________________________________
The following pseudo-code illustrates a typical prefetching client.
______________________________________
if (prefetching.enabled)
//Construct the corresponding pathfile URL from
thisURL
String pathfileLocation =
extractPathfileLocationFromHeaders(thisURL);
//If the constructed pathfileLocation is not in cache
if (!cache.inCache(pathfileLocation))
{
//Download and add to cache
pathfile = fetch(pathfileLocation);
cache.add(pathfileLocation, pathtile);
}
//Parse the pathfile
PathfileParser parser = new PathfileParser(pathfile);
//If valid
if (parser.contains((thisURL)))
{
//Construct the list to be prefetched for thisURL
ListToPrefetch list = parser.list(thisURL,
prefetchThreshold);
//Retrieve all of them
while (link=list.getNext())
{
//Add to cache
cache.add(link, fetch(link));
}
}
}
______________________________________
The user configures the client's prefetch characteristics at any time. The prefetch characteristics include: enable/disable prefetching, prefetch threshold value, and document prefetch limit. Referring to FIGS. 4, 5, and 6, during normal operation, the client checks if the user has configured it for prefetching 401. If it is not, then the prefetch process is skipped. If prefetch has been enabled, then the client looks at the current page for prefetch values 402. If there are prefetch values in the page, then the client enters a prefetch loop and checks if the user configured prefetch limit has been reached 501. If the limit has been reached, the client ends the prefetch sequence 507. Otherwise, the client searches through the document for the link with the highest PRE value that is greater than the threshold value and has not been previously checked 502. If such a link is not found 503, then the prefetch sequence is ended 507. If a link is found 503, then the client checks if the link is already in the cache 504. If so, then the next iteration of the loop is performed 501. If the link is not in the cache, then the client retrieves the document from the server 505 and places the document in the cache 506. The loop then continues 501. If the client is unable to parse the document for prefetch values, then it checks for a pathfile path in the LINK tag 403. If a path exists, the client checks if the pathfile is already in the cache 601. If it is not, then the client will get the pathfile from the server 602. The client parses the pathfile and creates the path that has the highest prefetch value above the threshold value 603. If the document prefetch limit has been reached 604, the prefetch process ends 605. Otherwise, the next link in the path is selected 606. If the link does not exist 607, then it is the path has ended and the prefetch process ends 605. If there is a link, then the link is checked to see if it is already in the cache 606. If so, then the next iteration of the loop is performed 604. If the link is not in the cache, then the client retrieves the document from the server 609 and places the document in the cache 610. The loop then continues 604. If there is no pathfile path in the LINK tag, then the prefetch sequence ends 404. Generating pathfiles: Pathfiles reside on the server. In addition to being created manually by the web administrator/author, pathfiles are created automatically by the server using log files. Servers generate log files daily. Log files are created because system administrators want to keep track of what is happening in the server in case of a problem. These log files contain information such as the time a client connects to the server, the client's Internetwork Processor (IP) address, and what file was accessed. Referring to FIG. 7, the server begins the pathfile creation process by opening the log file 701. The server keeps any normal GET requests and filters out all PREFETCH requests from the log file 702 which do not count as a normal fetch. Next, proxy requests are filtered out 703 because whenever a client makes a request through a proxy server, what gets reflected in the server logs is the proxy, not the individual client. False correlations on pages resulting from, e.g. random accesses, browser reloads, are also filtered out of the log file. This is done by correlating the time gaps between logical requests 704. For example, if the time between 2 requests is greater than 10 minutes, then there is no relationship. If the gap is less than or equal to 10 minutes, then the path sequence is valid. Weights are then assigned to each link 705. Finally, each path with its associated weights are placed in the pathfile 706. The generated pathfile is the aggregate or sum of all of the average paths that were found of clients traversing the web. If there are no log files available, then the web author may assign the weights. In another preferred embodiment, the client tells the server that it is prefetch enabled and the server can decide whether it sends the pathfile path to the client. The server sends a multi-part message which includes the pathfile. Further, prefetching is also predictive through user or website patterns. The following are examples of predictive prefetching: Predictive prefetching based on mouse position. The client gives greater weight to the links that are closer to the user's mouse position on the page. Predictive prefetching based on keyword indices. Users have a tendency to visit pages that have a common theme. The client queries each link on the page for keywords and follows a keyword pattern, e.g. sport, for document prefetching, giving higher weight to the links with the appropriate keyword matches. Sequential patterns. Sequential patterns occur when the user traverses a list of links connected by ANCHOR tags. For example, when the pages represent a book where the user reads each page in succession and in order. The next logical page is given higher weight than the back link or home link. Hub and spoke patterns. Hub and Spoke patterns occur when there is a main page that is the central hub for all of the links, for example, the user returns to the main page each time a link on the page is visited so he can follow other links on the main page. In this case, the prefetch would be weighted toward all of the links on the main page. Advantages: The server controls how much access is provided to the clients for prefetching. The server can switch off or control the amount of access a particular client has, thereby allowing the server to keep a tab on its performance and not become flooded by several extra document requests. refetch is controlled on both the client and server side (the client through the prefetch threshold level). Although the invention is described as being applied to Internet browsers, one skilled in the art can readily appreciate that the invention has many other applications. Although the invention is described herein with reference to the preferred embodiment, one skilled in the art will readily appreciate that other applications may be substituted for those set forth herein without departing from the spirit and scope of the present invention. Accordingly, the invention should only be limited by the claims included below.
|
Same subclass Same class Consider this |
||||||||||
