Blocking saves to web browser cache based on content rating6510458Abstract A user sets preference parameters that filter web page contents from being stored in the cache. The preferences relate to the web page's contents and attributes. Before caching the web page, the contents and attributes of the web page are filtered solely as a function of the web browser. Cache filters take a variety of forms, such as ratings filters, web page identifier filters, and key word filters, which scan accessed contents of a web page for user selected terms. The filtered web page is then blocked from entry in the browser's cache based on the filtering process. Conversely, a user sets preference parameters that filter web page contents to override the block from cache preferences. The browser responds by storing the filtered web pages that were previously designated as web pages not to be cached. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
Request SimpleRequest .vertline. FullRequest
SimpleRequest CET URI CrLf
FullRequest Method URI ProtocolVersion CrLf
[*<HTRQ Header>]
[<CrLf> <data>]
Method Method identifier, usually
alphanumeric
ProtocolVersion HTTP/N.N, where N is a valid number
URI Defined in standard
specifications
<HTRQ Header> <Fieldname> : <Value> <CrLf>
<data> MIME-conforming-message
The Protocol/Version field defines the format of the rest of the request. Currently, this field would usually contain HTTP/1.0 or HTTP/1.1. The Web is an information space. Human beings have a lot of mental machinery for manipulating, imagining and finding their way in spaces. Uniform Resource Identifiers (URIs) are the points in that space. Unlike web data formats, such as HTML or Extensible Markup Language (XML), and web protocols, such as HTTP/1.0 or HTTP/1.1, there is only one web naming/addressing technology, that is URIs. Uniform Resource Identifiers are a generic set of all names/addresses specified by short strings that identify resources in the Web: documents, images, downloadable files, services, electronic mailboxes, and other resources. They make resources available and addressable in the same simple way under a variety of naming schemes and access methods, such as HTTP, File Transport Protocol (FTP), and Internet mail. It is an extensible technology; there are a number of existing addressing schemes, and more may be incorporated over time. Uniform Resource Locators (URLs) are the set of URI schemes that have explicit instructions on how to access the resource on the Internet. Uniform Resource Names (URNs) are URIs that have an institutional commitment to persistence, availability, etc. Uniform Resource Citations, or Uniform Resource Characteristics (URCs), are a set of attribute/value pairs describing a resource. Some of the values may be URIs of various kinds. Others may include, for example, authorship, publisher, datatype, data, copyright status, and shoe size. They are not normally defined as a short string but a set of fields and values with some free formatting. The method field indicates the method to be performed on the object identified by the URL. The method GET is always supported, and many other methods are specified in the HTTP standard that may be extended occasionally. GET retrieves whatever data that is identified by the URI, so where the URI refers to a data-producing process or a script, it is this produced data which will be returned and not the source text of the script or process. HEAD is the same as GET but returns only HTTP headers and no document body. Request headers are format headers with special field names, as well as any other HTTP object headers or MIME headers. "Data" is content of an object that is sent, depending on the method, with the request and/or the reply. MIME stands for Multipurpose Internet Mail Extensions and is the standard for specifying the type of document. HTTP Request fields are header lines that are sent by the client in an HTTP protocol transaction. All lines are format headers. The list of headers is terminated by an empty line. Many different fields are specified in the HTTP standard. The Accept: field contains a semicolon-separated list of representation schemes (Content-Type metainformation values) which will be accepted in the response to this request. The set given may, of course, vary across requests from the same user. This field may be wrapped onto several lines, and more than one occurrence of the field is allowed, with the significance being the same as if all the entries have been in one field. The format of each entry in the list is: <field>=Accept:<entry>*[, <entry>] <entry>=<content type>*[; <parameters>] An example might be: Accept: text/plain, text/html Accept: text/x-dvi; q=.8; mxb=100000; mxt=5.0, text/x-c for specifying plain text, HTML, and DVI files. In order to save time and also allow clients to receive content types of which they may not be aware, an asterisk "*" may be used in place of either the second half of the Content-Type value or both halves. An example might be: Accept: *.*, q=0.1 Accept: audio/*, q=0.2 Accept: audio/basic q=1 which may be interpreted as "if you have basic audio, send it; otherwise, send me some other audio; or failing that, just give me what you've got." Parameters on the content type are useful for describing resolution, color depths, etc. They allow a client to specify in the Accept: field the resolution of its device. This may allow the server to economize greatly on transmission time by reducing the resolution of an image, for example, enabling a more appropriate custom-designed black and white image to be selected rather than giving the client a color image to convert into monochrome. Examples may include dots-per-inch (DPI), color/monochrome, etc. The Accept-Encoding: field is similar to Accept: but lists the content-encoding types that are acceptable in the response. <field>=Accept-Encoding:<entry>*[,<entry>] <entry>=<content transfer encoding>*[,<param>] An example would be: Accept-Encoding: x-compress; x-zip for compressed and zip files. The User-Agent: field specifies the software program used by the original client. The format of the field is: <field>=User-Agent:<product>+ <product>=<word>[/<version>] <version>=<word> An example might be: UserAgent:WebBrowser/1.0 for the browser program used to generate the request. There are several header fields given with, or in relation to, objects in HTTP, all of which are optional. These headers specify metainformation, that is, information about the object, not the information that is contained in the object. Any header fields that are not understood should be ignored. The order of header lines within the HTTP header has no significance. The Content-Type: field is slightly different than MIME types. It is reasonable to put strict limits on transfer formats for mail, where there is no guarantee that the receiver will understand an obscure format. However, in HTTP, one knows that the receiver will be able to receive it because it will have been sent in the Accept: field. There are several advantages for a very complete registry of well-defined types for HTTP, which would not be recommended for mail. In this case, the Content-Type list for HTTP may be a superset of the MIME list. MIME provides for a number of "multipart" types. These are encapsulations of several body parts in the one message. In HTTP, multipart types may be returned on the condition that the client has indicated acceptability (using Accept) of the multipart type and also of the content types of each constituent body part. The URI gives the identifier with which the object may be found within the Web. There is no guarantee that the object can be retrieved using the URI specified. However, it is guaranteed that if an object is successfully retrieved using the URI, it will be, to a certain given degree, the object that the URI specified. If the URI is used to refer to a set of variants, then the dimension in which the variants may differ must be given with the "vary" parameter: URI=<uri>[;vary=dimension [, dimension]*] dimension=Content-Type.vertline.language.vertline.version If no "vary" parameters are given, then the URI may not return anything other than the same bit stream as the specialized object. Multiple occurrences of this field give alternative access names or addresses for the object. One example of this field is: URI http://www.ibm.com/products/product1.multi; vary=content-type This indicates that retrieval given the URI will return the same document, never an updated version, but optionally in a different rendition. Another example of this field is: URI http://www.ibm.com/products/products.multi; vary=content-type language, version This indicates that the URI will return the same document, possibly in a different rendition, possibly updated, and without excluding the provision of translations into different languages. HTTP Response An HTTP response message provides one possible message format, among other possible message formats, for structuring a response to a rating sensitive request according to the present invention. The response from the server starts with the following syntax:
<status line> : : = <http version> <status code>
<reason
line> <CRLF>
<http version> : : = 3*<digit>
<status code> : : = 3*<digit>
<digit> : : = 0 .vertline. 1 .vertline. 2 .vertline. 3
.vertline. 4 .vertline. 5 .vertline. 6 .vertline. 7 .vertline. 8
.vertline. 9
<reason line> : : = * <printable>
<http version> identifies the HyperText Transfer Protocol version being used by the server, e.g., HTTP/1.0. <status code> gives the coded results of the attempt to understand and satisfy the request as a three-digit ASCII decimal number. <reason string> gives an explanation for a human reader except where necessary for particular status codes. Fields on the status line are delimited by a single blank, and parsers should accept any amount of white space. The response headers on returned objects are similar to the request headers, as well as any MIME conforming headers, notably the Content-Type field. Additional information may follow as response data in the format of a MIME message body. The significance of the data depends on the status code. The Content-Type used for the data may be Content-Type that the client has expressed as its ability to accept, e.g., text/plain or text/html. There are several values of the numeric status code to HTTP requests, some of which are described here. The data sections of Error, Forward and Redirection responses may be used to contain human-readable diagnostic information. "Success 2xx" codes indicate success. If present, the body section is the object returned by the request and it is a MIME format object, i.e., text/plain, text/html, or one of the formats specified as acceptable in the request. An "OK 200" code indicates that the request was fulfilled. An "Accepted 202" status code indicates that the request has been accepted for processing, but the processing has not been completed. When a "Partial Information 203" status code is received in the response to a GET command, this indicates that the returned metainformation is not a definitive set of the object from a server with a copy of the object but is from a private overlaid web. This may include annotation information about the object, for example. A "No Response 204" status code indicates that the server has received the request, but there is no information to send back and the client should stay in the same document view. This is mainly to allow input for scripts without changing the document at the same time. The "Error 4xx" status codes are intended for cases in which the client seems to have erred, and the "Error 5xx codes are for the cases in which the server is aware that the server has erred. The body section may contain a document describing the error in human readable form. The document is in MIME format and may only be in text/plain, text/html, or one of the formats specified as acceptable in the request. The "Bad Request 400" status code indicates that the request had bad syntax or was inherently impossible to be satisfied. The "Forbidden 403" status code indicates that the request is for something forbidden. The "Not Found 404" status code indicates that the server has not found anything matching the URI given. The "Not Implemented 501" status code indicates that the server does not support the facility required. The "Redirection 3xx" codes indicate an action to be taken, normally automatically, by the client in order to fulfill the request. Rating Systems As previously described, rated content is currently transmitted across the Web in a manner that allows a browser in conjunction with a filter application to screen objectionable content. Part of the Web infrastructure that allows a browser to screen content consists of a content label mechanism in conjunction with a rating system and rating service. The current web infrastructure that provides support for rating systems may also be used in association with the rating sensitive requests of the present invention. The Platform for Internet Content Selection (PICS.TM.) specification enables labels (metadata) to be associated with Internet content. The specification was originally designed to help parents and teachers control what children access on the Internet, but it also facilitates other uses for labels, including code signing and privacy. The PICS platform is one on which other rating services and filtering software have been built. Many authors and web site operators offer materials that they realize will not be appropriate for all audiences. They may label their materials to make it easier for filtering software to block access. PICS does not endorse any particular labeling vocabulary. The goal of the PICS effort is to enable a marketplace in which many different products and services will be developed, tested, and compared. Some organizations may rate items on well-known dimensions, using their own techniques and viewpoints to determine actual ratings. Other organizations may choose to develop their own dimensions for rating. This motivates the distinction between a rating system and a rating service. Some services may provide access to their ratings on-line from an HTTP server, while others may either ship them in batches or transmit them on floppy disks or CD-ROMs. At the core of the PICS infrastructure is the rating service. The rating service either chooses an existing rating system or develops a new rating system to use in labeling content. The rating system, which is described in a human readable form at the rating system URL, specifies the range of statements that can be made. The rating service establishes criteria for determining who can label content using their name and how the labels must be applied. This combination of criteria and rating service are uniquely identified by the particular service URL. This service URL becomes the "brand" of the rating service. At a minimum, the service URL will return a human readable form of the rating criteria and a link to the description of the rating system. The rating service is also responsible for delivering a service description file. This is a machine-readable version of the rating system with pointers to the rating system URL and the rating service URL. While not required, it is recommended that this be available automatically at the service URL. A labeler, given authority by the rating service, uses the criteria established by the rating service, along with the rating system, to label content. These content labels contain a statement about the content of the resource being labeled and contain a link back to the service URL. Content labels can come in the content itself, with the content, or from a trusted third party, such as a label bureau. Policies determine what actions are taken based on the specific statements in the content label. If a content label is based on an unknown service URL, it is a simple (and potentially automatable) task to retrieve the appropriate service description file to understand what statements are being made in the label. A rating service is an individual, group, organization, or company that provides content labels for information on the Internet. The labels it provides are based on a rating system. Each rating service must describe itself using a newly created MIME type, application/pics-service. Selection software that relies on ratings from a PICS rating service can first load the application/pics-service description. This description allows the software to tailor its user interface to reflect the details of a particular rating service, rather than providing a "one design fits all rating service" interface. Each rating service selects a URL as its unique identifier. It is included in all content labels that the service produces to identify their source. To ensure that no other service uses the same identifier, it must be a valid URL. In addition, the URL, when used within a query, serves as a default location for a label bureau that dispenses this service's labels. A rating system specifies the dimensions used for labeling, the scale of allowable values on each dimension, and a description of the criteria used in assigning values. For example, the MPAA rates movies in the United States based on a single dimension with allowable values G, PG, PG-13, R, and NC-17. Each rating system is identified by a valid URL. This enables several services to use the same rating system and refer to it by its identifier. The URL that names a rating system can be accessed to obtain a human-readable description of the rating system. The format of that description is not specified as a standard. A content label, or rating, contains information about a document. A content label, or rating, has three parts: (1) the URL naming the rating service that produced the label; (2) a set of PICS-defined or extensible attribute-value pairs that provide information about the rating, such as the date that the rating was assigned; and (3) a set of rating-system-defined attribute-value pairs, which actually rate the item along various dimensions, also called categories. As previously described, rated content is currently transmitted across the Web in a manner that allows a browser, in conjunction with a filter application, to screen objectionable content. The current rating systems may also be used with the present invention to rate the content that is to be blocked from transmission. The Recreational Software Advisory Council (RSAC) is an independent, non-profit organization that empowers the public, especially parents, to make informed decisions about electronic media by means of an open, objective, content advisory system. The RSACi (RSAC on the Internet) system provides consumers with information about the level of sex, nudity, violence, offensive language (vulgar or hate-motivated) in software games and Web sites. To date, the RSACi system has been integrated into Microsoft's browser, Internet Explorer. The RSACi system provides a simple, yet effective rating system for Web sites which both protects children and the rights of free speech of everyone who publishes on the World Wide Web. When a parent sets the levels for their child within a Web browser, they may be offered an option that says, "Do not go to unrated sites." RSAC works closely with the PICS standard. FIG. 4 shows the four categories of the RSACi system, with the five levels and their descriptors. It is these levels that parents and other interested individuals set within a browser or filtering software using a document of type application/pics-service. A rating service is defined by a document of type application/pics-service. FIGS. 5A and 5B provide an example of a rating service described by a document of MIME type application/pics-service in accordance with the RSACi rating system. The MIME type application/pics-service is intended to describe a particular rating service in sufficient detail to automatically generate a user interface for configuring content selection software that relies on the rating service. FIGS. 6 through 9 show the steps in which user rating preferences are set as rating level parameters within a user interface for Microsoft's Internet Explorer, called Content Advisor, while using the rating services document from RSAC. FIG. 6 is a dialog box showing the Internet Properties dialog for entry into the Content Advisor dialog. FIG. 7 is a dialog box showing the Ratings tab parameters in which a user may select the rating levels according to the RSACI rating system. FIG. 8 is a dialog box showing the General tab for setting general properties within the Content Advisor dialog. FIG. 9 is a dialog box showing the Advanced tab for setting rating system and rating bureau options. The quoted URL in the rating service field identifies the rating service. This identifier is included in all of the labels provided by the rating service. Dereferencing the URL yields a human-readable description of the service. If the optional URL for an icon for the rating service is supplied, it is dereferenced relative to the rating service URL. The name of the rating system is intended to be short and human-readable, with the description being a longer description, suitable perhaps for a pop-up box. A complete human-readable description is available from the rating service's URL. The quoted URL in the rating system field identifies the rating system used by this service. Dereferencing the URL yields a human-readable description of the rating system. All remaining relative URLs in the application/pics-service description are dereferenced relative to the rating system URL, since they describe features of the rating system. The machine-readable description also describes the categories used in the rating system. There may be one or more categories for a given rating system. A single document may have a rating on any or all of these categories. Categories can be nested within one another. A category has a "transmission name," which is used in the actual label for a document. Transmission names should be as short as reasonable, but they may be complete URLs if desired. They must be unique within a given rating system, i.e., two categories in the same rating system may not have the same transmission name. Unlike the name and the description strings, transmission names must be the same in all of them. Transmission names are case sensitive to allow URLs to be used as transmission names. In addition to the transmission name, which is required, a category may optionally have an icon and a human-readable description. Categories may be nested within one another. In this case, the transmission name is created in the usual way by starting with the outermost category's transmit-as string, adding a "/" and proceeding inward in the nesting. Values in PICS labels may be integers or fractions, with no greater range or precision than that provided by IEEE single-precision floating point numbers. Values may be given to names by using the label attribute. When a value is given a name, it may optionally have an attached icon and a human-readable description. The description for each category can specify a restriction on the range of permissible values for certain named attributes. Once a label is created, it is distributed along with documents in one of several ways. The recommended method, if an HTTP server allows it, is to insert an extra header in the HTTP header stream that precedes the contents of documents that are sent to web browsers. The correct format, as documented in the specifications, is to include the two headers: Protocol, and PICS-Label. FIG. 10 shows an example of an HTTP response with the Protocol header and the PICS-Label header. The next best method is to distribute and obtain labels through a label bureau running on a server according to the PICS specification. If neither of these methods is available, a simpler but more limited method is to embed labels in HTML documents, but not with images, video, or anything else. It is also cumbersome to insert the labels into every HTML document. Some browsers, notably Microsoft's Internet Explorer versions 3 and 4, will download the root document for a web server and look for a generic label there. A label consists of a service identifier, label options, and a rating. The service identifier is the URL chosen by the rating service as its unique identifier. Label options give additional properties of the document being rated, as well as properties of the rating itself, such as the time the document was rated. The rating itself is a set of attribute-value pairs that describe a document, along with one or more dimensions. One or more labels may be distributed together as a list. A specific label applies to a single document. If the document is in HTML format, it may refer to other documents, either by external reference (for example, using the <A HREF=...> tag) or by requesting that they be displayed in-line (for example, using the <img...> or <object...> tag). A label applies to the given document only, not to the referenced documents. A generic label (identified by the use of the "generic" option) applies to a document whose URL begins with a specific string of characters (specified using the "for" option). A generic label does not have the expected semantics of a "default" label that can be overridden by more specific labels. While a specific label does override a generic label when a client has access to both, the two labels may be distributed separately and, thus, a client may have access to only the generic label. A server can keep track of defaults and overrides, and generate a specific label based on a default that is not overridden in its local database. However, a generic label for a site or directory should only be distributed if it applies to all the documents in that site or directory. A rating service may provide a generic label for any or all prefixes of a given URL but should provide only one specific label for that URL. When the specific label for a document can be found, it should be used in preference to any generic label. Lacking a specific label, any generic label may be substituted, but preference should be given to the generic label that has the longest string. Some PICS client software may impose restrictions on the use of generic labels. For example, a client may choose to ignore a generic label that applies to a node in the URL tree more than two levels above the node where the document is located. PICS labels can also be retrieved separately from the documents to which they refer. To request labels in this way, a client contacts a label bureau or rating bureau. A label bureau is an HTTP server that understands a particular query syntax. It can provide labels for documents that reside on other servers and for documents available through protocols other than HTTP. Rating services have been encouraged to act as label bureaus, providing on-line access to their own labels. By default, the URL that identifies a rating service also identifies its label bureau. If a client requests the URL that identifies a rating service, a human-readable description of the service is returned. If, on the other hand, a client requests the same URL and includes query parameters, it should be interpreted as a request for labels. A rating service, however, is not required to act as a label bureau. A rating service may choose a different URL, perhaps even on a different HTTP server, to act as its label bureau. If a label bureau is implemented, then additional time is necessary to retrieve labels before the documents are retrieved. A document server may wait ro a response from a label bureau server before responding to the original document request. This additional retrieval time can create significant overhead. Two problems arise while using the PICS system. The first problem is that a synchronization problem arises when a document has been examined and a label generated but the document is later modified without updating the label. This can happen legitimately when someone updates a document while forgetting to update its label. It can also happen as a result of tampering with the document by an unauthorized party. PICS labels contain three option fields intended to help deter this kind of problem: a creation date, an expiration date, and a checksum. If the objective is to simply detect accidental changes, then the date of last modification of the document can be calculated when the label is created and stored in the "at" field. Assuming that the last modification time is accurately maintained, this will detect updates to the document made after the label was created. If the document is expected to be updated infrequently or periodically, the "until" or "exp" label can contain an expiration date that should cause the label to be invalid before the document is next updated. This also does not guard against a concerted, malicious attack. If the label is intended to apply only to the data that was actually rated, then a form of a checksum, called a "message digest," can be applied to the data when the label is created. The message digest is converted into US-ASCII characters using MIME base-64 encoding and stored in the mic-md5, also called md5, field. When the document is later retrieved, the same algorithm can be used to recompute the message digest, and the two digests can be compared. The md5 algorithm is designed so that it is extremely unlikely that the two message digests will be the same if the document has been tampered with in any way. The second problem is that of tampering with or forging labels. The end user needs some way of being reassured that the label received was created by the expected rating service and that it has not been altered since it was created. PICS addresses this problem by allowing labels to be "digitally signed." A digital signature, while not currently legally recognized, is a cryptographic technique to provide exactly this assurance. The use of digital signatures greatly increases the complexity of implementing the rating system. Regardless of the rating system or the means for implementing the system on the Internet, a serious security problem occurs when rated material, or any sensitive information, is downloaded to a user's computer. The problem occurs because the user's computer caches each item downloaded in a memory cache. The caching process is performed automatically, without regard to whether the information being downloaded is of sensitive or rated nature. A memory cache is most often defined as a small, fast memory holding recently accessed data, designed to speed up subsequent access to the same data. It is most often applied to processor-memory access but is also used for a local copy of data accessible over a network or the Internet using a web browser. When data is read from or written to main memory, a copy is also saved in the cache, along with the associated main memory address. The web browser monitors addresses of subsequent reads to see if the required data is already in the cache. If it is already in the memory cache (a cache hit), the data is returned immediately from memory cache, and the main memory read is aborted or not started. If the data is not cached (a cache miss), then it is fetched from the network connection and also saved in the cache. Generally, conventional browsers define cache as a memory allocation of memory addresses from a memory. Recently visited web pages are cached in the allocated addresses of the computer's memory. Alternatively, browser cache is also memory allocation of memory addresses from a disk cache on a hard disk drive, which are allocated as a cache folder for storing recently visited web pages. Upon receiving a web page request, the browser first checks the memory cache for the requested web page. If the requested web page has been visited during the current session, and that visit occurred recently enough that the web page has not been overwritten by more recently visited web pages, a cache hit occurs, and the web page contents are loaded from memory cache. If a memory cache hit does not occur, the browser then checks the disk caches for the requested web page. If a disk cache occurs, the page information is loaded from the disk cache; if not, the browser loads the web page from the requested web site on the Internet. Browser cache management is crucial to the effective operation of a browser. Due to the differences in transfer rates between the browser and the Internet connection, and between the browser and the onboard memory or hard drive memory, the most effective browser operation is to load contents of requested web pages directly into the browser defined cache. Subsequent web page requests are then handled by loading the requested web pages from the cache rather than the web site on the Internet. Thus, the user is not subjected to enormous amounts of idle time waiting for the web page to be loaded from an Internet connection. However, even moderate browsing generates vast amounts of data associated with recently visited web pages. The operation of a browser may be impeded if the browser is searching reams of cache memory for a cache hit after each web page request. Therefore, most conventional browsers allow users to select predefined cache limits in both the memory cache and the hard disk cache. Once these memory limits are reached, the oldest data is overwritten with current web page contents; and the amount of memory allocated as cache memory and hard disk cache remains manageable. Nonetheless, the problem of sensitive data being cached by the web browser remains a problem in current browser technologies. Users often request sensitive or private information from web sites. With the advent of more secure encryption means, the Internet is quickly becoming the distributed network of choice for financial institutions, government agencies, and professional groups. As a user assesses a web site that supports sensitive data, the user generally must present a valid user identification and password before being granted access to requested data. The data is then usually encrypted and sent to the user's browser. Once the requested page is loaded onto the user's computer by the browser, a breakdown in security occurs. This happens because the requested data, which was handled as privileged data by the web server, is treated as any other data by the web browser, without regard to the sensitive nature of the data. Sensitive data, or rated data, is given no more consideration by the web browser than any other type of data. Therefore, anyone having access to the user's browser may access the entire contents of the browser's cache. Any sensitive, important, rated, business or technical data stored in browser cache may be accessed without a valid user identification or password. However, in a preferred embodiment of the present invention, the content of a web page is filtered solely as a function of the web browser. In the present invention, a user sets preference parameters that filter web page contents from being stored in the cache. Cache filters may take a variety of forms, such as ratings filters, web page identifier filters, and key word filters, which scan accessed contents of a web page for user selected terms. Conversely, in another preferred embodiment of the present invention, the user sets preference parameters which filter web page contents to override the block-from-cache preferences and to store the filtered web pages which were previously designated as web pages not to be cached. Preferred embodiments of the present invention are now described with respect to FIGS. 11 and 12. With reference now to FIG. 11, a flowchart depicts client-side processing for setting cache preference options for generating cache save and cache block parameters used for storing or blocking web page content from a browser defined cache. In this example, the client computer may be similar to data processing system 300 previously described with respect to FIG. 3, which is similar to clients 108, 112 and 110 previously described with respect to FIG. 1. Client 108 may be configured with browsing software similar to Microsoft Internet Explorer, available from Microsoft Corporation. FIG. 11 is a flowchart depicting the process of setting user defined filter preferences for blocking web page contents from a browser defined cache in accordance with a preferred embodiment of the present invention. Initially, it should be noted that the process is contained and performed entirely within a web browser. Thus, the conventional web browser requires modification in order to perform the steps described herein. The process of setting filter preference options for filtering a browser cache begins by opening a web browser and then opening the browser preference option menu (step 1102). The cache preference options for defining filter preferences consist of two types: block-from-cache filter options, and save-to-cache filter options. The web page filters work identically in both the block-from-cache and save-to-cache options in that the filter merely identifies a criteria set by the user. The criteria might be the identity of the web page, the nature of its contents, or the rating of the web page. At step 1104, the user selects the block-from-cache preferences option. This preference option filters web pages from being written to the browser-allocated cache by the criteria discussed above. Thus, when a web page is opened and it is determined by the browser that the web page will not be cached, once that page is closed, the browser must again read the web page through the Internet or distributed net connection rather than finding the web page contents in browser cache. In accordance with a preferred embodiment of the present invention, cache content filter preferences may take one of three possible forms. The first form is the rating filters, as discussed above with respect to FIGS. 6 through 10. At step 1106, the user sets the filter rating preferences, which determine which web pages will be blocked from entry into the cache and not saved. Next, the user sets the site filter preferences (step 1108). Site filter preferences are merely a list of web sites selected by the user which, when accessed by the browser, will filter the web site from cache. Thus, the web site will not be saved to cache. Examples of such web sites might be a user's on-line brokerage house or bank, or perhaps web sites associated with the user's profession. Once a site is selected in the block-from-cache preferences, that site will no longer be stored in cache when the site is accessed by the browser. The final form of cache filter is a content or key word filter. The user must also select content or key word filter preferences as performed above with respect to the previously mentioned filter types (step 1110). Here, the user selects key words or groups of key words which, when contained in an access web page, will filter any web page containing the selected words from being stored in browser cache. In accordance with another preferred embodiment of the present invention, the key words filter function of the browser performs similarly to a search engine in that the browser searches an open page for key words or groups of key words. Upon finding the key words in the content of the search, the browser denies that page from entry in browser cache. Thus, as with a conventional search engine, the user may select groups of words which are connected by logical operators, such as AND or OR or NOR. In one example, a user may be engaged in a profession, such as intellectual property, and may prefer that web pages related to this profession not be stored in cache, such that anyone examining the user's browser cache could not determine the subject matter of a case which the user might be pursuing. In order to keep pages relevant to the practice of intellectual property, the user may select filter terms, such as "intellectual property," "patents," and "trademark," as key word preferences. The boolean operator selected by the user may look something like "patents OR trademarks OR intellectual property." However, these are extremely broad terms and might very well cause an inordinate amount of unrelated data to be blocked from entry in the browser's cache. To the possible detriment of the user, the key words in the above example would also block from entry in browser cache any general information web pages related to the U.S. Patent and Trademark Office, or even pages concerning actions taken by the Commissioner. The filtered web pages would be excluded from browser cache even though generic or political articles about the Patent and Trademark Office have little to do with the user's particular assignment or case. Returning to the flowchart depicted in FIG. 11, the user sets save-to-cache preferences (step 1112). The save-to-cache filter works in the opposite manner from the block-from-cache filter discussed above with respect to step 1104. That is, once the content filter determines that a web page contains save-to-cache criteria, the browser saves that web page to cache. Clearly, the normal operating condition for a web browser is to save all web pages that are opened by the browser to cache. However, the present invention blocks certain web pages that are opened with the browser from being cached based on the web page rating, site identifier, or the contents of the web page. Therefore, the save-to-cache filter works in opposition to the block-from-cache filter in that certain web pages which ordinarily would be excluded from the browser cache by the block-from-cache filter are instead saved to browser cache by the save-to-cache filter. At step 1114, the user selects rating filter preferences that determine web pages to be saved to cache based on the web page rating. Clearly, this step could be performed in conjunction with step 1106 by merely not specifying certain parameters to be blocked from cache. However, in the interest of expedience, the user may make a blanket selection of ratings to be filtered from being cached and, at step 1114, merely specify certain ratings that the user intends to be saved. At step 1116, the user sets the web site filter preferences for saving selected web sites. The operation of the site preference works similarly to the operation of the rating preference in that, at step 1108, the user may select an entire site, including all of its related sites, to be blocked from cache. For instance, the user may select USPTO.gov to be blocked from cache. Thus, any web page containing a USPTO.gov path would not be written to cache. Within the site USPTO.gov, however, the user may have access to important material which is not of private or sensitive content. An example of this might be the USPTO's edition of the Manual Patent Examination Procedures (MPEP). If the user sets a save-to-cache site preference at step 1116, even though the user previously selected block-from-cache preferences which would block all site paths containing "USPTO.gov", the web browser would still allow any path extension including /mpep to be stored in cache. At step 1118, the user selects key word filter preferences that allow web pages containing the selected key words to be stored in cache. A specific example here might be using the name of the acting Commissioner of Patents, Q. Todd Dickinson, as a key word filter. As a result, even though a page may contain the word patent, trademark, or intellectual property, which would otherwise block the web page from cache as discussed with respect to step 1110 above, the words `Q. Todd Dickinson` would force the web page to be written to cache, contrary to the filter preference set at step 1110. Finally, the user determines whether to give priority to the block-from-cache preferences or the save-to-cache preferences (step 1120). In the present invention, browser operations differ from those of a conventional browser in that every web page accessed is not necessarily stored in cache. In an alternative embodiment of the present invention, the user selects priorities for implementation of filter preferences. Depending on the sequence in which the filters are applied, a web page may or may not be saved to cache. Step 1120 gives the user the option of determining which set of preferences has priority over the other. The logical flow of performing block-from-cache filtering and then save-to-cache filtering has a different outcome from the logical flow of performing save-to-cache filtering and then block-from-cache filtering. Assuming, for example, that the user chooses to give priority to block-from-cache preferences, the browser would set the block-from-cache filter preferences as dominant (step 1122). Alternatively, if the user chooses not to give priority to the block-from-cache preferences, the browser would set the save-to-cache preferences as dominant (step 1124). The process would then end. FIG. 12 is a flowchart depicting a process for blocking a web page from a browser-defined cache. Initially, a user makes a request for a web page (step 1202). After the request has been sent to the appropriate web server by the web browser, the browser receives the web page, which includes the web page identifier, the web page contents, and the web page rating (step 1204). Ordinarily, a conventional browser saves the requested web page to cache without respect to the web page contents, ratings or identity. However, in the present invention, rather than automatically caching the requested web page, the browser filters the web page identifier, contents, and rating using block-from-cache preferences (step 1206). Note that the logical flow of the process depicted in FIG. 12 is meant to reflect a preferred embodiment of the present invention, where block-from-cache preferences are given dominance over the conventional automatic caching of all requested web pages. Additionally, however, save-to-cache preferences are then given dominance over the block-from-cache preferences in this preferred embodiment. In practice, the browser determines which web pages are to be saved to cache based on save-to-cache preferences, and these web pages are automatically saved. All remaining web pages are also saved to cache, unless those web pages meet block-from-cache criterion. After step 1206, the browser checks the web page identifier, contents, and rating against the save-to-cache preferences (step 1208). This step is only performed for web pages that have been determined in step 1206 to be blocked from cache. If a web page has been selected not to be written to cache in step 1206, it may ultimately be written to cache anyway if the user has selected the appropriate save-to-cache filter preferences. In another embodiment of the present invention, after the browser filters a web page's identifier, contents, and rating against the block-from-cache option (step 1206) and save-to-cache option (step 1208), the browser checks to determine if the user has activated the manual block-from-cache override for the specific web page being viewed (step 1210). This further refinement of the present invention enables a user to override filter preference options, which are set in advance of Web browsing sessions and/or may not be entirely applicable to certain accessed web pages under unique circumstances. Therefore, in an alternative embodiment of the present invention, a hot key, or hot spot, is provided on the browser, which, when activated, overrides the block-from-cache option set by the user. The user may force into cache web pages that would otherwise be blocked from cache by filter preference settings. Thus, rather than resetting filter preferences to accommodate a single web page occurrence, the user may activate a save-to-cache hot spot or hot key in order to save only the currently open web page to cache. Upon the opening of another web page, the browser resumes reliance upon preference settings, unless those selected options are again circumvented by use of the hot key. After comparing both the block-from-cache preferences and the save-to-cache preferences to the identity, content, and rating of the open web page, a determination is made whether to block the requested web page from cache (step 1212). At step 1212, if the browser determines that some block-from-cache criteria is met without save-to-cache criteria being found, then the web page is not written to cache, and the process ends. If, on the other hand, no block-from-cache criteria is present in the web page, or if save-to-cache criteria is met, the web page is saved to cache (step 1214). The process then ends. While the descriptions of the preferred embodiments of the present invention relate to accessing and downloading web pages, yet other embodiments relate to alternative browser functionality. Browsers also provide a user with a tool for invoking applications and routines not related to the Internet or web page accessing. For instance, browsers are used as viewers to examine the contents of documents and files. Once these documents and files are opened, the information in them is cached in browser cache. Additionally, browsers are used to invoke specific application programs and applets for performing application functions. The information displayed in the browser viewer is normally cached in the browser cache, similar to a web page. In an alternative embodiment of the present invention, a user may select block-from-cache and save-to-cache preferences related to any sensitive documents or applications. Thus, files and documents that are opened, created or viewed by the browser may be blocked from entry in browser cache. It is important to note that, while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in a form of a computer readable medium of instructions and a variety of forms, and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media, such as floppy discs, hard disk drives, RAM, and CD-ROMs and transmission-type media, such as digital and analog communications links. The description of the present invention has been presented for purposes of illustration and description but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
|
Same subclass Same class Consider this |
||||||||||
