Method and system for consistent update and retrieval of document in a WWW server6510439Abstract A system for providing coherent access to different versions of a group of documents stored in a file system and retrievable over the Internet from an HTTP server includes a state management server which stores registration data indicating the file paths of the documents in each version of the group and a set of index paths used by clients to reference documents in the group. State information identifying the version of said group previously accessed by a client is stored in a cookie which is associated with the domain of the state management server and the path of the group. A client requests a document from the group by issuing a request to an HTTP server including the index path of the desired document. The request and associated cookie, which is automatically transmitted by the client Internet software, is forwarded to the state management server. The state information stored in the cookie is extracted and used to determine which version of the group of documents should be accessed. The index path is then mapped to the file path for the appropriate version of the requested document and the data information is updated to reflect the present access. The mapped file path and cookie are then returned to the HTTP server. Claims We claim: Description TECHNICAL FIELD
INDEX PATH FILE PATH
/usr/httpd/docs/john/home.html /usr/httpd/docs/john/home.html
/usr/httpd/docs/john/thesis/toc.html /usr/httpd/docs/john/thesis/v1/
toc.html
/usr/httpd/docs/john/thesis/ch1.html /usr/httpd/docs/john/thesis/v1/
ch1.html
/usr/httpd/docs/john/thesis/ch2.html /usr/httpd/docs/john/thesis/v1/
ch2.html
When a new version of the group is created, i.e., in response to revising one or more of the documents in the group, the new version is recorded in the registration table. With reference to FIG. 3, the registration table 14 has been updated to identify a version 2 of the john_thesis group 30a. The record containing the file path set 36b for version 2 is identified by its group version number (2). This version contains 5 documents. Because an additional document has been added to version 2 of the group, as indicated by file path 40 in file path set 36b, the index path set is also updated to include a path reference 42 to the new file. The index paths and corresponding physical locations of the files in version two of group 30a are therefore:
INDEX PATH FILE PATH
/usr/httpd/docs/john/home.html /usr/httpd/docs/john/home.html
/usr/httpd/docs/john/thesis/toc.html /usr/httpd/docs/john/thesis/v2/
toc.html
/usr/httpd/docs/john/thesis/ch1.html /usr/httpd/docs/john/thesis/v2/
ch1.html
/usr/httpd/docs/john/thesis/ch2.html /usr/httpd/docs/john/thesis/v2/
ch2.html
/usr/httpd/docs/john/thesis/ch3.html /usr/httpd/docs/john/thesis/v2/
ch3.html
When a group is revised to delete a document, care must be taken to preserve the mapping of the paths in the index path group since the deleted document is still a part of one or more earlier versions of the group. As shown in FIG. 4, the document "toc.html" has been removed from file path set 36c representing version 3 of group 30a. Note that in this particular table embodiment, the mapping between the paths in the index path set 34a and the paths in the file path sets 36a-36c are determined by the order paths are listed. To preserve the mapping between the paths in the index path set 34a and the prior versions of the group, the deleted file path is replaced with a place holder 44. The corresponding index path 46 in index path set 34a still maps to the appropriate file paths 48, 50 in the file path sets 36a, 36b for the previous versions of the group. It should be noted that a new version does not need to add additional files. Instead, the number and designation of the files can remain the same with variations in the file contents. When a client requests a document through the HTTP server 16, the client identifies the document by its index path and the index path is mapped to the appropriate file path version. In order to determine which is the appropriate group version to map a client request to, some memory regarding prior accesses to the group by the client must be maintained. According to an aspect of the invention, this "state" information is maintained in the form of a "cookie" which is stored on the client's computer system and which is automatically forwarded to the HTTP server 16 by the client's Internet software. A cookie is a small data structure used by a web server to deliver state data to a web client user and request that the client store the information. The HTTP server supplying the cookie also adds the information about the domain and the subset of URLs for which the cookie is applicable. A typical cookie contains a NAME and VALUE pair which is used to define a data element and an associated value. A cookie also contains a DOMAIN field which stores data indicating the server-side domain for which the cookie is valid and a PATH field which stores data indicating the subset of URL's on the specified domain for which the cookie is valid. When a client makes a request to a server in a given domain, the client software matches the server's domain with the domain attribute in the cookie list and determine which of those matching cookies have a path attribute which is a prefix of the requested URL. All the cookies that match both domain and path attributes are sent to the server along with the URL request. A cookie can also include an EXPIRE field which specifies the date and/or time at which the cookie will expire. After a cookie has expired, it is discarded by the client. Through the use of the DOMAIN and PATH fields, a cookie can be configured to be valid for a specific Group Path 32 associated with a group of documents so that when a client attempts to access a document by referencing its index path, the cookie is forwarded to the HTTP server 16 along with the document request. According to the invention, the VALUE field of the cookie is used to store information indicating which version of the group was most recently accessed by the client and which documents in that group have already been accessed. The specific operation of each of the elements illustrated in FIG. 1 and the use of cookies to store state information indicating the correct version of documents to be retrieved will now be discussed. The HTTP server 16 is the server side front end which interacts with the clients 24. The HTTP server 16 receives a client request and parses it to extract the URL of the requested document. HTTP servers are designed to serve documents and in most cases do not process data sent from a client, such as data in the form of a cookie. In such a situation, a gateway program is used to process the client data on the server end. In the Internet environment, the Common Gateway Interface ("CGI") is the mechanism which controls the flow of data from the HTTP server to the gateway program. According to the CGI specification, data is sent to the gateway programs through environment variables and read by the program from standard input. To return data back to the HTTP server, the gateway program writes out the data to its standard output, which is then read by the HTTP server and, after proper modifications to the data headers, returned to the client. In the present invention, a CGI script 18 is used as an interface between the HTTP server 16 and the State Management Server 12. When a client request is received, the HTTP server 16 sets the CGI environment variables to reflect the full URL of the requested document and the cookie(s) accompanying the client's HTTP request. The CGI script 18 is then executed. The script 18 is configured to establish an Internet socket connection with the State Management Server 12 and then forward the URL and any received cookies to the SMS 12. The particular implementation of such a CGI script will be apparent to one of skill in the art and is therefore not discussed in detail herein. The SMS 12 is configured to retain a copy of the most recent Registration Table 14 in memory and to use the data in the Registration Table to map client requests to the proper version of documents. Preferably, client updates to the Registration table are managed by a Group Specification User Interface ("GSUI") program 22. After an update to the Registration Table is made, the GSUI 22 sends an interrupt to the SMS 12 indicating that the Registration Table should be reloaded. Alternatively, the SMS 12 can load the Registration Table only on an as-needed basis. In a further embodiment, the Registration Table maintained by the GSUI 22 can be stored in memory which is shared by the SMS 12 such that the most recent version is automatically available. When the SMS 12 receives a forwarded URL and cookie from the CGI program, it accesses the Registration Table data and determines the file path of the appropriate document for the client to receive according to the data contained in the cookie. The decision as to what version of a document should be provided to a client is made based on state information that is stored in the cookie, i.e., the last version of the group that was accessed and which document in that group has already been requested. The cookie state information is then revised to indicate the new reference and the determined file path, and a new cookie is returned to the HTTP server 16 via the CGI script 18. The HTTP server then retrieves the identified document and returns it and the modified cookie to the client 24. In a particular embodiment of the invention, a separate cookie is used for each group of documents. The group ID of the group for which a cookie is associated with is encoded in the NAME field, i.e., NAME="john_thesis". The PATH is set to the group path of this group, i.e., PATH="/usr/httpd/docs/john". The DOMAIN field is set to equal the Internet domain address of the HTTP server 16. In conventional Internet Browser software, cookies are implemented with only one VALUE field. Thus, to store both the group version number and the document file access history, these two informational values must be combined. In the preferred embodiment, a first portion of the VALUE field is used to store a group version number GVN and a second portion of the VALUE field is used to store information about what documents in that version have already been accessed by the client during the current logical session, i.e., in the form of a bit vector BV, where each bit corresponds to a path in the index path set 34 and the bit value indicates whether the client has accessed the corresponding file in that version or not. Alternatively, the values can be stored in separate cookies. A logical session begins when a client makes a request for a document in a document group without an accompanying cookie. When a request is received by the SMS without a cookie, the SMS accesses the Registration table and maps the received index path to the corresponding file path in the file path set of the most recent version of the group (i.e., the one with the largest group version number). The file path is then returned to the HTTP server 16 through the CGI script 18. The SMS 12 also creates a new cookie associated with the group and which contains a VALUE field identifying the most recent group version and the index path requested. If the client request is accompanied by a cookie, the SMS recognizes it as an ongoing session, updates the state information on the cookie and returns this cookie together with the appropriate file path of the version to be returned. A variety of techniques can be used to encode state information in a cookie. A particular implementation is now discussed through the following examples. For this example, the Value information in a cookie is limited to a 32 bit segment. If the maximum number of versions of a group that can be maintained is limited to 4, then the GVN portion of the VALUE field only needs to be 2 bits long. The remaining 30 bits comprise the bit vector BV. Each bit of BV represents a document in the group and the value of the bit specifies if a document of this particular version has previously been accessed by the client in the current logical session. If the bit is 1, then the document was previously accessed, otherwise not. In this example, given that BV is 30 bits long, a group can consist of at most 30 documents. It is apparent that for a fixed-length VALUE field, the number of documents in a group and the number of supported versions are related. However, this is not a major concern when the VALUE field is long or of unlimited length, or if more than one VALUE field is available. For example, with reference to FIG. 3, if a client access received at HTTP server 16 for document "/usr/httpd/docs/john/thesis/toc.html" does not include an accompanying cookie from the client, the request is considered to be the start of a logical session. The SMS determines the position of the requested document in the index path set 34a and then maps this index path to the corresponding file path in the file path set 36b for the most recent version (here version 2). The SMS returns the file path "/usr/httpd/docs/john/thesis/v2/toc.html" as the file path and also creates and returns a cookie. The cookie NAME is "john/_thesis", thus associating the cookie with the referenced group. The DOMAIN field is set to the domain of the HTTP server 16 and the PATH field in the cookie is set to the Group Path, "/usr/httpd/docs/john." To record the initial client access, the GVN portion of the VALUE field is set to 2, since version 2 is the most recent version accessed, and the BV portion of the VALUE field is set to the binary value "01000. . . ", where the first five bits indicate that after this request is served, the client will have accessed only the second document in the group of five documents in this version. Because the group only contains 5 documents, any additional bits are don't cares. Subsequently, the client requests the document "/usr/httpd/docs/john/thesis/ch1.html" from the HTML Server 16. The prefix of this request ("/usr/httpd/docs/john") matches the PATH information in the cookie and the DOMAIN is also the same. Thus, the client will return the the cookie to the HTTP server 16 along with the document request. When the request and cookie are forwarded to the SMS, the SMS reads the NAME field in the cookie to determine the Group ID of the group being accessed. The GVN and BV portions of the VALUE field are extracted. In this example, the GVN value indicates that a logical session has previously been established with access to version 2 of this document group. The index path in the request is mapped to the corresponding file path in the version 2 file path set 36b to identify the file path of the correct version 2 document to return, here "/usr/httpd/docs/john/thesis/v2/ch1.html". The SMS also revises the BV data field to be "01100", reflecting the fact that after the request is serviced, the second and third documents of this version will have been accessed by the client. As a second example, another client issues a request to the HTTP server 16 for document "/usr/httpd/docs/john/thesis/ch1.html" . A cookie is provided with this request having a NAME of "john_thesis" and where the GVN and BV components of the VALUE field are "1" and "1101x", respectively, where "x" is a don't care. The receipt of this cookie by the SMS indicates that the second client has already established a logical session with respect to version 1 of this group. Thus, the index path will be mapped to a file path in the file path set 36a corresponding to group version 1. Here, the returned file path is "/usr/httpd/docs/john/thesis/v1/ch1.html". In addition, based on the cookie BV field, after this request is serviced, the second client will have accessed all documents in group 1. If this condition is defined to indicate the end of a logical session, the returned cookie is modified to indicate that a subsequent access to this group indicates the start of a new logical session. In one implementation, the BV is returned with all relevant bits set to "1", thus indicating that all documents in the version have been accessed. When the SMS receives a cookie with a BV having all document bits set to "1", it treats the situation essentially as if no cookie had been returned. Alternatively, if supported by the client software, the cookie can be returned with an EXPIRE field set to the current time. This indicates to the client system that the cookie has expired and should be deleted. Thus, in a subsequent access by the client, no cookie will be returned. Another possible terminating event occurs when the client requests a document which has previously been requested, even if all the documents in the group have not yet been requested. In such a case, the SMS can consider the present logical session terminated and open a new session by returning the file path of the most recent version of the document and indicating the most recent version in the GVN portion of the VALUE field. The rational behind this terminating event is the concept that if an already accessed document is being requested again, then the client does not appear to have a need for unrequested documents of the old version and hence it is reasonable to start a new logical session. It should be noted that if this end condition is implemented, setting all document bits in a BV to "1", i.e., after all documents have been accessed, will automatically initiate a new session on a subsequent access by the client. A further mechanism through which a logical session can be terminated is by the elapsing of a defined time-out period in the Registration Table. If a client establishes a logical session but does not access a document in the group within the time out period, the logical session expires and the next access to the group will be considered a new logical session. A time-out ("TO") period can be defined globally, on a group basis, or separately for each version of a group. Various time-out periods 50 are defined in the Registration Tables illustrated in FIGS. 2-4. When a client makes a request for a document which has an associated time out period, the cookie returned by the SMS has its EXPIRES field set to be the present time plus the time out period. If the client makes subsequent requests for documents in this version, then the EXPIRES field in the cookie will be appropriately updated with each request. However, if the time between two requests to the group exceeds the time-out period, the expiration time for the cookie will elapse and the client Internet software will automatically discard the cookie. A subsequent request by the client to the group of documents will not be accompanied by a cookie and thus, a new logical session will be started and the latest version of the requested document will be returned. This use of the Time-Out period also serves as a guide to determine when the SMS can remove an old version of a group, i.e., perform garbage collection. According to one method, the SMS maintains a table of expiration times for each group version defined in the Registration Table. Each time the SMS receives a request for a document in version v of group G, the SMS updates the expiration time for that version to equal the current time plus the time out period. The passing of the expiration time indicates that there should be no clients with a logical session that is open to the associated version. Thus, on a periodic basis, the SMS can delete from the registration table stored in internal memory those versions with expiration times that have elapsed without interfering with any open logical sessions. The SMS 12 can also forward such expiration time information to the GSUI 22 to permit a similar garbage collection in the master copy of the Registration Table if desired. The method for consistent update and retrieval of documents in a WWW server performed by the SMS 12 will now be summarized with reference to the flow chart of FIG. 5. Initially, the SMS receives an index URL (referencing an index path from the Registration Table) from the client via HTTP server 16 and the CGI program 18 (step 60). The SMS examines the request and determines whether a cookie (containing state information) has been forwarded along with the client request (step 62). If no cookie is present, a new logical session is started. The SMS extracts a Group path and index path from the URL (step 64) and cross-references the Group information to the Registration table to determine the Group ID of the group referenced by the index URL (step 66). After the group is identified, the index path is mapped to the corresponding file path in the most recent version of the identified group (step 68). A new cookie is then generated containing state data which associates the cookie with the determined group and indicates the most recent group version along with information indicating the particular document entry which has been accessed (step 70). Finally, the mapped document URL and new cookie are returned to the client via the CGI script 18 and HTTP Server 16 (step 72). If a cookie is included with the client request, a logical session has previously been initiated. The Group ID and version for the session, as well as the group access history (e.g., in the form of a bit vector) are extracted from the cookie data (step 74) and the index path is extracted from the index URL (step 76). Based on the group access history, a determination is made as to whether a new logical session should be started (step 78). Various conditions for triggering a new logical session are discussed above. If a new logical session is warranted, the SMS 12 proceeds in a manner similar to the case where no cookie is present. The index URL is mapped to the corresponding file path in the most recent version of the group (step 80) and the cookie information is modified to reflect the new version number and the accessed document (step 82). The modified cookie and mapped URL are then returned to the client (step 72). If an existing logical session is continued, the SMS 12 maps the index URL to the corresponding file path in the group version indicated by the cookie data (step 84). The cookie information is modified to record the document access (step 86). Finally, the modified cookie and mapped document URL are returned to the client (step 72). While the invention has been particularly shown and described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
