Maintaining document identity across hierarchy and non-hierarchy file systems6330573Abstract A mechanism and method for translating between two incompatible document management systems whereby the identity of a document is maintained. The mechanism and method allows for the maintaining of information related to an original document to reconstruct the original document which was deleted. The maintained information including name information, location information and characteristic information. The characteristic information being properties which are attached to a document in a document management system which separates the content from properties of the document. Claims Having thus described the present invention, we now claim: Description BACKGROUND OF THE INVENTION
Action: The behavior part of a property.
Active Property: A property in which code allows the use of
computational power to either alter the document or
effect another change within the document
management system.
Arbitrary: Ability to provide any property onto a document.
Base Document: Corresponds to the essential bits of a document. There
is only one Base Document per document. It is
responsible for determining a document's content and
may contain properties of the document, and it is part
of every principal's view of the document.
Base Properties: Inherent document properties that are associated with a
Base Document.
Bit Provider: A special property of the base document. It provides
the content for the document by offering read and
write operations. It can also offer additional operations
such as fetching various versions of the document, or
the encrypted version of the content.
Browser: A user interface which allows a user to locate and
organize documents.
Collection: A type of document that contains other documents as
its content.
Combined A document which includes members of a collection
Document: and content.
Content: This is the core information contained within a
document, such as the words in a letter, or the body of
an e-mail message.
Content A document which has content.
Document:
Distributed: Capability of the system to control storage of
documents in different systems (i.e., file systems,
www, e-mail servers, etc.) in a manner invisible to a
user. The system allows for documents located in
multi-repositories to be provided to a principal without
requiring the principal to have knowledge as to where
any of the document's content is stored.
DMS: Document Management System
Document: This refers to a particular content and to any properties
attached to the content. The content referred to may be
a direct referral or an indirect referral. The smallest
element of the DMS. There are four types of
documents; Collection, Content Document,
No-Content Document and Combined Document.
Document Corresponds to a particular view on a document, either
Handle: the universal view, or that of one principal.
DocumentID: A unique identifier for each Base Document. A
Reference Document inherits the DocumentID from its
referent. Document identity is thus established via the
connections between Reference Document References
and Base Documents. Logically, a single document is
a Base Document and any Reference Documents that
refer to it.
Kernel: Manages all operations on a document. A principal
may have more than one kernel.
Multi-Principal: Ability for muitiple principals to have their own set of
properties on a Base Document wherein the properties
of each principal may be different.
Notification: Allows properties and external devices to find out
about operations and events that occur elsewhere in
DMS.
No Content A document which contains only properties.
Document:
Off-the-Shelf Existing applications that use protocols and document
Applications: storage mechanism provided by currently existing
operating systems.
Principal: A "User" of the document management system. Each
person or thing that uses tne document management
system is a principal. A group of people can also be a
principal. Principals are central because each property
on a document can be associated with a principal. This
allows different principals to have different
perspectives on the same document.
Property: Some bit of information or behavior that can be
attached to content. Adding properties to content does
not change the content's identity. Properties are tags
that can be placed on documents, each property has a
name and a value (and optionally a set of methods that
can be invoked).
Property Special case application to extract properties from the
Generator: content of a document.
Reference Corresponds to one principal's view of a document. It
Document: contains a reference to a Base Document (Reference
Document A refers to Base Document B) and
generally also contains additional properties. Properties
added by a Reference Document belong only to that
reference; for another principal to see these properties,
it must explicitly request them. Thus, the view seen by
a principal through his Reference Document is the
document's content (through the Base Document), and
a set of properties (both in the reference and on the
Base Document). Even an owner of a Base Document
can also have a Reference Document to that base, in
which he places personal properties of the document
that should not be considered an essential part of the
document and placed in all other principal's view.
Space: The set of documents (base or references) owned by a
principal.
Static Property: A name-value pair associated with the document.
Unlike active properties, static properties have no
behavior. Provides searchable meta-data information
about a document.
INTRODUCTION As discussed in the background of the invention, the structure that file systems provide for managing files becomes the structure by which users organize and interact with documents. However, documents and files are not the same thing. The present invention has as an immediate goal to separate management of properties related to the document or concerning the document from the management of the document content. Therefore, user-specific document properties are managed close to the document consumer or user of the document rather than where the document is stored. Separation of the management of user properties from the document content itself provides the ability to move control of document management from a closed file system concept to a user-based methodology. FIG. 1 illustrates a distinction between hierarchical storage systems whose documents are organized in accordance with their location described by a hierarchical structure and the present invention where documents are organized according to their properties (e.g. author=dourish, type=paper, status=draft, etc.). This means documents will retain properties even when moved from one location to another, and that property assignment can have a fine granularity. To integrate properties within the document management system of the present invention, the properties need to be presented within the content and/or property read/write path of a computer system, with the ability to both change the results of an operation as well as take other actions. The outline of the concept is described in FIG. 2, where once user (U) issues an operation request (O), prior to that operation being performed by operating system (OS), a call is made to document management system (DMS) A of the present invention, which allows DMS A to function so as to achieve the intended concepts of the present invention. This includes having DMS A interact with operating system (OS), through its own operation request (O'). Once operation request (O') is completed, the results are returned (R) to DMS A which in turn presents results (R') to user (U). With these basic concepts having been presented, a more detailed discussion of the invention is set forth below. Document Management System (DMS) Architecture FIG. 3 sets forth the architecture of a document management system (DMS) A of the present invention in greater detail. Document management system (DMS) A is shown configured for operation with front-end components B, and back-end components C. Front-end components B include applications 10a-10n and 11a-11n, such as word processing applications, mail applications among others. Some of the applications are considered DMS aware 10a-10n which means these applications understand DMS protocols for storing, retrieving and otherwise interacting with DMS A. Other components are considered non-DMS aware 11a-11n. Browsers 12a (DMS aware) and 12b (non-DMS aware) are considered specialized forms of applications. In order for the non-DMS-aware applications 11a-11n and 12b to be able to communicate with DMS A, front-end translator 13 is provided. Similarly, back-end components C can include a plurality of repositories 14a-14n, where the content of documents are stored. Such repositories can include the hard disc of a principal's computer, a file system server, a web page, a dynamic real time data transmission source, as well as other data repositories. To retrieve data content from repositories 14a-14n, bit providers, such as bit provider 16, are used. These bit providers are provided with the capability to translate appropriate storage protocols. Principals 1-n each have their own kernel 18a-18n for managing documents, such as documents 20a-20n. Documents 20a-20n are considered to be documents the corresponding principal 1-n has brought into its document management space. Particularly, they are documents that a principal considers to be of value and therefore has in some manner marked as a document of the principal. The document, for example, may be a document which the principal created, it may be an e-mail sent or received by the principal, a web page found by the principal, a real-time data input such as an electronic camera forwarding a continuous stream of images, or any other form of electronic data (including video, audio, text, etc.) brought into the DMS document space. Each of the documents 20a-20n have static properties 22 and/or active properties 24 placed thereon. Document 20a, is considered to be a base document and is referenced by reference documents 20b-20c. As will be discussed in greater detail below, in addition to base document 20a having static properties 22 and/or active properties 24, base document 20a will also carry base properties 26 which can be static properties 22 and/or active properties 24 (Static properties are shown with a--and active properties are shown with a --o). Reference documents 20b-20c are configured to interact with base document 20a. Both base documents and reference documents can also hold static properties 22 and/or active properties 24. When principals 2,3 access base document 20a for the first time, corresponding reference documents 20b-20c are created under kernels 18b-18c, respectively. Reference documents 20b-20c store links 28 and 30 to unambiguously identify their base document 20a. In particular, in the present invention each base document is stored with a document ID which is a unique identifier for that document. When reference documents 20b-20c are created, they generate links to the specific document ID of their base document. Alternatively, if principal n references reference document 20c, reference document 20n is created with a link 32 to reference document 20c of Principal 3. By this link principal n will be able to view (i.e. its document handle) the public properties principal 3 has attached to its reference document 20c as well as the base properties and public reference properties of base document 20a. This illustrates the concept of chaining. The above described architecture allows for sharing and transmission of documents between principals and provides the flexibility needed for organizing documents. With continuing attention to FIG. 3, it is to be noted at this point that while links 28-30 are shown from one document to another, communication within DMS A is normally achieved by communication between kernels 18a-18n. Therefore, when DMS A communicates with either front-end components B, back-end components C, or communication occurs between principals within DMS A, this communication occurs through kernels 18a-18n. It is however, appreciated the invention will work with other communication configurations as well. Using the described architecture, DMS A of the present invention does not require the principal to operate within a strict hierarchy such as in file or folder-type environments. Rather, properties 22, 24 which are attached to documents allows a principal to search and organize documents in accordance with how the principal finds it most useful. For instance, if principal 1 (owner of kernel 18a) creates a base document with content, and stores it within DMS A, and principal 2 (owner of kernel 18b) wishes to use that document and organize it in accordance with its own needs, principal 2 can place properties on Reference Document 20b. By placement of these properties, principal 2 can retrieve the base document in a manner different than that envisioned by principal 1. Further, by interacting with browser 12, a principal may run a query requesting all documents having a selected property. Specifically, a user may run query language requests over existing properties Therefore, a point of the present invention is that DMS A manages a document space where properties are attached by different principals such that actions occur which are appropriate for a particular principal, and are not necessarily equivalent to the organizational structure of the original author of a document or even to other principals. Another noted aspect of the present invention is that since the use of properties separates a document's inherent identity from its properties, from a principal's perspective, instead of requiring a document to reside on a single machine, documents in essence can reside on multiple machines (base document 20a can reside on all or any one of kernels 18a-18n). Further, since properties associated with a document follow the document created by a principal (for example, properties on document 20b of kernel 18b, may reference base document 20a), properties of document 20b will run on kernel 18b, even though the properties of document 20b are logically associated with base document 20a. Therefore, if a property associated with document 20b (which references base document 20a) incurs any costs due to its operation, those costs are borne by kernel 18b (i.e. principal 2), since properties are maintained with the principal who put the properties onto a document. Support for Native Applications A DMS document interface provides access to documents as Java objects. Applications can make use of this interface by importing the relevant package in their Java code, and coding to the API provided for accessing documents, collections and properties. This is the standard means to build new DMS-aware applications and to experiment with new interaction models. DMS Browser 12 (of FIG. 3) can be regarded as a DMS application and is built at this level. The DMS document interface provides Document and Property classes, with specialized subclasses supporting all the functionality described here (such as collections, access to WWW documents, etc.). Applications can provide a direct view of DMS documents, perhaps with a content-specific visualization, or can provide a wholly different interface, using DMS as a property-based document service back-end. Support for Off-the-Shelf Applications Another level of access is through translators (such as translator 13 of FIG. 3). In an existing embodiment, a server implementing the NFS protocol is used as the translator. This is a native NFS server implementation in pure Java. The translator (or DMS NFS server) provides access to the DMS document space to any NFS client; the server is used to allow existing off-the-shelf applications such as Microsoft Word to make use of DMS documents; on PC's, DMS simply looks like another disk to these applications, while on UNIX machines, DMS A looks like part of the standard network filesystem. Critically, though, what is achieved through this translator is that DMS A is directly in the content and property read/write path for existing or off-the-shelf applications. The alternative approach would be to attempt to post-process files written to a traditional filesystem by applications, such as Word, that could not be changed to accommodate DMS A. By instead providing a filesystem interface directly to these applications, it makes it possible to execute relevant properties on the content and property read/write path. Furthermore, it is ensured that relevant properties (such as ones which record when the document was last used or modified) are kept up-to-date. Even though the application is written to use filesystem information, the DMS database remains up to date, because DMS A is the filesystem. As part of its interface to the DMS database layer, NFS provides access to the query mechanism. Appropriately formatted directory names are interpreted as queries, which appear to "contain" the documents returned by the query. Although DMS provides this NFS service, DMS is not a storage layer. Documents actually live in other repositories. However, using the NFS layer provides uniform access to a variety of other repositories (so that documents available over the Web appear in the same space as documents in a networked file system). The combination of this uniformity along with the ability to update document properties by being in the read and write path makes the NFS service a valuable component for the desired level of integration with familiar applications. It is to be appreciated that while a Java implementation, as well as a server implementing NFS protocol are discussed, these are only potential mechanisms of implementing the present invention and other options are also available. Maintaining Document Identity During Conversion of Off-the-Shelf Application Instructions to DMS Protocol As has been previously discussed, translators (e.g. translator 13 of FIG. 3) are provided as part of DMS A, to allow interaction with off-the-shelf applications. Translators allow not only existing off-the-shelf applications to interact with DMS A, but also will allow yet to be built applications to interface when a corresponding translation mechanism is added. A particular aspect of the translation procedure is the need to maintain a consistent document identity and persistent properties within DMS A, when DMS A is accessed by off-the-shelf applications through the translator interface. Existing applications, including but not limited to word processing, email, www based applications, have an awareness regarding directories which employ standard hierarchical file systems. When a request is made by one of the existing applications, it is expected that operations will be based on a simple, straightforward model of file identity, where the file is identified by its name, coded in a form such as :.backslash.dir1.backslash.dir2.backslash.filename. Applications that use this type of file system format frequently exploit this method of identification, wherein the file name also identifies its location, when saving new versions of a file. For instance, to protect against write failures, it is common to save a new version to a new file with a different name, and after that save has been deemed successful, erase the original file and rename the new file to the same name as the original. The above operation is depicted by FIG. 4. In an existing file system a file (i.e. .backslash.dir1.backslash.dir2.backslash.filename) is stored in a data storage repository. A user edits the content of this document with an intention to store the edited content of the document in place of the existing content (40). Upon the issuance of a write instruction, the computer system creates a new file "temp" (42). The new version of the content is then stored into this temporary file under the newly created name (i.e. "temp") (44). Once the system determines that the write process has been successful and the new version is stored in the "temp" file, the original "filename" file is deleted (46). The "temp" file is then renamed as "filename" (48). Following this operation, any subsequent operation request for the original file (i.e., by use of the name .backslash.dir1.backslash.dir2.backslash.filename) will receive the newly saved content, as intended by the user. While variations of this procedure exist depending upon the particular application, the concept of ensuring that the new file name of the new file has the same name (including the directory location) as the original file is commonly provided for in off-the-shelf applications. The reason the procedures of FIG. 4, and other similar procedures, are successful relies on the fact that these systems assume a file having the name "filename" within a specific directory path (dir1,dir2) is the same file since it is in the same location. Particularly, there is no consistent mechanism which allows a user to differentiate between the original "filename" file and the "temp" file once "temp" has been given its new name (i.e. "filename"). Thus, it is not possible to distinguish between writing over the existing "filename" file and replacing it with the "temp" file. DMS A offers capabilities for interacting with document spaces not possible with existing hierarchical based file systems. In order to work with existing applications, DMS A provides interfaces to off-the-shelf file systems. In this case, the existing applications will not be able to use the new features, but will still be able to read and write content. The new information recorded by DMS A changes the system's notion of the file's "identity." The file name and location no longer uniquely identify a particular file. This is a basic distinction between a "document" in DMS A and a file in an off-the-shelf file system. Separating the inherent identity of the document based on its location as done in DMS A, creates a potential problem in that the state of the document depends not only on its name, location, and file contents, but also on the properties which are attached to the document. Thus, an attempt to manage a DMS-type document by existing applications which do not understand properties is unreliable. DMS A documents which are moved around via off-the-shelf file system interfaces should correspondingly move all of the additional information, such as properties, so as to maintain the properties in association with the content. However, under existing process there is no procedure to ensure this outcome. An example of this problem is illustrated in connection with FIGS. 5a-5c. As depicted in FIG. 5a, a document 50 has attached thereto active and static properties 52 and 54, and its content 56 is stored separate from the properties. A user edits document (i.e. "filename") 50 through one of various existing file system interfaces (i.e. 11a-11n of FIG. 3) via a translator (i.e. translator 13 of FIG. 3). Once editing is complete, as shown in FIG. 5b, the application issues a write instruction 60 to write the new content 56' to another file (i.e. "temp") 62. Once it is verified the content is stored in the "temp" file, the application acts to delete the original document (i.e. "filename") 64 and renames "temp" as "filename" 66. FIG. 5c illustrates that following this procedure, new content 56' has lost its connection to properties 52, 54 which are attached to original document 50. This is true even though all the user did was save the document. In addition, if the original file actually lived somewhere other than on the user's machine, that version wasn't updated, since the new content was written to a different file altogether. The source of the problem is that the application that saved the new version of the document was not aware that any other information was attached to it or that it was stored someplace else entirely, and as a result, believed that it could completely reconstruct the document by creating a document of the same name in its place and saving only its content. As a further explanation, in DMS A each document has a single unique identifier, i.e. a document id. Therefore, for the following example it is assumed that "filename" has a DMS document id of "101." Then under the scenario of writing a new document, an existing file system procedure creates "temp" which in DMS A is document "102." The new content is then saved to "temp." The system deletes document "101" and then renames document "temp" to "filename." However, this is still document "102" in DMS A. It has simply been provided with a different name. Again, the problem which exists is that DMS A identifies files by their document id and names are simply another property which may be attached to a document. For example, one document can have a plurality of names by different users or one users may have a plurality of names for a single document. Thus, execution of write procedures by existing non-DMS aware applications lowers the reliability of interactions between DMS A. A specific detrimental outcome of the above example is that a user may have attached properties to the original document "filename", (for example, they may have attached a property such as "interesting" which would indicate to that particular user that it is part of an "interesting" collection of documents). When the property "interesting" was attached, it was attached to document "101." However, when the new version of the document is saved the process ends up deleting document "101" deleting the "interesting" property. Therefore, the new document "102" will have the content which has been revised but will not have the "interesting" or other properties attached to document "101." This occurs, since under existing hierarchical file systems there is the assumption that since it is the same file name, it is in the same location. However, as previously noted, the inherent relationship between file name and location does not exist in DMS A. In consideration of the above problems, a mechanism has been developed for maintaining the additional information, in the form of properties, attached to documents in DMS A when access is made through existing off-the-shelf file system interfaces that assume name equivalents for file identity. Using this mechanism properties attached to documents will be maintained when DMS A interacts with off-the-shelf file system interfaces. The foregoing problem is especially prevalent during delete and rename operations. A solution which is presented is directed to the situation where a document is to be deleted. As part of the solution, the document is not actually deleted. Rather, it is made "invisible" to the user. For example, if all the documents are listed, the deleted document will not appear. However, it is still maintained within the system and includes the capability of remembering its name. Then if a user attempts to create a document with that name or rename a document to that same name, the present invention interprets this as an attempt to recreate that original document. The two main instances when the present situation arises, are when an existing document is renamed to the same name, i.e. some editing of the content has occurred, and/or where a new document is created with the previously existing name. For explanation purposes, the following will discuss a situation where a document is to be renamed, as this is the more complicated of the situations and the instance of creating a new document includes the same general concepts. For purposes of the following discussion, it will be assumed that the concept of "rename" will be a procedure where an existing application wishes to alter content of the document. To accomplish this, an existing document (document "101") is resurrected with all of its existing properties attached, and the contents of a "temp" document (document "102") are copied into document "101". Thereafter, document "102" is deleted. Attention is directed to FIG. 6. When a document (document "101") is deleted or renamed by an off-the-shelf system interface 80, the present invention maintains a copy of the properties that were attached to the document (document "101") 82. The system further maintains the name and location of the original document 84. The information in steps 82 and 84 is maintained for a predetermined amount of time 86. If no further instructions are received during the predetermined amount of time, information maintained in steps 82 and 84 is deleted 88 and the newly designated document is maintained. If however, an application attempts to create a document with the same name, or tries to rename a document to that same name, additional steps are undertaken 90. It is to be noted that programs do not have a long delay period to replace the deleted document with a new document. Therefore, a preferred embodiment would have a delay time in step 86 of approximately 10 seconds, however, other times may be more appropriate depending on the use of the system and the programs involved. Next, upon the sensing the application interaction in step 90, the present invention "resurrects" the original document (document "101") still in its original location 92. Then, the new content (from document "102") is copied into the original document (document "101") 94, and the new document ("102") is deleted 96. By the procedures shown in FIG. 6, the present invention maintains the original properties and identity of the original document (document "101") and the new content (i.e., the content from document "102") is written to the original document (document "101") so that other users referring to that document will find it in the same place and with the same name. Thus, when accessed through existing off-the-shelf interfaces, the present invention offers the same name--equivalent semantics as the existing document systems. Any content that shows up with the same name as a recently deleted file acquires all of the additional properties that the original document included. It is understood that there are other processes by which the results of the present invention may be obtained. It would be possible to copy the properties or copy the file contents in order to maintain this integrity over existing off-the-shelf file system interfaces. However, the present invention addresses and solves the problem that when an existing application does a deletion followed by a re-creation procedure, within a certain time period, the document will have had its properties preserved. Additionally, the document id is also preserved. Therefore, accesses by document id will continue to function even after the operation. Thus, the present invention overcomes the problems associated with interfacing between existing off-the-shelf file systems (which operate under the assumption of inherent name location equivalents) when interfacing to DMS A where the location of the document content is separated from the properties of the document, and where those properties are attached by a user to the document. One example where the value of this present invention is seen, is in connection with collections. Particularly, assuming a DMS A user has generated a collection of documents, and the manner in which the user remembers the documents of the collection is by writing down the document id (i.e. the user has document 101, document 102, document 103 . . . document 10n). Then if the user makes a new document (i.e. document 10x) and copies document 101 over to document 10x, without this mechanism document 10x would simply disappear from the user's collection as the properties would not be transferred. However, using the described mechanism, the "new" document will remain in the collection. So the present invention is concerned with inheriting properties on newly created documents. The present invention looks to existing file systems to detect when the existing file system is attempting to take an action which will attempt to maintain document identity. When this is sensed, DMS A will apply the mechanism described above to maintain identity within DMS A. The steps for actually storing the content (i.e. from document 102 to document 101) are well-known in the art. Particularly, one manner of obtaining this outcome is by adjusting a pointer to the area of the contents. The invention has been described with reference to the preferred embodiment. Obviously, modifications and alterations will occur to others upon reading and understanding this specification. It is intended to include all such modifications and alterations in so far as they come within the scope of the appended claims or the equivalents thereof.
|
Same subclass Same class Consider this |
||||||||||
