Hierarchical document cross-reference system and method6978420Abstract A hierarchical document cross-reference system comprises a document server computer remotely accessible by a user computer. The document server computer includes a database which stores the contents of a first document and a second document. The first document contains one or more segments and the second document contains one or more segments. Each segment is identified by a segment identifier. The document server computer also includes a module executable in the document server computer. The module is configured to receive a request to cross-reference the first document and the second document on a key phrase. The module searches the first document and the second document for the key phrase and identifies the segments in the documents containing the key phrase. The module then displays on the user computer a side-by-side display listing the segment identifiers for the identified segments. The segment identifiers for the identified segments in the first document containing the key phrase are displayed in a first list and the segment identifiers for the identified segments in the second document containing the key phrase are displayed in a second list. The user can subsequently select a segment identifier from each list and submit the segment identifiers for display. The module then displays the contents of each segment, with the key phrase highlighted in a distinct color or by other means. Claims 1. A hierarchical document cross-referencing system comprising: Description BACKGROUND
Continuing the FAA example, the FAA official may advantageously select the "Standards and Recommended Practices" and "Digital Voice Recorder" documents and submit a key phrase for cross-reference within the documents by the document server computer 104. Subsequently, the document server computer 104 searches the two documents for the key phrase on a segment by segment basis. For example, the "Standards and Recommended Practices" document may include thousands of segments including two segments entitled "Routing of Messages" and "Security on the Internet," and the "Digital Voice Recorder" document may have segments such as "File Formatting" and "Digital-to-Analog Conversion." If the key phrase is found in the segment, the segment is identified as one containing the key phrase. Once all the segments in the documents have been searched, the identified segments are listed in a side-by-side display. For example, both the "Routing of Messages" segment and "File Formatting" segments may be listed in the side-by-side display as containing the key phrase. One display lists the identified segment(s) from the first document and the other display lists the identified segment(s) from the second document. If the key phrase is not found within a segment, that segment is not listed. Subsequently, the user may advantageously select and submit one segment from the first list and another segment from the second list in the side-by-side display for cross-reference by the document server computer 104. The document server computer 104 then advantageously searches the contents of the submitted segments for the key phrase and identifies components, such as, by way of example, a paragraph or a sentence, within the segments that contain the key phrase. The contents of the submitted segments are presented to the user in a side-by-side display. One display shows the contents of one segment and the other display shows the contents of the other segment. Within each displayed segment, the key phrase is presented in a manner which distinguishes it from the rest of the contents of the displayed segment. For example, the key phrase can be presented in one color, such as blue, while the balance of the contents of the segment can be presented in a different color, such as red. In another embodiment, the contents of one submitted segment may be presented in a first color, such as green, while the contents of the other submitted segment may be presented in a second color, such as white, to further distinguish the contents of one segment from the contents of the other segment. The key phrase may alternatively be identified or distinguished in the display of the segment contents by means such as underlining, bolding, a change in font, etc. As a further alternative, these highlighting techniques may be used in a combination, such as both underlining the key phase and displaying it in a contrasting color. Continuing the FAA example, the official may select and submit the "Routing of Messages" and the "File Formatting" segments (from the "Standards and Recommended Practices" and "Digital Voice Recorder" documents, respectively) for cross-reference by a key phrase, from the first and second lists in the side-by-side display. Subsequently, the document server computer searches the "Routing of Messages" and the "File Formatting" segments for the key phrase. The components of the segments containing the key phrase are identified. The contents of the "Routing of Messages" segment are displayed in one side of the side-by-side display. The contents of the "File Formatting" segment are displayed in the other side of the side-by-side display. The key phrase may advantageously appear in a different color or font than the rest of the contents of the respective segments. One benefit of the invention is that it permits users to cross-reference very large documents in stages or hierarchies. At the initial or first hierarchy is the type of documents or category of documents. Each type or category of document is an element or node in the particular hierarchy. In the FAA example above, the FAA category of documents can advantageously be one node in the top hierarchy. Another node in the top hierarchy would be, for example, the DOI documents. The documents contained in the category of documents compose the next or second hierarchy. Each document is an element or node in this particular hierarchy. In the FAA example, each one of the hundreds of documents, including the "Standards and Recommended Practices" Document and the "Digital Voice Recorder" document, can advantageously be one node in the second hierarchy. The segments contained in a document compose the third hierarchy. In the FAA example, each one of the thousands of segments, including the "Routing of Messages" segment and the "Security on the Internet" segment, can advantageously be one node in the third hierarchy. The components contained in the segments are elements in a fourth hierarchy. For example, each sentence (component) identified as containing the key phrase can advantageously be considered one node in the fourth hierarchy. Finally, subcomponents (i.e., the word or words comprising the key phrase) of the segments may be considered a fifth hierarchy. Thus, the invention advantageously cross-references documents in a layered or hierarchical manner. The documents can be very large, as large as computer memory may allow. In one embodiment, requesting a cross-reference of a particular category of documents (an element in the first hierarchy), which may comprise a simple search of the documents in the category for the key phrase, results in a listing of the documents that contain the key phrase, or, alternatively, a listing of all of the documents in the category (the listing of elements in the second hierarchy). Requesting a cross-reference by the key phrase of two or more listed documents (elements in the second hierarchy) results in a listing of the segments contained in the selected documents (elements in the third hierarchy) containing the key phrase. Requesting a cross-reference of two or more segments (two or more elements in the third hierarchy) results in the display of segment contents with the components (elements in the fourth hierarchy) and the subcomponents (elements in the fifth hierarchy) being further distinguished in the display for ease of identification. Those of ordinary skill in the art will realize that there may be additional hierarchies or fewer hierarchies without detracting from the hierarchical cross-referencing element of the present invention. FIG. 2 illustrates in more detail selected components of the user computer 102 and the document server computer 104 of FIG. 1 suitable to implement one embodiment of the present invention. The user computer 102 includes a browser 202. The document server computer 104 includes a web server 204, an interface module 206, and a document database 208. The depicted components may advantageously communicate with each other and other components comprising the respective computers through mechanisms such as, by way of example, interprocess communication, remote procedure call, and other various program interfaces. Furthermore, the functionality provided for in the components, modules, and databases may be combined into fewer components, modules, or databases or further separated into additional components, modules, or databases. Additionally, the components, modules, and databases may advantageously be implemented on one or more computers. The browser 202 is a software program which allows a user to access different computers, including the document server computer 104, through the communication medium 106. In one preferred embodiment, the browser 202 may be a standard browser such as the Netscape® Navigator developed by Netscape, Inc. or the Microsoft® Internet Explorer developed by Microsoft Corporation. One of ordinary skill in the art will realize that other types of access software could also be used to implement the browser 202. The other types of access software could be, by way of example, other types of Internet browsers, custom network browsers, two-way communications software, cable modem software, point-to-point software, custom emulation programs, and the like. A user employs the browser 202 to access the document server computer 104, and more particularly, the web pages which facilitate the hierarchical side-by-side cross-reference of documents, and requests that a document cross-reference be performed by the document server computer 104. One embodiment of a process by which a user requests a cross-reference of two documents is illustrated by the flow chart in FIG. 4. Beginning in a start state 400, the user initiates the execution of the browser 202 on his or her user computer 102. The user directs the user computer 102, utilizing the browser 202, to establish a communications link or network connection to the document server computer 104 through the communication medium 106. Having successfully established the network connection, the user is presented with a first web page stored on the document server computer 104 in state 402. In particular, the browser 202 displays the first web page which prompts the user for a password, and may, in addition, provide the user the capability to select a category of document to be cross-referenced by the document server computer 104. In one embodiment, a list of categories of documents stored on the document server computer 104 that can be cross-referenced may advantageously be presented to the user through a pull-down menu. From the list of the categories of documents, the user can use a pointing device, such as a mouse or the like, and select the desired category of document to cross-reference. In another embodiment, the first web page may contain a data entry field conducive to accepting input from the user. The user may then use an input device, such as a keyboard, microphone, and the like, and specify the desired category of document. In state 402, the user provides a password, or other identifying information, to the document server computer 104. In one embodiment, the password is entered through the same web page through which the user may specify the category of document. In another embodiment, the document server computer 104 may present the user a different web page which is to be used to provide the password. In still another embodiment, the user may not be required to provide a password to utilize the document cross-reference facility. In yet another embodiment, certain selected categories of documents, such as, by way of example, categories of documents containing confidential, classified, or sensitive documents, may require a password from the user. In states 404-408, the user, in response to a web page 210 (see FIG. 2) displaying a list of documents available for cross-reference, selects two or more documents in which the cross-reference is to be performed. In state 410, the user enters a key phrase by which the selected documents are to be cross-referenced. The key-phrase request may be performed through either the same web page 210 through which the user selected the documents, the same web page through which the user provided the password, or through still another web page altogether. Likewise, each document may be selected in individual web pages. Those of ordinary skill in the art will realize that the particular ordering of the states 402-410 is not critical, and that the aforementioned states may be rearranged in a different order, or even possibly omitted, without detracting from the scope of the invention. For example, the entry of the key phrase (state 410) may occur before or during the selection of documents from the list (states 404-408). In state 412 the user submits the selected documents and key phrase to the document server for cross-reference, via a mouse click on an appropriate screen "button," keystroke, etc. Subsequently, the user is presented a web page containing a side-by-side list 212 (see FIG. 2) of the segments from each selected document that contain the key phrase. One list identifies the segments from the first selected document, and a second list identifies the segments from the second selected document. Additional lists may be presented so that the overall number of lists corresponds to the number of documents selected for cross-reference; again, the invention is not limited to the selection or cross-reference of only two documents. Proceeding to states 414-418, the user selects two or more segments from the list for further cross-reference. The user then submits the selected segments to the document server computer 104 for cross-reference in state 420. In one embodiment, the user may select any one segment from the list corresponding to the first document and any one segment from the list corresponding to the second document. In another embodiment, the user may select any two or more of any of the segments displayed in any of the lists. This may include, for example, the selection of two or more segments from the same document (and none from the other document(s) displayed) for cross-reference. The document server computer 104 advantageously searches the submitted segments for the key phrase and appropriately identifies the components contained within the submitted segments containing the key phrase. Subsequently, the user is presented a web page containing a side-by-side display 214 (see FIG. 2) of some or all of the contents of the selected segments. The contents of the first selected segment are displayed in one display window or portion while the contents of the second selected segment are displayed in a second display window or portion adjacent the first. Of course, the actual number of side-by-side display portions corresponds to the number of selected segments, the user not being limited to the selection of two. In each display portion, the components of the segment containing the key phrase are appropriately identified. In the case of a textual document, the component may advantageously be a sentence or a paragraph. In the case of a graphical document, the component may be a number of pixels or lines of display. In the case of a video display, the component may be a number of frames. Those of ordinary skill in the art will realize that the division of documents into components is frequently a matter of choice for a document author and may differ depending on the document type. The components may be appropriately distinguished in the display by a difference in color, font, type size, intensity, contrast, and the like. Within the components, the actual key phrase, such as, by way of example, a word, letter, byte, bit, or pixel, may be further distinguished. Once the user has viewed the contents of the segments (state 422), the document cross-reference process proceeds to end state 424. With further reference to FIG. 2, the web server 204 provides access to the communication medium 106 and delivers the plurality of web pages stored on the document server computer 104 to the one or more user computers 102. The plurality of web pages facilitate the cross-reference of documents stored in the document database 208. It is contemplated that the web server 204 uses standard web server software applications such as, by way of example, public domain software from NCSA and Apache, and commercial packages such as Netscape's Internet Server software, Microsoft's Internet Server software, and the like. These web pages are accessible by users executing a standard browser on the user computer 102. In another embodiment, a proprietary or non-standard software application is employed to provide access to, and delivery of, the plurality of web pages. In this case, the user can execute a comparable software program, capable of interfacing to the proprietary or non-standard software executing on the document server computer 104, on the user computer 102 to access the web pages on the document server computer 104. The interface module 206 performs the requested document cross-reference and facilitates communication between the web server 204 and the document database 208. For example, the document server computer 104, in processing a request to cross-reference documents, receives the request through the web server 204. The web server 204 extracts the necessary information, and this information is advantageously processed by the interface module 206. As part of the processing, the interface module 206 accesses the documents and other data stored in the document database 208. Furthermore, the interface module 206 performs necessary operations, such as, by way of example, searching the document contents and presenting some or all of the document contents to the user through the web server 204, on the document contents and other data retrieved from the document database 208 in the manner disclosed herein. In one embodiment, the interface module 206 uses the Common Gateway Interface (CGI) protocol to process the information gathered from, and presented to, the user through the web server 204. The other operations, such as, by way of example, searching the document contents and identifying the segments containing the key phrase, performed by the interface module 206 as disclosed herein may advantageously be implemented using scripting languages, such as Unix/Linux shell (sh, ksh, or bash), PERL and JavaScript, and other standard programming languages such as Java, C and C++. Furthermore, the interface functionality enabling the interface module 206 to access the document database 208 may be implemented using the application language suited for the particular document database 208, such as the various standard and scripting languages mentioned above. Those of ordinary skill in the art will realize that the selection of the particular software language is not critical, and that, any software language capable of implementing the functions and features described herein may be used without detracting from the scope of the invention. The document database 208 is a repository for the documents stored on the document server computer 104. In one embodiment, the document database 208 utilizes a hierarchical file system, such as the Unix/Linux file system, in implementing the document repository. The structure of the hierarchical file system facilitates the storage of the electronic contents in one or more hierarchies or levels. As is generally illustrated by the document tree 90 in FIG. 9, at the top level is the root directory. Below the root directory is a directory containing the one or more categories of documents. Each category of document is an element or node in this level. Below each category of document may be one or more directories representing the documents contained within the particular category of documents. Each document is an element or node in this level. For example, as is illustrated in FIG. 9, the category of document "FAA" may be one node in the categories-of-documents level and may contain hundreds of documents titled "DOC1" through "DOCN," including the "Standards and Recommended Practices" document and the "Digital Voice Recorder" document. As is also illustrated in FIG. 9, the category of document "DOD" may be another node in the categories-of-documents level and may include the documents "DOC1" to "DOCX." Below each document directory are one or more files containing the contents of the respective document. In one embodiment, each file may advantageously correspond to a segment contained in a document. Certain documents contain segment delimiters such as chapters, sections, and subsections. In one embodiment, the smallest unit, such as a section or subsection, containing text may be considered a segment and stored in a separate file. In another embodiment, a section may be considered a segment, and each section of the document, including all subsections contained within the section, may be stored in separate files. Other documents may not contain segment delimiters, but may be continuous in form. In this instance, the document server computer 104 may create artificial segments in the process of storing the document in the document database 208. For example, for a text document, a selected number of lines of text may be considered a segment. For a video document, a selected number of frames may be considered a segment. In still another embodiment, the document server computer 104 may advantageously contain program logic capable of parsing the document contents and subsequently generating segments, and titles or segment headings for the created segments, based on the program's interpretation of the document contents. As an example, the program logic may advantageously parse the contents of a video document and create segments and segment headings based upon the program's perceived interpretation of the video document by, for example, detection of differences in image, color, patterns, contrast, etc. between various series of frames. Those of ordinary skill in the art will realize that the actual number files containing the document contents can vary based on the determination of what a segment is for a particular document during the storing of the document in the document database 208. In another embodiment, more layers may be present in the directory tree 90. For example, the "Standards and Recommended Practices" document may be composed of five volumes, "VOL1" through "VOL5." In this instance, the node "Standards and Recommended Practices" may advantageously contain five directories or nodes representing the five volumes. In still another embodiment, more directories may be present depending on the number of versions of the document that are stored on the document database 208. As an example, the document server computer 104 may advantageously store all the PTO documents as one of the category of documents in the document database 208. The "PTO" category of documents may consist of hundreds of documents. These documents may include, for example, the following documents: "Manual of Patent Examining Procedure" ("MPEP") and "Trademark Manual of Examining Procedure" ("TMEP"). Furthermore, each document may be very large, consisting of hundreds or thousands of pages. Additionally, the three most recent editions of the MPEP may be stored in the document database 208. The document server computer 104 may advantageously permit users, such as patent attorneys, to cross-reference specific topics among the three stored editions of the MPEP. The edition identifier, such as, by way of example, the edition number, the month and year designation, or a combination of both, may be used to distinguish the document versions. For example, the three editions of the documents may be appropriately identified by the text strings "/PTO/MPEP/FIFTHEDITION," "/PTO/MPEP/SIXTHEDITION," and "/PTO/MPEP/SEVENTHEDITION." Segments of each document may advantageously correspond to the numbered sections and subsections appearing in the respective MPEP edition. A patent attorney interested in quickly cross-referencing a topic among the three editions of the MPEP may then use a user computer 102 and remotely access the document server computer 104. The patent attorney can then request to cross-reference the "PTO" category of document. Upon receiving a listing of documents contained in the requested "PTO" category of document, the patent attorney may advantageously select one or more versions of the MPEP document for cross-reference by the document server computer 104. The document server computer 104 may advantageously list the segments contained in each of the three editions of the MPEP which contain a key phrase (entered by the attorney and corresponding to a topic of interest) in a side-by-side-by-side display. In one embodiment, if one segment in one edition of the MPEP is found to contain the key phrase, that segment's segment identifier is listed in that portion of the side-by-side-by-side display corresponding to the selected MPEP edition. The patent attorney may then advantageously select one segment identifier from each of the three lists for cross-reference by the document server computer 104. The document server computer 104 may then display the segment contents in a side-by-side-by-side display appropriately distinguishing the key phrase. In another alternative embodiment, the document server computer 104 may list the three editions of the MPEP in a web page and request the user to select two of the three editions of the MPEP for cross-reference. The patent attorney may then select two editions for cross-reference by the document server computer 104. Subsequently, the two specified editions of the MPEP can be cross-referenced by the document server computer 104. In still another alternative embodiment, the MPEP document may advantageously include both editions and revisions of the MPEP. In this instance, differing versions of the document may be identified by, for example, a combination of the edition number, the month and year designation, and the revision number. Those of ordinary skill in the art will realize that the version indicators, such as "OLD" and "NEW" directories, may be located in another hierarchy or level in the document tree 90 without detracting from the scope of the invention. In another embodiment, the document database 208 may be implemented with Structured Query Language (SQL) code. SQL is a relational database language standardized by the International Standards Organization (ISO). The document database 208 can be implemented utilizing any number of commercially available database products such as, by way of example, Microsoft® Access and the like. In still another embodiment, the document database 208 may conform to any database standard, or may even conform to a non-standard, private specification. The hierarchical structure of the document database 208 may be implemented using the selected database. In still another embodiment, the documents may be stored in the document database 208 in units of storage recognized by the particular database, and the contents of the units may be identified, retrieved, compared, modified, and listed in order to facilitate the hierarchical cross-reference of the electronic contents as disclosed herein. One embodiment of the interaction between the components of the document server computer 104, in particular the web server 204, the interface module 206, and the document database 208, in processing a document cross-reference request is generally illustrated in FIG. 5. Beginning in a start state 500, the document server computer 104 receives a user request to perform a document cross-reference in state 502. The document server computer 104 receives the user's identifying information such as, by way of example, a password, and may also receive the requested category of document to cross-reference. Proceeding to state 504, the user's identifying information is validated to ensure that the user is authorized to access the information contained in the specified category of document. In one embodiment, a data record may advantageously be used to maintain a list of users authorized to access the one or more categories of documents stored in the document server computer 104. The document server computer 104 can locate the data record for the particular category of document specified by the user and verify that the received user identifying information is found in the list of authorized users. By way of example, the user may specify "DOD" as the category of document and submit a password. Upon receipt of this request, the document server computer 104 can locate the "DOD" data record and determine if a password is required for access. If no password is required, then access is granted. If a password is required, then the "DOD" data record is searched to locate the submitted password. If the submitted password is not found in the "DOD" data record, an error message is displayed on the user computer 102 in state 506 and the document server computer 104 proceeds to end state 522. Alternatively, the user may be redirected to the previous page and prompted for correction of the identifying information. If the selected category of document is not password protected, or the user submitted password is found in the selected category's data record, the document server computer 104 displays a list of the documents contained in the category in state 508. Each document name is displayed alongside a check box or is otherwise associated with a "toggle" indicator for identifying the selected and non-selected documents. The display in state 508 also contains a prompt for the key phrase on which the cross-reference is to be performed. One embodiment of the display of the list of documents and key-phrase prompt is generally illustrated in FIG. 6. The display can list the documents "VSCS" through "SARP Vol5" in an appropriate format, with each document name adjacent to or associated with a check box as detailed above. From this screen, the user may advantageously enter the key phrase "software documentation" in the key-phrase prompt and use a pointing device, such as a mouse or the like to select the documents "VSCS" and "Emails—Faa," and subsequently submit the documents and key phrase for cross-reference. Proceeding to state 510, the document server computer 104 receives the user submitted key phrase and documents for cross-reference. The selected documents are searched for the key phrase in state 512. In one embodiment, each document segment in the selected documents is advantageously searched for the key phrase, and the particular segments are identified accordingly. In performing the search, the computer 104 advantageously stops searching a particular segment once it has found one instance of the key phrase in the segment (and then flags that segment as containing the key phrase); thus the search may proceed more quickly through all of the segments. Alternatively, the computer 104 may search the entire segment and locate all instances of the key phrase, in order to rank the segments by the number of key-phrase "hits" in the segment. As a further alternative, the computer 104 may locate all instances of the key phrase in one pass through the document and prepares a log of all key-phrase hits in the document and the location (segment, sub-segment, sub-sub-segment, down to the lowest level of the hierarchy) of each hit. The log obviates the need for some or all of the subsequent searches at lower levels in the hierarchy by providing a complete and accessible record of all instances of the key phrase in the document. The document segments containing the key phrase are presented to the user in a side-by-side display in state 514. One embodiment of the side-by-side display of the list of segments containing the key phrase is generally illustrated in FIG. 7. In a first list are presented segment identifiers corresponding to the identified segments from the first document submitted, and in a second list are presented segment identifiers corresponding to the identified segments from the second document submitted. The segment lists may be presented in a scrollable display; in one embodiment the segment identifiers are listed or ranked according to the number of instances of the key phrase within the corresponding segment. In another embodiment, the segment identifiers are listed according to the numerical, etc. order in which the corresponding segments appear in the underlying document. Continuing our example, the cross-reference of the requested documents "VSCS" and "Emails—Faa" on the key phrase "software documentation" may have identified the segments listed in the side-by-side segment list illustrated in FIG. 7. Utilizing this list, the user can quickly identify the document segments containing the key phrase. From this screen, the user can identify and submit one or more segments to the document server computer 104 for further cross-reference. For example, the user can select and submit the segments "4.2.2" and "Sat03Jan98155908" to the document server computer 104, whereupon the contents of the submitted segments are advantageously displayed to the user. Note that the labels or segment identifiers for the document segments could be any label useful to the user. Thus, for example, the text of the section heading could be included in the label/identifier. Proceeding to state 516, the document server computer 104 receives the user submitted segments for further cross-reference. In state 518, the specified segments are searched for the key phrase (previously entered in state 502). In other words, the key phrase search or cross-reference is now performed at the segment level (the next level in the hierarchy), rather than at the document level as was done at the previous step in our example. Segment components, such as, by way of example, sentences, paragraphs, images, photographs, and video frames, containing the key phrase are identified. The subcomponents comprising the key phrase, such as, by way of example, words, letters, pixels, and frames, are further identified. Proceeding to state 520, the document server computer 104 displays the contents of the particular segments in a side-by-side display as generally illustrated in FIG. 8. Each display is clearly identified to indicate the segment being displayed. The contents of one requested segment are displayed on one side and the contents of the other requested segment are displayed on the other side. If only one segment was submitted for further cross-reference in state 516, then only that segment's contents are displayed, and one of the displays may advantageously be empty, or simply not shown. Furthermore, the displays may be scrollable to provide the user ease of navigation in viewing the displayed contents. In another embodiment, the contents of the particular segments may be displayed in one display or screen. In the display, the key phrase itself and/or the components containing the key phrase may advantageously be indicated by methods such as, by way of example, underlining, "redlining," or the use of differing colors. Once the user has viewed the side-by-side display of the contents of the selected segments, the user may be directed to end state 522 or redirected to any of the previous states or pages, to perform further document cross-referencing. Continuing the example from above, the contents of "VSCS/4.2.2" are displayed on the left in the side-by-side display (see FIG. 8). The contents of "Emails—Faa/Sat03Jan98155908" are displayed on the right in the side-by-side display. The key phrase "software documentation" is highlighted in the text comprising the contents of the respective document segments. In another embodiment, the electronic contents stored in the document database 208 may include voice information. In cross-referencing and presenting the relevant segments of the voice information, the document server computer 104 can transform the voice information into textual form and present the textual form of the voice information, with the appropriate segments indicated, to the user in the side-by-side display. Furthermore, the voice information may advantageously be stored in the textual form in the document database 208. Thus, the voice information may advantageously be separated and stored in logical segments. These segments may comprise, for example, divisions by topic, or by time (30-second segments, one-minute segments, etc.), or by speaker. In still another embodiment, the voice information may be stored in the document database 208 as sound signals, and these signals can subsequently be separated into logical segments, cross-referenced on an orally spoken key phrase, and presented to the user through a speaker attached to the user computer 102. In still another embodiment, the electronic contents stored in the document database 208 may include video information. The video information may be separated into logical segments such as, by way of example, different scenes, different half-hour or one-hour TV shows, different topics, a predetermined length of time, or a predetermined number of frames. In cross-referencing various documents comprising the video information, the document server computer 104 can search the appropriate segments of the video and determine the segments containing the key phrase. In one embodiment, the document server searches the actual digitized video information to detect instances of the key phrase, such as a pattern, image, frame, scene, or a series of patterns, images, frames, scenes, etc. representing a specific event, person, object, motion, etc., by detecting specific values or patterns of values which correspond to the key phrase in the data comprising the digitized video. In another embodiment, the document server searches a text summary of what is being shown in the video. The segments containing the key phrase can be presented to the user in a side-by-side display as disclosed herein. The user can then select one or more video segments for further cross-reference by the document server computer 104. The document server computer 104 can identify components within the video segments that contain the key phrase. The submitted video segments can then be played in the side-by-side display with one video segment playing in one display and the other video segment playing in the other display. The video segments can be played simultaneously in the side-by-side display. Alternatively, the user may control the playing of the individual video segments. Furthermore, when a component of the video segment containing the key phrase is playing, an appropriate indicator, such as a light or banner message, may appear on the display alerting the user to the fact that the video segment being displayed on one or both of the side-by-side displays is a component of the video segment containing the key phrase. It is contemplated that a similar approach would be taken for segments of a recorded sound document such as music. The music may advantageously be segmented by song, by album, by artist, by time (such as 8-minute segments of a desired work of classical music or a 10-second segment of a 3-minute pop song), by subject, or by genre. FIG. 3 illustrates one embodiment of the flow of information between a user computer 102 and the document server computer 104 when the user accesses the web pages stored on the document server computer 104 in requesting a document cross-reference. In event A, the user utilizes a browser 202 executing on his or her user computer 102 and accesses the document server computer 104 through the communication medium 106. In particular, through a web page stored on the document server computer 104, the user submits information including a user password and, if applicable, a category of document to cross-reference. In event B, the document server computer 104 verifies the user submitted information and displays a list of documents available for cross-reference in a web page. Through this web page, the user can select two or more documents for cross-reference by the document server computer 104 in event C. In event D, the document server computer 104 cross-references the user specified documents on the key phrase. In particular, the cross-reference is performed by identifying segments within the documents that contain the key phrase. Segments containing the key phrase are appropriately identified and their segment identifiers are displayed to the user in a side-by-side display through a web page displayed on the user computer 102. For example, the identified segments from the first specified document may advantageously be listed in one of the side-by-side displays. The identified segments from the second specified document may advantageously be listed in the other of the side-by-side displays. Through this web page, the user advantageously selects two or more identified segments for further cross-reference by the document server computer 104 in event E. For example, the user may select a segment from the first specified document and a segment from the second specified document. In another embodiment, the user can select two segments from the same display or more than two segments in total from one or both displays. In event F, the document server computer 104 searches the selected segments and identifies the area or region of the segment containing the key phrase. The document server computer 104 then displays the contents of the user selected segments in a side-by-side display through a web page displayed on the user computer 102. In the side-by-side display, the key phrase itself and, optionally, the identified region or area of the segment containing the key phrase, are displayed in a contrasting manner for easy identification by the user. Subsequent to viewing the contents of the requested segments, the user can reaccess the side-by-side list of segments containing the key phrase and select one or more different segments from the side-by-side display for cross-reference by the document server computer 104. In one embodiment, the plurality of web pages facilitating the cross-reference of documents can be implemented with a "previous page" button well known to those of ordinary skill in the art. The user can use a pointing device, such as a mouse or the like, and click on the "previous page" button in the web page displaying the contents of the selected segments to access the side-by-side list of segments containing the key phrase. The user may then advantageously select one or more segments for cross-reference by the document server computer 104. This process may be repeated until the user has selected and viewed the appropriate portions of the desired segments. Thus, the user may, but is not required to, re-specify the category of document, or the documents of interest, after each cross-reference of the requested segment(s) by the document server computer 104. In another embodiment, the web pages facilitating the cross-reference of documents as disclosed herein may advantageously include a text entry area. The user may then specify one or more documents in the text entry area. Alternatively, the user may also specify one or more segments in the text entry area. The document server computer 104 may advantageously receive the one or more documents or the one or more sections entered by the user in the text entry area (as well as a key phrase as disclosed above) and subsequently perform the appropriate cross-reference. Thus, a more knowledgeable user may, but is not obliged to, make selections from the sequence of side-by-side displays in performing a document cross-reference. The more knowledgeable user may circumvent the sequence of making selections through the side-by-side displays by specifying the desired documents or segments for cross-reference through the text entry area. The invention advantageously performs an efficient cross-reference of two or more documents contained within a category of documents. The documents are searched in hierarchies or stages. In the first stage, the documents contained in the requested category of documents are presented. Subsequently, if particular documents are selected for further cross-reference, the documents' segments are searched for a key phrase input by the user. Each segment is searched until all instances of the key phrase are detected. If the key phrase is detected, the segment is identified as containing it. The identified segments are subsequently presented to the user. The invention affords the user an efficient cross-reference utility. In performing a cross-reference of a category of document, the user is first presented with documents contained in a specified category of document. Upon designating two or more documents for cross-reference on a given key phrase, the user is presented with side-by-side lists of segments within the document which contain the key phrase. Upon selecting two or more segments for further cross-reference on the key phrase, the user is presented with the contents of the segments with the key phrase appropriately distinguished for ease of identification. Thus, the user is able to cross-reference the discussion of a given topic associated with the key phrase in multiple documents in an efficient and hierarchical manner. As an example, a very large specification, such as a one setting forth the engineering requirements for a certain type of battle tank, can be stored on the document server computer 104. The specification may contain thousands of pages and may additionally comprise a number of volumes. Furthermore, each volume can contain thousands of sections. Multiple authorized users may advantageously be permitted to draft, and/or later amend, diverse segments of the stored agreement that nonetheless relate to a common topic. In one embodiment, authorized users provide a password to the document server computer 104. If the password is authenticated, the user is permitted to draft or amend various segments of the specification stored on the document server computer 104. During various phases of the preparation of the specification, it may become necessary to visually compare all of the segments of the specification that relate to, for example, the tank's cannon. This may be done in order to ensure that the contents of the segments do not contradict or to ensure that they are not redundant. This invention allows users to remotely access the stored specification and quickly and efficiently cross-reference all of the segments relating to the tank cannon without having to view the contents of the entire specification, or to select between views of the various documents/segments/volumes in a sequential or "back-and-forth" fashion. For example, a user can request the document server computer 104 to perform a cross-reference of the specification segments. (The stored specification may advantageously be considered a category of document.) The user can be presented with the one or more volumes contained in the specification. The volumes may advantageously be considered the documents contained in the particular category of document. The document server computer 104 advantageously focuses the user to the volumes that actually contain references to the tank cannon. In like fashion, the user can request a cross-reference of a specific volume and a specific segment contained in the specified volume. At each phase of the cross-reference, the user is presented with a segment list identifying the segments of the volume that refer to the tank cannon. The user can then specify one or more segments for the document server computer 104 to cross-reference. The contents of the requested segments are displayed and the references to the tank cannon are further distinguished for identification. Consequently, the user does not have to browse the thousands of pages contained in the specification to find each discussion of the tank cannon. Moreover, the user is able to focus the cross-reference to one of the multiple volumes, and one of the thousands of sections making up a volume, in observing, analyzing and comparing references to the cannon made in multiple volumes, documents, segments, etc. Similar advantages may be obtained if a user needs to compare references to the tank cannon in the specification, to references to the tank cannon in an archive of Department of Defense e-mails, and/or to references to the tank cannon in an archive of Department of Defense press releases. The user first selects the specification, e-mail archive and press release archive for cross-reference on a key phrase corresponding to the tank cannon, such as "120 mm." The system of the present invention advantageously searches the specification, e-mail archive and press-release archive for the key phrase "120 mm" and presents a side-by-side display of lists of segment identifiers corresponding to the segments of each of the three documents containing instances of the key phrase "120 mm." A first window of this display contains a list of specification segment identifiers (such as titles of volumes of the specification) corresponding to specification segments that contain the key phrase. A second window of this display contains a list of segment identifiers from the e-mail archive (such as titles of electronic folders containing all of the emails for a given date or the "RE:" text from individual emails) corresponding to segments of the e-mail archive that contain the key phrase. A third window of this display (advantageously located furthest to the right on the screen) contains a list of segment identifiers from the press-release archive (such as titles of folders containing press releases from a particular office within the Department of Defense) corresponding to segments of the press-release archive that contain the key phrase. Upon viewing this display, the user can select one or more segments from each window for further cross-reference. Advantageously, the present invention permits the user to save time by selecting only those segments which appear likely to contain relevant references to the tank cannon. In other words the user can exercise his or her judgment as to whether a particular segment that is found to contain one or more instances of the key phrase, is likely to contain information about the tank cannon that interests the user. For example, the user may see that certain segment identifiers in the first window (in our example, specification volume titles) represent specification volumes that are unlikely to contain information about the tank cannon that the user would be interested in, even though the volumes contain one or more instances of the key phrase. The user can thus focus on only the more relevant segments/volumes without wasting time on further investigation of the contents of the less relevant segments/volumes. Or, when reviewing the segment identifiers in the second window (in our example, the daily e-mail folders) the user may recognize that many of the e-mail folders are from dates too far in the past to be of any relevance. Thus the user saves time by selecting only the more recent, more relevant e-mail folders for further searching/cross-reference, and avoids further investigation of the older, less relevant folders. Likewise, when reviewing the segment identifiers in the third window, the user can select the press-release folders from the more relevant DoD offices and avoid the folders from the less relevant offices. In sum, the hierarchical cross-reference facilitated by the present invention permits the user to, at each level in the hierarchy, exercise judgment to steer the search toward the more relevant portions of a large document or data compilation, and avoid time-consuming sequential or "page-by-page" review of these documents. The side-by-side display permits quick and easy comparison of multiple documents at each level in the hierarchy. Continuing the example, after the user selects a number of displayed segments for further cross-reference, the user is presented with another side-by-side display of lists of the subsections, etc. from each selected segment which contain instances of the key phrase. In the case of the specification, the user is presented with a list of chapters from the previously-selected volume which contain the key phrase. The lists of subsections are shown in a series of windows, one for each of the selected segments from the previous display, and the user is prompted to select one or more subsections for further cross-reference. As disclosed above, this process continues until the user reaches the lowest level of the hierarchy. At this point the user can easily compare instances of the key phrase in the specification, e-mail archive, and press-release archive, displayed in side-by-side windows. The user may then verify that the various discussions of the topic relating to the key phrase are consistent, identify where changes may be needed, or otherwise compare the treatment of the topic in diverse locations in multiple large documents. It is contemplated that the present invention can be used to cross-reference many different types of documents. For example, a user may wish to cross-reference the tank specification with an audio file that contains a compilation of speeches made by the Secretary of Defense, and with a video file containing footage of tests of various Army vehicles and equipment. The user's purpose may be to compare the discussion of the tank cannon in the specification with the Secretary's statements about the cannon in his speeches, and/or with the characteristics and performance of the cannon that may be observed in the test footage. The user is prompted to enter a key phrase suitable for searching the text of the specification, such as "120 mm," and to identify a key phrase suitable for searching the audio file, such as an audio clip of the Secretary or someone else saying "120 millimeter" or the user's own voice (transmitted through a microphone attached to the user's computer) saying "120 millimeter." The user is also prompted to specify a key phrase suitable for searching the video file, such as a frame or series of frames depicting the cannon. Upon entry of the key phrase(s) the user is presented a side-by-side display of lists of segment identifiers from each of the specification, audio file and video file corresponding to segments containing the appropriate key phrase. The segment identifiers may be, for example, the volume titles of the specification, the titles of individual speeches, and vehicle-test categories. As disclosed above, the present invention permits the user to search at each successive level in the hierarchy of each of the specification, audio file and video file, until actual instances of the key phrase in each of the files are selected and shown in side-by-side windows. In the case of an audio file, the window may advantageously contain a matrix of buttons that are used to play/pause/rewind an audio clip containing the key phrase, as well as a progress bar, timer, waveform display, etc. In the case of a video file, the window may advantageously contain an inset window through which the video is displayed, along with a set of buttons, progress bar, timer, etc. Thus the user in our example may read the text from the specification containing the key phrase "120 mm" in a first window, control the playback of an audio clip containing the Secretary's utterance of "120 millimeter" in a second window, and view a video clip of a test of the tank cannon in a third window, so as to quickly and easily compare information about the cannon from each of these three sources. In one embodiment, the audio file is advantageously translated into text to facilitate searching. This could be done by a speech-to-text conversion program or by use of the actual speech copy received from the person who wrote the speech, or by a stenographer typing in the speech text as the person is speaking. It has been found that the present invention is especially useful when the user needs to cross-reference large documents, for example documents of 50-100 pages or more, or when the user must cross-reference a large number (5 or more) of smaller documents which together total over 50-100 pages. (One example of a large document is the International Civil Aviation Organization's Standards & Recommended Practices, a multivolume document that includes over 1500 pages.) It has also been found that the present invention is particularly useful when it is necessary to achieve precise consistency or correctness of wording or meaning among a number of documents, regardless of their length. This is often true when dealing with legal documents or documents intended for wide dissemination among the public, such as advertising materials. Of course, those of ordinary skill in the art will realize that the present invention may also be beneficially used to rapidly and hierarchically cross-reference smaller or less numerous documents in a variety of situations. An additional advantage of the present invention is the ability to search and cross-reference multiple documents that are of different types. As disclosed above, the system can cross-reference an archive of e-mails against a tank specification, an example of two documents that are of different types even though they are both text documents. In addition, the documents submitted for cross-reference may be a mix of text, audio, graphic, video and other types of documents. This invention may be embodied in other specific forms without departing from the essential characteristics as described herein. The embodiments described above are to be considered in all respects as illustrative only and not restrictive in any manner. The scope of the invention is indicated by the following claims rather than the foregoing description.
|
Same subclass Same class Consider this |
||||||||||
