System and method for comparing and representing similarity between documents using a drag and drop GUI within a dynamically generated list of document identifiers6938034Abstract The present invention relates to the field of data processing, and particularly to a software system and associated method for use with a search engine. The engine searches data maintained in systems that are linked together over an associated network such as the Internet. More specifically, this invention pertains to a computer software product for determining, comparing, and representing the similarity between documents using a drag and drop Graphical User Interface (GUI) within a dynamically generated list of document identifiers. The invention uses this drag and drop GUI interface for convenient selection of document identifiers for further comparison. Then processing of a similarity analysis request using a configurable similarity algorithm is executed; this processing can be done on the client, proxy or server side. When the comparison process is completed, the GUI presents the similarity result of the comparison process as a Venn Diagram to show the level of similarity between the selected documents. Claims 1. A computer readable program product for comparison of documents found on a network interconnected with a plurality of information processing units and hub processing units, the computer readable program product comprising instructions for: Description PARTIAL WAIVER OF COPYRIGHT Overview of the Current Invention The present invention provides software system and associated method for use with a search engine. The engine searches data maintained in systems that are linked together over an associated network such as the Internet. More specifically, this invention pertains to a computer software product for determining, comparing, and representing the similarity between documents using a drag and drop Graphical User Interface (GUI) within a dynamically generated list of document identifiers. The invention uses this drag and drop GUI interface for convenient selection of document identifiers for further comparison. Then processing of a similarity analysis request using a configurable similarity algorithm is executed. One such similarity algorithm is disclosed in U.S. patent application Ser. No. 09/543,230 filed on Apr. 5, 2000, with inventors Reiner Kraft, Qi Lu, and Shang-Hua Teng, entitled "Method and Apparatus for Determining the Similarity of Complex Designs" which is hereby incorporated in its entirety by reference. The processing of similarity analysis can be done on the client, proxy or server side. When the comparison process is completed, the GUI presents the similarity result of the comparison process as a Venn Diagram to show the level of similarity between the selected documents. The following example will illustrate how the invention works using a search result set as a preferred embodiment. Consider for instance that a user knows the content of a document A, and that user is generally satisfied with the overall content in relation to the issued search query. Another document B, displayed on the same search result page, has a promising title and abstract. However, there is no further information available from the search result page. Instead of loading document B into a document viewer, reading through the content and determining whether or not the document itself has similar properties as document A, which is a time-consuming process, the user actuates the present invention to perform this function. In particular, the user issues a selection request utilizing a pointing device like a mouse by clicking and holding the left mouse button to select the document link of document B. Then the user is able to drag and drop the document B identifier to the document A identifier, thereby starting the comparison process of the two selected documents. As a result, a GUI will be presented using a Venn diagram to show the similarity of the two documents. One embodiment of the invention integrates it within the Grandcentral Station site of portals (jCentral®, xCentral). System Level Overview FIG. 2 is a system level overview (200) of the Components for the invention to accomplish the Comparison and Representation of Similarity between selected documents. The invention (204) resides within a web browser environment (202). The System Architecture for the invention is composed of the following components:
FIG. 3a is a Graphical User Interface (300a) showing the drag and drop feature as practiced by this invention. The following example will illustrate how the invention works using a search result set as a preferred embodiment. Consider for instance that a user knows the content of a document A (302a) relating to a search query, in this example documents relating to the Mars Observer, and is generally satisfied with the overall content in relation to the issued search query. Another document B (304a), displayed on the same search result page, has a promising title and abstract. However, there is no further information available from the search result page. Instead of loading document B into a document viewer, reading through the content and determining whether or not the document itself has similar properties as document A, which is a time consuming process, the user actuates the invention to perform the same function. In particular, the user issues a selection request utilizing a pointing device like a mouse by clicking and holding the left mouse button to select the document link of document B. Then the user is able to drag and drop the document B (306a) identifier to the document A identifier, thereby starting the comparison process of the two selected documents. As a result, a GUI will be presented using a Venn diagram to show the similarity of the two documents. FIG. 3b is a Venn Diagram (300b) showing the percentage similarity between the two documents as practiced by this invention. GUI/Event Manager (206) Functional Overview FIG. 4 illustrates a functional overview (400) of a Graphical User Interface/Event Manager module as practiced by the invention. It acts as an interface between the web browser environment and the invention. The GUI/Event Manager receives GUI events (402) from the web browser, such as mouse movements, user selections or the equivalents for further processing. In addition, it will format result data received from the Result Set Manager for graphical representation. Before the GUI/Event Manager processes the result set, this search result set from an Internet search engine are received by the Result Set Manager. The search result items are marked there, so that the GUI/Event Manager knows how to represent these, and associates appropriate event handlers to them (404). For each search result item there will be an event handler, which will listen to particular mouse events (e.g., mouse click, drag, drop). Mouse events are received from the web browser environment and are interpreted as a selection of one search result item. When the search result items are identified, along with the associated target, the source search result item, along with the target search result item will be forwarded to the Downloader Component (406). At the end, a comparison result, representing the similarity of the source and target search result item will be received (408) and visually represented (410). Downloader Component (208) Functional Overview FIG. 5 is a functional overview (500) of a Downloader Component module as practiced by this invention. The Downloader Component receives as input a source and target search result items (502) from the GUI/Event Manager. A search result item is uniquely identified (504) using a URL or a similar document identifier. Then the Downloader Component selects (506) the appropriate transport and access protocol for the requested resources, and initiates a download (508) for both documents. Then a determination is made as to whether or not the download process is successful (510). In a web based environment the URLs are downloaded using the HTTP protocol. If the retrieval was successful, the Downloader Component passes (512) the content of the two search result items to the Comparison Unit for further processing. If a document cannot be successfully accessed or retrieved because of an expired or invalid URL or because of some such similar problem, the Downloader Component sends an error notification to the GUI/Event Manager, in order to notify the user (514) of the failure. Result Set Manager (212) Functional Overview FIG. 6 is a functional overview (600) of a Result Set Manager module as practiced by this invention. The result set manager identifies the appropriate time to activate the invention. It accomplishes this by intercepting all the data from a user's web browser session (602). The result set manager will parse the URL to identify a supported search engine (604), that is a search engine where a DTD (data type descriptor) scheme is available in the Scheme DTD DB (database). A check is made to determine whether on not a given page is supported or not (606). If a page is not supported then a determination is made as to whether or not a user session has terminated (616). If it has terminated then the process ends or conversely, if the session has not ended more data is intercepted from the web session. Once a search result page from a supported search engine is detected, the actual work of the invention begins, parsing the result set data as described below. Comparison Unit (210) FIG. 7 is a functional overview (700) of a Comparison Unit module as practiced by this invention. First, the Comparison Unit receives the content of two search result items (702). To effect the comparison, this invention could make use of a comparison method as described in U.S. patent application incorporated above by reference entitled "Method and Apparatus for Determining the Similarity of Complex Designs." In addition, any other comparison method or algorithm which is appropriate for the document type can be used. Further, because the two documents may be of different type, for example, one search result item can be a PDF document and the second one an MS Word document, in order to compare the two documents, they each have to be converted (704) to the same document type before the actual comparison can occur. Companies, such as INSO (http://www.inso.com) deliver document conversion filters, which can be used to facilitate the conversion of the two documents. Then the comparison of the two documents begins (706). The comparison algorithm itself will compare the structure as well as the content of the documents (708). Then the comparison algorithm will compute a value such as a percentage (710), which represents the similarity of the two documents. This value will be forwarded to the GUI/Event Manager component (712), which in turn displays the GUI representation of the similarity result for the user. Comparing and Representing Similarity Between Documents FIG. 8 illustrates the entire process (800) for Comparing and Representing the Similarity between Documents as practiced by this invention. First, a user enters a search query in the web browser (802) and a search result set will be returned (804) from an Internet search engine. A check is made to determine if the search engine is supported by the invention (806); if it isn't supported them the process ends, otherwise, the invention will be activated by the Result Set Manager when a supported search engine is successfully identified. Then the Result Set Manager parses the search result set data (808). Part of the parsing process is to identify the search result items, and to mark them (810). Knowledge of the structure and content of the search result set data is retrieved from the Scheme DTD Database (812). Once the search result set data is parsed and the search result items are properly marked, this marked data is passed to the GUI/Event Manager (814). The GUI/Event Manager will then associate proper event handlers (816) to each search result item so that user interaction with search result items can be detected. In a preferred embodiment the document is represented in HTML. The search result items could then be marked using some special tags, and event handlers can be represented as JavaScript code (client side scripting). The so enhanced search result set page will be displayed in the user's browser (818), waiting for the user to start a selection process of a search result item. When the user starts a selection process, for example, clicking on a search result item, the GUI/Event Handler receives this notification (820). The user uses a drag and drop mechanism to drag the selected search result item to another target search result item (822). As a result, the GUI/Event Handler will receive a drag and drop event notification from the web browser environment, along with a selected source and destination search result item (824). These two search result items will be forwarded to the Downloader component (826). The Downloader component tries to access and retrieve the selected documents (828). Next, a check is made (830) to determine whether a document or both documents cannot be downloaded; if one or both documents cannot be downloaded then an error message (840) will be sent to the GUI/Event Manager and the process terminated at this time. If the Downloader component was able to successfully download both documents, this downloaded data will be forwarded to the Comparison Unit (832). Then the Comparison unit receives the document data of the two documents and starts a comparison process (834). As a result of this process a similarity result will be computed and this result is forwarded to the GUI/Event Manager (836). Finally the GUI/Event Manager will generate a graphical display to show the similarity between the two selected documents (838). With existing technology there are several different ways to implement the invention. The implementation above uses client side scripting with HTML pages, based on a plug-in architecture. Other ways of implementing this should be obvious to someone skilled in the art after this detailed discussion of the proposed system architecture. Discussion of Hardware and Software Implementation Options The present invention, as would be known to one of ordinary skill in the art could be produced in hardware or software, or in a combination of hardware and software. The system, or method, according to the inventive principles as disclosed in connection with the preferred embodiment, may be produced in a single computer system having separate elements or means for performing the individual functions or steps described or claimed or one or more elements or means combining the performance of any of the functions or steps disclosed or claimed, or may be arranged in a distributed computer system or information processing system or information processing unit, interconnected by any suitable means as would be known by one of ordinary skill in art. According to the inventive principles as disclosed in connection with the preferred embodiment, the invention and the inventive principles are not limited to any particular kind of computer system but may be used with any general purpose computer, as would be known to one of ordinary skill in the art, arranged to perform the functions described and the method steps described. The operations of such a computer, as described above, may be according to a computer program contained on a medium for use in the operation or control of the computer, as would be known to one of ordinary skill in the art. The computer medium which may be used to hold or contain the computer program product, may be a fixture of the computer such as an embedded memory or may be on a transportable medium such as a disk, as would be known to one of ordinary skill in the art. The invention is not limited to any particular computer program or logic or language, or instruction but may be practiced with any such suitable program, logic or language, or instructions as would be known to one of ordinary skill in the art. Without limiting the principles of the disclosed invention any such computing system can include, inter alia, at least a computer readable medium or product allowing a computer to read data, instructions, messages or message packets, and other computer readable information from the computer readable medium. The computer readable medium may include non-volatile memory, such as ROM, Flash memory, floppy disk, Disk drive memory, CD-ROM, and other permanent storage. Additionally, a computer readable medium may include, for example, volatile storage such as RAM, buffers, cache memory, and network circuits. Furthermore, the computer readable medium may include computer readable information in a transitory state medium such as a network link and/or a network interface, including a wired network or a wireless network, that allow a computer to read such computer readable information.
|
Same subclass Same class Consider this |
||||||||||
