Method and apparatus for assigning keywords to documents6625335Abstract A keyword assignment system is provided to assign keywords when a digitized image of a document is created. The keyword assignment system includes a digitizer to generate the digitized image from the input document. A keyword entry system determines a keyword to be associated with the digitized image. A linker generates linking information that associates the keyword with the digitized image. A database is provided to store the digitized image and linking information. Claims What is claimed is: Description BACKGROUND AND SUMMARY OF THE INVENTION
Secretary A - Expense Report -- Purpose
.vertline. .sup. .vertline.- Date or Period
.vertline. .sup. .sup. Account
Purchase Order -- Requested by (or Project)
.sup. .vertline.- Vendor
.sup. .vertline.- Date
.sup. - Account
In the above example, assume that Secretary A enters his or her identification: "Secretary A" through suitable means such as by using a magnetic card inserted into a card reader on the imaging device. The keyword retrieval subsystem 106 and its associated keyword entry system 108 will retrieve the information stored in memory 120 associated with that user I.D. All or a portion of the contents of memory 120 would then be displayed on the touch screen or on another suitable display screen. While the presently preferred embodiment uses an inexpensive touch screen to display keyword choices and to receive the user's input, a larger system might employ a CRT or flat panel LCD display screen. Such display screen may include an associated computer subsystem for running a client application or browser for display of the information stored in memory 120. In such a configuration, memory 120 and the associated submodules 106 and 108 would function as an information server, supplying information to the browser's client application for display to the user. Depending on the quantity of information to be displayed, only the top level data elements may be displayed: "Expense Report" and "Purchase Order". Alternatively, if a larger display is available, lower level choices may also be displayed: "Purpose" "Requested by (or Project)". The user interface may include a suitable selection mechanism in addition to or as a substitute for the touch screen. Such selection mechanism may be a pointing device (track ball, touch pad, mouse, joy stick) or keyboard for selecting keywords from the displayed list and for selecting lower level data structures based on a displayed data element. In some applications, it may be possible to dispense with the user identification process. Thus the user identification subsystem 104 is optional in those applications. Instead of requiring the user to identify himself or herself, an alternate embodiment would allow the user to simply choose directly from an appropriate list of keywords. This would be done by displaying in hierarchical fashion a selection of keywords from which the user may choose. Alternatively, the user can enter the keyword directly using the keyword entry subsystem 108. As noted above, the keyword entry subsystem 108 has a variety of different user interface options. These include, numerical keys 110 and character keys 112, which may be provided using a physical keyboard or a virtual key board displayed on a computer touch screen. An on-line handwriting recognition module 114 allows the user the interact with the system through handwritten messages and a voice recognition system 116 allows the user to interact with the system using speech. As an alternate to on-line handwriting recognition, an electronic ink matching system 118 may be used. The electronic ink matching system may be implemented as described in U.S. Pat. No. 5,832,474, Lopresti, et. al. to retrieve information by conducting a partial match search against user-drawn annotations. Associated with the keyword entry module 108 is the keyword merging/linking module 124. Module 124 is responsible for merging or linking the user-selected keywords to the image data stored in memory 102. The keywords can be merged into the data structure of the image data file, such as within the tag of a TIFF file. Alternatively, the keywords can be linked to the image data file using suitable database software, or linked to a file as the file name (such as PO100.tif generated from the keyword "Purchase Order 100"), or used to specify a directory used to store an image, such as saving an image related to "Project-A" in the directory called Project-A. The resultant merged/linked image data and keywords are then supplied to a storage device 126 which can be either a locally attached device or a remote storage device accessed through a suitable network connection. The second alternate embodiment of the invention dispenses with the keyword list. Thus subsystem 106 is not required. The keyword entry subsystem 108 is still used, however. The user evokes one of the keyword input mechanisms 109 to enter keywords directly into module 108. Module 108, in turn, supplies the user selected keywords to the merging and linking module 124. If desired, the computationally expensive components, such as the handwriting and voice recognition components and/or the circle extraction and/or electronic ink processing components may be implemented on a remote server, such as the server associated with the image storage system. A third embodiment of the invention is illustrated in FIG. 3. This embodiment employs a keyword extraction technique that can be used to eliminate the need for the keyboard, handwriting, voice and electronic ink input mechanisms. Instead, the embodiment uses the scanning subsystem 100 and its associated memory 102 for input of the desired keywords. The user handwrites keywords on the first page of a document or on a cover sheet and those handwritten keywords are scanned and stored as images in memory 102. The keyword extraction module 134 and keyword recognition module 136 then analyzed the users handwritten keywords to extract them and convert them into suitable alphanumeric text that may be then merged or linked with the remaining image data by module 124. As an alternate to user handwritten entry of keywords, a circled region extraction technology may be employed. The details of the presently preferred extraction technology are described below. Instead of handwriting the keywords, the user simply draws circles around keywords that already appear in the text of the document being scanned. The circled region extraction technology identifies the user-drawn circles, then extracts the keywords within those identified circles. Preferably, the extracted keywords are then recognized. In some instances it may be useful to attach some form of user identification to the images being stored. The user identification module 104 provides this information directly to the merging/linking module 124, as illustrated. The user identification module 104 can be implemented in any of the ways described above in connection with FIG. 2. In all embodiments it may also be beneficial to add date and time information to the images being stored. Such information can be input by the user identification module 104. Of course, the date and time information may be added using other modules as well. In this regard, the merging and linking module 124 might be provided with the date and timestamp capability, for example. Circled Region Extraction The presently-preferred user-circled region extraction technology uses a contour analysis technique. The technique is based on identifying the contour of a candidate object. The candidate object may include user-drawn circles, machine printed frames or other non-text materials on the page. By analyzing the contour using a feature extraction module, the algorithm extracts the user-drawn circle. The contour is represented by a sequence of points with different curvature values. This makes it possible to distinguish a user-drawn circle from a machine-printed frame through a post processing analytical technique. Referring to FIG. 4, the first step (step 250) is to find the connected components within a given page. In this step the image is scanned line-by-line and each pixel is labeled if it is connected with its neighboring pixels. After labeling the pixels for connectivity, a bounding box is calculated for each connected component. These bounding boxes are used to extract the candidate area in the image. Next (at step 252) connected components representing text are eliminated. This is accomplished by analyzing the size of the bounding box and eliminating those connected components that have bounding boxes below a predetermined size. Next, halftone images are eliminated at step 254. Halftone images compose large connected components. The algorithm detects halftone images by assessing the black-to-white pixel ratio within the bounding box associated with the connected component in question. Halftone images tend to have more black pixels than areas containing text. To speed up the algorithm, an optional pre-processing step may be performed at 256. The contour analysis technique may generate more feature points than are actually required when there are extraneous characters touching the contour in question. This can occur, for example, when the user draws a circle that happens to intersect with other characters on the page. The optional pre-processing step eliminates these touching characters by performing a morphological operation to simplify the shape of the contour in the region where the user-drawn circle and extraneous character intersect. Next, the user-drawn circle is identified by examining different candidate areas on the page. The first step in this procedure, depicted at 258, involves generating the contour of the candidate object. This is performed by tracing the outline of the object. The contour is represented in computer memory as an ordered set of points (coordinates of the boundary pixels). The tracing scheme first scans for the starting point (a border pixel that is not previously traced). Then the trace starts in a clockwise direction along the convex outline of the object. When the trace goes back to the starting point, or to a point where no more black pixels can be found around the current one, the trace stops and the scanning process to find the next starting point continues. Next the contours obtained during step 258 are analyzed by calculating feature points associated with each contour (step 260). Contours obtained from step 258 can be closed curves or broken curves, due to noise in the image. Feature points are defined as high curvature points, including junctions of circles with other objects. Curvature can be calculated using re-sampling techniques, however this may not be reliable when noise is generated during the digitization process. By observation, the feature points can be detected approximately at either the local maxima or local minima on the x and y directions, even though not all maxima and minima are feature points. In the preferred implementation the starting point and ending point of each contour are treated as feature points. After feature points have been identified, the circled region is reconstructed using the feature points. This is illustrated at step 262. In essence, each contour generated at step 258 is broken into segments at the feature points. These segments are examined and reconnected such that segments belonging to different objects are separated and those belonging to the same object are connected. The main criterion for reconnecting the contour segments is to check the smoothness when making the transition between neighboring segments. For each contour segment, a small area around its starting and ending points is examined. The points on the two ends are fitted into lines so that the slope (angles coming and going out of the segment) can be estimated. These estimated angles are used to assess whether two line segments are approaching one another such that they should be connected as belonging to the same contour or are crossing one another such that they belong to unrelated contours. Using the circles reconstructed at step 262, the user-drawn circles are identified at step 264 through a series of tests. The first test is based on the size of the contour as well as the area the contour covers. If the length of the contour exceeds a predetermined threshold and the bounding box of the contour covers a predetermined area, the algorithm considers the contour to be a user-drawn circle. However, in order to discriminate between user-drawn circles and machine-printed frames, such as rectangular frames, machine-printed circles or tables, the smoothness of the connected contour is examined. One way to measure smoothness is to calculate the average curvature along the connected contour. If smoothness exceeds a predetermined threshold, the contour is considered to be machine-printed. Because the algorithm reconstructs circled regions from the calculated feature points, it is able to identify user-drawn circles even if they contain small gaps or breaks. If desired, the user-drawn circle candidates can be further evaluated to determine if any gaps are sufficiently large to warrant rejection as user-drawn circles. The analysis (depicted at step 266) involves assessing the distant between the starting point and ending point of a contour. Distance can be assessed in a variety of different ways. One technique for assessing distance is to determine whether one end point is within a predetermined radial distance from the other end point. We refer to this as a circular distance function. Another technique is to define a square bounding box of predetermined size around one end point and to determine whether the other end point is within that bounding box. We refer to this as the square distance function. A third technique is to define a square bounding box of predetermined size around one end point and then to rotate the bounding box around that end point to determine if at any rotational orientation the second end point falls within the bounding box. This will occur, if at all, when one comer of the bounding box lies on a line between the two end points. We call this the Manhattan distance function. If the contour fails to pass any of the above tests, then it is considered to be an open arc, as opposed to a user-drawn circle. Once the user-drawn circle is identified as described above, the bounding box around its contour is used to delimit the region that will be extracted for subsequent optical character recognition analysis. This is illustrated at step 268. The image inside the bounding box of the contour is extracted and optical character recognition is performed on the extracted image to ascertain the keyword or keywords for database lookup. If no good circle is identified, the system can be configured to attempt to extract a fax number from within a "circle" previously rejected as an open arc or poorly drawn circle. The extracted number is presented to the user to confirm or correct, as described above. Operation Keywords may be entered by the user under a variety of different circumstances. Keywords can be entered as part of a task specification, as when specifying the number of copies to be made by a copier prior to digitizing the scanned document. If the electronic imaging apparatus supports multi-tasking operations, the user can enter keywords while the apparatus is scanning the document. The scanning mechanism 100 may have an automated page feeding capability and the user may enter keywords while the feeding mechanism is in operation. Keywords may also be entered as part of a task specification such as identifying the image processing and/or image routing to be performed by the imaging device. In this regard, the imaging device may perform additional processes on the scanned image once it is loaded into memory 102. Additionally, the imaging device may have an associated telecommunications component (not shown) for faxing the image or sending it via a computer network to a remote location. Furthermore, the imaging device may have an associated printing mechanism to allow hard copies of the image in memory 102 to be printed. As noted above, the system can be used without first providing the user identification. This is done by entry into a keyword mode, which can be selected by selecting a suitable touchpad "button" on the display screen. When the user selects the keyword mode, a predetermined set of keywords is displayed on the touch screen. The keywords can be hierarchically organized. The user simply selects appropriate keywords (one or more keywords may be selected). When a number has to be entered as part of the keyword, such as a purchase order number, the numerical keyboard 110 is used. All selected keywords are stored in memory 120 so that the same keywords can be reused for a series of similar documents. Stored keywords can be recalled and modified by the user. Thus a keyword containing an associated purchase order number may be modified by recalling a previously stored keyword with number and then editing the number using the numerical keyboard. When keywords are not found in the keyword list, the user may add them. The presently preferred embodiment employs another operational mode, referred to herein as "Other Mode". When the user selects the Other Mode the numerical keyboard 110, character keyboard 112, handwriting recognition module 114, voice recognition module 116 and electronic ink module 118 may be used to input the keyword being added. Note that when the electronic ink module 118 is used, the user-entered annotation may not necessarily represent a keyword in the conventional sense. Rather, the user-drawn annotation can be any combination of user-drawn strokes that the user wishes to serve as a keyword to be linked with the image data being scanned. For example, the user could draw a simple picture and that picture would then serve as a keyword for later document retrieval. The electronic ink matching would thus also be useful when entering characters in a language that is not supported by the native system. For example, the user could draw Kanji characters or Chinese characters to represent words in an Asian language such as Japanese or Chinese, without requiring the system to perform recognition of those characters. As the user enters new keywords into the system, the user can also optionally store the logical meaning of those keywords. Thus the user could specify that a certain Vendor keyword is associated with the Purchase Order keyword. The keyword and its logical meaning could thus be stored using XML, for example. From the foregoing it will be seen that the invention provides a system for conveniently assigning a keyword to a digitized image. The user simply enters the keyword information at the digitizing equipment. The system formats the keyword information and assigns the formatted information to the digitized image. Because users are allowed to assign appropriate keywords to documents when they digitize them, they do not need to bother with the additional steps required by conventional software or hardware document management systems. While the invention has been described in its presently preferred embodiment in order to afford an enlightened understanding of the invention, and to describe its principles, it will be understood that the invention can be implemented in different ways without departing from the spirit of the invention as set forth in the appended claims.
|
Same subclass Same class Consider this |
||||||||||
