Visual dictionary5983237Abstract A system and method for improving the retrieval performance of a query engine in a visual information retrieval (VIR) system by encoding domain-specific knowledge into the VIR system through a visual dictionary or "victionary". The victionary is a dictionary-like information-mapping module that is used to retrieve visual information at a "semantic" level. A VIR system that performs generic image processing is enhanced by adding a query transformation unit and a query expansion unit, i.e., the victionary. With these additional components, a user may present a query either as a text term (such as a keyword or phrase), or as an image (with weights) and execute a "semantic query". During semantic query processing, the victionary-enhanced system transforms the user's original term (or image query) to a set of equivalent queries, and internally executes all the equivalent queries before presenting the results to the user. The victionary unit is responsible for taking the term (or image query) and finding the equivalent feature vectors (and weights). A result processor accumulates the score sheets of each equivalent query and presents a composite ranking that reflects a faithful representation of each equivalent query to the user. The architecture of the victionary-enhanced system is open and extensible, so that one or more domain-specific victionary modules can be plugged into the system. The plug-in architecture of the victionary module is effected through an application programming interface (API). Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE 1
______________________________________
QUERY STRUCTURE
Field Type
______________________________________
QUERY-TYPE IMAGE or TERM
BATCH-SIZE Integer
NOMINAL-FLAG
Boolean
COMBINATOR OR or AND
TERM-ARRAY Array of String
QUERY-FEATURE
Feature Vector, represented as packed binary data
QUERY-WEIGHT
Array of Float
EQUIV Array of Synonym Data Structure
______________________________________
The Query Structure is representative of the query image or query term. The Query Transformer 314 passes the Query Structure to a Visual Dictionary (Victionary) 316, and receives an equivalent query synonym structure, comprising feature vectors, in return. The equivalent Query Structure is then sent to a Query Processor 318. The Query Processor 318 executes the equivalent queries by retrieving feature vectors from the database 112 (using the database interface 208), comparing them against the equivalent feature vectors and creating a ranked score sheet indicating the similarity of each retrieved image to the query image or query term. The Query Processor 318 sends the ranked results back to the Query Transformer 314. The Query Transformer 314 then sends the ranked results along with portions of the Query Structure to the Result Processor 210. The Result Processor 210 uses the ordered results and the Query Structure, including user defined weights if the query utilized an image, to compute a final ranked list. Images (or their thumbnail representations) corresponding to the ranked list are preferably obtained through the Query Processor 318 from the database interface 208 and presented to the user through the Results Interface 212. The Result Processor 210 may optionally obtain the images (or their thumbnail representations) directly from the database interface 208. The VIR system 310 may include the Query Refinement Interface 214 that sends refinement parameters based on user input to the query refinement processor 216 for modification of a query. The Query Refinement Processor 216 may send a new Query Structure for refinement of the database query to the Query Processor 318 to produce a new set of query results and/or may send a refinement request to the Result Processor 210 to refine the prior set of query results. Query refinement preferably comprises weight manipulation, although other types of query refinement are envisioned. Referring to FIGS. 9-12, the victionary-enhanced VIR system 310 depicted in FIG. 8 executes the following steps during query processing. The specific steps performed within the Victionary unit 316 and within the Result Processor 210 are detailed separately below. Referring now to FIG. 9, the Query Parser process 312, previously shown in FIG. 8, will be described. Beginning at state 352, a user of the victionary-enhanced VIR system 310 provides a user query. Moving to state 312, process 312 gets the user query through a specified Graphical User Interface (GUI) operation, such as clicking a button. The process 312 also obtains a BATCH-SIZE of the number of result images desired to be seen by the user at a time. The BATCH-SIZE is used to present a desired number of images at one time on the visual display. Continuing to a decision state 356, process 312 determines if the user query is a term or an image query. If query is an image, process 312 sets a variable QUERY-TYPE=IMAGE, and proceeds to state 358. At state 358, process 312 uses a VIR engine to compute the feature vector of the image query and store it in a vector-type variable QUERY-FEATURE. Computation of feature vectors is described in Applicant's U.S. patent application Ser. No. 08/829,791, filed Mar. 28, 1997, for "SIMILARITY ENGINE FOR CONTENT-BASED RETRIEVAL OF OBJECTS". Proceeding to state 360, process 312 places the user-supplied weights from the GUI in a vector-type variable QUERY-WEIGHT. The weights are preferably part of the user query for an image type query. Continuing at state 362, process 312 sets a variable array TERM-ARRAY=NULL, and a variable COMBINATOR=NULL. The variable COMBINATOR is used to designate how terms are to be combined in a term type query. Moving to state 364, process 312 creates a variable QUERY-STRUCTURE object using the previously defined variables QUERY-TYPE, BATCH-SIZE, QUERY-FEATURE, NOMINAL-FLAG, QUERY-WEIGHT, TERM-ARRAY, and COMBINATOR. An additional field EQUIV of QUERY-STRUCTURE is unfilled at this time. Normally, NOMINAL FLAG is clear (OFF), but application logic, such as during the Query Transformer, may set the flag. If set (ON), NOMINAL FLAG only retrieves a nominal vector (described hereinbelow). Process 312 completes at state 366 by passing the QUERY-STRUCTURE created at state 364 to the Query Transformer 314 (FIG. 8). Returning to decision state 356, if the query type is term, process 312 sets the variable QUERY-TYPE=TERM, and advances to state 370. At state 370, process 312 preferably uses a standard text parser, such as described by Allen I. Holub, Compiler design in C, Englewood Cliffs, N.J., Prentice Hall, .COPYRGT.1990, to isolate individual phrases of the query. Continuing at state 372, process 312 places each phrase in a row of the array called TERM-ARRAY. Moving to a decision state 374, process 312 determines the type of a user-supplied combination condition, i.e., AND or OR. The variable COMBINATOR is set to a value of "OR" at state 376 if the combination condition is OR, or is set to a value of "AND" at state 378 if the combination condition is AND. Continuing at state 380, process 312 sets the variable QUERY-FEATURE=NULL, and the variable QUERY-WEIGHT=NULL. Proceeding to state 364, process 312 creates the variable QUERY-STRUCTURE object with QUERY-TYPE, BATCH-SIZE, NOMINAL-FLAG, COMBINATOR, TERM-ARRAY, QUERY-FEATURE, and QUERY-WEIGHT. Referring now to FIG. 10a, 10b and 10c, the Query Transformer process 314, previously shown in FIG. 8, will be described. Beginning at state 402, process 314 receives the QUERY-STRUCTURE generated by the Query Parser 312. Moving to a decision state 404, process 314 determines the type of query by examining the variable QUERY-TYPE. If the type of query is Image, process 314 continues at state 406 wherein the QUERY-STRUCTURE is passed to the Victionary 316 (FIG. 8). The structure and operation of the Victionary 316 will be described in conjunction with FIGS. 14-16 below. Moving to state 408, process 314 receives an equivalence structure SYNONYM from the Victionary (see details hereinbelow). Advancing to state 410, process 314 stores the SYNONYM structure in array variable EQUIV (part of QUERY-STRUCTURE). Continuing at state 412, process 314 initializes a variable ROW to a value of zero and a variable COLUMN to a value of zero. Moving to state 414, process 314 sends the feature vector at EQUIV [ROW,COLUMN] to the Query Processor 318 to execute VIR query through database interface 208. Process 314 receives a ranked result of the query back from the Query Processor 318 at state 416. Continuing at state 418, process 314 passes ROW, COLUMN, RESULT and the variables QUERY-TYPE, QUERY-WEIGHT, and BATCH-SIZE to the Result Processor 210. The Result Processor will be further described in conjunction with FIG. 12. Proceeding to a decision state 420, process 314 determines if all the columns in the current row of EQUIV have been processed. If not, process 314 moves to state 422, increments the value of variable COLUMN by one, and moves back to state 414 to send the next feature vector at EQUIV [ROW,COLUMN] to the Query Processor 318. If all the columns in the current row of EQUIV have been processed, as determined at state 420, process 314 advances to a decision state 424 and determines if all the rows in EQUIV have been processed. If not, process 314 moves to state 426, increments the value of variable ROW by one, reinitializes the value of COLUMN to zero, and moves back to state 414 to send the next feature vector at EQUIV [ROW,COLUMN] to the Query Processor 318. If all the rows of EQUIV have been processed, as determined at state 424, the Query Transformer process 314 completes at an end state 428. Returning to decision state 404, if the type of query is determined to be Term, process 314 proceeds to state 430 wherein a variable I, used as an array index, is initialized to zero. Continuing at state 432, process 432 passes the term at TERM-ARRAY [I] to the Victionary 316 (FIG. 8). The structure and operation of the Victionary 316 will be described in conjunction with FIGS. 14-16 below. Moving to state 434, process 314 receives the equivalence structure SYNONYM from the Victionary 316. Advancing to state 436, process 314 stores the SYNONYM structure in variable EQUIV [I]. Proceeding to a decision state 438, process 314 determines if all terms in array TERM-ARRAY have processed. If not, process 314 increments the value if I by one and moves back to state 432 to send the next term to the Victionary 316. When all the terms in array TERM-ARRAY have processed, as determined at decision state 438, process 314 moves a decision state 444 (FIG. 10b) through off-page connector A 442. At decision state 444, process 314 determines the value of variable COMBINATOR. Variable COMBINATOR was previously set at state 376 or 378 of the Query Parser process 312 (FIG. 9). If the value of COMBINATOR is OR, process 314 proceeds to state 446. At state 446, process 314 initializes a variable T to zero, variable ROW to a value of zero, variable COLUMN to a value of zero, and variable TERMROW to a value of zero. The variable T is used to index the current term of array EQUIV. Moving to state 448, process 314 chooses the feature vector at [ROW, COLUMN] of array EQUIV [T]. Proceeding to state 450, process 314 sends the feature vector selected at state 448 to the Query Processor 318 to execute a VIR query through database interface 208. Process 314 receives a ranked result of the query back from the Query Processor at state 452. Continuing at state 454, process 314 passes TERMROW, COLUMN, RESULT and the variables QUERY-TYPE and BATCH-SIZE to the Result Processor 210. Proceeding to a decision state 456, process 314 determines if all the columns in the current row of EQUIV [T] have been processed. If not, process 314 moves to state 458, increments the value of variable column by one, and moves back to state 448 to select the next feature vector of EQUIV [T]. If all the columns in the current row of EQUIV [T] have been processed, as determined at state 456, process 314 advances to a decision state 460 and determines if all the rows of EQUIV [T] have been processed. If not, process 314 moves to state 462, increments the value of variable ROW by one, reinitializes variable COLUMN to zero, increments the value of TERMROW by one, and moves back to state 448 as previously described. If all the rows of EQUIV [T] have been processed, as determined at state 460, process 314 proceeds to a decision state 464 and determines if all terms T of EQUIV have been processed. If not, process 314 moves to state 466, reinitializes the variable ROW to zero, reinitializes the variable COLUMN to zero increments the value of T by one, and moves to state 448 as previously described. However, if all terms of EQUIV have been processed, as determined at state 464, the Query Transformer process 314 completes at an end state 468. If there is a single term in the user's query, the variable COMBINATOR will be an OR. In this situation, there will be a single ROW and a single term T in EQUIV, so both decision states 460 and 464 will evaluate to the Yes branch. Returning to decision state 444, if the value of COMBINATOR is determined to be AND, process 314 proceeds to state 472 (FIG. 10C) through off page connector B 470. At state 472, process 314 creates a structure NETEQUIV. NETEQUIV is an array variable that is the same as array EQUIV but is initially empty. When the value of COMBINATOR is AND, NETEQUIV is used to store the intersection of terms in EQUIV that are in the user query. Proceeding to state 474, process 314 chooses index I such that EQUIV [I] identifies the term of EQUIV having the least number of equivalent feature vectors. This is to reduce processing time, because the result of intersecting sets of equivalent feature vectors will not have more members than the number of equivalent feature vectors of the smaller set. Advancing to state 476, process 314 sets NETEQUIV equal to EQUIV [1]. Continuing at state 478, process 314 marks EQUIV [I] to note which term in EQUIV is being processed so as not to repeat the same term at a later time. Proceeding to state 480, process 314 chooses index J such that EQUIV [J] is not marked. Moving to a function 482, process 314 performs a term intersection of NETEQUIV and EQUIV [I]. The tenn intersection function 482 is described in conjunction with FIG. 11. Continuing at state 484, process 314 sends NETEQUIV [ROW, COLUMN] to the Query Processor 318 to execute a VIR query through the database interface 208. Process 314 receives a ranked result of the query back from the Query Processor 318 at state 486. Continuing at state 488, process 314 passes ROW, COLUMN, RESULT, and the variables QUERY-TYPE and BATCH-SIZE to the Result Processor 210. Proceeding to a decision state 490, process 314 determines if all the columns in the current row of NETEQUIV have been processed. If not, process 314 moves to state 492, increments the value of variable COLUMN by one, and moves back to state 484 to send the next feature vector in NETEQUIV to the Query Processor 318. If all the columns in the current row of NETEQUIV have been processed, as determined at state 490, process 314 advances to a decision state 494 and determines if all the rows in NETEQUIV have been processed. If not, process 314 moves to state 496, increments the value of variable ROW by one, reinitializes the variable COLUMN to a value of zero, and moves back to state 484 as previously described. If all the rows of NETEQUIV have been processed, as determined at state 494, process 314 advances to state 498. At state 498, process 314 marks the term at EQUIV [J]. Moving to a decision state 500, process 314 determines if there are any unmarked terms in EQUIV, i.e., more feature vectors to be processed. If so, process 314 moves back to state 480 to choose a new index value J as previously described. However, if there are no unmarked terms in EQUIV, as determined at decision state 500, the Query Transformer process 314 completes at a done state 502. Referring now to FIG. 11, the Term Intersection function 482, previously defined in FIG. 10c will now be described. This function 482 is used by the Query Transformer process 314 when the value of COMBINATOR is AND. Function 482 begins at state 522 by generating a new array structure called NEWROW. The function 482 then accesses the feature vector at row I1 in NETEQUIV and column element J1 of row I1 in array NETEQUIV. Function 482 also accesses the feature vector at row I2 and column element J2 of row I2 of the term at EQUIV [I]. Next, at state 526, function 482 computes a distance D between the accessed feature vector of EQUIV [I] and the accessed feature vector of NETEQUIV. If the distance is less than a designer-defined threshold T, as determined at a decision state 526, function 482 computes the weighted average of the accessed feature vector of EQUIV [I] and the accessed feature vector of NETEQUIV at state 532, and places the computed average into the structure NEWROW at state 534. Next, function 482 loops over all column elements of row I1 of NETEQUIV and then replaces row I1 of NETEQUIV with NEWROW at state 546. Finally, function 482 loops over all the rows of NETEQUIV and deletes structure NEWROW at state 550 when all the rows of NETEQUIV have been processed. Referring to FIG. 12, the Result Processor process 210, previously shown in FIG. 8, will now be described. Beginning at a state 570, process 210 receives ROW, COLUMN, and RESULT from the Query Transformer 314, along with the variables QUERY-TYPE, QUERY-WEIGHT (for an image query), and BATCH-SIZE. Proceeding to state 572, process 210 stores RESULT into the [ROW,COLUMN] position of a RAW-RESULT matrix. Moving to a Diversity Maximization function 574, process 210 processes the RAW-RESULT matrix to produce a BATCH-SIZE number of ranked results to be displayed to the user. Function 574 ensures that the user is presented with the results in user-selectable batches such that for each batch, the diversity of result images is as broad as possible. The Diversity Maximization function 574 rearranges the results of the user query to show a proportion of the results corresponding to each equivalent feature vector. For example, if one user query has five equivalent feature vector queries, the results displayed to the user will have a portion from each equivalent feature vector query as determined by function 574. At the completion of the Diversity Maximization function 574, process 210 moves to state 576 and sends the ranked results of function 574 to the Results Interface 212 (FIG. 8). The Result Processor process 210 completes at an end state 578. IV. Diversity Maximizing Function for the Result Processor Referring to FIG. 13, the Diversity Maximization function 574 defined in FIG. 12 of the query processing process will now be described. The input to the flnction 574 is the RAW-RESULTS matrix 590 corresponding to the SYNONYM structure received by the Query Transformer 314 from the Victionary 316. An exemplary SYNONYM structure 840 is shown in FIG. 17. As previously discussed, the "visual sense" of a term is used to refer to a specific visual appearance that the term may take in a picture. The rows 842 of the matrix 840 correspond to individual visual senses and the columns 844 correspond to the variety of visual categories within each visual sense. For example, if a visual sense (row) is "cloudy sky", visual categories may include red clouds, white clouds, dark clouds, and so forth. Each entry of the RAW-RESULTS matrix is a ranked list of results returned from the Query Transformer 314. The steps herein are enumerated for a single batch (determined by BATCH-SIZE) and are repeated for every batch of results that the user wishes to view. A variable QUOTA is used to indicate how many images to display per visual sense (row) of the matrix, and a variable ENTRYQUOTA is used to indicate how many images to display per visual category (column) of the matrix. Each step below corresponds to one or more states in FIG. 13.
______________________________________
Step 1:
(a) For each row of the RAW-RESULTS matrix 590
(b) For each column entry
(c) compute COUNT(I,J) = the number of elements in the
ranked list
Step 2:
(a) For each row of the RAW-RESULTS matrix
(b) compute ROWSUM(I) = sum of COUNT(I,J) over its
column elements
Step 3:
compute TOTAL = sum of ROWSUM(I) over all rows
Step 4:
For each row of the RAW-RESULTS matrix
(a) compute QUOTA(I) =
TOTAL * ROWSUM(I) / BATCH-SIZE
Step 5:
Reorder the rows of the RAW-RESULTS matrix such that rows
are sorted in decreasing order of QUOTA (I)
Step 6:
For each row of the RAW-RESULTS matrix
(a) compute ENTRYQUOTA(I,J) =
QUOTA(I) * COUNT(I,J) / ROWSUM(I)
(b) Reorder column entries of the row such that columns are
sorted in decreasing order of ENTRYQUOTA(I,J)
Step 7:
(a) Create array FINAL-RESULT of BATCH-SIZE
(b) For each row I of the reorganized RAW-RESULTS matrix
(c) set variable ROWPICK(I) = 0
(d) For each column entry J
(e) set variable COLUMNPICK(I,J) = 0
Step 8:
(a) For each row I of the reorganized RAW-RESULTS matrix
(b) unless ROWPICK (I) >= ROWSUM(I)
(c) For each column entry J
(d) unless COLUMNPICK (I,J) >= ENTRYQUOTA(I,J)
1. pick the first unmarked result from the ranked list of
column entry J and place in FINAL-RESULT
2. mark the picked result
3. increment COLUMNPICK(I,J)
4. increment ROWPICK(I)
(e) if COLUMNPICK (I,J) >= ENTRYQUOTA(I,J) go to next
column
(f) if ROWPICK (I) >= ROWSUM(I) go to next row
Step 9
Return FINAL-RESULT to calling routine
______________________________________
V. Architecture of the Victionary Referring to FIG. 14, the Victionary 316 has five major components, as described below: 1. Concept Manager 682: The Concept Manager module 682 controls a central component of the Victionary 316 referred to as a Concept Lexicon 684, which is a list of phrases that the dictionary is capable of handling, together with an internal identifier, referred to as a concept ID (CID), which an Association Manager 686 (described hereinbelow) uses. The content of the Concept Lexicon 684 depends completely on the domain of application of the Victionary 316, and is outside the scope of this invention. Concepts can be categorized into several types, designated by a list of type identifiers (TIDs). The embodiment described in this specification distinguishes between two distinct categories of visual concepts: descriptor concepts that roughly correspond to noun phrases, and qualifier concepts, that roughly correspond to adjective phrases (e.g., "red"). Other embodiments may use other types of concepts, such as image-centric concepts and object-centric concepts, surface boundary-centric concept or surface texture-centric concept, and so on. The Concept Lexicon 684 can be preferably accessed in four different ways: a) Direct Access: In this situation the Victionary 316 is invoked with a term and the Concept Manager 682 uses a Lexicon traversal mechanism to identify the concept and returns the CID. The simplest embodiment of the Victionary 316 only has this option. b) Access through Concept Types: An index of concept types is locally maintained within the Concept Manager 682. Using this index, the Concept Manager 682 returns the TID, given the CID of a concept. For the purpose of simplicity, a CID has a single TID in this embodiment, although specific domains need not conform to this restriction. c) Access through a internal Concept Association Network: A Concept Association Network has the same purpose as a thesaurus, namely, relating different concepts. However, while a thesaurus associates a concept with its synonyms, homonyms, hyponyms and so on, the Concept Association Network relates concepts by what should be retrieved. Optionally, it also labels an association with a strength of relevance. Thus, although as per Roget's Thesaurus, the concepts "leaf", "turf" and "tree" are all hyponyms of the concept "plant", "turf" should be far less relevant than "tree" for semantic co-retrieval with "leaf" in the Victionary 316. d) Access through an external Concept Management Software: In this case the user's concept is passed through a Victionary interface to an external software program such as a thesaurus engine. The thesaurus engine retrieves equivalent concepts, which are then used by the Concept Manager for Direct Access. 2. Definitional Feature Lexicon (DFL) 690: This specification refers to a feature as a data unit having four components: a feature identifier (FID), a feature vector produced by a known feature schema, a weight vector which defines a hyper-ellipsoid in the feature space, and possibly an additional set of parameters used during computation or comparison of the features. These parameters are taken to have been produced by a media engine that was used to create the DFL. In this document we refer to every member of the DFL as an example feature. The lexicon is called definitional because it covers the examples provided by the user for every CID. The technical rational behind the example feature is explained below. The Victionary 316, as mentioned before, is example-based rather than model based. A model-based approach identifies a concept with a single region of definition in the feature space, while an example-based approach identifies several disjoint regions in the feature space that are perceptually or semantically associated with the same concept. FIG. 6 shows how these different regions may look like in a hypothetical two dimensional feature space. Each of the disjoint regions in the feature space is called a "prototype". The concept is represented as a disjoint collection of prototypes. Once the designer of the Victionary has identified the prototypes, the designer provides a number of examples from each prototype, so that in the feature space, these models approximately define the region covered, as shown in the example of FIG. 7. Since a feature vector defines a point in the feature space, and the weight vector defines a hyper-ellipsoid around it, one will need several smaller hyper-ellipsoids to span the region of definition and minimal additional regions. The DFL is preferably accessed through the Association Manager. 3. Nominal Feature Lexicon 688: Distinct from the DFL 690, the Victionary 316 also stores a Nominal Feature Lexicon (NFL), also accessed through the Association Manager 686. A nominal feature is computed after the DFL 690 is generated by a media engine, such as the VIR engine. The Query Processor 318 (FIG. 8) utilizes the VIR engine. Generation of the DFL is further described in the Creating Synonyms section hereinbelow. The purpose of a nominal feature is to represent a single "ballpark region" in feature space for each prototype (designated by a PID) of every CID. This is achieved by a process that computes one hyper-ellipsoid that covers the definitional region. In all likelihood, this nominal feature will also cover additional regions in the feature space that are near but not included in the definitional region. This computation may be conducted either inside the Victionary in one embodiment, or by an outside agent while creating the Victionary in another embodiment. 4. Association Manager 686: The Association Manager 686 has three functions. The Association Manager 686 maintains a data structure that indexes a concept to the NFL 688 and DFL 690. The data structure preferably is a B-tree indexing structure, although other indexing structures, such as a hash table, may be used. This data structure allows the user to perform the following operations: a) NFL 688 i) Function 1: Given a CID, it returns the nominal features of its prototypes. ii) Function 2: Given a feature vector, it returns a list of concept-prototype (CID-PID) pairs. The list represents all nominal features that the input feature vector is included in. Optionally, this list may be ranked in order of the distance of the input feature vector from a Nominal Feature Vector (NFV). This computation may be performed either within the Victionary, or by using the media engine of the system 310 where the Victionary is implemented. iii) Function 3: A variant of the second function, given a feature vector and a weight vector as input, it returns a list of (CID-PID) pairs. The list represents all nominal features the input feature region intersects. b) DFL 690 i) Function 1: Given a (CID-PID) pair, it gets all its example features. A variant is to just use the CID to get the examples. ii) Function 2: Given a feature vector and a CID, it returns a ranked list of PIDs. This computation may be performed either within the Victionary, or by using the media engine of the system where the Victionary is implemented. iii) Function 3: Given a qualifier concept and an example feature of descriptor concept, it modifies the example feature vector of the descriptor concept by the qualifier, and returns the modified feature vector. For example, a concept "yellow flower" can be generated by modifying all example features of the concept "flower" by changing the color primitive, as described in A. Gupta, "Visual information retrieval: a VIRAGE perspective", White Paper, Virage, Inc., 1995, of the example features of flower. 5. Controller Unit 680: The Victionary 316 mainly communicates with the Query Transformer 314 through an API, which reflects the functionality described above. In another embodiment, the Victionary also interacts with a Content Management Engine or a Text Engine (not shown). In a still further embodiment, if the specific Victionary is very large, it exchanges information with a back-end storage or database server (not shown). VI. Determining Equivalent Features within a Visual Dictionary Referring to FIG. 15, one embodiment of the Victionary 316 (FIG. 8 and 14) uses the following steps to determine the feature vectors equivalent to a term or an image query. These steps assume that the Victionary 316 has received the variable called QUERY-STRUCTURE from the Query Transformer 314 as discussed above. The Victionary 316 returns the variable SYNONYM to the Query Transformer 314 after executing the following steps of a Control Unit and Concept Manager process 700. Process 700 begins with receiving the QUERY-STRUCTURE 702 from the Query Transformer 314 (FIG. 8). Moving to a decision state 704, the Control Unit portion of process 700 determines whether the type of user query by checking the variable QUERY-TYPE. If the type of query is Image, process 700 proceeds to state 706 and sends QUERY-TYPE, QUERY-FEATURE and QUERY-WEIGHT to the Association Manager process 686. Next, process 700 invokes the Association Manager 686 to index a concept to the NFL 688 (FIG. 14) and DFL 690. The Association Manager process 686 will be further described in conjunction with FIG. 16 below. At the completion of the Association Manager 686, process 700 moves to state 710 wherein the Concept Manager portion of process 700 receives the SYNONYM structure returned by the Association Manager 686. Continuing at state 712, process 700 passes the SYNONYM structure to the Query Transformer 314 and then completes process 700 at an end state 714. Returning the discussion to decision state 704, if the type of query is determined to be Term, the Control Unit portion of process 700 passes the term to the Concept Manager portion of process 700 and moves to state 720. At state 720, process 700 parses the term to isolate individual words and places the words the words into a list called WORD. Proceeding to state 722, process 700 initializes a variable I, used to index the list WORD, to zero. Advancing to a decision state 724, process 724 determines if the word query WORD [I] is in the concept lexicon 684 (FIG. 14). If the indexed word query is not known to the concept lexicon, process 700 proceeds to state 726 and returns with a status of UNKNOWN to the Query Transformer 314 (FIG. 8). In another embodiment, the system 310 may use a text thesaurus to find the synonym of the unknown term. If the indexed word query is known to the concept lexicon, as determined at decision state 724, process 700 continues at a decision state 728 to determine if the indexed word query is a known terminal concept. A terminal concept is a concept that has not been further divided, e.g., "clear sky". If so, the Concept Manager portion of process 700 provides a CID, and a DESCRIPTOR at state 730. The CID and the DESCRIPTOR are saved in temporary storage. The DESCRIPTOR specifies whether the term is a nominative or a qualifier term. At the completion of state 730, process 700 invokes the Association Manager process 686 with QUERY-TYPE, CID and DESCRIPTOR. At the completion of the Association Manager 686, process 700 proceeds to a decision state 734 to determine if all the words of the parsed query term (from state 720) have been processed. If so, process moves to state 710 and receives the SYNONYM structure returned by the Association Manager 686. However, if all the words of the parsed query are not done, as determined at decision state 734, process 700 proceeds to state 736 wherein the value of index I is incremented by one to access the next word in the parsed term at state 724. Returning to decision state 728, if the indexed word query is not a known terminal concept, but is a known abstract concept, process 700 moves to state 740. An abstract concept is a concept that has been further divided, e.g., "sky" is further divided into "clear sky" and "cloudy sky". At state 740, the Concept Manager portion of process 700 provides a TID, and variable DESCRIPTOR is set to a value of NOMINATIVE. The Concept Manager accesses an indexing structure, such as a B-tree or hash table with the indexed word query to obtain the TID. Continuing at state 742, process 700 utilizes the TID obtained at state 740 to search another internal index or table to find all CIDs corresponding to the TID. For example, if the TID corresponds with "sky", the CIDs may include "clear sky" and "cloudy sky". The index search can utilize any search method, such as a B-tree method. The CIDs temporarily stored in memory. At the completion of state 742, process 700 invokes the Association Manager process 686 with QUERY-TYPE, CID(s) and DESCRIPTOR. At the completion of the Association Manager 686, process 700 proceeds to state 710 to receive the SYNONYM structure as previously described. Referring to FIG. 16, the Association Manager process 686, shown in FIGS. 14 and 15, will now be described. Beginning at a decision state 760, process 686 determines the type of user query. If QUERY-TYPE is Image, process 686 moves to state 762 and initializes an index variable I to zero. Proceeding to state 764, process 686 obtains PID [I] from the NFL 688 (FIG. 14). The PID is obtained by querying the NFL 688, wherein the NFL indexes on CID values using a standard structure such as a B-tree. Note that each PID corresponds to a visual sense (row) and that a NFV has an associated PID to uniquely identify the NFV. Proceeding to state 766, process 686 invokes the Query Processor 318 (FIG. 8) with the Nominal Feature Vector obtained at state 764, and the QUERY-FEATURE and QUERY-WEIGHT received from process 700 (FIG. 15). The Query Processor 318 computes a similarity distance between the NFV of PID [I] and the QUERY-FEATURE (i.e., the Feature Vector) representing the query image. At the completion of state 766, process 686 advances to state 768 to determine if all PIDs have been processed. If not, process 686 proceeds to state 770 to increment the value of index I by one and moves back to state 764 to get the next PID [I] from the NFL 688. When all PIDs have been processed, as determined at state 768, each PID has a computed similarity distance in terms of their similarity with QUERY-FEATURE. Moving to state 772, process 686 ranks the PIDs in a list by their similarity distance. Continuing at state 774, process 686 uses a predetermined threshold to identify the "K" most similar PIDs, and places each of these PIDs, and the associated Nominal Feature Vector (NFV) and Nominal Feature Weight (NFW) for the PID, in an array called NOMINAL. Continuing at state 776, process 686 generates a data structure called SYNONYM. An example SYNONYM data structure having rows and columns is shown in FIG. 17. At this point in the process 686, the features for each CID are populated by the corresponding NOMINAL array, i.e., the Nominal Feature Vector and Nominal Feature Weight of each CID-PID pair are entered into the SYNONYM structure. CIDs are utilized for Term type queries not for Image type queries. However, if the type of query is Image, a default CID is assigned for the image query to generalize process 686 for all query types. Advancing to a decision state 780, process 686 determines if the NOMINAL-FLAG is set (ON) in QUERY-STRUCTURE. As previously described, NOMINAL FLAG is normally clear (OFF), but application logic may set the flag. If set (ON), NOMINAL FLAG only retrieves a nominal vector. If the NOMINAL FLAG is set, process 686 returns the SYNONYM structure at state 782 and completes process 686. If the NOMINAL FLAG is clear (OFF), as determined at decision state 780, process 686 advances to state 784 wherein index variables I and J are both initialized to a value of zero. Process 686 then advances to a loop at states 786-794 to populate the SYNONYM structure. At state 786, for CID [I] and PID [J], process 686 queries the DFL 690 (FIG. 14) to get back an array called DEFINITIONAL with columns: FID, Definitional Feature Vector (DFV) and Definitional Feature Weight (DFW). Process 686 moves to a decision state 788 to determine if all PIDs for the current CID are done. If not, process 686 proceeds to state 790 to increment the value of index J by one and move back to state 786 to query DFL 690 again. When all PIDs for the current CID are done, as determined at state 788, process 686 advances to a decision state 792 to determine if all CIDs are processed. If not, process 686 proceeds to state 794 to increment the value of index I by one, reinitialize the value of index J to zero, and move back to state 786 to query DFL 690 again. When all CIDs are done, as determined at state 792, process 686 updates the SYNONYM structure by replacing the array NOMINAL with the array DEFINITIONAL. Continuing at state 796, process 686 checks each PID of SYNONYM to determine if QUAL (Qualifier is further described hereinbelow) is TRUE. If so, process 686 replaces the Definitional Feature Vector of the PID with the Nominal Feature Vector of the Qualifier. This is accomplished by unpacking the Definitional Feature Vector in the SYNONYM structure to extract the components of the DFV and replacing the active component of the Definitional Feature Vector with the active component from the Qualifier. Process 686 then repacks the modified Definitional Feature Vector. Moving to state 798, process 686 returns the SYNONYM structure to the Control Unit portion of process 700 (FIG. 15) and completes at an end state 800. Returning to the decision state 760, if QUERY-TYPE is Term, process 686 moves to state 810 and accesses the temporary storage to get the first CID. The CID(s) were previously obtained and stored at state 730 or state 742 (FIG. 15). Proceeding to a decision state 812, process 686 determines if the variable DESCRIPTOR has a value of NOMINATIVE. If so, process 686 advances to state 814 and queries the NFL 688 (FIG. 14) to retrieve a list containing a PID, a Nominal Feature Vector, and a Nominal Feature Weight. The NFL 688 indexes on CID values using a standard structure such as a B-tree. Note that each PID corresponds to a visual sense. Moving to a decision state 816, process 686 determines if all CIDs are done. If so, process 686 proceeds through connector A 818 to state 776 wherein the SYNONYM data structure is generated, as previously described. However, if all CIDs are not done, as determined at decision state 816, process 686 moves back to state 810 to access the next CID. Returning to the decision state 812, if the variable DESCRIPTOR does not have a value of NOMINATIVE, i.e., DESCRIPTOR=QUAL (Qualifier) for the current word, e.g., "blue", of the Term query, e.g., "blue sky", process 686 continues at state 820. At state 820, process 686 queries the NFL 688 (FIG. 14) to get back a list having a PID, a Nominal Feature Vector, and a Nominal Feature Weight. The NFL indexes on CID values using a standard structure such as a B-tree. The Nominal Feature Vector for a Qualifier is structurally the same as that of a Nominative term, but preferably differs in the content of the component. For example, if the Qualifier pertains to color, only the color component of the Feature Vector is meaningful, while the other components are set to NULL. The meaningful component is referred to as the active component. At the completion of state 820, process 686 advances to state 822 wherein variable QUAL is set to TRUE for each Term having a Qualifier. Moving to a decision state 824, process 686 determines if all CIDs are done. If so, process 686 proceeds through connector A 818 to state 776, as previously described. However, if all CIDs are not done, as determined at decision state 824, process 686 moves back to state 810 to access the next CID. VII. Creating Synonyms in the Victionary Several methods can be used in creating the associations between concepts and their feature equivalents in the Victionary 316. In one embodiment, the process may be manual whereby a system designer decides which terms should be included in the Victionary and for each term, picks the example set to create the DFL. In another embodiment, the process can be semi-automatic, where the designer picks a number of pre-annotated images. The system reads the annotation of each and automatically creates an association list, thereby relating each annotation word with a list images. Those annotation words that have less than a predefined list of images can be dropped. After this construction, the system designer manually reviews whether each word has a proper representative set of image examples. Still more sophisticated systems can employ pattern classification techniques to automatically group example images into visual classes, which can then be used as an automatic visual sense determination mechanism in the Victionary. As indicated earlier, a large body of literature exists in Computer Vision and Pattern Recognition research that uses machine learning techniques toward grouping examples into categories and several of these techniques can also be used in creating the synonyms for the Victionary. VIII. Extensibility of the Victionary There are two primary approaches to extend the scope of the Victionary 316 (FIG. 8). Extensibility of the query system is discussed in Applicant's U.S. patent application Ser. No. 08/829,791, filed Mar. 28, 1997, for "SIMILARITY ENGINE FOR CONTENT-BASED RETRIEVAL OF OBJECTS". The first primary approach is designing the Victionary to be a plug-in module, such that any number of victionaries can be associated with a single query system. The query system can then select one or more victionaries at query time and combine their results. With this flexibility, the query system 310 gains the previously unattained capability of adding domain knowledge to customize the similarity retrieval task for a specific problem In one embodiment, the plug-in architecture has the following application programming interface (API) functions: 1. RegisterVictionary a) Input: victionary name, feature schema used by victionary, pointers to the functions GetSynonyms, AddSynonyms and DeleteSynonyms, identifier of the text thesaurus used by the feature lexicon if any, and identifier of the database used by the feature vector, if any. b) Output: a victionary id c) Function: The system verifies that the function pointers of the input, the thesaurus and the database used are valid. 2. UnRegisterVictionary a) Input: victionary id b) Output: a success/failure notification c) Function: The system removes all internal references to the victionary and any space allocated to it. 3. ConfigureVictionary a) Input: victionary id, the element to be changed and the modified element b) Output: a success/failure notification c) Function: This is used, for example to change any of the initial parameters of a registered victionary such as the victionary name, feature schema used by victionary, pointers to the functions GetSynonyms, AddSynonyms and DeleteSynonyms, identifier of the text thesaurus used by the feature lexicon, and identifier of the database used by the feature vector. d) Example Usage: IF (ConfigureVictionary(v35, TextThesaurus, "/usr/local/lib/thes2.db") NOT-EQUAL-TO "ok") PRINT "Configuration Error" 4. CreateVictionaryInstance a) Input: victionary id b) Output: a handle to a newly created instance of the victionary c) Function: It creates a new victionary process and returns a handle to that process to the user. The user can then use the handle to make further operations on the victionary 5. DeleteVictionaryInstance a) Input: victionary instance handle b) Output: a success/failure notification c) Function: The system removes all internal references to the referred instance of the victionary and any space allocated to it. 6. AddConcept a) Input: a victionary id, a concept word b) Output: a concept handle c) Function: It creates an entry in the concept lexicon 7. DefineConcept a) Input: a victionary id, a concept handle, an example feature vector b) Output: a success/failure notification c) Function: It creates an entry in the DFL with the feature vector, and updates the NFL with the new entry 8. DeleteConcept a) Input: a victionary id, a concept handle b) Output: a success/failure notification c) Function: The system removes all internal references to the concept and its definitions within the victionrary and any space allocated to it. 9. GetSynonyms a) Input: a victionary instance handle, a QUERY-STRUCTURE as described before b) Output: the SYNONYM structure as described before c) Function: get synonyms as shown in FIG. 17 and described in the text A second approach to extending the Victionary is to generalize the victionary concept to other media types such as video and audio. It has been reported how to characterize audio segments into feature vectors for content-based retrieval. The commentators also reported that perceptual audio concepts such as "laughter", "ringing bells" can be acoustically shown to have diverse feature characteristics. Following this example, having an audio equivalent of the Victionary involves two changes to the embodiment described above: 1. The first change involves replacing the visual feature-based query processing engine with an audio feature-based engine. Alternatively, a multimedia engine can be used such that a single engine computes feature vectors from multiple media types and compares features created from any registered media. 2. The second change involves providing the query transformer with the power to identify whether a term should be expanded using an audio dictionary (rather than the visual dictionary). This can be easily accomplished by first querying each dictionary to enquire if the term is in its concept lexicon, and then either sending it to the dictionary that responds positively, or in case of multiple responses, by requesting user intervention. While the above detailed description has shown, described, and pointed out the fundamental novel features of the invention as applied to various embodiments, it will be understood that various omissions and substitutions and changes in the form and details of the system illustrated may be made by those skilled in the art, without departing from the intent of the invention.
|
Same subclass Same class Consider this |
||||||||||
