Apparatus and a method for retrieving image objects based on correlation with natural language sentence parameters5684999Abstract An apparatus for retrieving image objects in accordance with a natural language sentence according to the present invention includes: an input section for receiving a natural language sentence; a language processing section for parsing the natural language sentence by referring to a dictionary for language analysis so as to obtain a syntactic structure of the natural language sentence; a situation element division section for converting the syntactic structure of the natural language sentence into a semantic structure of the natural language sentence and for dividing a situation represented by the semantic structure of the natural language sentence into at least one situation element by referring to a situation element division knowledge base; an image database for storing at least one image object corresponding to the at least one situation element; and a retrieval section for retrieving at least one image object from the image database by using the situation element as an image retrieval key. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
______________________________________
if (a BE verb or a conjugate of the BE verb is
followed by a present participle of a verb)
then {remove the BE verbh or the conjugate of the
BE verb.
convert the present participle of the verb
into the base form of the verb.
______________________________________
A plurality of image-retrieval preprocessing knowledge pieces can be expressed as follows by using a plurality of if-then type condition branches. Such if-then type condition branches are sequentially processed by the image-retrieval preprocessing section 203.
______________________________________
if (condition for performing an image-retrieval
preprocess)
then {
perform the image-retrieval preprocess.
.
.
.
if (condition for performing an image-retrieval
preprocess)
then {
perform the image-retrieval preprocess.
}
______________________________________
The parsing section 204 parses a natural language sentence by referring to the grammar dictionary 209. The grammar dictionary 209 includes the following rules ER1 to ER10 of English grammar, for example: ER1) S.fwdarw.NP VP ER2) VP.fwdarw.VP PP ER3) VP.fwdarw.V NP ER4) VP.fwdarw.V ER5) NP.fwdarw.NP PP ER6) NP.fwdarw.D NP ER7) NP.fwdarw.A NP ER8) NP.fwdarw.N NP ER9) NP.fwdarw.N ER10) PP.fwdarw.P NP The symbols used in the above rules ER1) to ER10) represent the following items: S: sentence NP: noun phrase N: noun VP: verb phrase V: verb PP: preposition phrase P: preposition A: adjective D: articles (denominator) Each rule defines that "a symbol on the left of .fwdarw. is a sequence of symbols on the right of .fwdarw. in that order". For example, rule ER5) defines that "a noun phrase is a sequence of a noun phrase and a preposition in this order". FIG. 4 shows a parsing result of the natural language sentence (1') by the parsing section 204 in the form of a syntax tree. The syntax tree has a structure in which a plurality of nodes are combined in a hierarchial manner. One of rules ER1) to ER10) is applied in order to generate one node. For example, rule ER1) is applied to the uppermost node S in FIG. 4. This is because the node S coincides with the symbol "S" on the left of ".fwdarw." in rule ER1), and the sequence of subordinate nodes of the node S coincides with the sequence of symbols on the right of ".fwdarw." in rule ER1). As a result, nodes NP and VP are generated as subordinate nodes of the node S. The situation element division section 205 converts the syntactic structure of a natural language sentence into a semantic structure by referring to the situation element division knowledge base 210, and divides a situation represented by the semantic structure of the natural language sentence into at least one situation element, so as to generate at least one image retrieval key corresponding to the at least one situation element. The situation element division knowledge base 210 includes a plurality of case frames. The case frames are used for defining the semantic structure of a given natural language sentence. FIG. 5 shows the structure of one case frame included in the situation element division knowledge base 210. This case frame concerns a predicate "run". As shown in FIG. 5, the case frame includes a prospective word 501 to become a predicate for a natural language sentence, a slot 502 describing a pair consisting of the name of a case element related to the word and a constraint(s) which the case element must satisfy, and situation division data 503 describing knowledge for dividing a situation represented by the semantic structure of the natural language sentence into at least one situation element. Examples of predicates of natural language sentences include predicate verbs and predicate adjectives. The case frame of the predicate "run" includes slots corresponding to at least four following case elements: an agent case (AGENT) indicating an actor of an act, a participant case (PARTICIPANT) indicating where the act is directed, a locative case (LOCATIVE) indicating a location where the act is performed, and a purpose case (PURPOSE) indicating a purpose of the act. This indicates that any natural language sentence including "run" as a predicate may include noun phrases or preposition phrases indicating an agent, a participant, a purpose, and a location of the act "run". The constraints for these four slots to satisfy are as follows: The agent case (AGENT) must be a noun phrase representing an animal (ANIMATE). The participant case (PARTICIPANT) must be a preposition phrase consisting of the preposition "with" and a noun phrase representing an animal (ANIMATE). The locative case (LOCATIVE) must be a preposition phrase consisting of the preposition "on" or "at" and a noun phrase representing a location (LOC) or a race (RACE). The purpose case (PURPOSE) must be a preposition phrase consisting of the preposition "in" and any noun phrase. (In the case frame shown in FIG. 5, * indicates that any noun phrase can be used.) The process for converting the syntactic structure of the natural language sentence shown in FIG. 4 into a semantic structure will be described by using the case frame shown in FIG. 5. This process is performed by the situation element division section 205. First, preposition phrases are extracted from the natural language sentence in accordance with the parsing results. A preposition phrase consists of a preposition and a noun phrase. Then, noun phrases are extracted from the remainder of the natural language sentence after the preposition phrases are extracted. For example, three preposition phrases "with+John", "at+XX Marathon", "in+the first group" are extracted from the natural language sentence having the syntactic structure shown in FIG. 4, after which a noun phrase "Tom" is extracted. The symbol "+" represents a partition between a preposition and a noun phrase in a preposition phrase. Next, it is determined which slot of the case frame has its constraint(s) satisfied by the preposition phrases and noun phrases thus extracted. For example, the determination as to whether or not the noun phrase "Tom" satisfies the constraint "the agent case must be an animal (ANIMATE)" of the case frame (FIG. 5) having "run" as a predicate is conducted as follows: First, it is determined whether "Tom" coincides with "animal" by referring to the image object retrieval dictionary 213. Since "Tom" does not coincide with "animal" on the character string level, it is determined whether "animal" is of an "Upper Class" relation with respect to "Tom" in the following manner: By conducting a retrieval from the image object retrieval dictionary 213 with "Tom" as a keyword, image object retrieval data concerning "Tom" can be obtained. By tracing pointers to other image object retrieval data of an "Upper Class" relation with respect to "Tom", image object retrieval data which are of an "Upper Class" relation with respect to "Tom" are obtained. If any of the image retrieval data thus obtained is related to "animal", then "Tom" is a member of "animal", so that the above-mentioned constraint for the agent is satisfied. Moreover, even in cases where no image object retrieval data concerning "animal" is found by only once tracing the pointers to other image object retrieval data, if image object retrieval data concerning "animal" is found by repeatedly tracing the pointers, then "Tom" is a member of "animal", so that the above-mentioned constraint is satisfied. Furthermore, in cases where a given set of image object retrieval data includes two or more pointers to other image object retrieval data which are of an "Upper Class" relation with respect to that image object retrieval data, and if no image object retrieval data concerning "animal" is found by repeatedly tracing one of the pointers, then the process may go back to the first image object retrieval data and start repeatedly tracing the other pointer. The case element "Tom" is determined not to belong to "animal" (i.e., not to satisfy the above-mentioned constraint) only if no image object retrieval data concerning "animal" is found by examining every image retrieval data traceable from the image object retrieval data concerning "Tom" in the above-mentioned manner. For example, consider the case where "human" is of an "Upper Class" relation with respect to "Tom" and "animal" is of an "Upper Class" relation with respect to "human" (although not shown in FIG. 3). In this case, "animal" is reached by tracing "Upper Class" relations twice from "Tom", so that the above-mentioned constraint "the agent case must be a noun phrase representing an animal" for the agent is satisfied. Next, the other constraints for the agent case are examined, which reveals that "Tom" satisfies the constraint "the agent case must be a noun phrase". Therefore, the noun phrase "Tom" satisfies the constraints for the slot corresponding to the agent case of the predicate "run", so that it is confirmed that the noun phrase "Tom" is an agent of the predicate "run". Similarly, it is examined which slot has its constraint(s) satisfied by each of the preposition phrases "with+John", "at+XX Marathon", and "in+the first group". As a result of the examination, "with+John", "at+XX Marathon", and "in+the first group" satisfy the constraints for the slots corresponding to the participant case, the locative case, and the purpose case, respectively. Thus, the cases of the respective preposition phrases are confirmed. In the above-mentioned manner, the case frame structure representing the semantic structure of the natural language sentence (1') is confirmed as shown in FIG. 6. Referring back to FIG. 5, the situation division data 503 describes knowledge for dividing a situation represented by the case frame structure into at least one situation element. Herein, a situation element refers to a minimum sentence unit that describes a situation. Labels P1, P2, BG1, and BG2 of the situation division data 503 indicate that a situation expressed by the case frame having "run" as a predicate can be divided into two kinds of situation elements corresponding to the foreground and two kinds of situation elements corresponding to the background. Labels P1 and P2 correspond to the foreground, while labels BG1 and BG2 correspond to the background. Each circle shown below labels P1, P2, BG1, and BG2 of the situation division data 503 defines the correlation between the predicate and at least one case element related to the predicate. For example, the two circles shown below label P1 indicate that a combination of the predicate "run" and the case element "AGENT" defines a situation element corresponding to the foreground. This is based on the knowledge that a scene in which the case element "AGENT" is performing the act "run" is regarded as a situation element. The two circles shown below label P2 indicate that a combination of the predicate "run" and the case element "PARTICIPANT" defines a situation element corresponding to the foreground. This is based on the knowledge that a scene in which the case element "PARTICIPANT" is performing the act "run" is regarded as a situation element. The circle shown below label BG1 indicates that the case element "LOCATIVE" defines a situation element corresponding to the background. This is based on the knowledge that a scene representing a location is regarded as a situation element. The circle shown below label BG2 indicates that the case element "PURPOSE" defines a situation element corresponding to the background. This is based on the knowledge that a scene representing a purpose is regarded as a situation element. Thus, according to the situation element division data 503, a predicate and at least one case element related to the predicate are correlated with each other within one case frame. A single case element, or a combination of case elements, defines a situation element. Moreover, situation elements are classified into situation elements corresponding to the foreground and situation elements corresponding to the background by labels P1, P2, BG1, and BG2. Next, the process for dividing a situation represented by the semantic structure of a natural language sentence into at least one situation element by referring to the situation division data 503 so as to generate image retrieval keys corresponding to the situation elements will be described. This process is performed by the situation element division section 205. As described above, the semantic structure of a natural language sentence is expressed by a case frame. Hereinafter, this process will be described with respect to, as an example, the case frame shown in FIG. 6, which expresses the semantic structure of the natural language sentence (1'). For conciseness, the case frame shown in FIG. 5 is referred to as the "dictionary frame", while the case frame shown in FIG. 6 is referred to as the "analysis frame". The dictionary frame is stored in the situation element division knowledge base 210. As described above, in the analysis frame, the value of the case element "AGENT" is confirmed to be "Tom"; the value of the case element "PARTICIPANT" is confirmed to be "John"; the value of the case element "LOCATIVE" is confirmed to be "XX Marathon"; and the value of the case element "PURPOSE" is confirmed to be "the first group". Thus, an analysis frame includes a predicate, case elements related to the predicate, and the confirmed values of the case elements. The above-mentioned analysis frame expresses the meaning of the natural language sentence ('1). In order to define situation elements corresponding to the analysis frame, the predicate "run" and the case elements related to the predicate "run" must be correlated with one another by referring to the situation division data 503 of the dictionary frame shown in FIG. 6. For example, in connection with label P1 of the situation division data 503, a value "Tom run", which is a combination of "run" and "Tom", is defined as a situation element corresponding to the foreground. In connection with label P2 of the situation division data 503, a value "John run", which is a combination of "run" and "John", is defined as a situation element corresponding to the foreground. Similarly, in connection with label BG1 of the situation division data 503, a value "XX Marathon" is defined as a situation element corresponding to the background. In connection with label BG2 of the situation division data 503, a value "the first group" is defined as a situation element corresponding to the background. Thus, the situation represented by the analysis frame shown in FIG. 6 is divided into four situation elements. Each situation element has a semantic structure different from the structure of the dictionary frame or the analysis frame. The semantic structure of a situation element is expressed in the following format: ((case element (value of case element), . . . , case element (value of case element)), "foreground" or "background"); or ((predicate, case element (value of case element), . . . , case element (value of case element)), "foreground" or "background") In the present specification, a situation element expressed in the above format is referred to as "image retrieval key". An image retrieval key represents a partial meaning of an input natural language sentence. Image retrieval keys are used for retrieving image objects from the image database 212, as described later. For example, the following image retrieval keys (2-1) to (2-4) are obtained from the analysis frame shown in FIG. 6. Hereinafter, with reference to an image retrieval key, any data of the form "case element (value of case element)" will be referred to as "case element data". ((run, AGENT (Tom)), "foreground"):(2-1) ((run, PARTICIPANT (John)), "foreground"):(2-2) (RACE (XX Marathon), "background"):(2-3) (* (the first group), "background"):(2-4) The image retrieval section 211 retrieves image objects related to an image retrieval key from the image database 212. FIG. 7 shows a process for retrieving image objects related to an image retrieval key from the image database 212. This retrieval process is performed by the image retrieval section 211. Hereinafter, the retrieval process will be described with respect to each step, with reference to FIG. 7. Step 701: image retrieval keys generated by the situation element division section 205 are input to the image retrieval section 211. An IMAGE OBJECT RETRIEVAL ROUTINE (to be described later) is performed for each of the input image retrieval keys. Step 702: Pairs of each image retrieval key and one or more image objects obtained through the IMAGE OBJECT RETRIEVAL ROUTINE are input to an IMAGE OBJECT SELECTION ROUTINE (to be described later). The IMAGE OBJECT SELECTION ROUTINE is performed for each of the input pairs. Step 703: Among the image objects classified into a background group, those which are included in most pairs are selected. Step 704: Among the image objects classified into a foreground group, those which are included in most pairs are selected. Step 705: The image objects selected in step 703 are output as image objects corresponding to the background. The image objects selected in step 704 are output as image objects corresponding to the foreground. ›IMAGE OBJECT RETRIEVAL ROUTINE! FIG. 8 shows a process performed in the IMAGE OBJECT RETRIEVAL ROUTINE. The IMAGE OBJECT RETRIEVAL ROUTINE is a routine for receiving an image retrieval key and obtaining one or more image objects related to the image retrieval key from the image database 212. Hereinafter, this process will be described with respect to each step, with reference to FIG. 8. Step 801: An image retrieval key is input to the IMAGE OBJECT RETRIEVAL ROUTINE. Step 802: It is determined whether or not a predicate describing an act of a case element is included in the input image retrieval key. Step 803: If a predicate is included in the input image retrieval key, image object retrieval data having that predicate as a keyword is obtained from the retrieval dictionary 213. One or more image objects pointed to by the pointers of the image object retrieval data are obtained. If no predicate is included in the input image retrieval key, step 803 is skipped. Step 804: One of sets of case element data contained in the input image retrieval key is taken out. Image object retrieval data having the value of the case element of the case element data as a keyword is obtained from the image object retrieval dictionary 213. One or more image objects that are pointed to by the pointers of the image object retrieval data are obtained. Step 805: It is determined whether or not a retrieval of image objects has been performed for every case element data contained in the input image retrieval key. If the result is "No", the process goes back to step 802. If the result is "Yes", the process proceeds to step 806. Step 806: One or more image objects commonly included in the image objects obtained based on the predicate and the image objects obtained based on each case element data are selected. The selected one or more image objects are returned as a retrieval result of the IMAGE OBJECT RETRIEVAL ROUTINE. ›IMAGE OBJECT SELECTION ROUTINE! FIG. 9 shows a process performed in the IMAGE OBJECT SELECTION ROUTINE. The IMAGE OBJECT SELECTION ROUTINE is a routine for subjecting one or more image objects selected through the IMAGE OBJECT RETRIEVAL ROUTINE to a further screening. Hereinafter, this process will be described with respect to each step, with reference to FIG. 9. Step 901: A pair of an image retrieval key and one or more image objects selected for the image retrieval key through the IMAGE OBJECT RETRIEVAL ROUTINE is input to the IMAGE OBJECT SELECTION ROUTINE. The pair is classified into either a background group or a foreground group depending on the value of the last term (i.e., "background" or "foreground") included in the input image retrieval key. Step 902: For each of the one or more image objects included in the pair, the information 101 representing the meaning of the whole or part of the pixel data included in the image attribute data of that image object is obtained. As has been described with reference to FIG. 1C, in image attribute data, the information 101 is represented by a natural language sentence. The natural language sentence included in the image attribute data is converted into an image retrieval key by the language processing section 202 and the situation element division section 205. This conversion is achieved by performing the same process as the process for converting a natural language sentence input to the image retrieval apparatus into an image retrieval key. In order to improve the efficiency of the above conversion, it is preferable to convert into image retrieval keys only those natural language sentences of the image attribute data which are classified into the same group that the pair is classified into. In other words, it is preferable to convert only the natural language sentences which belong to the background group into image retrieval keys if the pair belongs to the background group; and it is preferable to convert only the natural language sentences which belong to the foreground group into image retrieval keys if the pair belongs to the foreground group. Whether a given set of image attribute data belongs to the background or the foreground depends on whether or not region data is described in the region section 102 of the image attribute data. If some region data is described in the region section 102 of the image attribute data, that image attribute data belongs to the foreground group. If no region data is described in the region section 102 of the image attribute data, that image attribute data belongs to the background group. Moreover, in order to improve the efficiency of the above conversion, it is preferable, when registering image retrieval data in the image object database 214, to convert the natural language sentence contained in the image retrieval data into an image retrieval key. In this case, the image attribute data of the image objects selected through the IMAGE OBJECT RETRIEVAL ROUTINE have previously converted image retrieval keys. Accordingly, the above conversion process in step 902 can be omitted, thereby enhancing the retrieval speed. Step 903: Similarity between each of the image retrieval keys obtained from the image attribute data and the input image retrieval key is calculated. This calculation is performed by the image retrieval key similarity calculation algorithm shown below, for example. Other algorithms can be adopted, however. ›Image retrieval key similarity calculation algorithm! A score indicating the similarity is initialized at zero points before the calculation of the similarity. One point is added to the score if the predicates of both image retrieval keys coincide with each other. One point is added to the score if any case element data of the image retrieval keys coincide with each other. In cases where the number of the case element data contained in each image retrieval key is one, one point is added to the score if only the values of the case elements of the case element data coincide with each other. In the above algorithm, the similarity can be calculated even more accurately by adding a point equivalent to the correlation (e.g., upper-lower or synonymous correlation) between the predicates and/or case element data of both image retrieval keys even in cases where the predicates and/or case element data of both image retrieval keys do not completely coincide with each other. It can be easily determined whether or not an upper-lower type correlation or synonymous correlation holds by tracing the pointers to other image object retrieval data related to the image object retrieval data, as has been described with reference to FIG. 3. Step 904: Region availability information of image attribute data corresponding to image retrieval keys that acquired a score higher than a predetermined number of points are made available, and region availability information of other image retrieval keys are made not available. Step 905: It is determined whether or not the process from step 902 to step 904 is complete for every pair that has been input. If the result is "No", the process goes back to step 902. If the result is "Yes", image objects containing image attribute data are returned as results of the IMAGE OBJECT SELECTION ROUTINE, and this routine is finished. The display section 215 displays the image objects obtained by the image retrieval section 211 by, if necessary, using region data. The display section 215 also has functions necessary for editing the image objects. FIG. 10 shows a process for receiving the image objects obtained by the image retrieval section 211 and displaying necessary portions of the pixel data of those image objects. This process is performed by the display section 215. Hereinafter, this process will be described with respect to each step, with reference to FIG. 10. Step 1001: Examines whether or not there has been input a plurality of image objects. If the result is "Yes", then each image object is displayed in accordance with the following steps 1003 to 1007. If the result is "No", the process proceeds to step 1002. Step 1002: The entire pixel data of the input image object is displayed. Thereafter, the process proceeds to step 1009. Step 1003: One of the plurality of input image objects that has not been displayed is selected. By referring to the region availability information of the image retrieval data of the selected image object, image attribute data that is capable of being displayed is obtained. Step 1004: Region data is obtained by referring to the region section 102 of the image attribute data. Step 1005: If no region data is described in the region section 102 of the image attribute data, the image object is judged to be related to the background, and the process proceeds to step 1006. If some region data is described in the region section 102 of the image attribute data, the image object is judged to be related to the foreground, and the process proceeds to step 1007. Step 1006: The entire pixel data of the image object is displayed as the background. Step 1007: Only a portion of the pixel data of the image object that corresponds to the region data of the image attribute data is displayed as the foreground. Step 1008: It is determined whether or not the process is complete for every image object that has been input. If the result is "No", the process goes back to step 1003. If the result is "Yes", the process proceeds to step 1009. Step 1009: If necessary, the displayed image objects are edited. Thereafter, the process is finished. Hereinafter, the entire procedure for retrieving image objects by using the image retrieval apparatus according to the present invention will be described in more detail. It is assumed that the above-mentioned natural language sentence (1) is input to the image retrieval apparatus. In addition, the following images 1 and 2 are used as examples of image objects in the description of the retrieval process: image 1: an image corresponding to FIGS. 1A, 1B, and 1C. image 2: an image corresponding to FIGS. 11A, 11B, and 11C. The natural language sentence (1) is input to the image retrieval apparatus by the input section 201. AS the input section 201, a keyboard is typically used. However, it is also applicable to use a tablet or the like instead of a keyboard. The parsing section 204 parses the input natural language sentence (1). The details of the process by the parsing section 204 have been described earlier and therefore are omitted here. The results of the parsing by the parsing section 204 are expressed in the form of a syntax tree shown in FIG. 4, for example. The situation element division section 205 converts the syntactic structure of the input natural language sentence (1) into a semantic structure, and divides a situation represented by the semantic structure of the natural language sentence into at least one situation element, so as to generate at least one image retrieval key corresponding to the at least one situation element. The details of the process by the situation element division section 205 have been described earlier and therefore are omitted here. The results of semantic analysis by the situation element division section 205 are expressed in the form of the analysis frame shown in FIG. 6, for example. In accordance with the analysis frame shown in FIG. 6, four image retrieval keys (2-1) to (2-4) are obtained. The image retrieval keys represent the semantic structure of the respective situation elements. The image retrieval section 211 retrieves image objects from the image database 212 based on the image retrieval keys. The retrieval process includes a process by the IMAGE OBJECT RETRIEVAL ROUTINE and a process by the IMAGE OBJECT SELECTION ROUTINE. The IMAGE OBJECT RETRIEVAL ROUTINE obtains image objects related to the natural language sentence (1) from the image database 212 by using the image retrieval keys (2-1) to (2-4). The IMAGE OBJECT SELECTION ROUTINE subjects the image objects retrieved based on the image retrieval keys and the image attribute data of the image objects obtained through the IMAGE OBJECT RETRIEVAL ROUTINE to a further screening. First, the process by the IMAGE OBJECT RETRIEVAL ROUTINE will be described with respect to a case where the above-mentioned image retrieval key (2-1) is input to the IMAGE OBJECT RETRIEVAL ROUTINE. Since the image retrieval key (2-1) contains the predicate "run", image object retrieval data including the predicate "run" as a keyword are obtained from the image object retrieval dictionary 213 shown in FIG. 3 in accordance with step 803 shown in FIG. 8. The image object retrieval data includes "Image 1" and "Image 2" as pointers to the related image objects, as shown in FIG. 3. In the remaining portion of the present specification, it is assumed, for conciseness, that the only image objects that can be selected by the IMAGE OBJECT RETRIEVAL ROUTINE are the specific image objects shown in FIG. 3. Accordingly, image 1 and image 2 are obtained by tracing the above pointers. Since the image retrieval key (2-1) includes the case element data "Agent (Tom)", image object retrieval data including the case element data value "Tom" as a keyword is obtained from the image object retrieval dictionary 213 shown in FIG. 3, in accordance with step 804 shown in FIG. 8. The image object retrieval data includes "Image 2" and "Image 9" as pointers to the related image objects, as shown in FIG. 3. Accordingly, image 2 and image 9 are obtained by tracing these pointers. Next, image objects commonly included in the image objects obtained based on the predicate "run" (i.e., image 1 and image 2) and the image objects obtained based on the case element data "AGENT (Tom)" (i.e., image 2 and image 9) are selected. As a result, image 2 is selected as the common image object. This selection result is output as a result of the IMAGE OBJECT RETRIEVAL ROUTINE. Similarly, the image retrieval keys (2-2) to (2-4) are consecutively input to the IMAGE OBJECT RETRIEVAL ROUTINE. As a result, the following image objects are selected, in accordance with the input image retrieval key, through the IMAGE OBJECT RETRIEVAL ROUTINE: image retrieval key (2-1): "Image 2" image retrieval key (2-2): "Image 1" and "Image 2" image retrieval key (2-3): "Image 2" and "Image 10" image retrieval key (2-4): "Image 2" and "Image 12" Next, the process by the IMAGE OBJECT RETRIEVAL ROUTINE will be described. As described above, image 2 has been selected for the image retrieval key (2-1), so That a pair consisting of the image retrieval key (2-1) and the image object "Image 2" is input to the IMAGE OBJECT RETRIEVAL ROUTINE. The last term of the image retrieval key (2-1) is "foreground". Therefore, the pair of the image retrieval key (2-1) and the image object "Image 2" is classified into the foreground group, in accordance with step 901 shown in FIG. 9. Similarly, the respective pairs of image retrieval keys and image objects are classified into either the foreground group or the background group as follows: Foreground group: the pair of the image retrieval key (2-1) and the image object "Image 2": (3-1) the pair of the image retrieval key (2-2) and the image objects "Image 1" and "Image 2": (3-2) Background group: the pair of the image retrieval key (2-3) and the image objects "Image 2" and "Image 10": (3-3) the pair of the image retrieval key (2-4) and the image objects "Image 2" and "Image 12": (3-4) The pair (3-1) of the image retrieval key (2-1) and the image object "Image 2" belongs to the foreground. The image object "Image 2" has image attribute data of the structure shown in FIG. 11C. Among the image attribute data of the image object "Image 2", only the three natural language sentences ("John is running", "Tom is running", and "Tom is running a race with John") of the image attribute data corresponding to the foreground are converted into image retrieval keys, in accordance with step 902 shown in FIG. 9. As a result, the three following image retrieval keys are obtained: ((run, AGENT (John)), "foreground"):(4-1) ((run, PARTICIPANT (Tom)), "foreground"):(4-2) (run a race, AGENT (Tom), PARTICIPANT (John), "foreground"):(4-3) The similarity between the image retrieval key (2-1) and each of the three image retrieval keys (4-1) to (4-3) is calculated in accordance with step 903 shown in FIG. 9. The following results are obtained by calculating the respective similarities in accordance with the image retrieval key similarity calculation algorithm: Similarity between the image retrieval key (2-1) and the image retrieval key (4-1): 1 point because the predicates of the image retrieval keys coincide with each other. Similarity between the image retrieval key (2-1) and the image retrieval key (4-2): 2 points because the predicates of the image retrieval keys coincide with each other; the number of the case element data of either image retrieval key is one; and the values of the case element data of the image retrieval keys coincide with each other. Similarity between the image retrieval key (2-1) and the image retrieval key (4-3): 0 points. Now, it is assumed that the region availability information of the image attribute data corresponding to the image retrieval keys which scored points other than zero is made available. In this case, the region availability information of the image attribute data corresponding to the image retrieval key (4-1) (i.e., the image attribute data in the first line of FIG. 11C) and the region availability information of the image attribute data corresponding to the image retrieval key (4-2) (i.e., the image attribute data in the second line of FIG. 11C) are made available in accordance with step 904 shown in FIG. 9. The process from steps 902 to 904 in FIG. 9 is also performed for the pairs (3-2), (3-3), and (3-4). The image object "Image 2" is included in all of the pairs (3-1) to (3-4). The following results are obtained by sorting out the image attribute data whose region availability information is available, among all the image attribute data of the image object "Image 2", in descending order of scores indicating similarities (where the first term in each parenthesis indicates the line number of the corresponding image attribute data shown in FIG. 11C, and the second term in each parenthesis indicates the points of the score indicating similarities): Foreground group: image retrieval key (2-1): (2,2), (1,1) image retrieval key (2-2): (1,2), (2,1) Background group: image retrieval key (2-3): (4,1) image retrieval key (2-4): (4,1) Similarly, the similarities are calculated for the image objects, other than "Image 2", included in each pair. Herein, it is assumed that the similarities of the image objects, other than "Image 2", included in each pair are zero for conciseness. Among the image objects classified into the background group, those which are included in the most pairs are selected in accordance with step 703 shown in FIG. 7. In this exemplary case, the image object "Image 2" is the one that is included in the most pairs among those classified into the background group because the image object "Image 2" is included in both pairs (3-3) and (3-4) classified into the background group. Accordingly, the image object "Image 2" is selected in step 703 shown in FIG. 7. Among the image objects classified into the foreground group, those which are included in the most pairs are selected in accordance with step 704 shown in FIG. 7. In this exemplary case, the image object "Image 2" is the one that is included in the most pairs among those classified into the foreground group because the image object "Image 2" is included in both pairs (3-1) and (3-2) classified into the foreground group. Accordingly, the image object "Image 2" is selected in step 704 shown in FIG. 7. The image object "Image 2" is output as an image object corresponding to the background, and the image object "Image 2" is also output as an image object corresponding to the foreground in accordance with step 705 shown in FIG. 7. These outputs are the retrieval results, for the image retrieval keys (2-1) to (2-4), of the image retrieval section 211. The display section 215 displays the image objects that have been retrieved by the image retrieval section 211. In this exemplary case, the image object "Image 2" is the only image object that has been retrieved by the image retrieval section 211. Accordingly, the entire pixel data of the image object "Image 2" (i.e., the pixel data of FIG. 11A) is displayed by the display section 215 in accordance with steps 1001 and 1002 shown in FIG. 10. The above-described image retrieval process corresponds to a case where an image object that completely coincides with the content of the input natural language sentence is found. However, in accordance with the image retrieval process of the present invention, it is possible to retrieve a plurality of image objects that are necessary for expressing the content of the input natural language sentence even in cases where no image object that completely coincides with the content of the input natural language sentence is found. For example, a case will be described where image objects are retrieved from the image database 212 based on the following natural language sentence (5): "John is running with Tom at .largecircle..largecircle. Park": (5) The natural language sentence (5) is input to the image retrieval apparatus by the input section 201. An analysis frame shown in FIG. 12 representing the semantic structure of the input natural language sentence (5) is obtained by means of the parsing section 204 and the situation element division section 205. Next, the situation element division section 205 divides a situation represented by the analysis frame (FIG. 12) into three situation elements, so as to generate the following image retrieval keys corresponding to the three situation elements: ((run, AGENT (John)), "foreground"): (6-1) ((run, PARTICIPANT (Tom)), "foreground"): (6-2) (LOCATIVE (.largecircle..largecircle. Park), "background"): (6-3) The image retrieval section 211 performs an image retrieval process through the IMAGE OBJECT RETRIEVAL ROUTINE and a process by the IMAGE OBJECT SELECTION ROUTINE. In the IMAGE OBJECT RETRIEVAL ROUTINE, image objects related to the input natural language sentence (5) are obtained from the image database 212 based on the image retrieval keys (6-1) to (6-3). In the IMAGE OBJECT SELECTION ROUTINE, pairs of image retrieval keys and image objects are classified into either the background group or the foreground group. The results will be as follows, for example: Foreground group: the pair of the image retrieval key (6-1) and the image objects "Image 1" and "image 2":(7-1) the pair of the image retrieval key (6-2) and the image objects "Image 2" and "Image 9":(7-2) Background group: the pair of the image retrieval key (6-3) and the image objects "Image 1" and "Image 6":(7-3) With respect to the image object "Image 1", it is assumed that, as a result of the above-mentioned similarity calculation, the value of the region availability information of the image attribute data 103 shown in the first line of FIG. 1C, which corresponds to the image retrieval key (6-1) is made available, and that the value of the region availability information of the image attribute data 107 shown in the eighth line of FIG. 1C, which corresponds to the image retrieval key (6-3) is made available. With respect to the image object "Image 2", it is assumed that the value of the region availability information of the image attribute data shown in the second line of FIG. 11C, which corresponds to the image retrieval key (6-2) is made available. It is assumed that the values of region availability information of the other image retrieval data are made not available. The image objects "Image 1" and "Image 6" are selected as image objects corresponding to the background in accordance with step 703 shown in FIG. 7, and the image object "Image 2" is selected as an image object corresponding to the foreground in accordance with step 704 shown in FIG. 7. The image objects "Image 1" and "Image 6" are output as image objects corresponding to the background, and the image object "Image 2" is also output as an image object corresponding to the foreground in accordance with step 705 shown in FIG. 7. These outputs are the retrieval results, for the image retrieval keys (6-1) to (6-4), of the image retrieval section 211. In this exemplary case, the image objects "Image 1", "Image 6", and "Image 2" are the image objects that have been retrieved by the image retrieval section 211. Accordingly, the entire pixel data of the image object "Image 1" (i.e., the pixel data of FIG. 1A) is displayed by the display section 215 in accordance with steps 1005 and 1006 shown in FIG. 10. Moreover, the pixel data of region X.sub.2 corresponding to the image attribute data of the second line of FIG. 11C of the image object "Image 2" (i.e., the pixel data of region X.sub.2 of FIG. 11B) is displayed by the display section 215 in accordance with steps 1005 and 1007 shown in FIG. 10. Since the region availability information of the image attribute data contained in the image object "Image 6" is all not available, the inputting of the image object "Image 6" to the display section 215 may be omitted. The image objects displayed by the display section 215 are edited if necessary. A combination of the pixel data and further pixel data is a typical example of such editing. FIG. 13 shows an example of a combination result of pixel data of FIG. 1A and the pixel data of region X.sub.2 of FIG. 11B. The image retrieval apparatus according to the present invention utilizes a natural language as an inquiry language for an image database, as opposed to the conventional retrieval methods using keywords. This makes it possible to accurately convey the meaning of a desired image to the image retrieval apparatus. The present invention has a particular significance in that, even if a user does not know the content of the desired image, the user can still retrieve the image from the image database. Moreover, according to the present invention, each image object has a semantic structure representing the whole or part of an image. The meaning of the entire image corresponds to a situation represented by the entire image. The meaning of a part of the image (i.e. a region included in the image) corresponds to a situation element(s) constituting the situation. In the image retrieval process according to the present invention, the syntactic structure of an input natural language sentence is converted into a semantic structure. The situation represented by the semantic structure of the natural language sentence is divided into at least one situation element, and an image retrieval key corresponding to the at least one situation element is defined. By referring to the respective situation elements corresponding to image objects based on the image retrieval key, image objects having meanings most similar to the meanings of the image retrieval keys are retrieved from the image object database. Accordingly, image objects can be retrieved for every situation element representing partial meanings of the input natural language sentence even in cases where no image object that completely coincides with the meaning of the input natural language sentence is present in the image object database. This facilitates displaying of combined image objects and editing such image objects. As a result, the user can easily obtain the desired image. Various other modifications will be apparent to and can be readily made by those skilled in the art without departing from the scope and spirit of this invention. Accordingly, it is not intended that the scope of the claims appended hereto be limited to the description as set forth herein, but rather that the claims be broadly construed.
|
Same subclass Same class Consider this |
||||||||||
