Video database indexing and query method and system5819286Abstract A video indexing and query execution system includes a processor which indexes video clips by: (a) identifying each symbol of one or more graphical icons in each frame of each video clip, (b) determining the horizontal, vertical and temporal coordinates of each symbol of the identified graphical icons, and (c) constructing a database for each identified symbol of the graphical icons. The processor converts a video query from graphical form to string form by: (a) receiving a video query specifying the vertical, horizontal and temporal coordinates of a graphical icon to be matched in at least one frame to be retrieved, and (b) constructing a normal 3-D string from the video query indicating the distance between each symbol of each icon in the video query in each direction. The processor also executes a video query on a video database by: (a) identifying only those video clips of the database whose signatures contain the signature of the executed video query, (b) for each of the identified video clips: (b1) constructing a 1-D list for each of the horizontal, vertical and temporal directions, comprising a plurality of sets of symbols of icons contained in video query, each set containing a permutation of symbols of the icons which satisfy the video query in the respective direction of the 1-D list, and (b2) forming the intersection of the three 1-D lists, and (c) identifying the portions of the video clips, indicated by a corresponding set contained in an intersection set of at least one of the identified video clips, as satisfying the video query. Claims The claimed invention is: Description FIELD OF THE INVENTION
TABLE 1
______________________________________
Video OID Icon OID Symbol OID
Position
______________________________________
V001 A a1 (2, 3, 1)
V001 B b1 (1, 4, 1)
V001 C c1 (3, 4, 1)
V001 D d1 (5, 1, 2)
V001 D d2 (5, 3, 2)
V001 D d3 (3, 2, 2)
______________________________________
Each row of the index base table contains information of a symbol identified in step 220 but not discarded in step 230. In Table 1 "OID" stands for object identifier. The video OID attribute (or column) indicates the identifier of the clip in which the symbol is located. Illustratively, the same symbol may occur in more than one clip and may be entered in the video index base once for each clip (step 230 is performed on a clip-by-clip basis). The icon OID attribute indicates the identifier of the icon of the symbol. Likewise, the symbol OID attribute indicates the identifier of the symbol. Finally, the position attribute indicates the (x,y,t) coordinates in the horizontal (x), vertical (y) and temporal (t) directions. The temporal coordinate t illustratively is simply the frame number, e.g., in order of play, of the frame containing the symbol. As an example, consider the symbol a1 of clip V001. This symbol is contained in frame 1 and frame 2 as shown in FIG. 7. As shown in FIG. 8, the identification of symbol a1 in frame 2 is discarded. The symbol a1 is located in the second column, third row and first frame. Therefore the symbol has (x,y,t) coordinates (2,3,1). Illustratively, the processor 110 stores the video index base thus generated in main memory 120 or disk memory 130. The processor 110 may perform step 250 contemporaneously with step 240. In step 240, the processor 110 forms the signature of each video clip. The signature is the set of all icons which occur in at least one frame of the clip (i.e., all icons of which at least one symbol is contained in at least one frame of the clip). In the case of the video clip shown in FIG. 6, the signature is the set {A,B,D,C}. Illustratively, the processor 110 stores the video clip signatures in main memory 120 or disk memory 130. Steps 270-300 are for transforming video queries into suitable strings for query execution. In step 270, the processor 110 receives the video query. Illustratively, the video query is specified in graphical form. The video query may be generated using the manual input device 160 and display device 150. This is illustrated in FIGS. 9 and 10. FIGS. 9 and 10 illustrate part of an image displayed on the display device 150 during query generation. As shown, the image is in the form of a control panel with a time diagram 410, frame window 420, icon list 430 and "query type" indicators 440 (see Table 2 discussed below). The time diagram 410 is a "meter" which graphically indicates the relative position of the frame displayed in the frame window in the sequence of frames that make up a clip. The frame window 420 displays a single query frame which is divided into two-dimensional rectangular regions. Using, for example, a pointer device, the user may select copies of icons in the icon list 430 and position them in particular rectangular regions of the frame window. (Note that each positioned "copy" of an icon is a symbol.) The icon list 430 may be a predefined list of icons which are contained in the video clips of the video database. The user may input the "query type" at particular vertical, horizontal and temporal coordinates using the "query type" indicators 440. The placement of symbols in the frame window 420 designates criteria of a particular frame of the video query. Naturally, the user can designate criteria for multiple query frames. For instance, FIG. 9 illustrates the placement of symbols in a first frame t=1 of the video query and FIG. 10 illustrates the placement of symbols in a second subsequent frame t=2 of the video query. Thus, the video query designates criteria, namely, symbols at particular vertical and horizontal coordinates, in each of two successive frames with temporal coordinates t=1 and t=2, respectively. After receiving the video query, the processor 110 executes step 270. In step 270, the processor 110 receives a matching criterion for purposes of determining how to compare the video query symbols to the video database. Four types of matching criterion may be specified, namely, "containment," "precedent," "equal," and "distance between 2 symbols." The "containment" criterion requires that the symbols of the video query merely be contained in the matching video. That is, the relative position of each symbol in video clip does not matter. Typically, if this is the matching criterion, then the time coordinates of the symbols are not utilized in the query execution discussed below. The "precedent" criterion requires that the symbols of the matching video must exhibit the same precedential relationship as in the video query, although the distance between the symbols in the matching video need not be strictly the same as in the video query. The "equal" criterion requires that the symbols specified in the query must also appear in the matching video such that each symbol has an identical coordinate value (for one or more coordinate directions specified in the query) vis-a-vis the matching video. Note that the specified coordinate directions of the symbols in the query may have different coordinate values than the symbols in the matching video. For example, suppose the video query specifies a bird icon at horizontal coordinate x=4, a pig icon at horizontal coordinate x=4 and specifies the equal criterion (for the specified coordinate direction horizontal). The video with a pig icon at horizontal coordinate x=4 and a bird icon at an identical horizontal coordinate x=4 matches the query. However, the video with a pig icon at horizontal coordinate x=3 and a bird at horizontal coordinate x=3 also matches the query since the horizontal coordinates of the pig and bird are identical within the matching video (even though they are different from the horizontal coordinate x=4 in the query). The "distance between 2 symbols" criterion requires that the "rank" between the symbols in the matching video must be precisely the same as in the video query. In other words, the "distance between two symbols" criterion requires that symbols must precede each other in the matching video identically as in the video query--no extraneous intervening symbols may be present in the identified video clip frames between each adjacent pair of symbols specified by the video query. As discussed in greater detail below, 3-D strings are used to represent the video query. Different types of 3-D strings may be constructed for each video query. The type of 3-D string constructed depends on the matching criterion specified in step 270. This is summarized in Table 2 below:
TABLE 2
______________________________________
Matching criterion
Type-0 Type-1 Type-2
Type-3
______________________________________
Containment .check mark.
.check mark.
.check mark.
.check mark.
Precedent X .check mark.
.check mark.
.check mark.
Equal X X .check mark.
.check mark.
Distance between 2 symbols
X X X .check mark.
______________________________________
Above, ".check mark." means that a 3-D string of the type of the respective column can be used to perform the kind matching. An "X" indicates that a 3-D string of the type of the respective column is not sufficient to perform the kind of matching. Thus, a type-1 3-D string can be used to perform "containment" or "precedent" matching but not "equal" or "distance between 2 symbols" matching. Next the processor 110 executes step 280. In step 280, the processor 110 builds a normal 3-D string representation of the video query. A normal 3-D string is a string (X,Y,T) where X is a normal 1-D string in the horizontal direction, Y is a normal 1-D string in the vertical direction and T is a normal 1-D string in the temporal direction. Each 1-D string is a sequence of alternating icons and relational operators "?", ".vertline.", ".ident." and "" indicating the relative order of precedence between adjacent pairs of symbols of the icons in the corresponding video query frames. "?" indicates that the order of precedence is undefined. "" indicates that the symbol to the left precedes the symbol to the right by an unspecified distance. ".ident." indicates that the symbol to the left and right are at an equal distance. ".vertline..sub.n " indicates that the symbol to the left precedes the symbol to the right by a distance of n. Illustratively, only the operators ".vertline." and ".ident." which indicate a precise distance between symbols, are permitted in the normal 1-D string formed in step 280. Consider the video query depicted in the frame windows 420 of FIGS. 9 and 10. The strings X, Y and T for this video query are: X:B.vertline..sub.1 A.vertline..sub.1 C.ident.D.vertline..sub.1 D Y:D.vertline..sub.1 D.vertline..sub.1 A.vertline..sub.1 B.ident.C T:A.ident.B.ident.C.vertline..sub.1 D.ident.D (X,Y,T):(B.vertline..sub.1 A.vertline..sub.1 C.ident.D.vertline..sub.1 D, D.vertline..sub.1 D.vertline..sub.1 A.vertline..sub.1 B.ident.C, A.ident.B.ident.C.vertline..sub.1 D.ident.D) The processor 110 illustratively stores the normal 3-D string in the main memory 120 or disk memory 130. Next the processor 110 executes step 290. In step 290, the processor 110 converts the normal 3-D string of the video query to an extended 3-D string of a particular query type, i.e., type-0, type-1, type-2 or type-3. The conversion process accords with the matching criterion specified in step 270. Illustratively, the processor 110 uses table 3 to effect the conversion.
TABLE 3
__________________________________________________________________________
Normal string
Type-0 string
Type-1 string
Type-2 string
Type-3 string
indicator
indicator
indicator
indicator
indicator
__________________________________________________________________________
a .vertline..sub.n b
a ? b a b a b a .vertline..sub.n b
a .tbd. b
a ? b a ? b a .tbd. b
a .tbd. b
__________________________________________________________________________
Continuing with the above example, suppose the equal matching criteria is specified. Thus, a type-2 3-D string must be constructed. The string (X,Y,T) above would therefore be converted to: (X',Y',T'):(BAC.ident.DD, DDAB.ident.C, A.ident.B.ident.CD.ident.D) The processor 110 illustratively stores the extended 3-D string in the main memory 120 or disk memory 130. After performing step 290, the processor 110 performs step 300. In step 300, the processor 110 forms the video query signature which is the set containing the icon of which at least one symbol is present in the video query. In the example above, the video query signature would be {A,B,C,D}. Illustratively, the processor 110 stores the video query signature in the main memory 120 or disk memory 130. The execution of the video query is performed in steps 310-330. The processor 110 may execute these steps after executing step 300. In step 310, the processor 110 compares the video query signature to the signature of each video clip in the video database. (To do so, the processor 110 illustratively retrieves the video query signature and video clip signatures from the main memory 120 or disk memory 130.) In particular, the processor 110 determines if the video query signature is a subset of the video signature of the clip. If so, then there is a possibility that the video clip can satisfy the video query. If the video query signature is not a subset of the video clip signature, then the video query signature contains at least one icon not also contained in the video clip. Such a video clip cannot possibly satisfy the video query. The processor 110 forms a set containing each clip which the processor 110 determines can possibly satisfy the video query as a result of this comparison. For example, consider the video index base shown in FIG. 11. The signature of clip V001 is {A,B,C,D} and the signature of clip V002 is {A,B,C,D,E}. Suppose the video query has the extended 3-D string (X,Y,T) where: X:A.ident.BCD.ident.E Y:EDCB.ident.A T:AB.ident.C.ident.ED The signature of the video query is {A,B,C,D,E}. The video query signature is a subset of the signature of clip V002 but not a subset of the clip V001. Therefore, no subsequence of frames of clip V001 can satisfy the video query. However, it is possible that a subsequence of frames of clip V002 satisfies the video query. Next, the processor 110 executes step 320. In step 320, the processor 110 constructs a 1-D list in the horizontal direction, a 1-D list in the vertical direction and a 1-D list in the temporal direction for each clip in the set of clips that can possibly satisfy the video query. This is achieved as follows. Consider as an example the horizontal (X) direction. The processor 110 utilizes the video index base to construct a sequence of sets of symbols of a video clip. Each set of symbols contains all of the symbols of a video clip of a particular corresponding icon (as indicated in the index base). One set of symbols is provided which corresponds to each icon of the 1-D string X (in the horizontal direction) of the video query. This is illustrated in FIG. 12. As shown, for the video clip V002, five sets 501, 502, 503, 504 and 505 of symbols are constructed including, a set 501 of A icon symbols {a1, a2, a3, a4} a set 502 of B icon symbols {b1, b2, b3}, a set 503 of C icon symbols {c1, c2, c3, c4, c5, c6}, a set 504 of D icon symbols {d1, d2, d3, d4, d5, d6} and a set 505 of E icon symbols {e1, e2, e3, e4}. The sets 501-505 are arranged in sequence in the same order as the icons appear in the respective extended 1-D string of the video query 555, i.e., {a1, a2, a3, a4},{b1, b2, b3},{c1, c2, c3, c4, c5, c6},{d1, d2, d3, d4, d5, d6},{e1, e2, e3, e4}. Next, the processor 110 determines the rank of each symbol in the 1-D list 500 of the video clip as follows. Consider that the resolution in each direction of the video query need not be the same as that used to index the video clips. For example, the query resolution is R.sub.x =9 in the horizontal direction, R.sub.y =9 in the vertical direction and R.sub.t =9 in the temporal direction (where R.sub.t is the number of frames of the video query). On the other hand, the resolution used to index the video clip may be x.sub.max =1024 in the horizontal direction, y.sub.max =768 in the vertical direction and t.sub.max =7 in the temporal direction (where t.sub.max is the total number of frames in the video clip). If a symbol of the video clip has the position (x,y,t) then the rank r.sub.x, r.sub.y, r.sub.t in each of the X, Y and T direction is given by: ##EQU1## where .left brkt-top.z.right brkt-top. denotes the ceiling of, or largest integer that is less than or equal to, z. The processor 110 determines the rank of each symbol by retrieving the appropriate position information of each symbol from the index base. Next, the processor 110 sequentially orders (e.g., sorts) the symbols in each of the sets depending on their rank in the direction of the 1-D list (horizontal, vertical or temporal). FIG. 13 illustrates the sequential ordering of the symbols of the horizontal 1-D list 500. For example, a.sub.3 has lower rank r.sub.x (a.sub.3 =3) than a.sub.2 (r.sub.x (a.sub.2)=4) but a higher rank than a.sub.1 (r.sub.x (a.sub.1)=2) and is therefore ordered after a.sub.1 but before a.sub.2. This produces the ordered sequences 506 {a.sub.1,a.sub.3,a.sub.2,a.sub.4 }, 507 {b.sub.1,b.sub.2,b.sub.3 }, 508 {c.sub.1,c.sub.3,c.sub.4,c.sub.2,c.sub.6,c.sub.5 }, 509 {d.sub.1,d.sub.5,d.sub.2,d.sub.3,d.sub.4,d.sub.6 } and 510 {e.sub.1, e.sub.3, e.sub.2, e.sub.4 }. Next, the processor 110 segregates each sequence 506-510 of the 1-D list 500 into one or more classes of equivalently ranked symbols. The following rules are applied to determine how to segregate symbols into equivalent classes: Case 1: If, in the extended 1-D string of the video query 555, the icon corresponding to the respective sequence is preceded by the operator "" or "?" and followed by the operator "" or "?" then all symbols in the respective sequence are in the same equivalent class Case 2: Otherwise, those symbols with equal ranks are in an equivalent class; those symbols with different ranks are in different classes. For purposes of applying these rules, the 1-D extended string of the video query X is presumed to begin and end with the "" operators as follows: X:A.ident.BCD.ident.E FIG. 14 illustrates the segregation of the sequences 506-510 into classes 511-525. Consider first the sequence 506 for the icon A. The icon A is preceded by the operator "" but followed by the operator ".ident." in the extended 1-D string of the video query 555. Thus, case 1 does not apply but case 2 does. One class is formed for each different rank of the symbols in the sequence 506. Since all symbols are of different ranks, each symbol is in a different class 511, 512, 513 or 514. The class 511 is the set of {a.sub.4 }. The class 512 is the set of {a.sub.2). The class 513 is the set of {a.sub.3 }. The class 514 is the set of {a.sub.1 }. Now consider the sequence 507 corresponding to the icon B. In the extended 1-D string of the video query 555, icon B is followed by the operator "" but preceded by the operator ".ident.". Therefore, case 1 does not apply but case 2 does. One class is provided for each different rank. The sequence 507 has two symbols, namely, b.sub.2 and b.sub.3 which have the same rank (i.e., both have rank 4). These symbols are in the same equivalent class 516, i.e., the set of {b.sub.2,b.sub.3 }. The symbol b.sub.1 is placed in a separate class 515 which is the set of {b.sub.1 }. Now consider the sequence 508 for the icon C. The icon C is both preceded and followed by the operator "" in the extended 1-D string of the video query 555. Thus, case 1 applies. One equivalent rank class 517 is provided for all the symbols. The class 517 is thus the set of {c.sub.1,c.sub.3,c.sub.4,c.sub.2,c.sub.6,c.sub.5 }. The rules are applied in this fashion to construct classes 518-525 in a like fashion. Note that each class 511-525 is a set of symbols that are sequentially ordered according to their rank. Each class 511-525 is therefore a subsequence of the symbols of the sequence 506-510. After segregating the sequences 506-510 into classes 511-525, the processor 110 links adjacent symbols in each class. The link is directed from a first symbol of the class, e.g., the symbol c.sub.4 of the class 517, to the next symbol in the class according to the sequential ordering within the class, e.g., the symbol c.sub.2. This is shown in FIG. 14 by arrows. Next, the processor 110 links symbols associated with a first icon with symbols associated with a following, adjacent icon. This is illustrated in FIG. 15. Illustratively, this is achieved by applying the following linking rules for each class of symbols 511-525. In the rules below, the following notation is used: G.sub.a is the a.sup.th icon in the 1-D extended string 555 of the video query, G.sub.b is the b.sup.th icon in the 1-D extended string 555 of the video query, where b=a+1, G.sub.a,p designates a p.sup.th class of G.sub.a, 1.ltoreq.p.ltoreq.last class index of G.sub.a, G.sub.b,k designates a k.sup.th class of G.sub.b, 1.ltoreq.k.ltoreq.last class index of G.sub.b, s.sub.i is any symbol in G.sub.a,1, the first (lowest ranked) class of G.sub.a, s.sub.i+1 is the symbol which follows s.sub.i in G.sub.a,1, s.sub.imax is the last symbol in the p.sup.th class G.sub.a,p, S.sub.j is any symbol in G.sub.b,1, the first (lowest ranked) class of G.sub.b, s.sub.j-1 is the symbol which precedes s.sub.j in G.sub.b,1, s.sub.jmin is the first symbol in the k.sup.th class G.sub.b,k, r(s) is the rank of s, a.sub.1 is the operator which precedes G.sub.a in the extended 1-D string of the video query 555, a.sub.2 is the operator between G.sub.a and G.sub.b in the 1-D extended string of the video query 555, and a.sub.3 is the operator which follows G.sub.b in the 1-D extended string of the video query 555. The linking rules are as follows: (i) s.sub.i is linked with s.sub.j, if: (1) a.sub.1 is or ?, (2) a.sub.2 is , (3) a.sub.3 is or ?, (4) r(s.sub.i)<r(s.sub.j), (5) r(s.sub.i+1).gtoreq.r(s.sub.j), and (6) r(s.sub.i).gtoreq.r(s.sub.j-1). (ii) s.sub.i is linked with s.sub.jmin if: (1) a.sub.1 is or ?, (2) a.sub.2 is , (3) a.sub.3 is .ident. or .vertline..sub.n, (4) r(s.sub.i)<r(s.sub.jmin), and (5) r(s.sub.i+1).gtoreq.r(s.sub.jmin). (iii) s.sub.imax is linked with s.sub.j if: (1) a.sub.1 is .ident. or .vertline..sub.n, (2) a.sub.2 is , (3) a.sub.3 is or ?, (4) r(s.sub.imax)<r(s.sub.j), and (5) r(s.sub.imax).gtoreq.r(s.sub.j-1) (iv) s.sub.imax is linked with s.sub.jmin if: (1) a.sub.1 is .ident. or .vertline..sub.n, (2) a.sub.2 is , (3) a.sub.3 is .ident. or .vertline..sub.n, (4) r(s.sub.imax)<r(s.sub.jmin). (v) s.sub.imax is linked with s.sub.jmin if: (1) a.sub.2 is .ident., and (2) r(s.sub.imax)=r(s.sub.jmin). (vi) s.sub.imax is linked with s.sub.jmin if: (1) a.sub.2 is .vertline..sub.n, and (2) r(s.sub.jmin)-r(s.sub.imax)=n. (vii) s.sub.imax is linked with s.sub.jmin if: (1) a.sub.2 is ?. The link is always directed from symbol s.sub.i or s.sub.imax associated with G.sub.a, to symbol s.sub.j or s.sub.jmin associated with G.sub.b, wherein G.sub.a precedes G.sub.b in the extended 1-D string of the video query. As before, for purposes of applying these rules, the 1-D string is presumed to begin and end with the operator "". Consider first the class 511. The icon of this a=1.sup.st class G.sub.1 is "A". The adjacent b=2.sup.nd icon G.sub.2, in the 1-D extended string of the video query 555 (corresponding to the symbols with which symbols of class 511 may be linked), is B. In this case, a.sub.1 ="", a.sub.2 =".ident." and a.sub.3 ="". Rules (i)-(iv) do not apply because conditions (i)(2), (ii)(2), (iii)(2) and (iv)(2) (wherein a.sub.2 =) are not satisfied. Rule (vi) does not apply because condition (vi)(1) a.sub.2 is ? is not satisfied. Rule (vii) does not apply because condition (vi) (1) a.sub.2 is ? is not satisfied. However, rule (v) may apply. According to rule (v), the last symbol in class 511 may be linked with the first symbol in one of the classes 515 or 516 if the rank of these two symbols are equal, as per condition (v)(2). The rank of the last symbol in class 511 r.sub.x (a.sub.4)=5. The rank of the first symbol in class 515 r.sub.x (b.sub.1)=2 and the rank of the first symbol in class 516 r.sub.x (b.sub.2)=4. Since condition (v)(2) cannot be satisfied, no symbol in class 511 is linked to a symbol in classes 515 or 516. Consider now class 512. As before, only rule (v) may possibly apply (because G.sub.1 ="A", G.sub.2 ="B", a.sub.1 ="", a.sub.2 =".ident." and a.sub.3 =""). In this case r.sub.x (a.sub.2)=4 is the same as r.sub.x (b.sub.2). Thus, the conditions (v)(1)-(2) are satisfied. As such, symbol a.sub.2 is linked with symbol b.sub.2. The link is directed from the symbol a.sub.2 to the symbol b.sub.2 because the symbol a.sub.2 is associated with the icon A which precedes the icon B, associated with the other linked symbol b.sub.2, in the extended 1-D string of the video query 555. The application of the above rules to class 513 is identical to that of class 511. The application of the rules to class 514 is identical to that of class 512. In this case, the symbol a.sub.1 is linked to symbol b.sub.1 because r.sub.x (a.sub.1)=r.sub.x (b.sub.1)=2. The symbol is directed from the symbol a.sub.2 to the symbol b.sub.2 because the icon A, associated with the symbol a.sub.2, precedes the icon B, associated with the symbol b.sub.2 in the extended 1-D list of the video query 555. The linking of symbols in classes 515 and 516 to symbols in class 517 is now described. For both classes 515, 516, G.sub.2 ="B" and G.sub.3 ="C" (where a=2 and b=3), a.sub.1 =".ident.", a.sub.2 ="" and a.sub.3 ="". Only rule (iii) can possibly apply. Consider first the case of class 515 in which the last symbol is b.sub.1. Rule (iii) can apply if two conditions are true: the rank of the symbol b.sub.1 is less than the rank of any given symbol in class 517, e.g., r.sub.x (c.sub.3), but greater than or equal to the rank of the symbol which precedes the given symbol in class 517, e.g., r.sub.x (c.sub.1). The symbol c.sub.3 has a rank r.sub.x (c.sub.3)=3 which is greater than the rank r.sub.x (b.sub.1)=2. Furthermore, the symbol c.sub.1 which precedes c.sub.3 has a rank r.sub.x (c.sub.1)=2 which is equal to the rank of symbol b.sub.1. Therefore, the processor 110 links symbols b.sub.1 and c.sub.3. The link is directed from b.sub.1 to c.sub.3 because icon B, corresponding to symbol b.sub.1, precedes icon C, corresponding to symbol c.sub.3, in the extended 1-D string of the video query 555. The application of the rules is very similar for class 516 in that rule (iii) applies. In this case, the last symbol b.sub.3 of the class 516 is linked with the symbol c.sub.2. Consider now the application of the rules for linking the symbols of class 517 with the symbol of classes 518-522. In this case G.sub.3 =C, G.sub.4 =D (for a=3 and b=4), a.sub.1 ="", a.sub.2 ="" and a.sub.3 =".ident.". Rule (ii) can possibly apply to the symbols of class 517. Rule (ii) can apply to any symbol in class 517 but only to the first symbol of the classes 518, 519, 520, 521 or 522. For rule (ii) to apply to a symbol in class 517, e.g., symbol c.sub.3, and a first symbol of one of the classes, e.g., the symbol d.sub.1 of class 518, the rank of the symbol c.sub.3 must be less than the rank of the symbol d.sub.1 and the rank of the symbol in class 517 following c.sub.3, namely, symbol c.sub.4, must be greater than or equal to the rank of the symbol d.sub.1. Since this is true, the symbol c.sub.3 is linked to the symbol d.sub.1. Note that r.sub.x (c.sub.1)<r.sub.x (d.sub.1). However, r.sub.x (c.sub.3) (the symbol which follows c.sub.1) is not greater than or equal to r.sub.x (d.sub.1). Therefore, symbol c.sub.1 is not linked to d.sub.1. Note also that rule (ii) is satisfied for linking symbol c.sub.6 to symbol d.sub.4 and to symbol d.sub.6. Finally, note that symbol c.sub.5 cannot satisfy rule (ii). Rule (v) may then be applied for linking the symbols in classes 518-522 with the symbols in classes 523-525. After linking the symbols in the classes 511-525, the processor 110 then discards certain symbols from the 1-D list of the video clip 500. In particular, the processor 110 discards all symbols not on a continuously directed link from a symbol in the sequence 506 associated with the first icon A to a symbol in the sequence 510 associated with the last icon E, of the 1-D extended string of the video query 555. This is illustrated in comparing FIGS. 15 and 16. An example of a continuously directed link is: a.sub.1 .fwdarw.b.sub.1 .fwdarw.c.sub.3 .fwdarw.c.sub.4 .fwdarw.c.sub.2 .fwdarw.d.sub.3 .fwdarw.e.sub.2 Note that symbols a.sub.3, a.sub.4, c.sub.1, c.sub.5, c.sub.6, d.sub.2, d.sub.4, d.sub.6 and e.sub.4 are not on a continuously directed link. Symbols c.sub.1 and e.sub.4 cannot be reached by traversing the links of the 1-D list of the video clip 500 (in the direction indicated by the links) from a symbol associated with the first icon A of the 1-D extended string of the video query. On the other hand, it is not possible to traverse links in the 1-D list of the video clip 500 from the symbols a.sub.3, a.sub.4, c.sub.5, c.sub.6, d.sub.2, d.sub.4 and d.sub.6 to a symbol associated with the last icon E, of the 1-D extended string of the video query 555. After producing the extended 1-D list 500, the processor 110 executes step 330. In step 330, the processor 110 determines which subsequence of frames of the video clip satisfy the video query. This is achieved by forming each possible permutation of symbols from each 1-D list of the video clip (i.e., from the 1-D list in the horizontal direction, the 1-D list in the vertical direction and the 1-D list in the temporal direction). One set of permutations (wherein each permutation is a set, itself) is formed for each of the 1-D lists of the video clip. This is illustrated in connection with FIG. 16 for the 1-D list of the video clip 500 in the horizontal (X) direction. Each permutation of symbols includes one symbol corresponding to each icon of the extended 1-D string of the video query 555. However, the selection of symbols is governed by the following rules: (1) The first symbol of the permutation (corresponding to the first icon A of the extended 1-D string of the video clip 555) may be selected from any one of the classes 512, 514 in the extended 1-D list of the video clip 500. (2) Each permutation symbol other than the first permutation symbol is selected by the following rule. The next symbol of the permutation, in the next class (associated with the next icon in the extended 1-D string of the video query 555) must be on a continuously linked path of the extended 1-D list of the video clip 500 extending from the current selected symbol (in a current class associated with the current icon in the extended 1-D string of the video query). The following are possibilities for selecting the next symbol: (a) The current symbol is directly linked to the next symbol. For example, if the current symbol is a.sub.1, the next symbol can be b.sub.1 which is directly linked to the symbol a.sub.1. (b) A symbol which follows the current symbol in the current class is directly linked to the next symbol. For example, if the current symbol is b.sub.2, the next symbol can be c.sub.2 which is directly linked to the symbol b.sub.3 which follows the current symbol b.sub.2 in the current class 516. (c) The current symbol is directly linked to a symbol of the next class which precedes the next symbol in the next class. For example, if the current symbol is b.sub.1, the next can be c.sub.2. Symbol c.sub.2 is in the same next class 517 as, and is preceded in the next class by, the symbol c.sub.3 which is directly linked to the symbol b.sub.1. (d) A symbol which follows the current symbol in the current class is directly linked to a symbol of the next class which precedes the next symbol in the next class. For example, if the current symbol is d.sub.1, the next symbol can be e.sub.3. Symbol e.sub.3 is preceded in the same next class 523 by the symbol e.sub.1. Symbol e.sub.1 is directly linked to the symbol d.sub.5. The symbol d.sub.5 follows the symbol d.sub.5 in the same current class 518. According to these rules, the following set S.sub.X of permutations can be formed for the horizontal (X) direction: S.sub.X ={{a.sub.1,b.sub.1,c.sub.3,d.sub.1,e.sub.3 },{a.sub.1,b.sub.1,c.sub.3,d.sub.1,e.sub.1 },{a.sub.1,b.sub.1,c.sub.3,d.sub.5,e.sub.3 },{a.sub.1,b.sub.1,c.sub.3,d.sub.5,e.sub.1 },{a.sub.1,b.sub.1,c.sub.3,d.sub.3,e.sub.2 },{a.sub.1,b.sub.1,c.sub.4,d.sub.3,e.sub.2 },{a.sub.1,b.sub.1,c.sub.2,d.sub.3,e.sub.2 },{a.sub.2,b.sub.2,c.sub.2,d.sub.3,e.sub.2 },{a.sub.2,b.sub.3,c.sub.2,d.sub.3,e.sub.2 }}. FIGS. 17 and 18 show extended 1-D lists 500', 500" of the video clip formed for the vertical (Y) and temporal (T) directions. Sets of permutations S.sub.Y and S.sub.T are formed for these list as well as follows: ##EQU2## Given the sets of permutations for each of the extended 1-D lists 500, 500', 500" of the video clip, it is only necessary to intersect them to determine which frames of the video clip satisfy the video query. Thus the intersection is: S=S.sub.X .andgate.S.sub.Y .andgate.S.sub.T ={{a.sub.1,b.sub.1,c.sub.3,d.sub.1,e.sub.1 },{a.sub.1,b.sub.1,c.sub.3,d.sub.5,e.sub.1 }} The intersection set can then be used to identify the matching video as follows. Each permutation of the intersection set refers to a unique match in the video clip. The processor 110 accesses the index base stored in memory 120 or 130 and retrieves the exact position of each symbol in each permutation. The processor 110 can then retrieve the video frames of the clips which contain the symbols in a respective permutation set of the intersection. (Note that the frame numbers are indicated by the positions of the symbols of the permutations recorded in the index base). The frames corresponding to one of these permutations may then be displayed on the display device 150, for example, in real time or otherwise via the graphical user interface shown in FIGS. 9 and 10. Since the exact position is known for each icon that satisfies the video query, these icons may optionally be displayed in highlighted form in the retrieved frames. The user may then perform further processing on the retrieved frames such as, displaying the frames, compressing the frames, filtering the frames, etc. Thus, the invention avoids the necessity of performing a frame-by-frame search of each video clip which, depending on the matching criterion, can have a very high processing time. Note also that it is not necessary to directly compare strings representing the video query to strings representing each frame of each video clip. Rather, 3-D lists are constructed for each video clip utilizing a lower processing overhead, and the intersection of each of the lists need only be formed. Note that the inventive indexing and query process and system may be used in a video file server, for example in a video on demand system. The invention may also be employed in an electronic library or in conjunction with remote video education systems. In short, a video indexing and query execution method and system are provided. The method may be performed by a processor which performs the following steps to index the video clips: (a) identifying each symbol of one or more graphical icons in each frame of each video clip, (b) determining the horizontal, vertical and temporal coordinates of each symbol of the one or more identified graphical icons, and (c) constructing an index base for each identified symbol of the one or more graphical icons which includes its coordinates. In converting a video query from graphical form to string form, the processor may execute the following steps: (a) receiving a video query specifying the vertical, horizontal and temporal coordinates of at least one graphical icon in at least one frame, and (b) constructing a 3-D string from the video query indicating the distance between each symbol of each icon in the video query in the vertical, horizontal and temporal directions. In executing a video query on a database containing at least one video clip, the processor may perform the following steps: (a) identifying only those video clips of the database whose signatures contain the signature of the executed video query, (b) for each of the identified video clips: (b1) constructing a 1-D list for each of the horizontal, vertical and temporal directions, comprising a plurality of sets of symbols of icons contained in video query, each set containing a permutation of symbols of the icons which satisfy the video query in the respective direction of the 1-D list, (b2) forming the intersection of the three 1-D lists, and (c) identifying the portions of the video clips, indicated by a corresponding set contained in an intersection set of at least one of the identified sequences, as satisfying the video query. Finally, the above discussion is intended to be merely illustrative. Those having ordinary skill in the art may devise numerous alternative embodiments without departing from the spirit and scope of the following claims.
|
Same subclass Same class Consider this |
||||||||||
