System and method for defining shapes with which to mine time sequences in computerized databases5878413Abstract A system and method including a computer shape definition language are disclosed for defining shapes and mining time sequences that resemble the shapes. The system and method include provisions for establishing a user-defined alphabet that in turn establishes a set of elemental shapes. The system also includes simple yet powerful operators for combining the elemental shapes to define a desired time sequence shape. Moreover, intervals of actual time sequences are mapped into corresponding transition sequences using the alphabet, and the transition sequences are stored in a hierarchical index structure for easily accessing the transition sequences. The index structure is entered with the desired time sequence shape, and the index structure is traversed to identify maximal actual transition sequences which conform to the desired time sequence shape, within user-definable blurry criteria. Claims We claim: Description BACKGROUND OF THE INVENTION
______________________________________
Interval .DELTA. value iv fv
______________________________________
I1 0 0 0
I2 +.02 0 .02
I3 +.15 .02 .17
I4 +.18 .17 .35
I5 +.15 .35 .50
I6 -.05 .50 .45
I7 -.02 .45 .43
I8 -.28 .43 .15
I9 -.12 .15 .03
I10 -.03 .03 0
______________________________________
These intervals are mapped to elemental shapes at block 42 using an alphabet of elemental shapes, such as the alphabet shown in FIG. 4 to establish a set of transition sequences which describe the time sequence "S". Accordingly, after mapping using the alphabet shown in FIG. 4, the sequence "S" shown in FIG. 5 is transformed to the following transition sequences: (zero appears/stable up up up down stable Down down disappears/stable). As stated above, the ambiguities between appears and stable and between disappears and stable are resolved at query time in favor of the particular elemental shape desired by the user. From block 42, the shape definition and locator module 16 of the system 10 proceeds to block 44 to store the mapped intervals of the time sequence "S" into a hierarchical index/storage structure. As intended by the present invention, blocks 42 and 44 together establish a mapping device. FIG. 6 shows the hierarchical index structure of the presently preferred embodiment. As shown, the index structure has a first, i.e., top, level L1, a second level L2, a third level L3, and a fourth, or bottom, level L4. FIG. 6 shows that each level is an array that includes a plurality of instances I.sub.ij, wherein i=level number and j=instance number. It can be appreciated in reference to FIG. 6 that instances I.sub.1j in the first level L1 are indexed to the elemental shapes which are defined at block 36 and which appear in one or more of the stored sequence shapes. Further, each instance I.sub.1j in the first level L1 points to an instance I.sub.2j in the second level L2. Each instance I.sub.2j in the second level L2 is a data array indexed by a start period of the first occurrence of the corresponding elemental shape. The size of an array in the second level L2 is equal to np, where np equals the maximum number of time sequence intervals in some time sequence. The dots shown in instances of the second level represent further entries in accordance with the principles discussed above. Additionally, each instance I.sub.2j in the second level L2 points to an instance I.sub.3j in the third level L3, and each instance I.sub.3j in the third level L3 is a data array indexed to a maximum number of times the corresponding elemental shape appears consecutively. The size of each array in an instance in the third layer L3 which is pointed to from the k.sup.th element of a second level L2 array is np-k elements. Moreover, each instance I.sub.3j of the third level L3 points to one or more instances I.sub.4j in the fourth level L4, each of which is representative of a transition sequence "H". More specifically, each transition sequence "H" is stored as a series of tuples which include the object identification (referred to herein as "o.id") of the shape "H", including an identification of the time sequence of which the particular shape "H" is a part, and the number of contiguous occurrences of the shape "H" in its associated time sequence. In the exemplar of the present figures, each transition sequence "H" represents a respective one of the intervals I of the sequence "S" shown in FIG. 5. While FIG. 6 shows that each instance in the third level L3 points to a single transition sequence "H", because the specific exemplary entries shown are for the sequence "S" shown in FIG. 5, it is to be understood that each instance in the third level L3 points to all transition sequences "H" which correspond to time sequences in the database 20 to which the conditions imposed by the appropriate ancestors in the top three levels L1-L3 apply. The entry "NULL" indicates elements corresponding to empty combinations, e.g., when an elemental shape does not start at a specific position in any of the time sequences in the database 20. When each time sequence interval stored in the database 20 can be mapped to one and only one transition sequence, instances in the first level L1 correspond to elemental shapes. On the other hand, the present invention recognizes that it may happen that one or more time sequence intervals in the database 20 can be mapped to more than one transition sequence. For example, in the exemplary time sequence and elemental shapes shown herein, the time sequence (0 0 0) can be mapped either to (zero zero) or (zero stable). In the case of large databases, storing all such multiple mappings in the index structure shown in FIG. 6 can result in an exponential growth of the index structure. To avoid this, in applications wherein all elementary shapes are not disjoint from each other, the present invention contemplates breaking down non-disjoint elementary shapes into primitive elementary shapes that are disjoint from each other, and the first level L1 shown in FIG. 6 would then include such primitive elementary shapes. For example, the shapes "zero" and "stable" could be broken down into three primitive elemental shapes, the first one of which would represent the portion of "zero" which does not overlap with "stable". In such a circumstance, the second primitive elemental shape would represent the portion of "stable" which does not overlap with "zero", while the third primitive elemental shape would represent the overlapping portions of "zero" and "stable". To limit the number of primitive elementary shapes, each primitive elementary shape is associated with an interval of real numbers. Now addressing in detail the desired shape generation process represented at block 40 of FIG. 3, a user may define a desired time sequence shape for retrieval from the data structure shown in FIG. 6. In defining a desired shape at block 40, the user selectively combines elementary shapes defined at block 36 and parameterized shapes, if any, that are defined at block 38 to generate a desired shape. The present invention provides a multiplicity of operators for this purpose. The first group of operators that will be discussed are what are termed herein "multiple choice operators" ("MCO"), and the multiple choice operators of the present invention include the operators "any" and "concat". It is to be understood that while the particular syntax of the present invention described herein is simple and straightforward, other syntax may be used within the spirit of the present invention. At block 40, a user may specify that a desired shape have any one of a number of user-specified shapes by defining the shape to have (any P.sub.1, P.sub.2, . . . P.sub.n) where P.sub.i is a shape descriptor. In other words, as more fully disclosed below the shape definition and locator module 16 of the system 10 can be caused to traverse the index structure shown in FIG. 6 to retrieve all transition sequences having any of the desired shapes P.sub.i, by using the "any" operator. Also at block 40, a user may specify that a desired shape be characterized by a concatenation of any two or more of a number of user-specified shapes by defining the shape to have (concat P.sub.1, P.sub.2, . . . P.sub.n) where P.sub.i is a shape descriptor. In other words, as more fully disclosed below the shape definition and locator module 16 of the system 10 can be caused to traverse the index structure shown in FIG. 6 to retrieve all transition sequences in which a concatenation of strictly contiguous shapes P.sub.1 -P.sub.n are found, by using the "concat" operator. Thus, the only transition sequences that will be retrieved in response to the "concat" operator are those characterized by the shape P.sub.1 which is immediately followed by the shape P.sub.2 which is in turn immediately followed in order by the remainder of the shapes, if any, that are specified in the "concat" definition. In addition to the multiple choice operators discussed above, the present invention provides so-called "multiple occurrence operators" ("MOO") which include the operators "exact", "atleast", and "atmost". More specifically, at block 40, a user may specify that a desired shape have exactly n contiguous occurrences of a shape P by defining the shape to have (exact n P), where n is an integer. In other words, as more fully disclosed below the shape definition and locator module 16 of the system 10 can be caused to traverse the index structure shown in FIG. 6 to retrieve all transition sequences having exactly n contiguous occurrences of a shape P. As an example, for the time sequence "S" shown in FIG. 5, if a user specified "exact 2 up" the shape definition and locator module 16 of the system 10 would return the null set .PHI., because no subsequence of "S" is exactly two intervals in length consisting entirely of "up" elemental shapes which is neither preceded nor followed by an "up" interval. Additionally, at block 40 a user may specify that a desired shape have at least n contiguous occurrences of a shape P by defining the shape to have (atleast n P), where n is an integer. In other words, as more fully disclosed below the shape definition and locator module 16 of the system 10 can be caused to traverse the index structure shown in FIG. 6 to retrieve all transition sequences having at least n contiguous occurrences of a shape P. Importantly, the "atleast" operator is maximal, in that the shape definition and locator module 16 of the system 10 returns only subsequences which have at least n contiguous occurrences of P and which are neither preceded nor followed by a subsequence that matches P. As an example, for the time sequence "S" shown in FIG. 5, if a user specified "atleast 2 up" the shape definition and locator module 16 of the system 10 would return the answer {S›2,5!} where {S›2,5!}=(0.02 0.17 0.35 0.50). Still further, at block 40 a user may specify that a desired shape have at most, i.e., no more than, n contiguous occurrences of a shape P by defining the shape to have (atmost n P), where n is an integer. In other words, as more fully disclosed below the shape definition and locator module 16 of the system 10 can be caused to traverse the index structure shown in FIG. 6 to retrieve all transition sequences having at most n contiguous occurrences of a shape P. As an example, for the time sequence "S" shown in FIG. 5, if a user specified "atmost 2 up" the shape definition and locator module 16 of the system 10 would determine the answer to be {›k,k!} where 0.ltoreq.k.ltoreq.1 or 6.ltoreq.k.ltoreq.10. This is because the null sequence is matched at those positions of "S" that do not participate in an "up" transition. The other null sequences are not determined to be in the answer because they participate in a sequence of three consecutive up transitions. Consequently, the final answer is a set of null sequences which are not returned in accordance with the present invention. It will be appreciated, however, that allowing a null sequence to match "atmost n P" enables a user to, e.g., specify (concat (at least 2 up)(atmost 1 Down)) and match it to {S›2,5!} corresponding to the sequence up up up. Thus, MCO and MOO operators can be combined to define a desired shape at block 40, and as more fully disclosed below, can further be combined with other operators to define relatively complex desired shapes. In addition to the MCO and MOO operators, the shape definition and locator module 16 of the system 10 includes a bounded occurrence operator ("BOO") "in" and associated shape-occurrence specifiers "precisely", "noless", "nomore" (in a first format) and "inorder" (in a second format) which can appear only with the BOO "in". As discussed more fully below, the BOO facilitate imposing arbitrary conditions on so-called "blurry" matching, wherein a time sequence can be matched to a desired shape despite the time sequence not exactly matching the desired shape. Specifically, the syntax for effecting blurry matches using the BOO "in" is (in ›length! ›shape-occurrences!), wherein ›length! establishes an integer number of intervals in which a defined number of user-specified shapes occur. The first of two formats the ›shape-occurrences! can assume is one of (precisely n P), (noless n Q), and (nomore n R), or a combination thereof using the logical operators "and" and "or". In response, the shape definition and locator module 16 of the system 10 will retrieve all subsequences from the index structure shown in FIG. 6 which are characterized by subsequences ›length! long and which contain precisely n shapes P, or no less than n shapes Q, or no more than n shapes R, as appropriate for the particular shape-occurrence specifier used. Importantly, unlike the MOO operators "exact", "atleast" and "atmost", the shape-occurrence specifiers are used only in conjunction with the BOO "in", and unlike the MOO operators the n occurrences of the shape-occurrence specifiers may overlap and need not be contiguous. As an example, a user may specify (in 5 (and (noless 2 (any up Up))(nomore 1 (any down Down)))). This defines a desired shape five intervals long that is characterized by at least two ups (either "up" or "Up") and at most one down (either "down" or "Down"). When applied to the sequence "S" shown in FIG. 5, the shape definition and locator module 16 of the system 10 will return {S›2,7!} corresponding to the transition sequence (up up up down stable). In contrast, {S›3,8!}.ident. (up up down stable Down) is not returned because it contains two downs. The second format of the ›shape-occurrences! uses the "inorder" shape occurrence specifier, in which subsequences ›length! long contain, in order of specification, a plurality of shapes P.sub.1, P.sub.2, . . . P.sub.n. P.sub.i and P.sub.i+1 may not overlap, but need not be contiguous. As an example, a user may specify (in 7(inorder (atleast 2 (any up Up))(in 4((noless 3(any down Down))))). To match, a sequence shape in the fourth level L4 of the index structure shown in FIG. 6 must be 7 intervals long, and must contain at least two ups followed by a subsequence four intervals long that contains at least three downs. When applied to the sequence "S" shown in FIG. 3, the shape definition and locator module 16 of the system 10 will return {S›2,9!}.ident.(up up up down stable Down down). As recognized by the present invention, the user may select a non-optimal way to define a desired shape at block 40 in FIG. 3. For this reason, the shape definition and locator module 16 of the system 10 proceeds to block 46 from block 40, wherein the user's definition of a desired shape (also referred to herein as a "query") may be automatically rewritten by the shape definition and locator module 16, as more fully described below in reference to FIGS. 7A-7C. Then, from block 46 the shape definition and locator module 16 proceeds to block 48 to execute the query, i.e., to identify sets of transition sequences that match the desired transition sequence. Accordingly, block 48 essentially is a query executor, and the operation of block 48 is discussed more fully below in reference to FIGS. 8-13. From block 48, the shape definition and locator module 16 proceeds to output state 50 to cause the output device 28 (FIG. 1) to output sequences that satisfy the user's query, i.e., to output parameters that are representative of transition sequences that match the desired time sequence shape. Now referring to FIG. 7A, in undertaking the query rewriting step of block 46 in FIG. 3, the shape definition and locator module 16 begins at decision block 52 in FIG. 7A and determines whether the user has defined, for the user's desired time sequence shape, identical shapes inside the "concat" operator. Although for illustration purposes FIGS. 7A-7C are in the form of flow charts in which decisions are made as to the nature of the user's query prior to rewriting the query, it is to be understood that the module 16 need not actually decide the nature of the query prior to rewriting it. Rather, the module 16 need only receive the query and route directly to the appropriate operational step for rewriting the particular query. If the user has defined, for the user's desired time sequence shape, identical shapes inside the "concat" operator, the module 16 proceeds to block 54 to "fold" the query using the "exact" operator. This rewriting operation may be expressed as follows:
______________________________________
User Input Query
Rewritten Form
______________________________________
concat (P.sub.1 P.sub.2 P.sub.2 . . . P.sub.2 P.sub.3)
concat (P.sub.1 (exact n P.sub.2) P.sub.3), wherein n
=
number of occurrences of P.sub.2 in the
user input query.
______________________________________
From block 54 or decision block 52 if the test there was negative the module 16 proceeds to decision block 56, wherein the module 16 determines whether first and second MOO, designated M.sub.1 and M.sub.2, have been composed together. If not, the module 16 proceeds to FIG. 7C. On the other hand, if two MOO have been composed together in the form M.sub.1 n (M.sub.2 m P), wherein n and m are integers and P is a desired shape, the module 16 proceeds to one of three decision blocks 58, 60, 62 to determine the nature of the composition. At block 58, the module 16 determines whether the first MOO M, has been composed with "n" number of the second MOO M.sub.2 outside the "concat" and "any" operators. If not, the module 16 proceeds with the remaining tests at decision blocks 60 and 62, but otherwise proceeds to decision block 64 to determine whether the first MOO M.sub.1 is either "exact" or "atleast". If not (meaning that the first MOO M.sub.1 is "atmost"), the module rewrites the input query, at block 66, as "(any(exact 0 (M.sub.2 m P))(M.sub.2 m P))", wherein the shape arguments to "any" in the right hand side correspond to 0 and 1 occurrences of the atmost argument. Then, the module 16 ends at state 68. The rewriting operation at block 66 may be expressed as follows:
______________________________________
User Input Query Rewritten Form
______________________________________
atmost n (M.sub.2 m P)
(any(exact 0 (M.sub.2 m P))(M.sub.2 m
______________________________________
P))
In contrast, if the test at decision block 64 is positive, the module 16 rewrites the input query as the null set .phi. at block 70 if n>1, or "M.sub.2 m P" if n=1, and then ends at state 68. The rewriting operation at block 70 may be expressed as follows:
______________________________________
User Input Query Rewritten Form
______________________________________
({exact .vertline. atleast} n (M.sub.2 m P))
(M.sub.2 m P), n = 1, .phi., n
______________________________________
> 1
At decision block 60, the module 16 determines whether the "concat" operator has been applied to the first and second MOO M.sub.1, M.sub.2. If not, the module 16 proceeds to decision block 62, but if so, the module 16 proceeds to decision block 72, wherein the module 16 determines whether the second MOO M.sub.2 is "atmost" or m=0, in which case the module 16 rewrites the input query as M.sub.2 mP at block 76. Otherwise, the module 16 rewrites the input query as the null set .phi. at block 74, and then ends at state 68. The rewriting operations at blocks 74 and 76 may be expressed as follows:
______________________________________
User IpDut Query
Rewritten Form
______________________________________
(concat (M.sub.1 n P)(M.sub.2 m P))
(M.sub.1 n P), if (M.sub.2 = atmost or m = 0),
and .phi. otherwise
______________________________________
At decision block 62, the module 16 determines whether the "any" operator has been applied to the first and second MOO M.sub.1, M.sub.2, provided the first and second MOO M.sub.1, M.sub.2 are equal to each other. If either condition is not met, the module 16 proceeds to FIG. 7B, but if both conditions are met, the module 16 proceeds to blocks 80, 82, and 84. At block 80, if both the first and second MOO are the "exact" operator and the specified number of occurrences for both are equal, the module 16 rewrites the user input query as a single "exact". The rewriting operation at block 80 may be expressed as follows:
______________________________________
User Input Query Rewritten Form
______________________________________
(any(exact nP)(exact mP)) (exact nP), n = m
______________________________________
At block 82, if both the first and second MOO are the "atleast" operator, the module 16 rewrites the user input query as a single "atleast" specifying the minimum of "n" or "m" as the desired number of occurrences. The rewriting operation at block 82 may be expressed as follows:
______________________________________
User Input Query Rewritten Form
______________________________________
(any
(atleast nP)(atleast mP)) (atleast min (n,m) P)
______________________________________
At block 84, if both the first and second MOO are the "atmost" operator, the module 16 rewrites the user input query as a single "atmost" specifying the maximum of "n" or "m" as the desired number of occurrences. The rewriting operation at block 84 may be expressed as follows:
______________________________________
User Input Query Rewritten Form
______________________________________
(any
(atmost nP)(atmost mP)) atmost max (n,m) P
______________________________________
As shown in FIG. 7A, the module 16 ends from blocks 80, 82, 84. Now referring to FIG. 7B, at decision block 86 the module 16 determines whether first and second MOO M.sub.1, M.sub.2 that are different from each other have been specified by the user in any order inside the "any" operator. If not, the module 16 proceeds to FIG. 7C. If the module 16 determines that the first and second MOO M.sub.1, M.sub.2 have been specified by the user inside the "any" operator, the module 16 proceeds to blocks 88, 90, 92. At block 88, if the first MOO is the "exact" operator and the second MOO is the "atleast" operator, the module 16 rewrites the user's input query as atleast m P if m.ltoreq.n, and as atleast n P if n=m-1. The rewriting operation at block 88 may be expressed as follows:
______________________________________
User Input Query
Rewritten Form
______________________________________
(any
(exact nP)(atleast mP))
atleast mP, m .ltoreq. n, atleast nP, n = m
______________________________________
- 1
In contrast, at block 90, if the first MOO is the "exact" operator and the second MOO is the "atmost" operator, the module 16 rewrites the user's input query as atmost m P if m.gtoreq.n, and as atmost n P if m=n-1. The rewriting operation at block 90 may be expressed as follows:
______________________________________
User Input Query
Rewritten Form
______________________________________
(any
(exact nP)(atmost mP))
atmost mP, m .gtoreq. n, atmost nP, m = n
______________________________________
- 1
On the other hand, at block 92, if the first MOO is the "atmost" operator and the second MOO is the "atleast" operator, the module 16 rewrites the user's input query as atleast 0 P if m.ltoreq.n+1. The rewriting operation at block 92 may be expressed as follows:
______________________________________
User Input Query Rewritten Form
______________________________________
(any
(atmost nP)(atleast mP)) atleast 0 P, m .ltoreq. n + 1
______________________________________
Now referring to FIG. 7C, the module 16 determines whether to rewrite user input queries that use the "in" operator. Stated differently, FIG. 7C shows how the module 16 rewrites user-defined desired shapes which are to be blurry matched with transition sequences in the index structure shown in FIG. 6. Starting at decision block 94 in FIG. 7C, the module 16 determines whether two shape occurrence specifiers (SOS) have been used in a query inside the BOO "in". If not, the module 16 proceeds to decision block 96, wherein the module 16 determines whether the length specified by the BOO "in" operator is less than the guaranteed minimum length of the transition sequence to be matched, as specified by the numeral immediately following the SOS. If not, the module 16 ends at state 98, but if the length specified by the BOO "in" operator is less than the guaranteed minimum length of the transition sequence to be matched, the module 16 proceeds to block 100 and rewrites the user's query as the null set .phi.. On the other hand, if the module 16 determines, at decision block 94, that two shape occurrence specifiers (SOS) have been used in a query inside the BOO "in", the module 16 proceeds to decision block 102. At decision block 102, the module 16 determines whether first and second SOS S.sub.1, S.sub.2 have been composed together. If so, the module 16 proceeds to block 104 to rewrite the user's query using the principles discussed above in regard to two MOO that are composed together. If, at decision block 102, the module 16 determines that first and second SOS S.sub.1, S.sub.2 have not been composed together, the module 16 moves to decision block 106 to determine whether the SOS have been used inside the "and" or "or" operators. If not, the module 16 ends at state 98, but if so, the module 16 rewrites the user's input query in accordance with principles discussed above with regard to MOO. More specifically, if the SOS have indeed been used inside the "and" operator, the module 16 rewrites the query using the principles discussed above for the case wherein two MOO have been used inside the "concat" operator. And, if the SOS have indeed been used inside the "or" operator, the module 16 rewrites the query using the principles discussed above for the case wherein two MOO have been used inside the "any" operator. Now referring to FIGS. 8-13, the operation of the query execution engine represented at block 48 in FIG. 3 can be appreciated. As intended by the present invention, the module 16 receives the user-defined desired shape, possibly rewritten as discussed above, and then undertakes the operations shown in FIGS. 8-13, as appropriate, to traverse the index structure shown in FIG. 6 and match transition sequences that are stored therein with the desired shapes. Such index traversal is referred to herein as "evaluation", in that a desired shape is evaluated in terms of the stored transition sequences. As further intended by the present invention, along with the desired shape configuration, the user also inputs a start point "s" and an end point "e" of an interval to be evaluated. (Note that "s" and "e" also may be computed by the procedure of the invention from the results of previous calculations.) The result of the evaluation command (termed herein "eval") is a set of tuples ›o.id, start, length!, wherein "o.id" is the object identification retrieved from the bottom level L4 of FIG. 6, "start" is the start period as retrieved during index traversal from the second level L2 of FIG. 6, and "length" is the length of the matched subsequence retrieved from the third level L3 of FIG. 6 during index traversal. Alternatively, for certain desired shapes (e.g., shapes defined by the concat operator), "length" is the combined third level L3 lengths of two or more transition sequences. It is to be further understood that as intended herein, the notation shape›P!.start›x!.occur›y! means "retrieve object identifiers that have y occurrences of the shape P starting from x". Beginning at block 110 in FIG. 8, a user may desire to match, i.e., to evaluate, an elemental shape P in the interval having start at "s" and end at "e", and block 110 represents the receiving of the user's evaluate query. Proceeding to block 112, the module 16 addresses the user's query by traversing the index structure shown in FIG. 6 by first entering the top level L1 with the particular elemental shape P as the entering argument. Then, the module 16 traverses the lower levels L2, L3, L4 as appropriate to retrieve o.id's that have a start at some "x" and that have "y" consecutive occurrences, within the constraints of block 114. Specifically, as shown at block 114, the module 16 retrieves only those o.id's in which the maximum of either "s" or "x" is less than or equal to "i", and "i" is less than the minimum value of either "e" or the sum of "x" and "y". From the index traversal and o.id retrieval steps of blocks 112, 114, the module 16 proceeds to output state 116 to output a set of matching elemental shapes in the form {o.id, start›i!, length›l!}. As the skilled artisan will appreciate, the length of the satisfying o.id's is unity, because only a single elemental shape was desired to be evaluated, i.e., matched to subsequences in the index structure of FIG. 6. The operation of the module 16 in FIG. 8 can be expressed as follows: eval (P,›s,e!)={›oid:o; start:i; length:l!.vertline..E-backward.x,y(o .epsilon. shape P.start›x!.occur›y! max (s,x).ltoreq.i<min(x+y,e))} FIG. 9 shows the index traversal of the module 16 to satisfy an "exact" query (at block 118) for "n" occurrences of an elemental shape P, n>0. At block 120, the module 16 undertakes index traversal to retrieve o.id's that have a start at some "x" and that have "y" consecutive occurrences, within the constraints of blocks 122, 124, 126. More particularly, at block 122 "x" is constrained to be no more than the difference between "e" and "n". Also, at block 124 the sum of "s" and "n" is constrained to be no more than the sum of "x" and "y". Additionally, at block 126 the difference between the minimum of either "e" or the sum of "x" and "y" and the maximum of either "s" or "x" is constrained to equal "n". If a particular o.id meets all three of the above criteria, it is output in a set of matching transition sequences at output state 128 as {o.id, start›max (s,x)!, length›n!}. It is to be understood that "start›max (s,x)!" indicates that the starting point which is output for an o.id is the maximum of either "s" or "x". The operation of the module 16 in FIG. 9 can be expressed as follows: ##EQU1## Now referring to FIG. 10, the index traversal of the module 16 to satisfy an "atmost" query (at block 130) for "n" occurrences of an elemental shape P is shown. At block 132, the module 16 undertakes index traversal to retrieve o.id's that have a start at some "x" and 20 that have "y" consecutive occurrences, within the constraints of blocks 134, 136, 138. More particularly, at block 134 "x" is constrained to be less than "n". Also, at block 136 "s" is constrained to be less than the sum of "x" and "y". Furthermore, at block 138 the difference between the minimum of either "e" or the sum of "x" and "y" and the maximum of either "s" or "x" is constrained to be no more than "n". If a particular o.id meets all three of the above criteria, it is output in a set of matching transition sequences at output state 140 as {o.id, start›max (s,x)!, length›min (e, x+y)-max (s, x)!}. It is to be understood that "length›min (e, x+y)-max (s, x)!" indicates that the length that is output for an o.id is the minimum of either "e" or the sum of "x" and "y" minus the maximum of either s or x. The operation of the module 16 in FIG. 10 can be expressed as follows: ##EQU2## In addition, the set of matching transition sequences at output state 140 includes the results of "eval exact 0 P›s,e!," which operation is disclosed below in reference to FIG. 12. FIG. 11 shows the index traversal of the module 16 to satisfy an "atleast" query (at block 142) for "n" occurrences of an elemental shape P is shown. At block 144, the module 16 undertakes index traversal to retrieve o.id's that have a start at some "x" and that have "y" consecutive occurrences, within the constraints of blocks 146, 148, 150. More particularly, at block 146 "x" is constrained to be less than or equal to "e-n". Also, at block 148 the sum of "s" and "n" is constrained to be no more than the sum of "x" and "y". Furthermore, at block 150 the difference between the minimum of either "e" or the sum of "x" and "y" and the maximum of either "s" or "x" is constrained to be no less than "n". If a particular o.id meets all three of the above criteria, it is output in a set of matching transition sequences at output state 152 as {o.id, start›max (s,x)!, length›min (e, x+y)-max (s,x)!}. It is to be understood that the length that is output for an o.id is the difference between the minimum of either "e" or the sum of "x" and "y" and the maximum of either "s" or "x". From output state 152, the module 16 proceeds to decision block 154, wherein it is determined whether n=0. If not, the module 16 ends at end state 156. Otherwise, the module 16 proceeds to block 158 to union the results of "eval exact 0 P" (FIG. 12) to the set of matching transition sequences that are output at output state 152. The operation of the module 16 in blocks 142-152 of FIG. 11 can be expressed as follows: ##EQU3## In reference to FIG. 12, the means by which the module 16 evaluates "eval exact 0 P›s,e!" is shown. From block 160, the module 16 proceeds to block 162 to evaluate the query "eval atleast 1 P›s,e!" using the process shown in FIG. 11. Then, the module 16 outputs a set of matching transition sequences at output state 164 which are those o.id's that are not included in the set of o.id's determined at block 162. It is to be understood that to evaluate queries using one of the shape occurrence specifiers "precisely", "nomore", and "noless", the module 16 undertakes the operational steps shown in FIGS. 9, 10, and 11, respectively, with the following exceptions. When "nomore" is being evaluated, "n" must be greater than the sum of all P occurrences in ›s,e!. Likewise, when "noless" is being evaluated, "n" must be smaller than the sum of all P occurrences in ›s,e!. Also, when "precisely" is being evaluated, "n" must be equal to the sum of all P occurrences in ›s,e!. When the user desires to evaluate, i.e., match a complex shape (i.e., a shape established by combining two or more elementary shapes) with a transition sequence in the index structure, the above operations are recursively performed as appropriate, and the resulting output sets combined in accordance with principles set forth above. Stated differently, the evaluation of complex shapes is inductively performed using the index structure and steps shown above. FIG. 13 shows an example of such an inductive evaluation for an input query of "eval concat D.sub.1, D.sub.2 ›s,e!" at block 166, wherein D.sub.1 and D.sub.2 are first and second complex shapes each of which is composed of elemental shapes "P". At block 168, the index structure is traversed as appropriate to output a first set S.sub.1 of o.id's that match the constraints of the first complex shape D.sub.1, using the principles discussed above for elemental shapes to inductively determine S.sub.1. Then, an interval I.sub.1 is determined. As intended by the present invention, the interval I.sub.1 is the interval in which the matching of the second complex shape D.sub.2 with the first complex shape D.sub.1 must begin. At block 170, the interval I.sub.1 is determined to be bounded on one end by the minimum start plus length of an element in the first S.sub.1. Moreover, as shown in FIG. 13 the interval I.sub.1 is determined to be bounded on its other end by the maximum start plus length of an element in the first set S.sub.1. After determining the interval I.sub.1 at block 170, the module 16 moves to block 172 to traverse the index to evaluate the second complex shape D.sub.2 using a start "t" that is within the interval I.sub.1. A second set S.sub.2 is the result of the step at block 172. Next, at block 174, the first and second complex shapes D.sub.1, D.sub.2 are joined within the predicate constraints of blocks 176 and 178. It is to be understood that the second set S.sub.2 is the union of all such evaluation results for each start "t". More specifically, block 176 requires that, in joining an o.id of the first set S.sub.1 with an o.id of the second set S.sub.2, the o.id's must match. In other words, to concatenate an element from the first set with an element from the second set, the elements must be derived from the same time sequence. Furthermore, block 178 requires that the start of the o.id in the second set S.sub.2 that is to be matched with an o.id in the first set S.sub.1 must equal the start of the first o.id plus the length of the first o.id. Thus, the constraint set forth above in relation to the "concat" operator, namely, that concatenated shapes be contiguous, is preserved. Sets of matches (i.e., joined o.id's) are output at output state 180 with a projection on each match that requires that the start of the match is equal to the start of the o.id from the first set S.sub.1 and the length of the match is equal to the sum of the lengths of the joined o.id's. FIG. 13 shows the steps for concatenating two complex shapes D.sub.1 and D.sub.2. As stated above, the concatenation of "n" complex shapes is performed inductively. In other words, the first two shapes are concatenated as shown, then the third shape is concatenated to the concatenation of the first two shapes, and so on. The inductive evaluation stops when the result of a join is empty, or when all joins have been performed. With the above detailed disclosure in mind, the evaluation of other operators may be simply explained in relation to the processes discussed previously. To evaluate the MOO operators as used on complex shapes, the inductive process of FIG. 13 using FIGS. 9-12, as appropriate, is used (recall that, like "concat", the MOO require contiguous, non-overlapping shapes). The exact and atmost operators have the same stopping condition as concat. In keeping with principles discussed above (i.e., that "exact" return sequences containing only exactly the specified number of shapes), the exact operator returns the result of step "n" if the result of step "n+1" is empty, and the null result otherwise. In further keeping with principles discussed above, the atmost operator returns the result of step "i" if i.ltoreq.n and the result of step i+1 is empty. In the case of the atleast operator, the inductive evaluation stops when a join returns an empty set. The result of step "i" in such a case is returned if i.gtoreq.n and the step i+1 returned the empty set. In addition, other operators may be evaluated in accordance with the principles set forth above. For example, to evaluate the operator "any" as used on D.sub.1 . . . D.sub.n, each D.sub.i, 1.ltoreq.i.ltoreq.n, is separately evaluated and the answer output as the union of all "eval D.sub.i ›s,e!". The evaluation of the operator "and", because it allows gaps and overlaps between shapes inside it, is implemented as a join of the respective sets of subsequences that match the various shapes within the "and" operator. When the "or" operator is applied to two or more desired shapes, the answer is the union of the respective sets of subsequences matching the shapes. With respect to the "in" operator, the length parameter essentially defines a family of intervals inside ›s,e! where matches are to be performed. Consequently, "in" is evaluated as follows: eval((in n D›s,e!)=.orgate..sub.s.ltoreq.i.ltoreq.e-n (eval(D›i,i+n!)) When used inside the "in" operator, as they must be, with complex shapes "D", the precisely, nomore, and noless operators are evaluated as are exact, atmost, and atleast are with regard to complex shapes, with the following exception. Recall that the precisely, nomore, and noless operators allow gaps and overlaps between shapes, unlike exact, atmost, and atleast. Accordingly, the definitions for the interval I.sub.1 and the predicate constraints of blocks 176, 178 and projection requirement of output state 180 in FIG. 13 require offsets of at least one time period to allow for this. Additionally, the "inorder" operator, when used as it must be inside the "in" operator, does not allow overlap but does allow gaps between successive shapes. Consequently, it's evaluation schemata is the same as for "concat", except that the predicate constraints of blocks 176, 178 and projection requirement of output state 180 in FIG. 13 require that the start period of a subsequence be separated from the start period of its immediately preceding subsequence by at least the length of the preceding subsequence. While the particular system and method for defining shapes with which to mine time sequences as herein shown and described in detail is fully capable of attaining the above-described objects of the invention, it is to be understood that it is the presently preferred embodiment of the present invention and is thus representative of the subject matter which is broadly contemplated by the present invention, that the scope of the present invention fully encompasses other embodiments which may become obvious to those skilled in the art, and that the scope of the present invention is accordingly to be limited by nothing other than the appended claims.
|
Same subclass Same class Consider this |
||||||||||
