Wildcards in radix- search tree structures6594655Abstract A system for storing and retrieving data using a radix-search tree having a plurality of sub-trees containing nodes and leaves, the system including: (a) a data storage module designed and configured for storing the plurality of sub-trees, wherein at least one of the leaves contains at least one entry having at least one wildcard in a primary position, and (b) a processor that is operative to perform operations including: (i) building the radix-search tree in the data storage module, and (ii) retrieving data from the radix-search tree in the data storage module. Claims What is claimed is: Description FIELD AND BACKGROUND OF THE INVENTION
TABLE 1
Bit# 0 1 2 3 4 5
A 0 0 1 * * 1
B 0 1 * * 1 *
C 0 1 * * 1 0
D 0 1 * 1 1 1
E 0 1 0 1 1 0
F 0 1 0 0 1 1
A radix-search tree embodying the entries of Table 1 is provided in FIG. 1. In Patricia methodology, the bits are examined in order, such that bit 0 would be expected to be the root of the tree. However, a glance at FIG. 1 reveals not only that the root is a node in which bit 1 is checked, but that bit 0 is nowhere to be found. This technique, commonly referred to as "skipping", is based on the observation that when all entries have an identical value at a certain bit, that bit cannot distinguish between the entries. In this case, Bit 0 is skipped because all six entries have an identical value of "0", and the tree-building procedure proceeds to bit 1. Because bit 1 distinguishes between entry A (bit 1=0) and entries B-F (bit 1=1), bit 1 becomes the first node ("root") of the tree. If the value of bit 1 equals 0, we move left from the root. Only entry A appears on the left-hand side of the tree. From Table 2, we see that Entry A contains 4 keys: A.sub.1, A.sub.2, A.sub.3, A.sub.4, all of which equal 1 at bit 2. Thus, bit 2 is skipped, and we proceed to examine bit 3, which contains a wildcard. Bit 3 becomes a node. If the value of bit 3 equals 0, we move left from the node, and examine the next bit. Bit 4 is also a wildcard, and is incorporated as a node on the tree. If the value of bit 4 equals 0, we move left from the node, and arrive at a leaf containing key A.sub.1 : 001001. If the value of bit 4 equals 1, we move right from the node, and arrive at a leaf containing key A.sub.2 : 001011. Similarly, if the value of bit 3 equals 1, we move right from the node, and proceed to examine bit 4. If the value of bit 4 equals 0, we move left from the node, and arrive at a leaf containing key A.sub.3 : 001101. If the value of bit 4 equals 1, we move right from the node, and arrive at a leaf containing key A.sub.4 : 001111.
TABLE 2
Bit# 0 1 2 3 4 5 Bit # 0 1 2 3 4 5 Refined by
left
A 0 0 1 * * 1 A1 0 0 1 0 0 1 +
A2 0 0 1 0 1 1 +
A3 0 0 1 1 0 1 +
A4 0 0 1 1 1 1 +
B 0 1 * * 1 * B1 0 1 0 0 1 0 C1
B2 0 1 0 0 1 1 F
B3 0 1 0 1 1 0 C2 = E
B4 0 1 0 1 1 1 D1
B5 0 1 1 0 1 0 C3
B6 0 1 0 1 1 +
B7 0 1 1 1 1 0 C4
B8 0 1 1 1 1 1 D2
C 0 1 * * 1 0 C1 0 1 0 0 1 0 +
C2 0 1 0 1 1 0 E
C3 0 1 1 0 1 0 +
C4 0 1 1 1 1 0 +
D 0 1 * 1 1 1 D1 0 1 0 1 1 1 +
D2 0 1 1 1 1 1 +
E 0 1 0 1 1 0 E 0 1 0 1 1 0 +
F 0 1 0 0 1 1 F 0 1 0 0 1 1 +
Continuing now to the right side of the root (bit 1=1), we need to distinguish between the other remaining keys: B.sub.6, C.sub.1, C.sub.3, C.sub.4, D.sub.1, D.sub.2, E, and F. Prior-art methods cannot handle entries such as B, C, and D, all of which have wildcards in the middle. Hence, there is a need to expand those entries, such that only the expanded single entries are stored in the tree. Proceeding to bit 2, if the value of bit 2 equals 0, we move left from the node. Entry E and entry F both equal 0 at bit 2, and are incorporated in the left-hand side of node 2. B.sub.1 -B.sub.4 from entry B and C.sub.2 from entry C have been overridden by more specific keys (C.sub.1, D.sub.1, E, and F), as shown in Table 2. Proceeding on the left-hand side of node 2, we arrive at bit 3. If the value of bit 3 equals 0, we move left from the node, along with key C.sub.1 and entry F. Since bit 4 does not distinguish between key C.sub.1 and entry F, we proceed directly to bit 5. If the value of bit 5 equals 0, we move left from the node, arriving at a leaf containing solely C.sub.1. If the value of bit 5 equals 1, we move right from the node, arriving at a leaf containing solely entry F. The explanation of the rest of the tree follows in a straightforward manner from the explanation above. The tree of FIG. 1 is particularly cumbersome, containing 4 levels of nodes for most leaves, and at least 3 levels of nodes for the remainder. It would be greatly advantageous to incorporate the entries into a tree which is more shallow and has less branching. It must be emphasized that leaves of the prior art typically contain a single key. More advanced leaves may contain more than one key, provided that an additional node cannot distinguish between the keys, e.g., in the case of the following keys:
H: 010111
I: 010111010
J: 0101110
K: 0101110101
Key H has only 6 bits. After the first six bits (which behave as a "prefix"), a node cannot distinguish between key H and keys I, J and K, which have the identical first six bits followed by additional bits. Before describing various aspects of the present invention, it is important to understand the relationships between the entries in Table 1 From the Venn diagram representation provided in FIG. 2, it can be seen that entry B is the most general string, including within it strings C, D, E, and F. This is because string B is identical to strings C, D, E, and F in all the defined bits (Columns 0, 1, and 4). For the same reason, string C includes string E. Unlike strings C, D, E, and F, String A is represented by a circle outside of the circle representing string B, because string A has a value of 1 in column 1, as opposed to string B (and strings C, D, E, and F), which has a value of 0 in column 1. String A, which contains wildcards at bit 3 and bit 4, can be specified by 2.sup.2 =4 keys. Because the wildcard can equal zero or one, bit 3 and bit 4 do not have a singular value, and require nodes to distinguish between them according to the prior art. It is observed, however, that because keys A.sub.1, A.sub.2, A.sub.3, and A.sub.4 are all included within entry A, it is possible to reach keys A.sub.1, A.sub.2, A.sub.3, and A.sub.4 in a deterministic manner, even without additional nodes. Thus, we can expand the heretofore known duty of a leaf to include entries containing wildcards at the beginning or in the middle of the entry, and not just for the more restricted duties described above. Within the broader leaf of the present invention, keys A.sub.1, A.sub.2, A.sub.3, and A.sub.4 are all included, in list form, within one leaf. The use of the inventive leaf in the radix-search tree of FIG. 1 is demonstrated in FIG. 3. The nodes on the left-hand side of the tree that distinguish between the four keys in entry A have been eliminated in favor of one broad leaf of the present invention. This idea can be further broadened to apply to what we term a "pseudo-node", which, as used herein in the specification and in the claims section that follows, refers to a bit within the set of entries being examined having at least one wildcard and not more than one defined value (i.e., either 0 or 1, but not both). According to the present invention, the bit can be eliminated as a node in such a case, with any "lost" information being stored in the leaves. Looking, for example, at bit 2 of Table 1, we observe that entries B-F either equal 0, in the case of entries E and F, or contain a wildcard, in the case of entries B, C, and D. None of the entries being examined have a value of 1 (Note that entry A was already distinguished back at node 1). Thus, even though bit 2 does not have a singular value for entries B-F, there is no need to use bit 2 to distinguish between any of the entries, because at bit 2, all entries having a defined value (entries E and F) are included within entries having a wildcard (entries B, C, and D), and can be found in a deterministic fashion without establishing the pseudo-nodes as actual nodes in the tree structure. As part of the inventive method, therefore, bit 2 is eliminated as a node. Instead of forcing the tree to distinguish between such entries, we simply store these entries together in one leaf. This is demonstrated in FIG. 4, in which is provided a radix-search tree having an inventive leaf that additionally can contain multiple entries belonging to a pseudo-node. Thus, from the root, we proceed to the right (bit 1=1) directly to bit 3, without examining bit 2. For a bit 3 value of 1, we move right from node 3. The only entry that cannot equal 1 is entry F, such that entries B, C, D and E are represented. For a bit 3 value of 0, we move left from node 3. Entries D and E cannot equal 0, thus, entries B, C and F are represented. Note that because entries B and C are wildcards, they are represented both for the case of bit 3 equals 1 and for the case of bit 3 equals 0. Proceeding now to bit 5, we further distinguish between entries B-F. For bit 3 equals 1, we must distinguish between entries B, C, D and E. For bit 3 equals 0, we must distinguish between entries B, C and F. For the case of (bit 3 equals 1 and) bit 5 equals 1, only entries B and D comply. It should be noted that at this point in the path (i.e., bit 1=1, bit 3=1, bit 5=1), entries B and D are indistinguishable. The selection of entry B or entry D must be based on predefined rules. As mentioned above, in the vast majority of networking applications, when a packet matches multiple prefixes, it is intuitively clear that the packet should be forwarded according to the most specific prefix or longest matching prefix. Here too, the more specific entry--entry D--is selected. Proceeding along the same level of the tree, we examine the case of bit 3 equals 1 and bit 5 equals 0. As can be seen from Table 1, entries B, C and E comply. However, at this point in the path (and particularly after moving left at node 5), entries B and C have become indistinguishable, hence entry B, the less specific of the two entries, is eliminated. Note that we cannot eliminate entry C, even though entry E is included in entry C, because the two entries are distinguishable at bit 2, where entry E is specified, but entry C has a wildcard. If, by way of example, our search key is 010110, we would like entry E to be chosen, entry E being more specific than entry C. However, if our search key is 011110, entry E does not match, such that we want entry C to be selected. As mentioned above, theoretically, we can distinguish between entries C and E by the addition of another node for bit 2: For (bit 3=1, bit 5=1, and) bit 2 equals 1, entry C is isolated; for bit 2 equals zero, entries C and E are indistinguishable, such that the more specific entry--entry E--would be selected. If the value of bit 3 equals 0, we move left from the node, along with entries B, C and F. Bit 4 does not distinguish between entries B, C and F, as described above, hence, we proceed directly to bit 5. If the value of bit 5 equals 0, we move left from the node, arriving at a leaf. The leaf contains entry B and entry C, or more specifically, B.sub.1, C.sub.1 and C.sub.3. However, because entry B is more general than entry C, and given the assumption that the more specific string is preferred, entry B is eliminated, leaving solely C (C.sub.1 and C.sub.3) in the leaf. If the value of bit 5 equals 1, we move right from the node, arriving at a leaf. The leaf contains entry B and entry F, or more specifically, B.sub.2 and F. However, entry B is clearly more general than key F, hence, entry B is eliminated, leaving solely key F in the leaf. Thus, according to the present invention, the leaf can store, not only entries containing multiple keys having the same "prefix", but families of entries that are sub-sets of one another. It has thus been shown that in addition to the above-described keys H, I, J, and K, a typical leaf according to the present invention can store keys of the form:
L: 01*11101
M: 01*11*01**0*1
N: 01**110*
O: 01**1101
It may readily be seen that while L is a subset of O, and O is a sub-set of N, M is not related, since M is of a different length. It has further been discovered by the inventors that the tree can be further compacted by choosing the order in which bits are incorporated into the tree. In Patricia-type methods, bits are checked in the order of appearance, as demonstrated in the prior art of FIG. 1 and in the inventive-leaf containing trees in FIG. 3 and FIG. 4. In the search tree according to one aspect of the present invention, however bits can be checked out of (i.e., not in) the order of appearance. It has been observed by the inventors that the tree can be simplified by selecting bits according to the bit having the minimum number of entries that contain a wildcard in the location of that particular bit. Thus, bit 1 is selected as the root (the Patricia trie of FIG. 1 and the search trees of FIGS. 3 and 4 also had bit 1 as the root, by coincidence: skipping bit 0, a non-distinguishing bit, bit 1 was simply the first distinguishing bit). FIG. 5 provides a radix-search tree having the inventive leaf of FIG. 4, but in which the bits are checked according to the minimum number of wildcards. Thus, the first bit to examine is bit 1, because bit 1 has zero wildcards (and not by coincidence, as above). For the case of bit 1 equals 1, we proceed along the right-hand side of the tree. Upon examination of bits 2-5, we find that bit 4 is now non-distinguishing. Of the remaining bits, bit 5 has the least number of wildcards, i.e., 1, hence bit 5 is selected as the next node. (Bit 2 is a pseudo-node, and is discussed below.) If bit 5 equals 0, we move left from the node. Only entry B, entry C and entry E comply with the conditions bit 1 equals 1 and bit 5 equals 0. Entry B is eliminated from this part of the tree, since at this point in the path, it is indistinguishable from entry C and also more general. Thus, entry C and entry E remain, or more specifically, keys C.sub.1, E, C.sub.3, and C.sub.4 (see expansion provided in Table 2). It can be seen that entry C and entry E are identical, except for bit 2 and bit 3. We observe, however, that regarding entry C and entry E, both bit 2 and bit 3 are pseudo-nodes and were skipped, hence, the entry C and entry E are placed together in one inventive leaf. If bit 5 equals 1, we move right from the node with entries B, D and F. The only remaining bit that has not been skipped (bit 0, bit 4) and is not a pseudo-node (bit 2) is bit 3. If the value of bit 3 equals 0, we move left from the node, along with entry B and entry F. From the expansion provided in Table 2, only key B.sub.6 and key F comply, such that only these two keys are stored in the leaf. If the value of bit 3 equals 1, we move right from the node, along with entry B and entry D. >From the expansion provided in Table 2, key B.sub.4 equals key D.sub.1 and B.sub.8 equals key D.sub.2. Being more specific, entry D, containing keys D.sub.1 and D.sub.2, is retained and stored in the leaf. Thus, according to this aspect of the invention, the tree contains 3 nodes and 4 leaves. The tree has a maximum depth of 3 nodes, with 2 leaves having a depth of 3, 1 leaf having a depth of 2, and 1 leaf having a depth of 1. This compares favorably with the inventive tree of FIG. 4, which has 4 nodes and 5 leaves, and wherein 4 of the leaves have a depth of 3. Compared with trees constructed on the basis of prior art, the reduction in tree depth and the reduction in the number of branches is significantly more pronounced. According to another aspect of the present invention, the bits are selected in an order that provides the best balance of the total number of single keys in both sub-trees. This principle is demonstrated in FIG. 6 and FIG. 7, using the entries provided in Table 4 below.
TABLE 4
0 1 2 3 4 5 6 7 8
A 0 0 0 1 1 * 1
B 1 0 0 * 1 * 0
C 1 1 * 0 1 * * *
D 1 1 * 1 0 * * * *
E 1 1 1 0 1 * * 0
FIG. 6 provides a radix-search tree according to the present invention, in which the bits are checked according to the minimum number of wildcards (which, coincidentally, is also the natural order). In the event that two or more bits have the same number of minimum wildcards, those bits are checked according to the order of appearance. Thus, bit 0, bit 1, and bit 4 all are distinguishing and have zero wildcards, and bit 0 is selected as the root of the sub-tree. For the case of bit 0 equals 1, we proceed along the right-hand side of the tree. Upon examination of bits 1-5, we find that bit 1 and bit 4 are both distinguishing and have zero wildcards, leading to the selection of bit 1. For the case of bit 1 equals 1, we proceed along the right-hand side of the tree. Upon examination of the remaining bits, we find that, upon separation of entry B, bit 3 remains distinguishing, but now has zero wildcards, such that bit 3 is selected. At this point, bit 4 is now non-distinguishing, and in fact, entry C and entry E can be inserted into the same leaf, according to an above-described facet of the invention. In computing the average search length, we note that 2 single keys (corresponding to entry A) are found at a search depth of 1; 4 single keys (corresponding to entry B) are found at a search depth of 2; 48 single keys (16 corresponding to entry C and entry E, and 32 corresponding to entry D) are found at a search depth of 3. The average search depth (assuming uniform distribution upon the keys) is approximately 2.85. FIG. 7 provides a sub-tree according to another aspect of the present invention, in which bits are checked out of the order of appearance, according to the optimal balancing of single keys within the sub-tree structure. There is an advantage to accessing larger leaves, i.e., leaves containing larger pluralities of single keys, as early as possible in the search routine. Thus the inventive algorithm discerns that by choosing bit 4 as the root, entry D, containing 32 single keys, is distinguished from all the other entries, and is inserted into a single leaf having a search depth of 1. For the case of bit 4 equals 1, we proceed along the right-hand side of the tree. Upon examination of bits 0, 1, 2, 3, and 5, we find that bit 0 and bit 1 are both distinguishing, and both have zero wildcards. However, by choosing bit 1, we can isolate 16 single keys corresponding to entry C and entry E (entry E being a subset of entry C), whereas the choice of bit 0 would isolate only 2 single keys corresponding to entry A. Thus, bit 1 is selected. We observe that the sub-trees in FIG. 6 and FIG. 7 both have 4 leaves and 3 nodes in series (including the root as a node). However, from a computation of the average search length, we note that 32 single keys (corresponding to entry D) are found at a search depth of 1; 16 single keys (corresponding to entry C and entry E) are found at a search depth of 2; 6 single keys (corresponding to entry A and entry B) are found at a search depth of 3. The average search depth is approximately 1.5, close to 1/2 the value of the average search depth of the sub-tree of FIG. 6. In the updating of a tree, there is a tradeoff between the speed of the updating and the balance of the tree. If, for example, the tree is completely rebuilt each time that an insertion or deletion is performed, the time invested in the updating/rebuilding is prohibitively long. The balance of the tree will, however, be close to optimal at all times. If, however, many insertions and deletions are performed without rebuilding the tree (or one or more of the affected sub-trees), the updating process is fast, but subsequent retrieval times will be severely compromised by the imbalances in the tree structure. An updating procedure according to the present invention is illustrated in FIG. 8 and in FIG. 9, wherein the basic tree structure is that of FIG. 7. In FIG. 8, we add new entry F to the existing tree structure; in FIG. 9, we add new entry G to the existing tree structure. New entry F and new entry G are provided in Table 5 below, along with original entries A-E.
TABLE 5
0 1 2 3 4 5 6 7 8
A 0 0 0 1 1 * 1
B 1 0 0 * 1 * 0
C 1 1 * 0 1 * * *
D 1 1 * 1 0 * * * *
E 1 1 1 0 1 * * 0
F * 1 1 0 1 1
G * 0 0 0 1 1
Referring now to FIG. 8, the insertion of new entry F into the tree structure is performed as follows: looking at the first node (bit 4) of the existing tree structure, we see that for entry F, bit 4 equals 1. Hence, we proceed to the right at bit 4, as if performing a retrieval operation. At the next node, bit 1 of entry F equals 1, hence we again proceed to the right, arriving at a leaf containing entry C and entry E. As is evident from the miniature Venn diagram representing this leaf (an 18-key leaf shown in FIG. 8), entry F does not intersect entry C and entry E, as entry F has a different number of bits. Since there is no node that can distinguish between entry F and entries C and E, entry F is incorporated into the existing leaf. The updating operation described above is extremely fast, being comparable to a simple retrieving operation. However, the balance of the tree has been slightly compromised, which slightly reduces the average retrieval efficiency. In FIG. 9, the insertion of new entry G into the tree structure is demonstrated. Examining the first node (bit 4) of the existing tree structure, we see that for entry G, bit 4 equals 1. Hence, we proceed to the right at bit 4. At the next node, bit 1 of entry G equals 0, hence we proceed to the left, arriving at a node corresponding to bit 0. Since for entry G, bit 0 is a wildcard, entry G must be recursively inserted on both sides of the node. For bit 0 equals 1, we proceed to the right, arriving at a leaf containing entry G. As is evident from the miniature Venn diagram representing this leaf (a 6-key leaf shown in FIG. 9), entry G does not intersect entry B, as entry G has a different number of bits. Since there is no node that can distinguish between entry G and entry B (note that at bit 2, the entries are indistinguishable, and bit 3 and bit 5 are pseudo-nodes), entry G is incorporated into the existing leaf containing entry B. For bit 0 equals 0, we proceed to the left, arriving at a leaf containing entry A. Entry A and entry G are distinguishable at bit 3, hence, a new node is created in the tree structure, using the sub-tree building algorithm, in which bit 3 is examined. For bit 3 equals zero, we move to the left child, arriving at a new leaf containing solely entry G. For bit 3 equals one, we move to the right child, arriving at an old leaf containing solely entry A. It must be emphasized, based on this particular example, that the inventive tree structure and method successfully address both the problem of incorporation of wildcard-containing entries (irrespective of wildcard location) and the incorporation of entries of varying length, into the tree structure. This is also true for matching and retrieval operations, as has been discussed above. Based on various trade-off considerations that may vary considerably from application to application, a restructuring of the tree or a portion thereof will be called for from time to time. In most cases, the restructuring will be a local restructuring, for example, a restructuring of sub-trees attached to node 0 using the sub-tree building algorithm that will be described in greater detail below. FIG. 10 is a logical flow diagram of a tree-building algorithm 200 that combines several aspects of the present invention. In the algorithm, sub-trees are built 202 one at a time, in a recursive fashion. The input to the sub-tree building routine includes a set of the appropriate entries, i.e., the set of all entries that hit this node on the lookup path. As described above, the entries are examined and compared in an algorithm 204 in which more general entries, which are indistinguishable from one or more of the other entries according to the lookup path, ("overridden entries") are eliminated. In the next step 206, an algorithm searches for one or more bits having different specified values, i.e., bits in which at least one entry has a value of 1 and at least one entry has a value of 0. If this condition is met, the algorithm proceeds to step 208, where the bit having the least number of wildcards is selected. Often, several bits will have the identical number of minimum wildcards. In such a case, the bit selection can be performed based on various criteria. In step 210 of the logical flow diagram, the criterion used, by way of example, is optimal balancing of single keys within the sub-tree structure, as described above and as shown in FIG. 7. The algorithm then proceeds to step 214. If at step 208 there is only one bit having the minimum number of wildcards, no additional criteria are necessary, and the algorithm proceeds directly to step 214, where entries for the left sub-tree are selected. Subsequently, the left sub-tree is built 216, entries for the right sub-tree are selected 218, and the right sub-tree is built 220. If the condition at step 206 is not met, i.e., no bits have different specified values, the algorithm proceeds to step 230. At step 230, the algorithm verifies that the current set of entries can be inserted into one leaf. If the current set of entries can be inserted into one leaf, the algorithm proceeds to step 232, wherein the leaf is built. In step 234, the node is subsequently defined as a leaf, and the building of the sub-tree is completed. It is possible that the current set of entries cannot be inserted into one leaf. For example, if the current set of entries contains 1,000 single keys, hardware limitations such as leaf size may prevent the inclusion of all the entries into a single leaf. In such a case, the algorithm proceeds to step 240, wherein the bits are re-examined. The selected bit in step 240 is the one which, for a maximal number of pairs, is the only different bit in the pair (i.e., in one entry of the pair, the bit is specified, and in the other entry in the pair, that bit is a wildcard, and all other bits in the pair are identical). This approach eliminates the maximal number of entries in sub-trees. There are two sub-trees for this node. In one of the sub-trees, the specified entry of the pair will not appear, and in the other sub-tree, the unspecified entry (i.e., the entry having a wildcard at the selected bit) will be overridden by the specified entry that is more specific. Consequently, the algorithm proceeds directly to step 214, and from there to step 216, as described above. Other aspects of the logical flow diagram are understood by those skilled in the art without further elaboration. The algorithm is treated in further detail, by way of example, in the Appendix provided below. FIG. 11 is a block diagram of a system 50 according to the present invention. System 50 includes a processor 20, a memory 60, and an I/O block 40. Memory 60 includes an instruction storage area 70 and a data storage area 90. This general system architecture allows the processor 20 to access the data storage area 90 and retrieve the requisite information from the radix-search tree structure of the present invention. As used herein in the specification and in the claims section that follows, the term "wildcard" refers to a bit located at a position that is within the length of an entry, in which the bit value is allowed to equal either 0 or 1, such that the wildcard-containing entry is actually a set containing subsets in which the wildcard bit is represented as 2 specified bits. As used herein in the specification and in the claims section that follows, the term "defined bit" or "specified bit" are used interchangeably to refer to a bit having a definite value, i.e., either 0 or 1. As used herein in the specification and in the claims section that follows, the term "leaf" refers to a group of one or more entries that need not be further divided. As used herein in the specification and in the claims section that follows, the term "entry" refers to a set of one or more keys represented by a string of the following alphabet--{0,1,*}, where "*" signifies a wildcard (i.e., either bit equals 0 or bit equals 1 in the specified bit position). As used herein in the specification and in the claims section that follows, the term "order of appearance" refers to the order in which bits appear in a key or entry. Examining bits in order of appearance belongs to conventional Patricia-trie methodology. As used herein in the specification and in the claims section that follows, the term "primary position" refers to a position at the beginning or in the middle of an entry, after which at least one specified bit is present. Thus, the wildcard positioned in the 7th bit of the string 011101*0 is in a primary position, whereas the wildcard positioned in the 5th bit of the string 0111**** is part of a suffix. As used herein in the specification and in the claims section that follows, the term "middle position" refers to a position in a string or entry that is both a non-prefix position and a non-suffix position, i.e., a bit preceded by a specified bit and followed by a specified bit. Thus, in the following string: string I=00110011***, bits 1-7 are middle position bits; bit 0 and bits 8-10 are prefix and suffix positions, respectively. As used herein in the specification and in the claims section that follows, the term "sub-tree" refers to at least one node within a radix-search tree, which may be a leaf node. As used herein in the specification and in the claims section that follows, the expression "best-balanced split" or "best-balanced split of single keys" refers to an ordering of bits in a sub-tree wherein the greatest number of single keys can be accessed as early as possible in the search routine. Such a split minimizes the average search depth and can significantly improve performance. As used herein in the specification and in the claims section that follows, the expression "split bit" refers to a bit pertaining to a set of entries, in which the bit has different specified values (i.e., both 0 and 1). A split bit cannot be skipped according to classic Patricia-trie methodology. It will be appreciated that the above descriptions are intended only to serve as examples, and that many other embodiments are possible within the spirit and the scope of the present invention.
APPENDIX
Algorithms
Overview
Lookup and SubTreeLookup are the algorithms that perform lookup for leafs
that can contain the
specified entry. Lookup is just the initialization part, SubTreeLookup is
the recursive mechanism of
looking in sub-tree.
BuildTree and BuildSubTree are the algorithms that build the tree
statically, i.e. when all the entries
are known at pre-build phase. BuildTree is just the initialization part.
BuildSubTree is the algorithm
that recursively builds the tree by building its sub-trees. It is the main
algorithm of the suite, which is
used both for static building of the tree and its dynamic rebuild.
UpdateEntry and RebuildSubTree are the algorithms that dynamically update
an existing tree.
UpdateEntry is just initialization and state-analysis phase. RebuildSubTree
is the algorithm that
recursively rebuilds the sub-trees of the tree.
POL_FindSplit is the algorithm that encapsulates the balancing
considerations in building/rebuilding
of the tree.
POL_RebuildHere is the algorithm that encapsulates the rebuild/lookup
trade-off of the dynamic
update of the tree.
Miscellaneous not detailed algorithms:
EliminateOverridenEntries is the algorithm that marks the entries that are
overridden by finer
entries after checking the given path. It is quite obvious and thus is not
detailed.
SelectEntriesByBit is the algorithm that selects from the given entries the
entries that will hit the
sub-tree associated with the specified bit value. It is quite obvious and
thus is not detailed.
Lookup
Synopsis Lookup( Tree, Entry )
Input Tree - the tree to start the lookup at (tree = root node).
Entry - the entry to lookup for.
Output A list of leaves the entry hits on lookup at Tree.
Initialization LeafList = empty.
Algorithm
SubTreeLookup( Tree, Entry, LeafList )
Output LeafList
SubTreeLookup
Synopsis SubTreeLookup( Node, Entry, LeafList )
Input Node - node to start the lookup at.
Entry - the entry to lookup for.
Output LeafList - A list of leafs the entry hits on lookup from
Node.
Complexity Time O(sub-tree depth) .about.= O(logN)
Space O(1)
Initialization None
Algorithm
If Node is a leaf
Then LeafList += Node.
Stop.
If the bit of Entry at Node.SplitPosition is specified
Then If this bit is 0
Then SubTreeLookup( Node.LeftSon, Entry,
LeafList )
Else SubTreeLookup( Node.RightSon, Entry,
LeafList ).
Else (bit is undefined or wildcarded).
SubTreeLookup( Node.LeftSon, Entry, LeafList ),
And SubTreeLookup( Node.RightSon, Entry, LeafList ).
Output LeafList
BuildTree
Synopsis BuildTree( AllEntries )
Input AllEntries - the list of entries to build the tree from.
Output The built patricia tree (root node).
Initialization Tree = new Node.
Path = EmptyPath.
Algorithm
BuildSubTree( Tree, AllEntries, Path ).
Output Tree.
BuildSubTree
Synopsis BuildSubTree (Node, EntriesSet, Path )
Input Node - the root of the sub-tree to be built.
EntriesSet - the set of entries which hit this node on the
lookup path.
Path - the set of indexes of bits that were checked in the
path to Node.
Output Node - the built sub-tree.
Complexity Time > O(N*logN*L)
Space > O(N*L)
Initialization None.
Algorithm
EliminateOverridenEntries( EntriesSet, Path ).
SplitLocation = POL_FindSplit( EntriesSet, Path ).
If SplitLocation ==NO_NEED_TO_SPLIT
< all the entries form leaf >
Then FormLeaf( Node, EntriesSet ). Stop.
Path += SplitLocation.
Node.LeftSon = new Node;
SonEntriesSet = SelectEntriesByBit( EntriesSet, SplitLocation, 0 ).
BuildSubTree ( Node.LeftSon, SonEntriesSet, Path ).
Node.RightSon = new Node;
SonEntriesSet = SelectEntriesByBit( EntriesSet, SplitLocation, 1 ).
BuildSubTree( Node.RightSon, SonEntriesSet, Path ).
POL FindSplit
Synopsis POL_findSplit( EntriesSet, Path )
Input EntriesSet - the set of entries to be split (if needed)
Output Split location, or NO_SPLIT in case all of them form a
leaf.
Path - the set of indexes of bits that were checked in the
path to Node.
Initialization None.
Algorithm
<
PHASE 1 - Splitting by abstract tree restrictions (split while there are
different bits).
Find bit which is in different specified state. (0 or 1). In this case we
have to split.
Simple case (equals classic patricia): find first bit in location order.
Balancing consideration 1: find bit number such that minimal number of
entries contain WC in it.
This approach will minimize the number of leafs in tree.
Balancing consideration 2: find bit number that divides the group of
entries (or group of all the
keys) into most equal groups, i.e., the bit that divides the entries into
two groups (for left and right
sub-trees) in such way that divides the total amount of single keys into
two mostly equal groups among
the sub-trees.
This approach will minimize the expected length of search for single key
(where the single key
distribution is assumed to be uniform).
Balancing consideration 3:
The final algorithm will be a certain incorporation of the above
considerations.
If no splitting needed at this phase (i.e., by tree restrictions the
entries may form group), go to phase 2.
PHASE 2 - Splitting by hardware leaf representation restrictions (split
further, when no different bits).
Check if a leaf can be formed from the entries. If yes, i.e., no splitting
needed, return NO_SPLIT.
If a leaf cannot be formed, the entries set it must be split.
Algorithm for optimal splitting: find bit that in most pairs of entries
(each for different length) it is
the only difference between the entries in pair (i.e. in one entry in pair
this bit is specified, in other is
WC and the others are the same).
This approach will eliminate maximal number of entries in sub-trees. For
each such pair, in one
sub-tree the corresponding entry in pair will not appear, and in other
sub-tree the other entry in pair will
be overridden by the former entry.
>
UpdateEntry
Synopsis UpdateEntry( Tree, Entry )
Input Tree - the root node of the existing tree.
Entry - the entry to update.
Output The updated tree.
Initialization None.
Algorithm
EntryLeafs = Lookup( Tree, Entiy ).
If EntryLeafs is empty <entry not found>
Then If ADD_ENTRY
Then Output RebuildSubTree( Tree, ADD, Entry
).
Stop.
Else <entry is found, EntryLeafs contain all leafs that
logically contain the entry>
If CHANGE_ENTRY
Then For each Leaf in EntryLeafs do:
ChangeResult(Leaf, Entry).
USR_TraceChange (Leaf)
Else If DELETE_ENTRY
Then
Stop.
RebuildSubTree
Synopsis RebuildSubTree( Node, OPERATION, Entry )
Input Node - the node whose sub-tree should be updated.
OPERATION - ADD/DELETE;
Entry - the entry to update.
Output Node - The updated sub-tree.
Complexity Time O(logN) [in leaf] - O(N*logN*L) [whole tree]
Space O(1) [in leaf] - O(N*L) [whole tree]
Initialization None.
Algorithm
If POL_RebuildHere( Node, Entry )
Then < update by rebuilding the sub-tree from this Node >
SubTreeEntries = CollectEntries( Node ).
If OPERATION == ADD
Then SubTreeEntries += Entry.
Else SubTreeEntries -= Entry.
DeleteSubTree( Node )
BuildSubTree( Node, SubTreeEntries ).
USR_TraceChange( Node )
Output Node. Stop.
< simulate lookup >
If the bit of Entry at Node.SplitPosition is specified
Then If this bit is 0
Then RebuildSubTree ( Node.LeftSon,
OPERATION, Entry )
Else RebuildSubTree ( Node.RightSon,
OPERATION, Entry ).
Else <bit is undefined or wildcarded>
RebuildSubTree ( Node.LeftSon, OPERATION, Entry )
RebuildSubTree ( Node.RightSon, OPERATION, Entry ).
Output Node.
POL RebuildHere
Synopsis POL_RebuildHere( Node, Entry )
Input Node - the node where the update may start
Entry - to update
Output YES to update at the node, NO to go down.
Initialization None.
Algorithm
<
Key ideas:
Simple case 1: return YES for the root node, i.e. rebuild the whole tree.
This approach will minimize the lookup time and sacrifice update time.
Simple case 2: return YES for the leaf node.
This approach will minimize the update time and sacrifice the lookup time
since some building
considerations wouldn't be considered.
The final algorithm will be a certain compromise between the above simple
cases.
The considerations may be the tree balancing state, system resources
(memory, processing power), user
wishes, etc.
Simple compromising approach: return YES for the first node where the
node's split bit is
wildcarded.
This approach will compromise between rebuild time and optimality of the
rebuilt balancing state.
>
|
Same subclass Same class Consider this |
||||||||||
