|
|
|
DATABASE SCHEMA OR DATA STRUCTURE |
Technique for efficiently classifying packets using a trie-indexed hierarchy forest that accommodates wildcards6041053
Abstract
A technique, specifically apparatus and accompanying methods, which utilizes a trie-indexed hierarchy forest ("rhizome") that accommodates wildcards for retrieving, given a specific input key, a pattern stored in the forest that is identical to or subsumes the key. The rhizome contains a binary search trie and a hierarchy forest. The search trie provides an indexed path to each unique, most specific, pattern stored in a lowest level of the hierarchy forest and also possibly to increasingly general patterns at higher levels in the pattern hierarchy. The hierarchy forest organizes the patterns into nodal hierarchies of strictly increasing generality. For use as a packet classifier, the rhizome stores wildcard-based packet classification patterns at separate corresponding pattern nodes, along with, corresponding "reference" fields associated therewith. Operationally, as each different queue is established or removed, a corresponding classification pattern is either inserted into or removed from the rhizome. A search key is formed for each packet, typically by concatenating classification fields, e.g. source and destination addresses and source and destination port designations, appearing in a header of the packet. The search key is then applied to the rhizome to determine whether that key exists therein, by virtue of either matching an identical classification pattern or being completely subsumed within a more general pattern stored therein. When such a classification is found, the classifier returns the contents of the associated reference field, which for scheduling, is a designation of a transmission queue to which the packet is to be directed.
Claims
We claim:
1. A method for use in a packet classifier, the classifier having a data structure with hierarchically-related pattern values stored therein, said pattern values having wildcards therein; wherein the data structure has a search trie and a hierarchy forest, the search trie being formed of a plurality of branch nodes organized into a binary search trie extending from a root node therein to said hierarchy forest, the hierarchy forest having a plurality of pattern nodes, each having a different corresponding one of the pattern values associated therewith, and at least one hierarchy of pattern nodes with associated pattern values of increasing generality; the method comprising the steps of:
forming a key in response to information contained in a portion of the packet;
traversing along a path through the search trie in order to reach one of the pattern nodes in said hierarchy forest, the path being defined by the key and pivot bit values associated with ones of said branch nodes;
determining whether the key either matches or is subsumed by a pattern value stored in said one pattern node or in a different one of pattern nodes situated at an increasingly higher level of a hierarchy containing said one pattern node, so as to locate a desired one of the pattern values, stored in the hierarchy, that either identically matches or most specifically subsumes the key; and
if the key identically matches or is subsumed by the desired one pattern value:
accessing a stored classification associated with said desired one pattern value; and
returning the stored classification for the packet.
2. The method in claim 1 wherein the classification is a designation of a transmission queue.
3. The method in claim 2 wherein the accessing step further comprises the steps of:
accessing a reference field associated with a pattern node storing said desired one pattern value so as to yield a stored designation of an associated transmission queue as the classification; and
returning the designation of the associated transmission queue for the packet.
4. The method in claim 3 wherein the portion of the packet is a packet header, and the information comprises source and destination addresses and source and destination port designations.
5. The method in claim 1 wherein said traversing step comprises the step of tracing along the path by branching from each successive branch node encountered along said path, starting at the root node, in a direction therefrom as specified by a bit of the key indexed by the pivot bit of said each branch node until said one pattern node is encountered.
6. The method in claim 5 wherein the tracing step comprises the steps of:
examining, when the path reaches said successive branch node, a particular bit in the key indexed by a value of the pivot bit associated with said successive branch node; and
branching, as specified by a value of the particular bit along either a child zero branch or a child one branch, both branches emanating from the successive branch node, to reach a child one or child zero node as a next node along said path, said next node being either a different one of the branch nodes, having a higher pivot bit associated therewith than that of the successive branch node, or the one pattern node.
7. The method in claim 6 further comprising the step of:
returning a message, indicative of the no match condition, in the event said key did not match nor was subsumed by the pattern value stored in said one pattern node nor in any of the pattern values stored at all increasingly higher levels of the hierarchy.
8. The method in claim 7 further comprising the step of inserting, in response to a new transmission queue being established, a new pattern node, into said hierarchy forest, for storing a new pattern value in the data structure, said new pattern node having a designation of the new queue associated therewith.
9. The method in claim 8 wherein the classification is a designation of a transmission queue.
10. The method in claim 9 wherein the accessing step further comprises the steps of:
accessing a reference field associated with a pattern node storing said desired one pattern value so as to yield a stored designation of an associated transmission queue as the classification; and
returning the designation of the associated transmission queue for the packet.
11. The method in claim 10 wherein the portion of the packet is a packet header and the information comprises source and destination addresses.
12. The method in claim 8 wherein said inserting step further comprises the steps of:
inserting said new pattern node at a proper hierarchical location within said hierarchy forest, said inserting comprises updating stored parameter values associated with existing ones of the nodes in the data structure such that said new pattern node is situated properly and is fully reachable therethrough; and
if said hierarchy forest contains a wildcard-based pattern, replicating sufficient structure in said search trie, where necessary, based on bit positions where bit disagreements occur among bits in corresponding bit positions in said new pattern and existing pattern values stored in the data structure, such that the new pattern node is reachable over all different paths that, as a result of a wildcard, can be extended through the search trie to the new pattern node.
13. The method in claim 12 wherein each of said branch nodes in said search trie has a corresponding data structure associated therewith, the replicating step comprises the step of defining a data structure for each new branch node formed in said search trie, wherein the defining step comprises the step of:
storing, in corresponding memory locations in said data structure for said new branch node:
a corresponding one of the pivot bits associated with said new branch node;
child zero and child one pointers to child one and child zero nodes; and
values for VALUE, MASK and IMASK parameters.
14. The method in claim 13 wherein said storing step further comprises the steps of:
forming the VALUE parameter as a first predefined function of the VALUE parameters of the child one and zero nodes emanating from said new branch node;
forming the MASK parameter as a second predefined function of MASK parameters of the child one and child zero nodes emanating from said new branch node; and
forming the IMASK parameter as a third predefined function of IMASK parameters of the child one and child zero nodes emanating from said new branch node.
15. The method in claim 14 wherein said first and second functions are each a disjunction function, and said third function is a conjunction function.
16. The method in claim 15 wherein said first and second functions is the disjunction function taken up to the pivot bit position for said one branch node, and the third function is the conjunction function taken up to the pivot bit position for said one branch node.
17. The method in claim 12 wherein each of said pattern nodes in said hierarchy forest has a corresponding data structure associated therewith, the inserting step comprises the step of defining a data structure for each new pattern node formed in said hierarchy forest, wherein the defining step comprises the step of storing, in corresponding memory locations in said data structure:
a reference value;
a pointer to a next higher pattern node in a hierarchy containing said new pattern node; and
values for VALUE, MASK and IMASK parameters.
18. The method in claim 17 wherein said storing step further comprises the steps of:
setting the VALUE parameter equal to a new pattern value to be stored in said each new pattern node;
forming the MASK parameter to specify a wildcard in any bit position in said VALUE parameter for said each new pattern node; and
forming the IMASK parameter as a fourth predefined function of the MASK parameter of said each new pattern node and the IMASK parameter of a godparent to said each new pattern node.
19. The method in claim 18 wherein said fourth function is a conjunction function.
20. The method in claim 7 further comprising the step of removing an existing pattern node, from said hierarchy forest, so as to remove a pattern value, and a flow, associated therewith from said data structure.
21. The method in claim 20 wherein the removing step comprises the steps of:
eliminating structure from the search trie solely needed to previously support said existing pattern node; and
updating stored parameter values associated with remaining ones of the nodes in the data structure such that the existing pattern node is fully removed from the data structure and is not reachable therein.
22. The method in claim 21 wherein each branch node in the search trie has a corresponding data structure associated therewith, the data structure storing in separate fields therein: a corresponding one of the pivot bits associated with said each branch node; child zero and child one pointers to child one and child zero nodes; and values for VALUE, MASK and IMASK parameters; and wherein said updating step for, a remaining one of the branch nodes in the forest, comprises the steps of:
forming the VALUE parameter as a first predefined function of the VALUE parameters of the child one and zero nodes emanating from said remaining one branch node;
forming the MASK parameter as a second predefined function of MASK parameters of the child one and child zero nodes emanating from said remaining one branch node; and
forming the IMASK parameter as a third predefined function of IMASK parameters of the child one and child zero nodes emanating from said remaining one branch node.
23. The method in claim 22 wherein said first and second functions are each a disjunction function, and said third function is a conjunction function.
24. The method in claim 23 wherein said first and second functions is the disjunction function taken up to the pivot bit position for said remaining one branch node, and the third function is the conjunction function taken up to the pivot bit position for said remaining one branch node.
25. The method in claim 21 wherein each pattern node in the hierarchy forest has a corresponding data structure associated therewith, the data structure storing in separate fields therein: a reference value; a pointer to a next higher pattern node in a hierarchy containing said new pattern node; and values for VALUE, MASK and IMASK parameters, wherein said updating step for, a remaining one of the pattern nodes in the data structure, comprises the step of forming the IMASK parameter as a fourth predefined function of the MASK parameter of said remaining one pattern node and the IMASK parameter of a godparent to said remaining one pattern node.
26. The method in claim 25 wherein said fourth function is a conjunction function.
27. A computer readable medium having computer executable instructions stored therein for performing the steps recited in claim 1.
28. Apparatus for a packet classifier comprising:
a processor;
a memory having executable instructions and a data structure stored therein, the data structure having a search trie and a hierarchy forest, said search trie being formed of a plurality of branch nodes organized into a binary search trie extending from a root node therein to said hierarchy forest, said hierarchy forest having a plurality of pattern nodes, each of said pattern nodes having a different corresponding one of the pattern values associated therewith, organized into at least one hierarchy of pattern nodes of increasing generality wherein at least one of the pattern values in the hierarchy contains a wildcard;
wherein the processor, in response to the stored instructions:
forms a key in response to information contained in a portion of the packet
traverses along a path through the search trie in order to reach one of the pattern nodes in said hierarchy forest, the path being defined by the key and pivot bit values associated with ones of said branch nodes;
determines whether the key either matches or is subsumed by a pattern value stored in said one pattern node or in a different one of pattern nodes situated at an increasingly higher level of a hierarchy containing said one pattern node, so as to locate a desired one of the pattern values, stored in the hierarchy, that either identically matches or most specifically subsumes the key; and
if the key identically matches or is subsumed by the desired one pattern value:
accesses a stored classification associated with said desired one pattern value; and
returns the stored classification for the packet.
29. The apparatus in claim 28 wherein the classification is a designation of a transmission queue.
30. The apparatus in claim 29 wherein the processor, in response to the stored instructions:
accesses a reference field associated with a pattern node storing said desired one pattern value so as to yield a stored designation of an associated transmission queue as the classification; and
returns the designation of the associated transmission queue for the packet.
31. The apparatus in claim 30 wherein the portion of the packet is a packet header, and the information comprises source and destination addresses and source and destination port designations.
32. The apparatus in claim 28 wherein the processor, in response to the stored instructions, traces along the path by branching from each successive branch node encountered along said path, starting at the root node, in a direction therefrom as specified by a bit of the key indexed by the pivot bit of said each branch node until said one pattern node is encountered.
33. The apparatus in claim 32 wherein the processor, in response to the stored instructions:
examines, when the path reaches said successive branch node, a particular bit in the key indexed by a value of the pivot bit associated with said successive branch node; and
branches, as specified by a value of the particular bit along either a child zero branch or a child one branch, both branches emanating from the successive branch node, to reach a child one or child zero node as a next node along said path, said next node being either a different one of the branch nodes, having a higher pivot bit associated therewith than that of the successive branch node, or the one pattern node.
34. The apparatus in claim 33 wherein the processor, in response to the stored instructions, returns a message, indicative of the no match condition, in the event said key did not match nor was subsumed by the pattern value stored in said one pattern node nor in any of the pattern values stored at all increasingly higher levels of the hierarchy.
35. The apparatus in claim 34 wherein the processor, in response to the stored instructions, inserts, in response to a new transmission queue being established, a new pattern node, into said hierarchy forest, for storing a new pattern value in the data structure, said new pattern node having a designation of the new transmission queue associated therewith.
36. The apparatus in claim 35 wherein the classification is a designation of a transmission queue.
37. The apparatus in claim 36 wherein the processor, in response to the stored instructions:
accesses a reference field associated with a pattern node storing said desired one pattern value so as to yield a stored designation of an associated transmission queue as the classification; and
returns the designation of the associated transmission queue for the packet.
38. The apparatus in claim 37 wherein the portion of the packet is a packet header, and the information comprises source and destination addresses and source and destination port designations.
39. The apparatus in claim 35 wherein the processor, in response to the stored instructions:
inserts said new pattern node at a proper hierarchical location within said hierarchy forest, said inserting comprises updating stored parameter values associated with existing ones of the nodes in the data structure such that said new pattern node is situated properly and is fully reachable therethrough; and
if said hierarchy forest contains a wildcard-based pattern, replicates sufficient structure in said search trie, where necessary, based on bit positions where bit disagreements occur among bits in corresponding bit positions in said new pattern and existing pattern values stored in the data structure, such that the new pattern node is reachable over all different paths that, as a result of a wildcard, can be extended through the search trie to the new pattern node.
40. The apparatus in claim 39 wherein each of said branch nodes in said search trie has a corresponding data structure associated therewith, and wherein the processor, in response to the stored instructions, defines a data structure for each new branch node formed in said search trie, and through so doing, stores in corresponding memory locations in said data structure for said new branch node:
a corresponding one of the pivot bits associated with said new branch node;
child zero and child one pointers to child one and child zero nodes; and
values for VALUE, MASK and IMASK parameters.
41. The apparatus in claim 40 wherein the processor, in response to the stored instructions:
forms the VALUE parameter as a first predefined function of the VALUE parameters of the child one and zero nodes emanating from said new branch node;
forms the MASK parameter as a second predefined function of MASK parameters of the child one and child zero nodes emanating from said new branch node; and
forms the IMASK parameter as a third predefined function of IMASK parameters of the child one and child zero nodes emanating from said new branch node.
42. The apparatus in claim 41 wherein said first and second functions are each a disjunction function, and said third function is a conjunction function.
43. The apparatus in claim 42 wherein said first and second functions is the disjunction function taken up to the pivot bit position for said one branch node, and the third function is the conjunction function taken up to the pivot bit position for said one branch node.
44. The apparatus in claim 39 wherein each of said pattern nodes in said hierarchy forest has a corresponding data structure associated therewith, and the processor, in response to the stored instructions, defines a data structure for each new pattern node formed in said hierarchy forest, and, through so doing, stores, in corresponding memory locations in said data structure:
a reference value;
a pointer to a next higher pattern node in a hierarchy containing said new pattern node; and
values for VALUE, MASK and IMASK parameters.
45. The apparatus in claim 44 wherein the processor, in response to the stored instructions:
sets the VALUE parameter equal to a new pattern value to be stored in said each new pattern node;
forms the MASK parameter to specify a wildcard in any bit position in said VALUE parameter for said each new pattern node; and
forms the IMASK parameter as a fourth predefined function of the MASK parameter of said each new pattern node and the IMASK parameter of a godparent to said each new pattern node.
46. The apparatus in claim 45 wherein said fourth function is a conjunction function.
47. The apparatus in claim 34 wherein the processor, in response to the stored instructions, removes an existing pattern node, from said hierarchy forest, so as to remove a pattern value, and a flow, associated therewith from said data structure.
48. The apparatus in claim 47 wherein the processor, in response to the stored instructions:
eliminates structure from the search trie solely needed to previously support said existing pattern node; and
updates stored parameter values associated with remaining ones of the nodes in the data structure such that the existing pattern node is fully removed from the data structure and is not reachable therein.
49. The apparatus in claim 48 wherein each branch node in the search trie has a corresponding data structure associated therewith, the data structure storing in separate fields therein: a corresponding one of the pivot bits associated with said each branch node; child zero and child one pointers to child one and child zero nodes; and values for VALUE, MASK and IMASK parameters; and wherein the processor, in response to the stored instructions, for, a remaining one of the branch nodes in the forest:
forms the VALUE parameter as a first predefined function of the VALUE parameters of the child one and zero nodes emanating from said remaining one branch node;
forms the MASK parameter as a second predefined function of MASK parameters of the child one and child zero nodes emanating from said remaining one branch node; and
forms the IMASK parameter as a third predefined function of IMASK parameters of the child one and child zero nodes emanating from said remaining one branch node.
50. The apparatus in claim 49 wherein said first and second functions are each a disjunction function, and said third function is a conjunction function.
51. The apparatus in claim 50 wherein said first and second functions is the disjunction function taken up to the pivot bit position for said remaining one branch node, and the third function is the conjunction function taken up to the pivot bit position for said remaining one branch node.
52. The apparatus in claim 48 wherein each pattern node in the hierarchy forest has a corresponding data structure associated therewith, the data structure storing in separate fields therein: a reference value; a pointer to a next higher pattern node in a hierarchy containing said new pattern node; and values for VALUE, MASK and IMASK parameters, wherein the processor, in response to the stored instructions and for a remaining one of the pattern nodes in the data structure, forms the IMASK parameter as a fourth predefined function of the MASK parameter of said remaining one pattern node and the IMASK parameter of a godparent to said remaining one pattern node.
53. The apparatus in claim 52 wherein said fourth function is a conjunction function.
Description
BACKGROUND OF THE DISCLOSURE
1. Field of the Invention
The invention relates to a technique, specifically apparatus and accompanying methods, which utilizes a trie-indexed hierarchy forest that accommodates wildcards for, inter alia, retrieving, given a specific input key, a pattern stored in the forest that is identical to or subsumes the key. This technique finds particular, though not exclusive, use in a computer-based network, for efficiently classifying packets in order to apply any network related processing applicable to the packet, e.g., packet scheduling. Thus, this invention also relates to apparatus for a packet classifier and accompanying methods for use therein that embody this technique.
2. Description of the Prior Art
Packets are routinely used within a computer network to carry digital communication between multiple nodes on the network. Any such network possesses a finite number of different connections among the nodes therein. Generally speaking, whenever a computer stationed at one such node is executing an application wherein the results of that application will be sent to a computer at another network node, the former computer will establish a connection, through the network, to the latter computer and then send these results, in packetized form, over that connection.
A network interface for a personal computer contains both specialized hardware, as required to physically and electrically interface that computer to the network itself, and associated software which controls the interface and governs packetized communication therethrough. Inasmuch as the interface hardware is irrelevant to the present invention, we will not discuss it in any detail. The software is often part of the operating system, such as exists in, e.g., the Windows NT operating system currently available from the Microsoft Corporation of Redmond, Washington (which also owns the registered trademark "Windows NT"). In particular, the software implements, inter alia, various processes that rely on classifying each packet and processing the packet accordingly. One such process is packet scheduling. While several other such packet-related processes, such as routing, security and encryption, are also employed in the software, for purposes of brevity and illustration, we will confine the ensuing discussion to scheduling.
In particular, while a physical network interface to a computer operates at a single speed, e.g., 10 or 100 Mb/second, several different streams of packetized traffic, depending on their data content, can be interleaved together by the computer for simultaneous transmission at different rates through that interface. To accommodate this, a packet scheduler directs individual packets in each such stream to a corresponding software transmission queue (also referred to hereinafter as simply a "queue") from which packets in each such stream will be dispatched, in proper order, for network transmission at a corresponding rate--regardless of the network destinations of these streams. Given the substantially increased data rate required for, e.g., video data over textual data, the scheduler encountering both video and textual packetized data will ensure that each type of data packet is directed to the proper queue such that a significantly greater number of video-carrying packets than text-carrying packets will subsequently be transmitted per unit time through the interface. Other software implemented processes successively pull individual packets from the queues, at the corresponding data rates associated therewith, for multiplexing and subsequent transmission through the interface to the network. Inasmuch as these other processes are not relevant to the present invention, they will not be discussed in any further detail.
Packet classification, for purposes of packet scheduling, will be performed, by constructing a so-called "key" from select fields, contained in a packet in order to associate the packet with a corresponding queue. These fields are illustratively source and destination addresses (typically IP addresses) and corresponding port designations (all of which are collectively referred to herein as "classification fields"). This association typically occurs by consistently concatenating the classification fields, in the packet being classified, into a key which, in turn, is used to access a data structure in order to retrieve therefrom an identification of a corresponding queue for that packet. Since a group of packets that have differing values for their classification fields can nevertheless be transmitted at the same rate and hence should be directed to the same queue, a mask field containing one or more so-called "wildcard" (oftentimes referred to as "don't care") values is often used, through logical combination with the classification fields, to yield an identification associated with a single queue. Generally speaking, this identification is viewed as a "classification pattern", i.e., a bit field having a length equal to the total length of the concatenated classification fields wherein any bit in the pattern can have a value of "1", "0" or "X", where "X" is a wildcard. As a result, a single pattern having a wildcard(s) therein can serve to classify an entire group of such packets. If a match occurs between the non-wildcard bits of the pattern (i.e., taking the wildcard value(s) into account) and corresponding bits of the classification fields for the packet being classified, then an associated queue designation for that pattern is accessed from the data structure.
By virtue of permitting wildcards within packet classifications (i.e., patterns), the complexity associated with searching the data structure, for a pattern given the classification fields for a packet, as well as that of the structure itself, increases considerably. Furthermore, the process of classifying packets lies directly within the data flow and adds another layer of processing thereto, i.e., each outgoing packet must be classified before it is launched into the network. Consequently, any added complexity necessitated by accommodating wildcards will require additional processing. Since only a finite amount of processing time can be allocated to classify each packet and packet classification tends to be processor intensive, such classification needs to be rather efficient--particular for use in a computer that is to experience significant packet traffic.
A principal way to increase classification efficiency is to utilize a process that retrieves stored classification information as fast as possible from a data structure.
While the art teaches various approaches for classifying packets, such as that typified in M. L. Bailey et al, "Pathfinder: A Pattern-Based Packet Classifier", Proceedings of First Symposium on Operating Systems Design and Implementation (OSDI), USENIX Assoc., 14-17 November 1994, pages 115-123 (hereinafter the "Bailey et al" paper), and W. Doeringer et al "Routing on Longest-matching Prefixes", IEEE/ACM Transactions on Networking, Vol. 4, No. 1, February 1996, pages 86-97 (hereinafter the "Doeringer et al" paper), these approaches, while efficient and effective in their target environments, are limited. In that regard, the techniques described therein exhibit retrieval times that are generally linearly related to the number of elements (n) in a classification database. A large network will have a substantial number of different patterns. Hence, the size of a classification data structure for such a network can be considerable which, in turn, will engender linearly increasing and hence relatively long retrieval times as the database expands. For packetized payload data that requires a high data rate, such retrieval times may be inordinately long and cause excessive delay to a recipient user or process.
Furthermore, packet classifiers fall into two distinct types: declarative and imperative. Basically, a declarative classifier stores a pattern as a filter while the filter, used in an imperative classifier, contains a small segment of executable code. In particular, a declarative classifier relies on embedding a description, such as a key, in each packet for which a classification is sought and then matching the key to a pattern in a stored classification and retrieving an associated value, such as a queue designation, therefrom. An imperative classifier, on the other hand, executes, possibly on an interpretive basis, the code segment in each and every stored classification, in seriatim, against the header of an incoming packet to compute a result. The result specifies a corresponding pattern for that packet. Where the number of patterns is rather small, an imperative classifier executes a relatively small number of code segments against each packet. Inasmuch as each code segment is implemented through a highly simplified instruction set, the classification delay for small networks, with relatively few patterns, tends to be tolerable. However, both processing complexity and associated delay become intolerable for packet classification in a large network that has an extensive number of different patterns. Declarative classifiers eliminate the need to process each incoming packet through a separate code segment for each and every classification, and hence provide significantly shorter response times. However, prior art declarative classifiers, such as that typified by the methodology described in the Bailey et al paper, exhibit classification delay that is linear in the number of stored patterns and hence can be intolerably long in a large network which is expected to carry packetized payload data, such as video, that requires a high data rate.
While the Doeringer et al paper describes a declarative classification methodology that can handle a pattern containing wildcards, this methodology requires that all wildcards be located in a contiguous group at the end of the pattern. Inasmuch as this methodology is directed to IP (Internet Protocol) routing where arbitrarily placed wildcards do not occur, this methodology is acceptable in that application. However, for packet classification, where classification can occur for various different purposes, such as scheduling, and a wildcard can arbitrarily occur anywhere in a pattern, the methodology taught by the Doeringer et al paper is simply unsuited for such broad-based classification.
Therefore, a need exists in the art for a fast and versatile packet classifier that incorporates a search technique capable of rapidly retrieving stored information, from a data structure, given a specific value in a group of classification fields in the packet. Furthermore, this technique should accommodate a wildcard(s) located in any arbitrary position within a stored pattern and, to reduce delay, should preferably exhibit retrieval times that increase less than a linear function of the number of stored patterns.
Moreover, and apart from use in packet classification, a broad need continues to exist in the art for a generalized technique that, given specific input data (in the form of, e.g., a key), can rapidly retrieve a stored pattern, containing a wildcard(s) at any location therein, from a data structure--regardless of what the input data and patterns specifically represent. Such a technique would likely find widespread use including, but not limited, to packet classification. While the opposite problem of how to rapidly find a specific datum stored in a database given a generalized input pattern has been extensively investigated in the art, the converse problem, inherent in, e.g., a packet classifier, of how to swiftly retrieve a generalized stored pattern from a data structure given specific input data, e.g., classification fields, has apparently received scant attention in the art, which thus far, as discussed above, has yielded rather limited and disappointing results.
SUMMARY OF THE INVENTION
Our present invention satisfies these needs and overcomes the deficiencies inherent in the art through a stored data structure predicated on our inventive rhizome-indexed hierarchy forest, specifically our inventive modified hierarchical Patricia tree. In that regard, by structuring stored pattern data through a binary search trie, i.e., a search structure, that indexes into pattern hierarchies, i.e., a pattern hierarchy structure, wherein the hierarchies accommodate wildcard(s), the structure can be used to retrieve a pattern stored in the forest, given a specific input key, that is identical to or subsumes the key. Advantageously, our inventive structure permits a wildcard to be stored at any arbitrary bit position in a pattern.
In accordance with our broad teachings, our inventive data structure (also, for simplicity, referred to herein as a "rhizome"), typically stored in computer memory and/or on a computer-readable storage medium, contains two interconnected data structures: a binary search trie (forming a "search structure" when stored in memory and/or on a readable storage medium) and a hierarchy forest (forming a "hierarchy structure" when stored in memory and/or on a readable storage medium). The search trie, implemented through, e.g., a generalization of a Patricia tree, provides an indexed path to each unique, most specific, pattern stored in a lowest level of the hierarchy forest. The hierarchy forest, which extends beyond the search trie, organizes the patterns into nodal hierarchies of strictly increasing generality, starting with those that are most specific. Each pattern is stored in a separate pattern node. Each hierarchy of pattern nodes, at its lowest level, contains a node with a specific lowest level pattern. That node is then followed in the hierarchy by nodes, to the extent they exist, with increasingly general patterns. Each of these latter patterns, owing to an included wildcard(s), completely subsumes, by virtue of strict hierarchy, the pattern in the node situated immediately therebelow. By virtue of accommodating hierarchically related wildcard-based patterns, our inventive rhizome permits multiple nodes, either branch or pattern nodes, to point to a common pattern node.
By virtue of our specific inventive teachings, given a specific input ("search") key, a stored pattern, that provides the most specific match, either identically or which completely subsumes the key, is retrieved from the stored database through a two-fold retrieval process. First, the search trie, having a topology of branch nodes, is traversed by branching, via child paths, at successive branch nodes therein as determined by corresponding bits in the key, each such bit being designated by a so-called "pivot bit" associated with each such branch node, until a lowest-level pattern node is reached. The pivot bit defines a bit position at which at least a pair of stored patterns in the rhizome disagree with each other. The branch nodes are organized in terms of increasing pivot bit value. Second, the pattern hierarchy, containing that lowest-level pattern node, is itself searched until the most specific pattern therein is found which matches or completely subsumes the key. A search ultimately succeeds if the retrieval process locates a stored pattern that most specifically matches, either identically or in terms of subsuming, the search key. If a search succeeds, then contents of a so-called "reference field" associated with the stored pattern are returned for subsequent use.
Our inventive rhizome exhibits the advantageous characteristic that the key will match either a specific pattern stored in the rhizome or an increasingly general pattern stored above it in a hierarchical chain, or that key will not have a matching pattern stored anywhere in the rhizome. Hence, our inventive rhizome effectively narrows down a search for a pattern that most specifically matches a key from an entire database of stored patterns to just a linear chain of patterns in order of strict hierarchy, the latter being usually substantially smaller in size than the former. Specifically, through our inventive rhizome, the maximum number of bit comparisons that needs to be made to retrieve a pattern for a given search key generally equals the number of bits in the key itself plus the number of patterns, in the longest hierarchy chain in the hierarchy forest, less one. Moreover, owing to the strict hierarchy inherent in every hierarchical chain in our inventive rhizome, i.e., where patterns at higher levels in a given pattern hierarchy completely subsume those at lower levels therein, then, if a bit-by-bit comparison is to be made for patterns in a pattern hierarchical chain against an input key, then as increasingly general patterns are encountered in that hierarchy, there is no need to compare a bit in any bit position of the key that has previously been compared against and found to match that in any lower level pattern in that chain. This, in turn, further reduces processing time required to find a desired pattern in the rhizome.
Advantageously, our present invention finds particular, though not exclusive, use in a packet classifier for efficiently, i.e., very rapidly, classifying packets for scheduling. In such an application, our inventive rhizome stores packet classifications as individual classification patterns therein, wherein each classification (i.e., pattern) is typically formed of, e.g., source-destination addresses and source-destination port designations (all collectively referred to as "classification fields"). Depending upon a specific implementation, the classification field may also contain: other fields from a packet in addition to the addresses and port designations, fewer fields than these addresses and port designations, or even other fields from a packet. Wildcard-based classifications, i.e., patterns containing one or more wildcards, occur inasmuch as packets with different classification fields can be directed to a common transmission queue; hence, having a common classification. Each different pattern is stored at a separate corresponding pattern node in the inventive rhizome, along with, as the corresponding reference field therewith, a queue designation. Packets are dispatched from each queue for network transport at a particular data rate associated with that queue. In operation, as each different queue is established or removed, a corresponding classification pattern is either inserted into or removed from the classifier, specifically our inventive rhizome therein, respectively. Our inventive packet classifier forms a search key for each packet, typically by consistently concatenating the classification fields appearing in a header of the packet. The search key is then applied to our inventive rhizome which determines whether that key exists in the rhizome, by virtue of either matching an identical classification pattern, i.e., a match of all non-wildcard bits in the key with corresponding bits in the pattern, stored therein or being completely subsumed by a stored classification pattern. In either case, if such a classification pattern is found for the key, the search succeeds. Hence, the classifier then returns the contents of associated reference field therefor to specify the queue to which the packet is then directed.
Our inventive rhizome provides an advantageous feature that, for large databases, asymptotically, average search delay through the rhizome is a logarithmic function of a number of patterns stored in the rhizome rather than linear in the number (n) of patterns, as occurs with conventional databases of patterns containing wildcards. Hence, use of our inventive rhizome, particularly with a large and increasing classification database--as frequently occurs in packet classifiers, appears to provide significant time savings over conventional packet classification techniques.
Another advantageous feature provided by our inventive rhizome is backward compatibility with a conventional Patricia tree. In particular, if no wildcard-based patterns exist within our inventive rhizome or are to be inserted therein, then the structure of this rhizome would be identical to that of a conventional Patricia tree containing these patterns thereby assuring full backward compatibility. Furthermore, our inventive retrieval, insertion and removal processes would each operate on this rhizome, in the absence of any wildcard-based patterns, with substantially, if not totally, the same degree of computational complexity as would the corresponding processes in a conventional Patricia tree.
BRIEF DESCRIPTION OF THE DRAWINGS
The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
FIG. 1 depicts a high-level block diagram of two illustrative networked computers which each utilizes the teachings of the present invention;
FIGS. 2A and 2B depict typical key 200, and pattern 250 and its accompanying mask field 255, respectively, used in classifying a packet for scheduling;
FIG. 3 depicts a block-diagram of illustrative computer system 100, shown in FIG. 1, which utilizes the teachings of the present invention;
FIG. 4 depicts a simplified block diagram of network support software 112 that forms part of operating system (O/S) 110 shown in FIGS. 1 and 3;
FIG. 5A depicts a simplified high-level diagram of a packet scheduler that utilizes our inventive packet classifier;
FIG. 5B depicts a high-level flowchart of our inventive packet classification process 550 performed by packet classifier 410 shown in FIGS. 4 and 5A;
FIGS. 6A and 6B graphically depict a conventional database retrieval problem and an inverse problem inherent in a packet classifier, respectively;
FIGS. 7A and 7B depict single pattern 710, containing so-called wildcards, and resulting patterns 750 all subsumed therein, respectively;
FIGS. 8A, 8B and 8C respectively depict a group of wildcard-containing patterns, the same group hierarchically organized as a tree, and four-bit keys that are mapped into the patterns;
FIGS. 9A and 9B depict illustrative conventional Patricia trees 900 and 960, respectively, with exemplary data retrieval operations shown therefor;
FIG. 9C depicts data stored at each node, either branch or data node, of a conventional Patricia tree, such as that illustratively shown in either FIG. 9A or FIG. 9B;
FIG. 9D depicts data stored at each node of conventional Patricia tree 900 shown in FIG. 9A;
FIG. 9E depicts a conventional node insertion operation performed on sub-tree portion 902 shown in FIG. 9D;
FIG. 9F depicts a conventional node removal operation performed on sub-tree portion 902 shown in FIG. 9D;
FIG. 10 depicts illustrative rhizome 1000 which embodies the teachings of our present invention;
FIGS. 11A and 11B diagrammatically depict, through corresponding Venn diagrams, permitted hierarchical relationships between a pair of patterns stored in our inventive rhizome and specifically within hierarchy forest 1000.sub.2 shown in FIG. 10;
FIG. 11C diagrammatically depicts, through a Venn diagram, disallowed relationships between a pair of patterns stored in our inventive structure;
FIG. 12A depicts sub-structure portion 1002, as shown in FIG. 10, prior to a node insertion operation;
FIG. 12B depicts sub-structure portion 1002' which results after a node has been inserted into sub-structure 1002 shown in FIG. 12A, along with corresponding contents of VALUE, MASK and IMASK fields of each of the nodes shown in sub-structure portion 1002' both before and after the insertion;
FIG. 12C depicts sub-structure portion 1002' prior to removal of node 1220 therefrom;
FIG. 12D depicts sub-structure portion 1004 of rhizome 1000 shown in FIG. 10;
FIG. 12E depicts sub-structure portion 1004 in a re-oriented position prior to insertion of a new pattern node therein;
FIG. 12F depicts sub-structure portion 1004' which results after insertion of new pattern node 1230 into sub-structure portion 1004 shown in FIG. 12E;
FIG. 13 graphically depicts data fields associated with each node in our inventive rhizome;
FIG. 14 depicts a flowchart of Retrieval routine 1400 which executes within our inventive pattern classifier 410 shown in FIG. 4;
FIG. 15 depicts a flowchart of Insertion routine 1500 which is executed by our inventive pattern classifier 410 shown in FIG. 4;
FIG. 16 depicts the correct alignment of the drawing sheets for FIGS. 16A and 16B;
FIGS. 16A and 16B collectively depict a flowchart of Node Insert routine 1600 that is executed by, e.g., Insertion routine 1500 shown in FIG. 15;
FIG. 17 depicts a flowchart of Removal routine 1700 which is also executed by our inventive pattern classifier 410 shown in FIG. 4;
FIG. 18 depicts a flowchart of Node Remove routine 1800 that is executed by, e.g., Removal routine 1700 shown in FIG. 17;
FIG. 19 depicts a flowchart of Replicate routine 1900 that is executed by, e.g., Node Insert routine 1600 shown in FIGS. 16A and 16B; and
FIG. 20 depicts Eliminate routine 2000 that is executed by, e.g., Node Remove routine 1800 shown in FIG. 18.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
DETAILED DESCRIPTION
After considering the following description, those skilled in the art will clearly realize that the teachings of our present invention can be readily utilized in substantially any environment that processes packets to expedite packet classification. Our present invention will advantageously function not only within a computer system but also, generally speaking, within any apparatus that applies substantially any network-related processing to packets. Furthermore, though the broad teachings of our invention, particularly our inventive rhizome, are suitable for advantageous use within a wide variety of computer-based data classification processes to locate a stored generalized pattern given specific input data, to facilitate understanding and for brevity, we will describe our invention in the context of illustratively and predominantly a computer system which, as part of its operating system, classifies packets, for scheduling.
A. Packet Classification in a Network Environment
FIG. 1 depicts a high-level block diagram of networked environment 5 which illustratively contains computers 80 and 100 inter-connected through network 60. Inasmuch as the specific components that constitute network 60 are irrelevant to the present invention, these aspects will not be addressed hereinafter. Computers 100 and 80 are connected, via appropriate electrical connections 50 and 70, to network 60. These connections can be, e.g., wired, i.e., physical, or wireless links, the latter including radio and/or optical links; the actual form of the link being immaterial to the present invention.
Suffice it to say and to the extent relevant, computer system 80 located at a receiving location (henceforth a "receiver") can obtain data, such as, e.g., video or text data, from, e.g., application programs 130 located at a sending location (henceforth a "sender")--the actual mechanism through which the data is requested being irrelevant to the present invention. To provide network connections, computer system 100 also provides within its executing operating system 110 appropriate network support software 112. This network support software is often part of operating system 110, which is typified by, e.g., the Windows NT operating system currently available from the Microsoft Corporation of Redmond, Washington (which also owns the registered trademark "Windows NT"). For a clear understanding, the reader should also simultaneously refer, throughout the following discussion, to both FIGS. 1 and 4, the latter depicting a simplified block diagram (to the extent relevant) of network support software 112.
Network support software 112 provides, inter alia, packet classification and packet scheduling functionality, implemented through packet classifier 410 and packet scheduler 450, respectively. In operation and in essence, the network support software receives, as symbolized by path 120, a stream of packets from, e.g., application programs 130, and then, for purpose of scheduling, classifies, through packet classifier 410, each such packet as to its corresponding transmission queue (also referred to herein as simply a "queue"). The scheduler then directs each such packet to its corresponding queue. Thereafter, as symbolized by dashed line 140, the packets in each queue (not specifically shown in either FIGS. 1 or 4, but will be discussed in detail below in conjunction with FIG. 5A) are collectively interleaved together and transmitted onto network 60 at a corresponding data rate for each queue. Thus, through packet classification, scheduler 450 assures that the individual packets are to be launched into network 60 at a rate commensurate with a data rate of their payload data, Each queue may contain packets destined for different corresponding networked computers, e.g., computer 80 and/or any other computer(s) connected to network 60.
In essence, packet classification will be performed, for purposes of scheduling, by using values of specific classification fields contained in header of a packet, and specifically concatenating these fields together, in a consistent manner, to form a so-called "key" which, in turn, is used to access a data structure to retrieve a designation of a proper transmission queue. These classification fields that form illustrative key 200 are graphically shown in FIG. 2A.
In particular, each packet contains source and destination IP (Internet Protocol) addresses, i.e., an network address of a computer or other network device ("source device") at which that packet was created and a network address of a computer or other network device ("destination device") at which that packet is to be delivered. The packet also contains designations of source and destination ports, i.e., specific ports at the source and destination devices at which the packet originated and at which the packet is to be received, respectively. The source and destination addresses, and source and destination port designations collectively form the "classification fields". In other implementations, the classification fields may contain: other fields from a packet in addition to these addresses and port designations, fewer fields than these addresses and port designations, or even other fields from a packet. In any event, for purposes of the present embodiment, these addresses and port designations, when appropriately concatenated to form key 200, are used to access a so-called "classification pattern" from a stored database. As shown, key 200 contains four successive contiguous fields as the classification fields: source port designation field 210, destination port designation field 220, source address field 230 and destination address field 240. Though classification patterns are a specific type of pattern, for brevity, we will use these two terms synonymously during the course of describing our invention in the context of a packet classifier.
For each different transmission queue established at a computer, a separate corresponding pattern exists and is stored within a database. Moreover, multiple stored patterns may often correspond to the same queue. In any event, complications arise inasmuch as packets with different values for their classification fields are often to be transmitted at a common data rate, and hence are to be directed to a common transmission queue. In order to direct packets with different values of the classification fields to a common queue, each pattern, such as pattern 250 shown in FIG. 2B, has mask field 255 associated therewith, wherein a separate mask sub-field is provided for each of the individual classification fields, i.e., for each of the source and destination address fields and the source and destination port designation fields. As shown, pattern 250 is formed of: source port designation field 264, destination port designation field 274, source address field 284 and destination address field 294; with mask field 255 being formed, with corresponding concatenated sub-fields, of: source port designation mask sub-field 267, destination port mask sub-field 277, source address mask sub-field 287 and destination address mask sub-field 297.
Each mask sub-field identifies, by a one in a given bit position, each bit that is important (i.e., must be matched in value) in a corresponding classification field, (i.e., an address or port designation), and, by a zero in a given bit position, which bit in that classification field can be ignored, i.e., that which is a so-called "don't care" bit or (as will be equivalently referred to hereinafter) "wildcard". For an example shown in FIG. 2B, consider, as example 285, an address in source address field 284 to be the value 101001010 and a corresponding mask in mask sub-field 287 to be the value 111001110. What this means is that the source address bit-values in address bit positions (starting at MSB (most significant bit) position zero on the left and counting to the right) three, four and eight can be ignored. Hence, the source address becomes 101XX101X where "X" represents a "wildcard". Hence, any packet having a source address that contains the values "101" in bit positions zero through two and again in bit positions five through seven, regardless of its bit values in bit positions three, four and eight, will match the source address in pattern 250. By virtue of having a wildcard(s), a single source address in a pattern, such as that in pattern 250, can match against multiple different source addresses in packets. To specify just one particular source address rather than a group, all bits in the mask sub-field for the source address in a pattern would be set to one, i.e., denoting no wildcards; hence requiring a match over all bits in the address field with a corresponding source address in a key for a packet. The source port designation, destination port designation and destination address are all identically specified for a pattern through by a combination of an address or port designation in fields 264, 274 and 294, respectively, with a corresponding value in mask sub-fields 267, 277 and 297, respectively, of mask field 255.
As noted above, once the database search successfully retrieves a pattern that matches or subsumes within it the key for a packet, the queue designation associated with that pattern specifies a particular transmission queue, within the scheduler, to which that packet will be directed. The queue designations are simply predefined and stored in the database. Since the manner through which the bits in the queue designations are set to specify an associated transmission queue is irrelevant to the present invention, we will omit all those details from the ensuing discussion.
By virtue of permitting wildcards within patterns, the complexity associated with searching the data structure, for a pattern given a key, as well as that of the structure itself increases considerably. Furthermore, the process of classifying packets lies directly within the data flow and adds another layer of processing thereto, i.e., each outgoing packet must be classified before it is launched into the network. Consequently, since packet classification is processor intensive, then any added complexity necessitated by accommodating wildcards will require additional processing. Since only a finite amount of processing time can be allocated to process each packet, packet classification should proceed as efficiently as possible--particularly for computers that are expected to handle significant packet traffic.
As discussed in much greater detail below and in accordance with our inventive teachings, we teach an inventive data structure which can rapidly classify specific data to associated wildcard-based patterns. This structure is used illustratively in a packet classifier, but it can be employed in any application that requires storage and rapid retrieval of any patterns containing wildcards. However, prior to discussing specific details of our invention, we will digress somewhat to describe the salient aspects of a typical computer system, such as system 100 shown in FIG. 1, which utilizes our invention.
In that regard, FIG. 3 depicts a block-diagram of illustrative computer system 100, shown in FIG. 1, which utilizes the teachings of the present invention.
As shown, this system, illustratively a personal computer, comprises input interfaces (I/F) 310, processor 320, communications interface 330, memory 340 and output interfaces 370, all conventionally interconnected by bus 380. Memory 340, which generally includes different modalities (all of which are not specifically shown for simplicity), illustratively random access memory (RAM) and hard disk storage, stores operating system (O/S) 110 and application programs 130. As noted above, the specific software modules that implement our inventive teachings are incorporated within O/S 110. This operating system, apart from these modules, may be implemented by any conventional operating system, such as the WINDOWS NT operating system. Given that, we will not discuss any components of O/S 110 other than those needed to specifically implement our invention, inasmuch as the rest are irrelevant.
As shown in FIG. 3, incoming information can arise from two illustrative external sources: network supplied information, e.g., from the Internet and/or other networked facility such as an intra-net (all generally shown as network 60 in FIG. 1), through network connection 50 to communications interface 330 (shown in FIG. 3), or from a dedicated input source via path(s) 305 to input interfaces 310. Dedicated input can originate from a wide variety of sources, e.g., an external database, video feed, scanner or other input source. Input interfaces 310 are connected to path(s) 305 and contain appropriate circuitry to provide the necessary and corresponding electrical connections required to physically connect and interface each differing dedicated source of input information to computer system 100. Under control of the operating system, application programs 130 exchange commands and data with the external sources, via network connection 50 or path(s) 305, to transmit and receive information typically requested by a user during program execution.
Input interfaces 310 can also electrically connect, via leads 382, and interface user input device 385, such as a keyboard and mouse, to computer system 100. Display 394, such as a conventional color monitor, and printer 398, such as a conventional laser printer, can be connected, via leads 392 and 396, respectively, to output interfaces 370. The output interfaces provide requisite circuitry to electrically connect and interface the display and printer to the computer system. As one can appreciate, the particular type of input and output information and a specific modality through which that information is applied to or produced by system 100 are both immaterial for purposes of the present invention and thus will also not be discussed in any detail hereinafter.
In operation, typically one or more application programs 130, such as a browser or other application that requests data, such as here video data, from a remote server, execute under control of O/S 110. For each executing application program, one or more separate task instances are invoked by a user in response to each user specified command, typically entered interactively through appropriate manipulation of user input device 385 given available command choices, such as in a menu or icons in a toolbar, and accompanying information then presented on display 394. Hardcopy output information from an executing application is provided to the user through printer 398.
Furthermore, since the specific hardware components of computer system 100 as well as all aspects of the software stored within memory 340, apart from the modules that implement the present invention, are conventional and well-known, they will not be discussed in any further detail.
FIG. 5A depicts a simplified high-level diagram of packet scheduler 450 that utilizes our inventive packet classifier 410 and forms part of network support software 112 shown in FIG. 4. For ease of understanding, the reader should also simultaneously refer to FIGS. 1, 3, 4 and 5A during the following discussion.
Packets destined for transmission over network 60 from computer 100 are applied, as symbolized by line 405, to scheduler 450 located within network support software 112. These packets can contain multiple streams of different packetized data, such as, for example, separate streams of packetized video data or packetized textual data. These streams, generated by a common computer and regardless of their data content, are transmitted through a common network interface, via network 60, to the same or different network destinations. Though a physical networked connection operates at a single speed, several different streams of packetized traffic, depending on their data content, are interleaved together by network support software 112 for simultaneous transmission at different rates through the network interface--with the rate provided to each stream being governed by its data content. To accomplish this, scheduler 450 contains internal transmission queues 470, composed of individual queues 470.sub.1, 470.sub.2, . . . , 470.sub.n. Each queue is associated with a different transmission rate. For example, packets of, for example, video data, to be transmitted at 3 Mb/second would be directed to queue 470.sub.1 ; those packets to be transmitted at 2 Mb/second would be directed to queue 470.sub.2, and so forth for transmission speeds associated with the other queues. Packets to be transmitted at a relatively low data rate, such as packetized textual data, would be directed to queue 470.sub.n. To properly direct each packet appearing on line 405, scheduler 450 reads the classification fields from each such packet and forms a key therefor. This key is then passed, as symbolized by line 415, to classifier 410. As discussed above, the classifier, in turn, searches for a pattern stored therein which either matches or subsumes the key. If such a key is found, classifier 410 accesses the queue designation associated with that pattern and returns that designation, as symbolized by line 420, to the scheduler. In response to this designation, demultiplexor 460 contained within the scheduler directs the packet to the specific queue, i.e., one of queues 470, specified by the designation. By appropriately classifying each packet in succession based on its classification fields and directing that packet to the proper queue therefor, scheduler 410 and classifier 450 collectively ensure that each stream of packets will be scheduled for transmission over network 60 at a rate commensurate with the data carried in that stream. Other software implemented processes (not specifically shown but well-known) successively pull individual packets from the queues, at the corresponding data rates associated therewith, for multiplexing (as collectively symbolized by dashed lines 480) and subsequent transmission, via line 50 and through a network interface adapter (contained within communication interfaces 330) and accompanying network card driver 540, onto network 60. Inasmuch as these other processes are not relevant to the present invention, they will not be discussed any further.
FIG. 5B depicts a high-level flowchart of our inventive Packet Classification process 550 performed by our inventive packet classifier 410 shown in FIG. 4.
Upon entry to this process--which is separately performed for each packet, block 555 is executed which extracts the fields of interest, i.e., the classification fields, from a header of that packet. Thereafter, the extracted fields are consistently concatenated, by block 560, into a single key, i.e., a current search key, to form key 220 shown in FIG. 2. To minimize search delay, recently used keys and their associated flows are cached within memory 340 (see FIG. 3) so as to advantageously eliminate the need to conduct a complete database retrieval operation on each and every search key. Hence, once the current search key is formed, block 565, as shown in FIG. 5B, examines a cache to locate that key. Next, decision block 570 determines whether the cache contains the current search key. If it does, then execution proceeds, via YES path 572, to block 575 which retrieves the pattern, and its associated queue designation, for this packet from the cache. Alternatively, if the cache does not contain the current search key, then this decision block routes execution, via NO path 574, to block 580. This latter block, when executed, retrieves the pattern for the current block by searching our inventive data structure, also referred to herein as a "rhizome" and which will be discussed in considerable detail below, for that pattern. Once this pattern is located, either by retrieving it from the cache or from our inventive rhizome, the queue designation therefor is then returned, through execution of block 590, to a requesting client, thereby completing execution of process 550 for the packet. Though not specifically shown, once the pattern and the queue designation are retrieved from our inventive rhizome, these items are then cached for subsequent short-term retrieval.
B. Conventional Database Searching vis-a-vis Packet Classification Searching
Our inventive data structure and retrieval algorithms, particularly for use in packet classification, are directed to an entirely opposite problem to that routinely encountered with conventional databases.
In particular, and as shown in FIG. 6A, a conventional database, illustrated by database 630, contains specific pieces of information, such as entries 630.sub.1, 630.sub.2, 630.sub.3 and 630.sub.4, each of which stores a different numeric value. These values are illustratively 1011, 1010, 1111 and 0110 for entries 630.sub.1, 630.sub.2, 630.sub.3 and 630.sub.4, respectively. With such a conventional database, a generalized input pattern, such as pattern 610 here having an illustrative value 1X11 with a wildcard in bit position one, is applied; the database manager is instructed to return all stored data values that match, i.e., are identical to or are subsumed by, the pattern. Accordingly, entries 630.sub.1 and 630.sub.3 will be returned as containing values 1011 and 1111 that are both subsumed within input pattern 1X11.
In sharp contrast, we are faced with the problem, inherent in a packet classifier and as illustrated in FIG. 6B, of searching a database of generalized patterns for a classification pattern that matches, either identically or subsumes within it, a specific input value, i.e., a key. Here, database 670 illustratively contains stored values "1XX10" in entry 670.sub.1, "0X101" in entry 670.sub.2, "01101" in entry 670.sub.3 and "01001" in entry 670.sub.4. Illustrative input key 650, containing a value "10110", is applied to a retrieval process in an attempt to return a stored classification pattern that is either identical to or contains the key. Here, the retrieval succeeds in returning the stored classification pattern "1XX10" in entry 670.sub.1. As one can appreciate, the stored and returned classification pattern "1XX10", containing wildcards in bit positions one and two, subsumes within it the input key "10110".
As noted, a specific input key can identically match or be subsumed within a generalized stored classification pattern. The number of different keys that will be subsumed within any given classification pattern will be dictated by the number of wildcards contained within that pattern. If a stored classification pattern contains no wildcards, then only one key, i.e., that which identically matches the stored classification pattern, will result in that pattern being returned. Alternatively, if a stored four-bit classification pattern contains all wildcards, i.e., the value "XXXX", then any possible four-bit key, for a total of 16 different four-bit keys, will result in that stored classification pattern being returned. In that regard, consider FIG. 7A which depicts stored eight-bit classification pattern 710 having three wildcards, in bit positions two, three and seven. As a result, eight different eight-bit keys 750, as shown in FIG. 7B, when applied to a database containing stored classification pattern 710 will result in that particular classification pattern being returned.
Classification patterns can also be arranged hierarchically as trees with arbitrary depths and in terms of increasing generality. For example, FIG. 8A depicts six illustrative four-bit classification patterns 810. Three of these classification patterns do not contain any wildcards, while the remaining three contain one, two or three wildcards. These classification patterns result in a pattern hierarchy, i.e., a tree, shown in FIG. 8B having three levels; namely, level 832 being the topmost, level 834 being an intermediate level and level 836 being a lowest level. The lowest level in the tree contains those classification patterns, here 1101, 0111 and 0101, that are the most specific, i.e., contain the least number of wildcards, i.e., here zero. Inasmuch as classification patterns 0111 and 0101 are both subsumed within pattern 01XX, and pattern 1101 is subsumed within pattern 10X, intermediate level 834 contains higher-level patterns 110X and 01XX. Because both of these higher-level patterns are themselves subsumed within classification pattern X1XX, the latter pattern is situated at the highest level. Following a retrieval strategy that if any two or more patterns match (or subsume) a key, the most specific pattern is selected first and applied as output, then for set 850 of sixteen possible four-bit input keys, as shown in FIG. 8C, resulting patterns 870 would be retrieved for those keys. A hierarchy tree need not be binary or even regular; any shape at all can occur provided the tree conforms to the actual hierarchical relationship among the stored classification patterns. Furthermore, a classification pattern does not need to have a hierarchical connection to all or any other such patterns; therefore, a group of classification patterns may include several different hierarchy trees. We will hereinafter refer to a group of trees as a "forest" and a structure realized by a group of classification patterns as a pattern "hierarchy forest".
A pattern classifier functions by retrieving, for a given specific input key, the most specifically matching stored classification pattern from among all the classification patterns stored in a pattern hierarchy forest in a database.
C. Classification Data Structures
We have discovered that a very efficient data structure for pattern classification results from appropriately modifying, in accordance with our inventive teachings, a conventional trie, specifically a Patricia tree, to accommodate hierarchically-related classification patterns that contain wildcards. A trie is a search tree in which branches are followed, during retrieval, based upon a digital value of a search key and wherein the data against which a key is ultimately matched is stored in the leaves of the tree. A Patricia tree is such a binary trie in which each internal node has only two children emanating therefrom and no one-way branching. A superset of a conventional Patricia tree, the superset which we have developed in accordance with our invention and which we will refer to hereinafter as a "rhizome", advantageously accommodates wildcards in stored patterns--an accommodation wholly absent from a conventional Patricia tree.
To simplify reader understanding, we will proceed from here by first discussing, through examples, a conventional Patricia tree; followed by discussing our inventive rhizome; and finally discussing the various inventive software routines that, in the context of use in illustratively a packet classifier for use in scheduling, execute within network support software 112 (see FIGS. 1 and 4) and implement retrieval, insertion and removal operations on our inventive rhizome. Our discussions of the conventional Patricia tree and of our inventive rhizome will also include graphical-based descriptions of the associated retrieval, insertion and removal operations thereupon.
1. Conventional Patricia Tree
a. Overview
Any data structure designed for use in packet classification must support retrieval of a stored classification pattern therefrom, insertion of a new classification pattern therein and removal of a stored classification pattern therefrom. One data structure that provides relatively fast retrieval, insertion and removal operations is a Patricia tree.
As noted above, a Patricia tree is a binary trie. Each internal node in this tree has exactly two branches emanating therefrom and contains a bit index value. When a given node is reached, movement therefrom can occur in either of two directions, namely along one branch to a child or along the other branch to another child. The child can be either another internal node or a leaf node. Each leaf in the tree contains a data value. Nodes extend outward from a root of the tree in a hierarchical, typically multi-level, fashion as one traverses a path through the tree. Each internal node contains a numeric bit index value which specifies the specific bit in an input key that governs movement from that node: if that bit has a one value, one branch is taken; else if it has a zero value, the other branch is taken, and so forth through successive nodes along the path until a leaf node is reached. While the bit index for any child is greater than that of its parent, the bit indices encountered along any such path need not be consecutive, merely ascending.
The manner through which nodes are interconnected to form a Patricia tree and the values of their associated bit indices are strictly determined by the value that is to be stored in each leaf node and by disagreements, where they occur starting at the root and continuing at each successively higher bit position, among all such leaf values. Each bit index specifies a bit position at which such a disagreement exists.
With this in mind, consider FIG. 9A which depicts illustrative conventional Patricia tree 900. This tree stores seven data values, labeled for simplicity as A (value 0000001000), B (value 0000101000), C (value 0000111000), D (value 0010001000), E (value 0010001100), F (value 0010001101) and G (value 0011001000) stored in corresponding leaves. This tree, which is relatively simple, contains six internal nodes, hierarchically arranged in at most four levels, specifically nodes 905, 915, 920, 935, 940 and 945. Each node has two branches, specifically a "zero" branch and a "one" branch, emanating therefrom, such as branches 907 and 910 respectively emanating from node 905, and so forth for all the other nodes. As noted above, the nodal pattern is determined by the bit positions at which disagreements exist among the bits of all the stored values, starting at the root and continuing throughout the remaining bit positions.
In that regard, considering values A-G, starting with the highest order bit position (with the most significant bit (bit position zero) situated at the far left) and progressing incrementally in a step-wise fashion throughout the other bit positions from left to right, a disagreement among bits for these stored values first occurs in bit position two. This position yields node 905, i.e., the root of the tree with a bit index of two. Two branches emanate from node 905: zero branch 907 and one branch 910. For those values that have a zero in bit position two (i.e., values A, B and C), the next disagreement occurs in bit position four. Hence, a second node, here node 915, terminates zero branch 907. This second node, assigned a bit index of four, also has zero and one branches emanating from it: here zero and one branches 917 and 919, respectively. Value A, being the only value with a zero in bit positions two and four, terminates zero branch 917 and hence is child zero of node 915. Now, for those values that have a zero in bit position two and a one in bit position four, a further disagreement exists with respect to bit position five, i.e., values B and C respectively have a zero and one at this bit position. Hence, a third node, here node 920 with a bit index of five terminates one branch 919 (and becomes child one of node 915) with value B terminating zero branch 922 (hence being a child zero of node 920) and value C terminating one branch 924 (hence being a child one of node 924), and so forth.
Hence, given the differences among values A-G in bit positions 2, 3, 4, 5, 7 and 9, conventional Patricia tree 900 results.
Another illustrative example of a Patricia tree is tree 960 depicted in FIG. 9B. Here, tree 960 contains two internal nodes: specifically node 961, being the root; and node 967. This tree stores three values L (value 011011), M (value 011101) and N (value 111011). Inasmuch as these three values, L, M and N, have bit disagreements at bit position zero and next at bit position three, nodes 961 and 967 have bit indices of zero and three, respectively, thus forming tree 960 with three leaves sufficient to store the three values L, M and N. In that regard, root node 961 has zero branch 965 and one branch 963 emanating therefrom, with node 967 terminating the zero branch and a leaf containing value N terminating one branch 963. Node 967 has zero branch 969 and one branch 968 emanating therefrom which terminate at corresponding leaf nodes storing values L and M, respectively. While disagreements exist among these values at higher-order bits, such disagreements can be ignored since no further decision nodes are needed to provide a distinct path through tree 960 to each stored value.
b. Retrieval
Retrieving data from a Patricia tree involves searching the tree through use of a so-called key. Bit values of the key define a search path that will be traversed through the tree. Advantageously, Patricia trees exhibit a property that either the key will be identically stored in a leaf of the tree that terminates the path designated by the key or the key will not exist in the tree at all. In that regard, a key defines one and only one path through a conventional Patricia tree. Consequently, if a value specified by a key is stored in a Patricia tree, then that value can only be accessed through one unique path, as defined by the key, through the tree. Each leaf stores not only such a value but also, as a data item, a reference field, wherein information stored in the reference field is returned if the search is successful. Usually, the reference field is the object of the search with the key being a mechanism needed to uniquely traverse the tree. As we will describe below in conjunction with our inventive rhizome, the reference field illustratively stores a designation of a transmission queue while the key is correspondingly formed by appropriately concatenating classification fields.
Returning to FIG. 9A, assume that a search key of 0110010110 is provided; the goal is to determine whether this key is stored in this conventional Patricia tree. To determine this, one starts at the root of tree 900 and traverses along a search path therethrough. The search path incrementally progresses through the tree in a branch-wise fashion with the branch taken at each node being specified by a value of a bit in the key, that bit being designated by the bit index of that node. The retrieval operation ends when a leaf is reached. Specifically, given this search key, the root node of tree 900 carries a bit index of two, which, as symbolized by dashed line 903, designates bit two of the key. This particular bit is one which, in turn, causes the search path to then traverse one branch 910 of tree 900. The entire search path for this key is shown by arrows. The next node encountered is node 935 which has a bit index of three. Hence, bit three of the key is examined next. This particular bit in the key carries a zero value thereby causing the search path to next traverse zero branch 937 emanating from node 935. Next, node 940 is encountered which specifies a bit index of seven. Accordingly, bit position seven of the search key is examined. Inasmuch as the key has a one at this bit position, the search path traverses along one branch 944 to node 945. This particular node carries a bit index of nine. Bit position nine of the key contains a zero-bit; hence causing the search path to traverse along zero branch 947 emanating from node 945. Branch 947 leads to a leaf node at which value E is stored. Once this leaf node is encountered, the search key is compared to the value, here E (value 0010001100), stored thereat. Inasmuch as the search key, having a value of 0110010110, does not identically match the stored value at this leaf (i.e., search key # E), the retrieval operation terminates with a result that the key is not present in the structure. Hence, the search was unsuccessful.
As one can clearly appreciate, many different search keys can often produce the same search path through a conventional Patricia tree. However, the search will succeed if and only if, as a result of the comparison operation, the stored value ultimately accessed at a leaf identically matches the key being searched. This is evident in FIG. 9B. Here, two search keys, i.e., keys 011011 (search key 1) and 010010 (search key 2), are applied to tree 960. Both keys result in a common search path, as given by arrows, through the tree, i.e., from root node 961, along zero path 965 (zero in bit zero of the key), to node 967 and along zero path 969 (zero in bit three of the key) emanating therefrom. This search terminates at a leaf node storing value L which is illustratively 011011. Inasmuch as value L matches search key 1 (011011), a search on key 1 succeeds with the reference field (not shown) associated with value L being returned as a result. In contrast, the search on search key 2, which traverses the same search path as search key 1, does not succeed. Specifically, this search progresses from root node 961, along zero path 965 (zero in bit zero of search key 2), to node 967 and along zero path 969 (zero in bit three of the this key) emanating therefrom. This search also terminates at the leaf node having stored value L. However, inasmuch as search key 2 (010010) does not identically match stored value L (011011), this search is unsuccessful.
c. Insertion
To sufficiently elucidate insertion and, as will be shortly discussed below, removal operations, we will digress slightly to first describe, prior to discussing insertion, the data stored at each node in a Patricia tree--inasmuch as this data is used and appropriately modified during each of these operations.
FIG. 9C depicts this data as collectively data structure 970, arranged, for ease of understanding, in columnar form. In particular, data structure 970 contains data 972 which is stored at a branch node, and data 974 which is stored at each leaf (data node). Branch node data includes, in three separate fields, the pivot bit, and separate pointers (addresses) to child zero and child one of that node. The pivot bit is the bit index value associated with a corresponding branch node. Each data node stores two fields: a value, e.g., a pattern, which is ultimately matched against a search key, and a reference. The reference field is immaterial for search purposes and can store any item. As noted above, when a search of a key succeeds, the content of the reference field, stored at a corresponding data node at which the search terminates, is simply returned for subsequent use.
With the above in mind, FIG. 9D depicts tree 900 shown in FIG. 9A and specifically the data stored at each node in that tree. A starting address at which the data structure for each node is stored in memory is designated by a lower case letter. Specifically, letters a, b, c, d, e, f and g designate these addresses for data structures 918, 923, 925, 943, 948, 953 and 951 which store data for the leaves that store values A, B, C, D, E, F and G, respectively. For example, inasmuch as data structure 918 exists for a leaf having stored pattern A, this structure stores value A, in lieu of a pivot bit, in a first field in this structure, and so forth for the data structures for the other leaves of the tree. To simplify presentation, data has been omitted from the reference field for all the data structures for the leaves of tree 900. Letters m, h, i, j, k and l designate starting addresses for data structures 906, 916, 921, 941, 946 and 936, that store data for branch nodes 905 (root), 915, 920, 940, 945 and 935, respectively. Data structure 906 stores data for the root node of the tree wherein the pivot bit contains the value two, and the following two fields in structure 906 contains addresses h and l as pointers to the child zero an d child one nodes, respectively; address h being that for data structure 916 associated with node 915, and address l being that for data structure 936 associated with node 935; and so forth for all the other branch nodes in the tree. Inasmuch as the data structures themselves contain an implicit representation of the tree hierarchy, only the se structures themselves are stored in O/S 110 (see FIG. 3) within memory 340.
We will now turn our attention to the insertion operation. For simplicity of illustration, we will discuss both the insertion and removal operations with specific reference to such operations performed on sub-tree portion 902, depicted in FIG. 9D, of conventional Patricia tree 900. Essentially, insertion involves creating a new data structure f or each new node and altering the contents, specifically the pointers, in appropriate data structures then existing within the sub-tree portion, as well as storing appropriate pointers in the new data structure, to properly designate paths to and from the new node and existing nodes impacted thereby. Each time a new value is to be added to a Patricia tree exactly one new branch node and one new leaf node are created. The number of different values stored in a Patricia tree, and hence the number of its leaf nodes, always equals one more than the total number of its branch nodes.
FIG. 9E depicts a conventional insertion operation performed on sub-tree portion 902. Here, illustratively, a node with a pivot bit of six is to be inserted into sub-tree portion 902. This node, given the hierarchy within portion 902, is to be inserted on the child zero branch 937 between node 935, having a pivot bit of three, and node 940 having a pivot bit of seven. As shown, prior to insertion, the child zero pointer for node 935, specifically that stored within data structure 936, points to address j which is the starting address for data structure 941 for node 940; hence, as shown by arrows, linking the data structures for these two nodes together.
Inserting a node arises whenever a new and different value, here designated by H, is to be inserted within sub-tree portion 902. A new branch node is created with a pivot bit designating a lowest bit position at which a disagreement exists between the bits of the values presently stored in the sub-tree portion and the new value H. Here, such a disagreement illustratively occurs at bit position six. Hence, new node 982 (indicated by a dashed box) is created; memory is allocated for new data structure 983 at an illustrative starting address n with this new structure being stored thereat. The pivot bit for this node is set to six, as indicated within structure 983. A new leaf node (also indicated by a separate dashed box), specifically its associated data structure 989 containing value H, is also created with this data structure being assigned illustrative starting address p. The pointers of the existing nodes and new node are now set appropriately to modify the sub-tree to include the new node. In that regard, since new node 982 is reachable through the child zero branch of node 935, then the child zero pointer field for node 935, specifically that stored within data structure 936, is changed to point to new node 982, i.e., address n. The child zero and one branches for new node 982 and specified by addresses j and p are stored within data structure 983. As a result, sub-tree portion 902 is transformed into sub-tree portion 902' containing new nodes 982 and 987. Hence, rather than merely linking node 935 (pivot bit three) directly to branch node 940 (pivot bit seven) as in sub-tree portion 902, this linkage, in sub-tree portion 902' now extends through new node 982, as shown by the arrows therein.
d. Removal
Essentially, removal involves deleting desired nodal structure, specifically desired leaf nodes and associated branch nodes that provide unique paths to these leaf nodes, from the sub-tree portion and modifying the pointers of remaining affected nodes to exclude the deleted structure. For every stored value that is to removed from a Patricia tree only one leaf node and one corresponding branch node will be eliminated.
FIG. 9F depicts a conventional removal opertion perfomed on sub-tree portion 902. Here, illustratively, a leaf node storing value D and associated internal node 940 (with a pivot bit of 7)--as indicated by dashed X's--are to be deleted from sub-tree portion 902. As shown, prior to deletion, the child zero pointer for node 935, specifically that stored within data structure 936, points to address j which is the starting address for data structure 941 for node 940. Node 940, in turn, points through its child one branch to node 945 (having a pivot bit of 9) having data structure 946 with a starting address of k. The linkages among the data structures for these three nodes is shown by arrows.
To remove a leaf node and its associated branch node--the illustrative structure to be removed here being indicated by dashed X's, pointers in remaining nodes are modified to appropriately point around the n ode s to be removed followed by deleting the data structures, for these nodes, from memory and finally de-allocating the memory locations previously used thereby. In that regard, if illustrative value D is to be removed from sub-tree portion 902, then node 940 which provides a unique path to this leaf node is removed as well. To effectuate this, the child zero pointer in data structure 936, for node 935, is modified to point directly to node 945, i.e., to address k, rather than to address j. As such, sub-tree portion 902" results wherein node 935 (pivot bit three) directly links, via child zero branch 937', to branch node 945 (pivot bit nine), specifically address k for data structure 946, as shown by the arrow in this sub-tree portion. Once this occurs, leaf node D and branch node 940 are deleted from the tree; the memory locations used thereby are then appropriately de-allocated and made available for subsequent use.
For further information on a conventional Patricia tree, the reader is referred to: D. Morrison, "PATRICIA--Practical Algorithm to Retrieve Information Coded in Alphanumeric", Journal of the ACM, Issue 15, No. 4, October 1968, pages 514-534; D. Knuth, The Art of Computer Programming, Vol. 3--Sorting and Searching (.COPYRGT.1973, Addison-Wesley Publishing Company), pages 490-499; and, for a well-known extension to a conventional Patricia tree: T. Merrett et al, "Dynamic Patricia", Proceedings of the International Conference on Foundations of Data Organization, 1985, pages 19-29--all of these references being incorporated by reference herein.
2. Our Inventive Rhizome
a. Overview
As noted above, conventional Patricia trees do not accommodate wildcards at all. In sharp contrast, our inventive rhizome does. By doing so, our invention achieves retrieval efficiencies typically associated with conventional Patricia trees and with, though as not heretofore provided by the art, wildcard capability which advantageously and collectively renders our inventive rhizome particularly well suited for use in, e.g., a packet classifier. Through use of such a classifier, network packets can be classified very rapidly. Furthermore, not only can our inventive structure accommodate wildcards, but also this structure can retrieve, for an input key, stored wildcard-based pattern values in an increasing order of generality, i.e., for such stored values that are hierarchically related the most specific such stored pattern value will be returned first, as output, followed by each of those pattern values situated at a correspondingly increasing level of generality. As noted above, such a strategy is required of a packet classifier inasmuch as two classification patterns can be identical to each other or one such pattern can be subsumed within the other.
Essentially, through our inventive teachings, we have appropriately modified a Patricia tree, both in terms of its structure as well as the retrieval, insertion and deletion operations used in conjunction therewith, to provide a trie-indexed forest that accommodates hierarchically related wildcard-based stored values. Our inventive rhizome contains two interconnected data structures: a binary search trie (being a "search structure" once stored in memory or and/on a readable storage medium) and a hierarchy forest (being a "hierarchy structure" once stored in memory and/or on a readable storage medium)--the latter required to accommodate wildcard-based patterns and being totally absent from a conventional Patricia tree. Inasmuch as our inventive rhizome is particularly well suited for use in packet classification, for ease of understanding, we will discuss our inventive rhizome in the context of use with packet classification, for use in, e.g., scheduling, in which packet classification patterns, that contain wildcards, are stored in the structure.
In particular, FIG. 10 depicts illustrative rhizome 1000 which embodies the teachings of our present invention. As shown, rhizome 1000, which for enhanced readability is drawn upside-down with a root at the bottom, is organized into binary search trie 1000.sub.1 (comprising branch nodes and child links), which feeds, rather than just individual leaves as in a conventional Patricia tree, hierarchically-arranged wildcard-based patterns in hierarchy forest 1000.sub.2 (comprising pattern nodes and hierarchical links). A subset of nodes and their interconnecting links in our inventive rhizome can be regarded as a sub-structure, of which two are illustratively shown as 1002 and 1004. Each pattern node, such as node 1031, is indicated by a rectangle with a corresponding stored pattern associated with that node being indicated therein; each branch node, such as node 1010, is indicated by a circle with the value of the pivot bit for that node being specified therein.
Rhizome 1000 contains a forest of ten hierarchy trees. Generally speaking, a forest contains one or more trees. One such tree comprises pattern node 1045 at a root, pattern node 1041, pattern node 1031, and pattern node 1033. A second such tree comprises pattern node 1068 at a root and pattern node 1062. The remaining eight hierarchy trees are degenerate, each containing a single pattern node, those pattern nodes being 1018, 1065, 1072, 1078, 1082, 1088, 1092 and 1094. The thick lines in this figure, that connect pairs of individual pattern nodes, indicate a hierarchical relationship between the patterns stored within those nodes. In particular, for the hierarchy tree rooted at pattern node 1045, the lowest-level hierarchically related patterns stored therein, i.e., the most specific, are 00101100XX001011 and 0010110011010011 stored at level 1030 in pattern nodes 1031 and 1033, respectively. At the next higher level, i.e., level 1040, pattern 00101100XX0XXX11, by virtue of its wildcards, stored within pattern node 1041 subsumes patterns 00101100XX001011 and 0010110011010011. Hence, this hierarchical relation between node 1041 and each of nodes 1031 and 1033 is indicated by darkened lines 1037 and 1039, respectively. A similar hierarchical relationship exists between the patterns, i.e., 0011010001110100, and 00X101X00X110100, stored within nodes 1062 and 1068, and is so indicated by a darkened line as well. Furthermore, given the patterns in node 1041 in view of the pattern, i.e., 0010XX00XXXXXX11, stored within node 1045, the latter pattern subsumes the former pattern. Hence, a hierarchical relationship also exists between the latter node and the former node, and is so indicated by again a darkened line. Inasmuch as rhizome 1000 does not store any patterns that are increasingly more general, e.g., 0010XXXXXXXXXX11, than that in node 1045, node 1045, being the most general god pattern in this hierarchy, occupies the highest level in this particular hierarchy tree.
For the hierarchy tree rooted at pattern node 1068, the most specific pattern is 0011010001110100 in node 1062. At a higher level, pattern 00X111X00X110100, by virtue of its increased generality, is stored in node 1068.
To generalize the relationships in a hierarchy forest, FIGS. 11A and 11B diagrammatically depict, through a Venn diagram, permitted hierarchical relationships among any pair of patterns stored in a rhizome, and specifically within hierarchy forest 1000.sub.2 shown in FIG. 10. To fully appreciate this figure, the reader should simultaneously refer to FIGS. 10, 11A and 11B throughout the following discussion.
For any two stored patterns, apart from a trivial case of their identicality (which is not shown and will result in only one stored pattern in our inventive rhizome), two stored patterns can either be unique or subsumed one by the other, as represented by depictions 1110 and 1120, respectively, shown in FIGS. 11A and 11B, respectively.
In particular, if both patterns are unique, i.e., one is no more general than the other and neither subsumes the other, as is the case for illustrative patterns 00101100XX001011 and 0010110011010011, stored within nodes 1031 and 1033 at level 1030, then, as represented by corresponding circles 1113 and 1117 in depiction 1100, these patterns are stored independently of each other. The nodes may be indirectly interconnected, as nodes 1031 and 1033 happen to be by node 1041, or they may be completely separate, as is the case for most of the unrelated patterns in structure 1000.
Alternatively, if one pattern is subsumed within the other, as is the case for illustrative patterns 0010110011010011 and 00101100XX0XXX11 stored at nodes 1033 and 1041, then, as represented by circles 1127 and 1123 in depiction 1120, the more general pattern, here 00101100XX0XXX11, completely contains the more specific pattern, here 0010110011010011, and is stored at a higher hierarchical level, here level 1040, in the hierarchy tree than is the latter pattern, here at level 1030.
In addition to identity, independence, and subsumption, another possible relationship exists between two patterns as illustrated by depiction 1130 in FIG. 1C. In this depiction, pattern 00XXXX10 defines a set of sixteen values collectively represented by circle 1133; similarly, pattern 0011XXXX defines a set of sixteen values collectively represented by circle 1137. There are some values, such as 00000010, that fall within circle 1133 but do not fall within circle 1137; there are other values, such as 0010000, that fall within circle 1137 but do not fall within circle 1133; and there are still other values, such as 00110010, that fall within both circles. Therefore, these two patterns are not independent, but neither does either one of them subsume the other. Patterns that partially overlap in this manner may not be stored in our inventive rhizome, since no defined hierarchical relationship holds for these patterns. Such patterns are said to "conflict".
Now, returning to FIG. 10, search trie 1000.sub.1 is situated immediately below and is connected to hierarchy forest 1000.sub.2. The former implements a binary search trie which provides a unique path to each of those patterns situated at the lowest level hierarchy forest, and possibly to those at higher levels. This trie, often unbalanced as is the case here, is formed of branch nodes, each of which, as noted above, is indicated by a circle and contains a value of the corresponding pivot bit, i.e., the bit index. Also, for enhanced readability, zero and one branches project to the left and right, respectively, from each branch node. As shown, search trie 1000.sub.1 contains branch nodes 1010, 1015, 1020, 1050, 1055 and 1060 all located on a left side of root node 1005; and branch nodes 1070, 1075, 1080, 1085 and 1090 all situated on a right side of root node 1005.
Generally speaking, the organization of the branches in the search trie, such as search trie 1000.sub.1, and a corresponding pivot bit value at each branch node therein are both governed, in a similar fashion as with a conventional Patricia tree, by bit locations at which bits, in the patterns stored in the hierarchy forest, disagree. A unique search path is defined through the search trie to each pattern stored at the lowest level, and possibly to those at higher levels, of the hierarchy forest. As discussed above, the ordering in the hierarchy forest, such as forest 1000.sub.2, is governed by whatever order of increasing generality, if any and by virtue of wildcards, exists among pairs of the stored patterns.
As one can see, by virtue of accommodating hierarchically related patterns, our inventive modified Patricia tree permits multiple nodes, either branch or pattern nodes and here illustratively branch nodes 1015 and 1050, to point to a common leaf, here illustratively pattern node 1018. In contrast, a conventional Patricia does not permit this condition inasmuch as only one node therein can point to any one leaf. However, as with a conventional Patricia tree, our inventive rhizome does not permit one-way branching.
b. Retrieval
As with a conventional Patricia tree, a search of our inventive rhizome is conducted on the basis of an input search key. In a conventional Patricia tree as discussed above, either the search key will match a value found by tracing through the tree or that key will not have a matching value anywhere in the tree. With our inventive rhizome, the search key will match either a specific pattern stored in the rhizome or a increasingly general pattern stored above it in a hierarchical chain, or that key will not have a matching pattern stored anywhere in the rhizome. Hence, the binary search trie effectively narrows down a search from an entire database of stored patterns to just a linear chain of patterns in order of strict hierarchy, the latter being usually substantially smaller in size than the former. A search succeeds if the retrieval process locates a stored pattern that most specifically matches, either identically or in terms of subsuming, the search key.
In particular, to determine whether an input search key exists in rhizome 1000, search trie 1000.sub.1 is searched by first tracing through this trie and specifically branching at each node based on the value of a bit in a search key. Once the search reaches a lowest-level pattern node, the pattern stored at that node is tested against the search key. Inasmuch as the most specific pattern is to be returned for any search key, then if this lowest-level pattern does not match the search key, testing for a match continues, through hierarchy forest 1000.sub.2, with the next higher pattern node in succession until all pattern nodes in a pattern hierarchy chain, that includes and extends from the lowest-level pattern node, have been tested. As soon as a pattern is found in the chain that subsumes the search key, the search is successful and the reference value associated with that particular pattern is returned for subsequent use. In the event no pattern in the hierarchy chain subsumes the search key within it or no such pattern hierarchy chain exists, then the search does not succeed and an appropriate indication therefor is provided by a retrieval routine.
To fully appreciate how a stored wildcard-based pattern is retrieved, let us discuss retrieval in the context of a few examples for rhizome 1000. First, consider the search key 0010110010001011. Searching begins at root node 1005, which has a pivot bit (bit index) of zero. Since the value of bit zero for this search key is zero, the search path traverses a zero branch, emanating from this node, to node 1010. This latter node has a pivot bit of three which for this particular key necessitates that the zero branch emanating from this node is traversed to node 1015. Node 1015 carries a pivot bit of seven, which for key 0010110010001011, identifies bit position seven therein here being a zero, thereby causing the search to traverse the zero path emanating from branch node 1015 to branch node 1020. This latter branch node carries a pivot bit of eleven, which, designates bit eleven in the key. Inasmuch as bit eleven of the exemplary key contains a zero, a zero branch emanating from node 1020 is then traversed to pattern node 1031. This pattern node contains the stored pattern 00101100XX001011. Since, owing to the wildcards within this pattern, the search key is subsumed within the pattern, the search is successful and the reference value associated with this particular pattern node is returned. No further matching occurs, inasmuch as any other higher-level pattern nodes, such as 00101100XX0XXX11 and 0010XX00XXXXXX11 in nodes 1041 and 1045, respectively, that subsumes the search key, owing to strict hierarchy in the rhizome, will be more general than 00101100XX001011.
As a further example, consider the search key 1001111101010101. Searching begins at root node 1005, which has a pivot bit (bit index) of zero. Since the value of bit zero of this particular search key is one, the search path traverses a one branch, emanating from this node, to node 1070. This latter node has a pivot bit of two which for this key necessitates that the zero branch emanating from this node is traversed to branch node 1075. Given this key, the search traverses, as a result of indexed bit positions in the key, through a one path emanating from node 1075 to node 1080, from a one path emanating from node 1080 to node 1085, from a zero path emanating from node 1085 to node 1090, and finally from a zero path from node 1090 to pattern node 1092. This pattern node contains the stored pattern 1X0XX01101010X01. This pattern, even with its various wildcards, does not match the search key of 1001111101010101 inasmuch as a bit disagreement occurs in bit position five (the search key having a one at this bit position; the stored pattern containing a zero thereat). Since node 1092 has no pattern hierarchy chain, the search is unsuccessful.
Lastly, consider the search key 0010110011001111. Beginning at root node 1005, the search path then traverses through nodes 1010, 1015, and 1020 and finally reaches pattern node 1031 which stores pattern 00101100XX001011. Though this particular pattern does not match the key--owing to a bit disagreement in bit position thirteen (here, a one in the key, a zero in the stored pattern), a pattern hierarchy exists above node 1031 in which patterns may be stored, specifically those within nodes 1041 and 1045, that might be sufficiently general to encompass this key. Consequently, the search successively traverses a hierarchical path emanating upward from node 1031 to locate the least general (most specific) pattern, if any exist, that subsumes the key. Pattern node 1041, which contains pattern 00101100XX0XXX11 that does subsume the key, is encountered. Hence, the search is successful with the reference value for node 1041 being returned for subsequent use. If, alternatively, the search key was not subsumed by pattern node 1041 but was subsumed by node 1045, then the search would still have succeeded; however, the reference field for node 1045 would have been returned instead of that for node 1041.
As one can appreciate, the search trie tends to drastically reduce the number of patterns that need to be examined to locate one that most specifically matches a given input search key. Moreover, owing to the strict hierarchy in which patterns at higher levels in a given hierarchy are strictly more general and hence completely subsume those at lower levels therein, then, if a bit-by-bit comparison is made of successive patterns in a pattern hierarchy chain against a search key, then as increasingly general patterns are encountered in that chain, there is no need to compare a bit in any bit position of the search key that has previously been compared against and found to match that in any lower level pattern in that chain. In particular, with the last example given above, bit position zero in the search key was found to match bit position zero in pattern 00101100XX001011 stored in node 1031. Hence, when a next higher pattern, i.e., that being 00101100XX0XXX11 in node 1041, is encountered, there is no need to recheck bit zero for the pattern stored in 1041 as that bit was already found to match, via pattern node 1031, that in the search key. In view of the strict hierarchy, re-checking bit positions that have already been found to match are redundant and hence can be advantageously eliminated to reduce retrieval time. Generally, the maximum number of bit comparisons that needs to be made to retrieve a given search key equals the number of bits in the key itself plus the number of patterns, in the longest hierarchy chain in the hierarchy forest, less one. In practice, further efficiencies might be gained by comparing bits in groups, such as bytes or nibbles, rather than singly since available processors can simultaneously compare certain-sized groups of bits just as fast as they can compare single bits.
c. Insertion
For insertion, the presence of a wildcard(s) in a pattern can cause complications. First, inserting a new pattern containing a wildcard(s) into our inventive rhizome is not as simple, as with a conventional Patricia tree, as inserting a new branch node, a new pattern node and a new path--as previously discussed in the context of FIG. 9E. A single wildcard in a pattern permits a corresponding pattern node to be reached by two different paths. Hence, insertion for our inventive rhizome, based on the bit position of the wildcard(s) in the new pattern, may require inserting multiple branch nodes, some with identical pivot bits, and multiple paths therefrom to reach the new corresponding pattern node. Second, if a pattern is to be inserted into our inventive rhizome that already contains one or more pattern nodes that each carries a wildcard(s), then existing portions of the rhizome may need to be fully or partially replicated to assure that existing pattern nodes can be reached through multiple paths, specifically via both sides of a new branch node that will be created for that new pattern node.
To sufficiently elucidate, for our inventive modified Patricia tree, insertion and, as will be shortly discussed below, removal operations, we will digress slightly to first describe, prior to discussing insertion, the data stored in memory at each node in our inventive rhizome--inasmuch as this data is used and appropriately modified during each of these operations. As will be seen, this data includes additional fields for handling these complications.
FIG. 13 depicts this data as collectively data structure 1300, arranged, for ease of understanding, in columnar form. This data structure, which, when stored in memory and/or on a computer readable medium, forms a "memory structure", encompasses all the fields for a conventional Patricia tree, as shown in structure 970 in FIG. 9C but with additional fields for both a branch node and a pattern node in order to accommodate wildcards and accompanying pattern hierarchy.
In particular, data structure 1300 shown in FIG. 13 contains data 1310 which is stored at a branch node, and data 1350 which is stored at each pattern node. Data for VALUE, MASK and IMASK fields 1315, 1320 and 1325, are stored for each node, regardless of whether that node is a branch or pattern node. The MASK and IMASK fields, which will shortly be defined and discussed in considerable detail below, are necessary to specify a wildcard(s) at a given pattern node and to propagate the existence of that wildcard(s), at that node, downward into underlying structure of the rhizome. The reason, as will be described in detail below, supporting the need for such propagation is to provide a mechanism at each branch node that indicates whether a section of the structure, starting at that node and extending upward, needs to be replicated (i.e., to provide alternate search paths) during insertion. This propagated information is also necessary, as will be discussed below, for determining, during removal, just how much structure in the rhizome is redundant, in view of a pattern to be removed.
Data specific to a branch node encompasses a pivot bit stored in field 1330, and pointers to child zero and one, of that branch node, stored in fields 1335 and 1340, respectively. Inasmuch as this particular data is used in exactly the same manner as in a conventional Patricia tree and has been discussed above, we will not dwell on that data any further.
Data specific to a pattern node encompasses a reference value and a pointer to a godparent node, as stored in respective fields 1360 and 1365. Reference field 1360 can store any arbitrary data and is used in much the same manner as the reference field in a conventional Patricia tree, i.e., the value of that field is merely returned for subsequent use in the event of a successful search terminating at that node. For use in a pattern classifier, the reference field usually stores a class designation for a packet, which for packet scheduling is a designation of a transmission queue. Generally speaking, whenever a new pattern is to be inserted into the rhizome, then a given unique designation is assigned to that pattern and stored in the reference field associated therewith. For any pattern node, field 1365 stores a pointer (address) to a "godparent" node. A godparent for a given pattern node is defined as a next higher-level hierarchically related pattern node. In particular, for a hierarchy chain of pattern nodes, the godparent of a given pattern node in that chain is the next higher pattern node in that chain that is more general that the given pattern node. The pattern node situated at the top of the hierarchy chain has no godparent. If no godparent exists for a pattern node, then a pre-defined null value is stored in godparent field 1365 for that pattern node. Further, by way of definition, for any pattern node that is a godparent, the node situated immediately below it in a hierarchical pattern chain and containing the next more specific pattern, is a "godchild" of that pattern node.
As with a conventional Patricia tree, pivot bit field 1330 stores the pivot bit, i.e., bit index, for a branch node associated therewith. Now, apart from this function, the data stored in pivot bit field 1330 also identifies the type of node, i.e., branch or pattern, for which structure 1300 stores data--this being necessary since, as noted above, the same data structure is used to store data for each of these two different nodes. In particular, for a branch node, the pivot bit field stores a bit index which has a value between zero and one less than the total number of bits in a stored pattern, inclusive. If a value stored in the pivot bit field equals the length of a stored pattern (all the patterns in a rhizome have the same length), then this indicates structure 1300 is storing data for a pattern node. In particular, for a rhizome containing eight-bit patterns, the pivot bit will store values from zero to seven, inclusive, for a branch node, but will be set equal to eight for each pattern node; and so forth for rhizomes storing patterns of other sizes.
Now, as to the VALUE, MASK and IMASK (being an abbreviation for an "inverse mask") fields, the purpose of each of these fields differs between a pattern node and a branch node. To facilitate understanding, we will discuss each of these fields, first in terms of its use in a pattern node and then in terms of its use in a branch node.
Simply stated, for any pattern node, VALUE field 1315 stores the actual binary pattern therefor.
With respect to the MASK field used in any pattern node, this field specifies the existence of a wildcard in the corresponding pattern (stored in the associated VALUE field). In that regard, if a given bit in the MASK field is a zero, then the corresponding bit in a pattern is a wildcard (denoted by an "X" in the patterns shown, for example, in FIG. 10); if the given bit in the MASK field is a one, then the corresponding bit of the pattern equals the corresponding bit in the VALUE field. Thus, the pattern 00101100XX101011 (stored in pattern node 1031 shown in FIG. 10) is represented by a MASK field of 1111111100111111 and a VALUE field 0010110000001011. The two adjacent bits in the eighth and ninth bit positions in this pattern, given the wildcards situated there, are irrelevant and could equally be 00, 01, 01 or 11. However, to simplify the logic and testing used in our inventive insertion routine for our inventive rhizome, as discussed in detail below in conjunction with FIGS. 15, 16A and 16B, we set any bit position in the VALUE field, at which a wildcard exists at that bit position in the pattern, to zero. For a shorthand notation, though each of the pattern nodes situated in rhizome 1000 shown in FIG. 10 is formed of a combination of the VALUE and MASK fields, for this and other figures, a wildcard at any bit position in the VALUE field (identified by a corresponding zero bit in MASK field) is set to "X" to denote that wildcard. As shown in FIG. 2B and discussed above in conjunction therewith, a pattern, for use in packet scheduling, such as pattern 250, is formed by consistently concatenating the classification fields; a MASK field therefor, such as mask field 255, is formed by consistently concatenating the corresponding mask sub-fields for each of the individual classification fields.
As another example, consider the pattern 00101100XX0XXX11 stored in pattern node 1041 shown in FIG. 10, this pattern is represented by a VALUE field of 0010110000000011 and a MASK field of 1111111100100011. Likewise, consider the pattern 0010XX00XXXXXX11 stored in pattern node 1045. This pattern is represented by a VALUE field of 0010000000000011 and a MASK field of 1111001100000011.
For each pattern node, the IMASK field of that node is set equal to the IMASK field of its godparent, if it exists, of that node. For any pattern node that does not have a godparent, the IMASK field of that particular pattern node is set equal to its own MASK field. Inasmuch as pattern hierarchies in any hierarchy chain follow strict hierarchical rules as the chain is incrementally traversed upward to successively more general patterns, then, setting the IMASK field of a pattern node to that of its godparent is equivalent to stating that the IMASK field of that pattern node is equal to a conjunction (i.e., logical AND combination) of the MASK field of that node and the IMASK field of its godparent. This occurs because, as a result of the strict hierarchy in a pattern chain, the MASK field of the godparent is guaranteed not to have any bits set therein that are not set in the MASK field of the pattern node immediately below it (i.e., a godchild) in the chain.
Furthermore, in each instance where hierarchy is irrelevant or nonexistent, such as for a hierarchy chain that contains just one pattern node, then the MASK and IMASK fields of that pattern node are set equal to each other.
As noted above, the MASK, IMASK and VALUE fields serve a different purpose for any branch node than for any pattern node. Specifically, the fields of any given branch node specify, in condensed form, information about the fields of all patterns reachable from that branch node. There are five possible states maintained for each bit position: (a) all descendent patterns require that the bit be zero; (b) all descendent patterns require that the bit be one: (c) some but not all descendent patterns require that the bit be zero; (d) some but not all descendent patterns require that the bit be one; and (e) no descendent patterns care about the value of the bit. This information is necessary, as stated above, during: (a) insertion, for determining just how what portion of the rhizome needs to be replicated in order to insert a new pattern, and (b) removal, for determining, just how much structure in the rhizome is redundant, in view of a pattern to be removed.
In particular, the MASK field for any branch node is set to the disjunction (logical OR combination) of the MASK fields of its two child nodes, up to the bit position specified by the pivot bit of this branch node. Since this is a recursive definition extending upward through both the search and hierarchy structures, a bit value of one in a MASK field of a branch node indicates that at least one pattern node, reachable from (i.e., a lineal descendant of) that branch node, cares about the value of that bit. If a bit position in a MASK field of a branch node is a zero, then this means that no descendent pattern, i.e., any of those in a pattern hierarchy chain reachable from this branch node, cares about this bit position; hence, the value at this bit position is irrelevant to all those patterns.
The IMASK field for any branch node is set to the conjunction (logical AND combination) of the IMASK fields of its two child nodes, up to the bit position specified by the pivot bit of this branch node. Since this is also a recursive definition extending upward through both the search and hierarchy structures, a bit value of zero in an IMASK field of a branch node indica |