Distributed evalulation of directory queries using a topology cache6980985Abstract A technique for performing query evaluation on distributed directories utilizes the creation of a "topology cache" defining the hierarchical relationship between the various directory servers (i.e., identifying "subordinate" and "superior" knowledge references associated with each directory server in the system). The created topology cache is then stored at each directory server, and forwarded to a client upon submitting a query to the system. Using the topology cache information at the client, a distributed query evaluation plan can be developed for use with complex queries, such as hierarchical and aggregate queries. Claims 1. A method of searching a query through distributed directory servers organized as a forest having a plurality of entries, each entry represented by a unique identified, comprising the steps of: Description TECHNICAL FIELD
LDAP queries are evaluated in a distributed directory by "walking" up and down the partitions managed by the physical directory servers, and evaluating the filter against each relevant partition. To answer the query submitted by an LDAP client, the distributed directory uses "superior" and "subordinate" knowledge references to assist the client in determining all of the servers that need to be contacted to fulfill this request. The referral entries to these other servers that need to be contacted are returned to the client, which then resubmits the original query to these other servers. Thus, the distribution is exposed to the LDAP client that decides whether or not to continue the processing of a query that spans multiple servers. This mechanism is known as "distributed evaluation by referrals", where an example is discussed below, illustrating the steps involved in the distributed evaluation by referrals of the LDAP query Q, defined as follows:
The mechanism of distributed evaluation by referrals, as discussed above, is based on the LDAP philosophy that the distribution logic must be provided by the LDAP client. This avoids inter-server communication during query processing. One advantage of this mechanism is that each directory server needs only to be aware of the servers in its local vicinity: its parent server and its children servers. However, there are some major disadvantages of utilizing "distributed evaluation by referrals". Indeed, if a directory server becomes unavailable, it can create "islands" of directory servers that are disconnected from each other. For example, assume that server S1 in FIG. 1(b) is unavailable, while all other servers are available. Then, when the above-described LDAP query Q is submitted to server S4, the results of the query cannot be returned to the client, even though both servers S2 and S3, which contain these results, are available. The problem is that server S4 does not "know" about server S2, but only the (unavailable) server S1 on the path from server S4 to server S2. Additionally, distributed query evaluation, even when all of the servers are available, requires a series of client-sequence interactions, involving multiple servers, which can be quite inefficient, especially when the directory server topology has long paths. Looking at the above example, the original submission of the LDAP query Q to server S4 requires the client to contact, in turn, server S1, followed by server S2, followed by server S3. Of these, the interaction with server S1 is essentially useless, since it cannot contain any answer to the query. If the base-entry-DN had been r3, managed by server S3, the interactions with both servers S1 and S2 would also have been useless. Again, the problem is that server S4 does not "know" about the existence of servers S2 and S3, which manage the base-entry-DNs of query Q. With this background and a full understanding of distributed query evaluation using referrals, it is now possible to better understand the subject matter of the present invention and the utilization of a topology cache to improve the efficiency of a distributed query evaluation. In accordance with the present invention, a "topology cache" describes the forest-structured topology of the directory servers, and comprises a set of knowledge references, one for each physical server in the distributed topology of directory servers. It is an aspect of the present invention that each directory server is required to store a copy of the topology cache. Thus, for an LDAP query, the evaluation of a query Q using the topology cache method of the present invention would proceed as follows:
Thus, referring to FIG. 1(b), a topology cache would contain information about all four servers, and the above-described query Q would proceed through the topology cache method as follows:
As will be discussed below, these steps as outlined above are defined as a "distributed query evaluation plan", PQ, which is expressed algebraically as follows: An advantage of using the topology cache system of the present invention is that the non-availability of one or more servers (as in the above example when server S1 is unavailable) does not affect the answer to the query. Further, no irrelevant servers are contacted to answer the user query (except for S4, which is the only server known to the client). Thus, for simple LDAP queries, the mechanism of distributed evaluation using the topology cache mechanism of the present invention allows for only relevant servers to be contacted, and exploits the maximum parallelism among the servers. Further, this mechanism adheres to the LDAP philosophy that the distribution logic must be provided entirely by the LDAP client, without burdening the servers with managing inter-server communication during query processing. There are two main concerns associated with the concept of a topology cache that have heretofore negated its use in association with database queries: "consistency" and "cost". In particular, maintaining consistent caches in a distributed environment is considered expensive in terms of the constant need for update and control. However, this in only true when the frequency of change propagation is high, that is, either the cached data changes frequently, or the number of (slowly changing) entries in each cache is very large. In the present context of directory queries, neither of these conditions is valid. That is, while the data in each in directory may change often, the topology of the directory servers rarely changes; the servers are linked, created and/or removed only occasionally. Since the topology cache as used in the present invention only contains the topology information in terms of subordinate and superior knowledge references (and not detailed information about the contents of each directory server), the cached data changes infrequently, too. Further, the number of directory servers is likely to be in the range of tens or hundreds, hence the number of entries in the topology cache will be in the same range. For user queries whose answer is small (one, or a few directory entries), fetching and examining the entire topology cache is a considerable overhead. However, these costs can be amortized over multiple queries in a single client-server session. That is, the topology cache needs to be fetched only once at the beginning of each session, and all of the user queries in that session can use the fetched topology. Further, the knowledge references in the topology cache can be pre-processed into a trie structure, mirroring the forest of physical directory servers. This enables very efficient determination of the particular servers that are relevant to the current query, based on the observations that: (a) the task of finding the server that manages the base entry of the query is akin to finding the string (in a database of strings) that is the longest prefix of a given query string; and (b) all other relevant servers are located in the subtree below the server that manages the base entry of the query. FIG. 2 illustrates experimental results from utilizing the topology cache of the present invention, when compared to a prior art distributed directory query using the "referral" method discussed above. In each diagram, the prior art is defined by the term "nocache" in the legend. The experiments were carried out using the OpenLDAP directory server (whose source code is publicly available), on a single, lightly-loaded host machine. Multiple directory servers, each with 100 entries, were created on different ports of the same host. In the prior art case (i.e., where no topology cache is used), there is a first phase during which the server that manages the base-entry-DN of the submitted LDAP query has to be located. The optimal situation in this case for the prior art approach of distributed evaluation by referrals is to hit the "right" server (i.e., the one that contains the query base) in the beginning. To ensure a fair comparison of the evaluation strategy of the present invention with the prior art approach, the experiment was configured to ensure that the query was submitted to the server that managed its base. The topology cache method of the present invention requires the fetching of the entire topology no matter where the query base is located. As discussed above, this fetching is performed only once during a session to amortize the cost of fetching the topology cache. In the experiments, four different topology configurations were used: (1) left-deep skinny, where each server has two children servers (only one of which may have children), varying the depth of the tree from 0 (one server) to 10 (21 servers); (2) left-deep bushy, where each server has five children servers (only one of which may have children), varying the depth of the tree from 0 (one server) to four (31 servers); (3) balanced skinny, which is a complete balanced binary tree of servers, varying the depth from 0 (one server) to four (31 servers); and (4) balanced bushy, which is a complete balanced 5-ary tree of servers, varying the depth from 0 (one server) to two (31 servers). The results associated with configurations (1) and (2) are shown in FIG. 2(a), and the results associated with configurations (3) and (4) are shown in FIG. 2(b). Since the experiment's purpose is to quantify the difference between the techniques of the prior art and the present invention, the impact of the query answer network traffic had to be controlled, since the traffic itself could mask the differences between the two techniques. Therefore, particular LDAP queries were used that were known to have no matching answers. In either case, however, the evaluation techniques would still have to search the relevant sub-topology to determine that result. Thus, the time taken for communicating the results from the server to the client can be attributed only to the distribution overheads. The experimental results as illustrated in FIG. 2 clearly show the benefit of using the topology cache method of the present invention, since in both FIGS. 2(a) and 2(b), the "nocache" approach curves are above the "cache" approach curves, indicating a longer time period required to respond to the query. Indeed, the benefits only increase with a larger number of servers, where this is emphasized by the fact that in both graphs the absolute value of the different between the nocache and cache curves grows linearly with the number of servers. Further, the graphs show that the "nocache" approach is highly sensitive to the distribution of the servers. That is, as the number of servers increase, the evaluation time for skinny and bushy approaches begin to diverge, demonstrating the sensitivity to the depth of the topology. In contrast, using the topology cache method of the present invention makes the evaluation cost of LDAP queries insensitive to the topology; as the number of servers increases, the evaluation times for the skinny and bushy approaches begin to converge. This is a significant measure of the robustness of the topology cache technique of the present invention. As mentioned above, the utilization of a topology cache in performing queries in distributed directories is also useful with queries that are more complex than LDAP queries. For example, queries with multiple base-entry-DNs, hierarchical queries, and aggregate selection queries also benefit from using the topology cache methodology to generate efficient distributed plans. The utilization of the topology cache technique will be described below as used in association with a hierarchical query with multiple-base-entry DNs. The evaluation of such a query necessarily begins with a definition of distributed plans, and a method for generating the plans. Recall that the directory client obtains the entire topology cache T at the beginning of a session. For the purposes of discussion, it is presumed that the first step is to produce a distributed query plan for evaluating a given query plan Q. The process begins with the simple observation that no matter how complex Q is, its "answer" is always a set of directory entries, each of which belongs to some directory server. In other words, the plan PQ for answering Q can be expressed as a union: where S1, S2, . . . , Sk are the directory servers in the distributed directory (they can be extracted from the topology cache T), and each QSi is a query identifying the contribution of server Si to the global result. The relation PQ is thus defined as the "distributed query evaluation plan" for query Q, where each element QSi is defined as a separate "server query". It is to be noted that the relation PQ does not hold for SQL queries on distributed database. In that case, an answer consists of multiple records, and for each record the different components may come from different servers. This is a critical distinction, enabling the derivation of much more efficient distributed query evaluation plans than in traditional distributed relational databases. When using a distributed query evaluation plan (PQ) with relatively simple LDAP queries, the directory client sends each server query QSi to its associated server Si and unions all of the results. The client needs to choose a "schedule", that is, an order in which to contact the servers. For LDAP queries, possible schedules are: send all queries in parallel, send only a limited number in parallel, or send all queries sequentially. Any schedule is a legal schedule, but, given a particular DIF, some schedules are more efficient than others. For hierarchical and aggregate selection queries, the distributed query evaluation plan PQ still holds, but the difficulty lies in determining the definition of each server query QSi. Consider the following hierarchical query Q for the directory structure of FIG. 1:
A distributed query plan P can be diagrammed as a tree, as shown in FIG. 3(a) for the example discussed above. The distributed query plan tree PT contains nodes that correspond to each subexpression in P, where the root node in PT corresponds to the entire expression P. Non-root nodes correspond either to server queries, or to conditionals. Edges correspond to the (expression, subexpression) relationship. A fragment of the query plan tree for a more complex example is illustrated in FIG. 3(b), for the query: (d Q′ (a Q′ Q2′)), where the expression (a Q′ Q2′) is represented by Q" in FIG. 3(b). There is an essential distinction between nodes corresponding to server queries and those corresponding to conditionals. In particular, each server query node is eventually sent to a directory server for evaluation, but it has to wait for all of its children to be evaluated before the server query expression can be computed. Therefore, the server query nodes can be considered as AND nodes. In contrast, conditional nodes are defined as OR nodes, and computation of the server query expression may proceed if any one of the children evaluates to TRUE. Once the plan P is generated, the directory client has to choose a "schedule", that is, an order in which to send the server queries to the individual servers. Unlike the LDAP queries discussed above, not every "hierarchical" query is legal. Indeed, a schedule for a hierarchical query is legal only if all subplans of the query are evaluated before the query. FIG. 4 contains an exemplary generic scheduling algorithm, which can be instantiated to generate a variety of specific policies. A schedule essentially "evaluates" the plan tree PT and determines the different types of nodes within the tree. The value of the root node and its immediate children are sets of directory entries. The value of all other nodes are Boolean values: TRUE or FALSE. Values are computed in a bottom-up fashion, starting from the leaves. The main loop of the algorithm of FIG. 4 consists in issuing queries to directory servers, then waiting for result events. The functions computeQueryNode(n) and computeConditionalNode(n) are called on the two types of nodes (server queries and conditionals), and attempt to compute the node's value. Before the computation can proceed, the child nodes beneath them must first be computed. Query nodes are AND nodes: all of their children must be computed first. Conditional nodes are OR nodes: if some child is computed and has value TRUE, then the process does not wait for the other children to complete computation; otherwise, if all children have the value FALSE, then the node has the value FALSE. In all other cases, i.e., when some of the children's values are still missing, both computeQueryNode(n) and computeConditionalNode(n) defer the computation. The computation of the node proceeds differently for server query nodes and for conditional nodes. For conditional nodes, the value is computed immediately. For server query nodes, the query expression Q is first generated by expanding all of its "if" macros. Expansion is possible at this point, since all of the values of the Boolean conditions have been obtained. Once the expansion has been prepared, the existential query cache needs to be tested and a scheduling policy determined. With the scheduling policy in place, each separate server query QSi may then be sent to its associate server Si. The rationale behind forming an existential query cache is that multiple nodes in the plan tree PT may have the same (query, server)-pair. For example, in FIG. 4(a), there are two nodes labeled (Exists Q")@S3. Referring to FIG. 4(b), an even greater number of duplicate nodes are shown. It is to be noted that this duplication occurs for a single user query; such duplication can then be presumed to be even more significantly multiplied across a number of different queries. Thus, to avoid repeated computations, a cache of such query results can be maintained at the client. It is important to remember that only the results of existential queries can be cached, where each cache entry contains the following information: (1) the (query, server) pair (QS); (2) its value (TRUE, FALSE, or PENDING); and (3) a list of nodes in PT waiting for that result. The set of such results is thus referred to as the "existential query cache". Once this existential query cache is determined, the scheduling policy is considered. Initially, the cache is empty, and when a pair (QS) needs to be computed, an entry with the value PENDING is created, and that creation results in the query Q being issued to server S. Subsequent results for the same pair (QS) have the effect that the corresponding node is added to the list of nodes waiting for the result of Q@S, and no query will be issued at that time. When the Boolean value of Q@S is then obtained from the server, all of the waiting nodes' values are updated and the query can be issued. In order to issue the query Q, the pair (QS) is entered in the Enabled list. As a result of the query cache, each such pair is entered in the Enabled list at most once. The function chooseFor Schedule(Enabled) selects some of the enabled queries to issue to the directory servers. This is the function that implements a particular scheduling policy. For example, a "greedy" policy returns all nodes in Enabled, and empties the list. A more conservative policy may restrict the number of concurrent requests below a certain limit (i.e., the number of outstanding requests is given by the length of Pending), and perhaps favor nodes lower in the query plan tree. The function LDAP—issueQuery(Q,S) sends the query asynchronously to the server (that is, it does not wait for an answer). Answers are expected by the function LDAP—waitForEvent(e). The event e contains information about the (query, server) pair (Q,S) for which it was issued, as well as a result value. For Boolean queries, the result value is either TRUE or FALSE. For non-Boolean queries, the result value is one directory entry in the answer set: when that answer set has been exhausted, a special End-of-Entries value is returned. The utility of an existential cache for achieving a scalable, distributed evaluation of hierarchical queries is illustrated by way of example in FIGS. 6 and 7. FIG. 7 contains a simple graph containing two curves, representing the performance of the topology cache of the present invention, with and without the use of an existential cache, as used with a hierarchical query. In generating the data for FIG. 7, the OpenLDAP server is extended to deal with hierarchical queries, using the same data sets used for the experiments described hereinabove. Since the goal of this experiment is to measure the distribution overheads of evaluating hierarchical queries (and not the performance of a stand-alone directory server for these queries), a descendant query is used, where each of its sub-queries has an answer in each directory server, but the hierarchical query as a whole has an empty result. As a consequence, the algorithms need to construct complex server queries for each non-leaf directory server in the distributed directory. The results as shown in FIG. 6 were generated for the "left-deep skinny" topology, varying the depth of the tree from 0 to 10 (the number of servers varies from 1 to 21). Without the use of the existential cache, the same query can be posed multiple times to a given server without changing the time associated with developing the answer. These two curves show the exponential nature of the "no cache" approach, as compared to the linear nature of the approach using the existential cache, as a function of the maximum path length between the server that manages the base-entry-DN of the query, and any other server that needs to be contacted. FIG. 7(a) shows the performance results related to the evaluation of the hierarchical selection query without using an existential cache. Essentially, the costs of generating and evaluating a distributed hierarchical query without using an existential cache are proportional to the total number of relevant servers in the topology, as well as exponentially dependent on the depth of the topology. FIG. 7(b) shows the performance of the same query, with the use of an existential cache for the four studied topologies. Two observations can be made from these results: (1) distributed evaluation of hierarchical queries is robust (i.e., the cost is independent of the specific topology and depends only on the number of relevant servers); and (2) the evaluation strategy is scalable with respect to the number of servers (as evident from the linear nature of the curves). Beyond hierarchical queries, the utilization of caches can be extended to aggregate selection queries. As one example, the query Q=(d Q1 Q2 count>=20) returns all directory entries satisfying Q1 that have at least 20 descendants satisfying Q2. In order to evaluate such directory queries efficiently on a DIF, the LDAP servers need to be extended with a new query functionality: computing "aggregate-value queries". In particular, aggregate-value queries have the form (Agg Q), where Agg has the value of: count, sum, min, max or exists. An example of an aggregate-value query is: (count Q2), which returns the number of directory entries satisfying Q2. A distributed evaluation plan for this aggregate-value query on the example of FIG. 1 can thus be developed. As before, the query plan has the form of: where the server query QS2 has to return all entries in S2 satisfying Q1, and that either have at least 20 descendants in S2 satisfying Q2, or they have x such descendants in S2, y such descendants in S3, and x+y≧20. This can be expressed concisely as: Here, [20-((count Q2)@S3)] is a macro that results in a number, once the aggregate-value query (count Q2) is evaluated at server S3. In general, therefore, for aggregate selection queries, numeric macros need to be introduced into the query plans. Scheduling proceeds as with hierarchical and LDAP queries, except that now numeric macros are all AND nodes (i.e., all of their children have to be computed before the node's value can be determined). In general, the use of directories as a key component of the network infrastructure, in applications such as the DEN initiative, has been shown to benefit from the efficient, distributed evaluation of hierarchical and aggregate selection queries, as discussed above. Moreover, the use of the topology cache of the present invention can quickly identify directory servers that are relevant to a query, enhancing the functionality of current day directory servers to be able to support existential and aggregate-value queries. By maintaining and taking advantage of a small existential/aggregate-value cache value at the directory client, both LDAP queries and the richer hierarchical and aggregate selection queries can be evaluated using the topology cache of the present invention, where the topology cache has been found to be both scalable and robust, in that the evaluation cost grows linearly with the number of query-relevant directory servers, and that this cost is independent of the specific topology of the servers.
|
Same subclass Same class Consider this |
||||||||||
