Parallel bottom-up construction of radix trees5826262Abstract A method for partitioning keys onto radix tree logical pages and a parallel index page build algorithm in order to provide radix tree build speedup proportional to the number of processors on the system and controlled efficient page utilization. Also, since keys are intelligently partitioned so that a complete set of keys is inserted into a logical page, there is no page overflow during the tree construction and thus page splitting is eliminated. Since radix index trees are really groups of logical pages in which each logical page contains a small tree, the tree is built (with respect to the logical pages) from the bottom up, while within each individual logical page the tree is constructed from the top down. The space required for a logical page is pre-allocated to allow construction of limbs to begin without waiting for the build of their underlying pages to complete. Claims What is claimed is: Description FIELD OF THE INVENTION
TABLE 1
______________________________________
ARF
BAA
BARK
CHEEP
CHIRP
CRY
GROWL
HOOT
HOWL
MEOW
MOO
PEEP
PLOP
QUACK
ROAR
SQUAWK
SQUEAK
SQUEAL
TWEET
TWITTER
WHINNEY
______________________________________
To build the binary radix tree for keys 118 as shown in table 1, TOD values 116 are computed for each key list entry. Since the TOD value is defined as the value between that entry and its preceding entry, TOD value 0 is specially assigned to the first entry ARF. The other TOD values are calculated, and the contents of key list 114 becomes as shown in table 2.
TABLE 2
______________________________________
TOD Key
______________________________________
0 ARF
6 BAA
19 BARK
7 CHEEP
20 CHIRP
11 CRY
5 GROWL
4 HOOT
18 HOWL
3 MEOW
11 MOO
6 PEEP
11 PLOP
4 QUACK
7 ROAR
2 SQUAWK
29 SQUEAK
47 SQUEAL
7 TWEET
21 TWITTER
5 WHINNEY
______________________________________
Client task 108 starts leaf page level processing by scanning key list 114 from the beginning and looking for the first TOD drop. The TOD decline from 19 to 7 between BARK and CHEEP indicates that BAA and BARK can be grouped to form a starting partition with a resulting TOD of 6. The contents of key list 114 now becomes as show in table 3.
TABLE 3
______________________________________
TOD Key
______________________________________
0 ARF
6 (BAA BARK)
7 CHEEP
20 CHIRP
11 CRY
5 GROWL
______________________________________
Then, the TOD decline from 20 to 11 between CHIRP and CRY indicates that CHEEP and CHIRP can be grouped with a resulting TOD of 7. The contents of key list 114 now becomes as shown in table 4.
TABLE 4
______________________________________
TOD Key
______________________________________
0 ARF
6 (BAA BARK)
7 (CHEEP CHIRP)
11 CRY
5 GROWL
4 HOOT
18 HOWL
3 MEOW
______________________________________
Next, the TOD decline from 11 to 5 between CRY and GROWL indicates the group (CHEEP CHIRP) and CRY could be grouped. However, in this example a logical leaf page can fit only 2 keys, so that CRY cannot be acquired into (CHEEP CHIRP). At this point, groups (BAA, BARK) and (CHEEP, CHIRP) have been formed and will be placed on the server queue for logical page construction. Since C is the leading common text between group (CHEEP, CHIRP) and CRY, C is excluded in that data group to be sent off for logical page construction. The data would be (HEEP, HIRP). Since full page is detected at CRY, the process is continued forward at the next entry, GROWL. The contents of key list 114 now becomes as shown in table 5.
TABLE 5
______________________________________
TOD Key
______________________________________
5 GROWL
4 HOOT
18 HOWL
3 MEOW
______________________________________
Although there is a TOD decline from 5 to 4 between GROWL and HOOT, no grouping can be done because GROWL has no leading entity. Instead, client task 108 moves to the next TOD decline between HOWL(18) and MEOW(3), which indicates HOOT and HOWL can be grouped with a resulting TOD of 4. The contents of key list 114 now becomes as shown in table 6.
TABLE 6
______________________________________
TOD Key
______________________________________
5 GROWL
4 (HOOT HOWL)
3 MEOW
______________________________________
The TOD decline from 4 to 3 between (HOOT HOWL) and MEOW indicates GROWL could be grouped with (HOOT HOWL). However, in this example due to the page being full, GROWL can't be acquired into (HOOT HOWL). At this point, only 1 group (HOOT HOWL) is formed. The process is continued forward from MEOW. Eventually, when the last list entry is processed, the text grouping as shown in table 7 is formed for logical leaf page construction.
TABLE 7
______________________________________
Logical Page Address
Key
______________________________________
p1 BAA BARK
p2 HEEP HIRP
p3 HOOT HOWL
p4 MEOW MOO
p5 PEEP PLOP
p6 QUACK ROAR
p7 EAK EAL
p8 TWEET TWITTER
______________________________________
Key list 114 as shown in table 8 has been formed for prior level processing where the text groupings formed for logical pages are represented with addresses from p1-p8. Any stripped common text that is associated with a logical page is shown as a prefix separated by a dash from the page pointer in table 8.
TABLE 8
______________________________________
TOD Key
______________________________________
0 ARF
6 p1
7 C-p2
11 CRY
5 GROWL
4 p3
3 p4
6 p5
4 p6
2 SQUAWK
29 SQU-p7
7 p8
5 WHINNEY
______________________________________
For this limb level processing, the first TOD decline from 11 to 5 between CRY and GROWL indicates that p2 and CRY can be grouped to form a starting partition with a resulting TOD of 7. The contents of key list 114 becomes as shown in table 9.
TABLE 9
______________________________________
TOD Key
______________________________________
0 ARF
6 p1
7 (C-p2 CRY)
5 GROWL
4 p3
______________________________________
Next, the TOD decline from 7 to 5 between (C-p2 CRY) and GROWL indicates that p1 and (C-p2 CRY) can be grouped with a resulting TOD of 6. The contents of key list 114 becomes as shown in table 10.
TABLE 10
______________________________________
TOD Key
______________________________________
0 ARF
6 (p1 C-p2 CRY)
5 GROWL
4 p3
______________________________________
Then, the TOD decline from 6 to 5 between (p1 C-p2 CRY) and GROWL indicates that ARF and (p1 C-p2 CRY) can be grouped with a resulting TOD of 0. The contents of key list 114 becomes as shown in table 11.
TABLE 11
______________________________________
TOD Key
______________________________________
0 (ARF p1 C-p2 CRY)
5 GROWL
4 p3
3 p4
______________________________________
Then, the TOD decline from 5 to 4 between GROWL and p3 indicates that (ARF p1 C-p2 CRY) and GROWL can be grouped with a resulting TOD of 0. The contents of key list 114 becomes as shown in table 12:
TABLE 12
______________________________________
TOD Key
______________________________________
0 (ARF p1 C-p2 CRY GROWL)
4 p3
3 p4
______________________________________
The TOD decline from 4 to 3 between p3 and p4 indicates that (ARF p1 C-p2 CRY) and GROWL can be grouped with a resulting TOD of 0. The contents of key list 114 becomes as shown in table 13.
TABLE 13
______________________________________
TOD Key
______________________________________
0 (ARF p1 C-p2 CRY GROWL p3)
3 p4
6 p5
4 p6
2 SQUAWK
______________________________________
Then, the TOD decline from 6 to 4 between p5 and p6 indicates that p4 and p5 can be grouped with a resulting TOD of 3. The contents of key list 114 becomes as shown in table 14.
TABLE 14
______________________________________
TOD Key
______________________________________
0 (ARF p1 C-p2 CRY GROWL p3)
3 (p4 p5)
4 p6
2 SQUAWK
______________________________________
The TOD decline from 4 to 2 between p6 and SQUAWK indicates that (p4 p5) and p6 can be grouped with a resulting TOD of 3. The contents of key list 114 becomes as shown in table 15.
TABLE 15
______________________________________
TOD Key
______________________________________
0 (ARF p1 C-p2 CRY GROWL p3)
3 (p4 p5 p6)
2 SQUAWK
. . .
______________________________________
The TOD decline from 3 to 2 between (p4 p5 p6) and SQUAWK indicates that (ARF p1 C-p2 CRY GROWL p3) and (p4 p5 p6) can be grouped. However, in this example, this acquisition can't be done due to page size. At this point, 2 groups are formed and sent to the page construction server queue. The process is continued forward from SQUAWK. Eventually, the groups as shown in table 16 would be formed for limb page level construction.
TABLE 16
______________________________________
ARF p1 C-p2 CRY GROWL p3
p4 p5 p6
SQUAWK SQU-p7 p8 WHINNEY
______________________________________
The list as shown in table 17 would be created for the final trunk page level processing where the above groupings are represented with logical page addresses from p9-p11.
TABLE 17
______________________________________
0 p9
3 p10
2 p11
______________________________________
FIG. 4 shows the resulting tree after this build. The dashed lines depict pointer connections between tree partitions. FIG. 5 is a flow chart that depicts the operation of the control program as it does a parallel index build, according to the preferred embodiment. At block 500, client task 108 partitions the data space records of radix tree 112 into p lists where p is the number of processors 102 available to perform the parallel tree build and dispatches p server tasks 110 to build the keys associated with each list of data space records. At blocks 505, 510, 515, through block 520 server tasks 110 build the key lists. At block 525, client task 108 sorts the p lists of resulting keys resulting in single, sorted key list 114. An example of the processing depicted by block 525 is taught in U.S. Pat. No. 5,179,699, which is hereby incorporated by reference. At block 530, the client task 108 partitions the key list, as further described in the following details. At block 540, client task 108 checks whether the next permissible combination of key partitions fit on the current logical page. If true, client task 108 combines the key partitions at block 545 and continues back to block 540. When next permissible combination no longer fits on the current logical page, client task 108 continues to block 547 where it allocates a logical page from radix tree 112. Then, at block 550, client task 108 builds a logical page request 575 for the current page using the current key partition and sends the request to request queue 570. At block 555, client task 108 checks whether the current logical page is the trunk page of the tree. If the current page is the trunk page, then client task 108 has completed its portion of the tree build. If the trunk page has not yet been reached, at block 560, client task 108 adds an entry representing the logical page whose build request was just issued (in block 550) to key list 114 for the prior page level of the tree. Client task 108 then continues to block 540 to process the next logical page. In this way, client task 108 continues to issue logical page build request 580, 585, through logical page build request 590 to request queue 570, which will be processed by server tasks 110a, 110b, 110c, through 110d, as further described under the description for FIG. 6. FIG. 6 is a flow chart that depicts the operation of server tasks 110 as they perform a top down logical page index construction. At block 655, server task 110 accepts as input the list of sorted keys on a logical page 650, which is part of one of the logical page build requests 575, 580, 585, through 590. At block 660, server task 110 checks whether there are more keys to insert in tree 112. If there are more keys to insert, at block 665 server task 110 gets the next key to insert. At block 670, server task 110 checks whether the node that is the insert point for the key has been reached within tree 112. If not, at block 675, server task 110 advances to the next node. Server task 110 continues checking for the correct insert point and advancing through the nodes until the insert point is reached, at which time server task 110 continues to block 680. At block 680, server task 110 checks whether the key contains a pointer to a subsequent page in tree 112. If the key does not contain a pointer to a subsequent page, then the key is comprised of entirely text, so server task 110 inserts the key in tree 112 at block 685 and continues back to block 660. If the key does contain a pointer to a subsequent page, then server task 110 inserts the lowest byte of text on the page that this key represents as a text node at block 690. At block 695, server task 110 saves the position of the text node for later replacement with a pointer node to the page. At block 700, server task 110 checks whether the text node associated with a previously inserted page pointer key is no longer needed for tree navigation. If true, server task 110 replaces the text node with a pointer node to its associated page at block 705 and continues back to block 660. If false, server task 110 continues directly to block 660. When, at block 660, there are no more keys to insert in tree 112, server task 110 replaces any lingering text nodes with their associated page pointer nodes at block 710 and exits. FIG. 7 shows an article of manufacture or a computer program product including a storage medium for storing thereon program means for carrying out the method of this invention in the system of FIG. 1. It is important to note that while the present invention has been described in the context of a computer system, that those skilled in the art will appreciate that the mechanisms of the present invention are capable of being distributed as a program product in a variety of forms, and that the present invention applies equally regardless of the particular type of signal bearing media used to actually carry out the distribution. Examples of signal bearing media include: recordable type media such as floppy disks and CD ROMs and transmission type media such as digital and analog communications links. An example of such an article of manufacture is illustrated in FIG. 11 as pre-record floppy disk 1002. Floppy disk 1002 is intended for use with a data processing system, and includes magnetic storage medium 1004, and program means 1006, 1008, 1010, and 1012 recorded thereon, for directing control program 110 to facilitate the practice of this invention. It will be understood that such apparatus and articles of manufacture also fall within the spirit and scope of this invention. While this invention has been described with respect to the preferred and alternative embodiments, it will be understood by those skilled in the art that various changes in detail may be made therein without departing from the spirit, scope, and teaching of the invention. For example, although the examples have shown a binary radix tree, the tree could be other types of trees such as an M-ary tree with corresponding changes to the key partition algorithm. A M-ary tree is a tree where a parent node has a maximum of M children. Accordingly, the herein disclosed invention is to be limited only as specified in the claims below. The following is an example of pseudo-code implementation for control program 107.
______________________________________
Main --
/* Process the key list from start to end. Group keys into partitions.
*/
/* For each partition, send off a request for logical leaf page
construction. */
/* Also, build another list for next prior limb level processing. */
Call build.sub.-- a.sub.-- level;
Do while not reaching trunk level
/* Now a new list for prior limb level has already been created. */
/* Proeess the list from start to end. Group entries into partitions. */
/* For each partition, allocate a page and send off a request to build
a limb page.*/
/* Build another list for prior level processing. */
Call build.sub.-- a.sub.-- level;
End; /* end do while
*/
/* Wait for all logical page construction requests to be completed. */
end; /* main */
procedure build.sub.-- a.sub.-- level
build.sub.-- a.sub.-- level:proc;
/* Initialization */
last.sub.-- point := 1;
n :=1;
/* Rolling forward to find a proper split point so that preceding
consecutive keys could be combined to form a group */
Do while (n <= #.sub.-- of.sub.-- keys.sub.-- in.sub.-- list)
If /* there is an increase in the score between consecutive keys or this
is the first processing entry */ (score{n} < score{n+1 }) (n =
last.sub.-- point)
then
;/* No decision can be made. Keep on rolling forward */
else /* Since this is the greatest score from the last starting point,
this
is the proper split point for a group's upper bound. It's proper to
combine Nth key with its preceding entity. However, that can be done
only if the new group could be fitted onto a logical page. */
If (key n and preceding entity would not fit on a logical.page) /* This
implies that preceding entity is a group since at least 2 entries must
be
fitted on a given logical page. Also, this is a proper spot for a new
starting point. */
then
Send off current group(s) for processing;
/* There could be more than one groups pending for
expansion. */
For each group formed, create and add an entry to the new list;
Copy any entry not selected (from the last starting point) to
the new list for next level processing;
/* Remember that all entries up to this point have been
processed */
last.sub.-- point := n + 1;
else /* key n and the preceding entity could be fitted onto a logical
page */
Group Nth key with its preceding entity;
Mark the Nth key and its preceding entity to indicate that they
are already selected/acquired into a partition;
Update left.sub.-- bound accordingly if preceding entity is a key;
/* Now we need to expand this new group by acquiring either its
preceding entity or succeeding key. The adjacent entity with a
higher score will be the candidate for acquisition */
spin.sub.-- back := on;
Roll backward up to the last processing point to find the proper
split point;
Set left.sub.-- bound to the entry preceding the first entry of current
group;
Do while not passing last processing point and page is not full
and backward score is > forward score
(left.sub.-- bound > last.sub.-- point) until (spin.sub.-- back=off);
If TOD between first entry of the group and its preceding
entity is higher than TOD between last entry of the group
and its succeeding entity (score{left.sub.-- bound+1} >
score{n+1}) then
/* this is not a split point */
/* It's proper to combine the preceding entity into the
group. However, that could be done only if the new
group could be fiffed onto a logical page. */
If preceding entity and current group could be fitted
onto a logical page then
Group preceding entity with current group;
If necessary, mark entries to indicate that they are
already selected/acquired into a partition;
Update left.sub.-- bound (index to previous entry) to
continue rolling backward;
If preceding entity is a group then
Skip other entries (if any) already
acquired to this group and directly get to
the entry prior to the first entry of the
current group;
else
Roll back one more entry left.sub.-- bound :=
left.sub.-- bound - 1;
else /* preceding entity and current group would not fit
onto a logical page. It implies this is a valid split point
*/
Send off current group(s) for processing;
For each group formed, create and add an entry
to the new list;
Copy entries not selected to the new list;
/* Remember this is the last processing point so
that is point will not be passed when rolling
backward later */
last.sub.-- point := n +1;
/* Stop rolling backward
spin.sub.-- back := off;
else
/* Stop rolling backward. */
spin.sub.-- back := off;
end; /* rolling backward */
/* Continue rolling forward
n := n + 1;
End; /* rolling forward */
n := n - 1;
/* Process any remaining entries and group. */
/* For each group formed, create and add an entry to the new list. */
/* Copy entries not selected to the new list. */
end; /* procedure build.sub.-- a.sub.-- level */
______________________________________
|
Same subclass Same class Consider this |
||||||||||
