Threaded, height-balanced binary tree data structure5557786Abstract A hierarchical data structure is provided in the form of a tree for storing key-indexed entries. Each entry of the tree includes a balance bias indicator and pointer fields for maintaining (i) link pointers to successive entries in the hierarchy or (ii) thread pointers to preceding entries in the hierarchy. Entries are added to the tree, or removed from the tree, while the tree is maintained in a height-balanced condition. During insertion of a new entry, the tree is traversed along a search path to determine an insertion point for the entry and to determine a potential rebalancing point in the tree. The potential rebalancing point in the tree is identified on the basis of the balance bias indicators of the entries in the search path. The tree is rebalanced, if necessary, about the potential rebalancing point determined during the insertion traversal. Claims What is claimed is: Description FIELD OF THE INVENTION
______________________________________
1. LOCK TREE
CURR <---- HEAD
IF LINK(CURR,1) IS A THREAD;
RETURN ERROR
GOTO 3.
ELSE
CURR <---- LINK(CURR,1)
ENDIF
2. IF LINK(CURR, -1) IS A THREAD;
RETURN SUCCESS
RETURN DATA(CURR)
GOTO 3.
ENDIF
CURR <---- LINK(CURR, -1)
GOTO 2.
3. UNLOCK TREE
______________________________________
The Next Entry Procedure The Next Entry procedure returns the data and key values of a tree entry with a key entry equal to or successive to the input, KEY. If there are no entries of equal or greater value, or if the tree is empty, an error is returned. In step 1, the routine is initialized and the pointer CURR is set to the tree header address. Steps 2 and 3 form a finding loop which exits from step 3 to step 5 if an entry having a key equal to KEY is found or from step 2 to step 4 if the finding path terminates in a thread. In step 4, the thread found in step 2 is followed and the procedure proceeds to step 7. In step 5, the pointer CURR is assigned to indicated the rightward child of the current entry (i.e. one entry to the right of the identically-keyed entry). If the right pointer of the current entry field is a thread, control passes to step 7. Otherwise control passes to step 6. In step 6, the directional indicator is set to follow leftward links, and a loop is executed to follow leftward links until a thread is encountered. In general, following left links from a right-linked child of a given entry results in finding the next greater-keyed entry relative to the given entry. When a thread is encountered, control passes to step 7. In step 7, the current entry passed from step 4, step 5, or step 6, is tested to determine whether the current entry is the head entry. If such is the case, an error is returned indicating that no greater-keyed entries were found. If the passed entry is not the head entry, the key field and data field contents of the current entry are returned. Otherwise, an error indication is returned. The Next Entry procedure may easily be changed to a Previous Entry procedure by reversing the values of the directional vector in step 4, step 5, and step 6. The Next Entry and Previous Entry procedures are used as subroutines of procedures described hereinafter and are referenced as NEXT(KEY) and PREV(KEY) respectively.
______________________________________
1. LOCK TREE
CURR <---- HEAD
DIR <---- 1
2. IF LINK(CURR,DIR) IS A THREAD;
GOTO 4.
ENDIF
3. CURR <---- LINK(CURR,DIR)
DIR <---- SIGN(KEY - KEY(CURR))
IF DIR = 0;
GOTO 5.
ENDIF
GOTO 2.
4. IF DIR = 1;
CURR <---- LINK(CURR,DIR)
ENDIF
GOTO 7.
5. CURR <---- LINK(CURR,1)
IF CURR IS A THREAD;
GOTO 7.
ENDIF
6. IF LINK(CURR,-1) IS A THREAD;
GOTO 7.
ENDIF
CURR <---- LINK(CURR,-1)
GOTO 6.
7. IF CURR=HEAD;
RETURN ERROR
ELSE
RETURN KEY(CURR)
RETURN DATA(CURR)
ENDIF
UNLOCK TREE
______________________________________
The Traversal Procedure The Traversal procedure visits each tree entry in forward sequential order according to the key values of the tree entries. The Postorder Traversal procedure in effect locates the first sequential entry, if any, in a tree and iteratively performs Next Entry searching. A "visit" may include manipulation of the entry or associated fields such as data alteration or retrieval. In step 1, the tree is locked and a test is made to determine whether the tree is empty. If the tree is not empty, the right link to the root entry is followed. Step 2 is a loop in which left links are followed until a thread is reached. When a thread is reached, control passes to step 3. In step 3, a test is made to determine whether the current entry is the tree header. If such is the case, control passes to step 8. Otherwise control passes to step 4. In step 4, the current entry is visited and the desired visitation procedure is performed. The right link/thread field is then followed. If the right link/thread field is a thread, control passes to step 3. If the right link/thread field is a link, control passes to step 5. In step 5, left links are followed proceeding from the rightward child of the most recently visited entry in order to find the entry having the next higher key. When a thread is encountered in the leftward direction, control passes to step 3. In step 6, the tree is unlocked and the program exits. Step 6 is reached by passage of the CURR=HEAD test in step 3 or by detection of an empty tree in step 1. For the condition CURR=HEAD to be true in step 3, the tree header must have been reached by following the right link/thread field from a visited entry in step 4. The only entry from which the header can be reached by following a right link/thread field is the entry having the highest key value in the tree. Hence, when the header has been reached in such a fashion, postorder traversal is completed.
______________________________________
1. LOCK TREE
IF LINK(HEAD,1) IS A THREAD;
GOTO 6.
ELSE
CURR <---- LINK(HEAD,1)
ENDIF
2. IF LINK(CURR,-1) IS A THREAD;
GOTO 3.
ELSE
CURR <---- LINK(CURR,-1)
GOTO 2.
ENDIF
3. IF CURR=HEAD;
GOTO 6.
ENDIF
4. VISIT NODE
CURR <---- LINK(CURR,1)
IF CURR IS A THREAD;
GOTO 3.
ENDIF
5. IF LINK(CURR,-1) IS A THREAD;
GOTO 3.
ENDIF
CURR <---- LINK(CURR,-1)
6. UNLOCK TREE
______________________________________
The Insertion Procedure The insertion procedure finds the insertion point in the tree for a new entry having a memory location of NEW; a key, KEY; and a data field, NEW-DATA. If an entry having KEY already exists in the tree, an error flag is returned. The insertion procedure monitors the balance bits of entries visited during the process of finding the insertion point in order to facilitate restoration of tree balance in the event that insertion causes the tree to become unbalanced. The location of a potential rebalancing point is recorded, updated and maintained during the finding process by the use of variables RP and PRP if the key fields have numeric values. If the key fields have alphanumeric values, a path buffer is employed to store a sequence, or map, of the entries visited during the finding phase and to record the location of potential rebalancing points. By recording the location of potential rebalancing points during the finding phase of the insertion procedure before new entries are inserted, considerable computational overhead is reduced relative to a tree maintenance system employing separate insertion and rebalancing procedures. The insertion procedure anticipates whether rebalancing will be necessitated by insertion of a new entry and determines the location about which such rebalancing will occur. For trees having numeric keys, the variable RP is used to hold the location of the entry which would head an unbalanced subtree subsequent to insertion. The variable PRP is used to hold the location of the entry which was visited prior to RP during the finding process so that the proper link/thread fields may be re-connected after rotation of an unbalanced subtree. For trees having alphanumeric keys, a stack, or path buffer, is maintained to hold the entry locations and directional vectors (corresponding in value to balance bias indicators) representing the path followed during the finding routine. Use of the path buffer obviates the need for repetitive and time-consuming alphanumeric key comparisons during rebalancing. The variable BT is used to point to the top of the path buffer. BT is incremented by the path buffer item length, PBIL, each time that an item is placed onto the path buffer by the PUSH(n,DIR) function. The path buffer item length is typically two--one for the entry location and one for the directional vector. Items are read from the path buffer by the function NODE(n) which returns the entry location of the nth item on the path buffer, and by DIR(n) which returns the directional vector associated with NODE(n). The rebalancing point is recorded in the path buffer item which is pointed to by the variable BB. The function of path buffer pointer BB corresponds to the function provided by the variable RP used for numerically-keyed trees. A variable corresponding to PRP in the number case is not needed since the entry in the finding path prior to the rebalancing point is simply the path buffer item prior to that indicated by BB (i.e. the item indexed from BB by -PBIL). The functional operations of the insertion procedure for both numeric and alphanumeric keys are as follows: The tree is locked, the insertion point is found, and the balance condition in an entry above the insertion point corresponding to the potential rebalancing point, if any, is stored in steps 1-4. If the input entry was not found to be already in the tree during steps 1-4, the new entry is linked into the tree in step 5. In steps 6-9, it is determined whether the inserted entry was the root entry and, if not, balance factors above the inserted entry are adjusted. In step 10, it is determined whether no rotation, single rotation, or double rotation is needed in order to rebalance the tree. Step 11 adjusts link/thread fields for the first phase of double rotation. Step 12 adjusts the link/thread fields for the second phase of double rotation which is a single rotation. Step 13 links the rotated subtree back to the entry visited immediately prior to the rebalancing point in the finding path. In step 14, the tree is unlocked and the error flag, if any, is returned.
______________________________________
1. LOCK TREE
If the keys are numbers, go to 2a.
If the keys are strings, go to 2b.
2a. DIR <---- 1
CURR <---- PRP <---- HEAD
RP <---- LINK(CURR,DIR)
3a. IF LINK(CURR,DIR) IS A THREAD;
GOTO 5.
ENDIF
RC <---- CURR
CURR <---- LINK(CURR,DIR)
DIR <---- SIGN(KEY - KEY(CURR))
IF DIR = 0;
ERRFLAG <---- 1
GOTO 14.
ENDIF
4a. IF BAL(CURR)<>0;
PRP <---- RC
RP <---- CURR
ENDIF
GOTO 3a.
2b. DIR <---- 1
BB <---- RB
BT <---- RB - PBIL
CURR <---- HEAD
PUSH(CURR,DIR)
3b. IF LINK(CURR,DIR) IS A THREAD;
GOTO 5.
ENDIF
CURR <---- LINK(CURR,DIR)
DIR <---- SIGN(KEY - KEY(CURR))
IF DIR = 0;
ERRFLAG <---- 1
GOTO 14.
ENDIF
4b. PUSH(CURR,DIR)
IF BAL(CURR)<>0;
BB <---- BT
ENDIF
GOTO 3b.
5. NODES <---- NODES +1
LINK(NEW,DIR) <---- LINK(CURR,DIR)
LINK(CURR,DIR) <---- NEW
LINK(NEW,-DIR) <---- THREAD(CURR)
BAL(NEW) <---- 0
KEY(NEW) <---- KEY
DATA(NEW) <---- NEW-DATA
ERRFLAG <---- 0
6. IF CURR = HEAD;
HEIGHT <---- HEIGHT + 1
GOTO 14.
ENDIF
If keys are numbers, goto 7a.
If keys are strings, goto 7b.
7a. DIR <---- SIGN(KEY - KEY(RP))
RC <---- CURR <---- LINK(RP,DIR)
SD <---- DIR
8a. IF CURR = NEW;
GOTO 10a.
ENDIF
9a. DIR <---- SIGN(KEY - KEY(CURR))
BAL(CURR) <---- DIR
CURR <---- LINK(CURR,DIR)
GOTO 8a.
10a. IF BAL(RP) = 0;
BAL(RP) <---- SD
HEIGHT <---- HEIGHT + 1
GOTO 14.
ENDIF
IF BAL(RP) = -SD;
BAL(RP) <---- 0
GOTO 14.
ENDIF
IF BAL(RP) = SD;
SB <---- 0
CURR <---- RC
IF BAL(RC) = SD;
GOTO 12.
ELSE
GOTO 11.
ENDIF
ENDIF
7b. RC <---- BB + PBIL
8b. IF RC > BT;
GOTO 10b.
ENDIF
9b. BAL(NODE(RC)) <---- DIR(RC)
RC <---- RC + PBIL
GOTO 8b.
10b. CURR <---- NODE(BB)
SD <---- DIR(BB)
IF BAL(CURR) = 0;
BAL(CURR) <---- SD
HEIGHT <---- HEIGHT + 1
GOTO 14.
ENDIF
IF BAL(CURR) = -SD;
BAL(CURR) <---- 0
GOTO 14.
ENDIF
IF BAL(CURR) = SD;
SB <---- 0
CURR <---- NODE(BB + PBIL)
RP <---- NODE(BB)
IF BAL(CURR) = SD;
GOTO 12.
ELSE
GOTO 11.
ENDIF
ENDIF
11. RC <---- LINK(CURR,-SD)
IF LINK(RC,SD) IS A THREAD;
LINK(CURR,-SD) <---- THREAD(RC)
ELSE
LINK(CURR,-SD) <---- LINK(RC,SD)
ENDIF
LINK(RC,SD) <---- CURR
IF BAL(RC) = SD;
SB <---- -SD
ENDIF
IF BAL(RC) = -SD;
BAL(CURR) <---- SD
ELSE
BAL(CURR) <---- 0
ENDIF
CURR <---- RC
12. IF LINK(CURR,-SD) IS A THREAD;
LINK(RP,SD) <---- THREAD(CURR)
ELSE
LINK(RP,SD) <---- LINK(CURR,-SD)
ENDIF
LINK(CURR,-SD) <---- RP
BAL(CURR) <---- 0
If keys are numbers, go to 13a.
If keys are strings, go to 13b.
13a. IF RP = LINK(PRP,-1);
LINK(PRP,-1) <---- CURR
ELSE
LINK(PRP,1) <---- CURR
ENDIF
BAL(RP) <---- SB
GOTO 14.
13b. IF NODE(BB) = LINK(NODE(BB-PBIL),-1);
LINK(NODE(BB - PBIL),-1) <---- CURR
ENDIF
IF NODE(BB) = LINK(NODE(BB - PBIL),1);
LINK(NODE(BB - PBIL),1) <---- CURR
ENDIF
BAL(NODE(BB)) <---- SB
14. RETURN ERRFLAG
If keys are strings, deallocate path buffer
UNLOCK TREE
______________________________________
The Deletion Procedure The entry deletion procedure is more case-intensive than the insertion procedure due to the many permutations of link/thread fields that may be affected by deletion of a particular entry and the fact that several rebalancing operations may be required. During insertion, entries are added to the bottom of the tree and then rotated into position during rebalancing if necessary. Deletion of an entry, however, may be operative at any location with a tree thus requiring a more complicated readjustment procedure. Since there may be more than one rebalancing point subsequent to a deletion, a path buffer is used for deleting and rebalancing entries from both numerically and alphanumerically keyed trees. In steps 1-3, the tree is locked and the target entry is located by a finding procedure. If the finding procedure reaches a thread, an error flag is set indicating that the target entry was not found in the tree. In step 4, the links and/or threads to the target entry are altered to exclude the target entry from the tree. Links and threads associating neighboring entries are re-adjusted to maintain the integrity of the remaining tree. The detailed deletion procedure is divided into the four cases indicated in the deletion procedure by asterisks. It is noted that these four cases are tested in the sequence shown and thus the defining conditions of the cases are logically cumulative. The cases may be re-ordered as desired depending upon the relative likelihood of their occurrence within a given application as long as the defining conditional tests are modified appropriately. The first case, characterized by the target entry having a right thread, is further divided into two separate situations in which the left link/thread field of the target entry is (i) a thread or (ii) a link. Deletion of such a target entry and adjustment of the neighboring pointers in both situations (i) and (ii) are illustrated in FIGS. 6A-6B. In FIG. 6A, the target entry is respectively shown in situations (i) and (ii) as the entry marked CURR within a portion of a tree having neighboring entries A, B, and C. In FIG. 6B, the relationship of the neighboring entries is respectively shown in situations (i) and (ii) subsequent to deletion of the target entry and adjustment of the pointer fields of the neighboring entries. The specific pointers requiring adjustment in these situations are set forth in the procedure beginning at the step marked *1. The second case is characterized by the target entry having a left thread. The right pointer field in this case contains a link to a subsequent entry since the possibility of a right thread was eliminated in the first case. Deletion and pointer adjustment in the second case are illustrated in FIGS. 7A-7B. In FIG. 7A, there is shown the target entry, CURR, having neighboring entries A and B. In FIG. 7B, there is shown the relationship of the neighboring entries subsequent to deletion of the target entry and adjustment of the neighboring entries' pointer fields. The specific pointers requiring adjustment are set forth in the procedure beginning at the step marked *2. The third case is characterized by LINK(LINK(CURR, 1),-1) which is the situation where the left pointer field of the entry indicated by the right pointer field of the target entry is a thread. In this case, both of the pointer fields of the target entry contain links since all cases involving threads from the target entry have been previously eliminated. Deletion and reconstruction of link/thread fields in the third case is shown in FIGS. 8A-8B. In FIG. 8A, there is shown the target entry, CURR, having neighboring entries A, B, and C. In FIG. 8B, there is shown the relationship of the neighboring entries subsequent to deletion of the target entry and adjustment of the neighboring link/thread fields. The specific link/thread pointers requiring adjustment are set forth in the procedure beginning at the step marked, *3. In the fourth case, the target entry is replaced by its NEXT entry, or the entry of the tree having the next higher key value. Two situations arise in this case depending upon whether the NEXT entry has (i) a right thread or (ii) a right link. Deletion of the target entry and neighboring pointer adjustment in these situations are shown in FIGS. 9A-9B. In FIG. 9A, there is respectively shown the target entry CURR having neighboring entries as defined by situations (i) and (ii). In FIG. 9B, there is respectively shown the same portion of the trees defined by situations (i) and (ii) subsequent to deletion of the target entry and adjustment of the neighboring pointer fields. The specific pointers requiring adjustment are set forth in the procedure beginning at the step marked, *4. In step 5, a loop is started to locate rebalancing points, if any, and to make any necessary adjustments to the balance bits of the entries in the search path to the deletion point. The search path is followed in reverse by pulling items off of the path buffer that was formed during the finding phase. In step 5, the top item in the path buffer is removed from stack. If the resulting entry is the HEAD entry (indicating that the rebalancing loop has reached the beginning of the path buffer), then the deletion procedure proceeds to step 8. Otherwise, the procedure proceeds to step 6. In step 6, the balance bias of the current entry removed from the stack in step 5 is tested to compared to the search path direction taken from the current entry to determine whether rebalancing is required and to adjust the balance bits of the current entry. If the balance bias of the current entry is zero, the bias is reset to the opposite of the search path direction from the current entry and the deletion procedure proceeds to exit at step 8. No further reverse traversal of the search path is needed since a properly maintained tree would have been balanced above a zero-biased entry prior to deletion of the target. Hence, the deletion in this case causes the zero-biased entry to become biased toward the opposite side of the search direction taken from the entry. If in step 6, the balance bias of the current entry is in the same direction as the path vector (BAL(CURR)=DIR ), then the balance bits of the current entry are set to indicate a zero-bias since the deletion caused the subtree below the current entry to become more balanced. The deletion routine then returns to step 5 to obtain the previous path buffer entry. Rebalancing is required whenever it is found in step 6 that the deletion occurred on the side of the current entries subtree that is opposite to the current balance bias (this third conditional test is shown in the routine for purposes of better understanding the routine, but such an explicit statement is superfluous since it is the only remaining condition possible). The appropriate rotation of the subtree is then performed. A loop control variable, CNT, is set during rotation to indicate whether the required rotation eliminated the need to continue following the search path in reverse for further rebalancing or balance bias correction. The procedure then proceeds to step 7. In step 7, the head of the rotated subtree is linked into the rest of the tree at the location preceding the current rebalancing point. The loop control variable, CNT, is tested to determine whether more rotations may be required. If such is the case, the procedure returns to step 5. Otherwise control passes to step 8. In step 8, the memory space previously occupied by the now-deleted entry is released to the memory manager, the NODES or Number of Entries field 41 of the tree header is decremented, the error flag is returned to the invoking process, and the tree is unlocked.
__________________________________________________________________________
1 LOCK TREE
BT <---- RB - PBIL
DIR <---- 1
CURR <---- HEAD
2. IF LINK(CURR,DIR) IS A THREAD;
ERRFLAG <---- 1
GOTO 9.
ENDIF
3. PUSH(CURR,DIR)
CURR <---- LINK(CURR,DIR)
DIR <---- SIGN(KEY - KEY(CURR))
IF DIR = 0;
ERRFLAG <---- 0
GOTO 4.
ENDIF
GOTO 2.
*1
IF LINK(CURR, 1) IS A THREAD;
IF LINK(CURR,-1) IS A THREAD;
LINK(NODE(BT),DIR(BT)) <---- LINK(CURR,DIR(BT))
GOTO 5.
ELSE
LINK(LINK(CURR,-1),1) <---- LINK(CURR,1)
LINK(NODE(BT),DIR(BT)) <---- LINK(CURR,-1)
GOTO 5.
ENDIF
ENDIF
*2
IF LINK(CURR,-1) IS A THREAD;
LINK(LINK(CURR,1),-1) <---- LINK(CURR,-1)
LINK(NODE(BT),DIR(BT)) <---- LINK(CURR,1)
GOTO 5.
ENDIF
*3
IF LINK(LINK(CURR,1),-1) IS A THREAD;
LINK(LINK(CURR,1),-1) <---- LINK(CURR,-1)
LINK(NODE(BT),DIR(BT)) <---- LINK(CURR,1)
BAL(LINK(CURR,1)) <---- BAL(CURR)
PUSH(LINK(CURR,1),1)
LINK(PREV(CURR),1) <---- THREAD(LINK(CURR,1))
GOTO 5.
ENDIF
*4
PUSH(CURR, 1)
SBT <---- BT
CURR <---- LINK(CURR,1)
4a. IF LINK(CURR,-1) IS A THREAD;
GOTO 4b.
ENDIF
PUSH(CURR,-1)
CURR <---- LINK(CURR,-1)
GOTO 4a.
4b. IF LINK(CURR,1) IS A THREAD;
LINK(NODE(BT),-1) <---- THREAD(CURR)
ELSE
LINK(NODE(BT),-1) <---- LINK(CURR,1)
ENDIF
SBT1 <---- BT
BT <---- SBT
LINK(CURR,-1) <---- LINK(NODE(BT),-1)
LINK(CURR,1) <---- LINK(NODE(BT),1)
BAL(CURR) <---- BAL(NODE(BT))
LINK(PREV(NODE(BT)),1) <---- THREAD(CURR)
BT <---- BT - PBIL
LINK(NODE(BT),DIR(BT)) <---- CURR
PUSH(CURR,1)
BT <---- SBT1
5. CURR <---- NODE(BT)
DIR <---- DIR(BT)
BT <---- BT - PBIL
IF CURR = HEAD;
HEIGHT <---- HEIGHT - 1
GOTO 8.
ENDIF
6. IF BAL(CURR) = 0;
BAL(CURR) <---- -DIR
GOTO 8.
ENDIF
IF BAL(CURR) = DIR;
BAL(CURR) <---- 0
GOTO 5.
ENDIF
IF BAL(CURR) = -DIR;
RC <---- LINK(CURR,-DIR)
IF BAL(RC) = BAL(CURR);
BAL(CURR) <---- 0
BAL(RC) <---- 0
CNT <---- 1
GOTO 7.
ENDIF
IF BAL(RC) = -BAL(CURR);
IF BAL(CURR) = BAL(LINK(RC,DIR));
BAL(CURR) <---- -BAL(LINK(RC,DIR))
BAL(RC) <---- 0
ELSE
BAL(CURR) <---- 0
BAL(RC) <---- -BAL(LINK(RC,DIR))
ENDIF
BAL(LINK(RC,DIR)) <---- 0
RC1 <---- LINK(RC,DIR)
IF LINK(RC1,-DIR) IS A THREAD;
LINK(RC,DIR) <---- THREAD(RC1)
ELSE
LINK(RC,DIR) <---- LINK(RC1,-DIR)
ENDIF
LINK(RC1,-DIR) <---- RC
RC <---- RC1
CNT <---- 1
GOTO 7.
ENDIF
IF BAL(RC) = 0
BAL(RC) <---- -BAL(CURR)
CNT <---- 0
ENDIF
ENDIF
7. IF LINK(RC,DIR) IS A THREAD;
LINK(CURR,-DIR) <---- THREAD(RC)
ELSE
LINK(CURR,-DIR) <---- LINK(RC,DIR)
ENDIF
LINK(RC,DIR) <---- CURR
LINK(NODE(BT),DIR(BT)) <---- RC
IF CNT = 1
GOTO 5.
ENDIF
8. Deallocate Memory
NODES <---- NODES - 1
RETURN ERRFLAG
UNLOCK TREE
__________________________________________________________________________
The terms and expressions which have been employed are used as terms of description and not of limitation and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described, or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed.
|
Same subclass Same class Consider this |
||||||||||
