Client/server database system with methods for providing clients with server-based bi-directional scrolling at the server5918224Abstract System and methods are described for integrating the navigational semantic model of PC DBMS environments into the set-oriented model of SQL database environments. More particularly, a Client/Server system of the present invention provides native navigational support on the Server side, without emulation through a multitude of SQL commands from the Client. The Database Server surfaces to the Client an API (Application Programming Interface) having calls corresponding to each verb taken from a navigational context. The Server, in turn, is modified to include knowledge of how to process each navigational verb. To support this functionality, the indexing of the Server is modified to provide bi-directional scrolling. In this manner, the system is able to provide more efficient, higher performance navigational support in an SQL database context. Claims What is claimed is: Description COPYRIGHT NOTICE
______________________________________
blr.sub.-- begin
blr.sub.-- message
.
.
.
blr.sub.-- loop
blr.sub.-- assign
blr.sub.-- eoc
______________________________________
As shown, tt includes variable declarations, assignments, loop constructs, block delimiters (e.g., begin and end), messages, and the like. The "messages" defined within the blr.sub.-- string are the messages passed back and forth, between the Client and the Server. A blr.sub.-- string defines one or more actions to be carried out at the Server (e.g., conditional logic). Since the blr.sub.-- string already represents the low-level native language of the Server, the Server may map it (i.e., compile it) into a parse tree structure suitable for conversion into machine language instructions (for the target processor). In other words, the string is compiled into a tree which the SQL Engine knows how to execute. In a preferred embodiment, using Interbase from Borland, the Engine itself structured as a series of actual machine instructions (conceptually) arranged as a large switch or case statement, for execution upon receipt of a particular node of the parse tree. Particular blr low-level verbs are added to the system. For instance, the navigational command of "SEEK" (corresponding, for example, to the dBASE command of SEEK), is implemented as a blr.sub.-- seek command. This will now be examined in further detail. C. Providing navigational capability to a set-oriented (SQL) database server According to the present invention, the database Server allows the Client to define a navigational string. Since the Server supports the underlying navigational semantics of the Client, no semantic mapping is necessary. In an exemplary embodiment, the following navigational capabilities are supported: Set current index or set to natural order; go to next record; go to previous record; seek forward or backward a specified number of records; go to the beginning or end of the table; search for a key value, including a partial key; and find a particular record in a navigational stream. The following describes the BLR to perform these operations. (1) BLR streams To support navigational semantics, a stream expression is introduced. This expression is similar to a record selection expression (RSE) used by SQL-type requests. Only a subset of the capabilities is supported, in order to reduce the amount of error checking needed. The stream expression is used to specify a single table to use in retrieving records. The "blr.sub.-- stream" construct is a limited version of "blr.sub.-- rse," in that joins, sorts and aggregates are not permitted. The syntax for the blr.sub.-- stream expression is: blr.sub.-- stream {blr.sub.-- relation relation.sub.-- name.vertline.blr.sub.-- rid id}stream ›boolean-expression! blr.sub.-- end Stream numbers used in blr.sub.-- stream and blr.sub.-- rse expressions are unique. The boolean-expression is a restriction on the records that will be retrieved. Its syntax is identical to the boolean-expression for the blr.sub.-- rse expression, and it is used to implement filters and other restrictions on records in a navigational stream. (2) Current index By default, navigation on a stream is done in natural order. The request may change that order dynamically by resetting the current index. This is done with the following syntax: blr.sub.-- set.sub.-- index stream ›index-id! where stream refers to a previously declared blr.sub.-- stream number, and index-id is a value expression evaluating to the index ID to use. All subsequent seeks on this stream are done with respect to this index. An index-id of 0 indicates that the index should be reset to natural order. (3) Seeking through a stream To move through an RSE, the blr.sub.-- for construct is used to automatically loop forward through the records. The blr.sub.-- stream statement is used to declare a stream anywhere within the request (provided it is declared before it is referenced). The new statement "blr.sub.-- seek" is used to move forward and backward through the stream. The syntax for the seek statement is: blr.sub.-- seek stream direction offset where stream refers to a blr.sub.-- stream number, offset is an expression evaluating to an unsigned long, and direction is a single byte indicating: (1) blr.sub.-- forward: move offset records forward from the current record; (2) blr.sub.-- backward: move offset records backward from the current record; (3) blr.sub.-- bof.sub.-- forward: start at the beginning of the stream and move forward offset records; and (4) blr.sub.-- eof.sub.-- backward: start at the end of the stream and move backward offset records. (4) Finding a Key Value The statement "blr.sub.-- fmd" sets the current record in a stream to a record of a given key value. The syntax for the find statement is: blr.sub.-- fmd stream operator backwards arg.sub.-- count value-expression1, . . . , value-expressionN where stream is defined as a previous blr.sub.-- stream number and operator is one of the following values: blr.sub.-- lss, blr.sub.-- leq, blr.sub.-- eq, blr.sub.-- geq, blr.sub.-- gtr. These indicate the comparison operation to perform on the record stream. For example, if a "blr.sub.-- geq" is specified, the Server will find the first record whose key value is greater than or equal to the specified key. The backwards argument is a byte of value 0 to indicate that the search direction is forwards, or value 1 to indicate that the search direction is backwards; arg.sub.-- count is a count of the number of value-expressions to follow, where value-expression is compared with a single segment of the key in the current index. The value-expression may be a partial key operation, such as searching for last name on an index defined on last name, first name, in which case arg.sub.-- count is less than the actual number of keys in the index. (5) Finding a Record To find a particular record when the dbkey is known, the following statement was added: blr.sub.-- find.sub.-- dbkey stream dbkey where stream is a previously defined stream number, and dbkey is an eight-byte quantity evaluating to a dbkey previously retrieved via the blr.sub.-- dbkey expression. Whether the table is specified in indexed or natural order, the find.sub.-- dbkey statement fetches the record specified by the dbkey, and sets the current position of the stream to the position of that record. Whether the stream was set to the same index when the dbkey was fetched is not important. Internal operation A. Scrollable cursors FIGS. 3A and 3B are block diagrams contrasting the approach of the present invention for providing scrollable cursors (FIG. 3B) with a conventional approach (FIG. 3A). In FIG. 3A, the Client 310, which is connected to a Database Server 350 via a network, emulates the scrolling locally, by using a multitude of SQL commands together with a local cache. There are numerous disadvantages to this approach. For one, issuing a large number of SQL commands greatly increases network traffic, thereby degrading the performance of the network. Moreover, the multitude of SQL commands places a tremendous burden on the Database Server 350, as it must expend precious resources parsing and processing each one of the many commands. To make the approach of FIG. 3A function in real-time, a cache memory is required, for offsetting the otherwise poor performance of the arrangement. Employing a cache, particularly a large one, is not without difficulties, however. In particular, additional methods (and associated overhead) are required for maintaining concurrency between the records in the cache and corresponding records resident on the Database Server. In this regard, the cache is employed more for optimizing network I/O, instead of optimizing server load. As an additional disadvantage, a large cache further complicates the task of the client (i.e., specifically the local engine and SQL link or interface) in the instance of updated records. Since the copy of records in the cache may be stale, the client must undertake additional steps to ensure that a record which has been updated locally has not, in fact, already been changed at the server. FIG. 3B is a block diagram illustrating modification of the System 300, now System 300a, in accordance with the present invention. As shown, the Client (now Client 310a) includes a driver which formulates a single BLR statement, instead of a multitude of SQL statements. Further, the Database Server (now Database Server 350a) is modified as follows. First, native navigational support is added to the Server, as shown at 351. In other words, the Server, which is ordinarily set-based, is "taught" navigational techniques. In operation, the Database Server 350a receives the BLR statement from the Client. This is processed by the Server to generate a node tree of executable commands. The node tree is in turn executed by the Server, specifically by a looper routine using recursive descent methodology. The looper routine, which may be viewed as a large "switch" statement, is modified to provide direct navigational support. Looper includes, for example, case arms for seek, find, skip, and the like. In this regard, the looper routine functions as a dispatcher, and the navigational routines function as handlers. These handlers or routines perform their methodology in the context of a cursor. Each of these methods, in turn, "know about" the different way in which the system retrieves data (i.e., traverses indexes). In an exemplary embodiment, the looper method may be constructed in the C programming language as follows.
__________________________________________________________________________
1:
static NOD (request, in.sub.-- node)
2:
{
3:
/**************************************
4:
*
5:
* l o o p e r
6:
*
7:
**************************************
8:
*
9:
* Functional description
10:
* Cycle thru request execution tree. Return next node for
11:
* execution on stall or request complete.
12:
*
13:
*************************************/
14:
STA impure;
15:
NOD *ptr, *end, handlers, node, top.sub.-- node = NULL, prev.sub.--
node;
16:
TRA transaction;
17:
18:
// . . .
19:
20:
/* This is the main execution loop in the engine; switch based
21:
on the current node type in the execution tree. */
22:
23:
while (noe && |(request->req.sub.-- flags & req.sub.-- stall))
24:
{
25:
switch (node->nod.sub.-- type)
26:
{
27:
// . . .
28:
case nod.sub.-- stream:
29:
node = find (node);
30:
break;
31:
32:
case nod.sub.-- find:
33:
node = find (node);
34:
break;
35:
36:
case nod.sub.-- find.sub.-- dbkey:
37:
case nod.sub.-- find.sub.-- dbkey.sub.-- version:
38:
node = find.sub.-- dbkey (node);
39:
break;
40:
41:
case nod.sub.-- set.sub.-- index:
42:
node = set.sub.-- index (node);
43:
break;
44:
45:
case nod.sub.-- set.sub.-- bookmark:
46:
node = set.sub.-- bookmark (node);
47:
break;
48:
49:
case nod.sub.-- release.sub.-- bookmark:
50:
node = set.sub.-- bookmark (node);
51:
break;
52:
53:
case nod.sub.-- end.sub.-- range:
54:
node = RNG.sub.-- end (node);
55:
break;
56:
57:
case nod.sub.-- delete.sub.-- range:
58:
ndoe = RNG.sub.-- delete (node);
59:
break;
60:
61:
case nod.sub.-- delete.sub.-- range:
62:
if (request->req.sub.-- operation == req.sub.-- evaluate)
63:
{
64:
RNG.sub.-- delete.sub.-- ranges (request);
65:
request->req.sub.-- operation = req.sub.-- return;
66:
}
67:
node = node->nod.sub.-- parent;
68:
break;
69:
70:
case nod.sub.-- range.sub.-- relation:
71:
node = RNG.sub.-- add.sub.-- relation (node);
72:
break;
73:
74:
case nod.sub.-- release.sub.-- lock:
75:
if (request->req.sub.-- operation == req.sub.-- evaluate)
76:
{
77:
desc = EVL.sub.-- expr (node->nod.sub.-- arg ›e.sub.-- rellock.sub.--
lock!);
78:
RLCK.sub.-- release.sub.-- lock (* (LCK*) desc->desc.sub.-- address);
2
79:
request->req.sub.-- operation = req.sub.-- return;
80:
}
81:
node = node->nod.sub.-- parent;
82:
break;
83:
84:
case nod.sub.-- release.sub.-- locks:
85:
if (request->req.sub.-- operation == req.sub.`3 evaluate)
86:
{
87:
RLCK.sub.-- release.sub.-- locks (request->req.sub.-- attachment);
88:
request->req.sub.-- operation = req.sub.-- return;
89:
}
90:
node = node->nod.sub.-- parent;
91:
break;
92:
93:
case nod.sub.-- force.sub.-- crack:
94:
if (request->req.sub.-- operation == req.sub.-- evaluate)
95:
{
96:
RSE.sub.-- MARK.sub.-- CRACK (* (RSB*)node->nod.sub.-- arg
›1!,rsb.sub.-- crack.vertline.rsb.sub.-- forced.sub.-- crack);
97:
request->req.sub.-- operation = req.sub.-- return;
98:
}
99:
node = node->nod.sub.-- parent;
100:
break;
101:
102:
case nod.sub.-- reset.sub.-- stream:
103:
if (request->req.sub.-- operation == req.sub.-- evaluate)
104:
{
105:
RSE.sub.-- reset.sub.-- position (* (RSB*) node->nod.sub.-- arg
›e.sub.-- reset.sub.-- from.sub.-- rsb!,
106:
request->req.sub.-- rpb+(USHORT)node->nod.sub.-- arg ›e.sub.--
reset.sub.-- to .sub.-- stream!);
107:
request->req.sub.-- operation = req.sub.-- return;
108:
}
109:
node = node->nod.sub.-- parent;
110:
break;
111:
112:
default:
113:
BUGCHECK (168); /* action not yet implemented */
114:
}
115:
}
116:
117:
// . . .
118:
119:
return node;
120:
}
__________________________________________________________________________
The above case arms show the processing associated with particular verbs. The nod.sub.-- find (line 32) case arm finds a record based on a key value. The nod.sub.-- find.sub.-- dbkey (line 36) case arm, on the other hand, finds a record based on a dbkey. The nod.sub.-- set.sub.-- index (line 41) case arm "sets" a particular index, such as done in dBASE, for establishing a particular order of the table. The nod.sub.-- set.sub.-- bookmark and nod.sub.-- release.sub.-- bookmark case arms (lines 45 and 49) sets and releases bookmarks, respectively. Finally, a nod.sub.-- stream is what a BLR stream maps to; it opens a stream. B. Modification of B-tree indexes 1. Overview Providing native navigational support alone is not sufficient for creating scrollable cursors at the Server level. Instead, the basic indexes which support logical views of the Server's relational database tables must themselves be modified to support navigational capability. Moreover, new index traversal methods are required for processing the modified indexes. Before describing these in detail, it is first helpful to examine the detailed structure of indexes employed by the Database Server. 2. Index structure In the system of the present invention, an index is employed to store pointers to all the records in a relation (i.e., table), sorted on the fields for which the index is defined. As shown in FIG. 4A, an index is organized as a tree 400 of pages, collectively known as a B-tree. Trees are organized into levels, numbered from 0 at the lowest level and increasing towards the root of the tree. The tree 400, for example, is organized into three levels. Pages at levels 1 and higher contain pointers to pages at the next lower level. Each page has a sibling pointer which points to the next page in the same level. As shown in FIG. 4B, a B-tree page layout, such as page layout 430, specifies that each page hold a variable number of variable-length nodes. The page is organized as follows. (1) page header; flags for a B-tree page could be: page is first one in its level page has not been propagated to a higher level page is marked for deletion page is part of a descending index (2) page number of the next B-tree page on the same level (3) relation id; redundant info for consistency check (4) index id; redundant info for consistency check (5) length of data in the page (6) level number of the page in the index, starting with 0 for the root page (7) an array of leaf nodes (followed by end-of-page marker) (8) total length of all the nodes (for ease in copying) The end-of-page marker is used to mark the end of the array. In an exemplary embodiment, this is an empty node with the record number set to -2. In addition, an end-of-level marker is used to mark the end of the last page in the level. This is an array node with a record number of -1. The B-tree nodes, which are stored within the B-tree pages, have the following format: (1) record number or page number (pointer): points to record or B-tree page (2) length of prefix: the number of bytes that the key value has in common with the previous record's key value (3) length of data: the number of bytes left after the prefix (4) data: the bytes remaining after the prefix Each B-tree node contains a pointer, either to a record or to another B-tree page. Nodes at level 0 are called leaf nodes; each contains a record number. Nodes at higher levels contain the page number of the B-tree page at the next level. The next three fields in the B-tree node describe the key value of a particular record. The key value is a concatenation of the values of the indexed fields for that record. Key values stored at levels 1 or higher contain the first key value stored on the lower level page, making it possible to step down the tree to find a particular record or range of records. Key values are stored in the index using "prefix compression," which is illustrated by the prefix compression table 440 in FIG. 4C. Since nodes are arranged in sorted order, it is often the case that characters in common exist between successive key values (stored in previous nodes). Prefix compression takes advantage of these characters in common. Table 440 illustrates the approach for an index defined on a character field. Here, the leftmost column describes the value of the indexed field for the first few records in the index. The next three fields contain the values which are stored in the corresponding B-tree node. When a particular page is required at run time, it is read in from disk and "expanded" in system memory: its key values are dynamically expanded into their uncompressed state. When an index is defined, during system operation, the records in the relation or table are sorted on the key field(s) of the index. For each record, the key value and the record number are both inserted in the index. At the outset, only one B-tree page in the index, with a level of 0, exists in the index. Nodes are added successively to the end of the page, using prefix compression to store the key values. When that page is filled, a sibling page is created at level 0. A new root page is allocated at level 1, which points to both level 0 pages. Subsequent records are then added to the sibling page. As new pages are created at level 0, they are added to the root page at level 1. When a level 1 page is filled, a sibling page is allocated. A new root page at level 2 is created, which points to both level 1 pages. The process continues up to a maximum number of levels (e.g., 16 levels). FIG. 4D illustrates a simple B-tree index page 460 with one level, defined on an EMPLOYEES relation. 3. B-Tree modifications for bi-directional navigation (a) Overview Conventionally, SQL database systems only include methods for traversing forward through leaf-level pages of B-tree indexes. One reason that SQL database servers are limited to this forward traversal is that the key value information stored in the pages is compressed using the above-described prefix compression. Since each successive key value eliminates information in common with its predecessor key value, the task of traversing both within a leaf page and between leaf pages is complicated. One cannot simply pick a starting node and then traverse backwards. The actual key value stored at that point cannot be determined as the information in common with prior nodes is not known, since no previous nodes have been visited in this instance. In order to guarantee that one can read any node on the page, it may be necessary to go back to the very beginning of the page (or find the last node which is fully described). Unlike B-tree indexes in a conventional SQL (i.e., set-oriented) database server, the indexes in the system of the present invention are modified to support bi-directional navigation through a cursor, including both intra-page and inter-page navigation. The former is traversing forward and backward within a leaf page; the latter is traversing forward and backward along the leaf pages. Each will be described in turn. (b) Modification for intra-page navigation The indexes are modified so that the Database Server can navigate both forward and backward within a particular leaf-level page of the B-tree indexes themselves. As shown in FIG. 5, each of the B-tree leaf pages 530, upon expansion into expanded leaf pages 540, stores not only uncompressed key values but also the length of the prior key (L.sub.1) and the length of the current key (L.sub.2). This allows the system to go forward to the next node or backward to the previous node, even though the key values at the nodes are prefix compressed. The lengths may be expressed in terms of relative (or even absolute) offsets. The expanded pages themselves are maintained in a buffer cache, which is located (conceptually) between the Database Server and the database table. When a page is expanded, therefore, it is assumed that it will be needed again (i.e., by the current process or by another process). When navigating back and forth in an index, the system can utilize a previously-expanded page, so long as another process is not updating that specific page. When the original B-tree page is updated, the expanded page is released (i.e., freed from memory). Also shown in FIG. 5, each of the leaf pages 530 stores the amount of prefix compression for the page so that the memory buffer required for expansion of that page is known before the actual expansion (i.e., into an expanded index page) is performed. Since the memory buffer required for holding the largest "theoretical" index page which has expanded can be quite large, on the order of 100 megabytes or more, the system maintains a running count of expanded (uncompressed) space required by each node. As a new node is added, the running count is updated accordingly. In this manner, pages can be expanded in an efficient manner. (c) Modification for inter-page navigation The indexes are further modified so that the Database Server can navigate both forward and backward between leaf-level pages of the B-tree indexes. FIG. 6 illustrates inter-page navigation 600 for a tree comprising a root page 610 and leaf page(s) 620. The leaf pages are modified to add, in addition to forward page links, backward page links. For the two-level index 600 in FIG. 6, each leaf page on disk stores two page numbers: one pointing forward (pointer 601) and another pointing backward (pointer 603). Thus, each page stores a page pointer or ID for identifying both the previous and the next (logical) page (i.e., in chain of pages). The pages themselves need not be stored contiguously on disk. Instead, the system, using the page identifiers or numbers stored on disk, loads relevant pages on an "as needed" basis. Also shown, the root page 610 stores page numbers so that the root page stores key/page number combinations. Thus, the root page resembles the leaf pages in that it stores key values which are prefix compressed; instead of storing record numbers, however, the root page stores page numbers. The page numbers, in turn, refer to the pages as they are stored on disk. To access page number 101, for instance, the system would seek to a location which is the value of the page number (e.g., 101) multiplied by the size of the page (e.g., 1024 bytes). The page numbers, therefore, refer to offsets within the particular file set which comprises a single logical file (which may be made up of one or more physical files). Besides adding bi-directional page pointers, the system also modifies the "handoff" of pages, when traversing from one page to another. Consider a scan of an index. Such an operation would occur, for instance, in the following SQL query: SELECT* FROM employees ORDER BY emp# Here, the emp# field represents the field which the table is indexed on. The query would be implemented using the index, since it contains the result, already pre-sorted. Particularly, the query would be satisfied by scanning the emp# index: starting at the beginning of the index and traversing the sequence of leaf pages (as specified by the page pointers). The record numbers would be read from the nodes of the leaf pages and the corresponding records would be retrieved and returned to the user. When traversing the leaf pages, the currently-held leaf page is not released until the system successfully obtains a lock on the next leaf page. The process of locking the next page and releasing the previous page is the "handoff." This is required to ensure concurrency--that another process does not split a page while it is being viewed (i.e., by the current process). If a lock were not obtained on the next page before traversing, it is possible that the next page might be split, thereby generating a new page between the two pages (i.e., between the two pages which the current process is traversing). As part of concurrency control, therefore, pages are handed off in a left-to-right fashion. Consider a query which scans the index in an ascending fashion executed simultaneously with a query which scans the index in a descending fashion. In such an instance, it is possible to trigger a "deadlock." A deadlock occurs when a first process (e.g., ascending scan) waits for a page locked by a second process (e.g., descending scan), which itself is waiting for a page locked by the first process. If no other precautions are undertaken, deadlocks will occur when pages are handed off not only from left-to-right, but also from right-to-left. C. Navigation methods 1. Intrapage Intrapage navigation is the process of moving among records on a single page. The workhorse for doing the intrapage navigation is a NA.sub.-- get.sub.-- record function which, in an exemplary embodiment, may be constructed as follows (using C programming language):
__________________________________________________________________________
1:
BOOLEAN NAV.sub.-- get.sub.-- record (
2: RSB rsb,
3: IRSB.sub.-- NAV
impure,
4: RPB *rpb,
5: RSE.sub.-- GET.sub.-- MODE direction)
6:
{
7:
/****************************************
8:
*
9:
* N A V .sub.-- g e t .sub.-- r e c o r d
10:
*
11:
*****************************************
12:
*
13:
* Functional description
14:
* Get a record from a stream, either in
15:
* the forward or backward direction, or the
16:
* current record. This routine must set
17:
* BOF, EOF, or CRACK properly.
18:
*
19:
****************************************/
20:
IDX *idx;
21:
IRB retrieval;
22:
NOD retrieval.sub.-- node;
23:
BTR page;
24:
BTN node, next;
25:
EXP expanded.sub.-- page;
26:
BTX expanded.sub.-- node, expanded.sub.-- next = NULL;
27:
WIN window;
28:
SLONG
number;
29:
USHORT
l;
30:
KEY key, upper, lower;
31:
UCHAR
*p, *q;
32:
33:
idx = (IDX*) ((SCHAR*) impure
34:
+ (SLONG) rsb->rsb.sub.-- arg ›RSB.sub.-- NAV.sub.-- idx.sub.`3
offset!);
35:
36:
// ...
37:
38:
/* find the last fetched position from the index */
39:
window.win.sub.-- page = impure->irsb.sub.-- nav.sub.-- page;
40:
next =
get.sub.-- position (rsb, impure, &window, direction,
41: &expanded.sub.-- next);
42:
MOVE.sub.-- FAST
(impure->irsb.sub.-- nav.sub.-- data, key.key.sub.-- data,
43: impure->irsb.sub.-- nav.sub.-- length);
44:
retrieval.sub.-- node
45:
= (NOD) rsb->rsb.sub.-- arg ›RSB.sub.-- NAV.sub.-- index!; retrieval
46:
47:
/* set the upper (or lower) limit for navigational retrieval */
48:
49:
if ((direction == RSE.sub.-- get.sub.-- forward) && retrieval->irb.sub.
-- upper.sub.-- count)
50:
{
51:
upper.key.sub.-- length = impure->irsb.sub.-- nav.sub.-- upper.sub.--
length;
52:
MOVE.sub.-- FAST ((impure->irsb.sub.-- nav.sub.-- data
53:
+ (SLONG) rsb->rsb.sub.-- arg ›RSB.sub.-- NAV.sub.-- key.sub.--
length!),
54:
upper.key.sub.-- data, upper.key.sub.-- length);
55:
} else if ((direction == RSE.sub.-- get.sub.-- backward)
56: && retrieval->irb.sub.-- lower.sub.-- count)
57:
{
58:
lower.key.sub.-- length = impure->irsb.sub.-- nav.sub.-- lower.sub.--
length;
59:
MOVE.sub.-- FAST ((impure->irsb.sub.-- nav.sub.-- data +
60:
(SLONG) rsb->rsb.sub.-- arg ›RSB.sub.-- NAV.sub.-- key.sub.--
length!),
61:
lower.key.sub.-- data, *lower.key.sub.-- length);
62:
}
63:
/* Find the next interesting node.
64:
If necessary, skip to the next page */
65:
66:
for (;;)
67:
}
68:
node = next;
69:
expanded.sub.-- node = expanded.sub.-- next;
70:
if (node)
71:
number = BTR.sub.-- get.sub.-- quad (BTN.sub.-- NUMBER (node));
72:
page = (BTR) window.win.sub.-- buffer;
74:
/* in the backwards case, check to make sure we haven't hit the
75:
beginning of a page, and if so fetch the left sibling page.*/
76:
77:
if (direction == RSE.sub.-- get.sub.-- backward)
78:
{
79:
if (node < (BTN) page->btr.sub.-- nodes)
80:
{
81:
expanded.sub.-- page = window.win.sub.-- expanded.sub.-- buffer;
82:
83:
/* if there is no left sibling, we are at bof */
84:
85:
if (|page->btr.sub.-- left.sub.-- sibling)
86:
{
87:
RSE.sub.-- MARK.sub.-- CRACK (rsb, rsb.sub.-- bof);
88:
break;
89:
}
90:
91:
/* otherwise, find the page to the left
92:
and go to the end of it */
93:
94:
page = BTR.sub.-- left.sub.-- handoff (&window, page, LCK.sub.--
read);
95:
expanded.sub.-- page = NAV.sub.-- expand.sub.-- index (&window,
NULL.sub.-- PTR);
96:
next = BTR.sub.-- last.sub.-- node (page, expanded.sub.-- page,
&expanded.sub.-- next);
97:
continue;
98:
}
99:
}
100:
else
101:
/* In the forwards case, check for end of page. If we find
102:
it, do a simpie handoff to the right sibling page. */
103:
104:
{
105:
if (number == END.sub.-- LEVEL)
106:
{
107:
RSE.sub.-- MARK.sub.-- CRACK (rsb, rsb.sub.-- eof);
108:
break;
109:
}
110:
111:
/* if find the end of the page, go to the page on the right */
112:
113:
if (number == END.sub.-- BUCKET)
114:
{
115:
page = (BTR) window.win.sub.-- buffer;
116:
page = (BTR) HAN.DOFF (&window, page->btr.sub.-- sibling,
117:
LCK.sub.-- read, pag.sub.-- index);
118:
119:
next = (BTN) page->btr.sub.-- nodes;
120:
If (expanded.sub.-- age = window.win.sub.-- expanded.sub.-- buffer)
121:
expanded.sub.-- next = (BTX) expanded.sub.-- age->exp.sub.--
nodes;
122:
continue;
123:
}
124:
}
125:
126:
/* Build the current key value from the prefix
127:
and current node data. */
128:
129:
if (expanded.sub.-- node)
130:
{
131:
if (1 = BTN.sub.-- LENGTH (node) + BTN.sub.-- PREFIX (node))
132:
{
133:
p = key.key.sub.-- data;
134:
q = expanded.sub.-- node->btx.sub.-- data;
135:
do *p++ = *q++; while (--1);
136:
}
137:
}
138:
139:
else
140:
{
141:
if (1 = BTN.sub.-- LENGTH (node))
142:
{
143:
As shown (at lines 2-5), the function is invoked with four parameters: rsb, impure, rpb, and direction. Direction simply indicates whether the method is to operate in a forward or backward direction. Impure is a pointer to an area which is the variable-link tail of a request--that is, the data area of a request. In the context of this routine, the area is used for storing index descriptions, for example. The rsb and rpb parameters require further explanation. The rsb parameter defines a Record Source Block. This is a data member describing the record stream and, thus, acts as a descriptor for the record. In an exemplary embodiment, the record source block may be constructed as follows:
______________________________________
/* Record Source Block */
typedef struct rsb {
struct blk rsb.sub.-- header;
RSB.sub.-- T
rsb.sub.-- type;
UCHAR rsb.sub.-- stream;
USHORT rsb.sub.-- count;
USHORT rsb.sub.-- flags;
ULONG rsb.sub.-- impure;
ULONG rsb.sub.-- cardinality;
struct rsb
*rsb.sub.-- next;
struct rel
*rsb.sub.-- relation;
struct str
*rsb.sub.-- alias;
struct prc
*rsb.sub.-- procedure;
struct fmt
*rsb.sub.-- format;
struct nod
*rsb.sub.-- any.sub.-- boolean;
struct rsb
*rsb.sub.-- arg ›1!;
} *RSB;
______________________________________
The data members function as follows: (a) rsb.sub.-- header: rsb header (b) rsb.sub.-- type: type of rsb (c) rsb.sub.-- stream: stream, if appropriate (d) rsb.sub.-- count: number of sub arguments (e) rsb.sub.-- flags: flags (f) rsb.sub.-- impure: offset to impure area (g) rsb.sub.-- cardinality: estimated cardinality of stream (h) *rsb.sub.-- next: (pointer to) next rsb, if appropriate (i) *rsb.sub.-- relation: (pointer to) relation, if appropriate (j) *rsb.sub.-- alias: (pointer to) SQL alias for relation (k) *rsb.sub.-- procedure: (pointer to) procedure, if appropriate (l) *rsb.sub.-- format: (pointer to) format, if appropriate (m) *rsb.sub.-- any.sub.-- boolean: (pointer to) any/all boolean The other parameter, rpb, describes a Record Parameter Block or RPB data structure:
______________________________________
/* Record Parameter Block */
typedef struct rpb {
SLONG rpb.sub.-- number;
SLONG rpb.sub.-- transaction;
struct rel
*rpb.sub.-- relation;
struct rec
*rpb.sub.-- record;
struct rec
*rpb.sub.-- prior;
struct srpb
*rpb.sub.-- copy;
struct rec
*rpb.sub.-- undo;
USHORT rpb.sub.-- format.sub.-- number;
SLONG rpb.sub.-- page;
USHORT rpb.sub.-- line;
SLONG rpb.sub.-- f.sub.-- page;
USHORT rpb.sub.-- f.sub.-- line;
SLONG rpb.sub.-- b.sub.-- page;
USHORT rpb.sub.-- b.sub.-- line;
UCHAR *rpb.sub.-- address;
USHORT rpb.sub.-- length;
USHORT rpb.sub.-- flags;
struct win rpb.sub.-- window;
} RPB;
______________________________________
The data members function as follows: (a) rpb.sub.-- number: record number in relation (b) rpb.sub.-- transaction: transaction number (c) *rph.sub.-- relation: (pointer to) relation of record (d) *rpb.sub.-- record: (pointer to) final record block (e) *rpb.sub.-- prior: (pointer to) prior record block if this is a delta record (f) *rpb.sub.-- copy: (pointer to) rpb copy for singleton verification (g) *rpb.sub.-- undo: (pointer to) the first version of data if this is a second modification (h) rpb.sub.-- format.sub.-- number: format number in relation (i) rpb.sub.-- page: page number (j) rpb.sub.-- line: line number on page (k) rpb.sub.-- f.sub.-- page: fragment page number (l) rpb.sub.-- f.sub.-- line: fragment line number on page (m) rpb.sub.-- b.sub.-- page: back page (n) rpb.sub.-- b.sub.-- line: back line (o) *rpb.sub.-- address: address of record without header (p) rpb.sub.-- length: length of record (q) rpb.sub.-- flags: housekeeping flags (r) win: window into table When a record is read in by the system, the information for the record (e.g., stored in the record header) is copied to the Record Parameter Block. This, in turn, stores a pointer to the actual record (i.e., data) itself. In general, the NAV.sub.-- get.sub.-- record method operates by retrieving a record from the record stream, either in a forward or backward direction (or the current record). The routine then sets BOF (Beginning of File), EOF (End of File), and Crack (i.e., current crack--position before or after a record). Collectively, these are logical constructs which assist in navigation. The record source block (rsb) stores a flag indicating a current state of the record stream--that is, whether at BOF, EOF, or crack. The record stream description, in this context, is a description of the table (i.e., stream of records). Thus, rsb describes that stream. The impure area, for this routine, stores all the indexes for this record stream (table). Also, recall that there is a node in the prefix compressed page and a corresponding node on the expanded page (i.e., the "expanded node"). Those two pointers move in lockstep during the method; if one is updated, so is the other. The method begins by setting up state information about the current location (i.e., where the user is currently positioned, logically, in the table). The method sets up a "window block" which is a logical data structure into which a data page may be retrieved. This creates a "window" into the database file. At this point, the window block is set to a particular page. The method proceeds to retrieve the corresponding BTN--the B-tree node for that page which is at the current position. In other words, at this point the method gets the page number from the impure area and then fetches that page (i.e., B-tree leaf page); this tells the system the current position in the table. Next, the method extracts the data from the B-tree node and stores it in a local key variable. Specifically at this point, the method reads the key from the impure area and copies it into a local key variable. This is done in preparation for retrieving data via an index. The method then sets the end condition: the key value at which the method will stop retrieving data. The end condition sets an upperbound (i.e., a value which is not to be exceeded) when traversing in a forward direction, and sets a lowerbound (i.e., a value below which further retrieval is not desired) for backward traversal. At this point, the method has performed initialization and found the previous node. The method proceeds to find the next node as follows. A loop is established to look for the next node, either forward or backward (depending on the input parameter). If necessary, the method will cross a page boundary--skipping to the next page. In the backward case, the method fetches the left sibling page. If there is no left sibling, then the method has reached the beginning of the file (BOF). If, however, a left sibling is present, the method traverses backwards by performing a left handoff. Thus, intrapage navigation switches to interpage navigation once the end of the current page (i.e., being navigated) has been reached. In the instance where the beginning of file is reached, the BOF flag is set accordingly. Similarly, the EOF flag is set for the instance where the method has reached the end of the file. In the instance where interpage navigation is required, the handoff is performed using a fault-tolerant method described below (under interpage navigation). In a forward traversal, the method performs a locked handoff to a right sibling page, upon reaching the end of a current page. If the method reaches the end of file, it sets the EOF flag. During the traversal, the method keeps the current key value (i.e., the one stored in the local variable) up to date with the new node. If the method, for instance, has traversed backwards, it must go to the expanded node for retrieving the current key value (because the key values are prefix compressed). For going forward, however, the method need not expand the page. Instead, the key value is set to the current key data plus the prefix. In other words, when traversing forward, the prefix is already known, without having to access the expanded node. The method next checks the boundary limit (i.e., upper or lower limit), for determining whether the boundary has been reached. This is done by comparing the current key against the upper (or lower) value limit. Upon reaching the limit, the crack is set accordingly and the method breaks out of the loop. The method then calls a set.sub.-- position routine for setting the current position to the one found. Recall that when the method was invoked, the current position was determined from the impure area. The call to set.sub.-- position now sets the position of the impure area to reflect the node found. The method then concludes by releasing the logical window (i.e., releasing the lock from the B-tree page). The method may now retrieve the physical data record, which is stored on a data page. If a physical data record is retrieved, the method is done. If not, however, the method continues in the loop, for getting the next node. If the method is successful in finding a record, it returns a Boolean value of true; otherwise, it returns a Boolean value of false. 2. Interpage Left-to-right interpage navigation may be done in a conventional manner, by using a locked handoff. The right-to-left navigation cannot be done in the same conventional manner, however, as deadlocks will result. A fault-tolerant method for traversing prior pages (i.e., right-to-left traversal), yet avoiding deadlocks, is as follows. The current page is released and the system proceeds to the previous page. Once at the previous page, the system checks whether that page's forward pointer points to the page which was just left. If the two are not identical, the system reads the forward pointer of the then-current page and then traverses to the right. It continues this rightward traversal until it reaches the page which it originally departed from. Once there, the system repeats the process of going left until it reaches a page whose forward pointer agrees with the page which was just departed from. Instead of handing off, therefore, the system just assumes at the outset that the new page is correct. That assumption is then verified to confirm that the new page's forward pointer agrees with the page number of the page which was departed from. If the validation does not hold true, the system steps right until it returns to the original page, whereupon it repeats the process until a new page is found whose forward pointer agrees with the page number of the original page. FIG. 7 illustrates the effect of page splitting on the page pointers. Initially, page #100 (shown at 701) stores a Page Pointer 702 pointing to page #144 (shown at 711). Page #100 splits into two pages: page #100 (shown at 701a) and page #205 (shown at 705). As a result of the split, the forward page pointer for record #100 is set to the value of 205 (shown at 702a); page #205 has its forward page pointer set to the value of 144, thereby linking these two pages into the page chain. At this point, however, backward pointers (e.g., backward pointer of page #205) are not updated. Preferably, the first process which traverses backwards and finds an incorrect backward-pointing pointer corrects it. In this manner, the overhead for updating backward pointers is shifted to those clients requiring bi-directional scrollable cursors. For a client or user which does not require bi-directional scrollable cursors, therefore, there is little or no impact on performance since those clients employ a left-to-right traversal methodology which does not include any overhead for backwards traversal. In an exemplary embodiment, a left handoff method may be constructed in the C programming language as follows:
__________________________________________________________________________
1:
BTR BTR.sub.-- left.sub.-- handoff (window, page, lock.sub.-- level)
2:
{
3:
/**************************************
4:
*
5:
* B T R .sub.-- l e f t .sub.-- h a n d o f f
6:
*
7:
**************************************
8:
*
9:
* Functional description
10:
* Handoff a btree page to the left. This is more difficult than
11:
* right handoff because we have to traverse pages without handing
12:
* off locks. (A lock handoff to the left while someone was handing
13:
* off to the right could result in deadlock.)
14:
*
15:
*************************************/
16:
SLONG original.sub.-- age, sibling, left sibling;
17:
WIN fix.sub.-- win;
18:
BTR fix.sub.-- page;
19:
DBB dbb;
20:
21:
/* set local variables to original page and its left sibling */
22:
23:
original.sub.-- page = window->win.sub.-- page;
24:
left.sub.-- sibling = page->btr.sub.-- left.sub.-- sibling;
25:
26:
/* release the current window first; opens a race condition
27:
in which someone else could update the page so we need to be able to
28:
handle a page which is inserted to the left of us */
29:
30:
RELEASE (window);
31:
32:
/* fetch the page to the left */
33:
34:
window->win.sub.-- page = left.sub.-- sibling;
35:
page = (BTR) FETCH (window, lock.sub.-- level, .pag.sub.-- index);
36:
37:
/* if the page is the one we were expecting based on the page number
38:
cached on the original page, we're done */
39:
40:
if ((sibling = page->btr.sub.-- sibling) == original.sub.-- page)
41:
return page;
42:
43:
/* Since not handing off pages, a page could split before
44:
* we get to it. To detect, fetch the left sibling pointer and
45:
* then handoff right sibling pointers until reach the page
46:
* to the left of the page passed to us.
47:
*/
48:
49:
//while no match
50:
51:
while (sibling |= original.sub.-- page)
52:
{
53:
/* do handoff to right */
54:
page =(BTR)HANDOFF(window,page->btr.sub.-- sibling,lock.sub.-- level,
pag.sub.-- index);
55:
sibling = page->btr.sub.-- sibling;
56:
}
57:
58:
fix.sub.-- win.win.sub.-- page = original.sub.-- page;
59:
fix.sub.-- page = (BTR) FETCH (&fix.sub.-- win, LCK.sub.-- write,
pag.sub.-- index);
60:
61:
/* Check to see whether someone already pointed the original
62:
page to point to the new left page, if so just return. */
63:
64:
if (fix.sub.-- page->btr.sub.-- left.sub.-- sibling == window->win.sub.-
- page)
65:
{
66:
RELEASE (&fix.sub.-- win);
67:
return page;
68:
}
69:
70:
/* fix up the original page to point to the new left sibling */
71:
72:
MARK (&fix.sub.-- win);
73:
fix.sub.-- page->btr.sub.-- left.sub.-- sibling = window->win.sub.--
page;
74:
RELEASE (&fix.sub.-- win);
75:
76:
return page;
77:
}
__________________________________________________________________________
The method is invoked with a window, which is a logical window into the current physical page. Page is the page number. Lock level is the level at which the new page is to be locked. In a scan through the table, for instance, the page may be locked at the level of "read." The left handoff method hands off a B-tree page to the left. As noted above, this task is more difficult than a right handoff since pages are traversed without handing off locks. The method initializes local variables to store the original page and its left sibling (lines 21-24). Next, the method releases the current window (line 30). Note in particular that this creates a race condition in which another task could insert a page to the left of the original page. Therefore, additional logic (described below) is required for detecting and handling this condition. After releasing the window, the method fetches the page to the left (lines 32-35). Page is set to point to the B-tree page (line 35). If the page just fetched is the one expected by the method, the method has completed its task and may return page. This determination is made by comparing the page number of the original page to the forward page pointer of the left sibling page (lines 40-41). Otherwise, the page has been split and requires additional processing. This case is handled by fetching the left page (i.e., the next left page) and then continue a right handoff until the method reaches the page to the left of the page passed in. This is done by the "while" loop of lines 51-56. The actual right handoff is done at line 54, by calling a HANDOFF routine. At line 58, the method fixes up the original page to point to the new left page. The method then checks to see whether another process has already pointed the original page to point to the new left page (lines 61-64). In such a case, the method releases the logical window and returns the page. If this has not happened, however, then the method fixes up the original page to point to the next left sibling and thereafter releases the fixed up page and returns page (lines 70-76). (d) Natural order navigation FIG. 8 illustrates traversal of records in natural order (i.e., without an index). The figure shows a sequence of on-disk data pages (DPG constructs) 810 which comprise a table. Each data page comprises a header followed by a plurality of slots (e.g., slot 820), each slot storing an offset/length pair. The slots, in effect, function as a directory to the actual data records themselves (which can be stored any other place on the data page). The slots grow up from the beginning of the data page, and the data records grow down from the end of the data page. The data pages themselves, however, can be physically located anywhere in the database. As a record is deleted, the space occupied by the record is freed (i.e., available for storage of another record). As a new record is added, it is placed on the first free space. As a result, the natural order of the table is random. Because of the semantics of PC DBMS systems, users expect natural order to imply something other than a random arrangement. SQL users, on the other hand, expect no such ordering, unless they explicitly request ordering by using the "ORDER BY" SQL command. To accommodate navigational users, therefore, the Database Server is modified to provide a unique identifier--dbkey--for identifying a particular record. In essence, it serves as a handle for fetching a particular record. In an exemplary embodiment, the dbkey comprises two long words storing a relation (table) ID and a record number. The record number here is a logical construct, derived from the page number combined with the slot number for the data record. To navigate forward through database pages in natural order, the system increments a record number variable and attempts to fetch the record with that number (i.e., page/slot combination). If the record number does not exist, the system simply continues on to the next one. In a similar manner, navigating in reverse in a natural order table comprises decrementing the record number (variable), fetching the record associated with that record number (if any), and continuing on in that fashion. (e) "Bookmark" The Database Server is also modified to provide a "bookmark"--an indication of where the user is currently located in the table. Once a key is found for a particular record, the database server maintains the location as a bookmark. This, in turn, is surfaced to the Client. To explain the concept of a bookmark, it is helpful to understand how records are stored by the Database Server. The actual data for a relation is stored on data pages. Each data page is essentially an array of records with the following format: page header; available data page flags: data page is orphaned--it does not appear in any pointer page page is full a blob or array is on page sequence number of this data page in the relation; for integrity checking relation id of this relation; used for integrity checking number of records or record fragments on this data page an array of record descriptors, each of the format: offset of record or record fragment length of record or record fragment records and record fragments Thus, a data page may be viewed as an array of pointers to records, together with the records themselves. A record descriptor describes the location and size of records stored on a given page. For each record, the page stores a record descriptor in an array at the top of the page. As records are stored, the array grows down the page and the records are stored backwards from the end of the page until they meet in the middle. Records vary in size, so that the number of records that can fit on a data page varies. Records can be erased, leaving holes on the page. Calculation of the amount of free space on page is done by looking at the size of all records on page. If necessary to make room for a larger record, the page win be compressed to remove all holes. When the free space is less than the size of the smallest possible record fragment, the page is considered full. A record is uniquely identified by its record number, known outside the engine (i.e., surfaced to clients) as the dbkey. Conceptually, the record number identifies the offset from the beginning of the relation. The record number generally does not reflect the number of records stored. The record number is calculated by determining how many records could potentially be stored between the beginning of a relation and the current record. This will be explained by way of example. Consider a database with page size of 1024 bytes and a pointer page size of 1000 bytes. For a slot size of 4 bytes, 250 slots are available on the pointer page, and a total of 1008 bytes available on a data page. If the smallest possible record used 22 bytes (e.g., minimum record data of 6 bytes (using padding if necessary), record header of 12 bytes, and record descriptor of 4 bytes), a maximum of 45 records can be stored on a data page. The record number, in turn, is calculated as: <max number of records on previous data pages>+<line number on data page>, where the first value is: <number of data pages stored on previous pages>* 45 and the first value is: <pointer page number>* 250+<data page slot>. The formula for determining record number is thus: <pointer page number>* 250+<data page slot>* 45+<line number>. Note that record numbers are not necessarily sequential. Unused record numbers may arise since most relations have more than 6 bytes of data per record. In addition, records can be modified and erased, leaving extra space on data pages. Using the above-described fault-tolerant method, the system can retrieve the data record corresponding to a given record number (i.e., in natural order). In general, the approach is as follows. First, the system assumes that the key value stored at that offset indicated by the record number has not been updated. Next, the system verifies this assumption by comparing the key value found with the one which is expected. The page number, the offset on the page, and the key value are all cached, so that the information can be readily retrieved. When it is retrieved, however, the information is verified by comparing the two key values. In the instance where the key value has changed (i.e., another client has updated the page), the system goes back to the beginning of the page to again look for the key. If the key cannot be located on that page (e.g., the page has been split), the system continues in a left-to-right direction until the key is located. If the key value is found, then the bookmark is reset to that new position. If the key value is not found (i.e., a key value is located which is past the one sought), on the other hand, the record for that bookmark has been deleted by another client. In that case, the bookmark is set to point to the position (i.e., the "crack") just before the key value located which is greater than the key value sought. The crack is, therefore, set to the position where the prior record was deleted. In an exemplary embodiment, a get.sub.-- bookmark method may be constructed as follows (using the C programming language).
__________________________________________________________________________
1:
void NAV.sub.-- get.sub.-- bookmark (rsb, impure, bookmark)
2:
{
3:
/*************************************
4:
*
5:
* N A V .sub.-- g e t .sub.-- b o o k m a r k
6:
*
7:
*************************************
8:
*
9:
* Functional description
10:
* Return a descriptor containing a pointer
11:
* to a bookmark data structure which describes
12:
* the current location of a navigational stream.
13:
*
14:
************************************/
15:
16:
/* store info necessary to return to this location in the index */
17:
18:
bookmark->bkm.sub.-- number = impure->irsb.sub.-- nav.sub.-- number;
19:
bookmark->bkm.sub.-- age = impure->irsb.sub.-- nav.sub.-- page;
20:
bookmark->bkm.sub.-- incarnation = impure->irsb.sub.-- nav.sub.--
incarn;
21:
22:
/* store current key value, set the key descriptor to point at it */
23:
24:
bookmark->bkm.sub.-- key.sub.-- desc.dsc.sub.-- dtype = dtype.sub.--
text;
25:
bookmark->bkm.sub.-- key.sub.-- dec.dsc.sub.-- length
= impure->irsb.sub.-- nav.sub.-- length;
26:
27:
MOVE.sub.-- FAST (impure->irsb.sub.-- nav.sub.-- data,
bookmark->bkm.sub.-- key.sub.-- data,
28:
impure->irsb.sub.-- nav.sub.-- length);
29:
}
__________________________________________________________________________
The method is invoked with three parameters: rsb, impure, and bookmark. The Record Source Block or rsb (previously-described) identifies the current record stream. The impure area (previously-described) represents state information of a request (i.e., "where" a request is in its current state of execution). Bookmark points to a storage block--a bookmark block--for storing bookmark data. In an exemplary embodiment, a bookmark data structure, BKM, may be constructed as follows:
______________________________________
/* bookmark block, used to hold information about the current position
within an index; a pointer to this block is passed to the user as a
handle to facilitate returning to this position */
typedef struct bkm {
struct blk
bkm.sub.-- header;
struct bkm
*bkm.sub.-- next;
struct dsc
bkm.sub.-- desc;
ULONG bkm.sub.-- handle;
SLONG bkm.sub.-- number;
SLONG bkm.sub.-- page;
SLONG bkm.sub.-- incarnation;
SLONG bkm.sub.-- expanded.sub.-- offset;
USHORT bkm.sub.-- offset;
USHORT bkm.sub.-- flags;
struct dsc
bkm.sub.-- key.sub.-- desc;
UCHAR bkm.sub.-- key.sub.-- data ›1!;
} *BKM;
______________________________________
The data members function as follows: (a) bkm.sub.-- header: bookmark header (b) *bkm.sub.-- next: pointer to next bookmark (c) bkm.sub.-- desc: bookmark descriptor describing the bookmark handle (d) bkm.sub.-- handle: bookmark handle containing pointer to this block (e) bkm.sub.-- number: current record number (f) bkm.sub.-- page: current btree page (g) bkm.sub.-- incarnation: last known incarnation number of current btree page (h) bkm.sub.-- expanded.sub.-- offset: offset into expanded index page (if it exists) (i) bkm.sub.-- offset: offset into current btree page (j) bkm.sub.-- flags: flag values (k) bkm.sub.-- key.sub.-- desc: descriptor containing current key value (l) bkm.sub.-- key.sub.-- data ›1!: current key value The set.sub.-- bookmark method, at lines 18-20, stores information which enables the system to return to this location (i.e., the current location) in the index. This is done by storing the page number, the page offset (i.e., into the actual B-tree page itself), and "incarnation" (i.e., the number of times which it has been updated). Next, the method stores the current key value, setting a key descriptor to point to it (lines 22-25). This indicates what the fully-expanded key value is at the current location. This information is copied into the impure data area (line 28). The set.sub.-- bookmark routine is the "flipside" of the get.sub.-- bookmark routine. In an exemplary embodiment, a set.sub.-- bookmark method may be constructed as follows (using the C programming language).
__________________________________________________________________________
1:
BOOLEAN NAV.sub.-- set.sub.-- bookmark (rsb, impure, rpb, bookmark)
2:
{
3:
/*************************************
4:
*
5:
* N A V .sub.-- s e t .sub.-- b o o k m a r k
6:
*
7:
*************************************
8:
*
9:
* Functional description
10:
* Set up the impure area so that the current
11:
* position of the stream is that of the
12:
* stored bookmark.
13:
*
14:
**************************************/
15:
16:
/* save the bookmark state in the impure area for the stream */
17:
18:
impure->irsb.sub.-- nav.sub.-- number = bookmark->bkm.sub.-- number;
19:
impure->irsb.sub.-- nav.sub.-- incarnation = bookmark->bkm.sub.--
incarnation;
20:
impure->irsb.sub.-- nav.sub.-- offset = bookmark->b
21:
22:
/* store current key value, set key descriptor to point at it */
23:
impure->irsb.sub.-- nav.sub.-- length = bookmark->bkm.sub.-- key.sub.--
desc.dsc.sub.-- length;
24:
MOVE.sub.-- FAST (bookmark->bkm.sub.-- key.sub.-- data,
impure->irsb.sub.-- nav.sub.-- data,
25:
bookmark->bkm.sub.-- key.sub.-- desc.dsc.sub.-- length);
26:
27:
/* if on a crack, BOF or EOF, don't bother to retrieve a record */
28:
29:
if (rsb->rsb.sub.-- flags & (rsb.sub.-- bof .vertline. rsb.sub.-- eof
.vertline. rsb.sub.-- crack))
30:
return FALSE;
31:
32:
/* go ahead and fetch the physical record into the rph area
33:
return NAV.sub.-- get.sub.-- record (rsb, impure, rpb, RSE.sub.--
get.sub.-- current);
34:
}
__________________________________________________________________________
As before, the method is passed parameters of rsb, impure, and bookmark. In addition, however, it is also passed a record parameter block, rpb. This is so that the current position in the bookmark may be used to set the current record position (as expressed by rpb). At the conclusion of the method, rpb will point to the new record. At lines 16-20, the method stores into the impure area the same data values which were retrieved with the get.sub.-- bookmark method (i.e., number, incarnation, and offset). At lines 23-25, the method stores into the impure area the key value from the bookmark. At lines 27-30, if the current position lies on a crack, BOF, or EOF, the method will not bother retrieving a record and will simply return false (line 30). Otherwise, the method will fetch the physical record into the rpb area and return that record as its result (lines 32-33). A stream may be set to the location of the specified bookmark:
__________________________________________________________________________
1:
static NOD set.sub.-- bookmark (node)
2:
{
3:
/*************************************
4:
*
5:
* s e t .sub.-- b o o k m a r k
6:
*
7:
**************************************
8:
*
9:
* Functional description
10:
* Set a stream to the location of the
11:
* specified bookmark.
12:
*
13:
**************************************/
14:
TDBB tdbb;
15:
REQ request;
16:
BKM bookmark;
17:
USHORT stream;
18:
RPB *rpb;
19:
RSB rsb;
20:
21:
tdbb = GET THREAD.sub.-- DATA; request = tdbb->tdbb.sub.-- request;
22:
BLKCHK (node, type.sub.-- nod);
23:
24:
if (request->req.sub.-- operation == req.sub.-- evaluate)
25:
{
26:
bookmark = BKM.sub.-- lookup (node->nod.sub.-- arg ›e.sub.-- setmark.su
b.-- id!);
27:
stream = (USHORT) node->nod.sub.-- arg ›e.sub.-- setmark.sub.--
stream!;
28:
rpb = &request->req.sub.-- rpb ›stream!;
29:
rsb = *((RSB*) node->nod.sub.-- arg ›e.sub.-- setmark.sub.-- rsb!);
30:
31:
/* check if the bookmark was at beginning or end of file
32:
and flag the rsb accordingly */
33:
34:
RSE.sub.-- MARK.sub.-- CRACK(rsb, 0);
35:
if (bookmark->bkm.sub.-- flags & bkm.sub.-- bof)
36:
RSE.sub.-- MARKCRACK (rsb, rsb.sub.-- bof);
37:
else if (bookmark->bkm.sub.-- flags & bkm.sub.-- eof)
38:
RSE.sub.-- MARK.sub.-- CRACK (rsb, rsb.sub.-- eof);
39:
else if (bookmark->bkm.sub.-- flags & bkm.sub.-- crack)
40:
{
41:
RSE.sub.-- MARK.sub.-- CRACK (rsb, rsb.sub.-- crack);
42:
if (bookmark->bkm.sub.-- flags & bkm.sub.-- forced.sub.-- crack)
43:
RSE.sub.-- MARK.sub.-- CRACK (rsb, rsb.sub.-- forced.sub.-- crack);
44:
}
45:
46:
if (|RSE.sub.-- set.sub.-- bookmark (rsb, rpb, bookmark))
47:
EXE.sub.-- mark.sub.-- crack(rsb,rsb->rsb.sub.-- flags&(rsb.sub.--
crack.vertline.rsb.sub.-- eof.vertline.rsb.sub.-- bof));
48:
49:
request->req.sub.-- operation = req.sub.-- return;
50:
}
51:
52:
return node->nod.sub.-- parent;
53:
}
__________________________________________________________________________
Finally, the bookmark is released by:
______________________________________
1: static NOD release.sub.-- bookmark (node)
2: {
3: /**************************************
4: *
5: * r e l e a s e .sub.-- b o o k m a r k
6: *
7: **************************************
8: *
9: * Functional description
10: * Deallocate the passed bookmark.
11: *
12: **************************************/
13: TDBB tdbb;
14: REQ request;
15:
16: tdbb = GET.sub.-- THREAD.sub.-- DATA; request = tdbb->tdbb.sub.--
request;
17: BLKCHK (node, type.sub.-- nod);
18:
19: if (request->req.sub.-- operation == req.sub.-- evaluate)
20: {
21: BKM.sub.-- release (node->nod.sub.-- arg ›e.sub.-- relmark.sub.--
id!);
22: request->req.sub.-- operation = req.sub.-- return;
23: }
24:
25: return node->nod.sub.-- parent;
26: }
______________________________________
While the invention is described in some detail with specific reference to a single preferred embodiment and certain alternatives, there is no intent to limit the invention to that particular embodiment or those specific alternatives. Thus, the true scope of the present invention is not limited to any one of the foregoing exemplary embodiments but is instead defined by the appended claims.
|
Same subclass | ||||||||||
