Database system with methods for appending data records by partitioning an object into multiple page chains5717919Abstract A Client/Server Database System with improved methods for appending items to an object, such as appending data records to a database table, in the context of a multi-user environment is described. The system includes one or more Clients (e.g., Terminals or PCs) connected via a Network to a Server. The Clients store data in and retrieve data from one or more database tables resident on the Server by submitting SQL commands, some of which specify insert or append operations, for appending records to a table. For enhancing the speed in which multiple appenders (i.e., Clients) can append records, the operation of the server is modified to store an object (e.g., table) as multiple (physical) page chains. From the logical viewpoint, a single (logical) page chain of data pages is presented to each client or user. From the perspective of inserting records, however, the system has multiple page chains to insert into, thereby removing contention among multiple appenders for the last page. Claims What is claimed is: Description COPYRIGHT NOTICE
__________________________________________________________________________
1:
/* XDES: */
2:
/*
3:
** The XDES has xslgroup field whose purpose is to temporarily
4:
** hold any SLGROUPs to be freed till eot. When table unsliced, the
5:
** SLGROUPs to be released are moved from the DES to XDES.xslgroup.
6:
** If the transaction commits, move this list of released SLGROUPs
7:
** to the global free list. If the transaction aborts, we move the
8:
** SLGROUPs back to the DES. This provides convenient and safe way
9:
** of managing the SLGROUPs.
10:
*/
11:
12:
typedef struct xdes
13:
{
14:
15: XLRMARKER
xindir; /* ptr for indirect updating */
16:
17: struct sdes
*xupdate;
/* sdes pointer for updated table */
18: int32 xstate; /* state */
19: short xstatus;
/* status bits */
20: short xmode; /* type of update */
21:
22: /* relevant to slices. */
23: int xpost.sub.-- commit.sub.-- work;
24: /* Is there post commit work? */
25:
26: /*
27: . . .
28: */
29:
30:
31: /* relevant to slices. */
32: int32 xslicehashval;
/* used to choose a slice number
33: ** while inserting into sliced
34: ** table. See hsm.sub.-- strategy().
35: */
36: struct xxs
*xxs; /* XLS state information */
37:
38:
39: /* relevant to slices. */
40: struct slgroup
*xslgroup;
/* SLGROUPs to be freed at eot */
41:
42: /*
43: . . .
44: */
45:
46: DATE xstarttime;
47: spid.sub.-- t
xspid; /* spid of user */
48: short xnlen;
/* length of the transaction name */
49: BYTE xname›MAXXACTNAME!;
/* the name of transaction. */
50:
51:
}XDES;
__________________________________________________________________________
Of the data members contained within the xdes data structure, two are of particular interest: xpost.sub.-- commit.sub.-- work and xslgroup. When a table is unpartitioned, some work remains to be done (e.g., memory de-allocation) after the transaction has committed. Accordingly, xpost.sub.-- commit.sub.-- work is employed as a boolean for indicating whether there is any post-commit work remaining to be done. If the boolean is set, at the end of the transaction, the system will examine the field xslgroup for determining a list of SLGROUPS (i.e., cache groups) which can be placed on the global free list. Another data member relevant to Dataslices is xslicehashval, which is a 32-bit integer used to choose a random slice number for insertion. This is a property of the transaction (i.e., stored in xdes) since it is desired to repeatedly use the same slice throughout the duration of the transaction. In other words, the value is set at the beginning of the transaction and then repeatedly used throughout the duration of the transaction. The next data structure of interest is sdes. The sdes data member serves as a session descriptor, for example, employed when accessing a database table. In an exemplary embodiment, the sdes data structure may be constructed as follows:
__________________________________________________________________________
1:
/* SDES: */
2:
typedef
struct
sdes
3:
{
4: /*
5: . . .
6: */
7:
8: struct lockrec
*sscanlock;
/* lock on sscanbuf */
9: struct lockrec
*sdatalock;
/* lock on sdatabuf */
10: struct
des
*sdesp;
/* pointer to DES */
11:
12: /*
13: . . .
14: */
15:
16: /* relevant to slices. */
17: SLCACHE
*sslcacheptr;
/* current slice's cache descriptor */
18: slnum.sub.-- t
scurslice;
/* current slice. Used for scanning and
19: ** inserting into a sliced table.
20: */
21:
22: struct sdes
*read.sub.-- sdes;/* read session descriptor */
23: int32 srestarts; /* # of times scan has restarted */
24: BYTE sstat2.sub.-- vec›2!; /* more status bits */
25: uint16 sshort.sub.-- spare; /* short spare */
26: OBJ.sub.-- STATS
sdes.sub.-- stats;/* Structure to collect statistics */
27: VS.sub.-- STRATEGY
svs.sub.-- strategy; /* virtual space strategy for scan */
28: int32 sindlevel; /* leaf or non-leaf level of current scan */
29: int slkprom.sub.-- lvl; /* Lock promotion info */
30:
} SDES;
__________________________________________________________________________
The sdes includes a pointer to des which is a descriptor for a particular table object. The sdes stores housekeeping information, such as which record is currently locked, whether a scan is open, and the like. Of particular relevance to Dataslices is SLCACHE which serves as a descriptor to the current slice's cache. Stored in conjunction with this is slnum.sub.-- t, which is the number of the slice this transaction is currently positioned on. With an understanding of the relevant data structures, the methods of the Dataslices manager may now be understood. The Dataslices manager includes a method, dsm.sub.-- slice.sub.-- table, which slices a table by creating (N-1) new slices, where N is the number of slices for the table. The existing page chain of the table is assigned to the first slice. The page chain for each slice, except for the first slice, contains only one page which is the first and last page of the chain. All existing rows of the table belong to the first slice. In addition, a control page is created for each slice. In an exemplary embodiment, the dsm.sub.-- slice.sub.-- table method may be constructed as follows:
__________________________________________________________________________
1:
/*
2:
**
dsm.sub.-- slice.sub.-- table
3:
**
4:
**
Parameters:
5:
** xdes
(I/O) Transaction descriptor
6:
** sdes
(I/O) Session descriptor of the table
7:
** slicenum
(I/P) # of slices to create. Must be > 1
8:
**
9:
**
Returns:
10:
** SM.sub.-- SUCCESS
Successfully sliced the table.
11:
** other status
Failure status (see slice.h for values)
12:
**
13:
**
Side Effects:
14:
** A new row for each slice is inserted into syspartitions
15:
** system table.
16:
**
17:
**
Assumptions:
18:
** 1.
The transaction must have taken an exclusive lock on the
19:
** table.
20:
** 2.
Caller must have checked if it is legal to slice the table.
21:
** (for instance, this function should not be called to
22:
** slice a work table)
23:
**
24:
*/
25:
26:
smstatus.sub.-- t
27:
dsm.sub.-- slice.sub.-- table (XDES *xdes, SDES *sdes, slnum.sub.-- t
slicenum)
28:
{
29: INDEX indstruct;
/* sysindexes row for the table */
30: int16 segid; /* table's segment id */
31: int32 devcount;
/* # devices segment mapped to */
32: uint32 devmap ›MAXVIRT!;
33: smstatus.sub.-- t
status; /* of slicing operation */
34: slnum.sub.-- t
i; /* slice number */
35:
36: /*
37: ** First, determine the segment number of this object by
38: ** obtaining the sysindexes row (indid = 0) for the table.
39: */
40: ind.sub.-- rowcopy (sdes, 0, &indstruct) ;
41: segid = GETSHORT (&indstruct.indsegment) ;
42:
43: /* Build the device map for the segment */
44: devcount = hsm.sub.-- map.sub.-- objdev (segid,
45: sdes->sdesp->ddbtable, devmap) ;
46: /* Create slices */
47: for (i = 1; i <= slicenum; i++)
48: {
49: if ((status = hsm.sub.-- create.sub.-- slice (xdes, sdes, i,
slicenum,
50: devcount, devmap) ) |= SM.sub.-- SUCCESS)
51: {
52: return status;
53: }
54: }
55:
56: /* Created all slices successfully */
57: return SM.sub.-- SUCCESS;
58:
}
__________________________________________________________________________
As shown, the method is invoked with the previously-described data members, xdes and sdes; slicenum is simply the number of slices to create. The steps of the method are as follows. First, the method determines the segment number of the object by obtaining the sysindexes row for the table (lines 36-41). Each table is mapped to a certain segment in the database. Accordingly, all of the partitions for the table must be mapped to the same segment. The method begins, therefore, by determining which segment the table is mapped to. The information which is returned, a segment ID (segid), is then used to construct a device affinity map (lines 43-44). A device count is established at this point. Now, the method is ready to create slices. This is done at steps 46-54, by establishing a "for" loop for processing each slice in turn (i.e., from the number 1 to the number of slices); the method calls hsm.sub.-- create.sub.-- slice to create the Nth slice. As shown, the information required to create a slice includes the previously-described xdes and sdes descriptors, the slice ID (i), the total number of slices (slicenum), total number of devices (devcount), and a list of devices or "device map" (devmap). A request to slice a table is, therefore, carried out by repeatedly invoking the create slice method. If the process succeeds, the method returns a success value. The create slice method, hsm.sub.-- create.sub.-- slice, establishes an association between a heap object and a slice, and creates a page chain for the slice. Generally, the heap object (e.g., table) already exists (i.e., has at least one slice) before this method is invoked.
__________________________________________________________________________
1:
/*
2:
**
HSM.sub.-- CREATE.sub.-- SLICE
3:
**
Parameters:
4:
** xdes -- ptr to the transaction descriptor.
5:
** sdes -- a session descriptor ptr to the heap object
6:
** for which the slice is being created.
7:
** slnum -- the slice number to create.
8:
** totalslices -- total number of slices being created for
9:
** this object
10:
** totaldevs -- total number IO devices on which the
11:
** the object can exist.
12:
** devmap -- ptr to a bit map of valid virtual devices
13:
** for the object's segment.
14:
**
15:
**
Returns:
16:
** Status.
17:
**
18:
**
Synchronization:
19:
** The caller must have a table level lock.
20:
**
21:
**
Side Effects:
22:
**
23:
*/
24:
smstatus.sub.-- t
25:
hsm.sub.-- create.sub.-- slice(XDES *xdes,
26: SDES *sdes,
27: slnum.sub.-- t
slnum,
28: int32 totalslices,
29: int32 totaldevs,
30: uint32
*devmap)
31:
{
32:
SLCACHE
*slc;
33:
DES *obj;
34:
INDEX indstruct;
35:
INDEX *indp;
36:
BUF *bp;
37:
SLINFO
*info;
38:
SLGROUP
*head;
39:
pgid.sub.-- t
firstpg;
40:
pgid.sub.-- t
ctrlpg;
41:
pgid.sub.-- t
lastpg;
42:
SLGROUP
*slg;
43:
int slgcount; /* Number of SLGROUPs needed. */
44:
VOLATILE SLICE.sub.-- CRT.sub.-- CPY copy;
45:
SLICE.sub.-- CTRLBLK
ctrlblk;
46:
SLICE.sub.-- CTRLBLK
*ctrlblk.sub.-- ptr; /* points into control page */
47:
SYBPARTITION
slice;
48:
SYBPARTITION
*slptr;
49:
BYTE rowbuf›sizeof(SYBPARTITION)!;
50:
SLICE.sub.-- AFFTBL
afftbl;
51:
int32
offset;
52:
TALLY
tally;
53:
54:
/*
55:
** Step 1: Allocate a first page if necessary. This requires
56:
** the affinity map, so set it up here.
57:
*/
58:
59:
/* Zero out the control block. */
60:
MEMZERO((BYTE *)&ctrlblk, sizeof(struct slice.sub.-- ctrlblk));
61:
62:
/* Build the control page affinity map for this slice. */
63:
ctrlblk.scp.sub.-- afftbl.sub.-- count = hsm.sub.-- build.sub.--
slicemap(sdes, slnum,
64: totalslices, totaldevs,
65: devmap,
66: &afftbl);
67:
68:
/*
69:
** If this is the first slice then count up the number of
70:
** pages already allocated to this page chain. There is ALREADY
71:
** a first page, so there is no need to allocate one.
72:
*/
73:
if (slnum == 1)
74:
{
75: /* Get the used page count for the heap (indid = 0) */
76: tally.sub.-- fetch(sdes, 0, &tally);
77: ctrlblk.scp.sub.-- alloccount = tally.t.sub.-- usedpgcnt;
78:
79:
/*
Remember first and last page of the original page chain */
80: firstpg = obj-->dslinfo.sd.sub.-- slice1.sc.sub.-- first;
81: lastpg = obj-->dslinfo.sd.sub.-- slice1.sc.sub.-- root;
82:
}
83:
84:
else /* If not first slice, we need to allocate a first page. */
85:
{
86: if ((bp = hsm.sub.-- .sub.-- allocate(sdes, indp, 0, FALSE,
&afftbl,
87: ctrlblk.scp.sub.-- afftbl.sub.-- count,
88: (int32 *)&ctrlblk.scp.sub.-- afftbl.sub.-- offset)) == (BUF
*)NULL)
89: {
90: ex.sub.-- raise(SLICEMGR, 0, EX.sub.-- CONTROL, 1);
91: }
92:
93: firstpg = bp-->bpage-->ppageno;
94: lastpg = bp-->bpage-->ppageno;
95:
96: ctrlblk.scp.sub.-- alloccount = 1;
97:
98: /* unkeep the first page buffer */
99: bufunkeep(bp, sdes);
100:
}
101:
102:
/* Step 2: Allocate a control page. */
103:
104:
if ((bp = hsm.sub.-- .sub.-- allocate(sdes, indp, firstpg, TRUE,
&afftbl,
105: ctrlblk.scp.sub.-- afftbl.sub.-- count,
106: (int32 *)&ctrlblk.scp.sub.-- afftbl.sub.-- offset)) == (BUF
*)NULL)
107:
{
108: ex.sub.-- raise(SLICEMGR, 0, EX.sub.-- CONTROL, 1);
109:
}
110:
111:
/* Remember the control page id */
112:
ctrlpg = bp-->bpage-->ppageno;
113:
114:
copy.bp = bp;
115:
116:
if (|endupdate(xdes))
117:
{
118: ex.sub.-- raise(SLICEMGR, 0, EX.sub.-- CONTROL, 1);
119:
}
120:
121:
/* Step 3: Insert a new row in SYSPARTITIONS */
122:
123:
/* Zero out the SYBPARTITION structure. */
124:
slptr = (SYBPARTITION *)&slice;
125:
MEMZERO((BYTE *)slptr, sizeof (SYBPARTITION));
126:
127:
/* format a row for syspartitions table. */
128:
slptr-->sl.sub.-- slicenum = slnum;
129:
slptr-->sl.sub.-- objid = sdes-->sdesp-->dobjectc.objostat.objid;
130:
slptr-->sl.sub.-- first = firstpg;
131:
slptr-->sl.sub.-- ctrlpg = ctrlpg;
132:
133:
fmtrow((int)SYSPARTITIONS, (BYTE *)slptr, (int *)0, rowbuf);
134:
135:
/* insert the row */
136:
sl.sub.-- row.sub.-- insert(xdes, rowbuf);
137:
138:
The steps involved are as follows. As the first step, the method allocates and initializes the first page (lines 55-101). In particular, at line 63, the method invokes hsm.sub.-- build.sub.-- slicemap which builds a control page affinity map for the particular slice being constructed. This information indicates for the current slice what are the available devices. Since the available devices might not equal the number of slices, the method determines at this point the mapping between devices and slices. It is preferable to do this up front since the first page as well as the control page are allocated from that particular set of devices. At lines 68-82, the method tests whether the current slice is the first slice. Recall that in order to partition a table, the table must have already been created (i.e., already has a first slice and already has a first page). The method tests at this point, therefore, if the current slice is the first slice; if so, then the first page has already been created (by create table routines). Instead at this point, the method simply keeps a count of how many pages exist for the first slice (i.e., the original unsliced table which is now going to be sliced). The call to tally.sub.-- fetch, at line 76, gets the number of pages which are in the first slice. Also at this point (specifically lines 79-81), the method remembers the first and last pages of the original page chain. At a later point, the system will place that information in the control page. If the current slice is not the first slice (i.e., false at line 73), then the method needs to allocate a first page for the current slice. This is done at lines 84-101. Specifically, this is done by the hsm.sub.-- allocate method, at line 86, using the affinity map (afftbl) which was built by the call to hsm.sub.-- build.sub.-- slicemap. Specifically at this point, the method attempts allocation of the first page, with an affinity to the device(s) specified in the map. Again, the method remembers the first page and last page (lines 94-95) which here will be the same (since a brand new first page has just been created). At this point, there is exactly one page (noted at line 97). Once a first page is allocated, the method proceeds to the second step to allocate a control page (i.e., a page which is to contain this information), shown at lines 103-120. Regardless of whether the table object is previously partitioned or unpartitioned, a new control page is allocated at this point for storing information about the slice being created. The particular allocation is done at line 105, with a call to hsm.sub.-- allocate. At the completion of this second step, the method is done with allocation of the two pages--a first page and a control page. Now, the method is ready to add a new entry to the SYSPARTITIONS table, a catalog for tracking this information, as its third step. The SYSPARTITIONS table will contain, in particular, the first page ID as well as the control page ID. From the control page, the system can determine the page ID for the last page. The particular row entry is formatted at lines 128-134 and inserted at lines 136-137. In particular at line 137, the row is inserted into the SYSPARTITIONS table by a call to sl.sub.-- row.sub.-- insert. Since this is an insert operation, it is a logged operation. At this point, the first page and the control page have been created, an entry has been made in SYSPARTITIONS, and the operation has been logged. The method is now ready to create the in-memory (cache) structures for managing this slice. Allocation for the cache for all of the slices is done up front, that is during creation of the first slice, as the fourth step (lines 139-170). The fifth step is to get the cache entry and update it with correct values (lines 172-189). At line 172, the call to slcache.sub.-- des.sub.-- get returns a pointer to the Nth slice. After initializing the entry to zero (line 176), a spin lock is assigned (line 179) and the remaining fields are updated with values for the slice (namely, slice number, control page ID, last page ID, and first page ID). Finally, at the sixth step, the same information is installed on the control page (lines 191-252). Recall that this is done because the cache and the control page are maintained in sync. This information, as well as the device affinity map (i.e., which storage devices are mapped to this partition) are copied to the control page, particularly at lines 223-224. Once the information has been copied, the control page is flushed to disk (by marking it dirty at line 238). The method concludes by updating the number of slices in the des descriptor. Finally, it returns, with a status flag indicating "success" (at line 259). Complementing dsm.sub.-- slice.sub.-- table is dsm.sub.-- unslice.sub.-- table, the method for unpartitioning a table. The primary functionality achieved by this method is to unslice the table by concatenating the page chains into one page chain. The method starts at the end: for N number of slices, the method begins at the Nth slice. The Nth slice is concatenated with the N-1 slice, then with the N-2 slice, and so forth and so on until a single page chain view of the table is completed. In an exemplary embodiment, the dsm.sub.-- unslice.sub.-- table may be constructed (using the C programming language) as follows:
__________________________________________________________________________
1:
/*
2:
**
dsm.sub.-- unslice.sub.-- table
3:
**
4:
**
Parameters:
5:
** xdes
(I/O) Transaction descriptor
6:
** sdes
(I/O) Session descriptor of the table
7:
**
8:
**
Returns:
9:
** SM.sub.-- SUCCESS
Successfully unsliced the table
10:
** other status
Failure status (see slice.h for values)
11:
**
12:
**
Side Effects:
13:
** Rows of the slices of the tbale are deleted from syspartitions
14:
** system table.
15:
**
16:
**
Assumptions:
17:
** 1.
The transaction must have taken an exclusive lock on the
18:
** table.
19:
** 2.
The table must be a sliced table.
20:
**
21:
*/
22:
23:
smstatus.sub.-- t
24:
dsm.sub.-- unslice.sub.-- table (XDES *xdes, SDES *sdes)
25:
{
26: slnum.sub.-- t
slicenum;
/* number of slices */
27: slnum.sub.-- t
i; /* slice number */
28: smstatus.sub.-- t
status;
/* of unslicing operation */
29:
30: /* Get the total number of slices for the table */
31: slicenum = SDES.sub.-- NUMSLICES (sdes) ;
32:
33: /*
34: ** Destroy slices starting from last to second. When a slice
35: ** is destroyed, its page chain is concatenated with the
36: ** previous page chain.
37: */
38: for (i = slicenum; i > 1; i--)
39: {
40: if ((status = dsm.sub.-- destroy.sub.-- slice (xdes, sdes, i) ) |=
SM.sub.-- SUCCESS)
41: {
42: return status;
43: }
44: }
45:
46: /*
47: ** Now, the first slice's control page needs to be freed. Call
48: ** HSM to destroy the first slice. The page chain is not really
49: ** destroyed. HSM treats first slice as a special case.
50: */
51: if ((status = hsm.sub.-- destroy.sub.-- slice (xdes, sdes, 1) ) |=
SM.sub.-- SUCCESS)
52: {
53: return status;
54: }
55:
56: /* Table has been successfully unsliced */
57: return SM.sub.-- SUCCESS;
58:
}
__________________________________________________________________________
The specific steps are as follows. First, the method gets the total number of slices for the table (lines 30-31). Next, for each slice (starting at the end), the method invokes a dsm.sub.-- destroy.sub.-- slice method for destroying the then-current Nth slice (lines 33-44). In other words, at this point, the system determines how many slices it has and then repeatedly calls dsm.sub.-- destroy.sub.-- slice to destroy each slice. Note that the call includes an xdes parameter, as this is a logged operation. The dsm.sub.-- destroy.sub.-- slice method is continually invoked as long as there is greater than one slice. After the slices have been destroyed, the method next invokes hsm.sub.-- destroy.sub.-- slice for removing the control page (lines 46-54). After this is done, the table has been successfully "unsliced" and the method may return "success" (at lines 55-56). As shown above, there are two "destroy slice" methods. dsm.sub.-- destroy.sub.-- slice performs the concatenation of page chains. hsm.sub.-- destroy.sub.-- slice, on the other hand, simply removes the control page. The dsm.sub.-- destroy.sub.-- slice destroys the Nth slice by concatenating the page chains. In an exemplary embodiment, the dsm.sub.-- destroy.sub.-- slice method may be constructed as follows:
__________________________________________________________________________
1:
/*
2:
**
dsm.sub.-- destroy.sub.-- slice
3:
**
4:
**
Parameters:
5:
** xdes
(I/O) Transaction descriptor
6:
** sdes
(I/O) Session descriptor of the table
7:
** slicenum
(I/P) Slice number to destroy. Must be > 1
8:
**
9:
**
Returns:
10:
** SM.sub.-- SUCCESS
Successfully destroyed the slice
11:
** other status
Failure status (see slice.h for values)
12:
**
13:
**
Side Effects:
14:
** The row for the slice in syspartitions system table is deleted.
15:
**
16:
**
Assumptions:
17:
** 1.
The transaction must have taken an exclusive lock on the
18:
** table.
19:
**
20:
**
21:
*/
22:
23:
smstatus.sub.-- t
24:
dsm.sub.-- destroy.sub.-- slice (XDES
*xdes, SDES *sdes, slnum.sub.-- t slicenum)
25:
{
26: pgid.sub.-- t
lastpage;
/* of a slice's page chain */
27: pgid.sub.-- t
firstpage;
/* of a slice's page chain */
28: BUF *bp; /* buffer pointer */
29: SLCACHE *cur; /* slice to destroy */
30: SLCACHE *prev;
/* previous/preceding slice */
31: smstatus.sub.-- t status;
/* of unslicing operation */
32: SCAN.sub.-- CONTEXT scan context;
/* needed to get the first page */
33:
34: /* Get slcache pointers for current and previous slices */
35: cur = slcache.sub.-- des.sub.-- get (sdes->sdesp, slicenum) ;
36: prev = slcache.sub.-- des.sub.-- get (sdes->sdesp, (slicenum -1) )
;
37:
38: /* Housekeeping:
39: ** Call beginupdate () before making any updates -
40: ** concatenation, page removal, etc. causes updates to the
41: ** pages. Last arg is used only in deferred mode and we are not
42: ** in deferred mode.
43: */
44: if (|beginupdate(xdes, sdes, XMOD.sub.-- DIRECT, 0) )
45: {
46: return SM.sub.-- FAILURE;
47: }
48:
49: /*
50: ** Concatenate page chains.
51: */
52:
53: /* Find last page id for the previous slice and get it */
54: (void) slcache.sub.-- getval (prev, SLC.sub.-- LASTPAGE, (BYTE *)
&lastpage) ;
55: sdes->scur.pageid = lastpage;
56: bp = getpage.sub.-- noscan (sdes, &scan.sub.-- context) ;
57:
58: /* Find first page id for the current slice */
59: (void) slcache.sub.-- getval (cur, SLC.sub.-- FIRSTPAGE, (BYTE *)
&firstpage) ;
60:
61: /*
62: ** Update prev slice's last page's next page to current
63: ** slice's first page.
64: */
65:
66: if (|new.sub.-- replace (sdes, bp, OFFSETOF (PAGE, pnextpg),
67: (BYTE *) &firstpage, sizeof (firstpage), XREC.sub.-- MODIFY) )
68: {
69: bufunkeep (bp, sdes) ;
70: return SM.sub.-- FAILURE;
71: }
72: bufunkeep (bp, sdes) ;
73:
74: /*
75: ** Update current slice's first page's prev page to prev
76: ** slice's last page.
77: */
78:
79: /* Get first page of the current slice */
80: sdes->scur.pageid = firstpage;
81: bp = getpage.sub.-- noscan (sdes, &scan.sub.-- context) ;
82:
83: if (|new.sub.-- replace (sdes, bp, OFFSETOF (PAGE, pprevpg),
84: (BYTE *) &lastpage, sizeof (lastpage), XREC.sub.-- MODIFY) )
85: {
86: bufunkeep (bp, sdes) ;
87: return SM.sub.-- FAILURE;
88: }
89:
90: /* Get last page id of the current slice */
91: (void) slcache.sub.-- getval (cur, SLC.sub.-- LASTPAGE, (BYTE *)
&lastpage) ;
92:
93: /*
94: ** If current slice's first page has no rows, free that page
95: ** to ensure there are no empty pages in the chain.
96: **/
97: if (bp->bpage->pfreeoff == PAGEHEADSIZE)
98: {
99: if (|bp->bpage->pnextpg)
100: {
101: /*
102: ** This is the last page of chain and we are
103: ** freeing it. Its prev page becomes the last
104: ** page of the previous slice's chain.
105: */
106: lastpage = bp->bpage->pprevpg;
107: }
108: /*
109: ** Remove the page. neighbors.sub.-- locked param is TRUE
110: ** because of table lock. prev and next pointers are
111: ** null because we have not kept those buffers.
112: **/
113: if (|removepage (sdes, bp->bpage, NULL, NULL,
114: TRUE, bp->bcache.sub.-- desc->cid) )
115: {
116: bufunkeep (bp, sdes) ;
117: return (SM.sub.-- FAILURE) ;
118: }
119: }
120:
121: /* We are done with the buffer of the first page */
122: bufunkeep (bp; sdes) ;
123: /* Change last page id of
124: prev slice to concatenated chain's last*/
125: if (|sl.sub.-- chgvalue (xdes, sdes, slicenum-1, SLC.sub.--
LASTPAGE,
126: (BYTE *) &lastpage) )
127: {
128: return SM.sub.-- FAILURE;
129: }
130:
131: /* Done with the update */
132: if (|endupdate (xdes) )
133: {
134: return SM.sub.-- FAILURE;
135: }
136:
137: /* Destroy the current slice. */
138: if ((status=hsm.sub.-- destroy.sub.-- slice (xdes, sdes, slicenum) )
|= SM.sub.-- SUCCESS)
139: {
140: return status;
141: }
142:
143: return SM.sub.-- SUCCESS;
144:
}
__________________________________________________________________________
The method is invoked with three parameters: transaction descriptor (xdes), session descriptor (sdes), and slice ID or number (slicenum) of the slice to destroy. At lines 34-36, the method uses the in-memory structures (i.e., cache) to gain access to the N (current) and N-1 (previous) slices; these are the slices which the method operates on. At lines 38-47, the method performs housekeeping, calling beginupdate method. The page chains are concatenated at lines 49-72. This is done by finding the last page of the previous slice (line 54) and the first page for the current slice (line 59), and then merging the two together. Recall that the last page of the previous slice was pointing to NULL, because that was the end of that chain. Now, the pointer is updated to point to the first page of the next (i.e., current) slice (lines 61-72). Similarly, the current slice should point back to the previous slice: the backward link has to be set up as well as the forward link. This is done at lines 74-88. In particular, the method gets the first page of the current slice (lines 80-81). Then it updates the previous pointer of the current slice's first page to point to the last page of the previous slice (lines 83-88). Once that is done, the method gets the last page (ID) of the current slice, that is the slice currently being destroyed (lines 90-91). The reason for doing this is as follows. If there exists no rows for the current (Nth) slice, then there is no point in keeping an empty page around. In other words, when at the last page, if it is empty, the method prefers to free up the page. This is done at lines 93-119. Specifically, at line 97, the method tests whether no rows exist in the last page. In the event that none are found, the method then proceeds to free up the last page (particularly lines 108-118). This is followed by release of the buffer for the first page, at lines 121-122. At lines 124-129, the method performs a cache update by changing the last page ID of the previous slice to that of the concatenated page chain's last page. After ending the (logged) update at lines 131-135, the method invokes the hsm.sub.-- destroy.sub.-- slice method for destroying the current slice (lines 137-141). Upon completion of that call, the method is done and may return a "success" result (line 143). The hsm.sub.-- destroy.sub.-- slice method destroys a partition for an object by deallocating the control page and deleting the row entry from the SYSPARTITIONS table. As with the dsm.sub.-- destroy.sub.-- slice method, the hsm.sub.-- destroy.sub.-- slice method is invoked with parameters of a transaction descriptor (xdes), a session descriptor (sdes), and a slice ID or number (slnum). In an exemplary embodiment, the hsm.sub.-- destroy.sub.-- slice method may be constructed as follows:
__________________________________________________________________________
1:
/*
2:
**
hsm.sub.-- destroy.sub.-- slice
3:
**
4:
**
Parameters:
5:
** xdes
Ptr to the transaction descriptor
6:
** sdes
SDES ptr for the heap object
7:
** slnum
the slice number to deactivate
8:
**
9:
**
Returns:
10:
** status
11:
**
12:
**
Side Effects:
13:
** None.
14:
**
15:
**
Synchronizations:
16:
** A table level lock must have been acquired by the caller.
17:
**
18:
*/
19:
20:
smstatus.sub.-- t
21:
hsm.sub.-- destroy.sub.-- slice (XDES *xdes, SDES *sdes, slnum.sub.--
t slnum)
22:
{
23: SLCACHE *slc;
24: BUF *bp;
25: objid.sub.-- t
objid;
26: DES *des;
27: PGALLOC.sub.-- CTRL
pgc; /* Page deallocation control record */
28: VOLATILE SLICE.sub.-- CRT.sub.-- CPY
copy;
29: SCAN.sub.-- CONTEXT scan.sub.-- context;
/* needed to get control page */
30:
31:
32: objid = SDES.sub.-- OBJID (sdes) ;
33:
34: /* get slcache for this particular partition number */
35: if (slnum == 1)
36: slc = & (sdes->sdesp->dslinfo.sd.sub.-- slice1) ;
37: else
38: slc = slcache.sub.-- des.sub.-- get (sdes->sdesp, slnum) ;
39:
40: /* get the pageid of the control page so can deallocate it */
41: sdes->scur.pageid = slc->sc.sub.-- ctrlpg;
42:
43: /* get scan.sub.-- context for control page */
44: getpage.sub.-- init.sub.-- scan.sub.-- context (&scan.sub.--
context, sdes, objid,
45: TABENTRY, NULL.sub.-- CACHEID) ;
46: bp = getpage.sub.-- noscan (sdes, &scan.sub.-- context) ;
47:
48: if (|beginupdate (xdes, sdes, XMOD.sub.-- DIRECT, 0) )
49: {
50: ex.sub.-- raise (SLICEMGR, 0, EX.sub.-- CONTROL, 1) ;
51: }
52:
53: /* pg.sub.-- deallpage () requires a tidy struct */
54: MEMZERO (&pgc, sizeof (pgc) ) ;
55:
56: /* log page header contents */
57: MOVE.sub.-- FIXED (bp->bpage, pgc.pga.sub.-- pinfo.pg1.sub.-- pghdr,
PAGEHEADSIZE) ;
58:
59: pgc.pga.sub.-- pinfo.pg1.sub.-- cid = (odcid.sub.-- t) scan.sub.--
context.cid;
60:
61: /*
62: ** Deallocate the slice's control page by a direct call to
63: ** pg.sub.-- deallpage ().
64: */
65:
66: if (|pg.sub.-- deallpage (sdes, &pgc) )
67: {
68: /* Insufficient log space */
69: ex.sub.-- raise (SLICEMGR, 0, EX.sub.-- CONTROL, 1) ;
70: }
71:
72: bufunkeep (bp, sdes) ;
73:
74: if (slnum == 1)
75: /* If this is the first slice then
76: ** we are no longer a sliced object
77: ** so update sysindexes. */
78: if (|ind.sub.-- chgvalue (sdes, xdes,
79: 0, LASTPAGE, (BYTE *) &slc->sc.sub.-- root) )
80: {
81: return (SM.sub.-- FAILURE) ;
82: }
83: if (|endupdate (xdes) )
84: {
85: ex.sub.-- raise (SLICEMGR, 0, EX.sub.-- CONTROL, 1) ;
86: }
87:
88: /* At this point, the control page has been
89: ** deallocated from the slice. However, we will
90: ** not erase the control page info from the in-memory
91: ** structure (as we might rollback).
92: ** By definition, we only update the last page.
93: */
94:
95: des = sdes->sdesp;
96:
97: /* delete the slice row from Syspartitions */
98: sl.sub.-- row.sub.-- delete (xdes, objid, slnum) ;
99:
100: /* don't decrement the count if we are the first
101: ** slice because an unsliced object has one slice.
102: */
103: if (slnum > 1)
104: des->dslinfo.sd.sub.-- numslices--;
105:
106: else
107: {
108: /*
109: ** If we are destroying the first slice, move the SLGROUP
110: ** list from the DES to the xdes. When this transaction
111: ** ends, we will release this list.
112: */
113: xdes->xpost.sub.-- commit.sub.-- work = TRUE;
114: xdes->xslgroup = sdes->sdesp->dslinfo.next;
115: sdes->sdesp->dslinfo.next = (SLGROUP *) NULL;
116: }
117:
118: return (SM.sub.-- SUCCESS) ;
119:
120:
}/* end hsm.sub.-- destroy.sub.-- slice */
__________________________________________________________________________
At line 32, the method obtains an object ID for the current object (table); this is obtained from the sdes descriptor. Next, at lines 34-38, the method gets the cache information (slcache) for this particular slice or partition number. The cache information for the first slice is treated as a special case. In particular, the information is maintained in the sdes for the object, thereby optimizing its storage for "unsliced" tables (i.e., tables with only one slice). Next, the method gets the page ID for the control page (lines 40-41) and then fetches the control page (lines 43-46). After some housekeeping steps (lines 48-58), the method deallocates the page by invoking pg.sub.-- deallpage (lines 61-70). After this is done, the buffer can be freed (line 71). When the method reaches the last slice (slice number equal 1 at line 73), it is no longer dealing with a sliced object and may therefore update sysindexes with the last page of the object. As the control page has now been removed, the method can proceed to delete the corresponding slice row from the SYSPARTITIONS table (lines 87-97). The method decrements the number of slices at lines 102-103. Recall, however, that the number is preferably not decremented below 1, as each table object in the system is defined to have at least one slice. When the table was first partitioned, cache information was stored (i.e., by SLGROUP). Therefore, when the system reaches the last slice it will free up the memory. However, this does not occur until after the transaction has committed. The process of freeing up the memory can be deferred until the transaction has committed, by using the previously-described xpost.sub.-- commit.sub.-- work boolean. In effect, the method at lines 106-115 marks the memory which needs to be freed up. The method concludes by returning a "success" result at line 117. The next method, dsm.sub.-- get.sub.-- next.sub.-- page.sub.-- id, serves to get the ID for the next page in the single logical page chain. Recall that, although there are multiple physical page chains, it is desired to preserve the view of a single logical page chain. It does not suffice, therefore, to get the next page by simply looking at the pointer for the next physical page. Additionally, the system must appropriately handle the traversal of a boundary from the last page of one physical page chain to the first page of another physical page chain. The method dsm.sub.-- getnextpage does exactly that. In an exemplary embodiment, the dsm.sub.-- get.sub.-- next.sub.-- page.sub.-- id method may be constructed as follows (in the C programming language):
______________________________________
/*
** dsm.sub.-- get.sub.-- next.sub.-- page.sub.-- id
**
** Parameters:
** sdes -- (I/O) Session descriptor of the table
** bp -- (I/P) Pointer to current page's buffer
**
** Returns:
** The next page id.
**
** Side Effects:
** sdes structure may get updated to point at the next slice.
**
** Notes:
** It is acceptable to call this function even if the table is
** not sliced, in which case the function behaves as if the table
** has one slice.
**
**
*/
pgid.sub.-- t
dsm.sub.-- get.sub.-- next.sub.-- page.sub.-- id(SDES *sdes, BUF *bp)
pgid.sub.-- t pageid;
slnum.sub.-- t slicenum; /* number of slices */
slnum.sub.-- t cur.sub.-- slicenum; /* current slice number */
/*
** It will be a good idea to make this fucntion a macro so
** that the next statement (which is the common case) gets
** executed without the cost of a procedure call.
*/
/* If not end of chain, return the next page id */
if (bp-->bpage-->pnextpg |= 0)
{
return bp-->bpage-->pnextpg;
}
/* We are at the end of a page chain. */
/*
** For clustered and non-clustered index scans (sindid |= 0),
** and for table scans (sindid == 0) on unsliced tables,
** end of the chain indicates next page is zero.
*/
if ((sdes-->sindid |= 0) .vertline..vertline. |DSM.sub.-- IS.sub.--
TABLE.sub.-- SLICED(sdes))
{
return 0;
}
/* For sliced table, position the scan on the next slice */
/* Get number of slices for the table */
slicenum = SDES.sub.-- NUMSLICES(sdes);
/* Get the slice number currently positioned on */
cur.sub.-- slicenum = SDES.sub.-- CURSLICE(sdes);
/* If already at the last slice, next page id is 0 */
if (cur.sub.-- slicenum == slicenum)
{
return 0;
}
/* Change SDES to point at the next slice */
slcache.sub.-- getnext.sub.-- slice(sdes);
/* Get the first page id of the slice */
(void) slcache.sub.-- getval(sdes-->sslcacheptr, SLC.sub.-- FIRSTPAGE,
(BYTE *)&pageid);
/* That is the next page for the scan */
return pageid;
}
______________________________________
Given an sdes and a buffer which contains the current page, the method will examine the buffer and fetch the next slice, if one exists. If the current slice is the third slice, for instance, and the object contains a fourth slice, the method will fetch the first page of the fourth slice, by returning a page ID. The specific steps of the method are as follows. First, the method examines the buffer passed in to see whether it contains a valid next page pointer. If it does, the method simply returns the ID for that next page and is done. In other words, if the method is not traversing the boundary between two physical page chains (or is not at the end of the last page chain), it may simply return the ID for the next physical page. Otherwise, the method has reached the end of a boundary. It must at this point, therefore, determine whether the table is sliced. If the table is not sliced, then the method has finally reached the end of the last page chain and has really reached the end of the single logical chain; in such a case it may simply return. The ID, for this case, is set to zero, thereby indicating the end of the page chain. If the table is a sliced table, on the other hand, the method must position the scan on the next slice. Before this may be done, the method must first determine how many slices exist for the table. Macros are defined for extracting this information out of the sdes data structure. From this information, the method can determine the next slice. Upon making this determination, the method simply gets the page ID for the first page for this next slice. The next method determines whether a table is empty. Prior to partitioning, it was a simple matter to determine whether a table was empty. If the first page had no rows, then the table was empty, as the first page was the only page in the object. With partitioned tables, the fact that one partition or slice is empty does not necessarily mean all other slices are empty. Therefore, the other slices must be checked. The dsm.sub.-- is.sub.-- table.sub.-- empty method checks exactly this:
______________________________________
/*
** dsm.sub.-- is.sub.-- table.sub.-- empty
** Parameters:
** sdes -- (I/O) Session descriptor of the table
** bp -- (I/P) Pointer to current first page's
** buffer. For a sliced table, it is the
** first page of the first slice.
**
** Returns:
** TRUE if table has zero rows and FALSE otherwise.
**
** Side Effects:
**
**
** Notes:
** It is acceptable to call this function even if the table is
** not sliced, in which case the function behaves as if the table
** has one slice.
**
** Assumptions:
** 1. The transaction must have taken table-level lock in share or
** exclusive mode.
**
**
*/
int
dsm.sub.-- is.sub.-- table.sub.-- empty(SDES *sdes, BUF *bp)
{
slnum.sub.-- t
slicenum;
/* number of slices */
slnum.sub.-- t
i; /* current slice number */
int empty; /* slice has rows? */
pgid.sub.-- t saved.sub.-- pageid;
SLCACHE *saved.sub.-- slcacheptr;
slnum.sub.-- t
saved.sub.-- curslice;
SCAN.sub.-- CONTEXT
scan.sub.-- context;
/* to retrieve first page */
/*
** If the first slice's page chain is not empty, we are done.
** Else, if the table is not sliced, nothing more to check.
*/
if (|DSM.sub.-- IS.sub.-- PAGECHAIN.sub.-- EMPTY(bp))
{
return FALSE;
}
else if (|DSM.sub.-- IS.sub.-- TABLE.sub.-- SLICED(sdes))
{
return TRUE;
}
/*
** It is a sliced table. For the table to be empty, all slices
** must be empty. First slice is known to be empty from above.
*/
/* Save sdes values that may get changed due to slice
processing */
saved.sub.-- pageid = sdes-->scur.pageid;
saved.sub.-- slcacheptr = sdes-->sslcacheptr;
saved.sub.-- curslice = sdes-->scurslice;
/* Get the total number of slices for the object */
slicenum = SDES.sub.-- NUMSLICES(sdes);
/*
** Go through each slice until a non-empty slice is found.
** First slice is already found to be empty.
*/
empty = TRUE;
sdes-->scurslice = 1;
for (i = 2; i <= slicenum && empty; i++)
{
/* Change SDES to point at the next slice */
slcache.sub.-- getnext.sub.-- slice(sdes);
/* Get the first page id of the slice and then the page */
(void) slcache.sub.-- getval(sdes-->sslcacheptr,
SLC.sub.-- FIRSTPAGE, (BYTE *)&sdes-->scur.pageid);
/* get scan context */
getpage.sub.-- init.sub.-- scan.sub.-- context(&scan.sub.-- context,
sdes,
SDES.sub.-- OBJID(sdes),
TABENTRY, NULL.sub.-- CACHEID);
bp = getpage.sub.-- noscan(sdes, &scan.sub.-- context);
/* Determine if slice is empty */
empty = DSM.sub.-- IS.sub.-- PAGECHAIN.sub.-- EMPTY(bp);
/* Done with the page */
bufunkeep(bp, sdes);
}
/*
** Restore the values in SDES that might have been changed
** due to slice processing above.
*/
sdes-->scur.pageid = saved.sub.-- pageid;
sdes-->sslcacheptr = saved.sub.-- slcacheptr;
sdes-->scurslice = saved.sub.-- curslice;
return empty;
}
______________________________________
The method is generic in the sense that the caller does not have to be concerned whether the table is sliced or not; the method does the work for either case. The method operates by reading in the first page of the object (i.e., what the caller thinks is the first page). If that page is not empty, then clearly the object is not empty, in which case the method may return "false." The next case is one in which the first page is empty and the table is not sliced. In such a case, the method may return "true," thereby indicating that the object is in fact empty. The remaining cases are ones in which the method is dealing with a sliced table and the first page is empty. Here, the method must look at the other slices to see whether they are empty. This is done as follows. The method gets the number of slices. Then, it sets up a loop to look at the first page in each slice. For each such page, the method makes a determination of whether the page is empty. A page is empty when it contains no rows. If all pages are empty, the method may return "true," thereby indicating that the object, which is sliced, is empty. If any one of the slices is not empty, however, the method may conclude that the object is not empty and return "false." The next method, dsm.sub.-- change.sub.-- last.sub.-- page.sub.-- id, simply serves to change the last page ID of a page chain. The need for this typically arises when appending a page. Again, the method insulates the caller from concerns about whether the object is sliced or unsliced. In an exemplary embodiment, the method may be constructed as follows (in the C programming language):
______________________________________
/* dsm.sub.-- change.sub.-- last.sub.-- page.sub.-- id
** Changes the last page id of a page chain to the new id.
**
**
** Parameters:
** xdes - (I/O) Transaction descriptor
** sdes - (I/O) Session descriptor of the table
** slicenum - (I/P) Slice number of the page chain. 0 if
** caller does not know the slice number;
** unused if the table is not sliced.
** old.sub.-- lastpage - (I/P) Old last page id
** new.sub.-- lastpage - (I/P) New last page id
** allocated - (I/P) Change due to allocation of a new page
** as last page (TRUE) or deallocation of
** the current last page (FALSE)
**
** Returns:
** TRUE if change is successful and FALSE otherwise.
**
** Notes:
** It is acceptable to call this function even if the table is
** not sliced.
**
**
*/
int
dsm.sub.-- change.sub.-- last.sub.-- page.sub.-- id(XDES *xdes, SDES
*sdes, slnum.sub.-- t
slicenum, pgid.sub.-- t old.sub.-- lastpage, pgid.sub.-- t
new.sub.-- lastpage, int
allocated)
SLCACHE *slc;
int status; /* From sl.sub.-- chgvalue */
/*
** If unsliced table, call sysindexes manager to change last
** page id. 0 is the index id for heap's row in sysindexes.
*/
if (|DSM.sub.-- IS.sub.-- TABLE.sub.-- SLICED(sdes))
{
return ind.sub.-- chgvalue(sdes, sdes->sxdesp, 0, ROOTPAGE,
(BYTE *) &new.sub.-- lastpage);
}
/* Unless given, find slice number old.sub.-- lastpage belongs to
*/
if (slicenum == 0)
{
if ((slc = slcache.sub.-- findslice(sdes->sdesp, old.sub.--
lastpage,
SLC.sub.-- LASTPAGE)) == NULL)
{
return FALSE;
}
slicenum = slc->sc.sub.-- slicenum;
}
/* Change the last page id of the slice */
status = sl.sub.-- chgvalue(xdes, sdes, slicenum,
(allocated) ? SLC.sub.-- LASTPG.sub.-- PLUS : SLC.sub.-- LASTPAGE,
1
(BYTE *) &new.sub.-- lastpage);
return status;
}
______________________________________
The steps of the method are as follows. If the table is not sliced, the method simply changes the page ID in one place, sysindexes, because there is exactly one last page. If, on the other hand, the object is a sliced object, the method must figure out which slice the last page belongs to and then change the last page ID. The change is made by finding the corresponding entry in cache and updating it accordingly. The hsm.sub.-- create.sub.-- slice and hsm.sub.-- destroy.sub.-- slice methods have already been described (during operation of methods of the Dataslices manager). In addition to these methods, the Heapslices manager includes the previously-mentioned methods for "opening" and "closing" a slice. The hsm.sub.-- open method is invoked when the system is ready to perform an INSERT operation. The system must decide which slice to insert into and position the sdes on that particular slice (by setting a pointer to the appropriate cache entry). In an exemplary embodiment, the method may be constructed (using the C programming language) as follows:
______________________________________
/*
** hsm.sub.-- open
**
** Parameters:
** sdes -- SDES ptr for the heap object
** slnum -- the slice number to open
**
** Returns:
** None
**
** Side Effects:
** calls the hsm.sub.-- strategy() to get a slice assignment
** if the slnum = 0.
** Sets the SLCACHE ptr and the current slice number
** in the SDES.
**
** Synchronizations:
**
**
*/
void
hsm.sub.-- open(SDES *sdes,
slnum.sub.-- t slnum)
SLCACHE *slc;
if (slnum == 0)
{
slnum = hsm.sub.-- strategy(sdes);
}
slc = slcache.sub.-- des.sub.-- get(sdes->sdesp, slnum);
sdes->sslcacheptr = slc;
sdes->scurslice = slnum;
/* indicate that the session has opened a slice */
sdes->sstat2 .vertline.= SS2.sub.-- SLICE.sub.-- OPENED;
}/* end hsm.sub.-- open */
______________________________________
The steps of the method are as follows. First, the method looks to whether the caller of the method actually specified a particular slice (i.e., slice number). If the caller has not already chosen a slice, then the method calls on to a subroutine, hsm.sub.-- strategy, which will choose a slice at random. Based on this information, the method gets the pointer to the corresponding cache entry, sets the sdes to point to that cache entry, and sets the slice number. Finally, the method indicates that the slice has been opened, setting an appropriate flag in the sdes. The flag is provided so that this method is not repeatedly called. Complementing the above method is hsm.sub.-- close. It serves to indicate that the system is done with inserts to the partition. In an exemplary embodiment, the method may be constructed (using the C programming language) as follows:
______________________________________
/*
** hsm.sub.-- close
**
** Parameters:
** sdes
SDES ptr for the heap object
**
** Returns:
** None
**
** Side Effects:
** none
**
** Synchronization:
**
**
*/
void
hsm.sub.-- close(SDES
*sdes)
SLCACHE *slc;
slnum.sub.-- t
slnum;
slnum = sdes->scurslice;
slc = slcache.sub.-- des.sub.-- get(sdes->sdesp, slnum);
sdes->sslcacheptr = (SLCACHE *)NULL;
sdes->scruslice = 0;
sdes->sstat2 &= .about.SS2.sub.-- SLICE.sub.-- OPENED;
}/* end hsm.sub.-- close */
______________________________________
The method operates as follows. It sets the cache pointer in the sdes descriptor to NULL, thereby indicating that the system is not accessing a particular slice at that moment. Thereafter, the method is done. The hsm.sub.-- strategy method generates a random number in the method, for randomizing the selection of a partition. It may be constructed as follows:
______________________________________
/*
** hsm.sub.-- strategy
**
** Parameters:
** sdes
SDES ptr to a sliced object.
**
** Returns:
** slice number
**
** Side Effects:
** none.
**
** Synchronizations:
**
**
*/
slnum.sub.-- t
hsm.sub.-- strategy(SDES *sdes)
if (sdes->sxdesp->xslicehashval == 0)
{
/*
** For this transaction, this is the first insert into
** a slice table. Generate a random value and store
** it in xdes. This is used to choose a slice for this
** insert and future inserts by this transaction.
*/
Eresource->erslicerandomval =
(ersource->erslicerandomval + SLICERAND) &
0x7fffffff;
/*
** To reduce the probability of two engines getting
** the same random value, add engine id.
*/
sdes->sxdesp->xslicehashval = Eresource->
erslicerandomval + Engine.sub.-- number;
}
/*
** Choose a slice based on the hash value in xdes.
** With this formula, it is ensured that multiple insert
** statements into the same table within the same transaction
** will use the same slice of the table. If a slice is
** randomly chosen for each insert then we get the following
** undesirable behavior: a transaction with
** multiple insert statements into the same table may end up
** using more than one slice of the table thereby preventing
** other transactions from inserting into the table.
*/
return (sdes->sxdesp->xslicehashval %
SDES.sub.-- NUMSLICES(sdes)) + 1;
}/* end hsm.sub.-- strategy */
______________________________________
Once the number is generated, the method stores it in the xdes descriptor. Recall that the sdes stores a pointer to the xdes. Accordingly, it can be determined from the sdes which xdes is currently being employed (i.e., for the current transaction). The random number is, therefore, stored in the xdes field and is available to the sdes data structure, via a pointer de-reference. The object, in turn, performs a modulus on the random number with the number of partitions, for figuring out which particular partition to go to. The random number itself is refreshed on a per transaction basis. The method for allocating a new page is hsm.sub.-- alloc.sub.-- page. It serves to allocate a new page to a partition. In contrast to a page allocation routine for unsliced objects, this method looks at the device affinity map, for determining the preferred device or devices for the current partition. In an exemplary embodiment, the hsm.sub.-- alloc.sub.-- page method may be constructed (using the C programming language) as follows:
______________________________________
/*
** hsm.sub.-- alloc.sub.-- page
** This function is called to add a new page to the end of the
** current partition's page chain
**
** Parameters:
** sdes
SDES ptr to a partitioned object.
** indptr
ptr to the heap's INDEX row
** targetpg -- page id indicating where to start the search
** (used as a hint for allocation)
** pgc
allocation page context.
**
** Returns:
** buffer containing allocated page or null
**
** Side Effects:
** none.
**
** Syncronizations:
**
**
*/
BUF *
hsm.sub.-- alloc.sub.-- page(SDES
*sdes,
INDEX *indptr,
pgid.sub.-- t
targetpg,
PGALLOC.sub.-- CTRL *pgc)
BUF *newbp;
BUF *bp; /* control page buffer */
DEV.sub.-- INFO.sub.-- ELEM
*dmap;
SLICE.sub.-- CTRLBLK
*cbp;
/*
** Access the control page for the slice to obtain the device
** assignment map.
*/
sdes->scur.pageid = sdes->sslcacheptr->sc.sub.-- ctrlpg;
/*
** The scan context for the control page will be the
** same as that for the page allocation.
*/
bp = getpage.sub.-- noscan(sdes, &pgc->pga.sub.-- scan.sub.-- context);
cbp = (SLICE.sub.-- CTRLBLK *) ((BYTE *)bp->bpage +
PAGEHEADSIZE);
dmap = (DEV.sub.-- INFO.sub.-- ELEM *) ((BYTE *)bp->bpage +
PAGEHEADSIZE + sizeof(SLICE.sub.-- CTRLBLK);
newbp = pg.sub.-- biased.sub.-- allocate(sdes, indptr, targetpg,
pgc, dmap, &cbp->scp.sub.-- aftbl.sub.-- offset,
cbp->scp.sub.-- afftbl.sub.-- count);
/* unkeep the control page buffer */
bufunkeep(bp, sdes);
return(newbp);
}/* end hsm.sub.-- alloc.sub.-- page */
______________________________________
The steps of the method are as follows. Given an sdes (which is essentially a pointer to the object), the method determines the control page ID for the slice. It next reads the control page for accessing the device map. Using the particular device affinity indicated by the map, the method will then call on to the page allocation method, passing a pointer to the device affinity map. The method hsm.sub.-- get.sub.-- slgroup is needed when an object is first partitioned. It sets up the memory for the cache entry and, in an exemplary embodiment, may be constructed (using the C programming language) as follows:
______________________________________
/*
** hsm.sub.-- get.sub.-- slgroup
** This function grabs as many SLGROUP structures from the
** free pool as specified. It returns a pointer to the head
** of the newly allocated null-terminated list of groups.
**
** Parameters:
** count - Number of SLGROUPs to be allocated.
**
** Returns:
** Pointer to the head of the list of SLGROUP structures (or NULL)
**
** Side Effects:
** None.
**
** Synchronizations
** Grabs the slgroup freelist spinlock.
**
*/
SLGROUP *
hsm.sub.-- get.sub.-- slgroup(int count)
struct spinlock *master; /* local copy of freelist spinlock */
SLGROUP *head; /* local copy of head of freelist */
SLGROUP *cur; /* temporary pointer to SLGROUP list */
SLGROUP *prev; /* temporary pointer to SLGROUP list */
int i; /* Counter for allocating SLGROUPS */
master = Resource->rslgroup.sub.-- spin;
cur = head = Resource->rslgroup.sub.-- head;
prev = (SLGROUP *)NULL;
P.sub.-- SPINLOCK(master);
if (head == (SLGROUP *)NULL)
{
V.sub.-- SPINLOCK(master);
return (SLGROUP *)NULL;
}
else /* Attempt the allocation */
{
/* Remove `count` SLGROUP entries from the list */
for (i = 0; i < count; i++)
{
if (cur |= (SLGROUP *)NULL)
{
prev = cur;
cur = cur->next;
}
else
{
/* There aren't enough SLGROUPs. */
V.sub.-- SPINLOCK(master);
return (SLGROUP *)NULL;
}
}
/* We allocated as many SLGROUPS as required. */
Resource->rslgroup.sub.-- head = cur;
}
V.sub.-- SPINLOCK(master);
/* Terminate the list just allocated. */
prev->next = (SLGROUP *)NULL;
return (head);
}/* end hsm.sub.-- get.sub.-- slgroup */
______________________________________
Given a certain number of partitions to create, the method allocates sufficient memory in the cache for supporting that number of partitions. It return a pointer to that memory. The method is typically invoked by the hsm.sub.-- create.sub.-- slice method. In order to minimize pointer overhead, a number of cache entries are packaged into each SLGROUP. In an exemplary embodiment, each SLGROUP stores 16 cache entries. Based on the number of partitions required, the method will allocate one or more groups, as needed. The method also allocates spin locks. Each cache requires a spin lock, in order to lock it from other operators. In operation, the method gets the head of free memory (i.e., free pool) and then computes (outside the caller) how many groups are needed, accounting for the fact that the first group is stored in the sdes. If sufficient memory is available, the method makes the allocation and sets the list head accordingly. Spin locks are employed at this point so other processes do not attempt to allocate the same groups. After the allocation is made, the lock is released. In a complementary manner, the memory is released back to the free list when it is no longer needed, typically when a table is unpartitioned. In such an instance, the post-commit flag is set so that memory will be freed upon completion of the transaction. The typical sequence is as follows. First, a table is unpartitioned. All the memory for the cache is put into the xdes, with the post-commit flag set to true. Upon completion of the transaction, the post-commit flag is examined for determining whether memory is to be freed. When the flag is set to true, the memory pointed to by the xdes is placed back on the free list. The actual task of freeing the memory is done by a free.sub.-- slgroup method. This is done under control of spin locks. The method hsm.sub.-- get.sub.-- slicelk is the particular routine which gets the spin lock for a cache. It may be constructed as follows:
______________________________________
/*
** hsm.sub.-- get.sub.-- slicelk
** This function returns spinlock from the global
** pool. A spinlock can be assigned more than one slice.
**
** Parameters:
** void
**
** Returns:
** A pointer to a spinlock structure
**
** Side Effects:
** None.
**
*/
struct spinlock *
hsm.sub.-- get.sub.-- slicelk(void)
SLICESPIN.sub.-- CB *slicecb;
struct spinlock *master;
int temp;
struct spinlock *spinptr;
master = Resource->rslgroup.sub.-- spin;
slicecb = ((SLICESPIN.sub.-- CB *)Resource->rslicespin.sub.--
cb);
P.sub.-- SPINLOCK(master);
/* get index for next spinlock to be assigned */
if (slicecb->scb.sub.-- spinassign == slicecb->scb.sub.-- spinlk.su
| ||||||
