Method and system for processing queries in a database system using index structures that are not native to the database system5893104Abstract A method and apparatus for processing a query in a database system using index types that are not built into the database system are disclosed. Routines for managing an index structure that is not supported by a database system are generated. Data that identifies the routines are submitted to the database system, thereby "registering" the index types with the database system. In response to statements issued to the database system by a client, the database system calls the routines, causing the routines to create an index structure using data from a data container in the database, and to generate data that indicates which data in the data container satisfies a query issued by the client. The routines of the registered index type extend the indexing capabilities of the database systems and one or more such index types can be registered with the database system. The index structure managed by the routines may be maintained within segments of the database, and the segments may be accessed as index-only tables. Storing a row of data in a database using index-only tables involves storing in a leaf node an index entry that includes a key value along with all other values in the row of data. If the row of data exceeds a predetermined size, then a portion of the row of data is stored in an overflow area. Retrieving a row of data from an index-only table for a user-supplied key involves identifying a leaf node for the key, and reading a row of data from the index entry and any remaining portion from the overflow area when the row exceeds the predetermined size. Claims What is claimed is: Description FIELD OF THE INVENTION
______________________________________
PROCEDUREi.sub.-- create(iargs Iargs, parms VARCHAR2(1024));
PROCEDUREi.sub.-- drop(iname VARCHAR(30));
PROCEDUREi.sub.-- truncate(iname VARCHAR(30));
PROCEDUREi.sub.-- insert(iargs Iargs, rowid ROWID, newval Ivals);
PROCEDUREi.sub.-- delete(iargs Iargs, rowid ROWID, oldval Ivals);
PROCEDUREi.sub.-- update(iargs Iargs, rowid ROWID, oldval Ivals, newval
Ivals);
FUNCTIONi.sub.-- open(iargs Iargs, opinfo Iops)RETURN Ihdl;
PROCEDUREi.sub.-- start(ihdl Ihdl);
FUNCTIONi.sub.-- fetch(ihdl Ihdl, ancdata OUT VARRAY(30) OF
TYPEANY) RETURN ROWID;
FUNCTIONi.sub.-- joinfetch(ihdl Ihdl, ancdata OUT VARRAY(30) OF
TYPEANY) RETURN VARRAY(2) OF ROWID;
PROCEDUREi.sub.-- close(ihdl Ihdl);
______________________________________
In the exemplary index routine definitions listed above, parameters of types Iargs, Ivals, Iops, and Ihdl are used by some of the methods. Those parameter types may be defined as follows:
______________________________________
CREATE TYPE Iargs
Iname VARCHAR(30),--Index Object Name
Iobjnum NUMBER,--Unique Index Instance Number
Isname VARCHAR(30),--Schema containing the index
Tabname VARCHAR(30),--Indexed Table Name
Tabsname VARCHAR(30),--Schema containing the table
Colname VARRAY(30) OF VARCHAR(30),--Names of indexed
columns
Coltype VARRAY(30) OF VARCHAR(30)--Types of indexed columns
);
CREATE TYPE Ivals
(
Numval NUMBER,--Number of values
Val VARRAY(30) OF TYPE ANY--Array of values
);
CREATE TYPE Iops
(
Opname VARCHAR(30),--Indexed Operator name
Opsname VARCHAR(30),--Schema containing the operator
Numargs NUMBER,--Number of arguments to the operator
Opargs VARRAY(30) OF TYPEANY,--Argument values
Opmode NUMBER--Operator mode: 1=selection, 2=join
);
CREATE TYPE Ihdl
(
state NUMBER--user maintained state
);
______________________________________
The exemplary list of routines includes three DDL routines: i.sub.-- create, i.sub.-- drop, and i.sub.-- truncate. The database server calls the i.sub.-- create procedure of an index object when a CREATE INDEX statement is issued that references a non-native index type. Upon invocation, the database instance passes to the appropriate routine any physical parameters that are specified in the related CREATE INDEX . . . PARAMETERS (. . . ) statement. The iargs argument contains the index name, the table name on which the index is defined, and descriptions of the columns or Abstract Data Type (ADT) attributes over which the index is defined. The i.sub.-- create routine can also be invoked to perform bulk loading of index objects. If the table with the indexed column(s) is not empty, the i.sub.-- create routine scans the table and builds the index structures for all the existing rows. The i.sub.-- drop procedure of an index object is invoked by the database server hen an index object is destroyed using a DROP INDEX statement. The iname argument contains the name of the index object to be dropped. The i.sub.-- truncate procedure of an index object is called by the database server when a TRUNCATE statement is issued against a table that contains a column or ADT attribute indexed using a non-native index type. After this procedure executes, the index object definition should be empty but remain intact. The iname argument contains the index object name. The exemplary routines listed above also define three DML routines that are invoked to insert, update, and delete entries from an index object. Specifically, the database server invokes the i.sub.-- insert routine of an index object when a record is inserted in a table that contains columns or ADT attributes indexed by the related index object. Upon invocation, the iargs argument contains the index name, the table name on which the index is defined, and descriptions of the columns or ADT attributes over which the index is defined. The newval argument contains an array of values of indexed columns. The i.sub.-- delete procedure of an index object is called by the database server when a record is deleted that contains columns or ADT attributes indexed by the related index object. The index name, table name, and descriptions of the indexed columns or ADT attributes are contained in the iargs argument. The oldval argument contains an array of values of indexed columns. The i.sub.-- update procedure of an index object is called by the database server when a record is updated that contains columns or ADT attributes indexed by the related index object. Upon invocation, the iargs argument contains the index name, the table name on which the index is defined, and descriptions of the columns or ADT attributes over which the index is defined. The oldval and newval arguments point to old and new column data. The routines i.sub.-- open, i.sub.-- close, i.sub.-- start, and i.sub.-- fetch are routines that are invoked by the database server during query processing. Specifically, i.sub.-- open is called by the database server to begin processing a query on a table that has an index object. The arguments passed to the i.sub.-- open routine contain information related to the index and information about the operator specified in the query. The index related information includes the name of the index and its unique object number. The operator information provides the name of the indexed operator and the values of the arguments with which the operator is being invoked by the system. The i.sub.-- open function returns a handle of type Ihdl, which is subsequently passed in as an argument to the i.sub.-- fetch, i.sub.-- start and i.sub.-- close routines. The handle can be used to store some state defined by the index object. The handle is not interpreted by the database server and is for use only by the software that manages the index object. A typical usage of the handle is to store the memory address for the perinvocation state object maintained by the index object. The database server calls the i.sub.-- close routine at the end of query processing. To execute the same query multiple times without opening and closing, the database server calls the i.sub.-- start routine. The i.sub.-- start routine resets the state of the index object to the start of query processing. The database server calls the i.sub.-- fetch routine to return rows that satisfy the indexed operator. The indexed operator can be supported in either the selection mode or the join mode. The selection mode allows selection of qualifying rows from one base table. The join mode operator performs the join of two base tables. According to one embodiment, the i.sub.-- fetch routine returns unique identifiers ("rowids") of all the rows for which the indexed operator evaluates to TRUE. Each call to i.sub.-- fetch returns a single rowid along with the ancillary data output values. Thus, the i.sub.-- fetch routine does not return the rows for which the indexed operator evaluates to FALSE (or NULL). The i.sub.-- fetch returns a NULL rowid to indicate that all the qualifying rows have been returned. The i.sub.-- joinfetch routine does not have to be specified if the index object does not support the join modes of any of its indexed operators. However, if an operator is supported in the join mode, the i.sub.-- joinfetch routine returns a pair of rowids along with the ancillary data output values. The end of fetch is indicated by returning NULL rowids. The fetch routines may be EXACT or INEXACT with respect to each of the indexed operators. If a fetch routine is EXACT with respect to an indexed operator, then the set of rows returned are exactly those that satisfy the operator. In an INEXACT implementation of an indexed operator, the returned rows are a superset of the actual result. In this case, the system applies a "secondary filter" before returning the rows to the user. This secondary filter is the non-indexed implementation of the operator, which is guaranteed to be exact. For example, the spatial index may support an operator "Overlaps" that determines if two geometries overlap one another. The implementation of the indexed operator may be approximate in that it may sometimes return non-overlapping geometries. The secondary filter is applied by the system to reject such rows. The index routines specified above are merely exemplary of the type of routines that would be supported by non-native index types. The actual routines that the database server expects a domain expert to implement for an index type may vary from implementation to implementation. For example, the index routines may include a fetch routine that returns multiple rows at a time. For example, the i.sub.-- fetch routine may alternatively be defined as: FUNCTION i.sub.-- fetch(ihdl Ihdl, ancdata OUT VARRAY(60) OF TYPEANY, numrows OUT NUMBER) RETURN VARRAY(2) OF ROWID; where the actual number of rows being returned is specified through the numrows parameter. If the i.sub.-- fetch routine returns more than one result row at a time, the number of calls the database server will have to make to the i.sub.-- fetch routine will be significantly reduced. The reduction in the number of invocations can result in considerable performance gains. A similar technique can be used to improve the performance of the i.sub.-- joinfetch routine. QUERY PROCESSING EXAMPLE To illustrate how a database server may process a query, assume that a client has created a table "Emp" that includes a Name column, an Id column, and a Resume column. Assume also that the table Emp has the following rows:
______________________________________
Name Id Resume
Jags 100 Works at Oracle
John 101 Knows Unix
Ravi 102 Experience with Oracle rdbms
______________________________________
Assume that the client has created a text index on the Resume column. The text index is an index object that is accessed through routines supplied by a domain expert. To evaluate the query: SELECT * FROM Emp WHERE Contains(resume, `Oracle`);, the database server may invoke the query processing routines as follows.
______________________________________
hdl=i.sub.-- open();/*Passes in contains operator and arguments*/
/*gets back a handle*/
for (;;){
i.sub.-- start(hdl),/*Starting to access rows*/
for(;;){
rownum=i.sub.-- fetch(hdl);/*returns the rowids*/
if(|rownum) break;/* All rows done*/
/* Process rownum*/
if (|re.sub.-- execute)break; /* Check if the query needs to be
reexecuted?*/
}
i.sub.-- close(hdl);
______________________________________
In the present example, the calls to i.sub.-- fetch will successively return rows 1, 3, NULL. A query can be executed multiple times before closing the handle. THE DOMAIN EXPERT Before a non-native index type can be used, the index routines for the non-native index type must be created, and the database server must be made aware of the non-native index type and the location of its associated routines. The designer of the index routines is referred to herein as a "domain expert". FIG. 2 is a flow chart illustrating the steps performed by the domain expert to design a non-native index type according to one embodiment of the invention. As a preliminary step, the domain expert must decide which operators will be supported by the non-native index type. For example, the text index may support the operator "Contains", while a spatial index may support the operator "Overlaps". The operators supported by a non-native index type are referred to herein as indexed operators. Referring to FIG. 2, at step 202 the domain expert defines and codes functions to support non-indexed implementations of the indexed operators. The non-indexed implementation of the indexed operators can be coded as stand alone functions in the database language supported by the database server (e.g. PL/SQL), as a packaged functions, or as Abstract Data Type ("ADT") methods. According to one embodiment of the invention, the functions that implement the indexed operators must satisfy certain requirements. Specifically, all indexed operators must return Boolean values. The only valid return values are TRUE and FALSE. The parameters to the indexed operator are one or more IN parameters followed by zero or more OUT parameters. When the indexed operator returns FALSE, the values of each of the OUT parameters are set to NULL. The non-indexed implementation of indexed operators is essential because the system will not be able to use the index to evaluate the operator in all cases. For the purposes of explanation, it shall be assumed that a domain expert is developing a client that will use a text-based index that supports an operator "Contains" that takes as parameters a text value and a key and returns a Boolean value indicating whether the text contained the key. Assuming that the database server supports the PL/SQL database language, the non-indexed implementation of the Contains operator may be defined as:
______________________________________
CREATE FUNCTION Contains(Text IN VARCHAR2, Key IN
VARCHAR2)
RETURN Boolean AS
BEGIN
. . .
END Contains;
______________________________________
At step 204, the domain expert defines and codes (1) implementations for the index routines and (2) routines that implement the operator using an index object of the non-native index type. This coding is typically performed in a third generation programming language such as Pascal, C, or C++. These routines are compiled into executable modules that can be statically linked with the database server code, dynamically linked with the database server code, or executed as separate processes when invoked by the database server. For example, the domain expert of the text index may create the DDL routines to build the text index on a text column ("text.sub.-- create"), remove the index information when the index is dropped ("text.sub.-- drop"), and truncate the text index when the base table is truncated ("text.sub.-- truncate"). These implement the i.sub.-- create, i.sub.-- drop, and i.sub.-- truncate index routines for a specific type of non-native index. The DML routines created by the domain expert may include routines to manage the text index when rows are inserted to the base table ("text.sub.-- insert"), deleted from the base table ("text.sub.-- delete") or updated in the base table ("text.sub.-- update"). These routines implement the i.sub.-- insert, i.sub.-- delete, and i.sub.-- update index routines for the non-native index. The QP routines for the text index may include the routines text.sub.-- open, text.sub.-- start, text.sub.-- fetch, and text.sub.-- close which implement the index routines used to access the text index to retrieve rows of the base table that satisfy an operator predicate passed to the routines from the database server. In the present example, the Contains(. . . ) predicate of a query received by the database server is passed from the database server to the query routines which then access the text index and return the qualifying rows to the system. According to one embodiment of the invention, the names of all index routines are fixed, and the domain expert must supply routines which implement all of the index routines except the i.sub.-- joinfetch function. An implementation of the i.sub.-- joinfetch function must be supplied only when the index supports a "join" indexed function. At step 206, the domain expert creates a data type that associates the routines created in step 204 with the index routines to which they correspond. This data type brings together all the routines for managing and accessing the index object. For example, a data type named "OTextIType" which brings together all of the routines for managing and accessing the text index may be created as follows:
______________________________________
CREATE TYPE OTextIType
--DDL Routines
PROCEDUREi.sub.-- create(iargs Iargs, parms VARCHAR2(1024)) AS
EXTERNAL NAME `/text/text.sub.-- create` WITH CONTEXT;
PROCEDUREi.sub.-- drop(iname VARCHAR(30)) AS EXTERNAL NAME
`/text/text.sub.-- drop` WITH CONTEXT;
PROCEDUREi.sub.-- truncate(iname VARCHAR(30)) AS EXTERNAL
NAME `/text/text.sub.-- truncate` WITH CONTEXT;
--DML Routines
PROCEDUREi.sub.-- insert(iargs Iargs, rowid ROWID), newval Ivals) AS
EXTERNAL NAME `/text/text.sub.-- insert` WITH CONTEXT;
PROCEDUREi.sub.-- delete(iargs Iargs, rowid ROWID, oldval Ivals) AS
EXTERNAL NAME `/text/text.sub.-- delete` WITH CONTEXT;
PROCEDUREi.sub.-- update(iargs Iargs, rowid ROWID, oldval Ivals,
newval Ivals) AS EXTERNAL NAME `/text/text.sub.-- update` WITH
CONTEXT;
Query Processing Routines- for operators
FUNCTIONi.sub.-- open(iargs Iargs, opinifo Iops) RETURN Ihdl AS
EXTERNAL NAME `/text/text.sub.--open` WITH CONTEXT RETURN Ihdl;
PROCEDUREi.sub.-- start(ihdl Ihdl) AS EXTERNAL NAME `/text/text.sub.--
start` WITH CONTEXT;
FUNCTIONi.sub.-- fetch(ihdl Ihdl, ancdata OUT VARRAY(30) OF
TYPEANY) RETURN ROWID AS EXTERNAL NAME `/text/text.sub.--
fetch` WITH CONTEXT RETURN ROWID;
PROCEDUREi.sub.-- close(ihdl Ihdl) AS EXTERNAL NAME `text/text.sub.--
close` WITH CONTEXT;
);
______________________________________
The WITH CONTEXT option allows the external routines to rely on execution services provided by the database system that handle errors and allocate memory. At step 208, an Indextype schema object is created for the non-native index type. The Indextype schema object is associated with its implementation provided by the data type declared in step 206. The schema object declaration indicates both the data type created at step 206 and the operator functions created at step 204. For example, a schema object named "OTextI" for a text index that supports the Contains operator may be created by the declaration:
______________________________________
CREATE INDEXTYPE OTextI
OPERATORS Contains(VARCHAR2, VARCHAR2)
AS TYPE OTextIType;
______________________________________
After the statements generated in steps 206 and 208 are submitted to the database server, the non-native index type is considered registered with the database. That is, the database server is aware of the operators that can be used with an OTextI index type and the location of the routines that must be invoked to create and manage an OTextI index object. INHERITANCE According to an alternate embodiment, the domain expert may define the data type for the non-native index using an inheritance mechanism. According to this embodiment, the database system defines a base type, BaseIType with default implementations of all the index routines. The default actions could be, for example, to simply raise an error. Domain experts then derive a data type for their specific index type from the base type, and define routine implementations that override the default implementations of the index routines. The domain expert can override the implementations of any or all of the methods in the BaseIType object class. The use of inheritance enforces the interface for the index routines since the domain expert will derive the interface from the base type. USE OF EXTENSIBLE INDICES Once a non-native index type is registered with the database server, a client can create index objects of the non-native index type and submit queries that use the indexed operators employed by the non-native index type. In the text index example, a user may define text indices on text columns and use the associated Contains operator to query text data. For example, a client may define a table "EMP" using the definition: CREATE TABLE Emp (name VARCHAR2(64), id INTEGER, resume VARCHAR2(2000)); The client may then cause a text index to be created on the resume column by submitting to the database server the statement: CREATE INDEX resumeI ON Emp(resume) INDEXTYPE IS OTextI; During the creation of the text index, the database server invokes the create routine supplied by the domain expert (i.e. "text.sub.-- create"). Once the text index is created on the resume column, the database server may invoke the QP routines provided by the domain expert to use the text index during the processing of queries on the base table. For example, the text data in the resume column can be queried as: SELECT * FROM Emp WHERE Contains(resume, `Oracle`); The text index on the resume column may be used to efficiently process the query. To use the text index, the database server invokes the QP routines associated with the OTextI Indextype. TUPLE-BASED INDEX MODELS As mentioned above, the domain expert is responsible for coding the routines that manage and interpret index objects. It has been discovered that the entries in virtually all types of index structures can be represented by tuples. For example, document indexing involves parsing text and inserting words ("tokens") into an inverted index. Such index entries typically have the following logical form (token, <docid, data>) where "token" is the key, "docid" is a unique identifier for the related document, and "data" is a segment containing Information Retrieval ("IR") specific quantities. For example, a probabilistic IR scheme could have a data segment with token frequency and occurrence list attributes. The occurrence list identifies all locations within the related document where the token appears. Assuming an IR scheme such as this, each index entry would be of the form: (token, <docid, frequency, occlist>. . . ) The following sample index entry for the token Archimedes illustrates the associated logical content. (Archimedes, <5, 3, ›7 62 225 !>, <26, 2, ›33, 49!>, . . . ) In this sample index entry, the token Archimedes appears in document 5 at 3 locations(7, 62, and 225), and in document 26 at 2 locations(33 and 49). Note that the index would contain one entry for every document with the word Archimedes. Because the index entries of an index object can be represented as tuples, the routines implemented by the domain expert to store the index entries may employ the same interface as the routines used to implement the native access methods. According to one embodiment of the invention, the interface of the routines used to access index-only tables is used for the routines that implement the index object tables. INDEX-ONLY TABLES An index-only table is similar to a conventional table with an index on one or more of its columns. However, an index-only table differs from a standard table in that instead of maintaining two separate data containers for the table and its index, the database server only maintains a single index with no actual base table. As with conventional tables, clients manipulate index-only tables by submitting statements to the database server in the database language supported by the database server. However, all operations on the data in the table are performed by manipulating the corresponding index. Each entry in the index for an index-only table contains both the encoded key value and the associated column values for the corresponding row. That is, rather than having row ROWID as the second element of the index entry, the actual data from the corresponding row is stored in the index. Thus, every index entry for an index-only table has the form <primary.sub.-- key.sub.-- value, non.sub.-- primary.sub.-- key.sub.-- column.sub.-- values>. Index-only tables are suitable for accessing data via primary key or via any key which is a valid prefix of the primary key. Also, there is no duplication of key values as only non key column values are stored with the key. ROW OVERFLOW For index-only tables, index entries can become very large, since they are made up of <key, non.sub.-- key.sub.-- column.sub.-- values>tuples. If index entries get very large, then the leaf nodes may have to store one row or row piece, thereby destroying the dense clustering property of the index. To overcome this problem, a Row Overflow Area clause may be used. Specifically, users specify an overflow tablespace as well as a threshold value. The threshold is specified as percentage of the block size. If the row size is greater than the specified threshold value, then the non-key column values for the row that exceeds the threshold are stored in the specified overflow tablespace. When this occurs, the index entry contains <key, rowhead>pair, where the rowhead contains the beginning portion of the rest of the columns. The entry is like a regular row-piece, except that an overflow row-piece contains the remaining column values. INDEX-ONLY TABLE OPERATIONS According to one embodiment, the SQL statement "CREATE TABLE" used to create conventional tables may be used to create index-only tables. However, when used to create index-only TABLES, the following additional information is supplied: An ORGANIZATION INDEXED qualifier is used to indicate this is an index-only table. A primary key is specified through a column constraint clause (for a single column primary key) or a table constraint clause (for a multiple column primary key). A primary key must be specified for index-only tables. An optional row overflow specification clause that specifies overflow area physical attributes may also be supplied. A data row is placed in the overflow tablespace if the size of the data row exceeds the threshold. The threshold is specified as percentage of the block size. For example, an index-only table can be created to model inverted index for an IR client in the following manner:
______________________________________
CREATE TABLE docindex
( token char(20),
doc.sub.-- oid integer,
token.sub.-- frequency smallint,
token.sub.-- occurrence.sub.-- data varchar(512),
CONSTRAINT pk.sub.-- docindex PRIMARY KEY (token,
doc.sub.-- oid))
ORGANIZATION INDEX TABLESPACE text.sub.-- collection
PCTTHRESHOLD 20
OVERFLOW TABLESPACE text.sub.-- collection.sub.-- overflow;
______________________________________
In the above definition, the ORGANIZATION INDEXED qualifier indicates that docindex is an index-only table, where the row data reside in an index defined on columns that designate the primary key (token, doc.sub.-- id) for the table. The overflow clause indicates that pieces of data rows which exceed 20% of the block size, will be placed in the text.sub.-- collection.sub.-- overflow tablespace. No syntax changes are required to manipulate index-only tables. A user can use an index-only table in place of a regular table in SQL INSERT, SELECT, DELETE, and UPDATE statements. However, for index-only tables the rows are stored in the B-tree itself and these rows do not have a row identity (ROWD). Thus, an index-only table does not support implicit ROWID column and hence users cannot perform ROWID based-retrieval on index-only tables. Instead, users access the rows of index-only tables using the primary key. In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
|
Same subclass Same class Consider this |
||||||||||
