O-R gateway: a system for connecting object-oriented application programs and relational databases5504885Abstract This is a method and system of extending the capability of a relational database management system's precompiler to object-oriented languages comprising: embedding SQL statement in an object-oriented program; compiling the object-oriented program with a precompiler from the database management system into a temporary file; compiling the temporary file with an object-oriented precompiler into a file acceptable to an object-oriented compiler; and compiling the file with the object-oriented compiler into an executable file. Claims What is claimed is: Description FIELD OF THE INVENTION
______________________________________
DEPARTMENT (dname, floor, budget)
EMPLOYEE (ename, dept, project)
HOURLY.sub.-- EMP (ename, wage, skills)
EMP.sub.-- CAR (ename, decal.sub.-- date)
______________________________________
Primary keys are underlined. Foreign key constraints on these relations are: dept in EMPLOYEE is a foreign key matching dname in DEPARTMENT and ename is a foreign key in both HOURLY.sub.-- EMP and EMP.sub.-- CAR matching ename in EMPLOYEE. The partial primary key of EMP.sub.-- CAR, carno, in EMP CAR identifies an employee's cars (i.e., first, second, etc.) and is not unique across all tuples of the relation. Connections that exist between these relations based on the rules given above are: 1. Ownership connection. Since ename is the primary key of EMPLOYEE and is part (subset) of the primary key of EMP.sub.-- CAR and given the above foreign key constraints, there is an ownership connection from EMPLOYEE to EMP.sub.-- CAR. EMP.sub.-- CAR tuples are owned by EMPLOYEE tuples in the sense that an EMP.sub.-- CAR tuple cannot exist in the database without being related to an EMPLOYEE tuple. The multiplicity from an owner relation to an owned relation is In; an EMPLOYEE tuple can own several EMP.sub.-- CAR tuples, while an EMP.sub.-- CAR tuple must be owned by only one EMPLOYEE tuple. 2. Subset connection. There is a subset connection from EMPLOYEE to HOURLY.sub.-- EMP based on the connection attribute ename, which is the primary key of both EMPLOYEE and HOURLY.sub.-- EMP and a foreign key in HOURLY.sub.-- EMP. This implies that the set of entities that HOURLY.sub.-- EMP tuples describe is a subset of the set of entities described by the tuples of EMPLOYEE. 3. There is a reference connection from EMPLOYEE to DEPARTMENT based on connection attributes dept and dname, since none of the rules for ownership and subset connections is satisfied. An EMPLOYEE tuple is said to reference a DEPARTMENT tuple. The multiplicity of a reference connection is n:1; an EMPLOYEE tuple can reference only one DEPARTMENT tuple, while a DEPARTMENT tuple can be referenced by several EMPLOYEE tuples. We extend the definition of some of the connection types described above for the structural data model and define some new connection types. Our goal is to cover cases that may exist in a relational schema that are not covered by the above three connection types. In what follows, we first extend the definition of the ownership connection and then define some new connection types. In some cases, a foreign key A2 of a relation A1 (matching A1 of R1) is not part of the primary key of that relation, yet, there is an existence dependency of tuples of A1 on tuples of R1 expressed as a NON-NULL constraint on A2. This case becomes clear if we replace carno of EMP.sub.-- CAR with registrationNo whose values are unique across all tuples. The definition of EMP.sub.-- CAR becomes: EMP CAR (registrationNo, ename, decal.sub.-- date) where registrationNo is the primary key and ename is a foreign key which is constrained to be NON-NULL. An EMP.sub.-- CAR tuple can not exist in the database without being related to an EMPLOYEE tuple, and therefore, there is an ownership connection from EMPLOYEE to EMP.sub.-- CAR. To handle this case, we modify the definition of an ownership connection type to the following: An ownership connection from R1 to A1 exists if O-R NON-NULL(A2). Below, we introduce the definition of two new connection types. 4. A set-equality connection exists between R1 and R2 if AZ PK(A1) and A1=FK(R1) and A1=PK(R1) and A2=FK(R2). The last two predicates in this list of predicates is the common rule that the other three connection types must satisfy (see above). This connection type implies that the set of entities described in R1 must be equal to the set of entities described in A1. A practical example of this case can be illustrated if we define the relation: EMP.sub.-- INSU (ename, policy, date) which adds life insurance information about employees to the above database. To satisfy a requirement that every employee in the company must have life insurance, then ename in EMP.sub.-- INSU should be modeled as a foreign key attribute matching ename in EMPLOYEE, and ename in EMPLOYEE should also be modeled as a foreign key matching ename in EMP.sub.-- INSU. 5. A set-intersection connection exists between R1 and A1 if A1=PK(R1) and A2=PK(A1). As a special case, the common rule that is applicable to the above four connection types, does not apply to this connection type. A set-intersection connection between two relations implies that the two sets of entities described in the two relations do not have to be equal and their intersection may or may not be NULL. A procedure for generating a C++ schema out of a relational schema is discussed next. We use the following relational schema to illustrate this procedure, assuming that the foreign key and NON-NULL constraints described above are applicable.
______________________________________
DEPARTMENT (dname, floor, budget)
EMPLOYEE (ename, dept, project)
HOURLY.sub.-- EMP (ename, wage, skills)
EMP.sub.-- CAR (registrationNo, ename, decal.sub.-- date)
______________________________________
The generated C++ schema is as follows:
______________________________________
/* define a parameterized set class */
template <class type>
class set {
/* . . data members and member functions definitions . . */
class C.sub.-- DEPARTMENT {
private:
char *dname;
int floor;
float budget;
};
class C.sub.-- EMP.sub.-- CAR {
private;
float registrationNo;
char *decal.sub.-- date;
};
class C.sub.-- EMPLOYEE {
private;
char *ename;
C.sub.-- DEPARTMENT *dept;
SET<C.sub.-- EMP.sub.-- CAR>cars;
char *project;
};
class C.sub.-- HOURLY.sub.-- EMP; public C.sub.-- EMPLOYEE {
private:
float wage;
char *skills;
};
______________________________________
An equivalent schema represented graphically is shown in FIG. 5, where small circles denote classes, and thick and thin links denote type-subtype and data member (attribute) relationships, respectively. The following is a procedure that consists of seven steps to map relations and connection types to C++ constructs. Step one is create a class corresponding to each relation. For simplicity, we use relation names preceded by the string "C" to denote class names. Accordingly, the classes corresponding to the above relational schema are C.sub.-- DEPARTMENT C.sub.-- EMPLOYEE, C.sub.-- HOURLY.sub.-- EMP, and C.sub.-- EMP.sub.-- CAR (FIG. 5). Step two is each non-foreign key attribute of a relation becomes an attribute of the corresponding class (e.g., dname, floor, and budget attributes of DEPARTMENT become attributes of C.sub.-- DEPARTMENT). In C++, attributes are referred to as data members. The data types of the data members defined in this step are selected from the built-in C++ types (int, char, float, etc.). If the type of a data member is one of the C++ built-in types, it must be as close as possible to the data type of the corresponding relational attribute. Since, normally, the set of types in a relational DBMS does not exactly match the set of C++ built-in types, type conversion needs to be performed when objects are retrieved. (In case of Oracle, this type conversion is performed by the Oracle C Precompiler.) Step three is we use the terms referencing and referenced class to denote classes corresponding to a referencing and referenced relations, respectively. There is a reference connection from relation EMPLOYEE to relation DEPARTMENT, therefore C.sub.-- EMPLOYEE is a referencing class and C.sub.-- DEPARTMENT is a referenced class. In this step of the procedure, a data member is created in every referencing class. The type of this data member is a pointer to the referenced class. Hence, in FIG. 5, the data member dept of C.sub.-- EMPLOYEE is defined as a pointer to C.sub.-- DEPARTMENT. Step four is we use the terms owner and owned class to refer to classes corresponding to an owner and owned relations, respectively. Therefore, C.sub.-- EMPLOYEE is an owner class and C.sub.-- EMP.sub.-- CAR is an owned class. In this step, a data member is created for every owner class. The type of this data member is "SET of" the owned class. Hence, the data member cars of C.sub.-- EMPLOYEE is defined as SET of C.sub.-- EMP.sub.-- CAR. Step five is a subset connection between two relations maps to a type subtype relationship between the classes corresponding to the two relations. Therefore C.sub.-- HOURLY.sub.-- EMP is defined in FIG. 5 as a subtype of C.sub.-- EMPLOYEE. Step six is for any two classes corresponding to two relations connected by a set-equality connection, a new class is created to act as a super class of these two classes. The data members common to the two classes are moved up to the super class. Step seven is any two classes corresponding to two relations connected by a set-intersection connection are handled in the same way as step six above. Another possible mapping for step six above is to merge the two classes into one class whose set of attributes is the union of the two sets of attributes of the two classes. For example, C.sub.-- EMPLOYEE and C.sub.-- EMP.sub.-- INSU classes can be merged to form one class whose data members are ename, dept, project, policy, and date. However, we keep the schema described in step six as the default option and provide the DBA with the capability of interacting with the SGM module (FIG. 4) to specify other mapping options. The advantage of this is that step seven and the default option in 6 can be treated uniformly, which simplifies the initial prototype implementation. (Note that merging the two classes in step seven will result in loosing some semantic information about objects.) In addition, a database designer input is necessary during the mapping process for the following reasons. The first reason is to chose the names of classes and data members in the C++ schema. By default, data member names are the same as the attribute names, and class names are the same as the relations names preceded by the string "C.sub.-- ". A user (or a DBA) may prefer to chose different names. In this case, a name manager (part of O-R Gateway) needs to store the mappings between the C++ schema names and the underlying relational schema names. Query processing makes use of these name mappings. The second reason is to add the definition of necessary public member functions to the generated C++ schema. These functions define the behavior of the C++ objects. The third reason is to provide any information that is missing from the underlying relational DBMS (not all DBMSs support all functionalities) such as foreign-key relationships. The mapping rules and the DBA guidelines and choices will be recorded in the system since they will also guide the operation of the QTM and OGM modules (see FIG. 4). Note: NON-NULL and primary key constraints are supported by almost all commercial relational database systems. Foreign key constraints are supported by DB2, SYBASE 4.0, and INGRES 6.3. Oracle provides syntax for defining foreign key constraints and stores the definitions in its dictionary but does not enforce them (as of Oracle Version 6.0). Since the problem being considered is sizable, we only focus in this report on retrieval operations (queries). Further work is needed to support transactions that include update operations. One of the key features of object queries as supported by many existing object query languages is the use of path expressions. If Class1 has a data member whose type is Class2 and Class2 has a data member whose type is Class3 in some schema, then "Class1.Class2.Class3" is a path expression that starts at Class1 and ends at Class3. Path expressions enable logical navigation at the schema level and can be used in specifying predicates or identifying the list of attributes to be retrieved. For example, "Class1.Class2.Class3==value " is an associative predicate that identifies all the Class1 objects whose related Class3 objects are equal to the given value. Different query languages may use different syntax to express path expressions. For example, the above path expression is expressed in functional query languages as "Class3(Class2(Class1))." Supporting path expressions in an object query language does not violate the principle of physical data independence since the query optimizer may independently choose the appropriate access paths at the physical level in order to evaluate a given query. Path expressions are used for querying complex objects by specifying predicates on data members that are deeply nested within the structure of these objects. In O-R Gateway, we support the use of object queries in C++ programs that have the following structure: T1 -SELECT <class-name> or - <path expressions rooted at a single class>? -FROM <range variable declarations>? -WHERE <predicates that may involve path expressions>.? - Where the FROM clause is optional. This syntax is similar to that of OQL C++! developed at Texas Instruments (J. Blakeley, C. Thompson, and A. Alashqur, "Strawman Reference Model for Object Query Languages," Proceedings of the X3/SPARC/DBSSG OODB Task Group Workshop on Standardization of Object-Oriented Database Systems, Atlantic City, N.J., May 22, 1990. A revised version is also in the Proceedings of the International Journal on Computer Standards and Interfaces, 1991) for the Zeitgeist Object-Oriented Database System (Steve Ford, et al., "Zeitgeist: Database Support for Object-Oriented Programming," in the Proc. of the 2nd Int'l. Workshop on Object-Oriented Database Systems, 1988.). This syntax is upward compatible with that of the relational language SQL. To translate an object query to an SQL query, we need to translate path expressions and expressions involving inheritance relationships to equivalent SQL joins. These joins are over matching key and foreign key attribute values of the relations corresponding to the classes referenced in the object query. A From clause in the SQL query will list the referenced relations. The Query Translation Mule performs such translations. For illustration, the following are four example object queries and their equivalent SQL queries. The first two of these queries contain path expressions, the third query is to demonstrate how a FROM clause in an object query can be useful, and the fourth query is to demonstrate how inheritance is handled. (Example queries in this section are expressed against the relational schema and its equivalent C++ schema described in a proceeding section.) Query 1:
______________________________________
SELECT C.sub.-- EMPLOYEE.ename, C.sub.-- EMPLOYEE.project
WHERE C.sub.-- EMPLOYEE.dept->floor = 2;
QTM translates this query to the following SQL query:
SELECT EMPLOYEE.ename, EMPLOYEE.project
FROM EMPLOYEE, DEPARTMENT
WHERE EMPLOYEE.dept = DEPARTMENT.dname AND
DEPARTMENT.floor = 2;
______________________________________
The path expression "C.sub.-- EMPLOYEE.dept" in the WHERE clause of the object query is translated to the SQL join predicate "EMPLOYEE.dept=DEPARTMENT.dname". Syntactic simplification can be made in both object and SQL versions of a query by removing redundant information. For example, the attributes need not be qualified by their relation names in the SELECT clause of the above SQL query, therefore, reducing it to "SELECT ename, project". This is because each of ename and project is an attribute of exactly one of the relations referenced in the FROM clause. However, in this report, we use full syntax for the sake of clarity. Query 2 below includes a path expression in the SELECT clause. Query 2:
______________________________________
SELECT C.sub.-- EMPLOYEE.ename,
C.sub.-- EMPLOYEE.dept->budget
WHERE C.sub.-- EMPLOYEE.project = `OODB`;
Translates to the following SQL query:
SELECT EMPLOYEE.ename, DEPARTMENT.budget
FROM EMPLOYEE, DEPARTMENT
WHERE EMPLOYEE.dept = DEPARTMENT.dname AND
EMPLOYEE.project = `OODB`
______________________________________
The following object query demonstrates one of the ways in which the optional FROM clause can be used (note that the FROM clause is not optional in SQL). Query 3:
______________________________________
SELECT E1.ename, E2.ename
FROM C.sub.-- EMPLOYEE, E1, E2
WHERE E1.project = E2.project
______________________________________
This query selects pairs of employee names for employees co-working on the same project. E1 and E2 are two range variables whose type is C.sub.-- EMPLOYEE. This query is translated to the following SQL query:
______________________________________
SELECT E1.ename, E2.ename
FROM EMPLOYEE.E1, EMPLOYEE.E2
WHERE E1.project = E2.project
______________________________________
Since a subclass inherits the members of its superclass, an object query can reference the inherited data members directly. The following is an example query that involves inheritance. Query 4:
______________________________________
SELECT C.sub.-- HOURLY.sub.-- EMP.project, C.sub.-- HOURLY.sub.--
EMP.wage
WHERE C.sub.-- HOURLY.sub.-- EMP.wage > 10k
______________________________________
C.sub.-- HOURLY.sub.-- EMP inherits project form C.sub.-- EMPLOYEE, therefore project is referenced in the SELECT clause as an ordinary data member of C.sub.-- HOURLY.sub.-- EMP. The following SQL query accounts for this by performing an equijoin between the relations EMPLOYEE and HOURLY.sub.-- EMP over ename values.
______________________________________
SELECT EMPLOYEE.project, HOURLY.sub.-- EMP.wage
WHERE EMPLOYEE.ename = HOURLY.sub.-- EMP.ename AND
HOURLY.sub.-- EMP.wage > 10k
______________________________________
QTM will use the schema translation rules and DBA guidelines that are used by the Schema Generation Module (SGM) in the process of translating an object query to SQL. QTM will, based on these transformation rules, identify project as an attribute of EMPLOYEE and not of HOURLY.sub.-- EMP relation even though it is referenced as a data member of the C.sub.-- HOURLY.sub.-- EMP class in the object query. QTM will also, based on these transformation rules, identify ename as the attribute that links the two relations EMPLOYEE and DEPARTMENT, and therefore it generates the predicate "EMPLOYEE.ename-HOURLY.sub.-- EMP.ename" as part of the WHERE clause of the SQL query. If an object query is included in an OQL/C++ program that is processed by O-R Gateway, the translated SQL query will appear in the generated SQL/C++ program preceded by the necessary host variable declarations and Oracle connect statements, and will be followed by code that constructs C++ objects out of the retrieved relational data. This is performed by the OGM, which generates the code that assembles data retrieved from the relational database to construct complex C++ objects. An OQL/C++ application programmer needs to declare an aggregate object type such as an array or a set (or a pointer to an aggregate object) to hold the result of an object query. The data type of the elements of the aggregate object is either the class whose objects are to be retrieved or a pointer to it. Therefore, Query 1 above would be actually written in an OQL/C++ application program as follows (Note: the following syntax may not be very accurate, its goal is to convey the general meaning only):
__________________________________________________________________________
/* EMPAR is declared as a pointer to an array of pointers to C.sub.--
EMPLOYEE
objects */
C.sub.-- EMPLOYEE** EMPAR new C.sub.-- EMPLOYEES* 100!;
EMPAR =
SELECT C.sub.-- EMPLOYEE.ename, C.sub.-- EMPLOYEE.project
WHERE C.sub.-- EMPLOYEE.dept->floor = 2;
__________________________________________________________________________
This query will assign pointers to the first 100 C.sub.-- EMPLOYEE objects retrieved to the cells of the array EMPAR. Using C++ Template Class definition capability C++ Version 3.0 (Stanley Lippman, C++ Primer, Addison-Wesley Publishing Company, 1991)!, one can declare a set whose type is pointer to C.sub.-- EMPLOYEE objects as follows:
__________________________________________________________________________
/* SET is declared somewhere else in the program as a Template Class */
SET <C.sub.-- EMPLOYEE*>*EMPSET = new SET <C.sub.-- EMPLOYEE*>;
__________________________________________________________________________
EMPSET can then be used to hold pointers to the employee objects returned by the query. The Object Generation Module (OGM) will add the necessary code that creates the C++ objects, assigns attribute values retrieved from the database to data members of objects, and adds the necessary CONNECT statements to connect to Oracle. The C++ code generated by the QTM and OGM to implement the above query and assign the result to the array EMPAR is as follows:
__________________________________________________________________________
EXEC SQL BEGIN DECLARE SECTION;
VARCHAR uid 20!;
VARCHAR pwd 20!;
VARCHAR ename.sub.-- var 15!;
VARCHAR project.sub.-- var 15!;
int floor;
EXEC SQL END DECLARE SECTION;
EXEC SQL INCLUDE sqlca.h;
EXEC SQL INCLLTDE oraca.h;
strcpy(uid.arr,"USERNAME");
uid.len = strlen(uid.arr);
strcpy(pwd.arr,"PASSWORD");
pwd.len = strlen(pwd.arr);
EXEC SQL CONNECT :uid IDENTIFIED BY :pwd;
EXEC SQL DECLARE emp.sub.-- objects CURSOR FOR
SELECT EMPLOYEE.ename,EMPLOYEE.project
FROM EMPLOYEE,DEPARTMENT
WHERE EMPLOYEE.dept = DEPARTMENT.dname AND
DEPARTMENT.floor = 2;
EXEC SQL OPEN emp-object;
EXEC SQL WHENEVER NOT FOUND GOTO end.sub.-- of.sub.-- fetch;
/* retrieve data for the first 100 employee objects */
for (int = 0; i < 100; ++i) {
EXEC SQL FETCH emp.sub.-- object INTO :ename.sub.-- var; :project.sub.--
var;
ename.sub.-- var.arr ename.sub.-- var.len! = `/0`;
project.sub.-- var.arr project.sub.-- var.len! = `/0`;
/* create the i'th employee objects */
EMPAR i! = new C.sub.-- EMPLOYEE;
/* assign attribute values to corresponding object data members
*/
EMPAR i!->ename = ename.sub.-- var;
EMPAR i!->project = project.sub.-- var;
}/* end of for */
end.sub.-- of.sub.-- fetch:
EXEC SQL CLOSE emp.sub.-- object;
EXEC SQL COMMIT WORK RELEASE;
/* log off the Oracle database. */
__________________________________________________________________________
In Query 2, several C.sub.-- EMPLOYEE objects may share the same C.sub.-- DEPARTMENT object. In this case, we need to install a C.sub.-- DEPARTMENT object only once in main memory and make all related C.sub.-- EMPLOYEE objects point to it. This is done in O-R Gateway by maintaining an association table(s) that stores the relationship between a pointer to the object as represented in main memory and the key attribute value of the corresponding tuple as stored in the relational database. The association table is maintained as a C++ data structure and has the following format: the C++Pointer field stores a pointer to an object that has been installed in memory. Rel-Key-Attr-Value stores the key attribute value of the relational tuple that corresponds to the C++ object. Counter stores the number of other C++ objects in main memory that share (point to) this object. This information may be useful for garbage collection and for supporting update operations. In order to maintain this table, the generated SQL query needs to retrieve the key attribute value even if it is not referenced in the object query. The following shows Query 2 again, and the SQL query that is actually generated by the QTM.
______________________________________
SELECT C.sub.-- EMPLOYEE.ename,
C.sub.-- EMPLOYEE.dept->budget
WHERE C.sub.-- EMPLOYEE.project = `OODB`;
______________________________________
The generated SQL query:
______________________________________
SELECT EMPLOYEE.ename, DEPARTMENT.budget,
DEPARTMENT.dname
FROM EMPLOYEE, DEPARTMENT
WHERE EMPLOYEE.dept = DEPARTMENT.dname AND
EMPLOYEE.project = `OODB`
______________________________________
Whenever a new <ename, budget, dname> tuple is fetched, O-R Gateway assigns ename to a data member of a newly created C.sub.-- EMPLOYEE object. O-R Gateway then looks up the Rel-Key-Attr-Value column of the association table for a dname entry which has the same value as the one just retrieved. If a dname value exists, this means that the needed C.sub.-- DEPARTMENT object has already been installed in memory. In this case, a pointer from the newly created C.sub.-- EMPLOYEE object to this C.sub.-- DEPARTMENT object will be established by using the C++Pointer field of the association table, the budget data member is assigned the retrieved value, and the Counter field is incremented by one. If on the other hand, there is no dname entry in the Rel-Key-Attr-Value that has the same value, then a new C.sub.-- DEPARTMENT object is created, its budget data member is assigned the value retrieved by the SQL query, the C.sub.-- EMPLOYEE object is made to point to it, and a new tuple is inserted in the association table to related a pointer to this C.sub.-- DEPARTMENT object to the dname key attribute value. The OGM generates code that performs the above functions and maintains the association table up-to-date. The association table idea can be used to enable more than one query in the same OQL/C++ application program to retrieve information about objects of the same class. For example, both Query 1 and Query 2 above retrieve information about objects of the class C.sub.-- EMPLOYEE. These two queries can be used in the same OQL/C++ application program. In this case, an association table will relate pointers to C.sub.-- EMPLOYEE objects retrieved by the first query to their key attribute values. When the second query is executed, the association table is checked for every <ename, budget> tuple retrieved by the SQL query to see if a C.sub.-- EMPLOYEE object corresponding to the ename value has already been installed in memory by the first object query. If so, the C.sub.-- EMPLOYEE.dept->budget data member of this object will be assigned the value retrieved by the second query. Otherwise, a new C.sub.-- EMPLOYEE object is installed and the association table is updated to reflect the new situation. An object faulting mechanism can be implemented as part of O-R Gateway. Whenever a pointer to an object that is not in the C++ environment (as indicated by the association table) is traversed in the C++ program, a query will be generated to retrieve the relevant relational data which will then be used to seamlessly install the referenced object in the C++ environment. The association table will be updated accordingly. Although the present invention and its advantages have been described in detail by way of the preferred embodiment, it is to be understood that this is for example only and that various changes, substitutions and alterations can be made without departing from the spirit and scope of the invention as defined by the appended claims.
|
Same subclass Same class Consider this |
||||||||||
