Method for making data objects having hidden pointers persistent5590327Abstract A method for making data objects having hidden pointers persistent is disclosed. A pre-existing process for creating data objects is modified so as to selectively inhibit both the allocation of memory space for the data object and the initialization of data within the data object. A data object with hidden pointers created by a previous program invocation is retrieved into memory by a subsequent program invocation. The modified object creation process is then advantageously applied to the data object. As a result, the hidden pointers within the data object are initialized without allocating new memory or disturbing the data within the data object. Where the object creation process includes the execution of a user supplied initialization function, the initialization function is modified so as to inhibit the initialization of data within the data object. A global flag is set by the object creation process and tested within the initialization function. If the initialization function finds the flag to be set, execution of the data initialization code is avoided. Claims We claim: Description FIELD OF THE INVENTION
______________________________________
class person {
public:
char firstname[MAX];
char lastname[MAX];
int age;
virtual void print();
};
class student: virtual public person {
public:
char university[MAX];
virtual void print();
};
class employee: virtual public person {
public:
char company[MAX];
int salary;
virtual void print();
};
class student.sub.-- employee: public employee,
public student {
public:
int maxhours;
virtual void print();
};
______________________________________
In the above code, the specifications for each of the four classes declares one or more data members. Each specification also includes the declaration of a virtual function with the name print. In particular, class person is declared to include three data members (a first name, a last name, and an age), and an as yet unspecified version of a virtual function named print. Class student is declared to include one data member (a university name) in addition to those included in its base class person, as well as an as yet unspecified version of a member function named :print. Class employee is declared to include two data members (a company name and a salary) in addition to those included in its base class person, as well as an as yet unspecified version of a member function named print. And class student.sub.-- employee is declared to include one data member (a maximum number of working hours) in addition to those included in each of its base classes, class employee and class student, as well as an as yet unspecified version of a member function named print. In the above code, each base class is declared to be a virtual base class. As described above, this declaration ensures that only one copy of the data members of the base class appears in each instance of any derived class. The virtual base class is shared by all the components of the inheritance hierarchy that specify the given class as a virtual base class. Declaring a base class as virtual has no effect with single inheritance (as in the above examples of class employee and class student), but does make a difference in the case of multiple inheritance (as in the above example of class student.sub.-- employee). The following C++ code illustrates definitions of the virtual function print, as might be provided for each of the four classes defined in the above code:
______________________________________
void person::print()
cout << firstname << " " << lastname
<< ", age = " << age << endl;
}
void student::print()
{
person::print();
cout << "student at " << university << endl;
}
void employee::print()
{
person::print();
cout << "employed at " << company << endl;
}
void student.sub.-- employee::print();
{
person::print();
cout << "student at " << university << endl;
cout << "employed at " << company << endl;
}
______________________________________
In the above code a version of virtual function print is defined for each class which appropriately prints out relevant information regarding the data object which is passed to it as an argument. Such relevant information includes name and age for the class person, name, age and university for the class student, name, age and company for the class employee, and name, age, university and company for the class student.sub.-- employee. When invoking print through a pointer (or reference) which has been declared to be a pointer to an object of type person, the actual virtual function to be called will be determined at run time according to the actual type of the referenced object. For example, consider the following program code which illustrates the use of the print virtual function:
______________________________________
main()
person *pp = new person;
student *ps = new student;
. . .
pp.fwdarw.print();
ps.fwdarw.print();
. . .
pp = ps;
pp.fwdarw.print();
. . .
}
______________________________________
The first pp->print function call in the above program invokes the print function defined for class person, since pp points to a person object. As a result, name and age information will be printed by the version of the virtual function print defined for class person. Similarly, the ps->print function call invokes the print function of class student. However, the second pp->print function call invokes the print function of class student even though the declared data type of pp is a pointer to person, because pp was assigned a pointer to a student object in the preceding line of code. The Use of Virtual Base Classes in C++ As described above, the purpose of virtual base classes is to allow the sharing of base class data members in cases of multiple inheritance. In the above example, class person is a virtual base class of both the student class and the employee class, and class student.sub.-- employee is derived from both the student class and the employee class. For each base class of a derived class object, it is necessary for a portion of the representation of the derived class object to be devoted to the representation of the data associated with that base class. For example, a student object includes a person "sub-object," as does an employee object. Furthermore, a student.sub.-- employee object includes both a student sub-object and an employee sub-object. However, class person is a virtual base class of both class student and class employee. Therefore, every student.sub.-- employee object should advantageously contain only one instance of class person instead of two. Both the employee sub-object and the student sub-object will share this instance of person. Consider the following C++ program code:
______________________________________
main()
student.sub.-- employee *se;
int a, b;
. . .
se.fwdarw.student::age = a;
. . .
se.fwdarw.employee::age = b;
. . .
}
______________________________________
Because se->student and se->employee share the same person sub-object (since the person base class was declared to be a virtual base class of each), se->student::age and se.sub.-- >employee::age both refer to the same data component, i.e., se->person::age. Therefore, the C++ implementation must ensure that these data items are, in fact, one and the same. C++ Implementation of Virtual Functions and Virtual Base Classes FIGS. 2-5 illustrate the memory representations of an object of the type of each of the four above-defined classes in a typical C++ implementation. FIG. 2 shows the memory allocation for a person object. FIG. 3 shows the memory allocation for a student object. FIG. 4 shows the memory allocation for an employee object. And FIG. 5 shows the memory allocation for a student.sub.-- employee object. As can be seen from the illustrations, each object of a class that has virtual functions contains a hidden pointer that points to a virtual function table, called the vtbl. The vtbl contains the addresses of the specific virtual functions to be called for the given object. In the case of derived class objects, the vtbl of a base class sub-object also contains offsets (deltas) that are used to find the address of the derived class object given the address of the base class sub-object. In FIG. 2, vtbl pointer 22 is the first entry in the memory layout of person object 21. Also included in the memory layout are, of course, the entries for the data members of the class, firstname 23, lastname 24 and age 25. Vtbl pointer 22 points to person vtbl 26, which contains the address of the specific version of the virtual function which is to be invoked when the given named function (i.e., print) is applied to this object, namely &person::print. In FIG. 3, the memory layout of student object 31 is shown. The first portion of the layout comprises data specific to the class student. It begins with vtbl pointer 32 which points to student vtbl 35. Next, the layout includes vbase pointer 33. Since person is declared as a virtual base class of student, references to the person component of a student object are resolved by an indirection through a pointer. This pointer is called the vbase pointer. Note that vbase pointer 33 in FIG. 3 points to the second portion of the layout of student object 31, namely person sub-object 36. In particular, person sub-object 36 comprises vtbl pointer 22 which points to person vtbl 28, and the entries for the data members of the person sub-class, namely firstname 23, lastname 24 and age 25. After vbase pointer 33 and before person sub-object 36, the memory layout of student object 31 includes the single specific data member of class student, university 34. Student vtbl 35 contains the address of the appropriate version of the print function which is to be called when it is invoked with student object 31 as its argument, namely&student::print. In addition, person vtbl 28 not only contains the address&student::print, but also contains the offset that is used to find the address of the derived class object (student object 31), given the address of the sub-object (person sub-object 36). Specifically, each vtbl for a base sub-class contains an offset representing the relative location of the sub-object in the memory layout of the derived object. For example, person vtbl 28 includes the value delta (student, person), which is equal to the difference between the address of vtbl pointer 22 and vtbl pointer 32 in student object 31. When a pointer which has been declared to be a pointer to an object of a base class type is assigned to point to an object of a derived class type, the pointer must point to the base class sub-object within the derived class object. Otherwise, the semantics of C++ would be violated. For example, after the assignment "pp=ps;" in the above illustrated program code, the pointer pp 40 (declared as a pointer to a person object) is pointing to the head of person sub-object 36, while the pointer ps 39 (declared as a pointer to a student object) is pointing to the head of student object 31. Subsequent to the execution of the assignment "pp=ps;" in the illustrated program code, the function call "pp->print();" is executed. Since pointer pp 40 points to person sub-object 36, this call requires an indirection via vtbl pointer 22 to person vtbl 28. The address of the appropriate function, namely&student::print will be retrieved from person vtbl 28. However, student:::print: requires that it receive the address of a student object as its argument. This address is therefore calculated by subtracting from pointer pp 40 the value of delta (student, person) as stored in person vtbl 28. In a similar fashion to the illustration of student object 31 in FIG. 3, FIG. 4 illustrates the memory layout of employee object 41. The first portion of the layout comprises data specific to the class employee. It starts with vtbl pointer 42 which points to employee vtbl 45. Next, the layout includes vbase pointer 33 which points to the second portion of the layout of employee object 41, namely person sub-object 36. Person sub-object 36 comprises vtbl pointer 22 which points to person vtbl 29, and the entries for the data members of sub-class person, namely firstname 23, lastname 24 and age 25. After vbase pointer 33, the memory layout of employee object 41 includes the specific data members of class employee, namely, company 43, and salary 44. Employee vtbl 45 contains the address of the appropriate version of the print function which is to be called when it is invoked with employee object 41 as its argument, namely&employee::print. In addition, person vtbl 29 not only contains the address&employee::print, but also contains the offset that is used to find the address of the derived class object (employee object 41), given the address of the base class sub-object (person sub-object 36). This value is delta (student,person), which is equal to the difference between the address of vtbl pointer 22 and vtbl pointer 42 in employee object 41. As pointed out above, because person is declared as a virtual base class of student, references to the person component of a student object require an indirection through a pointer, called the vbase pointer. In the case of student object 31 and employee object 41 as illustrated in FIG. 3 and FIG. 4, respectively, this indirection may seem unnecessary. In these cases, there is only one vbase pointer, which could therefore readily be replaced by a fixed offset. Such indirection is required, however, in order to implement sharing of a virtual base class in objects of types specified using multiple inheritance. FIG. 5 illustrates the memory layout of student.sub.-- employee object 51, a class for which such multiple inheritance is specified. In particular, the illustrated memory layout of student.sub.-- employee object 51 begins with the representation of employee sub-object 57, which appears as it does in FIG. 4. Specifically, it comprises vtbl pointer 42, which points to employee/student.sub.-- employee vtbl 54. Employee/student.sub.-- employee vtbl 54 contains the address of the appropriate version of the print function which is to be called when it is invoked with student.sub.-- employee object 51 as its argument, namely&student.sub.-- employee::print. Note that the same vtbl is used for the employee sub-object as is used for the student.sub.-- employee object as a whole. This optimization is utilized in most C++ implementations. Specifically, it enables the sharing of the vtbl of a derived class object with its first non-virtual base class sub-object, since both objects can be assigned the same address. It is as a result of this optimization that the specific data member for the student.sub.-- employee object itself is deferred to later in the memory layout. After vtbl pointer 52 there is a first instance of vbase pointer 33 pointing to the portion of student.sub.-- employee object 51 which represents the person sub-object, namely, person sub-object 36. Following this are data members company 43 and salary 44, which are the data members specific to employee sub-object 57. Next is the representation of student sub-object 58, which appears as it does in FIG. 3. Specifically, it includes vtbl pointer 32, a second instance of vbase pointer 33, and the specific data member of the student sub-object, university 34. Vtbl pointer 32 points to student vtbl 55 which contains the address of the appropriate version of the print function to be called when it is invoked with student.sub.-- employee object 51 as its argument, namely&student.sub.-- employee::print. The second instance of base pointer 33 also points to the portion of student.sub.-- employee object 51 which represents the person sub-object, person sub-object 36. Note that in order to implement virtual base class sharing, it is necessary for both instances of vbase pointer 33 to point to the same person sub-object 36. Following student sub-object 58 is the specific data member for the student.sub.-- employee object itself, maxhours 53. Finally, the single representation of the person sub-object, person sub-object 36, appears, including vtbl pointer 22 pointing to person vtbl 56, which once again contains the address of the appropriate version of the print function to be called when it is invoked with student.sub.-- employee object 51 as its argument, namely &student.sub.-- employee::print. Persistence and the Hidden Pointers Problem As described above, the ability to save data objects, e.g., on disk, and later to retrieve them in a subsequent program invocation requires that the data objects be persistent. In other words, they must remain valid across program invocations. Virtual functions and virtual base classes have an impact on persistence because of the hidden vtbl and vbase pointers generated by C++ compilers in implementing these facilities. As seen above, virtual function invocations involve indirections that uses vtbl pointers to access entries in the virtual function tables. References to the components of virtual base classes must follow vbase pointers. The vtbl and vbase pointers are hidden pointers because they represent implementation related information and are invisible to the user. In other words, there is no mechanism in C++ by which the user can directly manipulate these pointers. Unfortunately, hidden pointers are volatile since they are not valid beyond the invocation of the program that created them. Saving objects containing hidden pointers on disk and subsequently reading these objects back into memory in another program (or in a subsequent invocation of the same program) will result in the hidden pointers becoming invalid. Absent these pointers being "fixed", a reference to a virtual function or a component of a virtual base class will likely lead to an illegal (if not nonsensical) memory reference. Of course, the same observation holds true for the values of data members that are themselves volatile pointers, But in the case of data members, the programmer can ensure that they are not used as pointers with invalid values. For example, alternative representations (e.g., indices) may be used. The programmer is fully aware of the existence of these data members and can directly manipulate them in such a manner so as to avoid any memory reference problem. In the case of hidden pointers, however, the user has no such direct control. Fixing the Hidden Pointers with a Modified Object Creation Process In accordance with an illustrative embodiment of the present invention, a method is provided for making data objects with hidden pointers persistent. In particular, each data object of a user-defined class is created in an object oriented language by applying some pre-existing object creation process. In C++ this process is known as the new operator. The method of one illustrative embodiment of the present invention involves the modification of this object creation process to limit the functionality thereof. This modified process is then applied to data objects which have been read into memory (e.g., from disk) by a program invocation subsequent to the program invocation that created the data objects. The normal operation of the object creation process which results from the invocation of the new operator in C++ is illustrated in FIG. 6. Step 61 comprises the allocation of the appropriate quantity of memory space (e.g., by assigning memory space from free storage areas such as the "heap") as is required to represent the given data object. (See, for example, FIGS. 2-5.) This quantity of memory is readily determined based on the specification of the class as provided by the user. Step 62 comprises initialization of the data members of the class (and of any sub-classes) in the memory space allocated. This step is optional in that the user may or may not provide information as to which data members are to be initialized and to what values, whenever a new data object is created. In C++, such initialization is typically accomplished by invoking a user-supplied initialization routine, known as a "constructor." If the user does not supply any such constructor, the data members will not be initialized to any particular values upon the object's creation. Finally, step 63 comprises the initialization of all hidden pointers contained in the data object, and, if necessary, the creation of any corresponding virtual function tables. This step requires no information from the user beyond that contained in the specification of the class (and of any sub-classes). The C++ run-time system will know to what address each of these pointers must point, based on the class specification and the system's own arrangement of various information (e.g., the code of the virtual functions) in its own memory space. The order of performance of step 62 and step 63 is of no importance. The execution of the new operator may perform either step first or may even intertwine them. In particular, the combination of step 62 and step 63 is often considered to be the application of the (overall) constructor for the given data object. Specifically, the C++ compiler adds the code necessary to implement step 63 to the user supplied constructor for performing step 62. If no user supplied constructor exists, the resultant constructor consists only of the added code. FIG. 7 illustrates a modification to be made to the object creation process (i.e., the functionality of applying the new operator) according to one illustrative embodiment of the present invention. In particular, step 61 and step 62 are avoided, and only step 63 is performed in the modified version of the process. In this manner, the hidden pointers are advantageously initialized to valid address values, and yet no new data object is created (i.e., no memory is allocated). Moreover, the values of the data members of the data object are left intact. FIG. 8 illustrates the application of the modified object creation process of FIG. 7 to data objects which have been retrieved into memory by a program invocation subsequent to the one that created the data object. In particular, step 71 retrieves the previously created data object into memory. As a result, the desired object is in memory, even though it contains invalid pointers. Then, modified object creation process 72, which comprises only step 63 (and not step 61 or step 62), is applied to the data object to fix the hidden pointers. In accordance with another aspect of the present invention, the process illustrated in FIG. 7 may be created directly without modifying an existing object creation process. Specifically, hidden pointers contained in existing data objects are initialized without allocating new memory or disturbing the contents of data members within the objects. In this manner, a process equivalent to that created by the modification procedure described above is produced. That is, the process comprises step 63 as shown in FIG. 7. This directly created process may be applied to existing data objects to fix hidden pointers as illustrated in FIG. 8, in the same manner as described above for the modified object creation process. C++ Implementation of the Modified Object Creation Process As pointed out above, the operator new is the C++ object creation mechanism. Therefore, it is necessary to modify the normal process of invoking this operator to avoid the allocation of memory and the initialization of data members. To avoid the allocation of memory the new operator is overloaded by defining a new version of operator new. As is well known by C++ programmers of ordinary skill, a function name is said to be "overloaded" when it has two or more distinct meanings. Specifically, the intended meaning of any particular use is determined by its context. In C++, two or more functions can be given the same name provided that each signature (argument structure) is unique, in either the number or the data types of their arguments. In particular, the address of the location where the retrieved data object has been stored will be passed to this new version of operator new. The function will merely return this same address as its result, without allocating any new storage. The function call will, however, cause the appropriate constructor to be invoked. The following C++ code defines the overloaded operator new:
______________________________________
class .sub.-- ode ( );
void* operator new(size.sub.-- t, .sub.-- ode *p)
return (void *) p;
}
______________________________________
Class.sub.-- ode is a unique data type defined to ensure that the overloaded definition of new is invoked. Note that C++ requires that the first parameter of an overloaded definition of function operator new to be of type size.sub.-- t and also requires that new return a value of type void *. Suppose, for example, that p points to an employee object that has been retrieved into memory. Then the overloaded definition of operator new is applied to the given employee object by the following line of C++ code: new ((.sub.-- ode *) p) employee; This invocation of operator new will not allocate any new storage, but it will invoke the argumentless constructor for class employee. To avoid the initialization of data members in the data object, therefore, the constructor must be modified so that it will not execute any user specified constructor code (if any exists) when it is called. In this manner it will only initialize the hidden pointers. This is advantageously achieved by defining a global variable.sub.-- fix.sub.-- hidden which will operate as a flag to indicate whether or not the constructor is being invoked only to fix hidden pointers. This variable may be declared, for example, by the following line of C++ code: short .sub.-- fix.sub.-- hidden; The .sub.-- fix.sub.-- hidden flag will be used to distinguish between the case where the constructor is being invoked by the modified object creation process (i.e., the overloaded operator new) and where it is being invoked by the original unmodified object creation process. In particular, assume that a class D defines a user specified constructor of the following form:
______________________________________
D::D(parameter-declarations.sub.opt)
. . .
}
______________________________________
The subscript opt indicates that the parameter declarations are optional, since a constructor function may or may not have arguments. This constructor, as well as all other user specified constructors, are transformed as follows:
______________________________________
D::D(parameter-declarations.sub.opt)
if (!.sub.-- fix.sub.-- hidden) {
. . .
}
}
______________________________________
This transformation thereby ensures that if the global variable .sub.-- fix.sub.-- hidden is set (i.e., is non-zero), then no user specified code will be executed when the constructor is called. Therefore, none of the data members of the retrieved object will be modified (i.e., they will not be initialized). However, it is also necessary to ensure that any initializers present in a constructor definition do not modify any data members. As is well known by C++ programmers of ordinary skill, initializers are given just before the constructor body as follows:
______________________________________
D::D(parameter-declarations.sub.opt) initializer-list
. . .
}
______________________________________
Initializers may be used to initialize the data members of the object as well as base class components (by another constructor call). Initializers that are themselves constructor calls need not be modified, since these constructor functions will themselves have been modified to execute conditionally based on the value of the global variable .sub.-- fix.sub.-- hidden. But those initializers which specify an initial value for a data member are modified to change the value of the data member only if the constructor is being called to initialize a newly created object and not to fix hidden pointers for an object that has been retrieved into memory. For example, an initializer of the form m(initial-value) where m is a data member, will be transformed to the following initializer: m(.sub.-- fix.sub.-- hidden ?m: initial-value) In this manner, when the global flag variable .sub.-- fix.sub.-- hidden is set, the initializer effectively assigns the member to itself, thus leaving the current value of the data member intact. The initialization of hidden pointers, that is, the modified object creation process 72 as shown in FIG. 8, may be thus encapsulated in a single member function, reinit, that is generated for each class. For example, the body of the reinit function for class student.sub.-- employee might comprise the following C++ code:
______________________________________
extern short .sub.-- fix.sub.-- hidden;
void student.sub.-- employee::reinit(void* p)
.sub.-- fix.sub.-- hidden = 1;
new ((.sub.-- ode *)p) student.sub.-- employee;
.sub.-- fix.sub.-- hidden = 0;
}
______________________________________
In particular, function reinit sets the global variable .sub.-- fix.sub.-- hidden to 1 before invoking the overloaded version of the new operator (which does not allocate any storage). Any constructors that will be invoked as a result of the new operator call will find .sub.-- fix.sub.-- hidden set and thus will neither execute any user specified code in the constructor nor modify any data member values through initializers. The only effect will be to fix the hidden pointers by changing their addresses to the appropriate values. Function reinit then sets the global variable .sub.-- fix.sub.-- hidden back to 0 before returning. In the foregoing discussion, primary reference to object-oriented languages and practices has illustratively been through the C++ language and practices, but it will be understood that other object-oriented contexts may be substituted in appropriate cases. In addition, each reference to a compiler should be deemed to include translators, interpreters and any other means by which source code is processed, thereby directly or indirectly resulting in the execution of the specified functionality. Although a specific embodiment of this invention has been shown and described herein, it is to be understood that this embodiment is merely illustrative of many possible specific arrangements which can be devised to represent application of the principles of the invention. Numerous and varied other arrangements can be devised in accordance with these principles by those of ordinary skill in the art without departing from the spirit and scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
