Data collection and restoration for homogeneous or heterogeneous process migration6442663Abstract A technique for process migration between computers is disclosed, particularly for collecting the memory contents of a process on one computer in a machine-independent information stream, and for restoring the data content from the information stream to the memory space of a new process on a different computer. The data collection and restoration method enables sophisticated data structures such as indirect memory references to be migrated appropriately between heterogeneous computer environments. Claims We claim: Description This invention pertains to methods and apparatus for collecting and restoring the data contents of a process in the memory space of a computer, more particularly for data collection and restoration between computers which may or may not have the same computing platform.
TABLE 1
Source Destination Source Destination
EDGE Node Node Address Address
E1 V3 V4 0105 0207
E2 V3 V5 0109 2026
E3 V4 V5 0215 2026
E4 V4 V6 0211 0899
The Type information (TI) table is created to track properties of each data type used in the process. The data type is the basic property that describes the semantic meaning of the object stored in a memory block. The type of a memory block is determined by the type of its contents (or data). A memory block contains an object or an array of objects of a particular type. The data type could be a "primitive" data type such as integer, character, or floating-point. It could also be a complex data type such as an array, and structure, record, pointer, or combination of these different types. The array type is an abstraction of data in which a memory block contains multiple data of the same type. The structure type describes the data when a memory block contains data of different types. The pointer type is the type of a memory block that stores indirect memory references to memory blocks, perhaps including itself. In most of the high-level programming languages such as C, a name can be given to a data type. This name can be used to reference a type of a user-defined object. For example, suppose that TREE is defined by a user as a type of a memory block containing an integer and two pointers to memory blocks of type TREE. In FIG. 4, the type of nodes V1 and V2 is integer; while those of nodes V3, V4, V5, and V6 are TREE. The TI table is used to store information about every data type, as well as basic functions for data collection and restoration. Other parts of the TI table are the type-specific saving and restoring functions. The saving and restoring functions collect the contents of a memory block node in the MSR graph, and restore the collected information back to a memory block node, respectively. To collect the content of a graph node, which is a memory block, the basic saving function associated with the type of the memory block is invoked. The saving function then generates the machine-independent information of a memory block; the restoring function takes the machine-independent representation of the information and restores it in a machine-specific form to the appropriate memory location. To collect a graph edge, which is a pointer, the algorithm translates the machine-specific pointer address to machine-independent pointer information. The MSR Look up Table (MSRLT) data structures are constructed to provide machine-independent locations of memory blocks and a mechanism to access them. The machine-independent locations of memory blocks are the indices of the MSRLT data structure to access those memory blocks. Since there could be a large number of memory blocks allocated in the process memory space, a method is provided to select the memory blocks that need to be registered into the MSRLT data structures. This selection keeps the size of the MSRLT data structures small, and reduces the searching time when the MSRLT is used. According to yet another feature of the invention, algorithms to collect and restore the live data content of the process are provided. Since the data content of the process may be represented in the form of an MSR graph, the data collection algorithm may traverse and collect graph components (nodes and edges) in a depth-first manner, that is, it travels from the "leaves" back up to the "root" of the graph. When a depth-first search algorithm is used to collect graph components, it is assumed that an initial set of nodes to be collected is known, based on live variable analysis or other criteria. For example, in FIG. 5, the nodes V3 and V4 are the initial set. This set of nodes should be known by both the sender and receiver processes before the data collection and restoration mechanisms begin. Since the receiver process does not initially know anything about the MSR nodes of the sender process, the information about the MSR nodes of the sender process must be received by the receiver process before data restoration begins. Assuming that the receiver process knows the information of the MSR graph and the initial set of nodes V3 and V4, FIG. 5 shows an example of the output of data collection on nodes V3 and V4. In collecting V3, recall that V3 has a structure type which is a combination of an integer value and two pointers. First the integer value of V3, which is 30, is collected and saved into the information stream in a machine-independent format. V3 is also "marked" as having been "visited" by the data collection algorithms. Then when the algorithm encounters the first pointer of V3, the information of the edge E1 is collected. The depth-first algorithm follows the pointer E1 to collect the content of V4, and marks V4 as having been visited. The integer part of V4 is saved to the machine-independent stream. Then its first pointer, E4, is collected. By following E4, the data collection algorithm next visits the node V6. The information of node V6, including an integer value and two pointers, is saved and marked as visited. Since there is no other link from V6, the algorithm backtracks to V4, and saves the second pointer of V4, which is E3. By following the link E3, V5 is visited and saved. Since there are no pointers contained in V5, the data collection algorithm backtracks to V4. Since there is no more information to be saved on V4, the algorithm backtracks to V3. At V3, the second pointer, E2, is saved. By following E2, the node V5 would be saved. But V5 has already been visited. Thus, its data is not collected again. The algorithm instead records information saying that V5, which is the destination node of E2, has been saved (or marked). The data collection algorithm backtracks to the node V3. Since there is nothing left to be saved at V3, the data collection operation on V3 is finished. In collecting data of node V4, since its data was already collected in the previous data collection operation, node V4 will not be saved again. The algorithm instead saves information (V4(MARKED)), indicating that V4 was already saved to the information stream. The information to reconstruct the MSRLT data structure and the collected machine-independent information of the memory space contents are placed into the output information stream and sent to another process on the destination computer. During data transmission between two processes on different machines, the information about the structure of the MSRLT data structures is sent to the destination process first, before any other machine-independent information. An identical MSRLT data structure is then created from the transmitted information in the memory space of the destination process to provide an index for restoring the data. At the receiving end, the destination process reads the transmitted data and reconstructs its MSRLT data structure. Then the data restoration algorithm is invoked to extract data from the information stream, and to restore the data to the appropriate memory location in the destination process's memory space. The data will also be transformed from the machine-independent format to the machine-specific one according to their data types. For example, on the receiver process, the data restoration operation restores information of V3 and V4 to its memory space. The data of V3 is extracted from the information stream to a memory space allocated for V3. Then the information of E1 and V4 are extracted from the information stream. The information of V4 is put to an allocated memory space. E1 is also recreated as a pointer from the second component of V3 to the first component of V4. Next the restoration algorithm extracts data of E4 and V6 from the information stream. The data of the node V6 is put in an allocated memory space. The edge E4, which is a pointer from the second component of V4 to the first component of V6, is also reestablished. Since V6 does not have any pointers, the algorithm backtracks to reconstruct the second pointer of V4, which is E3. Then the algorithm extracts E3 and V5 from the transmitted information stream. The data content of V5 is put to an allocated memory space. The pointer represented by an edge E3 is reestablished from V4 to V5. Since the pointer contents of V5 are NULL, the restoration algorithm backtracks to restore the rest of the content of the node V4. But since all components of V4 have been restored, the algorithm backtracks to restore the second component of node V3. At this point, the data restoration algorithm extracts information of E2, and the marking flag indicating that V5 has already been visited (V5(MARKED)) from the information stream. By reading the marking flag, the algorithm knows that data content of V5 is already restored. So the algorithm finds the node V5 in the memory space of the receiver process and reconstructs the edge E2, the pointer from the third component of V3 to the first component of V5. Now the data content of the node V3 is completely restored. Next the algorithm restores V4. Since the information of V4, V4(MARKED), in the transmitted information stream indicates that V4 is already restored, the algorithm finds the node V4 already in the memory space of the receiver process. The invention has been successfully demonstrated with three C programs containing different data structures and execution behaviors including numerical-intensive and pointer-intensive behaviors with recursion. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 shows examples of storage objects and reference objects. FIG. 2 shows the software modules as attached to the source and destination processes. FIG. 3 shows the content of a memory snapshot of a process. FIG. 4 shows the nodes and edges of an MSR graph. FIG. 5 shows the initial set of nodes V3 and V4. FIG. 6 shows a process model for data transfer in a heterogeneous environment. FIG. 7 shows the, layered design of the data collection and restoration software. FIGS. 8a and 8b show an example program and its memory blocks. FIG. 9 shows an example of the MSR graph. FIG. 10 shows the MSRLT data structures. FIG. 11 shows a representation of a pointer between two nodes of the MSR graph. FIG. 12 shows an example of the machine-independent format of a pointer offset. FIG. 13 shows different formats of the output information stream of the saving function. FIG. 14 shows the three layers of routines for data collection and restoration in heterogeneous environment. FIGS. 15a and 15b show the MSR graphs for (a) test_pointer and (b) bitonic sort programs respectively. FIG. 6 illustrates schematically the novel process for data transmission between two processes on different computing platforms. In preparing a process for data transmission in a heterogeneous environment, type information for all transmitted data is attached to the source code, the data collection and restoration library is also linked to the source code, and a special data structure is used to identify the logical locations of data in the process during execution. The type information and the library are linked with the source code at compile-time. A special data structure to track data in the process memory space, the MSRLT data structure, is dynamically constructed and maintained at run-time. For process migration, the same source code (in migratable format) preferably is distributed to the source and destination machines before the migration decision is made, and the source code on the destination machine is compiled and ready for execution, which is defined as pre-initialization. Since the same source code is used on both the source and destination machines, identical annotated information about data types, the Type Information Table, is attached to the source code on both machines. Also, the data collection and restoration library routines that are linked with both processes during compilation should perform the same functions. During computation, the source process of data transmission, Process A in FIG. 6, generates and maintains a special purpose data structures, the Memory Space Representation Look Up Table (MSRLT), to track allocated memory blocks in the process memory space. It also provides logical machine-independent locations to the memory blocks. The MSRLT box not only contains information about the memory blocks such as types and lengths, but also information about the execution status of the running process. The construction of the MSRLT data structure is conducted via function calls to MSRLT Programming Interfaces inserted at various locations in the source code. The insertions of these operations are preferably performed by a precompiler. Before data transfer, the same MSRLT data structure is created in the memory space of the destination process, Process B in FIG. 6. The new MSRLT data structure in Process B provides basic information for recognition of the logical machine-independent locations, as well as other information attached to the transmitted data during the data restoration operation. The MSRLT data structure is reconstructed by again making function calls to the MSRLT Programming Interfaces. The MSRLT is used for data collection and restoration after the MSRLT data structures are created in the memory spaces of the source and destination machines. The data collection and restoration software may be viewed at three levels as shown in FIG. 7. The first level is the conceptual level. In this level, the memory space of a process is viewed in the form of a graph called the Memory Space Representation (MSR) graph. The data collection operation traverses the MSR graph to collect information and place the data in a stream of machine-independent information. In the data restoration operation, the software transforms the data in the machine-independent information stream to a machine-dependent form in the memory space of the destination process according to the MSR graph. In the second level, the MSRLT data structure is used to recognize memory blocks in the form of the graph structure conceptually defined in the first level. The MSRLT data structure is designed to store memory blocks, to provide machine-independent identification of the memory blocks, and to support the searching of memory blocks during data collection and restoration operations. The MSRLT interfaces are also defined at this level to manipulate the construction and reconstruction of the MSRLT data during execution of source and destination processes, respectively. The Type Information table is a module containing information about the data types used in the program, including (for example) XDR functions to encode the data objects of these types in machine-independent format, and to decode the machine-independent information to machine-specific format. The software runs on the top of the XDR technology. The basic idea is that the logical locations attached to the collected data enable the data to be restored to the desired locations in the memory space of the destination process. The Type Information table and the data collection and restoration modules cooperate during memory space collection and restoration. The data collection and restoration programming interfaces and algorithms are part of a separate programming module in the second level. The function calls of the programming interfaces to save or restore variables are inserted in the source code by the precompiler. The data collection and restoration algorithms for these interfaces examine the MSRLT data structure and then call the saving or restoring function provided in the Type Information Table to encode or decode data. Finally, the XDR utilities (for example), placed in the lowest level, are invoked after the information to be transmitted has been identified from programming modules in the second level. In the destination process, the XDR utilities decode the transmitted data from the machine-independent format to machine-dependent format. A snapshot of the program memory space may be modeled as a graph G, defined by G=(V, E), where V and E are the set of vertices and edges respectively. G is called the Memory Space Representation (MSR) graph. Each vertex in the graph represents a memory block, and each edge represents a relationship between two memory blocks, when a memory block contains pointers. Each memory block is represented by a vertex v in the MSR graph. The following terminology is used: head(v): the starting address of memory block v type(v): the type of object stored in the memory block v elem(v): number of objects of type type(v) in memory block v. The address of the memory block refers to any address within the memory block. Let addr be an address in the program memory space. The predicate Address_of(x, v) is true if and only if head(v) .ltoreq.x.ltoreq.head(v)+((unit_size*elem(v)), where unit_size is the size of an object of type type(v). Given an example program in FIG. 8(a), FIG. 8(b) shows all memory blocks in the snapshot of the program memory space just before execution of the memory allocation instruction at line 17 in the function foo, assuming that the "for loop" at line 13 in the main function has been executed four times before the snapshot was taken. Each memory block in FIG. 8(b) is represented by a vertex name v.sub.i ;, where 1.ltoreq.i.ltoreq.11. The associated variable name for each memory block is also shown in parentheses. We assume that addr.sub.i, where 1.ltoreq.i.ltoreq.4, is the address of the dynamically allocated memory block created at runtime. addr.sub.i represents the name of the memory blocks in this example. In a stack-based language such as C, a process generally manages its memory space in four segments: code, global data, stack, and heap. The code segment contains machine instructions of every function. The global data segment is a memory storage of global variables and statically initialized data. During execution, a block of memory is allocated to each global variable. The size of each memory block could be varied depending on how its associated variable is defined. The lifetime of a global variable is the time from the start to termination of its mother program. The stack segment is a memory space used to store local variables, parameter passing in function calls, and other supporting information to keep track of function calls in the program. An activation record is a representation of a function in the stack segment. It contains memory spaces for parameters and local variables of the function. When a function is called, its associated activation record is pushed to the top of the stack segment. The memory spaces of parameters and local variables, which are a part of the activation record, can then be accessed during execution of the function. When the function terminates, its activation record is popped out of the stack segment. Thus the lifetime of the local and parameter variables of a function is limited to the execution of the function. Finally, the heap segment is a memory space provided for dynamic allocation of memory storage during execution of the program. The heap memory allocation is usually performed by a programming language statement such as a call to a function malloc in C. The allocated space can also be deallocated to give back the memory space to the heap segment. This can be done by a statement such as a call to a function free in C. The lifetime of the allocated memory spaces thus spans the period from creation by the memory allocation statement to destruction by the deallocation statement. For data collection and restoration, the memory blocks can reside in different areas of the program memory space. If a memory block is created in the global data segment, it is called a global memory block. If it is created in the heap segment by dynamic memory allocation instruction, it is called the heap memory block. If the memory block resides in the activation area for a functions in the stack segment of the program, it is called the local memory block of function f. Let Gm, Hm, Lm.sub.main, and Lm.sub.foo represent sets of global memory blocks, heap memory blocks, local memory blocks of function main, and the local memory block of function foo, respectively. From FIG. 8, we can define Gm={v.sub.1 }, Hm={v.sub.6, v.sub.7, v.sub.8, v.sub.9 }, Lm.sub.main ={v.sub.2, v.sub.3, v.sub.4 }, and Lm.sub.foo ={v.sub.10, v.sub.11 }. FIG. 9 shows an MSR graph representing a snapshot of memory space for the example program in FIG. 8(a). The edges e.sub.i, where 1.ltoreq.i.ltoreq.8, represent the relationships from the pointer variables to the addresses of their reference memory blocks. One of the most important properties of a memory block is its data type. A memory block comprises one or more objects of a particular type. Each type describes a set of properties for the same kind of data and associates a set of functions to manipulate the data objects. To support data collection and restoration during process migration, we need to provide certain information and operations to manipulate data of the specific type stored in the memory block. Every type to be used during the execution of the program will be recognized at compile-time. This type information is used to construct the Type Information (TI) table which contain information and functions to save and restore the memory block nodes used for data collection and restoration during process migration. At compile-time, a unique number, the Type Identification (Tid) number, is assigned to every type to be used during the execution of the program. The information of every type used in the program is stored in the TI table. The type-specific saving and restoring functions to encode and decode data are also created to transform the contents of the memory block to and from the machine-independent information stream. This information, along with these functions, is annotated in the global declaration section of the program source code to allow global access during execution of the program. The TI table is indexed by the Tid number. It contains the following fields: (1) unit size (unit_size): size in bytes of the object of a particular type. In the case of an array type, the unit size is the size of the object of the array content. Otherwise, the unit size is the size of the object of the given type. (2) number of elements (elem): If the object has type array, the number of elements in the array is assigned to this field. Otherwise, this field is 1. (3) number of components (compo): In case the type is a structure type, we assign the number of the components of the structure to this field. Otherwise, this field is assigned the value 1. (4) a pointer to component layout table (component_layout): The component_layout table contains information about the format of a structure type. Each record in the table supplies information for a component of the structure. This table is used to translate the offset of any pointer address, relative to the beginning address of the memory block, to machine-independent format and vice versa. The record in the component_layout table contains the following information: (a) The beginning offset of the corresponding component relative to the beginning position of the structure. (b) If the type of the component is not an array, the Tid number of the component is assigned to this field. Otherwise, the Tid number of the array content is assigned. (c) The number of elements is assigned to this field if the component is an array. If not, this field is assigned to 1. (5) Tid number of the content of the pointer or array (Tid_content): In case the type is a pointer or array, the Tid number of the object to which the pointer refers, or the content of the array, is assigned to this field. Otherwise, the field is left empty. (6) saving method (Saving): This field contains a function to save the content of the memory block of a particular type in the program memory space into a stream of machine-independent information. This function is used while collecting information in the memory space of the migrating process. (7) restoring method (Restoring): This field contains a function to rebuild a memory block of a particular type in the appropriate memory location. This function is used during the restoration of the memory space of a new process on a new machine. Note that the name in parenthesis associated with each field is its field_name. Notation TI[tid].<field_name>, where tid represents a type identification (Tid) number, represents the content of the field field_name in the TI table. The following code shows examples of Tid numbers, a saving function, and the programming interface to save a variable of the source code in FIG. 8(a). An example of Type Identification number:
#define TypeStructTree 1
#define TypePtr_to_StructTree 2
#define TypePtr_to_Ptr_to_StructTree 3
An example of the Saving Function for data type "struct tree":
int Pack_StructTree(char *mem_block, int type_id, int length) {
struct tree *target_mem = (struct tree *) mem_block ;
. . .
for(i = 0; i < length; i++) {
pack_float( target_mem->d1, Type_Float, 1 );
pack_Int( target_mem->d2, Type_Int, 1 );
save_pointer( target_mem->d3,
TypePtr_to_Array10_to_Ptr_to_Int ) ;
save_pointer( target_mem->left, TypePtr_to_StructTree ) ;
save_pointer( target_mem->right, TypePtr_to_StructTree ) ;
An example of the Programming Interface to save the pointer variable "p":
save_pointer( p, TypePtr_to_Ptr_to_StructTree ) ;
The Tid number is the index of the TI table. The Tid number for data types struct tree, struct tree*, and struct tree** are defined as 1, 2, and 3, respectively. In the saving function for the data type struct tree, the function save_pointer is called to collect the pointer structure components, pointers which are the components d3, left, and right. An example is the programming interface to save the pointer variable p, defined as a local variable in the function foo in FIG. 8(a). This function call statement is inserted in the body of the function foo to collect value of the variable p. The saving and restoring methods are the most important functions in the TI table to be used during memory block collection and restoration. Once process migration is started on the source machine, the data collection and restoration algorithms identifies the memory blocks to be collected. The saving function encodes the contents of the memory blocks in machine-independent format, and makes them a part of the machine-independent information stream for process migration. After transmitting the information to the new machine, the restoring function extracts the information for the memory block from the stream, decodes it, and stores the results in the appropriate place within the memory space of the new process. When the memory block does not contain any pointers, one can apply techniques to encode and decode contents of a memory block to and from the XDR information stream using the XDR software package to construct the saving and restoring functions. However, if the memory block does contain pointers, one uses functions save pointer(pointer_content, tid) and restore_pointer(tid), where pointer_content represents the memory address stored in a pointer and tid is the Tid number of the pointer, to save and restore the pointers, respectively. The save_pointer function initiates the traversal through the connected components of the MSR graph in a depth-first search manner. It examines the memory block to which the pointer refers and then invokes the appropriate saving function stored in the TI table to save the contents of the memory block. The following code shows a generic example of a saving function: Function pack_td(obj_mem_address, tid, element_num) 1: Cast obj_mem_address to be the pointer to td 2: For i=1 to element_num do 3: For each component of the structure pointed to by obj_mem_address do 4: If the component is not a pointer, the function pack_compTD, where compTD is the type of the component, is called to save the data content. Otherwise, the function save_pointer is called to save the pointer. 5: od; 6: Move obj_mem_address to point to the address of the next memory content of the type td; 7: od; Note that td is the type name of the data to be saved and that tid is the type identification number of the type td. The visited memory blocks are then marked so that the same blocks are not saved again by save pointer. On the other hand, restore_pointer recursively rebuilds the memory blocks in the memory space of the destination machine according to the information saved by save_pointer. To restore the contents of a memory block, restore_pointer invokes an appropriate restoring function stored in the TI table. The following code shows a generic example of a restoring function: Function unpack_td(obj_mem_address, tid, element_num) 1: Cast obj_mem_address to be the pointer to td 2: For i=1 to element_num do 3: For each component of the structure pointed to by obj_mem_address do 4: If the component is not a pointer, the function unpack_compTD, where compTD is the type of the component, is called to restore the data content. Otherwise, the function restore_pointer is called to restore the pointer. 5: od; 6: Move obj_mem_address to point to the address of the next memory content of the type td; 7: od; In conjunction with the TI table, the MSRLT data structures track memory block nodes of the MSR graph. For further efficiency, a practical scheme to reduce the amount of information stored in the MSRLT data structures is used. The functions of MSRLT data structure are the following: (1) to track information to transfer each memory block into machine-independent format. Specific data about memory blocks such as Tid number, size of each element, and number of elements is used when the information stored in those memory blocks is translated to the machine-independent information stream. (2) to support searching of memory blocks during the data collection operation. The saving functions are recursively invoked to collect memory blocks, following the MSR graph nodes, in a depth-first search manner. During the search, the item in the MSRLT data structure corresponding to each memory block is marked when the memory blocks are collected. Marked memory blocks are not collected a second time. Thus, this scheme prevents the duplication of transmitted data during process migration. (3) to provide machine-independent identification to memory blocks. In program memory space, the memory blocks are identified by address. However, since the memory addressing scheme can be different on computers with different architectures, a logical scheme is used to identify the memory blocks during the data transmission between the two processes. In practice, keeping track of all memory blocks is generally unnecessary. Only the memory blocks that are visible to multiple functions, and those that are or that may be pointed to by any pointers during the execution of the program need to be recorded for process migration. In the MSR graph these vertices are called significant nodes, and others are called trivial nodes. The significant nodes and their properties are recorded in the MSRLT data structures. We classify nodes in the MSR graph into two types. During process migration the significant nodes can be "visited" multiple times due to their multiple references; the trivial nodes are visited only once via their variable names or memory addresses. To prevent multiple copies of significant memory blocks from being transmitted during process migration, the significant nodes are registered so that the status of the nodes can be checked. If a memory block is a global or local variable, its properties are recorded in the MSRLT data structure if and only if the following conditions are satisfied: (a) The memory block of a global variable is referred to in multiple functions or in a function with possibility of direct or indirect recursions. During execution, since the memory blocks can be accessed from more than one function in the activation record, the memory blocks can become significant nodes in the MSR graph. (b) The memory block's address (any addresses that satisfy the Address_of predicate) is or could be assigned to any pointer, or could be used in the right hand side of any pointer arithmetic expression during program execution. In C, since the address of the memory block is defined by the `&` operator, the address of the variable that applies `&` is registered to the MSRLT data structure. In the case of an array variable, the variable name represents the starting address of the array memory block. Therefore, if the array name is used in the right hand side of the pointer arithmetic statement, the information of the array memory block should also be registered to the MSRLT data structure. This condition is applied to both global and local variables. These conditions can be verified at compile-time. The compiler scans every programming language statement to detect the use of each variable declared in the program. Instructions to enter the information of the detected global variables to the MSRLT data structure are inserted at the beginning of the main function's body: the detected local variables are inserted at the beginning of the function in which they are declared. If local variables belong to the main function, the MSRLT registration instructions are inserted after those of the global variables. These instructions are called the MSRLT Programming Interfaces. In the case of a dynamically allocated memory block in the heap segment of the program, the starting address of the memory block is assigned to a pointer variable. One can access the memory block either by directly referring to its address, any address that satisfies the Address_of predicate for the particular memory block, or by referring to the content of the pointer. Therefore, one registers information of every dynamically allocated memory block to the MSRLT data structure. In our design, we create a function to wrap up the original memory allocation statement in the program. In our C language prototype, we replaced the original malloc function by the HALLOC function. For example, the statement . . . =(struct tree *) malloc(sizeof(struct tree) ); was replaced by . . . =(struct tree *) HALLOC( tid, sizeof(struct tree) ); where tid is the Tid number of struct tree. The HALLOC function calls malloc and registers information of the allocated memory block to the MSRLT data structure. Let Gs, Hs, Ls.sub.main, and Ls.sub.foo be sets of the significant nodes of the global data segment, heap data segment, local data of function main, and local data of function foo, respectively. From the example program and its memory blocks in FIG. 8, we get Gs={ }, Hs={v.sub.6, v.sub.7, v.sub.8, v.sub.9 }, Ls.sub.main ={v.sub.3, v.sub.4, v.sub.5 }, and Lsfoo={ }. The architecture of the MSRLT data structures gives a logical mechanism for identifying MSR memory block nodes across machines. The MSRLT data structure is used to track information of every significant memory block in the program memory space. It also provides each memory block a logical identification that can be used for reference between two machines during process migration. FIG. 10 shows an example of the MSRLT data structure. The structure comprises two tables: the mem_stack and mem_set tables. The mem_stack table is a table that keeps track of the function calls in the program activation record. Each record in the table represents a particular data segment of the program. The first record, denoted by mem_stack[0], is used for information of a set of significant memory blocks of the global variables. The second record, mem_stack[1], contains information of a set of significant memory blocks in the heap segment. The third record is used for the set of significant memory blocks of local variables of the main function. The rest are used for the significant memory blocks of local variables of each function call in the activation record. A record of the mem_stack table comprises two fields: a pointer to a mem_set table, and the number of records in the pointed mem_set table. The mem_set table is used to store information of every significant memory block of a data segment of the program represented by a record in the mem_stack table. Each record in the mem_set table comprises the following fields: tid, type identification number of the object contained in the memory block. unit_size, size in bytes of each object in the memory block. element_num, the number of objects stored in the memory block. mem_block_pointer, a pointer to the starting address of the memory block. marking_flag, a marker used to flag whether the memory block has been "visited" during the memory collecting operation. When the program begins execution, the first three records of the mem_stack table are created. Then, whenever a function call is made, a mem_stack record is added to the mem_stack table in stack-like manner. If there are any significant local variables in the function, they are added to the mem_set table of the last mem_stack record. After the function finishes execution, the mem_stack record as well as its mem set table are destroyed. In the case of memory blocks in the heap segment, the information in the memory block allocated by the function malloc is added to the mem_set table of the mem_stack[1] record. The information is deleted from the mem_set table when the free operation is called. Each significant memory block is identified by a pair of indices, its mem_stack and mem_set records. This identification scheme is used as a logical identification of significant memory blocks across different machines. Let v, stack_index(v), and set_index(v) be a significant MSR node, the index of its mem_stack record, and the index of its mem_set record, respectively. The logical representation of v is given by (stack_index(v), set_index(v)). Each edge in the MSR graph is represented by a pair of source memory addresses: the memory address of a pointer and the memory address of the object to which the pointer refers. Pointers are represented by edges of the MSR graph, in a machine-independent format. Such a format is illustrated in FIG. 11. There are three edges between nodes v .sub.1 and v.sub.2 in FIG. 11. For example, edge e.sub.1 can be represented in the form of (addr.sub.1, addr.sub.4) where addr.sub.1, and addr.sub.4 are addresses that satisfy the predicate Address_of for node v.sub.1 and v.sub.2, respectively. addr.sub.1 is a pointer object that contains the address addr.sub.4 in its memory space. Therefore, given addr.sub.1 one always obtains addr.sub.4 as its content. By taking a closer look at e.sub.1, one can also write it in the form of (addr.sub.4, head(v.sub.2)+(addr.sub.4 -head(v.sub.2))). The address head(v.sub.2) is called the pointer head, and the number (addr.sub.4 -head(v.sub.2) is called the absolute pointer offset. The representation of a pointer in machine-independent format comprises the machine-independent representations of the pointer head and the pointer offset. Under the definition of significant memory block, the node pointed to is always a significant node. Thus, its properties are stored in the MSRLT data structure. From the example in FIG. 11, the logical identification of v.sub.2 can be represented by (stack_index(v.sub.2), set_index(v.sub.2)). This logical identification is used to represent the pointer head in the machine-independent information stream for process migration. To represent the offset of the pointer in machine-independent format, one transforms the absolute pointer offset into a sequence of (component position, array element position) pairs. The component position is the order of the components in the memory space of a structure to which the pointer refers. The array element position is the index to the array element that has the pointer pointing to its memory space. For example, the absolute pointer offset (&a[i].c1[1].c2[2].vertline.-a) of the memory block of the variable a in FIG. 12 can be translated to <(-,i),(1,1),(2,2)>. Note that indexing starts from zero and that the component position of the first pair was not used. From the first pair, the array position indicates that the pointer points to the (i+1)st element of array a. The component position part of the second pair (1,1) means that the pointer points to the second component of the structure stored in a[i], which is a[i].c1. The array element position part of (1,1) indicates that a[i].c1 is the array, and that the pointer points to the second element of that array, a[i].c1[1]. Finally, the component position of the pair (2,2) means that a[i].c1[1] is the structure, and the pointer falls on the third component of a[i].c1[1], which is a[i].c1[1].c2. The array element position of the pair (2,2) indicates that the component a[i].c1[1].c2 is the array, and that the pointer points to the third element of a[i].c1[1].c2, which is a[i].c1[1].c2[2]. A sequence of machine-independent pointer offsets can be generated from a given absolute pointer offset using the information provided in the component_layout tables. The absolute pointer offset can also be derived from the machine-independent offset using the same information. To represent the transformation between the absolute and machine-independent pointer offsets, let mach_indep of set(x) and abs_offset(y) represent the machine-independent pointer offset of the absolute pointer offset x and the absolute pointer offset of the machine-independent pointer offset y, respectively. The programming interfaces to collect and restore data in the program memory space can be applied to any variable of the source and destination processes. The algorithm for collecting and restoring data in the MSR memory space must be created for the applicable source code chosen. The interfaces of the MSR graph data collection operations are categorized into two types according to the type of the variables. Let x be a variable and tid be the Tid number of x. If tid is a pointer type, the interface used to save x is save_pointer(x, tid). Otherwise, it would be save_var(&x, ptid, TI[tid].<elem>), where ptid is the Tid number of a type of pointer to an object of the type represented by tid. In the case of a pointer, save_pointer is used to save the pointer content, the address stored in a pointer x. The saving function is not applied in the TI table to save pointer x directly, because the memory block to which x points is necessarily a significant memory block. Since the significant memory block is registered to the MSRLT data structures, its contents are collected in an appropriate manner. save_pointer is also used in the saving function in the TI information table if the memory block of a particular data type contains pointers. If tid is not the Tid number of a pointer type, the variable x is viewed as a memory block of type represented by tid and of size TI[tid].<elem >. However, one cannot simply apply the saving function in the TI table to save x, because x might be a significant memory block whose information is stored in the MSRLT data structure. Thus, the memory address &x, the starting memory address of the memory block of x, and ptid are parameters of the interface. These parameters are passed to the function save_variable as shown at line 2 of the algorithm below: Function save_var(start_memory_address, ptid, elem_num) 1: For i=1 to elem_num do 2: Call save_variable(start_memory_address, ptid); 3: Set start_memory_address to point to the address of the next memory content of the memory block; 4: od; Note that ptid is the Tid number of the type of pointer to the type represented by tid. The function of save_variable is similar to that of save_pointer; however, save_variable is designed to save the content of the trivial memory block. Two interfaces are used to rebuild the memory space of the new process by reading the contents of the machine-independent information stream. They are the functions restore_pointer and restore_var. Let x be a variable with a type corresponding to the Tid number tid. If x is a pointer, use x restore_pointer(tid)to restore x; and restore_var(&x, ptid, TI[tid].<elem>) is used to restore the content of x. The function restore_pointer is used to rebuild the program data structures saved by save_pointer. In the memory rebuilding process, restore_pointer may allocate memory space, create the appropriate MSRLT data structure, and restore the contents of the significant memory blocks. When finished, restore_pointer returns the appropriate memory address to x. Likewise, the function restore_var shown below restores the memory contents saved by save_var: Function restore_var(start_memory_address, ptid, elem_num) 1: For i=1 to elem_num do 2: Call restore_variable(start_memory address, ptid); 3: Set start_memory address to point to the address of the next memory content of the memory block; 4: od; Since the variable x could be a significant node in the MSR graph, the starting memory address &x is treated as a pointer of type associated to ptid, pointing to the memory block of x. Also, restore_var calls the function restore_variable, which has a function similar to restore_pointer. The function restore_variable restores the trivial nodes of the MSR graph. In the function save_pointer, the pointer content, which is a memory address to which the pointer refers, and the tdid, the type identification number of the pointer type, are passed as parameters. An algorithm illustrating function save_pointer is shown below: Function save_pointer(pointer_content, ptid) 1: If pointer_content is NULL Then 2: Write a flag FALSE to the information stream and Return; 3: fI 4: Search MSRLT data structure to find a significant node v that the predicate address_of(pointer_content,v) is true; 5: If the predicate is true Then 6: Write the marking_flag of v in the MSRLT data structure to the stream; 7: Write tdid,TI[ptid].<tid_content>, stack_index(v), and set_index(v) to the stream; 8: If the marking_flag=CLEAR Then 9: Set marking_flag to MARKED; 10: Write elem(v) to the stream; 11: Call TI[TI[ptid].<tid_content>].<Saving>(head(v), TI[ptid].<tid_content>, elem(v)); 12: Write mach_indep_offset(pointer content-head(v)) to the information stream; 13: Else { marking flag is MARKED } 14: Write mach_indep_offset(pointer_content-head(v)) to the information stream; 15: fI 16: Else 17: Report "There might be a dangling pointer in the program"; 18: fI First one checks if the pointer content is NULL. If so, the function writes the flag FALSE to the machine-independent information stream, and returns to its caller. Otherwise, it searches the MSRLT data structure to find a significant memory block node v such that the Address_of(pointer content,v) predicate is satisfied. If such a v cannot be found, the pointer could be dangling and is reported. After v is found, the properties of the node in the MSRLT data structure such as marking flag, stack index(v), and set index(v) as well as the type information of the pointer and the pointed node are written to the output information stream. If the marking_flag is CLEAR, the node v in the MSR graph has not been saved before. Thus one saves the number of data objects in the memory block node v, elem(v), and invokes the function TI[TI[ptid].<tid_content>].<Saving> to save its contents. If node v contains any pointers, the function save_pointer is called to traverse and save nodes in the MSR graph. However, in any situation the result of the saving function TI[TI [tdid].<tdid_content>.<Saving> is considered as the contents of node v. Then, at line 12 of save_pointer, the machine-independent pointer offset mach_indep_offset(pointer_content-head(v)) is saved to the stream before the termination of the function. If the marking flag is MARKED, the contents of v are skipped since they will have already been saved to the information stream. Only the machine-independent pointer offset is saved before the function returns to its caller. The save variable function is largely identical to the save_pointer function, except the code on line 17 of save_pointer will be replaced by lines 17 and 17a shown below. Function save variable(pointer_content, ptid) 16: Else 17: Write the TRIVIAL flag to the information stream; 17a: Call TI[TI[ptid].<tid_content>].<Saving>(pointer_content, TI[ptid].<tid_content>, TI[TI[ptid].<tid content>].<elem>); 18: fI Since some of the memory blocks of the global or local variables could possibly be significant memory block nodes, the function save_variable performs most of its operations similarly to those of save pointer. The new lines 17 and 17a are used to save the contents of the trivial nodes in the program memory space. FIG. 13 shows the different output information streams obtained by applying the save_pointer and save_variable functions. Both functions produce the first two formats in FIG. 13. The first format is the output of the functions when the marking flag is CLEAR. The second is the output when marking flag is MARKED. The last is the output of the function save_variable when the memory block of the starting memory address pointer_content is a trivial node. In function restore_pointer shown below, the function reads a flag from the information stream first. Function restore_pointer(ptid) return pointer_content 1: Read temp_flag from the information stream; 2: If temp_flag=FALSE Then 3: Return NULL; 4: fI 5: If temp flag=CLEAR or MARKED Then 6: Read ptid, TI[ptid].<tid_content>, stack_index(v), and set_index(v) from the stream; 7: If the temp_flag=CLEAR Then 8: Read elem(v) from the information stream; 9: Set the contents of the mem stack and mem_set records in the MSRLT data structure for a significant node v; 10: If stack_index(v)=1 Then { Heap } 11: Create a memory block for v in the Heap segment; 12: Call TI[TI[ptid].<tid_content>].<Restoring>(head(v), TI[ptid].<tid_content>, elem(v)); 13: Read the machine-independent pointer offset y from the information stream; 14: Return head(v)+abs_offset(y); 15: Else {v is significant variable memory block} 16: { In this case, the memory space for v is already allocated by the program } 17: Call TI[TI[ptid].<tid_content>].<Restoring >(head(v), TI[ptid].<tid_content>, elem(v)); 18: Read the machine-independent pointer offset y from the information stream; 19: Return head(v)+abs_offsetty); 20: fI 21: Else { temp_flag is MARKED } 22: { In this case v has already been rebuilt } 23: Read the machine-independent pointer offset y from the information stream; 24: Return head(v)+abs offset(y); 25: fI 26: Else 27: Report "Invalid information stream"; 28: fI If the flag is FALSE, the function returns a NULL value to the pointer variable. Otherwise, it reads type information as well as the information needed to restore the MSRLT data structures. If the flag is CLEAR, the function reads the number of data objects to be stored in the memory block node v, elem(v). The information of v in the MSRLT data structure is updated according to this information. If the memory block node v is to be rebuilt in the heap segment of the program, the memory block is allocated, and then the function TI[TI ptid].<tid_content>].<Restoring> at line 12 in restore_pointer is invoked to restore the contents of node v. After that, the machine-independent pointer offset, y, is read from the stream, transformed to abs_offset(y), and then added to the starting address of v. The result is returned to the caller function as the memory address that is stored in a pointer. If v is a significant variable memory block, we assume that the memory space for the block is already allocated. Thus, the instructions in restore_pointer from line 17 to 19 should be performed appropriately. On line 21, if the flag is MARKED, one will just read the machine-independent pointer offset and restore the content of the pointer. The restore_variable function is identical to the restore pointer function except that line 27 of restore_pointer is replaced by line 27 and 27a shown below. Function restore_variable(ptid) return pointer_content 26: Else 27: { this alternative implies that the flag is trivial } 27a: Call TI[TI[ptid].<tid_content>].<Restoring>(pointer content, TI[ptid].<tid_content>, TI[TI[ptid].<tid_content>].<elem>); 28: fI Due to the possibility that some of the memory blocks of the variables could be significant memory block nodes, the function restore_variable performs most operations similar to restore_pointer. The major difference between them is that new lines 27 and 27a of restore_variable are used to restore the contents of the trivial nodes. Software for a heterogeneous data transfer environment includes various functional modules. These modules can be classified into three layers as illustrated in FIG. 14. The first layer uses basic data communication and file system utilities. The migration information can be sent to the new machine using, for example, the TCP protocol, the shared file system, or the remote file transfer if the file systems of two machines involved in the process migration are separate. In the second layer, XDR routines, for example, are used to translate primitive data values such as `char`, `int`, or `float` of a specific architecture into machine-independent format. In the third layer, an MSR Manipulation (MSRM) library routine, comprising the data collection and restoration interfaces and routines as well as the MSRLT interfaces and routines, translates complex data structures such as user-defined types and pointers into a stream of machine-independent migration information. The MSRM routines are called by the migration macros annotated to the program. To verify the correctness of the model and algorithms, we have conducted successful experiments on three programs with different kinds of data structures and execution behaviors. They are the test_pointer, the unpack benchmark, and the bitonic sort programs. These programs are migration-safe. The program analysis and annotated migration operations of MpPVM, adapted for the BDT mechanism, were applied to make them migratable in a heterogeneous environment. First, test_pointer is a program that contains various data structures, including pointer to integer, pointer to array of 10 integers, pointer to array of 10 pointers to integers, and a tree-like data structure. The MSR graph of the tree-like data structure is shown in FIG. 15(a). The root pointer of the tree is a global variable. Two different data sizes were used in our experimental tests. One had 11,325 nodes; the other 31,375 nodes. The program worked in two steps. It first generated and initialized every data structure, and then traversed the tree-like structure. In our experiment, we conducted process migration in the main function after all data structures were generated and initialized. After the migration, the tree-like data structure was traversed on the migrated machine. Second, the linpack benchmark from the netlib repository at ORNL is a computational intensive program with arrays of double and arrays of integer data structures. Pointers are used to pass parameters between functions. The benchmark solves a system of linear equations, Ax=b. Most variables in this program are declared in the main function. We performed two experiments on this program, with two different problem sizes. First, the program solved a matrix problem of order 200. The size of the matrix A in this case was 200*200. In the second test, the order of matrix was increased to 1000. The size of matrix A increased to 1000*1000. At run-time, we forced the program to migrate when the function DAXPY was executing with a function call sequence main( ).fwdarw.DGEFA( ).fwdarw.DAXPY( ), which means that the function main( ) called the function DGEFA( ), which in turn called the function DAXPY( ). Finally, the bitonic sort program was tested. In this program a binary tree, illustrated in FIG. 15(b), is used to store randomly generated integer numbers. The program manipulates the tree so that the numbers are sorted when the tree is traversed. The root of the tree is defined in the main function. Dynamic memory allocation operations and recursions are used extensively in this program. Two different problem sizes were again tested in our experiments. One was a tree with 1,024 nodes; the other was a tree with 4,096 nodes. Process migration was conducted in function bimerge( ) with a sequence of function recursive calls, main( ).fwdarw.bisort( ).fwdarw.bisort( ).fwdarw.bisort( ).fwdarw.bisort( ).fwdarw.bimerge( ).fwdarw.bimerge( ).fwdarw.bimerge( ).fwdarw.bimerge( ). In each experiment, we originally ran the test program on a DEC 5000/120 workstation running Ultrix, and then migrated the processes to a SUN Sparc 20 workstation running Solaris 2.5, so that the migration was truly heterogeneous. Both machines were connected via a 10 Mbit/s Ethernet network. Each machine had its own file system. All test programs were compiled with optimization using gcc on the Sparc 20 workstation, and cc on the DEC workstation. Each experiment was performed in two modes: direct network process migration, and process migration through file systems. The former is the choice of MpPVM and MPVM, which were developed for load balance in a non-dedicated environment. The latter is the method used in checkpoint for fault tolerance. In network migration, migration operations scan a program's data structures and store the machine-independent migration information in a buffer. After that, the buffer contents were sent over the network via the TCP transmission protocol. On the destination machine, migration operations used the received information to restore the execution state and data structures of the program. Thus, we estimated the migration time (Migrate) by summation of memory scan time (Scan), data transmission time (Tx), and memory restoration time (Restore). For migration through file systems, migration operations scanned the program's memory and wrote the migration information to a file. After that, we remotely copied the file to the destination machine using the rcp utility. Then, the new process read migration information from the file and restored the data structure on the destination machine. The migration time (Migrate) was the sum of memory scan and write time (Scan&Write), remote copy time (Rcp), and the file read and memory restoration time (Read&Restore). The objectives of our experiments were: (1) To verify that the proposed data collection and restoration methodology was correct and feasible for different applications; (2) To compare the performance of direct network migration and file system based migration; and (3) To measure the times used in collecting and restoring data for applications with various data structures. The results of our experiments are shown in Table 2.
TABLE 2
Timing results of heterogeneous process migration
test_pointer (nodes) Linpack (matrix)
bitonic (nodes)
Program 11,325 31,375 200 .times. 200 1000 .times.
1000 1,024 4,096
NETWORK
Tx (problem) 1,165,680 3,242,480 325,232 8,021,232
46,704 182,248
Size (bytes)
Scan (sec) 2.678 14.296 0.303 5.591
0.150 0.419
Tx (sec) 1.200 4.296 0.357 9.815
0.053 0.191
Restore (sec) 2.271 4.563 0.095 2.962
0.077 0.278
Migrate (sec) 6.150 23.181 0.756 18.368
0.280 0.889
FILE
Scan & Write (sec) 18.533 69.032 0.997 45.243
0.803 4.896
Rcp (sec) 15.4 20.3 11.9 39.1
10.6 11.5
Read & Restore (sec) 8.693 17.602 0.124 3.654
0.303 1.234
Migrate (sec) 42.626 106.934 13.220 87.998
11.707 17.631
COMPARISON
Diff (sec) 36.48 83.75 12.46 69.63
11.43 16.74
Speed Up 6.93 4.61 17.48 4.79
41.72 19.82
(File/Network)
For correctness, we found that the outputs of the implementation with migration and the implementation without migration were identical, for each testing input and testing program used. Because of the diversity of the test programs used, the process migration approach was shown to be feasible for any migration-safe code. Apparently, direct network process migration is faster than migration through file systems, as shown by the difference in migration time (Diff) in Table 2. Network process migration gained significantly higher performance improvements than did process migration via file systems when the size of data transmitted was small. For example, the improvement ratio (Speed Up) of network migration compared to file migration for sort processes with data transmission sizes of (Tx size) 46,704 and 182,248 bytes, and the Linpack process with data transmission size of 325,232 bytes through file systems calculated at 41.72, 19.82, and 17.48, respectively. On the other hand, the network migration times of processes with large data transmission sizes such as the test_pointer process with data transmission size of 3,242,480 bytes, and the Linpack process with data transmission size of 8,021,232 bytes, appear to have lower performance gains, as indicated by the factors of 4.61 and 4.79 of the migration time through file systems over the network migration time, respectively. When the size of the buffer that holds the migration information becomes large, a number of disk accesses are made implicitly by the memory swapping operations of the operating systems during process migration. Therefore, network process migrations with large data sizes tend to suffer performance degradation caused by a number of inevitable accesses to disks. Network process migrations with small data transmissions benefit most from direct network-to-network migration and obtain high performance improvement. Nevertheless, since the cost of migrating a process with a large data transmission size either via network or through file systems is relatively high, the relatively small change of the comparison factor is significantly large in absolute performance improvement of process migration. As shown in Table 2, the time differences between process migration through file systems and direct network transmission of the test_pointer process with a data transmission size of 3,242,480 bytes and the Linpack process with data transmission size of 8,021,232 bytes were 83.75 and 69.63 seconds, respectively. Although the improvement ratios of these two applications were low, the performance differences (Diff) were larger than those of the processes with smaller data transmission sizes. In direct network process migration, according to our experimental results, the memory scan and restoration times of the pointer-intensive program tended to be higher than those of the program with array data structure, and this factor dominated the total process migration cost. Note that, in our comparison, the pointer-intensive and array-based programs had comparable data transmission sizes. Comparing the timing results of the bitonic sort program, with the data transmission size of 182,248 bytes, and the linpack program with the data transmission size of 325,232 bytes shows the transmission size of the bitonic sort program was smaller, but its memory scan time and restoration time were higher. The same behavior was observed for the timing results of the test_pointer program, with data transmission size of 3,242,480 bytes, compared to the linpack program, with a data transmission size of 8,021,232 bytes. During the scanning operation the MSRM routines searched every memory block, making the cost of scanning operations high for pointer-intensive programs. Also, during restoration many memory allocation operations were used to recreate the data structures. The searching and dynamic allocation costs depended mostly on the number of memory blocks and the number of pointers among them. On the other hand, the linpack program only needed a trivial amount of time to search for memory blocks during the memory scanning operations, since most data were stored in local or global variables. At the receiving end of the migration, the migration information was copied into the reserved memory space of the new process. Dynamic memory allocation was unnecessary for the restoration process of the linpack program. The migration time of the pointer-intensive program was also higher than that of the array-based program for migration through file systems. From Table 2, the data transmission size indicated the size of the file that contains migration information. The relation between the complex data structure and the cost of process migration for file system-based migrations can be explained similarly to that of network-based migration. Those of skill in the art will recognize that the novel methods are implemented by being programmed onto a digital computer; and that the programs may be stored in any short-term or long-term memory storage device, e.g. magnetic tape, floppy disks, hard disks, CD, DVD, RAM, and the like. The novel methods may be used both at runtime or upon retrieval from pre-stored information. The complete disclosures of all references cited in this specification are hereby incorporated by reference, as is the entire disclosure of the following reference (which is not prior art to this application): K. Chanchio and X.-H. Sun, "Data collection and restoration for heterogeneous network process migration," Tech. Rep. 97-017, Louisiana State University, Department of Computer Science, September 1997. In the event of an otherwise irreconcilable conflict, however, the present specification shall control.
|
Same subclass Same class | ||||||||||
