Method and apparatus for identifying dynamic structure and indirect messaging relationships between processes6233729Abstract A system and method for identifying indirect messaging relationships between software process. Within a source code repository, all occurrences of calls to interface procedures are identified together with communications objects to which these calls pertain. Each interface procedure call is categorized as either a "send" or a "receive" interface procedure call. A relationship is identified when a pair of interface procedure calls are located which pertain to the same communications object and one of the pair has a "send" type and the other of the pair has a "receive" type. By identifying pairs of this type, relationships between hierarchical software entities which contain the interface procedure calls are also simultaneously identified. Each relationship thus identified is mapped onto one or more inter-process indirect messaging relationships. Claims The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows: Description FIELD OF THE INVENTION
TABLE 1
IDF Record Format and Example IDF Records
IDF field Name Example IDF Record 1 Example IDF Record 2
procedure Name q_send q_receive
procedure Type "s" "r"
Type of service queue queue
Position of COID 1 1
within procedure
call
The fields in the IDF record include a procedure name field, a procedure type field, a type of service field, and a "position of COID within procedure call" field. The procedure name field contains the symbolic name of an interface procedure as it would appear in source code. The procedure type is either "c" for create, "r" for receive, "s" for send, or "o" for other. There may be many different procedures relating to receiving for example, and all of these would have the same procedure type, namely "r". The interface procedure, when called, contains one or more parameters at least one of which is a communications object having a COID (communications object identifier) of a certain type. The COID is a global entity symbolic name which either directly identifies the communications object appearing in the procedure call, or which identifies a global entity ultimately associated with the communications object appearing in the procedure call. The "type of service" identifies the particular messaging service/paradigm being used for the particular communication. For operating systems that support the C language, for example the pSOS operating system environment, these message types include signal, queue, event, semaphore. The "position of COID within procedure call" identifies where the global entity having that COID is located within the one or more parameters of the procedure call. There will be a different IDF for each software language and operating system. For a given software language and operating system, there will be a number of standard entries in the IDF. In addition, there may be a number of entries which relate to interface capabilities provided by application specific messaging paradigms implemented by application software used in conjunction with the OS. For example, it may be that the certain software entities in the source code are designed to communicate with each other (when compiled, linked and run etc.) through a database application. In this case, the first entity writes to the database, and the second entity reads from the database. Assuming this type of relationship between the first and second entities is of interest, entries for these database read and write procedures are included in the IDF. Example IDF record 1 pertains to the C function "q_send" which has procedure type "s" and has message type "queue". Example IDF record 2 pertains to the C function "q_receive" which has procedure type "r" and has message type "queue." Sender interface procedures, namely interface procedures which are assigned interface procedure type "s" include any procedures which do something with "write" permission to a communications object. These include write procedures, modify procedures, delete procedures to name a few examples. Receive interface procedures, namely interface procedures which are assigned interface procedure type "r" include any procedures which do something with "read" permission on the basis of a communications object. These include read and receive procedures for example. In step two, each record in the set of records in the object files and the DFA files pertaining to "procedure calls procedure" construct is examined to see if the called procedure is any one of the procedures included in the IDF. A flowchart for this step is shown in FIG. 6. As identified previously, the indirect messaging table is a table in which is stored all occurrences of procedures calling interface procedures. A preferred format for records in the indirect messaging table and two examples of such records are shown in Table 2. Example record A is an example of a record which uses the interface procedure defined in example IDF record 1 of Table 1, and example record B is an example of a record which uses the interface procedure defined in example IDF record 2 of Table 1. As was the case for the format of basic relationships in the object file, the format of the records in the indirect messaging table will be partially dependent upon how software entities are grouped within the software depository. In the examples of Table 2, it is assumed that interface procedures are called from procedures contained in files.
TABLE 2
Indirect Messaging Table Record Format and Example
Example Example
Field Name Record A Record B
File Name file_A file_B
procedure where called proc_A proc_B
interface procedure Name q_send q_receive
Communications Object GV1 GV1
Identifier
line number 100 200
Type of service queue queue
Procedure Type {c,r,s,o} s r
The fields in the indirect messaging table record include the name of the interface procedure being called, the procedure within which the interface procedure is called, and the file containing that procedure. The interface procedure name field must be one of the procedures identified in the "procedure name" field of a record in the IDF. The "position of the COID within procedure call" is obtained from the field of the same name in the relevant record of the IDF and the global entity name directly or ultimately associated with the object in the object position within the procedure call identified by the "position of COID within procedure call" field is placed in the "Communications Object Identifier" field. The line number identifying where the interface procedure was called is included in the "line number" field. Finally, the "type of service" and procedure type fields of the indirect messaging record contain the "type of service" and "procedure type" copied from the fields of the same names in the relevant record of the IDF. For constructs in the object files, if the entity in the "position of COID within procedure call" is non-global, then DFA may be performed to identify the global entity or entities ultimately associated with the non-global entity, with a record or records then generated on the basis of any such non-global entities identified. Alternatively, the record can be discarded with the understanding that it will be picked up when processing a DFA file record containing the same construct. The latter approach is preferred as it avoids duplicate records in the IMT which must be identified and eliminated. In what follows, the latter approach is assumed. For the example parameters appearing in Table 2 above, the interface procedure calls in the source code might look like the following: q_send(GV1,x,y,z) for example contents A; and q_receive(GV1,b,c) for example contents B. For records in the DFA file, if the entity in the "position of COID within procedure call" is global, then a record in the IMT will have been created while processing a related object file record and the DFA record may be discarded. For records in the DFA file which have non-global entities in the field identified by the "position of COID within procedure call", DFA is performed to identify one or more global entities which are ultimately associated with the entity in that position. A record in the indirect messaging table is created for each such global entity identified, these records taking the same form as those described with reference to Table 2 above. DFA in this context is the process of tracking local entity usage back to its global source(s). Techniques for performing DFA to achieve such ends are well known in the art and will not be described further herein. Step Three: Identify Pairs of Records and Create Indirect Messaging Relationships Table In this step, the indirect messaging table is examined for pairs of records which together constitute a relationship. A relationship exists when one procedure is using a "send" type interface procedure with respect to a particular communications object, and another procedure is using a "receive" type interface procedure with respect to the same communications object. A table defined as an "Indirect Messaging Relationships Table" is created and is used to store a record for each such relationship. These records are categorized according to the particular message type. An example record format for the C message type "queue" is shown in Table 3. A relationship having the "queue" message type would be identified for the two example indirect messaging table records of Table 2 because record A is a "send" type interface procedure (has type "s"), record B is a "receive" type interface procedure (has type "r"), and they both relate to the communications object GV1. Table 3 contains an example of an indirect messaging relationships record contents pertaining to the records of Table 2.
TABLE 3
Indirect Messaging Relationships Table Record Format
for C Queue Message Type and Example
Field Name Example Contents
file_sender file_A
proc_sender proc_A
sender_line_no 100
file_receiver file_B
proc_receiver proc_B
receiver_line_no 200
COID GV1
Of course, since most of the contents of the indirect messaging relationships table are already present in the indirect messaging table, a more concise version of the indirect messaging relationships table may be realized by simply including in each record a record ID together with pointers to the send and receive records in the indirect messaging table. A flowchart for a method of identifying indirect messaging relationships is shown in FIG. 7. Firstly, the records in the indirect messaging table are examined in sequence for the first (next) record having an unprocessed COID and having type "s". Next, the remaining records are examined for any records having the same COID and having a procedure type "r". For each such record, an entry in the indirect messaging relationships table is created. After all of the records have been examined, the entire process is repeated for the next and all subsequent unprocessed COIDs having type "s". While a very specific method of identifying indirect messaging relationships from the indirect messaging table records has been described, it is to be understood that other methods may be employed within the scope of the invention. Notes on Interface Procedures Using Local Variables In the above described embodiment of the RDT, it was assumed that in some cases, local variables were used in interface procedure calls to identify communications objects. Consider the following example in which it is assumed that q receive is an "r" type interface procedure, q_send is an "s" type interface procedure, local.sub.-- 1 is a local variable, Q1, Q2 are global variables, and RF,RF1,RF2,SF are procedures defined as follows:
RF {
q_receive(local_1)
}
RF1 {
RF(Q)
}
RF2 {
RF(Q1)
}
SF {
q_send (Q1)
}
In this example, there is a relationship between the procedures SF and RF2 because SF is a send procedure for Q1 and RF1 is indirectly a receive procedure for Q1. In some circumstances, omitting such relationships may not pose a problem. This is because identifying relationships which directly use global entities will in most cases identify a large percentage of relationships. As such, it is contemplated that if the complex relationships are not of interest, then the creation and processing of the DFA files, and the DFA processing of local entity names within records in the object file are not necessary. However, in the event that these relationships are also to be captured, an interface procedure call containing a local variable located where a communications object identified by a global variable symbolic name would otherwise be expected, may be supplemented by a list of global variables whose values the local variable may be assigned using the above referenced DFA techniques. In the above example, the local variable local.sub.-- 1 may be assigned the values Q, as in the procedure RF1, or Q1, as in the procedure RF2, and as such the list {Q,Q1} is associated with the local variable local.sub.-- 1. This list is preferably generated automatically. In some cases, it may be necessary to follow the local variable through several procedure calls before finding the global variable(s) it actually represents. For the purposes of the creation of the indirect messaging table, a separate record would then be created for each possible global variable listed in association with the local variable. In the above described example it has been assumed that the source code consists of files which contain procedures which contain procedure calls. This is shown diagramatically in FIG. 3 where a file 32 is shown to contain a procedure 34 which contains a procedure call 36. Using the system and method provided by the invention on this example, a particular relationship will identify a sender interface procedure by a particular procedure containing the sender interface procedure call, and a particular file containing the particular procedure. These may be referred to as the send procedure and the send file respectively. Similarly, the receive interface procedure will be identified in the relationship by a particular procedure containing the receive interface procedure, and a particular file containing the particular procedure. These may be referred to as the receive procedure and the receive file respectively. Such a relationship is very rich in information. It identifies a messaging relationship between the send procedure and the receive procedure. It identifies a messaging relationship between the send file and the receive file. Thus, for the example in Table 3, relationships may be identified between proc_A and proc_B, and between file_A and file_B. These three types of relationships are illustrated in FIGS. 8A-8B respectively. More generally, it is to be understood that depending upon the nature of the source code contained in the source code repository, procedures may be grouped in many different ways on many different levels. At the very bottom level, there will always be some software entity which contains a call to a "send" interface procedure and another software entity which contains a call to a "receive" interface procedure. These software entities may be referred to as "Level 1" entities (and in the above described example these are procedures containing interface procedure calls). Groups of level 1 entities may be arbitrarily combined to form "Level 2" entities (and in the above described example these are procedures). Groups of Level 2 entities may be arbitrarily combined to form "Level 3" entities (and in the above described example these are files). Any number N of levels may be defined, and generally, entities may be categorized from Level 1 to Level N. These levels and their contents may be manually defined for a particular source code repository for the purposes of the RDT, and/or may exist due to the structure of the source code repository. The methods and systems provided by the invention identify basic indirect messaging relationships between Level 1 entities which inherently identify relationships between higher level entities. The relationships do not even need to involve entities on the same level. For example, it might be of interest to determine all Level 2 entities which interact with a particular Level 3 entity (in the above level example definitions, this would involve determining all procedures which interact with a particular file). The structure of the basic indirect messaging relationships identified permits such a determination to be made with a simple database query. Furthermore, it is to be understood that these levels may include logical groupings such as defined in a library code repository for example, when present. By associating all logical and physical levels/groupings with each interface procedure call, it is possible to identify relationships between physical entities, between logical entities, or between physical and logical entities. An example of an IDF for the "C" language is given in FIG. 9. In this example, the first twenty-seven entries 60 are specific to the OS (pSOS) language, and the remaining two entries 62 are application based entries. All constructs could be stored in a single file (the object for example) with DFA being conditionally performed if the object in the COID position is a non-global entity. Furthermore, the object files and DFA files are convenient ways of organizing constructs. Alternatively the interface procedure calls may be searched for directly in the source code files in which case the object files and DFA files are not required. Referring again to FIG. 2, the IPRDT 5 has as a first input the static structure of the software stored in the source code repository 12, and has as a second input an identification of all indirect messaging relationships existent in the source code repository, both as identified by the RDT 10 as described in detail above. For the purpose of example, the static structure will be assumed to contain at least a module_uses_section table, a proc_const_table, a proc_uses_proc (procedure uses procedure) table and a proc_calls_proc (procedure calls procedure) table. The module_uses_section table has the following structure: module_id; section_id. Module_id is the name of a module, and section_id is the name of a section used by the module. The proc_const table has the following structure: section_id; id_name; line_no; impl_section_id; impl_line_no. Section_id identifies the section in which the procedure is defined. Id_name is the logical name of the procedure. Line number is the line within the definition section section_id at which the procedure is defined. Impl_section is the section within which the procedure is actually implemented. Impl_line_no is the line number within the implementation section at which the implementation starts. The proc_uses proc table has the following structure user_section_id; user_id_name; section_id; id_name; line_no. The field user_section_id is the name of the section of the using procedure. The field user_id_name is the name of the using procedure. The field section_id is the name of the section of the used procedure. The field id_name is the name of the used procedure. The field line_no is the line number within the user section at which the used procedure is used. The proc_calls_proc table has the following structure: caller_section_id; caller_id_name; section_id; id_name; line_no. The field caller_section_id is the name of the section of the calling procedure. The field caller_id_name is the name of the calling procedure. The field section_id is the name of the section of the called procedure. The field id_name is the name of the called procedure. The field line_no is the line number within the caller section at which the called procedure is called. A very high level flowchart for the functionality implemented by the IPRDT 10 of FIG. 2 is shown in FIG. 10. The steps in this flowchart will be described briefly by way of overview, and then each step will be described in detail. Step one is to identify all processes and their associated process entry points. Step two is to create a "process_uses_proc" table. Step three is to create "inter-process indirect messaging relationships table". Step One: Identify Processes and Process Entry Points Each process has some sort of process reference which is used globally to identify the process. In addition, each process ultimately has some static software entity which is run first when the process begins, i.e. which is an "entry point" to the process. The purpose of this step is to match up process references with process entry points. The particular nature of this match up is very dependent upon the particular operating system. In the pSOS operating system used in some Northern Telecom switching systems for example, the process reference is a process name identified in a create process procedure call. The process create procedure call also contains a process identifier. The entry point is identified by an "entry_proc" contained in an associated start process procedure call. However, the start process procedure call does not include the process name, but does include the process identifier. The purpose of this step in the pSOS world is to match up each process name with an associated entry_proc. A flowchart for step one for the pSOS example is shown in FIG. 11, and contains two substeps, namely step A which consists of creating an "activation table" which contains a list of all process activation procedure calls, i.e. all procedure calls which relate to the creation of a process or processes or to the start of a process or processes. Step B is to create a process entry points table by identifying all pairs of records in the activation table consisting of a first record relating to performing a "create" procedure on a particular process, and a second record relating to performing a "start" procedure on the same process. In step A, each record in the set of records in the object files and the DFA files pertaining to "procedure calls procedure" is examined to see if the called procedure is either a process create or a process start statement and each such record is stored in the activation table. A preferred format for records in the activation table and two examples of such records are shown in Table 4. Example record A is an example of a record which uses the process create procedure call "t_create", and example record B is an example of a record which uses the process start procedure call "t_start". It is assumed that these procedures are called from procedures contained in files.
TABLE 4
Activation table Record Format and Example
Example Example
Field Name Record A Record B
File Name file_A file_A
Procedure where called proc_A proc_A
Process Activation Procedure t_create t_start
Name
line number 100 200
Procedure Type {create, create start
start}
Priority Pri_1
Process Identifier PID_1 PID 1
Process name process_1
Entry_proc name entry_proc_1
The fields in the activation table record include the name of the process activation procedure being called, the procedure within which the process activation procedure is called, and the file containing that procedure. The line number identifying where the process activation procedure was called is included in the "line number" field. The "procedure type" field identifies the process activation procedure as either a create or a start procedure. The "priority" field contains the priority of the process as defined in the create statement. The "process identifier" field contains the PID for the process. This field is filled in for both the create and the start statements. The "process name" field contains the symbolic name of the process. This is only present in the create statement. The entry_proc field contains the name of the entry procedure. This field is only present in the start statement. Step A Identify Pairs of Records and Create Process Entry Points Table In this step, process create and process start procedure calls are paired up such that the each process name can be linked with a corresponding entry_proc. A process creation relationship exists when one procedure is using a "create" type process actuation procedure with respect to a particular process identifier, and another procedure is using a "start" type process actuation procedure with respect to the same process identifier. A table defined as a "Process Entry Points table" is created and is used to store a record for each such pair of records. Such matching pairs of activation table records are correspondent to the creation and activation of the same process. An example record format is shown in Table 5.
TABLE 5
Process Entry Points Table Record Format and Example
process_name process_1
process identifier PID_1
Priority Pri_1
entry_proc_name entry_proc_1
module_id module_1
section_id section_1
The process_name is the symbolic name of the process extracted from the "create" record in the activation table. The priority is extracted from the activation table from the "create" record. The entry_proc_name is extracted from the "start" record. Depending upon the partitioning of the software, it may be necessary to include additional structural names in this record to uniquely identify the entry procedure. For example, for pSOS, the section (File), module and procedure_name will ensure a unique identification of a particular procedure. Thus the record may include a section_id field and a module_id field. These would be filled in on the basis of the static software structure identified by the RDT or the LDT of FIG. 2. The term "static structure" when used in conjunction with a particular procedure call will be used to refer to any and all static software entities required to uniquely identify the particular procedure. A flowchart for a method of matching pairs of activation records is shown in FIG. 12. Firstly, the records in the activation table are examined in sequence for the first (next) record having an unprocessed PID and having type create. The process name in that record is extracted. Next, the remaining records are examined for any records having the same PID and having a type "start". When such a record is identified, the "entry_proc" identifier from the record having type start is extracted. The process name and entry_proc thus extracted are used to create a record in the process entry point table. The static structure needed to uniquely identify the entry procedure is also extracted and included in the corresponding process entry points table record. The entire process is repeated for the next and all subsequent unprocessed PIDs having type "create". While a very specific method of identifying process creation relationships from the activation table records has been described, it is to be understood that other methods may be employed within the scope of the invention. Step Two: Create Process_uses_proc Table In this step, a complete set of procedures which are called by each process is identified. A table defined as a "Process_uses_Proc" table is created and is used to store this information. An example record format is shown in Table 6. A record for the example Process Entry Points table record of Table 5 would be created as described in detail below.
TABLE 6
Process_uses_Proc Record Format
Field Name Example Contents
process_name process_1
procedure_name entry_proc_1
module_id module_1
section_id section_1
A flowchart for this step is shown in FIG. 13. To begin, each entry in the process entry points table is examined in sequence. Then, a record in the process_uses_proc table is created consisting of the process_name, the entry_proc and any static structure needed to uniquely identify the entry_proc procedure thereby identifying that the process identified by the process_name name uses the procedure identified by the entry_proc field. Next, an iterative procedure is performed to identify further records in the process_uses_proc table. Each record in the process_uses_proc table is read in sequence, starting at the beginning of the table. This involves for the ith record reading process_name_i, and procedure_name_i. Next, all the procedures called by the current one procedure_name_i are identified. This consists firstly of examining each record in the proc_calls_proc and the proc_uses_proc tables relating to procedure_name_i and extracting records of the form procedure_name_i, uses/calls procedure_name_j and the necessary static structure for procedure_name_j. A record in the process_uses_proc table is created consisting of process_name, procedure_name_j and the static structure for procedure_name_j. The new record is added to the end of the table such that, since the table is processed in sequence, the new record will itself be processed at a later time at which point it will not longer be the last record in the table. This is repeated until no new records are to be added to the table. Mathematically, this is a referred to as a "transitive closure". Step Three: Create Inter-Process Indirect Messaging Table Once the "Process_uses_Proc" table has been completed as described above, step three of the method comprises the extraction of inter-process indirect messaging relationships from this table. An input to this step is the previously described IMT (indirect messaging relationships table). This may have been produced in accordance with the above described method. Alternatively, the input may take any form which permits the identification of indirect messaging relationships between static functional entities. A flowchart for this step is shown in FIG. 14. Assuming that it is the IMT which is used, each record in the table is examined in sequence. Recall each IMT record includes a first procedure using a "send" type procedure and a second procedure using "receive" type procedure with respect to a particular COID (communication object identified), thereby identifying an indirect relationship between the first procedure and the second procedure. The procedure using the "send" type procedure is first identified and used to look up in the "Process uses Proc" table a first set of process names consisting of all processes which use that procedure. Next, the procedure using the "receive" type procedure is identified and used to look up in the "Process uses Proc" table a second set of process names consisting of all processes which use that procedure. Finally, a set of records in the inter-process indirect messaging table is created consisting of all possible pairs of process names which include one from the first set of process names and one from the second set of process names. Each such record identifies a potential inter-process indirect messaging relationship. Preferably, the COID is also extracted from the relevant record in the indirect messaging table and stored with each record in the process indirect messaging table. An example record format is shown in Table 7.
TABLE 7
Inter_Process Indirect Messaging Relationship Table
Field Name Example Contents
Process_name_sender process_1
Process_name_receiver process_2
COID COID_1
This procedure is completed for each record in the IMT this identifying all possible inter-process indirect messaging relationship. These relationships may then optionally be stored in a database, and/or used to generate graphical output such as the relationship between process.sub.-- 1 ad process.sub.-- 2 depicted in FIG. 1D. Alternatively, assuming the RDT of FIG. 10 and described in detail above is used, then the inter-process indirect messaging relationships may be captured as an extension to the above described indirect messaging table. This may be done by adding the two fields process_name_sender and process_name_receiver to each record in the IMT. The IMT record format of Table 3 above thus modified has the format of Table 8 below:
TABLE 8
Indirect Messaging Relationships Table Record Format
for C Queue Message Type and Example Including Inter-Process
Parameters
Field Name Example Contents
file_sender tile_A
proc_sender proc_A
sender_line_no 100
file_receiver file_B
proc_receiver proc_B
receiver_line_no 200
COID GV1
process_name_sender process_1
process_name_receiver process_2
In the above described embodiment, it has been assumed that processes are identified by process names and that the first procedure in a process is identified by an entry procedure for that process. Furthermore, it is assumed that processes are created and started independently, and that create and start commands must be matched up in order to match process entry points with process references. The entry procedure is then used as a starting point to identify all procedures which may be called/used in that process. More generally, any method of identifying a set of possible called/used procedures associated with some sort of process reference which is identifiable within the source code may be used. The particular method used may depend upon the operating system employed. For example, in the UNIX operating system, processes do not have process names, but only have PIDs. Processes may be created using the "fork" command which spawns a child process identical to a parent process. In so doing, an exchange of process identifiers is made, so that the parent process knows the identity of the child process. In another example, processes may also be started by referring to file names instead of entry procedures. In such case, there may be some convention which identifies the first procedure from which to develop a list of procedure names which may be called. Again using the UNIX example, when a filename is used in a process start command, there is always a procedure called "main" in the file which is the default first procedure run. In this case, it is the procedure "main" in the file which takes on the role of the entry procedure. Any statement which results in some process being started in some manner will be referred to generally as a process instigation statement. A display may be created for a set of process references to be displayed consisting of a subset or all of the process references, the display consisting of a process display element representative of each process reference in said set of process references to be displayed, for example a bubble as depicted in FIG. 1D; and for each pair of process references in one of an inter-process indirect messaging relationship which are both included in the set of process references to be displayed, a connection display element representative of a connection between the display elements for the pair of process references. Each connection display element may for example be a line connecting the relevant process display elements labelled with the corresponding communications object identifier. It may be that multiple relationships between two processes exist in which case the line connecting the relevant display elements may be labelled with more than one communications object identifier. The invention may be embodied in a processor readable medium containing a software program comprising instructions for a processor to implement any of the above described methods. Furthermore, the invention may be embodied in a processing platform programmed for implementing any of the above described methods. Numerous modifications and variations of the present invention are possible in light of the above teachings. It is therefore to be understood that within the scope of the appended claims, the invention may be practised otherwise than as specifically described herein.
|
Same subclass Same class Consider this |
||||||||||
