Data structure extraction, conversion and display tool5432942Abstract The present invention relates to a tool, in the form of a computer program, for analyzing computer programs by extracting and converting information about data structures in the program, storing the information about the extracted data structures in a series of random access files forming a relational database, and displaying the stored information as desired. The method for analyzing the computer program using the tool of the present invention includes the steps of inputting a computer program to be analyzed, extracting and converting at least one data structure such as a variable or a table from the program, storing information about the data structure(s) in one or more random access files, and displaying the stored information in either a textual or graphical mode. The program to be analyzed is preferably inputted into the program of the present invention in the form of one or more source code files. It has been found to be successfully applied to the analysis of source code files written in programming language Compiler Monitor System Version 2Y (CMS-2Y), which is commonly used in military application. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE I
__________________________________________________________________________
VRBL
NAVFLAGA I 32 S P O
"NAVSAT FLAG "$
VRBL
SCRUDANG A 32 S 31 P O
"RUDDER ANGLE IN BAMS
"$
VRBL
SCSTNPLN A 32 S 31 P O
"STERN PLANE ANGLE IN BAMS
'$
VRBL
SCSFTRPM A 32 S 6 P O
"SHAFT RPM (REV/MIN)
"$
__________________________________________________________________________
The interpretation of the input code for variables is based upon the fixed syntax of the language of the compiler. As shown in Table I, the variable definitions include a number of tokens. Preferably, only the following formal constructs or tokens are interpreted: (a) The syntax of the compiler is such that the first token after the keyword VRBL is a character string identifying the variable name e.g. NAVFLAGA; (b) The next token is a single character identifying the data type of the structure to be created (I=integer or A=real); (c) The next token is an integer that represents the size of the variable in bits; (d) The next token is a single character that identifies the sign of the variable (U=unsigned, S=signed); and (e) The interpretation of the next token depends on whether the second identified token identified the type of the variable as integer or real. For variables of type real, the next recognized token is an integer representing the size of the fractional portion of the variable in bits; and (f) The next recognized token is a word beginning with two quote characters (ASCII character 0039) in a row, or the End Of Line (EOL) character "$". The quote character sequence identifies the beginning of a character string representing an inline comment. If the End of Line character is the next token, the definition of VRBL is complete. If the beginning of an inline comment was identified during the parsing of the VRBL declaration in the program, then all subsequent tokens are parsed and lexically added together (concatenated) until a token is found that ends with two quote characters in a row. This construct identifies the end of the definition of the inline comment and that the definition of the VRBL is complete. Once the complete variable definition and inline comment has been extracted, the information is stored in the relational database 12 for subsequent use by the display functions. For illustrative purposes, it can be assumed that the source code under analysis also uses the keyword "TABLE" to identify a complex data structure for processing. When the keyword "TABLE" is recognized, the program invokes the Process Table function 24 which in turn invokes a Process Field function. This portion of the extraction and conversion process reads the name and table top level data structure defined in the source code. The process table function extracts the table name, type (horizontal or vertical), the number of items in the table, and the number of 32-bit words of memory to be allocated for the table. Additionally, if there is an inline comment on the table keyword, it is extracted as well. Once the top level structure of the table has been extracted, the process table function invokes the process field function in order to extract the constituent field data structures used in the table. The data structure information extracted for each field includes field name, data type, size, sign, and position in the table (both word position and bit position). An example table definition, complete with field definitions and inline comments is outlined below in Table II:
TABLE II
__________________________________________________________________________
CSTSEX H 7 MAXNTGSS "CSTS EXTENSION "$
FIELD RNGRATE
A 16 S 8 0 31
"RANGE RATE OF TGT IN
YARDS/SECOND "$
FIELD TURNAMT
A 16 S 15 0 15
"TURN AMOUNT IN HAMS "$
FIELD TURNAMTB
A 16 U 16 0 15
"TURN AMT IN BAMS "$
FIELD EUA A 16 S 15 1 31
"D/E ANGLE IN HAMS "$
FIELD DE A 16 U 16 1 31
"D/E ANGLE IN BAMS "$
FIELD UCRR A 16 S 8 1 15
"UNCORRECTED RNG-RATE "$
FIELD DISPHIST
B 2 0 "WHEN SET, THIS TARGET HAS
"$
" TRACK HISTORY DISPLAYED
"$
FIELD OPTURN B 2 1 "OPERATOR SELECTED TURNRATE
1 - YES 0 - NO (TRAINER ONLY)
"$
FIELD ZIGZAG B 2 2 "TGT ZIGZAG MANEUVER IN EFFECT
1 - YES 0 - NO (TRAINER ONLY)
"$
FIELD PIDEQUL
B 2 3 "TGT IS PI/DE QUALIFIED
1 - YES 0 - NO "$
FIELD ASSB64 B 2 4 "TGT IS ASSIGNED TO B64
1 - YES 0 - NO "$
FIELD PU B 2 6 " TGT IS A PARTICIPATING
UNIT
1 = YES 0 = NO "$
1 - YES 0 - NO "$
FIELD SODL B 2 8 " TGT IS A SODL TRACK
1 = YES 0 = NO "$
FIELD OTH B 2 5 "TGT IS OTH OR HAS GTE
"$
FIELD OTBITS I 4 U 2 8
"OVERLAY FOR OTH BITS "$
FIELD SNORKEL
B 2 9 "TGT SNORKEL STATUS (0) OFF,
(1) ON "$
FIELD ACTAMP B 2 10 "TGT ACTIVE SONAR LEVEL(0)
LOW, (1) HIGH "$
FIELD ACTSONAR
I 2 U 2 12
"TGT ACTIVE SONAR MODE
"$
FIELD B64INX I 8 U 2 31
"B64INTF1 TABLE INDEX FOR
21B64 INTERFACE "$
FIELD ORDCRS A 32 S 31 3 31
"ORDERED CRS HAMS "$
FIELD ORDCRSB
A 31 U 31 3 31
"ORDERED CRS BAMS "$
FIELD ORDDEPTH
A 16 U 3 4 15
"ORDERED DEPTH "$
FIELD ORDSPD A 16 S 4 4 31
"ORDERED SPEED "$
FIELD CRUMY I 16 S "Y POSITION CRUMBS "$
FIELD CRUMX I 16 S "X POSITION CRUMBS "$
FIELD TRNRATE
A 32 S 31 6 31
"TURNRATE IN HAMS/SEC "$
FIELD TRNRATEB
A 31 U 31 6 31
"TURNRATE IN BAMS/SEC "$
TABLE CSTSEX $
__________________________________________________________________________
The interpretation of the input computer code for tables is also based upon the fixed syntax of the language of the compiler. Preferably, the following constructs for tables are interpreted in this invention: (a) The syntax of the compiler is such that the first token after the keyword "TABLE" is a character string identifying the table name (e.g. CSTSEX); (b) The next token is a single character identifying the table type of the structure to be created (H=horizontal, V=vertical); (c) The next token is a variable name or an integer that represents the table size of the data structure (in 32-bit words); (d) The next token is a variable name or an integer that identifies the number of items in the table; (e) The next recognized token is a word beginning with two quote characters (ASCII 0039) in a row, or the End Of Line (EOL) character "$". The quote character sequence identifies the beginning of a character string representing an inline comment. If the End of Line character is the next token, the definition of the table name and size is complete and the detailed description of the internal structure of the TABLE begins. If the beginning of an inline comments was identified during the parsing of the TABLE declaration in the program, then all subsequent tokens are parsed and lexically added together (concatenated) until a token is found that ends with two quote characters in a row. This construct identifies the end of the definition of the inline comment; and (f) The definition of the internal structure of the TABLE includes the definition of any number of fields within the table. The definition of the structure of each field is sequentially defined until the keyword "END-TABLE" is encountered. The "END-TABLE" keyword signifies the completion of the definition of the TABLE data structure. The definition of the language syntax interpreted by the present invention for the constituent fields within a TABLE data structure may be specified as follows: (1) The syntax of the compiler is such that the first token after the keyword FIELD is a character string identifying the field name (e.g. RNGRATE); (2) The next token is a single character identifying the data type of the structure to be created (I=integer, A=real, B=boolean); (3) The interpretation of the next token depends on whether the second identified token identified the variable as an integer, a real or a boolean data type. (A) For integer and real data types; (i) the next token is an integer that represents the field size of the structure in bits; (ii) the next token is a single character that identifies the sign of the field (U=unsigned, S=signed); (iii) the interpretation of the next token depends on whether the identified field is of the type integer or real. For fields of type real, the next recognized token is an integer representing the size of the fractional portion of the field in bits; (iv) the next recognized token is an integer specifying the word position of the field within the table; (v) the next recognized token is the bit position of the field within the previously identified word in the table; (vi) the next recognized token is a word beginning with two quote characters (ASCII character 0039) in a row, or the End of Line (EOL) character "$". The quote character sequence identifies the beginning of a character string representing an inline comment. If the End of Line character is the next token, the definition of the field is complete. If the beginning of an inline comment was identified during the parsing of the FIELD declaration, then all subsequent tokens are parsed and lexically added together (concatenated) until a token is found that ends with two quote characters in a row. This construct identifies the end of the definition of the inline comment and that the definition of the field is complete. (B) For boolean data types, (i) the next token is an integer representing the word position within the TABLE data structure of the boolean field; (ii) the next token is an integer representing the bit position of the boolean field within the previously identified word in the TABLE data structure; (iii) the next recognized token is a word beginning with two quote characters in a row (ASCII character 0039). This sequence identifies the beginning of a character string representing an inline comment; (iv) the next recognized token is a word beginning with two quote characters (ASCII 0039) in a row, or the End Of Line (EOL) character "$". The quote character sequence identifies the beginning of a character string representing an inline comment. If the End Of Line character is the next token, the definition of the field is complete. If the beginning of an inline comment was identified during the parsing of the FIELD declaration in the program, then all subsequent tokens are parsed and lexically added together (concatenated) until a token is found that ends with two quote characters in a row. This construct identifies the end of the definition of the inline comment and that the definition of the field is complete. The information produced during this extracting and converting step is then stored in a series of random access files in the relational database 12 for subsequent access by the display functions. Each file in the database can be kept on an individual disk if desired. The C-Switch Processing Block may be used to identify structures containing the keywords "CSWITCH", "CSWITCH-ON", "CSWITCH-OFF", and "END-CSWITCH" or any other keywords used to identify sections of the source code to be switched on or off for conditional compilation. The four keywords mentioned above, when found in the source code, invoke process-cswitch, process-cswitchon, process-cswitchoff and process-cswitchend functions. These functions can be invoked at any time (even in the middle of a table definition) in response to encountering switches in the compiler. The functions add to the description of individual data structure records stored during the conversion process, the definition of the existence of cswitches and their status (ON or OFF). As can be seen from the foregoing, the processing sections discussed above extract data structure information about variables, tables, fields in the tables, and comments about the foregoing located in the source code. The data structure information, as well as conversion status information, is stored during the extraction process in one or more random access files 12 for subsequent display processing. Preferably, the information is stored on off-line storage devices such as hard disks, floppy disks, tapes, and the like. This off-line storage approach represents a relational database. It has been found that this approach is desirable because it allows for the processing of very large source files (limited only by available disk space, not by available computer memory). This approach also eliminates the need to extract data structure information each time an analysis session is initiated. In the database, a separate status file is maintained that identifies complete status information logged during the file extraction and conversion process. The detailed data structure information extracted by the process-variable, process-table and process-cswitch functions are preferably stored in four random access data files. They are the variables file, the tables file, the fields file and the comments file. The status file is produced at the end of the conversion process and contains information concerning the number of extracted variables, tables, fields and comments. It contains the sizes of the files used and created, their file creation dates, and a list of the CSWITCHES found. The variables file contains a number of fixed length records, each defining the content of the data extracted for the VRBL data type. Information stored in each record includes the variable name, data type, size, sign, data position, and cswitch status. Each of these records also contains pointer information to index into the comments file for any inline comments extracted for the variable. The tables file contains a number of fixed length records, each defining the content of the data extracted for the TABLE data type. Information stored in each record includes the table name, type (horizontal or vertical), number of items, and length in 32-bit words. Each of these records also contains pointer information to index into the comments and fields files. The pointers for the comments file provides access to any inline comments extracted from the table definition. The pointers for the fields file allow for the access to the field data that comprise the table definition. The fields file contains a number of fixed length records, each defining the content of the data extracted for the FIELD data subtype for the TABLE data structure. Information stored in each record includes the field name, data type, size, sign, data location (word and bit positions) and cswitch status. Each of these records also contains pointer information to index into the comments file for any inline comments extracted for the field. The comments file contains a number of fixed length records, each defining the comment information extracted during variable, table and field processing. Access to the comments records is performed once the file indexing pointers have been obtained from the appropriate variable, table or field file record. The software tool of the present invention also includes a display segment which provides textual and graphic displays of the extracted data structures. The display segment has a simple printing function embedded therein that provides printouts of the higher level information managed by the software. If desired, the display segment may also include an embedded function that provides visual display of a desired set of information on a monitor associated with the computer on which the analysis tool of the present invention is being run. The textual display mode provides the capability to display the content of a selected data structure stored in the aforementioned files in text format. For table data structures, the individual fields within the table may also be shown (see FIGS. 3 and 4). The graphic display mode of the display segment may be used to provide a schematic representation of a selected data structure as it would be stored in the computer when the program is executed. As shown in FIG. 5, it provides for the display of the content, data type and location of each data structure and its constituent parts. The legend at the bottom of FIG. 5 illustrates the various types of information being displayed. With respect to "bit" information, the following code is applicable with respect to the CMS-2Y source code.
______________________________________
WORD 2 (BIT FIELDS):
BIT POSITION FIELD NAME
______________________________________
0 DISPHIST
1 OPTURN
2 ZIGZAG
3 PIDEQUL
4 ASSB64
5 (BLANK)
6 PU
7 STDL
8 SODL
9 SNORKEL
10 ACTAMP
______________________________________
The menu system 16 provides the man-machine interface and acts as the binder for the other constituent functions. It may contain memubars, pop-up menus, pop-up windows and context sensitive help. It also includes a pop-up list feature which provides for display of lists of data and for user selection from the displayed lists. The present invention is most advantageous in that it provides a data structure extraction, conversion and display capability that eliminates the labor intensive process of manually searching for data structure information, interpreting the information, and sketching the architecture of the data storage provided in the operational computer. The present invention facilitates the processing of very large quantities of computer software and the filtering out of data not pertinent to the data structure analysis task. The present invention also provides rapid viewing of data structures, reducing the time required to perform analyses. Still further, the graphic display of the data structures enabled by the present invention provides visualization of many software architecture attributes (table packing efficiency, multiple data references, etc.). The present invention also avoids documentation errors due to the extraction of data from the source code itself. It is apparent that there has been provided in accordance with this invention a data, structure extraction, conversion and display tool which fully satisfies the objects, means, and advantages set forth hereinbefore. While the invention has been described in combination with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art in light of the foregoing description. Accordingly, it is intended to embrace all such alternatives, modifications and variations as fall within the spirit and broad scope of the appended claims.
|
Same subclass Same class Consider this |
||||||||||
