Computer program analyzer for adapting computer programs to different architectures5488714Abstract An extended mode analyzer (EMA) processes source code modules, detects suspicious instruction patterns and produces recommendations for code modification. The EMA applies knowledge based technology to the problem of massive source code conversion. The knowledge base component within the EMA models any given source code module using a hierarchical class/attribute structure. All source lines occurring in a given module are partitioned into homogenous classes characterized by function or instruction type. Higher level programming concepts are abstracted from lower level implementation details by drawing correspondences between class members which constitute instruction sequences related by common elements. When inferencing begins, the existence of class members meeting certain criteria trigger events which change the state of the world as seen by the knowledge base, in turn triggering other state changing events and so on until a state of equilibrium is achieved. The end result of this process is the body of recommendations produced by EMA for source code conversion. Claims I claim: Description BACKGROUND OF THE INVENTION
______________________________________
Sequence 1 SR R1,R1
ICM R1,3,A
SLL R1,8
Sequence 2 L R1,B
SLL R1,8
SRL R1,8
Sequence 3 SRL R1,8
STH R1,C
Sequence 4 LH R3,C
STH R3,D
______________________________________
In the following code segments, the lines marked S1, S2, S3, and S4 belong to sequences 1, 2, 3, and 4, respectively:
______________________________________
S1 XR R1,R2 S2 L R1,B
S1 ICM R1,3,A S1 XR R1,R1
S2 L R1,B S1 ICM R1,3,A
S1,S2 SLL R1,8 S4 LH R3,C
S4 LH R3,C S1,S2 SLL R1,8
S2,S3 SRL R1,8 S2,S3 SRL R1,8
S4 STH R3,D S4 STH R3,D
S3 STH R1,C S3 STH R1,C
______________________________________
If the four sequences are interleaved in the manner shown on the left, the function accomplished by each sequence in its consecutive form is negated by intervening instructions. If, however, they are interleaved in the manner shown on the right, sequences 1 and 3 achieve their original functions, while sequences 2 and 4 do not. Rules in the knowledge base distinguish between occurrences of nonadjacent, overlapping sequences such as the ones on the left which require no modification, and those on the right for which code modifications are necessary. On the System 80, certain data structures which are global to all OS/3 modules have traditionally been stored in a low area of memory such that a structures's address fits into just two bytes. Some of these structures are being relocated to higher memory on the new platform (i.e., their addresses are expanding from two to four bytes) so that half word addressing is no longer valid. Addresses of global data structures are equated to well known symbol names which are accessible to all OS/3 modules. Because the low address range of these structures is well known, it is common throughout OS/3 code to store these symbol values in two bytes of a register or into a local variable which is knowingly treated by the programmer as a half word. All occurrences of half word addressing in OS/3 code must be identified and changed. In some modules, a relocated structure is accessed by its symbol name providing a way to trace the propagation of its value to registers and local variables. Any instruction manipulating such a register or variable can then be examined for addressing violations. In other modules, a relocated structure is accessed through a register which has been loaded with the structure address by a previously executed module. When values are passed in this way, the symbol name never appears in the code making it impossible to detect addressing violations through symbol propagation. In this case, knowledge base rules sensitive to half word data manipulation use the same types of contextual clues mentioned above to distinguish between the manipulation of address versus non-address data. These rules generate recommendations for code conversion for instructions which appear to manipulate half word address data. The main components in the OS/3 knowledge base 14 are class structures and rules. Collectively, the class structures provide a symbolic framework in which any given source code module can be represented. Knowledge base rules (technically called demons) embody the knowledge used to identify source lines which must be modified for the extended address platform. The class structures defined in the knowledge base remain constant from one execution to the next, while the members of a class vary with each OS/3 module analyzed. All source lines in a module are treated as members of the class "CODE", which has an associated set of attributes representing all characteristics of a single OS/3 source line. Every member of the class takes on a unique value for each of the attributes, much like every record in a data base takes on its own value for each field associated with a data base table. As shown in FIG. 3 which illustrates in more detail the symbolic source information 13, CODE 30 is a parent class to fifteen subclasses 32 which inherit its attributes, collectively forming a two-level class/attribute hierarchy. Every source line in the current module (other than comments and certain assembler directives) falls into exactly one of the CODE subclasses 32, thereby partitioning the set of all source lines into homogeneous classes characterized by function or instruction type. All symbols (tags) referenced in the given module are treated as members of the class SYMBOLS 34, with the exception of proc names which belong to the PROC.sub.-- SYMBOLS class 36. Both SYMBOLS and PROC.sub.-- SYMBOLS have associated attribute sets representing characteristics of the individual members in each class. Class members in the OS/3 knowledge base are logically connected by relation attributes associated with the class CODE. A relation attribute is a pointer to a member of a specified class, through which attribute values of the related member can be reached. Some relation attributes of the CODE class point to members of CODE itself, while others point to members of either SYMBOLS or PROC.sub.-- SYMBOLS. FIG. 3 illustrates the design of the class/attribute hierarchies in the OS/3 knowledge base, although not all subclasses of CODE are shown. Relationships between example members in each class are depicted by arrows drawn from one member to another. An example of a CODE attribute which points to another member of CODE is the relation attribute "next.sub.-- line" 37. For each member of CODE (with the exception of the last line in the module), the attribute "next.sub.-- line" points to the next significant line of code occurring in the given module. An example of a CODE attribute which points to a member of the SYMBOLS class is the relation attribute "symbol" 38. For each source line whose first operand is a symbol, "symbol" points to the corresponding member of the SYMBOLS class. The path established by "symbol" allows the attribute values of the related SYMBOLS member to be retrieved or modified. Once they are populated with members and attribute values, the class structures in the knowledge base form a repository for every implementation detail involved in a particular source code module. The information stored in these structures and the connectivity established between their members forms a rich framework from which higher level programming concepts can be abstracted. Each module analyzed by the EMA is first processed by the parser 12, which converts the contents of a source code module into a form recognizable by the OS/3 knowledge base 14. The output of the parser 12 is a communications file consisting of statements asserting the existence of class members (i.e., source code lines, symbols and proc symbols) and their associated attribute values. Each time the OS/3 knowledge base 14 is invoked to analyze a particular module, the communications file associated with that module is accessed and its contents are read. Demons are triggered by information contained in the communication file. Each demon in the knowledge base is designed to detect a unique instruction pattern, some consisting of single source lines and others consisting of multiple source lines related by common elements. In the process of reading class and attribute assertions from the communications file, a demon is activated by any member of CODE (i.e., any source line) which qualifies as its target line. Demons searching for single line patterns have only one possible target line. Demons which search for multiple line patterns use the line which is the most probable indicator of the pattern's existence as a target line. When a demon identifies its target line, related demons are triggered to locate other source lines which complete the instruction pattern being sought. This chain reaction continues until the entire pattern is found or a terminating state is reached. Terminating states can be caused by conditions such as interfering instructions which negate the function of the pattern being sought or a scan which exceeds the maximum number of lines over which a pattern can be reasonably expected to occur. If a pattern is found, recommendations for code conversion are assigned as attribute values of the source lines involved. The cycle of target line identification and associated pattern line search continues until the entire communications file has been read. The following description provides details of the implementation of class structures in the OS/3 knowledge base 14. Each class and its associated attributes are defined. Certain information items pertaining to the current module are stored in global attributes defined in the OS/3 knowledge base. Global attributes are not associated with any one class, but act as value holders accessible throughout the entire knowledge base. file name: Prefix of OS/3 source module to be analyzed (e.g., "TV$MIMM"). All related filenames are formed by appending extensions to this name. debug: Determines if messages are generated for the terminal and log.sub.-- file. (See log.sub.-- file explanation below.) module.sub.-- name: Path name of current OS/3 source module relative to the OS3EMA directory. If the module named "TV$MIMM.OS3" is in the directory "OS3EMA/modules", module.sub.-- name would have the value "OS3EMA/modules/TV$MIMM.OS3". parser.sub.-- version: Version of parser used to create communications file. opcode.sub.-- version: Version of opcode table used by parser to determine which class an OS/3 opcode belongs to when creating the communications file. log.sub.-- file: This attribute is set to the name of the Analyzer Trace file output by the Analyzer. If debug is set to true, a message is generated each time a demon fires. The log.sub.-- file contains a list of all such messages generated during a single module analysis. output.sub.-- file: Contains all source line recommendations generated during a single module analysis. Once analysis is complete, the Report Formatter merges this file with the xxx.jnl file to produce the final output file, xxx.prn. atl.sub.-- file: This attribute is set to the name of the Analyzer Cross Reference file output by the Analyzer and used by the Report Formatter to produce the final EMA output. status.sub.-- file: This is an empty file created to indicate successful completion of a single module analysis. version: Analyzer version; appears in the output file. max: Limit on the number of significant lines to be scanned for pattern components. opcodes: List of opcodes found in the last max lines of code read from the communications file. opcode.sub.-- hum: Number of opcode names stored in opcodes. Used in updating the list. tab, any.sub.-- message, any.sub.-- rec, string, double: All are used for formatting purposes. CODE is the parent class of the fifteen subclasses to which all source code lines belong. Attributes declared at the CODE level are inherited by each of its subclasses. The attributes of the CODE class are as follows: psym: Proc symbol used in a proc call. sym1,(sym2): Symbol used in first (second) address expression OR first (second) symbol argument to a proc call. The context of the instruction determines the way in which sym1 (sym2) is used. sym3: Third symbol argument to a proc call (if one exists). r1,(r2): First (second) register referenced by this instruction. op1,(op2): Operator used in first (second) address expression. The operator type is defined as: PLUS=+ MINUS=- MULT=* ERR=Error (expression was not evaluated by parser) COMPLEX=Complex expression, not fully evaluated. In this case, the parser passes along the first symbol used in the expression as sym1 (sym2) and sets the value of op1 (op2) to COMPLEX to indicate an incomplete evaluation. dis1,(dis2): Displacement used in first (second) address expression. Note that in cases where a symbol is equated to an integer value and used as the displacement, dis1 (dis2) is undetermined but the displacement value is stored in the initv attribute of the symbol pointed to by sym1 (sym2). b1,(b2): first (second) base register referenced by this instruction or third (fourth) register referenced by a USING statement. m: Mask value used in this instruction. For branch instructions, the mask value is the normal OS/3 condition code mask (e.g., m8 for BE). The mask has special values for specialized types of OS/3 branch instructions: BCT, BCTR m16 BC, BCR m17 BXH, BXLE m18 SVC m19 x: Index value used in this instruction or fifth register referenced by a USING statement. i: Immediate storage value used in this instruction. len1,(len2): First (second) length value used in this instruction. prev: Previous significant source code line occurring in module. temp.sub.-- prev: Temporary previous source line. Initially, temp.sub.-- prev is set to prev, but it is constantly reassigned in the process of a backward search. When the search terminates (successfully or not), temp.sub.-- prev is set back to prev in preparation for the next backward search. num: Position of this instruction relative to consecutively numbered significant source code lines in the current module. name: Literal name of opcode used in this instruction. next: Next significant source code line occurring in module. temp.sub.-- next: Same definition as temp.sub.-- prev, but for the next instead of the previous significant line. prev(next).sub.-- line: Previous (next) pattern line relative to this instruction; used only for a three line pattern search. When the first two lines of a three line pattern are found as the target lines a and b, one of those lines is used as the target for the second part of the search (say line b). Line a needs to be "remembered" by line b before starting the search for line c, so line a is stored in the prev.sub.-- line attribute of line b. bstat: Status of a backward search. When searching back from a target line, bstat is set to "changed" if the search terminates successfully (i.e., a valid pattern line is found), or "stopped" if the search terminates unsuccessfully (i.e., the maximum number of lines have been scanned without finding a valid pattern line, an absolute branch is encountered, or an instruction altering the register referenced by the target line is found). fstat: Same definition as bstat, but for forward searching. rule: List of rules which apply to this line. Used as a flag to prevent further searching from a target line which has already been identified as part of a valid pattern. output: Text string containing all recommendations determined for this line. Each significant line of source code in the current module becomes a member of one of the CODE subclasses. Non-significant lines are comments and certain assembler directives. Certain CODE subclasses include source lines with different opcodes (mixed opcode classes), while others include only source lines with the same opcode (single opcode classes). Each subclass includes only those lines which are potential target lines of a specific pattern. The exceptions are DIR (the class of all compiler directives) and MISC.sub.-- CLASS, a catch-all class for those instructions which do not occur as target lines of any pattern. All CODE subclasses inherit the attributes of the parent class CODE. There are no new attributes at the subclass level. Members of the SYMBOLS class are all symbols referenced in the current module with the exception of proc names which belong to the PROC.sub.-- SYMBOLS class. Since the knowledge base 14 does not allow special characters in member names, the parser 12 generates member names from the OS/3 symbol name using upper case for all alphabetic characters and the following translation for special characters:
______________________________________
* .fwdarw. s
@ .fwdarw. a
$ .fwdarw. d
? = q
# = p & .fwdarw. m
______________________________________
For example, OS/3 symbols SB$MHI and TO@DEXP translate to member names SBdMHI and TOaDEXP, respectively. Member names are strictly internal to the OS/3 knowledge base 14 and are seen by the user only if her or she wishes to look at the communications file (xxx.kcm). Attributes of the SYMBOLS class are as follows: name: Actual symbol name used in the OS/3 code. initv: Initial value of LTORG, DS, DC, or EQU symbols. All numeric values are resolved to decimal. decl: Line number on which LTORG, DC, DS, or EQU symbol is declared. stat: Symbol status; can take on one of the following values or remain undetermined if none apply: CG#.fwdarw.Changed Global (# defined in SYMBOL CODES document) AL1.fwdarw.Symbol declared with address length one. AL3.fwdarw.Symbol declared with address length three. SL.fwdarw.Suspect Local. ref: List of line numbers on which a symbol is referenced. This list is accumulated in the actions section and appears in the Analyzer Trace file (xxx.alt). ref.sub.-- cnt: Number of lines on which a symbol is referenced; used to format the output. Members of the PROC.sub.-- SYMBOLS class are all the proc names used in the current module. Member names are handled as for symbols in the SYMBOLS class. The attributes of PROC.sub.-- SYMBOLS are as follows: name: Literal proc name used in the OS/3 code. stat: Passed in as "changed proc" if symbol appears on list of changed procs; otherwise, remains undetermined. The following description provides details of the implementation of demons in the OS/3 knowledge base. Demons in the OS/3 knowledge base have the following format: DEMON NAME: [variable declaration statement] associates this demon with a class and declares a variable which is used to access the associated class members' attribute values WHEN [guard] conditions involving attributes of some member of the associated class which must be satisfied in order for the demon body to execute THEN [body] commands executed when the guard evaluates as true ENDWHEN DEMON NAME is an arbitrary name which uniquely identifies each demon in the knowledge base. The variable declaration statement following the DEMON NAME associates a demon with a specific knowledge base class. WHEN, THEN and ENDWHEN are required KES keywords which break the demon into its guard and body sections. A demon acts as a process which is invoked for every member of its associated class as soon as that member's attributes are known, provided the attributes satisfy the conditions imposed in the demon guard. A demon is reinvoked on the same class member whenever that member is reassigned new attribute values which satisfy the guard conditions. Since every demon in the OS/3 knowledge base is designed to detect the target line of a particular instruction pattern, the class to which a demon is associated is always the subclass of CODE to which that target line belongs. Hence, the class members evaluated in the demon guard are only those instructions which are potential target lines of the pattern that demon detects. Conversely, members of each CODE subclass corresponds to the group of instructions targeted by a specific demon. Demons in the OS/3 knowledge base are conceptually grouped into categories based on the number of lines in the pattern detected (one, two, or three), the direction of search for the next line of a multiple line pattern, and in the case of three line patterns, the search step which the demon handles (i,e. step one--have first line, look for second; step two--have second line, look for third). Based on these criteria, the following demon categories emerge: 1) One line pattern--no search 2) Two line pattern--backward search 3) Two line pattern--forward search 4) Three line pattern--backward search--step one 5) Three line pattern--forward search--step one 6) Three line pattern--backward search--step two 7) Three line pattern--forward search--step two All demons in the same category perform similar functions and hence share the same general structure. The structure of demons in each of the seven categories is outlined below in pseudocode and illustrated in the corresponding flow charts shown in FIGS. 4 to 10. The following naming conventions are used in all pseudocode and flow chart examples: LINE refers to the current source line under examination whose attributes are tested in the demon guard. This line is a member of the demon's associated class and hence, a potential target line for the pattern being sought. TRUE PREVIOUS refers to the source line immediately preceding LINE in the current module. TRUE PREVIOUS is an attribute of LINE. PREVIOUS refers to a source line occurring somewhere above LINE in the current module. This line is initially set to TRUE PREVIOUS, but is continually reset in the process of a backward search. PREVIOUS is an attribute of LINE. TRUE NEXT refers to the source line immediately following LINE in the current module. TRUE NEXT is an attribute of LINE. NEXT refers to a source line occurring somewhere below the LINE in the current module. This line is initially set to TRUE NEXT, but is continually reset in the process of a forward search. NEXT is an attribute of LINE. The pseudocode for the general structure of a demon which identifies a one line pattern is as follows:
______________________________________
ONE LINE PATTERN:
WHEN
all attributes of LINE are known
THEN
if LINE is a valid pattern then
if invalid address symbols found then
tag the symbols "suspect"
endif
recommend code changes for this LINE
endif
ENDWHEN
______________________________________
A demon which identifies a one line pattern is invoked for each member of its associated class at the moment all attributes of that member are known. The flow chart of FIG. 4 shows the process implemented by this pseudocode. The process begins at decision block 40 which determines if all attributes of LINE are known. If they are, a test is then made in decision block 41 to determine if LINE is a valid pattern. If either of the tests made in decision blocks 40 or 41 should fail, the process returns. If both are true, a test is made in decision block 42 to determine if LINE contains symbols used as an invalid addresses. If this test fails, control goes directly to function block 44. If the test succeeds, the symbols are tagged as "suspect" in function block 43 before code changes are recommended in function block 44. The pseudocode for the general structure of a demon which identifies a two line pattern using a backward search is as follows:
______________________________________
TWO LINE PATTERN - BACKWARD SEARCH:
WHEN
all attributes of LINE are known and
PREVIOUS line exists
THEN
if LINE is not tagged with this pattern and
LINE and PREVIOUS are a valid pattern then
if invalid address symbols found then
tag the symbols "suspect"
endif
recommend code changes for one or both lines
tag LINE with this pattern
set back search for LINE to success
set PREVIOUS line to TRUE PREVIOUS line
else
if back search for LINE is not set then
if PREVIOUS negates this pattern then
set back search for LINE to failure
set PREVIOUS line to TRUE PREVIOUS line
else
set PREVIOUS back one line
endif
endif
endif
ENDWHEN
______________________________________
A demon which identifies a two line pattern with a backward search is invoked for each member of its associated class when all attributes of that member are known and its PREVIOUS line exists. The demon is reinvoked on a member each time its PREVIOUS line is reset. In effect, the demon is repeatedly called for the same target line with a new value for PREVIOUS until the first line of the pattern is found or a condition exists which terminates the pattern search. The flow chart of FIG. 5 shows the process implemented by the pseudocode for a two line pattern using a backward search. The process begins at decision block 46 which determines if all attributes of LINE are known. If they are, a test is then made in decision block 47 to determine if the PREVIOUS line exists. If either of the tests made in decision blocks 46 or 47 should fail, the process returns. If both tests succeed, a test is made in decision block 48 to determine if LINE is already tagged with this pattern. If LINE is not tagged with this pattern, a further test is made in decision block 49 to determine if LINE and PREVIOUS form a valid pattern. If a valid pattern is found, a test is made in decision block 50 to determine if either line contains symbols used as invalid addresses. If invalid address symbols are not found, control goes directly to function block 52. If invalid address symbols are found, the symbols are tagged as "suspect" in function block 51 before recommending code changes for one or both pattern lines in function block 52. Next in function block 53, LINE is tagged with the current pattern, and in function block 54, back search for LINE is set to success. Finally, in function block 55, PREVIOUS line is set to the TRUE PREVIOUS line and the process returns. Returning to decision blocks 48 and 49, if LINE has already been tagged with this pattern or LINE and PREVIOUS line do not form a valid pattern, a test is made in decision block 56 to determine if back search for this LINE has already been set. If it has been set (indicating that a backward search from this LINE terminated in either a successful or unsuccessful state), the process returns. If back search has not been set, a further test is made in decision block 57 to determine if the PREVIOUS line negates the pattern. If the PREVIOUS line does negate the pattern, back search is set to failure in function block 58 before control goes to function block 55. If the PREVIOUS line does not negate the pattern, PREVIOUS is set back one line in function block 59 and the process returns. The pseudocode for the general structure of the demon pair which identifies a two line pattern using a forward search is as follows:
______________________________________
TWO LINE PATTERN - FORWARD SEARCH
IDENTIFICATION PART:
WHEN
all attributes of LINE are known and
NEXT line exists
THEN
if LINE is not tagged with this pattern and
LINE and NEXT are a valid pattern then
if invalid address symbols found then
tag the symbols "suspect"
endif
recommend code changes for one or both lines
tag LINE with this pattern
set forward search for LINE to success
set NEXT to TRUE NEXT
endif
ENDWHEN
SEARCH PART:
WHEN
all attributes of LINE are known and
NEXT line exists and
line after NEXT exists
THEN
if forward search for LINE is not set then
if LINE is a valid target line then
if NEXT line negates pattern then
set forward search to failure
set NEXT line to TRUE NEXT line
else
set NEXT ahead one line
endif
endif
endif
ENDWHEN
______________________________________
Patterns requiring a forward search are handled separately by two related demons, one for pattern identification and one for pattern search. The tasks are split for performance sake because only LINE (the target line) and NEXT are needed to identify a valid pattern, while LINE, NEXT, and the line after NEXT are needed for a pattern search. The two tasks are handled by a single demon in the backward searching case because the line before PREVIOUS is already known whenever LINE and PREVIOUS are known. Together, each pattern identification and pattern search pair comprise the structure of a demon in the two line backward search category. The guard of the pattern identification demon checks for all LINE attributes and the existence of NEXT, while the guard of the pattern search demon checks for the line after NEXT as well to accommodate the forward searching case. The pattern identification demon corresponds to the "WHEN--THEN--IF" part of a two line backward searching demon, while the pattern search demon corresponds to the "WHEN--THEN--ELSE" part. If LINE and NEXT constitute a valid pattern, the body of the pattern identification demon is executed. If they do not constitute a valid pattern and the line after NEXT exists, the body of the pattern search demon is executed. The pattern search demon is reinvoked on the same target line each time NEXT is reset until NEXT is pointing to the second line of the pattern or a condition exists which terminates the search. When NEXT points to the second pattern line, the pattern identification demon executes and the cycle terminates. The two demons effectively produce the same repetitive behavior in the forward direction that is produced by one demon in the backward searching case. The flow chart of FIG. 6 shows the processes implemented by the pseudocode for a two line pattern using a forward search. The process begins with the identification part at decision block 60 which determines if all attributes of LINE are known. If they are, a test is then made in decision block 61 to determine if the NEXT line exists. If NEXT does exist, a test is made in decision block 62 to determine if LINE is already tagged with this pattern. If LINE is not tagged with this pattern, a further test is made in decision block 63 to determine if LINE and NEXT form a valid pattern. If any of the tests made in decision blocks 60, 61 or 63 should fail or if the test made in decision block 62 succeeds, the process returns. If LINE and NEXT do form a valid pattern, a test is made in decision block 64 to determine if either line contains symbols used as invalid addresses. If invalid address symbols are not found, control goes directly to function block 66. If invalid address symbols are found, the symbols are tagged as "suspect" in function block 65 before recommending code changes for one or both pattern lines in function block 66. Next in function block 67, LINE is tagged with the current pattern, and in function block 68, forward search for LINE is set to success. Finally, in function block 69, NEXT is set to the TRUE NEXT line and the process returns. The search part begins at decision block 70 which determines if all attributes of LINE are known. If they are, a test is made in decision block 71 to determine if the NEXT line exists. If NEXT does exist, a test is made in decision block 72 to determine if the line after NEXT exists. If this test succeeds, a further test is made in decision block 73 to determine if forward search for this LINE has already been set. If forward search has not been set, a test is made in decision block 74 to determine if LINE is a valid target line for this pattern. If any of the tests made in decision blocks 70, 71, 72 or 74 should fail or if the test in decision block 73 succeeds (indicating that a forward search from this LINE terminated in either a successful or unsuccessful state), the process returns. Otherwise, a test is made in decision block 75 to determine if the NEXT line negates the pattern. If NEXT does negate the pattern, forward search for this LINE is set to failure in function block 76 and NEXT line is set to the TRUE NEXT line in function block 78 before the process returns. If the NEXT line does not negate the pattern, NEXT is set ahead one line in function block 77 and the process returns. Patterns consisting of three instruction lines are handled by related demons which cooperate to accomplish the two steps needed to find the entire pattern. In step one, the pattern's primary target line is located and a search is conducted (either backward or forward) for the next pattern line. The primary target line (the target line used in step one which is the most probable indicator of the pattern's existence) can be the first, second or third line in the pattern. The position of the target line determines the direction of search for the next pattern line. If a backward search is required, step one is handled by a single demon which is similar in structure to demons in the two line backward search category. If a forward search is required, step one is handled by a demon pair which is similar in structure to demon pairs in the two line forward search category. In either case, if the demon or demon pair handling step one is successful in finding the next pattern line, one of the lines involved in step one is modified to qualify it as the secondary target line (the target line used in step two). In step two, the secondary target line is located and a search ensues (either backward or forward) for the last pattern line. As in step one, a backward search is handled by a single demon and a forward search is handled by a demon pair. Patterns which use the same line as the primary and secondary target lines use different search directions in step one and step two, while patterns using different lines as the primary and secondary target lines use the same search direction in step one and step two, as illustrated below. Group A Same target line, different search directions ##STR1## Group B Different target lines, same search direction ##STR2## There are minor differences in the way secondary target lines are handled by demons in Group A and those in Group B. For the sake of brevity, pseudocode and flow chart examples are given only for step one and step two demons handling three line patterns of the type in Group B, where the primary target line is different from the secondary target line. The pseudocode for the general structure of a demon which handles step one of a three line pattern identification using a backward search is as follows:
______________________________________
THREE LINE PATTERN - BACKWARD SEARCH -
STEP ONE:
WHEN
all attributes of LINE are known and
PREVIOUS line exists
THEN
if LINE is not tagged with this pattern and
LINE and PREVIOUS are a valid pattern then
if invalid address symbols found then
tag the symbols "suspect"
endif
recommend code changes for one or both lines
tag LINE and PREVIOUS with this pattern
set back search for LINE to success
set "next pattern line" of PREVIOUS to LINE
set PREVIOUS to TRUE PREVIOUS
else
if back search for LINE is not set then
if PREVIOUS negates this pattern then
set back search for LINE to failure
set PREVIOUS to TRUE PREVIOUS
else
set PREVIOUS back one line
endif
endif
endif
ENDWHEN
______________________________________
The structure of this demon is essentially the same as a demon in the two line pattern backward search category with two additional functions performed in the demon body. If a valid pattern is found, both PREVIOUS and LINE are tagged with this pattern (rather than just LINE), and the "next pattern line" attribute of PREVIOUS is set to LINE. Tagging PREVIOUS with this pattern qualifies PREVIOUS as the secondary target line for this pattern. Setting the "next pattern line" attribute of PREVIOUS to LINE establishes a connection between these two lines so that attributes of LINE can be accessed and possibly modified by the demon handling step two for this pattern. FIG. 7 is a flow chart of the process implemented by the pseudocode above. A comparison of this flow chart with that of FIG. 5 will demonstrate the similarity between a demon which handles step one of a three line pattern backward search, and one which handles a two line pattern backward search. Because of the similarity, no further discussion will be made of FIG. 7. The pseudocode for the general structure of the demon pair which handles step one of a three line pattern identification using a forward search is as follows:
______________________________________
THREE LINE PATTERN - FORWARD SEARCH -
STEP ONE
IDENTIFICATION PART:
WHEN
all attributes of LINE are known and
NEXT line exists
THEN
if LINE is not tagged with this pattern and
LINE and NEXT are a valid pattern then
if invalid address symbols found then
tag the symbols "suspect"
endif
recommend code changes for one or both lines
tag LINE and NEXT with this pattern
set forward search for LINE to success
set "previous pattern line" of NEXT to LINE
set NEXT line to TRUE NEXT line
endif
ENDWHEN
SEARCH PART:
WHEN
all attributes of LINE are known and
NEXT line exists and
line after NEXT exists
THEN
if forward search for LINE is not set then
if LINE is a valid target line then
if NEXT line negates pattern then
set forward search to failure
set NEXT line to TRUE NEXT line
else
set NEXT ahead one line
endif
endif
endif
ENDWHEN
______________________________________
The structure of the demon pair is essentially the same as the demon pair in the two line pattern forward search category with two additional functions performed in the body of the identification part. If a valid pattern is found, both NEXT and LINE are tagged with this pattern (rather than just LINE), and the "previous pattern line" attribute of NEXT is set to LINE. Tagging NEXT with this pattern qualifies NEXT as the secondary target line for this pattern. Setting the "previous pattern line" attribute of NEXT to LINE establishes a connection between these two lines so that attributes of LINE can be accessed and possibly modified by the demon handling step two for this pattern. FIG. 8 is a flow chart of the process implemented by the pseudocode above. A comparison of this flow chart with that of FIG. 6 will demonstrate the similarity between a demon pair which handles step one of a three line pattern backward search, and a demon pair which handles a two line pattern backward search. Because of the similarity, no further discussion will be made of FIG. 8. The pseudocode for the general structure of a demon which handles step two of a three line pattern identification using a backward search is as follows:
______________________________________
THREE LINE PATTERN - BACKWARD SEARCH -
STEP TWO:
WHEN
all attributes of LINE are known and
LINE is secondary target line and
PREVIOUS line exists
THEN
if LINE is not tagged with this pattern and
LINE and PREVIOUS are a valid pattern then
if invalid address symbols found then
tag the symbols "suspect"
endif
recommend code changes for one or both lines
tag LINE with this pattern
set back search for LINE to success
set PREVIOUS to TRUE PREVIOUS
else
if back search for LINE is not set then
if PREVIOUS negates this pattern then
set back search for LINE to failure
set PREVIOUS to TRUE PREVIOUS
else
set PREVIOUS back one line
endif
endif
endif
ENDWHEN
______________________________________
The structure of this demon is essentially the same as a demon in the two line pattern backward search category with one additional test made in the guard. LINE must qualify as the secondary target line for this pattern, a condition which is true if LINE has been modified by the demon handling step one for this pattern. The flow chart for the process implemented by this pseudocode is shown in FIG. 9. Again, a comparison of this flow chart with that in FIG. 5 will demonstrate the similarity between this demon and that for a two line backward search. The pseudocode for the general structure of a demon which handles step two of a three line pattern identification using a forward search is as follows:
______________________________________
THREE LINE PATTERN - FORWARD SEARCH -
STEP TWO
IDENTIFICATION PART:
WHEN
all attributes of LINE are known and
LINE is secondary target line and
NEXT line exists
THEN
if LINE is not tagged with this pattern and
LINE and NEXT are a valid pattern then
if invalid address symbols found then
tag the symbols "suspect"
endif
recommend code changes for one or both lines
tag LINE with this pattern
set forward search for LINE to success
set NEXT line to TRUE NEXT line
endif
ENDWHEN
SEARCH PART:
WHEN
all attributes of LINE are known and
LINE is secondary target line and
NEXT line exists and
line after NEXT exists
THEN
if forward search for LINE is not set then
if NEXT line negates pattern then
set forward search to failure
set NEXT line to TRUE NEXT line
else
set NEXT ahead one line
endif
endif
ENDWHEN
______________________________________
The structure of this demon pair is essentially the same as a demon pair in the two line pattern forward search category with two exceptions. An additional test is made in both guards to determine if LINE qualifies as the secondary target line for this pattern, a condition which is true if LINE has been modified by the demon handling step one for this pattern. The addition of this test to the guard eliminates the need to determine if LINE is a valid target line in body of the search part demon. The flow chart for the processes implemented by this pseudocode is shown in FIG. 10. A comparison of FIG. 10 to FIG. 6 will demonstrate the similarity of the two processes. Although the EMA was designed specifically to convert OS/3 to an extended memory platform, the underlying concepts used to accomplish this task are language and platform independent. If source code representation and conversion details are parameters to the code conversion problem, then given the proper parameters EMA technology can be applied to a broad range of conversion problems. For example, EMA technology could be applied to porting applications from one operating system to another, porting code from one hardware platform to another, or re-documenting systems which are old and expensive to maintain. The key aspects of the invention which easily extend the applicability of the EMA technology to other language conversion problems are 1) the way in which source code is represented, 2) the way in which conversion knowledge is represented, and 3) the way in which conversion knowledge is applied to case-specific information in order to arrive at a problem solution. In the preferred embodiment, the EMA system provides an output report of recommended source code modifications; however those skilled in the art will recognize that other output scenarios exist which may be implemented according to specific application requirements. For example, rather than providing an output of recommended source code modifications, the EMA system may readily generate the fully modified source code, or the source code may be generated with recommended changes inserted as comments. In either case, the EMA system may be interactive by querying the user for approval of automated source code changes. Thus, while the invention has been described in terms of a single preferred embodiment, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.
|
Same subclass Same class Consider this |
||||||||||
