Fault tolerant and combinatorial software environment system, method and medium6973560Abstract A fault tolerant software environment, in which various program components (e.g., portions of computer programs, applications, etc) are objectized into entities represented by "codons." This allows for improper syntax to occur, enabling, for example, combinatorial operations such as genetic programming. The present invention also contemplates such features as the ability to probabilistically execute individual codons, to switch between treating information as executable code or as data (or passing over it), provides that the individual codons can be tagged so that additional information can be associated with them, and provides for tagging of the stack. Claims 1. A computer-based method for processing at least one codon in a construct containing one or more codons, comprising the steps of: Description BACKGROUND OF THE INVENTION
Genetic programming is a biologically inspired general-purpose search technique for discovering solutions to complex problems pioneered by John Koza. Typically implemented in LISP or C/C++, Genetic Programming involves creating a population of syntactically correct programs (in LISP) or data structures (in C), then breeding successive generations of syntactically correct programs or data structures guided by the Darwinian principle of "survival of the fittest." Analogs of naturally occurring operations such as sexual recombination, mutation, and gene deletion are used to generate the programs. Utilizing this approach enables computers to develop solutions to a given problem without advanced knowledge of the form of the solution. For a further description of Genetic Programming see Koza, John R., "Genetic Programming III; Darwinian Invention and Problem Solving" (1999). FIG. 1 demonstrates the concept of breeding programs. In this example, an application is contemplated that creates a picture by invoking two programs (A and B), which create an ellipse and a rectangle respectively. The concepts of "cross-over" and "mutation" are employed to generate variations of the original picture. Referring now to FIG. 1, several pictures are set forth where each picture consists of two components, an ellipse (A) and a rectangle (B). In the original representation 102, the application used coordinate data 50 50 150 150 to produce the ellipse, and 100 100 200 200 to produce the rectangle. Below original representation 102 are four exemplary representations depicting the concept of "cross-over." For example, in representation 104, it can be seen that the data elements after the first position have been exchanged. In other words, the four instructions following the first "50" in the ellipse program have been crossed over (i.e., swapped) with the four instructions following the first "100" in the rectangle program. The remaining three other such representations (106, 108 and 110) have resulted from aspects of original representation 102 being crossed over in a similar fashion. Representation 112 depicts the concept of mutation. Specifically, rather than simply moving around the same data provided by the original representation 102, at least some of the data is mutated (i.e., changed), thus yielding different values, as shown. Thus, from using such concepts as cross-over and mutation, numerous "possibilities" can be created, one or more of which can then be selected by a designer according to his or her personal taste or objectively tested according to a machine-based fitness function such as maximizing the axial symmetry of the picture. A fundamental problem in using genetic programming to evolve solutions to problems is that programs cannot be arbitrarily recombined without producing software with invalid syntax, nonsense logic, overflow errors, etc. Programs must have correct, balanced, "perfect" grammar to compile or even be interpreted. If a semicolon ";" is missing at the end of a line in a program written in C, or a "next" does not follow a "for" in a program written in BASIC, the software will not execute. For example, the code strip below returns the temperature in Fahrenheit given a Celsius value: If a cut is made after the multiply sign and the second part of the piece is recombined in front of the first part, the new code will appear as: If this new code were passed to a conventional computer for execution or a compiler for translation, the procedure would terminate at the second instruction ")" which has no meaning without a prior parentheses. Termination is an unrecoverable error that stops the entire computing process in the middle of execution and typically requires human interaction in order to resume. Traditional computing environments are therefore unsuitable for automatically evolving solutions to problems or designing a robust life-like computing system. As noted above, existing virtual and silicon machines (computers) are not designed to process random code, and will halt execution on unhandled errors. In order to get around this problem, existing implementations of genetic programming impose rigid constraints on program breeding in order to ensure that programs will run. This restricts the space of potential programs to a sub-space of all grammatically correct or "legal" programs. This restriction may hinder the achievement of an optimal solution as all paths must be syntactically correct, thereby prohibiting "short cuts" or "trespasses" through the space of "illegal programs." In addition to syntactical errors, the arbitrary recombination of programs frequently produces programs containing circular logic and infinite loops that cannot be readily terminated by existing genetic programming systems without human intervention. Existing computer architectures do no allow for the inclusion of junk instructions which have the potential of later being executed (silent instructions). It has been estimated that as much as 97% of the human genome is comprised of Junk DNA or "introns". Although the role of introns is not fully understood, no software system presently allows for their inclusion. (Some believe that they provide a pool of genetic material that can be recombined in reproduction to create innovative new genes.) Thus, current software languages prohibit inclusion of "junk", and invalid syntax, both highly desirable capabilities in evolving software if one desires to emulate the processes and capabilities of molecular biology. As the arbitrary combination of elements in the genetic code of DNA produce useful new parts, these parts can be immediately deployed in the cell or used in reproduction. In the latter case, the new genetic units become part of the overall pool from which succeeding generations are created. However, this real-time extensibility is not a feature of existing software languages. Although it may be possible to detach and update object libraries that a program uses through operating system commands, existing computer languages do not provide functions to programmatically edit and update the elements of the libraries, i.e. prior art languages do not provide features for programs written in such languages to modify themselves at run-time. The above-noted shortcoming limits the speed that programs can be evolved, as the process must continually be terminated and then restarted at each generation. It also does not enable online adaptation or cooperative computing in which one personal computer is in production while another one is evolving improvements to the library. In general, it would further be desirable to be able to objectize the constituents of a software program (e.g. its key words, constants, operators, any user-defined functions, and combinations thereof) so that their methods can be executed upon being encountered and that new constituents formed during the execution of the program (and those formed during the genetic algorithm process) can be objectized without terminating the program. The existence of the deficiencies mentioned above has meant that certain other related deficiencies also exist. For example, prior art computer systems are highly deterministic. Once compiled, a program's behavior is set for all time. A more flexible system would allow the execution of code to be dependent on the context (configurable) of the program at run-time, just as the expression of a gene is dependent of the circumstances of the cell at the time of translation. Similarly, for modeling and theoretical purposes, random numbers are often used to, e.g., decide whether a binary state is on or off. Taking that concept a step further, it may be useful under certain circumstances for a specified constituent of a computer program (e.g., a "+" sign) to be randomly executed (i.e., sometimes the constituent is executed when it is encountered in a computer program, and sometimes it is not). Building in the ability to set and reset an instruction's probability of execution would also help facilitate learning and program self-modification, as broken code could be deactivated while good code could be executed with a 100% probability. Were such deactivation of portions of code allowed to occur in current software environments, the program may become syntactically incorrect, thus generating an error. Another deficiency of the prior art is the lack of ability to dynamically switch between treating information as executable code and treating it as data. A remedy to this deficiency would find use in automatic program generation. For example, the ability to switch between data and program modes would provide greater flexibility in creating, inspecting, and testing programs, and passing programs to other programs. For example, in normal execution mode, the program "+-*/" would generate four successive math errors. However, if the constituents of the program were treated as data where each had a corresponding instruction number (e.g., 395, 396, 397, 398), the data could then be manipulated mathematically and utilized as a toolkit for synthesizing mathematical programs, or used as a template for searching programs. Programming languages such as LISP have a limited ability to switch between data and executable code mode in that it can manipulate text-based items rather than processing them. However, LISP does not assign numerical designations to all of its instructions. Thus, it does not manipulate the items on a numerical level, and would be deficient in implementing the concepts mentioned above. Present day microprocessors and virtual machines utilize a set of instructions that point to one precise action or object. Distinguishing between instances of machine instruction is not provided. For example, a particular instruction might point to the "add" operator, but there is no way of tagging one program's "add" operator so that it is distinguishable from another's. This feature would be useful in tracking instruction lineage in genetic programming. In examining the instructions of the offspring after breeding two programs, the determination of what instruction came from which parent would be highly desirable. The ability to tag and thereby individualize instructions would have many uses. For example, the authorship of code segments could be marked and usage statistics kept. Mutation rates could be assigned to each individual instruction. Unutilized features of a program could be identified and spliced out, while highly used areas could be selected for breeding or identified for software pricing. The inability to individualize each instruction in current environments thus limits the possibility to analyze, study, research, modify, and price programs. From the above, it can be appreciated that "prior art" computers (virtual and silicon) and software languages are deficient in emulating many of the characteristics that make up the evolutionary process and do not allow the deliberate or arbitrary recombination of software instructions. Existing computers terminate execution on unhandled errors and are therefore not fault tolerant. Code cannot be arbitrarily reassociated. Unused code (analogous to introns) cannot reside embedded in software programs as the code itself lacks the facility to determine which sections to execute or ignore. Software languages cannot be compiled or interpreted if syntax errors exist. Extant languages do not permit real-time extensibility, the ability to add new functionality without stopping execution of the program and recompiling. Existing machine instructions which make up software languages and are processed by the computer are uni-dimensional and do not provide for individualization (e.g., tagging). These weaknesses demanded the development of a new computer and software environment to overcome the stated shortcomings. SUMMARY OF THE INVENTION The present invention overcomes the deficiencies mentioned above by providing a fault tolerant software environment. In particular, embodiments of the present invention envision that various program components (e.g., portions of computer programs, applications, etc) are objectized into entities represented by (and referred to herein as) "codons," as described herein. Further, it is thus envisioned that a computer program, comprising a plurality of codons, is executed codon by codon utilizing a virtual machine format. This is implemented such that, for example, executing a codon that would not have been expected in view of previously-executed codons (e.g., a codon requiring two inputs is executed without those inputs being available) does not, by itself, cause the program containing the codon to crash or terminate. Instead, the program will continue to run, albeit possibly without generating a meaningful result. This thus allows for improper syntax to occur, allowing for the inclusion of "junk" code. The fault-tolerant aspects of the present invention enable, for example, combinatorial operations such as genetic programming. In the environment as envisioned herein, various portions of a computer program can be separated and spliced together, creating new and potentially different functional portions. As these portions are created, embodiments of the present invention contemplate that they can immediately be used in (and as part of) the program without concern for whether they are syntactically correct, and without having to terminate the execution of the program to re-compile it. Embodiments of the present invention also contemplate the ability to probabilistically execute individual codons. That is, a codon can be selected to, for example, execute only a certain percentage of the time that it is encountered during execution of a program. The fault-tolerant aspects provided by embodiments of the present invention facilitate this, since the non-execution of a codon could otherwise lead to syntactical errors and adverse results. In general, probabilistic execution can allow for, for example, an enhanced capability to facilitate various types of modeling. In addition, embodiments of the present invention also provide the capability of being able to switch between treating information as executable code or as data. An advantage to being able to treat programs as data is, for example, that it allows for a greater ability to execute programs in a genetic programming environment. Furthermore, treating programs as data can also facilitate program self-analysis, self-modification and the ability for programs to readily create other programs. A silent mode, as contemplated by embodiments of the present invention, also facilitates these concepts by allowing specified codons not to be executed. Also, embodiments of the present invention provide that the individual codons can be tagged so that additional information can be associated with them. This information can include any number of attributes, such as the number of times the codon was encountered, its origin, etc. Thus, in one sense, the individual codons become "read-writable," where information pertaining to them can be stored, and then read for subsequent purposes. Further, embodiments of the present invention provide for stack tagging capabilities, which facilitates analysis of functional portions of a program. The present invention also contemplates numerous additional aspects, which will be described further below. BRIEF DESCRIPTION OF THE DRAWINGS Various objects, features, and attendant advantages of the present invention can be more fully appreciated as the same become better understood with reference to the following detailed description of the present invention when considered in connection with the accompanying drawings, in which: FIG. 1 depicts an example of the concept of breeding programs in a genetic programming environment. FIG. 2 depicts two exemplary programs demonstrating fault-tolerant aspects contemplated by the present invention. FIG. 3 depicts an exemplary master codon information structure table, as contemplated by embodiments of the present invention. FIG. 4 depicts an exemplary manner of execution of a program, as contemplated by embodiments of the present invention. FIG. 5 depicts an exemplary stack tag table, as contemplated by embodiments of the present invention. FIG. 6 depicts an exemplary multi-property codon information table, as contemplated by embodiments of the present invention. FIG. 7 depicts a diagram disclosing an exemplary technique for the generation of programs utilizing the treatment of executable code as data, as contemplated by embodiments of the present invention. FIG. 8 depicts a diagram disclosing an example of the genetic programming process. FIG. 9 depicts a diagram disclosing the concept of grammar repair. FIG. 10 depicts a diagram disclosing various features of the present invention implementable by codons, including "silent mode." FIG. 11 is a block diagram disclosing certain aspects of the virtual machine environment, as contemplated by embodiments of the present invention. FIG. 12 is a block diagram of a computer processing system used as part and/or in environments of the present invention. FIGS. 13-16 are flow diagrams disclosing an exemplary method of operation for processing codons, as contemplated by embodiments of the present invention. FIG. 17 is a high-level flow diagram depicting various exemplary aspects of the processing of codons. FIG. 18 is a block-diagram depicting the one-to-one mapping of objectized entities to codon numbers, as contemplated by embodiments of the present invention. FIG. 19 is a flow diagram relating to network and internet distributed environments contemplated by the present invention. FIG. 20 is a diagram depicting an exemplary architecture for routing tasks, as contemplated by embodiments of the present invention. FIG. 21 is a diagram depicting an exemplary genetic algorithm/programming session. FIG. 22 is a diagram depicting the concept of certain machines performing specified tasks within the environment of the present invention. FIG. 23 is a chart depicting an exemplary implementation of probabilistic execution. DETAILED DESCRIPTION The present invention relates to a fault tolerant and combinatorial software environment. More specifically, the present invention relates to a system, method and medium for providing and facilitating a virtual machine and instruction format that facilitates systematic or arbitrary alteration and recombination of portions of code (e.g., programs) at run-time. Instructions that comprise a language that can be used with the present invention can be individualized, allowing for differentiation of each instruction based on, e.g., usage, origin, authorship, or any other user-defined tag. The overall instruction set can be extended to include any object or procedure that can be represented by software. The invention is envisioned for use in applications such as those relating to automatic program creation (e.g. genetic programming), artificial/directed evolution of digital designs (graphic, industrial, mechanical, architectural, engineering, etc.), online adaptive software systems, product and process reliability testing, and simulation of biological, non-linear, and probabilistic processes. The present invention accomplishes this by allowing the various features of a programming language (e.g., operators, constants, variables, text strings, keywords, etc.), combinations thereof (e.g., lines of code, subroutines, functions, entire programs, etc.) and even aspects of other programs (e.g., the functions associated with third party software) to be "objectized" and assigned some unique designator (e.g., a unique number). Consequently, it is envisioned that virtually any software-related entity (or any entity that can be represented by software) can be assigned such a designator, and that the address location of the methods relating to that entity can be stored in, e.g., a master table, where the assigned number can be used as a key to access and execute the methods (and to indicate other attributes of the entity). These assigned, representative designators (and their associated methods) will be referred to hereinafter as "codons." The various entities possibly comprising groups of codons (e.g., lines of code, functions, whole programs, etc.), while potentially being codons, themselves, are often also referred to herein as "agents." The term "codon" itself is drawn from biology, but in many cases the term "instruction" (which also envisions the combination of multiple smaller instructions) can be substituted. Thus, from the above, it is envisioned that the present invention contemplates that virtually any entity that can be represented in software can be objectized and become a codon. A program (which comprises codons, and which itself may be a codon) is executed codon by codon. If the program is syntactically incorrect, the program will still continue to run (i.e., it will not cause the system to crash), but it may produce meaningless results or no tangible result at all. This enables certain advantageous results to occur, including those involving genetic programming. Other advantages and applications will also be discussed herein. A feature contemplated by embodiments of the present invention with relation to the codon scheme mentioned above is the ability to record various specific aspects of individual codons. More specifically, it is envisioned that each instance of a codon has associated with it a number of fields that can indicate any number of properties or characteristics about the codon. For example, in a genetic programming environment, it may be advantageous to keep track of where (i.e., from what piece of code) a newly created (i.e., spliced together) piece of code came from, and in particular, where each codon in the newly formed piece of code originated. Consequently, for each codon in the newly formed code, the codon of the entity from which the newly created codon came will be stored and in some way associated with the codon in the newly formed code. This can be done, e.g., in a table format. Another feature contemplated by embodiments of the present invention includes the ability to treat the codons as data, rather than as just the function that they otherwise represent. Again, this facilitates program self-modification, self-analysis, and genetic programming, and has other advantages that shall be discussed further herein. An illustration of the general nature of the fault tolerance contemplated by the present invention is described with regard to FIG. 2. In this Figure, as well as in the description of subsequent ones, the programs are ordered and explained using the concept of post-fix or "reverse polish notation," wherein, e.g., operators are positioned after the arguments that they will be using (e.g., x+y is represented as xy+). Of course, it should be understood that this is just by way of example, and that the various features and inventive concepts of the present invention are not limited to the use of such notation. Referring now to FIG. 2, two programs (202 and 204) are shown, each having ten steps. The letters shown in each program represent arguments (e.g., constants) that are used as input to, e.g., subsequent operations (such as an addition operation), represented by the numbers in the program. Each number, in addition to merely signifying the existence of an operation, indicates the number of arguments required by the operation. (Of course, the numbers could represent any type of entity requiring arguments.) In the example of FIG. 2, the first program is depicted as a "perfect" program 202, so called because all of the operations have the correct number of arguments associated with them (it is assumed in this example that the operations are subroutines, i.e. they do not return anything). Specifically, in "perfect" program 202, a, b, and c (of steps 1-3, respectively) represent three arguments, while the number "3" of Step 4 represents an operation requiring exactly 3 arguments. Step 6 depicts an operation requiring one argument, which is provided in this example by argument "d" of Step 5. Similarly, the operation of Step 9 requires two arguments, which are provided by arguments e and f of steps 7, and 8, respectively. Consequently, each operation has the number of arguments that it expects. In conventional software environments, "perfect" programs of the nature described above are expected, since conventional programming languages expect that their operations, functions, etc., will receive the anticipated number of arguments. Otherwise, an error will occur and the program will not execute. For example, if the "2" at Step 9 of the "perfect" program 202 represented the "+" operation, then two arguments would be expected. If one of the arguments were not present, then an error would occur and the program would not execute. This would be the case whether the program were compiled (in which case an error message would likely occur during the compilation) or if the program were interpreted. In an interpretive environment, the error would typically occur while typing in the lines of code in the program (if the editor checked for such occurrences) or at run time (e.g., where variables are used in the equations, but at run time no input for them is provided). In a compiler setting, the error would typically occur during compilation or at run time. In most conventional situations, such errors and the cessation of execution resulting there from is the desired result. However, in situations where automation, robustness, and flexibility are desirable (e.g., genetic programming) the conventional methods are inadequate. In addition to the "perfect" program, an "ungrammatical" program 204 is also shown in FIG. 2. Here, the "ungrammatical" program 204 is shown to contain operations that do not have the expected number of arguments associated with them. For example, in Step 3 of "ungrammatical" program 204, the operation requires three arguments, but is only provided with two (specifically, a and b). Other operations in the "ungrammatical" program 204 are similarly mismatched. In conventional programming environments, the ungrammatical program would not run (e.g., it would yield fatal run time and/or compilation errors). However, the present invention will continue to run under such circumstances. While such a situation may ultimately generate nonsense or simply no value at all, the fact that it continues executing as it attempts to yield a meaningful result is of great value in many situations, some of which will be discussed herein. For purposes of illustration herein, various features of the present invention, through several of the following Figures, will be described in the context of an exemplary computer program. The function of the program is to convert a number representing a temperature in degrees Celsius into one representing degrees Fahrenheit. The program contains the following general steps:
3. Multiply the quotient of the previous division step (specifically, 1.8) by the number obtained in the spreadsheet cell. 4. Add a fifth argument (specifically, the number "32") to the product of the previous multiplication step. (This will yield the temperature in Fahrenheit.) While various features of the present invention will be illustrated using this program, it should be understood that the present invention is by no means limited to aspects of this example. As indicated above, embodiments of the present invention contemplate that the meaning (i.e., interpretation) of each codon can be stored in some type of structure such as a master table. That way, when a codon is encountered during execution of the program, its interpretation (and possibly various characteristics of the codon) can be looked up and obtained. An example of such a look-up table is depicted as a Master Codon Information Structure Table (hereafter master codon table) 304 shown in FIG. 3. Referring now to FIG. 3, master codon table 304 is shown containing ten (10) codons (numbered 1 through 10), as indicated by a codon number column 306 in the master codon table 304. The entity that each codon number specifically represents is in the "label" column 308. Thus, codon number 3 represents the "*" operation (i.e., multiplication), codon number 5 represents the constant "9", codon number 10 represents a function called "xlCell.RValue" (which is an abbreviation for Excel read value), etc. (In this example, the Excel program from Microsoft Corporation is mentioned, but any other functionally similar program could also have been used.) The codons shown and described in this Figure were specifically chosen to illustrate the implementation of the above-mentioned temperature conversion program, though it should be clearly understood that the present invention contemplates that the master codon table can contain any number and different types of codons (pre-set or user-defined). The "original program" 302 of FIG. 3 is an exemplary sequence of codons envisioned to implement the temperature conversion program mentioned above. Thus, when this program is executed, the present invention begins by evaluating what each codon represents (using the master codon table 304), executing that codon, and then continuing in that way throughout the entire program. Consequently, the process begins by obtaining the first codon number in the program (in this example, the number "8"). Turning then to the master codon table 304, and particularly codon number column 306 and label column 308, we see that the number 8 represents (i.e., has been assigned to) the constant "1." Continuing with the processing of codons in original program 302, the next codon number encountered is "2" which, when compared against the master codon table 304 (and particularly with reference to codon number column 306 and label column 308), can be seen to have been assigned to the constant "4." Evaluation of the next codon number (codon number "10") indicates that it has been assigned to represent the function "xlCell.R.Value", which obtains the value of an Excel spread sheet at the cell position indicated by a first and second argument (e.g., row 1, column 4). Eventually, all codon numbers in the original program 302 are evaluated against master codon table 304 and executed. In this example, from using label column 308 of master codon table 304, a translation of the original program 302 can be seen as translated program 306. In the example shown in FIG. 3, the "original program" 302 is shown to be in the form of codon numbers, as would be the case for programs that are manipulated in a genetic programming environment in, e.g., the manner discussed further below. However, human programmers typically write programs using programming languages such as C, C++, Visual Basic, JAVA, etc, to produce what is generally referred to as source code. To accommodate human programmers, embodiments of the present invention contemplate the use of a function that takes the source code as input and, using the master codon table 304, converts the source code into corresponding codon numbers. As a result, the source and codon representations of programs are intraconvertable. Programs can be stored, manipulated and executed in either form (i.e., either in codon number form or, e.g., human readable text). Storing programs as text enables the codon numbers to be changed over time and eases sharing of programs among virtual machines which could have different codon numbers for the same operation. From there, the various features of the present invention discussed herein can be implemented. Of course, it should be understood that embodiments of the present invention contemplate that various other types of implementations are also possible. It should also be understood that the present invention contemplates that there are any number of ways for codons to be assigned a codon number and registered in the master codon table 304. Three examples are mentioned here. First, a user can predefine certain codons by embedding their methods at some location within the programming environment, placing that location address in the master codon table 304 (discussed below) and re-compiling the entire compiler or interpreter that is used to facilitate the virtual environment (i.e., to execute the codons). Second, the present invention also contemplates that an agent can be written by a user, randomly or systematically generated or evolved, and immediately added to the virtual machine instructions set (e.g., immediately added to the master codon table) using, e.g., a "DefineAgent" function, mentioned below. In this way, the programming language need not be re-compiled. Doing this allows the newly generated items to be used, immediately facilitating real time extension of the operating environment and the ability to map old agents to newly created agents. In addition, it, e.g., enlarges the overall pool from which succeeding generations can be bred while performing genetic programming. This concept can also be used to automatically and immediately operate on codons in master codon table 304 that are used in a user-created program. Third, it is also contemplated that one can encapsulate third party compiled code (in real time) into a codon provided that a header file and library file (object file) is available. This involves inspecting the header file, and recording the attributes of the desired "extensions" in a data structure maintained by the virtual machine and loading the function's machine instructions into memory. The function (i.e., a function which is part of the third party code) would be assigned a codon in the typical manner, and given a special type (e.g., "user-defined" codon, as mentioned further below). When that codon is encountered, embodiments of the present invention contemplate that the codon is passed to a special wrapper function which checks if the appropriate arguments and types are available. If so, the wrapper function extracts the arguments from the virtual machine's data stack, casts them into the appropriate type/form (data types) and places them on the microprocessor's stack. The virtual machine invokes the third party function by transferring control to the memory address where the function was stored previously. The value returned by the function, if any, is converted into the virtual machine's data representation format and placed on its stack similar to any other intrinsic virtual machine function. Referring still to FIG. 3, in addition to providing a translation for the codon numbers of a program, the present invention also contemplates that various attributes and characteristics of the codons can be recorded and retrieved for a variety of purposes. In the example depicted by master codon table 304, information relating to these attributes and characteristics are recorded in and retrievable from the subsequent columns. Specifically, embodiments of the present invention envision the existence of a "type" column 310 indicating the particular type of codon at issue. For example, the "+" codon is an "arithmetic codon" type, while codon number 2 (which is the number 4) is shown as type "constant codon." In looking at the type column 310 for this exemplary program, it can be seen that most of the codons are either of an arithmetic or constant type. However, codon number 9, which is the "end" statement, is a "control" codon, while codon number 10, which is the function for obtaining the cell in an Excel spread sheet, is a user-defined Visual Basic application input-output codon type ("Visual Basic" is a programming language from Microsoft Corporation of Redmond, Wash.). In general, the present invention contemplates the use of various user-defined functions, as well as contemplates that any number of other types of codons can exist in addition to those specifically listed in the type 310 column. The type field, as envisioned by embodiments of the present invention, allows assigning codons to particular categories (classes) which is highly useful in breeding and analyzing programs and implementing context sensitive behavior. Embodiments of the present invention contemplate that the entries in type column 310 can, e.g., be entered automatically based upon the information in other columns such as the label column 308 (using, e.g., an additional table that matches a label with a type) and/or can be entered manually by a user. Column 312 of master codon table 304 indicates the default value of the codon. In general, it is envisioned that fundamental types (integer, double, etc.) which can be stored within the value field are given their appropriate numeric value, while the codons representing arithmetic operations, which do not need a "value" field, are given a value of zero. For strings, structures, arrays, and other derived data types which are too large to be stored within the value field, the value field contains the address of the object (e.g., data) or the address of an information structure about the object. Different numbering schemes for various purposes are also contemplated. The column designated "Num Args" 314 indicates the number of arguments that a particular codon expects. For example, the "+" operator expects two arguments, as do the "*", "/", and "xlCell.R.Value" codons. However, it should be understood that the present invention contemplates use with operations and functions requiring any number of arguments. Column 316 indicates, e.g., the address where the actual code to implement the operator or function exists (which could be, e.g., a local or remote address). Thus, for the addition operation, a pointer to a virtual machine function for "add" is given. Similarly, the address of a function for handling constants (such as the constant 4, represented by codon number 2) is given. The behavior of each codon is implemented by a user-defined virtual machine function to provide greater flexibility. For instance, constants can be stored in remote databases, hardware, etc. rather than in RAM. This functional design also enables one to remap or replace the behavior of a function at run-time. With regard to, e.g., virtual machine functions, embodiments of the present invention contemplate that they can be written in any number of programming languages, or the like. It is envisioned, for example, that such functions may then be compiled, stored in machine-readable format, and executed when needed. To enhance the interchangeability of functions and to reduce the amount of idiosyncratic information which must be stored in order to call functions, embodiments of the present invention further contemplate that the functions can have a standard interface form so that additional functions can be readily generated and used. Embodiments of the present invention contemplate that, while most codons are associated with a unique function, some codons such as constants, matrices, and arrays are handled by a single function. To implement the latter, the codon number and/or the value field is envisioned to contain the instance or "case" information which the virtual machine function needs to lookup, retrieve, manipulate, and replace the unique data associated with the codon. An exemplary form is as follows: "T—DATA—INDEX CALL—STYLE vmFN(T—VMRECORD*pvm)." In this form, the "pvm" represents the pointer or memory address of a virtual machine state structure, T—VMRecord, allowing full access to virtual machine information. Providing complete access to the virtual machine state(s) allows the programmer to create functions which manipulate the virtual machine itself, such as control flow logic (e.g. do..loop), branching, starting and killing sub-agents, inspecting and modifying programs, and so forth. In situations where this level of visibility is not required, a simplified format is envisioned in which only information pertinent to the evaluation of a codon is provided, such as the codon number, an argument buffer, the number of items in the argument buffer, and an address of a return buffer. "T—DATA—INDEX" is an implementation-defined integral data type such as "long" whose value is set to the number of items in the return buffer or to a negative number to indicate that an error occurred. For example, a "-1" could indicate that an error occurred (usually due to invalid arguments) and the return buffer would be empty. It is further envisioned that there would exist a table of errors which would contain error numbers and error descriptions which could be extended during runtime. A return value of "0" could indicate that the virtual machine function or procedure was successfully executed, but that no values were returned (i.e. the return buffer is empty). Any integer greater than "0" could indicate that the function was successfully executed and the integer returned would indicate the number of items in the return buffer. The value and type of the items could then be accessed in the return buffer. While embodiments of the present invention envision that the argument data types can be determined by the evaluating functions by passing arguments to and from the virtual machines in the form of variants and providing mechanisms for retrieving and converting fundamental and structure types into variants, other possible schemes are also contemplated, including maintaining a table in the virtual machine of the type of arguments each function takes. In the former case, type checking is carried out by the virtual machine functions, in the latter case, the virtual machine itself determines whether the appropriate arguments are available. The following represents an exemplary function for performing addition. The vmfn—ADD( ) is passed a pointer to the virtual machine record that contains the argument and return buffer, vmArgs and vmRets respectively. The vmfn—Add function adds the two passed arguments and if the result is finite loads the result into the return buffer by setting the value field to the result and the type to "double", identifies it as a "double" and returns the value "1" indicating one item in the buffer. If the result is not finite, then an error has occurred. If the arguments are both negative in value then the function returns L—NegOverflow, and if the two arguments are positive then the function returns L—PosOverflow.
The num rets column 318 of FIG. 3 indicates the number of values that can be expected to be returned when a codon is executed. In this particular example, with the exception of the "end" codon, all of the codons will return one value. The errors column 320 indicates the number of errors that occurred due to the execution of a particular codon. Embodiments of the present invention contemplate that various different types of errors can be kept track of using column 320. In addition, embodiments of the present invention also contemplate that multiple error columns can be utilized to keep track of different types of errors separately. One example of a type of error that could be kept track of is where the proper number of arguments for a codon does not exist. Thus, where a "+" codon is encountered and only one argument is available when that codon is executed, an error may be generated (even though as indicated above, the program would continue to run, and as such might not be viewed as a true "error"). Examples of other types of errors would include division by zero, where a division operator is concerned (in this example, codon number 4). Instances column 322 indicates the number of times that a given codon has been executed. This can be kept track of from one or more perspectives. For example, the present invention can keep track of the number of times a codon is encountered for all programs running within a given environment (e.g., it would keep track of the total number of times a codon is encountered in all programs) and/or the total number of times the codon is encountered over some set time period or within certain select units (e.g., within one or more specified programs, subroutines, etc). Various other criteria for tracking the number of instances are also contemplated by embodiments of the present invention, as well. As with errors column 320, it is envisioned that multiple instance columns can also be utilized. As will be discussed further, keeping track of the number of instances allows the present invention to, e.g., determine which portions (e.g., features, subroutines, etc.) of a program are being used. This has numerous potential applications, including those relating to the rental of software (e.g., a user can be charged based upon the features that he or she actually uses). Keeping track of the instances can also be used for debugging purposes, and can be used to differentiate one initially-identical program from another based upon how each user has utilized their copy of the program. The run time total column 324 indicates the total time of execution for a codon during, e.g., the execution of a program. The next three columns indicate whether or not certain types of events (e.g., errors) should be handled for the given codon. TrapGrammar column 326 will be set to "true" or to the value of a grammar correction codon, if it is desired that grammar errors be corrected. An example of a grammar error is where a function that is expecting a numeric argument instead receives something else, such as an operation (e.g., as would be the case for the program "+cos", where the cosine expects a number rather than a "+"). TrapError column 328 relates to whether control should be transferred to a handling function on encountering an error. TrapSuccess column 330 relates to whether control should be transferred to a handling function on encountering a success, which can be useful in the context of, e.g., probabilistic execution, as will be described further below. In this particular example, all codons in that column 330 are set to "false," but if probabilistic execution were implemented with regard to a particular codon, one might set the TrapSuccess entry for that codon to "true" or to a specific handling codon, so that some context specific logic can be executed. Some additional columns (not shown) also contemplated by the present invention include those that would indicate the author/creator of a codon. For example, a program could be assigned a certain identification number, and if that program should create a codon (using, e.g., "define agent" function, discussed below), the identifying number of the program would be placed in the appropriate spot in the author/creator column. Another contemplated column involves access privileges for a codon, where, e.g., only certain programs would have access to certain codons. Thus, to be given access to a codon, a program's identification number, for example, could be placed in the appropriate location in a "privileges" column. Similarly, in multi-tasking environments, it might be useful to include a "Mutex" or mutual exclusion field indicating that a codon was in use, and/or, that is was temporarily locked. The present invention also contemplates the flexibility to allow users to define their own columns that can be associated with the codons as well. This is indicated by column 332. In general, the various columns mentioned above are exemplary in nature, and the present invention contemplates that different columns (and/or the same ones used in somewhat different functional ways) can also be used. An exemplary manner for execution of the program described above for converting Celsius to Fahrenheit will now be described and explained with regard to FIG. 4. Referring to FIG. 4, an Excel spreadsheet 402 is shown, where the number 10 was placed in the cell at the first row, fourth column. Referring to the program depicted at 404, the various steps of the program and results of each step will be explained in conjunction with master codon table 304 of FIG. 3. Referring to the program at 404, the first codon encountered during execution of the program is codon number 8. When this codon is evaluated against master codon table 304, it is determined that it represents the numeric constant "1." The result of encountering and executing a constant is merely to fetch the constant and place it on a stack in memory. Thus, the result of Step 1 is that the number 1 is placed on a stack 406, as shown. Consequently, the stack 406 shows the number of items that are on the stack after the completion of each step (note that the last element added to the stack is at the bottom). (As will be shown with regard to FIG. 5 below, embodiments of the present invention contemplate that the stack consists of a set of uniform "items" or structures which contain value, type, and other tagging information. The items are an extension of the "variant" or "union" data type, as they can encapsulate a multiplicity of fundamental and derived data types.) In Step 2 of program 404, codon number 2 is encountered which, according to the master codon table 304, represents the numeric constant 4. Having thus encountered another constant, the present invention also places this number on the stack 406. Consequently, the result of Step 2 is that the numbers 1 and 4 are on the stack 406, as shown. In Step 3 of program 404, codon number 10 is encountered which, according to the master codon table 304, represents the user-defined function for obtaining the value of a cell of a Microsoft Excel spread sheet. As per this exemplary function, the specific cell location in the spreadsheet at which the value is to be obtained is designated by two arguments, where the first argument represents a row number and the second represents a column number (note that the Num Args columm 314 of master codon table 304 contains a "2" in the "xlCell.R.Value" entry). As embodiments of the present invention contemplate that operations, functions, etc. receive their requested arguments from the stack 406 (either, e.g., directly via a stack pointer, or a by calling a stack pop function, or indirectly via an argument buffer which would contain copies of the pertinent stack items) the Excel function takes the number 4 off of stack 406 as its second argument, and the number 1 off of stack 406 as its first argument. Thus, the value of the cell located at column 4, row 1 is obtained. In this example, the value at that location is the numeric constant 10. Consequently, the result at Step 3 is that the 4 and 1 will have been removed off the stack and replaced by the numeric constant "10." In Steps 4 and 5 of program 404, codons 5 and 6, respectively, represent the numeric constants 9 and 5, respectively, and each of these constants is placed upon the stack 406, as shown. In Step 6, codon number 4 is encountered, which represents a "divide" operation. The divide operation takes two arguments and divides the first by the second. Thus the top and penultimate items on the stack 406 (in this example, the numeric constants "5" and "9") are divided to yield the numeric value "1.8". Consequently, this result is then placed on the stack 406, as shown in Step 6. In Step 7, codon number 3 is encountered, which represents a "multiply" operation. In this example, this operation causes the top and penultimate items on the stack 406 (in this case, 1.8 and 10, respectively) to be multiplied together, resulting in the number 18. Thus, the result of Step 7 is to place "18" on the stack 406. In Step 8, codon number 7 is encountered, which represents the numeric constant "32". Consequently, the result of Step 8 is that "32" is placed on the stack 406, as shown. In Step 9, codon number 1 is encountered, which represents an "add" operation. Consequently, the top and penultimate items on the stack will be added together, resulting in the numeric constant "50," as shown. While the use of stack 406 and its utilization as described above is as contemplated by embodiments of the present invention, it should be understood that the present invention also contemplates other methodologies for executing programs and implementing the virtual machine contemplated herein. This includes, for example, concepts using conventional and pre-fixed notation rather than post-fixed notation. Embodiments of the present invention contemplate that the items that are placed on the stack 406 (or on similar structures) can be "tagged" such that they are associated with various additional items of information. Embodiments of the present invention envisioning the tagging of items on stack 406 are described with regard to FIG. 5. Referring now to FIG. 5, an exemplary stack tag table 502 is shown containing information relating to the stack 406 and values thereon (particularly the value at the top of the stack) at each step in the program. A stack count column 504 indicates, for each step, the number of items on the stack after the completion of each step. Thus, in Step 1, it is shown that there was one item on stack 406, in Step 2 there were two, in Step 3 there was one, etc. A stack value column 506 indicates the value of the top item on the stack 406 at each given step (which, as generally envisioned herein, is considered to be the "value" of the stack for a given step). For example, at Step 5, the stack value column contains a "5" (indicating that the value of the stack 406 at that step was "5"), while at Step 6 (after the division operation) the value of the stack 406 as shown in stack value column 506 is "1.8." The remaining columns in stack tag table 502, discussed below, relate in this example to the value of the stack (i.e., the top-most value) for each given step. The "stack type" column 508 indicates the type of value (e.g., the fundamental storage class, integer, double, etc., or object type, or codon type for a derived or pointed to data type) for each step. Thus, the value of the stack at Step 5 (i.e., the number "5") is shown to be of type "integer," whereas the value at Step 6 (i.e., the number "1.8") is of type "long." The "origin" column 510 indicates the step in the program at which each item on the stack was generated. The "origin codon label" column 512 indicates the codon that generated the value. For example, at Step 3, the number "10" is shown to have been generated by the "xlCell.R.Value" function, while in Step 8, the number "32" is shown to have been generated simply by the encounter of the numeric constant 32. The dependency column 514 indicates the earliest position in the program that each value on the stack relied upon for its calculation. In some cases, (for example, looping logic) it may be useful to also record the step. For example, in Step 2, the value of stack 406 (which is "4") came from the numeric constant 4 encountered at Step 2 of the program at 404 of FIG. 4. Thus, in Step 2, the dependency column 514 has a number 2 (indicating that Step 2 is the earliest position in the program 404 that the value of the stack 406 at that step depends upon for its existence). In Step 6, the value "1.8" came from the division of 9 by the number 5. Thus, since the earliest of the numbers relied on for this calculation was the number 9 of Step 4, a "4" is in the "dependency" column 514 for step 6. In Step 9, the entry in the "dependency" column 514 is a "1," indicating that the first step that was relied upon to produce the number "50" was Step 1. The "dependency codon label" column 516 indicates the value of the stack at the step indicated by the corresponding dependency column 514. It should be understood that the items depicted in the stack tag table 502 are by way of example. Other types of information relating to the stack 406 can also be associated with the various values thereon, and added as columns in stack tag table 502. In addition, other types of structures for recording and retrieving information relating to values on the stack 502 (or similar items) are also contemplated by embodiments of the present invention. Embodiments of the present invention envision that tagging items on the stack is useful to help determine where a particular item came from, the functional unit it is part of, and the instructions and data on which it depends. This information can be used to break a program into functional units for breeding, and for allocation among processors or computers, e.g. run-time parallelization. Potentially, by tracking a particular item's origin, the process of calculation can be reversed. For example, in Step 5 above, the three items on the stack are the values "10," "9," and "5." Thus, if someone were interested in where the number "10" came from, they could find the entry in the "stack value" column 506 containing the value "10." In this case, it can be seen that the value was placed upon the stack at Step 3. Then, looking at, e.g., the "origin codon" column associated with Step 3, one would see that the "xlCell.R.Value" was executed. In addition, one could also see from the "dependency" column 514 at Step 3 that the earliest item relied upon for the value was at Step 1 of the program. When looping and branching logic is involved additional information may be needed to properly tag data such as the step at which an item was placed on the stack, as well as the step at which its dependent items were placed on the stack. In addition to the various items of information in the master codon table 304 of FIG. 3 that relate to the number of errors and instances for, e.g., a given type of codon (such as a "+" operation) that occur over the course of running, e.g., one or more programs, the present invention also contemplates that activities and characteristics of each individual codon in a program can be tracked and recorded. Thus, for example, if a program contained twelve different "+" operations, the activities and characteristics of each of these individual "+" operations could be individually tracked and recorded. This would be in addition to keeping a total of the activities of all "+" operations within, e.g., a given program, groups of programs, time frame, etc., as generally anticipated would be the case with regard to information stored in the instances column 322 of the master codon table 304, as contemplated by embodiments of the present invention. As an exemplary analogy, a conventional program can be considered as a tape of numbers or instructions, while the multi-property instructions contemplated herein would be a fat tape in which the normal tape had been expanded with additional fields for read/writeable information. The program is then streamed and a fat reader head would read the instruction and the properties associated with it each time the program was executed. In addition to simply reading the tape, the fields could be modified as the program was executed. FIG. 6 depicts one manner contemplated by embodiments of the present invention for implementing the tracking and recording of activities and characteristics of individual codons as mentioned above. Referring now to FIG. 6, a multi-property codon information table 602 is shown for keeping track of the various items of information relating to the individual codons of a particular functional entity (e.g., programs, subroutines, etc.) This is evidenced by the existence of a codon number entry 604, containing the codon numbers for each instruction of the functional entity. In the example of FIG. 6, that entity is the exemplary temperature conversion program mentioned previously. The exemplary information that embodiments of the present invention envision can be tracked and recorded in the multi-property codon information table 602 for each codon in, e.g., a program, is now described. An "activity level" entry 606 indicates whether a codon is on or off. A value of 1 indicates "on" and a value of 0 indicates "off." Intermediate values suggest fuzzy or "probabilistic" activation. Specifically, given an activity level, the present invention has the capability of determining whether or not (based upon any number of chosen factors) to execute a particular codon. The activity level number, itself, in the multi-property codon information table 602, is envisioned to be indicative of the level at which the codon would execute (e.g., an activity level of 0.33 might mean that the codon would have a 1 in 3 chance of executing when encountered). It should be realized that the fault-tolerant aspects of the present invention facilitate implementation of this feature, since the act of suppressing the execution of a part of a program may lead to incorrect syntax. The fault tolerant aspect, and the potential encapsulation of all language entities as codons, enables any codon to be remapped to another. To facilitate this feature, a "substitution codon" entry 608 allows one codon to be substituted for another one. Thus, for example, it may be useful in certain modeling situations to change a "+" operation in a given program to a "*" operation. Embodiments of the present invention envision that this could be accomplished with regard to the exemplary temperature conversion program by placing the number "3" at the "Instruction 9" column in the substitute codon entry 608 of multi-property codon information table 602. This is because codon number 3 represents the "*" operation, as per master codon table 304. Thus, this would cause the initial "+" operation of Instruction 9 (note the "1" in codon number entry 604 at Instruction 9, which represents the "+" sign) to be executed as a operation. The present invention also envisions the implementation of global-type changes, where, e.g., all "+" operations in a program are changed to "*" operations. Of course, this feature may cause circularity (e.g., A->B, B->C, C->A). To prevent this from occurring, embodiments of the present invention contemplate checking whether any codons being redirected appear twice. If any do, the redirection is stopped at that point. The instances entry 610 indicates the number of times that a codon has been encountered during, e.g., a single execution of the program. Thus, embodiments of the present invention contemplate that, e.g., the number of times a codon has been encountered in a most recent execution of a program or subroutine (potentially at, e.g., a particular virtual machine level, discussed below) would be recorded by this instances entry 610. The total instances entry 612, on the other hand, is envisioned to keep track of the total number of times that a codon has been encountered during, e.g., the course of multiple executions and/or virtual machine levels of the program. It should be understood that the distinction generally mentioned above as to when an encounter of a codon should be registered in the instances entry 610 and/or total instances entry 612 is by way of example, and that any number or criteria can be used. Also, it is envisioned that additional types of "instances" entries can be added to multi-property codon information table 602 (or the "fat tape") that can be used for recording when certain codons are encountered under any number of different types of circumstances. An example describing a use of recording the "instances" and "total instances" within the multi-property codon information table 602, generally, and as compared with a contemplated use of the instances recorded in the master codon table 304 of FIG. 3 is given as follows. In this example, it is assumed that the master codon table 304 also has two instance-related columns, one called "instances" and the other "total instances." The former is envisioned to record all encounters with a particular type of codon in the course of a single execution of a program, while the latter records all encounters with the type of codon over multiple executions of one or more programs. In the example below, assume the existence of two programs:
The following sequence of the programs mentioned above is executed: GreetingsWorld, Greetings, Greetings, and Greetings. The table below shows how the properties in the multi-property codon information table 602 for "Hello" in Greetings and the master codon 304 table for "Hello" would change with respect to the "instances" entries and "total instances entries":
After the execution of GreetingsWorld, there was no change in the instances entry of the multi-property codon information table for "Hello" in Greetings because the program Greetings was not executed. But the master codon table for the codon "Hello" recorded two instances of "Hello." (Remember, GreetingsWorld's programs is "Hello", PrintText, "World," PrintText, "Hello," PrintText). Still with regard to FIG. 6, the errors and total errors entries (614 and 616, respectively) keep track of errors and total errors that occur with regard to a codon, with an envisioned distinction between the usage of the errors entry 614 versus total errors entry 616 being that as described above with regard to instances. As with the errors column 320 described with regard to the master codon table 304 of FIG. 3, recording any number of different types of errors is contemplated by embodiments of the present invention. The parent agent entry 618 contemplates containing an indication of the agent from which the codon was copied. More specifically, embodiments of the present invention contemplate that parent agent entry 618 can contain the codon number of the agent from which the codon was copied. In particular, this has application in a genetic programming environment where various portions of agents are cut and spliced together to form new agents. In such environments, it may be advantageous to readily determine the agent (i.e., the parent) from which a particular codon came from in the cutting and splicing process. In this way, when a new piece of code is formed in a genetic programming environment, one can readily determine the "parents" of that piece of code (and, in particular, the parent of any given codon). In addition, items such as an author identification number or TCP/IP address can be assigned to indicate who created the program. The parent index entry 620 indicates the position within a particular parent agent from which a codon was copied (in, e.g., a genetic programming environment). Thus, for example, if codon number 8 at Instruction 1 in the multi-property codon information table 602 had been in the fourth position (i.e., the fourth codon) in the program from whence it came, then the number "4" would be put into the parent index entry in the Instruction 1 column. User tag 622 and "other user-defined fields" 624 indicate that embodiments of the present invention contemplate that various other types of entries are envisioned, and thus other types of information can be tracked and recorded. One such additional entry envisioned by the present invention relates to keeping designated instructions together in the event that they achieve (or are dose to) some pre-defined score or threshold level from a fitness function in a genetic programming environment. Thus, for example, if a piece of code comprising Instructions 3-7 is found to have scored very well, an entry in the multi-property codon information table for each of those instructions can indicate that the codon should not be separated from the preceding and/or following codon, thus making sure that those instructions are kept together as the genetic programming process precedes (e.g., keep codons 2 through 6 together). Another use would be to record the IP address of a computer on which the codon was recorded to further help identify the codon's origin. In general, the concept of the multi-property codon information table 602 essentially allows programs to, in effect, become "read/write" in nature, in that the elements of the program (i.e., the codons) can be given dynamically changeable attributes. In this way, various items of information can be associated with each codon in a program and a population of initially identical programs can be individualized. Tracking and storing information about individual codons in the manner described above has numerous advantages. One such advantage relates to debugging programs. A programming language that uses the concepts described above would effectively contain a built-in debugging mechanism, since it can inherently track where the program logic has been (e.g., which codons have been encountered), and thus which aspect of a program (e.g., which subroutines) has been entered. Use of the concepts described above also allow two initially identical programs to differ from one other as they are used by different persons (assuming, of course, that each is not used in an identical manner). Thus, each program will have its own individual footprint. Also, as mentioned above, the parent agent and parent index entries (618 and 620, respectively) allow an added advantageous level of analysis in genetic programming environments. Another advantage of implementing a multi-property codon information table 602 as discussed above relates to protecting aspects of the virtual machine environment from potentially unwanted effects of genetic programming, while at the same time facilitating a reasonable amount of flexibility for the genetic programming process. More specifically, when programs are arbitrarily spliced and recombined in a fault tolerant environment, in which data and instructions are intraconvertible, it can be difficult, if not impossible, to predict the potential outcomes. Consequently, embodiments of the present invention envision that certain sensitive aspects of the virtual environment and host computer system need to be safeguarded and/or restricted to prevent undesirable and potentially destructive effects, and that direct access to the master codon table 304 is not permitted by the code that is being bred and executed. However, the aspects of the multi-property codon information table 602 are accessible, but as indicated above, the table relates only to a particular program and not the entire environment (so a potential negative result would not be as catastrophic). Thus, certain precautions are implemented while preserving much flexibility. Of course, it should be understood that the present invention envisions any number of specific ways and degrees to implement this concept. While the particular mechanism described above for keeping track of codon information utilizes a table format, it should be understood that the present invention contemplates that any number of other different types of structures can be used. For example, in order to stream programs or transfer information from machine-to-machine, certain properties must be attached, and structures are envisioned for use to accommodate this. In addition, any number of different types of information relating to the codons (i.e., in addition to and/or less than and/or different than the entries of multi-property codon table 602 described above) is contemplated by embodiments of the present invention. As indicated above, one of the features contemplated by embodiments of the present invention is the ability to treat programs as data and vice versa. For example, when a codon number is encountered, rather than looking it up in the master codon table 304 and executing it in accordance with its meaning in that table, the present invention has the ability to instead treat the codon number simply as data (i.e., as a numerical constant). As indicated above, an advantage to being able to treat programs as data is the enhanced ability to execute the programs in a genetic programming environment. Furthermore, treating programs as data also facilitates program self-analysis, self-modification and the ability for programs to readily create other programs. An example of this latter advantage is now shown with regard to FIG. 7. The example of FIG. 7 contains the labels "Arg1" and "Arg2". These labels are "keywords" and refer to the parameter space of an agent. (A number of exemplary keyword codons contemplated by embodiments of the present invention will be discussed below in conjunction with various figures and features.) Embodiments of the present invention contemplate that all agents can use these same terms to access and modify their parameters. The maximum number of agent parameters is contemplated to be a user-configurable element of the system. As a default, the size of the parameter space is set to equal the size of the stack space. When an agent that requires n parameters is invoked, the top n items on the active virtual machine stack are copied into a set of private registers, named "Arg1", "Arg2", . . . "ArgN" within the current virtual machine record. This standardization of parameter space and nomenclature is envisioned to enhance the "evolvability" of programs by increasing the overlap of terminology and data structures between two arbitrary programs. Referring now to FIG. 7, an agent called "ProductMaker" 706 is shown. This agent requires three parameters. In effect, the agent takes the first two parameters, both codon numbers of one-parameter functions, and creates a new agent having the name of the third parameter. The resultant one-parameter agent computes the product of the two one-parameter functions evaluated at a given value. In this particular example, the two arguments passed to ProductMaker are the codon numbers of FunctionA and FunctionB. FunctionA 702 multiplies arg1 by the constant "2." Thus, as indicated in 702, the first item in FunctionA is the variable "arg1", the second item is the number "2", and the third is the multiplication sign (thus, the format used is post-fix). Though comprised of codons itself, FunctionA also exists as a codon. Here, FunctionA has been assigned the codon number of 3482. (It is envisioned that such assigned numbers can be arbitrary, or can in some way be representative of the entity that they represent, e.g., all arithmetic operators are three digit codons that begin with the number "1". Similarly, when converting programs from text to codon representations, certain words can automatically be associated with specific codon types, for instance, any word enclosed in brackets is considered a "marker" codon, described below.) Similarly, FunctionB contains the codons arg1, the number "2," and the function "POW". "POW" is an abbreviation for the 2-parameter power operation, which raises the value of its first argument (arg1) to the power of the second argument (which, here, is "2"). FunctionB is, itself, also a codon, and in this example it has been assigned the codon number of 8113. The ProductMaker program 706 in this example thus creates a new program that multiplies FunctionA with FunctionB. The ProductMaker program 706, itself requires three input arguments (which are envisioned to come from an input stack). The first two arguments, each denoted here as "AgentCodon," represent the functions to be multiplied (in this example, Function A and FunctionB). The third argument (called "TextCodon") represents the name that a user wishes to give to the new program that is going to be created. In this particular example, the new program will simply be called "NewFunction." Program stack 710 as shown in FIG. 7 represents the contents of the program stack 710 after the ProductMaker program 706 has executed all its steps just prior to the step of actual defining the contents of a stack as an agent (i.e., objectizing those contents). Those steps that obtained that result will now be described. Referring to the ProductMaker program 706, in the first step, the codon "AsData" is encountered. This codon (which can be thought of as a toggle) indicates to embodiments of the present invention that the subsequently encountered codon should be translated as data rather than executed in normal fashion. Consequently, in Step 2, rather than executing the methods associated with Arg1 (which involves retrieving the current value of an active agent's first parameter register), the codon number of "arg1" is placed on program stack 710 at position 1. As can be seen from a block designated "assigned numbers" 708, "arg1" was assigned codon number 684. In Step 3, the codon "arg1" is again encountered, this time in normal operating mode. The value of the first argument passed to the agent, in this case the codon number of FunctionA (3482), is retrieved and placed on the stack 710 in position 2. Consequently, Steps 2 and 3 illustrate the distinction between treating codons as data and executing them. In this example, when the new program that ProductMaker is creating is executed, arg1 at position 1 will be what will ultimately grab the value necessary for input to FunctionA. In Step 4, again, the "AsData" codon is encountered, thus treating the codon in step 5 (i.e., "arg1") as data. Again, rather than executing "arg1", its codon number (i.e., 684) is placed on the stack 710 at position 3. In Step 6, "arg2" is encountered, and since it is not preceded by an AsData codon, it will be translated (i.e. executed) in normal mode. The execution of "arg2" is similar to "arg1", except that the second (rather than the first) argument is passed to the ProductMaker program from the input stack. Specifically, in this case, the codon number of FunctionB (8113), is placed on the program stack 710 at position 4. At Step 7, the "AsData" codon is again encountered so that, at Step 8, the multiplication operation will be treated as data. Thus, the codon number of the multiplication operation (230), seen from the assigned numbers block 708, will be placed on the program stack 710 at position 5. (It should be understood that the present invention also contemplates the potential toggling in the opposite direction, such that data can be treated as an executable program). At this point, it can be seen that positions 1 through 5 of the data stack contain the makings of a program where FunctionA (as defined above) and FunctionB (also defined above) will both receive a value in the form of arg1, wherein the two functions are then multiplied together. In other words, the new program created by ProductMaker can be interpreted as follows: (arg1*2)*(arg1^2). Referring back to the ProductMaker function of 706, Step 9 contains "arg3," which takes the third argument passed to the ProductMaker function. Here, that third argument is the text "New Function," which was previously encapsulated and assigned the object number 8866. This number is placed on the program stack 710 at position 6. Step 10 of ProductMaker indicates the number of arguments that NewFunction expects to receive (in this case, it expects one argument). Thus, the codon representation for 1 (i.e., codon number 809 in this example, as indicated at assigned numbers block 708) is placed at position 7. It should, of course, be understood that the number representing the number of arguments could itself have been yet another argument (e.g., arg4) in the event that ProductMaker allowed for FunctionA or FunctionB to have more than one input argument. In step 11, a "DefineAgent" codon is encountered (which in embodiments of the present invention is envisioned to have the effect of "encapsulating" the contents of the program stack 710) and assigns a new codon number (not shown) to represent the newly created agent in the master codon table 304 of FIG. 3. The present invention contemplates that there are any number of ways to actually call functions such as the ProductMaker one described. One exemplary syntactical form for doing this is as follows: {[FunctionA FunctionB] "New Function" ProductMaker}. In this example, placing FunctionA and FunctionB within the brackets causes them to be treated as data (i.e., causes their codon representation to be treated as data) which in the example above, enabled the codon number for FunctionA and FunctionB to be placed upon program stack 710. Other syntactical forms contemplated by embodiments of the present invention include the following:
The present invention envisions that agents also can utilize a codon named "pop", which returns the top item from the calling (or parent) agent's input stack. This allows an agent to access more data than it was originally provided and to utilize an indefinite number of arguments. Such functionality is useful for implementing functions such as C's printf( ). The ability to treat programs as data and vice versa allows the present invention the ability to perform many different types of tests and data manipulations not possible using conventional methodologies. Embodiments of the present invention envision the inclusion of a "transform" codon that takes a given number and, if a corresponding codon exists, carries out the procedures associated with that codon (i.e., it executes the operation of the codon associated with that number). The "transform" operator is metaphorical to invoking a function through a pointer. The C/C++ language provides this facility, but not in a fault tolerant fashion; the pointer may refer to an address that does not contain a valid function. To provide pointer safety, a list of all function addresses must be available at run-time (but this is currently not a provision of C/C++). However, the "transform" operator of the present invention is "safe" or fault tolerant, since all prospective transform operations can be checked against the master codon table for validity. Moreover the fault tolerance is such that, in a preferred embodiment, any and all mathematical functions in the environment can be performed on codon numbers and the result transformed into an operation without risk of failure. Additionally the current environment may be extensible at run-time, allowing new instructions to be added. Other key codons similar to "transform" which provide conversion and execution of data and are contemplated by embodiments of the present invention include:
The present invention extends the space of programs that can be evolved by the Genetic Programming process. Conventional environments require that all programs included in a Genetic Programming population be grammatically correct. In effect, this is akin to knowing, in form, what each program is going to do in advance of being run. With the introduction into the environment of codons such as "transform" which convert data to instructions, and generally with the utilization of functions which can, for example, take an indeterminate number of arguments (discussed below) as well as "junk" instructions, the invention facilitates the execution of programs whose output cannot be known in advance. Such ungrammatical, indeterminate programs can be included in the genetic programming process when run on the present invention, thus broadening the space of potential solutions. A simple example of the genetic programming process is now described with regard to FIG. 8. Referring to FIG. 8, two starting programs, one termed a "mother" program 802, and the other a "father" program 806, are shown. Each of these programs has 10 codons associated with them. In this example, during the genetic programming process, a piece is taken from the mother program 802 in the form of piece 804. Similarly, a piece is also taken from father program 806 in the form of piece 808. These two pieces (i.e., constructs, where the term simply refers to one or more codons), 804 and 808, are then spliced together to form a child piece 810. This child piece can then be tested in accordance with a fitness function as per the general principles of genetic programming, or for any other purpose. Although the example manipulates programs in their codon form, as a result of the intraconvertiblity of text and codon forms, it is also possible to perform genetic operations on the programs in their textual representation. In the latter case, the genetic operations are simply text operations (e.g. string concatenations). In either case, during the cutting and splicing phase, the representations are treated as data. After breeding, the resultant data is, if necessary, converted from text to codons and again treated as an executable program. The fault tolerant aspects of the present invention enable innumerable ways of rearranging and recombining pieces of code to create new programs. As indicated previously, it is envisioned that any element (e.g., function, operator, constant, etc.) of a program in the environment can be a codon. Due to fault tolerance, codons can be deleted, inserted, swapped, and re-ordered without impinging the ability to execute. Thus, the invention inherently facilitates Genetic Programming. The multi-property codon information table 602 (discussed above with regard to FIG. 6) automatically tabulates a variety of information that is useful in Genetic Programming and biological simulation. Specifically, as mentioned previously, multi-property codon information table 602 contains a parent agent entry 618 and parent index entry 620 that is available for each codon in a program. Consequently, each codon in the child program 810 is envisioned to have an associated entry indicating where it came from (i.e., indicating it's parent program). Thus, for example, the third codon (e.g., instruction) in child 810 (i.e., the codon whose numeric value is 10) would have in its parent agent entry 618 the codon number associated with the "mother" program 802, since that is where the "10" came from. Then, in parent index entry 620 of the multi-property codon information table 602, that entry for the aforementioned codon in the third position would be 3, since it came from the third position of the "mother" program 802. Thus, the number 3 would be placed in the appropriate parent agent entry for the child program 810. Similarly, for the tenth codon of child program 810 (here, with a value of 1), the parent agent entry 618 for that codon would be the codon number that represented the father program 806, while the parent index number 620 would be the number 6 (since it came from the sixth position of the "father" program 806). While the point in the mother program 802 and father program 806 at which a piece of the program is obtained may be arbitrary (e.g., determined by a random number generator), embodiments of the present invention contemplate that the length of code (i.e., number of codons) taken from the mother program 802 and father program 806 can depend upon the functionality of the code. More specifically, it may be desirable to obtain portions of code that produce some complete (or quasi-complete) functional results, or that contain codons of a specific type (e.g. codons that create computer graphics). This strategy will in many cases increase the chances that the resultant spliced code will score higher in a "fitness" test where a solution to a particular problem is sought, or generate output of interest to a designer conducting directed evolution. In conventional Genetic Programming frameworks, wherein all programs are grammatically correct, determining logical units is simply a matter of parsing. This method is not always effective in the more general and fault tolerant framework provided by the present invention. A more effective procedure for determining logical units when working with ungrammatical, indeterminate, "junky" programs, is to evaluate the program in a step-wise fashion and to note the codons contributing to the result at each step. Besides parsing and step-wise evaluation, embodiments of the present invention provide another facility for obtaining a functional portion of a parent program in a genetic programming environment. The method involves utilizing dependency information such as that found in the stack tag table of FIG. 5, and particularly the dependency column 514. Specifically, this dependency column 514 can be used to indicate how many codons of a parent program to obtain so that a functional unit is procured. As an example, and referring back to FIG. 5, it should first be observed that the program represented in this Figure by the Origin Codon column 512 is the mother program 802 of FIG. 8 (see program 404 of FIG. 4 for the codon correlation). In any event, if through the use of, e.g., a random number generator, an indication exists that the program represented at Origin Codon column 512 should be split (or copied) at the "multiply" operation at Step 7, then the present invention would note that the corresponding entry in the dependency column 514 at Step 7 is a "1." This would be an indication to grab the codons from Step 1 through 7, since they make up a functional unit (and yield a functional result). Similarly, if it is desired to make the split at the "divide" operation of Step 6, then since the dependency column contains a "4" at that step, the codons from Steps 4, 5 and 6 would be taken, and spliced with other codons to form the new child 810. This works, because the dependency column 514 indicates the earliest step from which the value of that current step depended, and thus obtaining the codons from that earliest step forward will produce a unit of codons likely to yield some kind of result. It should be understood that the scheme for determining what codons to use in a genetic algorithm environment as described above is by way of example, and that the present invention contemplates any number of other techniques for determining where in a parent program a cut (or copy) is to be made, and how many codons are to be used from each parent in creating the new child program. Another example of advantages of the fault tolerance of the present invention with relation to genetic programming is that the present invention has the ability to explore and develop solutions from a larger search space than the space permitted by conventional genetic programming environments. The exemplary code written below generates a random integer between 1 and 10 and assigns it to variable n. It then generates n random integers between 0 and 1 and sums the result. An exemplary manner for doing this using the present invention is as follows:
This program could not be executed in conventional genetic programming environments because of the uncertainty of the numbers that should be summed. Specifically, codons such as "sum" would not be possible, since conventional environments would require, e.g., many conditional statements for taking into account the number of possible integers that may need to be summed, depending upon the number generated by the random number generator. In particular, because of the stack-based framework contemplated by embodiments of the present invention, all of the numbers on the stack (no matter how many there may be, within certain physical or architectural limits) can simply be summed. Alternatively, the present invention can also indicate that it will take exactly n numbers from the stack, indicative e.g., of the number of random numbers generated. These types of codons (termed variadic and argomatic, respectively) will be mentioned further below. Thus, the present invention allows for keyword codons such as "Sum" to be variadic (i.e., it is not known how many inputs into Sum will exist at the time of execution). As noted previously, the present invention contemplates a fault tolerant environment, where a program having incorrect syntax will be executed and not crash the system (although such a program may or may not yield a worthwhile result, or even any result at all). However, the present invention does envision instances where it would still be useful for an otherwise meaningless sequence of codons to yield at least some result. Such a situation can be envisioned in a genetic programming environment where, in the course of creating and testing pieces of code that have been spliced together, it is still useful to ensure that there is at least some type of output against which fitness can be tested. In other words, if the results of executing the program were allowed to yield no result at all, then there would (typically) be absolutely no chance at all that the program would score well on the fitness function, whereas forcing at least some result to occur would allow for at least some (albeit typically slim) chance that it may score favorably, or that it's performance can be differentiated from others in a population of programs. In view of the above, the present invention contemplates a scheme for "grammar repair," where the grammar of a program to be executed is repaired. One way that the present invention contemplates implementing this is now described with regard to FIG. 9. Referring to FIG. 9, the four-step program at 902 would, in conventional programming languages, be considered syntactically incorrect. Specifically, Step 1 is an addition operation, which expects to receive two arguments to be added together. However, in program 902, there are no such arguments available. Similarly, the cosine operation of Step 2 would expect one argument (but has none), and the multiplication operation of Step 3 would expect two arguments (and has none). Regarding the exemplary program 902 described above, embodiments of the present invention contemplate that a user can, if desired, initiate a grammar repair feature to allow the program 902 to yield some meaningful output. When this feature is implemented, then upon encountering the addition operation of Step 1, the present invention would recognize an insufficient arguments situation. Embodiments of the present invention then envision that, e.g., a grammar repair codon is used to attempt to generate meaningful arguments. A simple implementation would be to generate the arguments randomly. In the example of FIG. 9, the two numbers generated are 4 and 7, which are placed upon the stack as indicated at 904. These two numbers are then added together in accordance with the addition operation of Step 1 in program 902 (yielding the number 11), which is then placed on the stack at Step 2. Still referring to program 902, Step 2 is the cosine operation which, requiring one argument as input, receives the 11 from stack 904, and yields 0.9816, as shown at Step 3 of stack 904. Then, upon encountering Step 3 of the program 902, which is the multiplication operation, the present invention looks for two arguments. Finding only one (i.e., the 0.9816), the present invention's grammar repair generates a second argument (which in this example is the number 3) and places it on the stack at step 3 as shown. The multiplication can then commence, yielding the number 2.9448. Thus, the program 902 can then be tested for fitness using the numbers generated, as shown on stack 904. It may very well be that the results are unlikely to score well with the fitness function, but at least there will be some result to work with. Embodiments of the current invention contemplate that the specific type of grammar repair function used can differ depending upon the specific codon(s) encountered. Furthermore, it is envisioned that the grammar repair function can vary with each codon and can be set at any time. In addition to the data mode and program mode described above, the present invention also contemplates embodiments where a program can be running in a "silent mode" such that, when the silent mode is turned on, codons that would otherwise be executed are simply skipped over in the sense that no operation is performed, i.e., no result is generated, only the codon status and machine information (e.g., various counters) is updated, and there will be no return values. This feature will be further explained with regard to FIG. 10. Re | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
