Automated bot development system7039654Abstract An automated system (100) and method for developing Complete Context™ Bots (30) for an organization. After extracting data from existing narrowly focused systems, mission measures and organization levels are defined for one or more organizations. The elements, factors and risks that contribute to mission measure performance by organization level and organization are systematically defined and stored in a ContextBase (60) using up to six context layers. ContextBase (60) information is extracted for specified combinations of context layers, organization levels and organizations as required to produce complete context frames that are used to support simulations of bot performance under a variety of scenarios. The program instructions that will maximize bot performance under the forecast scenarios are identified. After this programming is transferred to the Complete Context™ Bot (30), it is activated. Claims The invention claimed is: Description CROSS REFERENCE TO RELATED APPLICATIONS
In many cases these deficiencies in bot background data are a product of the limitations of the narrowly focused management systems (hereinafter, narrow systems) like customer relationship management and supply chain management systems that most organizations use to manage their day to day operations. The second deficiency of currently available bots is that they fail to strike a balance between optimizing short term impact and long term performance. As Jack Welch, the retired CEO of General Electric said "any fool can optimize short term results and any fool can optimize long term results. The real trick is striking a balance between the two." The third deficiency of all known bots is that they do not have the ability to identify new information that is relevant to the decisions/recommendations being made. Said another way, these bots can optimize their decisions and/or recommendations within the box that has been defined by the user but they cannot change the box as required to improve the value of the decisions/recommendations being made. The shortcomings of existing bots can be summarized by saying that bots do not have the complete context required to optimize short term results, they do not have the complete context required to balance short term results against long term performance and they do not have the ability to independently define the complete context that should guide their performance. It is clear from the preceding discussion that bots need to have the ability to define, obtain and process complete context information if they are ever to achieve the level of market acceptance that has been widely expected for over twenty years (again, using our expanded definition of bots that includes robots). A critical first step in defining a new approach to solving the problem of "getting the complete context to the right bot" is to clearly define the terms: data, information, context and knowledge. Data is anything that is recorded. This includes records saved in a digital format and data stored using other means. A subset of the digital data is structured data such as transaction data and data stored in a database for automated retrieval. Data that is not structured is unstructured data. Unstructured data includes data stored in a digital format and data stored in some other format (i.e. paper, microfilm, etc.). Information is data plus context of unknown completeness. Knowledge is data plus complete context. Complete context is defined as: all the information relevant to the decision being made using the data at a specific time. If a decision maker has data and the complete context, then providing additional data or information that is available at the time the decision is being made will not change the decision that was made. If additional data or information changes the decision, then the decision maker had "partial context". We will use an example to illustrate the difference between data, partial context, complete context and knowledge. The example is shown in Table 1.
The example in Table 1 illustrates that there is a clear difference between having data with partial context and having knowledge. Data with partial context leads to one decision while data with complete context creates knowledge and leads to another completely different decision. The example also reinforces the prior discussion regarding the reasons that so many firms are not realizing the return they expect from their investments in bots. Virtually every bot development system being sold today processes and analyzes data within the narrow silo defined by the portion of the organization it supports. As a result, these systems can not provide bots with the complete context required to turn data into knowledge. Another limitation of all known bot development systems is their complete reliance on structured historical data. The problem with this is that not all data are stored and that most of the data that is stored is stored in an unstructured format that is difficult to process. The most common estimate is that 80% of the data that is stored digitally is stored in an unstructured format. A number of products are being developed to help structure unstructured digital data. The system of the present invention is capable of accepting input from these systems. The system of the present invention also has the ability to structure and process unstructured: text data, video data, geo-coded data and web data on its own. This leaves the problem of data that has not been stored in any system as an area needing further development. While much of the data that has not been stored may not be useful for performance management and bot development, the data that resides with subject-matter experts is potentially very valuable. In fact, as the world moves into an increasingly uncertain environment with a growing number of non-traditional threats and increasingly volatile weather patterns, the need to rely on information from subject-matter experts is expected to increase dramatically. A method for systematically incorporating data from subject-matter experts into bot development systems is clearly needed. However, to be successful, this method needs to overcome a few potential problems. While subject-matter experts have a great deal of knowledge about a particular field, it is more likely than not that:
In light of the preceding discussion, it is clear that it would be desirable to develop methods and systems that could define the complete context required for effectively and efficiently programming bots. In short, the new methods and systems should help organizations improve their performance by developing, storing, retrieving and applying complete context information for use in developing sophisticated bots to complete tasks and develop recommendations in an automated fashion. SUMMARY OF THE INVENTION It is a general object of the present invention to provide a novel, useful system that develops, analyzes, stores and applies complete context information for use in developing robust, productive bots in an automated fashion. This new system overcomes the limitations and drawbacks of the prior art that were described previously. Processing in the automated bot development system (100) is completed in three steps: The first step in the novel method for bot development involves using data provided by existing narrow systems and the nine key terms described previously to define mission measures for each organization level the bots will be supporting. As part of this processing data from the world wide web, unstructured data, geo-coded data, and video data are processed and made available for analysis. The automated indexation, extraction, aggregation and analysis of data from the existing, narrow computer-based systems significantly increases the scale and scope of the tasks that can be completed by the bots. This innovation also promises to significantly extend the life of the narrow systems that would otherwise become obsolete. The system of the present invention is capable of processing data from the "narrow" systems listed in Table 2.
The extracted narrow system information is identified separately for each of the different subsets of the organization. Unstructured data is also captured for processing and later use in the automated bot development system (100)as part of this process. For simplicity, we will refer to the collection of different subsets of an organization that can be supported by the system for automated bot development as organization levels. Managers use the extracted narrow system data to define quantitative mission measures for each organization level as part of the first step of processing. The quantitative mission measures that are initially created using the extracted narrow system data from each organization can take any form. For many of the lower organization levels (combinations being the highest level and an element being the lowest organization level) the mission measures are simple statistics like percentage achieving a certain score, average time to completion and the ratio of successful applicants versus failures. At higher levels more complicated mission measures are generally used. For example, Table 4 shows a three part mission measure for a medical organization mission—patient health, patient longevity and financial break even. The system of the present invention provides several other important features, including:
After the user defines the mission measures and the data available for processing is identified, processing advances to second stage of processing where mission-oriented context layers for each organization level are developed and stored in a ContextBase (60). In the final processing step the context layers and organization levels are combined as required to develop context frames. The context frames are used to drive simulations that identify the program parameters that will maximize mission measure performance. The system of the present invention is the first known system with the ability to systematically develop the context required to support the comprehensive analysis of mission performance and turn data into knowledge. Before completing the summary of system processing, we will provide more background regarding complete context, context layers and the ContextBase (60). The complete context for evaluating and optimizing performance can contain up to six distinct types of information:
The ability to rapidly create context frames can be used to analyze a number of different operating scenarios including an alliance with another organization or a joint exercise between two organizations. For example, combined context frames could be created to support Company A and the Company B in analyzing the short and long term implications of a joint exercise as shown in Table 3. It is worth noting at this point that the development of a combination frame is most effective when the two organizations share the same mission measures.
Using the context frames from the combined organizations to guide both tactical (short-term) and strategic analysis and decision making would allow each organization to develop plans for achieving a common goal from the same perspective (or context) while still maintaining independence. This capability provides two distinct advantages over traditional bot development applications that:
Before moving on to better define context, it is important to re-emphasize the fact that the six layers of context we have defined can be used to support the development of bots that will support management and analysis in a wide variety of fields. In fact, the system of the present invention will support the development of Complete Context™ Bots (30) for any organization with a quantifiable mission. For example, Table 4 illustrates the use of the six layers in analyzing a sample business context and a sample medical context.
Our next step in completing the background information is to define each context layer in more detail. Before we can do this we need to define nine key terms: mission, element, resource, asset, agent, action, commitment, priority and factor, that we will use in the defining the layers.
6. Action—consumption, production, acquisition or transfer of resources that support organization mission—examples: sale of products and development of a new product (actions are a subset of events which include anything that is recorded);
The automated bot development system (100) develops a complete picture of how the organization is performing, saves it in the ContextBase (60), divides the picture into frames and then re-combines the frames as required to provide the detailed information regarding the slice of the organization being supported by a given bot. These details are included in the context frames that are produced using information in the ContextBase (60). The context frames are then used in simulations that define the program instructions that will be given to each bot. Developing the complete picture first, before dividing it and recombining it as required to produce context frames, enables the system of the present invention to reduce IT infrastructure complexity by an order of magnitude while dramatically increasing the ability of develop robust, productive bots. Because the ContextBase (60) is continually updated by a "learning system", changes in organization context are automatically captured and incorporated into the processing and analysis completed by the automated bot development system (100). The mission-centric focus of the ContextBase (60) provides four other important benefits. First, by directly supporting mission success the system of the present invention guarantees that the ContextBase (60) will provide a tangible benefit to the organization. Second, the mission focus allows the system to partition the search space into two areas with different levels of processing. Data that is known to be relevant to the mission and data that is not thought to be relevant to mission. The system does not ignore data that is not known to be relevant, however, it is processed less intensely. Third, the processing completed in ContextBase (60) development defines a complete ontology for the organization. As detailed later, this ontology can be flexibly matched with other ontologies as required to interact with other bots from organizations that have organized their information using a different ontology. It also gives the bots the ability to extract data from the semantic web in an automated fashion. Finally, the focus on mission also ensures the longevity bots developed using the context stored in the ContextBase (60) as organization missions rarely change. For example, the primary mission of each branch of the military has changed very little over the last 100 years while the assets, agents, resources and the social environment surrounding that mission have obviously changed a great deal. The same can be said for almost every corporation of any size as almost all of them have a shareholder value maximization mission that has not changed from the day they were founded. The difference between the mission-oriented approach and a more generic approach to knowledge management are summarized in Table 5.
Another benefit of the novel system for automated bot development is that it can be used for creating bots that support the performance of any entity with a quantifiable mission. It is most powerful when used to support an organization with different levels where each of the levels are linked together with mission measures that are in alignment. Before going further it is important to compare the six context layers we have defined for our mission-oriented ContextBase (60) with more traditional designations used for capturing and organizing. FIG. 13 shows the traditional knowledge classifications. FIG. 14 shows how the knowledge classifications used by the system of the present invention. A comparison of the two figures shows that data classification and storage schema used by the system of the present invention can be readily mapped to the "traditional" classifications. The six layer classification scheme shown in FIG. 14 gives the novel system of the present invention the ability to answer the "why" questions related to mission performance while developing robust, productive bots. In addition to providing the ability to systematically analyze and develop Complete Context™ Bots (30) that will help improve mission performance, the automated bot development system (100) provides the ability to create robust models of the factors that drive action, event and instant impact levels to vary. This capability is useful in developing the programs to improve bot performance. One of the main reasons for this is that many mission measures relate to the long term impact of actions, events and instant impacts on organization performance. This capability also enhances the ability of the system to program bots that focus on optimizing actions and impacts To facilitate its use as a tool for improving performance, the system of the present invention also produces reports in formats that are graphical and highly intuitive. By combining this capability with the previously described capabilities for: flexibly defining robust performance measures, identifying complete context information and supporting the development of robust, productive bots, the automated bot development system (100) gives executives and managers the tools they need to dramatically improve the performance of any organization with a quantifiable mission. BRIEF DESCRIPTION OF DRAWINGS These and other objects, features and advantages of the present invention will be more readily apparent from the following description of an embodiment of the invention in which: FIG. 1 is a block diagram showing the major processing steps of the present invention; FIG. 2 is a diagrams showing the application layer portion of software architecture of the present invention; FIG. 3 is a diagram showing the tables in the application database (50) of the present invention that are utilized for data storage and retrieval during the processing in the innovative system for automated bot development; FIG. 4 is a diagram showing the tables in the ContextBase (60) of the present invention that are utilized for data storage and retrieval during the processing in the innovative system for Complete Context™ Bot (30) development; FIG. 5 is a block diagram of an implementation of the present invention; FIG. 6A, FIG. 6B, FIG. 6C and FIG. 6D are block diagrams showing the sequence of steps (200) in the present invention used for specifying system settings, preparing data for processing and defining the mission measures; FIG. 7A, FIG. 7B, FIG. 7C, FIG. 7D and FIG. 7E are block diagrams showing the sequence of steps (300) in the present invention used for creating a mission-oriented ContextBase for by organization and organization level; FIG. 8 is a block diagram showing the sequence in steps (400) in the present invention used in defining context frames, programming bots and printing reports; FIG. 9 is a diagram showing the data windows that are used for receiving information from and transmitting information via the interface (700); FIG. 10 is a sample report showing the efficient frontier for organization mission measure and the current position of organization XYZ relative to the efficient frontier; FIG. 11 is a diagram showing how the automated bot development system (100) can be integrated with a business process integration platform; FIG. 12 is a block diagram shown the relationship between different organization levels. FIG. 13 is a diagram showing the traditional classification schema for knowledge management; and FIG. 14 is a block diagram showing the six layer classification schema of the present invention. DETAILED DESCRIPTION FIG. 1 provides an overview of the processing completed by the innovative system for automated bot development. In accordance with the present invention, an automated system (100) and method for developing a mission-oriented ContextBase (60) that contains the six context layers for each mission measure by organization and organization level is provided. Processing starts in this system (100) when the data extraction portion of the application software (200) extracts data from an organization narrow system database (5); optionally, a partner narrow system database (10); an external database (20); and a world wide web (25) via a network (45). The processing completed by the system (100) may be influenced by a user (20) or a manager (21) through interaction with a user-interface portion of the application software (700) that mediates the display, transmission and receipt of all information to and from a browser software (800) such as the Netscape Navigator® or the Microsoft Internet Explorer® in an access device (90) such as a phone, personal digital assistant or personal computer where data are entered by the user (20). While only one database of each type (5, 10 and 20) is shown in FIG. 1, it is to be understood that the system (100) can process information from all narrow systems listed in Table 2 for each organization being supported. In the embodiment, described below all functioning narrow systems within each organization will provide data to the system (100) via the network (45). It should also be understood that it is possible to complete a bulk extraction of data from each database (5, 10 and 20) and the World Wide Web (25) via the network (45) using peer to peer networking and data extraction applications. The data extracted in bulk could be stored in a single datamart, a data warehouse or a storage area network where the analysis bots in later stages of processing could operate on the aggregated data. A virtual database could also be used that would leave all data in the original databases where it could be retrieved as needed for calculations by the analysis bots over a network (45). The operation of the system of the present invention is determined by the options the user (20) and manager (21) specify and store in the application database (50) and the ContextBase (60). As shown in FIG. 3, the application database (50) contains a system settings table (140), a bot date table (141) and a Thesaurus table (142). As shown in FIG. 4, the ContextBase (60) contains tables for storing extracted information by context layer including: a mission measures table (170), a physical layer table (171), a tactical layer table (172), an instant impact layer table (173), an organization layer table (174), a mission layer table (175), a structured data table (176), an internet linkage table (177), a video data table (178), a social environment layer table (179), a text data table (180), a geo data table (181), an ontology table (182), a report table (183), an element definition table (184), a factor definition table (185), an event risk table (186), a scenarios table (187), an event model table (188), an impact model table (189), a context frame table (190) and a simulations table (191). The ContextBase (60) can exist as a database, datamart, data warehouse, a virtual repository or storage area network. The system of the present invention has the ability to accept and store supplemental or primary data directly from user input, a data warehouse or other electronic files in addition to receiving data from the databases described previously. The system of the present invention also has the ability to complete the necessary calculations without receiving data from one or more of the specified databases. However, in the embodiment described herein all required information is obtained from the specified data sources (5, 10, 20 and 25) for each organization, organization level and organization partner. As shown in FIG. 5, an embodiment of the present invention is a computer system (100) illustratively comprised of a user-interface personal computer (110) connected to an application-server personal computer (120) via a network (45). The application-server personal computer (120) is in turn connected via the network (45) to a database-server personal computer (130). The user interface personal computer (110) is also connected via the network (45) to an Internet browser appliance (90) that contains browser software (800) such as Microsoft Internet Explorer or Netscape Navigator. The database-server personal computer (130) has a read/write random access memory (131), a hard drive (132) for storage of the application database (50) and the ContextBase (60), a keyboard (133), a communication bus (134), a display (135), a mouse (136), a CPU (137) and a printer (138). The application-server personal computer (120) has a read/write random access memory (121), a hard drive (122) for storage of the non-user-interface portion of the enterprise section of the application software (200, 300 and 400) of the present invention, a keyboard (123), a communication bus (124), a display (125), a mouse (126), a CPU (127) and a printer (128). While only one client personal computer is shown in FIG. 3, it is to be understood that the application-server personal computer (120) can be networked to fifty or more client, user-interface personal computers (110) via the network (45). The application-server personal computer (120) can also be networked to fifty or more server, personal computers (130) via the network (45). It is to be understood that the diagram of FIG. 5 is merely illustrative of one embodiment of the present invention as the system of the present invention could reside in a single computer or be support by a computer grid. The user-interface personal computer (110) has a read/write random access memory (111), a hard drive (112) for storage of a client data-base (49) and the user-interface portion of the application software (700), a keyboard (113), a communication bus (114), a display (115), a mouse (116), a CPU (117) and a printer (118). The application software (200, 300 and 400) controls the performance of the central processing unit (127) as it completes the calculations required to support automated bot development. In the embodiment illustrated herein, the application software program (200, 300 and 400) is written in a combination of Java and C#. The application software (200, 300 and 400) can use Structured Query Language (SQL) for extracting data from the databases and the World Wide Web (5, 10, 20 and 25). The Complete Context™ Bots (30) can use DAML Query Language (DQL) for interacting with bots from other organizations. The user (20) and manager (21) can optionally interact with the user-interface portion of the application software (700) using the browser software (800) in the browser appliance (90) to provide information to the application software (200, 300 and 400) for use in determining which data will be extracted and transferred to the ContextBase (60) by the data bots. User input is initially saved to the client database (49) before being transmitted to the communication bus (124) and on to the hard drive (122) of the application-server computer via the network (45). Following the program instructions of the application software, the central processing unit (127) accesses the extracted data and user input by retrieving it from the hard drive (122) using the random access memory (121) as computation workspace in a manner that is well known. The computers (110, 120, 130) shown in FIG. 5 illustratively are personal computers or workstations that are widely available. Typical memory configurations for client personal computers (110) used with the present invention should include at least 1028 megabytes of semiconductor random access memory (111) and at least a 200 gigabyte hard drive (112). Typical memory configurations for the application-server personal computer (120) used with the present invention should include at least 5128 megabytes of semiconductor random access memory (121) and at least a 300 gigabyte hard drive (122). Typical memory configurations for the database-server personal computer (130) used with the present invention should include at least 5128 megabytes of semiconductor random access memory (131) and at least a 750 gigabyte hard drive (132). Using the system described above, data is extracted from the narrowly focused enterprise systems, external databases and the world wide web as required to develop a ContextBase (60), create context frames and program Complete Context™ Bots (30). Before going further, we need to define a number of terms that will be used throughout the detailed description of an embodiment of the automated bot development system (100):
Real options are generally supported by the elements of performance of an organization;
As discussed previously, the automated bot development system (100) completes processing in three distinct stages. As shown in FIG. 6A, FIG. 6B, FIG. 6C and FIG. 6D the first stage of processing (block 200 from FIG. 1) extracts data, defines mission measures and prepares data for the next stage of processing. As shown in FIG. 7A, FIG. 7B, FIG. 7C, FIG. 7D and FIG. 7E the second stage of processing (block 300 from FIG. 1) develops and then continually updates the mission-oriented ContextBase (60) by organization and organization level. As shown in FIG. 8, in the third and final stage of processing (block 400 from FIG. 1) prepares context frames for use in simulations, completes simulations and uses the results of the simulations to define programs for Complete Context™ Bots (30). The third stage of process can also prepare and print reports. If system processing is continuous, then the processing described above is continuously repeated. Mission Measure Specification The flow diagram in FIG. 6A, FIG. 6B, FIG. 6C and FIG. 6D details the processing that is completed by the portion of the application software (200) that establishes a virtual database for data from other systems that is available for processing, prepares unstructured data for processing and accepts user (20) and management (21) input as required to define the mission measures for each organization level. As discussed previously, the system of the present invention is capable of accepting data from all the narrowly focused systems listed in Table 2. Data extraction, processing and storage is completed by organization and organization level. Operation of the system (100) will be illustrated by describing the extraction and use of structured data from a narrow system database (5) for supply chain management and an external database (20). A brief overview of the information typically obtained from these two databases will be presented before reviewing each step of processing completed by this portion (200) of the application software. Supply chain systems are one of the seventy plus narrow systems identified in Table 2. Supply chain databases are a type of narrow system database (5) that contain information that may have been in operation management system databases in the past. These systems provide enhanced visibility into the availability of resources and promote improved coordination between organizations and their suppliers. All supply chain systems would be expected to track all of the resources ordered by an organization after the first purchase. They typically store information similar to that shown below in Table 7.
External databases (20) are used for obtaining information that enables the definition and evaluation of elements of performance, external factors and event risks. In some cases, information from these databases can be used to supplement information obtained from the other databases and the Internet (5 and 10). In the system of the present invention, the information extracted from external databases (20) includes the data listed in Table 8.
System processing of the information from the different databases (5, 10 and 20) and the World Wide Web (25) described above starts in a block 202, FIG. 6A. The software in block 202 prompts the user (20) via the system settings data window (701) to provide system setting information. The system setting information entered by the user (20) is transmitted via the network (45) back to the application-server (120) where it is stored in the system settings table (140) in the application database (50) in a manner that is well known. The specific inputs the user (20) is asked to provide at this point in processing are shown in Table 9.
The system settings data are used by the software in block 202 to establish organization levels and context layers. As described previously, there are six context layers for each organization level. The application of the remaining system settings will be further explained as part of the detailed explanation of the system operation. The software in block 202 also uses the current system date to determine the time periods (generally in months) that require data to complete the calculations. The analysis of organization level performance by the system utilizes data from every data source for the four year period before and the three year forecast period after the date of system calculation. The user (20) also has the option of specifying the data periods that will be used for completing system calculations. After the date range is calculated it is stored in the system settings table (140) in the application database (50), processing advances to a software block 203. The software in block 203 prompts the user (20) via the organization layer data window (702) to define the different organization levels, define process maps, identify the elements and factors relevant to each organization level and graphically depict the relationship between the different organization levels that were saved in the system settings (140). For example, an organization could have two enterprises with each enterprise having three departments as shown in FIG. 12. In the case shown in FIG. 12 there would be nine organization levels as shown in Table 10.
In the system of the present invention an item within an element of performance is the lowest organization level. The organization level and process map relationships identified by the user (20) are stored in the organization layer table (174) in the ContextBase (60). It is also possible to obtain the organization layer information directly from narrow system input. The element and factor definitions by organization level are stored in the element definition table (184) and the factor definition table (185) in the ContextBase (60) After the data is stored, processing advances to a software block 204. The software in block 204 communicates via a network (45) with the different databases (5, 10, and 20) that are providing data to the Automated bot development system. As described previously, a number of methods can be used to identify the different data sources and make the information available for processing including bulk data extraction and point to point data extraction using bots or ETL (extract, test and load) utilities. Data from the lower levels of the hierarchy are automatically included in the context layers for the higher organization levels. In the embodiment being discussed the systems providing data are identified using UDDI protocols. The databases in these systems (5, 10 and 20) use XML tags that identify the organization level, context layer, element assignment and/or factor association for each piece of data. In this stage of processing the software in block 204 stores the location information for the data of interest as required to establish a virtual database for the administrative layers for each organization level that was specified in the system settings table (140). Establishing a virtual database eliminates the latency that can cause problems for real time processing. The virtual database information for the physical layer for each organization level is stored in the physical layer table (171) in the ContextBase (60). The virtual database information for the tactical layer for each organization level is stored in the tactical layer table (172) in the ContextBase (60). The virtual database information for the instant layer for each organization level is stored in the instant impact layer table (173) in the ContextBase (60). Structured data that was made available for processing that could not be mapped to an administrative context layer, organization level, factor and/or element is stored in the structured data table (176) in the Context Base (60). World Wide Web data that needs to be processed before being mapped to a context layer, organization level, factor and/or element are identified using a virtual database stored in the Internet data table (177) in the ContextBase (60). Video data that needs to be processed before being mapped to a context layer, organization level, factor and/or element are identified using a virtual database stored in the video data table (178) in the ContextBase (60). Unstructured text data that needs to be processed before being mapped to a context layer, organization level, factor and/or element are identified using a virtual database stored in the text data table (180) in the ContextBase (60). Geo-coded data that needs to be processed before being mapped to a context layer, organization level, factor and/or element are identified using a virtual database stored in the geo data table (181) in the ContextBase (60). In all cases, data from narrow partner system databases (10) can be extracted and stored in a manner similar to that described for organization narrow system data. This data can include feature designations that define the acceptable range for data that are changed during optimization calculations. After virtual databases have been created that fully account for all available data from the databases (5, 10 and 20) and the World Wide Web (25), processing advances to a software block 205 and then on to a software block 210. The software in block 210 prompts the user (20) via the review data window (703) to review the elements and factors by context layer that have been identified in the first few steps of processing. The element—context layer assignments and the factor—context layer assignments were created by mapping data to their "locations" within the ContextBase (60) using xml tag designations. The user (20) has the option of changing these designations on a one time basis or permanently. Any changes the user (20) makes are stored in the table for the corresponding context layer (i.e. tactical layer changes are saved in the tactical layer table (172), etc.). As part of the processing in this block, the user (20) is given the option to establish data categories for each context layer using an interactive GEL algorithm that guides the process of category development. The newly defined categories are mapped to the appropriate data in the appropriate context layer and stored in the organization layer table (174) in the ContextBase (60). The user (20) is also prompted by the review data window (703) to use data and/or the newly created data categories from each context layer to define six of the nine key terms—element, agent, asset, resource, action and commitment (mission measures and priorities will be defined in the next step) for each organization level. The resulting definitions are saved in the key terms table (170) in the ContextBase (60) by organization and organization level. Finally, the user (20) is prompted to define transaction data that do not correspond to one of the six key terms. For example, transaction data may relate to a cell phone call or an email—both events that are not defined as actions for the current organization level. The user (20) will define these events using standardized definitions from a Thesaurus table (142) in the application database (50) with synonyms that match business concepts like "transfer", "return" and "expedite" as required to define each transaction. The information from the Thesaurus table (142) can be supplemented from on line lexicons like WordNet. In any event, the new definitions are also stored in the key terms table (170) in the ContextBase (60) before processing advances to a software block 215. The software in block 215 prompts the manager (21) via the mission measure data window (704) to use the key term definitions established in the prior processing step to specify one or more mission measures for each organization level. The manager (21) is given the option of using pre-defined mission measures for evaluating the performance of a commercial organization or defining new mission measures using internal and/or external data. If more than one mission measure is defined for a given organization level, then the manager (21) is prompted to assign a weighting or relative priority to the different mission measures that have been defined. The software in this block also prompts the manager (21) to identify keywords that are relevant to mission performance for each organization level in each organization. After the mission measure definitions are completed, the value of the newly defined mission measures are calculated using historical data and forecast data and stored in the mission layer table (175) by organization and organization level. After this has been completed, the mission measure definitions, priorities and keywords are stored in the key terms table (170) in the ContextBase (60) by organization and organization level before processing advances to a software block 231. The software in block 231 checks the structured data table (176) in the ContextBase (60) to see if there is any structured data that has not been assigned to an organization level and/or context layer. If there is no structured data without a complete assignment (organization, organization level, context layer and element or factor assignment constitutes a complete assignment), then processing advances to a software block 232. Alternatively, if there are structured data without an assignment, then processing advances to a software block 235. The software in block 235 prompts the manager (21) via the identification and classification data window (705) to identify the context layer, organization level, element assignment or factor assignment for the structured data in table 176. After assignments have been specified for every data element, the resulting assignment are stored in the appropriate context layer table in the ContextBase (60) by organization and organization level before processing advances to a software block 232. The software in block 232 checks the system settings table (140) in the Application Database (50) to see if video data extraction is going to be used in the current analysis. If video data extraction is not being used, then processing advances to a software block 236. Alternatively, if video data extraction is being used, then processing advances to a software block 233. The software in block 233 extracts text from the video data stored in the video data table (178) and stores the resulting text in the text table (180) in the ContextBase (60). The information in the video comes in two parts, the narrative associated with the image and the image itself. In the preferred embodiment, the narrative portion of the video has been captured in captions. These captions along with information identifying the time of first broadcast are stored in the text table (180). This same procedure can also be used for capturing data from radio broadcasts. If captions are not available, then any of a number of commercially available voice recognition programs can be used to create text from the narratives. The image portion of the video requires conversion. The conversion of video into text requires the use of several conversion algorithms and a synthesis of the results from each of the different algorithms using a data fusion algorithm. The algorithms used for video conversion include: coefficient energy block classification, local stroke detection and merge and graphics/text block classification. Again, the resulting text information along with information identifying the time of first broadcast are stored in the text table (180) before processing advances to a software block 236. The software in block 236 checks the system settings table (140) in the Application Database (50) to see if internet data extraction is going to be used in the current analysis. If internet data extraction is not being used, then processing advances to a software block 241. Alternatively, if internet data extraction is being used, then processing advances to a software block 237. The software in block 237 checks the bot date table (141) and deactivates internet text and linkage bots with creation dates before the current system date and retrieves information from the key terms table (180). The software in block 237 then initializes text bots for each keyword stored in the key terms table. The bots are programmed to activate with the frequency specified by user (20) in the system settings table (140). Bots are independent components of the application that have specific tasks to perform. In the case of internet text and linkage bots, their tasks are to locate and extract keyword matches and linkages from the World Wide Web (25) and then store the extracted text in the text data table (180) and the linkages in the internet linkages table (177) in the ContextBase (60). Every Internet text and linkage bot contains the information shown in Table 11.
After being initialized, the text and linkage bots locate, extract and store text and linkages from the World Wide Web (25) in accordance with their programmed instructions with the frequency specified by user (20) in the system settings table (140). These bots will continually extract data as system processing advances a software block 241. The software in block 241 checks the system settings table (140) to see if text data analysis is being used. If text data analysis is not being used, then processing advances to a block 246. Alternatively, if the software in block 241 determines that text data analysis is being used, processing advances to a software block 242. The software in block 242 checks the bot date table (141) and deactivates text relevance bots with creation dates before the current system date and retrieves information from the system settings table (140), the key terms table (170) and the text data table (180). The software in block 242 then initializes text relevance bots to activate with the frequency specified by user (20) in the system settings table (140). Bots are independent components of the application that have specific tasks to perform. In the case of text relevance bots, their tasks are to calculate a relevance measure for each word in the text data table (180) and to identify the type of word (Name, Proper Name, Verb, Adjective, Complement, Determinant or Other). The relevance of each word is determined by calculating a relevance measure using the formula shown in Table 12.
One advantage of this approach is that it takes into account the fact that text is generally a sequence of words and not just a "bag of words". The type of word is determined by using a probabilistic speech tagging algorithm. If the amount of text that needs processing is very large, then a multi layer neural net can be used to sort the text into blocks that should be processed and those that should not. Every text relevance bot contains the information shown in Table 13.
After being activated, the text relevance bots determine the relevance and type of each word with the frequency specified by the user (20) in the system settings table (140). The relevance of each word is stored in the text data table (180) before processing passes to a software block 244. The software in block 244 checks the bot date table (141) and deactivates text association bots with creation dates before the current system date and retrieves information from the system settings table (140), the tactical layer table (172), the instant impact layer table (173), the mission measure table (175), the text table (180), the element definition table (184) and the factor definition table (185). The software in block 244 then initializes text association bots for the words identified in the prior stage of processing in order of relevance up to the maximum number for each organization (the user (20) specified the maximum number of keywords in the system settings table). Bots are independent components of the application that have specific tasks to perform. In the case of text association bots, their tasks are to determine which element or factor the relevant words are most closely associated with. Every bot initialized by software block 244 will store the association it discovers with the most relevant words stored in the text data table (180). Every text association bot contains the information shown in Table 14.
After being initialized, the bots identify the element or factor that each word is most closely associated with and stores the association "assignment" in the text data table (180) and the element definition table (184) or factor definition table (185) in the ContextBase (60) before processing advances to a software block 245. The software in block 245 prompts the user (20) via the review data window (703) to review the associations developed in the prior step in processing. Options the user (20) can choose for modifying the associations include: changing the association to another element or factor, removing the assigned association, or adding an association to one or more other elements or factors. When all the user (20) completes the review of the assignments, all changes are stored in the text data table (180), the element definition table (184) and/or the factor definition table (185) before system processing advances to a software block 246. The software in block 246 checks the system settings table (140) in the Application Database (50) to see if geo-coded data is going to be used in the current analysis. If geo-coded data is not being used, then processing advances to a software block 251. Alternatively, if geo-coded data is being used, then processing advances to a software block 247. The software in block 247 retrieves the data stored in the geo table (181), converts the data in accordance with applicable geo-coding standard, calculates pre-defined attributes and stores the resulting data in the physical context layer table (171) by element or factor in the ContextBase (60) before processing advances to software block 251. The software in block 251 checks each of the administrative context layer tables—the physical layer table (171), the tactical layer table (172) and the instant impact layer table (173)—and the social environment layer table (179) in the ContextBase (60) to see if data is missing for any required time period. If data is not missing for any required time period, then processing advances to a software block 256. Alternatively, if data for one or more of the required time periods is missing for one or more of the administrative context layers, then processing advances to a software block 255. The software in block 255 prompts the user (20) via the review data window (703) to specify the method to be used for filling the blanks for each field that is missing data. Options the user (20) can choose for filling the blanks include: the average value for the item over the entire time period, the average value for the item over a specified period, zero, the average of the preceding item and the following item values and direct user input for each missing item. If the user (20) does not provide input within a specified interval, then the default missing data procedure specified in the system settings table (140) is used. When all the blanks have been filled and stored for all of the missing data, system processing advances to a block 256. The software in block 256 calculates pre-defined attributes by item for each numeric, item variable in each of the administrative context layer tables—the physical layer table (171), the tactical layer table (172) or the instant impact layer table (173)—in the ContextBase (60) by element. The attributes calculated in this step include: summary data like cumulative total value; ratios like the period to period rate of change in value; trends like the rolling average value, comparisons to a baseline value like change from a prior years level and time lagged values like the time lagged value of each numeric item variable. The software in block 256 also derives attributes for each item date variable in each of the administrative context layer tables (171, 172 and 173) in the ContextBase (60). The derived date variables include summary data like time since last occurrence and cumulative time since first occurrence; and trends like average frequency of occurrence and the rolling average frequency of occurrence. The software in block 256 derives similar attributes for the text and geospatial item variables stored in the administrative context layer tables—the physical layer table (171), the tactical layer table (172) or the instant impact layer table (173)—by element. The numbers derived from the item variables are collectively referred to as "item performance indicators". The software in block 256 also calculates pre-specified combinations of variables called composite variables for measuring the strength of the different elements of performance. The item performance indicators and the composite variables are tagged and stored in the appropriate administrative context layer table—the physical layer table (171), the tactical layer table (172) or the instant impact layer table (173)—by element and organization level before processing advances to a software block 257. The software in block 257 uses attribute derivation algorithms such as the AQ program to create combinations of variables from the administrative context layer tables—the physical layer table (171), the tactical layer table (172) or the instant impact layer table (173)—that were not pre-specified for combination in the prior processing step. While the AQ program is used in the preferred embodiment of the present invention, other attribute derivation algorithms, such as the LINUS algorithms, may be used to the same effect. The resulting composite variables are tagged and stored in the appropriate administrative context layer table—the physical layer table (171), the tactical layer table (172) or the instant impact layer table (173)—in the ContextBase (60) by element before processing advances to a software block 260. The software in block 260 derives external factor indicators for each factor numeric data field stored in the social environment layer table (179). For example, external factors can include: the ratio of organization level earnings to expected earnings, the number and amount of jury awards, commodity prices, the inflation rate, growth in gross domestic product, organization level earnings volatility vs. industry average volatility, short and long term interest rates, increases in interest rates, insider trading direction and levels, industry concentration, consumer confidence and the unemployment rate that have an impact on the market price of the equity for an organization level and/or an industry. The external factor indicators derived in this step include: summary data like cumulative totals, ratios like the period to period rate of change, trends like the rolling average value, comparisons to a baseline value like change from a prior years price and time lagged data like time lagged earnings forecasts. In a similar fashion the software in block 260 calculates external factors for each factor date field in the social environment layer table (179) including summary factors like time since last occurrence and cumulative time since first occurrence; and trends like average frequency of occurrence and the rolling average frequency of occurrence. The numbers derived from numeric and date fields are collectively referred to as "factor performance indicators". The software in block 260 also calculates pre-specified combinations of variables called composite factors for measuring the strength of the different external factors. The factor performance indicators and the composite factors are tagged and stored in the social environment layer table (179) by factor and organization level before processing advances to a block 261. The software in block 261 uses attribute derivation algorithms, such as the Linus algorithm, to create combinations of the external factors that were not pre-specified for combination in the prior processing step. While the Linus algorithm is used in the preferred embodiment of the present invention, other attribute derivation algorithms, such as the AQ program, may be used to the same effect. The resulting composite variables are tagged and stored in the in the social environment layer table (179) by factor and organization level before processing advances to a block 262. The software in block 262 checks the bot date table (141) and deactivates pattern bots with creation dates before the current system date and retrieves information from the system settings table (140), the physical layer table (171), the tactical layer table (172), the instant impact layer table (173) and the social environment layer table (179). The software in block 262 then initializes pattern bots for each layer to identify frequent patterns in each layers. Bots are independent components of the application that have specific tasks to perform. In the case of pattern bots, their tasks are to identify and frequent patterns in the data for each context layer, element, factor and organization level. In the preferred embodiment, pattern bots use the Apriori algorithm to identify patterns including frequent patterns, sequential patterns and multi-dimensional patterns. However, a number of other pattern identification algorithms including the PASCAL algorithm can be used alone or in combination to the same effect. Every pattern bot contains the information shown in Table 15.
After being initialized, the bots identify patterns in the data by element, factor, layer or organization level. Each pattern is given a unique identifier and the frequency and type of each pattern is determined. The numeric values associated with the patterns are item performance indicators. The values are stored in the appropriate context layer table by element or factor. When data storage is complete, processing advances to a software block 303. ContextBase Development The flow diagrams in FIG. 7A, FIG. 7B, FIG. 7C, FIG. 7D and FIG. 7E detail the processing that is completed by the portion of the application software (300) that continually develops a mission oriented ContextBase (60) by creating and activating analysis bots that:
Processing in this portion of the application begins in software block 301. The software in block 301 checks the mission layer table (175) in the ContextBase (60) to determine if there are current models for all mission measures for every organization level. If all the mission measure models are current, then processing advances to a software block 321. Alternatively, if all mission measure models are not current, then the next mission measure for the next organization level is selected and processing advances to a software block 303. The software in block 303 retrieves the previously calculated values for the mission measure from the mission layer table (175) before processing advances to a software block 304. The software in block 304 checks the bot date table (141) and deactivates temporal clustering bots with creation dates before the current system date. The software in block 304 then initializes bots in accordance with the frequency specified by the user (20) in the system settings table (140). The bot retrieves information from the mission layer table (175) for the organization level being analyzed and defines regimes for the mission measure being analyzed before saving the resulting cluster information in the mission layer table (175) in the ContextBase (60). Bots are independent components of the application that have specific tasks to perform. In the case of temporal clustering bots, their primary task is to segment mission measure performance into distinct time regimes that share similar characteristics. The temporal clustering bot assigns a unique identification (id) number to each "regime" it identifies before tagging and storing the unique id numbers in the mission layer table (175). Every time period with data are assigned to one of the regimes. The cluster id for each regime is saved in the data record for the mission measure and organization level being analyzed. The time regimes are developed using a competitive regression algorithm that identifies an overall, global model before splitting the data and creating new models for the data in each partition. If the error from the two models is greater than the error from the global model, then there is only one regime in the data. Alternatively, if the two models produce lower error than the global model, then a third model is created. If the error from three models is lower than from two models then a fourth model is added. The process continues until adding a new model does not improve accuracy. Other temporal clustering algorithms may be used to the same effect. Every temporal clustering bot contains the information shown in Table 16.
When bots in block 304 have identified and stored regime assignments for all time periods with mission measure data for the organization being analyzed, processing advances to a software block 305. The software in block 305 checks the bot date table (141) and deactivates variable clustering bots with creation dates before the current system date. The software in block 305 then initializes bots as required for each element of performance and external factor for the current organization level. The bots activate in accordance with the frequency specified by the user (20) in the system settings table (140), retrieve the information from the physical layer table (171), the tactical layer table (172), the instant impact layer table (173), the social environment layer table (179), the element definition table (184) and/or the factor definition table (185) as required and define segments for the element data and factor data before tagging and saving the resulting cluster information in the element definition table (184) or the factor definition table (185). Bots are independent components of the application that have specific tasks to perform. In the case of variable clustering bots, their primary task is to segment the element data and factor data into distinct clusters that share similar characteristics. The clustering bot assigns a unique id number to each "cluster" it identifies, tags and stores the unique id numbers in the element definition table (184) and factor definition table (185). Every item variable for every element of performance is assigned to one of the unique clusters. The cluster id for each variable is saved in the data record for each variable in the table where it resides. In a similar fashion, every factor variable for every external factor is assigned to a unique cluster. The cluster id for each variable is tagged and saved in the data record for the factor variable. The element data and factor data are segmented into a number of clusters less than or equal to the maximum specified by the user (20) in the system settings table (140). The data are segmented using the "default" clustering algorithm the user (20) specified in the system settings table (140). The system of the present invention provides the user (20) with the choice of several clustering algorithms including: an unsupervised "Kohonen" neural network, decision tree, support vector method, K-nearest neighbor, expectation maximization (EM) and the segmental K-means algorithm. For algorithms that normally require the number of clusters to be specified, the bot will use the maximum number of clusters specified by the user (20). Every variable clustering bot contains the information shown in Table 17.
When bots in block 305 have identified, tagged and stored cluster assignments for the data associated with each element of performance or external factor in the element definition table (184) or factor definition table (185), processing advances to a software block 306. The software in block 306 checks the mission layer table (175) in the ContextBase (60) to see if the current mission measure is an options based measure like contingent liabilities, real options or strategic risk. If the current mission measure is not an options based measure, then processing advances to a software block 309. Alternatively, if the current mission measure is an options based measure, then processing advances to a software block 307. The software in block 307 checks the bot date table (141) and deactivates options simulation bots with creation dates before the current system date. The software in block 307 then retrieves the information from the system settings table (140), the element definition table (184) and factor definition table (185) and the scenarios table (152) as required to initialize option simulation bots in accordance with the frequency specified by the user (20) in the system settings table (140). Bots are independent components of the application that have specific tasks to perform. In the case of option simulation bots, their primary task is to determine the impact of each element and factor on an option mission measure under different scenarios. The option simulation bots run a normal scenario, an extreme scenario and a combined scenario. In this embodiment, Monte Carlo models are used to complete the probabilistic simulation, however other probabilistic simulation models such as Quasi Monte Carlo can be used to the same effect. The element and factor impacts on option mission measures could be determined using the processed detailed below for the other types of mission measures, however, in this embodiment a separate procedure is used. The models are initialized specifications used in the baseline calculations. Every option simulation bot activated in this block contains the information shown in Table 18.
After the option simulation bots are initialized, they activate in accordance with the frequency specified by the user (20) in the system settings table (140). Once activated, the bots retrieve the required information and simulate the mission measure over the time periods specified by the user (20) in the system settings table (140) as required to determine the impact of each element and factor on the mission measure. After the option simulation bots complete their calculations, the resulting sensitivities are saved in the element definition table (184) and factor definition table (185) by organization and organization level in the application database (50) and processing advances to a software block 309. The software in block 309 checks the bot date table (141) and deactivates all predictive model bots with creation dates before the current system date. The software in block 309 then retrieves the information from the system settings table (140), the mission layer table (175), the element definition table (184) and the factor definition table (185) as required to initialize predictive model bots for each mission layer. Bots are independent components of the application that have specific tasks to perform. In the case of predictive model bots, their primary task is to determine the relationship between the element and factor data and the mission measure being evaluated. Predictive model bots are initialized for every organization level where the mission measure being evaluated is used. They are also initialized for each cluster and regime of data in accordance with the cluster and regime assignments specified by the bots in blocks 304 and 305 by organization and organization level. A series of predictive model bots is initialized at this stage because it is impossible to know in advance which predictive model type will produce the "best" predictive model for the data from each organization level. The series for each model includes 12 predictive model bot types: neural network; CART; GARCH, projection pursuit regression; generalized additive model (GAM), redundant regression network; rough-set analysis, boosted Naive Bayes Regression; MARS; linear regression; support vector method and stepwise regression. Additional predictive model types can be used to the same effect. Every predictive model bot contains the information shown in Table 19.
After predictive model bots are initialized, the bots activate in accordance with the frequency specified by the user (20) in the system settings table (140). Once activated, the bots retrieve the required data from the appropriate table in the ContextBase (60) and randomly partition the element or factor data into a training set and a test set. The software in block 309 uses "bootstrapping" where the different training data sets are created by re-sampling with replacement from the original training set so data records may occur more than once. After the predictive model bots complete their training and testing, the best fit predictive model assessments of element and factor impacts on mission measure performance are saved in the element definition table (184) and the factor definition table (185) before processing advances to a block 310. The software in block 310 determines if clustering improved the accuracy of the predictive models generated by the bots in software block 309 by organization and organization level. The software in block 310 uses a variable selection algorithm such as stepwise regression (other types of variable selection algorithms can be used) to combine the results from the predictive model bot analyses for each type of analysis—with and without clustering—to determine the best set of variables for each type of analysis. The type of analysis having the smallest amount of error as measured by applying the mean squared error algorithm to the test data are given preference in determining the best set of variables for use in later analysis. There are four possible outcomes from this analysis as shown in Table 20.
If the software in block 310 determines that clustering improves the accuracy of the predictive models for an organization level, then processing advances to a software block 314. Alternatively, if clustering does not improve the overall accuracy of the predictive models for an organization level, then processing advances to a software block 312. The software in block 312 uses a variable selection algorithm | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
