|
|
|
Allocating resources or scheduling for an administrative function |
Advanced information gathering for targeted activities6845370
Abstract
An agent based system assists in preparing an individual for an upcoming meeting by helping him/her retrieve relevant information about the meeting from various sources based on preexisting information in the system. The system obtains input text in character form indicative of the target meeting from a calendar program that includes the time of the meeting. As the time of the meeting approaches, the calendar program is queried to obtain the text of the target event and that information is utilized as input to the agent system. Then, the agent system parses the input meeting text to extract its various components such as title, body, participants, location, time etc. The system also performs pattern matching to identify particular meeting fields in a meeting text. This information is utilized to query various sources of information on the web and obtain relevant stories about the current meeting to send back to the calendaring system. For example, if an individual has a meeting with Netscape and Microsoft to talk about their disputes, the system obtains this initial information from the calendaring system. It will then parse out the text to realize that the companies in the meeting are "Netscape" and "Microsoft" and the topic is "disputes". It will then surf the web for relevant information concerning the topic. Thus, in accordance with an objective of the invention, the system updates the calendaring system and eventually the user with the best information it can gather to prepare for the target meeting. In accordance with a preferred embodiment, the information is stored in a file that is obtained via selection from a link imbedded in the calendar system.
Claims
What is claimed is:
1. A method for creating an information summary, comprising the steps of:
(a) retrieving a plurality of terms descriptive of an upcoming event;
(b) transmitting the terms to a software agent that autonomously creates a query based on the terms;
(c) querying a network of information utilizing the query; and
(d) updating the information associated with the upcoming event with information from the query, the updating comprising adding information obtained from the query when the information summary has previously been created, or creating the information summary when the information summary has not vet been created,
wherein the information summary is established or updated at a predetermined time before the event.
2. A method for creating an information summary as recited in claim 1, including the step of parsing the terms based on predefined criteria to create the query.
3. A method for creating an information summary as recited in claim 1, including the step of providing constants that are utilized by the system for dynamically configuring the system based on current user inputs.
4. A method for creating an information summary as recited in claim 1, including the step of ranking the results based on relevance to meeting criteria.
5. A method for creating an information summary as recited in claim 1, including the step of utilizing proximity to a meeting date as a filtering device for the information summary.
6. A method for creating an information summary as recited in claim 1, including the step of pattern recognition to enhance the location of pertinent information.
7. A method for creating an information summary as recited in claim 1, including support for querying the Internet to obtain pertinent information.
8. A method for creating an information summary as recited in claim 1, including optimizing the query for a particular engine.
9. A method for creating an information summary as recited in claim 1, including the step of responding to updates of the meeting information to obtain additional summary information pertinent to the updates.
10. The method for creating an information summary as recited in claim 1, wherein the query is created by applying a pattern template to the plurality of terms.
11. The method for creating an information summary as recited in claim 10, wherein the terms include at least one indicator, and wherein the applying a pattern template further includes determining if a part of the terms can be bound to the pattern, binding at least one of the plurality of text strings, and locating an indicator.
12. The method for creating an information summary as recited in claim 1, wherein the information summary is stored and displayed separately from the description of the upcoming event.
13. The method for creating an information summary as recited in claim 1, further comprising inserting a link to the updated information summary in the description of the upcoming event.
14. The method for creating an information summary as recited in claim 1, wherein the step of updating the information summary comprises filtering the information retrieved from the query based on a profile of the user, and adding the filtered information to the information summary.
15. The method for creating art information summary as recited in claim 1, wherein the query seeks to obtain additional information relating to at least one of a person, a topic, a location and a company identified in the description of the upcoming event.
16. The method for creating an information summary as recited in claim 15, further comprising providing access to the information summary by creating a link thereto that is accessible to a user.
17. The method for creating an information summary as recited in claim 15, wherein the information from the query is stored separately from the description of the upcoming event.
18. The method for creating an information summary as recited in claim 15, wherein the terms identified in the description as a name of a company are verified by comparison with a resource containing names of companies.
19. An apparatus that creates an information summary, comprising:
(a) a processor;
(b) a memory that stores information under the control of the processor;
(c) logic that retrieves a plurality of search terms descriptive of an upcoming event (d) logic that transmits the terms to a software agent that autonomously creates a query based on the terms;
(e) logic that queries a network of information utilizing the query; and
(f) logic that updates the information associated wit the upcoming event with information from the query, the logic that updates comprising adding information obtained from the query when the information summary has previously been created, or creating the information summary when the information summary has not yet been created,
wherein the information summary is established or updated at a predetermined time before the event.
20. A computer program embodied on a computer-readable medium that creates an information summary, comprising:
(a) a code segment that retrieves a plurality of terms descriptive of an upcoming event;
(b) a code segment that transmits the terms to a software agent that autonomously creates a query based on the terms;
(c) a code segment that queries a network of information utilizing the query; and
(d) updating the information associated with the upcoming event with information from the query, the updating comprising adding information obtained from the query when the in formation summary has previously been created, or creating the in formation summary when the information summary has not yet been created,
wherein the information summary is established or updated at a predetermined time before the event.
21. A computer program embodied on a computer-readable medium that creates to an information summary as recited in claim 20, including logic that parses the terms based on predefined criteria to create the query.
22. A computer program embodied on a computer-readable medium that creates an information summary as recited in claim 20, including logic that modifies constants that are utilized by the system for dynamically configuring the system based on current user inputs.
23. A computer program embodied on a computer-readable medium that creates an information summary as recited in claim 20, including logic that ranks the results based on relevance to meeting criteria.
24. A computer program embodied on a computer-readable medium that creates an information summary as recited in claim 20, including logic that utilizes proximity to a meeting date as a filtering device for the information summary.
25. A computer program embodied on a computer-readable medium that creates an information summary as recited in claim 20, including logic that recognizes patterns to enhance the location of pertinent information.
26. A computer program embodied on a computer-readable medium that creates to an information summary as recited in claim 20, including logic that queries the Internet to obtain pertinent information.
27. A computer program embodied on a computer-readable medium that creates an information summary as recited in claim 20, including logic that optimizes the query for a particular engine.
28. A computer program embodied on a computer-readable medium that creates an information summary as recited in claim 20, including logic that responds to updates of the meeting information to obtain additional summary information pertinent to the updates.
29. A method for creating an information summary, comprising the steps of:
(a) retrieving a plurality of terms descriptive of an upcoming event;
(b) transmitting the terms to a software agent that autonomously creates a query based on the terms;
(c) querying a network of information utilizing the query; and
(d) updating the information associated with the upcoming event with information from the query,
wherein the query is created by applying a pattern template to the plurality of terms, and
wherein the pattern template is adapted for identifying words separated by punctuation, identifying full names by finding two capitalized words, parsing out time strings, and identifying continuous phrases of capitalized words as at least one of a company, topic and location.
30. The method for creating an information summary as recited in claim 29, wherein the step of updating the information summary comprises adding information obtained from the query when the information summary has previously been created, or creating the information summary when the information summary has not yet been created.
31. The method for creating an information summary as recited in claim 30, wherein the information summary is established or updated at a predetermined time before the event.
32. Method for creating an information summary regarding an upcoming event comprising:
receiving input regarding the upcoming event;
autonomously creating a query based on the input;
autonomously querying a network utilizing the query to retrieve relevant information;
autonomously deriving background information for the upcoming event from the relevant information, the background information comprising a subset of the relevant information; and
updating the information associated with the upcoming event with the background information, the updating comprising adding information obtained from the query when the information summary has previously been created, or creating the information summary when the information summary has not vet been created,
wherein the information summary is established or updated at a predetermined time before the event.
33. The method for retrieving information as recited in claim 32, wherein deriving background information comprises summarizing at least a part of the relevant information.
34. The method for retrieving information as recited in claim 32, wherein deriving background information comprises filtering at least a part of the relevant information.
35. The method for retrieving information as recited in claim 32, wherein querying a network utilizing the query to retrieve relevant information comprises generating a list of relevant information, and
wherein deriving background information comprises prioritizing at least a part of the list, the step of deriving being performed after the step of generating a list.
36. The method for retrieving information as recited in claim 32, wherein deriving background information is based on a profile of the user.
37. The method for retrieving information as recited in claim 36, wherein the user profile comprises user-specified data.
38. The method for retrieving information as recited in claim 36, wherein the user profile comprises data extrapolated from a user's activities.
39. The method for retrieving information as recited in claim 32, further comprising notifying the user regarding the background information.
40. The method for retrieving information as recited in claim 32, wherein querying a network is performed repeatedly.
41. The method for retrieving information as recited in claim 32, wherein querying a network is performed periodically.
42. The method for retrieving information as recited in claim 32, wherein querying a network utilizing the query to retrieve relevant information comprises querying a search engine on a network utilizing the query to retrieve relevant information.
43. The method for retrieving information as recited in claim 32, further comprising:
determining a commercial service based on the input; and
sending information regarding the commercial service to the user.
44. The method for retrieving information as recited in claim 32, Thither comprising selecting a device, from a plurality of user devices, and
sending at least a portion of the background intbrmation to the selected device.
45. The method of retrieving information as recited in claim 44, wherein selecting a device is based on a user profile.
46. The method of retrieving information as recited in claim 44, wherein deriving background information comprises selecting information, based on the device, for the upcoming event from the relevant information.
47. Method for retrieving relevant creating an information summary regarding an upcoming event for a user comprising:
receiving input regarding the upcoming event;
autonomously creating a first query based on the input;
autonomously querying a first network utilizing the first query to retrieve first relevant information;
autonomously creating a second query based on the input, the second query being different from the first query;
autonomously querying a second network utilizing the second query to retrieve second relevant information; and
updating the information associated with the upcoming event with information from the queries, the updating comprising adding at least a nation of the first relevant information and second relevant information when the information summary has previously been created, or creating the information summary when the information summary has not yet been created,
wherein the information summary is established or updated at a predetermined time before the event.
48. The method of retrieving relevant information as recited in claim 47, wherein the first network comprises a first public internet website and the second network comprises a second public internet website.
49. The method of retrieving relevant information as recited in claim 47, further comprising determining whether to query the first network.
50. The method of retrieving relevant information as recited in claim 49, wherein determining whether to query the first network is based on the input.
51. The method of retrieving relevant information as recited in claim 50, wherein determining whether to query the first network comprises determining whether particular information is present in the input.
52. The method of retrieving relevant information as recited in claim 47, wherein the first and second relevant information comprise background information regarding the upcoming event.
53. The method of retrieving relevant information as recited in claim 47, wherein the second relevant information comprises traffic information.
Description
FIELD OF THE INVENTION
The present invention relates to agent based systems and more particularly to an agent based system which automatically creates background information for an upcoming event.
Agent based technology has become increasingly important for use with applications designed to interact with a user for performing various computer based tasks in foreground and background modes. Agent software comprises computer programs that are set on behalf of users to perform routine, tedious and time-consuming tasks. To be useful to an individual user, an agent must be personalized to the individual user's goals, habits and preferences. Thus, there exists a substantial requirement for the agent to efficiently and effectively acquire user-specific knowledge from the user and utilize it to perform tasks on behalf of the user.
The concept of agency, or the user of agents, is well established. An agent is a person authorized by another person, typically referred to as a principal, to act on behalf of the principal. In this manner the principal empowers the agent to perform any of the tasks that the principal is unwilling or unable to perform. For example, an insurance agent may handle all of the insurance requirements for a principal, or a talent agent may act on behalf of a performer to arrange concert dates.
With the advent of the computer, a new domain for employing agents has arrived. Significant advances in the realm of expert systems enable computer programs to act on behalf of computer users to perform routine, tedious and other time-consuming tasks. These computer programs are referred to as "software agents."
Moreover, there has been a recent proliferation of computer and communication networks. These networks permit a user to access vast amounts of information and services without, essentially, any geographical boundaries. Thus, a software agent has a rich environment to perform a large number of tasks on behalf of a user. For example, it is now possible for an agent to make an airline reservation, purchase the ticket, and have the ticket delivered directly to a user. Similarly, an agent could scan the Internet and obtain information ranging from the latest sports or news to a particular graduate thesis in applied physics. Current solutions fail to apply agent technology to existing calendar technology to provide targeted acquisition of background information for a user's upcoming events.
SUMMARY OF THE INVENTION
According to a broad aspect of a preferred embodiment of the invention, an agent based system assists in preparing an individual for an upcoming meeting by helping him/her retrieve relevant information about the meeting from various sources. The system obtains input text in character form indicative of the target meeting from the a calendar program that includes the time of the meeting. As the time of the meeting approaches, the calendar program is queried to obtain the text of the target event and that information is utilized as input to the agent system. Then, the agent system parses the input meeting text to extract its various components such as title, body, participants, location, time etc. The system also performs pattern matching to identify particular meeting fields in a meeting text. This information is utilized to query various sources of information on the web and obtain relevant stories about the current meeting to send back to the calendaring system. For example, if an individual has a meeting with Netscape and Microsoft to talk about their disputes, the system obtains this initial information from the calendaring system. It will then parse out the text to realize that the companies in the meeting are "Netscape" and "Microsoft" and the topic is "disputes". It will then surf the web for relevant information concerning the topic. Thus, in accordance with an objective of the invention, the system updates the calendaring system and eventually the user with the best information it can gather to prepare for the target meeting. In accordance with a preferred embodiment, the information is stored in a file that is obtained via selection from a link imbedded in the calendar system.
DESCRIPTION OF THE DRAWINGS
The foregoing and other objects, aspects and advantages are better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which:
FIG. 1 is a block diagram of a representative hardware environment in accordance with a preferred embodiment;
FIG. 2 is a flowchart of the system in accordance with a preferred embodiment;
FIG. 3 is a flowchart of a parsing unit of the system in accordance with a preferred embodiment;
FIG. 4 is a flowchart for pattern matching in accordance with a preferred embodiment;
FIG. 5 is a flowchart for a search unit in accordance with a preferred embodiment;
FIG. 6 is a flowchart for overall system processing in accordance with a preferred embodiment;
FIG. 7 is a flowchart of topic processing in accordance with a preferred embodiment;
FIG. 8 is a flowchart of meeting record processing in accordance with a preferred embodiment;
FIG. 9 is a block diagram of process flow of a pocket bargain finder in accordance with a preferred embodiment;
FIGS. 10A and 10B are a block diagram and flowchart depicting the logic associated with creating a customized content web page in accordance with a preferred embodiment;
FIG. 11 is a flowchart depicting the detailed logic associated with retrieving user-centric content in accordance with a preferred embodiment;
FIG. 12 is a data model of a user profile in accordance with a preferred embodiment;
FIG. 13 is a persona data model in accordance with a preferred embodiment;
FIG. 14 is an intention data model in accordance with a preferred embodiment;
FIG. 15 is a flowchart of the processing for generating an agent's current statistics in accordance with a preferred embodiment;
FIG. 16 is a flowchart of the logic that determines the personalized product rating for a user in accordance with a preferred embodiment;
FIG. 17 is a flowchart of the logic for accessing the centrally stored profile in accordance with a preferred embodiment;
FIG. 18 is a flowchart of the interaction logic between a user and the integrator for a particular supplier in accordance with a preferred embodiment;
FIG. 19 is a flowchart of the agent processing for generating a verbal summary in accordance with a preferred embodiment;
FIG. 20 illustrates a display login in accordance with a preferred embodiment;
FIG. 21 illustrates a managing daily logistics display in accordance with a preferred embodiment;
FIG. 22 illustrates a user main display in accordance with a preferred embodiment;
FIG. 23 illustrates an agent interaction display in accordance with a preferred embodiment;
FIG. 24 is a block diagram of an active knowledge management system in accordance with a preferred embodiment;
FIG. 25 is a block diagram of a back end server in accordance with a preferred embodiment; and
FIG. 26 is a block diagram of a magic wall in accordance with a preferred embodiment.
DETAILED DESCRIPTION
A preferred embodiment of a system in accordance with the present invention is preferably practiced in the context of a personal computer such as an IBM compatible personal computer, Apple MACINTOSH computer or UNIX based workstation. A representative hardware environment is depicted in FIG. 1, which illustrates a typical hardware configuration of a workstation in accordance with a preferred embodiment having a central processing unit 110, such as a microprocessor, and a number of other units interconnected via a system bus 112. The workstation shown in FIG. 1 includes a Random Access Memory (RAM) 114, Read Only Memory (ROM) 116, an I/O adapter 118 for connecting peripheral devices such as disk storage units 120 to the bus 112, a user interface adapter 122 for connecting a keyboard 124, a mouse 126, a speaker 128, a microphone 132, and/or other user interface devices such as a touch screen (not shown) to the bus 112, communication adapter 134 for connecting the workstation to a communication network (e.g., a data processing network) and a display adapter 136 for connecting the bus 112 to a display device 138. The workstation typically has resident thereon an operating system such as the Microsoft Windows NT or Windows/95 Operating System (OS), the IBM OS/2 operating system, the MAC OS, or UNIX operating system. Those skilled in the art will appreciate that the present invention may also be implemented on platforms and operating systems other than those mentioned.
A preferred embodiment is written using JAVA, C, and the C++ language and utilizes object oriented programming methodology. Object oriented programming (OOP) has become increasingly used to develop complex applications. As OOP moves toward the mainstream of software design and development, various software solutions require adaptation to make use of the benefits of OOP. A need exists for these principles of OOP to be applied to a messaging interface of an electronic messaging system such that a set of
OOP classes and objects for the messaging interface can be provided. OOP is a process of developing computer software using objects, including the steps of analyzing the problem, designing the system, and constructing the program. An object is a software package that contains both data and a collection of related structures and procedures. Since it contains both data and a collection of structures and procedures, it can be visualized as a self-sufficient component that does not require other additional structures, procedures or data to perform its specific task. OOP, therefore, views a computer program as a collection of largely autonomous components, called objects, each of which is responsible for a specific task. This concept of packaging data, structures, and procedures together in one component or module is called encapsulation.
In general, OOP components are reusable software modules which present an interface that conforms to an object model and which are accessed at run-time through a component integration architecture. A component integration architecture is a set of architecture mechanisms which allow software modules in different process spaces to utilize each others capabilities or functions. This is generally done by assuming a common component object model on which to build the architecture.
It is worthwhile to differentiate between an object and a class of objects at this point. An object is a single instance of the class of objects, which is often just called a class. A class of objects can be viewed as a blueprint, from which many objects can be formed.
OOP allows the programmer to create an object that is a part of another object. For example, the object representing a piston engine is said to have a composition-relationship with the object representing a piston. In reality, a piston engine comprises a piston, valves and many other components; the fact that a piston is an element of a piston engine can be logically and semantically represented in OOP by two objects.
OOP also allows creation of an object that "depends from" another object. If there are two objects, one representing a piston engine and the other representing a piston engine wherein the piston is made of ceramic, then the relationship between the two objects is not that of composition. A ceramic piston engine does not make up a piston engine. Rather it is merely one kind of piston engine that has one more limitation than the piston engine; its piston is made of ceramic. In this case, the object representing the ceramic piston engine is called a derived object, and it inherits all of the aspects of the object representing the piston engine and adds further limitation or detail to it. The object representing the ceramic piston engine "depends from" the object representing the piston engine. The relationship between these objects is called inheritance.
When the object or class representing the ceramic piston engine inherits all of the aspects of the objects representing the piston engine, it inherits the thermal characteristics of a standard piston defined in the piston engine class. However, the ceramic piston engine object overrides these ceramic specific thermal characteristics, which are typically different from those associated with a metal piston. It skips over the original and uses new functions related to ceramic pistons. Different kinds of piston engines have different characteristics, but may have the same underlying functions associated with it (e.g., how many pistons in the engine, ignition sequences, lubrication, etc.). To access each of these functions in any piston engine object, a programmer would call the same functions with the same names, but each type of piston engine may have different/overriding implementations of functions behind the same name. This ability to hide different implementations of a function behind the same name is called polymorphism and it greatly simplifies communication among objects.
With the concepts of composition-relationship, encapsulation, inheritance and polymorphism, an object can represent just about anything in the real world. In fact, our logical perception of the reality is the only limit on determining the kinds of things that can become objects in object-oriented software. Some typical categories are as follows:
Objects can represent physical objects, such as automobiles in a traffic-flow simulation, electrical components in a circuit-design program, countries in an economics model, or aircraft in an air-traffic-control system.
Objects can represent elements of the computer-user environment such as windows, menus or graphics objects.
An object can represent an inventory, such as a personnel file or a table of the latitudes and longitudes of cities.
An object can represent user-defined data types such as time, angles, and complex numbers, or points on the plane.
With this enormous capability of an object to represent just about any logically separable matters, OOP allows the software developer to design and implement a computer program that is a model of some aspects of reality, whether that reality is a physical entity, a process, a system, or a composition of matter. Since the object can represent anything, the software developer can create an object which can be used as a component in a larger software project in the future.
If 90% of a new OOP software program consists of proven, existing components made from preexisting reusable objects, then only the remaining 10% of the new software project has to be written and tested from scratch. Since 90% already came from an inventory of extensively tested reusable objects, the potential domain from which an error could originate is 10% of the program. As a result, OOP enables software developers to build objects out of other, previously built, objects.
This process closely resembles complex machinery being built out of assemblies and sub-assemblies. OOP technology, therefore, makes software engineering more like hardware engineering in that software is built from existing components, which are available to the developer as objects. All this adds up to an improved quality of the software as well as an increased speed of its development.
Programming languages are beginning to fully support the OOP principles, such as encapsulation, inheritance, polymorphism, and composition-relationship. With the advent of the C++ language, many commercial software developers have embraced OOP. C++ is an OOP language that offers a fast, machine-executable code. Furthermore, C++ is suitable for both commercial-application and systems-programming projects. For now, C++ appears to be the most popular choice among many OOP programmers, but there is a host of other OOP languages, such as Smalltalk, common lisp object system (CLOS), and Eiffel. Additionally, OOP capabilities are being added to more traditional popular computer programming languages such as Pascal.
The benefits of object classes can be summarized, as follows:
Objects and their corresponding classes break down complex programming problems into many smaller, simpler problems.
Encapsulation enforces data abstraction through the organization of data into small, independent objects that can communicate with each other. Encapsulation protects the data in an object from accidental damage, but allows other objects to interact with that data by calling the object's member functions and structures.
Subclassing and inheritance make it possible to extend and modify objects through deriving new kinds of objects from the standard classes available in the system. Thus, new capabilities are created without having to start from scratch.
Polymorphism and multiple inheritance make it possible for different programmers to mix and match characteristics of many different classes and create specialized objects that can still work with related objects in predictable ways.
Class hierarchies and containment hierarchies provide a flexible mechanism for modeling real-world objects and the relationships among them.
Libraries of reusable classes are useful in many situations, but they also have some limitations. For example:
Complexity. In a complex system, the class hierarchies for related classes can become extremely confusing, with many dozens or even hundreds of classes.
Flow of control. A program written with the aid of class libraries is still responsible for the flow of control (i.e., it must control the interactions among all the objects created from a particular library). The programmer has to decide which functions to call at what times for which kinds of objects.
Duplication of effort. Although class libraries allow programmers to use and reuse many small pieces of code, each programmer puts those pieces together in a different way. Two different programmers can use the same set of class libraries to write two programs that do exactly the same thing but whose internal structure (i.e., design) may be quite different, depending on hundreds of small decisions each programmer makes along the way. Inevitably, similar pieces of code end up doing similar things in slightly different ways and do not work as well together as they should.
Class libraries are very flexible. As programs grow more complex, more programmers are forced to reinvent basic solutions to basic problems over and over again. A relatively new extension of the class library concept is to have a framework of class libraries. This framework is more complex and consists of significant collections of collaborating classes that capture both the small scale patterns and major mechanisms that implement the common requirements and design in a specific application domain. They were first developed to free application programmers from the chores involved in displaying menus, windows, dialog boxes, and other standard user interface elements for personal computers.
Frameworks also represent a change in the way programmers think about the interaction between the code they write and code written by others. In the early days of procedural programming, the programmer called libraries provided by the operating system to perform certain tasks, but basically the program executed down the page from start to finish, and the programmer was solely responsible for the flow of control. This was appropriate for printing out paychecks, calculating a mathematical table, or solving other problems with a program that executed in just one way.
The development of graphical user interfaces began to turn this procedural programming arrangement inside out. These interfaces allow the user, rather than program logic, to drive the program and decide when certain actions should be performed. Today, most personal computer software accomplishes this by means of an event loop which monitors the mouse, keyboard, and other sources of external events and calls the appropriate parts of the programmer's code according to actions that the user performs. The programmer no longer determines the order in which events occur. Instead, a program is divided into separate pieces that are called at unpredictable times and in an unpredictable order. By relinquishing control in this way to users, the developer creates a program that is much easier to use. Nevertheless, individual pieces of the program written by the developer still call libraries provided by the operating system to accomplish certain tasks, and the programmer must still determine the flow of control within each piece after being called by the event loop. Application code still "sits on top of" the system.
Even event loop programs require programmers to write a lot of code that should not need to be written separately for every application. The concept of an application framework carries the event loop concept further. Instead of dealing with all the nuts and bolts of constructing basic menus, windows, and dialog boxes and then making these things all work together, programmers using application frameworks start with working application code and basic user interface elements in place. Subsequently, they build from there by replacing some of the generic capabilities of the framework with the specific capabilities of the intended application.
Application frameworks reduce the total amount of code that a programmer has to write from scratch. However, because the framework is really a generic application that displays windows, supports copy and paste, and so on, the programmer can also relinquish control to a greater degree than event loop programs permit. The framework code takes care of almost all event handling and flow of control, and the programmer's code is called only when the framework needs it (e.g., to create or manipulate a proprietary data structure).
A programmer writing a framework program not only relinquishes control to the user (as is also true for event loop programs), but also relinquishes the detailed flow of control within the program to the framework. This approach allows the creation of more complex systems that work together in interesting ways, as opposed to isolated programs, having custom code, being created over and over again for similar problems.
Thus, as is explained above, a framework basically is a collection of cooperating classes that make up a reusable design solution for a given problem domain. It typically includes objects that provide default behavior (e.g., for menus and windows), and programmers use it by inheriting some of that default behavior and overriding other behavior so that the framework calls application code at the appropriate times.
There are three main differences between frameworks and class libraries:
Behavior versus protocol. Class libraries are essentially collections of behaviors that you can call when you want those individual behaviors in your program. A framework, on the other hand, provides not only behavior but also the protocol or set of rules that govern the ways in which behaviors can be combined, including rules for what a programmer is supposed to provide versus what the framework provides.
Call versus override. With a class library, the code the programmer instantiates objects and calls their member functions. It's possible to instantiate and call objects in the same way with a framework (i.e., to treat the framework as a class library), but to take full advantage of a framework's reusable design, a programmer typically writes code that overrides and is called by the framework. The framework manages the flow of control among its objects. Writing a program involves dividing responsibilities among the various pieces of software that are called by the framework rather than specifying how the different pieces should work together.
Implementation versus design. With class libraries, programmers reuse only implementations, whereas with frameworks, they reuse design. A framework embodies the way a family of related programs or pieces of software work. It represents a generic design solution that can be adapted to a variety of specific problems in a given domain. For example, a single framework can embody the way a user interface works, even though two different user interfaces created with the same framework might solve quite different interface problems.
Thus, through the development of frameworks for solutions to various problems and programming tasks, significant reductions in the design and development effort for software can be achieved. A preferred embodiment of the invention utilizes HyperText Markup Language (HTML) to implement documents on the Internet together with a general-purpose secure communication protocol for a transport medium between the client and the Newco. HTTP or other protocols could be readily substituted for HTML without undue experimentation. Information on these products is available in T. Berners-Lee, D. Connoly, "RFC 1866: Hypertext Markup Language-2.0" (Nov. 1995); and R. Fielding, H, Frystyk, T. Bemers-Lee, J. Gettys and J.C. Mogul, "HypertextTransfer Protocol--HTTP/1.1: HTTP Working Group Internet Draft" (May 2, 1996). HTML is a simple data format used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of domains. HTML has been in use by the World-Wide Web global information initiative since 1990. HTML is an application of ISO Standard 8879:1986 Information Processing Text and Office Systems; Standard Generalized Markup Language (SGML).
To date, Web development tools have been limited in their ability to create dynamic Web applications which span from client to server and interoperate with existing computing resources. Until recently, HTML has been the dominant technology used in development of Web-based solutions. However, HTML has proven to be inadequate in the following areas:
Poor performance;
Restricted user interface capabilities;
Can only produce static Web pages;
Lack of interoperability with existing applications and data; and
Inability to scale.
Sun Microsystem's Java language solves many of the client-side problems by:
Improving performance on the client side;
Enabling the creation of dynamic, real-time Web applications; and
Providing the ability to create a wide variety of user interface components.
With Java, developers can create robust User Interface (UI) components. Custom "widgets" (e.g. real-time stock tickers, animated icons, etc.) can be created, and client-side performance is improved. Unlike HTML, Java supports the notion of client-side validation, offloading appropriate processing onto the client for improved performance. Dynamic, real-time Web pages can be created. Using the above-mentioned custom UI components, dynamic Web pages can also be created.
Sun's Java language has emerged as an industry-recognized language for "programming the Internet." Sun defines Java as: "a simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multithreaded, dynamic, buzzword-compliant, general-purpose programming language. Java supports programming for the Internet in the form of platform-independent Java applets." Java applets are small, specialized applications that comply with Sun's Java Application Programming Interface (API) allowing developers to add "interactive content" to Web documents (e.g. simple animations, page adornnents, basic games, etc.). Applets execute within a Java-compatible browser (e.g. Netscape Navigator) by copying code from the server to client. From a language standpoint, Java's core feature set is based on C++. Sun's Java literature states that Java is basically "C++, with extensions from Objective C for more dynamic method resolution".
Another technology that provides similar function to JAVA is provided by Microsoft and ActiveX Technologies, to give developers and Web designers wherewithal to build dynamic content for the Internet and personal computers. ActiveX includes tools for developing animation, 3-D virtual reality, video and other multimedia content. The tools use Internet standards, work on multiple platforms, and are being supported by over 100 companies. The group's building blocks are called ActiveX Controls, small, fast components that enable developers to embed parts of software in hypertext markup language (HTML) pages. ActiveX Controls work with a variety of programming languages including Microsoft Visual C++, Borland Delphi, Microsoft Visual Basic programming system and, in the future, Microsoft's developmenttool for Java, code named "Jakarta." ActiveX Technologies also includes ActiveX Server Framework, allowing developers to create server applications. One of ordinary skill in the art readily recognizes that ActiveX could be substituted for JAVA without undue experimentationto practice the invention.
In accordance with a preferred embodiment, BackgroundFinder (BF) is implemented as an agent responsible for preparing an individual for an upcoming meeting by helping him/her retrieve relevant information about the meeting from various sources. BF receives input text in character form indicative of the target meeting. The input text is generated in accordance with a preferred embodiment by a calendar program that includes the time of the meeting. As the time of the meeting approaches, the calendar program is queried to obtain the text of the target event and that information is utilized as input to the agent. Then, the agent parses the input meeting text to extract its various components such as title, body, participants, location, time etc. The system also performs pattern matching to identify particular meeting fields in a meeting text. This information is utilized to query various sources of information on the web and obtain relevant stories about the current meeting to send back to the calendaring system. For example, if an individual has a meeting with Netscape and Microsoft to talk about their disputes, and would obtain this initial information from the calendaring system. It will then parse out the text to realize that the companies in the meeting are "Netscape" and "Microsoft" and the topic is "disputes." Then, the system queries the web for relevant information concerning the topic. Thus, in accordance with an objective of the invention, the system updates the calendaring system and eventually the user with the best information it can gather to prepare the user for the target meeting. In accordance with a preferred embodiment, the information is stored in a file that is obtained via selection from a link imbedded in the calendar system.
Program Organization
A computer program in accordance with a preferred embodiment is organized in five distinct modules: BF.Main, BF.Parse, Background Finder.Error, BF.PatternMatching and BF.Search. There is also a formMain which provides a user interface used only for debugging purposes. The executable programs in accordance with a preferred embodiment never execute with the user interface and should only return to the calendaring system through Microsoft's Winsock control. A preferred embodiment of the system executes in two different modes which can be specified under the command line sent to it by the calendaring system. When the system runs in simple mode, it executes a keyword query to submit to external search engines. When executed in complex mode, the system performs pattern matching before it forms a query to be sent to a search engine.
Data Structures
The system in accordance with a preferred embodiment utilizes three user defined structures:
1. TMeetingRecord;
2. TPatternElement; and
3. TPatternRecord.
The user-defined structure, tMeetingRecord, is used to store all the pertinent information concerning a single meeting. This info includes userID, an original description of the meeting, the extracted list of keywords from the title and body of meeting etc. It is important to note that only one meeting record is created per instance of the system in accordance with a preferred embodiment. This is because each time the system is spawned to service an upcoming meeting, it is assigned a task to retrieve information for only one meeting. Therefore, the meeting record created corresponds to the current meeting examined. ParseMeetingText populates this meeting record and it is then passed around to provide information about the meeting to other functions.
If GoPatternMatch can bind any values to a particular meeting field, the corresponding entries in the meeting record is also updated. The structure of tMeetingRecord with each field described in parentheses is provided below in accordance with a preferred embodiment.
A.1.1.1.1.1 Public Type tMeetingRecord
sUserID As String (user id given by Munin)
sTitleOrig As String (original non stop listed title we need to keep
around to send back to Munin)
sTitleKW As String (stoplisted title with only keywords)
sBodyKW As String (stoplisted body with only keywords)
sCompany( ) As String (companys identified in title or body through
pattern matching)
sTopic( ) As String (topics identified in title or body through
pattern matching)
sPeople( ) As String (people identified in title or body through
pattern matching)
sWhen( ) As String (time identified in title or body through pattern
matching)
sWhere( ) As String (location identified in title or body through
pattern matching)
sLocation As String (location as passed in by Munin)
sTime As String (time as passed in by Munin)
sParticipants( ) As String (all participants engaged as passed in by
Munin)
sMeetingText As String (the original meeting text w/o userid)
End Type
There are two other structures which are created to hold each individual pattern utilized in pattern matching. The record tAPatternRecord is an array containing all the components/elements of a pattern. The type tAPatternElement is an array of strings which represent an element in a pattern. Because there may be many "substitutes" for each element, we need an array of strings to keep track of what all the substitutes are. The structures of tAPatternElement and tAPatternRecord are presented below in accordance with a preferred embodiment.
Public Type tAPatternElement
elementArray( ) As String
End Type
Public Type tAPatternRecord
patternarray( ) As tAPatternElement
End Type
Common User Defined Constants
Many constants are defined in each declaration section of the program which may need to be updated periodically as part of the process of maintaining the system in accordance with a preferred embodiment. The constants are accessible to allow dynamic configuration of the system to occur as updates for maintaining the code.
Included in the following tables are lists of constants from each module which I thought are most likely to be modified from time to time. However, there are also other constants used in the code not included in the following list. It does not mean that these non-included constants will never be changed. It means that they will change much less frequently.
For the Main Module (BF.Main):
CONSTANT PRESET VALUE USE
MSGTOMUNIN_TYPE 6 Define the message
number used to
identify messages
between BF
and Munin
IP_ADDRESS_MUNIN "10.2.100.48" Define the IP address
of the machine in
which Munin and BF
are running on so
they can transfer
data through UDP.
PORT_MUNIN 7777 Define the remote
port in which
we are operating on.
TIMEOUT_AV 60 Define constants for
setting time out in
inet controls
TIMEOUT_NP 60 Define constants for
setting time out
in inet controls
CMD_SEPARATOR ".backslash." Define delimiter to tell
which part of
Munin's command
represents the
beginning of our
input meeting text
OUTPARAM_SEPARATOR "::" Define delimiter for
separating out
different portions of
the output.
The separator is for
delimiting the msg
type, the user id,
the meeting title
and the beginning of
the actual stories
retrieved.
For the Search Module (BF.Search):
CURRENT
CONSTANT VALUE USE
PAST_NDAYS 5 Define number of days you want
to look back for AltaVista articles.
Doesn't really matter now because
we aren't really doing a news
search in alta vista. We want all
info.
CONNECTOR_AV_URL "+AND+" Define how to connect keywords.
We want all our keywords in the
string so for now use AND. If you
want to do an OR or something,
just change connector.
CONNECTOR_NP_URL "+AND+" Define how to connect keywords.
We want all our keywords in the
string so for now use AND. If you
want to do an OR or something,
just change connector.
NUM_NP_STORIES 3 Define the number of stories to
return back to Munin from
NewsPage.
NUM_AV_STORIES 3 Define the number of stories to
return back to Munin from
AltaVista.
For the Parse Module (BF.Parse):
CURRENT
CONSTANT VALUE USE
PORTION_SEPARATOR "::" Define the separator
between different
portions of the meeting
text sent in by Munin.
For example in
"09::Meet with
Chad::about life::Chad .vertline.
Denise::::::" "::" is the
separator between
different parts of the
meeting text.
PARTICIPANT_SEPARATOR ".vertline." Define the separator
between each participant
in the participant list
portion of the original
meeting text.
Refer to example above.
For Pattern Matching Module (BFPatternMatch): There are no constants in this module which require frequent updates.
General Process Flow
The best way to depict the process flow and the coordination of functions between each other is with the five flowcharts illustrated in FIGS. 2 to 6. FIG. 2 depicts the overall process flow in accordance with a preferred embodiment. Processing commences at the top of the chart at function block 200 which launches when the program starts. Once the application is started, the command line is parsed to remove the appropriate meeting text to initiate the target of the background find operation in accordance with a preferred embodiment as shown in function block 210. A global stop list is generated after the target is determined as shown in function block 220. Then, all the patterns that are utilized for matching operations are generated as illustrated in function block 230. Then, by tracing through the chart, function block 200 invokes GoBF 240 which is responsible for logical processing associated with wrapping the correct search query information for the particular target search engine. For example, function block 240 flows to function block 250 and it then calls GoPatternMatch as shown in function block 260. To see the process flow of GoPatternMatch, we swap to the diagram titled "Process Flow for BF's Pattern Matching Unit."
One key thing to notice is that functions depicted at the same level of the chart are called by in sequential order from left to right (or top to bottom) by their common parent function. For example, Main 200 calls ProcessCommandLine 210, then CreateStopListist 220, then CreatePatterns 230, then GoBackgroundFinder 240. FIGS. 3 to 6 detail the logic for the entire program, the parsing unit, the pattern matching unit and the search unit respectively. FIG. 6 details the logic determinative of data flow of key information through BackgroundFinder, and shows the functions that are responsible for creating or processing such information.
Detailed Search Architecture Under The Simple Query Mode
Search Alta Vista
(Function block 270 of FIG. 2)
The Alta Vista search engine utilizes the identifies and returns general information about topics related to the current meeting as shown in function block 270 of FIG. 2. The system in accordance with a preferred embodiment takes all the keywords from the title portion of the original meeting text and constructs an advanced query to send to Alta Vista. The keywords are logically combined together in the query. The results are also ranked based on the same set of keywords. One of ordinary skill in the art will readily comprehend that a date restriction or publisher criteria could be facilitated on the articles we want to retrieve. A set of top ranking stories are returned to the calendaring system in accordance with a preferred embodiment.
News Page
(Function block 275 of FIG. 2)
The NewsPage search system is responsible for giving us the latest news topics related to a target meeting. The system takes all of the keywords from the title portion of the original meeting text and constructs a query to send to the NewsPage search engine. The keywords are logically combined together in the query. Only articles published recently are retrieved. The Newspage search system provides a date restriction criteria that is settable by a user according to the user's preference. The top ranking stories are returned to the calendaring system.
FIG. 3 is a user profile data model in accordance with a preferred embodiment. Processing commences at function block 300 which is responsible for invoking the program from the main module. Then, at function block 310, a wrapper function is invoked to prepare for the keyword extraction processing in function block 320. After the keywords are extracted, then processing flows to function block 330 to determine if the delimiters are properly positioned. Then, at function block 340, the number of words in a particular string is calculated and the delimiters for the particular field are and a particular field from the meeting text is retrieved at function block 350. Then, at function block 380, the delimiters of the string are again checked to assure they are placed appropriately. Finally, at function block 360, the extraction of each word from the title and body of the message is performed a word at a time utilizing the logic in function block 362 which finds the next closest word delimiter in the input phrase, function block 364 which strips unnecessary materials from a word and function block 366 which determines if a word is on the stop list and returns an error if the word is on the stop list.
Pattern Matching in Accordance With a Preferred Embodiment
The limitations associated with a simple searching method include the following:
1. Because it relies on a stoplist of unwanted words in order to extract from the meeting text a set of keywords, it is limited by how comprehensive the stoplist is. Instead of trying to figure out what parts of the meeting text we should throw away, we should focus on what parts of the meeting text we want.
2. A simple search method in accordance with a preferred embodiment only uses the keywords from a meeting title to form queries to send to Alta Vista and NewsPage. This ignores an alternative source of information for the query, the body of the meeting notice. We cannot include the keywords from the meeting body to form our queries because this often results in queries which are too long and so complex that we often obtain no meaningful results.
3. There is no way for us to tell what each keyword represents. For example, we may extract "Andy" and "Grove" as two keywords. However, a simplistic search has no way knowing that "Andy Grove" is in fact a person's name. Imagine the possibilities if we could somehow intelligently guess that "Andy Grove" is a person's name. We can find out if he is an Andersen person and if so what kind of projects he's been on before etc. etc.
4. In summary, by relying solely on a stoplist to parse out unnecessary words, we suffer from "information overload".
Pattern Matching Overcomes These Limitations in Accordance With a Preferred Embodiment
Here's how the pattern matching system can address each of the corresponding issues above in accordance with a preferred embodiment.
1. By doing pattern matching, we match up only parts of the meeting text that we want and extract those parts.
2. By performing pattern matching on the meeting body and extracting only the parts from the meeting body that we want. Our meeting body will not go to complete waste then.
3. Pattern matching is based on a set of templates that we specify, allowing us to identify people names, company names etc from a meeting text.
4. In summary, with pattern matching, we no longer suffer from information overload. Of course, the big problem is how well our pattern matching works. If we rely exclusively on artificial intelligence processing, we do not have a 100% hit rate. We are able to identify about 20% of all company names presented to us.
Patterns
A pattern in the context of a preferred embodiment is a template specifying the structure of a phrase we are looking for in a meeting text. The patterns supported by a preferred embodiment are selected because they are templates of phrases which have a high probability of appearing in someone's meeting text. For example, when entering a meeting in a calendar, many would write something such as "Meet with Bob Dutton from Stanford University next Tuesday." A common pattern would then be something like the word "with" followed by a person's name (in this example it is Bob Dutton) followed by the word "from" and ending with an organization's name (in this case, it is Stanford University).
Pattern Matching Terminology
The common terminology associated with pattern matching is provided below.
Pattern: a pattern is a template specifying the structure of a phrase we want to bind the meeting text to. It contains sub units.
Element: a pattern can contain many sub-units. These subunits are called elements. For example, in the pattern "with $PEOPLE$ from $COMPANY$", "with" "$PEOPLE$" "from" "$COMPANY$" are all elements.
Placeholder: a placeholder is a special kind of element in which we want to bind a value to.Using the above example, "$PEOPLE$" is a placeholder.
Indicator: an indicator is another kind of element which we want to find in a meeting text but no value needs to bind to it. There may be often more than one indicator we are looking for in a certain pattern. That is why an indicator is not an "atomic" type.
Substitute: substitutes are a set of indicators which are all synonyms of each other. Finding any one of them in the input is good.
There are five fields which are identified for each meeting:
Company ($COMPANY$)
People ($PEOPLE$)
Location ($LOCATION$)
Time ($TIME$)
Topic ($TOPIC_UPPER$) or ($TOPIC_ALL$)
In parentheses are the placeholders I used in my code as representation of the corresponding meeting fields.
Each placeholder has the following meaning:
$COMPANY$: binds a string of capitalized words (e.g. Meet with Joe Carter of <Andersen Consulting >)
$PEOPLE$: binds series of string of two capitalized words potentially connected by "," "and" or "&" (e.g. Meet with <Joe Carter>of Andersen Consulting, Meet with <Joe Carter and Luke Hughes>of Andersen Consulting)
$LOCATION$: binds a string of capitalized words (e.g. Meet Susan at <Palo Alto Square>)
$TIME$: binds a string containing the format #:## (e.g. Dinner at <6:30 pm>)
$TOPIC_UPPER$: binds a string of capitalized words for our topic (e.g. <Stanford Engineering Recruiting>Meeting to talk about new hires).
$TOPIC_ALL$: binds a string of words without really caring if it's capitalized or not. (e.g. Meet to talk about <ubiquitous computing>)
Here is a table representing all the patterns supported by BF. Each pattern belongs to a pattern group. All patterns within a pattern group share a similar format and they only differ from each other in terms of what indicators are used as substitutes. Note that the patterns which are grayed out are also commented in the code. BF has the capability to support these patterns but we decided that matching these patterns is not essential at this point.
PAT PAT
GRP # PATTERN EXAMPLE
1 a $PEOPLE$ of Paul Maritz of Microsoft
$COMPANY$
b $PEOPLE$ from Bill Gates, Paul Allen and
$COMPANY$ Paul Maritz from Microsoft
2 a $TOPIC_UPPER$ meeting Push Technology Meeting
b $TOPIC_UPPER$ mtg Push Technology Mtg
c $TOPIC_UPPER$ demo Push Technology demo
d $TOPIC_UPPER$ Push Technology interview
interview
e $TOPIC_UPPER$ Push Technology
presentation presentation
f $TOPIC_UPPER$ visit Push Technology visit
g $TOPIC_UPPER$ briefing Push Technology briefing
h $TOPIC_UPPER$ Push Technology
discussion discussion
i $TOPIC_UPPER$ Push Technology
workshop workshop
j $TOPIC_UPPER$ prep Push Technology prep
k $TOPIC_UPPER$ review Push Technology review
l $TOPIC_UPPER$ lunch Push Technology lunch
m $TOPIC_UPPER$ project Push Technology project
n $TOPIC_UPPER$ projects Push Technology projects
3 a $COMPANY$ corporation Intel Corporation
b $COMPANY$ corp. IBM Corp.
c $COMPANY$ systems Cisco Systems
d $COMPANY$ limited IBM limited
e $COMPANY$ ltd IBM ltd
4 a about $TOPIC_ALL$ About intelligent agents
technology
b discuss $TOPIC_ALL$ Discuss intelligent agents
technology
c show $TOPIC_ALL$ Show the client our
intelligent agents
technology
d re: $TOPIC_ALL$ re: intelligent agents
technology
e review $TOPIC_ALL$ Review intelligent agents
technology
f agenda The agenda is as follows:
--clean up
--clean up
--clean up
g agenda: $TOPIC_ALL$ Agenda:
--demo client intelligent
agents technology.
--demo ecommerce.
5 a w/$PEOPLE$ of Meet w/Joe Carter of
$COMPANY$ Andersen Consulting
b w/$PEOPLE$ from Meet w/Joe Carter from
$COMPANY$ Andersen Consulting
6 a w/$COMPANY$ per Talk w/lntel per Jason
$PEOPLE$ Foster
7 a At $TIME$ at 3:00 pm
b Around $TIME$ Around 3:00 pm
8 a At $LOCATION$ At LuLu's resturant
b In $LOCATION$ in Santa Clara
9 a Per $PEOPLE$ per Susan Butler
10 a call w/$PEOPLE$ Conf call w/John Smith
B call with $PEOPLE$ Conf call with John Smith
11 A prep for $TOPIC_ALL$ Prep for London meeting
B preparation for Preparation for London
$TOPIC_ALL$ meeting
FIG. 4 is a detailed flowchart of pattern matching in accordance with a preferred embodiment. Processing commences at function block 400 where the main program invokes the pattern matching application and passes control to function block 410 to commence the pattern match processing. Then, at function block 420, the wrapper function loops through to process each pattern which includes determining if a part of the test string can be bound to a pattern as shown in function block 430. Then, at function block 440, various placeholders are bound to values if they exist, and in function block 441, a list of names separated by punctuation are bound, and at function block 442 a full name is processed by finding two capitalized words as a full name and grabbing the next letter after a space after a word to determine if it is capitalized. Then, at function block 443, time is parsed out of the string in an appropriate manner and the next word after a blank space in function block 444. Then, at function block 445, the continuous phrases of capitalized words such as company, topic or location are bound and in function block 446, the next word after the blank is obtained for further processing in accordance with a preferred embodiment. Following the match meeting field processing, function block 450 is utilized to loacte an indicator which is the head of a pattern, the next word after the blank is obtained as shown in function block 452 and the word is checked to determine if the word is an indicator as shown in function block 454. Then, at function block 460, the string is parsed to locate an indicator which is not at the end of the pattern and the next word after unnecessary white space such as that following a line feed or a carriage return is processed as shown in function block 462 and the word is analyzed to determine if it is an indicator as shown in function block 464. Then, in function block 470, the temporary record is reset to the null set to prepare it for processing the next string and at function block 480, the meeting record is updated and at function block 482 a check is performed to determine if an entry is already made to the meeting record before parsing the meeting record again.
Using the Identified Meeting Fields
Now that we have identified fields within the meeting text which we consider important, there are quite a few things we can do with it. One of the most important applications of pattern matching is of course to improve the query we construct which eventually gets submitted to Alta Vista and News Page. There are also a lot of other options and enhancements which exploit the results of pattern matching that we can add to BF. These other options will be described in the next section. The goal of this section is to give the reader a good sense of how the results obtained from pattern matching can be used to help us obtain better search results.
FIG. 5 is a flowchart of the detailed processing for preparing a query and obtaining information from the Internet in accordance with a preferred embodiment. Processing commences at function block 500 and immediately flows to function block 510 to process the wrapper functionality to prepare for an Internet search utilizing a web search engine. If the search is to utilize the Alta Vista search engine, then at function block 530, the system takes information from the meeting record and forms a query in function blocks 540 to 560 for submittal to the search engine. If the search is to utilize the NewsPage search engine, then at function block 520, the system takes information from the meeting record and forms a query in function blocks 521 to 528.
Alta Vista Search Engine
The strength of the Alta Vista search engine is that it provides enhanced flexibility. Using its advance query method, one can construct all sorts of Boolean queries and rank the search however you want. However, one of the biggest drawbacks with Alta Vista is that it is not very good at handling a large query and is likely to give back irrelevant results. If we can identify the topic and the company within a meeting text, we can form a pretty short but comprehensive query which will hopefully yield better results. We also want to focus on the topics found. It may not be of much merit to the user to find out info about a company especially if the user already knows the company well and has had numerous meetings with them. It's the topics they want to research on.
News Page Search Engine
The strength of the News Page search engine is that it does a great job searching for the most recent news if you are able to give it a valid company name. Therefore when we submit a query to the news page web site, we send whatever company name we can identify and only if we cannot find one do we use the topics found to form a query. If neither one is found, then no search is performed. The algorithm utilized to form the query to submit to Alta Vista is illustrated in FIG. 7. The algorithmn that we will use to form the query to submit to News Page is illustrated in FIG. 8.
The following table describes in detail each function in accordance with a preferred embodiment. The order in which functions appear mimics the process flow as closely as possible. When there are situations in which a function is called several times, this function will be listed after the first function which calls it and its description is not duplicated after every subsequent function which calls it.
Procedure
Name Type Called By Description
Main Public None This is the main
(BF.Main) Sub function where
the program first
launches. It
initializes BF
with the
appropriate para-
meters (e.g.
Internet time-
out, stoplist . . . )
and calls GoBF
to launch the
main part of
the program.
ProcessCom- Private- Main This function
mandLine Sub parses the
(BF.Main) command line. It
assumes that the
delimiter indicating
the beginning of
input from
Munin is stored in
the constant
CMD.sub.--
SEPARATOR.
CreateStopList Private Main This function sets
(BF.Main) Func- up a stop
tion list for future
use to parse out
unwanted words
from the meeting
text. There
are commas on each
side of each word
to enable straight
checking.
CreatePatterns Public Main This procedure is
(BF.Pattern Sub called once
Match) when BF is first
initialized to
create all the
potential patterns
that portions of
the meeting text
can bind to. A
pattern can
contain however
many elements as
needed. There
are two types
of elements. The
first type of
elements are
indicators. These
are real words
which delimit the
potential of a
meeting field
(e.g. company)
to follow. Most
of these indicators
are stop words
as expected
because stop
words are words
usually common
to all
meeting text so
it makes sense
they form patterns.
The second type
of elements are
special strings
which represent
placeholders. A
placeholder is
always in the
form of $*$
where * can
be either
PEOPLE,
COMPANY,
TOPIC_UPPER,
TIME, LOCATION
or TOPIC_ALL.
A pattern can
begin with either
one of the
two types of
elements and
can be however
long, involving
however any
number/type of
elements. This
procedure dynamically
creates a new
pattern record
for each pattern
in the table and
it also dynamic-
ally creates
new tAPattern-
Elements for
each element
within a pattern.
In addition, there
is the concept
of being able to
substitute indicators
within a pattern.
For example, the
pattern
$PEOPLE$ of
$COMPANY$ is
similar to the
pattern
$PEOPLE$ from
$COMPANY$.
"from" is a
substitute for
"of". Our
structure should
be able to
express such a
need for
substitution.
GoBF Public Main This is a
(BF.Main) Sub wrapper
procedurer
that calls
both the parsing
and the searching
subroutines of
the BF. It
is also responsible
for sending data
back to Munin.
ParseMeeting Public GoBackGround This function
Text Func- takes the initial Finder
(BF.Parse) tion meeting text
and identifies
the userID of
the record as
well as other
parts of the
meeting text
including the
title, body,
participant list,
location and time.
In addition,
we call a helper
function Process-
StopList to
eliminate all
the unwanted
words from
the original
meeting title
and meeting body
so that only
keywords are
left. The
information
parsed out is
stored in the
MeetingRecord
structure.
Note that this
function does
no error checking
and for the
most time
assumes that the
meeting text
string is
correctly
formatted by
Munin.
The important
variable is
this Meeting
Record is the
temp holder for
all info
regarding current
meeting. It's
eventually
returned to
caller.
FormatDelim- Private ParseMeetingText, There are 4 ways
itation DetermineNum in which the
(BF.Parse) Words, delimiters can be
GetAWord placed. We
From take care of all
String these cases by
reducing them
down to Case 4
in which there
are no delimiters
around but
only between
fields in a
string (e.g.
A::B::C)
DetermineNum Public ParseMeeting This functions
Words Func- Text, determines
(BF.Parse) tion ProcessStop how many words
List there are in
a string
(stInEvalString)
The function
assumes that each
word is separated
by a designated
separator as
specified in
stSeparator. The
return type is
an integer that
indicates how
many words have
been found assuming
each word
in the string
is separated by
stSeparator.
This function is
always used along
with GetAWord-
FromString and
should be called
before calling
GetAWordFrom
String.
GetAWord Public ParseMeeting This function
From String Func- Text, extracts the ith
(BF.Parse) tion ProcessStop word of the
List string
(stInEvalString)
assuming that each
word in the
string is separated
by a designated
separator contained
in the variable
stSeparator.
In most cases,
use this function
with Determine-
NumWords. The
function returns
the wanted word.
This function checks
to make sure that
iInWordNum is
within bounds
so that i
is not greater
than the total
number of words
in string or
less than/equal
to zero. If it
is out of bounds,
we return empty
string to indicate we
can't get
anything. We try
to make sure
this doesn't
happen by calling
DetermineNumWords
first.
ParseAndClean Private ParseMeetingText This function first
Phrase Func- grabs the word
(BF.Parse) tion and send it to
CleanWord in
order strip the
stuff that nobody
wants. There
are things in
parse Word that
will kill the
word, so we
will need a
method of
looping through
the body and
rejecting words
without killing the
whole function
i guess keep
CleanWord and
check a return
value ok,
now I have a
word so I need
to send it down
the parse chain.
This chain goes
ParseClean-
Phrase ->
CleanWord ->
EvaluateWord. If
the word gets
through the entire
chain without
killed, it will be
added at the
end to
our keyword string.
first would be
the function
that checks for
"/" as a
delimiter and
extracts the
parts of that.
This I will call
"StitchFace"
(Denise is more
normal and calls
it GetAWord-
FromString)
if this finds
words, then each
of these will be
sent, in turn,
down the chain. If
these get through
the entire chain
without being
added or killed
then they will be
added rather
than tossed.
FindMin Private ParseAndClean This function
(BF.Parse) Func- Phrase takes in 6
tion input values
and evaluates to
see what the
minimum non
zero value is.
It first creates
an array as a
holder so that
we can sort the
five input
values in
ascending order.
Thus the minimum
value will be
the first non
zero value
element of the
array. If we
go through
entire array
without finding
a non zero value,
we know that
there is an
error and we
exit the function.
CleanWord Private ParseAnd This function
(BF.Parse) Func- Clean tries to clean
tion Phrase up a word in
Func- a meeting text.
tion It first of all
determines if the
string is of a valid length.
It then passes
it through a
series of tests to see it
is clean and
when needed, it
will edit the word
and strip unnecessary
characters off of
it. Such tests includes
getting rid of file
extensions, non
chars, numbers etc.
EvaluateWord Private ParseAndClean This function tests
(BF.Parse) Func- Phrase to see if this
tion word is in the
stop list so
it can determine whether
to eliminate the
word from the
original meeting text.
If a word is
not in the stoplist, it
should stay around as
a keyword and this
function exits
beautifully with no
errors. However, if the
words is a stopword, an
error must be
returned. We must
properly delimit the
input test string
so we don't
accidentally retrieve
sub strings.
GoPattern Public GoBF This procedure is
Match Sub called when our
(BF.Pattern QueryMethod is set
Match) to complex query
meaning we do want to
do all the pattern
matching stuff.
It's a simple wrapper
function which
initializes some arrays
and then invokes
pattern matching on
the title and the
body.
Match Public GoPattern This procedure loops
Patterns Sub Match through every pattern
(BF.Pattern in the pattern
Match) table and tries to
identify different
fields within a
meeting text specified
by sInEvalString.
For debugging
purposes it also
tries to tabulate how
many times a
certain pattern was
triggered and stores it
in gTabulateMatches
to see whichp
pattern fired the
most. gTabulateMatches
is stored as a
global because we
want to be able to
run a batch
file of 40 or 50
test strings
and still be able
to know how
often a pattern was
triggered.
MatchAPattern Private MatchPatterns This function goes
(BF.Pattern Func- through each
Match) tion element in the
current pattern.
It first
evaluates to
determine
whether element is
a placeholder or
an indicator.
If it is a
placeholder, then it
will try to bind the
placeholder with some
value. If it
is an indicator, then
we try to
locate it.
There is a
trick however.
Depending on
whether we are at
current element is
the head of the
pattern or
not we want to take
different actions.
If we are
at the head, we want to
look for the indicator or
the placeholder. If we
can't find it, then we
know that the current
pattern doesn't exist
and we quit.
However, if it is
not the head, then we
continue looking,
because there may
still be a head
somewhere. We retry
in this case.
etingField Private MatchAPattern This function uses a
(BF.Pattern Func- big switch statement
Match) tion to first determine
what kind of
placeholder we are
talking about
and depending on what
type of placeholder,
we have specific
requirements and
different binding
criteria as
specified in the
subsequent functions
called such
as BindNames,
BindTime etc.
If binding is
successful we add
it to our
guessing record.
BindNames Private MatchMeetingField In this function,
(BF.Pattern Func- we try to
Match) tion match names
to the
corresponding
placeholder
$PEOPLE$.
Names
are defined as
any consecutive
two words which are
capitalized. We also
what to retrieve
a series of names
which are connected
by and, or &
so we look until we
don't see any
of these 3
separators anymore.
Note that we
don't want to bind
single word names
because it
is probably
too general anyway
so we don't
want to produce broad
but irrelevant
results. This
function calls
BindAFullName which
binds
one name so
in a since
BindNames collects
all the results
from BindAFullName
BindAFull Private BindNames This function tries
Name Func- to bind a full
(BF.Pattern tion name. If the
Match) $PEOPLE$
placeholder is not
the head of
the pattern, we
know that it
has to come right
at the beginning
of the test string
because we've
been deleting
stuff off the head
of the string
all along.
If it is the
head, we search
until we find
something that
looks like a full
name. If we
can't find it,
then there's no
such pattern in the
text entirely and
we quit entirely
from this pattern. This
should eventually return
us to the next
pattern in
MatchPatterns.
GetNextWord Private BindAFull This function grabs
AfterWhite Func- Name, the next word
Space tion Bind in a test string.
(BF.Pattern Time, It looks for
Match) BindCompanyTo the next word after
picLoc white spaces,
@ or /.
The word is
defined to end when
we encounter another
one of these
white spaces or
separators.
BindTime Private MatchMeeting Get the immediate
(BF.Pattern Func- Field next word
Match) tion and see if it
looks like a time
pattern. If so
we've found a
time and so we
want to add it
to the record.
We probably
should add more
time patterns.
But people don't
seem to like to
enter the time
in their titles
these days
especially since
we now have
tools like OutLook.
BindCompany Private MatchMeeting This function finds a
TopicLoc Func- Field continuous capitalized
(BF.Pattern tion string and binds
Match) it to stMatch
which is passed
by reference
from Match-
MeetingField. A
continous capitalized
string is a
sequence of capitalized
words which are not
interrupted
by things like , . etc.
There's probably
more stuff we can
add to the list of
interruptions.
LocatePattern Private MatchAPattern This function tries to
Head Func- locate an element
(BF.Pattern tion which is an
Match) indicator. Note
that this indicator
SHOULD BE AT
THE HEAD of the
pattern otherwise
it would have gone
to the function
LocateIndicator
instead. Therefore,
we keep on
grabbing the next
word until
either there's no
word for us
to grab (quit) or
if we find
one of the indicators
we are looking for.
ContainIn Private LocatePattern ' This function
Array Func- Head, is really simple.
(BF.Pattern tion LocateIndicator It loops through all
Match) the elements in the
array ' to find a
matching string.
LocateIndicator Private MatchAPattern This function
(BF.Pattern Func- tries to
Match) tion locate an element
which is an
indicator. Note
that this indicator
is NOT at the head
of the pattern
otherwise it
would have gone to
LocatePatternHead
instead.
Because of this, if our
pattern is to be
satisfied, the
next word we grab
HAS to be the
indicator or else we
would have failed.
Thus we
only grab one word,
test to see if
it is a valid indicator
and then return result.
InitializeGue- Private MatchAPattern This function
ssesRecord Sub reinitializes our
(BF.Pattern temporary test
Match) structure because
we have already
transfered the info
to the permanent
structure, we can
reinitialize it so
they each have
one element
AddToMeeting Private MatchAPattern This function is
Record Sub only called when
(BF.Pattern we know that the
Match) information stored
in tInCurrGuesses
is valid meaning
that it represents
legitamate guesses of
meeting fields ready
to be stored
in the permanent
record,
tInMeetingRecord.
We check to make
sure that we
do not store duplicates
and we also what
to clean up
what we want to
store so that
there's no cluttered
crap such as
punctuations, etc. The
reason why we don't
clean up until
now is to save
time. We don't
waste resources calling
ParseAndCleanPhrase
until we know
for sure that we are
going to add it
permanently.
NoDuplicate Private AddToMeeting This function loops
Entry Func- Record through each element
(BF.Pattern tion in the array to
Match) make sure that
the test string
aString is not
the same as
any of the
strings already
stored
in the array.
Slightly
different from
ContainInArray.
SearchAlta Public GoBackGround This function
Vista Func- Finder prepares a
(BF.Search) tion query to be submited
to AltaVista
Search engine. It
submits it and then
parses the
returning result in the
appropriate format
containing the title,
URL and
body/summary of
each story retrieved.
The number of
stories retrieved is
specified by
the constant
NUM_AV.sub.--
STORIES.
Important variables
include
stURLAltaVista used
to store
query to submit
stResultHTML used
to store
html from page
specified by
stURLAltaVista.
ConstructAlta Private SearchAltaVista This function
VistaURL Func- constructs the
(BF.Search) tion URL string for the
alta vista
search engine using
the advanced
query search mode.
It includes the
keywords to
be used, the
language and
how we want to
rank the
search. Depending on
whether we want to
use the results
of our pattern
matching unit,
we construct
our query differently.
Construct Private ConstructAlta This function
SimpleKey Func- VistaURl, marches down
Word tion ConstructNews the list of keywords
(BF.Search) PageURL stored in
the stTitleKW
or stBodyKW
fields of the
input meeting
record and links
them up into
one string with
each keyword
separated by a
connector as
determined by the
input variable
stInConnector.
Returns this newly
constructed string.
Construct Private ConstructAlta This function
ComplexAV Func- VistaURL constructs the
KeyWord tion keywords to be
(BF.Search) send to the
AltaVista site.
Unlike
ConstructSimpleKey
Word which
simply takes all the
keywords from the
title to form
the query, this
function will
look at the
results of BF's
pattern matching
process and see
if we are able to
identify any specific
company names or
topics for constructing
the queries. Query will
include company and
topic identified
and default to
simple query if
we cannot identify
either company or
|