|
|
|
Access augmentation or optimizing |
Semantic user interface5974413
Abstract
A system and method that allows a user to use their everyday language or user defined words to operate a computer in a highly efficient way. In short, every word, letter, control character and symbol is potentially actionable. A computer user's productivity is dramatically increased by making available those functions that enable a user to produce most of his work through simple, language-based commands. The present invention provides an intuitive interface, referred to as a semantic user interface (SUI), that enhances the operation of the current standard window-based interface in a manner that is simple, richer and natural. By leveraging all of the richness and power inherent in a user's language, the present invention provides an important tool that allows the personal computer to operate in a manner that is much closer to our natural way of interacting. A user is allowed to enter "commands" in his everyday natural language in order to control the operations of the computer. All commands are language-based and user-defined. These commands can be entered from any context of the user's computer (e.g., any application or operating system workspace). The commands allows a user to launch applications and navigate within applications by using language rather than clicks from a pointing device such as a mouse. It also allows the replacement of keystrokes with stored words or keystrokes. The system also keeps a complete archive record of all the text content the user provides as input, regardless of which application program or operating system window the user is operating in at the time. The combined set of all user defined commands and the memory of all the input text that is stored in the archive constitutes the personality profile and is transportable from one computer to another.
Claims
What is claimed is:
1. A system for permitting a user to implement functionality on a computer, the functionality being provided across a plurality of application programs or within an environment created by an operating system, the computer including a data entry device, comprising:
means for monitoring all data entered or selected by a user within and one of the plurality of application programs or within the environment created by the operating system, said data including one or more alphabetic letters, symbols and/or words, wherein certain combinations of data represent action words;
a wordbase having stored therein a plurality of item records, each item record having an action word and one of a plurality of associated functions;
means for searching said wordbase for a match with an action word entered by said user; and
means for performing said function associated with said action word.
2. The system of claim 1, wherein said data is entered via a microphone, selection device, or keyboard.
3. The system of claim 2, further comprising means for recognizing voice signals input via said microphone to produce recognizable data, wherein said recognizable data is used by said means for searching.
4. The system of claim 1, wherein said data entered by said user may be selected with a selection device by said user.
5. The system of claim 1, wherein a word entered by said user is a dual word, wherein said user disambiguates said dual word to indicate to said means for monitoring that said dual word is an action word.
6. The system of claim 1, wherein said data includes code words, dual words and content words.
7. The system of claim 6, further comprising means for providing feedback to said user when said user enters a dual word.
8. The system of claim 1, further comprising means for providing statistics regarding said user's input activity, including the context said user was operating within at the time of said input activity.
9. The system of claim 1, wherein said data is captured by said means for monitoring prior to the operating system forwarding said data to one of the plurality of application programs.
10. The system of claim 1, further comprising a display device for displaying said data, wherein said data is erased if it represents an action word.
11. The system of claim 1, wherein said data is displayed within the data entry fields of one of the plurality of application programs.
12. The system of claim 1, further comprising means for forwarding said data to one of the plurality of application programs.
13. The system of claim 12, further comprising a state table that stores all information that has been forwarded to said one of the plurality of application programs.
14. The system of claim 1, wherein said action word is a dual word, said system further comprising means for providing a signal that a dual word has just been entered by said user.
15. The system of claim 1, wherein said function can be activated via a plurality of action words.
16. The system of claim 1, wherein said means for monitoring includes a virtual device driver.
17. The system of claim 1, wherein said means for monitoring monitors for a delineator, wherein an action word is always followed by a delineator.
18. The system of claim 17, wherein said delineator a punctuation mark, a special character, entry of a space bar, or a click of a selection device.
19. The system of claim 17, wherein said delineator includes a context switch between application programs.
20. The system of claim 1, further comprising means for forwarding said data to the operating system.
21. The system of claim 1, wherein said wordbase includes a plurality of folders, wherein each folder has a priority associated therewith.
22. The system of claim 21, further comprising means for changing the priority of said plurality of folders.
23. The system of claim 1, further comprising means for displaying a charm box, said charm box having displayed therein information relating to said data entered by said user.
24. The system of claim 1, further comprising a mathematical application program for performing in-place arithmetic, said mathematical application program being triggered by an action word.
25. The system of claim 1, further comprising means for displaying a monitor, said monitor having a field for displaying said data and a field for displaying said function being performed.
26. The system of claim 1, wherein each item record within said wordbase includes a frequency count of dual word matches and code word matches.
27. The system of claim 1, wherein said function is performed by executing a script.
28. The system of claim 1, wherein said wordbase includes an archive of all data entered by said user.
29. The system of claim 28, wherein said archive includes a 7.times.7 organization of said data.
30. The system of claim 1, wherein said function includes launching an application program, a file or a folder.
31. The system of claim 1, wherein said function includes text substitution, wherein said text is substituted at the position of a displayed curser.
32. The system of claim 31, further comprising means for toggling between at least two choices for said text substitution.
33. The system of claim 1, wherein at least a subset of said action words are user defined, wherein said user can add, delete and modify said action words within said wordbase.
34. The system of claim 1, wherein said means for monitoring can be toggled between on and off.
35. The system of claim 1, wherein the environment providing a graphical user interface (GUI).
36. The system of claim 35, further comprising means for selecting a block of data that is displayed via said GUI or one of the plurality of application programs, wherein said data can be entered by selecting said block of data.
37. The system of claim 1, wherein said associated function may include calling an agent.
38. The system of claim 1, wherein said wordbase is located on a server connected to a network.
39. The system of claim 1, further comprising means for providing said user with statistical feedback regarding said data.
40. The system of claim 1, wherein a single action word can activate two or more functions, the system further comprising means for selecting between said two or more functions when said single action word is entered by said user.
41. The system of claim 1, wherein said action word is formed by at least two natural language words.
42. The system of claim 1, further comprising means for generating and displaying statistical data regarding the productive use of said action words by said user.
43. The system of claim 1, further comprising a state table that includes a list of said data most recently entered by said user.
44. The system of claim 1, wherein said action words include code words and dual words, the system further comprising means for allowing said user to turn said code words and said dual words on and off within said wordbase.
45. A method for permitting a user to implement functionality on a computer having a graphical user interface and data entry device, comprising:
1) providing a wordbase having a plurality of item records, each item record having stored therein an action word, organizing said item records to define at least one personal profile, wherein said action word comprises natural language or code word text strings;
2) associating a plurality of agents with said wordbase, wherein each agent performs one of a plurality of functions;
3) associating said action word stored in said wordbase with a function performed by one of said plurality of agents;
4) receiving a data string input by the user within an application program or an operating system environment or data selected by a user within said application program or said operating system environment, wherein said data string selected by the user is displayed on the graphical user interface;
5) determining if said data string input by the user or said data selected by the user is an action word stored in said wordbase; and
6) performing via one of said plurality of agents said function associated with said action word stored in said wordbase.
46. The method of claim 45, wherein performing said function includes launching an application program and passing data to the newly launched application.
47. The method of claim 45, further comprising the step of populating said wordbase with user defined action words.
48. A method for allowing a user to control a computer having an operating system that provides a graphical user interface (GUI), comprising the steps of:
(1) providing a semantic user interface (SUI) that complements the GUI, said semantic user interface seamlessly integrated with the operating system;
(2) allowing a user to enter keystrokes;
(3) monitoring for said keystrokes by said SUI; and
(4) performing an action associated with said keystrokes, wherein said action can be performed within any application program running on the computer or within an environment created by the operating system.
49. A method for allowing a user to control a computer within a network, comprising the steps of:
(1) providing a semantic user interface (SUI);
(2) allowing a user to enter data, wherein said data can be entered via a microphone, selection device or keyboard;
(3) monitoring for said data by said SUI; and
(4) performing an action associated with said keystroke, wherein said action can be performed within any application program running on the computer or within an environment created by an operating system, wherein said action is user definable.
50. The method of claim 49, further comprising the step of populating a wordbase, wherein said wordbase includes a plurality of item records, each item record having a one or more keystrokes and an associated action.
51. The method of claim 50, wherein said action is selected from the group of: navigation, information, substitution and control.
52. The method of claim 49, wherein said network includes a server that executes an operating system that provides a graphical user interface (GUI), wherein said SUI complements said GUI.
53. The method of claim 48, wherein said action is selected from the group of: navigation, information, substitution and control.
54. The method of claim 48, further comprising the steps of allowing a user to entervoice data via speech recognition unit, monitored for said voice data by said SUI, and performing an action associated with said voice data, wherein said action can be performed within any application program running on the computer or within said environment created by the operating system.
55. The method of claim 48, wherein said keystrokes entered by said user can form a content word, a code word or a dual word, wherein the method further comprises disambiguating a dual word.
56. The method of claim 55, wherein said action is selected from the group of: navigation, information, substitution and control.
57. The method of claim 56, further comprising the step of allowing said user to select said keystrokes that result in said action being performed.
58. The method of claim 55, further comprising the step of providing a signal to said user when said user enters a dual word.
59. The method of claim 55, further comprising the step of erasing a code word or dual word after it is entered by said user.
60. The method of claim 48, wherein said step of monitoring occurs prior to the operating system forwarding said keystrokes to any application program.
61. The method of claim 48, wherein said keystrokes can form two or more words that result in said action being performed.
62. The method of claim 48, wherein the step of monitoring monitors for entry of an action word followed by a delineator.
63. The method of claim 48, further comprising displaying a charm box, said charm box having displayed therein information relating to said keystrokes entered by said user.
64. The method of claim 48, wherein said step of providing said SUI further comprises the step of displaying a monitor, said monitor having a field for displaying said keystrokes and a field for displaying said action being performed.
65. The method of claim 48, wherein said action includes launching an application program, a file or a folder.
66. The method of claim 48, further comprising the steps of:
displaying at least two actions that can be performed;
allowing said user to toggle between said at least two actions; and
allowing said user to select one of said at least two actions.
67. The method of claim 48, wherein the step of monitoring can be toggled between on and off.
68. The method of claim 48, further comprising providing statistics regarding said user's input activity, including the context said user was operating within at the time said keystrokes were entered.
69. The system of claim 1, further comprising means for defining at least two personal profiles, wherein each personal profile controls the computer with a different set of action words.
70. The system of claim 1, wherein said wordbase can be shared by more than one user.
71. The system of claim 1, wherein the computer is connected to a server.
72. The system of claim 1, wherein said associated function is selected from the group of: navigation, information, substitution, and control.
73. The system of claim 1, wherein one of said plurality of associated functions includes launching an application program or Internet site.
74. The system of claim 45, wherein said function includes navigation, information, substitution and control.
75. A system for permitting a user to implement functionality on a computer, the functionality being provided across a plurality of application programs or within an environment created by an operating system, the computer including a data entry device, comprising:
means for monitoring all data entered or selected by a user within any one of the plurality of application programs or within the environment created by the operating system, said data including one or more alphabetic letters, symbols and/or words, wherein certain combinations of data represent action words;
a wordbase having stored therein a plurality of item records, each item record having an action word and one of a plurality of associated functions, wherein said associate functions are selected from the group of: information, navigation, control and substitution;
means for searching said wordbase for a match with an action word entered by said user; and
means for performing said function associated with said action word.
76. The system of claim 75, wherein said data is entered via a microphone, selection device, or keyboard.
77. The system of claim 75, further comprising means for recognizing voice signals input via said microphone to produce recognizable data, wherein said recognizable data is used by said means for searching.
78. The system of claim 75, wherein said data entered by said user may be selected with a selection device by said user.
79. The system of claim 75, wherein a word entered by said user is a dual word, wherein said user disambiguates said dual word to indicate to said means for monitoring that said dual word is an action word.
80. The system of claim 75, wherein said data includes code words, dual words and content words.
81. The system of claim 80, further comprising means for providing feedback to said user when said user enters a dual word.
82. The system of claim 75, wherein said data is captured by said means for monitoring prior to the operating system forwarding said data to one of the plurality of application programs.
83. The system of claim 75, further comprising a display device for displaying said data, wherein said data is erased if it represents an action word.
84. The system of claim 75, wherein said function can be activated via a plurality of action words.
85. The system of claim 75, wherein said means for monitoring monitors for a delineator, wherein an action word is always followed by an delineator.
86. The system of claim 85, wherein said delineator a punctuation mark, a special character, entry of a space bar, or a click of a selection device.
87. The system of claim 85, wherein said delineator includes a context switch between application programs.
88. The system of claim 75, wherein said function includes launching an application program, a file or a folder.
89. The system of claim 75, wherein said function includes text substitution, wherein said text is substituted at the position of a displayed curser.
90. The system of claim 89, further comprising means for toggling between at least two choices for said text substitution.
91. The system of claim 75, wherein at least a subset of said action words are user defined, wherein said user can add, delete and modify said action words within said wordbase.
92. The system of claim 75, wherein said means for monitoring can be toggled between on and off.
93. The system of claim 75, wherein said wordbase is located on a server connected to a network.
94. The system of claim 75, wherein a single action word can activate two or more functions, the system further comprising means for selecting between said two or more functions when said single action word is entered by said user.
95. The system of claim 75, wherein said action word is formed by at least two natural language words.
96. The system of claim 75, further comprising means for defining at least two personal profiles, wherein each personal profile controls the computer with a different set of action words.
97. The system of claim 75, wherein said wordbase can be shared by more than one user.
98. The system of claim 75, wherein the computer is connected to a server.
99. The system of claim 75, wherein one of said plurality of associated functions includes launching an application program or Internet site.
100. The system of claim 75, wherein said action words include code words and dual words, the system further comprising means for allowing said user to turn said code words and said dual words on and off within said wordbase.
Description
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to a semantic interface for a computer system, and more particularly, to a system and method of providing a semantic interface that allows a user to access via a set of user defined words a plurality of services, including command, navigation and substitution, within all contexts of his/her computer system.
2. Related Art
Computers have revolutionized the way individuals in all aspects of life perform tasks. A user interface provides a mechanism for individuals to access all the features and functionalities of their computer. Without a user friendly interface, these features and functions are typically inaccessible to the computer operator. The prevalent user interface in the industry today uses windows, icons, menus and pointing devices. The text stream entered by the user, however, has been essentially ignored.
This window-based user interface (also referred to as a graphical user interface or GUI) was first conceived by Xerox, commercialized by Apple Computers (e.g., the Macintosh), and brought to the mainstream by Microsoft Corporation (e.g., Windows 95). The GUI is powerful for organizing the capabilities and resources available in a computer. It enables the user to incrementally explore and discover his computer's capabilities and controls. It keeps everything in a convenient visual context, using helpful metaphors, like desktops and windows.
The GUI provides a menu hierarchy which is accessible via a pointing device or mouse. One of the cornerstones of this interface is the ability for the user to interact directly with objects and elements. This can be a great advantage in some cases, but it can result in making simple tasks that are often repeated into tedious choirs of navigation through a maze of GUI windows. To provide the ability to directly manipulate the elements or objects, you must enable the user to work at an "atomic" level, losing the ability to group a related series of basic actions into one high level action or use of conditionals.
Under this paradigm, we can no longer access or work with objects that are not visible or unknown to us. This situation is not unlike going to eat in another country where we do not know the language. We are thus forced to go to the kitchen and point to whatever food we want. It is clear that we do not want to go to the kitchen in order to eat; we would rather express ourselves using all the richness that our natural language allows. In order to accomplish this goal, the computer has to respond to human language and not the other way around. In reality, we want to be able to go to our favorite restaurant and say "give me my usual order" and receive exactly what we ordered. This is personal attention and awareness of your eating profile. This is, in essence, what we want from our personal computer.
Advocates of the window-based user interface firmly believe that the user should always be in control. The window-based user interface provides permanent feedback to the user by providing windows with menus. The down side of this is that the user must always be in control even if he does not want to be or he cannot be because of the complexity of the task. Windows 95, for example, has started a trend in allowing the user to delegate his control to an agent, through a concept of smart and thoughtful "agents." Such tasks like un-installing software, are, for example, automatically performed by Windows 95.
There are many tasks that a user must repeat again and again when using a GUI, such as opening certain files and activating certain controls. For such tasks, the GUI presents the user with a single logic set, implemented within the limited screen real estate of his computer monitor. Also, the GUI recognizes none of the user's words. Even the simplest functions requires the user to change mode from the keyboard to hand/eye mouse control. To use the GUI, he must lift his hand from the keyboard to the mouse. He must also lift his eyes to the screen to locate the desired graphical element, and then manipulate the mouse while visually monitoring the result. This is like having to look at someone every time you want to say a word to that person.
Computer systems must provide users with a mechanism to undo a previous action when a mistake has been made. These same systems must also provide a strong warning if an intended action will be irreversible. This is not a great concern when you are a novice user, but when you become a more experienced user, this feature will turn against you and unnecessarily increase your workload. Take for instance the simple task of copying a file to a floppy that does not have enough free space. The Macintosh window-based interface provides a warning that you must throw away X Kb in order to make space. The user, in an attempt to make space, discards X Kb of data from the floppy and places it in the trash. The user once again attempts to copy the file onto the floppy, only to be told that there is not enough room, but would you like to throw out the trash. This is what you wanted from the start! What was a good feature quickly becomes a nuisance.
The problem stems from the inability of the computer to fully understand even our simplest intentions; it lacks our personality profile. In order to overcome this problem, the computer needs to build a deeper model of each user's intentions and history in order to better serve the user's needs and to eliminate unnecessarily repetitive activities. The core requirement here is to provide mechanisms to ascertain how users work and to track their activities in an unobtrusive way.
Current wisdom states that the more static and unchanging our environments, the simpler and better it is for us. As we grow in knowledge and understanding of what the computer can do for us, we are willing to accept changes and learn to cope with them in our quest to increase our personal productivity. Unfortunately, current computer user interfaces have limited abilities to allow a user to express themselves. If computers could communicate with a richer language, it would not be so important that everything have a uniform look and feel.
The computer interface should allow the user to perform any task at any time, irrespective of the application that are currently running. In other words, if a user is working on a word processor and needs to make some calculations, he should not be required to leave his work and open another application via a menu driven user interface to complete the necessary arithmetic operations.
Every computer user has a unique pattern of use. Typically, 80% of a user's work product is accomplished through repeated use of only 20% of his software's available features. This is commonly referred to as Pareto's Rule, and the 20% of the tasks are often referred to as the "vital few." The 80% of available software features and functions that are not needed or used by any particular user must still be available to all other users through the GUI system of menus and windows. Every user's "80/20" profile is unique. Nevertheless, it is the need to organize 100% of the available functionality that necessitates the depth, nesting and complexity of current GUI system. As a result, the GUI is an inefficient fit, to a greater or lesser degree, for every individual user.
Over the years, a number of approaches have been invented to tackle this problem of inefficient fit. Because of their inherent limitations, none have been successful enough to reach the mainstream user. Software entrepreneurs have developed "shortcut" utilities of various designs. While not specifically marketed as such, the intention of these utilities is to address each user's "80/20" pattern of often repeated tasks. These "shortcut" utilities take two forms: macros triggered by key combinations and icon palette macros.
Macros triggered by key combinations typically take one or both of two forms, macro utilities and text replacement utilities. Macro utility programs provide shortcuts to functions and processes such as opening applications and files, making menu selections, and performing multi-step operations. Macro utilities, such as Tempo, MacroMagic, and Keboard Express for the "WinTel" platform and QuicKeys for the Macintosh are all activated by the user via keystroke combinations. Microsoft's Windows interface offers many key combination shortcuts macros to operate various controls, menus and etc. To activate these macros, the user must press at least one "control" key (e.g., <alt>), combined with pressing a single "non-control" character (e.g., <x>). Users find it very difficult to develop a mnemonically consistent scheme for remembering such key combinations, for two reasons. First, the combinations are mnemonically so arbitrary that it is difficult to use mnemonic logic to memorize the cryptic key combinations. Also, many key combinations only work a given way in specific application programs, further restricting the combinations that are available. The user's limited ability to remember and reflexively recall more than a few cryptic key combinations severely limit's the usability of macro utilities. Many people are so intimidated by the cryptic nature of macros they refuse to even consider their use.
Text substitution utilities provide the ability to replace a short string of typed text with long and/or formatted text. For example, a user may define the code word "evp" to trigger the substitution to "Executive Vice President", or define a short code word like "nad" to be replaced by a series of pre-defined text lines (name and address in this case). There are several utility software products available to do that within single applications. Text replacement utilities for single applications are, for example, included with Word 7.0 for Windows95. Other examples include ShortKeys for Windows and both SpellCatcher and TypeIt4Me on the Macintosh platform. Recognition of the user's words by these utilities is limited to the purpose of replacing one text string with another. These utilities are writer's aids only. They do not enable the user to also use words for controlling computer processes and functions.
Icon palette utilities are used to give macros a visual presence and context. The macros are activated via mouse clicks. The Icon pallets are an attempt to use a visual interface to overcome the cryptic and therefore hard-to-remember keystroke interface for macro utilities. Often, macro utility products offer icon palettes as a second, alternative interface for accessing the macros. In this approach, a computer macro (process or function) is assigned to a graphical icon, which is presented on an icon bar on the user's screen. Examples of such utilities are included in Norton Navigator for Window95 and in both QuickKeys and OneClick on the Macintosh platform.
By definition, these Icon pallet utilities are an extension of the GUI. Screen size, display resolution, and the user's preference in allocating scarce screen real estate limit the number of icons it makes sense for the user to display on his screen. Given that the users "vital few" can involve scores or hundreds of items, the Icon approach is severely limited by the visual real estate available and the amount of visual complexity the user can tolerate. Moreover, the user must memorize the relationship between the graphic depiction of each icon and the function or process each executes. As the users icon pallet population increases, the distinctiveness of each icon is reduced.
The existing shortcut utilities do not offer the user an integrated approach to creating, managing and using shortcuts for content services, retrieval services and command. Their interfaces are inconsistent and far too difficult to organize and remember. Because the user must assemble his shortcuts using a collection of different software products, he loses a lot of his gains in dealing with cumbersome and time-consuming management of his shortcuts.
It is clear from the above, that the current trend to rely solely on window-based user interfaces has seriously constrained a user's ability to fully utilize their computer. Although the window-based user interface has revolutionized the computer system, and has allowed millions of people to use computers, we have reached a point where a user's ability to fully appreciate and utilize all of the features and functionalities of their computer system has been compromised. Thus, what is needed is a system and method that provides a user with an efficient, convenient and natural way to utilize his everyday language to work with applications, files, control commands, and the like, that form his/her "vital few."
SUMMARY OF THE INVENTION
The present invention allows a user to use their everyday language competency or user defined code words to operate a computer in a highly efficient way. In short, every word, letter, control character and symbol is actionable. The present invention is based on Pareto's law, which applies to how people work. Pareto's law states that people use 20% of all available tools and functionality to accomplish 80% of their tasks. Similarly, 80% of people's work is accomplished by repeating 20%, or the vital few, of their tasks. By focusing on those activities that enable us to produce most of our work and making them available through simple, natural language-based commands, the present invention enhances a computer user's productivity dramatically. The present invention provides a more intuitive interface that enhances the operation of the current standard graphical user interface (GUI) in a manner that is simple, richer and natural. By leveraging all of the richness and power inherent in our language, the present invention provides an important tool that allows the personal computer to operate in a manner that is much closer to our natural way of interacting; that is, the way people interact with each other.
The present invention provides a language awareness paradigm, which was born out of a very practical need: to do more with current resources. The basic principles of the language awareness paradigm can be stated very simply:
all commands are natural language-based and/or user-defined.
the basic set of commands are designed to allow users to gain access to their vital few (e.g., applications, documents, controls and functions), which defines each user's "sweet spot" of activity, using a least effort path.
all operations and functionality are unobtrusive.
all user's input is recorded in a context rich format for future reference.
the combined set of all user word preferences, defined commands, and the order in which the commands are stored in memory constitutes a personality profile and are transportable from one computer to another.
Based on the above principles, the present invention provides a user environment, referred to as a semantic user interface (SUI), that compliments the GUI. Via the SUI, the user is enabled to enter action words and interact with the system to control the operations of the computer. The SUI is always monitoring the user's input text stream in the background.
The SUI thus makes the computer responsive, on a system-wide basis, to the user's every word. Accordingly, the SUI allows a user to enter action words from any context (i.e., any application or operating system workspace). Action words are a new category of words introduced by the present invention. Action words are thus words that users place into the text stream as requests for specific services from the present invention. There are two types of action words: code words and dual words. Code words are action words the user makes up or which are not part of his natural language lexicon (e.g., not in the standard dictionary). For example, typing "msword" to launch Microsoft's Word application is an example of entering an code word. Dual words are utterances that can be either ordinary content words or action words, depending on the user's intention in typing the word. The user may type "excel" because he intends it to be a content word in his application text, or, alternatively, he may type it because he wants to use it as an action word for opening Microsoft Excel.
The action words are then checked against the contents of a wordbase. The wordbase includes a plurality of item records. Each item record includes an action word (i.e., code word and/or dual word) and an associated service script. The service script may perform a content, retrieval, navigation or command service, or a combination of these. If the action word entered by the user is located within the wordbase, the service script associated therewith is executed. Otherwise, the utterance entered by the user is a content word and is ignored by the present invention.
Action words allow a user to launch applications, navigate within applications and control application functions by using their natural language rather than dragging and clicking with a pointing device such as a mouse. The language used is personalized for each user. That is, the action words can be user defined, thus allowing a user to utilize his own lexicon of words to control his/her computer. The present invention allows the user to identify a variety of repetitive tasks and trigger them via their predefined action words. It also enables new types of computer access, information retrieval, and other services to be performed. The present invention works with, and independently of, any software application (e.g., word processor, spreadsheet, presentation package, Internet navigator, and the like). It is thus a context-free semantic user interface, software tool and an application environment.
The present invention saves all information that is entered by the user, and stores this information in a maintenance free environment, referred to as an ActiveWords archive. The present invention records and archives the user's input text on the fly from whatever application he is working in at that time. The present invention further creates a so called 7.times.7 data repository, which is a database that is divided into seven categories, each category having seven subcategories. The 7.times.7 categorization allows a user to record notes, expenses, to do lists, and the like. Finally, the present invention is completely portable. It goes wherever the user goes simply by providing for the user's personal profile to be downloadable from one computer to any other computer that has the present invention installed.
The user can create a user profile to match his unique language personality. The present invention keeps an archive record of the user's language preferences, word frequencies, and his utterance behavior. It provides the user with tools for using that archive, in combination with his user profile, to refine his SUI and tailor it to match his habit's and preferences. Using the SUI thus becomes reflexive, like the use of a mouse becomes reflexive, because it is so easy to learn and operate, and because it operates the same way in all contexts. Finally, the SUI establishes a platform others can use to develop and sell application products that leverage the SUI. By linking the SUI via software agents, any software product can become language aware.
The foregoing and other features and advantages of the invention will be apparent from the following, more particular description of a preferred embodiment of the invention.
BRIEF DESCRIPTION OF THE FIGURES
The present invention will be described with reference to the accompanying figures, wherein:
FIG. 1 illustrates the placement of the present invention within an operating system in order to be able to monitor all user inputs.
FIG. 2 illustrates an archive generated in accordance with the present invention.
FIG. 3 is an architectural block diagram of the present invention.
FIG. 4 is a block diagram of a micro kernel engine (MIKE).
FIG. 5 illustrates the interaction of a control center, which is a central place to manage the present invention, with the other components of the present invention.
FIG. 6 is a block diagram of the control center.
FIG. 7 is a flowchart that illustrates how the present invention checks a wordbase for action words.
FIG. 8 illustrates an exemplary environment for the present invention.
FIG. 9 is a flowchart of the operation of a toggle function and pop-up menu function.
FIG. 10 is a block diagram of the MIKE and a content display system (CDS) in accordance with the present invention.
FIG. 11 is a screen shot of a window that displays Mr. IBeams corner, which provides feedback to the user regarding their use of the present invention.
FIG. 12 illustrates the concept of multiple personal profiles.
FIG. 13 is a flowchart that illustrates how the ActiveWords archive is populated with data in accordance with the present invention.
FIG. 14 illustrates a screen shot of a monitoring bar in accordance with the present invention.
FIG. 15 is a screen shot of a window that displays a "Tip" that allows a user to become acquainted with the functions of the present invention.
FIG. 16 illustrates a screen shot of the monitoring bar along with a plurality of associated pull-down menus.
FIG. 17 illustrates the launching of Microsoft Word 97.
FIG. 18 is a screen shot of a state table in accordance with the present invention.
FIG. 19 is a screen shot of a LightEditor for adding code words and dual words to the ActiveWords Wordbase in accordance with the present invention.
FIGS. 20, 22 and 23 are screen shots of the control center.
FIG. 21 illustrates a wordbase item record.
FIG. 24 is a screen shot of a window that allows a user to configure the monitor bar.
FIGS. 25 and 26 are screen shots illustrating the Advanced Find and Find functions of the present invention, respectively.
FIG. 27 is a screen shot illustrating a banner that is displayed in a preferred embodiment when a dual word has been entered by the user.
FIGS. 28 and 29 illustrate the concept of a user profile.
FIG. 30 is a screen shot of the ActiveWords ScratchPad.
FIGS. 31A and 31B are screen shots of a window that allows multi-item resolution.
In the figures, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The figure in which an element first appears is indicated by the leftmost digit(s) in the reference number.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
I. Overview
A. The ActiveWords System
B. ActiveWord Services
1. Action Services
2. Archive Services
II. Exemplary Environment
III. Capturing Utterances Entered by the User
IV. Architecture and Operation of the Present Invention
A. Action Words and Content Words
B. Runtime Operation
1. Wordbase 340
2. Services performed by the ActiveWords System
3. MIKE 330
4. Monitor 110
5. State Table 450
6. Archiving User Text
7. The Control Center
8. Run-Time Operation of the ActiveWords System
9. The Toggle function, Pop-Up Window, Charm Box
10. Charm Words
C. The Application Programing Interface
D. Agents
E. Multi-Item Resolution
F. Portability
G. Third Party Application Programs
V. Examples of Using the ActiveWords System
VI. Conclusion
I. Overview
A. The ActiveWords System
The present invention, referred to herein as the ActiveWords system, provides a semantic user interface (SUI). The SUI allows a user to use his everyday natural language or user defined words to operate a computer and/or manipulate the user's content in a highly efficient manner. In short, every keystroke, every word, or group of words is actionable. Consequently, a computer user's productivity can be dramatically increased by using action words that the user designates to activate controls and features. This allows the user to produce most of his work through simple, natural language commands. The present invention provides an intuitive interface that enhances the operation of the current standard window-based interface (also referred to herein as a Graphical User Interface (GUI)) in a simple natural manner. By leveraging the richness and power inherent in a user's language, the present invention allows the personal computer to operate in a manner that is much closer to the way people interact with each other using words.
The present invention provides a simpler and more natural way to work with the objects, applications, information requests (i.e., queries), and the like that constitute each user's "vital few." The vital few is each users unique pattern of using objects (e.g., applications, files, folders) and processes (e.g., computer controls and applications features) that comprise the user's sweet spot. The SUI allows the user to activate his/her vital few, much more quickly and efficiently than he can using the GUI. Because the GUI is ideal for organizing the 100% of what is available, the user will continue to rely on the GUI to explore, discover and activate the 80% of things he seldom uses. For example, Windows 95 installs approximately 9000 items (applications, files, parameters, etc), all of which are accessible via the GUI. However, only a subset of 50-300 of these items comprise the average user's vital few. As such, in accordance with the present invention, the SUI provides a mechanism to access this subset of information, referred to as the vital few, in an effective way.
The present invention is a system that acts upon human language text that arrives at the user's desktop computer. Text can be entered directly by the user via keyboard or voice. In the case of voice, voice-to-text software is provided to translate the voice signals. Alternatively, text may arrive via e-mail text, Internet page text, or other forms of text from other sources. This text, referred to as "given text," can be selected by the user using conventional point and click technology. Once text has been entered or selected in this fashion, the text is passed to the present invention to determine its actionability. If the text is actionable, the present invention executes the designated action.
The present invention uses the same text input stream that the user employs to input data to applications and applications documents. The present invention constantly monitors the text input stream and takes appropriate action when it senses a command from the user. The ActiveWords system works all of the time and in all contexts (i.e,. within any application program or within the operating system workspace). The ActiveWords system accesses that text input stream prior to its access by an application the user may be using at any given time.
ActiveWords system exploits natural language by providing a single-word (or multi-word) logic interface, referred to herein as the SUI. That is, every word (or for that matter keystroke) entered or selected by a user is actionable. The term "single-word" as used in this document means any word that has meaning in the user's natural language (e.g., "word" for wordprocessor) or a set of letters that only has a predefined meaning to the user (e.g,. "wp" for wordprocessor). The present invention also provides for multi-word expressions. That is, two or more words may activate a service. Implementation of a multi-word embodiment will be readily apparent to one skilled in the art after reading the detailed description provided below for the single-word embodiment.
As a result of the present invention, the rich naming logic of natural language can be incorporated into a user interface. Computer users can now leverage their natural language abilities to assign names of their choosing for all their computer activities, including launching application programs, controlling application program operations, replacement of text, searching, retrieval of information, and the like.
A user is enabled to enter "utterances." Each utterance has the potential to control the operations of the computer. An "utterance" is any natural language word or group of words, string of letters or symbols, etc. followed by an delineator (e.g., a space bar or punctuation mark). The present invention checks each utterance against a wordbase to determine whether it is an action word (i.e., a word that when entered or selected triggers an action). The present invention thus senses the text stream for action words and automatically erases them when they are encountered. Action words are user defined. The action words allow a user to launch applications and navigate within applications by using language rather than clicks from a pointing device such as a mouse. The present invention, alternatively, replaces an utterance with designated words. The combined set of all user-defined action words, as well as a history of the user's past actions, constitute an ActiveWords user profile. That profile is transportable from one computer to another.
The present invention creates an environment where there are two classes of utterences that users can enter into their computers: content words and action words. Action words are divided into two groups: dual words or code words. Content words are words entered into the text stream that the user intends as input to some document, file, or directory. Examples include word processing text in a memo, file names in the Microsoft Windows directory, numbers in a spreadsheet.
Action words are a new category of words introduced by the present invention that are actionable within the SUI. Action words are thus words that users place into the text stream as requests for specific services from the present invention. Code words are action words the user makes up or which are not part of his natural language lexicon (e.g., not in the standard dictionary). For example, typing "msword" to launch Microsoft's Word application is an example of entering an code word. Dual words are utterences that can be either ordinary content words or action words, depending on the user's intention in typing the word. The user may type "excel" because he intends it to be a content word in his application text, or, alternatively, he may type it because he wants to use it as an action word for opening Microsoft Excel. Content words are not action words because the user does not intend them to be action words. As will be shown below, the present invention provides a simple mechanism for designating whether an entered word is an action word or a content word.
For many functions, the SUI offers the user a faster and simpler alternative to reaching for the mouse and using the graphic user interface (GUI). On a case by case basis, the user decides which interface (GUI or SUI) is most convenient for accomplishing his intended result. Typically, the SUI becomes the preferred, least effort path, for accessing the vital few. In a short time, the user settles into an optimum routine that combines his use of the GUI with his use of the SUI.
B. Active Word Services
The ActiveWords system provides two types of services: action services and archive services. The action services sense keystrokes, symbols and words within the text stream. If an action word is entered, the ActiveWord system takes whatever action the user has specified (i.e., each action word has at least one associated action associated therewith) for that action word. Action services are divided into five groups: command functions, content functions, navigation functions, information functions and complex functions. The archive services maintains a record of all the text the user enters as input via keyboard or voice. As stated above, these action and archive services are designed to be available at all times and within any context, so long as the computer's operating system is running. Both types of services will be discussed below.
1. Action Services
The present invention can be used to activate command functions. Command functions include, for example, window controls (e.g., resizing a window) and applications controls (e.g., save, print, search, view, open, etc.).
The present invention can also be used to activate content functions. Thus, action words can be used to achieve content results, such as text substitutions, punctuation, text formatting, text content transformation, and the like. In particular, the ActiveWords system can be used to perform text content substitutions, such as the detection and a correction of double capitals (e.g., THe becomes The), abbreviations, expansions (e.g., ceo becomes Chief Executive Officer) and large text insertions. The content functions further include insertion of punctuation, such as quotes and contractions. Still further, the content functions include formatting, such as complex formatting for programming, or for name and addresses. Finally, the content functions include content transformations, such as language translations (e.g., English to French), number to text conversion, currency conversion (e.g., dollars to pounds or yen), in-place arithmetic (e.g., replace "100+300" with "400"), date transformations (e.g., 7/1/97 to July 1, 1997), data conversions (e.g., chemistry symbols and acronyms), and the like.
The present invention can further be used to activate navigation functions. Thus, active words can be used to launch application programs and navigate within an application program. For example, a single-word, such as "excel," can be used to launch a spreadsheet program from anywhere within the working environment of a user's computer. The user can use action words to navigate between different views in an application (e.g., navigating between months, dates, weeks in a calendar/planning application, such as Ecco). Documents within a wordprocessor can also be opened via an action word. Accordingly, each of any number of documents or files in a user's computer can be assigned an action word. Furthermore, the user can launch various services that affect her computer (e.g., backup of the hardrive) via an action word. These services can be launched within the user's computer or across a network of computers.
The present invention can also be used to locate information within a user's computer or from external sources. For example, an action word can be used to trigger a directory search or a database search. Another action word may be used to trigger an Internet search (e.g., find "xxxx" at the Wall Street Journal web site). Yet another action word can retrieve a specific file or record available via the Internet, extranet or intranet.
Finally, the ActiveWords system can be used to trigger and/or perform complex functions, such as dialing a person's telephone number or dialing a person's beeper service and send a message to that beeper. The ActiveWords system also provides four information and software resources, which are described in greater detail below, referred to as the toggle function, pop-up window function, charm-box function and charm-word function.
Note that most of the computer services and functions discussed above are already available within a user's computer (e.g., launching a program) or within a single application program (e.g., text replacement or searching a database). However, access to these services and functions is almost always context dependent in that the user has to leave where she is (e.g., Excel) and navigate to a specific tool or application service (e.g., Windows 95 start find menu) to obtain the service or control she needs. From the perspective of the user, that is a cumbersome and time consuming method. The user must find the service within the GUI's maze of pull-down windows or to use difficult to remember keystrokes that include control characters (e.g., ctrl, alt). The present invention allows a user to utilize his everyday language to activate these services, programs, functions, etc. from any context in the computer. The service script will navigate to the appropriate tool or context and perform designated action.
2. Archive Services
The archive service records and stores all the text a user inputs via keyboard or voice-to-text. The ActiveWord system tags the text with identifying information, such as date, application name and/or document or file name. The archive can thus be searched based on the actual text entered by the user in combination with the identifying information. The present invention further creates a so called 7.times.7 data repository, which is a database that is divided into seven categories, each category having seven subcategories.
Provided below is a detailed description of a system architecture for implementing a preferred embodiment of the ActiveWords system, along with an operational description of the present invention. Finally, this document concludes with a set of examples that illustrate practical applications for the present invention.
II. Exemplary Environment
The present invention may be implemented using hardware, software or a combination thereof and may be implemented in a computer system or other processing system. An example computer system 801, which can be installed with the present invention, is shown in FIG. 8. The computer system 801 includes one or more processors, such as processor 804. The processor 804 is connected to a communication bus 802. Various software embodiments are described in terms of this example computer system. After reading this description, it will become apparent to a person skilled in the relevant art how to implement the invention using other computer systems and/or computer architectures.
Computer system 802 also includes a main memory 806, preferably random access memory (RAM), and can also include a secondary memory 808. The secondary memory 808 can include, for example, a hard disk drive 810 and/or a removable storage drive 812, representing a floppy disk drive, a magnetic tape drive, an optical disk drive, or the like. The removable storage drive 812 reads from and/or writes to a removable storage unit 814 in a well known manner. Removable storage unit 814, represents a floppy disk, magnetic tape, optical disk, or the like, which is read by and written to by removable storage drive 812. As will be appreciated, the removable storage unit 814 is a computer usable storage medium having stored therein computer software and/or data.
In alternative embodiments, secondary memory 808 may include other similar means for allowing computer programs or other instructions to be loaded into computer system 801. Such means can include, for example, a removable storage unit 822 and an interface 820. Examples of such can include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 822 and interfaces 820 which allow software and data to be transferred from the removable storage unit 822 to computer system 801.
Computer system 801 can also include a communications interface 824. Communications interface 824 allows software and data to be transferred between computer system 801 and external devices. Examples of communications interface 824 can include a modem, a network interface (such as an Ethernet card), a communications port, a PCMCIA slot and card, etc. Software and data transferred via communications interface 824 are in the form of signals which can be electronic, electromagnetic, optical or other signals capable of being received by communications interface 824. These signals 826 are provided to communications interface via a channel 828. This channel 828 carries signals 826 and can be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RF link and other communications channels.
In this document, the terms "computer program medium" and "computer usable medium" are used to generally refer to media such as removable storage device 812, a hard disk installed in hard disk drive 810, and signals 826. These computer program products are means for providing software to computer system 801.
Computer programs (also called computer control logic) are stored in main memory and/or secondary memory 808. Computer programs can also be received via communications interface 824. Such computer programs, when executed, enable the computer system 801 to perform the features of the present invention as discussed herein. In particular, the computer programs, when executed, enable the processor 804 to perform the features of the present invention. Accordingly, such computer programs represent controllers of the computer system 801.
In an embodiment where the invention is implemented using software, the software may be stored in a computer program product and loaded into computer system 801 using removable storage drive 812, hard drive 810 or communications interface 824. The control logic (software), when executed by the processor 804, causes the processor 804 to perform the functions of the invention as described herein.
In another embodiment, the invention is implemented primarily in hardware using, for example, hardware components such as application specific integrated circuits (ASICs). Implementation of the hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art(s).
In yet another embodiment, the invention is implemented using a combination of both hardware and software.
III. Capturing Utterances Entered by the User
A preferred embodiment of the present invention is designed to operate with Windows 95, an operating system designed and distributed by Microsoft Corporation. However, the present invention contemplates operating with any present or future developed operating system, including Windows NT. For convenience, the present invention is described with reference to the Windows 95 Operating System. The present invention is configured to always be active in the background, similar to a real-time monitoring system. Every time a computer implementing the present invention is turned on, the operating system launches the present invention.
FIG. 1 illustrates how the present invention captures the keystrokes (i.e., data entered by a user via a keyboard attached to computer system 801) of the user. The present invention operates with an architecture capable of monitoring for system wide inputs. This broad I/O capability can be provided under the Viral Machine Manager (VMM 120) that is available under Win32. The VMM 120 is an extensible operating system whose core and standard components are provided by Microsoft Corporation. By writing additional modules called VxDs (virtual device drivers), software and hardware vendors can complement the VMM 120. The core of the present invention, monitor 110, is implemented as a VxD (referred to as a Virtual Input Driver or VID) under the Win32 bit environment.
The heart of the Windows 95 architecture consists of two features: the dynamic VxD loader (VXDLDR.386) and the layered I/O system provider VxD (IOS.386). It is the main responsibility of the IOS VxD to catch I/O calls that user-mode applications perform to file storage devices and route them to a set of layered VxDs that will cooperatively process the calls.
Under Windows 95, a VxD can be loaded dynamically from another VxD, from a 16-bit user-mode Windows or DOS based application, or from a Win32-based application. To load a VxD from another VxD, the services from the VXDLDR VxD can be used. A 16-bit user-mode application obtains the VXDLDR's entry point and passes the location of the VxD to load to the VxD loader. Once the VxD needs to be unloaded, the application passes the module name of the VxD to unload to the VxD loader. Unfortunately, there is no such thing as a VxD handle that the user-mode application could use for that purpose; either the module name or the VxD ID must be known to the application in order to unload the VxD. A Win32-based application must open the VxD using the CreateFile Win32 API to obtain a handle to the VxD, and use the DeviceIOControl API to communicate with the VxD.
FIG. 1 shows the location of where the monitor VID 110 is placed under the Windows 95 Operating System in order to be able to monitor all inputs (i.e., keystrokes and mousestrokes). Hardware 102 includes a keyboard, mice, microphone, or the like. The hardware 102 forwards the users input to a kernel 104. The operating system includes kernel or inner layer 104 and upper layer 106. Both components include a plurality of window components. The components (i.e., VM's, mangers, drivers; VxD's, and VPICD) illustrated in FIG. 1 are well known in the art of operating systems, and do not directly affect the present invention. As such, for the sake of brevity, these components will not be explained in detail herein.
The monitor VIDI 110 is positioned between the two operating system components 104, 106 such that a user's keystrokes or mouse signals are captured prior to being forwarded to application program(s) 118. The present invention requires that the keystrokes entered by the user be captured prior to the operating system forwarding the keystrokes to the foremost application program 118. Voice signals are treated separately since they require additional processing to convert to text, which is done using third party voice-to-text software. The ActiveWords system will capture the text characters from the voice-to-text software before it is provided to the application program 118. Once captured, the source of the text is irrelevant to the present invention.
The Monitor VID 110 is graphically represented to the user in accordance with the present invention via a monitoring bar 325, as shown in FIG. 14. The monitoring bar 315 will be described in greater below in Section IV. 4. Generally, the monitoring bar 315 has two data fields: text field 1410 and feedback field 1420 and a number of icons. Icon 1415 provides access to a productivity center. Icon 1430, shown as C.sup.c, provides the user with access to a control center, which is a central place to manage the present invention. Icon 1460 is referred to as a "Mr. IBeam. " Icon 1470 allows a user's profile to be changed. Icon 1440 provides access to a LightEditor (FIG. 19). Icon 1450 provides the user with a find function (FIG. 26). Icon 1480 provides an advanced find feature (FIG. 25). Icon 1490 allows a user to select text, e.g., from a notepad, spreadsheet, e-mail, word processing document, etc.
IV. Architecture and Operation of the Present Invention
As discussed above, the present invention provides semantically driven functionality, thereby making the user's computer "language aware." The present invention is responsive to action words, which are the natural language text entered by the user either via keyboard or voice. Additionally, the user can select a word from a document, e-mail, database or Internet via his mouse and submit the word to the ActiveWord system as a potential action word. If the word is an action word, the system will react exactly as if input by the user via a keyboard.
The present invention operates in the background and takes appropriate action when it senses an action word. The present invention is seamlessly integrated with the operating system of the user's computer thereby making it unobtrusive to the user. In an alternate embodiment, the present invention is incorporated into the operating system software. For the user's convenience, the present invention provides a number of user signals and graphical aids that help the user work with the SUI. Described below is the general architecture and operation of the SUI, and its associated components.
The ActiveWords system monitors the user's data input whenever his computer is running, unless the ActiveWords system is turned off by the user. In a preferred embodiment, the user can place the ActiveWords system in "sleep" mode (via, for example, an action word), such that inputted text is not monitored. The services of the present invention are available in all contexts and at all times. Being context free and "aware" of the user's natural language and language(s)-of-art enables the ActiveWords system to assist the user in many useful ways.
Context independence is essential to the effectiveness of the present invention. The present invention works in the same way, no matter what context the user is working in when he requests a service. It makes no difference if the user is working in an application program, a utility program, an Internet browser, or in an operating system work space. The ActiveWords system does not interfere with whatever text services his applications provide. The user can use the full text services of Microsoft Word, for example, along with the full text services of the ActiveWords system. It compliments these application text services by providing greater depth of functionality and universal, context free, operation. This context-free operation enables the user to become reflexive in his use of action words.
Reflexive use means that the behavior in question is unconscious on the part of the person that performs that behavior. Stepping on a break pedal, for example, is reflexive for an experienced driver. Pointing with a mouse or other pointing device is reflexive for an experienced GUI user. These behaviors would not become reflexive if the break pedal only worked to slow the car on some streets, or if the pointing device only worked to move the cursor in some applications and not in others. Because these devices are reliable and work in the same way all the time and in all contexts, the user can become unmindful of them, thereby entrusting those behaviors to her reflexes. From then on, she performs the behavior automatically whenever she desires the result of that particular behavior.
The ActiveWords system may be viewed as providing a virtual personal computer within the user's actual computer. With ActiveWords, the user can give his own names (i.e., action words) to his computer's objects, processes, and features. He is no longer a captive of the interface and naming choices that others have provided. Every user's natural language vocabulary is unique to some degree. His SUI needs to reflect that uniqueness. The ActiveWords system enables each user to use and leverage his own terminology, his own mnemonic metaphors, and the structure of his personal language profile. It seems obvious that an English metallurgist who is an amateur astronomer should have an SUI that is significantly different from the SUI of a French businessman who is interested in soccer.
A. Action Words and Content Words
There are two types of Action Words: code words and dual words. A code word is any character string the user reserves for the purpose of signaling the present invention to provide him with a service. By designating a code word, a user is signaling his intention to never use this combination of letters, symbols, etc. as a content word. The ActiveWords system knows, therefore, that whenever it senses a code word, it may immediately erase it from the text stream. After erasing the code word, the present invention executes a service script associated with that code word. In the rare event when the user wants to type the code word as a content word, he simply turns the SUI off temporarily. In a preferred embodiment, an action word is provided to activate a service script that turns the monitor window off until the next word has been input. Alternatively, an icon on the monitoring bar 315, such as Mr. I-beam, can be used to toggle between sleep and awake mode.
A dual word is any word in the English dictionary (e.g., "file") or a word of-art that has a special meaning in a personal or professional context (e.g., "walkthrough" for programmers). In other words, a user may want a word to have a dual purpose: (1) a content word to be used in an application and (2) an action word to trigger a service. When a dual word is sensed, the present invention recognizes it as an utterance having a dual nature, in that it may be intended either as a content word or action word. Accordingly, when it encounters such an utterance, the present invention must be told by the user that it is an action word (i.e., the user must disambiguate the dual word).
In a preferred embodiment, the present invention provides the user with a simple method for declaring his intention: a double press of the space bar. If the user's intention is to use the entered dual word as a content word, the user does not press the space bar twice. In that event, the present invention ignores the word and continues sensing for the next action word. If his intention is to use it as an action word, the present invention immediately erases the word from the text stream and executes the service script associated with that action word. As should be readily apparent to one skilled in the art, other techniques can be used for disambiguating a dual word.
The present invention is language neutral. In other words, regardless of the user's natural language, English, Spanish, German, French, etc., the present invention operates the same. The user can designate any word(s) as an action word(s). The user can use any nicknaming logic for creating action words. For example, the user might use "ms" as an Code Word prefix to trigger service scripts related to various Microsoft application programs. Accordingly, "msw" could be the code word used to launch Microsoft Word, "mse" to launch Microsoft Excel, "msp" to launch Microsoft Powerpoint, "msa" to launch Microsoft Access, and so on. Obviously, a suffix can also be used instead of a prefix to trigger service scripts. Alternatively, the user can create code words without mnemonic aids such as suffixes and prefixes.
B. Runtime Operation
FIG. 3 is a block diagram of the present invention during runtime operation. The present invention includes a Virtual Input Driver (VID) 110, a microkernel engine (MIKE) 330, a monitoring bar 315, agents 370, agent registry and services 360, third-party applications 118, a wordbase 340, a profiles registry 350, control center 345 and set-up files 335. Window applications 118 include word processors, spread sheets, presentation software, utilities, and the like. The agents 370 are application programs that are dependent upon the present invention (i.e., require input from MIKE 330 to operate), as described in greater detail below. MIKE 330 uses a scripting language to launch an application program(s) 118 or to control functions and features of application program(s) 118. Each function is performed by a service script, which is associated with each action word within the wordbase 340.
MIKE 330 is made up of several components and is shown in further detail in FIG. 4. In operation, a user 310 enters an input via a keyboard or selects text via a mouse. This input is captured by VID 110. All typed keystrokes are received by the VID 110, which extends the functionality of the Win 95 Operating System, before they are dispatched to the applications 118. In other words, the input text stream is "hooked" by the VID 110. In a preferred embodiment, a mouse input is received by both the VID 110 and the Windows applications 118. In other words, the VID 110 only monitors and senses the activity of the mouse. (The present invention monitors the mouse since the clicking of the mouse indicates a change of context or the end of an utterance, which is analogous to pressing the space bar.) In an alternate embodiment, user input 310 is entered via a microphone.
The user input is then forwarded to MIKE 330. When MIKE 330 is inactive, the VID 110 retransmits all user inputs back to the foremost Windows application. The initial settings of MIKE 330 and monitoring bar 315 are stored in the start-up files 335, which are read at start-up and written to after changes or shut-down. Each user has their own start-up files 335.
MIKE 330 displays in the monitoring bar 315 the characters input by the user. It also sends feedback messages and displays activity indicators through monitoring bar 330. The user can interact with MIKE 330 through pop-up menus, as well as via the controls associated with monitoring bar 315. These controls include changing the current user profile, capturing selected text, launching the LightEditor, launching the Control Center, bringing in the Advance Find from the Control Center, displaying Mr. IBeams productivity center, turning on/off the monitoring bar 315, and going into "sleep" mode.
The profiles registry 350 is a listing of all available user profiles. The concept of user profiles is discussed in more detail below. All agents are registered in registry 360. The control center, which is a central place to manage the present invention, has access to the wordbase 340, monitoring bar 315, profiles registry 350, agent registry 360 and agents 370. Each major component of the present invention will be described in detail below.
1. Wordbase 340
MIKE 330 searches for action words or dual words stored in the wordbase 340. In a preferred embodiment, wordbase 340 is a relational database that is constructed using Jet Engine.RTM. available from Microsoft Corporation. Wordbase 340 is where all third party applications register their set of action words. The present invention contemplates, for example, a law wordbase, a medical wordbase, a business wordbase, etc. Thus, the medical wordbase, for example, will include a set of dual words, code words and associated scripts that are specific to the practice of medicine. Upon installation, each of these "third-party wordbases" will be seamlessly incorporated into a user's wordbase 340.
Each action word and it's associated service script comprise an active wordbase item record. Each wordbase item record includes the code word and/or the dual word that will trigger the execution of the service script. A detailed illustration of each wordbase item record is shown in FIG. 21.
When an action word match is found within wordbase 340, MIKE 330 accesses the wordbase 340 and retrieves the service script associated with the active word or dual word. The service script provides a content, retrieval, navigation, information or command service, or a combination of these. Additionally, the wordbase 340 records statistical information concerning the code word or dual word, such as incrementing a hit count, updating last access time, etc. These counts are recorded in the related wordbase item records and are used by the productivity center (FIG. 11) to provide statistical data to the user. The statistical data is used by the user to leverage the ActiveWords training features and improve his productivity. The operations of add, delete and modify can be performed by a user on wordbase 340 via the control center 345 or via a light editor (FIG. 19, which is described in detail below) as should be apparent to a person skilled in the art.
Every time the present invention senses that the user has finished a word, it searches the wordbase 340 to see if that word is in an item record as a code word or dual word. There are three possible outcomes of searching for a word in the wordbase 340:
(1) A matching code word is found in an wordbase item record. In this case, the typed word is immediately erased and the accompanying service script is executed.
(2) A matching dual word is found in an active wordbase item record. In this case, the ActiveWords system immediately gives the user audible and/or visual signals. FIG. 27 illustrates a visual display (i.e., a banner) that can be provided to the user to indicate that a dual word has just been entered. In this example, "Excel" has been typed. The ActiveWords system provides a visual message in the banner--"Dual Word detected. Press SPACE to use it." Additionally, when the present invention senses a dual word it provides an audible signal, such as a bell or whistle. The visible signal can also be provided via a change in where the "eyes" are looking in the Mr. IBeam icon 1460 on the monitoring bar 315. These signals notify the user that he has the option to treat that dual word as either an action word or as a content word. If the user intends the dual word to be an action word, he presses the spacebar a second time. The ActiveWords system immediately erases the word from the application text input stream and executes the accompanying service script provided within the associated wordbase item record. Obviously, keystrokes other than an additional space character can be designated (by the user) to signal the user's choice to treat a dual word as an action word. If, on the other hand, the user intends the dual word to be a content word, he simply continues typing. The ActiveWords system does nothing with respect to that content word, and continues monitoring the text stream for the next active word.
3) No match is found in the wordbase 340. The word is, therefore, assumed to be a content word. The ActiveWords system takes no action, and continues monitoring the text stream for the next action word.
Referring to FIG. 21, each record within the active wordbase includes a plurality of fields. Field C indicates the activation state, on/off, of the code word for this record. Field CW is the code word. Field D indicates the activation state, on/off, of the dual word. Field DW is the dual word. The Comment field allows the user to associate a comment with his action words.
The Action field contains the service script that will be executed upon the activation of an action word. The Category field contains information regarding the category/subcategory indicating where the record is registered. The Editing field defined the way the item is going to be edited. The item record can be edited as free text, free substitution, phone number, address, etc. The Action Type field designates the rules the present invention will follow in executing the script for that particular item. The action type can be one of the defaults--substitution, command, navigation--or the name of an external agent that will perform the action. The Extra field allows the user to provide additional information concerning the action word.
The CWCount field keeps track of the number of times the code word as been used. The DWCount field keeps track of the number of times the dual word has been used. The Xid field shows a special action to be performed. For example, the action or replacement is in the Extra field or the clipboard will be used to make a substitution or the substitution is a password the content of which will not be shown in monitoring bar 315 or enable markup language for this item record. The Modified field shows the last date/time the record was modified. The Accessed field shows the last time the script specified in the action field was executed. The Signature field indicates the creator of the record. The Flags field is system defined. The present invention is not limited to having only these fields within wordbase 340 and other fields are contemplated (e.g., security, product administration, application priority).
The user gains tremendously if any word, in any language, can be used to signal the ActiveWords system. By using words and thereby incorporating natural language logic directly into the SUI, the ActiveWords system becomes very powerful. The ActiveWords system achieves this power by allowing the user to associate service scripts with either code words or dual words, whichever is easiest for him to recall.
The service script specifies the service to be performed whenever the action word(s) within the item record is sensed. Service scripts in the ActiveWords system are written in scripting language. For example, a script for using the previous word a user typed as the find target for a search of a file directory in Windows 95, looks like this.
<erase last word><winstart>f<Iwinstart><delay><last word><enter>
(This script erases the last word type--activates the winstart key--types the letter "f" that triggers the windows find tool--closes the winstart key--waits for 600 ms--and calls in the last word typed--and presses enter to launch the find operation).
Those skilled in the art will readily appreciate that the specific scripting language used is implementation specific. In a preferred embodiment, the scripting language syntax is similar to HTML. An exemplary subset of the scripting language used in the present invention is provided below with reference to TABLE 1.
TABLE 1
__________________________________________________________________________
1 <F1> Function 1 key.
2 <F2> Function 2 key.
3 <F3> Function 3 key.
4 <F4> Function 4 key.
5 <F5> Function 5 key.
6 <F6> Function 6 key.
7 <F7> Function 7 key.
8 <F8> Function 8 key.
9 <F9> Function 9 key.
10 <F10> Function 10 key.
11 <F11> Function 11 key.
12 <F12> Function 12 key.
13 <LT> Lower than character"<".
14 <GT> Greater than character">".
15 <ESC> Escape key.
16 <DEL[:##]> Delete key (for deleting) [repeated ## times].
17 <TAB[:##]> Tab key [repeated ## times].
18 <BACK SPACE[:##]> Back space key (for deleting) [repeated ## times]
<BACKSPACE[:##]>
<BS[:##]>
19 <ENTER[:##]> Entry key [repeated ## times].
20 <UP[:##]> Up arrow key [repeated ## times].
21 <DOWN[:##]> Down arrow key [repeated ## times].
22 <LEFT[:##]> Left arrow key [repeated ## times].
23 <RIGHT[:##]> Right arrow key [repeated ## times].
24 <HOME> Home key (goes to beginning of line, or top of a
list).
25 <END> End key (goes to end of line or bottom of a
list).
26 <WINSTART></WINSTART>
Windows95 special key to activate the "START"
button.
27 <WINMENU> Windows95 special key to simulate a right mouse
click.
28 <ALT></ALT> <ALT> simulates the Alt key down,
</ALT> simulates
the Alt key up. An <ALT> must always be closed by
an
</ALT>.
29 <CTRL></CTRL> Same as Alt but with the Control key.
30 <SHIFT></SHIFT> Same as Alt but with the Shift key.
31 <ALTGR></ALTGR> Same as Alt but with the AltGr key. This key is
included
in some keyboards for special characters.
32 <WAIT[:###] Waits 600 milliseconds (.6 seconds) [or waits the
<DELAY[:###]> number of milliseconds indicated by the number].
33 <MINIMIZE WINDOW> Minimize window.
34 <MAXIMIZE WINDOW> Maximize window.
35 <RESTORE WINDOW> Restore window.
36 <CLOSE WINDOW> Close window.
37 <NEXT WINDOW> Next window.
38 <PREVIOUS WINDOW> Previous window.
39 <MOVE WINDOW> Moves the window.
40 <SIZE WINDOW> Sizes the window.
41 <MONITOR POWER> Sets the state of the display. This command
supports
devices that have power-saving features, such as
a
batter-powered personal computer.
42 <SCREEN SAVER> Executes the screen saver application specified in
the
[boot] section of the SYSTEM.INI file.
43 <APP EXIT> Exists the current application.
44 <CLOSE DOCUMENT> Close the current document (only for MDI
Application).
45 <MINIMIZE ALL> Minimize all windows.
46 <CLOSE APP> Close the current application (same as Close
Window).
47 <ActiveWord[:WAIT]> Can be any ActiveWord already existing in any
glossary.
[If AW is an ActiveWord to launch an application,
the
WAIT parameter indicates that ActiveWords should
wait
until the launched app is up and running to
continue
analyzing the rest of the Action]
48 <LAST WORD[:##]> Retrieves the last word from the list of Last
Typed
Words (LTW) aud places it where the current focus
is
[or retrieves the ## word from the list of LTW].
49 <LAST REPLACED WORD[:##]>
Retrieves the last word from the list of Last
Replaced
Words (LRW) and places it where the current focus
is
[or retrieves the ## word from the list of LRW].
50 <ERASE LAST WORD[:##]>
Deletes the last word typed [or deletes the ##
word from
the list of LTW].
51 <ERASE LAST REPLACED WORD[:##]>
Deletes the last word replaced [or deletes the ##
word
from the list of LRW].
52 <LAST LINE[:##]> Retrieves the last line from the list of Last
Typed Line
(LTL) and places it where the current focus is
[or
retrieves the ## line from the list of LTL].
53 <LAST REPLACED LINE[:##]>
Retrieves the last line from the list of Last
Replaced Line
(LRL) and places it where the current focus is
[or
retrieves the ## line from the list of LRL].
54 <LAST APP[:##]> Retrieves the last application name from the list
of Last
Applications Used (LAU) and places it where the
current focus is [or retrieves the ## application
name
from the list of LAU].
55 <LAST AW[:##]> Retrieves the ActiveWord from the list of Last
Typed
ActiveWords (LTAW) and places it where the
current
focus is [or retrieves the ## ActiveWord from the
list of
LTAW].
56 <LAST NESTED AW[:##]>
Retrieves the ActiveWord from the list of Last
Replaced
ActiveWords (LRAW) and places it where the
current
focus is [or retrieves the ## ActiveWord from the
list of
LRAW].
57 <LAST DW[:##]> Retrieves the DualWord from the list of Last
Typed
DualWords (LTDW) and places it where the current
focus is [or retrieves the ## DualWord from the
list of
LTDW].
58 <MORE INFO> Retrieves information related with the last AW
typed,
from the Comments field.
59 <MORB INFO:COMMENTS>
Same as above.
60 <MORE INFO:ACTION> Retrieves information with the last AW typed, from
the
Action field and writes it as a replacement
ignoring Type
and MarkUp Language tags.
61 <MORE INFO:COUNT> Retrieves information related with the last AW
typed,
from the Count field.
62 <MORE INFO:NORMAL> Retrieves information related with the last AW
typed,
from the DualWord field.
63 <MORE INFO:EXTRA> Retrieves information related with the last AW
typed,
from the eXtra field.
64 <MORE INFO:MASK> Retrieves information related with the last AW
typed,
from the Mask field.
65 <MORE INFO:CATEGORY>
Retrieves information related with the last AW
typed,
from the Category field.
66 <MORE INFO:XID> Retrieves information related with the last AW
typed,
from the Xid field.
67 <MORE INFO:AWAPP> Retrieves information related with the last AW
typed,
from the AWApp field.
68 <NESTED MORE INFO> Retrieves information related with the last nested
AW,
from the Comments field.
69 <NESTED MORE INFO:COMMENTS>
Same as above.
70 <NESTED MORE INFO:ACTION>
Retrieves information related with the last nested
AW,
from the Action field and writes it as a
replacement
ignoring Type and MarkUp Language tags.
71 <NEXTED MORE INFO:COUNT>
Retrieves information related with the last nested
AW,
from the Count field.
72 <NESTED MORE INFO:NORMAL>
Retrieves information related with the last nested
AW,
from the DualWord field.
73 <NESTED MORE INFO:EXTRA>
Retrieves information related with the last nested
AW,
from the eXtra field.
74 <NESTED MORE INFO:MASK>
Retrieves information related with the last nested
AW,
from the Mask field.
75 <NESTED MORE INFO:CATEGORY>
Retrieves information related with the last nested
AW,
from the Category field.
76 <NESTED MORE INFO:XID>
Retrieves information related with the last nested
AW,
from the Xid field.
77 <NESTED MORE INFO:AWAPP>
Retrieves information related with the last nested
AW,
from the AWApp field.
78 <DW MORE INFO> Retrieves information related with the last DW
typed,
from the Comments field.
79 <DW MORE INFO:COMMENTS>
Same as above.
80 <DW MORE INFO:ACTION>
Retrieves information related with the last DW
typed,
from the Action field and writes it as a
replacement
ignoring Type and MarkUp Language tags.
81 <DW MORE INFO:COUNT>
Retrieves information related with the last DW
typed,
from the Count field.
82 <DW MORE INFO:AW> Retrieves information related with the last DW
typed,
from the ActiveWord field.
83 <DW MORE INFO:EXTRA>
Retrieves information related with the last DW
typed,
from the eXtra field.
84 <DW MORE INFO:MASK> Retrieves information related with the last DW
typed,
from the Mask field.
85 <DW MORE INFO:CATEGORY>
Retrieves information related with the last DW
typed,
from the Category field.
86 <DW MORE INFO:XID)> Retrieves information related with the last DW
typed,
from the Xid field.
87 <DW MORE INFO:AWAPP>
Retrieves information related with the last DW
typed,
from the AWApp field.
88 <UNDO> Undoes the last replacement.
89 <DATE> Inserts the current date.
90 <TIME> Inserts the current time.
91 <SCRATCH PAD> Brings up a text capturing window.
92 <DLL:DllName.dll:Function>
Calls the specified function from a .DLL. The
Function
parameter is case sensitive.
93 <LAST something[:N.vertline.LIST]][:D]>
Applies to all the "LAST" commands (e.g. word,
replaced word, line, etc). When a number is
specified,
the something in the Nth position is returned
(normal
behavior). The user can also specify a group of
elements
through a LIST. This list may have any of the
forms:
1-3
1,2,5
4-8
1,3,5-10
If the last parameter is D, the last something(s)
are
returned with their respective delimiters.
94 <NOTIFICATION[:Bannertype][Sound file]>
Indicates that a notification must be presented
when the
term is hit. The Banner Type can be:
GO
FIND
CLOSE
If no Banner Type is specified, the default for
all other
actions is DEFAULT. The user can specify a sound
file
other than the default.
95 <ONLY:Appl,App2 ...AppN>
Specifies that the current CW and DW should only
be
executed if they are being called from one of the
specified applications.
96 <NOT:App1,App2 ...AppN>
Specifies that the current CW and DW should not
be
executed if they are being called from one of the
specified applications.
<USER INPUT[:Question]>
Brings up the ScratchPad as a text capturing
window,
with a user definable question or message.
97 <INPUT INFO> Inserts the information captured by the last call
to the
<USER INPUT>tag within the current script.
98 <{VARIABLE}> Replaces the tag for the value specified by
VARIABLE,
where VARIABLE can be other tags, such as LAST
WORD. The result is a new string to be evaluated.
99 <ED:{VARIABLE}[WORD1, WORD2 ...
Executes the respective CodeWord in positional
order
WORDN]:CWN1, CW2 ... CWN>
depending on the number obtained from resolving
the
VARIABLE, where if the result is 1 (one) the first
CW is
executed, if 2 (two) the second CW is executed and
so
on. If the result from resolving the VARIABLE
isn't a
number, but instead a word, following should be
the
same number of words to compare the VARIABLEs
value, and once again, depending on which word
matches, the corresponding CW in positional order
is
executed.
100
<WITH: Word15 ... Word2, Word1 .vertline. Word1>
Executes the rest of the script associated with
the item
containing the DualWord found, only if the
previous
words match the parameters. Where Word1 should
match with the LastWord Typed and so on. Each
word
separated by a comma is treated as a Boolean AND.
Each word separated by the .vertline. character is
treated as a
Boolean OR.
__________________________________________________________________________
Obviously, the present invention contemplates that the service script syntax and content will expand and evolve. The present invention is not limited to the service scripts provided in TABLE 1. Rather, TABLE 1 is merely exemplary, as should be readily apparent to those skilled in the art.
MIKE 330 supports several users and user profiles. On startup, MIKE 330 checks profiles registry 350. The current user and profile can be changed on-the-fly via either an active word or via an option control associated with monitoring bar 315 (i.e., icon 1470). FIG. 12 is a high level block diagram of a wordbase. It includes two user profiles 1230 and 1240 and a set of shared item records 1220. A list of all the user profiles and shared item records is provided via a master index 1210. The wordbases can be shared among different users on a system. The wordbase 340 may be stored at the network level (e.g., on a server) so that all users can obtain access and are read only. The present invention contemplates that the wordbase 340 will be accessible over a LAN, WAN, as well as other types of networks. Each user profile is a unique view into the shared wordbase that contains everything the user defines as his profile and the settings for these items. An editor 1235 is provided, which can be accessed via a control center 345, as described below, to edit the items contained in the user's profile.
Referring to FIG. 23, a view of wordbase 340 (as displayed by the control center 345) is shown. The master index 1210 is shown in window 2310. The master index 1210 is divided into drawers (e.g., Hobby, Places, etc.) and folders (e.g., Cities, States, etc.). FIG. 23 illustrates only six of the columns within wordbase 340. The columns of the wordbase 340 have been described with reference to FIG. 21, and for the sake of brevity will not be explained again.
FIG. 28 and FIG. 29 illustrate the concept of user profiles. A user's profile includes a combination of third party applications and wordbase item records, which are located in folders. Different profiles can be created by enabling/disabling the ActiveWords system for certain applications and by turning on/off folders of wordbase item records. Furthermore, drawers and folders can be assigned a priority.
FIG. 28 illustrates a list of applications (e.g., Microsoft Word, Ecco Pro, Internet Explorer, etc.). In a preferred embodiment, the present invention requires a user to configure an application after a user launches the application for the first time. These applications can be configured by the user to be on/off or placed in sleep mode. If an application is on, the ActiveWords system operates as described herein. If the application is off, the ActiveWords system is disabled while the user is using this application, but enabled in other contexts. Sleep mode disables the ActiveWord system, but still allows a user to enter action words via an ActiveWord Scratch Pad (FIG. 30). The Scratch Pad simply provides a text entry field to the user. While in sleep mode, action words entered directly into the application will not be sensed by the ActiveWords system.
FIG. 29 illustrates the drawers and folders that are part of a user profile called "My Profile." Profile names are user assignable. Drawers and folders can be turned on/off. Each folder contains a plurality of wordbase item records. By turning a folder "off," all wordbase item records within the folder are disabled. If a folder is "on" for a given profile, the profile is extended to include the pattern of wordbase item records in the folder that are turned on and off. For example, FIG. 23 illustrates that certain codes words and dual words can be disabled (e.g., the dual word "items" is disabled). Each folder is further assigned a priority. As such, if an code word appears in more than one drawer and folder, the service script within the highest priority drawer/folder will be executed. If a dual word appears in more than one drawer/folder, a preferred embodiment of the present invention provides for multi-item resolution, as described below.
A user can thus create multiple profiles by turning applications, drawers and/or folders on/off and by assigning priorities to each of the drawer/folder combinations. Thus, a user may have several user profiles: one for work, one for entertainment use of his computer, and several for each of his community and hobby interests. The windows shown in FIG. 28 and FIG. 29 are available via the control center 345.
The user profile allows, for example, an English speaking metallurgist who is interested in astronomy, to share a computer with someone having very different user profile. His sharing partner may be a French businessman who has an interest in soccer. Their respective user profiles are comprised of different selections (items on or off) and precedence-orders for the Applications in the Word Base.
An English or French user of the ActiveWords system will populate his wordbase 340 with code words and dual words that make sense to him as an English or French speaker. An English speaking metallurgist, for example, would have additional "word-of-art" items records (i.e., action words) related to metallurgy. These metallurgy terms enable the ActiveWords system to provide services tailored to the user's needs as a metallurgist. The user would specify that his metallurgy items records must override any items records that he has in the wordbase 340 for Standard English. Therefore, the service script associated with "steel" in his Metallurgy item record would override the service script associated with "steel" in his wordbase item record for Standard English.
An English spreading metallurgist would have service scripts associated with the word "mercury" in both his Standard English and Metallurgy Applications. When he is at work, his user profile priority settings tell the ActiveWords system to override associations for "mercury" in his Standard English item record in favor the his Metallurgy item record for "mercury." If our English metallurgist is also an amateur astronomer, he might have an wordbase item record for "mercury" as part of his ActiveWords Astronomy Application (a hypothetical application). One of his user profile's, that he uses for his hobby activities, would allow him to give the item records associated with his Astronomy Application precedence over the item records associated with his Standard English and Metallurgy Applications. In that case, any service scripts triggered by the planet name "mercury" would take precedence over service scripts triggered by the metal "mercury" in his Metallurgy or Standard English Applications.
In a preferred embodiment, code words and dual words are not sensitive to upper/lower case. As such, "Mercury" and "mercury" are handled in exactly the same manner.
The ActiveWords system leverages the precedence-order of words that appear in two or more wordbase items records (i.e., as part of two or more ActiveWords Applications). The system uses the precedence-order in the user profile to determine which service script should be triggered or otherwise given precedence when an action word matches two or more wordbase item records.
In this way, ActiveWords takes the user's universe of meanings and contexts into account, at the level of single-word or multi-word expressions. The ActiveWords system allows the user to designate and manage as many ActiveWords applications and user profiles as he requires.
ActiveWords enables the user to manage and organize his action words. The use of "mercury" above is a good example. In addition to managing its use in three contexts (Standard English, Metallurgy and Astronomy), the user may also wish to have "mercury" capitalized when he uses it as a planet's name. He may also want ActiveWords to substitute "Mercury" for "mc." The present invention allows the user to have one place to go and one set of tools for specifying and managing all his uses of a given word or a group of words.
2. Services performed by the ActiveWords System
As discussed above, the ActiveWords system can perform a variety of services in response to an action word, as discussed above. In a preferred embodiment, service scripts are constructed using a combination of these four service types:
(1) Content service--alters the user's text content in some way. Transforming a shorthand word into its longhand form is an example (e.g., typing "ddl" in order to have the ActiveWords system type "due diligence").The present invention can be set to automatically capitalize the first letter of proper nouns. Hence "tom" is automatically capitalized. Likewise for Washington, January, pluto, easter, lincoln, cobol, etc. From the day the ActiveWords system is installed, the user can forget about capitalizing proper nouns that are common to his natural language and his language(s) of art. Similarly, contractions automatically receive an inserted apostrophe, e.g., can't, won't, couldn't, shouldn't, hadn't, wouldn't, etc. Likewise with hyphenations: user-friendly, client-server, single-keystroke, etc. The ActiveWords system can also automatically corrects double caps at the beginning of a word (occurs when the user accidentally stays on the shift key too long), automatically capitalizes the first letters in sentences, automatically eliminates double spaces between words (if the user wishes) and automatically corrects inadvertent use of the "Caps Lock" key so that "tHIS" is automatically changed to "This".
(2) Information service--assembles and delivers software and information resources to the user's screen (e.g., having the ActiveWords system look up a word in a dictionary, database or at a website via an internet browser).
(3) Command service--causes an operation to be performed by a software application, a utility program, or by the operating system (e.g., opening a word processing document).
(4) Navigation services--causes navigation within an application or launches an application.
3. MIKE 330
FIG. 4 is a block diagram of MIKE 330. MIKE 330 includes a data manager 410, a fetcher 420, a command interpreter 430, a navigational manager 440, a state table 450 and an agent services module 460. Once the data has been captured by VID 110, it is sent character by character, to data manager 410. Data can also be entered via a microphone. Three actions occur while the present invention monitors for user input: updating the state table 450, searching for action words and updating an archive (not shown) with the contents of the current text stream.
The data manager 410 is a simple character store that ensures that no character is lost in case the system is busy. It works as a circular or rolling storing list of 200 bytes under a FIFO protocol. Data manager 410 is independent of the stream of inputs stored in an archive (not shown) by the present invention. The purpose of the data manager 410 is to detect an action word. An utterance is cleared by the data manager 410 upon the activation of a delineator. A delineator is a keystroke that signals data manager 410 that a complete set of keystrokes (e.g., word, group of words, number, etc.) has been entered. Example delineators include the pressing of the space bar, change of application context, an end of word punctuation, pressing the right or left buttons on the mouse, or the like. Each time data manager 410 is cleared, it begins monitoring for another action word. The type of delineators used in a preferred embodiment of the present invention is user assignable.
The data manager 410 also sends all characters and special keys (re-transmission of typed characters) from the user's data stream to the command interpreter 430. The command interpreter 430 passes each utterance to fetcher 420. Fetcher 420 is responsible for searching within wordbase 340 for action words. The wordbase 340 is searched after each delineator (e.g., space, tab comma, other punctuation, etc.). Wordbase 340 is searched to determine whether the utterance is actionable. Paired with each action word in every item record of wordbase 340 is a service script, as described above.
When data is entered via a microphone, the voice signals are recognized by voice recognition software and the generated text is provided to the command interpreter 430 via AW services 460. The present invention further contemplates receiving the translated voice signals via other components and/or drivers. Otherwise, operation of the present invention is analogous to when data is entered via a keyboard or selected via a mouse.
In a preferred embodiment, the fetcher 420 uses the Jet Database Engine to look for utterances inside the wordbase 340. That is, fetcher 420 determines whether the utterance matches an item record within the wordbase 340. If the fetcher 420 finds a match, it sends the action (i.e. service script), type, comments, and informational fields associated with the action word to the command interpreter 430.
The command interpreter 430 executes service scripts associated with an action word. The command interpreter 430 sends all keyboard related actions (replacements, special keys, and the like) associated with fetched action words through the VID 110 to the applications 118. For example, when the action word entered by the user requires a substitution (e.g., "June" to "June"), the command interpreter 430 forwards the replacement text to the application program (e.g., wordprocessor) via VID 110.
The data manager 410 can also activate an action box 470. The action box 470 is also referred to herein as a scratch pad (FIG. 30). The action box 470 notifies the data manager 410 when any of its options or related actions are executed. The action box is a dialog feature for general purposes, such as inputting text in response to a request from a service script. In its most common use, the scratch pad is a window that enables a user to enter action words when the user does not want to enter text into his foremost application. The display of a window in which text can be entered and selected is well known in the art.
The data manager 410 sends the typed characters (converted from ScanCodes to characters), feedback messages, control changes, and activity indicators to the monitoring bar 315. The monitoring bar 315 sends notifications of changes to control option settings the user issues to the data manager 410 via the monitoring bar's icons or pull down menu.
The data manager 410 notifies the command interpreter 430 when a delineator is detected. This indicates that the user has completed inputting a complete utterance that needs to be matched against the item records in the wordbase 340 to determine if it is an action word. The command interpreter 430 first compares each word with a list of integrated action words, which are stored locally within the command interpreter 430. Integrated action words are special action words for controlling various functions directly relating to the user's computer or the ActiveWords system, such as temporarily deactivating the present invention for the next word thus preventing the next word from being matched against the wordbase 340. This is referred to as putting the monitoring bar to sleep. The present invention also contemplates designating certain common spelling mistakes or proper nouns as integrated action words. For example "tHe" can be automatically replaced with "The" without having to access the wordbase 340. If it is not an integrated active word, it sends the word to the fetcher 420 so it can check the wordbase 340 for a match with the action word.
If a match is detected by the fetcher 420, the command interpreter 430 notifies the data manager 410 of the type of service script (e.g., substitution, control, navigation, in-place transformation) associated with the action word. If the service script calls for a text substitution, the command interpreter 430 also sends the replacement text to the data manager 410 for further processing (e.g., to act on another action word embedded in the script).
Command interpreter 430 receives the service scripts that fetcher 420 locates within wordbase 340 and proceeds to interpret them. A service script is made up of a series of commands which can range from a simple word replacement to a call to an application program. Scripts also allows the present invention to use the functionality included in agents 370 or application programs 118. For example, a third-party PIM application can directly insert, using Microsoft OCX controls, an appointment into their database using their own insertion function by simply making a call from the service script. This is a powerful and simple way, via the ActiveWords system, for the user to leverage the capabilities of third-party functionality.
The state table 450 provides information to application programs 118 and agents 370 via ActiveWords (AW) agent services 460 and Win95 messaging system 405. The information includes data about the user's typed/replaced text stream and the user's foremost environment (e.g., program, window, document) at the time an action word is sensed. It is a circular structure that contains lists of the last W words typed, the last X words replaced, the last Y code words typed or embedded, last Z lines typed or replaced. In a preferred embodiment, W, X and Y are set to 15 and Z is set to 1. However, W, X, Y and Z are user configurable. FIG. 18, which is described in greater detail below, illustrates a screen shot of a window that is displaying the state table 450.
The AW agent services are a set of functions and commands offered by the MIKE 330 to the application programs 118 and agents 370. These functions can retrieve information from MIKE 330, send information to the MIKE 300, set behavior, conditions and settings of MIKE 330. The AW agent services 460 and the agents 370 communicate with each other through the Windows 95 messaging system 405. Agents 370 also communicate with applications 118 (e.g., wordprocessors, spreadsheets, etc.) through the Windows 95 messaging system 405. Through this channel the agents 370 can request information or execute an action (such as fetch an action word and execute its associated service script or paste text to the current application) from MIKE 330. MIKE 330 further uses the Windows 95 messaging system 405 to act on applications, such as minimizing and maximizing windows.
Navigation manager 440 receives commands from command interpreter 430 regarding the launching, closing, and navigation of documents, applications, folders, links, URL's, and the behavior of windows. Navigational manager 440 communicates with applications via the Win95 Messaging system 405.
4. Monitor 110
Monitor VID 110 is graphically represented to the user in accordance with the present invention via a monitoring bar 315, as shown in FIG. 14. The monitoring bar 315 has two data fields: text field 1410 and feedback field 1420. The text field 1410 contains the symbol, character or word currently typed (i.e., prior to a space bar being pressed by the user or typing any other user-defined delimiter). The feedback field 1420 provides an indication, when appropriate, of the script being executed. The launching of Microsoft Word 97 is shown in FIG. 17. The feedback field 1420 illustrates the a script that is performed to launch this application program. The feedback field 1420 can also be used to display "hints" to the user for using the present invention. For example, the message "use d for and" can be displayed to tell the user that typing the letter "d" can be typed and the ActiveWords system will replace the "d" with "and".
The monitoring bar 315 further includes a number of other icons. Icon 430, shown as C.sup.c, provides the user with access to a control center 345, which is described below. Icon 1470 allows a user to change his user profile.
Icon 1440 provides access to a LightEditor, which is shown in FIG. 19. The LightEditor allows a user to make quick, simple changes to a wordbase 340, such as adding or modifying items in the wordbase 340 or consulting an existing item in the wordbase 340. The LightEditor can also be activated by dragging and dropping a shortcut to file, folder or program onto monitoring bar 315. The user then specifies the action word that, when typed by the user, will launch that file, folder or program.
The LightEditor allows a user to create and add an action word and its service script to the wordbase 340. In a preferred embodiment, the LightEditor is called from the monitoring bar 315, either from an icon 1440 or from an option in the pull-down menu. It can also be called via an action word. The LightEditor is similar to the editing mask in the Control Center 345. It contains several fields for user input, as well as buttons for actions related with the contents of the various fields that comprise a wordbase item record. The first two fields 1910 and 1920 are where the user specifies the code word and dual word, respectively, for that item. Each of these fields can be activated or de-activated through a check box 1915 and 1925. If either of these fields are empty, the corresponding check box is not checked. As soon as the user types in an empty field, the corresponding check box is automatically checked. The check mark tells the system to monitor the text stream for that word and to perform the service script when that word is encountered.
The user has to specify in what Drawer/Folder a new item is to be inserted in field 1930. When editing, the user can modify the destination of the item (i.e., move an item from one drawer/folder to another). The LightEditor pre-selects the action type (also referred to as a service type) within field 1940 related with a new action word, depending on how the LightEditor was activated. If the LightEditor was activated via a drag and drop operation by the user, navigation is selected. On the other hand, if it is activated through a icon on the monitoring bar 315, text substitution is selected. The user can change the action type by selecting one of the displayed action type categories via a mouse. When the action type is selected, the wordbase browser software enables the user to find the desired application, document, or link to be associated with that action word. The action type also enables the browser to select the correct editing mask for making additions or changes to that wordbase item record at a later time.
Icon 1450 provides the user with a find function that allows a user to find any word or set of words stored within his current wordbase 340. FIG. 26 illustrates a window that allows the user to enter one or more words to be located via a searching algorithm. Searching algorithms are well known in the art, and for the sake of brevity will not be described herein. Icon 1480 provides an advanced find feature (FIG. 25), which is available via the control center 345. The advanced find feature allows a more granular level of searching (e.g., searching between two dates). Activation of icon 1480 also launches the control center 345. Icon 1490 allows a user to select text, e.g., from a notepad, spreadsheet, e-mail, word processing document, etc., and search the wordbase 340 for the selected text to determine whether it is an action word.
Icon 1460 is referred to as a "Mr. IBeam." Mr. IBeam is cartoon character comprised of a vertical line with a pair of graphically displayed eye gla |