Semantic user interface6438545
Abstract
A system and method that allows a user to use their everyday language or user defined words to operate a computer in a highly efficient way. In short, every word, letter, control character and symbol is potentially actionable. A computer user's productivity is dramatically increased by making available those functions that enable a user to produce most of his work through simple, language-based commands. The present invention provides an intuitive interface, referred to as a semantic user interface (SUI), that enhances the operation of the current standard window-based interface in a manner that is simple, richer and natural. By leveraging all of the richness and power inherent in a user's language, the present invention provides an important tool that allows the personal computer to operate in a manner that is much closer to our natural way of interacting. A user is allowed to enter "commands" in his everyday natural language in order to control the operations of the computer. All commands are language-based and user-defined. These commands can be entered from any context of the user's computer (e.g., any application or operating system workspace). The commands allows a user to launch applications and navigate within applications by using language rather than clicks from a pointing device such as a mouse. It also allows the replacement of keystrokes with stored words or keystrokes. The system also keeps a complete archive record of all the text content the user provides as input, regardless of which application program or operating system window the user is operating in at the time. The combined set of all user defined commands and the memory of all the input text that is stored in the archive constitutes the personality profile and is transportable from one computer to another.
Claims
What is claimed is:
1. A system for permitting a user to implement functionality on a computer, the functionality being provided across a plurality of application programs, the computer including a data entry device, comprising:
means for monitoring all data entered by a user within any one of the plurality of application programs, said data including one or more alphabetic letters, symbols and/or words;
a wordbase having stored therein a plurality of item records, each item record having an action word and one of a plurality of associated functions, wherein one of said item records includes a default action word;
means for searching said wordbase for a match with said data entered by said user; and
means for performing said function associated with said data.
2. The system of claim 1, wherein said data is entered via a microphone, selection device, or keyboard.
3. The system of claim 2, further comprising means for recognizing voice signals input via said microphone to produce recognizable data, wherein said recognizable data is used by said means for searching.
4. The system of claim 1, wherein said data entered by said user may be selected with a selection device by said user.
5. The system of claim 1, wherein a word entered by said user is a dual word, wherein said user disambiguates said dual word to indicate to said means for monitoring that said dual word is actionable.
6. The system of claim 1, wherein said function can be activated via entering a plurality of words.
7. The system of claim 1, wherein said means for monitoring monitors for a delineator, wherein data that is actionable is always followed by a delineator.
8. The system of claim 7, wherein said delineator is a punctuation mark, a special character, entry of a space bar, or a click of a selection device.
9. The system of claim 7, wherein said delineator is a predefined key on a keyboard.
10. The system of claim 7, wherein said delineator is a user-defined key.
11. They system of claim 7, wherein said delineator is a button on a point-and-click device.
12. The system of claim 1, further comprising means for displaying a charm box, said charm box having displayed therein information relating to said data entered by said user.
13. The system of claim 1, wherein said function includes launching an application program, a file or a folder.
14. The system of claim 1, wherein said function includes text substitution, wherein said text is substituted at the position of a displayed cursor.
15. The system of claim 1, wherein said means for monitoring can be toggled between on and off.
16. The system of claim 1, wherein said associated function includes calling an agent.
17. The system of claim 1, wherein said wordbase is located on a server connected to a network.
18. The system of claim 1, wherein a single action word can activate two or more functions, the system further comprising means for selecting between said two or more functions when said single action word is entered by said user.
19. The system of claim 1, wherein an action word can be formed by at least two natural language words.
20. The system of claim 1, wherein an action can be a code word or a dual word, the system further comprising means for allowing said user to turn said action word on and off within said wordbase.
21. The system of claim 1, populating said wordbase with help-desk action words relating to features and/or functions of the computer, whereby a customer service person can request said user to enter said help-desk action words in order to assist said user.
22. The system of claim 1, wherein said wordbase is populated by a third party.
23. The system of claim 1, wherein said means for monitoring monitors said data within all contexts of the computer, said data including one or more alphabetic letters, symbols and/or words.
24. A system for permitting a user to access information across a plurality of application programs, comprising:
means for monitoring data selected by a user within any one of the plurality of application programs;
a wordbase; and
means for displaying information relating to said data selected by said user, wherein said wordbase is accessed prior to displaying said information.
25. The system of claim 24, wherein said wordbase having stored therein a plurality of records, each record having an action word and one of a plurality of associated functions, wherein said action word is formed from data selected by said user; and
means for performing said function associated with said data.
26. The system of claim 25, wherein said means for monitoring also monitors for data entered by said user via a keyboard.
27. The system of claim 26, wherein said data is formed from one or more alphabetic letters, symbols and/or words.
28. The method of claim 26, further comprising the step of establishing a user profile, wherein said information is displayed after consulting said user profile.
29. The method claim 28, wherein said information is accessed from a web page.
30. The method of claim 28, wherein said information is one of a stock quote, statistics, or a translation of said data into another language.
31. The method of claim 28, wherein said information is from a dictionary or an encyclopedia.
32. The method of claim 26, further comprising storing said wordbase on a server connected to a network, wherein the application programs execute on a computer system, which is also connected to said network.
33. The method of claim 26, wherein said wordbase is populated by a third party.
34. The method of claim 33, wherein said wordbase is further populated by said user.
35. The method of claim 26, wherein one of said item records includes a default action word.
36. The system of claim 25, wherein one of said records includes a default action word.
37. The system of claim 25, wherein a single action word can activate two or more functions, the system further comprising means for selecting between said two or more functions when said single action word is selected by said user.
38. The system of claim 25, wherein said action word is formed by at least two natural language words.
39. The system of claim 25, wherein an action word can be a code word or a dual word, the system further comprising means for allowing said user to turn said code word and said dual word on and off within said wordbase.
40. The system of claim 25, wherein said wordbase is populated by a third party.
41. The system of claim 40, wherein said wordbase is further populated by the user.
42. The system of claim 40, further comprising storing said wordbase on a server connected to a network, wherein the application programs execute on a computer system, which is also connected to said network.
43. The system of claim 24, wherein said data is selected via a mouse.
44. The system of claim 24, wherein said means for displaying is configured to display said information based on a user profile.
45. The system of claim 44, further comprising means for maintaining an archive record of language preferences, word frequencies, and utterance behavior of said user.
46. The system of claim 44, further comprising means for maintaining multiple user profiles.
47. The system of claim 24, wherein said means for displaying includes displaying a number of resources relating to said selected data.
48. The system of claim 24, wherein said means for displaying displays a list of equivalent words in one or more languages related to said selected data.
49. The system of claim 24, wherein said means for displaying displays a list of Internet links (URL's) related to said selected data.
50. The system of claim 24, wherein said means for displaying displays a list of synonyms for said selected data.
51. The system of claim 24, wherein the plurality of application programs include at least one of an e-mail program, a word processing program, or a browser.
52. The system of claim 24, wherein said data is selected by said user by activating a user-defined key.
53. The system of claim 24, wherein said data can be selected from a web page.
54. The system of claim 24, wherein said information is a stock quote.
55. The system of claim 24, wherein said information is from a dictionary or encyclopedia.
56. The system of claim 24, wherein said information is a statistic.
57. The system of claim 24, wherein said information is displayed within a pop-up window.
58. The system of claim 24, further comprises providing an indication that additional information is available relating to said selected data.
59. The system of claim 24, wherein said information is a translation of said data into another language.
60. The system of claim 24, wherein said information includes a weather report for a designated location.
61. The system of claim 24, wherein said data is selected by said user by pressing a button on a mouse.
62. The system of claim 24, wherein said data is selected by pressing a designated key on a keyboard.
63. The system of claim 62, wherein said information is displayed within a pop-up window.
64. The system of claim 62, wherein said information is a web page accessed via the Internet.
65. The system of claim 24, wherein said wordbase is populated by a third party.
66. The system of claim 65, wherein said wordbase is further populated by the user.
67. The system of claim 24, wherein said wordbase is located on a server connected to a network.
68. The system of claim 24, wherein said data can be selected from within an environment created by an operating system.
69. The system of claim 24, wherein said data is selected by entering a delineator.
70. The system of claim 69, wherein said delineator is user-definable.
71. The system of claim 69, wherein said delineator is a button on a point-and-click device.
72. The method of claim 24, wherein said data is in a first language, and further comprising:
means for searching said wordbase to determine a translation of said data into a second language;
means for displaying said translation of said word in said second language within said pop-up window.
73. The method of claim 72, further comprising the step of establishing a user profile, wherein said translation is displayed after accessing said user profile.
74. The method of claim 72, wherein said first language is English and said second language is Spanish or French.
75. The method of claim 72, wherein said data is selected using a mouse and a pre-defined key on a keyboard.
76. The method of claim 72, wherein said step of displaying includes providing a window that includes a translation of said data in a plurality of languages.
77. The method of claim 72, wherein said wordbase is located on a server connected to a network.
78. The system of claim 24, wherein said data including one or more alphabetic letters, symbols and/or words, and said means for displaying accesses said wordbase prior to displaying said information, wherein said wordbase includes other functions that can be performed upon selection of said selected data, wherein each of said functions is associated with one or more of said alphabetic letters, symbols and/or words.
79. The system of claim 24, further comprising:
means for converting voice signals received via microphone from a user to data, wherein said means for monitoring also monitors for said data entered via said microphone; and
wherein said wordbase having stored therein a plurality of item records, each item record having an action word and one of a plurality of associated functions.
80. The system of claim 79, further comprising:
means for establishing a user profile; and
means for displaying information relating to said selected data, wherein said information is based on said user profile.
81. The system of claim 24, wherein said data entered by said user is selected by using a mouse.
82. A method for permitting a user to access information across a plurality of application programs, comprising:
monitoring data entered by a user within any one of the plurality of application programs;
accessing a wordbase for a match with data entered by said user; and
displaying information within a display box relating to said data entered by said user.
83. The method of claim 82, wherein said step of displaying further includes consulting a user profile prior to displaying said information within said display box.
84. The method of claim 82, wherein said wordbase has stored therein a plurality of item records, each item record having an action word and one of a plurality of associated functions: and
activating a function associated with said data if said data matches an action word stored within said wordbase.
85. The method of claim 82, wherein said data is selected via a mouse.
86. The method of claim 82, further comprising selecting said data from at least one of an e-mail program, a word processing program, or a browser.
87. The method of claim 82, wherein said data can be selected from a web page.
88. The method of claim 82, wherein said information is a stock quote or statistic.
89. The method of claim 82, wherein said information is from a dictionary.
90. The method of claim 82, wherein said information is from an encyclopedia.
91. The method of claim 82, further comprising providing an indication that additional information is available relating to said selected data.
92. The method of claim 82, wherein said data is selected by the user activating a user-defined key.
93. The method of claim 82, wherein said data is selected by the user pressing a button on a mouse.
94. The method of claim 82, wherein said data is selected by pressing a designated key on a keyboard.
95. The method of claim 82, wherein said wordbase is populated by a third party.
96. The method of claim 95, wherein said wordbase is further populated by the user.
97. The method of claim 96, wherein said information is one of: a stock quote, a statistic, a translation, or from a dictionary, an encyclopedia, or a web page.
98. The method of claim 95, wherein said wordbase is located on a server connected to a network.
99. The method of claim 98, further comprising entering a user-defined delineator, whereby said data is selected by entering said delineator.
100. The method of claim 82, further comprising entering a user-defined delineator, whereby said data is selected by entering said delineator.
101. The method of claim 82, wherein said date entered by said user is selected by using a mouse.
102. The method of claim 82, wherein said data is monitored within any context of a computer system.
103. A method of accessing information, comprising the steps of:
selecting data via a point-and-click device associated with a computer, wherein said selection can occur within any one of a plurality of application programs executing on the computer;
entering a delineator;
accessing a wordbase using said selected data; and
displaying a window in response to said selection of said data, wherein said window contains information relating to said data.
104. The method of claim 103, further comprising populating said wordbase with a plurality of item records, each item record having an action word and one of a plurality of associated functions.
105. The method of claim 104, wherein one of said item records includes a default action word.
106. The method of claim 104, wherein an action word can be a dual word or a code word, wherein said user disambiguates a dual word to indicate that said dual word is an action word.
107. The method of claim 104, wherein a function can be activated via a plurality of words.
108. The method of claim 104, wherein said functions includes launching an application program, a file or a folder.
109. The method of claim 104, wherein said functions includes text substitution, wherein said text is substituted at the position of a displayed cursor.
110. The method of claim 103, wherein said delineator is a predefined key on a keyboard.
111. The method of claim 103, wherein said information is real-time information.
112. The method of claim 103, wherein said information includes the correct spelling of said data.
113. The method of claim 103, wherein said information includes a translation of said selected data in a different language.
114. The method of claim 103, wherein entering said delineator includes selecting, pointing, or pressing.
115. The method of claim 103, wherein said plurality of application programs include at least one of an e-mail program, a word processing program, or a browser.
116. The method of claim 103, herein said wordbase is populated by a third party.
117. The method of claim 103, further comprises providing an indication that additional information is available relating to said selected data.
118. The method of claim 103, wherein said delineator is a user-defined key.
119. The method of claim 103, wherein said wordbase is located on a server connected to a network.
120. The method of claim 103, further comprising the step of establishing a user profile, wherein said information is displayed after accessing said user profile.
121. A method for permitting a user to access information via a computer, comprising the steps of:
monitoring data selected by a user across a plurality of application programs, wherein said data is selected using a point-and-click device attached to the computer, wherein a selection of said data is followed by either pressing a pre-defined key on a keyboard attached to the computer or by pressing a button on said point-and-click device;
populating a wordbase;
accessing said wordbase in response to said user selecting said data; and
displaying information related to said data, based on said step of accessing said wordbase, within a pop-up window.
122. The method of claim 121, further comprising the step of establishing a user profile, wherein said information is displayed after accessing said user profile.
123. The method of claim 122, wherein said information is one of a stock quote, statistic, a translation, or from a dictionary, an encyclopedia, or a web page.
124. The system of claim 22, wherein said wordbase is further populated by the user.
125. The method of claim 121, wherein said wordbase is located on a server connected to a network.
126. The method of claim 125, further comprising entering a user-defined delineator, whereby said data is selected by entering said delineator.
127. The method of claim 126, wherein said wordbase is populated by a third party.
128. The method of claim 121, wherein said information is one of: a stock quote, statistic, a translation, or from a dictionary, an encyclopedia, or a web page.
129. The method of claim 121, wherein said wordbase is located on a server connected to a network.
130. The method of claim 121, further comprising entering a user-defined delineator, whereby said data is selected by entering said delineator.
131. The method of claim 121, further comprising monitoring data entered by said user via keyboard, accessing said wordbase in response to said data entered by said user, and displaying said information based on said data entered by said user.
Description
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to a semantic interface for a computer system, and more particularly, to a system and method of providing a semantic interface that allows a user to access via a set of user defined words a plurality of services, including command, navigation and substitution, within all contexts of his/her computer system.
2. Related Art
Computers have revolutionized the way individuals in all aspects of life perform tasks. A user interface provides a mechanism for individuals to access all the features and functionalities of their computer. Without a user friendly interface, these features and functions are typically inaccessible to the computer operator. The prevalent user interface in the industry today uses windows, icons, menus and pointing devices. The text stream entered by the user, however, has been essentially ignored.
This window-based user interface (also referred to as a graphical user interface or GUI) was first conceived by Xerox, commercialized by Apple Computers (e.g., the Macintosh), and brought to the mainstream by Microsoft Corporation (e.g., Windows 95). The GUI is powerful for organizing the capabilities and resources available in a computer. It enables the user to incrementally explore and discover his computer's capabilities and controls. It keeps everything in a convenient visual context, using helpful metaphors, like desktops and windows.
The GUI provides a menu hierarchy which is accessible via a pointing device or mouse. One of the cornerstones of this interface is the ability for the user to interact directly with objects and elements. This can be a great advantage in some cases, but it can result in making simple tasks that are often repeated into tedious choirs of navigation through a maze of GUI windows. To provide the ability to directly manipulate the elements or objects, you must enable the user to work at an "atomic" level, losing the ability to group a related series of basic actions into one high level action or use of conditionals.
Under this paradigm, we can no longer access or work with objects that are not visible or unknown to us. This situation is not unlike going to eat in another country where we do not know the language. We are thus forced to go to the kitchen and point to whatever food we want. It is clear that we do not want to go to the kitchen in order to eat; we would rather express ourselves using all the richness that our natural language allows. In order to accomplish this goal, the computer has to respond to human language and not the other way around. In reality, we want to be able to go to our favorite restaurant and say "give me my usual order" and receive exactly what we ordered. This is personal attention and awareness of your eating profile. This is, in essence, what we want from our personal computer.
Advocates of the window-based user interface firmly believe that the user should always be in control. The window-based user interface provides permanent feedback to the user by providing windows with menus. The down side of this is that the user must always be in control even if he does not want to be or he cannot be because of the complexity of the task. Windows 95, for example, has started a trend in allowing the user to delegate his control to an agent, through a concept of smart and thoughtful "agents." Such tasks like un-installing software, are, for example, automatically performed by Windows 95.
There are many tasks that a user must repeat again and again when using a GUI, such as opening certain files and activating certain controls. For such tasks, the GUI presents the user with a single logic set, implemented within the limited screen real estate of his computer monitor. Also, the GUI recognizes none of the user's words. Even the simplest functions requires the user to change mode from the keyboard to hand/eye mouse control. To use the GUI, he must lift his hand from the keyboard to the mouse. He must also lift his eyes to the screen to locate the desired graphical element, and then manipulate the mouse while visually monitoring the result. This is like having to look at someone every time you want to say a word to that person.
Computer systems must provide users with a mechanism to undo a previous action when a mistake has been made. These same systems must also provide a strong warning if an intended action will be irreversible. This is not a great concern when you are a novice user, but when you become a more experienced user, this feature will turn against you and unnecessarily increase your workload. Take for instance the simple task of copying a file to a floppy that does not have enough free space. The Macintosh window-based interface provides a warning that you must throw away X Kb in order to make space. The user, in an attempt to make space, discards X Kb of data from the floppy and places it in the trash. The user once again attempts to copy the file onto the floppy, only to be told that there is not enough room, but would you like to throw out the trash. This is what you wanted from the start! What was a good feature quickly becomes a nuisance.
The problem stems from the inability of the computer to fully understand even our simplest intentions; it lacks our personality profile. In order to overcome this problem, the computer needs to build a deeper model of each user's intentions and history in order to better serve the user's needs and to eliminate unnecessarily repetitive activities. The core requirement here is to provide mechanisms to ascertain how users work and to track their activities in an unobtrusive way.
Current wisdom states that the more static and unchanging our environments, the simpler and better it is for us. As we grow in knowledge and understanding of what the computer can do for us, we are willing to accept changes and learn to cope with them in our quest to increase our personal productivity. Unfortunately, current computer user interfaces have limited abilities to allow a user to express themselves. If computers could communicate with a richer language, it would not be so important that everything have a uniform look and feel.
The computer interface should allow the user to perform any task at any time, irrespective of the application that are currently running. In other words, if a user is working on a word processor and needs to make some calculations, he should not be required to leave his work and open another application via a menu driven user interface to complete the necessary arithmetic operations.
Every computer user has a unique pattern of use. Typically, 80% of a user's work product is accomplished through repeated use of only 20% of his software's available features. This is commonly referred to as Pareto's Rule, and the 20% of the tasks are often referred to as the "vital few." The 80% of available software features and functions that are not needed or used by any particular user must still be available to all other users through the GUI system of menus and windows. Every user's "80/20" profile is unique. Nevertheless, it is the need to organize 100% of the available functionality that necessitates the depth, nesting and complexity of current GUI system. As a result, the GUI is an inefficient fit, to a greater or lesser degree, for every individual user.
Over the years, a number of approaches have been invented to tackle this problem of inefficient fit. Because of their inherent limitations, none have been successful enough to reach the mainstream user. Software entrepreneurs have developed "shortcut" utilities of various designs. While not specifically marketed as such, the intention of these utilities is to address each user's "80/20" pattern of often repeated tasks. These "shortcut" utilities take two forms: macros triggered by key combinations and icon palette macros.
Macros triggered by key combinations typically take one or both of two forms, macro utilities and text replacement utilities. Macro utility programs provide shortcuts to functions and processes such as opening applications and files, making menu selections, and performing multi-step operations. Macro utilities, such as Tempo, MacroMagic, and Keboard Express for the "WinTel" platform and QuicKeys for the Macintosh are all activated by the user via keystroke combinations. Microsoft's Windows interface offers many key combination shortcuts macros to operate various controls, menus and etc. To activate these macros, the user must press at least one "control" key (e.g., <alt>), combined with pressing a single "non-control" character (e.g., <x>). Users find it very difficult to develop a mnemonically consistent scheme for remembering such key combinations, for two reasons. First, the combinations are mnemonically so arbitrary that it is difficult to use mnemonic logic to memorize the cryptic key combinations. Also, many key combinations only work a given way in specific application programs, further restricting the combinations that are available. The user's limited ability to remember and reflexively recall more than a few cryptic key combinations severely limit's the usability of macro utilities. Many people are so intimidated by the cryptic nature of macros they refuse to even consider their use.
Text substitution utilities provide the ability to replace a short string of typed text with long and/or formatted text. For example, a user may define the code word "evp" to trigger the substitution to "Executive Vice President", or define a short code word like "nad" to be replaced by a series of pre-defined text lines (name and address in this case). There are several utility software products available to do that within single applications. Text replacement utilities for single applications are, for example, included with Word 7.0 for Windows 95. Other examples include ShortKeys for Windows and both SpellCatcher and TypeIt4Me on the Macintosh platform. Recognition of the user's words by these utilities is limited to the purpose of replacing one text string with another. These utilities are writer's aids only. They do not enable the user to also use words for controlling computer processes and functions.
Icon palette utilities are used to give macros a visual presence and context. The macros are activated via mouse clicks. The Icon pallets are an attempt to use a visual interface to overcome the cryptic and therefore hard-to-remember keystroke interface for macro utilities. Often, macro utility products offer icon palettes as a second, alternative interface for accessing the macros. In this approach, a computer macro (process or function) is assigned to a graphical icon, which is presented on an icon bar on the user's screen. Examples of such utilities are included in Norton Navigator for Window 95 and in both QuickKeys and OneClick on the Macintosh platform.
By definition, these Icon pallet utilities are an extension of the GUI. Screen size, display resolution, and the user's preference in allocating scarce screen real estate limit the number of icons it makes sense for the user to display on his screen. Given that the users "vital few" can involve scores or hundreds of items, the Icon approach is severely limited by the visual real estate available and the amount of visual complexity the user can tolerate. Moreover, the user must memorize the relationship between the graphic depiction of each icon and the function or process each executes. As the users icon pallet population increases, the distinctiveness of each icon is reduced.
The existing shortcut utilities do not offer the user an integrated approach to creating, managing and using shortcuts for content services, retrieval services and command. Their interfaces are inconsistent and far too difficult to organize and remember. Because the user must assemble his shortcuts using a collection of different software products, he loses a lot of his gains in dealing with cumbersome and time-consuming management of his shortcuts.
It is clear from the above, that the current trend to rely solely on window-based user interfaces has seriously constrained a user's ability to fully utilize their computer. Although the window-based user interface has revolutionized the computer system, and has allowed millions of people to use computers, we have reached a point where a user's ability to fully appreciate and utilize all of the features and functionalities of their computer system has been compromised. Thus, what is needed is a system and method that provides a user with an efficient, convenient and natural way to utilize his everyday language to work with applications, files, control commands, and the like, that form his/her "vital few."
SUMMARY OF THE INVENTION
The present invention allows a user to use their everyday language competency or user defined code words to operate a computer in a highly efficient way. In short, every word, letter, control character and symbol is actionable. The present invention is based on Pareto's law, which applies to how people work. Pareto's law states that people use 20% of all available tools and functionality to accomplish 80% of their tasks. Similarly, 80% of people's work is accomplished by repeating 20%, or the vital few, of their tasks. By focusing on those activities that enable us to produce most of our work and making them available through simple, natural language-based commands, the present invention enhances a computer user's productivity dramatically. The present invention provides a more intuitive interface that enhances the operation of the current standard graphical user interface (GUI) in a manner that is simple, richer and natural. By leveraging all of the richness and power inherent in our language, the present invention provides an important tool that allows the personal computer to operate in a manner that is much closer to our natural way of interacting; that is, the way people interact with each other.
The present invention provides a language awareness paradigm, which was born out of a very practical need: to do more with current resources. The basic principles of the language awareness paradigm can be stated very simply:
all commands are natural language-based and/or user-defined.
the basic set of commands are designed to allow users to gain access to their vital few (e.g., applications, documents, controls and functions), which defines each user's "sweet spot" of activity, using a least effort path.
all operations and functionality are unobtrusive.
all user's input is recorded in a context rich format for future reference.
the combined set of all user word preferences, defined commands, and the order in which the commands are stored in memory constitutes a personality profile and are transportable from one computer to another.
Based on the above principles, the present invention provides a user environment, referred to as a semantic user interface (SUI), that compliments the GUI. Via the SUI, the user is enabled to enter action words and interact with the system to control the operations of the computer. The SUI is always monitoring the user's input text stream in the background.
The SUI thus makes the computer responsive, on a system-wide basis, to the user's every word. Accordingly, the SUI allows a user to enter action words from any context (i.e., any application or operating system workspace). Action words are a new category of words introduced by the present invention. Action words are thus words that users place into the text stream as requests for specific services from the present invention. There are two types of action words: code words and dual words. Code words are action words the user makes up or which are not part of his natural language lexicon (e.g., not in the standard dictionary). For example, typing "msword" to launch Microsoft's Word application is an example of entering an code word. Dual words are utterances that can be either ordinary content words or action words, depending on the user's intention in typing the word. The user may type "excel" because he intends it to be a content word in his application text, or, alternatively, he may type it because he wants to use it as an action word for opening Microsoft Excel. An "action word" can be either a single word or a phrase that includes two or more words.
The action words are then checked against the contents of a wordbase. The wordbase includes a plurality of item records. Each item record includes an action word (i.e., code word and/or dual word) and an associated service script. The service script may perform a content, retrieval, navigation or command service, or a combination of these. If the action word entered by the user is located within the wordbase, the service script associated therewith is executed. Otherwise, the utterance entered by the user is a content word and is ignored by the present invention.
Action words allow a user to launch applications, navigate within applications and control application functions by using their natural language rather than dragging and clicking with a pointing device such as a mouse. The language used is personalized for each user. That is, the action words can be user defined, thus allowing a user to utilize his own lexicon of words to control his/her computer. The present invention allows the user to identify a variety of repetitive tasks and trigger them via their predefined action words. It also enables new types of computer access, information retrieval, and other services to be performed. The present invention works with, and independently of, any software application (e.g., word processor, spreadsheet, presentation package, Internet navigator, and the like). It is thus a context-free semantic user interface, software tool and an application environment.
The present invention saves all information that is entered by the user, and stores this information in a maintenance free environment, referred to as an ActiveWords archive. The present invention records and archives the user's input text on the fly from whatever application he is working in at that time. The present invention further creates a so called 7.times.7 data repository, which is a database that is divided into seven categories, each category having seven subcategories. The 7.times.7 categorization allows a user to record notes, expenses, to do lists, and the like. Finally, the present invention is completely portable. It goes wherever the user goes simply by providing for the user's personal profile to be downloadable from one computer to any other computer that has the present invention installed.
The user can create a user profile to match his unique language personality. The present invention keeps an archive record of the user's language preferences, word frequencies, and his utterance behavior. It provides the user with tools for using that archive, in combination with his user profile, to refine his SUI and tailor it to match his habit's and preferences. Using the SUI thus becomes reflexive, like the use of a mouse becomes reflexive, because it is so easy to learn and operate, and because it operates the same way in all contexts. Finally, the SUI establishes a platform others can use to develop and sell application products that leverage the SUI. By linking the SUI via software agents, any software product can become language aware.
The foregoing and other features and advantages of the invention will be apparent from the following, more particular description of a preferred embodiment of the invention.
BRIEF DESCRIPTION OF THE FIGURES
The present invention will be described with reference to the accompanying figures, wherein:
FIG. 1 illustrates the placement of the present invention within an operating system in order to be able to monitor all user inputs.
FIG. 2 illustrates an archive generated in accordance with the present invention.
FIG. 3 is an architectural block diagram of the present invention.
FIG. 4 is a block diagram of a micro kernel engine (MIKE).
FIG. 5 illustrates the interaction of a control center, which is a central place to manage the present invention, with the other components, of the present invention.
FIG. 6 is a block diagram of the control center.
FIG. 7 is a flowchart that illustrates how the present invention checks a wordbase for action words.
FIG. 8 illustrates an exemplary environment for the present invention.
FIG. 9 is a flowchart of the operation of a toggle function and pop-up menu function.
FIG. 10 is a block diagram of the MIKE and a content display system (CDS) in accordance with the present invention.
FIG. 11 is a screen shot of a window that displays Mr. IBeams corner, which provides feedback to the user regarding their use of the present invention.
FIG. 12 illustrates the concept of multiple personal profiles.
FIG. 13 is a flowchart that illustrates how the ActiveWords archive is populated with data in accordance with the present invention.
FIG. 14 illustrates a screen shot of a monitoring bar in accordance with the present invention.
FIG. 15 is a screen shot of a window that displays a "Tip" that allows a user to become acquainted with the functions of the present invention.
FIG. 16 illustrates a screen shot of the monitoring bar along with a plurality of associated pull-down menus.
FIG. 17 illustrates the launching of Microsoft Word 97.
FIG. 18 is a screen shot of a state table in accordance with the present invention.
FIG. 19 is a screen shot of a LightEditor for adding code words and dual words to the ActiveWords Wordbase in accordance with the present invention.
FIGS. 20, 22 and 23 are screen shots of the control center.
FIG. 21 illustrates a wordbase item record.
FIG. 24 is a screen shot of a window that allows a user to configure the monitor bar.
FIGS. 25 and 26 are screen shots illustrating the Advanced Find and Find functions of the present invention, respectively.
FIG. 27 is a screen shot illustrating a banner that is displayed in a preferred embodiment when a dual word has been entered by the user.
FIGS. 28 and 29 illustrate the concept of a user profile.
FIG. 30 is a screen shot of the ActiveWords ScratchPad.
FIGS. 31A and 31B are screen shots of a window that allows multi-item resolution.
In the figures, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The figure in which an element first appears is indicated by the leftmost digit(s) in the reference number.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
I. Overview
A. The ActiveWords System
B. ActiveWord Services
1. Action Services
2. Archive Services
II. Exemplary Environment
III. Capturing Utterances Entered by the User
IV. Architecture and Operation of the Present Invention
A. Action Words and Content Words
B. Runtime Operation
1. Wordbase 340
2. Services performed by the ActiveWords System
3. MIKE 330
4. Monitor 110
5. State Table 450
6. Archiving User Text
7. The Control Center
8. Run-Time Operation of the ActiveWords System
9. The Toggle function, Pop-Up Window, Charm Box
10. Charm Words
C. The Application Programing Interface
D. Agents
E. Multi-Item Resolution
F. Portability
G. Third Party Application Programs
V. Examples of Using the ActiveWords System
VI. Conclusion
I. Overview
A. The ActiveWords System
The present invention, referred to herein as the ActiveWords system, provides a semantic user interface (SUI). The SUI allows a user to use his everyday natural language or user defined words to operate a computer and/or manipulate the user's content in a highly efficient manner. In short, every keystroke, every word, or group of words is actionable. Consequently, a computer user's productivity can be dramatically increased by using action words that the user designates to activate controls and features. This allows the user to produce most of his work through simple, natural language commands. The present invention provides an intuitive interface that enhances the operation of the current standard window-based interface (also referred to herein as a Graphical User Interface (GUI)) in a simple natural manner. By leveraging the richness and power inherent in a user's language, the present invention allows the personal computer to operate in a manner that is much closer to the way people interact with each other using words.
The present invention provides a simpler and more natural way to work with the objects, applications, information requests (i.e., queries), and the like that constitute each user's "vital few." The vital few is each users unique pattern of using objects (e.g., applications, files, folders) and processes (e.g., computer controls and applications features) that comprise the user's sweet spot. The SUI allows the user to activate his/her vital few, much more quickly and efficiently than he can using the GUI. Because the GUI is ideal for organizing the 100% of what is available, the user will continue to rely on the GUI to explore, discover and activate the 80% of things he seldom uses. For example, Windows 95 installs approximately 9000 items (applications, files, parameters, etc), all of which are accessible via the GUI. However, only a subset of 50-300 of these items comprise the average user's vital few. As such, in accordance with the present invention, the SUI provides a mechanism to access this subset of information, referred to as the vital few, in an effective way.
The present invention is a system that acts upon human language text that arrives at the user's desktop computer. Text can be entered directly by the user via keyboard or voice. In the case of voice, voice-to-text software is provided to translate the voice signals. Alternatively, text may arrive via e-mail text, Internet page text, or other forms of text from other sources. This text, referred to as "given text," can be selected by the user using conventional point and click technology. Once text has been entered or selected in this fashion, the text is passed to the present invention to determine its actionability. If the text is actionable, the present invention executes the designated action.
The present invention uses the same text input stream that the user employs to input data to applications and applications documents. The present invention constantly monitors the text input stream and takes appropriate action when it senses a command from the user. The ActiveWords system works all of the time and in all contexts (i.e,. within any application program or within the operating system workspace). The ActiveWords system accesses that text input stream prior to its access by an application the user may be using at any given time.
ActiveWords system exploits natural language by providing a single-word (or multi-word) logic interface, referred to herein as the SUI. That is, every word (or for that matter keystroke) entered or selected by a user is actionable. The term "single-word" as used in this document means any word that has meaning in the user's natural language (e.g., "word" for wordprocessor) or a set of letters that only has a predefined meaning to the user (e.g,. "wp" for wordprocessor). The present invention also provides for multi-word expressions. That is, two or more words may activate a service. Implementation of a multi-word embodiment will be readily apparent to one skilled in the art after reading the detailed description provided below for the single-word embodiment.
As a result of the present invention, the rich naming logic of natural language can be incorporated into a user interface. Computer users can now leverage their natural language abilities to assign names of their choosing for all their computer activities, including launching application programs, controlling application program operations, replacement of text, searching, retrieval of information, and the like.
A user is enabled to enter "utterances." Each utterance has the potential to control the operations of the computer. An "utterance" is any natural language word or group of words, string of letters or symbols, etc. followed by an delineator (e.g., a space bar or punctuation mark). The present invention checks each utterance against a wordbase to determine whether it is an action word (i.e., a word that when entered or selected triggers an action). The present invention thus senses the text stream for action words and automatically erases them when they are encountered. Action words are user defined. The action words allow a user to launch applications and navigate within applications by using language rather than clicks from a pointing device such as a mouse. The present invention, alternatively, replaces an utterance with designated words. The combined set of all user-defined action words, as well as a history of the user's past actions, constitute an ActiveWords user profile. That profile is transportable from one computer to another.
The present invention creates an environment where there are two classes of utterences that users can enter into their computers: content words and action words. Action words are divided into two groups: dual words or code words. Content words are words entered into the text stream that the user intends as input to some document, file, or directory. Examples include word processing text in a memo, file names in the Microsoft Windows directory, numbers in a spreadsheet.
Action words are a new category of words introduced by the present invention that are actionable within the SUI. Action words are thus words that users place into the text stream as requests for specific services from the present invention. Code words are action words the user makes up or which are not part of his natural language lexicon (e.g., not in the standard dictionary). For example, typing "msword" to launch Microsoft's Word application is an example of entering an code word. Dual words are utterances that can be either ordinary content words or action words, depending on the user's intention in typing the word. The user may type "excel" because he intends it to be a content word in his application text, or, alternatively, he may type it because he wants to use it as an action word for opening Microsoft Excel. Content words are not action words because the user does not intend them to be action words. As will be shown below, the present invention provides a simple mechanism for designating whether an entered word is an action word or a content word.
For many functions, the SUI offers the user a faster and simpler alternative to reaching for the mouse and using the graphic user interface (GUI). On a case by case basis, the user decides which interface (GUI or SUI) is most convenient for accomplishing his intended result. Typically, the SUI becomes the preferred, least effort path, for accessing the vital few. In a short time, the user settles into an optimum routine that combines his use of the GUI with his use of the SUI.
B. Active Word Services
The ActiveWords system provides two types of services: action services and archive services. The action services sense keystrokes, symbols and words within the text stream. If an action word is entered, the ActiveWord system takes whatever action the user has specified (i.e., each action word has at least one associated action associated therewith) for that action word. Action services are divided into five groups: command functions, content functions, navigation functions, information functions and complex functions. The archive services maintains a record of all the text the user enters as input via keyboard or voice. As stated above, these action and archive services are designed to be available at all times and within any context, so long as the computer's operating system is running. Both types of services will be discussed below.
1. Action Services
The present invention can be used to activate command functions. Command functions include, for example, window controls (e.g., resizing a window) and applications controls (e.g., save, print, search, view, open, etc.).
The present invention can also be used to activate content functions. Thus, action words can be used to achieve content results, such as text substitutions, punctuation, text formatting, text content transformation, and the like. In particular, the ActiveWords system can be used to perform text content substitutions, such as the detection and a correction of double capitals (e.g., THe becomes The), abbreviations, expansions (e.g., ceo becomes Chief Executive Officer) and large text insertions. The content functions further include insertion of punctuation, such as quotes and contractions. Still further, the content functions include formatting, such as complex formatting for programming, or for name and addresses. Finally, the content functions include content transformations, such as language translations (e.g., English to French), number to text conversion, currency conversion (e.g., dollars to pounds or yen), in-place arithmetic (e.g., replace "100+300" with "400"), date transformations (e.g., 7/1/97 to Jul. 1, 1997), data conversions (e.g., chemistry symbols and acronyms), and the like.
The present invention can further be used to activate navigation functions. Thus, active words can be used to launch application programs and navigate within an application program. For example, a single-word, such as "excel," can be used to launch a spreadsheet program from anywhere within the working environment of a user's computer. The user can use action words to navigate between different views in an application (e.g., navigating between months, dates, weeks in a calendar/planning application, such as Ecco). Documents within a wordprocessor can also be opened via an action word. Accordingly, each of any number of documents or files in a user's computer can be assigned an action word. Furthermore, the user can launch various services that affect her computer (e.g., backup of the hardrive) via an action word. These services can be launched within the user's computer or across a network of computers.
The present invention can also be used to locate information within a user's computer or from external sources. For example, an action word can be used to trigger a directory search or a database search. Another action word may be used to trigger an Internet search (e.g., find "xxxx" at the Wall Street Journal web site). Yet another action word can retrieve a specific file or record available via the Internet, extranet or intranet.
Finally, the ActiveWords system can be used to trigger and/or perform complex functions, such as dialing a person's telephone number or dialing a person's beeper service and send a message to that beeper. The ActiveWords system also provides four information and software resources, which are described in greater detail below, referred to as the toggle function, pop-up window function, charm-box function and charm-word function.
Note that most of the computer services and functions discussed above are already available within a user's computer (e.g., launching a program) or within a single application program (e.g., text replacement or searching a database). However, access to these services and functions is almost always context dependent in that the user has to leave where she is (e.g., Excel) and navigate to a specific tool or application service (e.g., Windows 95 start find menu) to obtain the service or control she needs. From the perspective of the user, that is a cumbersome and time consuming method. The user must find the service within the GUI's maze of pull-down windows or to use difficult to remember keystrokes that include control characters (e.g., ctrl, alt). The present invention allows a user to utilize his everyday language to activate these services, programs, functions, etc. from any context in the computer. The service script will navigate to the appropriate tool or context and perform designated action.
2. Archive Services
The archive service records and stores all the text a user inputs via keyboard or voice-to-text. The ActiveWord system tags the text with identifying information, such as date, application name and/or document or file name. The archive can thus be searched based on the actual text entered by the user in combination with the identifying information. The present invention further creates a so called 7.times.7 data repository, which is a database that is divided into seven categories, each category having seven subcategories.
Provided below is a detailed description of a system architecture for implementing a preferred embodiment of the ActiveWords system, along with an operational description of the present invention. Finally, this document concludes with a set of examples that illustrate practical applications for the present invention.
II. Exemplary Environment
The present invention may be implemented using hardware, software or a combination thereof and may be implemented in a computer system or other processing system. An example computer system 801, which can be installed with the present invention, is shown in FIG. 8. The computer system 801 includes one or more processors, such as processor 804. The processor 804 is connected to a communication bus 802. Various software embodiments are described in terms of this example computer system. After reading this description, it will become apparent to a person skilled in the relevant art how to implement the invention using other computer systems and/or computer architectures.
Computer system 802 also includes a main memory 806, preferably random access memory (RAM), and can also include a secondary memory 808. The secondary memory 808 can include, for example, a hard disk drive 810 and/or a removable storage drive 812, representing a floppy disk drive, a magnetic tape drive, an optical disk drive, or the like. The removable storage drive 812 reads from and/or writes to a removable storage unit 814 in a well known manner. Removable storage unit 814, represents a floppy disk, magnetic tape, optical disk, or the like, which is read by and written to by removable storage drive 812. As will be appreciated, the removable storage unit 814 is a computer usable storage medium having stored therein computer software and/or data.
In alternative embodiments, secondary memory 808 may include other similar means for allowing computer programs or other instructions to be loaded into computer system 801. Such means can include, for example, a removable storage unit 822 and an interface 820. Examples of such can include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 822 and interfaces 820 which allow software and data to be transferred from the removable storage unit 822 to computer system 801.
Computer system 801 can also include a communications interface 824. Communications interface 824 allows software and data to be transferred between computer system 801 and external devices. Examples of communications interface 824 can include a modem, a network interface (such as an Ethernet card), a communications port, a PCMCIA slot and card, etc. Software and data transferred via communications interface 824 are in the form of signals which can be electronic, electromagnetic, optical or other signals capable of being received by communications interface 824. These signals 826 are provided to communications interface via a channel 828. This channel 828 carries signals 826 and can be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RF link and other communications channels.
In this document, the terms "computer program medium" and "computer usable medium" are used to generally refer to media such as removable storage device 812, a hard disk installed in hard disk drive 810, and signals 826. These computer program products are means for providing software to computer system 801.
Computer programs (also called computer control logic) are stored in main memory and/or secondary memory 808. Computer programs can also be received via communications interface 824. Such computer programs, when executed, enable the computer system 801 to perform the features of the present invention as discussed herein. In particular, the computer programs, when executed, enable the processor 804 to perform the features of the present invention. Accordingly, such computer programs represent controllers of the computer system 801.
In an embodiment where the invention is implemented using software, the software may be stored in a computer program product and loaded into computer system 801 using removable storage drive 812, hard drive 810 or communications interface 824. The control logic (software), when executed by the processor 804, causes the processor 804 to perform the functions of the invention as described herein.
In another embodiment, the invention is implemented primarily in hardware using, for example, hardware components such as application specific integrated circuits (ASICs). Implementation of the hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art(s).
In yet another embodiment, the invention is implemented using a combination of both hardware and software.
III. Capturing Utterances Entered by the User
A preferred embodiment of the present invention is designed to operate with Windows 95, an operating system designed and distributed by Microsoft Corporation. However, the present invention contemplates operating with any present or future developed operating system, including Windows NT. For convenience, the present invention is described with reference to the Windows 95 Operating System. The present invention is configured to always be active in the background, similar to a real-time monitoring system. Every time a computer implementing the present invention is turned on, the operating system launches the present invention.
FIG. 1 illustrates how the present invention captures the keystrokes (i.e., data entered by a user via a keyboard attached to computer system 801) of the user. The present invention operates with an architecture capable of monitoring for system wide inputs. This broad I/O capability can be provided under the Virtual Machine Manager (VMM 120) that is available under Win32.The VMM 120 is an extensible operating system whose core and standard components are provided by Microsoft Corporation. By writing additional modules called V.times.Ds (virtual device drivers), software and hardware vendors can complement the VMM 120. The core of the present invention, monitor 110, is implemented as a V.times.D (referred to as a Virtual Input Driver or VID) under the Win32 bit environment.
The heart of the Windows 95 architecture consists of two features: the dynamic V.times.D loader (VXDLDR.386) and the layered I/O system provider V.times.D (IOS.386). It is the main responsibility of the IOS V.times.D to catch I/O calls that user-mode applications perform to file storage devices and route them to a set of layered V.times.Ds that will cooperatively process the calls.
Under Windows 95, a V.times.D can be loaded dynamically from another V.times.D, from a 16-bit user-mode Windows or DOS based application, or from a Win32-based application. To load a V.times.D from another V.times.D, the services from the VXDLDR V.times.D can be used. A 16-bit user-mode application obtains the VXDLDR's entry point and passes the location of the V.times.D to load to the V.times.D loader. Once the V.times.D needs to be unloaded, the application passes the module name of the V.times.D to unload to the V.times.D loader. Unfortunately, there is no such thing as a V.times.D handle that the user-mode application could use for that purpose; either the module name or the V.times.D ID must be known to the application in order to unload the V.times.D. A Win32-based application must open the V.times.D using the CreateFile Win32 API to obtain a handle to the V.times.D, and use the DeviceIOControl API to communicate with the V.times.D.
FIG. 1 shows the location of where the monitor VID 110 is placed under the Windows 95 Operating System in order to be able to monitor all inputs (i.e., keystrokes and mousestrokes). Hardware 102 includes a keyboard, mice, microphone, handwriting tablet or the like. The hardware 102 forwards the users input to a kernel 104. The operating system includes kernel or inner layer 104 and upper layer 106. Both components include a plurality of window components. The components (i.e., VM's, mangers, drivers, V.times.D's, and VPICD) illustrated in FIG. 1 are well known in the art of operating systems, and do not directly affect the present invention. As such, for the sake of brevity, these components will not be explained in detail herein.
The hardware 102 does not necessarily have to be connected to a conventional personal computer or workstation. It is contemplated that the present invention can be used to control anything. For example, car or home alarms, appliances, audio/visual equipment, cars, etc. Anything that has access to a processor and operating system can utilize the present invention.
The monitor VID 110 is positioned between the two operating system components 104, 106 such that a user's keystrokes or mouse signals are captured prior to being forwarded to application program(s) 118. The present invention requires that the keystrokes entered by the user be captured prior to the operating system forwarding the keystrokes to the foremost application program 118. Voice signals are treated separately since they require additional processing to convert to text, which is done using third party voice-to-text software. The ActiveWords system will capture the text characters from the voice-to-text software before it is provided to the application program 118. Once captured, the source of the text is irrelevant to the present invention.
The Monitor VID 110 is graphically represented to the user in accordance with the present invention via a monitoring bar 325, as shown in FIG. 14. The monitoring bar 315 will be described in greater below in Section IV. 4. Generally, the monitoring bar 315 has two data fields: text field 1410 and feedback field 1420 and a number of icons. Icon 1415 provides access to a productivity center. Icon 1430, shown as C.sup.c, provides the user with access to a control center, which is a central place to manage the present invention. Icon 1460 is referred to as a "Mr. IBeam." Icon 1470 allows a user's profile to be changed. Icon 1440 provides access to a LightEditor (FIG. 19). Icon 1450 provides the user with a find function (FIG. 26). Icon 1480 provides an advanced find feature (FIG. 25). Icon 1490 allows a user to select text, e.g., from a notepad, spreadsheet, e-mail, word processing document, etc.
IV. Architecture and Operation of the Present Invention
As discussed above, the present invention provides semantically driven functionality, thereby making the user's computer "language aware." The present invention is responsive to action words, which are the natural language text entered by the user either via keyboard or voice. The text can be words or phrases. There is no limit to the size of the phrase entered. Although a preferred embodiment limits phrases to 80 characters. Additionally, the user can select a word or phrase from a document, e-mail, database or Internet via his mouse and submit the word or phrase to the ActiveWord system as a potential action word. If the word is an action word, the system will react exactly as if input by the user via a keyboard.
The present invention operates in the background and takes appropriate action when it senses an action word. The present invention is seamlessly integrated with the operating system of the user's computer thereby making it unobtrusive to the user. In an alternate embodiment, the present invention is incorporated into the operating system software. For the user's convenience, the present invention provides a number of user signals and graphical aids that help the user work with the SUI. Described below is the general architecture and operation of the SUI, and its associated components.
The ActiveWords system monitors the user's data input whenever his computer is running, unless the ActiveWords system is turned off by the user. In a preferred embodiment, the user can place the ActiveWords system in "sleep" mode (via, for example, an action word), such that inputted text is not monitored. The services of the present invention are available in all contexts and at all times. Being context free and "aware" of the user's natural language and language(s)-of-art enables the ActiveWords system to assist the user in many useful ways.
Context independence is essential to the effectiveness of the present invention. The present invention works in the same way, no matter what context the user is working in when he requests a service. It makes no difference if the user is working in an application program, a utility program, an Internet browser, or in an operating system work space. The ActiveWords system does not interfere with whatever text services his applications provide. The user can use the full text services of Microsoft Word, for example, along with the full text services of the ActiveWords system. It compliments these application text services by providing greater depth of functionality and universal, context free, operation. This context-free operation enables the user to become reflexive in his use of action words.
Reflexive use means that the behavior in question is unconscious on the part of the person that performs that behavior. Stepping on a break pedal, for example, is reflexive for an experienced driver. Pointing with a mouse or other pointing device is reflexive for an experienced GUI user. These behaviors would not become reflexive if the break pedal only worked to slow the car on some streets, or if the pointing device only worked to move the cursor in some applications and not in others. Because these devices are reliable and work in the same way all the time and in all contexts, the user can become unmindful of them, thereby entrusting those behaviors to her reflexes. From then on, she performs the behavior automatically whenever she desires the result of that particular behavior.
The ActiveWords system may be viewed as providing a virtual personal computer within the user's actual computer. With ActiveWords, the user can give his own names (i.e., action words) to his computer's objects, processes, and features. He is no longer a captive of the interface and naming choices that others have provided. Every user's natural language vocabulary is unique to some degree. His SUI needs to reflect that uniqueness. The ActiveWords system enables each user to use and leverage his own terminology, his own mnemonic metaphors, and the structure of his personal language profile. It seems obvious that an English metallurgist who is an amateur astronomer should have an SUI that is significantly different from the SUI of a French businessman who is interested in soccer.
A. Action Words and Content Words
There are two types of Action Words: code words and dual words. a code word is any character string the user reserves for the purpose of signaling the present invention to provide him with a service. By designating a code word, a user is signaling his intention to never use this combination of letters, symbols, etc. as a content word. The ActiveWords system knows, therefore, that whenever it senses a code word, it may immediately erase it from the text stream. After erasing the code word, the present invention executes a service script associated with that code word. In the rare event when the user wants to type the code word as a content word, he simply turns the SUI off temporarily. In a preferred embodiment, an action word is provided to activate a service script that turns the monitor window off until the next word has been input. Alternatively, an icon on the monitoring bar 315, such as Mr. I-beam, can be used to toggle between sleep and awake mode.
A dual word is any word (or phrase) in the English dictionary (e.g., "file") or a word of-art that has a special meaning in a personal or professional context (e.g., "walkthrough" for programmers). In other words, a user may want a word to have a dual purpose: (1) a content word to be used in an application and (2) an action word to trigger a service. When a dual word is sensed, the present invention recognizes it as an utterance having a dual nature, in that it may be intended either as a content word or action word. Accordingly, when it encounters such an utterance, the present invention must be told by the user that it is an action word (i.e., the user must disambiguate the dual word).
In a preferred embodiment, the present invention provides the user with a simple method for declaring his intention: a double press of the space bar. If the user's intention is to use the entered dual word as a content word, the user does not press the space bar twice. In that event, the present invention ignores the word and continues sensing for the next action word. If his intention is to use it as an action word, the present invention immediately erases the word from the text stream and executes the service script associated with that action word. As should be readily apparent to one skilled in the art, other techniques can be used for disambiguating a dual word.
The present invention is language neutral. In other words, regardless of the user's natural language, English, Spanish, German, French, etc., the present invention operates the same. The user can designate any word(s) as an action word(s). The user can use any nicknaming logic for creating action words. For example, the user might use "ms" as an Code Word prefix to trigger service scripts related to various Microsoft application programs. Accordingly, "msw" could be the code word used to launch Microsoft Word, "mse" to launch Microsoft Excel, "msp" to launch Microsoft Powerpoint, "msa" to launch Microsoft Access, and so on. Obviously, a suffix can also be used instead of a prefix to trigger service scripts. Alternatively, the user can create code words without mnemonic aids such as suffixes and prefixes.
B. Runtime Operation
FIG. 3 is a block diagram of the present invention during runtime operation. The present invention includes a Virtual Input Driver (VID) 110, a microkernel engine (MIKE) 330, a monitoring bar 315, agents 370, agent registry and services 360, third-party applications 118, a wordbase 340, a profiles registry 350, control center 345 and set-up files 335. Window applications 118 include word processors, spread sheets, presentation software, utilities, and the like. The agents 370 are application programs that are dependent upon the present invention (i.e., require input from MIKE 330 to operate), as described in greater detail below. MIKE 330 uses a scripting language to launch an application program(s) 118 or to control functions and features of application program(s) 118. Each function is performed by a service script, which is associated with each action word within the wordbase 340.
MIKE 330 is made up of several components and is shown in further detail in FIG. 4. In operation, a user 310 enters an input via a keyboard or selects text via a mouse. This input is captured by VID 110. All typed keystrokes are received by the VID 110, which extends the functionality of the Win 95 Operating System, before they are dispatched to the applications 118. In other words, the input text stream is "hooked" by the VID 110. In a preferred embodiment, a mouse input is received by both the VID 110 and the Windows applications 118. In other words, the VID 110 only monitors and senses the activity of the mouse. (The present invention monitors the mouse since the clicking of the mouse indicates a change of context or the end of an utterance, which is analogous to pressing the space bar.) In an alternate embodiment, user input 310 is entered via a microphone.
The user input is then forwarded to MIKE 330. When MIKE 330 is inactive, the VID 110 retransmits all user inputs back to the foremost Windows application. The initial settings of MIKE 330 and monitoring bar 315 are stored in the start-up files 335, which are read at start-up and written to after changes or shut-down. Each user has their own start-up files 335.
MIKE 330 displays in the monitoring bar 315 the characters input by the user. It also sends feedback messages and displays activity indicators through monitoring bar 330. The user can interact with MIKE 330 through pop-up menus, as well as via the controls associated with monitoring bar 315. These controls include changing the current user profile, capturing selected text, launching the LightEditor, launching the Control Center, bringing in the Advance Find from the Control Center, displaying Mr. IBeams productivity center, turning on/off the monitoring bar 315, and going into "sleep" mode.
The profiles registry 350 is a listing of all available user profiles. The concept of user profiles is discussed in more detail below. All agents are registered in registry 360. The control center, which is a central place to manage the present invention, has access to the wordbase 340, monitoring bar 315, profiles registry 350, agent registry 360 and agents 370. Each major component of the present invention will be described in detail below.
1. Wordbase 340
MIKE 330 searches for action words or dual words stored in the wordbase 340. In a preferred embodiment, wordbase 340 is a relational database that is constructed using Jet Engine.RTM. available from Microsoft Corporation. Wordbase 340 is where all third party applications register their set of action words. The present invention contemplates, for example, a law wordbase, a medical wordbase, a business wordbase, etc. Thus, the medical wordbase, for example, will include a set of dual words, code words and associated scripts that are specific to the practice of medicine. Upon installation, each of these "third-party wordbases" will be seamlessly incorporated into a user's wordbase 340.
Each action word and it's associated service script comprise an active wordbase item record. Each wordbase item record includes the code word and/or the dual word that will trigger the execution of the service script. A detailed illustration of each wordbase item record is shown in FIG. 21.
When an action word match is found within wordbase 340, MIKE 330 accesses the wordbase 340 and retrieves the service script associated with the active word or dual word. The service script provides a content, retrieval, navigation, information or command service, or a combination of these. Additionally, the wordbase 340 records statistical information concerning the code word or dual word, such as incrementing a hit count, updating last access time, etc. These counts are recorded in the related wordbase item records and are used by the productivity center (FIG. 11) to provide statistical data to the user. The statistical data is used by the user to leverage the ActiveWords training features and improve his productivity. The operations of add, delete and modify can be performed by a user on wordbase 340 via the control center 345 or via a light editor (FIG. 19, which is described in detail below) as should be apparent to a person skilled in the art.
Every time the present invention senses that the user has finished a word, it searches the wordbase 340 to see if that word is in an item record as a code word or dual word. There are four possible outcomes of searching for a word (or phrase) in the wordbase 340:
(1) A matching code word is found in an wordbase item record. In this case, the typed word is immediately erased and the accompanying service script is executed.
(2) A matching dual word is found in an active wordbase item record. In this case, the ActiveWords system immediately gives the user audible and/or visual signals. FIG. 27 illustrates a visual display (i.e., a banner) that can be provided to the user to indicate that a dual word has just been entered. In this example, "Excel" has been typed. The ActiveWords system provides a visual message in the banner--"Dual Word detected. Press SPACE to use it." Additionally, when the present invention senses a dual word it provides an audible signal, such as a bell or whistle. The visible signal can also be provided via a change in where the "eyes" are looking in the Mr. IBeam icon 1460 on the monitoring bar 315. These signals notify the user that he has the option to treat that dual word as either an action word or as a content word. If the user intends the dual word to be an action word, he presses the spacebar a second time. The ActiveWords system immediately erases the word from the application text input stream and executes the accompanying service script provided within the associated wordbase item record. Obviously, keystrokes other than an additional space character can be designated (by the user) to signal the user's choice to treat a dual word as an action word. If, on the other hand, the user intends the dual word to be a content word, he simply continues typing. The ActiveWords system does nothing with respect to that content word, and continues monitoring the text stream for the next active word.
3) No matching dual word is found in the wordbase 340. If one of the dual words in the wordbase has the value "default" (including the quotes as part of the dual word), then this will qualify as a match and the system will perform exactly as it does for number (2) above with respect to the service script accompanying this "default" dual word. This feature of the present invention makes all words actionable. In one embodiment, the "default" script is not activated unless an predefined key (acting as a delineator) is pressed.
4) No match is found in the wordbase 340 and there is no "default" dual word in the wordbase 340. The word is therefore assumed to be a content word. The system takes no action, and continues to monitor the text stream for the next action word.
The default feature is a very powerful component of the present invention. It allows every word to be actionable. Thus, words that are entered or selected by a user, but do not appear explicitly in the wordbase 340, result in a default script being performed. For example, all words that evoke the default script will trigger the same function to occur (e.g., launch a browser or provide a text entry window).
Referring to FIG. 21, each record within the active wordbase includes a plurality of fields. Field C indicates the activation state, on/off, of the code word for this record. Field CW is the code word. Field D indicates the activation state, on/off, of the dual word. Field DW is the dual word. The Comment field allows the user to associate a comment with his action words.
The Action field contains the service script that will be executed upon the activation of an action word. The Category field contains information regarding the category/subcategory indicating where the record is registered. The Editing field defined the way the item is going to be edited. The item record can be edited as free text, free substitution, phone number, address, etc. The Action Type field designates the rules the present invention will follow in executing the script for that particular item. The action type can be one of the defaults--substitution, command, navigation--or the name of an external agent that will perform the action. The Extra field allows the user to provide additional information concerning the action word.
The CWCount field keeps track of the number of times the code word as been used. The DWCount field keeps track of the number of times the dual word has been used. The Xid field shows a special action to be performed. For example, the action or replacement is in the Extra field or the clipboard will be used to make a substitution or the substitution is a password the content of which will not be shown in monitoring bar 315 or enable markup language for this item record. The Modified field shows the last date/time the record was modified. The Accessed field shows the last time the script specified in the action field was executed. The Signature field indicates the creator of the record. The Flags field is system defined. The present invention is not limited to having only these fields within wordbase 340 and other fields are contemplated (e.g., security, product administration, application priority).
The user gains tremendously if any word, in any language, can be used to signal the ActiveWords system. By using words and thereby incorporating natural language logic directly into the SUI, the ActiveWords system becomes very powerful. The ActiveWords system achieves this power by allowing the user to associate service scripts with either code words or dual words, whichever is easiest for him to recall.
The service script specifies the service to be performed whenever the action word(s) within the item record is sensed. Service scripts in the ActiveWords system are written in scripting language. For example, a script for using the previous word a user typed as the find target for a search of a file directory in Windows 95, looks like this.
<erase last word><winstart>f</winstart><delay><last word><enter>
(This script erases the last word type--activates the winstart key--types the letter "f" that triggers the windows find tool--closes the winstart key--waits for 600 ms--and calls in the last word typed--and presses enter to launch the find operation).
Those skilled in the art will readily appreciate that the specific scripting language used is implementation specific. In a preferred embodiment, the scripting language syntax is similar to HTML. An exemplary subset of the scripting language used in the present invention is provided below with reference to TABLE 1.
TABLE 1
1 <F1> Function 1 key.
2 <F2> Function 2 key.
3 <F3> Function 3 key.
4 <F4> Function 4 key.
5 <F5> Function 5 key.
6 <F6> Function 6 key.
7 <F7> Function 7 key.
8 <F8> Function 8 key.
9 <F9> Function 9 key.
10 <F10> Function 10 key.
11 <F11> Function 11 key.
12 <F12> Function 12 key.
13 <LT> Lower than character "<".
14 <GT> Greater than character ">".
15 <ESC> Escape key.
16 <DEL[:##]> Delete key (for deleting)
[repeated ## times].
17 <TAB[:##]> Tab key [repeated ## times].
18 <BACK SPACE[:##]> Back space key (for deleting)
[repeated ## times]
<BACKSPACE[:##]>
19 <ENTER[:##]> Entry key [repeated ## times].
20 <UP[:##]> Up arrow key [repeated ##
times].
21 <DOWN[:##]> Down arrow key [repeated ##
times].
22 <LEFT[:##]> Left arrow key [repeated ##
times].
23 <RIGHT[:##]> Right arrow key [repeated ##
times].
24 <HOME> Home key (goes to beginning of
line, or top of a list).
25 <END> End key (goes to end of line or
bottom of a list).
26 <WINSTART></WINSTART> Windows95 special key to
activate the "START"
button.
27 <WINMENU> Windows95 special key to
simulate a right mouse
click.
28 <ALT></ALT> <ALT> simulates the Alt
key down, </ALT>
simulates the Alt key up. An
<ALT> must always be
closed by an </ALT>.
29 <CTRL></CTRL> Same as Alt but with the
Control key.
30 <SHIFT></SHIFT> Same as Alt but with the Shift
key.
31 <ALTGR></ALTGR> Same as Alt but with the AltGr
key. This key is
included in some keyboards for
special characters.
32 <WAIT[:####] Waits 600 milliseconds (.6
seconds) [or waits the
number of milliseconds
indicated by the number].
33 <MINIMIZE WINDOW> Minimize window.
34 <MAXIMIZE WINDOW> Maximize window.
35 <RESTORE WINDOW> Restore window.
36 <CLOSE WINDOW> Close window.
37 <NEXT WINDOW> Next window.
38 <PREVIOUS WINDOW> Previous window.
39 <MOVE WINDOW> Moves the window.
40 <SIZE WINDOW> Sizes the window.
41 <MONITOR POWER> Sets the state of the display.
This command supports
devices that have power-saving
features, such as a
batter-powered personal
computer.
42 <SCREEN SAVER> Executes the screen saver
application specified in the
[boot] section of the
SYSTEM.INI file.
43 <APP EXIT] Exists the current application.
44 <CLOSE DOCUMENT] Close the current document
(only for MDI
Application).
45 <MINIMIZE ALL> Minimize all windows.
46 <CLOSE APP> Close the current application
(same as Close
Window).
47 <ActiveWord[:WAIT]> Can be any ActiveWord already
existing in any
glossary. [If AW is an
ActiveWord to launch an
application, the WAIT parameter
indicates that
ActiveWords should wait until
the launched app is up
and running to continue
analyzing the rest of the
Action]
48 <LAST WORD[:##]> Retrieves the last word from
the list of Last Typed
Words (LTW) and places it where
the current focus is
[or retrieves the ## word from
the list of LTW].
49 <LAST REPLACED WORD[:##]> Retrieves the last word from
the list of Last Replaced
Words (LRW) and places it where
the current focus is
[or retrieves the ## word from
the list of LRW].
50 <ERASE LAST WORD[:##]> Deletes the last word typed [or
deletes the ## word
from the list of LTW].
51 <ERASE LAST REPLACED WORD[:##]> Deletes the last word replaced
[or deletes the ## word
from the list of LRW].
52 <LAST LINE[:##]> Retrieves the last line from
the list of Last Typed Line
(LTL) and places it where the
current focus is [or
retrieves the ## line from the
list of LTL].
53 <LAST REPLACED LINE[:##]> Retrieves the last line from
the list of Last Replaced
Line (LRL) and places it where
the current focus is [or
retrieves the ## line from the
list of LRL].
54 <LAST APP[:##]> Retrieves the last application
name from the list of
Last Applications Used (LAU)
and places it where the
current focus is [or retrieves
the ## application name
from the list of LAU].
55 <LAST AW[:##]> Retrieves the ActiveWord from
the list of Last Typed
ActiveWords (LTAW) and places
it where the current
focus is [or retrieves the ##
ActiveWord from the list
of LTAW].
56 <LAST NESTED AW[:##]> Retrieves the ActiveWord from
the list of Last
Replaced ActiveWords (LRAW) and
places it where
the current focus is [or
retrieves the ## ActiveWord
from the list of LRAW].
57 <LAST DW[:##]> Retrieves the DualWord from the
list of Last Typed
DualWords (LTDW) and places it
where the current
focus is [or retrieves the ##
DualWord from the list of
LTDW].
58 <MORE INFO> Retrieves information related
with the last AW typed,
from the Comments field.
59 <MORE INFO:COMMENTS> Same as above.
60 <MORE INFO:ACTION> Retrieves information with the
last AW typed, from
the Action field and writes it
as a replacement
ignoring Type and MarkUp
Language tags.
61 <MORE INFO:COUNT> Retrieves information related
with the last AW typed,
from the Count field.
62 <MORE INFO:NORMAL> Retrieves information related
with the last AW typed,
from the DualWord field.
63 <MORE INFO:EXTRA> Retrieves information related
with the last AW typed,
from the eXtra field.
64 <MORE INFO:MASK> Retrieves information related
with the last AW typed,
from the Mask field.
65 <MORE INFO:CATEGORY> Retrieves information related
with the last AW typed,
from the Category field.
66 <MORE INFO:XID> Retrieves information related
with the last AW typed,
from the Xid field.
67 <MORE INFO:AWAPP> Retrieves information related
with the last AW typed,
from the AWApp field.
68 <NESTED MORE INFO> Retrieves information related
with the last nested AW,
from the Comments field.
69 <NESTED MORE INFO:COMMENTS> Same as above.
70 <NESTED MORE INFO:ACTION> Retrieves information related
with the last nested AW,
from the Action field and
writes it as a replacement
ignoring Type and MarkUp
Language tags.
71 <NESTED MORE INFO:COUNT] Retrieves information related
with the last nested AW,
from the Count field.
72 <NESTED MORE INFO:NORMAL> Retrieves information related
with the last nested AW,
from the DualWord field.
73 <NESTED MORE INFO:EXTRA> Retrieves information related
with the last nested AW,
from the eXtra field.
74 <NESTED MORE INFO:MASK> Retrieves information related
with the last nested AW,
from the Mask field.
75 <NESTED MORE INFO:CATEGORY> Retrieves information related
with the last nested AW,
from the Category field.
76 <NESTED MORE INFO:XID> Retrieves information related
with the last nested AW,
from the Xid field.
77 <NESTED MORE INFO:AWAPP> Retrieves information related
with the last nested AW,
from the AWApp field.
78 <DW MORE INFO> Retrieves information related
with the last DW typed,
from the Comments field.
79 <DW MORE INFO:COMMENTS> Same as above.
80 <DW MORE INFO:ACTION> Retrieves information related
with the last DW typed,
from the Action field and
writes it as a replacement
ignoring Type and MarkUp
Language tags.
81 <DW MORE INFO:> Retrieves information related
with the last DW typed,
from the Count field.
82 <DW MORE INFO:AW> Retrieves information related
with the last DW typed,
from the ActiveWord field.
83 <DW MORE INFO:EXTRA> Retrieves information related
with the last DW typed,
from the eXtra field.
84 <DW MORE INFO:MASK> Retrieves information related
with the last DW typed,
from the Mask field.
85 <DW MORE INFO:CATEGORY> Retrieves information related
with the last DW typed,
from the Category field.
86 <DW MORE INFO:XID> Retrieves information related
with the last DW typed,
from the Xid field.
87 <DW MORE INFO:AWAPP> Retrieves information related
with the last DW typed,
from the AWApp field.
88 <UNDO> Undoes the last replacement.
89 <DATE> Inserts the current date.
90 <TIME> Inserts the current time.
91 <SCRATCH PAD> Brings up a text capturing
window.
92 <DLL:DllName.dll:Function> Calls the specified function
from a .DLL. The
Function parameter is case
sensitive.
93 <LAST something[:N.vertline.LIST]][:D]> Applies to all the
"LAST" commands (e.g. word,
replaced word, line, etc.).
When a number is
specified, the something in the
Nth position is
returned (normal behavior). The
user can also specify
a group of elements through a
LIST. This list may
have any of the forms:
1-3
1,2,5
4-8
1,3,5-10
If the last parameter is D, the
last something(s) are
returned with their respective
delimiters.
94 <NOTIFICATION[:Bannertype][Sound Indicates that a notification
must be presented when
file]> the term is hit. The Banner
Type can be:
GO
FIND
CLOSE
If no Banner Type is specified,
the default for all other
actions is DEFAULT. The user
can specify a sound
file other than the default.
95 <ONLY:App 1, App2 . . . AppN> Specifies that the current CW
and DW should only be
executed if they are being
called from one of the
specified applications.
96 <NOT:App1, App2 . . . AppN> Specifies that the current CW
and DW should not be
executed if they are being
called from one of the
specified applications.
<USER INPUT[:Question]> Brings up the ScratchPad as a
text capturing window,
with a user definable question
or message.
97 <INPUT INFO> Inserts the information
captured by the last call to the
<USER INPUT>tag within
the current script.
98 <{VARIABLE}> Replaces the tag for the value
specified by
VARIABLE, where VARIABLE can be
other tags,
such as LAST WORD. The result
is a new string to
be evaluated.
99 <ED:{VARIABLE}[WORD1, WORD2 . . . Executes the respective
CodeWord in positional order
WORDN]:CW1, CW2 . . . CWN> depending on the number
obtained from resolving the
VARIABLE, where if the result
is 1 (one) the first
CW is executed, if 2 (two) the
second CW is executed
and so on. If the result from
resolving the
VARIABLE isn't a number, but
instead a word,
following should be the same
number of words to
compare the VARIABLEs value,
and once again,
depending on which word
matches, the corresponding
CW in positional order is
executed.
100 <WITH: Word15 . . . Word2, Word1 .vertline.Word1> Executes the
rest of the script associated with the item
containing the DualWord found,
only if the previous
words match the parameters.
Where Word1 should
match with the LastWord Typed
and so on. Each
word separated by a comma is
treated as a Boolean
AND. Each word separated by the
.vertline. character is
treated as a Boolean OR.
Obviously, the present invention contemplates that the service script syntax and content will expand and evolve. The present invention is not limited to the service scripts provided in TABLE 1. Rather, TABLE 1 is merely exemplary, as should be readily apparent to those skilled in the art.
Scripts within wordbase 340 can also be qualified. For example, a script can be designated as "only" if a user only wants an action word to cause a function within a certain environment (e.g., a replacement only in his e-mail application, but nowhere else). A script can also be "contra" indicated if a user does not want an action word to cause a function within certain environments (e.g., perform a replacement of text unless he is in his e-mail application). This feature allows the same action word to produce different functions depending upon the application that is foremost (i.e., the application the user is working on at the time.) Thus, once an active word is entered, the present invention checks the wordbase 340 for a match. If it finds a match, but the script indicates that within the present user environment (e.g., e-mail) that the designated function should not be performed then the system continues to check the wordbase 340 for another match. If it comes across another match, and there is no contra indicator, then the script is performed.
In one embodiment, a script can include an action word. Thus, executing the script will require accessing the wordbase 340 to determine the next action to be performed.
MIKE 330 supports several users and user profiles. On startup, MIKE 330 checks profiles registry 350. The current user and profile can be changed on-the-fly via either an active word or via an option control associated with monitoring bar 315 (i.e., icon 1470). FIG. 12 is a high level block diagram of a wordbase. It includes two user profiles 1230 and 1240 and a set of shared item records 1220. A list of all the user profiles and shared item records is provided via a master index 1210. The wordbases can be shared among different users on a system. The wordbase 340 may be stored at the network level (e.g., on a server) so that all users can obtain access and are read only. The present invention contemplates that the wordbase 340 will be accessible over a LAN, WAN, as well as other types of networks. Each user profile is a unique view into the shared wordbase that contains everything the user defines as his profile and the settings for these items. An editor 1235 is provided, which can be accessed via a control center 345, as described below, to edit the items contained in the user's profile.
Referring to FIG. 23, a view of wordbase 340 (as displayed by the control center 345) is shown. The master index 1210 is shown in window 2310. The master index 1210 is divided into drawers (e.g., Hobby, Places, etc.) and folders (e.g., Cities, States, etc.). FIG. 23 illustrates only six of the columns within wordbase 340. The columns of the wordbase 340 have been described with reference to FIG. 21, and for the sake of brevity will not be explained again.
FIG. 28 and FIG. 29 illustrate the concept of user profiles. A user's profile includes a combination of third party applications and wordbase item records, which are located in folders. Different profiles can be created by enabling/disabling the ActiveWords system for certain applications and by turning on/off folders of wordbase item records. Furthermore, drawers and folders can be assigned a priority.
FIG. 28 illustrates a list of applications (e.g., Microsoft Word, Ecco Pro, Internet Explorer, etc.). In a preferred embodiment, the present invention requires a user to configure an application after a user launches the application for the first time. These applications can be configured by the user to be on/off or placed in sleep mode. If an application is on, the ActiveWords system operates as described herein. If the application is off, the ActiveWords system is disabled while the user is using this application, but enabled in other contexts. Sleep mode disables the ActiveWord system, but still allows a user to enter action words via an ActiveWord Scratch Pad (FIG. 30). The Scratch Pad simply provides a text entry field to the user. While in sleep mode, action words entered directly into the application will not be sensed by the ActiveWords system. In an alternate embodiment, only certain categories of action words are placed in "sleep" mode. This allows a user to still use a subset of his active words under normal use (e.g., entered or selected from any application or within the operating system environment).
FIG. 29 illustrates the drawers and folders that are part of a user profile called "My Profile." Profile names are user assignable. Drawers and folders can be turned on /off. Each folder contains a plurality of wordbase item records. By turning a folder "off," all wordbase item records within the folder are disabled. If a folder is "on" for a given profile, the profile is extended to include the pattern of wordbase item records in the folder that are turned on and off. For example, FIG. 23 illustrates that certain codes words and dual words can be disabled (e.g., the dual word "items" is disabled). Each folder is further assigned a priority. As such, if an code word appears in more than one drawer and folder, the service script within the highest priority drawer/folder will be executed. If a dual word appears in more than one drawer/folder, a preferred embodiment of the present invention provides for multi-item resolution, as described below.
A user can thus create multiple profiles by turning applications, drawers and/or folders on/off and by assigning priorities to each of the drawer/folder combinations. Thus, a user may have several user profiles: one for work, one for entertainment use of his computer, and several for each of his community and hobby interests. The windows shown in FIG. 28 and FIG. 29 are available via the control center 345.
The user profile allows, for example, an English speaking metallurgist who is interested in astronomy, to share a computer with someone having very different user profile. His sharing partner may be a French businessman who has an interest in soccer. Their respective user profiles are comprised of different selections (items on or off) and precedence-orders for the Applications in the Word Base.
An English or French user of the ActiveWords system will populate his wordbase 340 with code words and dual words that make sense to him as an English or French speaker. An English speaking metallurgist, for example, would have additional "word-of-art" items records (i.e., action words) related to metallurgy. These metallurgy terms enable the ActiveWords system to provide services tailored to the user's needs as a metallurgist. The user would specify that his metallurgy items records must override any items records that he has in the wordbase 340 for Standard English. Therefore, the service script associated with "steel" in his Metallurgy item record would override the service script associated with "steel" in his wordbase item record for Standard English.
An English speaking metallurgist would have service scripts associated with the word "mercury" in both his Standard English and Metallurgy Applications. When he is at work, his user profile priority settings tell the ActiveWords system to override associations for "mercury" in his Standard English item record in favor the his Metallurgy item record for "mercury." If our English metallurgist is also an amateur astronomer, he might have an wordbase item record for "mercury" as part of his ActiveWords Astronomy Application (a hypothetical application). One of his user profile's, that he uses for his hobby activities, would allow him to give the item records associated with his Astronomy Application precedence over the item records associated with his Standard English and Metallurgy Applications. In that case, any service scripts triggered by the planet name "mercury" would take precedence over service scripts triggered by the metal "mercury" in his Metallurgy or Standard English Applications.
In a preferred embodiment, code words and dual words are not sensitive to upper/lower case. As such, "Mercury" and "mercury" are handled in exactly the same manner.
The ActiveWords system leverages the precedence-order of words that appear in two or more wordbase items records (i.e., as part of two or more ActiveWords Applications). The system uses the precedence-order in the user profile to determine which service script should be triggered or otherwise given precedence when an action word matches two or more wordbase item records.
In this way, ActiveWords takes the user's universe of meanings and contexts into account, at the level of single-word or multi-word expressions. The ActiveWords system allows the user to designate and manage as many ActiveWords applications and user profiles as he requires.
ActiveWords enables the user to manage and organize his action words. The use of "mercury" above is a good example. In addition to managing its use in three contexts (Standard English, Metallurgy and Astronomy), the user may also wish to have "mercury" capitalized when he uses it as a planet's name. He may also want ActiveWords to substitute "Mercury" for "mc." The present invention allows the user to have one place to go and one set of tools for specifying and managing all his uses of a given word or a group of words.
The ActiveWords system includes the capability to write small agent applications that will automatically create an active word record (action word and/or code word and the accompanying ActiveWords script) that corresponds to the ASCII value of that field. For example, an ActiveWords agent for Microsoft Outlook can be written to access each of the contact name and company name fields in the Outlook database and that creates an ActiveWords item for each first name, middle name, last name and company name in that Outlook database and stores each item in the ActiveWords WordBase 340. Thereafter all those names are actionable as WORDS. The same approach can be taken with names, or part numbers, part names, or account numbers, telephone numbers, in any database.
2. Services performed by the ActiveWords System
As discussed above, the ActiveWords system can perform a variety of services in response to an action word, as discussed above. In a preferred embodiment, service scripts are constructed using a combination of these four service types:
(1) Content service--alters the user's text content in some way. Transforming a shorthand word into its longhand form is an example (e.g., typing "ddl" in order to have the ActiveWords system type "due diligence").The present invention can be set to automatically capitalize the first letter of proper nouns. Hence "tom" is automatically capitalized. Likewise for Washington, January, pluto, easter, lincoln, cobol, etc. From the day the ActiveWords system is installed, the user can forget about capitalizing proper nouns that are common to his natural language and his language(s) of art. Similarly, contractions automatically receive an inserted apostrophe, e.g., can't, won't, couldn't, shouldn't, hadn't, wouldn't, etc. Likewise with hyphenations: user-friendly, client-server, single-keystroke, etc. The ActiveWords system can also automatically corrects double caps at the beginning of a word (occurs when the user accidentally stays on the shift key too long), automatically capitalizes the first letters in sentences, automatically eliminates double spaces between words (if the user wishes) and automatically corrects inadvertent use of the "Caps Lock" key so that "tHIS " is automatically changed to "This".
(2) Information service--assembles and delivers software and information resources to the user's screen (e.g., having the ActiveWords system look up a word in a dictionary, database or at a website via an internet browser).
(3) Command service causes an operation to be performed by a software application, a utility program, or by the operating system (e.g., opening a word processing document).
(4) Navigation services causes navigation within an application or launches an application.
3. MIKE 330
FIG. 4 is a block diagram of MIKE 330. MIKE 330 includes a data manager 410, a fetcher 420, a command interpreter 430, a navigational manager 440, a state table 450 and an agent services module 460. Once the data has been captured by VID 110, it is sent character by character, to data manager 410. Data can also be entered via a microphone. Three actions occur while the present invention monitors for user input: updating the state table 450, searching for action words and updating an archive (not shown) with the contents of the current text stream.
The data manager 410 is a simple character store that ensures that no character is lost in case the system is busy. It works as a circular or rolling storing list of 200 bytes under a FIFO protocol. Data manager 410 is independent of the stream of inputs stored in an archive (not shown) by the present invention. The purpose of the data manager 410 is to detect an action word. An utterance is cleared by the data manager 410 upon the activation of a delineator. A delineator is a keystroke (or other indicator) that signals data manager 410 that a complete set of keystrokes (e.g., word, group of words, number, etc.) has been entered. Bottom line, any action by the user that provides an indication to the data manager 410 that entered or selected text is an action word (or phrase) can be a delineator. Example delineators include the pressing of the space bar, change of application context, an end of word punctuation, pressing the right or left buttons on the mouse, a function (or other predefined) key, or the like. A delineator can also be a biometric signal, such as eye movement, hand signal, etc. Time can also be used as a delineator. A predetermined (and/or assignable) time interval could be selected (e.g., two seconds) after which an entered or selected word would be compared against the wordbase 340. Each time data manager 410 is cleared, it begins monitoring for another action word. The type of delineators used in a preferred embodiment of the present invention is user assignable.
The data manager 410 also sends all characters and special keys (re-transmission of typed characters) from the user's data stream to the command interpreter 430. The command interpreter 430 passes each utterance to fetcher 420. Fetcher 420 is responsible for searching within wordbase 340 for action words. The wordbase 340 is searched after each delineator (e.g., space, tab comma, other punctuation, etc.). Wordbase 340 is searched to determine whether the utterance is actionable. Paired with each action word in every item record of wordbase 340 is a service script, as described above.
When data is entered via a microphone, the voice signals are recognized by voice recognition software and the generated text is provided to the command interpreter 430 via AW services 460. The present invention further contemplates receiving the translated voice signals via other components and/or drivers. Otherwise, operation of the present invention is analogous to when data is entered via a keyboard or selected via a mouse.
In a preferred embodiment, the fetcher 420 uses the Jet Database Engine to look for utterances inside the wordbase 340. That is, fetcher 420 determines whether the utterance matches an item record within the wordbase 340. If the fetcher 420 finds a match, it sends the action (i.e. service script), type, comments, and informational fields associated with the action word to the command interpreter 430.
The command interpreter 430 executes service scripts associated with an action word. The command interpreter 430 sends all keyboard related actions (replacements, special keys, and the like) associated with fetched action words through the VID 110 to the applications 118. For example, when the action word entered by the user requires a substitution (e.g., "June" to "June"), the command interpreter 430 forwards the replacement text to the application program (e.g., wordprocessor) via VID 110.
The data manager 410 can also activate an action box 470. The action box 470 is also referred to herein as a scratch pad (FIG. 30). The action box 470 notifies the data manager 410 when any of its options or related actions are executed. The action box is a dialog feature for general purposes, such as inputting text in response to a request from a service script. In its most common use, the scratch pad is a window that enables a user to enter action words when the user does not want to enter text into his foremost application. The display of a window in which text can be entered and selected is well known in the art.
The data manager 410 sends the typed characters (converted from ScanCodes to characters), feedback messages, control changes, and activity indicators to the monitoring bar 315. The monitoring bar 315 sends notifications of changes to control option settings the user issues to the data manager 410 via the monitoring bar's icons or pull down menu.
The data manager 410 notifies the command interpreter 430 when a delineator is detected. This indicates that the user has completed inputting a complete utterance that needs to be matched against the item records in the wordbase 340 to determine if it is an action word. The command interpreter 430 first compares each word with a list of integrated action words, which are stored locally within the command interpreter 430. Integrated action words are special action words for controlling various functions directly relating to the user's computer or the ActiveWords system, such as temporarily deactivating the present invention for the next word thus preventing the next word from being matched against the wordbase 340. This is referred to as putting the monitoring bar to sleep. The present invention also contemplates designating certain common spelling mistakes or proper nouns as integrated action words. For example "tHe" can be automatically replaced with "The" without having to access the wordbase 340. If it is not an integrated active word, it sends the word to the fetcher 420 so it can check the wordbase 340 for a match with the action word.
If a match is detected by the fetcher 420, the command interpreter 430 notifies the data manager 410 of the type of service script (e.g., substitution, control, navigation, in-place transformation) associated with the action word. If the service script calls for a text substitution, the command interpreter 430 also sends the replacement text to the data manager 410 for further processing (e.g., to act on another action word embedded in the script).
Command interpreter 430 receives the service scripts that fetcher 420 locates within wordbase 340 and proceeds to interpret them. A service script is made up of a series of commands which can range from a simple word replacement to a call to an application program. Scripts also allows the present invention to use the functionality included in agents 370 or application programs 118. For example, a third-party PIM application can directly insert, using Microsoft OCX controls, an appointment into their database using their own insertion function by simply making a call from the service script. This is a powerful and simple way, via the ActiveWords system, for the user to leverage the capabilit |