System and method for dynamically controlling remote processes from a performance monitor5432932Abstract Local and remote processes can be controlled from a data processing system performance monitor. Multiple processes can be controlled concurrently with a single action selected. Processes to be controlled can be ranked when presented to a user, to assist in determining problematic processes that need attention. Process data is captured dynamically at local and remote processes using a daemon to minimize system overhead in monitoring and controlling processes. Claims What is claimed is: Description A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
TABLE 1
______________________________________
Console Information
______________________________________
unsigned short left;
/* geometry: left offset
*/
unsigned short top;
/* geometry: top offset
*/
unsigned short w;
/* geometry: window width
*/
unsigned short h;
/* geometry: window height
*/
unsigned short count;
/* count of instruments
*/
unsigned short major;
/* major protocol version
*/
unsigned short minor;
/* minor protocol version
*/
______________________________________
For each instrument within the console being recorded, the information of Table 2 is written to the recording file 100, such that the format of the internal control block consistency is maintained between the subsystems. This information is written at 184 (FIG. 9) when a new recording file is being created, as determined at 182. Referring to Table 2, the graph type indicates whether this is a monitor-type (i.e. fixed) or skeleton-type (created from a skeleton). The graph collection name is the name of the console to which this instrument belongs. The subgraph number indicates which positional instrument this is within the console. The offsets are the location of the instrument within the console, specified as a percentage of the height (for top and bottom offsets; top being 0%) and width (for left and right offsets; left being 0%). The number of pixels to shift specifies how many picture elements (pixels) to shift the time graphs between subsequent data recording observations. The space between bars parameter is the number of pixels to space between bar graph elements being displayed. The history parameter specifies the number of observations to save in the display buffer for an instrument. The display history buffer is a `cache-like` buffer which maintains recent data displayed on the display. The time interval parameter specifies the data recording sampling frequency, in milliseconds. This time interval allows the granularity of samples to be varied in real time, and further allows for differing instruments to record the same value at different granularities, or frequencies. The index into tile array is a number that identifies a tile "pattern" in an array of tile patterns (e.g. vertical stripes, horizontal stripes, diagonal stripes, checkerboard, cross-hatch, etc.). These patterns can be combined with the foreground and background colors of a statistic being displayed to help differentiate it from other statistics in an instrument. The style parameter indicates the primary style of the instrument, such as a line graph, area graph, skyline graph, bar graph, state bar graph, state light graph, speedometer graph or pie chart. The stacked parameter specifies whether stacking is to be used for values that use the primary style
TABLE 2
______________________________________
Instrument Information
______________________________________
char *typ; /* graph type */
char *id; /* graph collection name
*/
unsigned int
seq; /* subgraph number */
unsigned int
x; /* offset from left of Form
*/
unsigned int
y; /* offset from top of Form
*/
unsigned int
x2; /* offset from right of Form
*/
unsigned int
y2; /* offset from bottom of Form
*/
unsigned int
br; /* no of pixels to shift/obs.
*/
unsigned int
sp; /* space between bars
*/
unsigned int
hist; /* history, # of observations
*/
unsigned int
t; /* time interval, millisecs
*/
char foregr[64];
/* foreground color name
*/
char backgr[64];
/* background color name
*/
short tile.sub.-- ix;
/* index into tile array
*/
graph.sub.-- style
style; /* primary style of graph
*/
boolean stacked; /* True if stacking active
*/
______________________________________
The basic description of each of the values of an instrument is stored in a record type shown in Table 3. The svp field is used to identify the value within the instrument and to match the following two record types (defined in Tables 4 and 5) with this one. The field is interpreted as two 16-bit values, where one identifies the instrument and the other gives the relative value number within the instrument. Again, this same principle is used for the two record types described next. The r1 and r2 values allow for scaling of graphs to match the data being recorded/displayed. There is a threshold alarm value, to trigger an action as described below. The index into tile array provides for differing tile patterns to be used for the graph fill. The graph style saves the style of graph to be displayed on a subsequent playback. Weighting allows to average more than a single sampled value to include the result of multiple samples taken over a period of time, thus providing a way to stabilize/average widely varying data samples. The descending flag indicates an alarm is to be triggered when the sampled value drops below (as opposed to going above) the threshold value. The path, label, and color fields are self-explanatory.
TABLE 3
______________________________________
Value Display Information
______________________________________
struct StatVals
*svp; /* Statistics value pointer
*/
unsigned long
r1; /* scale min value
*/
unsigned long
r2; /* scale max value
*/
unsigned long
thresh; /* threshold value for alarm
*/
short tile.sub.-- ix;
/* index into tile array
*/
unsigned style; /* graph style for this value
*/
unsigned weighting;
/* true if weighting/
averaging */
unsigned descending;
/* true if threshold
is descend. */
char path[128];
/* path name of statistic
*/
char label[64];
/* any user-defined label
*/
char color[64];
/* name of color to
plot value */
______________________________________
The contents of the record type of Table 4 might as well have been included in the former record type of Table 3. The current format is chosen because it matches the internal control block format of the performance tool (i.e. the same as blocks created from the configuration file by the configuration. There is one instance of this record type for each value defined in the console. The name and description of the value are self-explanatory. The data type field specifies the type of data that is recorded (e.g. counter (counts/time interval) or quantity data(Cumulative count)). The data format field specifies the internal format of the data (e.g. floating point, long word, short word, character, etc.).
TABLE 4
______________________________________
Value Detail Information
______________________________________
unsigned long
svp; /* statistics value pointer
*/
char name[SI.sub.-- MAXNAME];
/* name of
value */
char descr[SI.sub.-- MAXLNAME];
/* description
of value
*/
enum ValType
value.sub.-- type;
/* value type
*/
enum DataType
data.sub.-- type;
/* data format
*/
______________________________________
Whenever a set of data values is received for an instrument that is currently recording, a record as shown in Table 5 is written to the recording file 100. The svp pointer has previously been described. The actual data and delta values are self-explanatory. The instrument identifier field is an identifier value that tells which instrument to which this array of recorded values belongs. Count is the number of values contained in this record. The two time fields are self-explanatory, and are used to timestamp the values that were captured. The Instr.sub.-- Def data structure defines an "array of data reads" that has "n" records (as specified by the "count" field), the records being of the format defined by the Per.sub.-- Val.sub.-- Rec structure shown in this Table 5.
TABLE 5
______________________________________
Data Value Records
______________________________________
typedef struct
long svp; /* statistics value
pointer */
union Value
val; /* actual data reading
*/
union Value
val.sub.-- change;
/* delta value
(value change)
*/
} Per.sub.-- Val.sub.-- Rec;
typedef struct {
unsigned short ssp;
/* instrument identifier
*/
unsigned short count;
/* count of values
in record */
struct timeval time;
/* time of data reading
*/
struct timeval time.sub.-- change;
/* elapsed time since
previous reading
*/
Per.sub.-- Val.sub.-- Rec
r[MAX.sub.-- PLOTS.sub.-- PER.sub.-- GRAPH];
/*
array of data reads
*/
} Instr.sub.-- Def;
______________________________________
The recording subsystems interfaces of FIG. 5 are further expanded in FIG. 10. The interface 140 between the GUI 80 and the recording system 20 comprises messages from the GUI 80 to start/stop console recording and to start/stop instrument recording. The recording subsystem 20 can send a message to the GUI 80, to be presented to a user/operator of the performance tool, on whether the user desires to append or replace the recording file. The GUI 80 returns a yes/no user response in a message to the user. The interface 142 between the data value receiver subsystem 60 and the recording subsystem 20 comprises data blocks. The recording subsystem 20 does not have to worry with, or maintain, the origin of the data, as this operation is done by the data value receiver subsystem. The recording subsystem 20 also treats local and remote data statistics identically, further providing minimal overhead delays when recording data. This is because all data, whether local or remote, is treated the same by the data value receiver subsystem 60. Thus, based on this modular design, data can be quickly recorded in real-time as the overhead for receiving a packet of data and storing in the recording file is minimal. Further, as the overhead is minimal, recording can occur concurrently with the display of data, as will be later described. The interface 141 between the recording subsystem 20 and the recording file 100 comprises the data to be recorded (described above), as well as the console, instrument, value display, and value detail information, thus maintaining the context of the stored recording data. This information (console, instrument, value display, and value detail) is obtained across interface 143 from the configuration subsystem 30, as initiated by a request for configuration information from the recording subsystem 20. IMPLEMENTATION OF PLAYBACK SUBSYSTEM Referring initially to FIG. 12a, playback 234 is initiated from the "File" 232 menu of the main window of the performance tool user interface 230, as shown in FIG. 12a. When the "playback" menu item is selected (as determined by the GUI), a list of files 240 available for playback is presented, as shown in FIG. 12b. The file list consists of all files in the "$HOME/XmRec" directory with a prefix of "R.". A user can use the filter selection or button 248 and the filter portion of the file selection box at 242 to look for other masks in whichever directory they want. To select a file to replay, a user clicks on it as indicated and then on the "OK" button 244 or double-clicks on the file name. The selection box will remain open after a user selects a file to replay. This allows the user to select more than one file. To make the selection box go away, a user clicks on "Cancel" 246. When a user selects a valid playback file, GUI instructs the playback system to open the file, as detected at 252. The performance tool reads the console configuration from the recording file and creates the console at 254 of FIG. 15. The playback console is constructed from the console, instrument, value display, and value detail information records read in from the start of each recording (information structures described in Tables 1-4). This data is used to construct the playback console in the same manner that the data display subsystem constructs a regular console from data it reads in from the console configuration file. The main difference is that the creation of the playback console does not allow the normal console command pulldown or popup menus, but instead creates a special set of buttons to control the playback functions (e.g. Eject, Erase, Rewind, Seek, Play, Stop, Slower, and Faster), as shown in FIG. 12c at 250. Playback doesn't start until a user clicks on the "Play" button. The functions of the buttons as selected by a user, and the resulting operations, are as follows: Eject Immediately stops playback, closes the console, and closes the playback file. To restart playback you must reselect "playback" from the "File" menu of the main window and reselect the playback file 100. Internal to the performance tool, and in reference to FIG. 4, the GUI 80 component gets notified via user controls 130 that the "Eject" button was depressed and sends a message to the playback subsystem 50 to stop the playback as detected at 256 (FIG. 15). The playback subsystem 50 then calls the data display subsystem 40 to remove the associated playback console and cleanup at 258 of FIG. 15. Next, the playback subsystem 50 closes the associated recording file 100 and exits. Erase Allows a user to erase a playback file. When this button is selected, a dialog window will pop up. The dialog window warns that the user has selected the erase function, and indicates the name of the file currently playing back from. To erase the file and close the playback console, a user selects "OK". To avoid erasure of the file, a user selects "Cancel". Internal to the performance tool, the GUI 80 component gets notified via user controls 130 that the "Erase" button was depressed and sends a message to the playback subsystem 50. The playback subsystem 50 sends a message to the GUI 80 to display a dialog window to inform and solicit a response from the user. The user is prompted to confirm the erasure of the recording file 100 or cancel their request. If the user confirms the desire to erase the recording file (via user controls 130), the playback subsystem 50 will delete the recording file 100 at 282 (FIG. 15) and then call the data display subsystem 40 to remove the associated playback console from the display and cleanup. Rewind Resets the console by clearing all instruments and rewinds the recording file 100 to its start. Playback does not start until a user selects "Play". The "Rewind" button is not active while playback is ongoing. Internal to the performance tool, the GUI component 80 gets notified via user controls 130 that the "Rewind" button was depressed. The GUI sends a message indicating this selection to the playback subsystem 50. The playback subsystem 50 detects this (268 of FIG. 15) and sends a message to the data display subsystem 40 to reset all the console instruments back to their initial state (270 of FIG. 15). The playback subsystem then resets a pointer to the beginning of the recording file 100. The playback of a recording does not start until a user selects the "Play" menu button. Seek User selection of "Seek" pops a dialog box that allows a user to specify a time desired to seek for in the playback file 100. The time can be set by clicking on the "Hour" or "Minute" button. Each click will advance the hour or minute by one. By a user holding the button down more than one second, the hour or minute counter advances faster. Once the digital clock shows the time desired to seek for, a user then clicks on the "Proceed" button. This will cause all instruments in the console to be cleared and the playback file to be searched for the specified time. Internal to the performance tool, the GUI 80 component gets notified via user controls 130 that the "Seek" button was depressed and sends a message to the playback subsystem 50. The playback subsystem 50 sends a message to the GUI component 80 to display a dialog box to allow the user to specify a recording time to "seek to". Each data element in a recording has a timestamp that was affixed to the data value when the data was gathered by the data supplier. When the recording subsystem 20 records the data, it preserves the original timestamp. After the user selects the "seek to" time, the GUI 80 passes this parameter to the playback subsystem 50. The playback subsystem 50 detects this at 272 (FIG. 15) and then calls the data display subsystem 40 to reset the recording console instruments to their initial value. The playback subsystem then opens the recording file 100, reads in the graphical context of the recording, passes this data to the data display subsystem 40, and reads recording data from the recording file until it finds the specified "seek to" time. As the playback system 50 reads the recorded data into memory, it checks the timestamp of each data entry and can thus seek to a particular point in time of a recording. The playback subsystem then sets the playback time pointer to this "seeked" data record at 274 (FIG. 15), and then waits for the user to select, via GUI 80, the "Play" button to start the playback from this recording time record. In situations where a playback file spans over midnight, so that the same time stamp exists more than once in the playback file, the seek proceeds from the current position in the playback file and wraps to the beginning if the time is not found. Because multiple data records may exist for any hour/minute, "Play" should be used to advance to the next minute before doing additional seeks on the same time, or seek for a time one minute earlier than the current playback time. The "Seek" button is not active while playing back. Play Starts playing from the current position in the playback file. While playing, the button's text changes to "Stop" to indicate that playing can be stopped by clicking the button again. Immediately after opening the playback console, the current position will be the beginning of the playback file. The same is true after a rewind. Internal to the performance tool, the GUI 80 component gets notified via user controls 130 that the "Play" button was depressed and sends a message to the playback subsystem 50. The playback subsystem 50 detects this at 260 and tells the GUI subsystem 80 to change the "Play" button to a "Stop" button and then starts to feed the data display subsystem 40 with recording data from the current position of the recording file at 262 (FIG. 15). Initially, playing back is attempted at approximately the same speed at which the data was originally recorded. The speed can be changed by using the "Slower" and "Faster" buttons. While playing back, neither the "Rewind" nor the "Seek" buttons are active. The playback subsystem 50 continues to feed recording data to the data display subsystem 40 until it reaches the end of the recording file 100, or the user presses the "Stop" button via user controls 130. If the user presses the "Stop" button, the GUI 80 is notified and sends a message to the playback subsystem 50. If "Stop" is signalled, the playback subsystem 50 tells the GUI to change the "Stop" button to a "Play" button and then stops feeding the data display subsystem 40 with recording data. The playback subsystem 50 then waits for an indication that the user has selected another action. Slower A user clicks on this button to cut the playback speed to half of the current speed. The GUI 80 gets notified that the "Slower" button was depressed via user controls 130, and sends a message to the playback subsystem 50, where it is detected at 276 (FIG. 15). The playback subsystem 50 divides its playback rate parameter in half at 278 (FIG. 15), so that it now feeds the data to the data display subsystem 40 at half its present rate, thus providing a variable playback rate. Faster A user clicks on this button to double the playback speed. The GUI 80 gets notified that the "Faster" button was depressed via user controls 130, and sends a message to the playback subsystem 50, where it is detected at 276 (FIG. 15). The playback subsystem 50 doubles its playback rate parameter at 278 (FIG. 15), so that is now feeds the data to the data display subsystem 40 at double its present rate, thus providing a variable playback rate. 00:00:00 At the far right of a console is a digital clock. It shows the time corresponding to the current position in the playback file 100 or zeroes if at the beginning of the file 100. As play back proceeds, the clock is updated. This is done by reading the time stamp associated with the playback data being read from the recording/playback file. Recordings from instruments contain control blocks describing the instrument and console from which the recording was done, as previously described. There are a few possible surprises that may occur when a user attempts to playback from a file that does not contain valid configuration and/or statistics data. Playing from saved buffers When the buffer of an instrument or console is saved, that buffer may not be full because the monitoring has not been going on for a long enough time. If such a recording is replayed, the playback will show values of zeroes up to the up to the point where real data values are available. Unsynchronized Instruments Playback from recordings of multiple data supplier hosts in one console behaves just like the real console in that time stamps are applied to each instrument (where applicable) as they are read from the data. This reflects the differences in time-of-day as set on the data supplier hosts and thus should not be a surprise. However, these "time warps" do influence the "Seek" function and the current position clock. Recordings from Instantiated Skeleton Consoles Each time a skeleton console is instantiated, the actual choices made are very likely to vary. Even when, say, the same three remote hosts in two instantiations are selected, the sequence in which the hosts appear in the instantiated console is very likely to be different, due to various response delays inherent in a multi-computer data processing system interconnected via a communication network. This is no problem as long as new recordings are created for each instantiated console. However, if a recording is appended to a previous one with the same name, things will get messy. The reason is that a recording contains the definition of the console only once: at the beginning of the recording. During playback, when the position where a different instantiation was appended to a previous recording is reached, it is assumed that the relative position of instruments and values are unchanged. FIG. 13 further expands the interfaces of the playback system shown in FIG. 6. The interface 129 between the GUI 80 and the playback subsystem 50 comprises a message initiated by an operator to open/close a recording. The playback subsystem responds to the GUI with a list of recordings available and the user's selection is returned from the GUI to the playback subsystem. Further messages from the GUI, as initiated by an operator, are to play/stop, rewind, seek, slower/faster and erase a recording file 100. The interface 126 between the recording file 126 and the playback subsystem 50 provides the actual data to be displayed on a computer console, such as that shown in FIG. 12c. Further information read from the recording file 100 includes console, instrument, value display and value detail information. Again, this information is used to preserve the display context of the data to be presented to a user. All information and data read at interface 126 is passed immediately to the data display subsystem 40 at interface 128. Minimal system overhead is required to read the data and display it, allowing for other subsystem activities to occur with the actual displaying of data, such as recording the same or other performance data. Concurrency of Playback with Recording When recording to, and playing back from, a linear media such as a magnetic tape, one can only have limited playback control while recording because the "read head" follows the "write head" while the tape mechanism is moving. This arrangement is quite inflexible and does not allow functions like rewind, search, faster, slower, or play/stop while recording. In the preferred embodiment disclosed herein, recording is done on a filesystem that allows concurrent reading writing to a common file. Therefore, the record and playback functions are more independent in their operations than those of a linear media. The record function can continuously record prespecified data while a playback function can simultaneously read the data already recorded, up to the currently recorded data record, without disturbing the recording process. If a copy of the recording file is made to another file, then the playback can be done totally concurrently and independently of the original recorded file. Another technique that can be used is to copy the context of the data to be recorded and create two recording sessions for the same data. Then, a playback session can be invoked on one session while the other session continues recording of data. As shown in FIG. 14, this technique can similarly be extended to multiple remote machines 218, such that any machine 201 is recording data while any other machine 203 is currently playing back the data from the same data source 210 on remote machine 218. This technique is feasible since a single data source 210 can feed multiple consumer applications concurrently. Further, data consumer and data supplier can coexist on a single machine 219, and similarly supply other data consumers and data suppliers in the network. One example of a combined data consumer and data supplier will be described later in the discussion on filters and alarms. DATA DISPLAY SUBSYSTEM IMPLEMENTATION Instruments An instrument occupies a rectangular area within the window that represents a console. Each instrument may plot up to 24 values simultaneously in the preferred embodiment, with the reading of all values taken at the same time. All values in that instrument must be supplied from the same remote host. This allows for live displays/recordings of statistics from remote hosts, as the processing overhead is minimized by maintaining this restriction. The ability to dynamically add, change, or delete multiple data statistics to a monitoring instrument is a very powerful usability aid in visualizing the correlation between nominally disjoint parameters. This combinatorial feature coupled with the ability to display each parameter in a different color and presentation style (e.g., line, bar, area, skyline, etc.) in a live time graph, allows a very complex presentation of data that can still be comprehended with a minimum of explanation. In fact, a console of instruments can be constructed to show data from local and remote hosts, including statistics on individual processes. Additionally, custom data from applications that have registered with the Data Server daemon can be added to the viewing instruments that show normal system statistics (also from local or remote hosts). All of these operations can be done while the instruments are receiving data from the data supplier(s) and the display views are updated in real time (live). Data values of the same primary style can also be stacked and unstacked without disturbing the reception of data. A recording graph/instrument shows statistics for a system resource over a period of time, as shown at 249 in FIG. 12d. Recording graphs have a time scale with the current time to the right. The values plotted are moved to the left as new readings are received. A state graph/instrument 251 of FIG. 12e shows the latest statistic for a system resource, optionally as a weighted average. They do not show the statistics over time, but collect this data in case it is desired to change the graph to a recording graph. Instruments can be configured through a menu-based interface. In addition to selecting from values to be monitored with the instrument, the following properties are established for the instrument as a whole: Style The primary style of the graph. If the graph is a recording graph, not all values plotted by the graph need to use this graph style. In the case of state graphs, all values will be forced to use the primary style of the instrument. Default=Area graph Foreground The foreground color of the instrument. Most noticeably used to display time stamps and lines to define the graph limits. Default=White Background The background color of the instrument. Default=Black. Tile A pattern (pixmap) used to "tile" the background of the instrument. Tiles are ignored for state light type instruments. When tiling is used, it is always done by mixing the foreground color and the background color of the instrument in one out of eleven available patterns. Interval The number of milliseconds between observations. Default=2,500 milliseconds. History The number of observations to be maintained by the instrument. For example, if the interval between observations is 2,500 milliseconds and you have specified that the history is 1,000 readings, then the time period covered by the graph will be 1,000.times.2.5 seconds or approximately 42 minutes. The history property has a meaning for recording graphs only. If the current size of the instrument is too small to show the entire time period defined by the history property can be scrolled to look at older values. State graphs show only the latest reading so the history property does not have a meaning for those. However, since the user can change the primary style of an instrument at any time, the actual readings of data values are still kept according to the history property. This means that data is not lost if the primary style is changed from a state graph to a recording graph. Since the graph image can be bigger in the viewing area (window), scrolling is accomplished by using a Motif scroll-bar widget to make the appropriate part visible. During scrolling, the data display subsystem continues to update the graph image with real time data. Therefore, data integrity is maintained during scrolling. This data is then presented to the user on the display using the GUI. The minimum number of observations is 50 and the maximum number is 5,000. Default=500 readings. Stacking The concept of stacking allows data values to be plotted "on top of" each other. Stacking works only for values that use the primary style. To illustrate, think of a bar graph where the kernel-cpu and user-cpu time are plotted as stacked. If at one point in time the kernel-cpu is 15% and the user-cpu is 40%, then the corresponding bar will go from 0-15% in the color of kernel-cpu, and from 16-55% in the color used to draw user-cpu. If it is desired to overlay this graph with the number of page-in requests, one could do so by letting this value use the skyline graph style, for example. It is important to know that values are plotted in the sequence they are defined. Thus, if a user wanted to switch the cpu measurements above, they would simply define user-cpu before defining kernel-cpu. Values to overlay graphs in a different style, should always be defined last so as not to be obscured by the primary style graphs. Default=No stacking Shifting This property is meaningful for recording graphs only. It determines the number of pixels the graph should move as each reading of values is received. The size of this property has a dramatic influence on the amount of memory used to display the graph since the size of the pixmap (image) of the graph is determined by the product: history.times.shifting.times.graph height If the shifting is set to one pixel, a line graph looks the same as a skyline graph, and an area graph looks the same as a bar graph. Maximum shifting is 20 pixels, minimum is 1 pixel. Default=4 pixels Spacing A property used for bar graphs. It defines the number of pixels separating the bar of one reading from the bar of the next. Note that the width of a bar always is (shifting-spacing) pixels. The property must always be from zero to one less than the number of pixels to shift. Default=2 pixels In addition to the above properties that can be modified through a menu interface, four properties determine the relative position of an instrument within a console. They describe, as a percentage of the console's width and height, where the top, bottom, left and right sides of the instrument are located. In this way, the size of an instrument is defined as a percentage of the size of the monitor window. The relative position of the instrument can be modified by moving and resizing it as is commonly done in a MOTIF-like user interface, and as described below. For the state light graph type, foreground and background colors are used in a special way. To understand this, consider that state lights are shown as text labels "stuck" onto a background window area as though paper notes attached to a bulletin board. The background window area is painted with the foreground color of the instrument rather than with the background color. The color of the background window area never changes. Each state light may be in one of two states: Lighted (on) or dark (off). When the light is "off", the value is shown with the label background in the instrument's background color and the text in the instrument's foreground color. Notice that if the instrument's foreground and background colors are the same, one would see only an instrument painted with this color--no text or label outline is visible. If the two instrument colors are different, the labels will be seen against the instrument background and label texts are visible. When the light is on, the instrument's background color is used to paint the text while the value color is used to paint the label background. Thus, the special use of colors for state lights allows for the definition of alarms that are invisible when not triggered--or alarms that are always visible. The colors chosen depend on the selections made during setup. Skeleton Instruments Some computer system objects change over time. One prominent example of these changes is the set of processes running on a system. Because process numbers are assigned by the operating system as new processes are started, it is not known what process number an execution of a program will be assigned. Clearly, this makes it difficult to predefine consoles and instruments for processes in the configuration file. To help cope with this situation, a special form of consoles can be used to define skeleton instruments. Skeleton instruments are defined in the configuration file as having a "wildcard" in place of one of the hierarchical levels in the path that define a value. For example, a user could specify that a skeleton instrument for processes which have the following two values defined: Proc/*/kern Proc/*/user The wildcard is represented by the asterisk. In the above example, it appears in the place where a fully qualified path name would have a process ID. Whenever users try to start a console with such a wildcard, they are presented with a list of processes. From this list, the user can select one or more processes. Each process selected is used to generate a fully qualified path name. Each path name is then used to define either a value to be plotted or define a new instrument in the console. Skeleton instruments are also useful for handling the problem of varying resource configurations across different systems or over time. A skeleton instrument could be defined in which the disk name was replaced by a wildcard, to permit monitoring of any disk configuration on any system. The type of skeleton defined determines which one is selected. There are two types of skeletons, as described in the following sections. The skeleton type named "All" is so called because an instrument of this type will include all instances of the wildcard which are selected into the instrument. In the case of processes, this would include all selected processes. A skeleton instrument creates one instance of an instrument and this instrument contains values for all selected processes. Consoles may be defined with both skeleton instrument types but any non-skeleton instrument in the same console will be ignored. The relative placement of the defined instruments is kept unchanged. This may result in very crowded instruments when many processes are selected, but it is easy to resize the console. When only the "All" type skeleton instruments are defined, the performance tool will not resize the console. The type of instrument best suited for "All" type skeleton instruments is the state bar, but other graph types may be useful if the user chooses to allow colors to be assigned to the values automatically. To do the latter, the color is specified as "default" by the user when the skeleton instrument is defined. The "Each" skeleton type is so named because each instance of the wildcard object which is selected will create one instance of the instrument. In the case of processes, when five processes are selected by a user, each of the type "Each" skeletons will generate five instruments, one for each process. Again, one console may define more than one skeleton instrument and consoles can be defined with both skeleton instrument types, while any non-skeleton instruments in the same console are ignored. The relative placement of the defined instrument is kept unchanged. This may result in very small instruments when many processes are selected, but it's easy to resize the console. If the generated instruments would otherwise become too small, the performance tool will attempt to resize the entire console. The types of instruments best suited for the "Each" type skeleton instruments are the recording instruments (as exemplified in FIG. 12C). This is further emphasized by the way instruments are created from the skeleton: 1. The relative horizontal placement is never changed. 2. The relative vertical position defined by the skeleton is not changed, but the skeleton instrument is subdivided into the number of instruments to be created. 3. Each created instrument will have the full width of the skeleton instrument. 4. Each created instrument will have a height, which is the total height of the skeleton divided by the number of objects (e.g. processes) selected, as shown at 251 of FIG. 12e. Wildcards must represent a section of a value path name which is not the end point of the path. It could represent any other part of the path, but it only makes sense if that part may vary from time to time or between systems. With the standard statistics, the following wildcards are used:
______________________________________
PagSp/*/ . . . Page Spaces
Disk/*/ . . . Physical disks
NetIF/*/ . . . Network (LAN) interfaces
Proc/*/ . . . Processes
hosts/*/ . . . Remote hosts
______________________________________
When a console contains skeleton instruments, all such instruments should use the same wildcard. Mixing wildcards would complicate the selection process beyond the reasonable and the resulting graphical display would be incomprehensible. An extension to the concept of single wildcard notation is to use multiple wildcards to specify all statistics for a class of system objects. This facility permits users to define generic skeleton consoles for monitoring classes of system objects (e.g. disks, processes, paging spaces, network hosts, etc.), without requiring users to identify specific instances of the class. It allows multiple levels of specification of classes of system objects. These skeleton consoles can then be instantiated at run-time to monitor whatever system objects exist on a particular machine or set of network machines at a particular time (e.g., hdisk0, X process,/dev/hd6, abc.aus.ibm.com, etc., for all subnet nodes). These skeleton consoles are defined in a text configuration file by specifying the following information: 1. All display parameters (e.g. colors, locations, sizes, graph styles, etc.) 2. The system object classes (e.g. disks or processes). 3. The particular statistics to be displayed on each graph. For example, a configuration file line that could define a monitor consisting of a skeleton instrument for monitoring memory usage by individual processes on multiple network nodes would be: monitor.Local Processes.1.each.1: */Proc/*/workmem The above line has the information for the performance tool to monitor the working memory use of specific processes on multiple host machines. The host name and process ID are replaced with an asterisk (e.g., */Proc/*/workmem) to indicate to the performance tool that the particular hosts and processes are determined when the monitor is opened. When all values in an instrument have all or part of the value path name in common, the performance tool determines the common part of the name from the value names displayed in the instrument and displays the common part in a suitable place. In determining how to do this, the performance tool examines the names of all values in the containing console. To illustrate, assume a single instrument is in a console, and that this instrument contains the values: PagSp/paging00/% free PagSp/hd6/% free Names are checked as follows: 1. It is first checked whether all values in a console have any of the beginning of the path name in common. In this case, all values in the console have the part PagSp/ in common. Since this string is common for all instruments in the console, it can conveniently be moved to the title bar of the window containing the console. It is displayed after the name of the console and enclosed in angle brackets like this: <PagSp/> The remainder of the value names left to be displayed in the instrument thus are: paging00/% free hd6/% free 2. Next, each instrument in the console is checked to see if all the value names of the instrument have a common ending. In the example, this is the case, since both values display % free. Consequently, the part of the value names to be displayed in the color of the values is reduced to: paging00 hd6 The common part of the value name (without the separating slash) is displayed within the instrument in reverse video, using the background and foreground colors of the instrument. The actual place used to display the common part depends on the graph type of the instrument. 3. The last type of checking for common parts of the value names is only carried out if the end of the names do not have a common part. Using the example, no such checking would be done. When checking is done, it goes like this: If the beginning of the value names (after having been truncated using the checking described in numbered point one above) have a common part, this part is removed from the value path names and displayed in reverse video within the instrument. To illustrate, assume a console with two instruments. The first instrument has the values: Mem/Virt/pagein Mem/Virt/pageout while the second instrument has: Mem/Real/% Work Mem/Real/% free The result of applying the three rules to detect common parts of the value names would cause the title bar of the console window to display <Mem/>. The first instrument would then have the text "Virt" displayed in reverse video and the value names reduced to: pagein pageout The second instrument would display "Real" in reverse video and use the value names: % work % free Consoles Consoles, like instruments, are rectangular areas on a graphical display. They are created in top-level windows of the OSF/Motif ApplicationShell class, which means that each console will have full OSF/Motif window manager decorations. These window decorations allow you to use the mwm window manager functions to resize, move, iconify, and maximize the console. The window manager Close function invokes the Exit xmperf function also available from the File menu. Consoles are useful "containers" for instruments. A user can: 1. Move collections of instruments around in consoles, using the console as a convenient basket. 2. Resize a console and still retain the relative size and position of the instruments it contains. 3. Iconify a group of instruments so that historic data is collected and recording of incoming data continues even when the console is not visible. This also helps to minimize the load on the system. 4. Close a console and free all memory structures allocated to the console, including the historic data. Closed consoles use no system resources other than memory to hold the definition of the console. Consoles may contain non-skeleton instruments or skeleton instruments but not both. Consequently, it makes sense to classify consoles as either non-skeleton or skeleton consoles. The two work a little differently, as will now be described. Non-skeleton consoles may be in one of two states: open or closed. A console is opened by a user selecting it from the Monitor Menu. Once the console has been opened, it may be iconified, moved, maximized, and resized through mwm. None of these actions change the status of the console. It may not be visible on the display, but is is still considered open and if recording has been started, it continues. After having opened one or more non-skeleton consoles, the name of the console on the Monitor Menu is now preceded by an asterisk. This indicates that the console is open. If a user selects on one of the names preceded by an asterisk, the corresponding console is closed. Skeleton consoles themselves can never be opened. When a user selects one from the Monitor Menu, it is not opened, but rather causes the display of a list of names matching the wildcard in the value names for the instruments in the skeleton console. If a user selects one or more from this list, a new non-skeleton console is created and added to the Monitor Menu. This new non-skeleton console is automatically opened, and is given a name constructed from the skeleton console name suffixed with a sequence number. Skeleton consoles are defined like any other console. Neither the keywords defining the console nor those defining the instruments are different. The only difference is in one keyword used to define the values in the instruments of the console. The keyword that is different is "input" keyword, which must be changed to one of "all" or "each". The other thing that is different is that the path name of the value must contain one--and only one--wildcard, and that the path of all the "all" and "each" keywords in one console must be the same up to, and including the wildcard. Whether to use one or the other of the keywords depends on what type of skeleton you want. The following are two examples of skeleton definitions:
______________________________________
monitor.Single-host Monitor.3.each.1:
hosts/*/CPU/kern
monitor.Single-host Monitor.3.each.2:
hosts/*/Syscall/total
monitor.Remote Mini Monitor.1.each.4:
NetIf/*ipacket
monitor.Remote Mini Monitor.1.each.5:
NetIf/*/opacket
monitor.Disk Monitor.1.all.1:
Disk/*/busy
______________________________________
Note that skeleton types within a console can be mixed and that all paths up to the wildcard must be the same, not only in an instrument but for all instruments in a console. Skeleton instruments of type "all" can, as has already been pointed out, only have one value defined. It follows that all values in the instantiated instrument will have the same color, namely as defined for the value in the skeleton instrument. This is rather dull. Worse though, is that it effectively restricts the "all" type skeletons to use the state bar graph type since otherwise you wouldn't be able to tell one value from another. To cope with this, one can define the color for a value in a skeleton instrument of type "all" as "default". This will cause xmperf to allocate colors to the values dynamically as values are inserted during instantiation of the skeleton. Below is an example of a full value definition using this feature:
______________________________________
monitor.Processes.1.all.1:
hosts/myhost/Proc/*/kerncpu
monitor.Processes.1.color.1:
default
monitor.Processes.1.range.1:
0-100
monitor.Processes.1.label.1:
cmd
______________________________________
The non-skeleton console created from the skeleton is said to be an "instance" of the skeleton console; a non-skeleton console has been instantiated from the skeleton. The instantiated non-skeleton console works exactly as any other non-skeleton console, except that changes a user may make to it will never affect the configuration file. A user can close the new console and reopen it as often as desired, as well as resize, iconify, maximize, and resize it. Each time a skeleton console is selected from the monitor Menu, a new instantiation is created, each one with a unique name. For each instantiation, the user is prompted to select values for the wildcard, so each instantiation can be different from all others. If desired to change a created skeleton console into a non-skeleton console and save in the configuration file, the easiest way to change and save is to use the "Copy Console" function from the console menu. This will prompt a user for a name of the new console and the copy will be a non-skeleton console looking exactly as the instantiated skeleton console copied from. Once the console has been copied, a user can delete the instantiated skeleton console and save the changes in the configuration file. All consoles are defined as OSF/Motif widgets of the XmForm class and the placement of instruments within this container widget is done as relative positioning. To add an instrument to a console, a user can choose between adding a new instrument or copying one that is already in the console. If "Add Instrument" is chosen, the following happens: 1. It is checked if there is enough space to create an instrument with a height of 24% of the console. The space must be available in the entire width of the console. If this is the case, a new instrument is created in the space available. 2. If enough space is not available, the existing instruments in the console are resized to provide space for the new instrument. Then the new instrument is created at the bottom of the console. 3. If the new instrument has a height less than 100 pixels, the console is resized to allow the new instrument to be 100 pixels high. If "Copy Instrument" is chosen, the following happens: 1. It is checked if there is enough space to create an instrument of the same size as the existing one. If this is the case, a new instrument is created in the space available. Unlike what happens when adding a new instrument, copying will use space that is just wide enough to contain the new instrument, as there is no need to have space available in the full console width. 2. If enough space is not available, the existing instruments in the console are resized to provide space for the new instrument. Then the new instrument is created. New space is always created at the bottom of the console, and always in the full width of the console window. 3. If the new instrument has a height less than 100 pixels, the console is resized to allow the new instrument to be 100 pixels high. Once an instrument has been selected and chosen to be resized, the instrument goes away and is replaced by a rubber-band outline of the instrument. A user resizes the instrument by holding mouse button 1 down and moving the mouse. When the user presses the button, the pointer is moved to the lower right corner of the outline and resizing is always done by moving this corner while the upper left corner of the outline stays put. When a user releases the mouse button, the instrument is redrawn in its new size. Note that it is normally a good idea to move the instrument within the console so that the upper left corner is at the desired position before resizing. The position of the resized instrument must be rounded so that it can be expressed in percentage of the console size. This may cause the instrument to change size slightly from what the rubber-band outline showed. Instruments cannot be resized so that they overlap other instruments. If this is attempted, the size is reduced so as to eliminate the overlap. When an instrument is selected to be moved, the instrument disappears and is replaced by a rubber-band outline of the instrument. To begin moving the instrument, a user places the mouse cursor within the outline and presses the left mouse button. The button is held down while moving the mouse until the outline is at the desired location. Then, the button is released to redraw the instrument. Instruments can be moved over other instruments, but are never allowed to overlap them when the mouse button is released. If an overlap would occur, the instrument is truncated to eliminate the overlap. Referring initially to FIG. 4, the data display subsystem 40 takes inputs from the GUI subsystem 80, data value receiver subsystem 60, and the playback subsystem 50 and creates the displays needed to show the performance data in the format described by the configuration subsystem 30. It calls the configuration subsystem 30 to get the display format information from the configuration files and also to send requests for system configuration information to Data Supplier daemons 210. Referring now to FIG. 16, the data display subsystem checks if data was received from the data value received or playback subsystems at 300. The input data format from each of these subsystems will be identical, so no special code is necessary to distinguish where the data originated. It picks up the pointer to the display data and the corresponding console from the input parameters and updates the data in the display console instrument at 302, and then exits. For data received at 302, the data display subsystem was either invoked from the playback subsystem at 262 of FIG. 15, where an operator has requested to "Play" the data, or data is sent from the data value receiver subsystem at 408 of FIG. 18. In this state (302 of FIG. 16), the data display subsystem has minimal overhead to display statistical data, and the data display subsystem is able to present data from either a local or remote host with minimal impact on the system. If the operator had opened a "console monitor" via a graphic button selection as determined at 304, the GUI subsystem would capture that input and pass it to the Display subsystem. If the selection was for a fixed console as determined at 306, then the configuration subsystem would be called at 308 to get the console configuration data to create the display console. The negotiation for statistics with data suppliers at 310 is initiated from the data display subsystem, but uses the configuration subsystem to obtain the data through the network send/receive interface. The Display Subsystem then would create and open the fixed display console at 312. It would then call the Network Send/Receive Interface to start the data feed from the Data Supplier Daemon at 314, and exit. If the operator had opened a "Skeleton Console", as determined at 306, then the Data Display Subsystem would call the Configuration subsystem to get the console configuration data at 316, and then call the Network Send/Receive Interface to get the current skeleton parameters at 318, as specified by the skeleton template in the configuration file. Then it calls the GUI subsystem with the skeleton parameters to allow the operator to select which skeleton consoles they wish to view at 320. After receiving the operator choices from the GUI subsystem, it calls the Configuration subsystem to send a request to instantiate the skeleton console parameters to the Data Supplier Daemon(s) via the Network Send/Receive Interface. After receiving the data from the Data Suppliers, it creates and opens the skeleton display console at 322. Finally it sends a "start data feed" request to the Data Supplier daemon(s) via the Configuration and Network Send/Receive Interface subsystems at 314, and exits. If the operator had selected the "close console" option via a graphic button selection as determined at 324, the GUI subsystem would pass that input to the Data Display subsystem. The Data Display subsystem would then send a "stop data feed" request to the Data Supplier Daemon(s) via the Configuration and Network Send/Receive interface subsystems at 326. Finally, it would close the Display Console at 328 and exit. If the operator had selected one of the "change graph style" options as determined at 340, the GUI subsystem would capture that input and pass it to the Data Display subsystem together with the new graph style options selected. The Data Display subsystem would then change the display modes dynamically so that the current console, instrument, and value attributes would be updated and displayed with the new values at 342. The configuration file would not be updated until the operator explicitly requested the configuration file to be saved. If the operator had chosen to open a "Tabulation window" (a numeric display of the graphic data in tabular format) at 344, the GUI subsystem would pass that input to the Data Display subsystem to open a tabulation window for the selected instrument at 346, and set a flag for the data values to be displayed in this window concurrently with the graphic data in the corresponding instrument at 348. If the operator had chosen a button that should execute a command string that is defined in the configuration file as determined at 350, the GUI subsystem calls the Data Display subsystem which then calls the configuration subsystem to get the command string from the configuration file at 352. It also gets user supplied parameters at 354 and then it passes the command string to the host operating system to execute the command string at 356. If the operator had chosen a button that should open a playback console as determined at 350, the playback file is opened and the console configuration data that had been previously saved is read at 358. Next, a playback console is opened on the display at 360, using this console configuration data. This action is initiated by the playback subsystem at 254 of FIG. 15, and provides the ability to automatically present the operator with the previously recorded data in the same context in which it was originally recorded, without requiring extensive operator interactions for setup of the context. Finally, data recording feeds are initiated at 362, where a request is sent to data suppliers to start data feeding. Data feeds received as a result of this request are processed at 302. IMPLEMENTATION OF CONFIGURATION SUBSYSTEM Referring initially to FIG. 3, the main functions of the Configuration Subsystem 30 are to take requests for data from the configuration file 110 and return requests or data to the caller. It also is the main interface to the Network Send/Receive Interface 70 to route data requests to the proper Data Suppliers. When the xmperf performance tool is initially started, the configuration subsystem parses the configuration file and builds the initial configuration definition control blocks that determine how all the monitor and menus will look when created. Referring now to FIG. 17, if the data display subsystem calls for the configuration data that defines how a console is to look (size, shape, instruments, colors, values, etc.) or what information a skeleton console needs from the operator at 370, the Console subsystem retrieves that data from the configuration definition control block and returns that data to the caller at 372. If the operator selects the "save configuration file" option, the GUI subsystem will pass this request to the Configuration subsystem at 360 (FIG. 17), which will then rename the current configuration file with a timebased name and then write the current configuration control block data to a new file that will be the active configuration file at 362, and then exit at 390. If the GUI needs to present the operator with a list of network nodes, it calls the configuration subsystem at 364 to send this request to the Data Supplier daemons via the Network Send/Receive interface at 366. As the Data Suppliers respond to the request, a list of responding nodes is created and returned to the GUI subsystem at 368. If a caller routine requests a "start", "stop", or "change data feed rate" be sent to the data supplier(s) as determined at by 374 and 380, the configuration subsystem sends this request to the Data Suppliers(s) via the Network Send/Receive Interface at 386, and then exits at 390. If a caller routine wants to traverse the data hierarchy for data values as determined at 380, the configuration subsystem sends this request to the Data Supplier(s) via the network send/receive interface at and returns the data received to the caller at 384. An example of a data context hierarchy is: ##STR1## To traverse the data context hierarchy, a program would call RSiInstantiate to create (instantiate) all subcontexts of a context object. Next, it would call RSiPathGetCx to search the context hierarchy for a context that matches a path name. Then it would call RSiFirstCx to return the first subcontext of a context. RSiNextCx is called to get the next subcontext of a context. Statistics are at the leaf nodes of the context hierarchy. The statistics can be retrieved by calling RSiFirstStat to get the first statistic of a context and RSiNextStat gets the next statistics of a context. IMPLEMENTATION OF THE DATA VALUE. RECEIVER SUBSYSTEM The data value receiver subsystem 60 of FIG. 7 receives all data feeds at 150 from the Network Send/Receive Interface 70. This data includes the StatSetID so that the incoming data can be matched to a specific instrument in an active display console. Referring now to FIG. 18, upon receipt of a data feed packet 400, the data value receiver gets the StatSetID and searches the list of active display consoles looking for a matching StatSetID at 402. If the data value receiver does not find a match, it discards the data at 406. If it finds a matching console, as determined at 404, it passes the data to the data display subsystem with a pointer to the console control block that it found at 408. If recording is active for the console or instrument, as determined at 410, then the data is also passed to the recording subsystem with a pointer to the console control block for the data to be saved in the recording file at 412. Because of this single unidirectional flow of statistic data from the network send/receive interface, which contains both local and remote statistics, and because of the minimal amount of processing required by the data value receiver subsystem, real-time performance/statistical data can be sent to both the display and recording subsystems concurrently. IMPLEMENTATION OF THE NETWORK SEND/RECEIVE INTERFACE The main functions of the Network Send/Receive Interface 70 of FIG. 8 are to send network requests to Data Suppliers 210, receive their responses, and to pass data feed information to the Data Value Receiver Subsystem 60. Referring now to FIG. 19, if the network send/receive interface receives a call to identify all the data supplier daemons as determined at 420, the network send/receive interface broadcasts an "are.sub.-- you.sub.-- there" message to all the hosts in the local subarea network and any hosts specified in the hosts file at 422. The network send/receive interface waits for all the responses and then returns a list of all the responding hosts to the calling routine at 424. If the Network Send/Receive interface receives a call to traverse a data hierarchy 426, negotiate an instrument data value 428, start, stop or change frequency of a data feed 430, it sends a request packet to the data supplier(s) daemons at 432, 434, or 436, respectively, waits for the response(s), and passes the response back to the configuration subsystem at 438 and 440. The data supplier daemon 210 can be at either a local or remote node. The underlying TCP/IP socket communication protocol masks the daemon location from the data requestor. The data requestor only specifies a node name and the communication protocol determines whether it is local or remote. If the network send/receive interface receives a data feed from a data supplier daemon at 442, therefore passes the data to the Data Value Receiver subsystem at 444. IMPLEMENTATION OF THE GRAPHICAL USER INTERFACE As depicted in FIG. 20, the graphical user interface 80 is simply an interface between a user and the performance tool, receiving user input and passing the information on to the appropriate performance tool subsystem. The interface waits to receive user input at 421. A check is then made if the user desires to exit the performance tool at 423. If so, the tool is terminated at 419. Otherwise, processing continues at 425, 429, 433 and 437, where checks are made to determine if the input is destined for the configuration, data display, recording, or playback subsystems, respectively. The appropriate subsystem is called at 427, 431, 435 or 439, based upon the destination of the user input. The particular interfaces between the GUI and the other subsystems are further shown in FIG. 21. The GUI interface to the recording system consists of the following. A user initiates a request to start/stop a console or instrument recording, which the GUI detects at 433 (FIG. 20) and sends to the recording subsystem. The recording subsystem sends a message to the GUI to prompt the user whether the user desires to append/replace an existing recording file and the user response to such inquiry is returned to the recording subsystem. The GUI interface to the configuration subsystem comprises the following messages. First, a user initiated message to create/erase/copy a console can be sent to the configuration subsystem. A request to initiate a console can also be sent, with a response from the configuration subsystem being a message listing possible console instantiations. The user would then select from the list using the GUI, resulting in a message being sent from the GUI to the configuration subsystem indicating the selected instatiation(s). The user can also initiate a message to be sent to add an instrument, or to add/change a value, both of which result in a list of possible values to be presented to the user using the GUI. The user selects a value to be sent to the configuration subsystem. Finally, a user can initiate a request to save a configuration file. The GUI interface to the data display subsystem comprises the following messages. A user can initiate a request to open/close a console. A user can initiate a message to change the instrument style or properties, or to change the value properties. The data display subsystem sends the GUI a message containing a list of possible choices to present to the user, whereupon the user makes a selection to be returned to the data display subsystem. Finally, the GUI interface to the playback subsystem comprises the following messages. A user can initiate a message to instruct the playback subsystem to open/close a recording, and the playback subsystem responds with a list of recordings to be presented to the user for selection. The selection is returned to the playback subsystem. The user can also invoke actions which cause the GUI to send the playback subsystem various messages to play/stop, rewind, seek, play slower/faster, and erase a recording/playback file. MONITORING REMOTE SYSTEMS Referring to FIG. 22, the concept of separating the data collecting executable 210 from the data display executable 90 led to the concept of using a separate data supplier 220 capable of supplying statistics to data consumers 90 on a local 208 or on remote hosts 218. The performance tool 90 provides true remote monitoring by reducing the executable program on the system to be monitored remotely to a subset of the full performance tool. The subset, called xmservd 210, consists of a data retrieval part 207 and a network interface 205. It is implemented as a daemon that can be started manually, started from one of the system start-up files, or left to be started by the inetd super-daemon when requests from data consumers are received. The obvious advantage of this approach is that of minimizing the impact of the monitoring software on the system to be monitored. Because one host can monitor many remote hosts, larger installations may want to use dedicated hosts to monitor many or all other hosts in a network. Since the xmservd daemon can be ported to multiple (and differing) platforms, provisions are made to allow flexible adaptation to characteristics of each host where the daemon runs. This has several implications. First, the data supplier daemon 210 does not have any system dependent statistics imbedded in itself. Second, the system dependent statistics and functions to extract such statistics are provided in external executables 220. A protocol and method of cross-accessing statistics between these external executables and xmservd is defined. Third, an application programming interface is used to generalize the protocol and access mechanism. Thus, a customized tool similar to the performance tool described herein could be developed, and interface to the existing xmservd daemon. The following explains in more detail how monitoring of remote systems takes place. For this discussion, the term data supplier host 218 describes a host that supplies statistics to another host, while a host receiving, processing, and displaying the statistics is called a data consumer 208. The initiative to start remote monitoring always lies with the data consumer program 90. The performance tool will attempt to contact potential suppliers of remote statistics in three situations, namely: 1. When the tool starts executing, it always attempts to identify potential data supplier hosts. 2. When five minutes have passed since the last attempt to contact potential data suppliers and the user creates an instrument referencing a data supplier host. 3. When five minutes have passed since the last attempt to contact potential data supplier hosts and the user activates a console containing a remote instrument. The five-minute limit is implemented to make sure that the data consumer host 208 has an updated list of potential data supplier hosts 218. This is not an unconditional broadcast every five minutes. Rather, the attempt to identify data supplier hosts is restricted to times where a user wants to initiate remote monitoring and more than five minutes have elapsed since this was last done. The five-minute limit not only gets information about potential data supplier hosts that have recently started; the limit also removes from the list of data suppliers such hosts which are no longer available. Once the performance tool is aware of the need to identify potential data supplier hosts, it uses one or more of the following methods to obtain the network address(es) where an invitational are.sub.-- you.sub.-- there message can be sent. The last two methods depend on the presence of the file /usr/lpp/xmservd/hosts. The three ways to invite data supplier hosts are: 1. Unless prohibited by the user, the performance tool finds the broadcast address corresponding to each of the network interfaces of the host, as described below. The invitational message is sent on each network interface using the corresponding broadcast address. Broadcasts are not attempted on the Localhost (loop-back) interface 202 or on point-to-point interfaces such as X.25 or SLIP (Serial Line Interface Protocol) connections. 2. If a list of Internet broadcast addresses is supplied in the file /usr/lpp/xmservd/hosts, an invitational message is sent on each such broadcast address. Every Internet address given in the file is assumed to be a broadcast address if its last component is the number 255. 3. If a list of hostnames or non-broadcast Internet addresses is supplied in the file /usr/lpp/xmservd/hosts, the host Internet address for each host in the list is looked-up and a message is sent to each host. The look-up is done through a gethostbyname() call, so that whichever name service is active for the host where the performance tool runs is used to find the host address. The file /usr/lpp/xmservd/hosts has a very simple layout. Only one keyword is | ||||||
