System and methods for monitoring application server performance6792460Abstract A monitoring system monitors the amount of time spent by specific application components, such as Java components, during execution of specific web site transactions. A probe that runs on an application server initially instruments these components (preferably at component load time) to add code for tracking execution start and stop times. When a monitored transaction is executed by the application server, the probe measures the execution times of the invoked components--preferably at the component method level. The resulting measurement data is reported to a reports server, and is used to provide transaction-specific breakdowns of the amount of time spent by each instrumented component, and optionally each instrumented method within such components. In one embodiment, the probe only monitors transactions initiated by agent-generated transaction request messages that are marked or "colored" for monitoring, and thus ignores transactions initiated by actual users. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE 1
Name Description
Avg. Servlet Time The amount of time that the transaction was
processed by servlets
Avg. Session EJB Time The amount of time that the transaction was
processed by Session EJBs
Avg. Entity EJB Time The amount of time that the transaction was
processed by Entity EJBs
Avg. Database Time The amount of time that passes from the
moment the application server sends an SQL
query to the database server until the database
server returns a response to the application
server
Avg. App Server Queue The amount of time that passes from the
moment the application server receives a
transaction request until the request is
allocated a thread
Avg. App Server Logic The amount of time that passes from the
moment the transaction request is
allocated a thread until the request is
handed off to a servlet
Using the "filters" button in FIG. 2, the user can also limit the report to data associated with a specific agent 110 or set of agents. As described in U.S. Pat. No. 6,449,739, the set of agents may be specified by the user by designating one or more agent attributes, such as agent location, organization, and/or ISP. For example, the user may select the location "New York" to restrict the display to performance data generated in response to transactions executed by agents 110 residing in New York. By viewing the report shown in FIG. 2, monitoring personnel may determine, for example, that the application server 100 spends more time executing session EJBs than the other types of monitored application components. To further analyze this possible performance issue, the user can select the "view method detail" link for "avg. session EJB time," and view a breakdown of the time spent by specific session EJB methods 124. An example method breakdown report is shown in FIG. 4. If necessary, the user can also update the configuration file 125 to designate specific session EJB methods 124 to be monitored. Ultimately, the user may use the application performance data revealed in this and the other reports to improve the design of the application 102. The example performance data shown in FIG. 2 reveals that average session EJB time increased from about 1/2 second to about 1.5 seconds between 4 pm and 5 pm during the selected time window. To assess the impact this increase had on end users, the user can view a report, such as the transaction breakdown report of FIG. 5, that reveals average transaction response times over the same time window. FIG. 3 illustrates an example "breakdown summary" report for a user-specified time period. This report is based on the same six performance metrics as the "breakdown over time" report, but displays the breakdown separately for each monitored transaction. The user can thus identify transaction-specific performance problems that reside within the application server 100. To generate the data for a particular transaction (such as "login1"), the reports server 120 queries the database 118 to retrieve the probe-generated measurement data for all instances or "runs" of this transaction over the specified time window, and then uses the retrieved data to calculate the averages of the six constituent time periods. Although the values shown in FIG. 3 are averages, the user may be given the option to drill down to specific instances of the subject transaction. As with the "breakdown over time" report, the user also can limit the breakdown summary report to data associated with a specific agent 110 or set of agents. FIG. 4 illustrates a component breakdown report that shows the average execution time of each of multiple servlet methods 124 over a user-specified time window. This report may be used to effectively drill down to the servlet method level to determine whether a servlet performance problem is being caused by a particular method or set of methods. Similar breakdown reports may be provided for specific components 104. As with the reports in FIGS. 2 and 3, the user may limit the display to performance data associated with a particular agent 110 or set of agents. FIG. 5 illustrates one example of how the application server monitoring reports may be accessed from, and integrated with, other types of reports provided by the reports server 120. The particular report shown in FIG. 5 is a transaction breakdown report that breaks down the total end-user transaction time for each of multiple transactions into the following five categories: DNS (Domain Name Server) resolution, connection, server time, server/network overlap, network time, and client time. A preferred method for generating such a breakdown is disclosed in U.S. pat. application Ser. No. 10/038,098, referenced above. In this example, a "view application server breakdown" link is displayed next to the graphs for the "login_user" and "stock.sub.-- 5day_chart" transactions, indicating that application server monitoring data exists in the database 118 for these two transactions. By selecting one of the "view application server breakdown" links, the user can effectively drill down to further analyze the server time data displayed in the transaction breakdown report. The user may wish to do this if the transaction breakdown report reveals that the average server time for a particular transaction is unusually long, or has increased unexpectedly. The ability to view the probe-generated application server measurements in the context of associated end user and server response times allows administrators to assess the impact specific application components are having on overall server and end user performance. For example, in addition to indicating the value of an entity EJB response time over a particular time period, a report (or set of integrated reports) may reveal that this entity EJB response time contributed to 70% of the overall response time for a specific end-user transaction, and that the recent degradation in response time for this transaction was caused by the entity EJB response time suddenly increasing from 40% to 70% of the overall response time. Where the monitored web site 112 includes multiple application servers 100, the reports server 120 may also provide reports and graphs of the type shown in FIGS. 2-5 separately for each application server 100 of the web site. For example, a "breakdown by server" report may be provided that breaks down the performance of a given component or method by application server. This may be useful, for example, for evaluating whether all of the application servers in a load balanced environment of functioning properly. Reports may also be provided that aggregate the data of all, or of a selected group, of the web site's application servers. The performance data generated by the probe 122 and the agents 110 may optionally be analyzed automatically by a root cause analysis (RCA) application of the type described in U.S. patent application Ser. No. 10/038,098, referenced above. As depicted in FIG. 1, the RCA application 140 may run on or in association with the reports server 120 to assist users in efficiently pinpointing root causes of performance problems. The RCA application 140 preferably identifies those application components 104 that are the likely cause of performance degradations by monitoring changes is the probe's execution time measurements over time. For example, the RCA application may detect that the average servlet time over a five minute time window greatly exceeds its historical norm, and based on this fact, notify a user that servlets are the likely cause of an end-user performance degradation that occurred over the same time period. The algorithms applied to the probe's measurements by the RCA application 140 are preferably substantially identical to those described in U.S. patent application Ser. No. 10/038,098. III. Instrumentation of Code In one embodiment, the task of monitoring the application components 104 and methods 124 is accomplished using a virtual machine configured to pass the invoked components (classes) to the probe 122 at load time for dynamic instrumentation. The virtual machine may, for example, be a Java.TM. virtual machine ("JVM"), and may be so configured using the JVMPI API (Java Virtual Machine Profiling Interface Application Program Interface) available from Sun Microsystems. Other APIs that may become available in the future may also be used to configure the virtual machine. In addition, as described in section VI below, the JVM or other virtual machine may alternatively be modified by adding a patch to the virtual machine's class that is responsible for loading components for execution; with this approach, no special API is needed to configure the virtual machine. In one embodiment, this method is implemented using a utility program that adds a patch to the JVM's ClassLoader class on the application server 100. As described in section VI, two important benefits of this "patched JVM ClassLoader" approach are that (1) it is implemented purely in Java, and is therefore platform independent, and (2) the instrumentation method works regardless of whether the particular JVM installed on the application server 100 supports the Java Virtual Machine Profiling Interface. The probe 122 preferably instruments (adds hooks to) a monitored class by instrumenting some or all of the methods 124 within that class. As described below, a particular method is instrumented by adding a "start" call at the beginning of the method and an "end" call at the end of the method. These calls or "hooks" allow the probe to determine whether a particular invocation of an instrumented method corresponds to a transaction that is colored for monitoring, and if it is, to record the start and stop time of that method. The start and stop times of some or all of the methods invoked by this transaction can thus be recorded. These measurements can then be aggregated at the component level to determine the amount of time spend by each component. Thus, the data collected by the probe for a given transaction execution event may be used to (1) trace the execution of a transaction through all of the application components executed by the virtual machine as part of the transaction, and (2) measure the amount of time spent by each such application component (and specific methods thereof) processing the transaction. Although dynamic instrumentation is used in the preferred embodiment, the classes may alternatively be statically instrumented. FIG. 6 illustrates one implementation of the above-described dynamic instrumentation process. FIG. 6 also illustrates a set of components and data structures that may be used to (1) record execution start and stop times when instrumented methods are executed as part of monitored transactions, and (2) report these execution times, in raw and/or aggregated form, to an outside entity. It should be understood that the three processes illustrated in FIG. 6 (instrumentation, execution time monitoring, and reporting) typically occur at different times. Specifically, instrumentation occurs when a class is loaded into the Java or other virtual machine 600; monitoring of execution start and stop times occurs when the instrumented classes are invoked; and the reporting of collected data preferably occurs periodically. Further, although the probe 125 is depicted as being separate from the virtual machine 600 for purposes of illustration, the probe actually runs within the virtual machine in the preferred embodiment. As depicted by FIG. 6, instrumentation occurs as follows. The virtual machine 600 obtains a class source 602 from a storage device 604, such a disk drive, at run time. An example of a class source is bytecode, a compiled format for Java.TM. programs. Prior to executing the class source 602, the virtual machine 600 passes the class source 602 to a "code instrumentation" component 610 of the probe 122. This component 610 preferably determines whether the class source 602 is to be instrumented for monitoring based on information contained in the configuration file 125. To instrument the class source, all of its methods are typically instrumented individually, so that each such method may be separately monitored. In the preferred embodiment, however, a user can deselect one or more of these methods--such as those not believed to be the cause of performance problems--in which case only some of the methods of the class source may be instrumented/monitored. Once instrumented, a particular class typically remains persistent in memory until the application server is restarted; the instrumented class may therefore service many client requests without being loaded/instrumented again. In one embodiment, the configuration file 125 contains rules that are used by the probe 122 to dynamically determine, at load time, which classes (components) and methods should be instrumented for monitoring. The classes can be specified either directly, or by declaring that any class that inherits from a certain class or implements a certain interface should be hooked. Direct inheritance may be supported, as well as indirect inheritance of classes or interfaces, with any level of indirection. Methods to be monitored/hooked can be defined either explicitly or using wildcards. As mentioned above, a utility program and associated user interface 170 may optionally be provided to assist web site operators in creating and editing the configuration files 125 on their respective application servers 100. This utility program may, for example, display a listing of all Java components and methods (and/or types of components and methods) currently installed on the application server 100, together with respective check boxes for indicating which should be monitored. Another approach is for the configuration file 125 to specify heuristics for determining which components and/or methods should be monitored. The use of a configuration file 125 allows administrators to flexibly monitor only those components, and optionally methods, that are the most likely sources of performance problems. For example, an administrator may wish to monitor all objects provided by a particular vendor, while refraining from monitoring those provided by a more reputable vendor. Although a configuration file 125 is used in the illustrated embodiment, the configuration information that specifies which components and methods are to be monitored may alternatively be stored in another type of repository, such as an executable file or a database. In addition, some or all of this configuration information could be passed to the probe 122 in HTTP requests from the agents 110. Further, the probe 122 could be designed to monitor all components. If, at load time, the probe 122 (code instrumentation component 610) determines that the class source 602 is to be monitored, the probe instruments the class source by adding calls to the probe's "start" and "end" methods 612, 614 within the class source 602. By default, these calls are added to all of the methods of the class source 602. As mentioned above, however, the configuration file may specify that certain methods are to be excluded--such as those explicitly deselected by the user via the interface of FIG. 1C. As illustrated in FIG. 6, the probe 122 then returns the instrumented class source 602' to the virtual machine 600 for execution. If the probe determines that the class should not be monitored, it simply returns the class source without modification. In the particular example shown in FIG. 6, the virtual machine has loaded two classes, CLASS_A and CLASS_B, and only CLASS_A has been instrumented. Although instrumentation of the class source 602 has advantages, it is not necessary. For example, in one embodiment, the class source 602 contains function calls to methods that are equivalent to the instrumenting methods. These methods are part of the class through inheritance, statically added to the class source 602, or through any other method suitable for adding functionality to a class. One skilled in the art will also realize that the embodiments disclosed herein may be practiced within any of a number of suitable environments, including environments that do not use a virtual machine. IV. Monitoring of Instrumented Classes The probe's logic for monitoring execution of instrumented classes resides within the "start" and "end" methods 612, 614 to which calls are added during instrumentation. Both of these methods may be implemented within servlet or JSP code executed by the virtual machine 600. As the virtual machine 600 executes an instrumented component's class source code, it also executes the start and end methods 612, 614 of the probe. All of the J2EE components executed by the virtual machine 600 as part of a single transaction/page request are ordinarily executed within a single thread, with the first invoked component usually being a servlet or a JSP. When the start method 612 is first called, it determines whether this thread belongs to a transaction to be monitored. As described above, this may be accomplished by determining whether the associated HTTP request includes a special tag or header inserted by the agent 110. Because the start method 612 is effectively part of the JSP or servlet being executed, it has access to this information. In implementations that support application server monitoring of real user transactions, the start method 612 may monitor the transaction if it corresponds to a particular JSP or servlet page, or based on some other attribute of the transaction/HTTP request. The operation of the "start" method 612 is depicted by FIG. 7A. The first time the start method 612 is called by a given thread, the start method determines whether the thread belongs to a transaction to be monitored (block 712), and terminates processing if it is not. As described above, in one embodiment, the determination of whether the thread belongs to a monitored transaction involves determining whether the transaction is colored for monitoring. Because only agent-initiated transactions can ordinarily be colored in this embodiment, real user transactions are prevented from being monitored as the result of block 712. Agent-initiated transactions that are not colored are also excluded from monitoring. If the thread belongs to a monitored transaction, the start method 612 marks the thread as "inside transaction" in a global structure (not shown), and allocates a set of data structures to the thread (block 714). As illustrated in FIG. 6, these data structures include a vector 620 or other data structure for collecting the execution times of each method 124, and a stack 622 used to track the execution path and termination point of the thread. The start method 612 also records the execution start time in the vector 620, and places an identifier of the invoked method 124 on the stack 622 (block 716). Thereafter, each time the "start" method 612 is called, it determines whether the thread is inside a monitored transaction by looking at the global structure (block 710). If the thread is inside a monitored transaction, the start method 612 adds the start-time to the vector 620 of this thread, and places an identifier of the starting method 124 on the stack to note entry into the code of this component (block 716). FIG. 7B illustrates the operation of the "end" method 614. Each time the end method 614 is called, it initially checks the global structure to see if the thread has been marked as belonging to a monitored transaction (block 726), and skips over the remaining steps if it has not. If the thread is marked as belonging to a monitored transaction, the vector 620 is updated with the execution end time of the method 124 that just ended (block 728). The "end" method also pops the stack 622 (block 730), and then checks the top element of the stack to determine whether the now-ending method 124 had been called by another instrumented method 124 (block 732). If the identifier of another instrumented method 124 exists on the stack (indicating the existence of a nested call to a monitored method), processing is complete; otherwise, the vector 620 is updated to indicate that tracking of the thread is complete (block 738), since the method 124 that just ended is the first monitored method that was called as part of this thread. As described below, the measurements recorded within the vector 620 (including associated method and class identifiers) are preferably reported by the probe 122 asynchronously, rather than upon termination of the thread. At this point, monitoring of the transaction is not necessarily complete (unless the transaction is a real user transaction, in which case it is treated as complete), as the calling agent 110 can, in some embodiments, call other components 104 as part of the same transaction. For example, the agent 110 may, as part of the same transaction, request another servlet/JSP page. In this scenario, the above-described process is repeated to generate a new vector of measurements, which may later be associated or combined with the first vector of measurements by the calling agent 110 or another appropriate component. As depicted by block 740 in FIG. 7B, the method execution times recorded in the vector 620 may optionally be aggregated by the probe 122 upon completion of monitoring of the thread, or at the time of reporting, to calculate component execution times. This aggregation step may alternatively be performed in-whole or in-part outside the probe 122 and application server. Regardless of where and when the method execution times are aggregated, the execution time for each component 104 is preferably calculated as the sum of the execution times of all of its instrumented methods 124 that were invoked by the transaction. Ultimately, the method and component execution times generated over a period of time (and over multiple instances of the particular transaction) may be averaged for purposes of reporting to the user. In addition, the average component execution times may be aggregated by component type to generate data indicative of the amount of time spent by each type of component (EJBs, servlets, etc.) on the particular transaction. Further, data collected by multiple probes 122 (each of which runs on a respective application server 100) may be appropriately aggregated to generate data reflective of how a group of application servers is performing as a whole. The probe 122 reports the captured measurement data asynchronously, preferably but not necessarily via the agent 110 that executed the transaction. The measurements may be reported by the probe 122 in any appropriate form, such as raw method start and stop times, total execution times generated from these start and stop times, and/or aggregated or average execution times for specific components or component types. In one embodiment, the data reported by the probe 122 is transmitted to the corresponding agent 110 as an XML (Extensible Markup Language) file or sequence. The reported measurements associated with a particular transaction are stored in association with that transaction, such that breakdowns can be generated separately for each monitored transaction. The task of reporting the measurement data may be handled by a separate reporting thread 630 (FIG. 6), which may be started when the virtual machine 600 is started. This thread 630 may report the collected data at periodic intervals, at the completion of transaction execution, in response to polling messages, or using any other appropriate method. As mentioned above, the measurements may be reported together with associated data (transaction IDs, agent IDs, etc.) extracted from the associated HTTP requests. In one embodiment, the probe 122 reports the execution time measurements at the method level, and these measurements are aggregated outside the probe (e.g., by the reports server 120) as needed to generate component execution times, average execution times, etc. Components are thus used to specify groups of methods to be instrumented and monitored by the probe 122, and also to aggregate method execution times for display. In many cases, a given component or method will start and stop multiple times during execution of the particular thread and transaction. In such cases, the execution times generated through the above process preferably reflect the total execution time of each such component or method. For example, if a session EJB initially executes for 0.25 seconds before calling an entity EJB, and then executes for another 0.35 seconds after completion of the entity EJB before termination of the thread, the execution time for the session EJB would be 0.6 seconds. V. Tracking Transactions Across Process Boundaries In some cases, a given J2EE transaction may cross a process boundary. To track such a transaction across the process boundary, an ID of the transaction may be integrated into the native protocol that is being used for inter-process communication. For example, to transfer the transaction ID from a servlet to an EJB that is being called on a remote process/machine, the transaction may be added as one of the low-level parameters passed between the two processes. To accomplish this, the above-described instrumentation process may be appropriately supplemented to cause the monitored classes to pass the transaction ID. For example, for EJBs, the actual proxy/stub objects of the EJBs may be instrumented to add the additional information to the invocation. VI. Code Instrumentation using Patched ClassLoader Class of JVM As described in section III above, the Java Virtual Machine (JVM) 600 installed on a given application server 100 may be configured, using the Java Virtual Machine Profiling Interface (JVMPI) provided by Sun Microsystems, to cause the JVM 600 to pass classes to the probe 122 at load time. The probe 122 may then selectively and dynamically instrument those classes that are to be monitored. An alternative method that may be used involves adding a hook or "patch" to the JVM's ClassLoader class, so that the task of dynamically instrumenting those components that are to be monitored is performed by the patched ClassLoader class of the JVM 600. One benefit of this approach is that it is implemented purely in Java, and is thus platform independent. Another benefit is that it works regardless of whether the particular JVM installed on the application server 100 supports the Java Virtual Machine Profiling Interface. In one embodiment, this "patched ClassLoader" method is used as the default method for instrumenting each component, and the JVMPI method is used only if the patched ClassLoader method is unsuccessful. FIG. 8A illustrates how classes are instrumented once the patch has been added to the JVM ClassLoader class 800 of a JVM. The probe components used for recording and reporting execution times are omitted from this drawing, but may be the same as in FIG. 6. As illustrated, the instrumentation process is similar to the process depicted in FIG. 6, except that code instrumentation block 610 now receives the bytecodes of the classes being loaded before these classes are actually loaded. This occurs as the result of the hook (patch) having been added to the ClassLoader class 800. The task of adding the patch may be performed off-line using a configuration tool that runs on the application server 100 in conjunction with, or as a part of, the probe 122. FIG. 8B illustrates the steps that may be performed by this configuration tool to install the patch. This process only needs to be performed once per JVM installation. As depicted by block 810 in FIG. 8B, the configuration tool initially prompts the user to specify the path to the JVM installation directory used by the particular application server 100. Once this path has been specified by the user, the configuration tool retrieves the ClassLoader class from the specified directory and adds the code instrumentation patch (block 820). The patched ClassLoader class is then stored in a separate directory (block 830), such as a designated subdirectory of the probe's installation directory. Finally, the command line used by the operating system to launch the JVM is modified to cause the JVM to first look for bootclasspath classes in this special directory (block 840), so that the patched ClassLoader class will be loaded in place of the original ClassLoader class provided with the JVM. For example, if the probe is installed on the application server under c:.backslash.mercprobe, which includes the subdirectory c:.backslash.mercprobe.backslash.classes.backslash.boot, the configuration tool may store the patched class at C: .backslash.mercprobe.backslash.classes.backslash.boot.backslash.java.backs lash.lang.backslash.ClassLoader.class, and modify the command-line parameters for running the application server to include the following flag: "-Xbootclasspath/p:C: .backslash.mercprobe.backslash.classes.backslash.boot". The patched ClassLoader class may instrument J2EE components in the same manner as described above. Specifically, when a J2EE class is loaded, the patched ClassLoader class may use a configuration file 125 (or configuration information stored in another repository) to determine whether some or all of the methods of that J2EE class are to be monitored, and to instrument those methods that are to be monitored by adding calls to the probe's start and end methods 612, 614. VII. Monitoring of Additional Performance Parameters In addition to monitoring colored transactions as set forth above, the probe 122 may be designed to monitor and report certain application server performance parameters without regard to how the monitored components are invoked (e.g., by colored versus uncolored transactions). For example, in one embodiment, the probe 122 also monitors and reports the number of times each component (JSP, Session EJB, Entity EJB, JDBC, JNDI, etc.) is invoked over a given time period, and the average response time of each such component, without regard to how these components are invoked. These non-transaction-specific performance metrics may be reported to the database 118 in substantially the same manner as described above, and may be incorporated into performance reports that provide additional information about how the application server 100 is performing. For instance, these additional performance measurements may be used to provide reports that display the average response time, average number of hits per second, and average load factor of each servlet, session bean, method of a selected object, and entity bean. As with the transaction breakdown data reported by the probe 122, some or all of these non-transaction-specific metrics may be displayed separately for each application server 100 within a given web site system 112, or may be aggregated across multiple application servers. The load factor for each component or method is preferably calculated as a product of its average response time and its average hits per second values, and is a very useful measure of performance. These non-transaction-specific metrics may also be used as a basis for defining heuristics that specify which components and methods are to be instrumented for transaction-specific monitoring. For example, a heuristic may be defined specifying that all methods of the component having the longest average, non-transaction-specific response time over the last 24 hours are to be instrumented for transaction-specific monitoring. These non-transaction-specific response times may be measured by treating real user hits to specific URLs as implicit transactions. The transaction-specific performance data collected on colored, agent-based (synthetic) transactions may also be used to select implicit transactions (URLs) to monitor for purposes of monitoring real user activity. This may be accomplished by including logic within the probe--or another appropriate component--that identifies the currently worst performing transactions, and associates these with the URLs to which they correspond. Hits to these URLs may thereafter be treated as implicit transactions that are to be monitored, so that component breakdown data is collected by the probe both for agent-based and real user instances of the relevant transactions. Although this invention has been described in terms of certain preferred embodiments and applications, other embodiments and applications that are apparent to those of ordinary skill in the art, including embodiments which do not provide all of the features and advantages set forth herein, are also within the scope of this invention. Accordingly, the scope of the present invention is intended to be defined only by reference to the appended claims.
|
Same subclass Same class Consider this |
||||||||||
