Method and system of an integrated simulation tool using business patterns and scripts6768968Abstract A method, system and article of manufacture for estimating the performance of a computer system are provided. Initially, a business pattern representative of the expected usage of the computer system is identified. Then, for each parameter associated with each predefined script, which corresponds to the identified business pattern, a value is established. The computer system hardware characteristics and performance objectives are identified next. The performance estimate is then calculated utilizing the established parameter values, identified hardware characteristics and performance objectives. To calculate the performance estimate, the script measurements data is read from a table of previously measured values, and a weighted average number of page visits per user, a weighted average visit rate and a weighted average service time for each target device in the computer system are calculated. A total response time and a system throughput are calculated by varying each target device queue length and user arrival rate until the performance objectives are reached. Claims What is claimed is: Description FIELD OF THE INVENTION
User-to-Online-Buying Business Pattern
Frequency
Script Name Page Visits per Script Of Script
Browse Visit home page 85%
Category display (random department)
Category display (random category)
Category display (random subcategory)
Product display (random product)
Leave web site
Search Visit home page 10%
Select "Search" to go to Search panel
Enter keywords and press Find
Select New Search
Enter keywords and press Find
Leave web site
Buy Visit home page 5%
Select Sign In from home page
Select Sign On from menu
Enter user ID and select Sign-On
Go to Specialty Shop
Choose Shop (category display)
Product display
Add to shopping bag
Select Confirm on shipping info pop-up
Select Checkout
Select Continue Checkout
Enter credit card info and select Buy Now
Leave web site
Part of the definition of a business pattern is specifying the relative proportion or "frequency" of scripts in the customer mix. A new user may have equal probabilities of executing any scripts, or some very common scripts may get executed with a high frequency. Changing the relative frequency of scripts within the mix also allows different browse/buy ratios to be specified. Given a set of predefined scripts, measurements of their path lengths, disk I/O operations, and network transfers can be made on an existing system. The actual content of each script, as well as the measured parameters may tend to gradually change over time as usage patterns and software characteristics change. Often the exact content of the scripts and their measured parameters will be considered a trade secret. SUMMARY OF THE INVENTION The present invention discloses a method, system and article of manufacture for estimating the performance of a computer system. Initially, a business pattern representative of the expected usage of the computer system is identified. Then, for each parameter associated with each predefined script, which corresponds to the identified business pattern, a value is established. The computer system hardware characteristics and performance objectives are identified next. The performance estimate is then calculated utilizing the established parameter values, identified hardware characteristics and performance objectives. To calculate the performance estimate, the script measurements data is read from a table of previously measured values, and a weighted average number of page visits per user, a weighted average visit rate and a weighted average service time for each target device in the computer system are calculated. A total response time and a system throughput are calculated by varying each target device queue length and user arrival rate until the performance objectives are reached. The present invention provides an integrated simulation tool (i.e. a modeling tool) for projecting system performance without detailed knowledge of the workload characteristics being required. The present invention uses "business patterns" and "scripts" for typical computer installations to define the relevant workload characteristics. The "business patterns" describe the type of work that a computer installation will be used for (e.g. on-line shopping, on-line trading, etc.). The "scripts" describe typical operations within a business pattern (e.g. browse a catalog, buy an item, get a stock quote, etc.). Both the collection of business patterns and scripts are defined based on detailed studies of actual customer operations. The user of the modeling tool in accordance with the present invention can define a workload by specifying a business pattern and the relative frequencies of scripts within that pattern that best match the workload on some current or future computer system. The modeling tool will then construct the needed description of a composite workload for the performance estimates based on a weighted average of previous data collected from actual measurements for various scripts on various hardware or software combinations. Abstracted data from previous measurements are kept in database tables within the integrated modeling tool. This information is then used in an integrated analytic simulation model that employs variations of Mean Value Analysis techniques to produce performance estimates for a computer system. BRIEF DESCRIPTION OF THE DRAWINGS The present invention will become more apparent to those of ordinary skill in the art after considering the preferred embodiments described herein with reference to the attached drawings in which like reference numbers represent corresponding elements throughout: FIG. 1 illustrates an exemplary computer hardware environment that may be used in accordance with the present invention. FIG. 2 illustrates a flow diagram of the steps performed by a modeling tool in accordance with the present invention to estimate system performance using business patterns. FIG. 3 illustrates a flow diagram of the steps performed to derive an average composite workload from the scripts in the selected business pattern. FIG. 4 illustrates a flow diagram of the steps performed to estimate system performance using a standard Mean Value Analysis methodology. DETAIL DESCRIPTION OF THE PREFERRED EMBODIMENT In the following description of the preferred embodiment, reference is made to the accompanying drawings which form a part thereof, and which is shown by way of illustration a specific embodiment in which the present invention may be practiced. It is to be understood that other embodiments may be utilized as structural changes may be made without departing from the scope of the present invention. Hardware Environment FIG. 1 illustrates an exemplary computer hardware environment that may be used in accordance with the present invention. In the exemplary environment, the infrastructure 100 supporting most high volume web sites typically has multiple components which include clients 101, the network 103, a special purpose server called edge server 105, and one or more computer systems with multiple layers of server machines 109, 111 and 113 within the web server 107. These multiple server machine layers are frequently called tiers, with each tier handling a particular set of functions such as serving content (i.e. web presentation servers 109), providing integration business logic (i.e. web application servers 111), or processing database transactions (i.e. database servers 113). The clients 101 are devices that serve as the interface to the user. For example, the clients comprise a personal computer running a web browser, or a wireless device for mobile users. The type of client determines the delay associated with the client software operations for sending and receiving requests to the web server 107. The network 103, for example the Internet, is modeled as a generic delay associated with transferring data between the web server 107 and the client 101. Specific queuing delays along the network 103, and the effects of caching and content serving within the network 103 are not modeled. The edge server 105 is typically a special purpose server acting as an interface between the network 103 and the rest of the web server 107. It can be implemented as a single server or multiple servers acting in parallel. The edge server 105 may implement any or all of the following functions: Firewall--which implements security features, Network Dispatcher--which routes incoming requests to multiple server nodes in the next tier, and Content Caching--which holds cached copies of common content files (e.g. html, jpeg, etc.) and supplies them directly to clients 101 without interacting with the rest of the web server 107. The web presentation servers (i.e. HTTP servers) 109 respond to http requests from clients 101 and either supply static content if available, or pass the request on to the next tier. The presentation servers 109 are typically (but not necessarily) implemented as a number of small servers operating in parallel. The web application servers 111 provide integration business logic needed to execute the actual web application. The web application servers 111 are typically (but not necessarily) implemented as a number of small to medium servers operating in parallel. The database servers 113 are used to process database transactions requiring a high level of reliability, such as financial transactions. The database servers 113 are typically (but not necessarily) implemented as a single large SMP (Symmetric Multi Processor) server. A second SMP server is often configured as a standby backup server. Those of ordinary skill in the art will recognize that present invention is not limited to the web server configuration described above. For example, the three-tier web server of the exemplary environment may be combined into a two-tier or a single-tier structure. In a two-tier structure, the presentation and application tiers are implemented on a single "web tier", and the database server is implemented on a physically separate server. Those of ordinary skill in the art will further recognize that the computer system of the present invention may be comprised of a computer with one or more computer processors, one or more external storage devices, output devices such as a computer display monitor and a printer, a textual input device such as a computer keyboard, a graphical input device such as a mouse, and a memory unit. The computer processor is connected to the external storage device, the display monitor, the printer, the keyboard, the mouse, and the memory unit. The external storage device and the memory unit may be used for the storage of data and computer program code. The external storage device may be a fixed or hard disk drive, a floppy disk drive, a CDROM drive, a tape drive, or other device locally or remotely (e.g. via Internet) connected. The functions of the present invention are performed by the computer processor executing computer program codes, which is stored in the memory unit or the external storage device. The computer system may suitably be any one of the types that are well known in the art such as a mainframe computer, a minicomputer, a workstation, or a personal computer. The computer system may run any of a number of well known computer operating systems including IBM OS/390.RTM., IBM AS/400.RTM., IBM OS/2.RTM., Microsoft Windows NT.RTM., Microsoft Windows 2000.RTM., and many variations of OSF UNIX. FIG. 2 illustrates a flow diagram of the steps performed using a modeling tool in accordance with the present invention to estimate system performance using business patterns and scripts. In step 201, a business pattern is selected from a supplied list of business patterns that most resembles the expected usage of the computer system. This list may include business patterns such as: user-to-data (e.g. sites that provide information aggregation such as search engines, newspapers and magazines, Olympics and Wimbledon), user-to-business (e.g. sites for self-service system interactions such as online banking, online trading, making travel arrangements, tracking packages), user-to-online-buying (e.g. sites for electronic commerce such as buying books, cars, clothes), business-to-business (e.g. e-Marketplace sites such as those providing supply chain management), and user-to-user (e.g. sites that provide collaboration among individual users such as e-mails and instant messengers), etc. A detailed description of each business pattern is supplied to the modeling tool user. Within each business pattern, a number of predefined scripts are defined. A predefined script is a complete user session consisting of multiple web interactions to accomplish some task. For example, for the online shopping business pattern, predefined scripts may include: "browse" (i.e. browsing a catalog), "search" (i.e. searching for a specific product), and "buy" (i.e. purchasing a product). In addition, each script is associated with a number of parameters. For example, the "script frequency" parameter is the relative proportion (i.e. percentage) of scripts in a given business pattern. The sum of the percentages for all the scripts within a given business pattern equals to 100%. Or, the "page visits" per script parameter is the number of different page visits that make up each script. It is a measure of how long the user is interacting with the system. Those of ordinary skill in the art will recognize that various other predefined scripts and associated parameters could be defined within a business pattern to measure a workload. The "script frequency" and the "page visits" are not the sole determining factor of the amount of work done on the system--the script details also play a role. For example, a "buy" script will typically generate much more work on the backend database server than a "browse" script. In step 203, a predefined script is selected. Next in step 205, the script frequency (i.e. the percentage of times a user entering the system will execute that script), is specified. Alternatively, a default value may be specified. In step 207, the number of page visits per script is specified next. Each predefined script will be based on a specific number of user steps or actions, each of which corresponds to a single page visit. If it is determined that a typical user interaction for the workload to be modeled has somewhat more (or less) page visits than the default page visits for the predefined script, the number of page visits for each predefined script may be modified to reflect this increase (or decrease) in complexity. If in step 209 it is determined that there are additional predefined scripts in this business pattern, the process is moved back to step 203 to select another script and continue to specify the associated parameters for that script in steps 205 and 207. Otherwise, the process is moved to step 211. After all the predefined scripts and their associated parameters are specified, the hardware characteristics of the computer system for which performance modeling is to be performed, is specified in step 211. This typically involves specifying the number and type of the target devices (i.e. devices being modeled) such as processors and disks comprising the system hardware. Other devices such as busses and network connections can also be specified depending on the level of detail included in the model. Next, in step 213 the objectives for the performance calculations, such as the user arrival rate objective and maximum allowed response time objective are specified. The actual performance calculations are done in step 215. These calculations are described in more detail in FIG. 3 and FIG. 4. FIG. 3 illustrates a flow diagram of the steps performed to derive an average composite workload from the predefined scripts in the selected business pattern. The algorithm that performs those steps starts with selecting a predefined script in step 301. In step 303, the script measurement data values are read from a table of previously measured (or estimated) data. This includes data values for the processor service time and the number of disk IO operations per page visit for the selected predefined script. Those of ordinary skill in the art will recognize that present invention is not limited to these measured data. Additional measured data, such as the number of bytes transmitted or received (i.e. communicated) over the network can also be included depending on the details to be modeled. In step 305, the visit rates per page and service times per visit for each target device are calculated. For disks, the visit rate per page is given by: diskVisitRate=(Disk IO per page visit)/(total disks in system); The disk service time (i.e. diskServTime) is typically given as a fixed average service time in the neighborhood of 10 ms-15 ms. For processors, each disk IO results in a separate processor service interval. So, the total processor visits rate per page (per device) is given by: procVisitRate=(1+(Disk IO per page visit))/(total processors in the system); The processor service time per visit is given by: procServTime=(Processor service time per page visit)/(1+Disk IO per page visit); If in step 307 it is determined that there are additional predefined scripts in this business pattern, the above steps are repeated for the next script starting at step 301. Otherwise, step 309 is performed. When the calculations have been completed for all scripts, a weighted average composite application workload is calculated for each target device. First in step 309, the weighted average number of page visits per user is calculated by taking a weighted average across all scripts:
avgPageVisitis = 0;
For i = 1 to (number_of_scripts);
avgPageVisitis = avgPageVisitis + (pageVisits(i) * freq(i));
Next i;
Where: pageVisits(i)=number of page visits specified for script "i" freq(i)=specified fraction of users that execute script "i" Then in step 311, the average visit rates per page for processors and disks can be calculated by taking a weighted average across all scripts. The same method can be used for the average processor service time:
avgProcServTime = 0;
avgProcVisitRate = 0;
avgDiskVisitRate = 0;
For i = 1 to (number_of_scripts);
avgProcServTime = avgProcServTime +
(procServTime(i)*pageVisits(i)*freq(i));
avgProcVisitRate = avgProcVisitRate +
(procVisitRate(i)*pageVisits(i)*freq(i));
avgDiskVisitRate = avgDiskVisitRate +
(diskVisitRate(i)*pageVisits(i)*freq(i));
Next i;
avgProcServTime = avgProcServTime / avgPageVisits;
avgProcVisitRate = avgProcVisitRate / avgPageVisits;
avgDiskVisitRate = avgDiskVisitRate / avgPageVisits;
Note that the average disk service time (i.e. avgDiskServTime) can be assumed to be a fixed value between 10 ms and 15 ms. FIG. 4 illustrates a flow diagram of the steps performed to estimate system performance using a standard Mean Value Analysis methodology. Given the composite application workload calculated from the weighted averages of the predefined scripts, we can estimate the performance for that applications running on a specified hardware configuration. This is done by exploiting Mean Value Analysis techniques, a variation of which is described below. In step 401, the calculations are started by initializing the queue lengths to zero for the processors, disks, and any other devices, which are to be modeled (i.e. target devices). The user arrival rate is also set to zero. In step 403, the user arrival rate is incremented by a fixed step, such as by 0.1 users per second. In step 405, the response times are calculated for each type of target device being modeled in the system using the following equations: procRespTime=avgProcServTime*(1+procQueLength); diskRespTime=avgDiskServTime*(1+diskQueLength); In step 407, the total response time per page visit is calculated by summing up the response time for all devices. This can be done with the following equation: ##EQU1## In step 409, the total system throughput in terms of page visits per second can be calculated from the user arrival rate using the following equation: throughput=avgPageVisits*(user arrival rate); In step 411, a test is done to determine if the specified performance objective have been reached. The objectives include a specified user arrival rate, or average response time, or any number of other criteria specified in step 213 of FIG. 2. If the objectives were reached, the calculations are done and the results can be displayed. If the objectives have not been reached yet, new queue lengths are calculated for each type of target device in step 413. The following equations are used: procQueLength=throughput*avgProcVisitRate*procRespTime; diskQueLength=throughput*avgDiskVisitRate*diskRespTime; The calculations are then iterated again starting with step 403 until the objectives are finally reached. Those of ordinary skill in the art will recognize that the present invention and the algorithms described above are not limited to a specific hardware. The above algorithms can be extended to more detailed modeling in a number of other ways. For example, it can be used to model multi-tiered hardware systems, raid disks, paging operations, or other hardware facilities such as busses, network connections, and edge servers. Those of ordinary skill in the art will further recognize that although the system simulator of the present invention is originally intended to model complex web sites, the methodology can be equally applied to other computer systems. The applications are defined based on the intended uses of the computer system and detailed knowledge of the workload characteristics is not necessary, although it can be used to increase the accuracy of the simulation. Using this simulator, typical IT infrastructures can be analyzed and related models can be developed to assist in predicting and planning how to meet future requirements.
|
Same subclass Same class Consider this |
||||||||||
