|
|
|
Coherency (e.g., same view to multiple users) |
Controlling data collection, manipulation and storage on a network with service assurance capabilities6985901
Abstract
The present disclosure provides for controlling the collection, manipulation and storage of network performance data and network even data of a network with service assurance capabilities. Upon the receipt of an activation signal, a signal is transmitted to initiate the retrieving of network performance data and network event data generated from at least one network monitor. Such network monitor is adapted for monitoring a network system and the relocating of the data into a common directory. A signal is then transmitted to initiate the manipulation of the data and the loading of the manipulated data into a database.
Claims
What is claimed is:
1. A method for controlling a collection, manipulation and storage of network performance data and network event data of a network with service assurance capabilities, comprising the steps of:
(a) receiving an activation signal;
(b) transmitting a first signal to initiate retrieval of network performance data and network event data, wherein the network performance and network event data are generated from a network-monitor that monitors a network system and then relocates the data into a common directory;
(c) transmitting a second signal to initiate manipulation of the data and loading of the manipulated data into a database;
(d) transmitting a third signal to initiate a clean archive program for deleting files from an archive directory;
(e) transmitting a fourth signal to initiate a table extract program for extracting data from tables stored in the database;
(f) transmitting a fifth signal to initiate a trigger reporting program for generating reports;
(g) transmitting a sixth signal to initiate a purge record program for deleting records from the database; and
(h) transmitting a seventh signal to initiate a database backup program for backing up data stored on the database.
2. A method as recited in claim 1, further comprising: initiating a a controller program.
3. A method as recited in claim 2, wherein the controller program transmits an eighth signal to initiate calculation of statistics from the data.
4. A method as recited in claim 3, wherein the controller program transmits a ninth signal to initiate assigning of key codes to the data.
5. A method as recited in claim 4, wherein the controller program transmits a tenth signal to initiate sorting of the data.
6. A method as recited in claim 5, wherein the controller program transmits an eleventh signal to initiate concatenating of the data.
7. A method as recited in claim 6, wherein the controller program transmits a twelfth signal to initiate loading of the data into the database.
8. A computer program embodied on a computer readable medium for controlling a collection, manipulation and storage of network performance data and network event data of a network with service assurance capabilities, comprising:
(a) a code segment for receiving an activation signal;
(b) a code segment for transmitting a first signal to initiate retrieval of network performance data and network event data, wherein the network performance and network event data are generated from a network-monitor that monitors network system and then relocates the data into a common directory;
(c) a code segment for transmitting a second signal to initiate manipulation of the data and loading of the manipulated data into a database;
(d) a code segment for transmitting a third signal to initiate a clean archive program for deleting files from an archive directory;
(e) a code segment for transmitting a fourth signal to initiate a table extract program for extracting data from tables stored in the database;
(f) a code segment for transmitting a fifth signal to initiate a trigger reporting program for generating reports;
(g) a code segment for transmitting a sixth signal to initiate a purge record program for deleting records from the database; and
(h) a code segment for transmitting a seventh signal to initiate a database backup program for backing up data stored on the database.
9. A computer program as recited in claim 8, further comprising:
initiating a a controller program.
10. A computer program as recited in claim 9, wherein the controller program transmits an eighth signal to initiate calculation of statistics from the data.
11. A computer program as recited in claim 10, wherein the controller program transmits a ninth signal to initiate assigning of key codes to the data.
12. A computer program as recited in claim 11, wherein the controller program transmits a tenth signal to initiate sorting of the data.
13. A computer program as recited in claim 12, wherein the controller program transmits an eleventh signal to initiate concatenating of the data.
14. A computer program as recited in claim 13, wherein the controller program transmits a twelfth signal to initiate loading of the data into the database.
15. A system for controlling a collection, manipulation and storage of network performance data and network event data of a network with service assurance capabilities, comprising:
(a) logic for receiving an activation signal;
(b) logic for transmitting a first signal to initiate retrieval of network performance data and network event data, wherein the network performance and network event data are generated from a network-monitor that monitors a network system and then relocates the data into a common directory;
(c) logic for transmitting a second signal to initiate manipulation of the data and loading of the manipulated data into a database;
(d) logic for transmitting a third signal to initiate a clean archive program for deleting files from an archive directory;
(e) logic for transmitting a fourth signal to initiate a table extract program for extracting data from tables stored in the database;
(f) logic for transmitting a fifth signal to initiate a trigger reporting program for generating reports;
(g) logic for transmitting a sixth signal to initiate a purge record program for deleting records from the database; and
(h) logic for transmitting a seventh signal to initiate a database backup program for backing up data stored on the database.
16. A system as recited in claim 15, further comprising: logic for initiating a a controller program.
17. A system as recited in claim 16, wherein the controller program transmits an eighth signal to initiate calculation of statistics from the data.
18. A system as recited in claim 17, wherein the controller program transmits a ninth signal to initiate assigning of key codes to the data.
Description
FIELD OF THE INVENTION
The present invention relates to the data collection control and more particularly to data collection, manipulation and storage on a network with service assurance capabilities.
BACKGROUND OF THE INVENTION
A network system architecture enables multiple applications like voice, data and video to be mixed over the network that comprises switching nodes and communications links. Networks for these applications generally have been designed for a particular type of application, such as teleconferencing applications. Different types of applications, however, have diverse requirements.
"Quality of Service" or "QoS" parameters in part define these requirements. In the case of ATM networks, such parameters may include errored cell ratio (ECR), cell loss ratio, fixed delay and delay variation parameters. Applications may also indicate a set of parameters called a "traffic contract" that relates to the bandwidth required from the network. The Errored Cell Ratio (ECR) value is the ratio of the ATM cells in error to all the ATM cells sent during some interval. The remaining parameters are described later. Each application may have a maximum or minimum acceptable value for each parameter and a requested value for each parameter.
As previously indicated, networks are implemented with switching nodes and communications links. The communications links typically have been based upon conventional telephony transmission systems and comprised fiber optic, microwave or wireline links. Fiber optic links transfer typically an error rate of 10.sup.-9; microwave and wireline connections, 10.sup.-7. More recently, communications links have begun to comprise terrestrial and satellite mobile wireless communications links and cable television systems. Each of these communications links are prone to introducing errors at significantly greater rates.
Some present ATM networks attempt to overcome such errors by implementing a transformation process in the form of a correction, or data manipulation, technique at each switching node on a "per-communications link" basis. That is, the network is designed such that certain links operate with a transforming process that is not alterable on a dynamic basis. For example, it is fairly common that satellite systems have a single error correcting code that operates over the entire bit stream passing over the link, notwithstanding the specific application. Using error correcting codes, such as forward error correction (FEC) codes, requires significant amounts of redundant information to be sent with each block of bits wherein a "block" may comprise a partial ATM cell, a single ATM cell or a plurality of ATM cells. This redundant information adds "overhead" to the transfer. This, in turn, reduces the bandwidth available for transporting an ATM cell. To be effective, an error correcting code must also match the anticipated error burst characteristics of the transmission system; systems that are subject to longer bursts require larger codewords or interleaving or both. Interleaving adds significant amounts of fixed delay to that already present in the transmission system. Such fixed delays manifest themselves, for example, as awkward conversation and user discomfort in a teleconferencing application or even as motion sickness and disorientation in a virtual reality application.
Despite the foregoing efforts, network failures are inevitable, and there is a need of monitoring network performance for the purpose of maintaining a predetermined agreed upon QoS.
SUMMARY OF THE INVENTION
The present invention provides for controlling the collection, manipulation and storage of network performance data and network event data of a network with service assurance capabilities. Upon the receipt of an activation signal, a signal is transmitted to initiate the retrieving of network performance data and network event data generated from at least one network monitor. Such network monitor is adapted for monitoring a network system and the relocating of the data into a common directory. A signal is then transmitted to initiate the manipulation of the data and the loading of the manipulated data into a database.
In one embodiment, the signal to initiate the manipulating and storing of the data initiates a controller program that transmits a signal to initiate the manipulation of the data. Such controller program transmits a signal to initiate the calculation of statistics from the data, and further transmits a signal to initiate the assigning of key codes to the data.
In one aspect of the present invention, the controller program also transmits a signal to initiate the sorting of the data, and further transmits a signal to initiate the concatenating of the data. Still yet, the controller program transmits a signal to initiate the loading of the data into the database.
As an option, the present invention may further transmit a signal to initiate a clean archive program for deleting files from an archive directory, a signal to initiate a table extract program for extracting data from tables stored in the database, a signal to initiate a trigger reporting program for generating reports, a signal to initiate a purge record program for deleting records from the database, and a signal to initiate a database backup program for backing up data stored on the database.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be better understood when consideration is given to the following detailed description thereof. Such description makes reference to the annexed drawings wherein:
FIG. 1 is a schematic diagram of a hardware implementation of one embodiment of the present invention;
FIG. 2 illustrates an embodiment of the present invention which provides service assurance for a network;
FIG. 3 illustrates one embodiment of the present invention for dispatching network event information of a network with service assurance capabilities;
FIG. 4 illustrates an embodiment of the present invention for assigning time-based identifiers to data stored in a database;
FIG. 5 is a flowchart illustrating an embodiment of the present invention for collecting data;
FIG. 6 is a flowchart that provides an overview of a data collection process of one embodiment of the present invention;
FIG. 7 illustrates an embodiment of the present invention for retrieving and relocating event and performance data of a network with service assurance capabilities;
FIG. 8 illustrates operation of the main data acquisition script (get—data.pl) for each entry in the configuration file (get—data.cfg) during data acquisition in accordance with one embodiment of the present invention;
FIG. 9 illustrates an exemplary configuration of an architecture of the present invention for manipulating data using a Patroller and an Event Controller and Manager;
FIG. 10 depicts an embodiment of the present invention which generates an alarm to indicate a status of a network for service assurance purposes;
FIG. 11 illustrates an embodiment of the present invention used to develop a database;
FIG. 12 illustrates an exemplary database configuration chart of an embodiment of the present invention;
FIG. 13 illustrates an exemplary report indicating required table space allotments in a database;
FIG. 14 depicts an exemplary report which lists statistics relating to the data in the database in accordance with one embodiment of the present invention;
FIG. 15 is an exemplary report indicating times to load various amounts of data in accordance with an exemplary embodiment of the present invention;
FIG. 16 depicts an embodiment of the present invention which graphically conveys availability in a network with service assurance capabilities
FIG. 17A depicts a main ad hoc screen for navigating when making an ad hoc request for a report in accordance with an embodiment of the present invention;
FIG. 17B illustrates a graphical user interface for selecting a report type in accordance with one embodiment of the present invention;
FIG. 18 illustrates a form which receives report criteria in accordance with one embodiment of the present invention;
FIG. 19 illustrates exemplary batch reporting requirements in accordance with one embodiment of the present invention;
FIG. 20 lists generic report types;
FIG. 21 shows exemplary ad hoc reporting requirements in accordance with one embodiment of the present invention;
FIG. 22 illustrates sample metric information in accordance with an embodiment of the present invention;
FIG. 23 depicts a line graph report in accordance with one embodiment of the present invention;
FIG. 24 shows a line graph report in accordance with one embodiment of the present invention;
FIG. 25 depicts a comparison box-plot report in accordance with an embodiment of the present invention;
FIG. 26 illustrates a trend box-plot report in accordance with an embodiment of the present invention;
FIG. 27 depicts a bar graph report in accordance with an embodiment of the present invention;
FIG. 28 shows an availability spectrum report in accordance with an embodiment of the present invention;
FIG. 29 illustrates an exception spectrum report in accordance with an embodiment of the present invention;
FIG. 30 depicts an exception test report in accordance with an embodiment of the present invention;
FIG. 31 illustrates an embodiment of the present invention which maps events on a network with service assurance capabilities; and
FIG. 32 depicts an exemplary data/directory structure in accordance with one embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
A preferred embodiment of a system in accordance with the present invention is preferably practiced in the context of a personal computer such as an IBM compatible personal computer, Apple Macintosh computer or UNIX based workstation. A representative hardware environment is depicted in FIG. 1, which illustrates a typical hardware configuration of a workstation in accordance with a preferred embodiment having a central processing unit 110, such as a microprocessor, and a number of other units interconnected via a system bus 112. The workstation shown in FIG. 1 includes a Random Access Memory (RAM) 114, Read Only Memory (ROM) 116, an I/O adapter 118 for connecting peripheral devices such as disk storage units 120 to the bus 112, a user interface adapter 122 for connecting a keyboard 124, a mouse 126, a speaker 128, a microphone 132, and/or other user interface devices such as a touch screen (not shown) to the bus 112, communication adapter 134 for connecting the workstation to a communication network (e.g., a data processing network) and a display adapter 136 for connecting the bus 112 to a display device 138. The workstation typically has resident thereon an operating system such as the Microsoft Windows NT or Windows/95 Operating System (OS), the IBM OS/2 operating system, the MAC OS, or UNIX operating system. Those skilled in the art will appreciate that the present invention may also be implemented on platforms and operating systems other than those mentioned.
A preferred embodiment is written using JAVA, C, and the C++ language and utilizes object oriented programming methodology. Object oriented programming (OOP) has become increasingly used to develop complex applications. As OOP moves toward the mainstream of software design and development, various software solutions require adaptation to make use of the benefits of OOP. A need exists for these principles of OOP to be applied to a messaging interface of an electronic messaging system such that a set of OOP classes and objects for the messaging interface can be provided.
OOP is a process of developing computer software using objects, including the steps of analyzing the problem, designing the system, and constructing the program. An object is a software package that contains both data and a collection of related structures and procedures. Since it contains both data and a collection of structures and procedures, it can be visualized as a self-sufficient component that does not require other additional structures, procedures or data to perform its specific task. OOP, therefore, views a computer program as a collection of largely autonomous components, called objects, each of which is responsible for a specific task. This concept of packaging data, structures, and procedures together in one component or module is called encapsulation.
In general, OOP components are reusable software modules which present an interface that conforms to an object model and which are accessed at run-time through a component integration architecture. A component integration architecture is a set of architecture mechanisms which allow software modules in different process spaces to utilize each others capabilities or functions. This is generally done by assuming a common component object model on which to build the architecture. It is worthwhile to differentiate between an object and a class of objects at this point. An object is a single instance of the class of objects, which is often just called a class. A class of objects can be viewed as a blueprint, from which many objects can be formed.
OOP allows the programmer to create an object that is a part of another object. For example, the object representing a piston engine is said to have a composition-relationship with the object representing a piston. In reality, a piston engine comprises a piston, valves and many other components; the fact that a piston is an element of a piston engine can be logically and semantically represented in OOP by two objects.
OOP also allows creation of an object that "depends from" another object. If there are two objects, one representing a piston engine and the other representing a piston engine wherein the piston is made of ceramic, then the relationship between the two objects is not that of composition. A ceramic piston engine does not make up a piston engine. Rather it is merely one kind of piston engine that has one more limitation than the piston engine; its piston is made of ceramic. In this case, the object representing the ceramic piston engine is called a derived object, and it inherits all of the aspects of the object representing the piston engine and adds further limitation or detail to it. The object representing the ceramic piston engine "depends from" the object representing the piston engine. The relationship between these objects is called inheritance.
When the object or class representing the ceramic piston engine inherits all of the aspects of the objects representing the piston engine, it inherits the thermal characteristics of a standard piston defined in the piston engine class. However, the ceramic piston engine object overrides these ceramic specific thermal characteristics, which are typically different from those associated with a metal piston. It skips over the original and uses new functions related to ceramic pistons. Different kinds of piston engines have different characteristics, but may have the same underlying functions associated with it (e.g., how many pistons in the engine, ignition sequences, lubrication, etc.). To access each of these functions in any piston engine object, a programmer would call the same functions with the same names, but each type of piston engine may have different/overriding implementations of functions behind the same name. This ability to hide different implementations of a function behind the same name is called polymorphism and it greatly simplifies communication among objects.
With the concepts of composition-relationship, encapsulation, inheritance and polymorphism, an object can represent just about anything in the real world. In fact, one's logical perception of the reality is the only limit on determining the kinds of things that can become objects in object-oriented software. Some typical categories are as follows: - Objects can represent physical objects, such as automobiles in a traffic-flow simulation, electrical components in a circuit-design program, countries in an economics model, or aircraft in an air-traffic-control system.
- Objects can represent elements of the computer-user environment such as windows, menus or graphics objects.
- An object can represent an inventory, such as a personnel file or a table of the latitudes and longitudes of cities.
- An object can represent user-defined data types such as time, angles, and complex numbers, or points on the plane.
With this enormous capability of an object to represent just about any logically separable matters, OOP allows the software developer to design and implement a computer program that is a model of some aspects of reality, whether that reality is a physical entity, a process, a system, or a composition of matter. Since the object can represent anything, the software developer can create an object which can be used as a component in a larger software project in the future.
If 90% of a new OOP software program consists of proven, existing components made from preexisting reusable objects, then only the remaining 10% of the new software project has to be written and tested from scratch. Since 90% already came from an inventory of extensively tested reusable objects, the potential domain from which an error could originate is 10% of the program. As a result, OOP enables software developers to build objects out of other, previously built objects.
This process closely resembles complex machinery being built out of assemblies and sub-assemblies. OOP technology, therefore, makes software engineering more like hardware engineering in that software is built from existing components, which are available to the developer as objects. All this adds up to an improved quality of the software as well as an increased speed of its development.
Programming languages are beginning to fully support the OOP principles, such as encapsulation, inheritance, polymorphism, and composition-relationship. With the advent of the C++ language, many commercial software developers have embraced OOP. C++ is an OOP language that offers a fast, machine-executable code. Furthermore, C++ is suitable for both commercial-application and systems-programming projects. For now, C++ appears to be the most popular choice among many OOP programmers, but there is a host of other OOP languages, such as Smalltalk, Common Lisp Object System (CLOS), and Eiffel. Additionally, OOP capabilities are being added to more traditional popular computer progranu\ring languages such as Pascal.
The benefits of object classes can be summarized, as follows: - Objects and their corresponding classes break down complex programming problems into many smaller, simpler problems.
- Encapsulation enforces data abstraction through the organization of data into small, independent objects that can communicate with each other. Encapsulation protects the data in an object from accidental damage, but allows other objects to interact with that data by calling the object's member functions and structures.
- Subclassing and inheritance make it possible to extend and modify objects through deriving new kinds of objects from the standard classes available in the system. Thus, new capabilities are created without having to start from scratch.
- Polymorphism and multiple inheritance make it possible for different programmers to mix and match characteristics of many different classes and create specialized objects that can still work with related objects in predictable ways.
- Class hierarchies and containment hierarchies provide a flexible mechanism for modeling real-world objects and the relationships among them.
- Libraries of reusable classes are useful in many situations, but they also have some limitations. For example:
- Complexity. In a complex system, the class hierarchies for related classes can become extremely confusing, with many dozens or even hundreds of classes.
- Flow of control. A program written with the aid of class libraries is still responsible for the flow of control (i.e., it must control the interactions among all the objects created from a particular library). The programmer has to decide which functions to call at what times for which kinds of objects.
- Duplication of effort. Although class libraries allow programmers to use and reuse many small pieces of code, each programmer puts those pieces together in a different way. Two different programmers can use the same set of class libraries to write two programs that do exactly the same thing but whose internal structure (i.e., design) may be quite different, depending on hundreds of small decisions each programmer makes along the way. Inevitably, similar pieces of code end up doing similar things in slightly different ways and do not work as well together as they should.
Class libraries are very flexible. As programs grow more complex, more programmers are forced to reinvent basic solutions to basic problems over and over again. A relatively new extension of the class library concept is to have a framework of class libraries. This framework is more complex and consists of significant collections of collaborating classes that capture both the small scale patterns and major mechanisms that implement the common requirements and design in a specific application domain. They were first developed to free application programmers from the chores involved in displaying menus, windows, dialog boxes, and other standard user interface elements for personal computers.
Frameworks also represent a change in the way programmers think about the interaction between the code they write and code written by others. In the early days of procedural programming, the programmer called libraries provided by the operating system to perform certain tasks, but basically the program executed down the page from start to finish, and the programmer was solely responsible for the flow of control. This was appropriate for printing out paychecks, calculating a mathematical table, or solving other problems with a program that executed in just one way.
The development of graphical user interfaces began to turn this procedural programming arrangement inside out. These interfaces allow the user, rather than program logic, to drive the program and decide when certain actions should be performed. Today, most personal computer software accomplishes this by means of an event loop which monitors the mouse, keyboard, and other sources of external events and calls the appropriate parts of the programmer's code according to actions that the user performs. The programmer no longer determines the order in which events occur. Instead, a program is divided into separate pieces that are called at unpredictable times and in an unpredictable order. By relinquishing control in this way to users, the developer creates a program that is much easier to use. Nevertheless, individual pieces of the program written by the developer still call libraries provided by the operating system to accomplish certain tasks, and the programmer must still determine the flow of control within each piece after it's called by the event loop. Application code still "sits on top of" the system.
Even event loop programs require programmers to write a lot of code that should not need to be written separately for every application. The concept of an application framework carries the event loop concept further. Instead of dealing with all the nuts and bolts of constructing basic menus, windows, and dialog boxes and then making these things all work together, programmers using application frameworks start with working application code and basic user interface elements in place. Subsequently, they build from there by replacing some of the generic capabilities of the framework with the specific capabilities of the intended application.
Application frameworks reduce the total amount of code that a programmer has to write from scratch. However, because the framework is really a generic application that displays windows, supports copy and paste, and so on, the programmer can also relinquish control to a greater degree than event loop programs permit. The framework code takes care of almost all event handling and flow of control, and the programmer's code is called only when the framework needs it (e.g., to create or manipulate a proprietary data structure).
A programmer writing a framework program not only relinquishes control to the user (as is also true for event loop programs), but also relinquishes the detailed flow of control within the program to the framework. This approach allows the creation of more complex systems that work together in interesting ways, as opposed to isolated programs, having custom code, being created over and over again for similar problems.
Thus, as is explained above, a framework basically is a collection of cooperating classes that make up a reusable design solution for a given problem domain. It typically includes objects that provide default behavior (e.g., for menus and windows), and programmers use it by inheriting some of that default behavior and overriding other behavior so that the framework calls application code at the appropriate times.
There are three main differences between frameworks and class libraries: - Behavior versus protocol. Class libraries are essentially collections of behaviors that you can call when you want those individual behaviors in your program. A framework, on the other hand, provides not only behavior but also the protocol or set of rules that govern the ways in which behaviors can be combined, including rules for what a programmer is supposed to provide versus what the framework provides.
- Call versus override. With a class library, the code the programmer instantiates objects and calls their member functions. It's possible to instantiate and call objects in the same way with a framework (i.e., to treat the framework as a class library), but to take full advantage of a framework's reusable design, a programmer typically writes code that overrides and is called by the framework. The framework manages the flow of control among its objects. Writing a program involves dividing responsibilities among the various pieces of software that are called by the framework rather than specifying how the different pieces should work together.
- Implementation versus design. With class libraries, programmers reuse only implementations, whereas with frameworks, they reuse design. A framework embodies the way a family of related programs or pieces of software work. It represents a generic design solution that can be adapted to a variety of specific problems in a given domain. For example, a single framework can embody the way a user interface works, even though two different user interfaces created with the same framework might solve quite different interface problems.
Thus, through the development of frameworks for solutions to various problems and programming tasks, significant reductions in the design and development effort for software can be achieved. A preferred embodiment of the invention utilizes HyperText Markup Language (HTML) to implement documents on the Internet together with a general-purpose secure communication protocol for a transport medium between the client and the Newco. HTTP or other protocols could be readily substituted for HTML without undue experimentation. Information on these products is available in T. Berners-Lee, D. Connoly, "RFC 1866: Hypertext Markup Language-2.0" (November 1995); and R. Fielding, H, Frystyk, T. Berners-Lee, J. Gettys and J. C. Mogul, "Hypertext Transfer Protocol—HTTP/1.1: HTTP Working Group Internet Draft" (May 2, 1996). HTML is a simple data format used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of domains. HTML has been in use by the World-Wide Web global information initiative since 1990. HTML is an application of ISO Standard 8879; 1986 Information Processing Text and Office Systems; Standard Generalized Markup Language (SGML).
To date, Web development tools have been limited in their ability to create dynamic Web applications which span from client to server and interoperate with existing computing resources. Until recently, HTML has been the dominant technology used in development of Web-based solutions. However, HTML has proven to be inadequate in the following areas: - Poor performance;
- Restricted user interface capabilities;
- Can only produce static Web pages;
- Lack of interoperability with existing applications and data; and
- Inability to scale.
Sun Microsystem's Java language solves many of the client-side problems by: - Improving performance on the client side;
- Enabling the creation of dynamic, real-time Web applications; and
- Providing the ability to create a wide variety of user interface components.
With Java, developers can create robust User Interface (UI) components. Custom "widgets" (e.g., real-time stock tickers, animated icons, etc.) can be created, and client-side performance is improved. Unlike HTML, Java supports the notion of client-side validation, offloading appropriate processing onto the client for improved performance. Dynamic, real-time Web pages can be created. Using the above-mentioned custom UI components, dynamic Web pages can also be created.
Sun's Java language has emerged as an industry-recognized language for "programming the Internet." Sun defines Java as: "a simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multithreaded, dynamic, buzzword-compliant, general-purpose programming language. Java supports programming for the Internet in the form of platform-independent Java applets." Java applets are small, specialized applications that comply with Sun's Java Application Programming Interface (API) allowing developers to add "interactive content" to Web documents (e.g., simple animations, page adornments, basic games, etc.). Applets execute within a Java-compatible browser (e.g., Netscape Navigator) by copying code from the server to client. From a language standpoint, Java's core feature set is based on C++. Sun's Java literature states that Java is basically, "C++ with extensions from Objective C for more dynamic method resolution."
Another technology that provides similar function to JAVA is provided by Microsoft and ActiveX Technologies, to give developers and Web designers wherewithal to build dynamic content for the Internet and personal computers. ActiveX includes tools for developing animation, 3-D virtual reality, video and other multimedia content. The tools use Internet standards, work on multiple platforms, and are being supported by over 100 companies. The group's building blocks are called ActiveX Controls, small, fast components that enable developers to embed parts of software in hypertext markup language (HTML) pages. ActiveX Controls work with a variety of programming languages including Microsoft Visual C++, Borland Delphi, Microsoft Visual Basic programming system and, in the future, Microsoft's development tool for Java, code named "Jakarta." ActiveX Technologies also includes ActiveX Server Framework, allowing developers to create server applications. One of ordinary skill in the art readily recognizes that ActiveX could be substituted for JAVA without undue experimentation to practice the invention.
Overall Architecture
One embodiment of the present invention is composed of multiple software programs which are linked together to create an architecture which is capable of monitoring a network for events and checking system functions and resources. Such events can include alarms, faults, alerts, etc. Other embodiments of the present invention may each include an individual software program.
Reports on system performance, errors, etc. can be generated and output. For example, the reports may depict to operators/administrators of the network what is happening inside the network in real-time. This allows the administrators to respond to problems before disruptions to service occur.
Accordingly, FIG. 2 illustrates an embodiment of the present invention which provides service assurance for a network. In operation 200, an alarm is generated to indicate a status of a network. Network event information of the network is dispatched in operation 202 upon generation of the alarm after which the network event information is mapped in operation 204. The data collected on the status of the network is manipulated and stored in a database. See operations 206 and 208. In operation 210, availability of the network is conveyed graphically.
Collector
A collector may be used to collect system information including events and performance data and route pertinent portions of the system information to a notifier program which informs a system administrator of the system information. One example of a collector is Netcool® manufactured by Micromuse Inc., whose headquarters are located at 139 Townsend St., San Francisco, Calif. 94107.
Notifier
In one aspect of the present invention, a notifier may form part of the architecture or may stand alone. The notifier is a configurable decision engine that determines how to dispatch system information to users. Notification actions can include initiating alphanumeric pages, trouble tickets, log messages, and external scripts. The notifier is integrated closely with the collector and is able to forward alerts to specific individuals based on the contents of the alert and the time/day it was. The notifier also allows an alert to trigger more than one type of notification. This enables functionality like creating a trouble ticket before paging so that the trouble ticket number can be sent along with an alphanumeric page.
The notifier creates integration between the Collector, and any of a number of other programs which may include an alphanumeric paging program and/or a trouble tickets generation program. Preferably, the notifier can be modified to integrate with existing systems. The notifier may be written in the PERL language.
Accordingly, FIG. 3 illustrates one embodiment of the present invention for dispatching network event information of a network with service assurance capabilities. In operation 300, a network is monitored for an event. Thereafter, in operation 302, at least one notification action is generated based upon the occurrence of the event. The notification action may include an alphanumeric page, an e-mail message, a resolution script, a remedy trouble ticket, and/or a log message. Further, the notification action may be transmitted in operation 304 to notify a recipient about the occurrence of the event.
In one aspect of the present invention, the step of generating the notification action further includes the steps of: determining the type of event that occurred, and generating at least one notification action associated with the determined type of event. Further, the order in which notification actions are generated upon the occurrence of the event may be based upon the determined type of event.
In another aspect of the present invention, the step of transmitting the notification action may further include the steps of: determining characteristics of the occurred event, selecting at least one recipient for receiving the notification action based upon the determined characteristics of the occurred event, and transmitting the notification action to the selected recipient.
In still yet another aspect of the present invention, the step of transmitting the notification action may further include the step of determining the time of occurrence of the occurred event. Such selection of the recipient for receiving the notification action may be further based upon the occurrence time of the occurred event. In addition, the occurred event may be compared to a list of scheduled events such that the generation of the notification action may be inhibited when the occurred event matches of the listed scheduled events.
Automation Programs
Automations are a configurable function in the collector. Automations enhance the collector's functionality by adding intelligence and/or automating tasks. Exemplary automations include:
Fault/Resolution Pairing—each resolution alert received (e.g. Link Up) will trigger an automation to try and find its corresponding fault notification (Link Down, which would have been generated earlier) by matching the node and the problem uniquely. When a F/R pair is found, the resolution alert is removed and the fault alert is cleared (no longer a "critical" or "warning" status). Additional information can be added to the alert including the duration of the fault.
Event and Notification Logging—when a notification is sent out by the notifier, an automation can update the alert for which a notification was generated to indicate what action was taken (page, trouble ticket number, logged, etc;).
Maintenance—all cleared alerts can be removed from the event list after a specified period, reducing distractions from the operators.
Data Acquisition
In another embodiment of the present invention, Data Acquisition scripts are programs which coordinate the collection and transfer of application logs to a central location. Data Acquisition can be used so that log files containing performance statistics gathered by a monitoring program can be transferred to a central server for processing by the Performance Data Manipulator (see below). The Data Acquisition scripts may be written in PERL.
Performance Data Manipulator (PDM)
In an embodiment of the present invention, a PDM is a script that processes log files that have been collected by Data Acquisition in order to load the data into a database. The PDM converts the log files from formats specific to a particular monitoring program into a common format. PDM then formats the file based on data warehousing techniques which include converting nodes and performance metrics to key codes which are stored in the database. The coded data file is then bulk loaded into the database. The PDM may be written in PERL.
Database Data Warehouse Design
An embodiment of the present invention includes a database schema for storing and reporting on network element performance data. The schema should be built for the efficient storage, processing, and aging of performance data.
Time Key Preloader
The Time Key Preloader is a script that supports the efficient loading of data in the database. In prior art database systems, due to the use of "key codes" in the design of databases a unique time key needed to be created for each entry stored in the database. The original solution to this was to create a time key every time an entry was received. This is inefficient. One embodiment of the present invention includes the Time Key Preloader which pre-generates time keys for every second, minute, five minutes, and/or hour, etc. of the day all at once using a direct interface to the database. Thus, the time key for the time an alert or metric is received becomes associated with the alert or performance metric. The Time Key Preloader may be written in PERL.
FIG. 4 illustrates an embodiment of the present invention for assigning time-based identifiers to data stored in a database. First, prior to receiving data, a plurality of time-based identifiers are generated in operation 400. Next, in operation 402, the time-based identifiers are saved. Upon receipt of data in operation 404, one of the appropriate time-based identifiers is assigned to the received data in operation 406. In operation 408, the received data is stored with the assigned time-based identifier in a database.
In one aspect of the present invention, the time-based identifiers are generated for every second, minute, and hour of every day at once. The time-based identifiers may be generated using an interface to the database. Further, the time-based identifiers may be stored in a database separate from that in which the data is stored.
In yet another aspect of the present invention, the received data relates to a parameter of a network. Further, the data may be stored for service assurance purposes on the network.
Event Correlator and Manager (ECM)
One embodiment of the present invention includes an event correlation application which uses behavior models to correlate network conditions, identify critical problems, filter out superfluous events and take appropriate actions. An example of an ECM is NerveCenter manufactured by Seagate Software, 920 Disc Drive, Scotts Valley, Calif. 95067.
Patroller
An embodiment of the present invention includes a patroller which discover the environment of a network, continuously surveys related systems, and/or initiates alarms based on preset parameters. An example of a Patroller is BMC Patrol manufactured by BMC Software, BMC Software, Inc., 2101 CityWest Blvd., Houston, Tex. 77042-2827.
Alarm Definitions
Alarm Definitions are customizations to an ECM which allow for event correlation and alarm suppression. Preferably, the Alarm Definitions are specific to particular networks in order to add automated intelligence for problem resolution and to reduce unneeded alarm traffic sent to operators.
Automation Scripts and Knowledge Modules
Automation scripts allow the automation of application and system management tasks. Automation scripts may monitor application health and perform corrective actions when the status of an application changes. Other functions may include sending SNMP traps when specific conditions are detected in an application or system.
Reporting Interface Development, Customer Service Integration Module
One embodiment of the present invention provides the ability to correlate network events to individual customers (or providers in a Managed Network Services world) and notify customer service representatives of known outages affecting customers through a web interface. This allows proactive notification to customers of problems that affect them as well as builds confidence in customers calling to report problems that the provider is aware of.
Data Collection
Data Collection Overview
Referring to FIG. 5, in one embodiment of the present invention, an activation signal is received in operation 500. Upon receipt of the activation signal, a signal is transmitted in operation 502 to initiate the retrieving of network performance data and network event data generated from at least one network monitor. Such network monitor is adapted for monitoring a network system and the relocating of the data into a common directory. Then, in operation 504, the signal is transmitted to initiate the manipulation of the data and the loading of the manipulated data into a database.
In one embodiment, the signal to initiate the manipulating and storing of the data initiates a controller program that transmits a signal to initiate the manipulation of the data. Such controller program transmits a signal to initiate the calculation of statistics from the data, and further transmits a signal to initiate the assigning of key codes to the data.
In another aspect of the present invention, the controller program also transmits a signal to initiate the sorting of the data, and further transmits a signal to initiate the concatenating of the data. Still yet, the controller program transmits a signal to initiate the loading of the data into the database.
As an option, the present invention may further transmit a signal to initiate a clean archive program for deleting files from an archive directory, a signal to initiate a table extract program for extracting data from tables stored in the database, a signal to initiate a trigger reporting program for generating reports, a signal to initiate a purge record program for deleting records from the database, and a signal to initiate a database backup program for backing up data stored on the database.
The following subsections describe an embodiment of the present invention that controls the collection, manipulation and storage of network performance data and network event data of a network with service assurance capabilities and provides an exemplary step-by-step overview of the flow of data from collection to when it's loaded into the database. FIG. 6 is a flowchart that provides an overview of a data collection process of one embodiment of the present invention.
In this exemplary embodiment, Seagate NerveCenter will be used as the Event Correlator and Manager 600 and BMC Patrol will be used as the Patroller 602.
Applications
The data collection is started by the network monitory applications creating their ASCII text data files. These files are generally stored locally on the machines they are running on. Specifics on where these files should be stored are located in the installation & configuration instructions for each application.
Cron
The cron 604 process daily activates the processing scripts. The .crontab file should be modified on the system which will do the actual processing of the files.
The cron process starts one script, sa—master—process—control.pl 606, which then begins the rest of the processes
sa—master—process—control.pl
The sa—master—process—control.pl script 606 does not actually move or manipulate any of the data files, but instead starts other scripts which will move and manipulate the data to the appropriate locations and into the required formats. There are 8 sequential steps to this process, each beginning after the previous has finished. The steps are as follows: 1. clean.pl 608 2. get—data.pl
The get—data.pl script 610 is designed to move files from the systems where the data is being collect to the /sa directory structure where it will then be further processed. There are three options that this script can do. The three options are (a) to move files locally on a Unix system, (b) to move files from a remote Unix system, and (c) to move files from a remote NT system.
The script gets its settings from a configuration file named get—data.cfg 612. The most important part is the configuration section for each application:
#******************** Host Identification Section *******************
#$NodeName˜$Location˜$SourcePath˜$SourceExt˜$TargetPath˜$TmpExt˜$Mov eMethod˜$Archive˜#$ArchivePath˜$ArchiveExt˜$TransferType˜$Platform˜$Targ etExt˜$DestHost˜$LogFile˜$UnixRemoteScript
#*********************Example****************************** ***
nsmmws16˜remote˜/opt/PATROLLER/Solaris25-sun4/remote/˜.dat˜/sa/dev/dat/pr ocess/˜.tmp˜move˜yes˜/opt/PATROLLER3.2/Solaris25-sun4/remote/˜.old˜ftp˜Un ix˜.pat˜twmmdb02˜/files0/home/noc/bin/UnixRemote.log˜/files0/home/noc/bin/Un ixRemote.pl
Use the definition above and the example as a guide on adding more applications and host systems.
For moving files remotely from a Unix system, the UnixRemote.pl script 614 is called using remsh. Once UnixRemote.pl finishes, the verify.cmp file is used to make sure that all the files were properly transferred.
UnixRemote.pl
UnixRemote.pl 614 is called using remsh on the system which the files are to be moved. It first deletes the *.old files, which are the data files from the previous day. Then the current data files are renamed to $node—$date—*.$extension. A verify.cmp file is created to later verify that all the files are transferred. The *.$extension files and the verify.cmp file are then transferred to the /sa directory structure, usually by ftp. Once that's done, the old files are archived by change $extension to $extension.old, which flags them for deletion the next day. 3. extract 616 4. data—manip—wrapper.pl
The data—manip—wrapper.pl script 618 controls the action manipulation of the data by calling a number of subscripts, each performing it's own function. Each script is called when the previous script has finished.
data—manip—stage1.pl
The data—manip—stage1.pl script 620 is the main script which massages the data. If new applications are added, this is the script which will need to be modified. This script takes the individual data files in the /sa directory structure, reads in the data—manip—stage1.cfg configuration files, manipulates the data so that it's in the format defined by data—manip—stage1.cfg, and then outputs files in the form of $element.element.
calculate—stats.pl
The calculate—stats.pl script 622 calculates the mean, max, min, sample, and standard deviation for each element file. The output of the script is $element.element.stage2 and $element.element.stat for each element file.
key—codes.pl
The key—codes.pl script 624 uses perf—metric—tb.ext and network—element—tb.ext to look up the codes associated with the node or element being monitored. It assigns the key code as the name of the file (i.e. keycode.element.stage2 and keycode.element.stat). It also produces perf—metric—time—tb.dat.
sort—elmnt—files.pl
The sort—elmnt—files.pl 626 sorts all of the keycode.element.stage2 and keycode.element.stat files and renames them keycode.element.stage2.sorted and keycode.element.stat.sorted. union—all.pl
The union—all.pl script 628 takes the keycode.element.stage2. sorted files and concatenates them into perf—fact—tb.dat and the keycode.element.stat.sorted files and concatenates them into perf—fact—dly—tb.dat.
move.pl
The move—all.pl script 630 moves perf—fact—tb.dat, perf—fact—dly—tb.dat and perf—metric—time—tb.dat to /files6/ipsa/data—loads/data—files. 5. data—2—db.pl 632 6. purge—records.pl 634 7. backup.pl 636 8. trigger—reporting.pl 638 Call Script
This script does system calls and utilizes FTP to run other scripts. It calls the scripts in a specific order and logs successes and failures.
Scripts on other machines
If scripts need to be run on other machines, FTP should be used. With FTP, a configuration file is needed to house user and password information. Also, an account may have to be set up on that machine to be able to run that script.
NT Account
From the NT host:
Open Microsoft Internet Information Service (IIS), from there open Internet Services Manager
Right-Click on SA Ftp Site. Click on home directory and make sure this is share, not local. Add path, Network Share path, and make sure that Write and Log Access is accessed.
In the Security Accounts tab, make sure that the radio button for "Not allow anonymous connections is clicked.
In the FTP Site tab, make sure that the connections are limited to 1.
In the Directory Security tab, make sure the radio button for Denied Access is clicked and add the domain for the server that is attempting the FTP.
UNIX
The user and password being used to FTP must be supported by the system being accessed.
FTP example
# # Trigger daily reporting #Create $ReportingScript in /sa/dev/dat/tmp with string "This file is used to trigger the daily reporting process" open (FILE,>$SACommon::SATempDir${ReportingScript})||die "not able to open $ReportingScript ($!)"; print FILE "This file is used to trigger the daily reporting process"; #Transfer the file to $NTHost $ftp=Net::FTP->new("$NTHost"); $ftp->login("$User", "$Password"); $ftp->cwd("$NTPath"); $Return=$ftp->put("$SACommon::SATempDir$ {ReportingScript}", "$NTPath\$ReportingScript"); $ftp->quit; #Process the return value if(0!=Return) { - &SACommon::WriteLog("FTP of $ReportingScript failed");
}elsif (0=$Return) { - &SACommon::WriteErrorLog("FTP of $ReportingScript was successful");
} #Delete $ReportingScripts file from /sa/dev/dat/tmp close (FILE); unlink ($SACommon::SATempDir$ {ReportingScript}); Scripts on Local machines
The local scripts are called with a system call then logged successful or failure.
System command example # #Call Cleanup Utility $Return=(system($SACommon::SABinDir/$CleanupScript)); if (0!=$Return) { - &SACommon::WriteLog("$CleanupScript was successful");
}elsif (0=$Return) { - &SACommon::WriteErrorLog("$CleanupScript failed to run"); }
Order of Scripts Called 1. Cleanup Utility 608 2. Data Acquisition Utility 610 3. Table Extract 616 4. Data Manipulation 618 5. Data Loader 632 6. IPSA—ProcessTheBatchQueue.txt (Reporting SPSS) 638 7. Purge Records 634 8. Backup 636 get data.pl script
The get—data.pl script, along with the get—data.cfg file, retreves data from both local and remote hosts and relocates it in a common directory. The get—data.pl script also uses the UnixRemote.pl script, which resides on the remote UNIX machines from which data is being acquired. A similar script could also be designed for use with remote NT machines. The get—data.cfg file is the configuration file used to define all program specific information and parameters. There are additional variables and settings defined in the SACommon.pm Perl module, which is located in /sa/usr/mod/. In general, SACommon.pm contains global variables and sub-routines.
Configuration File (get—data.cfg):
The following is an example of what the configuration file entries look like. Each line is read by —data.pl and then acted on according to the logic defined in the script.
#******************** Host Identification Section (continuous line) *******************
$NodeName˜$Location˜$SourcePath˜$SourceExt˜$TargetPath˜$TmpExt˜$Move Method˜$Archive˜
$ArchivePath˜$ArchiveExt˜$TransferType˜$Platform˜$TargetExt˜$DestHost˜$L ogFile˜$UnixRemoteScript
#***********************Example (continuous line)********************************
nsmmws16˜remote˜/opt/PATROLLER/Solaris25sun4/remote/test/˜.dat˜/sa/dev/dat /process/˜.tmp˜
move˜yes˜/opt/PATROLLER/Solaris25sun4/remote/test/˜.old˜ftp˜Unix˜.pat˜twm mdb02˜/files0/home/noc/bin/UnixRemote.log˜/files0/home/noc/bin/UnixRemote.pl
The information contained in each entry is unique to the data being retrieved. The "˜" character is used as field delimiter throughout the file. Following is a description of the fields that must be defined for each entry: $NodeName—DNS name or alias for the local/remote host from which files need to be collected. $Location—Options include "local" or "remote". $SourcePath—Complete source path designating the directory from which data will be retrieved. $SourceExt—File extension used to designate which files need to be collected. $TargetPath—Complete target path designating the destination directory, on $DestHost, where files should be transferred to. $TmpExt $MoveMethod $Archive $ArchivePath $ArchiveExt $TransferType $Platform $TargetExt $DestHost—DNS name or alias for the host where files will to be transferred to.
It is not necessary for the destination host to be the system calling get—data.pl. $LogFile $UnixRemoteScript
There is a single variable, $ScriptName, defined locally within get—data.pl. It is recommended for use with the WriteErrorLog and WriteLog sub-routines provided in SACommon.pm. This is a static variable, and should not need to be changed unless the name of the script is changed.
Data Acquisition Local and Remote
FIG. 7 illustrates an embodiment of the present invention for retrieving and relocating event and performance data of a network with service assurance capabilities. First, in operation 700, a data file is obtained from a host. The data file includes event data collected on a network and/or performance data collected on the network. In operation 702, a verification control file is created that is associated with the data file. The data file is renamed in operation 704 and copied to a target directory in operation 706. Thereafter, the copying of the renamed data file is verified with the verification control file in operation 708.
In one aspect of the present invention, information is added to the renamed data file relating to host name, date, and application extension. Further, the type of host is determined from which the data file was obtained. The type of host is selected from a local host and/or a remote host.
In another aspect of the present invention, a platform of the determined host is determined when the host is determined to be a remote host. Further, the verification control file may be copied with the renamed data file to the target directory. In addition, the data file may be archived.
In yet another embodiment, previously archived data files may be deleted. Further, a move method may be determined for the data file. A temporary copy of the data file may be created based on the move method. Such move method includes moving the data file and/or copying the data file.
FIG. 8 illustrates operation of the main data acquisition script (get—data.pl) 800 for each entry in the configuration file (get—data.cfg) 802. As shown in FIG. 8, the main data acquisition script (get—data.pl) reads the configuration file (get—data.cfg) once, at the beginning of processing, and stores necessary information in memory. Specifically, user and password are stored as variables, and the host entries are stored as an array. Each host entry is then processed in order: 1. If the data files are local, the get—data.pl script will execute the following steps. Remove old archived files. 804 If the files are to be moved, they are renamed with .tmp. 806 Parse the system data and reconfigure for file renaming. 808 Rename the files, add machine name, date and application extension. 810 Copy the files to the target directory. 812 Archive the moved files, rename by appending old to the file name. 814 2. If the data files are remote and Unix, the remote shell command is used to run the UnixRemote.pl script, which resides on the remote machines. UnixRemote.pl will execute and perform the following steps remotely. Delete old archived files. 816 If the files are to be moved, they are renamed with .tmp. 818 Parse the system data and reconfigure for file renaming. 820 Rename the files, add machine name, date and application extension. 822 Ftp the files to the target directory and create a verification file. 824 Archive the moved files, rename by appending old to the file name. 826 It returns to get—data.pl and get—data.pl continues to execute the verification of the FTP. (To be able to execute FTP commands, the Net::FTP Perl module must be used). 828 3. If the data files are remote and the platform is NT, the remote shell command will be used to run a script that will reside on the remote machine. The script will execute the following steps remotely. Delete old archived files. 830 If the files are to be moved, they are renamed with .tmp. 832 Parse the system data and reconfigure for file renaming. 834 Rename the files, add machine name, date and application extension. 836 Ftp the files to the target directory and create a verification file. 838 Archive the moved files, rename by appending .old to the file name. 840 It returns to get—data.pl and get—data.pl continues to execute the verification of the FTP. (To be able to execute FTP commands, the Net::FTP Perl module must be used). 842 Data Manipulation Scripts
The scripts contained in the general function of data manipulation are: data—manip—stage1.pl. calculate—stats.pl, key—codes.pl, sort—elmnt files.pl, union—all.pl and move—all.pl. The combination of these scripts takes the data from /sa/dev/dat/process, manipulates it and performs mathematical computations on it then moves it to the /files6/ipsa/data—loads/data—files so that the SQL loader can load it into the database.
Detailed Description of Modules
data—manip—stage.pl
The data—manip—stage1.pl script is the main script which manipulates the data. If new applications are added, this will need to be modified. This script takes the individual data files in the configuration files, manipulates the data so that it is in the format defined by data—manip—stage1.cfg and then outputs files in the form of <element>.element. This manipulation of data has several steps: Filters out records that contain metrics we do not use for reporting, Filters out fields we do not use for reporting Reorders the fields we do want into the order <element>, <metric>, <instance>, <date-time stamp>, <value> Converts raw metric names to standard metric names (according to a table) Converts Date Time Stamps to Standard calculate—stats.pl
The calculate—stats.pl script calculates the mean, maximum, minimum, sample size, and standard deviation for each metric in each *.element file. The output of the script is <element>.element.stage2 and <element>.element.stat for each .element file. The format of the *.stat files is <element>, <metric>, <instance>, <min>, =<max>, <mean>, <standard dev>, <sample size>.
key—codes.pl
The key—codespl script uses perf—metric—tb.ext and network—element—tb.ext to look up the codes associated with the node or element being monitored. It assigns the key code to the name of the file in place of the element name (i.e. <\vcode>.element.stage2 and <keycode>.element.stat). It also produces perf—metric—time—tb.dat, which is a list of all of the unique date-time stamps appearing in the raw data and their corresponding epoch times (the number of seconds since Jan. 1, 1970)
sort—elmnt—files.pl
The sort—elmnt—files.pl sorts all of the <keycode>.element.stage2 and <Keycode>.element.stat files and renames them <keycode>.element.stage2.sorted and <keycode>.element.stat.sorted.
union—all.pl
The union—all.pl script takes the <keycode>.element.stage2.sorted files and concatenates them into perf—fact—tb.dat and the <keycode>.element.stat.sorted files and concatenates them into perf—fact—dly—tb.dat.
move—all.pl
The move—all.pl script moves perf—fact—tb.dat, perf—fact—dly—tb.dat and perf—metric—time—tb.dat to /files6/ipsa/data—loads/data iles.
Data Manipulator Sample Raw Data
Patroller Data
File 1: nsmmpe29—1999—902194516.dat
| | ,,nsmmpe29,HISTORY—Propagator,HISTORY—Propagator,GetHistoryData, 1999- | | 08-03 19:35:04,0 | | ,,nsmmpe29,NT—CPU,CPU—0,CPUprcrProcessorTimePercent, 1999-08-03 | | 19:42:51,1.83103 | | ,,nsmmpe29,NT—CPU,CPU—0,CPUprcrProcessorTimePercent, 1999-08-03 | | 19:57:51,2.2381 | | ,,nsmmpe29,NT—CPU,CPU—0,CPUprcrProcessorTimePercent, 1999-08-03 | | 20:12:52,3.98922 | | ,,nsmmpe29,NT—CPU,CPU—0,CPUprcrProcessorTimePercent, 1999-08-03 | | 20:27:52,1.52942 | | ,,nsmmpe29,NT—MEMORY,NT—MEMORY,MEMmemAvailableBytes, 1999-08-03 | | 19:34:41,4.14453 | | ,,nsmmpe29,NT—MEMORY,NT—MEMORY,MEMmemAvailableBytes, 1999-08-03 | | 19:49:41,2.4375 | | ,,nsmmpe29,NT—MEMORY,NT—MEMORY,MEMmemAvailableBytes, 1999-08-03 | | 20:04:41,1.85156 | | ,,nsmmpe29,NT—MEMORY,NT—MEMORY,MEMmemAvailableBytes, 1999-08-03 | | 20:19:41,1.19531 | | ,,nsmmpe29,NT—MEMORY,NT—MEMORY,MEMmemAvailableBytes, 1999-08-03 | | 20:34:41,1.14844 | | ,,nsmmpe29,NT—NETWORK,2,NETniPcktsPerSec, 1999-08-03 19:46:55,10.9169 | | ,,nsmmpe29,NT—NETWORK,2,NETniPcktsPerSec, 1999-08-03 20:01:55,11.5108 | | ,,nsmmpe29,NT—NETWORK,2,NETniPcktsPerSec, 1999-08-03 20:16:55,11.514 | | ,,nsmmpe29,NT—NETWORK,2,NETniPcktsPerSec, 1999-08-03 20:31:55,12.0128 | | File 2: nsmmws09—1999—901881775.dat ,,nsmmws09,CPU,CPU,CPUCpuUtil,1999-07-31 04:46:42,4 ,,nsmmws09,CPU,CPU,CPUCpuUtil,1999-07-31 05:01:47,2 ,,nsmmws09,CPU,CPU,CPUCpuUtil,1999-07-31 05:16:52,3 ,,nsmmws09,CPU,CPU,CPUCpuUtil,1999-07-31 05:31:57,2 ,,nsmmws09,DISK,sd4,DSKPercentBusy,1999-07-31 04:42:14,0 ,,nsmmws09,DISK,sd4,DSKPercentBusy,1999-07-31 04:57:14,4 ,,nsmmws09,DISK,sd4,DSKPercentBusy,1999-07-31 05:12:15,0 ,,nsmmws09,DISK,sd4,DSKPercentBusy,1999-07-31 05:27:15,0 ,,nsmmws09,MEMORY,MEMORY,MEMFreeMem, 1999-07-31 04:46:42,37184 ,,nsmmws09,MEMORY,MEMORY,MEMFreeMem, 1999-07-31 05:01:47,37172 ,,nsmmws09,MEMORY,MEMORY,MEMFreeMem, 1999-07-31 05:16:52,37168 ,,nsmmws09,MEMORY,MEMORY,MEMFreeMem,1999-07-31 05:31:57,37168 ,,nsmmws09,NETWORK,1e0,NETPacketsln,1999-07-31 04:43:05,6660 ,,nsmmws09,NETWORK,1e0,NETPacketsIn,1999-07-31 04:58:06,6423 ,,nsmmws09,NETWORK,1e0,NETPacketsIn,1999-07-3 1 05:13:06,6822 ,,nsmmws09,NETWORK,1e0,NETPacketsIn,1999-07-31 05:28:07,6285 ,,nsmmws09,NETWORK,1e0,NETPacketsOut, 1999-07-31 04:43:05,168 ,,nsmmws09,NETWORK,1e0,NETPacketsOut, 1999-07-31 04:58:06,118 ,,nsmmws09,NETWORK,1e0,NETPacketsOut, 1999-07-31 05:13:06,199 ,,nsmmws09,NETWORK,1e0,NETPacketsOut,1999-07-31 05:28:07,118 File 3:twmmt02—1999—901997338.dat ,,twmmnt02,HISTORYPropagator,HISTORY—Propagator,GetHistoryData, 1 999-08-01 12:48:56,0 ,,twmmnt02,NT—CPU,CPU—0,CPUprcrProcessorTimePercent,1999-08-01 13:02:42,0.835579 ,,twmmnt02,NT—CPU,CPU—0,CPUprcrProcessorTimePercent, 1999-08-01 13:17:42,0.776552 ,,twmmnt02,NT—CPU,CPU—0,CPUprcrProcessorTimePercent, 1999-08-01 13:32:42,0.795792 ,,twmmnt02,NT—CPU,CPU—0,CPUprcrProcessorTimePercent,1999-08-01 13:47:42,0.786594 ,,twmmnt02,NT—MEMORY,NT—MEMORY,MEMmemAvailableBytes,1999-08-01 13:02:49,21.4688 ,,twmmnt02,NT—MEMORY,NT—MEMORY,MEMmemAvailableBytes,1999-08-01 13:17:49,21.5703 ,,twmmntO2,NT—MEMORY,NT—MEMORY,MEMmemAvailableBytes,1999-08-01 13:32:49,16.7617 ,,twmmt02,NT—MEMORY,NT—MEMORY,MEMmemAvailableBytes,1999-08-01 13:47:49,16.8359 ,,twmmnt02,NT—NETWORK,2,NETniPcktsPerSec,1999-08-01 13:02:40,14.9904 ,,twmmnt02,NT—NETWORK,2,NETniPcktsPerSec,1999-08-01 13:17:40,15.0483 ,,twmmnt02,NT—NETWORK,2,NETniPcktsPerSec,1999-08-01 13:32:40,14.8512 ,,twmmnt02,NT—NETWORK,2,NETniPcktsPerSec,1999-08-01 13:47:40,15.0225 ,,twmmnt02,NT—PHYSICAL—DISKS,DISK—0,PDpdDiskTimePercent,1999-08-01 13:02:40,0 ,,twmmnt02,NT—PHYSICAL—DISKS,DISK—0,PDpdDiskTimePercent, 1999-08-01 13:17:40,0 ,,twmmnt02,NT—PHYSICAL—DISKS,DISK—0,PDpdDiskTimePercent, 1999-08-01 13:32:40,0 ,,twmmnt02,NT—PHYSICAL—DISKS,DISK—0,PDpdDiskTimePercent,1999-08-01 13:47:40,0 Event Correlation and Manager Data File 1: ifdata.log Time=Aug. 11, 1999 09:39:27 Tue, LogId=22248, Severity-Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=IfDataLogger, Ostate=Ground, Trigger=if Data, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=26, Object=if Entry, Attribute if Speed=100000000 Attribute ifInOctets=464859949 Attribute if OutOctets=312935185 Time=Aug. 11, 1999 09:39:27 Tue, LogId=22249, Severity-Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=IfDataLogger, Ostate=Ground, Trigger=if Data, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=27, Object=if Entry, Attribute ifspeed=100000000 Attribute ifInOctets=0 Attribute if OutOctets=0 Time=Aug. 11, 1999 09:39:27 Tue, LogId=22250, Severity-Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=IfDataLogger, Ostate=Ground, Trigger=if Data, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=28, Object=ifEntry, Attribute if Speed=100000000 Attribute ifInOctets=0 Attribute if OutOctets=0 Time=Aug. 11, 1999 09:39:27 Tue, LogId=22251, Severity-Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=IfataLogger, Ostate=Ground, Trigger\-if Data, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=29, Object=ifEntry, Attribute if Speed=3705032704 Attribute ifInOctets=0 Attribute ifoutOctets=0 Time=Aug. 11, 1999 09:39:27 Tue, LogId=22252, Severity-Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt03, Alarm=IfDataLogger, Ostate=Ground, Trigger=ifDfata, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=1, Object=ifEntry, Attribute if Speed=10000000 Attribute ifInOctets=1522731264 Attribute if OutOctets=2554292181 Time=Aug. 11, 1999 09:39:27 Tue, LogId=22253, Severity-Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt03, Alarm=IfDataLogger, Ostate=Ground, Trigger=ifData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=2, Object=ifEntry, Attribute if Speed=10000000 Attribute ifInOctets=0 Attribute ifoutOctets=0 Time=Aug. 11, 1999 09:39:27 Tue, LogId=22254, Severity-Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt03, Alarm=IfDataLogger, Ostate=Ground, Trigger=ifData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=3, Object=ifEntry, Attribute if Speed=10000000 Attribute ifInOctets=0 Attribute if OutOctets=0 Time=Aug. 11, 1999 09:39:27 Tue, LogId=22255, Severity=Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt03, Alarm=IfDataLogger, Ostate=Ground, Trigger=if Data, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=4, Object=ifEntry, Attribute if Speed=10000000 Attribute if InOctets=0 Attribute if OutOctets=0 Time=Aug. 11, 1999 09:39:27 Tue, LogId=22256, Severity—Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt03, Alarm=IfDataLogger, Ostate=Ground, Trigger=ifData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=5, Object=if Entry, Attribute ifSpeed=10000000 Attribute ifInOctets=958056 Attribute if OutOctets=3610555 File 2: lsystemdata.log Time=Aug. 11, 1999 09:18:26 Tue, LogId=22153, Severity=Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=lsystemDataLogger, Ostate=Ground, Trigger=lsystemData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=0, Object=lsystem, Attribute busyPer=4 Time=Aug. 11, 1999 09:18:28 Tue, LogId=22154, Severity=Normal, PropertyGroup=SA\—Router—Collection—Group, Node=nsmmrt03, Alarm =lsystemDataLogger, Ostate=Ground, Trigger=lsystemData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=0, Object=lsystem, Attribute busyPer=4 Time=Aug. 11, 1999 09:33:38 Tue, LogId=22219, Severity=Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=lsystemDataLogger, Ostate=Ground, Trigger=lsystemData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=0, Object=lsystem, Attribute busyPer=2 Time=Aug. 11, 1999 09:33:40 Tue, LogId=22220, Severity=Normal, PropertyGroup=SA—Router—Collection—Group, Node"=nsmmrt03, Alarm=lsystemDataLogger, Ostate=Ground, Trigger=lsystemData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=0, Object=lsystem, Attribute busyPer=7 Time=Aug. 11, 1999 09:48:43 Tue, LogId=22283, Severity=Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=lsystemDataLogger, Ostate=Ground, Trigger=lsystemData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=0, Object=lsystem, Attribute busyPer=2 Time=Aug. 11, 1999 09:48:45 Tue, LogId=22284, Severity=Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt03, Alarm=lsystemDataLogger, Ostate=Ground, Trigger=lsystemData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=0, Object=lsystem, Attribute busyPer=10 Time=Aug. 11, 1999 10:03:49 Tue, LogId=22347, Severity=Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=lsystemDataLogger, Ostate=Ground, Trigger=lsystemData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=0, Object=lsystem, Attribute busyPer=4 Time=Aug. 11, 1999 10:03:51 Tue, LogId=22348, Severity=Normal, PropertyGroup =SA—Router—Collection—Group, Node=nsmmrt03, Alarm=lsystemDataLogger, Ostate=Ground, Trigger=lsystemData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=0, Object=lsystem, Attribute busyPer=6 Time=Aug. 11, 1999 10:18:56 Tue, LogId=2241 7, Severity=Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=lsystemDataLogger, Ostate=Ground, Trigger=lsystemData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=0, Object=lsystem, Attribute busyPer=3 Time=Aug. 11, 1999 10:18:58 Tue, LogId=22418, Severity=Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt03, Alarm=lsystemDataLogger, Ostate=Ground, Trigger=lsystemData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=0, Object=lsystem, Attribute busyPer=3 Desired Metrics and their Standard or Derive Metrics
| TABLE 1 | | | | | Standard or Derived | | SNMP metrics | Patroller metrics | Metric | | | busyPer | n/a | CpuBusy—Percent | | ifInOctects, | n/a | InterfaceUtilization—Percent | | ifOutOctets, | | ifSpeed | | ifInOctets, | n/a | InterfaceUtilization—BPS | | ifOutOctets, | | ifSpeed | | n/a | CPUCpuUtil | CpuUtilization—Percent | | n/a | MEMFreeMem | MemoryFree—Bytes | | n/a | NETPacketsIn, | NetworkTraffic—PPS | | | NETPacketsOut | | n/a | DSKPercentBusy | DiskBusy—Percent | | n/a | CPUprcrProcessorTime | CpuProcessorTime—Percent | | | Percent | | n/a | MEMmemAvailable | MemoryFree—Bytes | | | Bytes | | n/a | NETniPcktsPerSec | NetworkTraffic—PPS | | n/a | PDpdDiskTimePercent | DiskTime—Percent | | n/a | ftpResponseTime | FtpResponseTime—Seconds | | n/a | smtpResponseTime | SmtpResponseTime—Seconds | | n/a | nntpResponseTime | NntpResponseTime—Seconds | | n/a | httpResponseTime | HttpResponseTime—Seconds | | Data Loader
The purpose of the data loader in the SA toolkit is to perform recommended operations on the gathered event data before it is loaded into the database. Primarily, the raw data files need to be concatenated, reformatted, translated to key codes, and sorted. In addition, the data loader will also perform statistical calculations on the data for daily "roll-up" reports to be loaded into the database.
Body Section
Patroller Data
Performance data from Patroller collectors is in the following text log file format: ,,<element name>,<object type>,<instance>,<metric name>,<date time>,<metric value> a sample file (found in nsmmws16:/opt/PATROLLER3.2/Solaris25-Sun4/remote/): ,,nsmmws09,CPU,CPU,CPUCpuUtil, 1999-07-30 14:58:13,25 ,,nsmmws09,CPU,CPU,CPUCpuUtil,1999-07-30 15:13:18,52 ,,nsmmws09,CPU,CPU,CPUCpuUtil,1999-07-30 15:28:24,32 ,,nsmmws09,DISK,fd0,DSKPercentBusy,1999-07-30 15:03:15,0 ,,nsmmws09,DISK,sd4,DSKPercentBusy,1999-07-30 15:03:15,0 ,,nsmmws09,DISK,sd4,DSKPercentBusy,1999-07-30 15:18:15,4 ,,nsmmws09,DISK,sd4,DSKPercentBusy,1999-07-30 15:33:15,0 ,,nsmmws09,MEMORY,MEMORY,MEMFreeMem,1999-07-30 14:58:13,24008 ,,nsmmws09,MEMORY,MEMORY,MEMFreeMem,1999-07-30 15:13:18,23548 ,,nsmmws09,MEMORY,MEMORY,MEMFreeMem,1999-07-30 15:28:24,23544
After parsing and reordering, the format is: <element>, <metric name>, <instancename>, <date time>, <metric value>
Finally, the element name and metric name must be translated to their corresponding key codes and the date time must be converted to the format consistent with the database: 16838, 236998718,6293,54
Event Correlator and Manager (ECM) Data
Performance data from ECM collectors is in the following text log file format: (note: fields in italics are do not have values for poll data): Time=<datetime>, LogId=<LogID>, Severity=<Severity>, PropertyGroup=<PropertyGroup>, Node=<Node>, Alarm=<Alarm>, Ostate=<Ostate>, Trigger=<Trigger>, Nstate=<Nstate>, TrapTime=<TrapTime>, GenericTrapNumber=<GenericTrapNumber>, Enterprise=<Enterprise>, SpecificTrapNumber=<SpecificTrapNumber>, Instance=<, [instance value]>, Object=<[base—object—name]>, Attribue <[attributename]>=value . . . . a sample file (found in/opt/seasoft/userfiles/logs/ifdata.log) Time=Aug. 11, 1999 09:39:27 Tue, LogId=22248, Severity=Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=IfDataLogger, Ostate=Ground, Trigger=idData, Nstate=Ground, TrapTime=GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=26, Object=ifEntry, Attribute if Speed=100000000 Attribute ifInOctets=464859949 Attribute if OutOctets=312935185 Time=Aug. 11, 1999 09:39:27 Tue, LogId=22249, Severity-Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=IfDataLogger, Ostate=Ground, Trigger=ifData, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=27, Object=ifEntry, Attribute if Speed=1000\900000 Attribute ifInOctets=0 Attribute if OutOctets=0 Time=Aug. 11, 1999 09:39:27 Tue, LogId=22250, Severity—Normal, PropertyGroup=SA—Router—Collection—Group, Node=nsmmrt04e2-1, Alarm=IfDataLogger, Ostate=Ground, Trigger=if Data, Nstate=Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=Instance=28, Object=ifEntry, Attribute if Speed=100000000 Attribute if InOctets=0 Attribute if OutOctets=0 Time=Aug. 11, 1999 09:39:27 Tue, LogId=22251, Severity=Normal, PropertyGroup=SA—Router Collection—Group, Node=nsmmrt04e2-1, Alarm—IfDataLogger, Ostate=Ground, Trigger=ifData, Nstate—Ground, TrapTime=, GenericTrapNumber=, Enterprise=, SpecificTrapNumber=,Instance=29, Object=ifEntry, Attribute if Speed=3705032704 Attribute ifInOctets=0 Attribute if OutOctets=0
After parsing and reordering, the format is: <element>, <metric name>, <instance name>, <datetime>, <metric value>
Finally, the element name and metric name must be translated to their corresponding key codes and the date time must be converted to the format consistent with the database: 16838, 236998718,6293,54 High Level Architecture Description
FIG. 9 illustrates an exemplary configuration of an architecture 900 of the present invention for manipulating data using a Patroller and an ECM.
Load Config File 902
Inputs: Configuration file. Description: This subroutine will open the configuration file, read in the required values to three arrays and close the configuration file. Outputs: Three arrays: @SectionNames, @SectionMarker, @SectionValues. @SectionNames will contain all of the section titles found in the Config file, (SECTION—GENERAL, SECTION—CONCATENATE—FILES, SECTION—REORDER—FIELDS . . . ); @SectionValues will contain each line from the config file (that isn't a section header, a blank line, or a comment) as a single element; @SectionMarker will contain the index number of the last line of each section. This way, you can pull out just the elements of the @SectionValues array that you need for a given functional block. Concatenate All Files 904 Inputs: All log files in the directory sa/prd/dat/process/ECM and sa/prd/dat/process/Patroller. ECM data will have a .nrc extension, Patroller data will have a .pat extension. Description: This subroutine will concatenate all files from a given source into one master file. For each file in the directory (/ECM and/Patroller), it will read in each line, and: - Concatenate to master (actually done last) 906
- Filter to only include desired metrics (delete line if it isn't listed as a desired metric) 908
- Reorder and drop fields (split on "," and put in proper order: <element>, <metric>, <time>, value). 910
- Format non-standard fields into standard comma separated format. For each element file with a nonstandard field format (such as ECM's dual value fields with a metric=value format), this subroutine will convert these nonstandard fields to standard fields. 912
- Outputs: The files master.nrc and master.pat.
Generate Element Files 914 Inputs: Master.nrc and master.pat. (Proper order, comma separated, all text). Description: For each master file, this subroutine will read each line, create an element file for each unique element, and write the line to the appropriate file. Outputs: The files element1.elemnrc, element2.elemnrc, . . . (for all elements found in master.nrc) and the files element1.elempat, element3.elempat (for all elements found in master.pat). Convert Date-Time Stamps to Standard 916 Inputs: Element files. (*.elemnrc, *.elempat, proper order, comma separated, all text) Description:. Convert the current date time stamps to a standard format (e.g. the 1999-07-30 15:42:13 format found in Patroller files would be converted to . . . ) Outputs: Element files. Convert to Standard Metric Names 918 Inputs: Element files. (*.elemnrc, *.elempat, proper order, comma separated, all text). Description: For each element file, this function will lookup each metric name in a hash or flatfile. If found, it will convert it to standard. Else, it's already standard. Outputs: (Expanded) Element files. Sort Element files by Metric Time 920 Inputs: Element files (*.elemnrc, *.elempat, proper order, comma separated, all text). Description: This subroutine will perform and ordered sort on each element file, sorting first by metric, then by time stamp. Outputs: Element files. Calculate Derived Fields 922 Inputs: Element files. (*.elemnrc, *.elempat, proper order, comma separated, all text, sorted by metric and timestamp). Description: For each element file, this subroutine will read each line and search for all metrics (at a given time) required to calculate derived metrics, calculate the given derived metric, create a new line for this derived metric, and append this line to the end of the file. Outputs: Element files. Calculate Statistical Data and create Rollup Files 924 Inputs: Element files. (*.elemnrc, *.elempat, proper order, comma separated, all text, sorted by metric and timestamp, derived metrics at end of file). Description: For each element file, this subroutine will 1) read each line for a given metric, 2) store each value into a temp array, 3) calculate min, max, mean, standard dev., and sample size, 4) create a new line (<element>, <metric>, <time>, <min>, <max>, <mean>, <stddev>, <n>), 5) write this line to a newly created rollup file and 6) repeat for the next metric. Outputs: Original element files (unchanged), rollup files stats.elemdailypat, stats.elemdailynrc Convert all Text Fields to Key Codes 926 Inputs: Element files (*.elemnrc, *.elempat, proper order, comma separated, all text, sorted by metric and timestamp, derived metrics at end of file), and rollup files. Description: For each file, this subroutine will read each line, convert each text field to a key code (according to a lookup table in a hash or file, except for the time field, which will be calculated), and write the line to a temp file. When all lines have been converted, the source file will be deleted, and the temp file will be moved to the source file.
For each file and for each unique time stamp, create a new line with the new calculated time key cd (<perf time—key—cd>, <dt>, <day>, <mon>, <yr>, <hour>, <min>, <sec>, <dow>) and write to a file perf—metric—time—tb.dat file. (This file then should be collapsed to remove duplicate time stamps from different elements). 928 Outputs: (Encoded) element files and rollup files. Union All Files of Same Element 930 Inputs: Element files (*.elemnrc, *.elempat, proper order, comma separated, all key codes, sorted by metric and timestamp, derived metrics at end of file), and rollup files (all key codes) Description: This subroutine will concatenate all files of the same element, regardless of collector source (*.pat, *.nrc, . . . ) into a *.elemall files. Outputs: All-Element files: element1.elemall, element2.elemall, Sort Each Element File 932 Inputs: All-Element files (see A. 10) Description: This subroutine will sort each file first by metric, then by time stamp. Outputs: All-Element files Union All Files 934 Inputs: All-Element files, Rollup files (see A.10) Description: This subroutine will concatenate all *elemall files into perf—fact—tb.dat, and all rollup files into perf—fact—daily—tb.dat. Outputs: Database load files: perf—fact—tb.dat, perf—fact—daily—tb.dat Move Files 936 Inputs: A files ready to be archived or moved to sql loader (perf—fact—tb.dat, perf—fact—daily—tb.dat, perf—metric—time—tb.dat) Description: This subroutine copies and/or moves files to any arbitrary location on a local drive. Outputs: Copies and/or moves files to /files6/ipsa/data—loads/data—files/ on twmmdb02.
| SNMP | | | | metrics | Patroller metrics | Standard or Derived Metric | | | busyPer | n/a | CpuBusy—Percent | | ifInOctects, | n/a | InterfaceUtilization—Percent | | ifOutOctets, | | ifSpeed | | ifInOctets, | n/a | InterfaceUtilization—BPS | | ifOutOctets, | | ifSpeed | | n/a | CPUCpuUtil | CpuUtilization—Percent | | n/a | MEMFreeMem | MemoryFree—Bytes | | n/a | NETPacketsIn, | NetworkTraffic—PPS | | | NETPacketsOut | | n/a | DSKPercentBusy | DiskBusy—Percent | | n/a | CPUprcrProcessorTime | CpuProcessorTime—Percent | | | Percent | | n/a | MEMmemAvailableBytes | MemoryFree—Bytes | | n/a | NETniPcktsPerSec | NetworkTraffic—PPS | | n/a | PDpdDiskTimePercent | DiskTime—Percent | | n/a | ftpResponseTime | FtpResponseTime—Seconds | | n/a | smtpResponseTime | SmtpResponseTime—Seconds | | n/a | nntpResponseTime | NntpResponseTime—Seconds | | n/a | httpResponseTime | HttpResponseTime—Seconds | | Configuration of Event Correlator and Manager
This section describes how to configure the Event Correlator and Manager (UNIX) to collect performance data. It covers the process of adding property groups, creating polls, and building alarms to collect basic performance statistics from CISCO routers. Although the examples are specific to CISCO routers, the concepts and processes introduced are transitive, and can be used to implement collection for other types of SNMP devices.
Accordingly, FIG. 10 depicts an embodiment of the present invention which generates an alarm to indicate a status of a network for service assurance purposes. Such purposes can include identifying errors and faults, monitoring system resources, anticipating problems, etc. Once a parameter of a network that is to be monitored is selected in operation 1000, a triggering level of the parameter is determined in operation 1002. In operation 1004, the parameter for an occurrence of the triggering level is monitored. If the triggering level is reached, an alarm is initiated in operation 1006.
In one aspect of the present invention, the alarm is named for identification purposes. Optionally, the parameter can be adapted to be monitored for a plurality of components of the network. In such an embodiment, a separate alarm notification may be initiated for each of the components upon the monitored occurrence of the triggering level thereon.
As an option, the alarm notification may be stored upon its initiation. Preferably, the alarm notification is stored until the occurrence of a user action.
Assumptions:
- The ECM MIB definition for your network hardware has been added to the MIB file.
Starting the Client - Start the ECM Client as 'no', if it is not already running.
- Using the Server/Connect menu selection, connect to the network management station as (i.e. nsmmwsl6) as 'noc'.
- Using the Server/Resync menu selection, resynchronize the ECM database with the platform database containing managed nodes (i.e. HP NNM).
Create a new 'Property Group'
Property groups allow categorization of managed devices into groups. The groups can be based on device types, location, priority, supported MIBs, business function, or any other useful characteristic. Polls or alarms can be restricted to the members of one or more property groups. - Select the Admin/Property Group List menu option to open the Property Group List.
- Select the MIB to Group button to open the MIB to Property Group window.
- Select the MIB of interest from those listed in the MIB list. For our example, select MIB-II.
- Type a unique name in the Property Group Name field, and then select OK. For this example, enter "SA—Router—Collection—Group".
Adding a Property to a Property Group
Adding properties to an existing property group enables one to take advantage of the flexibility of property groups. Properties may need to be added if it is desired to: - Make a property group unique by adding a property that isn't in any other group. By assigning the unique property group to selected devices, you can manage those devices differently from other devices.
- Make polls and/or alarms apply to devices belonging to that property group.
- Monitor a base object on a group of devices when the base object isn't listed as a property in the devices' property group.
For the purposes of this example, the "lsystem" base object property will be added, defined in the cscoe.asnl MIB definition; as well as a custom property to make the property group unique. - From within the Property Group List window, select SA—Router—Collection—Group from the list of groups
- Click inside the New Property field, type the name of the new property, and select Add. For our example, type "SA—Router—Collection—Property".
- Repeat the previous step to add the "lsystem" property.
- Select Save
- Close the Property Group List window
Assigning Property Groups
In the following example, SA—Router—Collection—Group, the property groups created in the previous section, will be assigned to a single router. - Open the Node Definition window for the device.
- Select the property group you would like associated with this device from the pop-up list in the Group field.
- Select Save, and close the Node Definition window.
Create a new 'Poll' - Open the 'Polls' list from the Patroller console using the Admin/Polls . . . menu selection.
- Select 'Add' from the Polls window.
- Configure the poll
- Leave 'Object Type' as Poll
- Change 'Name' to a new name
- Set the desired poll rate.
- Type a new name for the 'Poll Trigger'. Use a name similar to the name you gave to this poll.
- Leave False Trigger blank
- Leave Property as 'NO—PROP'
- Select 'Set Poll Condition'
(a) Select the 'Base Object' from which you want to collect data. This base object directly correlates to the base object of the SNMP MIB. (b) Select the desired 'Attributes' from the base object and drag them into the 'Expression' box using the middle mouse button. For each Attribute selected, change the 'Rel Op' field to 'present' and click the 'Add' button. When all attributes have been selected, click the 'Apply' button.
EXAMPLE POLL
Poll lsystemDataPoll
- Poll rate=15 min
- True Trigger=lsystemDataTrigger
- Property=SA—Router—Collection—Property
- Poll condition:
- Base object=lsystem
- lsystem.avgBusy1 present and
- lsystem.avgBusy5 present and
- lsystem.busyPer present
Create a new "Alarm" - Open the 'Alarm Definitions' list from the Patroller console using 'Admin->Alarm Definitions . . . ' selection.
- Select 'Add' from the Alarm window.
- Configure the new Alarm.
- Select a new name for the Logging alarm.
- Select the property of the device group you want to collect data from.
- Select the 'State Diagram' button.
- Using the 'New State' button, create 2 states called 'Ground' and 'Logging'. The 'Severity' of both states should be set to 'normal'.
- Using the 'New Transition' button, create a transition from 'Ground' to 'Logging'. The trigger for this transition should be the name of the 'Poll Trigger' you defined in section 3.1.4. Select 'Apply'. Under 'Action' select 'Log' and type the name of the log file where you want the data stored in the 'Arguments' field. Note: use the name of the trigger with a '.log' extension for the log name. Do not enter a path name as all data is stored in the pre-defined ECM logging directory.
- Create the same transition from 'Logging' to 'Logging'.
- In the 'Alarm Configuration View' select 'On' and 'Apply'.
Note/Caution:
When designing a logging alarm based on two states (i.e. ground and logging), be aware that a new 5K instance of the alarm may be instantiated and remain in memory for each object within the defined scope. For example, a subobject scoped alarm with two states, ground and logging, created to collect interfaced data from a router would be instantiated once for each interface on all routers in the target property group. Because this alarm contains two states, ground and logging, the alarm remains in memory until manually stopped as it never returns to the ground state. One method of avoiding this unnecessary overhead is to create the logging alarm with only one state, ground, and a single transition from ground to ground.
EXAMPLE ALARM
Design is Ground to Ground, with a single transition based on the 'lsystemDataTrigger' defined in the example poll above.
Alarm=lsystemDataLogger
- Property=SA—Router—Collection—Property
- Scope=node
- State List:
- Ground—Normal
- Transition List:
- Ground→Ground Trigger=if DataTrigger
- Action=Log to file:
- Ifdata.log
- Verbose enabled
Update the Nerve Center Schema Files
The ECM schema file(s) (e.g. ecm—schema.def) are used by the SA data management subsystem to define and upload data into SAS datasets. - Edit the file '/perf/CONFIGS/NERVEC—schema.def' Add to the end of this file a section for the new data you are collecting. The section should start with a single line to define the new 'TABLE' name, multiple lines for the attributes being collected, and a single line indicating the end of the table definition with the label 'ENDTABLE'.
- Add the 'TABLE' line. This line has 5 fields separated by commas.
Field 1—'TABLE' label Field 2—Name of new SAS dataset. This field should be no longer than 6 characters. Field 3—Name of the 'Poll Trigger'. Field 4—Name of the Alarm. Field 5—INDEXED or NONINDEXED. Type INDEXED if this is to be and INDEXED dataset. An INDEXED dataset contains data with multiple instances for a single device. For example, collecting router interface data for each interface on a router would be INDEXED, while collecting CPU data from a single CPU on a router would be NONIDEXED.
EXAMPLE
TABLE,NCSWRMON,RMONData,NC—RMONDataLogger,INDEXED
- Add a line for each attributed collected. These lines have 5 fields separated by commas.
Field 1—Name of the SAS Dataset defined in the 'TABLE' line. Field 2—New name of the attribute. This name must be 7 characters or less. Field 3—Name of the 'Poll Trigger'. Field 4—Name of the Attribute being collected by ECM. Field 5—Type of data. (STRING, C2RATE, GAUGE, INT) STRING: Alphanumeric string. C2RATE: If the data is a rising counter, the label will tell SAS to convert it to a rate for the interval in which is was collected. GAUGE: If the data is already a rate or a percentage. SAS will do no conversion. INT: If the data is an integer (Ex: Interface number)
EXAMPLE
NCSWRMON,INBITS,RMONData,etherStatsDataSource,STRING
- Add a line to signify the end of the configuration section. This line has 3 fields separated by commas.
Field 1—'ENDTABLE' label Field 2—Name of new SAS dataset. Field 3—Name of the ECM Alarm.
EXAMPLE
ENDTABLE,NCSWRMON,NC—RMONDataLogger
Database
One embodiment of the present invention, shown in FIG. 11, is used to develop a database. Data is extracted from a source system with an application program in operation 1100. The data is translated into a consistent model in operation 1102. The model contains information selected from one or more of performance metric information, time data, geographic location data, and event data. The modeled data is stored in a database in operation 1104. The data is analyzed in operation 1106 based on the information and the results of the analysis are output. It should be noted that the data may be analyzed before or after storing.
In one aspect of the present invention, the data is extracted using a plurality of application programs. Further, the information may include performance metric information, time data, geographic location data, and event data.
In yet another aspect of the present invention, the results may be outputted by way of reports and charts. Further, identifiers may be assigned to groups of the data. The data can also be stored in a plurality of databases including a first database for development purposes, a second database for testing purposes, and a third database for production purposes.
Refer to the following sections for more detailed information and additional embodiments.
Database Design
The database for the Service Assurance Toolkit can be designed as a data warehouse. This design offers greater performance and flexibility, which will enable the database to evolve in future releases of the Service Assurance Toolkit. The architecture for a successful data warehouse, by which we mean the end-to-end tools and processes, will vary from implementation to implementation. A typical data warehousing architecture should include, as a minimum: - Multiple extract programs from one or more operational systems, to retrieve the source data for the warehouse.
- A data repository ("Operational Data Store") containing the extracted data in an appropriate model.
- A tool to analyze and display the data as reports and charts.
The design of the SA database should also be independent of any relational DBMS. By doing this the SA toolkit will be more portable across platforms and will allow us to better meet client needs. Also, the design of the database may contain all necessary information to meet the requirements laid out in the reporting requirement document. For the SA toolkit, the data model should contain information such as: - Network element information.
- Performance Metric information.
- Detailed time data
- Network element geographic location data
- Network element event data
The design of the database can utilize surrogate keys to uniquely identify records. This will be done to increase query performance as well as to add flexibility to the database design. However, this adds complexity to the data manipulation because look-ups on the surrogate keys will need to take place.
Physical Database
The SA toolkit may be built on the Oracle RDBMS platform. While this is a specific platform, the physical design of the database can be independent of Oracle and portable to the major players in the RDBMS world i.e. Sybase, Informix, DB2, Microsoft, etc.
Development Environment
The database environment will have one database for the entire development process. However, for larger networks, more databases can be installed. For example, three databases may be installed—one database each for the devel, test, and production environments.
The installation and file system structure for the database will follow an Optimal Flexible Architecture (OFA). OFA is a set of 13 guidelines or goals to follow in order to standardize the database environment.
Database Backup/Recovery
Backups of the SA database will meet the following requirements: - The backup script utility may be written in perl. Perl allows scripts to be more portable across hardware platforms.
- The backup utility can be executed by the UNIX cron utility. It could also be executed by the Windows NT AT scheduler if running on the NT platform.
- Cold backups will be run at least once a week. This requires database down time.
- Exports of the database will take place every evening at the end of the nightly batch schedule.
Database Objects
All database objects (DDL, scripts, code) will be placed under source control. This will be done to allow for versioning when databases for other environments are implemented. This also places these objects in a centralized location for easy access.
Data Management, Retention and Volume
Data Volume
The estimated volume of detailed data is 2,000,000 rows per day. The physical design of the database will be built to handle this volume while remaining flexible enough to handle a large increase in volume.
Data Manipulation
Due to the fact surrogate keys will be used to uniquely identify records, a look-up process needs to occur in the data manipulation process. A perl script will be written to extract surrogate key information from the database to flat files. The data manipulation process will use these files to generate flat files in a format that can be loaded into the database tables.
During the nightly batch window, all the detailed level data will be rolled up into daily summary data.
Data Loading
The detailed records will be loaded in via a perl script which utilizes an SQL*Loader utility. While all of the major RDBMS vendors have some sort of data loading utility, they are all different syntactically. Thus, this script may need to change depending on the type of database. For phase 2, the detailed data may be loaded once a day during the nightly batch run.
Daily rollup data will also be loaded via the load script at the same time the detail records are loaded. This is possible because all of the data manipulation is occurring outside of the RDBMS.
Data Retention
Phase 2 recommends detailed level data to be retained for 40 days and daily rollup data retained for 13 months. A perl script will be written to purge this data once is has expired. This process will run nightly, before the actual loading of the data occurs.
Database Developer
The purpose of this section is to outline the steps taken to install a Database Developer portion of the present invention on a server. The following examples will teach how to install the Database Developer on both a Windows NT Workstation and a HP 9000 Server. This section should also be applicable to install the Database Developer on different servers. "Dbse" throughout the description of the present invention denotes a database program.
Minimum System Requirements
The minimum system requirements should be determined before installation of the Database Developer. Exemplary minimum system requirements are:
Each Client Workstation
- 90 MHz processor
- 32 Mb RAM, 1 Gb hard disk
- 60-680 Mb available disk space (depending on the components you choose to install
- Windows NT 4.0/95
- SQL*Net Client 2.3.4
- TCP/IP Adapter
- SQL*Plus 3.3.4
- CD-ROM drive
On the Server (for each Repository Created). - 125 Mb in the SYSTEM tablespace
- 20-330 Mb in other tables for the Repository data
- 16-28 Mb available RAM for the System Global Area
Pre-Installation Steps - Verify the system requirements are met at each client workstation
- Verify the Server is correctly installed and accessible on the network
- Verify the database instance you wish to use has been created and functions correctly
Client Installation 1. Log on to the client workstation as a user with administrative privileges but not as an administrator 2. Close all other applications 3. Insert the Database Developer media into a media reading device, (e.g., insert a CD into the CD-ROM drive) 4. In the Installation Settings dialog box: - a) Type the name of the license owner for this product
- b) Select DEFAULT—HOME for Home Name
- c) Select English for Home Language
- d) Click OK
5. Select Yes in all Update message boxes to update out-of-date products 6. Select No in all Up to Date message boxes to avoid reinstalling products 7. Select OK in all General Information and System Support Files message boxes to acknowledge message 8. Select OK to acknowledge the products that will be installed 9. Select OK to acknowledge that Installation is complete 10. Configure SQL*Net as follows: - a) Click Start→Programs→SQL for Windows NT→SQL Net Easy Configuration
- b) Choose Add Database Alias and click OK
- c) In the Choose Database Alias box, type the database alias you wish to use and click OK
- d) Choose TCP/IP in the Choose Protocol dialog box and click OK
- e) In the Choose Protocol/Server Name and Database Instance dialog box:
- i. Type the name of the server where the database is located
- ii. In the Database Instance field, type the name of the database
- iii. Click OK, click Yes
- f) Select Exit SQL*Net Easy Configuration, click OK
- g) Click OK
Server Installation (Example) Activities at the Database Server 1. Log on to the UNIX Server that holds the database 2. Start the database instance, if not already running 3. Verify that TNS Listener is running 4. Log onto SQL*Plus as system 5. In SQL*Plus, verify that the procedural option has been installed a) Execute the following SQL statement: SQL>create or replace procedure test as x varchar2(10) 2 begin 3 if x is null then null 4 end if 5 end 6/ Warning: Procedure created with compilation errors. SQL>drop procedure test; Procedure dropped. b) If the SQL statement returns an error (as above), continue to step c. Otherwise go to step 5. c) Run the following SQL script to install the option where dbse—home is the location where dbse is installed (C:\DBSNT): SQL>@dbse—home/rdbms/73/admin/catproc.sql 6. Create two tablespaces, one for Repository Data and another for Repository Index, of size 100 Mb each 7. Create one tablespace for small rollback segments (recommended four rollback segments @(10 Mb each) 8. Create one large tablespace for the rollback statement used during installation. The size of this rollback segment should be as big as possible (recommended at least 100 Mb). In this tablespace, create one rollback segment and put it online (this rollback segment and tablespace can be removed after install) 9. Take all rollback segments offline except the SYSTEM rollback segment and the one created in step 8 10. Check the size of the SYSTEM tablespace. You will need at least 125 Mb of free space for the installation. 11. Create the Repository owner and grant connection and resource roles 12. Check the settings in the INITxxx.DBA (where xxx is the name of the database) file and change if necessary. 13. If any changes were made to INITxxx.DBA in the previous step, shut down and restart the database. Make sure to close anySQL*Plus windows that are open. Once the database is restarted successfully, exit Server Manager Activities at the Client 1. Log on to the Client Workstation as a user with administrative privileges but not as an administrator 2. Log on to the database using SQL*Plus 3. Create a role called OWNER a) Execute the following SQL statement where home is the directory where the Database Developer was installed: SQL>home\repadm20\utl\ckrorole.sql b) Issue all privileges that apply to the Repository owner 4. Grant the role created in step 3 to the Repository owner 5. Close any SQL*Plus windows that are open 6. Start the Repository Administration Utility 7. Connect as the Repository owner 8. Click the View Privileges button and verify the required system privileges and roles are granted for the Repository owner. |