APPLICATION PROGRAM INTERFACE (API)

Method and apparatus for administering a server having a subsystem in communication with an event channel

6789112

Abstract

Methods and apparatus for administering a remote server having a subsystem in communication with an event bus. In one aspect of the present invention, an administration tool for administering a server has a subsystem in communication with an event bus. The administration tool includes a graphical user interface communications channel and a graphical user interface module corresponding to the server subsystem, wherein the graphical user interface module is in communication with the channel. The administration tool also includes a transport module in communication with the channel and the graphical user interface module. The graphical user interface module transmits an administration command to the corresponding server subsystem by sending the command to the transport module via the communications channel.


Claims

What is claimed is:

1. An administration tool for administering a server, the server having at least one subsystem in communication with an event bus, the administration tool comprising:

a graphical user interface communications channel;

a graphical user interface module corresponding to a server subsystem, the graphical user interface module in communication with the graphical user interface communications channel;

a transport module in communication with the graphical user interface communications channel and the graphical user interface module; and

a persistent store in communication with the server, the persistent store storing static data associated with the server,

wherein the graphical user interface module transmits an administration command to the corresponding server subsystem by sending the administration command to the transport module via the graphical user interface communications channel, and the corresponding server subsystem communicates with the persistent store in response to the received administration command.

2. The administration tool of claim 1 wherein the administrative command comprises an event.

3. The administration tool of claim 1 wherein the graphical user interface module comprises a loadable module.

4. The administration tool of claim 3 wherein the loadable module comprises a JAVA bean.

5. The administration tool of claim 3 wherein the loadable module comprises a COM object.

6. The administration tool of claim 3 wherein the loadable module comprises an ActiveX control.

7. The administration tool of claim 1 wherein the transport module sends data to the server using TCP/IP.

8. The administration tool of claim 1 comprising a plurality of graphical user interface modules, each of the modules corresponding to a respective subsystem on the server.

9. The administration tool of claim 1 wherein the graphical user interface module corresponds to a plurality of server subsystems.

10. The administration tool of claim 1 wherein the graphical user interface module displays dynamic data associated with the corresponding subsystem.

11. The administration tool of claim 1 wherein the communications channel comprises a data object.

12. A method for administering a server, the server having a subsystem in communication with an event bus, the method comprising:

providing a graphical user interface communications channel;

providing a graphical user interface module corresponding to a server subsystem, the graphical user interface module in communication with the graphical user interface communications channel;

providing a transport module in communication with the graphical user interface communications channel and the graphical user interface module;

providing a persistent store in communication with the server, the persistent store storing static data associated with the server;

transmitting an administration command from the graphical user interface module to the corresponding server subsystem by sending the administrative command to the transport module via the graphical user interface communications channel; and

communicating with the persistent store in response to the received administration command.

13. The method of claim 12 further comprising creating an administrative command using an event format.

14. The method of claim 12 wherein providing a graphical user interface module further comprises providing a loadable module.

15. The method of claim 14 wherein providing a loadable module further comprises providing a JAVA bean.

16. The method of claim 14 wherein providing a loadable module further comprises providing a COM object.

17. The method of claim 14 wherein providing a loadable module further comprises providing an ActiveX control.

18. The method of claim 12 wherein transmitting an administrative command further comprises using TCP/IP.

19. The method of claim 12 further comprising adding an additional graphical user interface module corresponding to a respective subsystem on the server.

20. The method of claim 12 further comprising displaying dynamic data associated with the corresponding subsystem via the graphical user interface module.

21. The method of claim 12 wherein providing a communications channel further comprises using a data object.


Description

FIELD OF THE INVENTION

The invention relates to server systems for use in a network of computers and particularly to administering a server having a subsystem in communication with an event bus.

BACKGROUND OF THE INVENTION

Client/server systems, in which the server executes one or more applications for a client, are similar to traditional multi-user systems such as UNIX. Graphically, these systems behave similarly to X-WINDOWS, a user interface standard for UNIX systems. A client/server system, such as the commercially available WINFRAME system manufactured by Citrix Systems, Inc. of Ft. Lauderdale, Fla., may include a number of application servers. Each application server may support multi-tasking of several applications that may be requested by a user at a remotely located workstation.

In order to minimize response time, maximize system throughput, and generally give the appearance that the user's application program is executing at the client, an administrator will often provide a user with access to a number of application servers that host the desired applications and are capable of servicing the user's requests. However, in order for such a system to operate efficiently, the application servers must dynamically coordinate access to system resources shared among the application servers as well as coordinate access to the application servers by the user. One way in which this is done is selecting one server from the group to act as the "master server." The master server is responsible for keeping track of resource usage both by users and application servers. However, as the number of applications servers grows larger, the administrative burden becomes significant, effectively limiting the size of these networks.

The present invention avoids this potential problem.

SUMMARY OF THE INVENTION

The present invention relates to a method and apparatus for administering a server having a subsystem in communication with an event bus. In one aspect, the invention relates to an administration tool for administering a server having a subsystem in communication with an event bus. The administration tool includes a graphical user interface communications channel and a graphical user interface module corresponding to a server subsystem, wherein the module is in communication with the channel. The administration tool also includes a transport module in communication with the channel. The graphical user interface module transmits an administration command to the corresponding server subsystem by sending the command to the transport module via the communications channel.

In one embodiment, the administrative command includes an event. In another embodiment, the graphical user interface module includes a loadable module. In another embodiment, the loadable module includes a JAVA bean. In another embodiment, the loadable module includes a COM module. In another embodiment, the loadable module includes an ActiveX control. In another embodiment, the transport module sends data to the server using TCP/IP. In another embodiment, the administration tool also includes a plurality of graphical user interface modules, each of the modules corresponding to a respective subsystem on the server. In another embodiment, the graphical user interface module corresponds to a plurality of server subsystems. In another embodiment, the graphical user interface module displays dynamic data associated with the corresponding subsystem. In another embodiment, the communications channel includes a data object.

In another aspect, the invention also relates to a method for administering a server having a subsystem in communication with an event bus. The method includes the steps of providing a graphical user interface communications channel and providing a graphical user interface module corresponding to a server subsystem, the module in communication with the channel. The method also includes the steps of providing a transport module in communication with the channel and transmitting an administration command from the graphical user interface module to the corresponding server subsystem by sending the command to the transport module via the communications channel.

In one embodiment, the method also includes the step of creating an administrative command using an event format. In another embodiment, the method also includes the step of adding an additional graphical user interface module corresponding to a respective subsystem on the server. In another embodiment, the method also includes the step of displaying dynamic data associated with the corresponding subsystem via the graphical user interface module.

In another embodiment, the step of providing a graphical user interface module also includes providing a loadable module. In another embodiment, the step of providing a loadable module also includes providing a JAVA bean. In another embodiment, the step of providing a loadable module also includes providing a COM object. In another embodiment, the step of providing a loadable module also includes providing an ActiveX control. In another embodiment, the step of transmitting also includes using TCP/IP. In another embodiment, the step of providing a communications channel also includes using a data object.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is pointed out with particularity in the appended claims. The advantages of the invention described above, as well as further advantages of the invention, may be better understood by reference to the following description taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a block diagram of an embodiment of an enterprise system architecture comprising multiple server farms;

FIG. 2A is a block diagram of an embodiment of a server farm using the invention;

FIG. 2B is a diagram of an embodiment of the server farm of FIG. 2A logically organized in multiple zones of servers;

FIG. 3 is a block diagram of an embodiment of one server in the server farm of FIG. 2A, the server including a plurality of subsystems in communication with each other over an event bus;

FIG. 4A is a diagram of an embodiment of a dispatch table used by the event bus to route events to subsystems on the server of FIG. 3;

FIG. 4B is a diagram of an embodiment of a subscription table used by the server of FIG. 3 to route events to subsystems on the same server;

FIG. 4C is a diagram of an embodiment of a subscription table used by the server of FIG. 3 to route events to subsystems on other servers in a farm;

FIG. 5A is a flow diagram illustrating an embodiment of a process used to respond to subscription requests;

FIG. 5B is a flow diagram illustrating an embodiment of a process used to respond to notification events;

FIG. 6 is a flow diagram illustrating an embodiment of a process used to initialize servers and subsystems on each server of the server farm;

FIGS. 7A-7B are diagrammatic views of an embodiment of an event that may be transmitted in accordance with the invention;

FIGS. 8A-8B are block diagrams of an embodiment of a process used to issue an event to a destination subsystem using a PostEvent command;

FIGS. 9A, 9B, 9C and 9D are flow and block diagrams of an embodiment of a process used to issue an event to a remote destination subsystem using a SendEventandWait command;

FIG. 10 is a block diagram of an embodiment of a process used to manage run-time data;

FIG. 11 is a block diagram of an embodiment of a server including a license management subsystem;

FIG. 12 is a flow chart of an embodiment of a process used during initialization license management subsystem;

FIG. 13 is a flow chart of an embodiment of a process used by the license management subsystem in response to a license request;

FIGS. 14A-14B are block diagrams of embodiments of a server including a user management subsystem;

FIG. 15 is a flow diagram illustrating an embodiment of a process used by a specialized server subsystem for processing a launch application request;

FIG. 16 is a flow diagram illustrating an embodiment of a process used by an administration tool to obtain data for managing the server farm; and

FIGS. 17, 18, 19 and 20 are exemplary screen shots of graphical user interface displays produced by the administration tool.

INDEX

The index below should help the reader follow the discussion of the invention:

1.0 System Overview

2.0 Server Farm Overview

2.1 Persistent Store

2.2 Dynamic Store

2.3 Collector Points

2.4 Server Zones

3.0 Server Overview

3.1 Common Facilities Module

3.2 Subsystem Communication Using the Event bus

3.2.1 Event bus API

3.2.2 Subsystem API

3.2.3 Dispatch Table

3.3 Direct Subsystem Communication

3.4 Persistent Store System Service Module

3.5 Dynamic Store System Service Module

3.6 Service Locator System Service Module

3.7 Subscription Manager System Service Module

3.7.1 Local Subscription Table

3.7.2 Remote Subscription Table

3.7.3 Subscribe Function

3.7.4 Unsubscribe Function

3.7.5 PostNotificationEvent

3.8 Host Resolution System Service Module

3.9 Zone Manager System Service Module

3.9.1 Assigning Ownership of Distributed Resources

3.9.2 Assigning Ownership of Network Services

3.10 System Module

3.11 Loader

4.0 Server and Subsystem Initialization

5.0 Events

5.1 Event Types

5.1.1 Directed Events

5.1.1.1 Request-And-Reply Events

5.1.1.2 Notification Events

5.2 Event Delivery Commands

6.0 Basic Examples

6.1 PostEvent Command

6.2 SendEventAndWait Command

6.3 Managing Dynamic Data

7.0 Subsystems

7.1 Transport Layer

7.2 Group Subsystem

7.3 Relationship Subsystem

7.4 Load Management Subsystem

7.5 License Management Subsystem

7.6 User Management Subsystem

7.7 ICA Browser Subsystem

7.8 Program Neighborhood Subsystem

7.9 Application and Server Subsystems

7.9.1 Application Subsystems

7.9.1.1 Common Application Subsystem

7.9.1.2 Specialized Application Subsystem

7.9.2 Server Subsystems

7.9.2.1 Common Server Subsystem

7.9.2.2 Specialized Server Subsystem

7.9.3 Application Name Resolution

7.9.4 Application Enumeration

7.9.5 Server Enumeration

7.10 Common Access Point (CAP) Subsystem

7.11 Administration Subsystem

8.0 Administration Tool.

DETAILED DESCRIPTION OF THE INVENTION

1.0 System Overview

Referring now to FIG. 1, one embodiment of a system architecture 100 constructed in accordance with the invention is depicted, which includes four server farms 110, 110', 110", 110'" (generally 110), at least one client 120 in communication with one of the server farms 110, and an administration tool 140. Although only four server farms 110 and one client 120 are shown in FIG. 1, no limitation of the principles of the invention is intended. Such system architecture 100 may include any number of server farms 110 and have any number of client nodes 120 in communication with those farms 110.

Each server farm 110 is a logical group of one or more servers (hereafter referred to generally as server 180 or servers 180) that are administered as a single entity. The servers 180 within each farm 110 can be heterogeneous. That is, one or more of the servers 180 can operate according to one type of operating system platform (e.g., WINDOWS NT, manufactured by Microsoft Corp. of Redmond, Wash.), while one or more of the other servers 180 can operate on according to another type of operating system platform (e.g., Unix or Linux). The servers 180 comprising each server farm 110 do not need to be physically proximate to each other server 180 in its farm 110. Thus, the group of servers 180 logically grouped as a server farm 110 may be interconnected using a wide-area network (WAN) connection or medium-area network (MAN) connection. For example, a server farm 110 may include servers 180 physically located in different regions of a state, city, campus, or room. Data transmission speeds between servers 180 in the server farm 110 can be increased if the servers 180 are connected using a local-area network (LAN) connection or some form of direct connection.

By way of example, the client node 120 communicates with one server 180 in the server farm 110 through a communications link 150. Over the communication link 150, the client node 120 can, for example, request execution of various applications hosted by the servers 180, 180', 180", and 180'" in the server farm 110 and receive output of the results of the application execution for display. The communications link 150 may be synchronous or asynchronous and may be a LAN connection, MAN connection, or a WAN connection. Additionally, communications link 150 may be a wireless link, such as an infrared channel or satellite band.

As a representative example of client nodes 120 and servers 180 in general, the client nodes 120 and server 180 can communicate with each other using a variety of connections including standard telephone lines, LAN or WAN links (e.g., T1, T3, 56 kb, X.25), broad band connections (ISDN, Frame Relay, ATM), and wireless connections. Connections can be established using a variety of lower layer communication protocols (e.g., TCP/IP, IPX, SPX, NetBIOS, Ethernet, RS232, direct asynchronous connections). Higher layer protocols, such as the Independent Computing Architecture protocol (ICA), manufactured by Citrix Systems, Inc. of Ft. Lauderdale, Fla., or the Remote Display Protocol (RDP), manufactured by Microsoft Corporation of Redmond Wash., can be used to allow client 120 access to a server farm 110, such as access to applications residing on the servers 180.

2.0 Server Farm Overview

Referring now to FIG. 2A, the servers 180 comprising a server farm 110 each include a network-side interface 202 and a server farm-side interface 204. The network-side interfaces 202 of the server 180 may be in communication with one or more clients 120 or a network 210. The network 210 can be a WAN, LAN, or international network such as the Internet or the World Wide Web. Clients 120 may establish connections with the servers 180 using the network 210.

The server farm-side interfaces 204 of the servers 180 are interconnected with each over communication links 200 so that the servers may communicate with one another in accordance with the principles of the invention. On each server 180, the server farm-side interface 204 communicates with the network-side interface 202. The server farm-side interfaces 204 also communicate (designated by arrows 220) with a persistent store 230 and, in some embodiments, with a dynamic store 240. The combination of servers 180, the persistent store 230, and the dynamic store 240, when provided, are collectively referred to as a server farm 110.

2.1 Persistent Store

Persistent store 230 may be physically implemented on a disk, disk farm, a redundant array of independent disks (RAID), writeable compact disc, or any other device that allows data to be read and written and that maintains written data if power is removed from the storage device. A single physical device may provide storage for a plurality of persistent stores, i.e., a single physical device may be used to provide the persistent store 230 for more than one server farm 110. The persistent store 230 maintains static data associated with each server 180 in server farm 110 and global data used by all servers 180 within the server farm 110. In one embodiment, the persistent store 230 may maintain the server data in a Lightweight Directory Access Protocol (LDAP) data model. In other embodiments, the persistent store 230 stores server data in an ODBC-compliant database. For the purposes of this description, the term "static data" refers to data that do not change frequently, i.e., data that change only on an hourly, daily, or weekly basis, or data that never change. Each server uses a persistent storage subsystem 300, described in detail in section 7.1 below, to read data from and write data to the persistent store 230.

The data stored by the persistent store 230 may be replicated for reliability purposes physically or logically. For example, physical redundancy may be provided using a set of redundant, mirrored disks, each providing a copy of the data. In other embodiments, the database itself may be replicated using standard database techniques to provide multiple copies of the database. In further embodiments, both physical and logical replication may be used concurrently.

2.2 Dynamic Store

As described above, the servers 180 store "static" data, i.e., data that persist across client sessions, in the persistent store 230. Writing to the persistent store 230 can take relatively long periods of time. To minimize accesses to the persistent store 230, the servers 180 may develop a logical, common database (i.e., the dynamic store 240) that is accessible by all of the servers 180 in the farm 110 for accessing and storing some types of data. The dynamic store 240 may be physically implemented in the local memory of a single or multiple servers 180 in the server farm 110, as described in greater detail below. The local memory can be random access memory, disk, disk farm, a redundant array of independent disks (RAID), or any other memory device that allows data to be read and written.

In general, data stored in the dynamic store 240 are data that are typically queried or changed frequently during runtime. Examples of such data (hereafter referred to as runtime data) are the current workload level for each of the servers 180 in the server farm 110, the status of the servers 180 in the server farm 110, client session data, and licensing information.

In one embodiment, the dynamic store 230 comprises one or more tables, each of which stores records of attribute-value pairs. Any number of tables may exist, but each table stores records of only one type. Tables are, in some embodiments identified by name. Thus, in this embodiment, two servers 180 that use the same name to open a table refer to the same logical table.

In further embodiments, each table record is uniquely identified by name. The name of a record may be one of the attributes of the record. Records may also include a "type" attribute that is unique to the type of record. Records may be created, updated, queried, or deleted by any server 180. An example of a dynamic store record table relating to active client sessions appears below:

Table "Client Sessions"

ID_TYPE "AppName Session"

ID_USER="MarkT"

ID_XXX= . . .

2.3 Collector Points

The dynamic store 240 (i.e., the collection of all record tables) can be embodied in various ways. In one embodiment, the dynamic store 240 is centralized; that is, all runtime data are stored in the memory of one server 180 in the server farm 110. That server operates as a master network node with which all other servers 180 in the farm 110 communicate when seeking access to that runtime data. In another embodiment, each server 180 in the server farm 110 keeps a full copy of the dynamic store 240. Here, each server 180 communicates with every other server 180 to keep its copy of the dynamic store 240 up to date.

In another embodiment, each server 180 maintains its own runtime data and communicates with every other server 180 when seeking to obtain runtime data from them. Thus, for example, a server 180 attempting to find an application program requested by the client 120 may communicate directly with every other server 180 in the farm 110 to find one or more servers hosting the requested application.

For server farms 110 having a large number of servers 180, the network traffic produced by these embodiments can become heavy. One embodiment alleviates heavy network traffic by designating a subset of the servers 180 in a farm 110, typically two or more, as "collector points." Generally, a collector point is a server that collects run-time data. Each collector point stores runtime data collected from certain other servers 180 in the server farm 110. Each server 180 in the server farm 110 is capable of operating as, and consequently is capable of being designated as, a collector point. In one embodiment, each collector point stores a copy of the entire dynamic store 240. In another embodiment, each collector point stores a portion of the dynamic store 240, i.e., it maintains runtime data of a particular data type. The type of data stored by a server 180 may be predetermined according to one or more criteria. For example, servers 180 may store different types of data based on their boot order. Alternatively, the type of data stored by a server 180 may be configured by an administrator using administration tool 140. In these embodiments, the dynamic store 240 is distributed among two or more servers 180 in the farm 110.

Servers 180 not designated as collector points know the servers 180 in a farm 110 that are designated as collector points. As described in more detail below, a server 180 not designated as a collector point communicates with a particular collector point when delivering and requesting runtime data. Consequently, collector points lighten network traffic because each server 180 in the farm 110 communicates with a single collector point server 180, rather than with every other server 180, when seeking to access the runtime data.

2.4 Server Zones

FIG. 2B shows an exemplary server farm 110 including servers 180, 180', 180", and 180'" organized into separate zones 260 and 270. A zone is a logical grouping of servers 180 within a server farm 110. In one embodiment, each zone 260, 270 includes its own dynamic store 240, i.e., the servers in each zone maintain a common database of run-time data. A zone 260, 270 includes a subset of the servers 180 in the server farm 110. In the embodiment shown in FIG. 2B, zone 260 includes servers 180', 180", and 180'", and zone 270 includes server 180.

The formation of each zone 260, 270 within a server farm 110 may be based upon network topology. For example, zone definitions can depend upon the geographic locations of the servers 180. Each server 180 determines the zone 260, 270 to which that server 180 belongs. In one embodiment, each server 180 determines its zone 260, 270 when first added to the server farm 110. In other embodiments, a server 180 may elect to join a different existing zone 260, 270 or start a new zone 260, 270 during run-time. In another embodiment, an administrator can establish and control the establishing of zones 260, 270 as well as assignment of servers 180 to zones 260, 270 through the administration tool 140. In still other embodiments, servers 180 may be logically grouped into zones based on one or more criteria such as IP address or lexical network name.

In one embodiment, each zone 260, 270 includes a server 180 that operates as a collector point for dynamically collecting a predetermined type of data from the other servers 180 in that zone 260, 270. Examples of types of data include licensing information, loading information on that server, load management data, server identification and status, performance metrics, total memory, available memory, subscription data (discussed in Section 3.5) and client session data. In the embodiment shown in FIG. 2B, servers 180" and 180 are the collector points for zones 260 and 270, respectively. In zone 260, for example, the collector point 180" receives run-time data from servers 180'" and 180'. The collected data is stored locally in memory at the collector point 180".

Each server 180 can operate as a collector point for more than one type of data. For example, server 180" can operate as a collector point for licensing information and for loading information. Also, multiple servers 180 may concurrently operate as collector points within a given zone 260, 270. In these embodiments, each collector point may amass a different type of run-time data. For example, to illustrate this case, the server 180'" can collect licensing information, while the server 180" collects loading information.

In some embodiments, each collector point stores data that is shared between all servers 180 in a farm. In these embodiments, each collector point of a particular type of data exchanges the data collected by that collector point with every other collector point for that type of data in the server farm 110. Thus, upon completion of the exchange of such data, each collector point 180" and 180 possesses the same data. Also in these embodiments, each collector point 180 and 180" also keeps every other collector point abreast of any updates to the runtime data. In some embodiments, multiple servers 180 in one zone 260, 270 function as collector points for a particular kind of data. In this embodiment, a server 180 broadcasts each change in the collected data to every other collector point in the farm 110.

In other embodiments, each collector stores information that is shared between servers 180 in a particular zone 260, 270 of a server farm 110. In these embodiments, because only one collector point per zone 260, 270 is necessary, no exchange of collected data occurs. Examples of collected data that are not shared outside of a particular zone 260, 270 include information relating to pooled zone licenses or client session data corresponding to disconnected sessions.

3.0 Server Overview

In brief overview, FIG. 3 shows an embodiment of one of the servers 180 in the server farm 110. The server 180 includes an event bus 310, a system module 320, a loader module 330, a common facilities module 340, a plurality of system service modules 350, and one or more personality subsystems 300. In the embodiment shown in FIG. 3, system service modules 350 are provided as subsystems and include: a persistent store system service module 352; service locator system service module (hereafter, "service locator") 354; a dynamic store system service module 356; a zone manager system service module (hereafter, "zone manager") 358; a host resolution system service module (hereafter, "host resolver") 360; and a subscription manager system service module (hereafter, "subscription manager") 362, all of which are described in more detail below. In other embodiments, system service modules may be provided as WINDOWS NT services or daemons. Server 180 is a representative example of the other servers in the server farm 110 and of other servers in the server farms 110', 110", and 110'".

Each personality subsystem 300 is a software module that provides particular behavior or functionality for the server 180, such as load management services. The particular set of subsystems 300 installed on each of the servers 180 define the behavior of each server 180 and, accordingly, of the server farm 110. Examples of personality subsystems useful in accordance with the present invention are: a group subsystem (described below in section 7.3), a relationship subsystem (described below in section 7.4), a load management subsystem (described below in section 7.5), a license management subsystem (described below in section 7.6), a user management subsystem (described below in section 7.7), an ICA browser subsystem (described below in section 7.8), a program neighborhood subsystem (described below in section 7.9), a specialized application subsystem (described below in section 7.10), a specialized server subsystem (described below in section 7.10), a common application subsystem (described below in section 7.10), and a common server subsystem (described below in section 7.10), a common access point subsystem (described below in section 7.11), and an administration subsystem (described below in section 7.12). The functionality of various subsystems 300, in another embodiment, is combined within a single subsystem. Also, the functionality of the server 180 is not intended to be limited to those subsystems 300 listed.

In general, the subsystems 300 communicate with one another, and with the system service modules 350 when they are provided as subsystems, by generating and transmitting event messages, also referred to throughout this specification as events, over the event bus 310. As used in this specification, the term "event" is broad enough to encompass any sort of message or packet that includes control information (such as the identity of the source subsystem and the destination subsystem) and payload data. Events are described in more detail in connection with FIGS. 7A-7B. Subsystems 300 may also communicate with system service modules 350 without using the event bus 310 using an internal API 302 provided by the system service modules 350. In one embodiment, each subsystem 300 is either a single-threaded or a multi-threaded subsystem. A thread is an independent stream of execution running in a multi-tasking environment. A single-threaded subsystem 300 is capable of executing only one thread at a time. A multi-threaded subsystem 300 can support multiple concurrently executing threads, i.e., a multi-threaded subsystem 300 can perform multiple tasks simultaneously.

3.1 Common Facilities Module

The common facilities module 340 provides common, basic functionality useful to all subsystems 300 and system service modules 350 including, but not limited to, buffer management, multi-threaded framework services, unique identifier management, and data structure management. A multi-threaded framework facility provides services for managing semaphores and synchronizing with: semaphores; operating system events; and critical sections. A multi-threaded framework also provides services for creating, starting, and stopping threads. In one embodiment, the multi-threaded framework facility is provided as a C++ or Java class. The common facilities module 340 may also provide functions allowing subsystems 300 to construct, destroy, and manage common data structures including queues, hash tables, linked lists, tables, security objects, and other standard data objects.

A buffer management facility 345 provides uniform data buffering services that each subsystem 300 uses to store events in event buffers 380, 380' (generally, 380). In one embodiment, the buffer management facility 345 is provided as a C++ base class. In another embodiment, the buffer management facility 345 is provided as a Java class. Examples of services that may be provided by the buffer management facility 345 include initialization, allocation, deallocation, resizing, and duplication of buffers.

In one embodiment, implementation of the event buffer 380 is in local memory 325 of the server 180, accessible by each of the subsystems 300 and the system service modules 350 of the server 180. In this embodiment, when a subsystem 300 generates an event, an event buffer 380 is dynamically created specifically to store that event. Although only two event buffers 380 are shown in FIG. 3, it should be understood that the number of event buffers 380 provided is limited only by the amount of local memory available for storing event buffers 380. Each subsystem 300 using the event maintains a reference pointer to the event buffer 380 storing the event. Pointers to an event buffer 380, rather than the event itself, are delivered from one subsystem 300 to another subsystem 300 to minimize the amount of information passing over the event bus 310. In this embodiment, the event buffer 380 maintains a reference count that allows each receiving subsystem 300 to determine if no other subsystem 300 is referencing the event stored in the event buffer 380. The recipient of an event may delete it from the event buffer 380 if no other subsystem is referencing that event.

In other embodiments, each subsystem 300 maintains its own copy of an event. The event buffer 380 allows its respective subsystems 300 to write the data relating to the event in the event buffer 380 and other subsystems 300 to read such data. When a subsystem 300 generates an event, an event buffer 380 is dynamically created specifically to store that event. In these embodiments, each subsystem 300 deletes an event from the event buffer 380 once it has read the event or when the event, or a pointer to the event, is transmitted to a remote server. This embodiment allows multiple subsystems 300 to access the same event information substantially simultaneously.

3.2 Subsystem Communication Using the Event Bus

The event bus 310 provides a communication path for conveying events between the subsystems 300 of the server 180 and for conveying events to subsystems 300 residing on other servers 180', 180", 180'" in the server farm 110. The event bus 310, in one embodiment, includes an event delivery object 312 and a transport layer 318. The event delivery object 312 delivers events between subsystems 300 on the same server 180 (i.e., local subsystems), and the transport layer 318 delivers events to subsystems on a different server 180', 180", 180'" (i.e., remote subsystems). The transport layer 318 uses a transport mechanism, such as TCP/IP, UDP/IP, HTTP, Ethernet or any other network transport protocol, to transmit or receive events to or from the transport layers of the other servers 180', 180", 180'". In another embodiment, the transport layer 318 is implemented as another subsystem 300 that communicates with the other subsystems 300 of the server 180 over the event bus 310.

In one embodiment each subsystem "type" is assigned a predetermined identifier. In other embodiments, each subsystem generates, or is assigned, a globally unique identifier that uniquely identifies that subsystem zone-wide, farm-wide, enterprise-wide, or world-wide.

In some embodiments, each subsystem 300 has a unique subsystem identifier. In these embodiments, the event delivery object 312 includes a dispatch table 316 binding each subsystem identifier to a respective entry point associated with the subsystem 300. The event delivery object 312 dispatches events to the subsystems 300 using the entry point. In one embodiment, the entry point is an event queue (not shown) associated with the subsystem. In other embodiments, the entry point is a pointer to an API provided by a subsystem 300, described in section 3.2.2. In general, the event delivery object 312 passes an event pointer between subsystems 300 on the same server 180 so that the receiving subsystem(s) can access the location in local memory 325 (i.e., the event buffer 380) where the event is stored.

For embodiments in which event queues are used, events delivered by the event delivery object 312 to the corresponding subsystem 300 are stored in the event queue in the order such events are received from the event delivery object 312. To place events on the event queues, the event delivery object 312 calls a "QueueEvent" function. In one embodiment, the QueueEvent function accepts, as an input parameter, a pointer to the event buffer 380 representing the event to be placed on the event queue. In one embodiment, each event queue holds pointers to the event buffers 380 storing the events. Events (or the pointers to the respective event buffers 380) remain in the event queue until dispatched by the event delivery object 312 to the corresponding subsystem 300. Event queues allow the identity of the thread responsible for delivery of the event to change. That is, the identity of the thread dispatching the event to the subsystem 300 from the event queue can be different from the identity of the thread that originally placed the event on the event queue.

In an alternative set of embodiments, two event queues (not shown in FIG. 3) may be associated with each subsystem 300. In these embodiments, one event queue receives incoming events from the event delivery object 312 and the other receives outgoing events from the subsystem 300 to the event delivery object 312. In these embodiments, the event delivery object 312 retrieves an event from an outgoing event queue associated with a first subsystem and places the event on the incoming event queue associated with a target subsystem identified by the event delivery object.

The event delivery object 312 provides an interface (hereafter, event bus API) 392 (see section 3.2.1) through which each subsystem 300 communicates with the event delivery object 312 using a standard protocol. Each subsystem 300 can "plug-in" to the event bus 310 because such subsystems 300 conform to the standard protocol. Further, this standard protocol permits other subsystems 300 that may not be developed until after the server 180 is deployed in the network, to be readily added to the server 180 as long as those later-developed subsystems 300 adhere to the standard protocol of the event bus API 392. The event bus API 392 may be provided as a C++ class, JAVA class, or shared library.

Each subsystem 300 provides a dynamically linked library (DLL) that implements a subsystem access layer (SAL) 304, 304', 304", 304'" (generally 304). Each SAL 304 defines application program interface (API) commands that may be used by other subsystems 300 to issue events to the subsystem 300 providing the SAL. SAL API functions use the event bus API 392 to create and send events to other subsystems 300 and system service modules 350 using the event delivery object 312. The SALs 304 of other subsystems in the server 180 are linked into the subsystem 300, e.g., using "include" and "library" files (i.e., ".h" files, ".dll" files, and ".lib" files) so that the subsystem 300 "knows" the events needed for interacting with those other subsystems 300.

Each subsystem 300 also includes an event handler table 308, 308', 308", 308'" (generally 308), respectively. Each event handler table 308 maps events directed to that subsystem 300 to an event handler routine that is able to process that received event. These event handler routines provide the core functionality of the subsystem 300 and are implemented in the software of the respective subsystem 300. One of the event handler routines is called upon dispatch of an event to the subsystem 300 (e.g., through the SAL 304 or by the event delivery object 312).

The following are pseudo-code examples of named event handler routines that are called upon the occurrence of particular events. As described in more detail below, event handler routines, when called, always receive a pointer to an event buffer 380 storing the delivered event. In these examples, the name of each handler routine is arbitrary, but suggestive of the function performed by that handler routine.

OnGetSampleData(EventBuffer* pEvent);

OnGetSystemData(EventBuffer* pEvent);

OnSetSampleData(EventBuffer* pEvent);

OnEnumerateAdminToolObjects(EventBuffer* pEvent);

OnHostUp (EventBuffer* pEvent);

OnHostUpReply(EventBuffer* pEvent);

The following is a pseudo-code example of an embodiment of a handler table 308 that maps events to the list of example handler routines above. Each entry of the event handler table 308 is provided by an "EVENT_ENTRY" macro. The EVENT_ENTRY macro takes as parameters an identifier of the source subsystem, an identifier of the event sent by the source subsystem, and identifies the event handler routine that responds to the event. In one embodiment, event identifiers are integers that are assigned constant values in a header file (e.g., (EventID_Defs.h") provided at the time the application is compiled.

BEGIN EVENT HANDLER TABLE

EVENT_ENTRY (SourceSubsystem, Event_GetSampleData,

OnGetSampleData);

EVENT_ENTRY (SourceSubsystem, Event_GetSystemData,

OnGetSystemData);

EVENT_ENTRY (SourceSubsystem, Event_SetSampleData,

OnSetSampleData);

EVENT_ENTRY (Administration Tool,

Event_EnumerateAdminToolObjects,

OnEnumerateAdminToolObjects);

EVENT_ENTRY (SourceSubsystem, Event_HostUp OnHostUp);

EVENT_ENTRY ((SourceSubsystem, Event_HostUpReply,

OnHostUpReply);

END EVENT HANDLER TABLE

3.2.1 Event Bus API

The event delivery object 312 provides an event bus API 392 that enables the subsystems 300 to direct events to the event delivery object 312. The event bus API 392 includes a "PostEvent" function. The PostEvent function permits a source subsystem 300 to send an event to the event delivery object 312 for subsequent delivery to a destination subsystem 300 on the server 180. As an input parameter, the PostEvent function includes a pointer to the event buffer 380 created to store that event. In other embodiments, the PostEvent function includes as other input parameters a source host identifier and a destination host identifier. In some embodiments, the event delivery object adds an event pointer to the event queue of the destination subsystem. In other embodiments, the event pointer bypasses the event queue and is dispatched directly to the destination subsystem 300. In one embodiment, once the event is dispatched to a destination subsystem 300, the PostEvent function immediately returns (i.e., the PostEvent function does not "block").

The PostEvent function may return a status value indicating the status of the event dispatch. For embodiments in which the event is dispatched to an event queue, the status value can indicate a failure to dispatch the event because the event queue associated with the target subsystem 300 is full. In other embodiments, the PostEvent function may accept, as input, a timeout value. The timeout value specifies a period of time that, if it elapses without a successful event delivery, the PostEvent function will indicate that it has failed. For example, the event delivery object 312 may be unable to dispatch an event to an event queue or the event delivery object 312 may dispatch the event to the transport layer 318 for remote transmission. In these embodiments, the thread of execution responsible for dispatching the event suspends execution for an associated timeout period once the event is dispatched. The operating system notifies the thread when the timeout period elapses. If the thread has not been notified that the event has been successfully dispatched before expiration of the timeout period, event dispatch has failed.

In still other embodiments, the PostEvent function may accept as input multiple addresses identifying multiple targets for the event, such as multiple subsystems 300 on the same server 180 or subsystems 300 distributed among several servers 180 in the server farm 110.

In other embodiments, the event delivery object 312 provides an API function that allows a subsystem to "pull" events off an event queue associated with that subsystem 300. In these embodiments, events are delivered to an event queue by the event delivery object 312 for eventual processing by the associated subsystem 300.

3.2.2 Subsystem API

Each subsystem 300 provides an interface 306 that the event delivery object 312 uses to dispatch events to that subsystem 300. The subsystem API 306 for every subsystem 300 includes a "DispatchEvent" function. Using the DispatchEvent function, the event delivery object 312 "pushes" an event to the subsystem 300, i.e, the event delivery object 312 passes an event pointer to the subsystem 300 for processing by an event handler routine. For embodiments in which an event queue is associated with the subsystem 300, the DispatchEvent function pushes the event at the head of the queue to the subsystem 300 for processing by an event handler routine.

3.2.3 Dispatch Table

The dispatch table 316 provides a routing mechanism for the event delivery object 312 to deliver events to the targeted subsystems 300 of the server 180. Referring to FIG. 4A, and in more detail, the dispatch table 316 includes an entry 420 for the system module 320 and each system service module or subsystem 300. Each entry 420 includes a destination subsystem field 424 and a dispatch address field 428 for mapping one of the subsystems 300, one of the system service modules 350, or the system module 320 to a dispatch address associated with that subsystem. In one embodiment, the dispatch address is the address of an event queue associated with the system module 320, one of the subsystems 300, or one of the system service modules 350. In some embodiments, the dispatch table 316 includes further information, such as a flag indicating whether the corresponding subsystem has been implemented to take advantage of multi-threaded execution (not shown). An exemplary mapping of subsystems 300, system module 320, and system service modules 350 to dispatch addresses is illustrated in FIG. 4A.

For purposes of illustrating this mapping, names corresponding to each of the subsystems 300, the system module 320, and the system service modules 350 appear in the destination subsystem field 424 and names corresponding to their associated dispatch addresses appear in the dispatch address field 428. It is to be understood that the implementation of the dispatch table 316 can use pointers to the addresses of the destination subsystems 300, system module 320, system service modules 350, and corresponding dispatch addresses. The event delivery object 312 populates the entries 420 of the dispatch table 316 with mapping information during the initialization of the server 180.

3.3 Direct Subsystem Communication

In some embodiments, subsystems 300 communicate directly with system service modules 350 without using the event bus 310. In these embodiments, the system service modules 350 provide an internal API 302 that may be directly called by subsystems 300 resident locally, i.e., on the same server 180. The internal API 302 may provide the same function calls as the event bus API 392 described above. Alternatively, the internal API 302 may provide a subset or a superset of the functions provided by the event bus API 392.

3.4 Persistent Store System Service Module

As described above in connection with FIGS. 2A and 3, a persistent store 230 is used by servers 180 to maintain static data. A persistent store system service module 352 serves as the mechanism that allows other subsystems 300 to access information from the persistent store 230. The persistent store system service module 352 translates subsystem requests into database requests. In one embodiment, the database merges a plurality of distributed storage systems together to form the persistent store 230. For example, the database may be provided as an ORACLE database, manufactured by Oracle Corporation, of Redwood City, Calif. In other embodiments, the database can be a Microsoft ACCESS database or a Microsoft SQL server database.

The persistent store system service module 352 services data requests or writes to the persistent store 230 that are received from a variety of potentially disparate requesting entities. The requesting entities reside on servers 180 that are part of the same server farm 110 as the persistent store 230. The requesting entities may also reside on platforms that are normally incompatible with that of the database providing the persistent store 230.

In order to service data requests from disparate entities, the persistent store system service module 352 translates requests made using an external data model into a database request using the internal data model used by the database providing the persistent store 230. Each of the requesting entities incorporates their particular external data model in an event that is transmitted to the persistent store system service module 352. In some embodiments, the internal data model closely approximates the external data models so that elements of the internal data model may be used as primitive blocks in building the external data model when responding to a request.

The persistent store system service module 352 essentially converts an event message submitted by the requesting entity in an external data model format into a locally understood internal data model format, and vice versa, in order to service the request. The internal and external data models supported by the persistent store system service module 352 can, for example, correspond to the lightweight directory access protocol (LDAP) data model or other protocol or database formats. The ability to convert external data models from a number of different requesting entities into a single internal data model (and vice versa) enables the persistent store system service module 352 to provide uniform access to data stored on the persistent store 230.

The information typically stored on the persistent store 230 includes, for example, system configuration information, security information, application settings common to a particular server farm 110, application hierarchy, common application objects, and unique identifiers for each stored object. In one embodiment, the stored information can be organized as entries that represent certain objects, such as a server 180, a subsystem 300, or a user. Each entry includes a collection of attributes that contain information about the object. Every attribute has a type and one or more values. The attribute type is associated with a particular syntax that specifies the kind of values that can be stored for that attribute.

In one embodiment, the objects in the persistent store 230 may be stored in a database file and, in this embodiment, the persistent store 230 maybe searched using traditional database requests. In another embodiment, the distinguished name of the requested data as specified by the external data model is mapped to the implicit or pre-defined schema stored on the persistent store 230. The pre-defined schema may include one or more fields that allow the objects within the database to be arranged as a tree data structure (e.g., a binary tree). For example, each entry in the persistent store 230 may include a "ParentID" field, a "NodeID" field, and a "Node Name" field as shown in Table 1 below, which allow the persistent store 230 to be searched as a tree data structure. For this embodiment, every object stored in the persistent store 230 may have an attribute that specifies the location of the object in the tree. This location can be an absolute position in the tree with respect to the root node or relative to the locations of other objects in the tree (e.g., relative to a parent node). Table 1 illustrates an exemplary arrangement of objects in the persistent store 230 that can be traversed like a tree:

           TABLE 1
        Parent Node ID       Node ID        Node Name
            none               0           Root (implied)
              0                1           farm_1
              0                2           farm_2
              1                3           Authorized_users
              2                4           Authorized_users
              3                5           user_1
              4                6           user_1


To avoid having to traverse the entire tree upon each access to an object in the persistent store 230, a requesting subsystem 300 can dynamically bind to a particular node in the tree to serve as a starting point for traversing the tree. The particular node in the tree depends upon the type of subsystem. Generally, each subsystem 300 owns part of the tree, that is, the subsystem owns those objects that it stored in the tree. Thus, the particular node can operate as a root node for objects that the subsystem owns and a starting point from which to traverse those objects. For example using Table 1, a subsystem 300 can bind to the authorized_users node to serve as a starting point for searching for a particular user.

As an illustrative example, consider that the administration tool 140 wants to authenticate whether a remote user of the server farm 110 is authorized to access an application program on a particular server 180 that is part of that server farm 110. The administration tool 140 directs an administration subsystem (not shown) to send an event message to the persistent store system service module 352 via the service locator 354 and the event delivery object 312 to obtain the desired information. The persistent store system service module 352 receives and parses the event message to obtain the distinguished name of the entry (described below) and attributes that are being requested.

The format of the distinguished name corresponds to the external model used by the administration subsystem when forming the event message. An example of such a distinguished name is "root/farm_name/authorized_users/user_1." Assuming that the contents of the persistent store 230 are organized into a single tree, the persistent store system service module 352 traverses the tree to obtain information about the authorized users of that particular application. The persistent store system service module 352 traverses "down" the tree to determine whether the last node traversed matches the distinguished name (in this case, whether the user_1 is included as an authorized user). In this manner, as long as the distinguished name in the external model maintains a hierarchical order that corresponds to a tree structure (internal model) in the persistent store 230, the individual/arbitrary formats of each element of the distinguished name do not need to be analyzed.

Data ownership and security issues are also important considerations when sharing a common persistent storage environment across multiple subsystems (requesting entities). The subsystem 300, which is the source of the data, sets the access restrictions via the SAL API of the persistent store system service module 352 that limit the exposure of the data to an authorized subset of requesting entities via the SAL API.

3.5 The Dynamic Store System Service Module

The dynamic store 240 operates as a global database that stores records accessible by each server 180 in a zone 260, 270. In one embodiment, each stored record is an attribute-value pair. An example of an attribute is subsystem identifier; an example of a value is the actual subsystem ID number. Each subsystem 300 that uses the dynamic store 240 defines the schema of the records that are created and stored for that subsystem type. Different subsystems generally have different schemas. The first call that a subsystem 300 makes to the dynamic store 240 registers the schema that subsystem will use. Subsequently, all subsystems 300 of the same subsystem type that register with the dynamic store 240 can access the records created according to that registered schema. As part of registering the schema, a subsystem 300 can specify which attributes may be used for searching. In one embodiment, a subsystem 300 identifies one or more attributes that will be frequently used to search the record table.

In one embodiment, each record is stored by both the server 180 creating the record as well as the server 180 responsible for storing records of that type. For embodiments in which more than one zone 260, 270 exists in a farm 110, a record is stored on a server 180 in each zone 260, 270 identified by the zone master of each zone as the server 180 that stores records of that type. The server 180 creating the record essentially acts as redundant storage for the table record. In some embodiments, the table owner updates the server 180 creating the record with subsequent changes to the record. Within a zone 260, 270 the definitive authority as to the correct value of a table record is the table owner, i.e., the server 180 chosen by the zone master to store data records of that type. Between zones, the definitive authority as to the correct value of a table record is the table owner in the zone from which the record originated. Although there are definitive authorities as to the correct value for a table record, no definitive authority exists as to the contents of a table--a table's contents are the union of all table records stored throughout the farm 110.

Each server 180 in the server farm 110 has a dynamic store system service module 356 that handles all calls from subsystems 300 to the dynamic store 240. The dynamic store system service module 356 permits each subsystem to perform database operations on the dynamic store 240. The operations are: (1) to insert a record, (2) to delete a record, (3) to search the dynamic store 240 to retrieve all records satisfying certain specified criteria, and (4) to update one or more values for attributes in an existing record.

When a record is inserted into a table or when a record is updated, the server 180 requesting the change locally stores the record and forwards it to the owner of the table. The name of the server changing or creating the record can be added as an attribute to the record to facilitate informing that server of subsequent changes to the record that may be effected by other servers 180 in the farm 110.

The requesting server 180 uses its local copy of the record if the table owner changes unexpectedly, for example, if the table owner crashes. When the zone manager detects this problem and designates a new table owner, the servers 180 in the server farm 110 upload locally-stored table records to the new owner.

Records can be queried based on attribute, and any number of records may be returned from a query. When a server 180 receives a query request, it forwards the request to the table owner, which performs the search and returns the results. The server that originated the query may cache the search results depending on various criteria such as configuration or record consistency parameters.

The delete operation is similar to a query, in that any valid search parameters can be used to specify which records to delete. This allows for operations such as "delete all records from host ABC."

Just as with a query request, the delete request is forwarded to the appropriate table owner. Since some of the records being deleted may have been created on the requesting server, the table owner returns a list of the records that were actually deleted. This allows the local server 180 to delete locally-stored records.

In one embodiment, when a subsystem 300 registers its schema (i.e., defines the data structure) with the dynamic store 240, that subsystem 300 also supplies one or more parameters that specifies usage information about records. One such parameter controls "update latency," that is, the frequency at which the records are updated. Every subsystem 300 on every server 180 can independently determine this frequency and therefore every server 180 in the server farm 110 can see the same information in the records associated with that subsystem 300.

Another parameter is the "time to live after originating host is no longer present." This parameter is useful for maintaining the record although the originator of the record is no longer active. When the time to live is set to zero, the record is deleted immediately after the absence of the originating host is detected by the record owner, i.e., the collector point responsible for collecting records of that type. The record owner is the only subsystem entitled to delete this record. Yet another parameter is a "time to live" parameter that results in automatic deletion of a record by the dynamic store system service module 356 when the "time to live" is exceeded. Time starts from the insertion of that record into the dynamic store 240.

Through communication among the servers in the server farm 110, there is a dynamic election of a master server in every zone defined in the server farm 110. After the master server is elected, all other servers in the zone know the identity of the master server, as described in more detail below.

At least one copy of every record in the dynamic store 240 exists in each zone. In one embodiment, the master server of the zone stores every record in memory local to that master server. In another embodiment, the master server distributes the dynamic store 240 in the local memory 325 of some or all of the servers 180 in the zone based on record type. The determined server is thus designated as the collector point for that record type.

Should one of the servers in the server farm fail, the master server chooses a new server in the zone to hold the type of records that the failed server previously held. This new server requests an update of those records from every other server in that zone to replace the records that became inaccessible when the server failed. Because every server keeps a copy of the records that pertain to that server, the update restores the content of the dynamic store 240. If the master server fails, any server in the zone that detects the absence of the master server initiates an election for a new master server.

In one embodiment, master servers are the only servers that know the master servers of the other zones 260, 270. To obtain this information, each master server queries every server in each other zone 260, 270, seeking a response that identifies the master server of that zone 260, 270. Zones are preconfigured, and the identity of servers associated with zones 260, 270 is stored in the persistent store 230. Periodically, each master server of a zone 260, 270 sends the records in the dynamic store 240 for that zone 260, 270 to the master servers in the other zones 260, 270. In another embodiment, each server that holds the records sends a copy of those records to corresponding servers in the other zones 260, 270. Such servers determine who are the corresponding servers in the other zones 260, 270 from information collected by the master server of its own zone 260, 270.

3.6 Service Locator System Service Module

Referring again to FIG. 3, the service locator 354 is in communication with each subsystem 300 over the event bus 310 (or via its internal API). The service locator 354 identifies a server 180 for servicing events issued to other subsystems 300. The identified server 180 can be local or remote. In brief overview, a source subsystem 300 may create or issue an event for which the host of the destination subsystem is not determined before the source subsystem 300 issues the event. In these cases, the source subsystem 300 uses either a SAL API call or an internal API call provided by the service locator 354 to either (1) obtain the address of the server 180 hosting the destination subsystem 300 or (2) request that the service locator 354 deliver an event to the destination subsystem 300 on behalf of the source subsystem 300.

The service locator 354 identifies a destination host by accessing information maintained in the dynamic store 240 through the dynamic store system service module 356 (see section 3.5). This information provides a zone-wide inventory of the server components in the server farm 110; that is, the information indicates which subsystems (and the versions of those subsystems) are installed on every server 180 in the server zone 260, 270. This information also indicates which of such servers 180 in the zone 260, 270 are currently operating. Thus, through this information, the service locator 354 has knowledge of all available subsystems 300 in the zone 260, 270.

Every server 180 in the server farm 110 has a service locator 354 that contributes to the zone-wide information in the dynamic store 240. For example, when a server 180 becomes operational, each subsystem 300 installed on the server 180 registers with the service locator 354. In one embodiment, the service locator 354 provides a "RegisterService" function that may be called by a subsystem (either through the SAL API or the internal API of the service locator 354) in order to register services that it can provide to other subsystems. In one embodiment, subsystems 300 register with the service locator 354 each version of each event that the subsystem 300 will process. In another embodiment, the RegisterService function also accepts as a parameter a rank value, which indicates the relative importance of the subsystem 300. Upon receipt of the registration message, the service locator 354 makes an entry into the dynamic store 240 for that subsystem 300. The entry includes the information provided by the subsystem, such as its identifier and its rank, when provided. Table 2 below depicts one embodiment of a table stored in the dynamic store 240.

            TABLE 2
            Subsystem
            ID              Rank        Zone       Host ID
            FFFF              1           A          0015
            AAAA              0           A          0012
            FFFF              1           A          0009
            AAAA              0           A          0006


When a server 180 shuts down in a controlled fashion, it is removed from the zone 260, 270, and an "UnregisterService" call is made to the service locator 354 by each subsystem 300 resident on that server 180. This call informs the service locator 354 that those subsystems are no longer present in the zone 260, 270. In some embodiments, the service locator 354 instructs the dynamic store 240 to discard records associated with a server 180 that terminates execution unnaturally, e.g., crashes.

To determine the target host for servicing an event, the service locator 354 determines certain information: (1) which servers 180 host the type of subsystem 300 identified in the event as the destination subsystem, and (2) which of such servers 180 is the target host for handling the event. Upon determining the target host, the service locator 354 either returns the determined address to the requesting subsystem 300 or it modifies a received event to include the determined address as the addressing information for the event and it delivers the modified event to the event bus 310 for delivery to that host.

Referring back to Table 2, an embodiment of a table stored in the dynamic store 240 by service locators 354 is shown that includes entries for two subsystems (having identifiers FFFF and AAAA). Each entry includes a subsystem identifier, a subsystem rank, a zone identifier, and a host identifier. The service locator 354 receives a request for an address (or a request to deliver an event to a host) and accesses the table stored in the dynamic store 240. In some embodiments, the service locator 354 provides two function calls that return a target host identifier to the requesting subsystem 300: "GetBestHost," which returns the host identifier associated with a host that can handle a particular type of event; and "GetBestHostFromList," which returns a target host identifier selected from an input list of hosts. If the table has only one entry for which the subsystem identifier matches the subsystem identifier provided in the API call, the host identifier from that table entry is returned to the requesting subsystem 300. If more than one table entry has a matching subsystem identifier, i.e., there is more than one host in the zone that can process the subject event, a host identifier is selected based using a predetermined rule or set of rules. For example, a host identifier may be selected at random, in round-robin order, based on the rank associated with the table entry, or based on other information that may be stored in table such as network latency to host, available bandwidth of channel between requesting subsystem 300 and target host, or geographic proximity to the requesting subsystem 300.

The service locator 354 may also provide API calls for sending an event to the target host on behalf of the requesting subsystem 300. In these embodiments, if only one of the other servers in the zone can process the identified message, i.e., there is only one entry in the table, then the service locator 354 inserts the host identification of that server into the event and sends the modified event to the event bus 310 for delivery to the target host. If more than one other server in the zone has the destination subsystem, then the service locator 354 chooses one of the servers using any of a variety of criteria as described above, modifies the event as described above, and transmits the modified event to the target host.

Using Table 2 as a specific example, a subsystem 300 may issue a GetBestHost call for a subsystem having an identifier of "FFFF." Two servers host that subsystem, identified by an identifier of 9 and 15. The identifier corresponding to either of these hosts may be returned to the requesting subsystem. In one embodiment, the system administrator can force one of the two subsystems to be elected by changing the "rank" values in the table. For example, if the entry associated with host "15" has a higher rank than the entry associated with host "9," host "15" may always be selected as the target host.

3.7 Subscription Manager System Service Module

The subscription manager 362 manages subscriptions for a server 180. A subscription is a standing request by which a subscribing subsystem 300 publicizes to the subscription manager 362 of the local server and/or to the subscription managers of remote servers that the subscribing subsystem wants to be notified upon the occurrence of an event. The registered subscription identifies the event and the subscribed-to subsystem that produces the event. Upon the occurrence of that event, the subscription manager 362 sends the event to any subsystem that has registered a subscription to that event by way of the event delivery object 312.

The subscription manager 362 uses two tables for managing subscriptions: (1) a local subscription table 450, and (2) a remote subscription table 418.

3.7.1 Local Subscription Table

The local subscription table 450 resides in local server memory 325 and stores subscriptions for which the specified scope is local. Using the local subscription table 450, the subscription manager 362 can alert local subsystems 300 of the occurrence of particular events on the server 180. Any local subsystem 300 on any server 180 can request to be notified when a particular subsystem issues a particular event by posting a subscription for that occurrence in the local subscription table 450.

Referring to FIG. 4B, and in more detail, the local subscription table 450 includes an entry 460 for each posted subscription. In one embodiment, each entry 460 of the local subscription table 450 includes event field 462 identifying a unique event, a subsystem field 464 identifying the subsystem that owns (i.e., generates) the unique event, and a destination subsystem field 468 identifying the subsystem 300 subscribing to the unique event. An exemplary local subscription is illustrated in FIG. 4B in which subsystem 300 seeks to be notified when subsystem 300' posts an "I'm Up" event to the event delivery object 312. For purposes of illustrating this subscription, names corresponding to the subsystem 300' and the service locator 354 appear in the fields 464 and 468, respectively, but the actual implementation of this subscription can use pointers to such subsystem 300' and service locator 354.

3.7.2 Remote Subscription Table

A remote subscription table 480 is stored in the dynamic store 240 and stores subscriptions registered by specific remote servers or having a scope specified as zone or farm-wide. Placing such subscriptions in the dynamic store 240 makes the subscriptions accessible farm-wide by subscription managers 362 of every other server 180 in the server farm 110. In one embodiment, shown in FIG. 4C, the remote subscription table 480 is implemented as three separate tables: a first table 480' stores subscriptions to events that may occur in the same "zone," a second table 480" stores subscriptions to events that may occur anywhere in the server farm 110, and a third table 480" stores subscriptions to events that may occur on a specifically identified remote host.

In more detail, each table 480', 480", and 480'" (generally 480) includes an entry 484 for each posted subscription. In one embodiment, each entry 484 includes an event field 492 identifying a unique event, a subsystem field 494 identifying the subsystem that owns (i.e., generates) the unique event, a destination subsystem field 496 identifying the subsystem 300 subscribing to the unique event, and a subscribing host field 498 identifying the host of the subscribing subsystem. The table 480'" further includes a source host identifier 488 for identifying the specific remote host upon which the subscribed-to subsystem resides. An exemplary subscription is illustrated in FIG. 4C in which subsystem 300 seeks to be notified when subsystem 300' of a particular remote host server 180' posts an "I'm Up" event. For purposes of illustrating this subscription, which is placed in the specific remote table 480'" of the remote subscription table 480, names corresponding to the servers 180, 180' and subsystems 300', 300 appear in the entry 484, but the actual implementation of this subscription can use pointers to such servers 180, 180' and subsystems 300', 300.

The subscription manager 362 provides three functions that can be called by other subsystems 300: (1) Subscribe, (2) Unsubscribe, and (3) PostNotificationEvent. In one embodiment, these functions are called through the SAL 304 associated with the subscription manager 362. In another embodiment, the functions are called through the internal API provided by each subscription manager 362.

3.7.3 Subscribe Function

When a subsystem 300 wants to subscribe to an event of another subsystem 300, the subscribing subsystem 300 calls the Subscribe function (either via a SAL API call or an internal API call) provided by the subscription manager 362. The Subscribe function instructs the subscription manager 362 to register a subscription in either the local subscription table 450 or in the remote subscription table 480 held in the dynamic store 240. The subscribing subsystem 300 specifies the scope of the subscription: local, zone, or farm-wide. In one embodiment, the specific SAL call used by the subscribing subsystem 300 determines the scope of the subscription. In another embodiment, the scope is an input parameter of the SAL call. The event delivery object 312 of the event bus 310 dispatches the Subscribe event to the subscription manager 362.

Typically, those subsystems 300 that are initialized after the subscription manager 362 is initialized call the Subscribe function during the initialization of such subsystems 300. The Subscribe function can also be called anytime during server operation by any subsystem. Input parameters to the Subscribe function uniquely identify the subscribing subsystem, the event for which the subscribing subsystem requests notification, the subscribed subsystem to be monitored, and, optionally, the scope of the subscription.

In one embodiment, the parameters uniquely identifying the subscribing and subscribed subsystems 300 may each be implemented as two separate entities: a value identifying the subsystem 300 and a value identifying the host on which the subsystem 300 resides. In other embodiments, the Subscribe function returns an output value representing the status of the subscription request, such as successfully registered.

Upon receiving the Subscribe function call, the subscription manager 362 determines the scope of the subscription from the type of SAL call 304 used to deliver the Subscribe event. If the scope of the subscription is for a local subsystem, then the subscription manager 362 stores a corresponding subscription entry in the local subscription table 450. If the scope of the subscription is remote, the subscription manager 362 communicates with the dynamic store subsystem 370 over the event bus 310 to register the subscription in the appropriate section of the remote subscription table 480 in dynamic store 240.

3.7.4 Unsubscribe Function

A subscribing system 300 can remove a previously registered subscription from the local and remote subscription tables 450, 480 by issuing an Unsubscribe function to the subscription manager 362. Such subscribing subsystem 300 can unsubscribe to only those subscriptions that the subsystem 300 has previously registered. Input parameters to the Unsubscribe function uniquely identify the subsystem requesting removal of the subscription, the event for which the subscribing subsystem no longer requests notification, and the subsystem having the subscription to be removed. The input parameters that uniquely identify the subscribing and subscribed-to subsystems are implemented in one embodiment as two separate entities: a value identifying the subsystem and a value identifying the host on which that subsystem resides.

In response to an Unsubscribe function call, the subscription manager 362 searches the local subscription table 450 and remote subscription tables 480 and removes every entry corresponding to the subscription to be removed. To remove the subscription from the remote subscription tables 480, the subscription manager 362 sends a delete request to the dynamic store system service module 356 to remove the entries from the dynamic store 240. The Unsubscribe function returns an output value representing the status of the removal of the subscription, such as successfully completed.

3.7.5 PostNotificationEvent

Some subsystems 300 produce events that may be subscribed to by other subsystems that are local and/or remote to these subsystems. Upon issuing such an event, such subsystems 300 also call a PostNotficationEvent function to send a copy of this event to the subscription manager 362. The subscription manager 362 issues a copy of that event to local or remote subscribing subsystems 300. The subsystems 300 call the PostNotificationEvent function regardless of whether any subsystem has actually registered a subscription to that event, because only the subscription manager knows if an event has been subscribed to by another subsystem.

FIG. 5A shows an embodiment of a process used by the subscription manager 362 upon receiving (step 510) a Subscribe function command. From the event type, the subscription manager 362 determines (step 514) whether the scope of the subscription event is remote. If the subscription is not remote in scope, the subscription manager 362 stores (step 518) the subscription in the local subscription table 450. When the scope of the subscription is remote, the subscription manager 362 determines (step 522) whether the subscribed-to event is in the zone, farm-wide, or for a specific remote host. Then the subscription manager 362 inserts (step 526) the subscription into the appropriate table 480', 480", 480'". The inserted subscription (hereafter, a subscription record) follows the particular schema defined by the subscription manager 362. A similar process is used to remove subscriptions from the subscription tables 450 and 480 upon receiving an Unsubscribe call.

FIG. 5B shows an embodiment of a process used by the subscription manager 362 for each PostNotificationEvent received (step 550) by the subscription manager 362. The subscription manager 362 determines (step 554) if the event exists in the local subscription table 450. If the event is subscribed to by one or more local subsystems, then the subscription manager 362 generates (step 558) a copy of the event to be delivered to each subscribing local subsystem. Each copy of the event is placed in its own event buffer 380.

Then the subscription manager 362 checks (step 562) the zone table 480' for any subscribing servers in the same zone. Similarly, the subscription manager 362 requests searches (steps 566 and 570) for subscriptions in the farm-wide section 480" and specific remote host section 480'", respectively, of the remote subscription table 480. In one embodiment, for each access to the remote subscription tables 480, the subscription manager 362 issues an event to the dynamic store system service module 356 that causes the desired search.

Then, in one embodiment, rather than search the local dynamic store 240 directly, the subscription manager 362 sends a copy of the event to a subscription dispatcher. The subscription dispatcher is one of the servers 180 in the server farm 110 that is dedicated for dispatching events to remote subscribers (i.e., another server in the same or different zone). The subscription dispatcher is identified as the target host in the zone for handling subscribed-to events.

For each received event, the subscription dispatcher performs a search operation on the remote subscription tables 480 in the dynamic store 240 and retrieves all subscription records corresponding to subscribers of that event. Each retrieved subscription record corresponds to one subscription. The subscription manager 362 then produces an event for each retrieved record, inserting the identification of the subscribing subsystem into the appropriate field in that event.

3.8 Host Resolution System Service Module

A subsystem 300 may target events to another subsystem residing on a remote server. Parameters associated with issuing such events include a unique host identifier corresponding to the remote server. The host resolver 360 receives such events from these source subsystems 300 (and in other embodiments from other system service modules 350) requesting that a distinguished name be obtained for the remote server. To obtain the distinguished name, the host resolver 360 sends an event that includes the unique host identifier to the persistent store system service module 352. The persistent store system service module 352 uses the unique host identifier to search the persistent store 230 for a corresponding distinguished name, and returns the distinguished name and the port address to the host resolver 360. The host resolver 360 can return the distinguished name and port address to the source subsystem 300 or it may forward the event received from the source subsystem 300 to the host identified by the distinguished name on behalf the source subsystem 300.

3.9 Zone Manager System Service Module

Each server 180 in the server farm 110 includes a zone manager 358 that directs accesses to the dynamic store 240 made by the dynamic store system service module 356 to the server 180 responsible for collecting data of the type identified in the access. One of the zone managers 358 in a server farm 110 is elected by its peers to be the master of the server farm 180. When acting as a master, a zone manager 358 (1) determines which server 180 collects each type of data, (2) designates which servers 180 in the farm 110 are responsible for providing various network services, and (3) identifies the zone master of other zones 260, 270 in the farm 110. As described above, the dynamic store 240 may be distributed among more than one server 180 in a server farm 110.

3.9.1 Assigning Ownership of Distributed Resources

The dynamic store 240, in one embodiment, comprises one or more record tables managed by the dynamic store system service module 356. Record tables store information relating to server farm run-time data, such as dynamic subscription tables and disconnected sessions. The dynamic store system service module 356 queries the zone master to determine which server 180 in the zone 260, 270 stores the various record tables.

The dynamic store system service module 356 can use the services of the zone master through a zone master interface, which in one embodiment provides a service called GetZoneResourceOwner. This service accepts as input a unique string identifier of an arbitrary resource, and returns the identity of the server 180 that should own a given resource. The dynamic store 230 is thus able to call GetZoneResourceOwner, passing the name of the dynamic store record table whose owner is desired, and the zone master will return the identity of the server 180 that owns that resource, i.e., that stores the dynamic store 230 records for that resource.

In further embodiments, the zone master chooses which server 180 in a server farm 110 stores dynamic store record tables. In these embodiments, the zone manager may choose a server 180 based on physical characteristics, such as available memory, or other criteria, such as proximity to (either logically or physically) those entities requesting the dynamic store records. In other of these embodiments, the zone master may change which server 180 stores the record table during server farm operation.

3.9.2 Assignment Ownership of Network Services

In some embodiments, certain services provided by service modules may be centralized, to allow all of the servers 180 in a zone 260, 270 make service request directly to the same zone server. An example of this might be a licensing server. In this example, all requests for a license would be directed to a single server 180 in the zone 180.

The service locator system service module 354 tracks which services are available on which servers 180 in the zone 260, 270. Although in one embodiment the main purpose of the service locator system service module 354 is to find the `best` host for a given service that may be available on many servers 180 in the zone 260, 270, it is also responsible for sending messages to centralized service modules. The determination as to which of the zone's member servers should be responsible for handling a given centralized service is made by the zone master in a similar way to how it assigns ownership of zone resources. Thus, the service locator system service module 354 the zone master to determine where requests for such services should be directed.

A master election can occur when a new server is added to a zone 260, 270. Alternatively, any zone manager 358 can initiate an election if the master fails to respond to a query, i.e., the master has failed.

In one embodiment, any zone manager 358 may force an election at any time by broadcasting a request election event. The election results are determined by a comparison of the set of election criteria which is transmitted within the request election event transmitted by the requesting zone manager 358 with the set of election criteria maintained on each receiving zone manager 358. That is, the first election criterion from the event of the requesting zone manager 358 is compared by the receiving zone manager 358 to the first criterion of the receiving zone manager 358. The highest ranking of the two criteria being compared wins the comparison and the zone manager 358 with that criterion wins the election. If the two criteria tie, then the next criteria are sequentially compared until the tie is broken.

Election criteria may be whether or not the zone manager 358 is statically configured as a master; whether the zone manager 358 is resident on the longest running server; and whether the server on which the zone manager 358 is resident has a lexically lower network name.

The interaction of zone manager system service and the dynamic store system service modules 358, 356 to manage and access the dynamic store 240 is discussed in greater detail below (see section 6.3).

3.10 System Module

The system module 320 is an executable program (.exe) that manages the boot-up of the server 180. Like each subsystem 300, the system module 320 is addressable (i.e., can be the target of an event) and includes an event queue 324 to receive events, such as "SetListeningPort," which sets the transport protocol port address on which the transport layer 260 "listens" for communication events. Another example of an event that can be directed to the system module 320 is "LoadSubsystem," which instructs the system module 320 to load a subsystem. Upon execution, the system module 320 initializes the event delivery object 312, the transport layer 318, and the loader module 330. The system module 320 also binds the transport layer 318 to the event delivery object 312. In one embodiment, the system module is provided as a WINDOWS NT service. In another embodiment, system module 320 is provided as a Unix daemon.

3.11 Loader

The loader module 330 allows for customization of the event bus 310 for different platforms and applications. The loader 330 can be implemented as a C++ class, implemented as static code or as a dynamically linked library. In brief overview, the loader module 330 uses several functions to manage the subsystems 300. In general, the functions performed by the loader module 330 create and destroy subsystems 300. Operation of the loader module 330 is described in more detail in connection with FIG. 4.

The loader module 330 uses a create function, having as input a subsystem identifier, to generate an instance of each subsystem 300. For embodiments in which an event queue is associated with the subsystem 300, the create function invokes an instantiation of an event queue in the event delivery object 312 and the loader 330 binds the event queue to the discovered subsystem 300. In other embodiments, the subsystem 300 is identified by a pointer that is entered in the dispatch table 316 to identify the subsystem 300.

The event delivery object 312 uses the pointer stored in the event delivery object 312 (in some embodiments the pointer identifies an event queue) to send events to the subsystem. The subsystem 300 uses a pointer to the event delivery object 312 to deliver event to the event bus 310. Thus, for example, in embodiments in which the interfaces are provided as C++ classes, the pointers identify the desired classes. In some embodiments, this function can return a status value. The loader module 330 uses a destroy function to delete an instance of the subsystem 300 (together with an event queue, if provided, associated with that deleted subsystem) and the corresponding entry in the dispatch table 316.

4.0 Server and Subsystem Initialization

FIG. 6 illustrates an embodiment of a process used to initialize a server 180, including system service modules 350 and personality subsystems 300. A server 180 executes boot service code (i.e., the system module 320) that creates the event bus 310. In the embodiment shown in FIG. 6, creation of the event bus 310 includes the steps of creating an event delivery object 312 (step 604), creating a transport mechanism 318 (step 608), and binding the event delivery object 312 to the transport layer 318 (step 612).

The system module 320 instantiates a loader module 330 (step 616) and starts (step 620) execution of the loader module 330. The loader module 330 creates and loads (step 624) a specialized subsystem identified by an initialization resource. In some embodiments, the specialized subsystem is identified by an entry in a registry file. For embodiments in which system service modules 350 are provided as subsystems, the specialized subsystem instructs the loader module 330 to create and load all required system service modules 350 (step 628). The specialized subsystem also determines which personality subsystems 300 should be loaded for the server 180 (step 632). In one embodiment, the specialized subsystem accesses a registry file to determine which personality subsystems 300 should be loaded and the registry file specifies an order in which the personality subsystems are loaded. For embodiments in which the system service modules 350 are provided as subsystems, the registry file also specifies the order in which they are initialized. In one particular embodiment, the registry file specifies the following order: the persistent storage system service module 352, the dynamic store system service module 356, the zone manager 358, the host resolver 360, the service locator 354, the subscription manager 362.

In another embodiment, the specialized subsystem accesses an initialization file to determine which subsystems should be loaded. In still other embodiments, the specialized subsystem accesses the persistent store 230 to determine which subsystems should be loaded. As part of loading the subsystems 300, the loader module 330 populates (step 636) the dispatch table 316 with entries 420 that map subsystem entry points to subsystem identifiers associated with the loaded subsystems 300, as shown above in FIG. 4A.

Each subsystem 300 can be represented by an entry in the initialization resource, i.e. installed on the server 180, because (1) the subsystem is necessary to the operation of the server 180, or (2) the subsystem is anticipated to be useful. In one embodiment, another reason for installing a subsystem 300 is that the subsystem 300 is requested by the arrival of an event directed to that subsystem (i.e., on-demand). For such embodiments that implement on-demand loading, the loader module 330 waits until an event is received directed to that subsystem before creating that subsystem. In these embodiments, the loader module 330 provides an API that allows the loader module 330 to be invoked during run-time to create and initialize a personality subsystem 300.

5.0 Events

FIG. 7A depicts an embodiment of an event 700 that includes an event header 710 and event data 730. The event header 710 is sometimes referred to as "control data" and event data 730 may be referred to as "payload data."

Referring now to FIG. 7B, the event header 710 includes one or more data fields that indicate various attributes associated with the event 700. For example, the event header 710 may include: a unique event identifier (event UID) 712; an event header version identifier 714; an event data version identifier 716; an event data size indicator 718; an event data offset identifier 720; a unique identifier (UID) 722 identifying a source subsystem; a destination subsystem UID 724 identifying a destination subsystem; and a channel identifier 726, described in detail below.

In more detail, the event UID 712 uniquely identifies each event produced by the subsystems 300 and system service modules 350. Every subsystem and system service module 350 predefines the event IDs of those events that it accept. The event IDs are hard-coded and unique for each server 180. Uniqueness of an event 700 within a server 180 is established by the combination of the source subsystem UID 722 and the event ID 712, and are used in combination to map events to handler routines as described above.

Identifiers of the source host and the destination host may be passed as parameters in the SAL commands used to issue events 700. In such cases, the source host identifier and the source subsystem UID 722 together uniquely identify the sender (i.e., the source server and subsystem) of the event 700. The destination host identifier and the destination subsystem UID 724 uniquely identify the subsystem or system service module 350 targeted to receive the event.

In one embodiment, the highest order bit of the event UID 712 is a "request bit" and indicates to the receiving subsystem how to map the event to the proper handler routine. All subsystems can optionally choose to handle events of another subsystem through such mechanisms as subscriptions. The event handler routines are mapped according to subsystem UID and event UID 712. Because the event being processed can either be directed or subscribed to, the request bit indicates whether to use the source or destination subsystem UID to map the event to the proper handler routine.

The event header version identifier 714 defines the layout of the event header 710, such as the size and order of fields in the header 710. The event data version identifier 1416 implicitly defines the layout of the event data 730 included in the event 700. The event data offset identifier 720 indicates the offset from the event header 710 at which the event data 730 begins. The event data offset 720 equals the size of the event header 710. The channel identifier 726 is used to match a reply event to a request event.

5.1 Event Types

Events can be one of several types including directed and notification events.

5.1.1 Directed Events

Directed events are events that have a specified destination subsystem 300 when sent to the event delivery object 312. The specified destination includes a unique identification (UID) of the destination subsystem 300 and an identifier of the server 180 hosting the destination subsystem. Examples of directed events include notification events and the request and reply events described below.

5.1.1.1 Request-and-Reply Events

Request events are subsystem specific directed events that send a request for service or functionality to another subsystem on the same server 180 or to a remote server in the server farm 110. Such request events contain codes that the destination subsystem can map onto known interfaces (i.e., event handler routines) to provide that service or functionality. Each request event includes a unique channel ID for use by the destination subsystem when creating a corresponding reply event.

Reply events occur in response to request events. Each reply event is delivered as a directed event to the subsystem from which the corresponding request event originated. The reply event specifies the same channel ID and the same event buffer 380 used by the corresponding request event. The subsystem that sent the request event waits for the reply event from the event delivery object 312. The same channel ID indicates to the event delivery object 312 that the reply event is to pass directly to the destination subsystem rather than be placed in an event queue associated with the destination subsystem.

The following pseudo-code embodies an example of a reply event handler routine that is called in response to receiving a request event. In particular, for the following example, the destination subsystem has a event handler routine, called OnGetSampleData(EventBuffer* pEvent), that is called in response to a GetSampleData request event. This event handler routine places data in the reply event buffer, pointed to by the pointer "pReplyEvent".

RESULT Sample::OnGetSampleData(EventBuffer* pEvent)

{

if (SUCCESS==Create Reply Event(&pReplyEvent, SetSampleDataReply, event version, subsystem, size))

{

put_data_in_event_buffer;

res=PostEvent(pReplyEvent); // send event to the Event bus

}

delete(pEvent); //

return res;

}

The OnGetSampleData reply event handler routine calls a CreateReplyEvent that creates a reply event to the original request event. As noted above, the reply event is placed in the event buffer used to hold the original request event (i.e., pointed to by pEvent), thus overwriting the request event. A new pointer, pReplyEvent, points to the reply event in the event buffer, and the old pointer, pEvent, is deleted.

The Create_Reply_Event, as the name suggests, creates the reply event according to supplied input parameters. One input parameter is the identification of the reply event, here SetSampleDataReply, and the version of the reply event, here 1. All events are associated with an event ID 712, which together with the subsystem ID 722 of the source subsystem produces a unique identifier for that event.

Another feature of the Create_Reply_Event is that this function automatically specifies the destination subsystem of the reply event, namely, the subsystem that originated the request event. The PostEvent command is one of the functions provided by the event bus API 392 for communicating with the event bus 310. Because the Create_Reply_Event function sets the destination subsystem of the event, the PostEvent command indicates where to the deliver the reply event (i.e., using the dispatch table).

5.1.1.2 Notification Event

A notification event is an event that is directed to the subscription manager 362. Such event is dropped (i.e., ignored) by the subscription manager 362 unless there is an entry in the local subscription table 450 or remote subscription table 418 indicating that at least one subsystem 300 is interested in being notified of the occurrence of that event. Each subsystem keeps a list of events that can be subscribed to by other subsystems, and accordingly produces a notification event after issuing one of these potentially subscribed to events.

5.2 Event Delivery Commands

In general, each subsystem 300 issues five types of commands to deliver events to the event bus 310: PostNotificationEvent, PostEvent, SendEventAndWait, Subscribe, and Unsubscribe. In brief overview, a PostNotificationEvent command sends a directed event to the subscription manager 362 as mentioned above. A PostEvent command sends a directed event to a destination subsystem and allows the source subsystem to immediately continue processing other tasks (that is, the PostEvent command immediately "returns"). A SendEventAndWait command sends a directed event to a destination subsystem and waits for a response causing the source subsystem to block until the response is received. A Subscribe command sends a notification event to register a subscription with the local subscription table 450 and/or remote subscription table 418. An Unsubscribe command sends a notification event to remove a previously registered subscription from the local subscription table 450 and/or the remote subscription table 418.

6.0 Basic Examples

Referring back to FIG. 3, the following examples use one particular embodiment to illustrate the principles of the subject matter described above and are not intended to limit the subject matter of the invention in any way whatsoever.

6.1 PostEvent Command

Referring also to FIG. 8A, when a source subsystem 300 seeks to communicate with a destination subsystem 300' on the same or different server, one method of communicating is for the source subsystem 300 to issue a PostEvent command to the event delivery object 312 through the event bus API 392. The source subsystem 300 determines (step 800) whether the identity of a target server hosting the destination subsystem 300' is needed. For example, a subsystem 300 preparing to issue an event to a peer subsystem 300 on another server 180 would need to determine the identity of the target server 180 hosting the peer subsystem.

If the identity of a target server is needed, the source subsystem 300 communicates (step 802) with the service locator 354. In one embodiment, such communication occurs as a directed event to the service locator 354 delivered over the event bus 310. The directed event may request the identity of the target server or request that the service locator 354 forward the event 700 to the destination subsystem 300' on the target server. In the latter case, the event received by the service locator 354 from the source subsystem contains the event 700 that is to be forwarded. The service locator 354 delivers this contained event 700 to the event delivery object 312 with the target server specified as one of the parameters.

In the embodiment shown in FIG. 8A, the source subsystem 300 does not deliver an event, but calls a function of the internal API 302 of the service locator 354. The service locator 354 then determines (step 804) the target server. In one embodiment, the service locator 354 returns (step 806) the identity of the target server to the source subsystem 300 so that the source subsystem 300 can issue (step 810) the PostEvent command to send the event 700 to the destination subsystem 300'. Alternatively, the service locator 354 issues (step 808) the PostEvent command to send the event 700 to the destination subsystem 300' on behalf of the source subsystem 300 over the event bus 310. For this case, the internal API 302 call contains the event 700 that is to be forwarded to the destination subsystem 300' on the target server.

Upon receiving the event 700, the event bus 310 determines (step 812) whether the event 700 is local or remote from any destination host parameter included in the PostEvent command. If the destination subsystem 300' is remote, the event is delivered (step 814) to the transport layer 318 of the event bus 310 for subsequent transmission to the remote server 180' hosting the destination subsystem 300'. The transport layer 318 then transmits (step 816) the event 700 over the network connection 200 to the transport layer 318' on the remote server 180'. Operation of the transport layers 318, 318' is described in more detail in section 7.2.

If the destination subsystem 300' is local, the event delivery object 312 of the event bus 310 determines (step 818) the entry point associated with the destination subsystem 300' and determines (step 820) whether the destination subsystem 300' is a single-threaded or multi-threaded subsystem. To determine the entry point, the event delivery object 312 examines the dispatch table 316 using the destination subsystem UID 724 of the event 700 as an index into the table 316. In embodiments having event queues, the dispatch table 316 identifies the event queue associated with the destination subsystem 300'. In one embodiment, the event queue indicates whether the destination subsystem 300' is multi-threaded.

If the event queue indicates that the destination subsystem 300' is multi-threaded, the event 700 is not queued. The event delivery object 312 calls (step 822) the DispatchEvent of the subsystem API 306 of the destination subsystem 300', which causes execution (step 824) of the appropriate handler routine of the destination subsystem 300' for responding to the event 700. In an alternative embodiment, a thread executed by the destination subsystem 300' retrieves the request event 700 from the event delivery objects 312'.

If the event queue indicates that destination subsystem 300' is single-threaded, the event delivery object 312 places (step 826) the pointer to the event buffer 380 holding the event 700 in the event queue associated with the destination subsystem 300'. The event delivery object 312 then starts (step 828) a new thread of execution that signals the destination subsystem 300', using the DispatchEvent function of the subsystem API 306, and delivers the event 700 from the event queue to the destination subsystem 300'. This new thread executes (step 824) the handler routine appropriate for the event 700. In one embodiment, the event delivery object 312 dispatches the event 700 (using DispatchEvent) to the destination subsystem 300' without placing the event 700 in the event queue if the event queue is empty when the event delivery object 312 is about to place the event 700 in the event queue. Again, in an alternative embodiment, a thread executed by the destination subsystem 300' retrieves the event 700 from the event queue, rather than the event delivery object 312 pushing the event 700 to the destination subsystem 300'.

In one embodiment, the dispatch table 316 indicates whether the destination subsystem 300' has multi-threading capability. If the dispatch table 316 indicates that the destination subsystem 300' is multi-threaded, the event delivery object 312' calls the DispatchEvent function of the subsystem API 306' of the destination subsystem 300' as described above. Using the dispatch table 316 to store information regarding multi-threaded capability of subsystem makes the use of an event queue for a multi-thread capable subsystem unnecessary.

6.2 SendEventandWait Command

Referring to FIGS. 9A-9D, another method for the source subsystem 300 to communicate with the destination subsystem 300' is for the source subsystem 300 to issue a SendEventandWait command to the event delivery object 312 through the event bus API 392. To start the process, subsystem 300 issues (step 902) a request event 700 using the SendEventAndWait command of the SAL 304 of the destination subsystem 300'. This request event 700 uses a channel identification and specifies the destination subsystem 300' in the destination UID 724. Because the request event 700 is an event for which a response is subsequently expected, the source subsystem 300 blocks further execution of the thread that generated the request event 700 until the response from destination subsystem 300' is received. While this thread is blocked, the source subsystem 300 can communicate with other subsystems through other threads.

In this example, that source subsystem 300 seeks (step 904) a target server from the service locator 354. Note that not every event is sent to the service locator 354 for determining a target server; for some events, such as reply events, the source subsystem 300 does not need to use the service locator 354 because the target server is determined from the request event 700. As described above, the service locator 354 determines (step 906) the target server and returns (step 908) the identity of the target server to the source subsystem 300, and the source subsystem 300 sends the request event 700 to the event bus 310. Alternatively, the service locator 354 issues (step 910') the request event 700 to the event bus 310 on the source subsystem's 300 behalf. The specific action taken by the service locator 354 depends upon the actual request from the source subsystem 300.

The request event 700 passes to the event delivery object 312 of the event bus 310. Assume that the service locator 354 determines the target server to be the remote server 180'. The event delivery object 312 then determines (step 912) from the destination host parameter of the SendEventandWait command that the destination subsystem 300' is on the remote server 180'. Because the destination subsystem 300' is remote to the source subsystem 300, the request event 700 passes (step 914) to the transport layer 318 on the server 180. The transport layer 318 then transmits (step 916) the request event over the network connection 200 to the transport layer 318' on the server 180'.

The transport layer 318' passes (step 918) the request event 700 to the event delivery object 312' of the event bus 310'. The event delivery object 312' of the event bus 310' then determines (step 920) the entry point associated with the destination subsystem 300' and determines (step 922) whether the destination subsystem 300' is a single-threaded or multi-threaded subsystem as described above.

If the destination subsystem 300' is multi-threaded, the request event 700 is not queued. The event delivery object 312' calls (step 924) the DispatchEvent of the subsystem API 306 of the destination subsystem 300', which causes execution (step 926) of the appropriate handler routine of the destination subsystem 300' for responding to the request event 700.

If the destination subsystem 300' is single-threaded, the event delivery object 312' places (step 928) the pointer to the event buffer 380 holding the request event 700 in the event queue associated with the destination subsystem 300'. The event delivery object 312 then starts (step 930) a new thread of execution that signals the destination subsystem 300', using the DispatchEvent function of the subsystem API 306, and delivers (step 932) the request event 700 from the event queue to the destination subsystem 300'. This new thread executes (step 926) the handler routine appropriate for the request event 700. In one embodiment, the event delivery object 312' dispatches the request event 700 to the destination subsystem 300', bypassing the event queue if the event queue is empty when the event delivery object 312' is about to place the request event 700 in the event queue.

The handler routine produces (step 934) a reply event 700' that is posted (step 936) by the destination subsystem 300' to the event delivery object 312' of the event bus 310'. The reply event 700' uses the same channel identifier provided by the source subsystem 300 when it issued the request event 700. After determining that the reply event 700' is for a remote server (here server 180), the event delivery object 312' then passes (step 938) the reply event 700' to the transport layer 318' on the server 180'. The transport layer 318' transmits (step 940) the reply event 700' to the transport layer 318 on the server 180 over the network connection 200.

The event delivery object 312 of the event bus 310 receives (step 942) the reply event 700' through the transport layer 318 of the server 180 and delivers (step 944) the reply event 700' to the waiting thread (i.e., the thread that produced the request event 700). Because the reply event 700' uses the same channel identification used by the source subsystem 300 to initially issue the request event 700, the reply event 700' returns to the waiting thread (i.e., the waiting thread unblocks), bypassing the event queue (if any) associated with the source subsystem 300. If the reply event 700' does not return within a specified timeout period specified in the command, the waiting thread is released. The event delivery object 312 ignores the reply if the reply event 700' arrives after the timeout period expires. The source subsystem 300 executes the appropriate handler routine for the reply event 700'.

In an alternative embodiment, a thread executed by the destination subsystem 300' retrieves the request event 700 from the event delivery object 312', and a thread executed by the source subsystem 300 retrieves the reply event 700' from the event delivery object 312. Thus, in this embodiment, the subsystems 300, 300' "pull" the event 700' in contrast to the above described embodiments in which the respective event delivery objects 312', 312 "push" the request event and reply events 700, 700' to the destination subsystems 300' and source subsystem 300, respectively.

6.3 Managing Dynamic Data

Referring to FIG. 10, when a subsystem 300 of a server 180 needs to store or retrieve collector point data stored in dynamic store 240, that subsystem 300 transmits an event to the dynamic store system service module 356 resident on the server 180 (step 1002). The dynamic store system service module 356 determines if it knows which server 180 in the server farm 180' is the collector point of the record type sought by the subsystem 300 (step 1004). For example, the dynamic store system service module 356 may cache associations between the record type and the collector point, and access this cache upon receiving the event from the subsystem 300.

If the dynamic store system service module 356 can determine the server collecting records of the type identified in the event, the dynamic store system service module 356 sends an event to the server 180 responsible for collecting such records (step 1006). If unable to determine the collector point, the dynamic store system service module 356 sends an event to the zone manager 358 seeking the address of the server that collects that record type (step 1008). Upon receiving that event (step 1010), the zone manager 358 determines (step 1012) if it is the master zone manager 358 for the zone. If the zone manager 358 is the zone master, then the zone manager 358 transmits to the dynamic store system service module 356 the identification of the server responsible for collecting events of identified type (step 1014).

If the zone manager 358 is not the master, then the zone manager 358 sends (step 1016) an event to the zone master, which is known as a result of the master election. The zone manager 358 receives the server identification of the zone master (step 1018) and transmits (step 1014) the server identification to the dynamic store system service module 356. Upon receipt of this server identification, the dynamic store system service module 356 accesses the dynamic store 240 according to the event initially received from the subsystem 300. In the event that the zone master does not respond after a predetermined number of requests are sent, the zone manager 358 initiates an election for a new zone master, as described above.

7.0 Subsystems

Whenever a dynamic store table is opened by a server 180 for the first time, the dynamic store contacts the zone master to determine the table owner. A request to the zone master for a table owner always succeeds assuming the requested table name is valid. Even if the table is not known to the zone master, an owner will be designated for it at the time of the request. Any failure to determine the table owner (other than invalid table name) is catastrophic, and will result in an error being propagated back to the component that initiated the connect request.

After the zone master has returned the identity of the server that owns the table in question, the requesting server must contact the owner. If the connection attempt fails after a predetermined number of attempts, the requesting server resets its state and requests the zone master to again identify the table owner. This should eventually result in a new table owner being designated.

After the record table has been successfully opened by contacting the table owner, the communication between requesting server and owning server settles into a set of insert, delete, update, and query requests. If a failure occurs while attempting to perform one of these operations after a predetermined number of attempts, the requesting server will contact the zone master to request a new owner. This process is executed as above. If a new table owner is selected by the zone master, the requesting server will first update the new owner with all local records. Since the new owner will need some time to receive updates from the other hosts in the zone before it will properly be able to deal with the incoming request in some embodiments the requesting server will have to wait for some amount of time before submitting the request.

As described above in section 3.0, each subsystem 300 includes a subsystem access layer (SAL) 304 that defines the application program interface (API) commands to which the subsystem 300 is capable of responding. When one subsystem 300 needs to use the functionality of another subsystem 300', that one subsystem 300 calls the appropriate API command provided by the SAL 304 of that other subsystem 300'. In one embodiment, each SAL 304 is implemented as an object class having data members and member functions. The member functions use the event as a parameter in a command. These command functions include a PostSALEvent function (equivalent to a PostEvent function) and a SendSALEvent function (equivalent to a SendEventAndWait function). The data members include (1) a reference to the subsystem that created the SAL, (2) identification of the subsystem calling the member function using the event as a parameter and (3) identification of the destination subsystem for the event.

When the source subsystem 300 needs to use the functionality of another subsystem 300', the source subsystem 300 creates an instance of the SAL class for that other subsystem 300' and calls the member functions provided by that SAL instance. When called, a SAL member function moves the event into an event buffer 380 and posts the appropriate pointer to the event to the event delivery object 312. For example, the called SAL member function sets the "request bit" in the event ID 712 and issues a SendSALEvent call to post the event and wait for a reply event. As discussed previously, the SendSALEvent call creates a unique channel ID 726 with which the destination subsystem sends a reply for this event to the source subsystem. Upon receiving a reply on the specified channel, the SAL 380 of the source subsystem extracts the data from parameters in the reply event and returns to the blocked thread that called the SAL member function.

If the source subsystem and the destination subsystem are the same type of subsystem, but reside on different hosts, the source subsystem does not need to use the SAL 304 of the receiving subsystem (e.g., the persistent store system service module 352 on one server 180 to the persistent store system service module 352' of another server 180'). In such instances, the source subsystem already knows the events to use to communicate with the destination subsystem without needing to reference the SAL of the destination subsystem. In these embodiments, the source subsystem may directly post an event to the event bus directed to its peer residing on another host.

7.1 Transport Layer

The transport layer 318 serves as the mechanism that allows subsystems 300 on different servers 180 to communicate with each other. The transport layer 318 corresponds to the Open Systems Interconnection (OSI) session and presentation layers in that it sends and receives event messages 700 via the server's network interface, performs encryption/decryption and compression, and manages connections. Connection management involves forming a connection to other servers in its server farm when there is an event message 700 to transmit and dropping the connection after a period of inactivity. The transport layer 318 handles two types of messages--control and event messages.

Control messages are used by the transport layer 318 to determine the compatibility of encryption and compression capabilities (i.e., filters) for servers 180 on each side of the connection. In addition to resolving the transport capabilities of the receiving server during th