Enterprise integration system6810429Abstract An enterprise integration system is coupled to a number of legacy data sources. The data sources each use different data formats and different access methods. The integration system includes a back-end interface configured to convert input data source information to input XML documents and to convert output XML document to output data source information. A front-end interface converts the output XML documents to output HTML forms and the input HTML forms to the XML documents. A middle tier includes a rules engine and a rules database. Design tools are used to define the conversion and the XML documents. A network couples the back-end interface, the front-end interface, the middle tier, the design tools, and the data sources. Mobile agents are configured to communicate the XML documents over the network and to process the XML documents according to the rules. Claims We claim: Description FIELD OF THE INVENTION
<customer>:
<firstname>John</firstname>
<lastname>Smith</lastname>
</customer>
The HTML style sheet 126 for this document is as follows:
<html>
<h1>`customer.firstname`</h1>
<h2>`customer.lastname`</h2>
</html>
After applying the style sheet to the XML document, the resultant HTML form 121 would appear as:
<html>
<h1>John</h1>
<h2>Smith</h2>
</html>
The style sheet supports accessing all of the elements and attributes in the XML documents, and iteration over groups of repeating elements. For example, an XML document contains:
<customer type="preferred">
<firstname>John</firstname>
<lastname>Smith</lastname>
</customer>
The "type" attribute of the customer is accessed by using a syntax such as the following: `customer.attr[type]` which yields the value "preferred." Given a document containing repeating groups as follows:
<customers>
<customer type="preferred">
<lastname>Smith</lastname> </customer>
<customer type="standard">
<lastname>Jones</lastname>
</customer>
The "lastname" element of the second customer is accessed using a syntax such a `customer[1].lastname` which yields the value "Jones." To iterate over all of the customers and access their "type" attributes, an expression such as:
`iterate(i=customers.customer) {
i.attr[type]
can be used to produce first the string "preferred," and then "standard." Validation The front-end interface also supports the validation of user entered information. Field validation information supplies some immediate feedback and interactivity to the user. Field validation also increases application efficiency by detecting common errors within the web browser process before any other network traffic is incurred or application logic is executed. Client side validation can be broken down into two related levels. Field-Level Field-level validation performs simple checks on user entered data to validate that the information is of the correct format or data type. For example, field-level validation can validate that a user enters numeric values in a particular field, or uses a proper date format. We implement field-level validations with Javascript. A library of common validations is supplied as a script file on a web server. The library has a ".js" file extension. This script file can be included into HTML forms as desired using the <script> HTML tag. Validation is enabled for a field by indicating the name of an appropriate validation routine, e.g. "onChange," within an event handler of the field. The event handler is triggered when an INPUT field changes. Setting up validation for a field requires HTML coding as follows: <input type="text" name="birthdate" onChange="validateDate(birthdate)"> The validation library provides routines for common data types such as dates, times, currency, etc. The validation library can also provide a pattern matching ability allowing user input to be matched against arbitrary patterns, e.g., a pattern $##.## to match a monetary amount. Cross-Field Validation Cross-field validation allows for more complex validations. In this type of validation, the contents of one field depends on the contents of another field. For example, cross-field validation can detect a situation where a telephone number must be entered. Such validation usually requires a more detailed knowledge of the requirements of the application. Middle Tier The middle tier 130 provides the "glue" that links the back-end and the front-end interfaces. The middle tier utilizes the mobile agents 101 to communicate with the interfaces. The middle tier also provides support for disconnected applications and users. In addition, the middle tier customizes the system 100 to the needs of specific enterprise functions without actually having to reprogram the legacy systems. The middle tier supports the automation of complex workflow and complex validations of data that may require access to multiple data sources. As a feature, the middle tier uses a rules engine (RE) 131 operating on rules stored in a database 132. The rules are defined in a rules language, and can be retrieved by the agents 101 as needed. In a typical scenario, the user launches an agent 105 due to interaction with the browser 124. The agent carries an XML document, e.g., a purchase order 106, to the rules database 132. The agent retrieves the appropriate rule for processing the order, such as a purchase order workflow. The agent then interprets the rule to appropriately route the document to the locations in the network specified by the rule. The rule can include a travel itinerary, as well as instructions on how to interact with the data sources. As an advantage, the operation of our system is always current. As rules change so does the operation of the system. The agents always execute according the current state of the rules database. Design Tools As shown in FIG. 2, the primary purpose of the design tools 140 is to generate 141 XML document type definitions (DTD) 142, to specify 143 data mappings, i.e., RACs 114, to encode 144 rules 132, and to design 145 user interfaces 126. Document Type Definitions The step 141 identifies the different types of document information (DTD) 142 that needs to be shared by the various data sources 111 of the back-end 110 and the browser 124 of the front-end 120. This information is specified in the DTDs. For example, to share purchase order information between systems, the type of information needed in a purchase order needs to be identified, then that information needs to be encoded in a corresponding DTD. In one embodiment, the design tools use the service bridge to extract schemas from the data sources. Data Mapping After a data source independent data format has been generated, the mappings between the XML format and legacy formats for a particular database needs to be specified as shown in FIG. 3. A query operation to a relational databases 111 involves extracting the schema of the database by generating a SQL runtime access component (RAC) 114 which makes the JDBC calls to the database, converting the resulting data into the XML format, and handing the XML document 113 to an agent 101. The access components can be implemented as Java code. The agent delivers the XML to the front-end 120 for conversion to the HTML form 121 using the style sheet 126 so that the data can be viewed by the user 103 using a standard browser 124. Conversely, the update operation converts the HTML form to the corresponding XML document. The XML document is converted to a legacy format and the RAC modifies the data source using its schema. For other legacy data sources that are not specified by a schema or some other metadata, the mapping may need to be done by means that access the APIs directly. Rule Encoding After the data format definition is generated, and the RAC has been specified to access the appropriate data source, the next step is to encode what agents are going to do with the information. In a simple data replication system, an agent may retrieve modified records from a master database, travel to the location of a backup database, and then update the backup database with a copy of the modified record. This process involves the encoding of a specific rule. Designing the User Interface As shown in FIG. 2, generating the user interface requires three steps: manipulating document type definitions (DTD) 145, importing DTD 146, and generating DTD from database schema 147. Authoring DTD The design tools 140 allow the system designer to define, design, and manipulate XML and HTML DTDs. A DTD 142 defines the name of the following document elements: the contents model of each element, how often and in which order elements can appear, if start or end tags can be omitted, the possible presence of attributes and their default values, and the names of the entities. Because the DTDs represent many different types of documents in the system, this step essentially defines the data types of the enterprise's computerized applications. As an advantage, the resulting DTDs do not directly tie the system to any specific legacy data source, nor do the definitions preclude the integration of other legacy systems in the future. DTD Import The tools also allow one to import already existing DTD definitions. Such functionality can be used in environments where DTDs have already been defined for standard document types. These DTDs may have been defined by standards bodies or a designer of the legacy system. DTD generation from Database Schema This part of the tools automatically generate DTDs from existing database schema. XML.THETA..fwdarw.SQL Mapping Definition Given the existence of the DTDs, the system 100 provides tools that map between legacy back-end data formats and XML document formats. In the case of relational database access, these mappings link tables, columns, and fields from the legacy database to elements and attributes of the XML documents as defined by the DTDs. This also allows the definition of several distinct mappings, each of which involves accessing slightly different information in the data source. Data Mappings Query Mapping A query mapping enables an agent to retrieve information from a legacy data source. In the case of a relational database, this mapping specifies the contents of the SELECT statement, including any information relevant for a table join. A query mapping for a purchase order may involve accessing a purchase order table, a customer table, and a product catalog table. Update Mapping An update mapping allows an agent to modify information in the data source. This involves specifying the contents of an UPDATE statement. An update mapping for a purchase order involves updating the purchase order table, but not modifying the customer table or the product catalog table. Delete Mapping A delete mapping allows an agent to delete information in the data source. This involves specifying the contents of a DELETE statement. A delete mapping for a purchase order involves deleting a record or records from the purchase order table, but not modifying the customer table or the product catalog table. Add/Create Mapping An add/create mapping allows an agent to add information to the data source. This involves specifying the contents of an INSERT statement. An insert mapping for a purchase order involves adding a record or records to the purchase order table, but not modifying the customer table or the product catalog table. Schema Extraction and Caching In order to allow for mapping between a legacy database schema and XML DTD formats, the mapping design tool extracts the schema from legacy databases. Because schema extraction is an expensive and time consuming task, the tools allow one to save extracted schemas on a disk for subsequent use. Form Generation The tools will also allow one to automatically generate a form from a DTD. Such a form may require minor modifications to enhance the physical appearance of the form. For example, color or font size of text can be adjusted to enhance usability. Embedding Binary Data in XML Documents Some enterprise applications may need to retrieve arbitrary binary data from the data source 111. For example, a legacy database contains employee information. Included with that information is a picture of the employee in standard JPEG format. The employee information is stored as a single table named "employees," which has a schema as Table 1, where the field <image> represents the picture:
TABLE 1
ID Name HireDate Photo
1 John Smith 1/1/96 <image>
The XML document that retrieves the above table appears as follows:
<employee>
<ID>1<1</ID>
<name>john Smith</name>
<hiredata>1996-29</hiredate>
</employee>
XML, by itself, does not naturally lend itself to the inclusion of binary data. To deliver this information for display in a web page, the service bridge 112 could encode the SQL record in an XML document as follows:
<employee>
<ID>1<1</ID>
<name>john Smith</name>
<hiredata>1996-29</hiredate>
<Photo href="http://server/directory/john.jpeg" />
</employee>
However, there are a number of problems with this type of approach. First, it is the responsibility of the user to issue the proper additional commands to retrieve the linked document before it can be displayed, e.g., the user must click on the URL of the picture. Second, the DTD for the XML document must specify the URL. For most legacy databases, it is unlikely that the records storing the binary data are accessible via an HTTP URL. Furthermore, the binary data is transported through the system by a follow on transport, such as HTTP. For reliability, security, consistence, and other reasons we prefer to carry all data, including binary data with the agents. To allow the servlet 123 to generate an agent that can access the binary data, we define a new type of URL. The new URL incorporates the location of the binary data, as well as a unique "name" that can be used to retrieve the binary data. The URL contains the hostname of the data source, a service name, an action name that can be used to perform the retrieval of the binary data, and a document identification referring to the binary data. This still results in a fairly complex URL. Using multiple requests to retrieve the binary data is inconsistent with our agent model. Agents try to use the network effectively by batching data into fairly large self-contained packets. This is very different than the hypertext model used on the web in which a single page display can lead to multiple network requests. Compound Documents In an alternative solution, we define a compound document. In a compound document, the binary data is embedded in the same document as the textual XML data. This approach is consistent with our agent driven system that attempts to transport data as larger batches. Compound documents can be built in two ways. Embed Binary Data into XML Text Element The binary data is embedded directly into an XML text element. This can be done as long as the binary data is encoded in such a way that the data only contain XML characters. Such an encoding could be based on the Base64 encoding. With Base64, special characters, such as "<" and ">," are replaced with equivalent entities (i.e., < and >). We also can use a character data (CDATA) section to work around the problem of illegal characters within the Base64-encoded data. We may want to prefix the embedded binary data with standard mime headers that specify content type, encoding, and name. Such a format for the photo element appears as follows:
<Photo>
Content-Type: image/jpeg
Content-Encoding: base64
Content-Name: john.jpeg
9j/4AAQSkZJ......gEASABIAAD/
</Photo>
It should be noted that this alternative increases the size of the binary data by 33% as well as increasing the overhead to encode and decode the data. This alternative requires that a SQL RAC extracts the binary data and encodes the data into Base64,and then adds the encoded data to the XML document with the proper mime headers. Compound Document Encoded as Mime Document Another alternative, embeds both the XML document and the binary data into separate parts of a multipart mime document. Each part of the overall document has a Content-ID which is referenced from a standard XML link, in part, such a format appears as follows:
Content-Type: multipart/related; boundary".differential.--XXXXX"
--XXXXX
Content-Type: text/xml
Content-ID: doc
<Photo href="cid:photo"/>
--XXXXX
Content-Type: image/jpeg
Content-Encoding: base64
Content-Name: john.jpeg
Content-ID: photo
9j/4AAQSkZJ... gEASABIAAD/
----XXXX----
With this alternative, the binary data may not need to be encoded. However, this requires that agents also retrieve MIME documents via the RAC. JDBC Service Bridge FIG. 4 shows details of a preferred embodiment of a service bridge 400 of the back-end interface 110 for accessing a data source. In this embodiment, JDBC is used to access a SQL type of database. The bridge 400 includes a public interface 410, JDBC run-time access component (RAC) 420, XML-SQL data mapping 430, and a document cache 440 as its main components. Public Interface As stated above, the public interface 410 provides the means by which agents access the data sources 111. The public interface allows data retrieval, modification, and addition. As an advantage, the public interface 410 makes no assumptions about how data in the legacy database 111 is sourced or maintained. Instead, we make the public interface resemble the GET/PUT model of HTTP. JDBC Run-Time Access Component The JDBC access component 420 is responsible for establishing and managing JDBC connections, building and executing SQL statements, and traversing result sets. This component works entirely within the context of JDBC and SQL. XML-SQL Data Mapping The XML-SQL data mapping 430 uses the mapping information generated by the design tools 140 to map data between XML and SQL. Document Cache The document cache 440 operates entirely with XML documents. XML documents that have been retrieved from the data source can be cached for fast future retrieval. The caching services are configurable so that maximum cache sizes and cache item expiration times can be specified. Caching can be disabled for certain classes of documents which contain highly volatile information. FIG. 5 shows the public interface 410 in greater detail. The interface supports four basic types of accesses, namely get 510, put 520, add 530, and delete 540. At the heart of the interface is the document id 104. The document id is a string which uniquely identifies every document instance within the data source. The document id can be thought of as corresponding to the URL of a World Wide Web document, or to the primary key of a record in a database. Although the id has a different format than a URL, it does serve as a document locator. In order to interact with information in the legacy data source, an agent needs to provide the id for the document containing the information. The id contains multiple sections of information and follows the following pattern. The first character of the id string specifies a separator character (S) 501 that is used to separate the different sections that make up the document id, e.g., a colon (:). This character is used in conjunction with a Java StringTokenizer to parse the document id. The subsequent information in the id includes name=value pairs (N, V) 502. One pairs 502 specifies a document type, e.g., ":type=cust_list:" In most common cases, the id 104 also contains a key specifying the exact document instance in order to uniquely identify an individual document in a data source. For example, in a document containing customer information, this key contains a data source specific customer number or a customer id. Within the service bridge, this key is mapped to a WHERE clause of a SQL statement. For example, an agent can request customer information for a particular customer by specifying an id string as follows: ":type=customer:key=SMITH:". This request results in a SQL query to the database that appears as follows: SELECT * FROM Customers WHERE Customers.ID=SMITH The exact semantics of how they key is mapped into the resultant SQL statement is specified by the design tools 140. The key portion of the id can be composed of multiple pieces of information separated by, for example, commas. Such a key is used in cases in which the WHERE clause of the corresponding SQL query needs multiple pieces of information to be specified by the agent. An example of this is a document containing a list of customers, where the customers names are within a certain alphabetic range, for example, "all customers whose last names begin with the letters A or B. Such a document has an id as follows: ":type=cust_list_by_name:key=A,Bzzzz:" In this case, the request would map into a SQL statement resembling the following:
SELECT * FROM Customers
WHERE Customers.LastName BETWEEN A, Bzzzz
Implementation Details of the Service Bridge Database Access User Authentication The service bridge is responsible for performing any authentication necessary in order to establish a database connection. This may involve supplying a database specific username and password or other login information. When a database access (get, put, add, delete) is made by an agent, the bridge examines the agent's runtime context to determine the user identity associated with the agent. After the agent's identity has been ascertained, the service bridge maps the identity into simultaneous database-specific user identification using a mapping table generated by the design tools. For example, the mapping maps the user identity "steve@accounting" into an Oracle username "steve." In order to establish a connection to a database on behalf of a user, the service bridge retrieves both the username and clear-text password for the corresponding database user account. In such cases, the clear-text password is stored in the identity-mapping table. For security reasons, the table is encrypted on disk using a public/private key pair. Connection Management To enhance performance and scalability, the service bridge supports database connection pools. This means that multiple users share a common pool of JDBC connections. Establishing a database connection can be a slow and relatively expensive.operation. The use of shared connection pools decreases this expense. The basis for this connection sharing are "users groups." When an agent attempts an operation which requires a connection to a database, the service bridge performs that operation using a connection established in the context of a special "pseudo-user" account. The pseudo-user is a database system account that represents not an individual user, but instead a particular group of users. A pool of such pseudo-user connections is available for use by all of the agents of the group. The service bridge generates and maintains a connection pool for each distinct group of users who access the bridge. FIG. 6 shows agents 101 for three users tom, joe and david 601-603 accessing the data source 111. Two of the users, tom@users and joe@users, are members of a users group. The third user, david@managers, is a member of a "managers" group. When these agents attempt to access the database, the two members of the users group share a connection pool 610 that was established with the.credentials of the "users" pseudo-user. The third agent will communicate with the database using a separate connection pool 620 established with the credentials of the "managers" pseudo-user. A connection pool for a particular group is generated when a member of the group makes the first access request. Connections within the pool are constructed as needed. The service bridge does not pre-allocate connections. After a configurable, and perhaps long period of inactivity, the connection pool is closed to free database resources. If a connection pool for a particular group has been closed due to inactivity, then any subsequent request by a member of that group results in the generation of a new pool. When a request is completed, the connection allocated for that request is returned to the pool. A maximum number of connections in a pool can be specified. If no connections are available when a request is made, then the request is blocked until a connection becomes available. Statement Construction and Execution The actual generation and execution of SQL statements is performed by a separate "modeler" object. The modeler object is generated by the design tools 140. For each type of document used in the system, there is a distinct modeler object. Each modeler knows how to construct exactly one type of document. During the design process, one specifies what information is to be retrieved from the database, and how to map the information into an XML document. The design tools serialize and save the modeler objects in a ".ser" file. At runtime, the service bridge loads and de-serializes the modeler objects from the ".ser" file. The resultant modeler objects are able to perform all of the data access and mapping functions required to retrieve information from the data sources. As stated above, SQL to XML data mapping is performed by the modeler object designed for a particular document type. Data Caching To improve the performance of document retrieval, the data service caches database information as converted XML documents. When a first request is made to retrieve a document, the service performs the SQL access and SQL to XML data mapping as described above. The resultant XML document is added to the cache of documents 440 maintained by the service bridge. Any subsequent request to retrieve the document will be satisfied by retrieving the document from the cache, bypassing the need for an additional expensive database access and mapping. When an update or addition is made to a data source, the cache is updated to reflect the new information. The update to the cache is made only after the SQL statement performing the update of the end database has been completed successfully. This prevents the cache from storing information that has not been committed to the database due to errors or to security restrictions. The XML document cache is configurable to specify a maximum size of the cache, the maximum amount of time a single document can be retained in the cache before it becomes stale, and whether the cache should be persisted to disk, in which case the cache can be re-used after a server restart. One can also customize how different classes of documents are cached. If a document represents highly volatile information, then caching can be disabled for that class of document. If a document class is completely (or virtually) static, then documents of that class can be cached for a very long time. Execution Flow The following section describes the execution flow for basic database access requests. FIG. 7 shows the steps 700 of a "get" or retrieval access in greater detail. After the request is received from the agent 710, the caller and document identity are determined 720, 730. The group specific cache is identified 740, and the cache is checked 750. If the cache stores the document, return the document in step 755. Otherwise, locate the XML-SQL mapping 760, construct the select SQL select statement 770, retrieve the connection 775, and execute the statement in step 780. Next, the result set is "walked" 785, fields are extracted 790 to build the XML document 794, the document is cached 796 and returned to the agent in step 798. FIG. 8 shows the steps 800 for the addition (add) and modification (put) similar to the get steps. The delete request simply deletes data from the database as shown at 540 in FIG. 5. Run-time Object Hierarchy FIG. 9 shows the run-time hierarchy 900 of objects of the service bridge 110. The objects can be classified as data source independent 901, and data source dependent 902. The data source independent object 901 includes data source factory object 910 indexed by group name, group specific data source objects 920, document factory objects 930 (one per document), document cache objects 940, document builder objects 950, connection pool objects 960, mapping table objects 970, document manager objects 980, and the data source manager objects 990. The data source dependent object 902 include source connection 991, string authentication 992, document map 993, and specific driver objects 994. Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
