|
|
|
Generating database or data structure (e.g., via user interface) |
System and method for providing access to databases via directories and other hierarchical structures and interfaces6985905
Abstract
A hierarchical/relational translation system is provided for enabling information from unrelated heterogeneous relational computing systems to be accessed, navigated, searched, browsed, and shared over a hierarchical computing system. In one embodiment, the hierarchical/relational translation system includes a virtual directory server for capturing information in the nature of relational database schema and metadata. The captured schema and metadata are then translated into virtual directories that are universally compatible with standard communication protocols used with hierarchical computing systems. A virtual directory of information organizes an index of data records and a standard addressing schema is provided to enable customizable access to relevant views of relational computing systems. Several embodiments for presenting the virtual directory information tree are included. In one embodiment, the virtual directory is displayed using browser format. In another embodiment, the virtual directory is presented in electronic mail format. Still, in another embodiment the virtual directory is presented over a wireless medium and through portable devices.
Claims
What is claimed is:
1. A method for building standard-based hierarchical view definitions, comprising steps of:
capturing relationships and objects from a first data source and a second data source, each data source having a data model, the first data source having a data model different from the second data source;
mapping the relationships and objects captured into a set of hierarchical paths; creating a virtual directory of the hierarchical paths; and forming a combined presentation of the view definitions based on the hierarchical paths.
2. A method for building view definitions according to claim 1, further comprising the step of transferring the combined presentation for display and modification.
3. A system for building view definitions, comprising:
a server for accessing a first data source and a second data source, each data source having a data model, the first data source having a relational data model, the second data source having a data model different from the first data source, in response to an instruction received from a client device communicatively coupled to the server;
coupled to the server, a database for providing the relational data of the first data source;
a module for capturing the relational data from the database;
a module for converting the relational data into a set of hierarchical paths;
a module for creating a virtual directory of the hierarchical paths; and
a module for navigating the hierarchical paths from the client device.
4. A computer program product for building standard-based hierarchical view definitions, the computer program product stored on a computer readable medium, and adapted to perform the operations of:
capturing relationships and objects from a first data source and a second data source, each data source having a data model, the first data source having a data model different from the second data source;
mapping the relationships and objects captured into a set of hierarchical paths;
creating a virtual directory of the hierarchical paths; and
forming a combined presentation of the view definitions based on the hierarchical paths.
5. A system for providing hierarchical representation of a first data source and a second data source, each data source having a data model, the first data source having a relational data model, the second data source having a data model different from the first data source, comprising:
a first server capturing the relational data of the first data source from a database coupled to the server;
coupled to the first server, a virtual server mapping the relational data captured into a virtual directory of hierarchical representations; and
an interface application coupled to the first server for forming a presentation of the virtual directory for display.
6. The method of claim 1 wherein each data source is selected from the group of: a network operating system, a database, an application, a web service protocol, and a directory structure.
7. The method of claim 6 wherein the database is a relational database.
8. The method of claim 2 further comprises: transferring the combined presentation in a form viewable on a device that is one from the group: a browser, an electronic mail display device, and a wireless portable device.
9. The system of claim 5 wherein the presentation display is selected from the group consisting of: a browser, electronic mail display, and on a wireless portable device.
Description
BACKGROUND OF THE INVENTION
1. Technical Field
The invention relates generally to communication network systems, and more specifically, to a method, system and computer medium for locating, extracting and transforming data from unrelated relational network data sources into an integrated format that may be universally addressed and viewed over network systems according to a hierarchical representation.
2. Description of the Related Art
There are conventionally-known ways of indexing and addressing information on the Internet (also referred to interchangeably as the "Net") using an Internet directory. An Internet directory is an application service that generally performs information retrieval based on properties associated with the data of interest. Internet directories can store various types of objects, wherein each object is associated with a type of property or characteristic. For example, one type of Internet directory that provides a standard way of indexing and addressing the computer servers that host Net sites is the Domain Name System (DNS). Typically, a DNS server includes a method of creating a symbolic name for an Internet Protocol numeric address associated with the hardware of the Net server, and provides the .com, .net, .org, etc., domain addresses.
Along with DNS, users are additionally able to determine an address for documents through the HyperText Transfer Protocol (HTTP) that provides a Uniform Resource Locator (URL) for a page formatted with HyperText Markup Language (HTML). This addressing technique provides users a way to access any web page in the world. Although this addressing scheme has worked well to provide a hierarchical addressing scheme during the initial growth of the worldwide web (Web), the amount and importance of the data continues to expand. In particular, the increasing amounts and wide-spread diversity of information that relates to a significant portion of the world's economy is based on critical data records inside databases. Yet, there is no simple and effective manner in which to address and reference such data records originating from diverse heterogeneous databases according to context. For example, there is no conventional standard URL for a sales total, inventory, or a customer record in a database. Accordingly, there is growing need to reach a finer level of granularity of data addressing and management.
A new level of "granularity" is needed in order to locate and distribute information that is increasingly fragmented in its locale, but that potentially gives rise to value-added benefits when integrated with information from other sources. The evolution of the Internet has created an entirely new set of challenges that include dealing with the millions of web sites, billion of documents and trillions of objects that are now available in an increasingly decentralized computer environment. A completely decentralized Net creates a critical need to categorize (i.e., index) information and provide an address (i.e., location) for each piece of data on the Net. If this does not occur, the Net becomes something like a large telephone system without a telephone directory to look-up and to locate the numbers of individuals and groups. While developers have standardized techniques to organize and communicate much of this information through the conventional indexing techniques described above, they have not adequately addressed the following problems.
In the past, conventional client-server computing was inward-focused and directed to a tightly controlled environment. More specifically, conventional client-server computing was developed for distributed networks, and in particular, for use inside an enterprise or organization. Frequently, many enterprises store their data in a collection of disparate databases and deploy applications based on their short-term departmental needs. This conventional approach becomes increasingly problematic as an enterprise grows and the information contained in these disparate databases become increasingly difficult to integrate. The narrow scope of each application can eventually become a hindrance to the overall needs of the organization as information databases grow and change along with the evolving state of the enterprise.
The difficulties of the inward-focused model are more clearly understood when considered in the context of the future growth pertaining to the Net-based economy, which explodes the conventional inward-focused model into an environment that is highly decentralized and far more open to outward-focused computing. One key problem confronting enterprises that attempt to migrate their businesses onto the Net is how to take advantage of existing lines of business applications that are still bound to the inward-focused client-server model. As such, it would be beneficial to provide enterprises and organizations experiencing this problem with a way to unlock their data for use by other applications and other users. By doing so, these "back office" applications do not risk becoming isolated "islands of automation" in an endless ocean of information. Accordingly, it would be beneficial to be able to access and selectively assemble such data from disparately-located data sources and to automatically manage the data with an integrated view of the network and the application infrastructure. What is needed is an efficient integrated solution to a fragmented and distributed enterprise information system.
Directory services are an established component of the network infrastructure, stemming from the Internet's DNS to electronic mail (email) systems, and to the Operating System (OS) domains of corporate intranets. Applications that can leverage the strength of this infrastructure are on the rise and are placing new demands on the directory architecture. Led by the dramatic growth of e-commerce, it would be desirable to move directory-enabled applications toward a model of centralizing administration. This aspect of centralized administration is beneficial because it would allow tasks to be administered from anywhere in a network. To this end, directory-enabled applications moving towards a model having centralized administration would be better-suited to enable access to a richer set of data than provided by conventional directories.
However, for corporate information technology (IT) staff deploying directories in the past, the process has often proven to be slow and expensive. Conventional Internet directory deployment is slow because the process is complicated, at least for several reasons. First, conventional Internet directories suffer from the "yet another database" syndrome. Because the source of the directory information frequently exists in other parts of the infrastructure, the issues of resolving authoritative ownership of the data can be problematic. Second, the inconsistency amongst the various data sources conventionally require reconciling the different data formats and data models associated with each disparate data source. Third, synchronizing data from disparate sources into the directory requires extensive and careful planning.
These complexities in turn result in higher costs, which is another problem typically experienced with conventional Internet directory deployment. Interestingly, a leading directory market research firm (e.g., the Burton Group) has estimated that a typical enterprise directory might take a year to deploy and cost up to $2 Million.
The LightWeight Directory Access Protocol (LDAP) is a standard directory protocol that can be used to establish a universal addressing scheme. However, the complexity of deploying LDAP alone is a drawback holding back the development of such an addressing scheme as discussed below. LDAP is an open Internet standard addressing scheme for accessing directories that has been adopted by the Internet Engineering Task Force (IETF) standards regulation organization as well as by leading developers in the computing industry. Generally, LDAP is a type of Internet directory service based on the International Telecommunications Union (ITU) X.500 series of recommendations, and which facilitates property-based information retrieval by using one or more Internet transports as a native means for establishing communication between client and server computers. In particular, LDAP is an object-oriented protocol enabling a client to send a message to a server and to receive a response. The server typically maintains a directory of object entries, and the message sent from the client can request that the server add an object entry to the directory. Those skilled in the art will recognize that adding an object to a directory is accomplished by instantiating the object. The data model associated with LDAP includes entries, each of which has information (e.g., attributes) pertaining to an object. The entries can be represented by a hierarchical tree structure. A third version of LDAP known by those skilled in the art to be defined in RFC 2251.
Although LDAP can be used to enable queries and updates to be made to a directory structure, the LDAP implementation alone does not and has not conventionally provided a reliable and scaleable enterprise directory primarily because recursive inquiries are required to accommodate the disparate syntax and semantics used by various database providers. The recursive inquiries involve re-synchronizing information existing in unrelated data sources on an ongoing basis due to the incompatibilities introduced by the disparate data models of each data source. Furthermore, as the number of records in the relational table increases, the need for additional recursive inquiries impedes the reliability, efficiency and scalability of the directory.
In order to take advantage of the features of an LDAP directory, this directory must be first created and populated. Since most of the data that would become the source for this directory resides essentially in RDBMS, the complexity of converting the relational data model to the hierarchical data model is problematic. Conventional directory technology can be built on top of an RDBMS engine, but the internal logic and data model of an LDAP directory is so different from an RDBMS, that this conversion is always required. The internal logic of the RDBMS is typically irrelevant from the perspective of the directory, since the entire schema and organization of the directory is based on LDAP, which is modeled as an object-oriented database with inheritance, object class, attributes, and entries. This difference in data representation and data model is problematic because it forces the directory-implementer through a complex and lengthy data modeling and conversion effort. For example, in conventional directory implementations, the data that resides in the RDBMS must be extracted, and converted into a different information model and format (e.g., LDIF as is known in the art) as an intermediate form, and then imported into an LDAP-based directory. To maintain current information in the directory, this process must be repeated on a regular basis, which brings about re-synchronization.
There are other problems associated with this conventional process. First, translating RDBMS logic into an LDAP-based directory is not a lossless process. For example, data types commonly used by RDBMS applications do not exist in the LDAP model. Such data types include, but are not limited to, date and floating-number fields. Some requirements from LDAP do not correspond an exact translation in RDBMS, like for instance, multivalue attributes. Additionally, the lack of transaction support afforded by LDAP directories means that the success of between "batched import" are not always guaranteed.
The LDAP directories are based on a domain- and attribute-oriented data model, while RDBMS are based on an entity- and relationship-oriented data model. From a theoretical perspective, it can be shown that the two models are equivalent in expressiveness as is understood by those skilled in the art of data modeling. For example, one piece of information represented in one model may be translated without loss into the other model. However, conventional directory implementations have not successfully realized a full implementation of the features of the domain and attribute data model, hence, destroying the possibility for lossless automatic translation from one data model to another.
The consequence of having mismatched data models also results in lengthy and costly deployment for an essential infrastructure function. Nevertheless, LDAP is beneficial for several reasons. For example, LDAP is well-suited for use with directories, as compared to databases, particularly for enabling ubiquitous look-up over a network. Also, the LDAP API is also supported by many conventional client computers having, for example, email or web browser functionality, that virtually any user connected to a network may gain access to directories given the appropriate security clearance. Although the database access API structured query language (SQL) provides rich access capabilities when the data is needed locally, it alone inadequately provides secure data access over a network. In order to provide network access to database data, application programmers must use vendor-specific software drivers to enable secure data access over a network.
Accordingly, there is a need for the deployment of Internet directory services that follows a simpler and more flexible approach with consideration that a significant hurdle to overcome entails the mismatch between the hierarchical data structure of a directory and the more complex relational data models supported by the databases that house the data needed for the directory. What is needed is a way to unite "back office" applications (i.e., those applications distinctive to an enterprise and its corresponding proprietary syntax, semantics, logical information modeling, physical data modeling and other mechanisms) so as to seamlessly gain access to data from these divergent sources, and to integrate the data for value-added applications over computer networks outside each of the specific enterprises. Additionally, it is desirable to provide directory-enabled applications that rely upon a model of centralized administration. By doing so, the directory-enabled applications would allow the inclusion of richer, more complex data and data relationships in the directory than has been conventionally known. It would be beneficial if there were a standard addressing scheme for indexing each data record on the Net. With such a universal addressing scheme, a finer level of granularity of data addressing and management can be achieved, thereby enabling end-users improved access to data content.
SUMMARY OF THE INVENTION
A computer system having a hierarchical/relational translation system is provided for enabling information from unrelated heterogeneous relational computing systems to be accessed, navigated, searched, browsed, and shared over hierarchical computing systems. In one embodiment of the present invention, the relational computing system comprises unrelated heterogeneous relational databases, and the hierarchical computing system comprises a client computer coupled to a communications network. In the same embodiment, the hierarchical/relational translation system includes a virtual directory server for capturing information in the nature of relational database schema and metadata, and for communicating with the client application over the network.
The hierarchical/relational translation system of present invention includes a method for bridging the mismatched and disparate data models used by the database and hierarchical-directory worlds. The method includes accessing and capturing the database schema and metadata from various relational databases. The captured schema and metadata are then translated into virtual directories that are universally compatible with standard communication protocols used with hierarchical computing systems. To do so, the method includes mapping relational database objects and logical relationships to virtual directory entries that are configured to communicate all aspects of the virtual directory structure over the network to the client application.
In the described embodiments, users can search and/or browse the virtual directory to find the data needed or they can query the directory with simple commands to search for the information needed. The present invention also enables the ability to select either default or customized views of the virtual directory.
In accordance with one aspect of the present invention, a standard addressing schema is provided to enable customizable access to relevant views of relational computing systems. In one embodiment of the present invention, the virtual directory server provides the standard accessing schema in the nature of an Information Resource Locator (IRL). The IRL is defined to mean an LDAP URL and is used as an address locator for any type of data record. In particular, the RL enables data to be indexed and addressed through an industry standard representation by the hierarchical computing system. Thus, the system of the present invention provides access to all data through the Internet in a logical and powerful manner.
Another aspect of the present invention comprises distributing the information on the virtual directory server to the hierarchical computing systems with an industry standard communication scheme. With this standard communication scheme used to address data, mission critical databases can be unlocked for a variety of uses. The data can be used to drive e-commerce and e-business applications, thereby being opened for use to far more people than with conventional client-server techniques, while at the same time maintaining proper access control levels. Accordingly, a method is provided for translating the address of any structured data into the structured format of the industry standard representation. In one embodiment of the present invention, an Internet standard known as the Lightweight Directory Access Protocol (LDAP) is used.
With the same embodiment, the present invention is designed to map structured data into an LDAP URL in order to provide an Internet address for data records. In particular, structured data indexes are stored in a virtual directory of information (VDI) and are expressed using an LDAP address, which can be presented as a directory for use by end-users (users). By associating an address for each data record using an industry standard method, the present invention enables individual data records to be accessed over the Internet using a directory environment that users will already be familiar with. The VDI organizes an index of the data records into a directory, and the directory provides a logical organization of the repository of data records. In particular, the data records comprise the address location of the particular records. With the address of a specific data record, a user can locate a very specific piece of information, for example, a sales total, an inventory level, or a price point. In accordance with the present invention, this is beneficial because a virtual directory distribution system creates a new level of data access and granularity for locating and accessing data over networks.
According to another aspect of the present invention, the structured data indexes stored in a VDI and expressed using an LDAP address can be presented as a directory for use by other computers. When the data is referenced using a standardized address, other computer applications may use the data retrieved to drive a process or trigger an event. In accordance with the present invention, the data addresses can be routed for use by such computer applications. To this end, the present invention also introduces a system having a VDI "hub and router" which is used to combine data records located amongst disparate data sources for access in a virtually seamless and transparent manner to a user or computer application. The hub creates a consistent organization of the data records, and the router ensures the query is directed to the source data and back to the user or application invoking the query. Additionally, because the data address are expressed using the industry standard LDAP, multiple VDI hub and router combinations can be deployed within single or multiple enterprises and linked together.
The virtual directory of information organizes an index of data records. According to one aspect of the present invention, a virtual directory server enables the dynamic reconfiguration of a virtual directory information tree and associated content. The dynamic reconfiguration is advantageous because it removes the necessity to replicate database data into the virtual directory. With dynamic reconfiguration, the routing of queries to extract database schema in the source database is returned back to the user or application making the query. In one embodiment of the present invention, the routing of the data records can be implemented automatically through a computer program. In an alternative embodiment, the routing of the data records can be implemented on demand from an end-user.
Another advantage of the present invention is that directory deployment is neither costly nor complicated as with conventional techniques.
In accordance with the present invention, several embodiments for presenting the data records of the virtual directory server are disclosed. In one embodiment, the virtual directory is displayed using a browser format. For example, the virtual directory may be presented to a client application as part of a Windows Explorer page. In another embodiment, the virtual directory is displayed using an electronic mail format at a client application. Still, in another embodiment, the virtual directory is presented over a wireless medium and through portable devices.
Advantages of the invention will be set forth in part in the description which follows and in part will be apparent from the description or may be learned by practice of the invention. The objects and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims and equivalents.
BRIEF DESCRIPTION OF THE DRAWINGS
The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings.
FIG. 1 is a high-level block diagram of a communication system including the hierarchical/relational translation system in accordance with the present invention.
FIG. 2 is a block diagram of a first embodiment of the hierarchical/relational translation system of the present invention.
FIG. 3A is a block diagram of a first embodiment of a forward translation unit in accordance with the present invention; and FIG. 3B is a block diagram of a first embodiment of a return translation unit in accordance with the present invention.
FIG. 4 is a block diagram of a second embodiment of the communication system of FIG. 1.
FIGS. 5A-5C are block diagrams of exemplary embodiments of the communication system of FIG. 4.
FIG. 6A is a block diagram of a first embodiment for the server of the communication system of FIG. 5A; and FIG. 6B is a block diagram of one embodiment for a return translation unit of FIG. 6A.
FIG. 7A is a block diagram of a second embodiment for the server of communication system of FIG. 5B; and FIG. 7B is a block diagram of one embodiment for the VDAP plug-in of FIG. 7A.
FIG. 8A is a block diagram of a third embodiment for the server of communication system of the FIG. 5C; and FIG. 8B is a block diagram one embodiment for the ASP vdWap of FIG. 8A.
FIG. 9 is an exemplary graphical representation of a user interface for displaying directory view definitions in accordance with the present invention.
FIG. 10A is a block diagram of the hardware for the server (or virtual directory server) according to the present invention; and FIG. 10B is a block diagram of the memory unit for the hardware of FIG. 10A.
FIG. 11 is a high-level flowchart of a preferred method for creating and deploying a virtual directory system in accordance with the present invention.
FIG. 12 is a flowchart of a preferred method for operating a virtual directory system at run-time in accordance with the present invention.
FIG. 13 is a flowchart of a preferred method for creating a directory view from extracted schema data in accordance with the present invention.
FIGS. 14a-c are flowcharts of preferred methods for schema extraction, for mapping objects to an LDAP schema, and for schema mapping, respectively.
FIG. 15 is a flowchart of a preferred method for generating a default directory view from schema data in accordance with the present invention.
FIG. 16A is a diagram of an embodiment of a hub and router system; FIG. 16B illustrates one manner for using LDAP to uniquely address database records at a "finer" level of granularity than permitted by conventional DNS namespace; and FIG. 16C illustrates structured data indexes being stored in a hub and expressed as an LDAP address.
FIG. 17 is a block diagram of the hardware for the client computer according to the present invention.
FIG. 18 is a data-flow diagram of the schema capture process according to one embodiment of the present invention.
FIG. 19A illustrates an exemplary graphical representation of a user interface for displaying a representation of the objects and relationships resulting from a schema being captured in accordance with the present invention; FIG. 19B illustrates an exemplary shortcut menu; and FIG. 19C illustrates an exemplary toolbar, both of which can be used to provide command selection to the user interface of FIG. 19A.
FIG. 20 illustrates an exemplary graphical representation of a user interface for selecting a candidate key name in accordance with the present invention.
FIG. 21 illustrates an exemplary graphical representation of a user interface for a derived view according to one example of the present invention.
FIG. 22 illustrates an exemplary graphical representation of a user interface for enabling a user to select a directory view type in accordance with the present invention.
FIG. 23A illustrates an exemplary graphical representation of a user interface for displaying a default flat view in accordance with the present invention; and FIG. 23B illustrates an exemplary graphical representation of a user interface for displaying a default indexed view in accordance with the present invention.
FIG. 24 is a block diagram of one embodiment for extracting information from a relational database in accordance with the present invention.
FIG. 25 illustrates an exemplary graphical representation of a user interface for selecting data link properties in accordance with the present invention.
FIG. 26 is a high-level block diagram of a schema showing entities and relationships that have been defined when the schema is captured in accordance with one example of the present invention.
FIG. 27 illustrates an exemplary graphical representation of a user interface for defining relationships in accordance with the present invention.
FIG. 28 illustrates an exemplary graphical representation of a user interface for determining the primary keys in accordance with the present invention.
FIG. 29A illustrates an exemplary graphical representation of a user interface for declaring display names; FIG. 29B illustrates an example of the display name functioning as the default name; and FIG. 29C illustrates an example of another interface for declaring the display names.
FIG. 30 illustrates an exemplary graphical representation of a user interface for creating derived views in accordance with the present invention.
FIG. 31 illustrates an exemplary graphical representation of a user interface for editing connection strings in accordance with the present invention.
FIG. 32A is a block diagram indicating an example of the relationships between four entities; and FIG. 32B is a directory tree according to an exemplary namespace of FIG. 32A.
FIGS. 33A-D are exemplary diagrams of the link mechanism utilized for various purposes in accordance with the present invention.
FIG. 34A illustrates an exemplary graphical presentation of a user interface for determining the options to be selected for objects in accordance with the present invention; FIG. 34B illustrates an exemplary shortcut menu; and FIG. 34C illustrates an exemplary toolbar which can be used for command selection within the user interface of FIG. 34A.
FIG. 35 is a table illustrating a 1×n, and an n×1 default representation in accordance with the present invention.
FIG. 36 illustrates an exemplary graphical representation of a user interface for changing a selected icon.
FIG. 37 illustrates an exemplary graphical representation of a user interface for indicating a default comparison operator in accordance with the present invention.
FIG. 38 illustrates an exemplary graphical representation of a user interface for selecting the join feature in accordance with the present invention.
FIG. 39 illustrates an exemplary graphical representation of a user interface for selective adding, deleting or removing columns in accordance with the present invention.
FIG. 40 is a data-flow block diagram of the schema manager application in accordance with the present invention.
FIG. 41 is a data-flow block diagram of the default view builder wizard in accordance with the present invention.
FIG. 42 is a data-flow block diagram of the DirectoryView Designer for enabling hierarchical views to be built and managed in accordance with the present invention.
FIG. 43 is a data-flow block diagram of the DirectoryView Designer for managing an existing directory view that has been modified in accordance with the present invention.
FIG. 44 illustrates an exemplary graphical representation of a user interface for selecting paths in accordance with the present invention.
FIG. 45 illustrates an exemplary graphical representation of a user interface for selecting or modifying Content output in accordance with the present invention.
The figures depict a preferred embodiment of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
DETAILED DESCRIPTION OF THE EMBODIMENTS
A system, method, computer medium and other embodiments for locating, extracting and transforming data from unrelated sources of information into an integrated format that may be universally addressed over network systems are described. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid obscuring the invention.
Reference in the specification to "one embodiment" or to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
Some portions of the detailed description that follows are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps (instructions) leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic or optical signals capable of being stored, transferred, combined, compared and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. Furthermore, it has also proven convenient at times, to refer to certain arrangements of steps requiring physical manipulations of physical quantities as (modules) code devices, without loss of generality.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
One aspect of the present invention includes an embodiment of the process steps and instructions described herein in the form of a computer program. Alternatively, the process steps and instructions of the present invention could be embodied in firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems.
The present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any references below to specific languages are provided for disclosure of enablement and best mode of the present invention.
Moreover, the present invention is claimed below as operating on or working in conjunction with an information system. Such an information system as claimed may be the entire information system for providing a virtual directory of information as detailed below in the described embodiments or only portions of such a system. For example, the present invention can operate with an information system that need only be a communications network in the simplest sense to catalog information. At the other extreme, the present invention can operate with an information system that locates, extracts and transforms data from a variety of unrelated relational network data sources into a hierarchical network data model through the dynamic reconfiguration of the Directory Information Tree (DIT) and contents without the necessity of replicating information from the relational data sources into the virtual directory as detailed below in the described embodiments or only portions of such a system. Thus, the present invention is capable of operating with any information system from those with minimal functionality, to those providing all of the functionality disclosed herein.
Reference will now be made in detail to several embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever practicable, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
Bridging the Gap between Databases Versus Directories with Virtual Directories
There is an ongoing debate regarding the differences between databases and directories. Accordingly, the differences between directories and databases are now discussed so as to clarify how the virtual directories of the present invention bridges the gap between them.
A. Comparison of Databases and Directories
There exists an ongoing debate that directories are best-suited for applications whose data is stable and that require information to be read quickly and frequently but written slowly and infrequently. This particular view contends that conventional Relational DataBase Management Systems (RDBMS) technology does not yield adequate speed and performance results for such applications. Instead, it is believed by some that in cases where information is rewritten frequently, and where relational data hierarchies and an object model are necessary, databases are best-suited to the task. Consideration of the above-mentioned opinion regarding the correct use of directories must be viewed in its appropriate context, namely where databases are intended only for the storage of very specific types of information that must be propelled by a different kind of engine, which is typically proprietary. This reasoning is based on the assumption that because the directory data is not "relational," RDBMS technology is inappropriate as an engine. Although the usage of directories has been conventionally restricted to a limited type of processing, the present inventors have realized that directories can be considered to be a special case database.
Additionally, such conventional assumptions may not be entirely accurate. Although speed and performance benefits associated with directories are highly attractive features of directories, there are a few situations that contradict the conventional view of choosing RDBMS technology versus directory technology for specific purposes. To say that directories excel in areas where it is obvious that databases do a fine job is misleading. A couple of arguments have been made regarding: (1) the ability of directories to out-perform relational databases; and (2) the specific abilities of directories to be beneficial over databases when data is predominantly read-oriented. However, neither of these arguments appears to be credible upon close scrutiny for the following reasons.
First, regarding relational databases, performance is virtually the highest priority. For example, those in doubt of performance being of highest priority need only review the amount of time database vendors spend on TPC benchmarks in attempting to woo customers by proving split-second differences in performance over the competition.
Second, the argument for better treatment of read-only data does a disservice to database vendors. Business-critical applications deployed in separate enterprises around the world rely upon responses at sub-second precision to read-only database queries; therefore, to suggest that a directory could better serve the need for very quick access of data is misleading. Additionally, if it were the case that directories could better serve the need for quick access of data, then application architects would have turned to directories many years ago in their quest to constantly provide better performing applications for end-users. A high read-to-write ratio is certainly a valid justification for the use of directory technology. However, if there actually is a tradeoff between the read-to-write ratio and performance, then enterprises that use RDBMS technology to create a database with information that changes hundreds of times per day and that is read millions of times per minute, would have supplanted RDBMS technology with the directory technology. Instead, the fastest and most heavily-used information-distribution systems presently are based on RDBMS technology.
The hierarchical nature of the directory provides another aspect in which to differentiate directories (i.e., application programs or software packages) from databases. For example, the directory hierarchy allows users and applications (i.e., application programs or software packages) to discover the relationships between directory objects as they progress further into the directory structure. Generally, the architecture of the directory is self-disclosing. This means that each object clearly shows the relationship between its parents above in the hierarchy, and its children below in the hierarchy. By comparison, the objects in a relational database can have a much more complex web of interactions, although they are hidden from view. All logical relationships in a relational database are implicit and cannot be viewed by those who do not have any previous knowledge of the database schema.
The high read-to-write ratio and the hierarchical self-disclosing criteria make directories an ideal mechanism for sharing data across a network, including those embodiments where the network comprises the Internet. When business partners share data, they do not necessarily know the intricacies of each other's database environments and may not have access to the appropriate third party software driver to access a database. Problems arise when the data being shared falls outside of the bounds of what is traditionally considered appropriate for storage in a directory. Conventionally, directories have been thought of as a source for relatively static data. This thought comes from problems associated with synchronization and replication between the unrelated sources of the relational data and the directory. Furthermore, source data is often stored in the core operational databases used by the enterprise. This data is extracted and copied into the directory using a utility application called LDAP Data Interchange Format (LDIF). When directories are populated in this way on a nightly, or even weekly basis, the value of the data diminishes the older it becomes.
The need for hierarchies, an object model, and some form of inheritance in LDAP justify the use of an object-oriented relational database system for the purposes of data storage and access. However, this justification for relational databases is contradicted by products that rely on both hierarchical and relational aspects, such as, for example: Oracle Internet Directory (OID), IBM SecureWay, and Microsoft Active Directory, which are implemented on top of Oracle 8i, IBM DB/2, and the Microsoft Jet database engine, respectively. Accordingly, there is support that the notion of a flat data hierarchy being a guarantee of maximum directory performance is not entirely valid since the fact that these proprietary directory technologies use a relational engine implies that relationships are just as important in a directory, as they are in a database.
B. The Role of Directories Abstracting Information from Databases
Based upon the above discussion, a conclusion might be drawn that because RDBMS technology offers power and speed and because a directory can be implemented on top of an RDBMS, there is no difference between the two technologies. However, directories and relational databases are not interchangeable.
The relational model is defined to mean a set of logical concepts, and, as such, is true or false in the limit of its definitions. A relational view is a virtual relation derived from base relations by applying relational algebraic operations. This requires selecting one or more tables that are stored in a database, and combining the tables using any valid sequence of relational operations to obtain a view. Examples of relational operations include selection, projection, join, etc. The result of applying the relational operations typically embody a table having properties of relational algebra. A view is defined to mean a result of a series of relational operations performed on one or more tables. Accordingly, a view can be the result of very complex operations. For example, a view can be established from a series of join operations followed by a projection operation. Additionally, a view can be characterized as a "virtual" table, meaning that the view is a "derived" table as opposed to being a "base" table.
There is a need for data abstraction because even though a directory can be implemented on top of an RDBMS, an RDBMS cannot take the place of a directory. Even in the situation when the RDBMS is used as the engine for a directory, the RDBMS must be programmed to provide a set of services that are characteristic of a directory. Directories have their own value, that is, they are ubiquitous in all sorts of applications such as email and groupware, network operating systems, and centralized Internet directories. Besides the significant difference between databases and directories being that directories support a ubiquitous Internet access standard, directories also have the ability to provide a self-disclosing schema. Although this look-up and discovery specialty distinctive to directories may sound minor to database adherents, it provides critical features that cannot be matched by relational databases.
Furthermore, many types of RDBMS technology conventionally use a data dictionary and a data catalog of some sort. The data dictionary comprises a directory of tables and their component fields, while the data catalog is a summarized abstract of a database's content. It is often the case in distributed computing that each enterprise has many disparate databases, each with its own directory. It thus remains a challenge as to how all of this information can be managed so as to facilitate analytical business processes without the need to abstract the information across all of these databases.
Directories provide a type of data-abstraction mechanism by acting as a central point for data management. Each database's data dictionary and data catalog are useful tools for managing and abstracting its data. Although each database can have its own internal directories, this does not change the fact that an enterprise-wide directory requires the implementation of a specific set of services that are directory-specific. Accordingly, a summary layer would be advantageous in providing the level of abstraction needed to maximize the productivity of data-storage and information-analysis activities across disparate databases at least at the enterprise level.
C. Using the Directory as a Tool to Manage Information Aggregation amongst Databases having the Same Implicit Scope
A directory can help to manage the scope of diverse information and to facilitate the search for information via the abstraction of aggregated data. There are at least two significant ways to use a directory, namely for searching and browsing, each of which will now be discussed as having a strong and distinct relationship with the way that users access for information and with the access paths that are used to obtain the data that is needed.
With the model of searching, the user either knows precisely or can ascertain via the use of attributes and keywords the item of interest. With either technique, the user generally provides a filter to find a specific object that meets the particular criteria by searching according to attributes. This approach provides a pattern of direct access to data and favors a flat hierarchy, an example of which is the White Pages.
With the model of browsing, the user has an approximate idea of the item of interest based on a broader criterion of the relationships between different types of information. This in turn facilitates category- and taxonomy-based navigation, which can be conveniently described as searching according to relationships. This approach provides a pattern of indirect access to data and favors a complex hierarchy with well-defined relationships between objects. A corresponding data structure allows the creation of a set of views that facilitates navigation, such as a categorized list driven by relationships between objects, an example of which is the Yellow Pages.
In general, directories can support information retrieval in an easy manner because the scope of an RDBMS is limited to objects therewithin. Metadata is not included, which is why data dictionaries and data catalogs are so heavily used for this purpose. Considering the many distributed systems and different information models used in databases, the maintenance of these varying scopes of information becomes unwieldy without a repository of "supertools" to aggregate data. In particular, a directory can be used to manage a group of databases, each pertaining to a different scope of information and containing different objects with unique definitions. When the objects in each database have commonality despite their differing granularity and information focus, directories can help facilitate information retrieval across an enterprise.
A directory is a system that can reconcile the divergent scope of information amongst unrelated databases. Directory technology provides an easy way to solve the problem of how to integrate fragmented information, that is, information spread amongst individual databases each having a narrow scope of content. As will be described in greater detail herein, the present invention provides a method to enumerate objects and their attributes, to build relationships and taxonomies based on this enumeration, and to aggregate data according to principles of generalization and specialization. While database technology uses container aggregation, in which an object is defined according to what it contains or includes rather than by categories and supercategories into which its component attributes can be classified, the data can be organized into a hierarchical model with change made to the semantics. The directory is a hierarchical model that is well-suited for aggregating relational-hierarchy. As will become evident in the description to follow, when information is retrieved either by searching or browsing a directory according to relationships, the relationships between objects in a directory become meaningful.
D. Defining and Modeling Virtual Directories
Although a search by attribute in a flat directory structure by convention works well, a search by relationship typically is problematic for the reasons already described. To overcome this hurdle, one aspect of the present invention involves mapping relationships that have already been defined within existing databases into a centralized set of hierarchical access paths that permit search and navigation. As such, the virtual directories described herein provide an alternative to large-scale data extraction and aggregation that supports both the search and browse usage models.
An aspect in accordance with the present invention directed towards the search model enables one-to-one relationships supported by a set of pointers to individual objects in the schema. This particular implementation is well-suited for a flat data hierarchy. Another aspect of the present invention which is directed towards the browse model translates the one-to-one object relationships into two hierarchies. Doing so results in mapping rules being straightforward, so that existing relationships can be used to construct an access path to the individual database objects. Additionally, the translation of objects accounts for the fact that relationships between objects cannot be duplicated in a flat data structure, which in turn can result in valuable context, that provide the ability to access different views, being lost.
It thus follows that the virtual directories of the present invention use schema-based data extraction to create a hierarchical object model. One benefit of this approach is that information does not need to be extracted, aggregated and synchronized with existing data sources on an ongoing basis, as compared with conventional approaches.
E. Illustrating the Benefits of Virtual Directories
To further clarify the benefits of the virtual directories in accordance with the present invention, an example will now be discussed. An enterprise software company uses: (1) an accounting software package to track customer and vendor receivables and payables; and (2) a sales support software package to track purchases by existing customers, prospective customers and their needs, and sales volume. The accounting package contains tables representing customers and vendors. The sales support package contains tables representing existing customers, potential customers, and sales representatives. Customers whose information is stored in the accounting package are tracked by their payment; however, the customers whose information is stored in the sales support package are tracked by their purchase history. The company's sales representatives have a need to access data on existing customers' overall expenditures in order to determine what level of pricing is compatible with their financial needs, and additionally to determine their credit-worthiness.
To perform this analysis, the representatives require the ability to quickly check the customer views in both the accounting package and their own sales support package. Because the customer records in each database contain different data types and are therefore not totally reconcilable, the representatives are best-served by a method of data access that allows them to navigate across schemas through directory layers in order to quickly check both views.
In accordance with the virtual directory server of the present invention, there is provided a method to access customer data stored in both databases. The virtual directory establishes a link between the two types of customer records and aggregates their data without changing the view. The aggregated records in the virtual directory constitute a "supercategory" of customers, which automates the process of searching for information in both source databases, and provides a unique way to index and address the data. In particular, the link between the two types of customer records is an ad hoc join. Using a standard Application Programming Interface (API) facilitates the mapping that allows navigation between the two unrelated databases. More importantly, the same mechanism is able to operate on different schema to aggregate data and to provide a simple way to deliver a choice of views. As subsequently described, one embodiment of the API that is well-suited for these purposes is LDAP.
The use of virtual directories in accordance with the present invention also offers advantages to directory administrators. These advantages are best appreciated by discussing how the VDS 408 solves many common problems being experienced by administrators deploying LDAP directories. For example, data replication and synchronization issues are eliminated with the VDS. Furthermore, the VDS enables dynamic reconfiguration of the LDAP namespace and schema. With the VDS, rapid deployment of LDAP namespaces can be established. Also, the VDS provides unlimited extensibility to existing LDAP structures.
In accordance with the present invention, the VDS eliminates data replication and synchronization issues by not requiring that any data be held within the directory itself. Requests from LDAP clients return live data from the authoritative source, so that the VDS handles schema transformation automatically. This is contrasted with conventional LDAP directories which require data to be extracted from the authoritative source of the information and transformed into a format matching the LDAP schema of the directory. With past methods, the data had to be loaded into the directory using LDIF on a periodic basis, and in order to maintain current information in the directory, this process must be repeated on a regular basis.
In one aspect of the present invention, the VDS enables dynamic LDAP namespace configuration by separating the data structure mapping and LDAP namespace creation into two distinct processes. More details about this process are described subsequently. Furthermore, relationships in back-end databases are initially mapped into the VDS server 408 using an automated database schema discovery mechanism. LDAP namespace hierarchies are then built on top of this mapping. As new LDAP attributes and objects are required in the namespace, they can be added using an interface that will be described subsequently as the DirectoryView Designer™ interface and corresponding module. The interface includes a familiar point-and-click control input enabling changes to the directory structure to take effect immediately.
Having mapped one or more relational database structures into the VDS, multiple directory hierarchies can be created based on the same data mapping to provide rapid LDAP namespace deployment. This enables the instantaneous deployment of new directory namespace structures, as the need arises. Unlike traditional LDAP implementations, where a new mapping requires either a redesign of the existing directory or a new directory structure, the present invention enables directory administrators to respond immediately to new application requests for directory data.
The VDS provides unlimited LDAP extensibility to any existing LDAP directory implementation using the object referral mechanism. Object referral allows one LDAP directory to make reference to another LDAP directory when clients request objects or attributes that are not stored in the primary directory. Using object referral, the VDS enables the extension of an existing LDAP structure without the necessity for directory redesign. With the present invention, objects and attributes can be added to an existing directory structure quickly to accommodate the changing needs of the client applications.
There are several advantages that the virtual directory server of the present invention provides to an application architect. As will be discussed in further detail below, the VDS provides an innovative way of addressing legacy application databases. For example, the VDS provides a single, industry standard API to all database data. Additionally, the VDS enables the aggregation of data from diverse heterogeneous databases. Also, the VDS allows the rapid deployment of collaborative business-to-business (B2B) applications. Finally, the VDS enables business processes to move into the network.
The VDS provides a single industry standard API by using an LDAP proxy layer to access one or more heterogeneous relational databases. Doing so allows application developers to use a single, open standard API to access any relational data source. The VDS provides a self-describing schema eliminating the need for application developers and users to understand the internal organization of each relational database being accessed. As users navigate through successive levels in the virtual directory structure, context is retained from one level to the next. This combination of a single API, self-describing schema, and the preservation of context dramatically simplifies database navigation for both application programmers and end users.
The VDS provides aggregate data from unrelated heterogeneous databases. As will be discussed herein, the term "unrelated" is defined to mean proprietary ownership stemming from various vendors, and the term "heterogeneous" is defined to mean diverse scope of content and/or context. The DirectoryView Designer™ interface is used to construct the objects in the virtual directory tree structure. Each object can represent a call to a relational database system table or view. By using container objects, that is, objects that do nothing themselves but contain references to other objects, a group of calls to related and/or unrelated heterogeneous databases that contain related data can be aggregated.
The VDS allows rapid deployment of collaborative B2B applications. The DirectoryView Designer™ interface is used to construct customized views of data in the field of corporate relational databases. The deployment of customized views is fast and simple, and does not require a great deal of technical sophistication. This means that business users can utilize the present invention to deploy customized views of real-time operational data as the needs of business partners arise. Additionally, role-based security provides for very granular authorization to view objects, assuring complete confidentiality to business partners accessing data over the network, like for example, the Internet. Business partners also have the flexibility to use customized LDAP applications and/or a plug-in (e.g., SmartBrowser™ application) to a web browser, like the Internet Explorer or Netscape Navigator.
The VDS enables business processes to move into the network. The relationship between tables in a relational database system enumerate the business processes acting upon the corporate data and together build an interrelated sequence of hierarchical connections. These hierarchical connections represent how the work of the business is done. In accordance with the present invention, the VDS enables the enumeration of these business processes to be moved out of the proprietary bounds of each unique database management system and into the network where they can be operated upon by the individuals and applications that can make best use of them.
Virtual Directory System Overview
Referring now to the high-level block diagram of FIG. 1, there is shown an example of a system 100a that implements the virtual directory system for locating, extracting and translating relational data objects and relationships into a representation that is useable with hierarchical data models in accordance with the present invention. In the example of FIG. 1, system 100a includes a hierarchical computing system 102 coupled to a hierarchical/relational translation system 104, which in turn, is communicatively coupled to a relational computing system 106. In general, hierarchical computing system 102 is based upon a top-down hierarchical data model, where information is navigable and ordered pursuant to predefined relationships being either one-to-one or one-to-many. The hierarchical network data models within system 102 are closely tied to their physical data storage since the data structures representing relationships are a part of the storage system.
By contrast, relational computing system 106 provides the unrelated heterogeneous sources of information, which can be based upon simple to more complex network data relational models that house the data but not necessarily the corresponding relationships amongst the data. Instead of relationships becoming inherently a part of the structure of system 106, logical relationships are represented by primary key matches that are connected as needed according to various relational operations. To this extent, the structure of relational computing system 106 alone typically lacks a pre-established path of navigation, unlike hierarchical computing system 102. In the hierarchical system 102, the paths are explicit, thereby allowing navigation and data discovery to be generally simple because up-front knowledge about particulars paths are not required. By contrast, relational computing system 106 includes implicit paths, which are dynamic in nature. This means that there is higher flexibility in terms of path navigation and information discovery, but requires knowledge about the objects and relationships (i.e., schema) in advance. Moreover, for clarity, further references made to "relationships" in the context of relational computing system 106 and corresponding embodiments disclosed shall refer to the "logical relationships."
In between systems 102 and 106, hierarchical/relational translation system 104 bridges the mismatch in data models between the hierarchical data structures in system 102 and the relational data structures in system 106. In general, system 104 provides the mapping from relational to hierarchical systems so that data may be shared across systems, and between unrelated sources of relational information. In doing so, translation system 104 allows the explicit definition of implicit relationships inherent to the relational computing system 106. The information within the relational computing system 106 can then be navigated and discovered in a manner that is substantially similar to navigating and discovering information in the hierarchical computing system 102.
FIG. 2 shows further details of one embodiment for a hierarchical/relational translation system 104a. In particular, a forward translation unit 202 receives requests 201 from hierarchical computing system 102, and provides a request to a query unit 206. In one embodiment to be described subsequently, this request 201 will be an Information Resource Locator (IRL, that is, an LDAP URL). Query generator 206 formulates the request into a format where relational computing system can be queried for the requested information. The extracted relational information from relational computing system 106 is received by a result storage unit 208, which transfers the extracted information to a return translation unit 210. Return translation unit converts the data received in a relational format to a hierarchical format compatible with hierarchical computing system 102. Return translation unit 210 then passes the converted data to hierarchical computing system 102 for review or further selection.
Turning to FIG. 3A, there is shown an embodiment of the forward translation unit 202 of FIG. 2. Unit 202 includes a command parser 302 for receiving requests from the hierarchical computing system 102 and for breaking down (i.e., decomposing) any commands embedded within the requests. The commands are forwarded to mapping unit 304. Unit 304 includes information about the metadata previously captured from the relational computing system 106 along with the pre-defined virtual directory definitions as previously established by a directory designer. Unit 304 uses this information to interpret the command and calls the query generator 206 with the appropriate information.
Reference is now made to FIG. 3B to describe one embodiment of the return translation unit 210 of FIG. 2. Unit 210 includes a result parser 310 for receiving responses from the result storage unit 208 which are received from relational computing system 106 in response to the queries sent from query unit 206. Result parser 310 breaks down relational data from the results received from result storage unit 208. This decomposed data is forwarded to a result formatting unit 312. Unit 312 formats the results received from parser 310 into a form compatible with the hierarchical computing system 102, and transmits the results to hierarchy computing system 102 through result transmission unit 314.
FIG. 4 shows a block diagram of a second embodiment 100b of communication system 100a, namely having more details for the hierarchical computing system 102b, the hierarchical/relational translation system 104b, and the relational computing system 106b. In the embodiment shown in FIG. 4, network communication system 100b enables the translation of relational database objects and (logical) relationships to virtual directory entries that are useable with hierarchical network data models in accordance with the present invention. Hierarchical computing system 102b includes one or more client computers 402 (used interchangeably herein with "user stations," "workstations" and "clients") that communicate over a network 404 with the translation system 104b. Translation system 104b includes at least one server computer (used interchangeably with "server") 406 having a virtual directory server 408. It is noted that reference made herein to a virtual directory server 408 refers to an application program for creating and "serving" virtual directories. By contrast, server 406 is a computer-based device having an operating system for executing the virtual directory server (application) 408. Accordingly, virtual directory 408 is referred to interchangeably herein as a "virtual directory", and VDS 408, and can be implemented by including software on server 406 for maintaining a virtual representation of directory information as described herein. The embodiment of the system 100b also illustrates that the relational computing system 106b can be a relational database.
Alternatively, virtual directory 408 can be implemented as a separate server computer from server 406. Accordingly, reference is made to an alternative embodiment for VDS 408 when implemented as a separate physical server from server 406.
One embodiment of network 404 in accordance with the present invention includes the Internet. However, it will be appreciated by those skilled in the art that the present invention works suitably-well with a wide variety of computer networks over numerous topologies, so long as network 404 connects the distributed user stations 402 to server 406. It is noted that the present invention is not limited by the type of physical connections that client and server devices make to attach to the network. Thus, to the extent the discussion herein identifies a particular type of network, such description is purely illustrative and is not intended to limit the applicability of the present invention to a specific type of network. For example, other public or private communication networks that can be used for network 404 include Local Area Networks (LANs), Wide Area Networks (WANs), intranets, extranets, Virtual Private Networks (VPNs), and wireless networks (i.e., with the appropriate wireless interfaces as known in the industry substituted for the hard-wired communication links). Generally, these types of communication networks can in turn be communicatively coupled to other networks comprising storage devices, server computers, databases, and client computers that are communicatively coupled to other computers and storage devices.
Client 402 and server 406 may beneficially utilize the present invention, and may contain an embodiment of the process steps and modules of the present invention in the form of a computer program. Alternatively, the process steps and modules of the present invention could be embodied in firmware, or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems. FIGS. 11-15 will thus be discussed accordingly for such process steps.
A. Exemplary Embodiment for Client Computer
Each user at client 402 works with system 100b to seamlessly access server 406 through network 404. Referring now to the block diagram of FIG. 17, a first embodiment for the client computer 402 is shown. The workstation 402 comprises a control unit 1702 coupled to a display device 1704, a keyboard 1706, a control input device 1708, a network controller 1710, and an Input/Output (I/O) device 1712 by a bus 1714.
Control unit 1702 may comprise an arithmetic logic unit, a microprocessor, a general purpose computer, a personal digital assistant or some other information appliance equipped to provide electronic display signals to display device 1704. In one embodiment, control unit 1702 comprises a general purpose computer having a graphical user interface, which may be generated by, for example, a program written in the Java language running on top of an operating system like the WINDOWS® or UNIX® based operating systems. In the embodiment of FIG. 17, one or more applications, electronic mail applications, spreadsheet applications, database applications, and web browser applications, generate the displays, store information, and retrieve information as part of system 100a, 100b. The control unit 1702 also has other conventional connections to other systems such as a network for the distribution of files (e.g., media objects) using standard network protocols such as TCP/IP, HTTP, LDAP and SMTP as will be understood by those skilled in art and shown in detail in FIG. 17.
It should be apparent to those skilled in the art that control unit 1702 may include more or less components than those shown in FIG. 17, without departing from the spirit and scope of the present invention. For example, control unit 1702 may include additional memory, such as, for example, a first or second level cache, or one or more application specific integrated circuits (ASICs). Similarly, additional components may be coupled to control unit 1702 including, for example, image scanning devices, digital still or video cameras, or other devices that may or may not be equipped to capture and/or download electronic data to control unit 1702.
Also shown in FIG. 17, the control unit 1702 includes a central processing unit (CPU) 1716 (otherwise referred to interchangeably as a processor), a main memory unit 1718, and a data storage device 1720, all of which are communicatively coupled to a system bus 1714.
CPU 1716 processes data signals and may comprise various computing architectures including a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, or an architecture implementing a combination of instruction sets. Although only a single CPU is shown in FIG. 17, multiple CPUs may be included.
Main memory unit 1718 can generally store instructions and data that may be executed by CPU 1716. FIG. 17 shows further details of main memory unit 1718 for a client computer 402 according to one embodiment. Those skilled in the art will recognize that main memory 1718 may include other features than those illustrated. The instructions and data may comprise code devices for performing any and all of the techniques described herein. Main memory unit 1718 may be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, or some other memory device known in the art. The memory unit 1718 preferably includes an Internet (web) browser application 1722 being of conventional type that provides access to the Internet and processes HTML, DHTML, XML, XSL, or other mark-up language to generate images on the display device 1704. For example, the web browser application 1722 could be a Netscape Navigator or Microsoft Internet Explorer browser. Alternatively, an LDAP client may be substituted for browser 1722, as will be recognized by those skilled in the art. The main memory unit 1718 also includes an Operating System (OS) 1724, a client program 1726 to enable communication between the client computer 402 and the server 406 for creating, editing, moving, adding, searching, removing or viewing information, including the directory views of the virtual directory system described in accordance with the present invention. For example, OS 1724 may be of conventional type such as WINDOWS® 98/2000 based operating systems. In other embodiments, the present invention may additionally be used in conjunction with any computer network operating system (NOS), which is an operating system used to manage network resources. A NOS may manage multiple inputs or requests concurrently and may provide the security necessary in a multi-user environment. An example of an NOS that is completely self-contained includes WINDOWS® NT manufactured by the Microsoft Corporation of Redmond, Wash.
Data storage device 1720 stores data and instructions for CPU 1716 and may comprise one or more devices including a hard disk drive, a floppy disk drive, a CD-ROM device, a DVD-ROM device, a DVD-RAM device, a DVD-RW device, a flash memory device, or some other mass storage device known in the art.
System bus 1714 represents a shared bus for communicating information and data through control unit 1702. System bus 1714 may represent one or more buses including an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus, a universal serial bus (USB), or some other bus known in the art to provide similar functionality.
Additional components coupled to control unit 1702 through system bus 1714 will now be described, and which include display device 1704, a keyboard 1706, a control input device 1708, a network controller 1710, and an I/O device 1712. Display device 1704 represents any device equipped to display electronic images and data as described herein. Display device 1704 may be a cathode ray tube (CRT), a liquid crystal display (LCD), or any other similarly equipped display device, screen or monitor. As will be described subsequently with respect to other embodiments of the client computer, display device can be the touch panel LCD screen of a Personal Digital Assistant (PDA) or the LCD screen of a portable hand held device like a cellular phone.
Keyboard 1706 represents an alpha-numeric input device coupled to control unit 1702 to communicate information and command selections to CPU 1716. Control input device 1708 represents a user input device equipped to communicate positional data as well as command selections to CPU 1716. Control input device 1716 may include a mouse, a trackball, a stylus, a pen, a touch screen, cursor direction keys, joystick, touchpad, or other mechanisms to cause movement of a cursor. Network controller 1710 links control unit 1702 to network 404 and may include network I/O adapters for enabling connection to multiple processing systems. The network of processing systems may comprise a LAN, WAN, and any other interconnected data path across which multiple devices may communicate.
One or more input/output devices 1712 are coupled to system bus 1714. For example, I/O device 1712 could be an audio device equipped to receive audio input and transmit audio output. Audio input may be received through various devices including a microphone within I/O device 1712 and network controller 1710. Similarly, audio output may originate from various devices including CPU 1716 and network controller 1710. In one embodiment, I/O device 1712 is a general purpose audio add-in expansion card designed for use within a general purpose computer. Optionally, I/O device 1712 may contain one or more analog-to-digital or digital-to-analog converters, and/or one or more digital signal processors to facilitate audio processing.
B. Exemplary Embodiments for Database
Database 106b represents any relational database system table or view. Preferably, any OLE DB, ODBC or JDBC compliant database is well-suited to work with the present invention. Although a single database 106 is shown in FIG. 4, multiple heterogeneous databases may be included. Examples of such databases include: Microsoft SQL server, Oracle, Informix, DB2, Sybase and Microsoft Access.
C. Exemplary Embodiment for Server Computer
Referring now to the block diagrams of FIGS. 10A-10B, further details of system 104b (including server 406 and VDS 408) are shown, namely through a particular embodiment of hardware as seen in hierarchical/relational translation system 104c. In the example of FIG. 10A, system 104c can include server 406 hosting the virtual directory 408 shown in FIG. 4 (and as will be described in more detail with respect to FIG. 10B). As shown in FIG. 10A, translation system 104C preferably includes a first network controller and interface (I/F) 1002 coupled to a data storage device 1004, a display device 1006, a second network controller and interface (I/F) 1008, a processing unit 1010, a memory unit 1012, and input device 1014 via a bus 1016. As shown in FIG. 10A, the first network controller and I/F 1002 is communicatively coupled via 124 to the hierarchical computing system 102b. In particular, first network controller and I/F 1002 is coupled to network 404 and ultimately to client 402. The second network controller and I/F 1008 is communicatively coupled to relational computing system 106b.
For convenience and ease of understanding the present invention, similar components used in both the client computer 402 (of FIG. 17) and the server 406 will be referenced by comparison. To this end, processing unit 1010 is similar to processor 1716 in terms of functionality. That is, processing unit 1010 processes data signals and may comprise various computing architectures including CISC or RISC architecture, or an architecture implementing a combination of instruction sets. In one embodiment, server 406 includes a multiple processor system which hosts virtual directory 408, as will be described in FIG. 10B with reference to application module 1054. As an example, a WINDOWS® NT/2000 server can be used for server 406, while other multiple processor systems may work suitably well with the present invention, including the Dell 1800 made and sold by Dell Computer Corporation.
Input device 1014 represents, primarily for convenience, the functional combination of devices for receiving control input, keyboard input of data, and I/O input. As such, the block diagram for input device 1014 in FIG. 10A may equivalently represent the functionality of keyboard 1706, control input device 1708 and I/O device 1712 of FIG. 17. Additionally, data storage device 1004 is similar to data storage device 1720, but stores data and instructions for processing unit 1010.
System bus 1016 represents a shared bus for communicating information and data through hierarchical/relational translation system 104c. System bus 1714 may represent one or more buses including an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus, a universal serial bus (USB), or some other bus known in the art to provide similar functionality.
Referring now to FIG. 10B, by way of example, portions of the memory unit 1012 needed for the processes of the present invention according to one embodiment of the present invention are shown and will now be described more specifically. In FIG. 10B, the memory unit 1012 preferably comprises an operating system 1050, other applications 1070, an application server program 1052, an LDAP server program 1053, at least one virtual directory server application 1054, a first module 1058, a second module 1060, a third module 1056, a fourth module 1062, a fifth module 1064, and a sixth module 1068, all communicatively coupled together via system bus 1020. As noted above, the memory unit 1012 stores instructions and/or data that may be executed by processing unit 1010. The instructions and/or data may comprise code for performing any and/or all of the techniques described herein. These modules 1050-1070 are coupled by bus 1020 to the processing unit 1010 for communication and cooperation to provide the functionality of the system 100b. Those skilled in the art will recognize that while the present invention will now be described as modules or portions of the memory unit 1012 of a computer system, the module or portions may also be stored in other media such as permanent data storage and may be distributed across a network having a plurality of different computers such as in a client/server environment.
The memory unit 1012 may also include one or more other application programs 1070 including, without limitation, word processing applications, electronic mail applications, and spreadsheet applications.
In accordance with the present invention, network 404 enables the communication between multiple components of server 406 and client 402, as well as other devices, which may or may not be co-located, but may be distributed for convenience, security or other reasons. To facilitate the communication between client 402 and server 404, a client-server computer network operating system (NOS) may be used for operating system 1050 to manage network resources. An NOS can manage multiple inputs or requests concurrently and may provide the security necessary in a multi-user environment. Operating system 1050 can include, for example, a NOS of conventional type such as a WINDOWS® NT/2000, and UNIX® used with the Microsystem SOLARIS® computing environment. Another conventional type of operating system that may be used with the present invention includes LINUX® based operating systems.
The virtual directory server (VDS) application 1054 is a procedure or routines that control the processing unit 1010 preferably at run-time on server 406. VDS application 1054 represents server 408 in that embodiment where server 406 hosts VDS 408. Alternatively, VDS application 1054 runs on a separate server similar to server 406 where VDS 408 is embodied as a physical server. Although only a single VDS application 1054 is shown in memory unit 1012 of FIG. 10B for ease of understanding the present invention, the server 406 may typically have several such VDS applications 1054; each application 1054 used for displaying information aggregated from unrelated heterogeneous sets of relational databases according to context.
In one embodiment, system 100b includes the VDS application 1054 along with six modules of software according to the present invention. These six modules are described below as the first module 1058, second module 1060, third module 1056, fourth module 1062, fifth module 1064, and sixth module 1068. The first module 1058 is embodied as a program for extracting and defining schema from any relational data sources that can be reached using Object Linking and Embedding DataBase (OLE DB), Open DataBase Connectivity (ODBC), and/or Java DataBase Connectivity (JDBC) software drivers. The second module 1060 is a program that includes processes for building virtual directory definitions using an oriented path derived from a schema for relational data sources, and represented by a hierarchical sub-directory of objects in a Directory Information Tree (DIT) structure. The third module 1056 includes a program for enabling browsing of the contents at the client application corresponding to the directory view definitions. The fourth module 1062 includes a program for mapping relational objects, such as tables, columns, attributes, and logical relationships into an external (e.g., XML) format. The fifth module 1064 maps the entities described by the module 1062 into the hierarchical object classes and attributes, which in one embodiment can be for LDAP. The sixth module 1068 includes processes for managing system security using Group access rights, and access control lists for directory entries, which may be implemented by conventionally known techniques. Exemplary functions and implementation for the VDS application 1054, and the first, second, third, fourth, and fifth modules 1056-064 are described below in more detail.
One Embodiment of the Present Invention
A particular embodiment for implementing system 100b, provided only by way of example, will now be discussed with focus directed to a VDS application 1054 used on server 406 along with a six module, or six-tier Internet application implemented with the Microsoft Development Environment. In this section, more details about the function of application 1054 and the first through fifth modules 1058, 1060, 1056, 1062, and 1064 are discussed, follow by an explanation of a process for using these modules. To add further clarification to particular aspects of the present invention, reference will be made to the flow-charts of FIGS. 11-15 appropriately throughout the discussion.
A. Virtual Directory Server
Reference will now be made to the VDS 408 which is implemented with the virtual directory server (VDS) application 1054 of the present invention as shown in FIG. 10B. The (VDS) application 1054 is implemented with software for accessing and extracting data 1102 from unrelated relational databases, transforming 1104 the extracted information into a representation that is compatible with a hierarchical model, and enabling the representation to be viewed on the client 402 as a virtual directory of information when queried 1108 by client 402. Generally, the VDS application 1054 maps relational database objects into a directory structure and enables users to navigate across diverse unrelated application namespaces. A namespace is the scope of those entities each referenced by some unique "qualified" name and defined by a schema. In particular, the virtual directory server 408 maps database views into a directory structure that is in compliance with LDAP, thereby resulting in LDAP directory structures. The virtual directory server 408 does not necessarily store any information itself, unlike conventional LDAP implementations. In a particular embodiment and as will be described with regard to FIG. 12 subsequently, requests are received from clients having applications operating in compliance with LDAP. The requests received are processed by the virtual directory server 408 and transmitted to the target database 106b hosting the data of interest. To this end, the virtual directory server 408 provides a virtual LDAP directory interface to diverse heterogeneous enterprise databases and allows the dynamic reconfiguration of the Directory Information Tree (DIT) and associated content. This aspect of the present invention is beneficial because a representation of complex data relationships is provided to users but without the need for replication of data and synchronization when translating data from a system using a network relational model to a system using a network hierarchical model.
In one embodiment of the present invention, the data source is a relational database 106b which forms the authoritative source of directory information to be viewed with the VDS 408 in accordance with the present invention. For example, the database 106b could be a PeopleSoft® application database having information in the nature of human resources. Alternatively, the database 106b could be an Oracle® database having financial information. In accordance with one aspect of the present invention, the virtual directory server 408 should preferably support, as a source for the directory data, the use of any relational database that can be accessed using OLE DB, ODBC, or JDBC.
According to one aspect of the present invention, the VDS 408 does not eliminate the need for an enterprise directory. Rather, enterprise directories are an integral part of any network infrastructure, and the VDS 408 inter-operates with the enterprise directory to provide even more functionality to directory-enabled applications. Enterprise directories store information from a wide array of sources, including the network operating system (NOS), and are well-suited for hosting the NOS level of data. Instead of supplementing enterprise directories, the VDS 408 in accordance with the present invention enables access to enterprise data that reside in related and unrelated relational databases. As will be described further herein, the VDS 408 is beneficial because of its ability to provide information housed in relational databases to LDAP-enabled applications.
In accordance with another aspect of the present invention, the VDS does not eliminate the need for a metadirectory. Metadirectories consolidate the management of multiple applications and NOS directories, and are a valuable component of any network infrastructure. With one embodiment of the present invention, the VDS 408 provides an LDAP interface to data that already exists in the infrastructure of relational database 106b of an enterprise. Utilizing the VDS 408 of the present invention with an enterprise metadirectory results in a faster directory infrastructure implementation and a more flexible directory design.
To further clarify aspects of the present invention, reference will contemporaneously be made to FIG. 18, while the present invention is described in the context of first, second, third, fourth, fifth, and sixth modules interacting across the relational computing system 106b, hierarchical/relational translation system 104b, and hierarchical computing system 102b. Although the particular modules 1056-1064 are mentioned, it will be appreciated by those skilled in the art that the present invention is applicable to other contexts of communications between multiple users such as users of a main frame computer, and users of other proprietary network systems. As such, the description here of the present invention in this specific context is only by way of example. It should be understood that the processes and method of the present invention are applicable to any relational database being accessed by multiple users.
As shown in the diagram of FIG. 18, a first module 1058 accepts 1802 schema data from OLE DB, ODBC and/or JDBC compliant data sources. These data sources are illustrated by way of example only as Microsoft Access database 1804, SQL Server database 1806, and Oracle database 1808. After the schema is captured 1102, the schema is then encoded in a standard format, such as XML, and stored 1810 in a schema file (as will be described in one embodiment as having a file extension of .orx).
Reference is now made to the flowchart of FIG. 14a to illustrate an example of implementing the accessing of the data sources and the capturing of schema according to step 1102 of FIG. 11. It should be noted that the exact sequence of steps described here are not necessary for the invention to work properly, and that the order of the steps may be modified to produce the equivalent end results and actions. In FIG. 14a, a user working at a client application 402 selects 1402 a relational data source. In response to the selection made, schema extraction of the objects and relationships is made by module 1058. In doing so, the entities in the data source are determined 1404 based upon the selection received. Each entity that is determined is translated 1406 to an object class. For example, step 1406 may in one embodiment generate an Objectclass Name for LDAP mapping. During this process, the primary keys of the corresponding entities are included 1408 as also being the Keys of the object class. Additionally, all attributes and/or columns of all entities selected are translated 1410 into attributes of the object classes. The results of extracting the schema in this example are memorialized 1412, that is for example, by discerning and defining the relationships between objects from the Primary and/or Foreign Keys information. Once this definition is completed, the Definition may be saved 1414, 1810 in the schema file (i.e., the .orx file) in XML format.
Frequently, there will be situations where the user will want to modify the structure of the schema in the virtual directory. User input module 1400 in FIG. 11 indicates this option, which is further described in one exemplary implementation referenced in FIG. 14b. In the example of FIG. 14b, a user is permitted to select 1420 a schema file (i.e., the .orx file) which has been output from the schema extraction process. As will be illustrated subsequently in the context of a graphical user interface, the user can provide input information so that the first module 1058 modifies the definition of the schema, by having the fourth module 1062 create new schema mapping, that is, where the VDS 408 maps database objects, such as tables, columns, attributes, and other entities into LDAP object classes and attributes. As shown in FIG. 14b, examples of such input information can comprise: (1) defining and redefining 1422 Object primary keys; (2) defining and redefining 1424 relationships between objects; (3) defining 1426 display attributes and titles for LDAP Distinguished Name (DN), and attributes mapped to LDAP; (4) removing 1428 useless objects; and (5) defining 1430 new Objects from existing, for example, as with the "derived views" option to be subsequently discussed in detail. Once these modifications have been accepted and processed by the VDS 408, 1054, the modified definition can be saved 1432 to overwrite the schema file.
Using the schema captured in the schema file, a second module 1060 is used to create 1104 a description 1812 of the directory views saved in another file, described herein as the directory view file having a .dvx file extension. For example, the creation 1104 of directory views from captured schemas indicated FIG. 11 is further described in one embodiment exemplified in the flowchart of FIG. 13. In the example of FIG. 13, a new directory view definition is created 1302 by specifying the schema to use. To do so, a default root label is provided 1304. A specific implementation will later be described in the context of a graphical user interface for clarity of the invention. Based on the relationships between objects as described in the schema specified, the user is allowed to build 1306 a hierarchy. The hierarchy should preferably be referenced, and the creation 1308 of a label is a mechanism that works well for this purpose. Input is then received 1310 from the user in order to provide the name of the label. Once the user input is received, the label is created 1312 based on the user input. In response thereto, a new node is added 1314 to the tree that represents the directory view. If there are further levels of the directory views to be built, then control is passed back to step 1306 as indicated by 1316. Otherwise, the directory view definition is saved 1318 in the directory view file (i.e., the .dvx file).
Referring back to step 1306, instead of a label being created, the user can request that a container or content be created 1320. Accordingly, the first module 1058 accepts 1322 user selection of an Object from the corresponding schema previously selected. Furthermore, the user may select 1324 attributes to retain for each Object, and may define other restrictions. This will be subsequently discussed in further detail for one implementation utilizing the "where" clause. Thereafter, the second module 1060 generates 1326 all the information needed to build the SQL query. For example, such information can include the primary key, relationships with ancestors in the hierarchy, attributes to display, and restrictions, among others, as will be described in more detail later. Control then passes to step 1314, which has already been described.
Referring back to step 1104 of FIG. 11, a default directory view may be created automatically, as described in more detail in FIG. 15. As seen in the example of FIG. 15, a schema output as a result of the schema mapping and schema manager modules 1062 and 1058, respectively as discussed in either FIGS. 14a or 14b, can be selected 1502 by the user. User selection of the objects from the schema (e.g., SQL tables) to include in the directory view is accepted 1504 by the Directory View Generator 2200 as will be described in more detail subsequently. At step 1506, the directory view is generated 1506. In doing so, for each Object selected, a node in the DirectoryView Tree is generated 1508. Each node describes the information needed to query the database 106. Thereafter, the definition is saved 1510 in a directory view file (i.e., the dvx file).
Throughout the process described in FIG. 11, the mapping of Objects from the relational model into LDAP model is performed, for example in steps of schema management as described in FIG. 14b, and using the process shown in FIG. 14c. Reference is now made to FIG. 14c to further describe the Objects mapping to the LDAP schema. As shown in FIG. 14c, the schema file (i.e., the .orx file) output from the first module 1058 is obtained 1440 by module 1062. Part of this process involves establishing definitions 1442 for the LDAP Objectclass. For example, mandatory LDAP attributes are established, like the primary key, display attributes, and non-nullable attributes. Other attributes may be established as optional LDAP attributes for the LDAP schema. More details about this process is explained subsequently in detail. Next, the LDAP attributes are added 1444 to the definition. More particularly, in step 1444, all the attributes of all the objects are added to the LDAP schema definition. The LDAP definitions are generated 1446 into files using a format that is specific to each target LDAP server.
At this stage, the directory view is added to the VDS 408 and is accessible under the control of either the third module 1056, or the LDAP server application 1053 (as seen in FIG. 10B). Additionally and as indicated in FIG. 11, the VDS may be queried 1108 and results generated in response from the VDS. More details about this process 1108 is shown in the exemplary flowchart of FIG. 12. In the example of FIG. 12, data requests are received 1202 by the VDS from the client, along with an IRL. Using the IRL received, a database query is generated 1204 by translating the IRL using the VDS. More specifically, using the input IRL and the corresponding DirectoryView definition, the appropriate database (e.g., SQL) query is generated 1205, for example by mapping generator 304 of FIG. 3A. Thereafter query generator 206 can assert the database query on database 106. In response, the result is received 1208 from database 106, for example at result storage unit 208 in FIG. 2. The data result received is then translated 1210 into a format that is useable by the client 402. In particular, the result is returned 1211, for example, as an SQL result set or LDAP entries. Alternatively, the results can be formatted in HTML, XML, WML, and DSML or other equivalent mark-up language that may be associated with particular client application. The translated data results can then be sent 1212 to the requesting client 402.
B. Schema Manager Application
The concepts and procedures for capturing database schema, and for analyzing and declaring missing attributes will now be discussed with focus being directed to a first module 1058, which is referred to interchangeably herein as the schema manager (application) 1058. The first module 1058 is referred to interchangeably herein as the schema manager 1058 for convenience. The schema manager 1058 is preferably a database schema software tool designed for extracting and capturing relational database metadata from a variety of relational databases 106b that can be accessed with OLE DB, ODBC, and/or JDBC software drivers. One type of configuration that works suitably well with the present invention comprises encoding the captured schema with an Internet markup language like, for example, Extensible Mark-up Language (XML). Once the schema is formatted with XML, the encoded metadata is then stored in a schema file. For example, the schema file may be stored with an .orx file extension representing the Objects and Relationships expressed (e.g., encoded) in XML, primarily for convenience and ease of system administration.
Referring to the block diagram of FIG. 40, an aspect of the schema manager module 1058 is shown for the function of managing objects and relationships. In the embodiment of FIG. 40, a schema manager module 530 processes the objects and relationships corresponding to a schema already captured from a database, formatted and saved in the schema file 532. The schema manager module 530 may call upon COM objects associated with the ORGEngine 534 in order to process the contents of the schema file. As will be discussed in more detail subsequently, this processing can include, but is not limited to: (1) adding relationships; (2) defining primary keys; (3) defining those attribute(s) that best describe an object (e.g., a display name); and (4) defining derived views from master objects. Once the original objects and relationships have been modified according to the described processes, the modified objects and relationships can be placed into a modified schema file, as indicated by module 536. As will be described with interface 1900 in FIGS. 19A-C, the modifications made through interface 1900 to effectuate the described processing that produces the modified schema file, may be implemented using functional module of FIG. 40 to enrich the ORG object.
1. The Schema Manager Process
The schema manager application 1058 provides the following functionality: (1) capturing database schema; (2) declaring implicit relationships; and (3) creating default and derived views.
The schema manager 1058 captures 1802, 1102 database schema from multiple relational data sources, such as the Microsoft Access 1804, Microsoft SQL Server 1802, and Oracle 1808 servers, by way of example. Each of these servers is associated with it's own language, and its metadata can be exported 1802 to the schema manager 1058. Upon capturing this metadata, the schema manager 1058 encodes 1810 the database schema in a standard format, for example, XML, which is stored in a schema file with a .orx extension, as described herein. The schema manager also records the different database connections required, and as will be discussed subsequently in detail, manages the mapping of the captured schema to an LDAP schema.
The schema manager 1058 can also declare implicit relationships. After the schema is captured 1802, undocumented primary keys and relationships, that are implicit in the code but not appearing in the data dictionary, can be declared. Since logical relationships between the different tables are the primary support for constructing directory views 1104, it is important to declare any logical relationship not captured by the schema manager 1058.
Additionally, the schema manager 1058 provides the option of using a default view in place of constructing a view by using the second module 1060 (as will be described in the next sub-section). Derived views, which are views based on one attribute in a table (e.g., a postal code) can also be constructed using the schema manager 1058.
2. Using the Schema Manager Interface
When the schema file is opened, a graphical user interface (GUI) 1900 as shown in FIG. 19A is invoked under the control of the schema manager application 1058. Interface 1900 maybe used in accordance with one embodiment of the present invention to display the database objects, which can include tables, views and relationships, preferably in alphabetical order. When a database object is selected in the interface 1900, information about the object appears in one portion of the interface. For example, in one embodiment of the interface 1900, the information about the selected object can appear on the right-hand side of the interface (as will be discussed with respect to FIG. 19A). It will be appreciated by those skilled in the art that a user interface, like for example the interface 1900, includes functionality common to conventional database schema managers. For example, such functionality comprises enabling the user to view, browse through, and edit the information.
The schema manager 1058 provides the information and resources to identify and to declare any relationships and primary keys that are not explicit in the database definition. The declaration process is a significant step because the declaration affects the quality of the directory views that will be created using the second module 1060. Any undeclared relationships or primary keys can result in a meaningless path or IRL, the consequence of which directly affects the quality or availability of information displayed using the third module 1056.
For example, FIG. 19A shows one embodiment of a user interface 1900, which illustrates summary information of all of the objects and relationships contained in a sample file, entitled Northwind.orx, having been extracted using the schema manager 1058 of the present invention. As shown in the example of FIG. 19A, a top-level name Objects 1902 is selected, and correspondingly, important summary information is displayed for each of the tables, views and relationships within the virtual directory 1901. A first type of icon 1904 identifies tables, a second type of icon 1906 indicates a view, and third type of icon 1908 identifies a relationship. Those skilled in the art will recognize that such distinctive icons are described by way of example, and that the present invention may be practiced with a variety of distinctive identifiers used for clarifying certain features of the present invention.
Commands available within the schema manager 1058 can be accessed in a variety of ways. For example, pull-down menus are available from the menu bar 1910 at the top of the interface 1900. After using a control input device to direct a cursor to click on a drop-down menu name, e.g., View 1912, a list of commands is displayed from which a selection can be made. Alternatively, schema manager 1058 can also provide command selection through the use of short-cut menus which are provided by the interface 1900. Referring to the particular embodiment of a user interface shown FIG. 19B, by performing a right-click command on an object (e.g., table, view or relationship) using a mouse, a shortcut menu 1920 appears, from which a command can be selected. Still further, schema manager 1058 can provide further command selection through the use of a toolbar 1930 as shown in the embodiment of FIG. 19A. FIG. 19C illustrates an exemplary toolbar 1930, which those of skill in the art will recognize may be programmed accordingly to conventional techniques. It will also be appreciated that menu bar 1910, shortcut menu 1920, and toolbar 1930 may be used with the present invention either by itself, or in combination with each other, and that command selection is not limited to any of these techniques.
3. The Schema Manager Basic Terms
Several definitions are introduced as follows to provide clarity and a foundation for the terms used and features described herein.
In a relational database, every table has a column or a combination of columns, known as the primary key of the table. These values uniquely identify each row in a table. At times, tables that were created in the database are found, but whose uniquely is identifying column(s) were not documented in the system catalog as the primary key. Declaring implicit primary keys is one of the database refining processes that can be performed with the second module 1058. As seen in the interface of FIG. 19A, a column indicator 1950 identifies those columns being primary keys. Additional details of the primary key are discussed in the section entitled Declaring Primary Keys.
By using the schema manager 1058, a display name, or alias, can be created for a the primary key. The display name allows the user browsing the directory to be shown more useful information. For example, if the primary key of the Customer table is CustID with an integer attribute type, then a list of numbers will be displayed in the directory tree at run time. Frequently, the user who created the directory will be the only person for whom those "numbers" have meaning. To avoid this situation, a display name could be created with the user's first name and last name in accordance with the present invention. Instead of the user seeing a "meaningless" number, the user will be able to discern a customer name that may suggest context and be significant to a larger audience. The display name is typically a combination of the primary key and one or more attributes. For example, the added attributes may be a user's first and last names. An example of a user interface 2000 is shown in FIG. 20 for selecting a display name. Additional details of the display name are discussed in the section entitled Declaring Display Names.
In order to evaluate missing relationships in the schema manager 1058, having a working knowledge of the underlying database application on which the schema is based is essential. Occasionally, the relationships between objects are not captured in the schema, for example, when some links are created implicitly. This means that the logical relationships may be present in the application, but are not recorded within the database dictionary (i.e., system catalog). Once relationships have been determined to be missing, these relationships can be declared from the schema manager 1058. One manner for doing so, for example, is with the Define Relationships command (i.e., button) 1932 of FIG. 19C. Additional details of relationships are discussed in the section entitled Setting Relationships.
A derived view results from queries made to the base table and/or VDS as discussed in the flowchart of FIG. 14b. The derived views are built by promoting one of the attributes of the base table to the entity level. Once the view is created, it can be added to the schema, after which the new relationship can be used to create more detailed and flexible views of information. Referring to the example of FIG. 21, a database includes a table that lists Customers and related attributes, including the attribute for Country. In order to determine a list of all countries having associated customers, the derived view feature of the present invention enables the creation of a view that lists all applicable countries. One advantage of having a derived view is the provision of summary data. For example, as shown in FIG. 21, all occurrences of a particular country is summarized in the derived view, that is, combined into one record for viewing. A derived view can be declared from the schema manager 1058. One manner for doing so, for example, is with the Define Derived Views command (i.e., button) 1934 of FIG. 19C. Additional details of derived views are discussed in the section entitled Creating Derived Views.
In FIG. 19C, the Edit Connection String command (i.e., button) 1936 found in interface 1900 can be defined to provide the function of changing the path to a database. The path is defined by OLE DB, ODBC, or JDBC whichever is applicable. Additional details on editing connection strings are discussed in the section entitled Editing Connection Strings.
A default view represents a default namespace, and can be created to either be a flat or indexed namespace. An example of a user interface referred to herein as the Default Views (DVX) Generator 2200 shown in FIG. 22 allows a user to select a directory view type 2201. For example, if a flat namespace with a simple short Distinguished Name (DN) is desired, the DVX Generator 2200 can be used to select the flat directory view type 2202. As is known in the art, a DN is a compound name that uniquely identifies an entry in an LDAP or X.500 directory. Thereafter, referring to FIG. 23A, the second module 1060 can be used to generate, by way of example, a user interface 2301 to display a DIT 2302 and a corresponding flat default view 2303 corresponding to a DN for the information displayed 2304 using the third module 1056. In the example of FIG. 23A, the DN is comprised of table=Customers 2306, dv=Northwind 2308, and o=radiantlogic. Upon selecting the flat directory view type 2202 from the DVX generator 2200, all of the tables 2310 that are selected are shown in the user interface 2300 of FIG. 23A. In particular, a flat default view 2303 enables a large amount of information to be displayed in view form. Accordingly, it will be appreciated by those skilled in the art that, in general, the flat namespace is well-suited to views that are not complex nor have a customized DIT. Additional details of default views are discussed in the section entitled Creating Default Views.
By contrast, indexed views permit each record of the table to be an entry in the DIT. Referring to the user interface for the DVX Generator 2200 shown in FIG. 22, if the indexed directory view type 2204 is selected, then in response and referring to FIG. 23B, the second module 1060 is used to generate, by way of example, a user interface 2320 to display attributes of a DIT 2322 in a corresponding default indexed directory view 2324. As seen in FIG. 23B, each customer is an entry in the tree 2326 on the left-hand side of interface 2328 as generated by the third module 1056. Although a longer DN is needed to retrieve the information using the indexed directory view, a comprehensive presentation is made available to users upon browsing the directory view.
4. Using the Schema Manager
In accordance with the particular embodiment described, the discussion will now focus on the process for capturing the database schema, determining the validity of the schema captured, and creating default and derived views.
(a) Capture the Database Schema
A key function of the schema manager 1058 comprises capturing database schema. To describe one manner for performing this function, reference is now made to a block diagram of FIG. 24 having a module 2402 for capturing the database schema. To provide added clarity of the present invention, reference will contemporaneously be made to FIG. 18. Module 2402 is interchangeably referred to as the Schema Extraction Wizard. The primary function of module 2402 is to select 1402 an OLE DB data source 2404 using the Datalink object for dialogs. OLE DB source 2404 can be any OLE DB or ODBC compliant databases known in the art. Several examples of such compliant databases include the Microsoft Access Jet, SQL, Oracle 8, and IBM DB2 databases. The database schema, which may comprise tables, views fields and logical relationships, is extracted from DB source 2404 with the use of database objects abstraction, such as Active Data Object (ADO) 2406 or JDBC objects. ADO 2406 is a programming interface from Microsoft that is designed to facilitate data access. Typically, an ADO is embodied as a Component Object Model (COM) object, which is called whenever the data access functionality programmed into the object is needed. The database schema extracted 1802 is then stored as an Object and Relationships Graph (ORG) object using an ORG engine COM object 2408. The ORG object 2408 is then serialized and transformed 1404 into an XML format 1810 and saved in a file with a .orx extension as indicated by 2410.
To further illustrate the process of connecting the virtual directory server 408 to a database 1066 and selecting the database from which to capture schema from, reference will now be made to a user interface 2500 shown in FIG. 25. The Schema Extraction Wizard 2402 may be stored on server 406 and invoked by the schema manager application 1058. For example, a user at client 402 may invoke the Schema Extraction Wizard 2402 from the desktop application of the Microsoft Windows operating system by selecting from the Start menu, the Programs command, and an application directed to execute the schema manager module 1058. The schema extraction wizard 2402 may be programmed to start upon selecting the New command from the File drop-down menu 1914 in the menu bar 1910 of FIG. 19A. After the schema extraction wizard 2402 is invoked, a user interface in the nature of a Data Link Properties dialog box 2500 is presented to the user. Under the tab labeled Provider 2501, the user selects an OLE DB Provider, like for example, Microsoft OLE DB Provider for ODBC Drivers 2502 (and clicks the Next button 2504). Under the tab labeled Connection 2506 (and shown in more detail in FIG. 31 described subsequently), the appropriate fields are displayed for the OLE DB (ODBC) provider, and the user inputs additional entries into required fields to select the name of the database 2404. An indicator, for example a Test Connection command (i.e., button) can be selected in order to obtain a message as to whether or not the testing of the connection to the database indicated succeeded. Assuming that the test connection succeeded, another selection can be made to invoke the schema extraction process, whereby the schema (.orx) file is generated to hold an XML representation of the schema extracted from the database 2404. The schema extraction wizard preferably allows the user to name and save the schema (.orx) file before completing.
(b) Determining the Validity of the Schema Captured
Once the schema is captured preferably using the described process, the captured schema should be validated. Referring now to FIG. 26, one example of implementing the validation of the captured schema is illustrated in the block diagram shown. In the example shown FIG. 26, the validity of the schema is evaluated by verifying that all the relationships and primary keys are defined in the schema |