Apparatus and method for performing data transformations in data warehousing6339775Abstract A transformation description language (TDL) for specifying how data is to be manipulated in a data warehousing application. The TDL is comprised of a source for storing raw data, one or more transformation objects for processing the raw data according to predefined instructions, and a target for storing the processed data. A mapping is used for directing the data flow between the I/O ports corresponding to the source, the plurality of transformation objects, and the target. The mapping specifies the connectivity between the source, transformation, and target objects as well as the order of these connections. There are a number of different transformations which can be performed to manipulate the data. Some such transformations include: an aggregator transformation, an expression transformation, a filter transformation, a lookup transformation, a query transformation, a sequence transformation, a stored procedure transformation, and an update strategy transformation. Claims What is claimed is: Description FIELD OF THE INVENTION
CREATE Source sr_prodsale (store VARCHAR(50) OUT,
month VARCHAR(10) OUT,
product VARCHAR(50) OUT,
cost MONEY OUT,
price MONEY OUT,
unitssold INTEGER OUT)
{ };
CREATE Source sr_storeexp (store VARCHAR(50) OUT,
month VARCHAR(10) OUT,
region VARCHAR(25) OUT,
expense MONEY OUT)
{ };
CREATE Aggregator ag_profprof (store VARCHAR(50) INOUT,
month VARCHAR(10) INOUT,
product VARCHAR(50) IN,
cost MONEY IN,
price MONEY IN,
unitssold INTEGER IN)
prodprof MONEY OUT,
Cachedir "S:.backslash.dw.backslash.system" ATTR)
{
prodprof = SUM(unitssold*(price-cost));
GROUP (store, month);
};
CREATE Join jn_strfinance (store VARCHAR(50) INOUT,
month VARCHAR(10) INOUT,
prodprof MONEY INOUT,
store1 VARCHAR(50) IN,
month1 VARCHAR(10) IN,
region VARCHAR (25) INOUT,
expense MONEY INOUT)
{
store1 = store;
month1 = month;
};
CREATE Aggregator ag_storeprof (store VARCHAR(50) IN,
month VARCHAR(10) INOUT,
prodprof MONEY IN,
region VARCHAR (25) INOUT,
expense MONEY IN,
regprof MONEY OUT)
{
regprof = SUM(prodprof-expense);
GROUP (region, month);
};
CREATE Mapping mp_regionalprofit ( )
{
sr_prodsale.exec ( );
sr_storeexp.exec ( );
ag_prodprof.exec (sr_prodsale.store, sr_prodsale.month,
sr_prodsale.product, sr_prodsale.cost, sr_prodsale.price,
sr_prodsale.unitssold);
jn_strfinance.exec (ag_prodprof.store, ag_prodprof.month,
ag_prodprof.prodprof, ar_storeexp.store, sr_storeexp.month,
sr_storeexp.region, sr_storeexp.expense);
ag_totprof.exec (jn_strfinance.store, jn_strfinance.month,
jn_strfinance.prodprof, jn_strfinance.region,
jn_strfinance.expense);
tg_profits.exec (ag_totprof.month, ag_totprof.region,
ag_totprof.regprof);
};
FIG. 7A shows a model for the components of a mapping. The following sections provide a detailed description as well as a specification and an example of using that specification for the fundamental concepts used in TDL. Specification of each object type has two parts: the first part has the declaration for the ports and attributes of the object, and the second part has the definition of the behavior of the object. Source and target objects do not have any behavior. The standard BNF syntax is used for specification of TDL, however for better readability, language keywords are in bold-face with their first letter capitalized while all pre-defined types and constants are in all capital letters. A variable naming convention, by means of a set of standard prefixes, is also used for quickly identifying the object type associated with a specific variable. For example, the source objects could have sr_ as a prefix while the aggregator objects could use the ag_ prefix. FIG. 7B depicts a UML notation for the components of FIG. 7A. Basically, a box (e.g., Class Name 750) indicates the abstraction for a concept, such as source, target, etc. A line with a diamond at one end (e.g., the line connecting boxes 751 and 752) indicates an aggregation relationship. A line with an arrow at one end (e.g., the line connecting boxes 753 and 754) indicates a specialization and/or generalization. A simple line (e.g., the line connecting boxes 755 and 756) indicates a general association. Each of the components relating to TDL is now described in detail below. Port A port is analogous to a column of a table and provides the primary means of parameterized dataflow between various objects in a mapping. A port must have a name, data type it holds, and its data flow type (i.e., in, out, or in/out). The port may also have precision and scale values for further specifying its data types. A port must always be defined within the definition of a source, target, or transformation object; thus it would be meaningless to have a stand-alone definition of a port. With reference back to FIG. 7A, the Transformation Field 712 corresponds to a port for a transformation object. It may be an input, or output, or an input/output. A ColumnExpression 716 specifies the expression (often defined in SQL) that is used for manipulating the data in the Transformation Field 712. The Dependency 713 captures the dependencies between the source, target, and transformation ports (e.g., the arrows). A TargetField 714 corresponds to the input port of a target table. A SourceField 715 corresponds to the output port of a source. Specification <port_def > ::=<port_name> <data_type_def> <port_type> <port_name> ::=<string> <port_type> ::={IN .vertline. OUT .vertline. INOUT} <data_type_def> ::=<data_type> [(<precision>[,<scale>])] <precision> ::=<integer> <scale> ::=<integer> Example Please see examples below. Source The source object 701 provides purely output information at the beginning of a data-flow pipeline of a mapping, and thus it only has port definitions of type OUT.
Specification
CREATE Source <source_name>(
<ports>
<ports>::=<port_ def>, . . .
)
Example
CREATE Source sr_lineItem(
id INTEGER OUT,
name VARCHAR(20) OUT,
price MONEY OUT,
discount MONEY OUT)
Target Target objects 702 are the final receivers of information at the end of a data-flow pipeline for a mapping. All ports of a target object must be of type IN.
Specification
CREATE Target <target_name>(
<ports>
<ports>::=<port_def>, . . .
)
Example
CREATE Target tg_storeSum(
store_id INTEGER IN,
store_name VARCHAR(20) IN,
num_orders INTEGER IN,
avg_sales MONEY IN,
total_sales MONEY IN)
Transformation Objects As explained above, a number of Transformations 718 are used to transform incoming data according to some predefined behavior or rule. There are a number of different types of transformations possible (e.g., Aggregator 703, Filter 705, Lookup 706, Query 707, etc.). Other different types of transformations can be added. Existing transformation types can be modified or even deleted. Furthermore, a transformation object can be programmed to perform specific functions, depending on its corresponding application. For example, the Aggregator transformation object 703 can be used to perform a summation between two variables in one application. However, a different application can apply the Aggregator transformation object 703 to perform a summation of four variables. Each Transformation 718 is comprised of an Attribute 719 which defines the characteristics of a transformation specified in its declaration part. This could be covered by either the TableExpression 704 or Property 717. The expressions associated with TableExpression 704 are those expression (often in SQL) that may be used at the table level and not at the port level. Property 717 contains the various characteristics of a transformation, such as a cache directory, etc. Aggregator Transformation An aggregator transformation object 703 allows one to perform aggregation functions over one or many of its output or input/output ports. In addition to the port definitions of an aggregator, it also has a Cachedir attribute for its cache directory location. A default value for this attribute may be optionally specified in the declaration part or later when the aggregator is instantiated. The behavior of the aggregation object is specified in the body of its specification, and one can also optionally specify a set of ports in the Group statement for any group-by behavior. However, the SQL rules for using port names in a group-by statement must be observed.
Specification
CREATE Aggregator <aggregator_name>
<ports>,
<ports>::=<port_def>, . . .
Cachedir [<string>]ATTR)
.backslash.{
<aggregation_behavior>;
<aggregation_behavior>::=<port_name>=<aggregate_exp>
<aggregate_exp>::={<port_name> .vertline.
<aggregate.sub.- func>
(aggregator_exp)}
[Group (<port_names>)]
<port_names>::=<port_name>, . . .
Example
CREATE Aggregator ag_salesStats (
store_id INTEGER INOUT,
quantity INTEGER INOUT,
price MONEY IN,
discount MONEY IN,
avg_sales MONEY OUT,
total_sales MONEY OUT,
Cachedir STRING ATTR)
{
avg_sales =AVG(quantity*price_discount);
total_sales =SUM(quantity*price-discount);
GROUP(store_id);
};
Expression Transformation The Expression transformation allows the user to associate an expression statement with any of its output or input/output ports. The expression in turn may contain functions that are applied to input or input/output ports.
Specfication
CREATE Expression <expression_name>(
<ports>
<ports>::=<port_def>, . . .
)
.backslash.{
<expression_binding>; . . .
<expression_binding>::=<port_name>=<expression_exp>
<expression_exp>::={<port_name> .vertline.
<general_func>
(<expression_exp>) .vertline.
<constant>}
.backslash.}
Example
CREATE Expression xp_salesRating(
quantity INTEGER IN,
price MONEY IN,
discount MONEY IN,
store_rating VARCHAR(10) OUT)
{
store_rating = IIF(SUM(quantity*price-discount)>= 10000,
`EXCELLENT`,IIF(SUM(quantity*price-discount)<5000,
`POOR`,`FAIR`));
};
Filter Transformation The filter transformation object 705 applies an expression to all of its ports and outputs all of the ports if the result of the evaluation is TRUE (a non-zero value), otherwise nothing is output. The expression may be a combination of constants and functions applied to the ports.
Specification
CREATE Filter <filter_name>(
<ports>
<ports> ::= <port_def>, . . .
)
.backslash.{
Filterexpr (<filter_exp>);
<filter_exp> ::= {<port_name> .vertline. <general_func>
(<filter_exp>) .vertline.
<constant>}
Example
CREATE Filter fl_storeFilter(
store_id INTEGER INOUT,
store_name VARCHAR(20) INOUT,
store_yr_opened INTEGER INOUT,
stor_sales MONEY INOUT)
{
Filterexpr (IIF(store_sales >= 100000 AND store_yr_opened > 1980));
}
Lookup Transformation The lookup transformation object 706 provides the capability to translate one or a combination of ports into other ports based on some predefined fields that are specified in a relational table in the target database. The fields of this lookup table must be specified as lookup ports in the lookup transformation, in addition to a lookup expression and several other parameters.
Specification
CREATE Lookup <lookup_name>(
<ports>,
<ports> ::= <port_def>, . . .
Lkpfields (<fields>),
<fields> ::= <field_def>, . . .
<field_def> ::= <field_name><data_type>
<field_name> ::= <string>
Lkptable [<table_name>] ATTR,
[Lkpcaching [<boolean>] ATTR,]
Multimatch [{FIRST .vertline. LAST .vertline. ERROR .vertline. ALL}] ATTR,
Dblocation [<string>] ATTR,
Sourcetype [{DB .vertline. FILE}] ATTR,
[Recache [<boolean>] ATTR]
)
.backslash.{
Lkpoverride <sql_exp>,
Lkpcond <lookup_expr>,
<lookup_expr> ::= {<lookup_subexpr> .vertline.
<lookup_expr> AND
<lookup_subexpr>}
<lookup_subexpr> ::= <port_name> <logical_op>
<port_name>
<logical_op> ::= { = .vertline. > .vertline. <> .vertline.
<= .vertline. >= }
.backslash.}
Example
CREATE Lookup lk_customerLookup (
old_cust_id INTEGER IN,
cust_id INTEGER OUT,
cust_name VARCHAR(25) INOUT,
address VARCHAR(20) OUT,
city VARCHAR(15) OUT,
state VARCHAR(2) OUT,
zip VARCHAR(10) OUT
old_cust_id INTEGER) IN,
Lkpfields (lkp_cust_id INTEGER, cust_id INTEGER, cust_name
VARCHAR(25), address VARCHAR(20), city VARCHAR(15),
state VARCHAR(2), zip VARCHAR(10))
{
Lkpcond ("OLD_CUST_ID = LKP_CUST_ID");
};
lk_customerLookup ("CUSTADDLKP", TRUE, FIRST,
"s:.backslash.dw.backslash.targcust",
DB)
Query Transformation The query transformation object 707 allows one to tailor the input data for a mapping from one or more source objects using SQL expressions. Given the fields of one or more source tables as the input ports of this transformation, the user may either provide a complete SQL query to override the default behavior of the transformation, or specify the join (Select) and filter (Where) parameters of the SQL expression that is created by the system.
Specification
CREATE Query <query_name>(
<ports>
<ports>::=<port_def>, . . .
)
.backslash.{
[Sqlquery (<sql_exp>);]
[Userdefjoin (<sql_exp>);]
[Sourcefilter (<sql_exp>)]
.backslash.}
Example
CREATE Query qr_storeQuery(
store_id INTEGER INOUT,
store_type CHAR IN,
store_name VARCHAR(20) INOUT,
store_address VARCHAR(35) INOUT)
}
Sourcefilter ("store_type = 'R'")
}
Sequence Transformation The sequence transformation object 708 is used for creating unique keys for records as they are processed in a mapping. Each instance of a sequence transformation is created with an initial value, which is used at the start of an execution, an increment value to compute the values of subsequent indexes, and an end value. Default values will be used if any of these parameters are omitted. This transformation has two predefined output ports, curval and nextval, that contain the current value and the next value of the sequence index, respectively. If the curval is used, all the rows processed will get the same value corresponding to the curval setting at the time of processing; and when the nextval is used, the row for each row is incremented from the curval by the given increment setting. If the Cycle keyword is specified, the sequence numbers will be reused once a cycle has been completed and Reset allows one to reset the value of the index being used to the starting value. The Cache keyword is provided to allow creating blocks of sequence indexes that could be used by one or more other objects.
Specification
CREATE Sequence <sequence_name> (
curval INTEGER OUT,
nextval INTEGER OUT,
[Startvalue [<integer>] ATTR,]
[Increment [<integer>] ATTR,]
[Endval [<integer>] ATTR,]
[Cycle [<boolean>] ATTR,]
[Cache [<boolean>] ATTR,]
[Reset [<boolean>] ATTR]
)
Example
CREATE Sequence sq_incrBy5Sequence (
curval INTEGER OUT,
nextval INTEGER OUT)
{};
sq_incrBy5Sequence (2,5,92,TRUE);
Stored-Procedure Transformation The stored-procedure transformation object 709 allows one to execute a parameterized function either in a pipeline mode or stand-alone within the expression, lookup, or user-defined transformations. Furthermore, the stored procedure may return one or more values through its output ports, however, no input/output ports may be used (i.e., the procedure parameters are only passed in by value). Procedures that may have a unique return value may also be specified having their return value represented in a specialized return port. For procedures that are processed on a row-basis, the name of the procedure is specified by the Procname statement. The Targetconn will be used for the location of the database in which the procedures will be executed. In addition to the parameterized procedures discussed, pre- and post-processing functions can be executed before and after the row-by-row transformations are processed. The name of such a function is specified by the Calltext statement and its type is defined one of the four types TARGPRE, TARGPOST, SRCPRE, and SRCPOST. NORMAL is used for the row-by-row type procedures.
Specification
CREATE Storedproc <storedproc_name> (
<ports>
<ports>::=<port_def>, . . .
Procname[<string>] ATTR,
Targetconn[<string>] ATTR,
Calltype
[{TARGPRE.vertline.TARGPOST.vertline.NORMAL.vertline.SRCPRE.vertline.SRCPO
ST}]
ATTR,
[Calltext <string>] ATTR]
)
Example
CREATE Storedproc sp_prodDistrProc (
prodcateg VARCHAR(10) IN,
price Port MONEY IN,
salesregion VARCHAR(15) IN,
numprodsold INTGEGER IN,
futuresales MONEY OUT)
)
{};
sp_prodDistrProc ("forecastSales", "S:.backslash.dw.backslash.sales",
NORMAL);
Update Strategy Transformation With the update strategy transformation object 710, one can specify how each individual row, after it is processed by a mapping, will be used to update the target database tables. The various updating options are insert, delete, update, and override. An expression specified by the user is evaluated for each row to determine the update strategy for that row. This expression returns 0 for insert, 1 for update, 2 for delete, and 3 for reject.
Specification
CREATE UpdateStrategy <update_name> (
<ports>
<ports>::= <port_def>, . . .
)
.backslash.{
Updateexpr <update_exp>;
<update_exp> ::= {<port_name> .vertline. <general_func>
(<update_exp>) .vertline.
<constant>}
.backslash.};
Example
CREATE UpdateStrategy us_customerUpdate (
acct_id NUMERIC(10) INOUT,
name VARCHAR(25) INOUT,
address VARCHAR(40) INOUT,
last_activity DATE IN,
account_status CHAR INOUT)
{
Updateexpr(IIF(DATE_DIFF(last_activity,SYSDATE,`MM`)>12, 2, 1));
};
A mapping 711 is a composition of various source, transformation, and target objects that are linked in a directed, acyclic graph. As shown in FIG. 4, the source and target objects are the starting and ending nodes of the graph, respectively, while the transformation objects provide the intermediate nodes. The description of a mapping presently contains the connectivity data for ports of connected objects. The order that the objects are connected is important since it dictates the data flow in the pipeline from sources to targets. Once a mapping is instantiated, each object in that mapping is included by means of its Exec method. Functions and Data Types The functions and data types that are used by TDL is described. The majority of these functions and data types are used in a standard SQL environment, however, additional specific elements are also provided. Specification <general_func> ::={<char_func> .vertline. <conversion_func> .vertline. <date_func> .vertline. <group_func> .vertline. <numeric_func> .vertline. <scientific_func> .vertline. <special_func>} <char_func> ::={ASCII(<char_var>) .vertline. CHR (<num_var) .vertline. CONCAT (<char_var> <char_var>) .vertline. INITCAP (<char_var>) .vertline. INSTR (<char_var>, <char_var>, [<int_var>, [<int_var>]]) .vertline. LENGTH (<expr_var>) .vertline. LOWER (<expr_var>) .vertline. LPAD (<expr_var>, <num_var>, [<char_var>]) .vertline. LTRIM (<expr_var>, [<expr_var>]) .vertline. RPAD (<char_var>, <num_var>, [<char_var>]) .vertline. RTRIM (<char_var>, [<char_var>]) .vertline. SUBSTR (<char_var>, <num_var>, [<num_var>]) .vertline. UPPER (<char_var>) } <conversion_func> ::={TO_CHAR ({<num_var> .vertline. date_var, [<char_var>]}) .vertline. TO_DATE (<char_var>, [<char_var>]) .vertline. TO_NUMBER (<var_char>)} <date_func> ::={ADD_TO_DATE (<date_var>, <char_var>, <int_var>) .vertline. DATE_COMPARE (<date_var>, <date_var>) .vertline. DATE_DIFF (<date_var>, <date_var>, <char_var>) .vertline. GET_DATE_PART (<date_var>, <char_var>) .vertline. LAST_DAY (<expr_var>) .vertline. SER_DATE_PART (<date_var>, <char_var>, <int_var>) } <group_func> ::={AVG (<num_val>, [<expr_val>]) .vertline. COUNT (<expr_var>, [<expr_val>]) .vertline. MAX (<expr_var>, [<expr_var>]), MEDIAN (<num_var>, [<expr_var>]) .vertline. MIN (<expr_var>, [<expr_var>]), PERCENTILE (<num_var>, <num_var>, [<expr_var>]) .vertline. STDDEV (<num_var>, [<expr_var>]) .vertline. SUM (<num_var>, [<expr_var>]) .vertline. VARIANCE (<num_var>, [<expr_var>]) } <numeric_func> ::={ABS (<num_var>) .vertline. CEIL (<num_var>) .vertline. CUME (<num_var>, [<expr_var>]) .vertline. EXP (<num_var>) .vertline. ABS (<num_var>) .vertline. FLOOR (<num_var>) .vertline. ISNULL (<expr_var>) .vertline. LN (<expr_var>) .vertline. LOG (<expr_var>, <expr_var>) .vertline. MOD (<num_var>, <num_var>) .vertline. MOVINGAVG (<num_var>, <int_var>, [<expr_var>]) .vertline. MOVINGSUM (<num_var>, <int_var>, [<expr_var>]) .vertline. POWER (<num_var>, <num_var>) .vertline. ROUND ({<num_var> .vertline. <date_var>), }<num_var> .vertline. <format_var>}) .vertline. SIGN (<num_var>) .vertline. SQRT (<num_var>) .vertline. TRUNC (<num_var>, [<num_var>) } <scientific_func> ::={COS (<num var>) .vertline. COSH (<num_var>) .vertline. SIN (<num_var>) .vertline. SINH (<num_var>) .vertline. TAN (<num_var>) .vertline. TANH (<num_var>)} <special func> ::={DECODE (<expr_var>, <expr_var>, <expr_var>, [<expr_var>]) .vertline. IIF (<expr_var>, <expr_var>, [<expr_var>]) .vertline. LOOKUP (<expr_var>, <expr_var>, <expr_var>) .vertline. PIVOT (<any_var>, <bool_var>) .vertline. TOP (<num_var>, <int_var>, <int_var>) } <data_type> ::={BINARY .vertline. BIT .vertline. CHAR .vertline. DATE .vertline. DECIMAL .vertline. DOUBLE .vertline. FLOAT .vertline. IMAGE .vertline. INTEGER .vertline. MONEY .vertline. NUMERIC .vertline. RAW .vertline. REAL .vertline. SMALLINT .vertline. TEXT .vertline. TIMESTAMP .vertline. TINYINT .vertline. VARCHAR } <constant> ::={<posinteger> .vertline. <integer> .vertline. <real> .vertline. <char> .vertline. <string> .vertline. <boolean>} FIG. 8 illustrates an exemplary computer system 800 upon which the present invention may be implemented or practiced. It is appreciated that the computer system 800 of FIG. 8 is exemplary only and that the present invention can operate within a number of different computer systems. Computer system 800 of FIG. 8 includes an address/data bus 801 for conveying digital information between the various components, a central processor unit (CPU) 802 for processing the digital information and instructions, a main memory 804 comprised of random access memory (RAM) for storing the digital information and instructions, a read only memory (ROM) 811 for storing information and instructions of a more permanent nature. In addition, computer system 800 may also include a data storage device 807 (e.g., a magnetic, optical, floppy, or tape drive) for storing vast amounts of data, and an I/O interface 808 for interfacing with peripheral devices (e.g., computer network, modem, etc.). It should be noted that the client program for performing TDL can be stored either in main memory 804, data storage device 807, or in an external storage device. Devices which may be coupled to computer system 800 include a display device 821 for displaying information to a computer user, an alphanumeric input device 822 (e.g., a keyboard), and a cursor control device 823 (e.g., mouse, trackball, light pen, etc.) for inputting data and selections. Hence, an apparatus and method for transforming data in data warehousing applications has been disclosed. Furthermore, a functional specification for the Transformation Description Language (TDL) has been described in detail. The primary objective for creating TDL is to provide a text-based representation of the definitions of various source, transformation, and target objects used in data warehousing applications. Such textual descriptions can in turn be used for verification of the mappings created through a graphical user interface. A parser can also be potentially developed for TDL so that components of a mappings described in this language can be brought into other systems. Thus, TDL could potentially become a standard for exchanging transformation metadata across various programs. Furthermore, TDL could potentially be extended to capture the complete behavior of a mapping (i.e., the internal dependencies and the data flow across the acyclic, directed graph structure of the mapping). In a sense, TDL may be extended to become a complete transformation programming language to be used on top of the transformation engine to offer a truly software component which in turn could be embedded in various data warehousing systems. In an alternative embodiment, the present invention can be readily encapsulated within an application. The present invention can be extended to cover any analytical application. Furthermore, the present invention can be extended to any application which uses data mart or data warehousing technology whereby data is being aggregated, consolidated or otherwise transformed. Some exemplary applications within which the present invention may be practiced include, but are not limited to: balance scorecard applications, enterprise applications, performance measurement applications, marketing tools, profiling applications, data mining applications, segmentation applications, filtering tools, etc.). FIG. 9 is a block diagram showing an exemplary application having an embedded data transformation engine as contemplated by the present invention. Source data is stored upon in one or more source systems 901-903 (e.g., source tables). The relevant source data is extracted from one of the source systems 901-903 by the data transformation engine 904. In the currently preferred embodiment, data transformation engine 904 performs the transformations described above in reference to the transformation description language. The transformed data is then presented by the analytical application 905 for display to the user. Typically, the analytical application has one or more graphical user interfaces which enables a user to request and obtain desired information. The resulting transformed data can also be stored in one or more data marts or data warehouses 906-907. An example of an analytical application 905 might be customer profiling used for marketing purposes. A luxury car dealership can use a profiling application to analyze the source systems of a credit card company in order to determine those potential customers who live within a certain geographical location, meet a certain income level, fly first class, eat out at upscale restaurants, etc. Complex algorithms and even neural networks can be used to analyze the transformed data obtained from the data transformation engine. The results are then stored within one or more data marts for subsequent retrieval. It should be noted that the source systems 901-903 and/or the data transformation engine 904 can be part of the application 905. Furthermore, data marts/warehouses 906-907 are optional, or they can reside outside of the analytical application 905. In another embodiment of the present invention, the transformation engine can be used to facilitate the transfer of data from one system to another system. Often, data from one system is required to be copied over onto a different system such that both system now have access to that valuable data. Unfortunately, in many instances, the two systems can have conflicting formats, structures, or configurations due to hardware, software, or vendor differences. As such, it may be a rather difficult task trying to reconcile the different formats. The present invention can be used to transform data, not only for analytical purposes, but also to transform the data to provide compatibility, formatting, and interfacing solutions. In other words, the data is transformed to provide data integration amongst two or more systems. Although the purpose has now changed, the process for accomplishing the data integration relies on the same steps of mapping, transforming, and then exporting of the data. FIG. 10 is a block diagram showing how one embodiment of the present invention can be used to transfer data from one system to another system having a different format. Source data is stored in one or more source systems 1001-1002. The relevant source data is extracted from the appropriate source system by the data transformation engine 1003. In the currently preferred embodiment, the TDL as described above is used to accomplish the transformation. The source data is transformed such that its format is now compatible with that of the target system, such as target system 1004 and/or target system 1005. The target system can be a database, data mart, or data warehouse. In yet another embodiment, the present invention covers the case whereby rather than having a physically separate database, the database is embedded within the application itself or otherwise accessible by the application. Thereby, the application now acts as the source system. An example might be a pre-packaged financial system. The user interacts with one or more graphical user interfaces rather than directly with a database. However, a database resides either inside, underneath, or remote to the application. The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the Claims appended hereto and their equivalents.
|
Same subclass Same class Consider this |
||||||||||
