Importing data using metadata6993529Abstract A method and system for importing data into a data store in accordance with metadata. The import system provides metadata that specifies how the import data for various types of import sources is to be imported into the data store. The import sources may be categorized according to the type of data provided by the import sources. When the import system receives the import data from the import source, it identifies the type of import source and retrieves the metadata defined for that type of import source. The import system then imports the received import data into the data store in accordance with the retrieved metadata. Claims What is claimed is: Description TECHNICAL FIELD
The meta—DimProperty table contains an entry for each column of each table in the data warehouse. The combination of the DimName and PropertyName attributes uniquely identifies the entries of this table.
The import data attributes of the meta—DimProperty table describes the data to be imported.
The schema data attributes of the meta—DimProperty table describe how to store the import data in this column in this table.
FIGS. 3-13 are flow diagrams illustrating the processing of the import system in one embodiment. FIG. 3 is a flow diagram of the processing of the import engine in one embodiment. The import engine is passed the import data and the name of the table of the data warehouse into which the data is to be imported. The import engine may be invoked for each set of import data that is to be imported. The processing in each of the blocks of this flow diagram is further described in another flow diagram. In block 301, the engine creates a temporary table to store the import data before it is transferred to the data warehouse. In one embodiment, the engine may store the data directly in the table of the data warehouse based on the setting of the UseWorktables attribute of the meta—Dimension table. In block 302, the engine copies the import data to the temporary table. In block 303, the engine calculates the hash value for each of the rows of the temporary table. The hash values are used to help identify whether one row contains the same data as another row. For example, if a row to be imported contains the same data as a row currently in the table of the data warehouse. In block 304, the engine checks for duplicate data in the import data and removes the duplicate data. In block 305, the engine creates lookup tables for the columns of the table of the data warehouse. In block 306, the engine creates buckets for the columns whose IsContinuous attribute is set. In block 307, the engine resolves the dimensions in the temporary table. For example, if one of the columns of the table of the data warehouse holds a reference to a product table, then the rows of the temporary table are set to include a reference to the product table. In block 308, the engine updates the data warehouse based on the information in the temporary table. In block 309, the engine updates the visit mappings. That is, if the IsAddVisitInformation attribute of the meta—Dimension table is set, then the engine calculates and adds the corresponding visit identifier for each row added to the table of the data warehouse. In block 310, if the UpdateParent attribute of the meta—Dimension table is set, then the engine updates the parent table of this table in the data warehouse. The engine then completes its processing. FIG. 4 is a flow diagram of the processing of the create temporary table function in one embodiment. This function is passed the name of the table and creates a temporary table with a column for each property of that table. In blocks 401-403, the function loops selecting each of the properties of the table and recording the column name for the temporary table. In block 401, the function selects the next property name of the meta—DimProperty table. In decision block 402, if all the property names have already been selected for the passed table, then the function continues at block 404, else the function continues at block 403. In block 403, the function adds a column for the selected property name to the temporary table and then loops to block 401 to select the next property name. In block 404, the function requests the creation of the temporary table with all of the columns identified in block 403. The function then returns. This function may also add other columns corresponding to the columns of the import data. For example, if the IsContinuous attribute is set for a property, then the function may add an additional column to the temporary table for the corresponding import value. FIG. 5 is a flow diagram of the processing of the copy import data to temporary data function in one embodiment. This function copies each row of the import data to the temporary table and generates the derived values as appropriate. In blocks 501-509, the function loops, selecting and processing each row of the import data. In block 501, the function selects the next row of the import data. In decision block 502, if all the rows of the import data have already been selected, then the function returns, else the function continues at block 503. In blocks 503-508, the function loops, selecting each property of the table of the data warehouse and generating a value to store in the temporary table. In block 503, the function selects the next entry for the meta—DimProperty table. In decision block 504, if all the entries of the meta—DimProperty table have already been selected, then the function continues at block 509, else the function continues at block 505. In decision block 505, if the IsDerived flag is set for the selected entry, then the function continues at block 506, else the function continues at block 507. In block 506, the function generates the derived value using the TransformString attribute of the selected entry. In block 507, the function retrieves the value indicated by the ColumnNumber attribute from the selected row of import data. In block 508, the function adds the value to a row to be added to the temporary table and then loops to block 503 to select the next entry for the meta—DimProperty table. In block 509, the function adds the row to the temporary table that contains the values added in block 508. The function then loops to block 501 to select the next row of the import data. One skilled in the art will appreciate that the temporary table may contain a separate column for the value of the import data when the IsDerived attribute is set. FIG. 6 is a flow diagram of the processing of the calculate hash values function in one embodiment. The hash values are generated from the columns of the temporary table whose IsHash attribute is set. In block 601, the function selects the next row of the temporary table. In decision block 602, if all the rows of the temporary table have already been selected, then the function returns, else the function continues at block 603. In block 603, the function creates the hash value for the selected row by combining value of the columns whose IsHash attribute is set in the corresponding entries of the meta—DimProperty table. In block 604, the function stores the created hash value in the selected row of the temporary table. The function then loops to block 601 to select the next row of the temporary table. One skilled in the art will appreciate that the import engine would add a hash column to any table of the data warehouse with at least one property whose IsHash attribute is set. FIG. 7 is a flow diagram of the processing of the check for duplicate import data function. This function checks for rows of the import data that contain the same information and deletes all but the last row. The rows are considered duplicates if the values in their columns with the IsIdentifiable attribute set are the same. That is, the combination of the values from the properties whose IsIdentifiable attribute is set is the key for the row, which may be unique. In block 701, the function selects the next row of the temporary table. In decision block 702, if all the rows of the temporary table have already been selected, then the function returns, else the function continues at block 703. In block 703, the function locates the duplicate rows. In decision block 704, if duplicate rows are located, then the function continues at block 705, else the function loops to block 701 to select the next row of the temporary table. In block 705, the function deletes all but the last duplicate row from the temporary table and then loops to block 701 to select the next row of the temporary table. FIG. 8 is a flow diagram of the processing of the create lookup tables function. This function loops, selecting each entry of the meta—DimProperty table that has its IsLookup attribute set and creating a lookup table for each such entry. The lookup tables may have names derived from the property name (e.g., "lookup—propertyname"). Lookup tables are used to represent import data that may be more compactly represented in an index to a table that contains the import data. For example, a property that contains the URL of accessed web pages (e.g., "www.acme.com/homepage") would be a candidate property for a lookup table. If the web site provides 16 different web pages, then the index into the lookup table can be represented by four bits, rather than, for example, 32 bytes of the URL. In block 801, the function selects the next entry of the meta—DimProperty table. In decision block 802, if all the entries of the meta—DimProperty table have already been selected, then the function returns, else the function continues at block 803. In block 803, the function creates a lookup table for the selected entry. In blocks 804-809, the function loops, selecting each row of the temporary table and adding the appropriate entries to the lookup table. In block 804, the function selects the next row of the temporary table. In decision block 805, if all the rows of the temporary table have already been selected, then the function loops to block 801 to select the next entry of the meta—DimProperty table, else the function continues at block 806. In block 806, the function retrieves the entry from the lookup table for the selected row of the temporary table. In decision block 807, if an entry was retrieved, then the function continues at block 809, else the function continues at block 808. In block 808, the function adds an entry to the lookup table corresponding to the value of the selected property for the selected row. In block 809, the function adds the index of the retrieved or added entry to the selected row and then loops to block 804 to select the next row of the temporary table. One skilled in the art will appreciate that the temporary table may include a column for holding the import value associated with each property whose IsLookup attribute is set. FIG. 9 is a flow diagram of the processing of the create buckets function in one embodiment. This function creates bucket values for each column of the target table whose IsContinuous attribute is set. The bucket tables may be predefined tables with a name that is derived from the name of the selected property (e.g., "bucket—propertyname"). Each entry of a bucket table may specify the range associated with that bucket. For example, if the property corresponds to the salary of employees, then the first bucket may have a range that is less than $25,000, the second bucket may have a range from $25,000 to $50,000 and so on. The function would assign an index of 2 when the salary is $35,000. In blocks 901-906, the function loops, selecting each entry of the meta—DimProperty table whose IsContinuous attribute is set and identifying its bucket index. In block 901, the function selects the next entry of the meta—DimProperty table whose IsContinuous attribute is set. In decision block 902, if all such entries of the meta—DimProperty table have already been selected, then the function returns, else the function continues at block 903. In blocks 903-906, the function loops, selecting each row of the temporary table and updating its bucket index for the selected entry of the meta—DimProperty table. In block 903, the function selects the next row of the temporary table. In decision block 904, if all the rows of the temporary table have already been selected, then the function loops to block 901 to select the next entry of the meta—DimProperty table, else the function continues at block 905. In block 905, the function looks up the bucket index for the value of the selected row. In block 906, the function adds the bucket index to the selected row and loops to block 903 to select the next row of the temporary table. One skilled in the art would appreciate that the temporary table may contain an additional column for holding the import data corresponding to each property whose IsContinuous attribute is set. FIG. 10 is a flow diagram of the processing of the resolve dimensions function in one embodiment. The resolve dimensions function updates the columns of the table of the data warehouse that correspond to a property whose RelDimension attribute indicates a dimension table. The function loops, selecting each entry of the meta—DimProperty table. In block 1001, the function selects the next entry of the meta—DimProperty table. In decision block 1002, if all the entries have already been selected, then the function returns, else the function continues at block 1003. In decision block 1003, if the RelDimension attribute of the selected entry names a dimension table, then the function continues at block 1004, else the function loops to block 1001 to select the next entry of the meta—DimProperty table. In blocks 1004-1009, the function loops retrieving each row of the temporary table and resolving the values for the selected property. In block 1004, the function retrieves the next row of the temporary table. In decision block 1005, if all the rows of the temporary table have already been retrieved, then the function loops to block 1001 to select the next entry of the meta—DimProperty table, else the function continues at block 1006. In block 1006, the function retrieves the entry from the related dimension table as indicated by the Reldimension attribute. In decision block 1007, if an entry was retrieved, then the function continues at block 1009, else the function continues at block 1008. In block 1008, the function adds an entry to the related dimension table. In block 1009, the function updates the column corresponding to the selected property in the selected row and then loops to block 1004 to select the next row of the temporary table. FIG. 11 is a flow diagram of the update data warehouse function in one embodiment. This function copies the data of the temporary table to the table of the data warehouse. In block 1101, the function selects the next row of the temporary table. In decision block 1102, if all the rows of the temporary table have already been selected, then the function returns, else the function continues at block 1103. In decision block 1103, if the DeleteAndlmport attribute of the meta—Dimension table entry is set, then the function continues at block 1104, else the function continues at block 1106. In block 1104, the function retrieves all rows of the data warehouse table with the same key (generated from properties whose IsIdentifiable attribute is set) as the selected row of the temporary table. In block 1105, the function deletes all the retrieved rows of the data warehouse table. In block 1106, the function adds the selected row of the temporary table to the data warehouse table. The function then loops to block 1101 to select the next row of the temporary table. FIG. 12 is a flow diagram of the processing of the update visit mapping function in one embodiment. This function updates the visit reference of the table in the data warehouse if the IsAddVisitInformation attribute of the meta—Dimension table is set. The import engine adds a visit reference column to each table whose IsAddVisitInformation attribute is set. In decision block 1201, if the IsAddVisitInformation attribute of the meta—Dimension table is set, then the function continues at block 1202, else the function returns. In blocks 1202-1205, the function loops selecting each row of the temporary table and updating its visit reference. In block 1202, the function selects the next row of the temporary table. In decision block 1203, if all the rows of the temporary table have already been selected, then the function returns, else the function continues at block 1204. In block 1204, the function identifies the related visit reference. The function may identify related visits based on timing information stored in the visit table of the data warehouse. In block 1205, the function updates the row of the data warehouse table corresponding to the selected row of the temporary table with the visit reference and then loops to block 1202 to select the next row of the temporary table. FIG. 13 is a flow diagram of the processing of the update parent function in one embodiment. This function updates the parent table in the data warehouse if the UpdateParent attribute contains the name of the parent table. The function updates the parent table by executing the code of the Expression attribute on each row of the temporary table. For example, the rows of the temporary table may correspond to line items of a purchase order and the parent table may contain a row for each purchase order. The expression may add the price for each line item to a total price column of the row for the purchase order in the parent table. In decision block 1301, if the UpdateParent attribute in the meta—Dimension table contains the name of parent table, then the function continues at block 1302, else the function returns. In block 1302, the function selects the next row of the temporary table. In decision block 1303, if all the rows of the temporary table have already been selected, then the function returns, else the function continues at block 1304. In block 1304, the function executes the expression on the selected row, which updates the corresponding parent entry in the data warehouse. The expression is specified by the Expression attribute. The function then loops to block 1302 to select the next row of the temporary table. FIG. 14 is a block diagram illustrating the generation of an aggregation and report table. The aggregation table 1401 includes an aggregating property in the first column and the various measure columns. The aggregation engine inputs of the aggregation metadata that defines the aggregating property and the measure columns and creates the aggregation table. The report table 1402 similarly includes an aggregating property in the first column and various measure columns as defined by the aggregation metadata. FIG. 15 is a block diagram illustrating the organization of the aggregation metadata in one embodiment. The aggregation metadata includes a meta—Aggregation table 1501, a meta—AggregationFact table 1502, a meta—AggregationMeasure table 1503, a meta—FactMeasure table 1504, a meta—AggregationProperty table 1505, a meta—ReportMeasure table 1506, and a meta—Fact table 1507. The meta—Aggregation table contains an entry for each aggregation that is defined. The meta—AggregationFact table contains for each aggregation an entry for each fact table that is used to generate that aggregation. For example, a single aggregation may be generated from the data of five different fact tables in which case the meta—AggregationFact table would have five entries for that aggregation. The meta—AggregationMeasure table contains for each entry of the meta—AggregationFact table an entry for each measure of the aggregation that is generated from that fact table. The meta—FactMeasure table has for each fact table an entry for each measure that is generated from the fact table and specifies how that measure is to be generated. The meta—AggregationProperty table specifies the aggregating property for an aggregation. The meta—ReportMeasure table specifies the measure names to be included in the report for the aggregation. The meta—Fact table contains an entry for each fact table that is used in an aggregation. The following tables describe the contents of these tables.
FIG. 16 is a flow diagram of the processing of the aggregation engine in one embodiment. The engine is passed the name of an aggregation and generates the aggregation and report tables as specified by the aggregation metadata. The processing of each of the blocks of this flow diagram is described in detail in a flow chart that follows. In block 1601, the engine creates the aggregation table with the appropriate columns as defined by the aggregation metadata. In block 1602, the function updates the aggregation table to include the aggregating property and measures as defined by the aggregation metadata. In block 1603, the function creates the report table with the appropriate columns as defined by the aggregation metadata. In block 1604, the function updates the report table as defined by the aggregation metadata and then completes. FIG. 17 is a flow diagram of the processing of the create aggregation table function in one embodiment. This function is passed the name of an aggregation. This function identifies the aggregation property and each measure from the aggregation metadata that is to be included in the aggregation table for the aggregation. The function then requests the database system to create the table. In block 1701, the function retrieves the entry from the meta—AggregationProperty A table for the aggregation. In block 1702, the function adds a column for the aggregating property to the aggregation table corresponding to the selected entry. In blocks 1703-1708, the function loops, selecting each measure for each fact table included in the aggregation and adding a corresponding column to the aggregation table. In block 1703, the function selects the next fact table from the meta—AggregationFact table. In decision block 1704, if all the fact tables of the meta—AggregationFact table have already been selected, then the function continues at block 1709, else the function continues at block 1705. In block 1705, the function selects the next measure for the selected fact table from the meta—AggregationMeasure table. In decision block 1706, if all the measures have already been selected for the selected fact table, then the function loops to block 1703 to select the next fact table from the meta—AggregationFact table, else the function continues at block 1707. In block 1707, the function retrieves the entry from the meta—FactMeasure table for the selected fact table and the selected measure. In block 1708, the function adds a column to the aggregation table corresponding to the retrieved entry and then loops to block 1705 to select the next measure. In block 1709, the function requests creation of the aggregation table with the added columns. FIG. 18 is a flow diagram of the processing of the update aggregation table function in one embodiment. This function is passed the name of the aggregation. The function adds a row to the aggregation table that corresponds to each entry of the dimension table corresponding to the aggregating property. In one embodiment, the function might only add to the aggregation table rows corresponding to rows of the dimension table that are pointed to by the fact tables used in the aggregation. The function then selects each fact table and generates for each row of the aggregation table the measures derived from that fact table. In block 1801, the function retrieves the entry from the meta—AggregationProperty table. In blocks 1802-1804, the function loops, adding a row to the aggregation table for each row of the dimension table identified by the retrieved entry of the meta—AggregationProperty table. In block 1802, the function retrieves the next entry from dimension table. In decision block 1803, if all the entries have already been retrieved, then the function continues at block 1805, else the function continues at block 1804. In block 1804, the function adds a row to the aggregation table and then loops to block 1802 to retrieve the next entry from the dimension table. In blocks 1805-1807, the function loops, selecting each fact table that is used in the aggregation and calculating the measures derived from the fact table. In block 1805, the function selects the next fact table from the meta—AggregationFact table. In decision block 1806, if all the fact tables have already been selected, then the function returns, else the function continues at block 1807. In block 1807, the function aggregates the data of the fact table by invoking the aggregate fact table function and then loops to block 1805 to select the next fact table. FIG. 19 is a flow diagram of the processing of the aggregate fact table function in one embodiment. This function is passed the name of an aggregation and the name of a fact table used in that aggregation. The function selects each row of the fact table and aggregates each measure derived from that fact table into the aggregation table. In block 1901, the function selects the next row of the fact table. In decision block 1902, if all the rows of the selected fact table have already been selected, then the function returns, else the function continues at block 1903. In blocks 1903-1906, the function loops, calculating the measure value for the selected row for each measure derived from the fact table. In block 1903, the function selects the next measure from the meta—AggregationMeasure table. In decision block 1904, if all the measures have already been selected, then the function loops to block 1901 to select the next row of the fact table, else the function continues at block 1905. In block 1905, the function retrieves the entry from the meta—FactMeasure table. In block 1906, the function updates the selected measure of a row of the aggregation table based on the selected row of the fact table. The updating is in accordance with the aggregation function of the retrieved entry from the meta—FactMeasure table. The function then loops to block 1903 to select the next measure. FIG. 20 is a flow diagram of the processing of the create report table. This function is passed the name of an aggregation. The function creates a report table that includes a column for the aggregating property and a column for each measure in the report. In block 2001, the function retrieves the entry from the meta—AggregationProperty table for the aggregation. In block 2002, the function adds a column to the report table corresponding to the aggregating property. In blocks 2003-2005, the function loops, adding a column to the report table for each measure to be included in the report. In block 2003, the function selects the next measure from the meta—ReportMeasure table. In decision block 2004, if all the measures have already been selected, then the function continues at block 2006, else the function continues at block 2005. In block 2005, the function adds a column to the report table corresponding to the selected measure and then loops to block 2003 to select the next measure. In block 2006, the function requests the creation of the report table and then returns. FIG. 21 is a flow diagram of the update report table function in one embodiment. This function is passed the name of an aggregation. The function adds a row to the report table corresponding to each row of the aggregation table with measure values calculated in accordance with the meta—ReportMeasure table. In block 2101, the function selects the next row of the aggregation table. In decision block 2102, if all the rows of the aggregation table have already been selected, then the function returns, else the function continues at block 2103. In block 2103, the function selects the next measure of the meta—ReportMeasure table. In block 2104, if all the measures have already been selected, then the function continues at block 2106, else the function continues at block 2105. In block 2105, the function calculates the value for this measure using the MeasureValue attribute of the measure and then loops to block 2103 to select the next measure. In block 2106, the function adds a row to the report table corresponding to the measure values calculated in block 2105 and then loops to block 2101 to select the next row is of the aggregation table. From the above description, it will be appreciated that although specific embodiments of the technology have been described for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention. For example, the import metadata may be used to transform the data of a data store. In such a situation, the data store may be considered to have data that is to be "imported" into other tables of the data store, which may be defined by the metadata itself. Alternatively, the tables defined by the metadata can be part of a separate data store. One skilled in the art will also appreciate that the metadata can be stored in various different ways and need not be stored in separate tables as described above. Accordingly, the invention is not limited except by the appended claims.
|
Same subclass Same class Consider this |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
