Generating one or more XML documents from a single SQL query6636845Abstract A technique is provided for generating one or more XML documents from a single SQL query. Data stored on a data storage device that is connected to a computer is transformed. A query is received that selects data stored in a relational database management system on the data storage device, wherein a data access definition defines: (1) a collection of one or more tables in the relational database management system for storing attributes from an XML document, (2) how data stored in the tables maps to the XML document, (3) a query for mapping the data stored in the tables to the XML document, and (4) a table that will contain the XML document after the XML document is generated. Then, one or more XML documents are generated from the selected data using the data access definition. Claims What is claimed is: Description FIELD OF THE INVENTION
TABLE
Restriction of Location Path Supported
Use of the Location Path Location Path Supported
Extracting UDFs a-j
Text Extender's search UDF a-j
DAD column definition c, f (simple location path)
Note that there is a restriction in the DAD column definition because there is a one-to-one mapping between an element or attribute to a column. The term simple location path refers to the c and f notations in the table for Restriction of Location Path Supported. The simple location path is a sequence of element type names connected by the "/" notation. Each element type may be qualified by its attribute values.
TABLE
Simple Location Path of an Element and an Attribute
Subject location path Description
XML /tag_1/tag_2/ . . . / an element content identified
Element tag_n-1/tag_n by the tag_n and its parents
XML /tag_1/tag_2/ . . . / an attribute with name "attr1"
Attribute tag_n-1/tag_n@attr1 of the element identified by
tag_n and it parents
The location path identifies the structure part that indicates the document context to be found. An empty path signals the structure to search or extract against is the whole document (same effect as if the location path is the root element). The XML System provides users the ability to create SQL queries on XML documents. Based on the nature of XML documents and the functionality of the XML System, the following terminology is used:
Document Access The definition used to enable an XML
Definition(DAD): System column or an XML collection,
which is XML formatted.
Partition: The term partition used means the full
partition. In other words, the union of all
partitioned parts forms the original
document.
Location path: A subset of the abbreviated syntax of the
location path defined by XPath. A
sequence of XML tags to identify an
XML element or attribute. It is used in the
extracting UDFs to identify the subject to
be extracted. The terms of path expression
and location path may be used
interchangeably.
Side table: Additional tables created by the XML
System to store searchable
elements/attributes for an enabled XML
Column.
Valid Document: An XML document that has an associated
DTD. To be valid, the XML document
cannot violate the syntactic rules specified
in its DTD.
Well-formed An XML document that does not contain
document: a DTD. A document with a DTD (valid)
must also be well-formed.
XML Attribute: Any attribute specified by the ATTLIST
under the XML element in the DTD. The
XML System uses the location path to
identify an attribute.
XML Column: A column in the application table of the
XML System UDT type. The term of
XML enabled column and XML column
will be used interchangeably.
XML Collection: A collection of relational tables which
present the data to compose XML
documents or to be decomposed from
XML documents.
XML Element: Any XML tag or ELEMENT as specified
in the XML DTD. The XML System uses
the location path to identify an element.
XML Tag: Any valid XML markup language tag,
mainly the XML element. The term tag
and element are used interchangeably.
XML Table: An XML Table is an application table
which includes XML System column(s).
The terms XML enabled table and XML
table are used interchangeably.
XML Object: The terms XML Object and XML
document are used interchangeably.
XML UDT: User defined type provided by the XML
System.
XML UDF: User defined function provided by the
XML System.
C.4 Example of an XML DTD The following DTD is provided as an example:
LineItem.dtd
<?xml encoding="US-ASCII"?>
<!ELEMENT Order (customer,Part+)>
<!ATTLIST Order Key CDATA #REQUIRED>
<!ELEMENT Customer #PCDATA>
<!ELEMENT Part(Quantity,ExtendedPrice,Tax,Shipment*)>
<!ATTLIST Part Key CDATA>
<!ELEMENT Quantity (#PCDATA)>
<!ELEMENT ExtendedPrice (#PCDATA)>
<!ELEMENT Tax (#PCDATA)>
<!ELEMENT Shipment (ShipDate,ShipMode,Comment)>
<!ELEMENT ShipDate (#PCDATA)>
<!ELEMENT ShipMode (#PCDATA)>
<!ELEMENT Comment (#PCDATA)>
In the above LineItem.dtd, the term LineItem.dtd is the title of the Document Type Definition. The term <?xml encoding="US-ASCII"?> indicates that encoding is in US-ASCII. The terms beginning with ELEMENT refer to elements of an XML document, and the terms beginning with ATTLIST refer to attributes of an XML document. The DTD is used to verify a Document Access Definition. C.5 Example of an XML Document The following is an example of an XML formatted document:
order.xml
<?xml version="1.0"?>
<!DOCTYPE Litem_DTD SYSTEM
"E:.backslash.dxx.backslash.test.backslash.dtd.backslash.LineItem.dtd">
<Order Key="1">
<Customer>General Motor</Customer>
<Part Key="156">
<Quantity>17</Quantity>
<ExtendedPrice>17954.55</ExtendedPrice>
<Tax>0.02</Tax>
<Shipment>
<ShipDate>1998-03-13</ShipDate>
<ShipMode>TRUCK</ShipMode>
<Comment>This is the first shipment to service of
GM</Comment>
</Shipment>
<Shipment>
<ShipDate>1999-01-16</ShipDate>
<ShipMode>FEDEX</ShipMode>
<Comment>This the second shipment to service of
GM.</Comment>
</Shipment>
</Part>
<Part Key="68">
<Quantity>36</Quantity>
<ExtendedPrice>34850.16</ExtendedPrice>
<Tax>0.06</Tax>
<Shipment>
<ShipDate>1996-04-12</ShipDate>
<ShipMode>BOAT</ShipMode>
<Comment>This shipment is requested by a call.
from GM marketing.</Comment>
</Shipment>
<shipment>
<ShipDate>1998-08-19</ShipDate>
<ShipMode>AIR</ShipMode>
<Comment>This shipment is ordered by an email.</Comment>
<Shipment>
</Part>
</Order>
In the above XML document, the term order.xml is the title of the XML document. The term <?xml version="1.0"?> indicates that this document is based on XML Version 1.0. The term <!DOCTYPE Litem_DTD SYSTEM "E:.backslash.dxx.backslash.test.backslash.dtd.backslash.LineItem.dtd"> is text for the XML document type definition and references the example Document Type Definition, entitled LineItem.dtd, in C.4, which is used for validation. The remaining terms define the data in the XML document. For example, the term <Quantity>17</Quantity> indicates that quantity has a value of 17. Also, note that <Quantity> without a slash at the beginning defines a start tag and </Quantity> with a slash at the beginning defines an end tag. Similarly, other terms in the XML document use such tags. C.6 The Document Access Definition (DAD) A user decides how XML document data is to be accessed in a database. That is the 150 user defines a DAD. With the help of a Graphical User Interface (GUI) tool. the user can create a DAD to define a mapping and indexing scheme. A Document Access Definition(DAD) is defined by the following Document Type Definition (DTD):
dad.dtd
<?xml encoding="US-ASCII"?>
<!ELEMENT DAD (dtdid?, validation, (Xcolumn .vertline. Xcollection))>
<!ELEMENT dtdid (#PCDATA)>
<!ELEMENT validation (#PCDATA)>
<!ELEMENT Xcolumn (table*)>
<!ELEMENT table (column*)>
<!ATTLIST table name CDATA #REQUIRED
key CDATA #IMPLIED
orderBy CDATA #IMPLIED>
<!ELEMENT column EMPTY>
<!ATTLIST column
name CDATA #REQUIRED
type CDATA #IMPLIED
path CDATA #IMPLIED
multi_occurrence CDATA #IMPLIED>
<!ELEMENT Xcollection (SQL_stmt*, prolog, doctype, root_node)>
<!ELEMENT SQL_stmt (#PCDATA)>
<!ELEMENT prolog (#PCDATA)>
<!ELEMENT doctype (#PCDATA .vertline. RDB_node)">
<!ELEMENT root_node (element_node)>
<!ELEMENT element_node (RDB_node?,
attribute_node*,
text_node?,
element_node*,
namespace_node*,
process_instruction_node*,
comment_node*)>
<!ATTLIST element_node
name CDATA #REQUIRED
ID CDATA #IMPLIED
multi_occurrence CDATA "NO"
BASE_URI CDATA #IMPLIED>
<!ELEMENT attribute_node (column .vertline. RDB_node)>
<!ATTLIST attribute_node
name CDATA #REQUIRED>
<!ELEMENT text_node (column .vertline. RDB_node)>
<!ELEMENT RDB_node (table+, column?, condition?)>
<!ELEMENT condition (#PCDATA)>
<!ELEMENT comment_node (#PCDATA)>
<!ELEMENT namespace_node EMPTY>
<!ATTLIST namespace_node
name CDATA #IMPLIED
value CDATA #IMPLIED>
<!ELEMENT process_instruction_node (#PCDATA)>
The XML System Administration GUI will provide an interface to create DAD files. The DAD itself is a tree structured XML document. The important elements and attributes of the DAD are: DTDID The identifier of the DTD stored in the dtd_ref table. It represents the DTD which validates the XML documents or guides the mapping between XML collection tables and XML documents. DTDID must be specified for XML collections. For XML columns, it is optional and is only needed if you w ant to create side tables for indexing on elements/attributes or validate net XML documents. The DTDID must be the same as the SYSTEM ID specified in the "doctype" of the XML documents. Validation For validating XML documents with the DTD, and "No" for no validation. If "Yes", then the DTDID must also be specified. Xcolumn An Xcolumn defines the indexing scheme for an XML Column. It is composed by zero or more tables. table: The relational side table(s) created for indexing elements or attributes of documents stored in an XML column. You can have one or more tables. A table is specified by: name: name of the side table. column: The column of the side table, which contains the value of a location path of the specified type. name: name of the column. It is the alias name of the location path which identifies an element or attribute, type: the data type of the column. It can be any SQL data type. path: the location path of an XML element or attribute. Only a simple location path defined in Section C.3 is allowed here. multi_occurrence: "YES" or "NO" to specify whether this element or attribute will have in one XML document. For Xcolumn, if multi_occurrence is specified as "YES", the XML System will add another column "DXX_SEQNO" with type Integer in the side table which this column belong to. This DXX_SEQNO keeps track of the order of elements occurred for the path expression in each inserted XML documents. With DXX_SEQNO, the user can retrieve a list of the elements with the same order as the original XML document using "ORDER BY DXX_SEQNO" in SQL. Xcollection The Xcollection defines mapping between XML documents and an XML collection of relational tables. It is composed by the following elements: SQL_stmt The SQL statement to specify the operation needed to achieve the mapping. It must be a valid SQL statement. It is only needed for composition, and only one SQL_stmt of query is allowed. objids A list of identifiers, each of which conceptually identifies a row object in the database table, so that the row to be selected is ordered by this unique value. It is only needed when SQL_stmt is supplied. The ID can be a column name, or a value generated from the generate_unique( ) function or a UDF. It is recommended but not necessary to be the primary key of the table. prolog Text for the XML prolog. The same prolog is supplied to all documents in tile entire collection. It is a fixed text. This because only XML 1.0 is supported, and UDB.RTM. only supports UTF-8. doctype Text for the XML document type definition. The doctype can be specified in one of the following two ways: The same doctype is supplied to all documents in the entire collection. In this case, it is a fixed text. When decomposing, the doctype can be stored as a column data of a table. In this case, the RDB_node should be specified. root_node The virtual root node which must has one and only one element_node. The element_node under the root_node is actually the root_node of the XML document. RDB_node: The node defines the mapping between an XML element or attribute and relational data. It consists of: table: name: the name of a relational table in which the data of an XML element or attribute reside. key: the primary single key of the table. It must be specified for decomposition. For the root element_node, all tables storing its attribute or all child element data should be specified. orderBy: names of columns that determine the sequence order of multiple-occurring, element text or attribute value when generating XML documents. column: It must be specified for text_node or attribute_node, but not for the element_node. name: name of the column which contains the value of an XML element or attribute. It must be specified for both composition and decomposition. type: the data type of the column. It is needed only for decomposition. path: the location path of the element or attribute. It is not needed for Xcollection, only for Xcolumn. multi_occurrence: multiple occurrence of the element or attribute. condition: the predicate to specify query condition. It serves two purposes: In RDB_node of a text_node or attribute_node: if specified, it qualifies the condition to select the column data to be used to compose or decompose XML element text or attribute value. It is optional. In RDB_node of the root element_node: if more than one tables are supplied, it must be specified as the condition to join tables. element_node Representing an XML element. It must be defined in the specified DTD. For the RDB_node mapping, the root element_node must have a RDB_node to specify all tables containing XML data for itself and all its children nodes. It can have zero or more attribute_nodes and child element_nodes, as well as zero or one text_node. In the next release, an element_node can also contain namespace nodes, process_instruction_nodes and comment_node. An element_node is defined by: Attributes: name: The name of the XML element. It is the tag name. ID: The unique ID. This is adapted from XPTH. BASE_URI: The base URI for the name space. This is also adapted from XPTH. Optional RDB_node: The RDB_node is only needed for the root element_node when using RDB_node mapping. In this case, all tables involved to generate or decompose XML documents must be specified. The column is not needed. The condition must be specified to show the join relationship among tables. Optional child nodes: An element_node can also have the following child nodes: element_node(s): representing child element(s) of this element, attribute_node(s): representing attribute(s) of this element; text_node: represent the CDATA text of this element, comment_node: representing the comment for this element, namespace_node: representing the namespace of this element, process_instruction_node: representing the process instruction, attribute_node: Representing an XML attribute. It is the node defining the mapping between an XML attribute and the column data in a relational table. It must has a name, and a column or a RDB_node. Attribute: name: the name of the attribute. It must be defined in the DTD. Column or RDB_node: Column: needed for the SQL mapping. In this case, the column must be in the SQL_stmt's SELECT clause. RDB_node: needed for the RDB_node mapping. The node defines the mapping between this attribute and the column data in the relational table. The table and column must be specified. The condition is optional. text_node: Representing the text content of an XML element. It is the node defining the mapping between an XML element content and the column data in a relational table. It must be defined by a column or a RDB_node. Column: needed for the SQL mapping. In this case, the column must be in the SQL_stmt's SELECT clause. RDB_node: needed for the RDB_node mapping. The node defines the mapping between this text content and the column data in the relational table. The table and column must be specified. The condition is optional. D. Creating Metadata for Fast Search of XML Documents Stored as Column Data One embodiment of the invention provides an XML System which solves the problem of fast searching and indexing of XML element/attribute values of XML documents when they are stored inside a database as column data. An XML document is a structured document. XML lets a user structure a document by elements or attributes (e.g., title or author). Once a document is structured in this manner, a structured search man be performed based on element or attribute values (or content). The embodiment of the invention converts the characters of element/attribute values to any general SQL data type. Additionally, the embodiment of the invention provides a technique for performing a range search on the data. That means the element or attribute values are converted to SQL types (e.g., number of pages may be an integer). With this embodiment of the invention, indices can be created on XML element/attribute values, thus the search operation is scalable. The embodiment of the invention permits application programmers to define a Data Access Definition (DAD) which identifies the XML elements or attributes that need to be indexed and defines the mapping between XML elements or attributes to columns in one or more side tables. The DAD is an XML formatted document that is used to specify within an XML document which elements or attributes are to be searched. The DAD also provides a location path or XPath. For example, if elements of a book are structured as follows: .vertline.-----Book .vertline.-----Title .vertline.-----Author The location path for the above structure would be: /Book/Title/Author. Additionally, the embodiment of the invention stores XML document data in an application table, while storing particular elements or attributes in side tables. The data stored in the side tables is referred to as "metadata" and is used to search for elements or attributes in the XML documents stored as column data in the application table. During the enabling of a column which contains XML documents, side tables are created (based on the DAD) to store duplicate data of these elements or attributes. Several triggers are created so that values of these elements or attributes are extracted when operations are performed on XML documents in columns of an application table. The operations include, for example, insert operations on the application table, which trigger insert operations to also store the inserted XML data into the side tables. Triggers also manage the synchronization of XML data between the side table data during the deleting and updating operations on the column containing the XML documents in the application table. D.1 Indexing for Searching XML Columns The indexing mechanism is applied on XML columns. In particular, the indexing mechanism discussed here is a technique to create an index on XML element or attribute values when entire XML documents are stored in XML columns. With a large collection of XML documents, search performance is a critical user requirement. Index support provides fast query performance at the cost of slower update performance due to index updates. The XML System provides an indexing mechanism that allows search predicates at query-time to be evaluated through indices, without reading document sources. The XML column indexing mechanism allows frequently queried data of general data types, such as integer, decimal, or date, to be indexed using the native database index supports from the database engine. This is achieved by extracting the values of XML elements or attributes from XML documents, storing them in the side tables, then allowing application programmers to create indices on these side tables. In a DAD, a user can define Xcolumns by specifying each column of a side table with a location path that identifies an XML element or attribute and a desired SQL data type. The XML System then will populate these side tables when data is inserted into the application table. An application can create an index on these columns for fast search, using the database B-tree indexing technology. The technique and options for creating an index may vary across platforms. Application programmers have the freedom to create a desired index as they usually do with a database on their platform. For elements/attributes in an XML document which occur multiple times, a separate table is created for each XML element/attribute with multiple occurrences, due to the complex structure of XML documents. For example, a user may want to create an index on `/Order/Part/ExtendedPrice`, and specify `/Order/Part/ExtendedPrice` to be of data type REAL. In this case, XML System will store the value of `/Order/Part/ExtendedPrice` in the specified column `price` in a side table. Multiple indices on an XML column are allowed. In the example, a user can create two columns in two side tables, one for `ExtendedPrice` and one for "ShipDate". When side tables are created, they are tied together with the main (or application) table through the notion of root_id. A user can decide whether the primary key of the application table is to be the "root_id". If the primary key does not exist in the application table, or for some reason a user doesn't want to use the primary key, then XML System will alter application table to add a column DXXROOT_ID for storing a unique identifier created at insertion time (i.e., when data is inserted into the application or main table). All side tables will have a "DXXROOT_ID" column and have the unique identifiers stored. If the primary key is used as the root_id, then all side tables will have a column with the same name and type as the primary key column in the application table, and the values of the primary keys are stored. D.2 Sample DAD for an XML Column Assuming the XML documents need to be stored are like the one shown in C.5. Example of an XML Document, the following example DAD will store the XML documents in an XML column and create several side tables for indexing.
Litem_DAD1.dad
<?xml version="1.0"?>
<!DOCTYPE Order SYSTEM "E:.backslash.dtd.backslash.dxxdad.dtd">
<DAD>
<dtdid>E:.backslash.dtd.backslash.lineItem.dtd</dtdid>
<validation>YES</validation>
<Xcolumn>
<table name="order_tab">
<column name="order_key"
type="integer"
path="/Order/@Key"
multi_occurrence="NO"/>
<column name="customer"
type="varchar(50)"
path="/Order/Customer"
multi_occurrence="NO"/>
</table>
<table name="part_tab">
<column name="part_key"
type="integer"
path="/Order/Part/@Key"
multi_occurrence="YES"/>
</table>
<table name="price_tab">
<column name="price"
type="double"
path="/Order/Part/ExtendedPrice"
multi_occurrence="YES"/>
</table>
<table name="ship_tab">
<column name="date"
type="date"
path="/Order/Part/Shipment/ShipDate"
multi_occurrence="YES"/>
</table>
</Xcolumn>
</DAD>
In the above DAD, Litem_DAD1.dad is the name of the DAD. The phrase <?xml version="1.0"?> identifies the version, and the phrase <!DOCTYPE Order SYSTEM "E:.backslash.dtd.backslash.dxxdad.dtd"> is text for the XML document type definition. The first DAD and the second DAD tags indicate that the information between these tags comprise the data access definition. The phrase <dtdid>E:.backslash.dtd.backslash.lineItem.dtd</dtdid> identifies the document type definition (DTD) to be used. The phrase <validation>YES</validation> indicates that this DAD is to be validated against the DTD. The four table name terms identify the four side tables to be created. In this example, the four side tables created for indexing are as follows: order_tab: with columns of order_key and customer; representing attribute "/Order/@Key" and element "/Order/Customer". part_tab: with column of part_key, representing attribute "/Order/Part/@Key". price_tab: with column of price, representing element "/Order/Part/Price" ship_tab: with column of date representing element "/Order/Part/Shipment/ShipDate". For this example, it is assumed that the columns in the tables are the elements and attributes which need to be searched frequently. FIG. 3 illustrates an application or main table and its four side tables. The Application table 300 has a root_id in common with each side table 302, 304, 306, and 308. The side tables 302, 304, 306, and 308 correspond to the side tables defined in the DAD above. D.3 XML Column/User Defined Types An XML column is designed to store XML documents in their native format in the database as column data. After a database is enabled, the following user defined types (UDTs) are created: XMLCLOB: XML document content stored as a CLOB inside the database, XMLVarchar: XML document content stored as a VARCHAR inside the database, XMLDBCLOB: XML document content stored as double byte CLOB inside the database, XMLFile: XML document stored in a file on a local file system, XMLURL: XML document stored as a uniform resource locator (URL) via Data Link. A user can use these UDTs as the data type of an XML column. An XML column is created when a user creates or alters an application table. D.4 Creating an XML Table An XML table is a table that includes one or more columns created with the XML System UDT. To create such a table, an XML column is included in the column clause of the CREATE TABLE statement. Consider a line item order book keeping application. The XML formatted line item order is to be stored in a column called "order" of an application table called "sales_tab". The sales_tab table also includes other columns of invoice_number and sales_person. Since the order is not very long, a user may decide to store it in the XMLVarchar type. The user may also decide to let the invoice_number be the primary key. The following create table statement can be used, where XMLVarchar is the XML System UDT:
CREATE TABLE sales_tab
(invoice_number char(6) NOT NULL PRIMARY KEY,
sales_person varchar(20),
order XML Varchar);
D.5 Defining Xcolumn in DAD In order to use an XML column, a DAD needs to be prepared and enabled. In DAD preparation, a user first needs to define an "Xcolumn". The following steps guide a user to define an "Xcolumn", using the examples: XML document order.xml in C.5, DTD LineItem.dtd in C.4, and DAD Litem_DAD1.dad in D.2. Identify the XML elements and attributes which will be frequently searched in the application. In the above examples, the "/Order/@Key". "Order/Customer", "/Order/Part/@Key", "/Order/Part/ExtendedPrice". "/Order/Part/Shipment/ShipDate" are mostly like to be searched and range search is needed for some of them. Decide how many side tables will be created for indexing. This is based on the understanding of the DTD and XML documents. In the above examples, since "/Order" has unique attribute "Key" and only one element "Customer", they are put in the same side table "order_tab". One "Order" can have one or more "Part" items (see DTD definition in C.4), and each "Part" will have unique attribute "Key" and element "ExtendedPrice", and so these are separates into two tables: "part_tab" and "price_tab". Now, since one "Part" can have multiple "Shipment" items and each "Shipment" has one "ShipDate", the "ShipDate" is put into another table "ship_tab". Define the column of each side table by specifying the column name, the matching XML element or attribute by location path, and the data type. In the examples, the ability to perform range search is desired on "ExtendedPrice" and "ShipDate", thus the data type is specified to be double and date respectively. Because there will be multiple occurrences of the /Order/Part/@key, /Order/Part/ExtendedPrice and /Order/Part/Shipment/ShipDate, specify the multi_occurrence="YES" for these elements or attributes. By doing so, the XML System will create an additional column DXX_SEQNO for side table price_tab and ship_tab so that a query can be performed using "order by DXX_SEQNO" to get the element or attribute with the same order as that in the original XML documents. D.6 Enabling Parameters A column can be enabled through the XML System administration GUI or using a dxxadm command with the enable_column option. The syntax of the option is as follows: dxxadm enable_column db_name tab_name column_name DAD_file -t tablespace -v default_view -r root_id] where: db_name: the database name tab_name: table name in which the XML column resides. column_name: name of the XML column. DAD_file: name of the file that contains Data Access Definition(DAD). tablespace: optional, but if specified, a previously created tablespace which will contain side tables created by the XML System. default_view: optional, but if specified, it is the name of the default view created by XML System to join application table and all side tables. root_id: optional, but recommended, and if specified, it is the column name of the primary key in the application table, and XML System will use it as the unique "root_id" to the all side tables with the application table. If not specified, XML System will add the column of DXXROOT_ID in the application table. Note: if the application table happened to have a column name as "DXXROOT_ID", the primary key must be specified as the "root_id", otherwise, an error will be returned. Here is an example for enabling the column order in the table sales_tab in database mydb with the DAD_file DAD_file Litem_DAD1.dad in C.4, default view sales_order_view and root_id invoice_number. /home/u1>dxxadm enable_column mydb sales_tab order Litem_DAD1.dad -v sales_order_view -r invoice_number DXXA007I XML Extender is enabling column order. Please wait. DXXA008I XML Extender has successfully enabled the column order. /home/u1> D.7 Results of the Column Enabling The enabling of an XML column mainly does the following things to a database: Read the DAD_file and do the following: if DTDID is specified, retrieve the DTD from the dtd_ref table. process Xcolumn to create side tables create triggers for insert, update and delete on the XML column so that the side tables will be populated or updated. Create a default_view if specified. If root_id not specified, alter application table to add DXXROOT_ID column. Update the XML_USAGE and dtd_ref table to reflect the enabling of this XML column. Based on the above examples, the user table sales_tab has the following schema: Based on the above examples, the user table sales_tab has the following schema:
User table sales_tab:
Column Name invoice_number sales_person order
Data Type char(6) varchar(20) XMLVarchar
The enabling column operation will create the following side tables based on the DAD:
Side_table order_tab:
Column Name order_key customer invoice_number
Data Type integer varchar(50) char(6)
Location Path /Order/@Key /Order/Customer N/A
Side table part_tab:
Column Name part_key invoice_number
Data Type integer char(6)
Location Path /Order/Part/@Key N/A
Side table price_tab:
Column Name price invoice_number
Data Type double char(6)
Location Path /Order/Part/ExtendedPrice N/A
Side table ship_tab:
Column Name date invoice-number
Data Type date char(6)
Location Path /Order/Part/Shipment/ShipDate N/A
Note that because the root_id is specified by the primary key invoice_number in the application table sales_tab, all side tables have the column invoice_number of the same type. Also, the value of the invoice_number of each row in the sales_tab will be inserted into the side tables. Since the default_view parameter is specified when enabling the XML column order, a default view sales_order_view is created by the XML System. It joins the above five tables by the following statement: CREATE VIEW sales_order_view(invoice_number,sales_person,order, order_key,customer,part_key,price,date) AS SELECT sales_tab.invoice_number, sales_tab.sales_person, sales_tab.order, order_tab.order_key, order_tab.customer, part_tab.part_key, price_tab.price, ship_tab.date) FROM sales_tab, order_tab, part_tab, price_tab, ship_tab WHERE sales_tab.invoice_number=order_tab.invoice_number AND sales_tab.invoice_number=part_tab.innvoice_number AND sales_tab.invoice_number=price_tab.invoice_number AND sales_tab.invoice_number=ship_tab.invoice_number. Because the tablespace in the enable_column command was not enabled, the default tablespace is used to create side tables. If the tablespace is specified and it does exist in the database, then the side tables will be created in the specified side tables. D.8 Inserting XML Documents For XML columns, an entire XML document is always stored as the column data. The insertion can be achieved in the following ways: Using the default cast function: For each UDT, there is a default cast function to convert the SQL base type to the UDT. The following cast functions can be used in a VALUES clause.
Input
Default UDT Parameter Return
Cast Function Type Type Description
db2xml.XMLVarchar() varchar XMLVarchar Input from
memory buffer
of varchar
db2xml.XMLCLOB() clob XMLCLOB Input from
memory buffer
of clob
db2xml.XMLDBCLOB() dbclob XMLDBCLOB Input from
memory buffer
of dbclob
db2xml.XMLFile() varchar XMLFile Only store file
name
db2xml.XMLURL() datalink XMLURL data type
The following SQL statement inserts the casted varchar type in the host variable xml_buff into the XMLVarchar. INSERT INTO sales_tab VALUES(`123456`, `Sriram Srinivasan`, db2xml.XMLVarchar(:xml_buff)) Using the Storage UDF: For each XML System UDT, there is a storage(or import) UDF to import data from a resource other than its base type. For example, if to import an XML document in a file to the database as a XMLVarchar, then the function XMLVarcharFromFile( ) is used. In the example below, a record is inserted into the sales_tab table. The function XMLVarcharFromFile( ) imports the XML document from a file into the database and stores it as a XMLVarchar. EXEC SQL INSERT INTO sales_tab VALUES(`123456`, `Sriram Srinivasan` XMLVarcharFromFile(`/home/u1/xml/order.xml`)) The above example imports the XML object from the file "/home/u1/xml/order.xml" to the column order in the table sales_tab. D.9 Retrieving XML Documents The XML table is ready to use when the XML column is enabled. Retrieving an XML column directly returns the UDT as the column type. A user can always use the default cast function provided by The database for distinct types to convert a UDT to an SQL base type, then operate on it. In addition to that, a user can also use overloaded UDF Content( ) to retrieve document content from a file or URL to a memory buffer. Using the default cast function: The following cast functions, which are automatically created by the database for the XML UDT, may be used in a SELECT statement.
Default Cast Input Return
Function Parameter Type Type Description
db2xml.varchar() XMLVarchar varchar XML document in variable
length of char
db2xml.clob() XMLCLOB clob XML document in CLOB
db2xml.dbclob() XMLDBCLOB dbclob XML in double byte CLOB
db2xml.varchar() XMLFile varchar XML filename in variable
length of char
db2xml.datalink() XMLURL datalink URL of XML document
The following SQL statement shows how to use the default cast function in a simple query. EXEC SQL SELECT db2xml.varchar(order)from sales_tab Using the content( ) UDF: Suppose XML documents are stored as XMLFile or XMLURL, to operate on these XML documents in memory; the UDF content( ), which takes XMLFile or XMLURL as input and returns a varchar or CLOB, is used. In the example below, a small sqc program segment illustrates how an XML document is retrieved from a file to memory. This example assumes that the column order is of XMLFile type.
EXEC SQL BEGIN DECLARE SECTION;
varchar(3k) xml_buff;
EXEC SQL END DECLARE SECTION;
EXEC SQL CONNECT TO mydb
EXEC SQL DECLARE cl CURSOR FOR
SELECT Content(order) from sales_tab
WHERE sales_person = `Sriram Srinivasan`
EXEC SQL OPEN cl;
do {
EXEC SQL FETCH cl INTO :xml_buff;
if(SQLCODE != 0) {
break;
}
else {
/* do whatever is needed to do with the XML doc in buffer */
}
}
EXEC SQL CLOSE cl;
D.10 Updating XML Documents With the XML System, an entire XML document can be updated by replacing the XML column data. The XML System provides two techniques for update: Using cast functions or storage UDFs in the set clause of the SQL update statement: In this case, a cast function or a UDF is used in the Set clause. Here is an example: UPDATE sales_tab set order=XMLVarcharFromFile(`/home/u1/xml/order2.xml`) WHERE sales_person=`Sriram Srinivasan` Using the Update( ) UDF: The XML System provides a UDF Update( ) which allows a user to specify a location path and the value of the element or attribute represented by the location path to be replaced. In this case, a user does not need to retrieve the XML document and use an editor to change the content. The XML System will do it automatically. Here is an example of using the UDF Update( ). In this example, the content of "/Order/Customer" is updated to NewMart". UPDATE sales_tab set order=Update(order,`/Order/Customer`,`NewMart`) WHERE sales_person=`Sriram Srinivasan` For an XML Column, the XML System will update side tables of extracted data when the XML column is updated. However, a user should not update these side tables directly without updating original XML documents stored in the XML column by changing the corresponding XML element or attribute value. Otherwise, there may be data inconsistency problems. D.11 Retrieving XML Element Contents and Attribute Values For XML columns, the XML System provides a UDF to extract element or attribute values from entire XML documents. The retrieval is performed on an XML document. It is a single document search. The XML System provides extracting UDFs to retrieve XML elements or attributes in the SQL select clause. This is very useful after search filtering on a collection of XML documents to further obtain desired elements or attributes. Suppose there are more than 1000 XML documents stored in the column order in the table sales_tab. To find all customers who have ordered items which have the ExtendedPrice greater than $2500.00, the following SQL statement with the extracting UDF in the select clause can be used: SELECT extractVarchar(Order,`/Order/Customer`) from sales_order_view WHERE price>2500.00 where the UDF extractVarchar( ) takes the order as the input, and the location path "/Order/Customer" as the select identifier, and returns the names of the customer. Note in this statement, only the orders with ExtendedPrice greater than $2500, say maybe 11 such orders, will be the input to the extracting function. The WHERE clause did the filtering on the collection of 1000 XML documents already. Again, the sales_order_view is the default view to join the application table sales_tab and all its side tables, where price is the part_tab.price, representing the "/Order/Part/ExtendedPrice". D.12 Searching an XML Document The above sections have described how the XML System may be used as a document repository for storage and retrieval, as well as for element or attribute selection. Here, searching using indices created on side table columns, which contain XML element contents or attribute values extracted from XML documents, is illustrated. Since the data type of an element or attribute can be specified, searches can be performed on SQL general data types and range searches can be performed. D.13 Search from Join View If desired and specified when an XML column is enabled, the XML System provides a default read-only view which joins the application table with all created side tables through the same unique identifier. With the default view, or any view created by the application, a user can search XML documents by a query on the side tables. The above examples have referenced an application table sales_tab and side tables order_tab, part_tab and ship_tab. The name of a default view sales_order_view is specified at the enabling column time. XML System had created a default view sales_order_view which joins these tables by the statement shown in the previous section. The following example SQL statement will return the sales_persons of the sales_tab who have line item orders stored in the column order where the ExtendedPrice is greater than $2500.00. SELECT sales_person FROM sales_order_view WHERE price>$2500.00 The advantage of a query on the join view is that it provides a virtual single view of the application table and side tables. However, when more side tables are created, the more expensive the query will be. Therefore, it is only recommended when the total number of side table columns is small. An application can create a desired view by joining important side table columns for optimization. Note that the root_id. which can be the specified primary key in the application table or the DXXROOT_ID created by the XML System, provides the way to join tables. D.14 Direct Query on Side Tables Since the DAD is specified by the application, the side tables created by the XML System are known to the application programmer. For better performance, an application can do query or sub-query on side tables directly. The following example shows how to do so for the same query stated above: SELECT sales_person from sales_tab WHERE invoice_number in (SELECT invoice_number from part_tab WHERE price>2500.00) Note that the invoice_number is the primary key in the application table sales_tab. The advantage of direct query with sub-query is better performance. When side tables have parent-children relationships, direct query with sub-query often make more sense. D.15 Query Using UDF In one embodiment, the side tables are created by the DAD, and indices are created for columns in the side tables. Therefore, the search will be fast with indexing. In another embodiment, it is not required that a user create side tables or indices on columns of side tables. The application still can use the extracting UDFs to do the query. Since each extracting UDF will do the source scan, it is very expensive. It should be used when other restrictions are applied to the WHERE clause so that the source scan is performed to a limited number of XML documents. Here is an example: SELECT sales_person from sales_tab WHERE extractVarchar(order,`/Order/Customer`) like `%NewMart%` AND invoice_number>100 D.16 Search on an Element or Attribute with Multiple Occurrences In XML documents, one element name type may occur multiple times. Since attributes belong to elements, the same location path of an attribute may often refer to multiple values. The term "multiple occurrence" will be used to specify this case. In the DAD, a user can specify whether the location path will have multiple occurrence. In the above DAD example, the "/Order/Part/price" has multiple occurrence, and the side table price_tab was crated for it. It is possible to have multiple rows in the part_tab table containing the same invoice_number. Therefore, a user should only select the distinct values. The following provides an example of how to do query for this case: SELECT sales_person from sales_tab WHERE invoice_number in (SELECT DISTINCT invoice_number from price_tab WHERE price>2500.00) On the other hand, since XML System provides additional column DXX_SEQNO in the price_tab, a user can select a price and pair it with the corresponding ShipDate. The following is an example: SELECT price_tab.price, ship_tab.date from price_tab, ship_tab WHERE price_tab.invoice_number=ship_tab.invoice_number AND price_tab.DXX_SEQNO ship_tab.DXX_SEQNO A user can also select the price ordered by the sequence number, as illustrated in the following example: SELECT price price_tab ORDER by DXX_SEQNO D.17 Structural-text Search In one embodiment of the invention, the structural-text or full text search is performed after enabling XML columns with Text Extender, a product from International Business Machines, Corporation. In the examples discussed herein, to perform structural-text on the column order, a user can enable the column with the Text Extender, by specifying a text handle name, say "orderHandle". Then with the Text Extender's section search support, the XML document with the word "XYZ" in the section "/Order/Customer" can be found. The following example shows how: SELECT order FROM sales_tab WHERE Contains(orderHandle. `model Order section(/Order/Customer).backslash."XYZ.backslash."`)=1 Where Order is the model name and Order; Customer is the section name. D.18 Deleting XML Documents Deleting a row from an XML table is done with a SQL DELETE statement. A user can use the search technique discussed above to specify the WHERE clause. The following is a simple example: DELETE from sales_tab WHERE invoice_number in (SELECT invoice_number from part_tab WHERE price>2500.00) D.19 Disable Columns The disable_column option disables the XML enabled column. The following is the syntax for disabling a column: dxxadm disable_column db_name tab_name column_name The following are the arguments for disable_column: db_name: the database name, tab_name: the table name in which the XML column resides. column_name: the name of XML column. The following actions are performed by disable_column: Delete the entry from the XML_USAGE table. Decrement the USAGE_COUNT in the DTD_REF table. Drop all triggers created with this column. Drop side table associated with this column. In one embodiment, a user must disable an XML column before dropping an XML table. If an XML table is dropped, but its XML column is not disabled, then all side tables created by the XML System will not be dropped. This may cause problems for the XML system to keep track of the number of enabled XML columns. D.20 Detailed Techniques The server code is the core of XML System. It has several major components, and each one performs a unique role in the product. The admin stored procedures are used to "xmlally" enable and disable the database, columns and indices. For performance and simplicy, these stored procedures were written in the embedded SQL. The XML System provides a number of functions in the server code. The functions are: dxxEnableDB, dxxDisableDB( ), dxxEnableColumn( ), dxxDisableColumn( ), and dxxEnableCollection( ). The dxxEnableDB stored procedure enables a database for XML document access. It uses the DDL statements to create XML System UDTs, a set of external UDFs, a set of internal UDFs, the DTD reference table, and the XML_USAGE table. The implementation of these UDFs are in the UDFs component. The dxxDisableDB( ) stored procedure drops everything created by the dxxEnabeDB( ). It does error checking on DTD_REF and XML_USAGE tables. The dxxEnableColumn( ) stored procedure enables an XML column of the XML System UDT. It parses the input DAD, create side tables, and triggers according to the DAD. It also updates the XML_USAGE table. The dxxDisableColumn( ) stored procedure disables an XML column. It deletes all side tables created by the XML System and updates the XML_USAGE table. The dxxEnableCollection( ) stored procedure enables an XML collection. It inserts a new row in the XML_USAGE table and stores the input DAD there. It checks or creates collection tables according to the DAD. The design description comprises program functions that implement the stored procedures in the source. These are listed below: dxxdb.sqc 1. dxxEnableDB( )
function name caller input output
dxxEnableDB() main() in dbName errCode,
(enable_db) dxxadm.sqc errMsg
The following is a Functional Description: 1. Initialize input and output parameters form sqlda, and sqlca. 2. Loop through enable statement array to execute each DDL 3. Check whether database is DBCS enabled. 4. If DBCS enabled, loop to execute each DBCS enable DDL statement 5. error check. 6. Set output parameter. 2. dxxDisableDB( )
function name caller input output
dxxDisableDB() main() in dbName errCode,
(disable_db) dxxadm.sqc errMsg
The following is a Functional Description: 1) Loop through disable statement array to execute each DDL. 2) Check whether database is DBCS enabled. 3) If DBCS enabled, loop to execute each DBCS disable DDL statement. 4) Error initialize input and output parameters form sqlda, and sqlca check. 5) Set output parameter. 3. IsDBCS_DB( )
function name caller input output
isDBCS_DB() dxxEnableDB(), dbName TRUE or FALSE
dxxDisableDB()
The following is a Functional Description: Check whether database is DBCS enabled, if so, return TRUE, otherwise return FALSE. dxxcol.sqc 1. dxxEnableColumn( )
function name caller input output
dxxEnableColumn main() in 2. dbName, 1. errCode
dxxadm.sqc 3. tabName, 2. errMsg
4. colName,
5. dadBuf,
6. tablespace,
7. defaultView,
8. rootID,
The following is a Functional Description: 1. Initialize host variables. 2. Initialize output parameters. 3. Call getParameter( ) to get input parameters, Call check_table( ) to check table name. Call check_column( ) to check input column name, and Check rootID. 4. Initialize XML4C parser. 5. Parse DAD. 6. Call dad_popu( ) to populate DAD into internal data structure. 7. Error checking on DAD: access_mode, DTDID 8. Get colno of this column from syscat, used as suffix of triggers. 9. Call createSideTables( ) to create side tables. 10. Create triggers on user tables: 1. rootidTrigger_BIT, 2. insertTrigger_AIT, 3. deleteTrigger_ADT, 4. updateTrigger_AUT, 5. validateTrigger_VIT, 6. validateTrigger_VUT 11. Create default view. 12. Insert a row into XML_USAGE table. 13. Update DTD_REF table. 14. Set output error message. 15. Error check on commit. 16. Free DAD structure. 2. dxxDisableColumn( ) function name caller input output dxxDisableColumn main( ) in dxxadm.sqc 1. dbName, 2. tabName, 3. colName, 1. errCode 2. errMsg The following is a Functional Description: 1. Initialize host variables. 2. Initialize output parameters. 3. Call getParameter( ) to get input parameters. 4. Check input parameters: 1. call check_table( ) to check table name, and 2. call check_column( ) to check input column name. 5. Get column name, DAD, DTDID, defaultView, Trigger suffix from XML_USAGE table. 6. Check whether column is XML enabled and trigger suffix exists. 7. Drop default view. 8. Parse DAD and populate DAD data structure. 9. Using DAD structure to delete all side tables. 10. Call createSideTables( ) to create side tables. 11. Drop triggers on user tables: 1. reset DXXROOT_ID, drop rootidTrigger_BIT, 2. insertTrigger_AIT, 3. deleteTrigger_ADT, 4. updateTrigger_AUT, 5. validateTrigger_VIT, and 6. validateTrigger_VUT. 12. Update DTD.sub.REF table. 13. Delete the row into XML_USAGE table. 14. Set output error message. 15. Error check on commit. 16. Free DAD structure. 3. check_table( )
function name caller input output
check_table dxxEnableColumn(), 1. dbName, 1. errCode
dxxDisableColumn() 2. errMsg
The following is a Functional Description: Check whether the table exists in the database by looking at the syscat.columns. 4. check_column( )
function name caller input output
check_column dxxEnableColumn(), 1. table name, 1. errCode
dxxDisableColumn() 2. column 2. errMsg
name
The following is a Functional Description: Check whether the column exists in the right table by looking at the syscat.columns. 5. getParameter( ) function name caller input output getParameter dxxEnableColumn( ), dxxDisableColumn( ) 1. in_sqlvar 1. data, 2. errCode 3. errMsg The following is a Functional Description: Extract parameter data from in_sqlvar, according to SQL_TYPE. 6. createSideTable( )
function name caller input output
createSide dxxEnableColumn() 1. pDAD, 1. errCode
Table() 2. rootid, 2. errMsg
3. rootid
definition
4. tablespace,
The following is a Functional Description: This routine creates all side tables specified in the DAD. It takes the pDAD (pointer to DAD) data structure, looping the list of side tables, and generates the "CREATE TABLE" statement. 1. If the rootid is not specified, then add DXXROOT_ID as a not-null column, else add the primary key in user table as a not-null column. 2. If the tablespace is specified, add "IN" tablespace to the CREATE TABLE statement. 7. rootidTrigger_BIT
function name caller input output
rootidTrigger_BIT dxxEnableColumn() 1. pDAD 1. errCode
2. errMsg
The following is a Functional Description: This routine creates the Before Insert Trigger (BIT) to add the value of DXXROOT_ID, which is generated from the generate_unique( ) function. 1. Check syscat.triggers to see whether the BIT exists or not, if exist, return error. 2. Execute the create trigger statement as: CREATE TRIGGER user_table.BIT BEFORE INSERT ON user_table REFERENCING NEW AS NEW FOR EACH ROW MODE DB2SQL WHEN (xmlcolumn IS NOT NULL) BEGIN ATOMIC SET NEWROW.DXXROOT_ID=generate_unique( ) END where user_table and xmlcolumn are taken from pDAD. 8. insertTrigger_AIT
function name caller input output
insertTrigger_AIT dxxEnable 1. pDAD 1. errCode,
Column(), 2. trigger_suffix, 2. errMsg
3. rootid,
The following is a Functional Description: This routine creates the After Insert Trigger (AIT) to populate the side tables after a row is inserted into the user table with an XML column. Loop through pDAD->s_table, for each s_table, pDADst do: For each column in the pDADst do: 1. If (pDADst->col is not multiple occurred) then set the trigger_stmt to: INSERT INTO side_tab VALUES (NEWROW.rootid, db2xml.extractDataType(xmlcolumn,path)) else set the trigger_stmt to: INSERT INTO side_tab SELECT NEWROW.rootid,db2xml.seqno( ), RETURNED DataType FROM TABLE(db2xml.extractDataTypes(xmlcolumn,path)) x 2. Execute the statement: "CREATE TRIGGER user_tab.AITtrigger_suffix AFTER INSERT ON user_tab REFERENCING NEW AS NEWROW FOR EACH ROW MODE DB2SQL WHEN (xmlcolumn IS NOT NULL) BEGIN ATOMIC trigger_stmt; END" where user_tab, side_tab, xmlcolumn and path are getting from pDADst, and DataType is getting from the call of mapType(pDADst->col->type) Note: the db2xml.seqno( ) is a UDF to generate a sequence number of multiple occurrence. 9. deleteTrigger_ADT
function name caller input output
deleteTrigger_ADT dxxEnableColumn() 1. pDAD 1. errCode
2. trigger_suffix, 2. errMsg
3. rootid,
The following is a Functional Description: This routine creates the After Delete Trigger (ADT) to delete rows in side tables after a row is deleted from the user table with an XML column. Loop through pDAD->s_table, for each s_table, pDADst do: execute the statement: "CREATE TRIGGER user_tab.ADTtrigger_suffix AFTER DELETE ON user_tab REFERENCING OLD AS OLDROW FOR EACH ROW MODE DB2SQL BEGIN ATOMIC DELETE FROM side_tab WHERE OLDROW.rootid=side_tab.rootid END" where user_tab, side_tab, xmlcolumn and path are getting from pDADst 10. updateTrigger_AUT
function name caller input output
updateTrigger.sub.-- dxxEnableColumn() 1. pDAD, 1. data,
AUT 2. trigger_suffix 2. errCode
3. rootid 3. errMsg
The following is a Functional Description: This routine creates the After Update Trigger (AUT) to update rows in side tables after a row is updated in the user table with an XML column. Loop through pDAD->s_table, for each s_table, pDADst do: For each column in the pDADst do: If (pDADst->col is not multiple occurred) then set the trigger_stmt to: UPDATE side_tab SET set_stmt WHERE NEWROW.rootid=side_tab.rootid else set the trigger_stmt to: DELETE FROM side_tab WHERE NEWROW.rootid=side_tab.rootid; INSERT INTO side_tab SELECT NEWROW.rootid, db2xml.seqno( ), RETURNED DataType FROM TABLE(db2xml.extractData Types(xmlcolumn.path)) x Execute the statement: "CREATE TRIGGER user_tab.AUTtrigger_suffix AFTER UPDATE ON user_tab REFERENCING NEW AS NEWROW FOR EACH ROW MODE DB2SQL WHEN (xmlcolumn IS NOT NULL) BEGIN ATOMIC trigger_stmt; END" where user_tab, side_tab, xmlcolumn and path are getting from pDADst, and DataType are getting from the call of mapType(pDADst->col->type) 11. validationTrigger_VBIT
function name caller input output
validationTrigger.sub.-- dxxEnableColumn() 1. pDAD, 1. errCode,
VBIT 2. trigger.sub.--
suffix 2. errMsg
The following is a Functional Description: This routine create a Validation Before Insert Trigger (VBIT) to validate an input XML document before inserting it into a user table. Due to the use of XML4C parser, it retrieves the DTD from dtd_ref table and puts it in an external file, then calls the UDF db2xml.validate in the trigger. It executes the following statement: "CREATE TRIGGER user_tab.VBITtrigger_suffix BEFORE INSERT ON user_tab REFERENCING NEW AS NEWROW FOR EACH ROW MODE DB2SQL WHEN validation!=0 SIGNAL SQLSTATE `DXX_SQLSTATE_INVALID_DOC` (DXX.sub.-- 0000E) where validation= SELECT db2xml.validate(NEWROW.xmlcolumn, db2xml.content(content,tmpfileName), pDAD->dtdid) FROM db2xml.dtd_ref WHERE dtdid=pDAD->dtdid) where_user tab, xmlcolumn are getting from pDAD, tmpefileName is set by this routine, and the "content" is the column name in dtd_ref for DTD.db2xml.content( ) is a UDF. 12. validation Trigger_VBUT
function name caller input output
validationTrigger.sub.-- dxxEnableColumn() 1. pDAD, 1. errCode
VBUT 2. trigger_suffix 2. errMsg
The following is a Functional Description: This routine create a Validation Before Update Trigger (VBUT) to validate an input XML document before updating it in user table. Due to the use of XML4C parser, it retrieves the DTD from dtd_ref table, puts it to an external file, then calls the UDF db2xml.validate in the trigger. It executes the following statement: "CREATE TRIGGER user_tab.VBUTtrigger_suffix BEFORE UPDATE ON user_tab REFERENCING NEW AS NEWROW FOR EACH ROW MODE DB2SQL WHEN validation !=0 SIGNAL SQLSTATE `DXX_SQLSTATE_INVALID_DOC` (DXX.sub.-- 0000E) where validation= SELECT db2xml.validate(NEWROW.xmlcolumn, db2xml.content(content,tmpfile.Name). pDAD->dtdid) FROM db2xml.dtd_ref WHERE dtdid=pDAD->dtdid) where user_tab, xmlcolumn are getting from pDAD, tmpefileName is set by this routine, and the "content" is the column name in dtd_ref for DTD.db2xml.content( ) is a UDF. 13. createDefaultView
function name caller input output
createDefaultView dxxEnableColumn() 1. pDAD, 1. errCode
2. rootid, 2. errMsg
3. tablename,
4. default_view
The following is a Functional Description: This routine creates a default view which joins the user table and XML column side tables together with the name specified as the input parameter default_view. The key here is to join by the rootid, which can be the DXXROOT_ID or the primary key of user table. As the input to this routine, the rootid is used as the column name for join. 1. Declare a cursor on the statement: SELECT colname FROM syscat.column WHERE tabname=tablename execute the statement, open the cursor and tetch on the cursor, while (! end of fetch) { get column and append user_tab.column to string userTableColumns append tablename to string alltablenames, 2. Looping through the pDAD structure, for each side table pDADst do: append side table name pDADst->stbname to alltablenames, append to string join_condition with "tablename.rootid=pDADst->stbname.rootid"; loop through the pDADst->col list, for each column in the side table do: append the pDADst->stbmame.pDAdst->scolname to the string sideTableColumns; 3. execute the create view statement: CREATE VIEW default_view AS SELECT usertableColumns sideTableColumns FROM alltablenames WHEREjoin_condition D.21 Flow Diagrams FIG. 4 is a flow diagram illustrating steps performed by the XML System in creating and maintaining XML document data as column data. In Block 400, the XML System creates a table with an XML column having a XML column type. The table is created in response to a CREATE TABLE statement that specifies the XML column. In Block 402, the XML System enables the XML column. Next, the XML System, in Block 404, creates side tables using a Data Access Definition for the XML column. In Block 406, the XML System creates triggers for Insert, Update, and Delete on the XML column, so that the side tables are populated when the main table is populated and the side tables are modified when the main table is modified. Thus, the main table and side tables are synchronized. In Block 408, when data is inserted into the main table, the XML System inserts data into the side tables. In Block 410, when the main table is modified (i.e., data is updated or deleted), the XML System modifies the side tables. FIGS. 5 and 6 illustrate key aspects of an embodiment of the invention. In particular, these figures illustrate enabling a column and disabling a column. FIG. 5 is a flow diagram of steps performed by the XML System to enable a column. In block 500, the XML System initializes all variables. In block 502, the XML System gets and checks input parameters. In block 504, the XML System calls a XML4C parser to parse a DAD. In block 506, the XML System determines whether the root_id is input by an application. If not, the XML System continues to block 508, otherwise, the XML system continues to block 510. In block 508, the XML System creates side tables with DXXROOT_ID. In block 510, the XML System creates side tables with the user table's primary key as the root_id. In block 512, the XML System creates the root_id, insert, delete, and update triggers on user tables. In block 514, the XML System determines whether the DAD specifies validation. If so, the XML System continues to block 516, otherwise, the XML System continues to block 518. In block 516, the XML System creates validation triggers. In block 518, the XML System determines whether a default view is input by the application. If so, the XML System continues to block 520, otherwise, the XML System continues to block 522. In block 520, the XML System creates a default view. In block 529, the XML System inserts an entry into XML_USAGE TABLE. In block 524, the XML System updates the DTD_REF. FIG. 6 is a flow diagram of steps performed by the XML System to disable a column. In block 600, the XML System initializes all variables. In block 602, the XML System gets and checks input parameters. In block 604, the XML System gets the DAD, DTDID, and default view from the XML_USAGE TABLE. In block 606, the XML System determines whether the default view is null. If si, the XML System continues to block 610, otherwise, the XML system continues to block 608. In block 608, the XML System drops the default view. In block 610, the XML System parses the DAD to get side table names. In block 612, the XML System drops all side tables. In block 614, the XML System drops the root_id, insert, delete, and update triggers on user tables. In block 616, the XML System determines whether the DAD specifies validation. If so, the XML System continues to block 618, otherwise, the XML System continues to block 620. In block 616, the XML System creates validation triggers. In block 620, the XML System deletes the entry from the XML_USAGE TABLE. In block 622, the XML System updates the DTD_REF table. E. Generating One or More XML Documents From a Single SQL Query In one embodiment of the invention, an XML System is provided that generates one or more XML documents from a single SQL query. This technique is referred to as "SQL mapping". The XML System retrieves data in existing relational database tables and forms a set of one or more XML documents. Using the XML System, application programs can turn existing business data into one or more new XML documents to be interchanged from business to business via a network, such as the internet or an intranet. The XML System takes a single SQL query, along with a definition of the data model from which one or more XML documents are to be generated (i.e., a DAD), and forms one or more XML documents using the data in existing database tables which meet the query condition. The XML System is implemented by stored procedures which can be called from the database client code. The stored procedures take a Data Access Definition (DAD), which consists of the SQL query, the Extensible Markup Language Path (XPath) data model based definition of the document structure to be generated, and a table name which will contain the generated one or more XML documents as its row data. The stored procedures use a heuristic technique to eliminate duplication from the SQL query. Additionally, the stored procedure identifies the relational hierarchy of the SQL query and maps the data obtained from the SQL query into elements and attributes of generated one or more XML documents. An Xcollection defines how to compose one or more XML documents from a collection of relational tables. An XML collection is a virtual name of a set of relational tables. Applications can enable an XML collection of any user tables. These user tables can be existing tables of legacy business data or ones newly created by the XML System. A user can access XML collection data through the stored procedures provided by the XML System. An XML collection is used to transform data between database tables and one or more XML documents. An XML collection achieves the goal of data interchange via XML. For applications that want to compose one or more XML documents from a set of relational tables, the XML System offers a technique to enable an XML collection through a Document Access Definition (DAD). In the Document Access Definition, applications can make a custom mapping between database column data in new or existing tables to XML elements or attributes. The access to an XML collection is by calling the XML System's stored procedures or directly querying the tables of the collection. E.1 EXAMPLE The following discussion provides an example of generating one or more XML documents from a relational database using an SQL query and a simple DAD. In particular, a relational database is illustrated. Then, an SQL query is illustrated that is used to retrieve data from the relational database. Next, the results of the SQL query are illustrated. Moreover, the Document Access Definition (DAD), which contains the SQL query is illustrated, along with a Document Type Definition (DTD). After this, one XML document that is generated to contain the data retrieved by the SQL query is illustrated. Relational Database:
order_tab:
order_key customer_name customer_email customer_phone
1 General Motor parts@gm.com 800-GM-PARTS
part_tab:
part_key color qty price tax order_key
156 red 17 17954.55 0.02 1
68 black 36 34850.16 0.06 1
128 red 28 38000.00 0.07 1
ship_tab:
date mode comment part_key
1998-03-13 TRUCK This is the first shipment to 156
service of GM.
1999-01-16 FEDEX This the second shipment to 156
service of GM.
1998-08-19 BOAT This shipment is requested 68
by a call. from GM
marketing.
1998-08-19 AIR This shipment is ordered by 68
an email.
1998-12-30 TRUCK NULL 128
The following is an SQL query. The SELECT term selects columns. The FROM term indicates the tables from which data is to be selected. The WHERE term indicates the conditions for selecting data. This SQL query is defined in a Document Access Definition, which is illustrated below. SELECT o.order_key, customer_name, customer_email, p.part_key, color, qty, price, tax, ship_id, date, mode FROM order_tab o, part_tab p, table(select substr(char(timestamp(generate_unique( ))), 16) as ship_id, date, mode, part_key from ship_tab) WHERE order_key=1 and p.price>20000 and p.order_key=o.order_key and s.part_key=p.part_key The following is a table holding the results of executing the SQL query:
order.sub.-- customer.sub.-- customer.sub.-- part.sub.--
key name email key color qty price tax ship_id
date mode
1 General parts@gm.com 68 red 36 34850 16 0.06
4.58.825484 1998-08-19 BOAT
Motor
1 General parts@gm.com 68 red 36 34850.16 0.06
4.58.825537 1998-08-19 AIR
Motor
1 General parts@gm.com 128 red 28 38000.00 0.07
4.58.825589 1998-12-30 TRUCK
Motor
The data in order_key, customer_name, customer_email, part_key,qty, price, and tax are duplicated for each shipment. The data in order_key, customer_name, and customer_email are duplicated for each part. This issue is addressed by partitioning the columns into equivalence classes that reflect the semantics of the relational data: {order_key, customer_name, customer_email}, {part_key, color, qty, price, tax}, and {ship_id, date, model}. The XML System opens a new cursor only when it crosses a boundary between classes. A user can decide how structured XML documents are to be stored or created through a Document Access Definition(DAD). The DAD itself is an XML formatted document. The DAD associates XML documents to a database by defining an Xcollection. The SQL_stmt in the DAD is an SQL query that specifies how columns of a table are to be mapped to XML elements and attributes. The columns in the SELECT clause are mapped to XML elements or attributes. They will be used to define the value of attribute_nodes or content of text_nodes. The FROM clause defines the tables containing the data, and the WHERE clause specifies the join and search conditions. Assuming the following structure of an XML document will be generated from the data selected by a SQL_stmt, how to use an XML collection to specify the DAD will be illustrated below.
<?xml version="1.0"?>
<!DOCTYPE Order SYSTEM
"E:.backslash.dxx.backslash.test.backslash.dtd.backslash.litem.dtd">
<Order key="1">
<Customer>General Motor</Customer>
<Part key="68">
<Quantity>36</Quantity>
<ExtendedPrice>34850.16</ExtendedPrice>
<Tax>0.06</Tax>
<Shipment>
<ShipDate>1998-04-12</ShipDate>
<ShipMode>BOAT</ShipMode>
<Comment>This shipment is requested by a
call from GM marketing</Comment>
</Shipment>
<Shipment>
<ShipDate>1998-08-19</ShipDate>
<ShipMode>AIR</ShipMode>
<Comment>This shipment is ordered by an
email</Comment>
</Shipment>
</Part>="128">
<Quantity>28</Quantity>
<ExtendedPrice>38000.00</ExtendedPrice>
<Tax>0.07</Tax>
<Shipment>
<ShipDate>1998-12-30</ShipDate>
<ShipMode>TRUCK</ShipMode>
<Comment>This is the first shipment to
service of GM</Comment>
</Shipment>
</Part>
</Order>
The following sample DAD shows how to define the mapping from relational tables to one or more XML documents using SQL mapping. The following sample DAD shows how to specify an SQL query to compose a set of one or more XML documents from data in three relational tables.
Litem_DAD2.dad
<?xml version="1.0"?>
<!DOCTYPE Order SYSTEM "E:.backslash.dtd.backslash.dxxdad.dtd">
<DAD>
<dtdid>E:.backslash.dtd.backslash.lineItem.dtd</dtdid>
<validation>YES</validation>
<Xcollection>
<SQL_stmt>
SELECT o.order_key, customer, p.part_key, qty, price, tax, ship_id,
date, mode,comment
FROM order_tab o, part_tab p,
table(select substr(char(timestamp(generate_unique())).16) as
ship_id,date,mode,comments from ship_tab) as s
WHERE p.price > 2500.00 and s.date > "1996-06-01" AND
p.order_key = o.order_key and s.part_key = p.part_key
</SQL_stmt>
<objids>
<column name="order_key"/>
<column name="part_key"/>
<column name="ship_id"/>
</objids>
<prolog>?xml version="1.0"?</prolog>
<doctype>!DOCTYPE Order SYSTEM
"E:.backslash.dtd.backslash.lineItem.dtd"</doctype>
<root_node>
<element_node name"Order">
<attribute_node name"Key">
<column name="order_key"/>
</attribute_node>
<element_node name="Customer">
<text_node>
<column name="customer"/>
</text_node>
</element_node>
<element_node name="Part">
<attribute_node name="Key">
<column name="part_key"/>
</attribute_node>
<element_node name="Quantity">
<text_node>
<column name="qty"/>
</text_node>
</element_node>
<element_node name="ExtendedPrice">
<text_node>
<column name="price"/>
</text_node>
</element_node>
<element_node name="Tax">
<text_node>
<column name="tax"/>
</text_node>
</element_node>
<element_node name="Shipment">
<element_node name="ShipDate">
<text_node>
<column name="date"/>
</text_node>
</element_node>
<element_node name="ShipMode">
<text_node>
<column name="mode"/>
</text_node>
</element_node>
<element_node name="Comments">
<text_node>
<column name="comment"/>
</text_node>
</element_node>
</element_node> <!-- end Shipment -->
</element_node> <!-- end Part -->
</element_node> <!-- end Order -->
</root_node>
</Xcollection>
</DAD>
The SQL query should be in a top-down order of the relational hierarchy. In the example, it is required to specify the selected columns in the order of 3 levels: order, part and shipment. Within each level, the objid must be the first column. If the order described is not preserved, the generated XML documents may not be correct. E.2 How to Use an XML Collection An XML collection is a set of relational tables which contain XML data. These tables can be new tables generated by the XML System or existing tables which have data to be used by the XML System to generate one or more XML documents. Stored procedures provided by the XML System serve as the access methods. Unlike the XML column, an XML collection does not have to be enabled. The enablement is based on the operations performed. A composition operation of an XML collection is to generate one or more XML documents from data existing in the collection tables. Therefore, for this operation, an XML collection does not need to be enabled, providing all tables already exist in the database. The DAD will be passed to stored procedures. The DAD can be overridden by other XML query parameters as the stored procedure input parameters. This kind of parameter can be obtained from various sources (e.g., dynamically from the web). In the DAD preparation, first "Xcollection" is defined. An Xcollection can be defined for composition or decomposition, in the way of either SQL mapping or RDB_node mapping. In both cases, the following steps should apply: 1. Specify the DTDID. The DTDID must be the same as the system ID in the doctype. 2. Specify the validation. When using DTDID, the validation should be always be "YES". 3. Specify the prolog and doctype. For composition, the same prolog and doctype are allowed. E.2.1 Enabling an XML Collection The purpose of enabling an XML Collection for decomposition is to parse a DAD, create new tables or check the mapping against existing tables. The DAD is stored into the XML_USAGE table when the XML Collection is enabled. When a user prefers to have the XML System create collection tables, the user should enable the XML collection. Additionally, the enablement depends on the stored procedure the user chooses to use. The stored procedure dxxInsertXML( ) will take XML Collection name as the input parameter. In order to use the stored procedure dxxInsertXML( ), the user must enable an XML collection before calling it. The user can call stored procedure dxxShredXML( ) without the enabling of an XML collection by passing a DAD. In the later caPse, all tables specified must exist in the database. E.2.1.1 Enabling XML Collection Option For composition, an XML collection is not required to be enabled. The assumption is that all collection tables already exist in the database. The stored procedure can take a DAD as an input parameter and generate XML documents based on the DAD. On the other hand, the composition is the opposite of the decomposition. For XML collections enabled during the decomposition process, the DAD is likely to be used to compose XM | ||||||
