System and method for generating horizontal view for SQL queries to vertical database6763350Abstract A database including vertical tables useful for storing large numbers of objects having potentially thousands of attributes in, e.g., e-commerce applications. To support querying the vertical database using conventional SQL, a horizontal view over the underlying vertical tables is defined, and then queries are posed against the view. The queries are automatically transformed and executed against the vertical tables. If desired, the query results can be transformed back to a horizontal format. In this way, it appears to the user that a conventional horizontal data format is being used. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE 1
Horizontal (H)
Oid A1 A2 A3
1 a b .perp.
2 .perp. c d
3 .perp. .perp. a
4 b .perp. d
Vertical (V)
Oid Key Val
1 A1 a
1 A2 b
2 A2 c
2 A3 d
3 A3 a
4 A1 b
4 A3 d
We start with the well-understood algebraic operations: select(.sigma.), project(.pi.), join({character pullout}), outer join ({character pullout}), left outer join ({character pullout}), right outer join ({character pullout}), cross product (.chi.), union (.orgate.), intersection (.andgate.), difference (-), and aggregation (F). We add two operations to this algebra: v2h(.OMEGA.) and h2v({character pullout}). We define the semantics of these operations after briefly introducing a few notations. Notation Assume that the vertical table V has the scheme (Oid, Key, Val) with a non-nullable column Oid and that A1, . . . , An are the key values in V. The equivalent horizontal table H has the scheme (Oid, A1, . . . , An) with the column Oid being non-nullable. We use Ak+m to represent the (k+m).sup.th attribute. The symbol .perp. represents a null value. We will use .THETA..sub.i=1.sup.k.PSI..sub.i as a short hand for .psi..sub.1.theta..psi..sub.2 . . . .theta..psi..sub.k. For a join opera (including its outer species), unless otherwise stated, assume the join predicate to be the equality of Oid. An explicit join predicate .psi. will be specified as {character pullout}. Sometimes we will use S0, S1, etc. to refer to the columns of a result table. For visual clarity, we will sometimes add square brackets in an expression as shown below. [.PSI..sub.0 ].theta.[.THETA..sub.i=1.sup.k.PSI..sub.i ] These square brackets do not affect the order of evaluation in any way; they are there only to enhance readability. v2h Operation Intuitively, the v2h(.OMEGA.) operation takes as input a vertical table and a list of attribute names and returns a horizontal table with those attribute names as the column labels. .OMEGA..sup.k (V) creates a horizontal table of arity k+1 whose first column is Oid and the first k key values form the rest of the k columns. The content of the table is defined by: .OMEGA..sup.k (V)=[.pi..sub.Oid (V)]{character pullout}[{character pullout}.sub.i=1.sup.k.pi..sub.Oid, Val (.sigma..sub.Key=`Ai` (V))] (1) Because of the first term on the right hand side and the use of left outer join, Eq. 1 can yield tuples with nulls in all of the non-Oid columns. For example, .OMEGA..sup.2 applied to a vertical table V (Table 1) results in the table shown in Table 2. For Oid=3, V does not contain tuples corresponding to key values A1 and A2. However, the result table contains a tuple with this Oid and null values for attributes A1 and A2.
TABLE 2
.OMEGA..sup.2 (V)
##STR1##
This semantics is consistent with the null handling in SQL. For instance, a projection of the horizontal table in Table 1 on attributes A1 and A2 will indeed preserve the tuple corresponding to Oid=3 in SQL. If it were desired that the v2h operation does not create such tuples, Eq. 1 can be modified to: .OMEGA..sup.k (V)={character pullout}.sub.i=1.sup.k.pi..sub.Oid, Val ((.sigma..sub.Key=`Ai` (V)) (2) h2v Operation The h2v({character pullout}) operation is the inverse of the v2h operation. Intuitively, it takes as input a horizontal table and converts it into a vertical table where each column label in the horizontal table is converted to a key value in the vertical table. Assume a horizontal table H having the scheme (Oid, A1, . . . , An) with the column Oid being nonnullable. {character pullout}.sup.k (H) creates a vertical table with the scheme (Oid, Key, Val). The content of V is defined by: {character pullout}.sub.k (H)=[.orgate..sub.i=1.sup.k.pi..sub.Oid, `Ai`, Ai (H)].orgate.[.orgate..sub.i=1.sup.k.pi..sub.Oid, `Ai`, Ai ((.sigma..sub..LAMBDA..sub..sub.i=1 .sup.k.sub.Ai=`.perp.`)] (3) For each tuple h in H, the first term in Eq. 3 creates the tuples {<Oid, `Ai`, h.Ai>.vertline.i=1, . . . , n.LAMBDA.h.Ai.noteq..perp.}. The second term is needed to maintain the equivalence of the horizontal and vertical representations. It handles the special case of a horizontal tuple that has null values in all of the non-Oid columns. Table 3 shows the result of applying {character pullout}.sup.2 to the horizontal table H from Table 1. Rewritings We describe now the rewritings of the standard algebraic operations on the horizontal view over a vertical table. We give two forms of rewritings: one with and, the other without using the v2h operation. The former can be used on a SQL-92 system whereas the latter can exploit the implementation of v2h operation using the object-relational extensions.
TABLE 3
{character pullout}.sup.2 (H)
##STR2##
Projection Let the projection be on attributes A1, . . . , Ak. We have, from the definition of the v2h operation: ##EQU1## Selection We discuss the usual case of a selection followed by projection. Let the selection predicate be .LAMBDA..sub.i=1.sup.k (Ai.theta.`ai`) and the projection be on the first k+m attributes, m.gtoreq.0. Table 4 gives an example of this transformation. ##EQU2## A disjunctive selection .pi..sub.A1, . . . , Ak+m ((.sigma..sub.V.sub..sub.i=1 .sub..sup.k .sub.Ai.theta.`ai` (H) can be transformed by simply replacing the intersection .andgate..sub.i=1.sup.k in Eq. 7 with the union .orgate..sub.i=1.sup.k.
TABLE 4
.pi..sub.A2,A3 (.sigma..sub.A1=`a` {character pullout} A2=`b` (H)) =
.pi..sub.A2,A3 ([.pi.Oid((.sigma.Key=`A1` {character pullout} Val=`a` (V))
.andgate.
.sigma..sub.Key=`A2` {character pullout} Val=`b` (V)))] {character pullout}
[.pi..sub.Oid,Val (.sigma..sub.Key=`A1` (V))] {character pullout}
[.pi..sub.Oid,Val (.sigma..sub.Key=`A2` (V
[.pi..sub.Oid,Val (.sigma..sub.Key=`A3` (V))])
##STR3##
Join Take a horizontal table H having the scheme (Oid, A1, . . . , An) which is really a logical view over a vertical table V. Its join with a true horizontal table RH having the scheme (R1, . . . , Rr) is given by: ##EQU3## Table 5 illustrates this transformation.
TABLE 5
##STR4##
##STR5##
Aggregation We use the following notation to specify aggregation: Grouping attributes.sup.F Function list.sup.(Table name) Function list consists of (function, attribute) pairs, where function can be one of the allowed aggregate functions such as SUM, COUNT, AVG, MAX, and MIN. The transformations are: ##EQU4## For aggregate functions, SUM, MIN, and MAX, which are unaffected by null values in the column being aggregated, Eq. 11 can be simplified to: .sub.A1, . . . , Ak F.sub.F Ak+1 (H)=.sub.$1, . . . , $k F.sub.F$k+1 ({character pullout}.sub.i=1.sup.k.pi..sub.Oid, Val (.sigma..sub.Key=`Ai` (V)) Set Operations The set operations cross product(X), union(.orgate.), intersection(.andgate.), and difference(-) can be transformed by first applying the v2h operation on the vertical table(s) and then carrying out the desired operation. Updates Updates are easy. Insertion requires decomposing a data object into a set of attribute name and value pairs and inserting them into V with a common Oid. A predicate-based deletion requires determining the Oid set of objects satisfying the predicate and deleting the corresponding tuples from V. An update results in a change of the value field in some tuples in V. It can also cause some insertions and deletions. Output There may be need for transforming the result of an operation involving a vertical table back into the vertical format (e.g. for storing the result). This can be accomplished by applying the h2v operation on the result table. Implementation With the algebra described above in hand, we are in a position to develop a non-intrusive enablement layer on top of the database engine that hides from the user (application) the vertical table. A horizontal view H is defined for the vertical table V using an extended DDL: CREATE HORIZONTAL VIEW H ON VERTICAL TABLE V USING COLUMNS (A1, A2, . . . , A.sub.n) where A.sub.i=1,n represent attribute names (keys) in the vertical table. The DDL is generated by the enablement layer. The user poses regular SQL queries over the view. The enablement layer parses the SQL query, validates it, and transforms it to another SQL query that runs against the underlying vertical table. It uses a query graph structure to facilitate this translation. We consider two transformation strategies. VerticalSQL This implementation assumes only the SQL-92 level capabilities from the underlying database engine. Above has been two equations for each of the algebraic ops. For this implementation, the enablement layer uses the latter set of equations to generate SQL translations. For example, the projection query: SELECT A1, A2 FROM H is translated into:
SELECT t.Attr1, t.Attr2 FROM
(SELECT t4.Oid, t4.Attr1, t6.Attr2 FROM
(SELECT t0.Oid, t3.Attr1 FROM
(SELECT DISTINCT t2.Oid FROM V AS t2) AS t0(Oid)
LEFT OUTER JOIN
(SELECT t1.Oid, t1.Val FROM V AS t1
WHERE t1.key = `A1`)
AS t3(Oid, Attr1)
ON t0.Oid = 13.Oid
) AS t4(Oid, Attr1)
LEFT OUTER JOIN
(SELECT t5.Oid, t5.Val FROM V AS t5 WHERE t5.Key =
`A2`) AS
t6(Oid, Attr2)
ON t4.Oid = t6.Oid
) AS t(Oid, Attr1, Attr2)
The above query looks quite complex, but the reader may want to verify that it is a direct SQL manifestation of Eq. 5. VerticalUDF This implementation attempts to exploit object-relational extensions to SQL, particularly the user-defined table functions. The underlying engine is extended with the table functions for v2h and h2v operations. The v2h table function reads tuples of vertical table sorted on Oid and outputs a horizontal tuple for each Oid. The h2v table function takes as input column names and a horizontal tuple and splits it into vertical tuples. The enablement layer uses the first set of equations for translating each of the algebraic operations given above for this implementation. For example, the same projection query: SELECT A1, A2 FROM H is translated into: SELECT t.Attr1, t.Attr2 FROM V v, TABLE(v2h(v.Oid, v.Key, v.Val)) AS t(Oid, Attr1, Attr2) WHERE v.Key=`A1` or v.Key=`A2` This query may look awkward to a reader unfamiliar with the table function syntax. The query appears to be a Cartesian product between the vertical relation V and the table function v2h. What happens in effect is that the relevant fields from any qualifying tuple v after applying the select predicates on the V table are passed as parameters into the v2h function, which in turn produces horizontal tuples t from which the fields in the select list are extracted. The v2h table function requires v tuples to be streamed in the Oid order so that it can buffer the key-value pairs until the Oid changes. At that point, it can output the tuple corresponding to the horizontal view. Unfortunately, the current SQL syntax does not allow the specification of the order in which the tuples should be streamed into a table function and that causes problems. Note that a good plan for the above query will push down the Key predicates on A1 and A2 so as to select only the relevant tuples from the V relation. However, the output of this selection would generate tuples in Key or physical row-id order which is different from the Oid order required for v2h. A workaround this problem is to introduce a join of the tuple stream produced by the selection with a table of Oid's and cajole the optimizer to pick a merge sort join plan, thereby forcing a sort on Oid. By introducing this join and adjusting the optimization level for the DB2 query optimizer, we could generate the correct plans. A wild card data type as discussed herein is but one non-limiting data type. Present principles can be applied to other data types such as, without limitation, integers, floats, etc. A separate vertical table can be created for each type. A catalog table could maintain data type information for each attribute, yielding the following scheme:
ATTRIBUTES (KEY CHAR(K) PRIMARY KEY, DATATYPE
CHAR(N));
V_INT(OID INTEGER, KEY CHAR(K), VAL INTEGER);
V_FLOAT(OID INTEGER, KEY CHAR(K), VAL FLOAT);
V_VARCHAR(OID INTEGER, KEY CHAR(K), VAL VARCHAR(X));
While the particular SYSTEM AND METHOD FOR GENERATING HORIZONTAL VIEW FOR SQL QUERIES TO VERTICAL DATABASE as herein shown and described in detail is fully capable of attaining the above-described objects of the invention, it is to be understood that it is the presently preferred embodiment of the present invention and is thus representative of the subject matter which is broadly contemplated by the present invention, that the scope of the present invention fully encompasses other embodiments which may become obvious to those skilled in the art, and that the scope of the present invention is accordingly to be limited by nothing other than the appended claims, in which reference to an element in the singular means "at least one". All structural and functional equivalents to the elements of the above-described preferred embodiment that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the present claims. Moreover, it is not necessary for a device or method to address each and every problem sought to be solved by the present invention, for it to be encompassed by the present claims. Furthermore, no element, component, or method step in the present disclosure is intended to be dedicated to the public regardless of whether the element, component, or method step is explicitly recited in the claims. No claim element herein is to be construed under the provisions of 35 U.S.C. .sctn.112, sixth paragraph, unless the element is expressly recited using the phrase "means for".
|
Same subclass Same class Consider this |
||||||||||
