Apparatus for design of a multilevel secure database management system based on a multilevel logic programming system5481700Abstract Apparatus for designing a multilevel secure database management system based on a multilevel logic programming system. The apparatus includes a multilevel knowledge base which has a multilevel database in which data are classified at different security levels. The multilevel knowledge base also includes schema, which describe the data in the database, and rules, which are used to deduce new data. Also included are integrity constraints, which are constraints enforced on the data, and security constraints, which are rules that assign security levels to the data. The system further includes users cleared to the different security levels for querying the multilevel database, and a multilevel logic programming system is provided for accessing the multilevel knowledge base for processing queries and for processing the integrity and security constraints. The multilevel database management system makes deductions and gives complete answers to queries and prevents certain unauthorized inferences. Claims I claim: Description BACKGROUND OF THE INVENTION
TABLE 1
______________________________________
Truth Table at Security Level L
A B A.LAMBDA.B AVB A .fwdarw. B
______________________________________
T T T T T
T F F T F
F F F F T
F T F T T
______________________________________
2.3 Theory of NPML In this section, we discuss the essential points of an NPML theory. An NPML theory consists of a set a logical axioms, a set of proper axioms, and a set of inference rules. The logical axioms are analogous to those of propositional logic and are given in Table 2. Each logical axiom has a security level associated with it. This is the minimum level in which the axiom evaluates to true. Note that this level is the least upper bound of the inherent security levels of the wffs associated with the axiom. The proper axioms of any NPML theory include the relationships between the security levels supported by the system. These axioms are also shown in Table 2 where we assume that the security levels are L1,L2, . . . Ln. The remaining proper axioms can be formed from a subset {W1, W2, . . . Wn} of wffs as follows. For each Wi (1.ltoreq.i.ltoreq.n), (Wi, L).sup.3 is explicitly asserted to be a proper axiom of the theory if: Wi evaluates to true with respect to L, and It is not the case that (Wi, L*) is a proper axiom where L* is just less than L..sup.4 Note that if L is just less than L+ and if Wi does not evaluate to true with respect to L+, then (Wi, L+) is asserted to be a proper axiom. We make an important assumption that the set of proper axioms of an NPML theory with respect to any security level is consistent. .sup.3 The security level associated with an axiom is not its inherent security level. It is the level at which the axiom is asserted. .sup.4 A level L1 is just less than a level L2 if L2 dominates L1 and there is no security level L* such that L1<L*<L2. For example, the axioms P,P.fwdarw. Q and Q cannot be the proper axioms of any theory with respect to any security level. Table 2 also shows the rules of inference of the NPML theory. The rules of inference are (MP) and Deduction Across Security Levels (DASL). Note that the rule DASL has been introduced specifically to support a multilevel universe. The theorems of an NPML theory are deduced from the axioms using the rules of inference. For example, (F,L) is denoted a theorem of the NPML theory T if either (F,L) is an axiom of T or (F,L) can be derived from the axioms using the rules of inference. That is, there exists a derivation (P1,L1), (P2,L2) . . . (Pn, Ln) such that (Pn,Ln)=(F,L) and for each i (1.ltoreq.i.ltoreq.n) Li.ltoreq.L, (Pi, L) holds, each (Pi,Li) is either an axiom of T or it can be derived from (P1,L1), (P2,L2) . . . (Pi-1,Li-1) using the rules of inference. One can define consistency, soundness, and completeness of an NPML theory as follows. An NPML theory is consistent if there is not a wff F and a security level L such that (F,L) and (F,L) are its theorems. An NPML theory is sound if for every theorem (F,L), F evaluates to true with respect to L. An NPML theory is complete if for every wff F and security level L, if F evaluates to true with respect to L, then (F,L) is a theorem. We obtain the following theorem. Theorem 1 An NPML theory is consistent, sound, and complete. Proof of Theorem 1: The proof uses techniques similar to the ones used to show the consistency, soundness, and completeness of propositional logic given in Mendleson, E. 1979, Introduction to Mathematical Logic, Princeton, N.J.: Van Nostrand.
TABLE 2
__________________________________________________________________________
NPML Theory
__________________________________________________________________________
Logical Axioms
If A, B, and C are NPML wffs whose inherent security levels are L1, L2,
and L3, and
the least upper bound of L1, L2, and 13 is L, and the least upper bound
of L1 and L2 is
L*, then the following are the logical axioms of an NPML theory:
(A .fwdarw. (B .fwdarw. A)), L*
((A .fwdarw. (B .fwdarw. C)) .fwdarw. ((A .fwdarw. B) .fwdarw. (A
.fwdarw. C))), L
(( B .fwdarw. A) .fwdarw.(( B .fwdarw. A) .fwdarw. B)), L*
Part of the Proper Axioms
The security Level 2 dominates the level L1, system-low.
The security level L3 dominates the level L2, system-low.
The security level Ln dominates the level Ln-1, system-low.
Rules of Inference
MP: (B,L) is a direct consequence of (A,L) and (A.sub..fwdarw. G,L),
where L is one of the
security levels L1, L2, L3, . . . Ln, system-low
DASL:
(P,Lj) is a direct consequence of (P,Li), ( P,Lj) and
the security level Lj dominates the level Li, system-low.
__________________________________________________________________________
Section 3 Logic for Multilevel Databases We develop a Nonmonotonic Typed Multilevel Logic, called NTML, to formalize concepts in multilevel databases. We describe NTML language, NTML semantics, and NTML theories in the next three subsections. 3.1 NTML Syntax The syntax of NTML is typed first-order logic with extensions to support multilevel security. We define the primitive symbols, terms, and formulas of the language. Security properties are enforced for each symbol, term, and formula. Any NTML theory based on this syntax must satisfy these properties..sup.5 .sup.5 In section 9 we will formally specify these security properties. Primitive Symbols A type symbol denoted by a string of one or more letters (each letter could be subscripted) has a security level assigned to it. If TS is a type symbol, then its security level (also called the inherent level of TS) is denoted by Level(TS)..sup.6 .sup.6 To simplify the syntax, one possibility would be to assign the lowest security level (i.e., system-low) to all the symbols of the language. Truth values could be assigned with respect to the various security levels. A variable symbol denoted by a string of one or more letters (each letter could be subscripted) has a type and a security level assigned to it. If VS is a variable symbol whose type and level (also called the inherent level of VS) are T and L respectively, then Type(VA)=T and Level(VS)=L. The following security property must be satisfied. SP1: Level(VS).gtoreq.Level(Type(VS)) A constant symbol consists of a type and a security level. If CS is a constant symbol whose type and level (also called the inherent level of CS) are T and L, respectively, then Type(CS)=T and Level(CS)=L. The following security property must be satisfied. SP2: Level(CS).gtoreq.Level(Type(CS)) A function symbol consists of a type and a level. Associated with each function symbol are its specified arguments, the types of its arguments and the type of its value. If a function symbol FS has n arguments, then FS is an n-plane function symbol. If T1, T2, T3 . . . Tn are the types of the arguments and T is the type of the value, then the type of the function symbol FS denoted Type(FS) is T1.times.T2.times.T3.times. . . . Tn.fwdarw.T. The level of FS (also called the inherent level of FS) is denoted by Level(FS). The following security property must be satisfied: SP3: Level(FS).gtoreq.l.u.b.(Level(T1), Level(T2), Level(T3), . . . Level(Tn), Level(T)) A predicate symbol consists of a type and level. Associated with each predicate symbol are the specified arguments and the types of the arguments. If a predicate symbol PS has n arguments, then PS is an n-place predicate symbol. If T1, T2, . . . are the types of arguments, then the type of the predicate symbol denoted by Type(PS) is T1.times.T2.times.T3.times. . . . Tn. The level of PS (also called the inherent level of PS) is denoted by Level (PS). The following security property must be satisfied: SP4: Level(PS).gtoreq.l.u.b. (Level(T1), LevelT2), Level(T3), . . . Level(Tn)) Logical connectives A (AND), V(OR), (NOT),.fwdarw.(IMPLICATION). Quantifiers (FOR EVERY), (THERE EXISTS). Quantification is permitted over variables and types. Terms A variable symbol is a term. The level and type of the term is the same as those of the corresponding variable symbol. A constant symbol is a term. The level and type of the term is the same as those of the corresponding constant symbol. Let F be an n-place function symbol of type T1.times.T2.times.T3.times. . . . Tn.fwdarw.T. If t1, t2, t3, . . . tn are terms of types T1, T2, T3 . . . . Tn, respectively, then F(t1, t2, t3 . . . tn) is a term type T. Let TE be this term. The level of TE also called the inherent level of TE) is denoted by Level (TE). The following security property must be satisfied: SP5: Level(TE).gtoreq.l.u.b.(Level(F), Level(t1), Level(t2), Level(t3), . . . Level(tn)) Atomic Formulas If P is a predicate symbol of type T1.times.T2.times.T3 . . . Tn and t1, t2 t3 . . . tn are terms of type T1, T2, T3, . . . Tn, respectively, then P(t1, t2, . . . .tn) is an atomic formula. Let this formula be denoted by AM. The level of AM (also called the inherent level of AM) us denoted by Level(AM). The following security property must be satisfied: SP6: Level(AM).gtoreq.l.u.b.(Level(P), Level(t1), Level(t2), Level(t3), . . . Level(tn)) Well-Formed Formulas Any atomic formula is a wff. If W is a wff (whose inherent level is Level(W)), then x/t(W) and x/t(W) are also wffs. The following security property must be satisfied: SP7: If L1 and L2 are the security levels (or inherent levels) of the formulas x/t(W) and x/t(W) respectively, then Level(W)=L1=L2. If W1 and W2 are wffs, then W1, W1 .LAMBDA. W2, W1 V W2, W1.fwdarw.W2 are also wffs. Further, if L is the least upper bound of the levels Level(W1) and Level(W2) (where Level(W1) and Level(W2) are the inherent levels of W1 and W2, respectively), then the following security property must be satisfied: SP8: Let W3, W4, W5 and W6 be W1, W1 .LAMBDA. W2m/w1 V W2 and W1.fwdarw.W2 respectively. Let L3, L4, L5 and L6 be the security levels (or inherent levels) of the formulas W3, W4, W5 and W6, respectively. Then L3=Level(W1), L4=L5=L6=L. 3.2 NTML Semantics First we discuss truth assignments to the symbols of NTML, and then we describe interpretations of NTML formulas. Interpretations of NTML Symbols We define truth assignments to NTML symbols, types, constants, functions and predicates under an interpretation with respect to a security level L (note that we do not address variable assignment). Type Symbols: Under and interpretation , associated with each type T with inherent level L*, there is a domain D(T,L) for each security level L.gtoreq.L*. Further, D(T,L) consists of all elements of type T which satisfy the following conditions: (a) L*.ltoreq.Level(x).ltoreq.L. (b) D(T,L') C D(T,L).sup.7 where L' is the security level that is just less than L (that is, them is no security level L* such that L'<L*<L). We denote D(T,L) to be (T,L). .sup.7 C is the subset relationship. Constant Symbols: Let a be a constant symbol of type T. Let Level(a)=L1 and Level(T)=L2 (note that L2.ltoreq.L1). Associated with a is an element a in D(T,L) for each L.gtoreq.L1 (note also that for such an element a to exist L1 must be .ltoreq.L). We denote a by (a, L). Function Symbols: Let F be an n-place function symbol of type T1.times.T2.times.T3.times. . . . Tn.fwdarw.T. Let Level(F)=L'. If L'.ltoreq.L, then associated with F is a mapping F*: D(T1,L).times.D(T2,L).times.D(T3,L).times. . . . D(Tn,L).fwdarw.D(T,:). We denote F* by (f,L). Predicate Symbols: Let P be an n-place predicate symbol of type T1.times.T2.times.T3.times. . . . Tn. Let Level(P)=L'. If L'.ltoreq.L, then associated with P is a relation P* on D(T1,L).times.D(T2,L).times.D(T3,L).times. . . . D(Tn,L) which evaluates to either True or False. We denote P* by (P,L). Interpretations of Terms and Formulas We first define interpretations of variable-free terms with respect to security levels. Next, we define interpretations of variable-free atomic formulas. Finally, we define interpretations of formulas which are either variable-free or closed (note that a formula is closed if it does not have any free variables; a free variable is a variable which is not within the bounds of a quantifier). Let t be a term and Level(t)=L'. If L'.ltoreq.L, then the interpretation of t with respect to security level L is denoted by (t,L). We consider the various possibilities for T. (a) t is the constant symbol a, then (t,L)= (a,L). (b) t is a term F(t1, t2, . . . tn) where t1, t2, . . . tn are variable free terms. Then (t,L)-F*(t1*, t2*, . . . tn*) where F*= (F,L), t1= (t1,L), t2= (t2,L), . . . tn= (tn,L). Next we define interpretations of variable-free atomic formulas and formulas which are either variable free or closed. Let A be the atomic formula with specification P(t1, t2, . . . tn) where P is a predicate symbol and t1, t2, . . . tn are variable free terms. Let Level(A)=L'. Then if L'.ltoreq.L, (A,L)=P*(t1*, . . . tn*) where P*= (P,L), t1*= (t1,L), t2*= (t2,L), . . . tn*= (tn,L). Let F and G be variable-free formulas. The interpretation of F, F.LAMBDA. G, F V G, F.fwdarw.G are defined as follows: (F,L)= (F,L) (F.LAMBDA. G,L)= (F,L) .LAMBDA. (G,L) (F V G,L)= (F,L) V (G,L) (F.fwdarw.G,L)=False if (F,L)=True and (G,L)=False=True if otherwise. Let F be a formula with free variables x1, x2, . . . xn. Let T1, T2, . . . Tn be the types of x1, x2, . . . xn, respectively. Then, the interpretation of x/T(F) and x/T(F) with respect to L (where x is the tuple x1, x2, . . . xn and T is the tuple T1, T2, . . . Tn) is defined as follows: (x/T(F),L)=True if there is an element (a1, a2, . . . an) which is a member of D(T1,L).times.D(T2,L).times. . . . D(Tn,L) such that (F(a1, a2, . . . an),L) evaluates to True. Otherwise (x/T(F),L) evaluates to False. (x/T(F),L)=True if for every element (a1, a2, . . . an) which is a member of D(T1,L).times.D(T2,L).times. . . . D(Tn,L), (F(a1, a2, . . . an),L) evaluates to True. Otherwise (x/T(F),L)evaluates to False. 3.3 NTML Theory As in any logic theory, an NTML theory has a set of logical axioms, a set of proper axioms, and a set of inference rules. The logical axioms of an NTML theory are analogous to those of first-order logic with equality. These axioms are shown in Table 3. Every NTML theory has a distinguished type symbol SL. Associated with the distinguished type symbol SL under the interpretation at level L is the domain D(SL,L). D(SL,L) consists of the security levels which are visible to users at level L. For each L, if L1 and L2 are elements of D(SL,L), then either L1.ltoreq.L2 or L2.ltoreq.L1. Furthermore, the level of the type symbol SL is system-low (which is the lowest security level supported by the system under consideration).
TABLE 3
__________________________________________________________________________
NTML Theory
__________________________________________________________________________
Logical Axioms
If A, B, and C are NPML wffs whose inherent security levels are L1, L2,
and L3, let
the inherent security levels of the variable xi be L4, the term t be L5,
the variable x1 be
L6, the variable x be L7 and the variable y be L8.
Let L' = l.u.b.(L1, L2, L3), L" = l.u.b.(L1,L2), L* = l.u.b.(L1,L4,L5),
L+ =
l.u.b.(L1,L2,L4), L = l.u.b.(L1,L7,L8)
Then, the following are the logical axioms of an NPML theory:
A1:
(A .fwdarw. (B .fwdarw. A)), L"
A2:
((A .fwdarw. (B .fwdarw. C)) .fwdarw. ((A .fwdarw. B) .fwdarw. (A
.fwdarw. C))), L'
A3:
(( B .fwdarw. A) .fwdarw. (( B .fwdarw. A) .fwdarw. B)), L"
A4:
( xi/Ti A(xi) .fwdarw. A(t)), L* where t is a term free for xi in
A(xi)
A5:
( xi/Ti (A .fwdarw. B) .fwdarw. (A .fwdarw. xi/Ti B)), L + if A is a
wff containing no free
occurrences of xi
A6:
( x1/T1 xi = x1) < L6
A7:
(x = y .fwdarw. (A(x,x) .fwdarw. A(x,y)), L
Part of the Proper Axioms
(Dominate(Secret, Unclassified)), Unclassified
(Dominate(TopSecret, Secret)), Unclassified
( L1,L2,L3 ESL (Dominate(L1,L2) .LAMBDA.Dominate(L2,L3)
.fwdarw. Dominate(L1,L3))),
Unclassified
( L1,L2 ESL (Dominate(L1,L2) L3 (L3 .noteq. L1 .LAMBDA.L3 .noteq. L2
.LAMBDA. Dominate(L1,L3)
.LAMBDA.Dominate(L3,L2)) .fwdarw. Just-less-than (L2,L1)), Unclassified
Rules of Inference
MP: (Q,L) is a direct consequence of (P,L) and (P .fwdarw. Q,L)
GEN: ( x/T(P),L) is a direct consequence of (P,L)
DASL: (P,L2) is a direct consequence of (P,L!), Just-less-than (L1,L2)
and ( ,P,L2)
__________________________________________________________________________
Every NTML theory has a set of constant symbols for every security level considered. Each constant is of type SL. Two predicate symbols "Dominate" and "Just-less-than" (both having security level system-low) are defined on the variable and constant symbols of type SL. For examples, let D(SL, Unclassified)--{Unclassified, Secret, and Top Secret}. Let the distinguished individual constants be Unclassified, Secret, and Top Secret whose assignments are the levels Unclassified, Secret, and Top Secret respectively. Then, Table 3 shows part of the proper axioms of the theory. Next to each formula we state a security level. This is the level at Note (a) Next to each formula we state a security level. This is the level at which the formula is true under all interpretations. If we do not specify a security level next to a formula, then the formula is assumed to be true with respect to all security levels under all interpretations (see the ensuing discussion of the proper axioms). (b) For convenience we refer to the constants Unclassified, Secret, and Top Secret by their respective assignments Unclassified, Secret, and Top Secret. The remaining proper axioms of an NTML theory are constructed from a subset of the NTML formulas, as follows: Let {W1, W2, . . . Wn} be a subset of NTML formulas. Let L1, L2, . . . Ln be the (inherent) security levels of W1, W2, . . . Wn respectively. Wi is explicitly asserted to be a proper axiom of an NTML theory at level L, which is denoted by (Wi,L), if (a) Li.ltoreq.L. (b) Wi is true in all interpretations with respect to L. That is, (Wi,L) is true for all interpretations . (c) It is not the case that (Wi,L*) is an axiom where L* is just less than L. Note that if L is just less than L+ and if Wi is not true with respect to L+, then (WI,L+) is a proper axiom of the theory. We assume that a set of proper axioms with respect to any security level is consistent. That is, the formulas (P,L), (P.fwdarw. Q,L) and (Q,L) cannot be the proper axioms of a theory for any security level L. Note that in the discussion on the syntax of NTML, each formula was assigned a security level. This level is the inherent security level of the formula. For example, in the database context the inherent security level of a formula is the security level of the schema which is represented by that formula. So, if (W,L) is a proper axiom of an NTML theory and if the inherent security level of W is L* , then the following security property must be satisfied: SP9: L*.ltoreq.L Note (a) In the database context, this means that the security level of the schema must be dominated by the security level of any tuple in the corresponding relation. (b) The security properties SP1 to SP9 that we have described are not part of the NTML theory. They can be regarded as axioms of a metatheory. That is, the security properties are enforced within the metatheory. This point will be addressed in section 6. (c) Each formula which is an axiom is followed by a security level. This is the level with respect to which formula is true under all interpretations. If no level is specified, then the formula is true with respect to all security levels under all interpretations. The rules of inference of an NTML theory are Modus Ponens (MP), Generalization (GEN) and Deductions Across Security Levels (DASL). These rules are also shown in Table 3. The rule DASL is needed for a multilevel environment. One can define theorems of an NTML theory as in any other logic. For example, (F,L) is denoted a theorem of the NTML theory T if either (F,L) is an axiom of T or (F,L) can be derived from the axioms using the rules of inference. That is, there exists a derivation (P1,L1), (P2,L2) . . . (Pn,Ln) such that (Pn, Ln)=(F,L) and for each i (1.ltoreq.i.ltoreq.n) Li.ltoreq.L, each (Pi,Li) is either an axiom of T or it can be derived from (P1,L1), (P2,L2) . . . (Pi-1,Li-1) from the rules of inference. Note (a) Like an axiom, a theorem is also followed by a security level. (b) In the case of a formula which is either an atom or the negation of an atom, we include the security level as an argument for convenience. For example, (P(t1, t2, . . . tn),L) and (P(t1, t2, . . . tn), L) are denoted by (P(t1, t2, . . . tn, L)) and (P(t1, t2, . . . tn, L)), respectively. One can define consistency, soundness, and completeness of an NTML theory as follows. An NTML theory is consistent if there is not a wff F and a security level L such that (F,L) and (F,L) are its theorems. An NTML theory is sound if for every theorem (F,L), F evaluates to true with respect to L. An NTML theory is complete if for every wff F and security level L, if F evaluates to true with respect to L, the (F,L) is a theorem. We have the following theorems that show the consistency, soundness, and completeness of an NTML theory. Theorem 2 An NTML Theory is Consistent. Proof of Theorem 2: The proof of this theorem uses a technique similar to the proof of the consistency of first-order logic. We refer the reader to Mendleson, E. 1979, Introduction to Mathematical Logic, Princeton, N.J.: Van Nostrand (see proposition 2.2, page 62) for the proof of the consistency of first-order logic. Theorem 3 An NTML Theory is Sound. Proof of Theorem 3: This proof follows from the proofs of the following lemma. The proof of this lemma uses techniques similar to the proof of the soundness of first-order logic given in Mendleson, E. 1979, Introduction to Mathematical Logic, Princeton, N.J.: Van Nostrand (see propositions 2.1-2.7, pages 61-65). Lemma 4 (a) If P and P.fwdarw.Q are NTML wffs which are true under an interpretation with respect to L, then so is Q. (b) If the NTML formula P is true under an interpretation with respect to L, then so is "x/T(P). (c) If the NTML formula P is true under an interpretation with respect to L1 and just-less-than (L1,L2) is true under with respect to all security levels, and it is not the case that ( P) is true under with respect to L2, the P is true under with respect to L2. Theorem 5 An NTML Theory is Complete. Proof of Theorem 5: The proof of this theorem uses techniques similar to the proof of the completeness of first order predicate calculus (see Mendleson, E. 1979, Introduction to Mathematical Logic, Princeton, N.J.: Van Nostrand, see proposition 2.13, page 69). Section 4 Viewing Multilevel Databases Through NTML In this section we provide an overview of the perceived multilevel universe and then describe three approaches to represent it, based on NTML. This work follows from the work reported in Nicolas, J., and H. Gallaire, (Editors: H. Gallaire and J. Minker), 1978, Database: Theory vs Interpretation, New York: Plenum Press, where the first-order logic is used to formalize nonmultilevel databases. 4.1 Multilevel Universe A state of the universe can be regarded as a set of elements linked together by functions or relations. Since functions can be regarded as special kinds of relations, we only consider relations in our discussion. We distinguish between the actual universe and the perceived universe. The actual universe is the real universe. The perceived universe is the part of the universe that is involved in the particular application under consideration. We refer to the perceived universe as the universe. Since elements and relations are all assigned security levels, the universe is multilevel. One can partition this universe into worlds corresponding to each security level. The worlds are the views of the universe by users at corresponding levels. For example, a Secret world in the universe is the world perceived by those cleared at the Secret level. Information in a multilevel universe is the knowledge of the truth value of a statement with respect to some security level. A statement could be an elementary fact such as "John's salary is 30K,.sup.8 it could be a schema such as, "The attributes of employee are SS#, name and salary," or it could be a law such as, "All salary values must be positive." The statement, "John's salary is 30K," could be true with respect to the Unclassified level, and it could be false with respect to the Secret level. Therefore, in the Unclassified world, "John's salary is 30K" represents true information while in the Secret world "John's salary is 30K" represents false information. .sup.8 By elementary fact, we mean a relation applied to the elements of the universe. The elementary facts, the schema and the laws are only a subset of the information of the perceived universe. This is known as explicit information. The other kind of information is implicit information which is the information derived from the explicit information. Note (a) A general law is in fact an integrity constraint. We use the terms general law and integrity constraint interchangeably. (b) Security constraints that assign security levels to the data can also be regarded as general laws, (i.e., security constraints are also integrity constraints). (c) A general law can either be an integrity rule or a derivation rule. An integrity rule must be satisfied by the database. A derivation rule is used to derive new information from the database data and any real-world information. Representing negative information has been a subject of much research. Negative information can also be either explicitly or implicitly specified. For example, the statement, "It is not the case that John's salary is 60K," explicitly states that John's salary is not 60K. Implicit representation of negative information can be derived by using certain rules. These rules include the rules of inference associated with the theory as well as other rules of inference such as the uniqueness axiom and the completeness axiom. For example, if there is a rule that "John's salary is unique" and a fact that "John's salary is not 30K," then one could deduce the negative information that "John's salary is not 60K." Completeness axioms specify all the values that a particular entity can take. For example, from the following axiom, "John's salary is either 30K or 60K," one can deduce that John's salary is not 50K. A multilevel universe can either be finite or infinite depending on whether the elements associated with it are finite or infinite. Further, one can regard a universe to be either open or closed. If the universe is closed, then any fact that does not evaluate to true in the universe is assumed to be false and its negation is assumed to be true. If it is an open universe, then such an assumption is not made. That is, one cannot assume negative information unless it is explicitly stated or one can deduce it. In the next three subsections, we describe three approaches to representing the perceived universe. All approaches use NTML Logic as a framework. 4.2 Perceived Multilevel Universe as an NTML Theory In this approach, called the proof theoretic approach, the perceived multilevel universe is represented as a NTML theory. The NTML theory is defined as follows: Its constants and predicates are respectively the elements and relations associated with the universe. Its proper axioms are defined as follows: If an elementary fact, schema, or a general law is true in all interpretations with respect to a security level L, and it is not the case that this fact or law is true in all interpretations with respect to the security level L* where L* is just less than L, then the fact or law is assigned the security level L and is taken to be a proper axiom of the theory. If it is not the case that an elementary fact, schema, or law is true in all interpretations with respect to a security level L, and this fact or law is true in all interpretations with respect to a security level L* where L* is just less than L, then the negation of this fact or law is either taken to be a proper axiom of the theory, or one should be able to derive the negation from the proper axioms. The implicit information is the theorems of the NTML theory. The actual multilevel universe is an interpretation of the NTML theory. Whether the actual universe is a model of the theory depends on how accurately the actual universe fits the perceived universe. The approach is illustrated in FIG. 1. Query Evaluation A query posed by a user at security level L is expressed as a wff of the NTML theory. There are two types of queries: closed and open. A closed query is a wff which is closed (i.e., with no free variables), and an open query is one with at least one free variable. Query evaluation amounts to theorem proving. For example, let W be the wff which corresponds to a query Q posed by a user at level L. Then evaluating the query Q amounts to proving that (W,L) is a theorem of the NTML theory..sup.9 .sup.9 It should be noted that proof procedures for NTML logic are yet to be developed. For the discussion given in this section we assume that the NTML rules of inference are used for deductions. For example, let A and B be the wffs which correspond to the respective queries Q1 and Q2. Further assume that Q 1 is closed and Q2 is open and that the queries are posed by a user at level L. The solution to Q 1 is a Yes or No answer. The answer is Yes if (A,L) is a theorem of the NTML theory. For the open query Q2, a tuple in the response is obtained as follows: Substitute appropriate elements associated with the NTML theory for the free variables of B. Let the resulting wff be B*. If (B*,L) is a theorem of the theory, then the tuple which is formed from the elements substituted to form B* is included in the response. We illustrate both open and closed query evaluation associated with this first approach with examples. We assume that there are only two security levels: Secret and Unclassified. The Secret level dominates the Unclassified level. The proper axioms of the NTML theory are shown in Table 4. PA1 and PA2 are schemas for the relations EMP and SEN-EMP, respectively. Both schemas have the Unclassified level associated with them as they are both true at the Unclassified level. PA1 states that if EMP(X,Y,Z) is true in an interpretation with respect to L, then the types of X, Y, and Z are SS#, NAME and SALARY, respectively. Note that since EMP(X,Y,Z) is an atomic formula, we include the level L as one of its arguments. PA2 states that if SEN-EMP(X) is true under an interpretation with respect to L, then the type of X is NAME. PA3, PA4, and PA5 are atomic formulas with no variables. So are the axioms PA11-PA14. Although it is not explicitly specified, axioms PA6-PA9 assume universal quantification over all variables X, Y, Z, L (with or without subscripts). PA6 is a law which defines all those who earn more than 30K or more as senior employees. PA7 is a law which states that all salaries must be less than or equal to 40K. This law is true only with respect to the Unclassified level. Note that PA10 negates this law at the Secret level. PA8 is the equivalent to the primary key constraint in relational databases. PA9 states that if two tuples exist with the same SS# at different security levels, then the negation of the tuple at the lower security level is true with respect to the higher level. Queries are expressed as formulas of NTML theories at the level of the user who posed the queries. Consider the following queries: Q1: EMP(000, John, 20K), Unclassified Q2: EMP(111, James, 60K), Unclassified Q3: X EMP(X,Y,Z), Unclassified Q4: EMP(000, John, 20K), Secret Q5: EMP(111, James, 35K), Secret The five queries are expressed as formulas of NTML whose proper axioms are given in Table 4. Query evaluation amounts to theorem proving. Note that the queries Q1, Q2, Q4, and Q5 are closed queries while Q3 is open. Also, the queries Q4 and Q5 are posed by Secret users while Q 1, Q2, and Q3 are posed by Unclassified users. An attempt to prove that (EMP(000, John, 20K), Unclassified) is a theorem of NTML will be successful as PA3 is a proper axiom of the theory. Therefore, the answer to query Q1 is "Yes." For the query Q2, since its negation (i.e. ,(EMP(111, James, 60K), Unclassified) can be proved to be a theorem of NTML, the answer is "No." The answer to query Q3 consists of all the (Y,Z) pairs such that there is some X for which (EMP(X, Y, Z), Unclassified) is a theorem of NTML. From the axioms it can be seen that the answer to Q3 is the set of pairs {(John, 20K), (James, 35K), (Mary, 15K)}. The answer to the query Q4 is, "No." This is because from the axioms PA3, PA9, and PA11, it can be deduced that (EMP(000, James, 20K), Secret) is a theorem. The answer to the query Q5 is "Yes" as it can be derived from the axiom PA4 and the rule of inference DASL.
TABLE 4
______________________________________
Database as an NTML Theory
______________________________________
PA1: ( X/Tl, Y/T2, Z/T3, L/SL EMP(X,Y,Z,L) .fwdarw.
T1 = SS# T2 = NAME T3 = SALARY),
Unclassified
PA2: ( X/T, L/SL SEN-EMP(X,L) .fwdarw. T = NAME),
Unclassified
PA3: .fwdarw. EMP(000, John, 20K, Unclassified)
PA4: .fwdarw. EMP(111, James, 35K, Unclassified)
PA5: .fwdarw. EMP(222, Mary, 15K, Unclassified)
PA6: (EMP(X,Y,Z,Unclassified) Z .gtoreq. 30K .fwdarw.
SEN-EMP(Y, Unclassified)), Unclassified
PA7: (EMP(X,Y,Z,Unclassified) .fwdarw. Z .ltoreq. 40K), Unclassified
PA8: (EMP(X, Y1, Z1, L), EMP(X, Y2, Z2, L)
Y1 = Y2 Z1 = Z2), Unclassified
PA9: (EMP(X, Y1, Z1, L1) EMP(X, Y2, Z2, L2)
(Y1 = Y2 Z1 = Z2) Just-less-than(L1, L2)
.fwdarw. EMP(X,Y1,Z1,L2)), Unclassified
PA10: ( X/SS#, Y/NAME, Z/SALARY, L/SL
(EMP(X,Y,Z,L) .fwdarw. Z .ltoreq. 40K)), Secret
PA11: .fwdarw. EMP(000, John, 50K, Secret)
PA12: .fwdarw. SEN-EMP (George, Secret)
PA13: .fwdarw. SEN-EMP (Harry, Unclassified)
PA14: .fwdarw. SEN(333, Jane, 70K, Secret)
______________________________________
Advantages and Drawbacks This approach permits disjunctive information to be represented. For example, one can express the information that "either the salary of John is less than 50K or John is a senior employee." However, this approach does have certain drawbacks. One is that in the database context, the elementary facts that have to be represented may be very large. Therefore, efficient proof techniques have to be developed to handle the large amounts of facts and rules. Furthermore, negative information in this approach has to be explicitly stated. This makes the database even larger. Another drawback to this approach is the verification of the general laws. That is, the general laws are treated as derivation rules with this approach. That is, they are not treated as integrity rules where they could be validated during database updates. By treating the general laws as derivation rules, each time some information is added at a security level L, the proper axioms of the NTML theory may have to be modified. As a result, a new NTML theory is produced. Therefore, the consistency of the new theory has to be checked. If the new theory is inconsistent, then appropriate actions have to be taken. Possible ways to handle exceptions are: (a) Do not permit the update. (b) Modify the axioms so that consistency is maintained. (c) Adopt the technique proposed in Kowalski, R. A. (Editors: H. Gallaire and J. Minker), 1978, "Logic for Data Description," Logic and Databases, New York: Plenum Press, where a new predicate INCONSISTENT is added to the NTML theory. This new predicate handles the exceptional situations. 4.3 Multilevel Database as a Model In the second approach, called the model theoretic approach, the set of elementary information is considered to be an interpretation of an NTML theory. This is the approach that is implicitly followed in a traditional relational database system. The proper axioms of the NTML theory are the general laws. All general laws are used as integrity rules and not as derivation rules. That is, since certain general laws are the proper axioms of an NTML theory these laws must be satisfied by the multilevel database. The approach is illustrated in FIG. 2. Query Evaluation Query evaluation in this approach amounts to checking the satisfiability of the formulas which represent the multilevel database. That is, for a closed query posed by a user at level L, the truth or falsity of the corresponding formula is checked against the portion of the multilevel database visible at security level L. For an open query at level L, the set of values associated with the database at level L which satisfies the formula, when substituted for the free variables, will form the response. We illustrate both open and closed query evaluation associated with this second approach with examples. Consider the multilevel database shown in Table 5. The schema S1, S2, and the general laws IR1-IR5, expressed as NTML formulas, are treated as integrity rules. That is, whenever the database is updated, the integrity rules must be satisfied. For example, whenever a name is included in the relation SEN-EMP, it must be first verified that the name is an employee and that the corresponding salary is greater than or equal to 30K. Also, whenever an Unclassified tuple is modified or entered into the database, the salary value specified must be less than or equal to 40K. IR3 is the primary key constraint. Also note that all the variables in IR1-IR3 are implicitly assumed to be universally quantified. IR1 is negated at the Secret level in IR4 as IR1 does not hold at the Secret level. For the laws IR2 and IR3, we do not duplicate them at the Secret level. This is because all of the laws are axioms of an NTML theory and therefore the NTML rules of inference can be used to deduce new laws at the Secret level. For example, from IR2 and the rule DASL, we have the following law: (EMP(X,Y,Z,L) Z.gtoreq.30K.fwdarw.SEN-EMP(Y,L)), Secret. IR5 is the integrity rule which ensures that tuples are duplicated at the appropriate levels..sup.10 It should also be noted that since the integrity rules and the schemas are axioms of an NTML theory, the rules of inference could be used to deduce new laws. IR5 states that if there is a tuple in the database at level L1, and if a second tuple in the database with the same primary key as the first one does not exist at a level L2 where L1 is just less than L2, then the first tuple in the database must also exist at level L2. Note that although IR5 is classified at the Unclassified level, the rule DASL will ensure that it is valid at all security levels. .sup.10 Note that, in this approach, NTML rules of inference cannot be used to infer new tuples.
TABLE 5
__________________________________________________________________________
Multilevel Database as a Model
__________________________________________________________________________
S1:
(.sub. X/T1, Y/T2, ZT3, L/SL EMP(X,Y,Z,L) .fwdarw.
T1 = SS#.sub..LAMBDA. T2 = NAME.sub..LAMBDA. T3 = SALARY),
Unclassified
S2:
(.sub. X/T, L/SL SEN-EMP(X,L) .fwdarw. T = NAME), Unclassified
IR1:
(EMP(X,Y,ZL) .fwdarw. Z .gtoreq. 40K), Unclassified
IR2:
EMP(X,Y,Z,L) .sub..LAMBDA. Z .gtoreq. 30K .fwdarw. SEN-EMP(X),
Unclassified
IR3:
(EMP(X,Y1,Z1,L).sub..LAMBDA. EMP(X,Y2,Z2,L) .fwdarw.
Y1 = Y2.sub..LAMBDA. Z1 = Z2), Unclassified
IR4:
(.sub. X/SS#, Y/NAME, Z/SALARY, L/SL (EMP(X,Y,Z,L) .fwdarw.Z
.ltoreq. 40K)), Secret
IR5:
(.sub. X1/SS#, Y1/NAME, Z1/SALARY, L1/SL, L2/SL (EMP(X1,Y1,Z1,L1).sup..
LAMBDA.
.sub. Y2/NAME, Z2/SALARY (EMP(X1,Y2,Z2,L2).sub..LAMBDA. (Y1 .noteq.
Y2 V Z1 .noteq. Z2))
.sup..LAMBDA. Just-less-than(L1, L2) .fwdarw. EMP(XI, Y1, Z1, L2))),
Unclassified
__________________________________________________________________________
Relation: EMP
SS# Name Salary
Level
__________________________________________________________________________
000 John 20K Unclassified
111 James 35K Unclassified
222 Mary 15K Unclassified
000 John 50K Secret
111 James 35K Secret
222 Mary 15K Secret
333 Jane 70K Secret
__________________________________________________________________________
Relation: SEN-EMP
Name
Level
__________________________________________________________________________
James
Unclassified
John
Secret
Jane
Secret
__________________________________________________________________________
We are, however, faced with the problem of finding a mechanism to enforce IR5. For example, consider the situation where the tuple (444, Paul, 30K, Unclassified) is added to the database. All of the integrity rules are satisfied at the Unclassified level and, therefore, the multilevel database is a model of the axioms at this level. However, at the Secret level, the Secret counterpart to IR5 (obtained via DASL) is not satisfied if (444, Paul, 30K, Secret) is not the database. Possible solutions to this problem are: (a) Reject the update if (444, Paul, 30K, Secret) is not in the database; this has a problem in terms of security. That is, if an Unclassified user has requested the update (which will usually be the case), then by rejecting the update this user will know that the tuple does not exist at the higher level since he can read IR5. (b) Enter the tuple (444, Paul, 30K, Secret) into the database. The level of the subject (who acts on behalf of the user) who attempts this update will be determined by the Security Policy enforced by the system (usually this subject will operate at the Secret level). (c) Permit the update, but use the predicate INCONSISTENT (as described in Kowalski, R. A. (Editors: H. Gallaire and J. Minker), 1978, "Logic for Data Description," Logic and Databases, New York: Plenum Press to handle the exceptional situation. That is, some additional laws are introduced at the Secret level in order to handle the exceptional situations (note that if the laws are updated, then a new theory is obtained, which means the consistency of the new theory has to be checked). Next we describe query evaluation with this approach. Consider the following queries posed by a Secret user. Q6: EMP(000, John, 20K) Q7: EMP(000, John, 60K) Q8: SEN-EMP(X) Note that Q6 and Q7 are open queries while Q8 is a closed query. The answer to Q6 is "No" at the tuple (000, John, 20K, Secret) is not specified in the database. The answer to Q7 is "Yes" as the tuple (000, John, 60K, Secret) is specified in the database. The answer to the query Q8 is the set (John, Jane, James). Advantages and Drawbacks By treating all the general laws as integrity rules, the NTML theory does not change unless the general laws themselves evolve. Also negative information does not have to be explicitly specified. Anything that is not specified in the database is assumed to be false. A drawback with this approach is that tuples have to be duplicated at different security levels. This is because none of the general laws are used as derivation rules and, therefore, new tuples cannot be deduced from existing tuples. Therefore, one cannot deduce that (111, James, 35K, Secret) from (111, James, 35K, Unclassified). Therefore, the tuple (111, James, 35K, Secret) has to be explicitly specified in the database (see the discussion on the rule IR5 given earlier). When data is inserted, deleted or modified, the interpretation is also changing. Therefore, one has to verify that the new interpretation is still a model of the NTML theory. That is, when there is some change made by a user at level L, then it has to be ensured that at all security levels L* (L*.gtoreq.L), the multilevel database is still a model of the theory. In general, it is not necessary to verify each law when the database is updated. That is, only certain laws may be falsified as a result of the update. It remains to be investigated as to whether the techniques developed for integrity checking in relational databases (see, for example, the work in Nicolas, J., and K. Yazdania, (Editors: H. Gallaire and J. Minker), 1978, "Integrity Checking in Deductive Databases," Logic and Databases, New York: Plenum Press; Nicolas, J., "Logic for Improved Integrity Checking in Relational Databases," 1982, Acta Informatica, Vol. 18, No. 3, pp. 227-253) can be adapted for multilevel databases..sup.11 .sup.11 Integrity constraints are discussed in section 7. 4.4 Integrated Approach This approach is a mixture of the two previous approaches for handling the general laws. That is, some of the general laws are taken as integrity rules and the others as derivation rules. The database has two components. One is the explicit extension which is a model of an NTML theory whose proper axioms are the general laws which are regarded as integrity vales. The other is the implicit extension which is the set of all tuples which are derived from the explicit extension by virtue of the general laws which are used as derivation rules. Query Evaluation Query evaluation at a level L depends on whether relations are defined explicitly or implicitly. Two ways to treat relation definitions have been identified. They are as follows: (a) A Relation is either defined explicitly or implicitly. If it is defined implicitly, then it is defined in terms of the explicit relations. Note that each definition is a NTML formula and therefore has a security level attached to it. When a query is posed at level L, all references to implicit relations in the query formula are replaced by explicit relations according to the definitions of the implicit relations at level L. The modified query is evaluated against the explicit extension of the multilevel database. Note that the treatment of views in relational database systems takes this approach. (b) In the second approach, it is not necessarily the case that a relation is defined either implicitly or explicitly. That is, it is impossible for part of the relation to be defined explicitly and part of it implicitly. Therefore, evaluating a query at level L amounts to deducing facts which are tuples at level L. That is, evaluating a query R(x1, x2, x3 . . . xn), expressed by (R(x1, x2, . . . xn), L), amounts to obtaining all proofs of the formula ( x1/t1, x2/t2, . . . xn/tn R(x1, x2, . . . xn), L). An advantage of the second treatment of relation definition is that in the real world one could know various relationships without having to know exactly how these relationships were formed. That is, it is possible for people to be senior employees without actually having to make a salary of more than 30K. We illustrate query evaluation with an example: Suppose a Secret user poses a query to retrieve all senior employees. This query is expressed by the NTML formula: SEN-EMP(X), Secret. If the relation SEN-EMP is defined only explicitly (see Table 6), the answer to the query will be the names {John, Jill}. (Note that in Table 6, S1 and S2 are the schema rules, DR1 is a derivation rule which defines JUN-EMP, and IR1 and IR2 are integrity rules.) If the relation SEN-EMP is defined only implicitly (Table 7), the query is then modified to the following formula: ( X,Z EMP(X,Y,Z).LAMBDA. Z>30K, Level=Secret). The answer to the query is the set {John, James, Jane}. (Note that in Table 7, S1 is a schema rule, DR1 is a derivation rule which defines SEN-EMP, and IR1-IR4 are integrity rules.) If, however, a relation could be defined explicitly as well as implicitly (see table 8), then the answer to the query is the set {John, James, Jane, Jill}. (Note that in table 8, S1 and S2 are schema rules, DR1 is a derivation rule, and IR1-IR4 are integrity rules.) Advantages and Drawbacks By using some of the general laws as derivation rules, not all of the information has to be made explicit. This could save storage space. Also, with this approach it is possible to express general information about the perceived world without having to split it into sets of elementary information. Examining Tables 6, 7, and 8 we see that by implicitly defining the relation SEN-EMP, certain tuples need not be explicitly expressed in the database. This could save storage space. As another example, consider the integrity rule IR2 expressed in Table 6 (note that this is in fact the rule IR5 in Table 5). This rule ensures that tuples are duplicated at different security levels. In the third approach, such duplication can be eliminated by using this rule as a derivation rule as given in Table 9. Note that the derivation rule is in fact the inference rule DASL. However, this rule has to be explicitly stated if new tuples are to be deduced from existing tuples. By using IR2 in Table 6 as a derivation rule as shown in Table 9, some tuples in the relation EMP need not be explicitly specified.
TABLE 6
__________________________________________________________________________
Relation Defined Explicitly
__________________________________________________________________________
S1: (.sub. X/T1, Y/T2, Z/T3, L/SL EMP(X,Y,Z,L) .fwdarw.
T1 = SS# .sub..LAMBDA. T2 = NAME .sub..LAMBDA. T3 = SALARY),
Unclassified
S2: (.sub. X/T, L/SL SEN-EMP(X,L) .fwdarw. T = NAME), Unclassified
DR1:
(EMP(X,Y,Z,L).sub..LAMBDA. Z < 20K .fwdarw. JUN-EMP(Y,L)),Unclassified
2
IR1:
(EMP(X,1,Z1,L).sub..LAMBDA. EMP(X,Y2,Z2,L) .fwdarw.
Y1 = Y2.sub..LAMBDA. Z1 = Z2), Unclassified
IR2:
(.sub. X1/SS#,Y1/NAME,Z1/SALARY,L1/SL,L2/SL (EMP(X1,Y1,Z1,L1).sup..LAM
BDA.
Y2/NAME, Z2/SALARY (EMP(X1,Y2,Z2,L2) .sup..LAMBDA. (Y1 .noteq. Y2
V Z1 .noteq. Z2))
.sub..LAMBDA. Just-less-than(L1, L2) .fwdarw. EMP(X1, Y1, Z1, L2))),
Unclassified
__________________________________________________________________________
Relation: EMP
SS# Name Salary
Level
__________________________________________________________________________
000 John 20K Unclassified
111 James 35K Unclassified
222 Mary 15K Unclassified
000 John 50K Secret
111 James 35K Secret
222 Mary 15K Secret
333 Jane 70K Secret
__________________________________________________________________________
Relation: SEN-EMP
Name
Level
__________________________________________________________________________
Jill
Unclassified
John
Secret
David
Secret
__________________________________________________________________________
Negative information is handled by assuming that anything that is not either explicitly or implicitly specified in the database is assumed to be false. However, one has to be careful so as to not introduce wrong information. That is, one has to forbid the derivation of positive information from negative information as negative information can be regarded as information that is not positively sure. That is, the following general law (L) cannot be used as derivation rules: (L): (A1 . . . An.fwdarw.B1 V B2 V B3 . . . V Bp) L; where p>1. Note that the above law can be rewritten as: (A1 A2 A3 . . . An B1 B2 B3 . . . Bp-1.fwdarw.Bp),L. The negative information B1, B2, . . . Bp-1 is included in the derivation of Bp. Since this negative information could be obtained by default, (i.e., the negative information could be obtained by the absence of positive information), one cannot be completely certain of the validity of the negative information. As a result, one cannot be certain of the validity of any information derived from negative information. If p=1 in the general law (L), then all information is deduced from positive information. 4.5 A Note on Element Level Classification In our treatment of multilevel databases we have assumed tuple level classifications. That is, an entire tuple is assigned a security level. In reality, however, it is possible for the elements of a tuple to be classified at different security levels. This type of classification is known as element level classification. Element level classifications can be achieved by the use of security constraints (which are part of the general laws). Table 10 shows examples of security constraints which provide element level classifications. The first constrain SC1 classifies all salary values in a relation EMP at the Secret level. The second constraint SC2 classifies a name value to be Secret if the corresponding salary value is greater than 50K. In the third constraint SC3, the name and salary values taken together are classified at the Secret level..sup.12 We now discuss how NTML syntax and semantics could be extended to accommodate the enforcement of security constraints. .sup.12 Note that the constraints SC1, SC2, and SC3 are not expressed as NTML formulas in table 10. In table 11 they are expressed as NTML formulas. The extensions to NTML that are required are in the area of supporting NULL values. We discuss a possible way to support NULL values. It should be noted that more research needs to be done in this area in order to find sound and complete query evaluation algorithms based on the extended NTML logic. Even in ordinary relational database systems based on horn clause logic, there are many research issues that need to be investigated in supporting NULL values, although the work of Reiter shows much promise, Reiter, R., (Editors: M. Brodie and J. Mylopoulos), 1984, "Towards a Logical Reconstruction of Relational Database Theory," Conceptual Modelling; Perspectives front Artificial Intelligence, Databases and Programming Languages, Heidelberg, Germany: Springer Verlag; Reiter, R., April 1986, "A Sound and Sometimes Complete Query Evaluation Algorithm," Journal of the Association for Computing Machinery, Vol. 33, No. 2, pp. 349-370. For each NTML theory we introduce a distinguished constant symbol NULL which is multi-typed. That is, this symbol can take any type as its value. The security level of this symbol is system-low. Each domain D(T,L)(where T is a type and L is a security level) has an element called NULL which is the assignment of NULL under any interpretation with respect to L.
TABLE 10
______________________________________
Security Constraints
______________________________________
SC1: EMP(X,Y,Z) .fwdarw. Level(Z) = Secret
{All salary values in the relation EMP are Secret.}
SC2: EMP(X,Y,Z) Z .fwdarw. 50K .fwdarw. Level(Y) =
Secret
{A name value in the relation EMP is Secret if the
corresponding salary value is less than or equal to
50K.}
SC3: EMP(X,Y,Z) .fwdarw. Level(Together(Y,Z) =
Secret
{A name value and its corresponding salary value
taken together in the relation EMP are Secret.}
______________________________________
Table 11 shows how the constraints SC1, SC2, and SC3 are expressed as NTML formulas. Note that these constraints are enforced at the Unclassified level. Therefore, the security levels assigned to the corresponding NTML formulas are Unclassified. At the Secret level these formulas are negated as the security constraints do not apply. Note also that we have assumed that there are only two security levels: Unclassified and Secret. Finally for convenience, we use the word NULL to describe both the constant NULL as well as its assignment.
TABLE 11
__________________________________________________________________________
Security Constraints as NTML Formulas
__________________________________________________________________________
Security Constraint SC1 is expressed by the formula R1; its negation
at the Secret level is expressed by R2.
R1: (EMP(X,Y,Z,L) .fwdarw. Z = NULL), Unclassified
R2: ( X/SS#, Y/NAME, Z/SALARY, L/SL
(EMP(X,Y,Z,L) .fwdarw. Z = NULL)), Secret
Security Constraint SC2 is expressed by the formula R3;
its negation at the Secret level is expressed by R4
R3: (EMP(X,Y,Z,L) Z .gtoreq. 50K .fwdarw. Y = NULL), Unclassified
R4: ( ( X/SS#, Y/NAME, Z/SALARY, L/SL
(EMP(X,Y,Z,L) Z .gtoreq. 50K .fwdarw. Y = NULL))), Secret
Security Constraint SC3 is expressed by the formulas R5 and R6; its
negation at the Secret level is expressed by the formulas R7 and R8.
R5: (EMP(X,Y,Z,L) Z .noteq. NULL .fwdarw. Y = NULL), Unclassified
R6: (EMP(X,Y,Z,L) Y .noteq. NULL .fwdarw. Z = NULL), Unclassified
R7: ( ( X/SS#, Y/NAME, Z/SALARY, L/SL
(EMP(X,Y,Z,L) Z .noteq. NULL .fwdarw. Y = NULL))), Secret
R8: ( ( X/SS#, Y/NAME, Z/SALARY, L/SL
(EMP(X,Y,Z,L) Y .noteq. NULL .fwdarw. Z = NULL))),
__________________________________________________________________________
Secret
Next, we briefly discuss how each of the approaches could handle the security constraints. In the first approach, the security constraints which are expressed as NTML formulas are part of the proper axioms of an NTML theory. The multilevel database is also expressed as the proper axioms of the theory. All of the security constraints are treated as derivation rules. When the multilevel database is updated, a new NTML theory is obtained. The consistency of the new theory has to be checked. For example, if the constraints, say, R3 and R4 given in Table 11 are enforced and the NTML axiom .fwdarw.EMP(000, NULL, 60K), Unclassified .fwdarw.EMP(000, John, 60K), Unclassified then the new NTML theory becomes inconsistent. Instead, if the following axiom .fwdarw.EMP(000, John, 60K), Secret is added to the theory, then the new theory remains consistent. In the second approach, all of the security constraints expressed as NTML formulas act as integrity rules. Further, the multilevel database is a model of the theory which consists of the integrity rules. Table 12 shows a multilevel database which is a model of the theory which consists of the axioms R3 and R4 of Table 11. In the third approach, some of the security constraints act as integrity rules and the rest as derivation rules. Since the security constraints specified in Table 10 are not really rules which can be used to derive new data, it seems more appropriate to use them as integrity rules. If so, they are treated as in the case of the second approach. More research needs to be done before it can be determined whether a particular security constraint should be treated as a derivation rule or as an integrity rule.
TABLE 12
______________________________________
Security Constraints as Integrity Rules
Schema S1:
S1:(.sub. X/T1, Y/T2, Z/T3, L/SL EMP(X,Y,Z,L) .fwdarw.
T1 = SS#.sub..LAMBDA. T2 = NAME .sub..LAMBDA. T3 = SALARY), Unclassified
Integrity Rules IR1 and IR2:
IR1:(EMP(X,Y,Z,L) .sub..LAMBDA. Z .gtoreq. 50K .fwdarw. Y = NULL),
Unclassified
IR2:( ( X/SS#, Y/NAME, Z/SALARY, L/SL
(EMP(X,Y,Z,L) Z .gtoreq. 50K .fwdarw. Y = NULL))), Secret
Relation: EMP
SS# Name Salary Level
______________________________________
000 NULL 60K Unclassified
000 John 60K Secret
111 Mary 40K Unclassified
222 NULL 70K Unclassified
222 Jane 70K Secret
333 NULL 55K Unclassified
333 David 55K Secret
______________________________________
4.6 A Note on Data Dependencies Data dependencies that have been studied extensively are the functional dependencies and multilevel valued dependencies, Maier, D., 1983, Theory of Relational Databases, Rockville, Md.: Computer Science Press; Ullman, J., 1988, Principles of Data and Knowledge Base Systems, Computer Science Press, Rockville, Md. Dependency statements can be regarded as general laws which are used in the decomposition of relations. Recently, data dependencies have been studied in multilevel database systems, Denning, D. E., at al., April 1987, "A Multilevel Relational Data Model," Proceedings of the IEEE Symposium on Security and Privacy, Oakland, Calif.; Onuegbe, E., December 1987, "Data Dependencies for Multilevel Databases," Technical Note, Honeywell, Inc.; Dwyer, P., E. Onuegbe, P. Stachour, and M. B. Thuraisingham, December 1988, "Query Processing in LDV--A Secure Database Management System," Proceedings of the 4th Aerospace Security Conference, Orlando, Fla.; Jajodia, S., and R. Sandhu, May 1990, "On Polyinstantiation Integrity," Proceedings of the 1990 IEEE Symposium on Security and Privacy, Oakland, Calif. Subsequently, functional and multivalued dependency statements have been proposed. This paper does not attempt to investigate data dependencies for multilevel relations. Instead it shows that the dependency statements can be expressed as NTML formulas. We state functional and multilevel value dependency statements as given in Onuegbe, E., December 1987, "Data Dependencies for Multilevel Databases," Technical Note, Honeywell, Inc., and specify them in NTML (the limitations of these statements are not addressed here)..sup.13 .sup.13 It should be noted that we have developed a new dependency, which we call association-based dependency, for multilevel databases. Association dependencies handle association-based constraints (which are also referred to as context-based constraints in the literature, Dwyer, P., G. Jelatis, and M. B. Thuraisingham, June 1987, "Multilevel Security for Database Management Systems," Computers and Security, Vol. 6, No. 3). An example of an association-based constraint is SC3, given in table 10. Association dependencies can also be represented as NTML formulas. However, we do not address association dependencies in this paper. They are described elsewhere, Thuraisingham, M. B., 1990, "Association Dependencies for Multilevel Databases," WP-28904, The MITRE Corporation, Bedford, Mass. (not currently in the public domain). Functional Dependencies Let (R(X1, X2, X3, . . . Xn) be an n-ary relation which is visible at a security level L (i.e., the NTML formula that corresponds to R is assigned an inherent security level L or lower). Let Y and Z be two subsets of {X1, X2, . . . Xn}. Then the functional dependency statement Y.fwdarw.Z holds for R at L, if and only if the following two conditions are satisfied: (a) Whenever two tuples at level L have equal values for Y, they also have equal values for Z. (b) Whenever an integrity constraint enforces any component of the value of Y to be NULL at level L, then the value of Z must be NULL (note this second condition will ensure that the security level of the value of a set of attributes will dominate the security level of the value of a second set of attributes if the former is functionally dependent on the latter). Let the function dependency statement be X1, X4.fwdarw.X3, X6. This statement is expressed by the following NTML formulas F1 and F2. F1: ( x1/t1, x2/t2, . . . xn/tn, x2'/t2, x5'/t5, x6'/t6, . . . xn'/tn [R(x1, x2, . . . xn, L) R(x1 x2', x3', x4, x5', x6', . . . xn', L).fwdarw.(x3=x3' x6=x6')]), Unclassified F2: (x1/t1, x2/t2, . . . xn/tn [R(x1, x2, . . . xn, L) (x1=NULL V x4=NULL).fwdarw.(x3=NULL x6=NULL)]), Level=L Multivalued Dependencies Informally multivalued dependency can be defined as follows for multilevel databases: Let R(x1, x2, x3, . . . xn) be an n-ary relation. Let Y and Z be subsets of {X1, X2, . . . Xn} and T be the complement in {X1, X2, . . . Xn} of {Y U Z}. The multivalued dependency statement Y.fwdarw.Z holds for R if and only if the following conditions are satisfied: (a) If y1, t1, z1, and y1, t2, z2 are, respectively, the Y, T and Z values of 2 tuples in R, then y1, t1, z2, and y1, t2, z1 are also, respectively, the Y, T, and Z values of 2 tuples in R. (b) If any component of the value of Y is NULL, then the value of T and Z are NULL. Let the multivalued dependency statement be X1,X4.fwdarw.X3,X6. This statement is expressed by the following NTML formulas F3 and F4. F3: ( x1/t1, x2/t2, . . . xn/tn, x2'/t2, x5'/t5, x7'/t7, . . . xn'/tn [R(x1, x2', x3, x4, x5', x6, x7', . . . xn', L) R(x1, x2, x3', x4, x5, x6', . . . xn, L).fwdarw.R(x1, x2, x3, . . . xn)]), Unclassified F4: ( x1/t1, x2/t2, . . . xn/tn [R(x1, x2, . . . xn, L) (x1=NULL V x4=NULL).fwdarw.x2=NULL x6=NULL . . . xn=NULL)]), Unclassified 4.7 A Note on the Inference Problem The inference problem in database security deals with the situation where users acquire unauthorized information from the information that they legitimately acquire. Various formulations of the inference problem have been given. In Thuraisingham, M. B., December 1987, "Security Checking in Relational Database Management Systems Augmented with Inference Engines," Computers and Security, Vol. 6, No. 6; Keefe, T., M. B. Thuraisingham, and W. T. Tsai, March 1989, "Secure Query Processing Strategies," IEEE Computer, Vol. 22, No. 3; Thuraisingham, M. B., September 1989, "Inference Problem in Database Security," M89-52, Volume 3, The MITRE Corporation, Bedford, Mass. (not currently in the public domain), a formulation of this problem within the context of query processing is described. In Morgenstern, M., May 1987, "Security and Inference in Multilevel Database and Knowledge Base Systems," Proceedings of the ACM SIGMOD Conference, San Francisco, Calif.; Morgenstern, M., April 1988, "Controlling Logical Inference in Multilevel Database Systems," Proceedings of the 1988 IEEE Symposium on Security and Privacy, Oakland, Calif.; Hinke, T., 1988, "Inference Aggregation Detection in Database Management Systems," Proceedings of the IEEE Symposium on Security and Privacy, Oakland, Calif., inference problem is formulated within the context of database design. In this section, we discuss the inference problem within the framework of NTML. We examine each approach for formalizing database concepts and discuss the inference problem associated with each approach. First, note that for security violations via inference to occur, there must be a set of tuples which classify various information. The information includes the database data as well as other information that the users possess. Inference Problem and the Perceived Multilevel Universe as an NTML Theory Approach In the first approach, the database as well as the general laws are represented as axioms of an NTML theory. We now extend the database to include all the information the users possess. Any security constraint which classifies any piece of information is regarded as a general law. Therefore, the proper axioms of the NTML theory are all the general laws and the extended database. We now separate the general laws into two: general laws which are security constraints and general laws which are not security constraints. Inference checking then amounts to checking the consistency of the NTML theory whenever it is updated. If the theory is inconsistent, and if any of the security constraints were used to prove its inconsistency, then there is a potential for security violations via inference. That is, inference checking amounts to checking the consistency of the theory. Inference Problem and the Multilevel Database as a Model Approach In the second approach, the multilevel database is considered to be a model of the NTML theory whose axioms are the general laws. Note that in this approach all of the constraints are treated as integrity rules. This approach is not appropriate for inference checking because the multilevel database only has partial information. That is, it may not be the case that all of the information that a user possesses can be represented in the database. The general laws are enforced on the data in the database. The general laws could be separated into two groups: those that are security constraints and those they are not. Inference checking then amounts to checking the validity of the security constraints whenever the database is updated. If any of the security constraints are violated, then security violation via inference occurs. However, since all real-world information or general laws may not be involved in the inference checking process, all security violations via inference cannot be detected. Inference Problem and the Integrated Approach In the third approach, some of the general laws are treated as integrity rules and some as derivation rules. Since the security constraints are also general laws, one could separate the security constraints into two groups: one group consists of those security constraints that act as derivation rules and the other group consists of those constraints that act as integrity rules. The multilevel database is a model of the general laws which are enforced as integrity rules. The real-world information can be captured as part of the axioms of the NTML theory. Inference checking amounts to the following. Whenever the multilevel database is updated, if any of the security constraints which are integrity rules are violated then security violation via inference could occur (see the discussion for the second approach). Whenever axioms of the NTML theory are modified (depending on the evolution of the real world), the consistency of the resulting theory is checked. If the theory is inconsistent, then security violation via inference could occur (see the discussion for the first approach). Section 5 NTML as a Programming Language In this section we describe our preliminary investigation towards the development of a programming language based on NTML. In section 5.1 we discuss the motivation for such a language and in section 5.2 we describe a resolution rule. 5.1 Motivation The advent of logic programming languages during the 1970s enabled computing systems to exhibit a certain degree of intelligence or deductive reasoning--an important feature which until then, remained relatively elusive. In other words, logic programmers need specify only the logic component of an algorithm without having to write any procedures or sets of instructions for the computer. The latter task is often tedious and time-consuming and can be effectively handled by the deductive reasoning component of the logic programming system. Logic programming languages, such as Prolog, are gaining in popularity and are being heralded as languages of the future. Therefore, it is important and useful that a logic programming language be developed that is appropriate for knowledge-based military applications. That is, one that incorporates constructs for multilevel security. However, current logic programming languages cannot be used for knowledge-based multilevel secure applications as they require the programmer to specify the procedures, thus causing it to lose the main advantages of logic programming. In contrast, we have developed the initial features of a logic programming language, called NTML-Prolog, for multilevel applications. NTML-Prolog is based on a subset of NTML. Such a development will be of significance to multilevel knowledge-based applications in the same way Prolog has been to the development of knowledge-based applications. In section 5.2, we describe the essential points towards the development of a logic programming language that we call NTML-Prolog. In particular, we describe an NTML-Prolog Program and a resolution rule for NTML-Prolog. Much research needs to be done before an interpreter for NTML-Prolog can be implemented. However, producing a workable NTML-Prolog system will be significant in the development of multilevel intelligent data/knowledge base management systems in the same way Prolog has marked a significant milestone in the development of intelligent data/knowledge base management systems. 5.2 NTML-Prolog In this section we discuss the essential points of NTML-Prolog. We assume that the reader is familiar with the programming language Prolog, Clockskin W., and Mellish, C., 1984, Programming in Prolog, Heidelberg, Germany: Springer Verlag. An NTML program consists of a set of program clauses. A program clause is of the form (((A,L).rarw.(B1, L1), (B2, L2), . . . , (Bn, Ln)),L*) or (((NOT ((A,L).rarw.(B1, L1), (B2, L2), . . . , (Bn, Ln)))), L*) where n.gtoreq.0 (note that if n=0, then there are no B1s), L, L1, L2, . . . Ln, L* are security levels L, L1, L2, . . . Ln are all .ltoreq.L* (if the security levels are constants) A, B1, B2, . . . Bn are (positive) atomic NTML formulas. The second form is the negation of the first form. The first form is read as follows: "To show that A is true at level L, we need to show that B1, B2, . . . Bn are all true at levels L1, L2, . . . Ln, respectively. Further, this entire statement is true at the level L*." A goal clause is of the form: .rarw.(B1, L), (B2, L), . . . (Bn, L) where n.gtoreq.1, L1, L2, . . . Lm are security levels and B1, B2, . . . Bn are (positive) atomic NTML formulas. Note that an NTML-Prolog program is based only on a subset of NTML. That is, we involve only the clauses with at most one negative NTML atom..sup.14 .sup.14 If we do not explicitly specify the level of an atomic formula, then we assume that its level is system-low. Next, we describe the algorithm which implements the resolution rule for NTML-Prolog. A query posed by a user at security level L is expressed as a goal associated with level L. Let the goal be .rarw.(A1, L), (A2, L), . . . (Am, L). The resolution principle is implemented as follows. For each i (1.ltoreq.i.ltoreq.m) do the following: Unify (Ai, L) with some program clause (((A*, L').rarw.(B1, L1), (B2, L2), . . . (Bt, Lt)), L*) where the following conditions are satisfied: (a) L*.ltoreq.L if L* is a constant. (Note that L', L1, L2, . . . Lt are all .ltoreq.L*) (b) It is not the case that there is a security level L" where L*.ltoreq.L".ltoreq.L such that (((NOT (A*, L').rarw.(B1, L1), (B2, L2), . . . (Bn, Ln))), L"). Note, by unification we mean find the most general unifier for Ai and A* which satisfies the conditions stated above. If a unifier cannot be found, then there is no solution for the query. Return "failure" as the response. If there is such a unifier, then it is included in the response being assembled. Try the procedure (i.e., unification) for the (Ai+1, L). End (for each response). Return the response to the query. If another solution has to be found, then repeat the same procedure, but exclude the previous responses in the unification process. That is, find only the new solutions. This ends the algorithm for implementing the resolution principle. The resolution principle as described here will ensure that appropriate responses at or below the user's level will be retrieved. Future work will include the investigation of the completeness and soundness of the resolution rule as described here. Section 6 Proposals for Handling Negative Information As stated in Nicolas, J., and H. Gallaire, (Editors: H. Gallaire and J. Minker), 1978, Database: Theory vs Interpretation, New York: Plenum Press, negative information corresponds to the fact that a given tuple does not satisfy a relation and may have to be represented by a negative ground literal. In a logical system, positive and negative information are treated in the same way. That is, both types of information have to be explicitly specified or derived from other information. In database applications, negative information is represented implicitly. That is, the negation of any statement is assumed to be true if the statement is not asserted. Such an implicit representation of negative information has been investigated extensively by researchers of logic programming and databases. The most notable efforts are those of Reiter, Reiter, R., 1978, "On Closed-World Databases," (Editors: H. Gallaire and J. Minker), Logic and Databases, New York: Plenum Press; Clark, Clark, K., (Editors: H. Gallaire and J. Minker), 1978, "Negation as Failure," Logic and Databases, Plenum Press, New York; and Nicolas and Gallaire, Nicolas, J., and H. Gallaire, (Editors: H. Gallaire and J. Minker), 1978, Database: Theory vs Interpretation, New York: Plenum Press. Many of the other proposals for handling negative information, Minker, J., 1982, "On Definite Databases and the Closed World Assumption," Proceedings of the Sixth Conference on Automated Deduction, Springer Verlag, Berlin, Germany; Shepardson, J. C., 1984, "Negation as Failure Rule," Journal of Logic Programming, Vol. 1; Shepardson, J. C., 1985, "Negation as Failure Rule--II," Journal of Logic Progamming, Vol. 2, can be regarded as an evolution of these earlier proposals. In Thuraisingham, M. B., (Editor: T. Lunt), May 1988, "Foundations of Multilevel Databases," Presented at the 1st RADC Database Security Invitational Workshop, Menlo Park, Calif. (Proceedings published by Springer Verlag, Heidelberg, Germany), an attempt was made to extend Reiter and Clark's proposals for multilevel databases. However, as stated earlier; that effort is solely based on first-order logic, and therefore does not capture many of the essential features of multilevel reasoning and data processing. Our discussion on viewing multilevel databases through NTML, given in section 4, showed how Nicolas and Gallaire's proposal could be adapted for a multilevel environment. As part of this discussion, issues on handling negative information were also addressed. In particular, the following conclusions were reached for the three approaches to handle negative information. They are the following: (1) In the proof-theoretic approach, negative information must be made explicit. (2) In the model theoretic approach, it is implicitly assumed that a tuple which does not appear in the extension of a relation does not satisfy it. (3) In the integrated approach it is implicitly assumed that a tuple which does not appear in the extension of a relation does not satisfy it. But there is an additional problem here, as some of the laws are used as derivation rules, and, therefore, one can envisage information to be deduced from negative information implicitly assumed. This problem is overcome by not permitting certain rules to be derivation rules. In section 6, we consider Reiter's and Clark's proposals to handle negative information within the framework of NTML. Reiter's proposal, known as the closed world assumption (CWA), states that negative information does not cause any complications if there are no gaps in the knowledge. Reiter also shows that, when only ground literals are handled, CWA does not produce any inconsistencies. But when general laws are taken into consideration, and when a deduction process is involved, this is not always the case. Clark's proposal is concerned with interpreting negation as failure. He calls such a rule "the negation as failure rule" and develops a proof procedure for a horn clause logic program which handles negation. In section 6.2 we discuss CWA, and in section 6.3 we discuss negation as failure within the framework of NTML. 6.2 Closed World Assumption for Multilevel Databases 6.2.1 Overview Reiter, Reiter, R., 1978, "On Closed-World Databases," (Editors: H. Gallaire and J. Minker), Logic and Databases, New York: Plenum Press, has distinguished two conditions under which queries could be evaluated; the open-world assumption (OWA) and the closed-world assumption (CWA). Under OWA, queries are evaluated using the first-order approach. That is, queries are expressed as formulas of first-order logic, and proofs of the formula are attempted. Each proof will produce an answer to the query. Under CWA, if no proof of a positive ground literal exists, then its negation is assumed to be true. That is, certain answers to a query are permitted as a result of the failure to find proofs. Under CWA, the database constitutes all positive ground literals explicitly specified, plus the negation of the ground literals implicitly assumed to be true. For applications which utilize logic for knowledge representation, it may be necessary to evaluate queries under OWA. But Reiter points out that, for many database applications, it is reasonable to evaluate queries under the CWA. Our intent is to examine the various issues involved in extending CWA to a multilevel environment. A preliminary discussion on this topic will be given in section 6.2.2. First, we discuss some terminology. We also explain OWA for a multilevel environment as it can then be compared with CWA. We assume that queries are expressed as NTML formulas. The level L of the user posing the query is the level at which the formula is specified..sup.15 A multilevel database (MBD) is a NTML-Prolog program (which contains no function signs). An answer to a query Q=(x/T1 .vertline. (y/T2 W(x,y),L).sup.16 is a set of elements (c1, c2, . . . cn) if and only if the following conditions are satisfied: (1) Each ci is of type T1, (2) Level(ci).ltoreq.L, (3) MDB .vertline.-V.sub.1 .ltoreq.i.ltoreq.n (y/T2 W(ci, y)) with respect to L (i.e., the proof of the formula does not involve any information which is not dominated by L). .sup.15 Note that the level at which the formula is asserted dominates the inherent level of the formula. .sup.16 Note that x and y could be tuples instead of individual elements. We assume that they are individual elements for convenience. The query requests to find all x such that there is a y for which W(x,y) holds. The set of all such answers will form the response to a query. The notation {c1+c2+c3+ . . . cn} is also used to denote the answer {c1+c2+c3+ . . . cn}. It should be noted that if {c1, c2, . . . cn} is an answer, then so will {c1, c2, c3, . . . cn, cn+1} be an answer provided cn+1 is of type T1 and has level .ltoreq.L. This suggests the need for a minimal answer. A minimal answer to Q is defined to be an answer such that no proper subset of it will be an answer. If an answer consists of a single element {c}, then it is a definite answer. Otherwise it is an indefinite answer. Evaluating Q under open world assumption amounts to finding all minimal answers to Q. This is denoted by .vertline..vertline.Q.vertline..vertline..sub.OWA. We could also attach the security level L and denote it by .vertline..vertline.Q.vertline..vertline..sub.OWA (L). Consider the following example. We assume that there are only two security levels; Unclassified and Secret. Let MDB consist of the following set E of employees and set D of departments. We use U to denote Unclassified and S to denote Secret. E={e1(U), e2(U), e3(U), e4(S)} D={d1(U), d2(U), d3(S)} Let the integrity constraints enforced at the Unclassified level be the following: "Every employee must work in at least one department" The constraint is expressed by the NTML formula: (x/EMP y/DEPT Works(x,y), U) Additional facts in the database are the following: (Works(e1, d1), U) (Works(e2, d3), S) Suppose Unclassified and Secret users pose queries to find out where the employees are working. The queries will be expressed by the following NTML formulas. (x/EMP y/DEPT Works(x,y), U) (x/EMP y/DEPT Works(x,y), S) The response (under OWA) to the query issued by the Unclassified user will be: {(e1, d1), (e2, d1)+(e2, d2), (e3, d1)+(e3, d2)}. The response (under OWA) to the query issued by the Secret user will be: {(e1, d1), (e2, d3), (e3, d1)+(e3, d2)+(e3, d3), (e4, d1)+(e4, d2)+(e4, d3)}. The specific techniques employed to obtain these answers have been adapted from Reiter's work on query evaluation under OWA described in Reiter, R., (Editors: Gallaire and J. Minker), 1978, "Deductive Question Answering on Relational Databases," Logic and Databases, Plenum Press, New York. 6.2.2 Closed World Assumption We first illustrate query evaluation under CWA using a purely extensional multilevel database MDB. Such a database consists of ground literals of different security levels..sup.17 We assume that there are only two security levels; Unclassified and Secret. .sup.17 In general the extensional portion of a multilevel database MDB is denoted by MEDB. In this example, we assume that MDB and MEDB are the same. Consider the multilevel database shown in Table 13. There are two domains EMP and DEPT. The MDB consists of one relation `Works.` Note that, at the Unclassified level, e1 works in d1. At the Secret level, e1 works in d2. This could mean that, at the Secret level, e1 works in both d1 and d2, or the tuple is polyinstantiated. For this discussion we assume that it is the former. That is, at the Secret level e1 works in both departments..sup.18 Note also that e4 works in both departments d1 and d3 at the Secret level. .sup.18 If we want to assume at the Secret level that e1 does not work in department d1, then, as discussed in the model theoretic approach to multilevel databases, we need to duplicate every tuple that holds in the Unclassified level at the Secret level also. We will see that, if negative information is explicitly specified, then negations at the Secret level can also be explicitly specified. That is, the tuple (e1,d1) can be included in the relation Works, as shown in table 14.
TABLE 13
______________________________________
A Multilevel Purely Extensional Database
EMP = {e1(U), e2(U), e3(U), e4(S)}
DEPT = {d1(U), d2(U), d3(S)}
Works
Ename Dname Level
______________________________________
e1 d1 U
e2 d3 S
e3 d2 U
e4 d1 S
e1 d2 S
e3 d1 U
e4 d3 S
______________________________________
Consider the following query posed at both the Unclassified and Secret levels. (x/EMP .vertline. Works(x,D2), U) (x/EMP .vertline. Works(x,D2), S) The query requests all those employees who do not work in department D2. Under OWA, the response to this query is NULL at both security levels. That is, .vertline..vertline.Q.vertline..vertline..sub.OWA (U)=.vertline..vertline.Q.vertline..vertline..sub.OWA (S)=Empty set. What we actually want is the response {e1,e2} at the Unclassified level and the response {e2,e4} at the Secret level. Under OWA, this response cannot be obtained as first-order logic and does not implicitly assume anything that is not present in the database. Therefore, in order to obtain the responses that we want, we need to explicitly specify the relation Works (usually denoted by Works) in the database. Table 14 shows the relation Works. The relation Works (U) has all the negative information at the Unclassified level and the relation Works (S) has the negative information at the Secret level. .sup.19 The database which consists of the negative information is the complement of MEDB and is denoted by Comp-MEDB.sup.20. Therefore, in order to evaluate a query, the database MDB and Comp-MEDB (the complement of its extensional component) have to be examined. .sup.19 We can also assume that any negative information at the Unclassified level also is valid at the Secret level unless it is stated as positive information at the Secret level. If so, the tuples in Works at the Unclassified level that are valid also at the Secret level are (e2,d1) and (e2,d2). These tuples need not be explicitly specified at the Secret level. .sup.20 Note: Comp-MEDB is the complement of MEDB. We now formally define query evaluation under CWA for a multilevel extensional database MDB and then extend the definition to an arbitrary database.
TABLE 14
______________________________________
Representing Negative Information
Works(U) Works(S)
Ename Dname Ename Dname
______________________________________
e1 d2 e1 d3
e2 d2 e2 d2
e2 d1 e2 d1
e3 d3
e4 d1
e4 d2
______________________________________
Let MDB be a multilevel extensional database. Let Comp-MEDB consist of all negative ground literals of the form (Pc, L) where P is a predicate and c is a constant (could possibly be ample) and (Pc, L) is not a ground literal of MDB. Consider a query of the form: ( x/T1 .vertline. ( y/T2 W(x,y), L). Then c is an answer to this query under CWA if and only if the following conditions are satisfied: (1) c is of type T1, (2) Level(c).ltoreq.L, (3) MDB U Comp-MEDB .vertline.-(y/T2 W(c,y)) with respect to L. Query evaluation under CWA for purely extensional MDBs are straightforward. That is, in order to evaluate a query, only the MDB and the Comp-MEDB are examined. Any tuple that satisfies the query from MDB U Comp-MEDB will be an answer. Next, we examine query evaluation under CWA for an arbitrary multilevel database. Such a database has an extensional component and an intensional component which consist of a set of general laws which are formulas of NTML. The Comp-MEDB in such a database consists of all negative information of the form Pc where P is a predicate, c is a constant (possibly a tuple) and (MDB .vertline.-Pc) with respect to any V security level L. That is, information is assumed to be negative only if its positive counterpart cannot be deduced with respect to any security level from the multilevel database. Consider the query (x/T1 .vertline. ( y/T2 W(x,y)), L). Then (c1+c2+c3+ . . . cn) is an answer under CWA to this query if and only if the following conditions are satisfied: (1) Each ci is of type T1, (2) Each ci has level.ltoreq.L, (3) MDB U Comp-MEDB .vertline.-V.sub.1.ltoreq.i.ltoreq.n (y/T2 W(ci,y)) with respect to L. As in the case of query evaluation under OWA, we can define minimal answers, definite answers, and indefinite answers for query evaluation under CWA. We illustrate query evaluation under CWA for an arbitrary multilevel database with an example. Consider the MDB shown in table 15. There are three domains: EMP (for employees), CSCI (for computer science departments), and DEPT (for departments). The elements of the various domains are also shown in the table. Note that a computer science department is also a department. We assume that there are only two security levels Secret and Unclassified.
TABLE 15
______________________________________
A Multilevel Deductive Database
EMP = {e1(U), e2(U), e3(S)}
CSCI = {cl(U), c2(U), c3(U), c4(S)}
DEPT = {d1(U), d2(U), d3(S), c1(U), c2(U), c3(U), c4(S)}
Law 1: x/EMP y,z/DEPT Works(x, y) .LAMBDA. SubDept
(z,y) .fwdarw. Works(x,z)
Law 2: x/CSCI Works(d2,x)
Law 3: x,y,z/DEPT SubDept(z,y) .LAMBDA. SubDept(y,x) .fwdarw.
SubDept(z,x)
Works SubDept
Ename Dname Level Dname Dname Level
______________________________________
e1 d1 U d2 d1 U
e3 c3 S d3 d2 S
e3 c4 S c1 d1 U
c2 c1 U
______________________________________
The extensional MDB consists of two relations, Works and Subdept (the subdepartment relation). The intensional MDB consists of three general laws, which are represented as NTML formulas at the Unclassified level. These laws state the following: Law 1: Any employer who works in a department works in all its sub-departments. Law 2: Employee e2 works in all computer science departments. Law 3: The `Subdept` relation is transitive.
TABLE 16
______________________________________
Representing Negative Information for a
Multilevel Deductive Database
Works SubDept
Ename Dname Level Dname Dname Level
______________________________________
e1 c3 U d1 d1 U
e1 c4 S d1 d2 U
e3 d1 S d1 d3 S
e3 d2 S d1 c1 U
e3 d3 S d1 c2 U
e3 c1 S d1 c3 U
e3 c2 S d1 c4 S
e2 d1 U d2 d2 U
e2 d2 U d2 d3 S
e2 d3 S d2 c1 U
. . . . . .
. . . . . .
. . . . . .
______________________________________
Part of Comp-MEDB for the MDB under consideration is shown in table 16. Note that in this table we have combined the tuples at various security levels into one relation. We also assume that negative information at the Unclassified level is also valid at the Secret level provided the positive counterpart cannot be derived either implicitly or explicitly at the Secret level. The following comments are in order: (1) Under CWA there are no gaps in our knowledge. That is, nothing is uncertain. This is because, for each predicate P, constant c, and level L, either Pc can be deduced from MDB or Pc can be deduced from Comp-MEDB with respect to L. Further, since the multilevel database is taken to be MDB U Comp-MEDB, it will definitely be the case that either Pc or Pc can be deduced from MDB U Comp-MEDB. Therefore, intuitively, it seems that all answers under CWA with respect to any security level are definite. Reiter has formally proved this for a nonmultilevel database. It appears that this proof can be extended to include a multilevel database also. (2) It can be seen that the negative information that has to be represented is very large. For any practical application it may not be possible to efficiently represent all the negative facts. Reiter has proved that, for many cases, the negative information need not be taken into consideration for query evaluation under CWA. That is, he shows that, for many cases, query evaluation coincides both under OWA and CWA. However, for some cases this does not work, and therefore negative information has to be explicitly specified. Reiter also shows that when the database is a non-Horn database, inconsistencies can arise with respect to CWA. However, if the database is a horn database, then it is consistent under CWA. Future work should include investigating the applicability of Reiter's results for query evaluation under CWA in a multilevel database. 6.3 Negation as Failure Rule for Multilevel Databases 6.3.1 Overview Clark, Clark, K., (Editors: H. Gallaire and J. Minker), 1978, "Negation as Failure," Logic and Databases, Plenum Press, New York, has defined a query evaluation process for a logic database (i.e. a deductive database) consisting of a set of clauses to be essentially Horn Clause theorem proving with a special inference rule which he calls the negation as failure rule in order to deal with negation. The negation as failure rule (denoted NF) simply states that P can be inferred if every proof of P fails. That is: From .vertline.- .vertline.-P infer .vertline.- P. The proof that P is not provable is an exhaustive but unsuccessful search for at least one proof of P. It is useful to compare the proposals of Reiter (CWA) and Clark (NF) to handle negation. The information A can be inferred under CWA if A cannot be proven to be a consequence of the logic database. As stated in Lloyd, J., 1987, Foundations of Logic Programming, (2nd Edition), Heidelberg, Germany: Springer Verlag, in logic programming terminology, this amounts to the following: If A is not in the success set of the database DB, then A is inferred. For A not to be in the success set of DB, either A is in the SLD finite failure set or the SLD-tree has at least one infinite branch. If A is in the SLD-finite failure set, then under CWA, we can assume A. If there is an infinite branch in the SLD-tree, then, unless there is a mechanism for detecting infinite branches, it will not be possible to show that A is not a logical consequence of the database DB. In logic programming terminology, .sup.< if the logic database is a set of Horn clauses, then the NF states that if A is in the SLD finite failure set of a logic database DB, then infer A. We can see that NF is weaker than CWA. However, as stated in Lloyd, J., 1987, Foundations of Logic Programming, (2nd Edition), Heidelberg, Germany: Springer Verlag, in practice, implementing anything beyond NF is difficult. .sup.21 For the logic programming terminology that we have used we refer to Lloyd, J., 1987, Foundations of Logic Programming, (2nd Edition), Heidelberg, Germany: Springer Verlag. In section 6.3.2 we discuss how the NF rule could be applied for multilevel databases. We regard a multilevel database as an NTML-Prolog program. Query evaluation in such a database is resolution theorem proving. The resolution rule is the one that we have described in section 5 with negated literals evaluated by a failure proof. That is, we augment the resolution rule that we have designed with the rule NF for multilevel databases. 6.3.2 NF Rule for Multilevel Databases The multilevel database consists of a set of NTML-Prolog clauses. Note that an NTML-Prolog clause could be a positive clause or it could be the negation of a positive clause at a different security level. We assume that each clause has a distinguished positive literal. The positive literal is what the clause defines and is placed at the head. For example, a clause is either of the form (((R(t1,t2, . . . tn), L*).rarw.(K1, L1) K2, L2) K3, L3) . . . Kn, Ln)), L) or (((R(t1,t2, . . . tn), L*).rarw.(K1, L1) (K2, L2) (K3, L3) . . . (Kn, Ln)) We first describe the query evaluation algorithm, and then illustrate example..sup.22 .sup.22 Compare this algorithm with Clark's SLDNF proof procedure. Resolution Rule Augmented with NF for Multilevel Databases Begin Let the query to be evaluated be (.rarw.(K1, L) (K2, L) (K3, L) . . . Kn, L)). Call it the current query Q. Until an empty query is derived and the evaluation succeeds, do the following: Select a literal from the current query so that a negative literal is selected only if it does not have any variables. Case 1 Ki is a positive literal R(t1,t2, . . . tm). Nondeterministically, choose a database clause (((R(t1', t2', . . . tm'), L+).rarw.(K1', L1) .LAMBDA.(K2', L2) .LAMBDA. . . . (Kn', Ln)), L*) such that the level L* is dominated by L, and the clause is not negated at any level between L* and L. Attempt to unify Ki' with R(t1', t2', . . . tm'). If Ki does not unify with this clause, then try another clause. If there is no other clause, the query fails. If Ki' does unify with R(t1', t2', . . . tm'), find the most general unifier O and replace the current query by ((.rarw.(K1, L) .LAMBDA. (K2, L) .LAMBDA. . . . (Ki-1, L) .LAMBDA. (K1', L) .LAMBDA. (K2', L) .LAMBDA. . . . (Kn',L) .LAMBDA. (Ki+1, L) .LAMBDA. (Ki+2, L) .LAMBDA. . . . (Kn, L)) O. Case 2 Ki is a negative ground literal P. Negative literals are proved by showing that all proofs of its positive counterpart fail. Therefore recursively enter the query evaluation process with (.rarw.P, L) as a query. If the evaluation of this query succeeds, then P has been shown to be a logical consequence of the multilevel database with respect to level L. Therefore P cannot be assumed to be true. This means that the evaluation of the original query Q fails. If the evaluation of (.rarw.P, L) fails for every path of its nondeterministic evaluation, this means that all proofs of P fail, and therefore P can be assumed to be a lemma. Hence, the current query can be replaced by the query: (.rarw.(K1, L) .LAMBDA. (K2, L) .LAMBDA. . . . (Ki-1, L) .LAMBDA. (Ki+1, L) .LAMBDA. . . . (Kn, L)) End. We now illustrate the query evaluation process with an example. Consider the following NTML-Prolog program which consist of the clauses C1-C25. C1: (((Works(e1,d1), U).rarw.), U) C2: (((Works(e3, c3), S).rarw.), S) C3: (((Works(e3, c4), S).rarw.), S) C4: (((Works(x,z), L).rarw.(Works(x,y), L) (SubDept(z,y), L)), U) C5: (((Works(e2, x), L).rarw.(CSCI(x), L)), U) C6: (((EMP(e1), U).rarw.), U) C7: (((EMP(e2), U) .rarw.), U) C8: (((EMP(e3), S) .rarw.), S) C9: (((CSCI(c1), U) .rarw.), U) C10: (((CSCI(c2), U).fwdarw.), | ||||||
