■ 
 
UNIVERSITY OF 
 
 ILLINOIS LIBRARY 
 
 AT URBANA-CHAMPAIGN 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books are reasons 
 for d.scplinary action and may result in dismissal from 
 the University. 
 To renew call Telephone Center, 333-8400 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/algorithmforinfe954sagi 
 
[* MM 
 
 W Report No. UIUCDCS-R-79-954 
 
 x ^ AN ALGORITHM FOR INFERRING MULTIVALUED DEPENDENCIES 
 
 THAT WORKS ALSO FOR A SUBCLASS OF PROPOSITIONAL LOGIC 
 
 UILU-ENG 79 1714 
 
 by 
 Yehoshua Sagiv 
 
 January 1979 
 
 NSF MCS77-22830 
 
 3ME aBRARX Of. IHE 
 
 JUN 2 91979 
 UNivEwsiry of Illinois 
 
UIUCDCS-R-79-954 
 
 An Algorithm for Inferring Multivalued Dependencies 
 
 that Works Also for a Subclass 
 
 of Propositional Logic 
 
 Yehoshua Sagiv 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 January 1979 
 
 This research was supported in part by the National Science Foundation 
 under grant MCS-77-22830 
 
An Algorithm for Inferring Multivalued Dependencies 
 that Works Also for a Subclass 
 of Propositional Logic 
 
 Yehoshua Sagiv 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 ABSTRACT 
 
 In this paper we give an algorithm for deciding whether a func- 
 tional or a multivalued dependency o is implied by a set of functional 
 and multivalued dependencies I. The running time of the algorithm is 
 0(|E|-|Y|), where Y is the right-hand side of a. We also investigate 
 the problem of constructing the dependency basis of a set of attributes 
 X. We show that the dependency basis can be found in 0(|Z|-p) time, 
 where p is the number of sets in the dependency basis. Since functional 
 and multivalued dependencies correspond to a subclass of propositional 
 logic (that can be viewed as a generalization of Horn clauses), our 
 algorithm is also an efficient resolution procedure for this subclass of 
 propositional logic. 
 
- 2 - 
 
 _1. Introduction 
 
 The relational model for databases [7] uses dependencies to express 
 constraints that the data must satisfy. Functional and multivalued 
 dependencies are the most common types of dependencies, and their pro- 
 perties have been investigated thoroughly (e.g., [2,5,6,8,9,13,15]). 
 
 In many applications, such as the logical design of relational 
 databases, part of the problem is to determine when a set of dependen- 
 cies Z implies another dependency o. This problem is usually referred 
 to as the mermbe^sJ^^bAem. (1) Both functional and multivalued depen- 
 dencies have inference rules [2,5] that can be used to infer o from Z if 
 (and only if) a is implied by Z. 
 
 Using these rules, [4] gives a linear time algorithm for the 
 membership problem when there are only functional dependencies. In [3] 
 a similar approach is used to develop an 0(|Z| 4 ) time algorithm for the 
 membership problem when there are functional and multivalued dependen- 
 cies. In [11] a refinement of this algorithm (based on an appropriate 
 data structure) is shown to run in 0(min(k 2 . |U| , | Z | 2 ) ) time, where k is 
 the number of dependencies in Z and U is the set of all the attributes. 
 
 Recent results [10,14] show that functional and multivalued 
 depenedencies can be interpreted also as formulas of propositional 
 logic. Functional dependencies correspond to the well known Horn 
 clauses, while multivalued dependencies correspond to a subclass of 
 
 Tl) Tu TZZt^Tlt" fr0m ? 6 f3Ct that We W3nt t0 determine whether 
 o^is a member of the set of all the dependencies that are implied by 
 
- 3 - 
 
 propositional logic that apparently has not been discussed In the 
 literature so far. 
 
 In this paper we give a new membership algorithm for functional and 
 multivalued dependencies. The algorithm is based on the interpretation 
 of these dependencies as formulas of propositional logic. The running 
 time of the algorithm is 0(|£|-|Y|), where Y is the right-hand side of 
 o. This algorithm is not only faster than the algorithm of [11], it 
 also requires a simpler data structure and, hence, has an easier imple- 
 mentation. In addition to that, in [11] it is assumed that all the 
 functional dependencies have only one attribute on the right-hand side. 
 This assumption may increase the size of the input substantially. On 
 the other hand, the algorithm in this paper can handle directly func- 
 tional dependencies that have a set of attributes on the right-hand 
 side. 
 
 2. The Relational Model for Databases 
 
 2.1. Basic Definitions 
 
 The relational model for databases assumes that the data is stored 
 in tables called relations . The columns of a table correspond to 
 attributes , and the rows to records or tuples . Each attribute has an 
 associated domain of values. It is convenient to regard the tuples as 
 mappings from the attributes to thier domains, since no canonical order- 
 ing of the attributes is needed in this way. If u is a tuple of a rela- 
 tion r and A is an attribute, then u(A) is the A-component of U. 
 
- 4 - 
 
 A relation scheme is a set of attributes labeling the columns of a 
 table. We often use the relation scheme itself as the name of the 
 table. A relation can be viewed as the "current value" of a relation 
 
 scheme. 
 
 Bcamnle_l: Suppose that the attributes are P (part-number), Q 
 (quantity), S (supplier), and C (city). These attributes are grouped 
 into one relation scheme which is the set {P,Q,S,C>. We usually denote 
 a set of attributes simply as a string of attributes, e.g., in this case 
 the relation scheme is PQSC. Figure 1 gives a relation that might be a 
 "current value" of the above relation sceme. Note that multiple copies 
 of the same tuple are not allowed. 
 
 This relation represents information about parts, their quantities, 
 the suppliers who supply these parts, and the cities in which a supplier 
 has a distribution center. It is assumed that a supplier may have a 
 distribution center in more than one city, and all these centers supply 
 all the parts that are produced by the supplier. [] 
 
 _2«_2« Dependencies 
 
 In many cases the data must satisfy certain constraints. For exam- 
 ple, consider the relation of Figure 1. Since a part number uniquely 
 identifies a part, a value for P determines a unique value for Q. That 
 is, we cannot have in the relation two distinct tuples with the same 
 value for P and different values for Q. 
 
- 5 - 
 
 p 
 
 c 
 
 S 
 
 1 438 | 
 
 1300 | 
 
 Smith | 
 
 | 34) 
 
 50 j 
 
 Smith | 
 
 | 43P 
 
 1300 
 
 Smith | 
 
 | 341 
 
 50 
 
 Smith 
 
 | 1C8 
 
 200 
 
 Jones 
 
 | 204 
 
 | 150 
 
 Jones 
 
 New York 
 
 Chicago 
 
 | Chicago 
 
 New York 
 
 Boston 
 
 Boston 
 
 Figure 1 
 
 A constraint of another type is imposed by the fact that there is 
 no dependency of a part and its quantity on a particular distribution 
 center (of a supplier who supplies this part). Since the relation must 
 reflect this fact, it is required that for any possible value of S, 
 every combination of a value for C and values for P and Q (for this 
 value of S) will appear in some tuple of the relation. 
 
 Constraints such as these are described by functional [2,7] and 
 multivalued [5,8,9,15] dependencies. A functional dependency is a 
 statement of the form X + Y, where both X and Y are sets of attributes. 
 A relation r satisfies the functional dependency X -> Y (or X -> Y holds 
 in r) if and only if there are no two tuples in r that have the same 
 value for all attributes in X and different values for an attribute in 
 Y. Note that the functional dependency P + Q holds in the relation of 
 
- 6 - 
 Figure 1. 
 
 A multivalued dependency is a statement of the form X ♦♦ Y, where X 
 and Y are sets of attributes. Let r be a relation on the set of attri- 
 butes U, and let Z be the set of all the attributes in U that are nei- 
 ther in X nor in Y. Let p be a tuple of r, and let W be a subset of U. 
 M[W] is a tuple defined on the attributes of W, such that for all A in 
 W, M[W](A) = u(A). That is, u[W] denotes the values of u for the attri- 
 butes of W. The multivalued dependency X +♦ Y holds in r if and only if 
 for all Uj and u 2 in r, if u^X] = y^X], then there are y 3 and y in r 
 such that 
 
 (i) u 3 [X] = u^X], M 3 [Y] = MjtY], and u [Z] - M,[Z] 
 (ii) m 4 [X] = U 2 [X], n 4 [T] = M 2 [Y], and y^fZ] = p^Z]. 
 In other words, X ♦♦ Y means that the set of Y-values associated with a 
 particular X-value must be independent of the values of the other attri- 
 butes. Note that the multivalued dependency S ♦♦ PQ holds in the rela- 
 tion of Figure 1. 
 
 A problem of practical importance is to determine when a dependency 
 is implied by some other dependencies. Formally, we say that a depen- 
 dency a is a consequence of a set of dependencies I if for all relations 
 r, o holds in r if all the dependencies of Z holds in r. Previous works 
 in this area [2,5] showed the existence of inference rules that are 
 instrumental in solving this problem. These rules are complete in the 
 sense that a is a consequence of I if and only if a can be inferred from 
 ^ by a sequence of applications of the rules. The rules are divided 
 into three groups reflecting the fact that the set I U {a} may contain 
 
- 7 - 
 
 only functional dependencies, or only multivalued dependencies, or a 
 mixture of both. Following [5] we now give a list of these rules. 
 
 We use letters from the begining of the alphabet to denote single 
 attributes, and letters from the end of the alphabet to denote sets of 
 attributes. XY, where both X and Y are sets of attributes, denotes the 
 union of X and Y. U is the set of all the attributes. 
 
 FD Rules 
 
 FD1 (Reflexivity) : 
 
 If Y _C X, then X * Y. 
 FD2 (Augmentation) : 
 
 If Z C W and X ■*■ Y, then XW + YZ. 
 FD3 (Transitivity): 
 
 If X + Y and Y -»■ Z, then X * Z. 
 There are three additional rules that are implied by the above rules. 
 FD4 (Pseudotransitivity) : 
 
 If X + Y and YW + Z, then XW * Z. 
 FD5 (Union): 
 
 If X -»• Y and X + Z, then X ■> YZ . 
 FD6 (Decomposition) : 
 
 If X ■*■ YZ, then X + Y and X > Z. 
 
 The rules of Union and Decomposition imply that we can always 
 replace a set of functional dependencies with an equivalent set that has 
 only functional dependencies of the form X + A (i.e., the right-hand 
 side contains a single attribute). 
 
- 8 - 
 
 MVP Rules 
 
 MVDO (Complementation) : 
 
 Let X, Y, and Z be sets of attributes such that their union 
 is U and Y n Z C X. Then X ++ Y if and only if X +* Z. 
 MVD1 (Reflexivity): 
 
 If Y C X, then X ♦+ Y. 
 MVD2 (Augmentation) : 
 
 If Z _C W and X >■> Y, then XW *•»> YZ. 
 MVD3 (Transitivity): 
 
 If X ■«■ Y and Y >-»• Z, then X ++ Z - Y. 
 The following rules follow from the above MVD Rules. 
 MVD4 (Pseudotransitivity) : 
 
 If X -►-► Y and W ^+ Z, then XW +-► Z - YW. 
 MVD5 (Union): 
 
 If X ++ Y and X ++ Z, then X ++ YZ. 
 MVD6 (Decomposition) : 
 
 If X ++ Y and X -»•■► Z, then X ++ Y ft Z, X -»•■»■ Y - Z, and 
 X ♦■»- Z - Y. 
 
 Note that the rules of Transitivity, Pseudotransitivity, and Decom- 
 position are more restricted than the corresponding FD Rules. 
 
 The FD Rules are sufficient when there are only functional depen- 
 dencies, and the MVD Pules are sufficient when there are only mul- 
 tivalued dependencies. When there are both functional and multivalued 
 dependencies, we need the FD Rules and the MVD Rules as well as the fol- 
 lowing rules. 
 
- n 
 
 FD-MVD Rules 
 
 FD-MVD1: If X + Y, then X +♦ Y. 
 
 FD-MVD2: If X ++ Z and Y ♦ Z' where Z' C Z and Y and Z are 
 
 disjoint, then X ->■ Z'. 
 
 FD-MVD 3: If X >+ Y and XY + Z, then X + Z - Y. 
 
 The last two rules are equivalent, i.e., either one of them can be 
 omitted . 
 
 2.3. Dependencies and Propositional Logic 
 
 Functional and multivalued dependencies can be interpreted also as 
 formulas of propositional logic. If attributes are considered to be 
 propositional variables and a set of attributes denotes the conjunction 
 of all the attributes that are contained in it, then a functional depen- 
 dency X ->• Y is also a formula of propositional logic, where + is now the 
 usual logical implication. That is, X -»■ Y is true if and only if some 
 attributes (i.e., propositional variables) in X are false or all the 
 attributes in Y are true. 
 
 Let Z = U - X - Y, where U is the set of all the attributes. A 
 multivalued dependency X ■*■■*■ Y is interpreted in propositional logic as 
 the formula X * Y + Z, where + is the logical implication and + is the 
 logical or. The formula X > Y + Z is true if either 
 
 (i) some attributes in X are false, or 
 
 (ii) all the attributes in Y are true, or 
 
- 10 - 
 
 (ill) all the attributes in Z are true, 
 
 A multivalued dependency X >+ Y is trivial [9] if either 
 
 (i) X y Y = U, or 
 
 (ii) Y is empty. 
 For all relations r, a trivial multivalued dependency always holds in r. 
 Similarly, when a trivial multivalued dependency X ++ Y is interpreted 
 as the formula X + Y + Z (where Z = U - X - Y) , then for all truth 
 assignments ip, X + Y + Z is true under ^ (because an empty conjunction 
 of propositional variables is always true). 
 
 Suppose we view dependencies as formulas of propositional logic. 
 The following theorem provides a new necessary and sufficient condition 
 for a dependency a to be a consequence of a set of dependencies Z. 
 
 Theorem 1 ; Let Z be a set of functional and multivalued dependen- 
 cies and let a be a functional or a multivalued dependency. The depen- 
 dency a is a consequence of Z if and only if for all truth assignments 4> 
 (i.e., assignments of truth values to attributes), if all the dependen- 
 cies of Z are true under ^, then a is also true under i}>. 
 
 Proof : This theorem is proved in [14]. [] 
 
 2^_4. The Dependency Basis and the Closure 
 
 Let M be a set of multivalued dependencies, and let F be a set of 
 functional dependencies. Given a set of attributes X, consider the fol- 
 
- 11 - 
 
 lowing set 
 
 ft = {W | X ■*■-*■ W is a consequence of M U F> 
 Note that the elements of this set are sets of attributes. The rules of 
 Complementation, Union and Decomposition for multivalued dependencies 
 imply that there exists a subset of the above set, called the dependency 
 basis of X [_5,_9] , such that 
 
 (a) the sets of the dependency basis are nonempty and their union 
 is U, 
 
 (b) the sets of the dependency basis are pairwise disjoint, and 
 
 (c) if X -►->- Y is a consequence of M, then Y is a union of some sets 
 from the dependency basis. 
 
 The algorithms of [3,11] can handle only multivalued dependencies. 
 In order to decide whether X ++ Y is a consequence of a set of mul- 
 tivalued dependencies £, these algorithms actually construct the depen- 
 dency basis of X, and check whether Y satisfies condition (c) above. 
 Both algorithms are based on the same idea. Since the dependency basis 
 is a partition of U, they start with a subset of ft that is a partition 
 of U (i.e., it satisfies conditions (a) and (b) above) and iterate until 
 condition (c) is satisfied. An initial partition is simply the one that 
 puts every attribute of X in a set by itself and all the attributes in 
 U - X in one set. As long as there is a dependency S ++ T in M and a 
 set P in the current partition such that 
 
 (1) S and P are disjoint, and 
 
 (2) neither T nor U - T are disjoint from P, 
 
 then the set P is replaced in the current partition with P fi T and 
 
- 12 - 
 
 P - (P fiT). A proof that this procedure computes the dependency basis 
 is essentially a result of [3], whose algorithm works in 0(|E| 4 ) time. 
 
 In [11] a refinement of this algorithm, using an appropriate data struc- 
 
 o 
 ture, is shown to run in 0(111") time. 
 
 The following results make it possible to use the algorithms of 
 [3,11] when there are also functional dependencies. 
 
 Lemma_2: Let F = {S -►-► A | S -»■ T is in F, and A e T>. Then X ■»■»> Y 
 is a consequnce of F U M if and only if X ^ Y is a consequnce of F U M. 
 
 Lemma 3: The functional dependency X ■+ A is a consequence of F tf M 
 if and only if 
 
 (i) X ++ A is a consequence of F y M, and 
 
 (ii) there is a functional dependency V + Z (A t- V) in F with A on 
 the right-hand side (i.e., A e Z). 
 
 Lemma 2 and Lemma 3 were originally proved in [3] using the infer- 
 ence rules, however, Lemma 2 has a much shorter proof based on truth 
 assignments [14]. 
 
 The closure of X, denoted by X*, is the set of all attributes A, 
 such that X + A is a consequence of F y M. When there are only func- 
 tional dependencies the closure can be computed in linear time [4]. 
 However, when there are also multivalued dependencies one has to compute 
 the dependency basis of X (with respect to F U M) in order to get the 
 closure of X. The closure is the set of all attributes A, such that {A} 
 is a set of the dependency basis of X and there is a functional depen- 
 
- 13 - 
 
 dency V + Z in F with A on the right-hand side, 
 
 3. The Algorithm 
 
 In this section we consider a set F of functional dependencies, a 
 set M of multivalued dependencies, and a multivalued dependency X ++ Y. 
 We assume that X and Y are disjoint, since X >->• Y is a consequence of 
 F U M if and only if X -»■-»■ Y - X is a consequence of F \J M [9]. We will 
 give an algorithm for deciding whether X ■*-*■ Y is a consequence of F U M. 
 This algorithm is not based on the construction of the dependency basis 
 of X. Instead, we use truth assignments. Consequently, from now on we 
 view dependencies as formulas of propositional logic. The algorithm is 
 derived from the following lemmas. 
 
 Lemma 
 
 4: Let W be a set in the dependency basis of X such that 
 
 W C U - X . If every attribute in W is false and all the other attri- 
 butes are true, then all the dependencies in F tf M are true. 
 
 Proof : The proof is given in [14]. [] 
 
 Lemma 5 : Let W be a set of attributes such that W fi X = <f>. Con- 
 sider the truth assignment under which every attribute in W is false and 
 all the other attributes are true. If all the dependencies of F y M are 
 true under this truth assignment, then W must be contained in one set of 
 the dependency basis of X. 
 
- 14 - 
 
 Proof: The proof is given in [14]. [] 
 
 In order to decide whether X » Y is a consequence of F y M, we try 
 to find a truth assignment under which every dependency in F y M is true 
 and X ~ Y is false. Suppose that X -- Y is not a consequence of F y M. 
 Thus, there is a set W in the dependency basis of X such that WflYM 
 and W t Y. If we assign false to every attribute in W and true to all 
 the other attributes, then I*mma 4 implies that all the dependencies in 
 F U M are true. However, X ~ Y is certainly false. Our algorithm 
 tries to find a set W such as this. 
 
 We pick an attribute B of Y, and use the procedure FIND(B) to find 
 the set W of the dependency basis that contains B. W is computed as 
 follows. Initially W is equal to U - X. While the procedure iterates, 
 the value of W is changed repeatedly in a way such that 
 (i) X -»•-»■ W is always a consequence of F M, and 
 (ii) W always contains B. 
 The current truth assignment is defined to be the one under which every 
 attribute in the current value of W is false and all the other attri- 
 butes are true. Upon termination W is a set of the dependency basis of 
 X. Since both W and Y contain the attribute B, W n Y * <j>. If w Y, 
 then W c U - X and, hence, all the dependencies in F y M are true under 
 the current truth assignment. But X ~ Y is false and, therefore, it is 
 not a consequence of F y M . If W C Y, then B was a poor choice and we 
 have to pick another attribute of Y and repeat the process. If W C Y 
 for all attributes B in Y, then X -- Y is a consequence of F y M. 
 
- 15 - 
 
 The value of W is modified during the execution of FIND(B) in the 
 following way. Suppose that S ++ T is a multivalued dependency of F y M 
 whose left-hand side is true under the current truth assignment. Since 
 X +-► W is a consequence of F P, so are X +■>• W u T and X ++ W - T (a 
 detailed proof of this fact is given later). If 3 £ W ft T we can 
 replace W with W fi T; otherwise, we replace it with W - T. (Note that B 
 is always contained in the current value of W.) Another case to he con- 
 sidered is a functional dependency S +• T whose left-hand side is true. 
 If B e T, then {B> is a set of the dependency basis of X; otherwise, we 
 replace W with W - T as in the previous case. The procedure FIND is 
 given in Figure 2, and the complete algorithm is given in Figure 3. 
 
 Lemma 6 : Whenever line 4 of the procedure FIND (in Figure 2) is 
 reached the following is true: 
 
 (a) X ■*■-> W is a consequence of F y M, 
 
 (b) every dependency that is not on QUEUE is true under the current 
 truth assignment, and 
 
 (c) every dependency that has been put on QUEUE has a left-hand 
 side that is true under the current truth assignment. 
 
 Proof : Induction on the number of times the loop of lines 4-12 is 
 
 executed. In this proof, W denotes the value of W after k iterations 
 
 of the loop of lines 4-12. The k fc truth assignment is the one under 
 
 which every attribute in W is false and all the other attributes are 
 
 k 
 
 true. 
 
- 16 - 
 
 procedure FIND(B) 
 begin 
 
 (1) make QUEUE empty 
 
 (2) W = U - X 
 
 (3) put every dependency with a left-hand side that is 
 currently true (i.e., disjoint from W) on QUEUE 
 
 (4) while QUEUE is not empty do 
 
 remove a dependency a (with a right-hand side T) from QUEUE 
 (6) if B e T 
 
 then if a is a functional dependency then return {B} 
 
 ( 8 ) else W = W fi T 
 
 (9) else W = W - T 
 
 (10) 
 (11) 
 
 lor every dependency r (with a left-hand side S) do 
 
 _if S is currently true, and x has not been on QUEUE 
 
 ( 12 ) then put T on QUEUE 
 end 
 
 end 
 
 (13) return W 
 end 
 
 Figure 2 
 
- 17 - 
 
 begin 
 
 (1) for every attribute B in Y do 
 
 (2) W = FIND(B) 
 
 (3) if W t Y then return false 
 end 
 
 (4) return true 
 end 
 
 Figure 3 
 
 Basis . Initially X -*■-*■ W is a consequence of F i}M, since 
 
 W = u - X. Every dependency that is not on QUEUE has a left-hand side 
 
 which is false and, hence, it is true. Also, every dependency on QUEUE 
 has a true left-hand side. 
 
 Induction . Suppose that the induction hypothesis is true after 
 (k-1) iterations. Let o be the dependency that is removed from QUEUE in 
 the k iteration. Either o is a functional dependency S ^ T or o is a 
 multivalued dependency S ++ T. Since o is in F I) M, it follows that in 
 either one of these cases S ■*■* T is a consequence of F ]} M. If a is a 
 functional dependency, then we can assume that B |t T; otherwise, line 4 
 is not reached again and the induction hypothesis certainly remains 
 valid. 
 
- 18 - 
 
 At first, we will show that X ♦■»■ W is a consequence of F y M. In 
 the proof we use the inference rules of Section 2.2. According to part 
 (c) of the induction hypothesis, S is true under the (k-l) St truth 
 assignment and, therefore, S C U - W^. By applying augmentation to 
 S ■*-*■ T, we- infer that U - W ♦•> T is a consequence of F y M. Part (a) 
 of the hypothesis implies that X ++ W is a consequence of F y M, and 
 by complementation so is X ++ U - W^. X ++ w DT follows from 
 X ++ U - W k _ x and. U - W^ ++ T by transitivity, and now an application 
 of the decomposition rule shows that X +■»■ W - T is also a consequence 
 of F y M. But W fc is either W^ n T or W - T, and so X ++ W is a 
 consequence of F y M. 
 
 As for part (c) of the hypothesis, the left-hand side of every 
 dependency that has already been on QUEUE (before the k th iteration) 
 remains true, because W fc _C W . The new dependencies that are put on 
 QUEUE in the loop of lines 10-12 (during the k th iteration) certainly 
 have true left-hand sides. Thus, parts (a) and (c) of the induction 
 hypothesis remain valid. 
 
 Finally, if a dependency has not been put on QUEUE during the first 
 k iterations, then its left-hand side is false and, hence, the depen- 
 dency is true. The dependency a (which is the one removed from QUEUE in 
 
 , th 
 the k iteration) is currently true, since either W, fi T = cf> or W CT 
 
 k k — 
 
 (and only W fc tl T = <f> is possible if a is a functional dependency) . Now, 
 let t be a dependency that was removed from QUEUE during one of the 
 first (k-1) iterations. There are two cases to be considered. 
 
- 19 - 
 
 Case 1. Suppose that t is a multivalued dependency P *♦ R. Part 
 
 (c) of the hypothesis implies that P C U - W^ and, hence, it follows 
 
 from part (b) that either R fi W^ = * or W^ C R. Since W k C W^, 
 
 either RtiW - <j) or W, C R and, therefore, P ++ R remains true, 
 k k 
 
 Case 1- Suppose that T is a functional dependency P - R. As was 
 
 . . th 
 shown in Case 1, P C U - W^, and since T must be true before the k 
 
 iteration, R ft W^ = 4>. But W fc C W^ and, therefore, R ^ k = * and 
 P ->- R remains true. 
 
 Thus, part (b) of the hypothesis also holds after the k itera- 
 tion. [] 
 
 Lemma 7 : The procedure FIND returns a set W of the dependency basis 
 of X that contains the attribute B (B t X) . 
 
 Proof : The procedure always terminates, since every dependency is 
 put on QUEUE at most once. Also, W always contains the attribute B. If 
 W is returned in line 13, then Lemma 6 implies that all the dependencies 
 in FP are true under the current truth assignment and X *♦ W is a 
 consequence of F U M. Since W is disjoint from X, it follows from Lemma 
 5 that W must be a set of the dependency basis of X. 
 
 Suppose that W is returned in line 7 at the k iteration. Let 
 S + T be the functional dependency a that is removed from QUEUE in this 
 iterartion. Since B e T, S * B is a consequence of F U M. Lemma 6 
 implies that S and W^ are disjoint, and X ~ W^ is a consequence of 
 F y M. It follows from the inference rule FD-MVD2 that X - B is also a 
 
- 20 - 
 
 consequence of F D M and, therefore, {B> is a set of the dependency 
 basis of X. [] 
 
 Lemma 8 ; The algorithm of Figure 3 correctly decides whether X ■*■-► Y 
 is a consequence of F U M. 
 
 Proof: The algorithm certainly terminates. If we reach line 3 with 
 a set W such that W t Y, then W is a set of the dependency basis of X 
 that contains more than one attribute (because W ft Y * <{>) . Therefore, 
 W _C U - X . Consider the truth assignment under which every attribute 
 in W is false and all the other attributes are true. Lemma 4 implies 
 that all the dependencies in F tf M are true under this truth assignment. 
 However, X ++ Y is false and, hence, X ■*■+ Y cannot be a consequence of 
 
 f y m. 
 
 For every B in Y let W B be the value of W that is obtained for B in 
 
 line 2 of the algorithm (Figure 3). Suppose that for no B, W t Y. 
 
 B 
 
 Then Y = |)W . Since X -►-»• W is a consequence of F y M for all B in Y. 
 
 B£Y ti B 
 
 X +•*■ Y is also a consequence of F U M. [] 
 
 In the following sections we describe an efficient implementation 
 of this algorithm. 
 
 _4. A Simple Data Structure 
 
 We consider a universe with n elements that can be represented by 
 the numbers l,2,...,n. The following two operations are defined on sets 
 
- 21 - 
 
 S and W 
 
 (1) INTERSECT(S.W). Replace W with W fi S. 
 
 (2) DELETE (S,W). Replace W with W - S. 
 
 We assume that a set W is given an initial value, and a sequence of 
 
 m operations with arguments (S.,W) has to be executed on-line. Since 
 
 the second argument is always W it can be omitted. In the sequel we 
 
 m 
 give an implementation that requires 0(n+ Z |S.|) time to execute this 
 
 J-l J 
 
 sequence. The set W is represented as a doubly linked list [12] using 
 
 the arrays U[l:n], LEFT[l:n], and RIGHT[l:n]. U(i) is one if i e W; 
 otherwise, it is zero. If i e W, then LEFT(i) and RIGHT(i) contain the 
 elements that are on the left and on the right of i, respectively. This 
 structure can be initialized in 0(n) time. DELETE(S) is executed by 
 assigning zero to U(i) and updating LEFT(i) and RIGHT(i) for all i in S. 
 To execute INTERSECT(S) an auxilary array AUX[l:n] is used. Initially 
 all the entries of this array are zero. Given a set S, we assign 1 to 
 AUX(i) for all i e S. Next, the list representing W is traversed. For 
 each element on the list we check (using the array AUX) whether it is in 
 S; if it is not, then the element is deleted. Finally, using the origi- 
 nal representation of S we assign zero to AUX(i) for all i e S. 
 
 This scheme requires 0(|S|) time to execute DELETE(S), and 
 0(|W|+|S|) time to execute INTERSECT(S) . The cost of DELETE(S) is dis- 
 tributed between the elements of S. The cost of INTERSECT(S) is divided 
 to 0(|S|)+0(|W-S|). The cost 0(|S|) is distributed between the elements 
 of S, and the cost 0(|W-S|) is distributed between the elements that are 
 deleted from W. Since an element can be deleted from W at most once, 
 
- 22 - 
 
 m 
 the total cost is 0(n+ Z |S |). 
 
 j = l j 
 
 Note that it is possible to get a list of all the elements that are 
 deleted from W in each operation without increasing the time complexity. 
 
 !• Implementation of the Algorithm 
 
 In this section we describe an implementation of the procedure FIND 
 that works in 0(|F»M|) time. We assume that the attributes are 
 A l' A 2""' A n and they are represented by the numbers 1,2,..., n. We also 
 associate numbers l,2,...,k with the dependencies of F U M. 
 
 Consider the procedure FIND in Figure 2. The implementation of 
 line 3 and lines 10-12 requires a linked list LIST(i) for all attributes 
 A ± . LIST(i) contains all the dependencies with a left-hand side that 
 includes A.. There is also a list, denoted as LIST(O) , that contains 
 all the dependencies with the empty set on the left-hand side. (The 
 attribute A Q corresponding to LIST(O) is a dummy attribute, i.e., it is 
 not included in U. ) In addition to these lists there is a counter C(j) 
 for all dependencies j. Initially C( j ) is equal to the number of attri- 
 butes on the left-hand side of j. During the execution of the algo- 
 rithm, C(j) is the number of attributes on the left-hand side of j that 
 ire false. These lists and counters can be initialized in time 
 '(|F UM|). The implemented procedure is given in Figure 4. 
 
 The procedure MAKE_TRUE (given in Figure 5) is used to update the 
 counters whenever some attributes are deleted from W (and, hence, become 
 
- 23 - 
 
 procedure FIND(B) 
 begin 
 
 (1) initialize the counters C(j) and the lists LIST(i) 
 
 (2) make QUEUE empty 
 
 (3) W = U - X 
 
 /* initialize W according to the data structure of Section 4 */ 
 
 (4) MAKE_TRUE(X U {A Q }) 
 
 (5) while QUEUE is not empty do^ 
 
 (6) remove a dependency a (with a right-hand side T) from QUEUE 
 
 (7) if B £ T 
 
 (8) then if a is a functional dependency then return {B} 
 
 ( 9 ) else W = W n T 
 
 (10) else W = W - T 
 
 (11) let R be the set of all the attributes 
 that are deleted from W 
 
 (12) MAKE_TRUE(R) 
 end 
 
 (13) return W 
 
 end 
 
 Figure 4 
 
- 24 - 
 
 procedure MAKE__TRUE(P) 
 begin 
 
 (14) for every attribute A in P do 
 
 i — 
 
 (15) for every dependency j in LIST(i) do 
 C16) C (j) = C(j) - 1 
 
 ( 17) M C (J) = then put j on QUEUE 
 
 end 
 
 end 
 
 end 
 
 Figure 5 
 
 true). When a counter becomes zero its associated dependency is put on 
 QUEUE. 
 
 Lemma_9: The procedure of Figure 4 requires max{0( |U| ) ,0( | F UM|)> 
 time. 
 
 Proof: The initialization step (line 1) requires 0(|F \) M| ) time. 
 Line 2 requires a constant time and line 3 requires 0(jU|) time. The 
 total cost of all the calls to the procedure MAKE_TRUE in lines 4 and 12 
 can be computed as follows. The cost of executing the loop of lines 
 14-17 for A ± is distributed between the dependencies on LIST(i) . Put- 
 ting a dependency j on QUEUE requires a constant time, since only a 
 pointer has to be moved. During the execution of FIND(B) each LIST(i) 
 
- 25 - 
 
 is traversed at most once, and the cumulative cost for each dependency 
 is proportional to the length of its left-hand side. The total cost of 
 all the calls is no more than 0(|F UM|). Lines 9-11 are implemented 
 using the data structure of Section 4. Since each dependency is put on 
 QUEUE no more than once, the total cost of lines 5-12 is 0(|U|+|F \} M|). 
 Thus, the running time of the procedure is max{0( |U | ) ,0( | F UM|)>. [] 
 
 If we assume (1) that |U| = 0(|F U M| ) , then the procedure FIND 
 requires 0(|F tf M| ) time. However, if only m (m<n) attributes from U 
 appear in the input, then their representation by the numbers 1,2,... ,m 
 requires the use of a dictionary [1]. This may take 0(min{ |U | ,mlogm>) 
 
 time. If we assume that the attributes in the input are already 
 
 (2) 
 represented by the numbers 1,2 n and |X tf Y| =0(|F UM|), then 
 
 the following theorem can be proved. 
 
 Theorem 10 : Given a set F of functional dependencies, a set M of 
 multivalued dependencies, and a multivalued dependency X ++ Y, we can 
 decide whether X -►-»■ Y is a consequence of F U M in 0(|F l> M| • |Y| ) time. 
 
 Proof : The proof is an obvious implication of Lemma 9, unless some 
 
 of the attributes in U are neither in F U M nor in X ++ Y. We define U' 
 
 to be the set of all the attributes that appear in F U M or in X ++ Y. 
 
 Let U' = <B ,B„,...,B >. If U' * U, then we add another attribute B 
 12 m 
 
 to IT. This additional attribute represents all the attributes in U 
 that are not included in U' . Now we can use U' instead of U in the pro- 
 
 (1) This assumption is made in [3] and in [11]. 
 
 (2) If this part of the assumption is omitted, then the running time in 
 Theorem 10 should be 0(max{|F U M| , I X U Y| }• | Y | ) . 
 
- 26 - 
 
 cedure FIND. The correctness of this modification follows from the 
 
 observation that all the attributes of U that do not appear in F U M or 
 
 in X ++ Y must be contained (probably with some other attributes) in one 
 set of the dependency basis. [] 
 
 &cample_2: Let F y M = {A+-BC,A—CD>. A ^ C is always a conse- 
 quence of Fi)M regardless of the other attributes in U. a ++ E is a 
 consequence of F U M if and only if A,B,C,D, and E are the only attri- 
 butes in U. In both cases the algorithm works correctly. [] 
 
 In view of the above example it may be unclear where the rule of 
 Complementation is used in the algorithm. Actually it is not. The 
 algorithm uses a weaker version of the rule of Complementation, namely 
 4) ++ U. It should not be surprising that this can be done, since in [6] 
 it has already been shown that if the rule of Complementation (MVDO) is 
 replaced with the rule <j> +> U, then the resulting set of inference rules 
 remains complete. 
 
 S, Modifying the Algorithm for Functional Dependencies 
 
 Suppose that we have to decide whether a functional dependency 
 ' A is a consequence of F tf M. This can be done using Lemma 3. That 
 is, we have to check that X ++ A is a consequence of F y M and there is 
 another functional dependency V + Z (A t V) in F U M with A e Z. Using 
 the algorithm of Figure 3, we can test in linear time whether X -»■•»■ A is 
 a consequence of F y M. Testing whether there is a functional depen- 
 
- 27 - 
 
 dency V + Z in F U M with A e Z is also linear. 
 
 Theorem 11 : Given a set F of functional dependencies, a set M of 
 multivalued dependencies, and a functional dependency X ♦ Y, we can test 
 whether X + Y is a consequence of F UM in 0(|F UM|«|Y|) time. 
 
 Proof: First, we test in linear time whether for all A e Y, there 
 is a functional dependency in F with A on the right-hand side but not on 
 the left-hand side. Next, line 3 of the algorithm of Figure 3 is modi- 
 fied to 
 
 if_ W * {B> then return false 
 and the algorithm is applied to F U M and X ■*■ Y. [] 
 
 7. Construction of the Dependency Basis and the Closure 
 
 The algorithm for deciding whether a multivalued dependency X *+ Y 
 is a consequence of F U M can also be used to construct the dependency 
 basis of X and, in particular, the closure of X. 
 
 The procedure FIND is also the core of the algorithm for the con- 
 struction of the dependency basis. We choose an attribute A ± , and com- 
 pute FIND(A.). The set W obtained in this way is the set of the depen- 
 dency basis that contains A.. We repeat this process with another 
 attribute that is not in W, until we find all the sets of the dependency 
 basis. The algorithm is described in Figure 6. The array A[l:n] indi- 
 cates which attributes are contained in the sets of the dependency basis 
 that have already been found. If A(i) = 1, then a set of the dependency 
 
- 28 - 
 
 begin 
 
 (1) initialize A(i) 
 
 (2) i = 1 
 
 (3) while i < n jio 
 
 (4) while A(i) = 1 and i < n do 
 
 (5) i = i + 1 
 end 
 
 (6) 
 
 
 if i > n then return 
 
 (7) 
 
 
 W = FINDCA.) 
 
 (8) 
 
 
 print W 
 
 (9) 
 
 
 for every A^ in W do 
 
 (10) 
 
 end 
 
 A(k) = 1 
 end 
 
 end 
 
 Figure 6 
 
 basis that contains the attribute A. has already been found; if 
 A(i) = 0, then no such set has yet been found. Initially A(i) is 1 for 
 all attributes A. in X, and for all the other attributes. 
 
 Theorem 12 : Let F be a set of functional dependencies, M a set of 
 multivalued dependencies, and X a set of attributes. The dependency 
 
- 29 - 
 
 basis of X and the closure of X can be found in 0(|F U M| «p) time, where 
 p is the number of sets in the dependency basis of X. 
 
 Proof : Obviously, line 7 of the algorithm is executed no more than 
 p times. The rest of the algorithm requires 0(|F U M| ) time. [] 
 
 Since the running time is proportional to p, it is a good approach 
 
 * 
 to apply the linear time algorithm of [A] to F in order to find ^ (the 
 
 closure of X with respect to F) , and then apply the algorithm of Figure 
 
 * 
 
 6 to X instead of X. 
 
 F 
 
 If we are interested only in X , it can be constructed in 
 0(|F l> M| - |F| > time (since only the attributes that appear in the 
 right-hand side of the functional dependencies in F may be in X ) . 
 
 References 
 [1] Aho, A. V., J. E. Hopcroft, and J. D. Ullman, The Design and 
 Analysis of Computer Algorithms , Addison-Wesley , Reading, Mass., 
 
 1974. 
 
 [2] Armstrong, W. W., "Dependency Structures of Database Relation- 
 ships," Proc . IFIP 74, North Holland, 1974, pp. 580-583. 
 
 [3] Eeeri, C. , "On the Membership Problem for Multivalued Dependencies 
 in Relational Databases," TR-229, Dept. of Elec Eng. and Comp. 
 Sci., Princeton University, Princeton, N. J., Sept., 1977. 
 
 [4] Eeeri, C, and P. A. Bernstein, "Computational Problems Related to 
 the Design of Normal Form Relational Schemas ," ACM Trans, on 
 
- 30 - 
 
 Database Systems , Vol. 4, No. 1 (March, 1979), pp. 30-59. 
 [5] Beeri, C. , R. Fagin, and J. H. Howard, "A Complete Axiomatization 
 for Functional and Multivalued Dependencies in Database Rela- 
 tions," Proc. ACM SIGMOD Int . Conf . on Management of Data , 
 Toronto, Aug., 1977, pp. 47-61. 
 
 [6] Biskup, J., "On the Complementation Rule for Multivalued Dependen- 
 cies in Database Relations," Acta Informatica . Vol. 10, No. 3 
 (1978), pp. 297-305. 
 
 [7] Codd, E. F., "A Relational Model for Large Shared Data Banks," 
 Comm. ACM, Vol. 13, No. 6 (June, 1970), pp. 377-387. 
 
 [8] Delobel, C. , "Contributions Theoretiques a la Conception d'un Sys- 
 teme d'Inf ormations ," Ph.D. thesis, University of Grenoble, Oct., 
 1973. 
 
 [9] Fagin, R. , "Multivalued Dependencies and a New Normal Form for 
 Relational Databases," ACM Trans , on Database Systems , Vol. 2, No. 
 3 (Sept., 1977), pp. 262-278. 
 
 [10] Fagin, R., "Functional Dependencies in a Relational Database and 
 Propositional Logic," IBM J. of Res. and Dev. , Vol. 21, No. 6 
 (Nov., 1977), pp. 534-544. 
 
 [11] Hagihara, K. , M. Ito, K. Taniguchi, and T. Kasami, "Decision Prob- 
 lems for Multivalued Dependencies in Relational Databases," to 
 appear SI AM J. Computing . 
 
 [12] Knuth, D. E., Jhe Art of Computer Programming ; Fundamental 
 Algorithms, Vol. 1, Addison-Wesley, Reading, Mass., 1968. 
 
 [13] Mendelzon, A. 0., "On Axioraatizing Multivalued Dependencies in 
 Relational Databases," J. ACM, Vol. 26, No. 1 (Jan. 1979), pp. 
 
- 31 - 
 
 37-44. 
 
 14] Sagiv, Y. , and R. Fagin, "An Equivalence Between Relational Data- 
 base Dependencies and a Subclass of Propositional Logic," IBM 
 Research Report RJ2500, March, 1979. 
 
 15] Zaniolo, C , "Analysis and Design of Relational Schemata for Data- 
 base Systems," Tech. Rep. UCLA-ENG-7769, Dept. of Comp. Sci., 
 UCLA, July, 1976. 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 4. Tit le .inJ Subt it le 
 
 1. Report No. 
 
 UIUCDCS-R-79-954 
 
 An Algorithm for Inferring Multivalued Dependencies that 
 Works Also for a Subclass of Prepositional Logic 
 
 7. Author(s) 
 
 Yehoshua Sagiv 
 
 >. Performing Organization Name and Address 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 61801 
 
 !2. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D.C. 
 
 !S. Supplementary Notes 
 
 3. Recipient's Accession No. 
 
 5. Report Date 
 
 6. 
 
 January 1979 
 
 8. Performing Organization Rept. 
 No. 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 NSF MCS 77-22830 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 16. Abstracts 
 
 P is the number of sets in the deDPnriPnrv h,c c? found in 0(|z|.p) time, where 
 
 procedure for this subclass of prepositional logic. ertlclent resolution 
 
 ■lords and Document Analysis. 17a. Descri 
 
 ptors 
 
 ^ab t ase: 1 p^osn?o^,ta; : c Va,Ued ^"^ ^^ P ™ blem ' "™™« 
 
 r b. Identifiers Open-Ended Te 
 
 -ATI Fie Id /Group 
 • Availability Statement 
 
 RELEASE UNLIMITED 
 
 «M NTIS-35 (10-7C 
 
 19. Security Class (This 
 Report ) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 33 
 
 22. Price 
 
 USCOMM-DC 40329-P7 1 
 
' 
 

AUG 1 » i»ap 
 
V-*o5£ 
 
 ! 
 
 r