■ UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for d.scplinary action and may result in dismissal from the University. To renew call Telephone Center, 333-8400 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN Digitized by the Internet Archive in 2013 http://archive.org/details/algorithmforinfe954sagi [* MM W Report No. UIUCDCS-R-79-954 x ^ AN ALGORITHM FOR INFERRING MULTIVALUED DEPENDENCIES THAT WORKS ALSO FOR A SUBCLASS OF PROPOSITIONAL LOGIC UILU-ENG 79 1714 by Yehoshua Sagiv January 1979 NSF MCS77-22830 3ME aBRARX Of. IHE JUN 2 91979 UNivEwsiry of Illinois UIUCDCS-R-79-954 An Algorithm for Inferring Multivalued Dependencies that Works Also for a Subclass of Propositional Logic Yehoshua Sagiv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 January 1979 This research was supported in part by the National Science Foundation under grant MCS-77-22830 An Algorithm for Inferring Multivalued Dependencies that Works Also for a Subclass of Propositional Logic Yehoshua Sagiv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 ABSTRACT In this paper we give an algorithm for deciding whether a func- tional or a multivalued dependency o is implied by a set of functional and multivalued dependencies I. The running time of the algorithm is 0(|E|-|Y|), where Y is the right-hand side of a. We also investigate the problem of constructing the dependency basis of a set of attributes X. We show that the dependency basis can be found in 0(|Z|-p) time, where p is the number of sets in the dependency basis. Since functional and multivalued dependencies correspond to a subclass of propositional logic (that can be viewed as a generalization of Horn clauses), our algorithm is also an efficient resolution procedure for this subclass of propositional logic. - 2 - _1. Introduction The relational model for databases [7] uses dependencies to express constraints that the data must satisfy. Functional and multivalued dependencies are the most common types of dependencies, and their pro- perties have been investigated thoroughly (e.g., [2,5,6,8,9,13,15]). In many applications, such as the logical design of relational databases, part of the problem is to determine when a set of dependen- cies Z implies another dependency o. This problem is usually referred to as the mermbe^sJ^^bAem. (1) Both functional and multivalued depen- dencies have inference rules [2,5] that can be used to infer o from Z if (and only if) a is implied by Z. Using these rules, [4] gives a linear time algorithm for the membership problem when there are only functional dependencies. In [3] a similar approach is used to develop an 0(|Z| 4 ) time algorithm for the membership problem when there are functional and multivalued dependen- cies. In [11] a refinement of this algorithm (based on an appropriate data structure) is shown to run in 0(min(k 2 . |U| , | Z | 2 ) ) time, where k is the number of dependencies in Z and U is the set of all the attributes. Recent results [10,14] show that functional and multivalued depenedencies can be interpreted also as formulas of propositional logic. Functional dependencies correspond to the well known Horn clauses, while multivalued dependencies correspond to a subclass of Tl) Tu TZZt^Tlt" fr0m ? 6 f3Ct that We W3nt t0 determine whether o^is a member of the set of all the dependencies that are implied by - 3 - propositional logic that apparently has not been discussed In the literature so far. In this paper we give a new membership algorithm for functional and multivalued dependencies. The algorithm is based on the interpretation of these dependencies as formulas of propositional logic. The running time of the algorithm is 0(|£|-|Y|), where Y is the right-hand side of o. This algorithm is not only faster than the algorithm of [11], it also requires a simpler data structure and, hence, has an easier imple- mentation. In addition to that, in [11] it is assumed that all the functional dependencies have only one attribute on the right-hand side. This assumption may increase the size of the input substantially. On the other hand, the algorithm in this paper can handle directly func- tional dependencies that have a set of attributes on the right-hand side. 2. The Relational Model for Databases 2.1. Basic Definitions The relational model for databases assumes that the data is stored in tables called relations . The columns of a table correspond to attributes , and the rows to records or tuples . Each attribute has an associated domain of values. It is convenient to regard the tuples as mappings from the attributes to thier domains, since no canonical order- ing of the attributes is needed in this way. If u is a tuple of a rela- tion r and A is an attribute, then u(A) is the A-component of U. - 4 - A relation scheme is a set of attributes labeling the columns of a table. We often use the relation scheme itself as the name of the table. A relation can be viewed as the "current value" of a relation scheme. Bcamnle_l: Suppose that the attributes are P (part-number), Q (quantity), S (supplier), and C (city). These attributes are grouped into one relation scheme which is the set {P,Q,S,C>. We usually denote a set of attributes simply as a string of attributes, e.g., in this case the relation scheme is PQSC. Figure 1 gives a relation that might be a "current value" of the above relation sceme. Note that multiple copies of the same tuple are not allowed. This relation represents information about parts, their quantities, the suppliers who supply these parts, and the cities in which a supplier has a distribution center. It is assumed that a supplier may have a distribution center in more than one city, and all these centers supply all the parts that are produced by the supplier. [] _2«_2« Dependencies In many cases the data must satisfy certain constraints. For exam- ple, consider the relation of Figure 1. Since a part number uniquely identifies a part, a value for P determines a unique value for Q. That is, we cannot have in the relation two distinct tuples with the same value for P and different values for Q. - 5 - p c S 1 438 | 1300 | Smith | | 34) 50 j Smith | | 43P 1300 Smith | | 341 50 Smith | 1C8 200 Jones | 204 | 150 Jones New York Chicago | Chicago New York Boston Boston Figure 1 A constraint of another type is imposed by the fact that there is no dependency of a part and its quantity on a particular distribution center (of a supplier who supplies this part). Since the relation must reflect this fact, it is required that for any possible value of S, every combination of a value for C and values for P and Q (for this value of S) will appear in some tuple of the relation. Constraints such as these are described by functional [2,7] and multivalued [5,8,9,15] dependencies. A functional dependency is a statement of the form X + Y, where both X and Y are sets of attributes. A relation r satisfies the functional dependency X -> Y (or X -> Y holds in r) if and only if there are no two tuples in r that have the same value for all attributes in X and different values for an attribute in Y. Note that the functional dependency P + Q holds in the relation of - 6 - Figure 1. A multivalued dependency is a statement of the form X ♦♦ Y, where X and Y are sets of attributes. Let r be a relation on the set of attri- butes U, and let Z be the set of all the attributes in U that are nei- ther in X nor in Y. Let p be a tuple of r, and let W be a subset of U. M[W] is a tuple defined on the attributes of W, such that for all A in W, M[W](A) = u(A). That is, u[W] denotes the values of u for the attri- butes of W. The multivalued dependency X +♦ Y holds in r if and only if for all Uj and u 2 in r, if u^X] = y^X], then there are y 3 and y in r such that (i) u 3 [X] = u^X], M 3 [Y] = MjtY], and u [Z] - M,[Z] (ii) m 4 [X] = U 2 [X], n 4 [T] = M 2 [Y], and y^fZ] = p^Z]. In other words, X ♦♦ Y means that the set of Y-values associated with a particular X-value must be independent of the values of the other attri- butes. Note that the multivalued dependency S ♦♦ PQ holds in the rela- tion of Figure 1. A problem of practical importance is to determine when a dependency is implied by some other dependencies. Formally, we say that a depen- dency a is a consequence of a set of dependencies I if for all relations r, o holds in r if all the dependencies of Z holds in r. Previous works in this area [2,5] showed the existence of inference rules that are instrumental in solving this problem. These rules are complete in the sense that a is a consequence of I if and only if a can be inferred from ^ by a sequence of applications of the rules. The rules are divided into three groups reflecting the fact that the set I U {a} may contain - 7 - only functional dependencies, or only multivalued dependencies, or a mixture of both. Following [5] we now give a list of these rules. We use letters from the begining of the alphabet to denote single attributes, and letters from the end of the alphabet to denote sets of attributes. XY, where both X and Y are sets of attributes, denotes the union of X and Y. U is the set of all the attributes. FD Rules FD1 (Reflexivity) : If Y _C X, then X * Y. FD2 (Augmentation) : If Z C W and X ■*■ Y, then XW + YZ. FD3 (Transitivity): If X + Y and Y -»■ Z, then X * Z. There are three additional rules that are implied by the above rules. FD4 (Pseudotransitivity) : If X + Y and YW + Z, then XW * Z. FD5 (Union): If X -»• Y and X + Z, then X ■> YZ . FD6 (Decomposition) : If X ■*■ YZ, then X + Y and X > Z. The rules of Union and Decomposition imply that we can always replace a set of functional dependencies with an equivalent set that has only functional dependencies of the form X + A (i.e., the right-hand side contains a single attribute). - 8 - MVP Rules MVDO (Complementation) : Let X, Y, and Z be sets of attributes such that their union is U and Y n Z C X. Then X ++ Y if and only if X +* Z. MVD1 (Reflexivity): If Y C X, then X ♦+ Y. MVD2 (Augmentation) : If Z _C W and X >■> Y, then XW *•»> YZ. MVD3 (Transitivity): If X ■«■ Y and Y >-»• Z, then X ++ Z - Y. The following rules follow from the above MVD Rules. MVD4 (Pseudotransitivity) : If X -►-► Y and W ^+ Z, then XW +-► Z - YW. MVD5 (Union): If X ++ Y and X ++ Z, then X ++ YZ. MVD6 (Decomposition) : If X ++ Y and X -»•■► Z, then X ++ Y ft Z, X -»•■»■ Y - Z, and X ♦■»- Z - Y. Note that the rules of Transitivity, Pseudotransitivity, and Decom- position are more restricted than the corresponding FD Rules. The FD Rules are sufficient when there are only functional depen- dencies, and the MVD Pules are sufficient when there are only mul- tivalued dependencies. When there are both functional and multivalued dependencies, we need the FD Rules and the MVD Rules as well as the fol- lowing rules. - n FD-MVD Rules FD-MVD1: If X + Y, then X +♦ Y. FD-MVD2: If X ++ Z and Y ♦ Z' where Z' C Z and Y and Z are disjoint, then X ->■ Z'. FD-MVD 3: If X >+ Y and XY + Z, then X + Z - Y. The last two rules are equivalent, i.e., either one of them can be omitted . 2.3. Dependencies and Propositional Logic Functional and multivalued dependencies can be interpreted also as formulas of propositional logic. If attributes are considered to be propositional variables and a set of attributes denotes the conjunction of all the attributes that are contained in it, then a functional depen- dency X ->• Y is also a formula of propositional logic, where + is now the usual logical implication. That is, X -»■ Y is true if and only if some attributes (i.e., propositional variables) in X are false or all the attributes in Y are true. Let Z = U - X - Y, where U is the set of all the attributes. A multivalued dependency X ■*■■*■ Y is interpreted in propositional logic as the formula X * Y + Z, where + is the logical implication and + is the logical or. The formula X > Y + Z is true if either (i) some attributes in X are false, or (ii) all the attributes in Y are true, or - 10 - (ill) all the attributes in Z are true, A multivalued dependency X >+ Y is trivial [9] if either (i) X y Y = U, or (ii) Y is empty. For all relations r, a trivial multivalued dependency always holds in r. Similarly, when a trivial multivalued dependency X ++ Y is interpreted as the formula X + Y + Z (where Z = U - X - Y) , then for all truth assignments ip, X + Y + Z is true under ^ (because an empty conjunction of propositional variables is always true). Suppose we view dependencies as formulas of propositional logic. The following theorem provides a new necessary and sufficient condition for a dependency a to be a consequence of a set of dependencies Z. Theorem 1 ; Let Z be a set of functional and multivalued dependen- cies and let a be a functional or a multivalued dependency. The depen- dency a is a consequence of Z if and only if for all truth assignments 4> (i.e., assignments of truth values to attributes), if all the dependen- cies of Z are true under ^, then a is also true under i}>. Proof : This theorem is proved in [14]. [] 2^_4. The Dependency Basis and the Closure Let M be a set of multivalued dependencies, and let F be a set of functional dependencies. Given a set of attributes X, consider the fol- - 11 - lowing set ft = {W | X ■*■-*■ W is a consequence of M U F> Note that the elements of this set are sets of attributes. The rules of Complementation, Union and Decomposition for multivalued dependencies imply that there exists a subset of the above set, called the dependency basis of X [_5,_9] , such that (a) the sets of the dependency basis are nonempty and their union is U, (b) the sets of the dependency basis are pairwise disjoint, and (c) if X -►->- Y is a consequence of M, then Y is a union of some sets from the dependency basis. The algorithms of [3,11] can handle only multivalued dependencies. In order to decide whether X ++ Y is a consequence of a set of mul- tivalued dependencies £, these algorithms actually construct the depen- dency basis of X, and check whether Y satisfies condition (c) above. Both algorithms are based on the same idea. Since the dependency basis is a partition of U, they start with a subset of ft that is a partition of U (i.e., it satisfies conditions (a) and (b) above) and iterate until condition (c) is satisfied. An initial partition is simply the one that puts every attribute of X in a set by itself and all the attributes in U - X in one set. As long as there is a dependency S ++ T in M and a set P in the current partition such that (1) S and P are disjoint, and (2) neither T nor U - T are disjoint from P, then the set P is replaced in the current partition with P fi T and - 12 - P - (P fiT). A proof that this procedure computes the dependency basis is essentially a result of [3], whose algorithm works in 0(|E| 4 ) time. In [11] a refinement of this algorithm, using an appropriate data struc- o ture, is shown to run in 0(111") time. The following results make it possible to use the algorithms of [3,11] when there are also functional dependencies. Lemma_2: Let F = {S -►-► A | S -»■ T is in F, and A e T>. Then X ■»■»> Y is a consequnce of F U M if and only if X ^ Y is a consequnce of F U M. Lemma 3: The functional dependency X ■+ A is a consequence of F tf M if and only if (i) X ++ A is a consequence of F y M, and (ii) there is a functional dependency V + Z (A t- V) in F with A on the right-hand side (i.e., A e Z). Lemma 2 and Lemma 3 were originally proved in [3] using the infer- ence rules, however, Lemma 2 has a much shorter proof based on truth assignments [14]. The closure of X, denoted by X*, is the set of all attributes A, such that X + A is a consequence of F y M. When there are only func- tional dependencies the closure can be computed in linear time [4]. However, when there are also multivalued dependencies one has to compute the dependency basis of X (with respect to F U M) in order to get the closure of X. The closure is the set of all attributes A, such that {A} is a set of the dependency basis of X and there is a functional depen- - 13 - dency V + Z in F with A on the right-hand side, 3. The Algorithm In this section we consider a set F of functional dependencies, a set M of multivalued dependencies, and a multivalued dependency X ++ Y. We assume that X and Y are disjoint, since X >->• Y is a consequence of F U M if and only if X -»■-»■ Y - X is a consequence of F \J M [9]. We will give an algorithm for deciding whether X ■*-*■ Y is a consequence of F U M. This algorithm is not based on the construction of the dependency basis of X. Instead, we use truth assignments. Consequently, from now on we view dependencies as formulas of propositional logic. The algorithm is derived from the following lemmas. Lemma 4: Let W be a set in the dependency basis of X such that W C U - X . If every attribute in W is false and all the other attri- butes are true, then all the dependencies in F tf M are true. Proof : The proof is given in [14]. [] Lemma 5 : Let W be a set of attributes such that W fi X = . Con- sider the truth assignment under which every attribute in W is false and all the other attributes are true. If all the dependencies of F y M are true under this truth assignment, then W must be contained in one set of the dependency basis of X. - 14 - Proof: The proof is given in [14]. [] In order to decide whether X » Y is a consequence of F y M, we try to find a truth assignment under which every dependency in F y M is true and X ~ Y is false. Suppose that X -- Y is not a consequence of F y M. Thus, there is a set W in the dependency basis of X such that WflYM and W t Y. If we assign false to every attribute in W and true to all the other attributes, then I*mma 4 implies that all the dependencies in F U M are true. However, X ~ Y is certainly false. Our algorithm tries to find a set W such as this. We pick an attribute B of Y, and use the procedure FIND(B) to find the set W of the dependency basis that contains B. W is computed as follows. Initially W is equal to U - X. While the procedure iterates, the value of W is changed repeatedly in a way such that (i) X -»•-»■ W is always a consequence of F M, and (ii) W always contains B. The current truth assignment is defined to be the one under which every attribute in the current value of W is false and all the other attri- butes are true. Upon termination W is a set of the dependency basis of X. Since both W and Y contain the attribute B, W n Y * . If w Y, then W c U - X and, hence, all the dependencies in F y M are true under the current truth assignment. But X ~ Y is false and, therefore, it is not a consequence of F y M . If W C Y, then B was a poor choice and we have to pick another attribute of Y and repeat the process. If W C Y for all attributes B in Y, then X -- Y is a consequence of F y M. - 15 - The value of W is modified during the execution of FIND(B) in the following way. Suppose that S ++ T is a multivalued dependency of F y M whose left-hand side is true under the current truth assignment. Since X +-► W is a consequence of F P, so are X +■>• W u T and X ++ W - T (a detailed proof of this fact is given later). If 3 £ W ft T we can replace W with W fi T; otherwise, we replace it with W - T. (Note that B is always contained in the current value of W.) Another case to he con- sidered is a functional dependency S +• T whose left-hand side is true. If B e T, then {B> is a set of the dependency basis of X; otherwise, we replace W with W - T as in the previous case. The procedure FIND is given in Figure 2, and the complete algorithm is given in Figure 3. Lemma 6 : Whenever line 4 of the procedure FIND (in Figure 2) is reached the following is true: (a) X ■*■-> W is a consequence of F y M, (b) every dependency that is not on QUEUE is true under the current truth assignment, and (c) every dependency that has been put on QUEUE has a left-hand side that is true under the current truth assignment. Proof : Induction on the number of times the loop of lines 4-12 is executed. In this proof, W denotes the value of W after k iterations of the loop of lines 4-12. The k fc truth assignment is the one under which every attribute in W is false and all the other attributes are k true. - 16 - procedure FIND(B) begin (1) make QUEUE empty (2) W = U - X (3) put every dependency with a left-hand side that is currently true (i.e., disjoint from W) on QUEUE (4) while QUEUE is not empty do remove a dependency a (with a right-hand side T) from QUEUE (6) if B e T then if a is a functional dependency then return {B} ( 8 ) else W = W fi T (9) else W = W - T (10) (11) lor every dependency r (with a left-hand side S) do _if S is currently true, and x has not been on QUEUE ( 12 ) then put T on QUEUE end end (13) return W end Figure 2 - 17 - begin (1) for every attribute B in Y do (2) W = FIND(B) (3) if W t Y then return false end (4) return true end Figure 3 Basis . Initially X -*■-*■ W is a consequence of F i}M, since W = u - X. Every dependency that is not on QUEUE has a left-hand side which is false and, hence, it is true. Also, every dependency on QUEUE has a true left-hand side. Induction . Suppose that the induction hypothesis is true after (k-1) iterations. Let o be the dependency that is removed from QUEUE in the k iteration. Either o is a functional dependency S ^ T or o is a multivalued dependency S ++ T. Since o is in F I) M, it follows that in either one of these cases S ■*■* T is a consequence of F ]} M. If a is a functional dependency, then we can assume that B |t T; otherwise, line 4 is not reached again and the induction hypothesis certainly remains valid. - 18 - At first, we will show that X ♦■»■ W is a consequence of F y M. In the proof we use the inference rules of Section 2.2. According to part (c) of the induction hypothesis, S is true under the (k-l) St truth assignment and, therefore, S C U - W^. By applying augmentation to S ■*-*■ T, we- infer that U - W ♦•> T is a consequence of F y M. Part (a) of the hypothesis implies that X ++ W is a consequence of F y M, and by complementation so is X ++ U - W^. X ++ w DT follows from X ++ U - W k _ x and. U - W^ ++ T by transitivity, and now an application of the decomposition rule shows that X +■»■ W - T is also a consequence of F y M. But W fc is either W^ n T or W - T, and so X ++ W is a consequence of F y M. As for part (c) of the hypothesis, the left-hand side of every dependency that has already been on QUEUE (before the k th iteration) remains true, because W fc _C W . The new dependencies that are put on QUEUE in the loop of lines 10-12 (during the k th iteration) certainly have true left-hand sides. Thus, parts (a) and (c) of the induction hypothesis remain valid. Finally, if a dependency has not been put on QUEUE during the first k iterations, then its left-hand side is false and, hence, the depen- dency is true. The dependency a (which is the one removed from QUEUE in , th the k iteration) is currently true, since either W, fi T = cf> or W CT k k — (and only W fc tl T = is possible if a is a functional dependency) . Now, let t be a dependency that was removed from QUEUE during one of the first (k-1) iterations. There are two cases to be considered. - 19 - Case 1. Suppose that t is a multivalued dependency P *♦ R. Part (c) of the hypothesis implies that P C U - W^ and, hence, it follows from part (b) that either R fi W^ = * or W^ C R. Since W k C W^, either RtiW - . But W fc C W^ and, therefore, R ^ k = * and P ->- R remains true. Thus, part (b) of the hypothesis also holds after the k itera- tion. [] Lemma 7 : The procedure FIND returns a set W of the dependency basis of X that contains the attribute B (B t X) . Proof : The procedure always terminates, since every dependency is put on QUEUE at most once. Also, W always contains the attribute B. If W is returned in line 13, then Lemma 6 implies that all the dependencies in FP are true under the current truth assignment and X *♦ W is a consequence of F U M. Since W is disjoint from X, it follows from Lemma 5 that W must be a set of the dependency basis of X. Suppose that W is returned in line 7 at the k iteration. Let S + T be the functional dependency a that is removed from QUEUE in this iterartion. Since B e T, S * B is a consequence of F U M. Lemma 6 implies that S and W^ are disjoint, and X ~ W^ is a consequence of F y M. It follows from the inference rule FD-MVD2 that X - B is also a - 20 - consequence of F D M and, therefore, {B> is a set of the dependency basis of X. [] Lemma 8 ; The algorithm of Figure 3 correctly decides whether X ■*■-► Y is a consequence of F U M. Proof: The algorithm certainly terminates. If we reach line 3 with a set W such that W t Y, then W is a set of the dependency basis of X that contains more than one attribute (because W ft Y * <{>) . Therefore, W _C U - X . Consider the truth assignment under which every attribute in W is false and all the other attributes are true. Lemma 4 implies that all the dependencies in F tf M are true under this truth assignment. However, X ++ Y is false and, hence, X ■*■+ Y cannot be a consequence of f y m. For every B in Y let W B be the value of W that is obtained for B in line 2 of the algorithm (Figure 3). Suppose that for no B, W t Y. B Then Y = |)W . Since X -►-»• W is a consequence of F y M for all B in Y. B£Y ti B X +•*■ Y is also a consequence of F U M. [] In the following sections we describe an efficient implementation of this algorithm. _4. A Simple Data Structure We consider a universe with n elements that can be represented by the numbers l,2,...,n. The following two operations are defined on sets - 21 - S and W (1) INTERSECT(S.W). Replace W with W fi S. (2) DELETE (S,W). Replace W with W - S. We assume that a set W is given an initial value, and a sequence of m operations with arguments (S.,W) has to be executed on-line. Since the second argument is always W it can be omitted. In the sequel we m give an implementation that requires 0(n+ Z |S.|) time to execute this J-l J sequence. The set W is represented as a doubly linked list [12] using the arrays U[l:n], LEFT[l:n], and RIGHT[l:n]. U(i) is one if i e W; otherwise, it is zero. If i e W, then LEFT(i) and RIGHT(i) contain the elements that are on the left and on the right of i, respectively. This structure can be initialized in 0(n) time. DELETE(S) is executed by assigning zero to U(i) and updating LEFT(i) and RIGHT(i) for all i in S. To execute INTERSECT(S) an auxilary array AUX[l:n] is used. Initially all the entries of this array are zero. Given a set S, we assign 1 to AUX(i) for all i e S. Next, the list representing W is traversed. For each element on the list we check (using the array AUX) whether it is in S; if it is not, then the element is deleted. Finally, using the origi- nal representation of S we assign zero to AUX(i) for all i e S. This scheme requires 0(|S|) time to execute DELETE(S), and 0(|W|+|S|) time to execute INTERSECT(S) . The cost of DELETE(S) is dis- tributed between the elements of S. The cost of INTERSECT(S) is divided to 0(|S|)+0(|W-S|). The cost 0(|S|) is distributed between the elements of S, and the cost 0(|W-S|) is distributed between the elements that are deleted from W. Since an element can be deleted from W at most once, - 22 - m the total cost is 0(n+ Z |S |). j = l j Note that it is possible to get a list of all the elements that are deleted from W in each operation without increasing the time complexity. !• Implementation of the Algorithm In this section we describe an implementation of the procedure FIND that works in 0(|F»M|) time. We assume that the attributes are A l' A 2""' A n and they are represented by the numbers 1,2,..., n. We also associate numbers l,2,...,k with the dependencies of F U M. Consider the procedure FIND in Figure 2. The implementation of line 3 and lines 10-12 requires a linked list LIST(i) for all attributes A ± . LIST(i) contains all the dependencies with a left-hand side that includes A.. There is also a list, denoted as LIST(O) , that contains all the dependencies with the empty set on the left-hand side. (The attribute A Q corresponding to LIST(O) is a dummy attribute, i.e., it is not included in U. ) In addition to these lists there is a counter C(j) for all dependencies j. Initially C( j ) is equal to the number of attri- butes on the left-hand side of j. During the execution of the algo- rithm, C(j) is the number of attributes on the left-hand side of j that ire false. These lists and counters can be initialized in time '(|F UM|). The implemented procedure is given in Figure 4. The procedure MAKE_TRUE (given in Figure 5) is used to update the counters whenever some attributes are deleted from W (and, hence, become - 23 - procedure FIND(B) begin (1) initialize the counters C(j) and the lists LIST(i) (2) make QUEUE empty (3) W = U - X /* initialize W according to the data structure of Section 4 */ (4) MAKE_TRUE(X U {A Q }) (5) while QUEUE is not empty do^ (6) remove a dependency a (with a right-hand side T) from QUEUE (7) if B £ T (8) then if a is a functional dependency then return {B} ( 9 ) else W = W n T (10) else W = W - T (11) let R be the set of all the attributes that are deleted from W (12) MAKE_TRUE(R) end (13) return W end Figure 4 - 24 - procedure MAKE__TRUE(P) begin (14) for every attribute A in P do i — (15) for every dependency j in LIST(i) do C16) C (j) = C(j) - 1 ( 17) M C (J) = then put j on QUEUE end end end Figure 5 true). When a counter becomes zero its associated dependency is put on QUEUE. Lemma_9: The procedure of Figure 4 requires max{0( |U| ) ,0( | F UM|)> time. Proof: The initialization step (line 1) requires 0(|F \) M| ) time. Line 2 requires a constant time and line 3 requires 0(jU|) time. The total cost of all the calls to the procedure MAKE_TRUE in lines 4 and 12 can be computed as follows. The cost of executing the loop of lines 14-17 for A ± is distributed between the dependencies on LIST(i) . Put- ting a dependency j on QUEUE requires a constant time, since only a pointer has to be moved. During the execution of FIND(B) each LIST(i) - 25 - is traversed at most once, and the cumulative cost for each dependency is proportional to the length of its left-hand side. The total cost of all the calls is no more than 0(|F UM|). Lines 9-11 are implemented using the data structure of Section 4. Since each dependency is put on QUEUE no more than once, the total cost of lines 5-12 is 0(|U|+|F \} M|). Thus, the running time of the procedure is max{0( |U | ) ,0( | F UM|)>. [] If we assume (1) that |U| = 0(|F U M| ) , then the procedure FIND requires 0(|F tf M| ) time. However, if only m (m) time. If we assume that the attributes in the input are already (2) represented by the numbers 1,2 n and |X tf Y| =0(|F UM|), then the following theorem can be proved. Theorem 10 : Given a set F of functional dependencies, a set M of multivalued dependencies, and a multivalued dependency X ++ Y, we can decide whether X -►-»■ Y is a consequence of F U M in 0(|F l> M| • |Y| ) time. Proof : The proof is an obvious implication of Lemma 9, unless some of the attributes in U are neither in F U M nor in X ++ Y. We define U' to be the set of all the attributes that appear in F U M or in X ++ Y. Let U' = . If U' * U, then we add another attribute B 12 m to IT. This additional attribute represents all the attributes in U that are not included in U' . Now we can use U' instead of U in the pro- (1) This assumption is made in [3] and in [11]. (2) If this part of the assumption is omitted, then the running time in Theorem 10 should be 0(max{|F U M| , I X U Y| }• | Y | ) . - 26 - cedure FIND. The correctness of this modification follows from the observation that all the attributes of U that do not appear in F U M or in X ++ Y must be contained (probably with some other attributes) in one set of the dependency basis. [] &cample_2: Let F y M = {A+-BC,A—CD>. A ^ C is always a conse- quence of Fi)M regardless of the other attributes in U. a ++ E is a consequence of F U M if and only if A,B,C,D, and E are the only attri- butes in U. In both cases the algorithm works correctly. [] In view of the above example it may be unclear where the rule of Complementation is used in the algorithm. Actually it is not. The algorithm uses a weaker version of the rule of Complementation, namely 4) ++ U. It should not be surprising that this can be done, since in [6] it has already been shown that if the rule of Complementation (MVDO) is replaced with the rule +> U, then the resulting set of inference rules remains complete. S, Modifying the Algorithm for Functional Dependencies Suppose that we have to decide whether a functional dependency ' A is a consequence of F tf M. This can be done using Lemma 3. That is, we have to check that X ++ A is a consequence of F y M and there is another functional dependency V + Z (A t V) in F U M with A e Z. Using the algorithm of Figure 3, we can test in linear time whether X -»■•»■ A is a consequence of F y M. Testing whether there is a functional depen- - 27 - dency V + Z in F U M with A e Z is also linear. Theorem 11 : Given a set F of functional dependencies, a set M of multivalued dependencies, and a functional dependency X ♦ Y, we can test whether X + Y is a consequence of F UM in 0(|F UM|«|Y|) time. Proof: First, we test in linear time whether for all A e Y, there is a functional dependency in F with A on the right-hand side but not on the left-hand side. Next, line 3 of the algorithm of Figure 3 is modi- fied to if_ W * {B> then return false and the algorithm is applied to F U M and X ■*■ Y. [] 7. Construction of the Dependency Basis and the Closure The algorithm for deciding whether a multivalued dependency X *+ Y is a consequence of F U M can also be used to construct the dependency basis of X and, in particular, the closure of X. The procedure FIND is also the core of the algorithm for the con- struction of the dependency basis. We choose an attribute A ± , and com- pute FIND(A.). The set W obtained in this way is the set of the depen- dency basis that contains A.. We repeat this process with another attribute that is not in W, until we find all the sets of the dependency basis. The algorithm is described in Figure 6. The array A[l:n] indi- cates which attributes are contained in the sets of the dependency basis that have already been found. If A(i) = 1, then a set of the dependency - 28 - begin (1) initialize A(i) (2) i = 1 (3) while i < n jio (4) while A(i) = 1 and i < n do (5) i = i + 1 end (6) if i > n then return (7) W = FINDCA.) (8) print W (9) for every A^ in W do (10) end A(k) = 1 end end Figure 6 basis that contains the attribute A. has already been found; if A(i) = 0, then no such set has yet been found. Initially A(i) is 1 for all attributes A. in X, and for all the other attributes. Theorem 12 : Let F be a set of functional dependencies, M a set of multivalued dependencies, and X a set of attributes. The dependency - 29 - basis of X and the closure of X can be found in 0(|F U M| «p) time, where p is the number of sets in the dependency basis of X. Proof : Obviously, line 7 of the algorithm is executed no more than p times. The rest of the algorithm requires 0(|F U M| ) time. [] Since the running time is proportional to p, it is a good approach * to apply the linear time algorithm of [A] to F in order to find ^ (the closure of X with respect to F) , and then apply the algorithm of Figure * 6 to X instead of X. F If we are interested only in X , it can be constructed in 0(|F l> M| - |F| > time (since only the attributes that appear in the right-hand side of the functional dependencies in F may be in X ) . References [1] Aho, A. V., J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms , Addison-Wesley , Reading, Mass., 1974. [2] Armstrong, W. W., "Dependency Structures of Database Relation- ships," Proc . IFIP 74, North Holland, 1974, pp. 580-583. [3] Eeeri, C. , "On the Membership Problem for Multivalued Dependencies in Relational Databases," TR-229, Dept. of Elec Eng. and Comp. Sci., Princeton University, Princeton, N. J., Sept., 1977. [4] Eeeri, C, and P. A. Bernstein, "Computational Problems Related to the Design of Normal Form Relational Schemas ," ACM Trans, on - 30 - Database Systems , Vol. 4, No. 1 (March, 1979), pp. 30-59. [5] Beeri, C. , R. Fagin, and J. H. Howard, "A Complete Axiomatization for Functional and Multivalued Dependencies in Database Rela- tions," Proc. ACM SIGMOD Int . Conf . on Management of Data , Toronto, Aug., 1977, pp. 47-61. [6] Biskup, J., "On the Complementation Rule for Multivalued Dependen- cies in Database Relations," Acta Informatica . Vol. 10, No. 3 (1978), pp. 297-305. [7] Codd, E. F., "A Relational Model for Large Shared Data Banks," Comm. ACM, Vol. 13, No. 6 (June, 1970), pp. 377-387. [8] Delobel, C. , "Contributions Theoretiques a la Conception d'un Sys- teme d'Inf ormations ," Ph.D. thesis, University of Grenoble, Oct., 1973. [9] Fagin, R. , "Multivalued Dependencies and a New Normal Form for Relational Databases," ACM Trans , on Database Systems , Vol. 2, No. 3 (Sept., 1977), pp. 262-278. [10] Fagin, R., "Functional Dependencies in a Relational Database and Propositional Logic," IBM J. of Res. and Dev. , Vol. 21, No. 6 (Nov., 1977), pp. 534-544. [11] Hagihara, K. , M. Ito, K. Taniguchi, and T. Kasami, "Decision Prob- lems for Multivalued Dependencies in Relational Databases," to appear SI AM J. Computing . [12] Knuth, D. E., Jhe Art of Computer Programming ; Fundamental Algorithms, Vol. 1, Addison-Wesley, Reading, Mass., 1968. [13] Mendelzon, A. 0., "On Axioraatizing Multivalued Dependencies in Relational Databases," J. ACM, Vol. 26, No. 1 (Jan. 1979), pp. - 31 - 37-44. 14] Sagiv, Y. , and R. Fagin, "An Equivalence Between Relational Data- base Dependencies and a Subclass of Propositional Logic," IBM Research Report RJ2500, March, 1979. 15] Zaniolo, C , "Analysis and Design of Relational Schemata for Data- base Systems," Tech. Rep. UCLA-ENG-7769, Dept. of Comp. Sci., UCLA, July, 1976. BIBLIOGRAPHIC DATA SHEET 4. Tit le .inJ Subt it le 1. Report No. UIUCDCS-R-79-954 An Algorithm for Inferring Multivalued Dependencies that Works Also for a Subclass of Prepositional Logic 7. Author(s) Yehoshua Sagiv >. Performing Organization Name and Address Department of Computer Science University of Illinois Urbana, Illinois 61801 !2. Sponsoring Organization Name and Address National Science Foundation Washington, D.C. !S. Supplementary Notes 3. Recipient's Accession No. 5. Report Date 6. January 1979 8. Performing Organization Rept. No. 10. Project/Task/Work Unit No. 11. Contract /Grant No. NSF MCS 77-22830 13. Type of Report & Period Covered 14. 16. Abstracts P is the number of sets in the deDPnriPnrv h,c c? found in 0(|z|.p) time, where procedure for this subclass of prepositional logic. ertlclent resolution ■lords and Document Analysis. 17a. Descri ptors ^ab t ase: 1 p^osn?o^,ta; : c Va,Ued ^"^ ^^ P ™ blem ' "™™« r b. Identifiers Open-Ended Te -ATI Fie Id /Group • Availability Statement RELEASE UNLIMITED «M NTIS-35 (10-7C 19. Security Class (This Report ) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 33 22. Price USCOMM-DC 40329-P7 1 ' AUG 1 » i»ap V-*o5£ ! r