LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN 5IO.&4 The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN MAR 2 2 1978 DEC 1 4 1976 DEC 12 RECD L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/twoupperboundsfo565prad /0. XV If). £>&<> Report No. UIUCDCS-R -73-565 Z*f> TWO UPPER BOUNDS FOR THE WEIGHTED PATH LENGTH OF BINARY TREES by Jean Louis Pradels January 1973 '« * '*$ DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS TH E UtBBM" OF THE MM 2 1973 anas U wvEBsH_onu-iNOVS '..••' Report No. UIUCDCS-R-73-565 TWO UPPER BOUNDS FOR THE WEIGHTED PATH LENGTH OF BINARY TREES* by Jean Louis Pradels January 1973 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 *This work was supported in part by the Department of Computer Science at the University of Illinois at Urbana-Champaign and by the Ministry of Foreign Affairs of France. It is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science in the Graduate College of the University of Illinois, June 1971. m ACKNOWLEDGMENT I wish to express my sincere gratitude and appreciation to Professor Jurg Nievergelt for his helpful suggestions and guidance in this work. Special thanks are also due to Miss Sue Cook for her time and excellent typing. Finally, I wish to express my gratitude to the Department of Computer Science of the University of Illinois and to the Ministry of Foreign Affairs of France for their support. IV TABLE OF CONTENTS INTRODUCTION II. DEFINITIONS AND NOTATIONS III. TWO UPPER BOUNDS FOR THE WEIGHTED PATH LENGTH OF BINARY TREES 1. Inequality Used 2. First Upper Bound " 3- Second Upper Bound IV. REFERENCES Page 1 6 6 6 10 11 I, TiTTRODUCTION Rooted binary trees with weighted nodes are structures encountered in many areas , such as coding theory, searching and sorting, information storage and retrieval. A common quantity of great importance is the weighted path length. An estimate of it gives indications about the expected time of a search or the length of a code, for example. The knowledge of lower and upper bounds would permit such estimates. The noiseless coding theorem in Information Theory provides a lower bound for the weighted root-leaf path length. But/ until recently, upper bounds which are the more meaninfgul for many applications, were still lacking. They were first obtained for unweighted trees, introducing the concept of structural balance of a tree [1]. Then upper bounds for various weighted trees have been derived \ising structural balance and the new concept of weight balance of a tree [2]. In this paper, using a different definition of the weight balance of a tree, we derive two upper bounds for the total path length of general weighted node trees. The first one introduces two parameters y and p which sharpen the bound but complicate its expression. The parameters y will be useful every time we have a certain knowledge of the weight distribution. However, if the estimation of 7 or (3 requires too much computation, we can take them equal to zero and derive from this upper bound a simpler one. The second upper bound, using a different normalization, will be useful when the entropy of the weights of the nodes is not known. II. DEFINITIONS AKD NOTATION A binary tree T is a finite set of n nodes which is either empty (if n = 0) or else is partitioned into the following three classes: A' single node r called the root of T , a binary tree T on the g nodes 1, . . , r-1 called the left subtree of the root and a binary tree T, on the d nodes r+1, . . , n called the right subtree of the root (r - g+1, g + d + 1 = n). The subscripts g and d- will always refer respectively to the left and right subtrees of T . In the above definition they have two meanings. They indicate both subtrees and number of nodes. A weighted binary tree is a binary tree T such that a non-negative real number w , called a weight, is assigned to each node k. of T . We denote a weighted k. n binary tree of n nodes by the (n+l)-tuple (T ; w ., .., w ). W, W , W, will denote respectively the sum of the weights of T , T , ' g d * D n' g' and T,. d Weight distribution . We will restrict the weight distribution to the following case: At least one of the two sons of each non- terminal node must have a strictly positive weight. The weighted path length |t j_ of a tree T n is defined as the sum over all the nodes of the product of the weight of the node and the level of the node. The weighted path length satisfies the following equalities : bil =o It I = It I + It J + w + w. (n > 1) 1 n 1 ' g 1 ' d 1 g d Weig hted root balance p(T ). The two upper bounds depend on a parameter which measures the "balance" of a tree, in the sense of how close the total weights of the left and right subtrees of the root are to each other. The following quantity: 1 Wrr W d P(T ) = — if n - 1; otherwise, p (T ) = min( r-r — , 77 ) serves partially v 11 2 ' n W-w ' W-w ' * J r 1 r this purpose. This definition implies < p (T ) < t. According to the restriction made previously about the weight distribution, p(T ) =0 whenever T has a right or left subtree of one node, the weight of which is equal to zero. (W • W = 0). We will see later that such a value must be avoided because it would give a bad estimate of Ihe weighted balance of T . We notice that the weighted path length It I remains the same if the weights of the two sons of T are interchanged. Let 1 n 1 n ° w denote the positive weight of the two sons of the root of T when the condition W • W , ~ arises, then the following definition: g d ' p(T ) = yz if n = 1, otherwise n 2 W W^ " if W ■ W A f 0, then p(T ) = min(- J " d r ~f — Mv- n / - «-« v w _ w > w _ w / r r Wo- j. W* - w ¥ Wrr + «fl if W • W, - 0, then p(T ) = minC-^—r — = , „ d - «, w „™ M\* n / - —v w _ w ; w _ w r 1 makes the weighted root balance strictly positive, except for the weighted binary trees of 3 nodes when one son of the root has a weight equal to zero. The weighted balance (3(T _)_ is equal to — if n = 1 or n = 3, otherwise p(T ) =min[P(T ), p(T r ), p(T ,)]. Although the weighted root balance can be equal to zero for trees of three nodes, we notice that the weighted balance (3(T ) for n > 1 is always strictly positive. We deduce from this definition the following inequality. for n = 1 or n - 3: < p(T ) < p(T )■ < i . • ' v n — n — 2 Terminal weighted balance (3 r n (T ) : [3 m (T ) = p(T ) if n = 1 or 3, otherwise: T n ii y * P T (T n ) -mnLp T (T ), P T (T d )] this definition implies that < p (T ) < - . This parameter appears in the first upper bound. Its evaluation is easy and it sharpens the bound significantly. In particular, it makes the bound equal to the weighted path length for trees of one or three nodes. However, in any case, if we don't want to estimate it we can take it equal to zero. be I initio a of 7 : This parameter appears also in the first upper bound. It can be verified easily that its value does not matter for trees of one or three nodes. Moreover, the bound is equal to the path length for every value of 7. Therefore, the value of 7 needs only to be estimated when n > 3. if n = 3 7(T 5 ) = 5(Tj otherwise y(l ) = min(5(T ), j (l ), /(T,))- with 5(t ) = (a + 1) log (a + l) - a log a W - w r and where a = w r The assumption made previously about the weight distribtution implies that for n -* 3 j 7(T ) has always a finite value, even if w = 0. The expression of the first upper-bound shows that the parameter 7 will sharpen strongly the bound if its value is high. However, this parameter will be useful only whenever we have a sufficient knowledge of the weight distribution of T , because its estimation implies some computation. Nevertheless, 7 as well as (3 „, can in any case be taken equal to zero. This corresponds to the fact that if n > 1, then W - w > 0. — r — Entropy H(x). We will introduce the following quantity in the two bounds : H(x) = -[x log x + (1-x) log(l-x)] for < x < 1. L or L(T ) will denote the sum of the weights of the terminal nodes of ^— n- All the logarithms in this paper are taken to the "base two. III. TW O UPPFR BO l,;rs FOR T;H T : t-O'I TGTITI'ID 1'ATH LENGT :T OF BINARY TRI'IES 1. Inequality Uscd Let {x, . x_, . . , x } be a set of n non-negative real numbers such 1' 2' ' n n that S = Z x. > 1. Let x, denote any one of these n numbers and a. a real 1=1 positive number such that S > (l + OL )x, . Then, we have the following inequality: (l) (S-x. ) log (S-X. ) < S log S - x^ log x, - 8X where 5 = (o^ + 1) log (c^ + 1) - a k log a-. Proof : Let f (x) = x log x and a and b' be two real numbers' such that 1 < a, < b. Then we have: f(l + b) - f(l) < f(a + b) - f(a). S - xv "^k Applying this relation with a = — " — - , b = , we obtain the previous ^^ w inequality (l). 2. First Upper Bound Let (T j W, f • . , w ) be a weighted binary tree, then the weighted path length |t | satisfies the following inequality: T J < IM [(W - W r } lo s( W -\) + ( £ V°«^ - V°S h ~ 7 (¥ " W r )] •i- (/ + 1 - H(p T ))L where p, p , 7, L have the definitions given in (il) Proof : Case (i). The weight distribution verifies the restriction introduced previously and wc assume that if v / then w, > 1 for all k. ) For n =1, W =w and L(T, ) = 0, the assertion is true. We can a also write : (2) |T n | < ~~ [W log W + w log i- - 7 w . ] + (7 + 1 - H(i)) W, 1 - H( i } 1^1 2 1 r fhis inequality which holds for every value of 7 will he useful in the following part of the proof. b) Assume that the assertion is true for all i < n and let (T ; w' ... w'), (T-,; w" ... w") be the left and right subtrees of T . Hence G J > (J d I d n w' = w, for 1 < k < r-1 and w." = w n for 1 < k < n-r (g = r-1 and d = n-r). k k — — k . k+r — — Let [i rr , Pm ; 7 m \ $ A , Prp , 7 A be respectively the parameters of T„ and T^ defined g >,) Prp ; / • p^, Prp , f , W IC^^Ui»Ci.jf W« pal amc uci fi yjx X CMAU X^ in (il). Then, according to the relation T = T + T, + W + W, we write / ' ' n 1 ' g 1 ' d 1 g d 1 g l 1 |T J ^ htp-j [w e - w P io « (w s - w ;> + 4 k io e ^- - w ; loe sj - W v r> + HIT^T [W d - W P l0E(W d - w r> + * w k l0 « V " W r los 5" - ^Vr )] + (7 + 1 - H(p,_ )) L + {7 , + 1 - H(p m )) L, + W + W . g Using the equality L = L + L , and the three relations p(T n ) =min[p(T n ), p g , p d ], P T (T n ) = min[p T , P T ], 7^) »nln[6(T n ), 7g , 7(i ] defined in (il), we write: . , 1 g 1 T < ttt^n [W - w') log(W - w') + (W. - w") log(W, - w") + Z w/ log -4 1 n 1 - H(p) g r' Z *"' x d r y eN d r ' k & w ' d 1 + £ w£ log ^r - 7(W g - WjJ + W d - wp] + (7 + 1 - H(P T ))L + W g + W^ If we assume that T and T have both more than one node (g > 1 and d > l) then S d W - w' > 1 and W - w" > 1. We can apply the inequality (l) defined at the £> j u. r — beginning of (ill). Moreover, using the relation 7 - min[s(T ), 7 (T ), 7 (T ) ], II £2 U. we obtain the pair of inequalities: (3) 0- r - V) log (V. T - w') < W log W + w' log — 7 w' g r g r - g g r b w' 7 r (W, - w") log (W_ - w") < W. log W. + v" log -4- - yw" d r d r / — d d r b w ' r r We noiA obtain the following inequality: , , 1 n l 1 (1+) T < —^ \M log W + W, log W + Z v, log — - v log — 1 n 1 - H(p) L g to g d °d ,•■ k ve> w, r & w k=l k r - 7(W-W r )l + ( 7 v- 1 - H(p T ))L + W g + W d If, however, T and T are such that g ~ 1 or d = 1, we can't apply inequality g M- (l) because W - w' or W, - w" is equal to zero. We will use instead the g r d r expression (2) derived at the "beginning of the proof when n = 1. Assume that T and T, are such that g > 1 and d = 1, then g d l T dl < STO [w a loe w d + v r l0 4- " V^ + (7 d + 1 - H

1 for all k such that w j= 0. The result obtained in case (i), applied to this new weight distribution, gives immediately the desired result. Remark : Except for particular distributions of the weights like a descending one, for example, it is difficult to obtain an upper bound which is both sharp and simple. Jf we don't want to .introduce the parameters y and (3 , we can set both of them equal to zero., This leads us to the simpler bound: n -y [W-W ) log (W-W ) f Z W log - k r 1-1 n 1 1 V -- HlTiJ D ' M V } ;i °o(W-W r ) f 2 w k log — - w p log ~- ] + L k=i k r 10 3- Second Upper round. "n Although the entropy Z w, log — is a natural quantity, it may not k=l k always be known. For such cases, the following bound would be useful. i I T_ W-Wv> |T nl < H(pT [(W 'V 1CS ( ^ " 2 (W " w r )] + 2L mm W min iG the I,ilnil;;a;i of s11 thcpositive weights of T . The proof is quite similar. We would have the following substitutions in part (ill). Inequality U s e d : (1) (S - x^) log (S - x k ) < S log S - 2 x^ s> V\ =0or \^ Proof: case (i) (2) IT-J < i^i * [W log ¥ - 2 W X J + 2 W (3) (W g - wp log (W g - Wp < W g log W g - 2 w; (W d - wp log (W d - vp < W d log W d - 2 w^ M K\ < |jgj [\ log W g + W d log w d . 2 (W g + W d )] + 2 L g + 2 L d + W g + W d case (ii) remains the same. 11 IV. RKTOKENCES [1] Nievergelt, J. and Wong, C. K. (1970) Upper Bounds for the Total Path Length of Binary Trees, IBM Research Report RC-3075. [2] Wong, C. K. and Yue, P. C. (1971) Upper-Bounds for Various Types of Path Lengths of Binary Trees, IBM Research Report. IBLIOGRAPHIC DATA HEET 1. Report No. UIUCDCS-R -73-565 3. Recipient's Accession No. Title and Subtitle Two Upper Bounds for the Weighted Path Length of Binary Trees 5. Report Date January 1973 Author(s) Jean Louis Pradels 8. Performing Organization Rept. No-UIUCDCS-R-73-565 Performing Organization Name and Address University of Illinois at Urbana-Champaign Department of Computer Science Urbana, Illinois 6l801 10. Project/Task/Work Unit No. 11. Contract / (irant No. 2. Sponsoring Organization Name and Address University of Illinois at Urbana-Champaign Department of Computer Science Urbana, Illinois 6l801 13. Type of Report & Period Covered Master 's Thesis 14. 5. Supplementary Notes S. Abstracts Rooted binary trees with weighted nodes are structures encountered in many areas, such as coding theory, searching and sorting, information storage and retrieval. The path length is a meaningful quantity which gives indication about the expected time of a search or the length of a code for example. In this paper, two sharp bounds for the total path length of general weighted node trees are derived. I. Key Words and Document Analysis. 17o. Descriptors Bound Path-length Binary tree Weight 'b. ldentif icrs/Opcn-Knded Terms '0 ( OS ATI Field/Group 1. Availability Statement Release Unlimited >RM N TIS-38 ( 10-70) 19. Security Class (Tins Report) UNCLASSinLD. 20. Security Class (This Page UNCLASSIFIED 21. No. ,,( Pages Ik 22. P USCOMM-OC 40329-P7I OCT 2 9 1973