J HI mbhbbsbi BSmM ■shshii HSNBB 9 S§§@1 ■ BBS m mm hb&h w& L 1 Iff ■HliBIai bseSg \Wm temmmBm r SSUi WgSSBRm H" MH is ■a Hi mm ffl&wlm ml H WMm 1 WmmmmiugBtiBttM HH iBSVttfl us hw si ■"■MB ■■ ■ &KM ■ I « ■ JafflL. 1 fflfflfiBH w mwm&m BBHBffiffl rafl MS iiojKKKnHKipuciau] rnnliBl LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 IJHbr "0.69S-7O2, Cot>. £. £ The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN ty+t RECTI L161 — O-1096 ^a> uiucdcs-k-y:?-w ON COMPUTING THE RANK FUNCTION FOR A SET OF VECTORS by Andrew Chi -chin Yao Foong Frances Yao February 19 Y 5 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN MAY 7 1975 URBANA, ILLINOIS UIUCDCS-R-75-699 ON COMPUTING THE RANK FUNCTION FOR A SET OF VECTORS by Andrew Chi -chin Yao Foong Frances Yao February 1975 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS 6l801 * This research is supported in part by the National Science Foundation under contract NSF GJ U1538. Digitized by the Internet Archive in 2013 http://archive.org/details/oncomputingrankf699yaoa ABSTRACT Let F "be a set of n d-component vectors with the componentwise partial order defined on it. It is desired to compute the rank for each vector in F. We show that this can be done in time O(n(log n) ) for d = 2, 3, and 2 --±- 0(d n ~ (log n) ) for d < log n by employing interesting data structures. 2 When d £ n log n, an algorithm is given which uses the Four Russians' method to perform efficient partial order verification and runs in 0(d n /log n) time, Finally, an algorithm is described which, on the average, runs in time 0(d n d (log n) d_2 ). 1. Introduction 1 A d-component vector (or d-vector ) is a d-tuple of real numbers. The j -th component of a d-vector v is written as v. or (v).. we define a partial order J J < on the set of all d-vectors as follows: v (l) ->(2) ->(k) . _ .. -. +(l) -*(2) +(k) ■* w , w , . .., w in F satisfying w < w < ••• < w = v. Recently, Luccio and Preparata[l] , Kung[2j , and F. Yao[3j have considered the problem of finding the minimal elements for a geven set F of d-vectors, where a minimal element is a vector v in F with r(v) = 1. In this paper, we study the more general problem of determining the rank function for a set of vectors. Let F be a set of n distinct d-vectors. Computing the rank function r of F in a straightforward manner will take 0(dn ) steps. The purpose of this paper is to present several more sophisticated algorithms which compute r in less time, assuming that n and d are restricted to certain regions . Main results are : (A) Using worst-case behavior as the cost measure: (1) For d = 2,3, r can be determined in O(n(log n) ) steps. 2-JL (2) For d _f_ log n, r can be determined in 0(.dn (log n) ) steps. 2 (3) For d 5 n log n, r can be determined in 0(r ) steps. log n (B) Using average behavior as the cost measure: 1+ j d _ 2 For fixed d, r can be determined in 0(n (log n) ) steps on the average if all the (nd)I orderings of the set of numbers {(v ). I 1 <_ i <_n, 1 < j <_d } are equally probable. J The special case d=2 has also been studied independently by Szymanski 2. Fast Algorithms for d S log n 2.1 Introduction A general procedure to compute the rank function is the following: +(1) ->(2) +(n) 1. Arrange the vectors in a sequence v , v , ..., v such that v Xd/ ^v xy ifj>i. 2. Compute the rank function r inductively by r(v (l) )=l r(7 (J) ) = max { r(v (i) ) + 1 | 7 (1) < ^ (J) , i < j } for J > 1. Straightforward implementation of the above prodedure involves 2 "0(dn ) scalar comparisons between components of the vectors. Proper choice 2 of data structures, however, can reduce the running time to o(dn ) when d <_ log n, as will be seen below. 2.2 The Cases d = 2 and 3 In the special cases d = 2, 3, the procedure outlined above can be implemented as follows: a + v + +(!) *(2) -*(n) . . 1. Arrange the vectors as v , v , ..., v i n increasing lexicographic order. 2 . S . = 4> for i = 1, 2, ..., n. (S. will be the set of vectors of rank i found so far in the algorithm. ) 3. For j = 1, 2, ..., n, do the following: Find (by a binary search) the largest number i in the index set {1, 2, . . . , n} for which there exists a vector ■+ ■+ -»■( 1 ) w z S. with w < v . If i A does not exist, set i. to 0. i '0 r(v (j) ) - i Q + 1 vv u ( ' (J>1 Steps 1 and 2 can be done in O(nlog n) operations. To facilitate the computation in step 3, we shall employ a data structure for each S. similar to that used in [l]. Let us consider the data structures for d = 2 and d = 3 separately: (i) d = 2 Suppose r(v ) is being computed. Let s. = min { w | w = (w ,w ) e S. }. ■> ■> -K1) (i) It is easy to see that there exists a w g S. with w < v iff s. < v_ . ^ l i—2 Therefore, to compute r(v ) for any j takes O(log n) operations. The total number of operaiions needed in step 3 is thus O(nlog n) , since the efforts needed to keep the s. 's updated are small, in fact 0(n). (ii) d = 3 Again assume r(v ) is being computed. Consider the set T. of projections of vectors in S. on the last two components. Let S! be the set of minimal elements of T. , i.e. i SJ = {(w 2 ,w ) |v£S., and (w£ w') £ (w 2 ,w ) for all w' e sj. Write S! as an ordered table (x ,y ), (x ,y ), ..., (x ,y ) with x < x <••• < x and y > y_ > • •• > y . To check if w < v for some we S., it is sufficient to find k (by a binary search) such that x. < v^ J < x. ,_ and see whether v_ J > y n . The set S! can be structured k. 2 k+1 3 k i as an AVL tree, the running time of step 3 is then bounded by 2 n(log n) + time needed to maintain AVL trees for all S! The second term is of the order O(nlog n), since at most n insertions and n deletions are performed by the algorithm in total. We have proved the following theorem: Theorem 2.1 Let d = 2 or 3. The rank function for a set of n d-component vectors can be computed in time O(n(log n) ). Corollary 2.2 The longest chain in a set of n d-component vectors can be found in time O(n(log n) ) for d = 2, 3. The proof of Corollary 2.2 is immediate. 2.3 General Case d < log n 2.3.1 The Algorithm For d f_ log n in general, we shall present an algorithm that d+1 computes the rank function in time 0(dn ). Corresponding to the two steps of the general procedure mentioned in Sec. 2.1, this algorithm consists of two phases: phase 1 . Divide the vectors into n/p disjoint sequences F , F , ..., F , , each containing p vectors. If we write F. = (v ' ,v ' , . . . ,v i -*■( i H) -* (k m) this division satisfies the condition that v ' < v ' implies that (i,&) prededes (k,m) in lexicographic order. phase 2 . r(v ' ) is computed in lexicographic order of (k,m) as follows: For each i, 1 <_ i <_ k, find f. = max { r(v) | v < v" (k ' m) , v e F. }. l ' l (If the set on the right hand side is empty, set f. to 0.) Let r(v (k ' m) ) = max { f. + 1 }. The division made in phase 1 will be such that, for each i, the Ki i 2 > • ••' i d 1 P- These sequences are then labelled lexicographically in (i ,i , ...,i ) as F n , F_, .. ., F , . 1 2 n/p begin for j = to d-1 do begin for each i,...,i. (l < i ,...,i, < p) do_ -*■ J -'-J begin perform a stable sort on F(i ,i , . . . ,i . ) by the j+1 st component of the vectors; divide the sorted sequence into p segments of equal lengths {F(i l5 i 2 ,...,i i ) | 1 <_ i < p} so that v. +1 l w J+1 if v e F(i lS .. .,i I), weF(i lt . . . ,i ,m) and £< m; comment for j=0, F(i.,...,i.) is defined to be 1 .1 end end end relabel the sequences { F(i. , i oJ . . . ,i , ) } as F n , F A , .... F , in 1 2 d 12 n/p lexicographic order of (i ,i , ...,i ); end partition The next procedure compare will be used in implementing phase 2 of the algorithm. Given a number f, v, and a set of vectors A, it performs the operation f «- max {f,{r(w)|weA, w < v } } . This procedure acquires efficiency by keeping track of 2d+l numbers high. (A), low. (A), (l <_ j <_ d) and rmax(A) defined by: J J high. (A) = max { w. } 3 W£A J low. (A) = min { w.} 1 <_j <_d J weA J rmax(A) = max { r(w) | w e A } procedure compare (v, A, f); begin compare v. with high. (A ) and low. (A) for j = 1,2,..., d; J J J case 1 v. < low. (A) for some J: J J f retains its original value; case i v. >_high.(A) for all j: J J f = max { f, rmax(A) }; case 3 low (A) <_ v < high (A) for j = k ,k , . . . ,k , and lJ J J J- £— )C v. >_ high (A) for all other j: J J -> begin for each w e A do begin if v >, w for all j e {k ,k , ...,k } then f = max { f, r(w)}; end end ■■■■■■: end compare algorithm rank(F) ; begin call partition (F, F^ F g , ..., F n / p ); compute high. (F. ), low (F ) for 1 <_ j <_ d, 1 <_ i <_ n/p; rmax(F. ) = for 1 1 i 1 n/p; l for k = 1 to n/p do begin for m = 1 to p do begin f = 0; for 1 $ i $ k-1 call compare (v ' , F., f); for I = 1 to m-1 do begin if v ( k 5 m )>_ v ( k ^) for all 1 < j <_d then f = max { f, r(v ' ')}; end end ,+(k,mh _ r(v J = r + 1; rmax(F ) = max { rmax(F k ), r(v ' ) } end end end rank 2.3.2 Analysis of the Algorithm The cost of algorithm rank can be divided into several parts: (i) Cost needed to construct F , ..., F , : 0(dnlog n) operations. (ii) Cost needed to compute the high.(F.)'s and lov.(F.)'s: 0(dn) operations. J i J i / • • • \ r. ^(k,m) . „,(k,m) ,. . ,. /-*-(k,mK (m) £ C where C ' is the cost for computing r(v ). It is clear that C ' <_ 3dn/p + p* (number of (i,j)'s for which low.(F.) •: v! k ' m) < high.(F.)) (1) J i - J J i We will prove in Theorem 2.3 that the parenthesized term on 2 the right hand side of Equ.(l) is bounded by dn/p . Therefore, ,2 C (k ' m) < V E 0(dn/p) k,m — k,m 1 0(dn 2 /p). Thus the running time of the algorithm is 0(dn /p) = 0(dn ), if we can prove the following proposition. Theorem 2 . 3 Let x be any number, and j an integer between 1 and d. 2 There are at most n/p sets F such that low.(F ) £ x < high.(F ) (2) J K J K. Proof For any fixed i , i ,...,i. , let us consider m, the number of sets F such that Equ.(2) is true and F C F(i ,i , . . . ,i . ) (cf. procedure k K- J- <- J - -L partition ) . Now, there is at most one integer & such that low (F(i lt i 2 ,...,i ,£)) ix < high (FU^ig,...,! ,*)) Therefore, if F C F(i ,i ,...,i. ) and Equ.(2) is true, then F C F(i ,i ,...,i. ,Z). Since there are at most n/p F v ' s ^- n F(i 1 ,i 2 ,...i ,£), we have m <_ n/p^ +1 . There are p " distinct (j-l)-tuples (i ,i ,...,i ). Hence there are a at most p J " 1 '(n/p J+1 ) = n/p 2 F 's satisfying Equ.(2). ~\ K. 2- We have proved that 0(dn ) time is sufficient for computing the rank function. The performance of algorithm rank can be further upgraded by the following modifications: (i) We divide F into sets of size n (instead of n ) according to the 2nd, 3rd, ..., d-2nd components of the vectors. (ii) Initially set all high.'s and low.'s to - °° and + °° respectively. J J The vectors are then input in ascending order of their first components. The high 's and low 's are updated each time after a new vector is J J processed. (iii) Each set F. is structured as a set of AVL trees (one for each rank) by the last two components of the vectors as in Sec. 2.2. These modifications lead to an improvement in the cost of the algorithm as can be easily analyzed. As a result, we have Theorem 2.k The rank function of a set of n d-component vectors can be n—P 2 computed in time 0(dn (log n) ). For the longest chain of a set, an analog of Cor. 2.2 is true. 2 3. The Four Russians' Method and the Case d >, n log n 2 For the case d 5 n log n, we shall compute the rank function by first explicitly constructing the partial order matrix for the vectors in F. The procedure we employ to compute the partial order matrix is based on a method for solving the following problem: Given a fixed partial order P, how do we check if an input set of numbers { x , x , . .., x } 2 satisfies P? We shall show that this can be done in time 0(n /log n) in Sec. 3.1. The complete algorithm to compute rank function is studied in Sec. 3.2. 3.1 Partial Order Verification Let p be a fixed partial order on a set of n elements. P is given in the form of an n x n matrix M. For any input set of numbers S = {x n , x^, ..., x } , it is desired to check whether S satisfies the 12 n conditions: x. > x. if M. . = 1 (3) i - J ij 10 We shall prove that, with some preconditioning on the matrix M, this can be 2 done in 0(n /log n) time for any input set S. For a given M, define {B , B , ..., B }, a family of subsets of {1, 2, ..., n} by J e B. iff M. =1 (1+) i ij Condition (3) can then be written as x. 5 max {x. jeB.} i=l,2,...,n (5) In view of (5), we can perform partial order verification by the following procedure adapted from the Four Russians ' Method [5] [6] . (i) (precondition) Partition {1, 2, . . . , n} into n/log n disjoint n/logn segments I n , I_, ..., I ,, each of size log n. Write B. as B. = U C. . 12 n/logn 1 1 lj where C. . c I. for all i, j . (ii) For each i, compute m(D) = max {x. j e D} for all D CI.. (iii) For each i, compute m(B. ) = max {m(C. .)}. 1 . ij J (iv) Test if x. 5 m(B.) for all i. It is not difficult to see that steps (ii), (iii) and (iv) take 2 2 time 0(n /log n). The preconditioning in (i) involves 0(n ) operations. We will use this procedure in the next subsection. 3.2 Computing the Rank Function in 0(dn /log n) Time r ->(l) -+(2) -+(n), Let F = {v v , v v , ..., v v '} be a set of d-vectors . We define (k) n x n Boolean matrices M , 1 <_ k <_ d, by m!V= 1 iff v U) > v (j) for all SL = 1,2, ..., k. (6) Clearly M properly describes the partial order < defined on F (M = 1 -*- J ■*■ ■*■ \ iff v. j v ) . In the following we will describe a method for finding ■*■ J 11 NT , M , . .., M successively. Once M is found, the rank function can then "be computed easily. (k) (k-l) It is clear that M = M iff the following conditions hold: ^>A 3) if m'*- 1 ^ 1 (T) Now, according to the result in Sec. 3.1, condition (7) can he checked in i = 1,2,..., n} assuming that the 2 r (i) 0(n /log n) time for the set {v (k-l) preconditioning on M has "been done. This suggests the following procedure for computing M (d) procedure PO (F, JT d '); begin M = n x n matrix with all entries equal to 1; precondition M (see Sec. 3.1); for k = 1 to d do begin if condition (T) is satisfied then test = true else test = false; (Use the method in Sec. 3.1) if test = true ™(k) . »„(k-l) then M is M else compute M and precondition M (Use the fact that M; . ; = 1 iff ij k - k end (8) (9) end end PO 12 Analysis of Procedure PO 2 (i) Line (8) is executed d times, each taking 0(n /log n) operations for a total of 0(dn /log n). 2 (ii) Each time line (9) is executed, it takes 0(n ) operations. The total running time is therefore 2 2 T = 0(dn /log n + n «L) (10) (k) (k-1) where L is the number of k's for which M ^ M . The next theorem shows that L $ n(n-l), which then implies that T = 0(dn 2 /log n + n ) = 0(dn /log n) if d >, x? log n. Theorem 3.1 There are at most n(n-l) numbers k for which M f M Proof Observe that M.^ = 1 iff M. k-1 ' = 1 and v, X ' > v, J) . It ij 1J k - k (k) follows that the number of zero entries in M is strictly greater than that in NT k_1) if M^ k) t M^ k ~ . Since M (d ^ can have at most n(n-l) zero entries, the theorem follows. h . An Algorithm With Good Average Behavior In the previous two sections we have based our algorithms on two different approaches to compute the rank function. Still a third method is the following: In the first pass, find the set D of minimal elements in F (using the algorithm of Kung [23 for example). Assign rank 1 to the elements in D , and let F + F - D . In the second pass, find the set D of minimal elements in F, and assign them rank 2. The process is repeated until finally F = <)> . 13 We will prove that the preceding procedure runs in time 1+1" d _ 2 0(n (log n) ) on the average for any fixed d >_ 3 . Here we assume that in the input set F = {v ,v ,...,v }, the dn numbers F' = {(v ). 1 Jl i 5. n » 1 ^L J li ^0 are randomly distributed, i.e., all the (dn)! orderings for F' are equally probable. It is shown in 121 that d— 2 finding the minimal elements can be done in O(n(log n) ) time. Therefore, we need only show that on the average, no more than 0(n ) passes are necessary for the procedure. It is easy to see that the number of passes needed on the average is simply the length of the longest chain in the set F on the average. Therefore, it suffices to prove the following theorem: Theorem k .1 Let F = {v , v , ..., v } be a set of d-vectors where all the (dn)! linear orderings of the set F' = {(v ). 1<_ i<_ n, 1<_ j fd) J are equally probable. Then the average length of the longest chain in F is less than c n n for some constant c >0. d d Proof Consider the following: Let \ = (p il' P i2"-" P in ) i = 1 « 2 ' '••' d - 1 be d-1 permutations of ( 1, 2, . . . , n) . A common increasing subsequence with respect to the it. 's is a sequence of indices j < j < ,,# < j such that p.. < p . . <••• < p . . for each j = 1 , 2 , . . . , d-1 . 1J 1 1J 2 1J k Let L -.(n) be the average length of the longest common increasing subsequence when these d-1 permutations are random. It is not difficult to show that L^ -,(n) is the average length d-1 of the longest chain for F. Hence we need only prove that I L, .(n) < c^n. (11) d-1 — d Ill To prove Equ.(ll), consider the set of all (d-l) -tuples of permutations T = {(it , tt , ..., n -,_-,)}• We shall find an upper hound on the number of elements in r which have a common increasing subsequence of length 5 k. The following is such a hound: where the first factor C is the number of possible k positions for a common increasing subsequence j < j <••• < j the second factor (C, ) is the number of possible ways to choose p. . , p. , ..., p.. , k ij x ij 2 ij k and the last factor correspond to the different ways of distributing the remaining numbers. Since |r| = (n!) , the probability P of having a common K. increasing subsequence of length >_ k satisfies ((£)*( (n-k)!)*" 1 c£ P, < (n!)^ 1 (k!)^ 1 k k < - n (k:)d (VW) d e \ n l/d Therefore, for k > (e + e) n , we have P <_ constant • (a ) where 1 > a, > 0. It is easy to derive then that d 1 L, _(n) < (e + 2e)n as n ■+ °° . d-l i This proves that L J ,< en for some c > 0. d-l d d ■ — ' The special case L (n) has been studied by Steckin CT^ • 15 REFERENCES [l] Luccio, F. and Preparata, F. P. , On Finding the Maxima of a Set of Vectors, Instituto di Scienze dell 'Invormazione, Universita di Pisa, Corso Italia UO, 561OO Pisa, Italy, 1973- [2] Kung, H. T. , On the Computational Complexity of Finding the Maxima of A Set of Vectors, 15th Annual Symposium of SWAT, October, 197^. [3] Yao, F. F. , On Finding the Maximal Elements in a Set of Plane Vectors, University of Illinois, Technical Report UIUCDCS-R-7I+-667 , July, 197U. [ U] Szymanski, T. C, A Special Case of the Maximal Common Subsequence Problem, TR #170 Princeton Preprint, January, 1975. [5] Arlazarov, V. L. , Dinic , E. A., Kronrod, M. A. and Faradzev, I. A., On Economical Construction of the Transitive Closure of a Directed Graph, Dokl . Akad. Nauk SSSR 19 1 !, 1+87-^8 (in Russian). English translation in Soviet Math. Dokl. 11:5, 1209-1210. [6] Aho, V. A., Hopcroft, J. E. and Ullman, J. D. , The Design and Analysis of Computer Algorithms, Addison Wesley, 197^, 2U3-2H7. [7] Steckin, B. S., Monotone Sequences in a Permutation of n Natural Numbers, Mat. Zametki 13(1973), 511-51^. (English translation in Math. Notes 13(1973), 310-313.) JIBLIOGRAPHIC DATA ,HEET 1. Report No. UIUCDCS-R-75-699 Title anJ Subtitle ON COMPUTING THE RANK FUNCTION FOR A SET OF VECTORS . Author(s) An drew Chi-chih Yao, Foong Frances Yao Performing Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 618OI Sponsoring Organization Name and Address National Science Foundation 1800 G Street N.W. Washington, D.C. 20550 3. Recipient's Accession No. 5. Report Date February 1975 6. 8. Performing Organization Rept. N °- UIUCDCS-R-75-699 10. Project/Task/Work Unit No. 11. Contract/Grant No. NSF GJ U1538 13. Type of Report & Period Covered 14. ipplementary Notes 16. Abstracts Let F be a set of n d-component vectors with the componentwise partial order defined on it. It is desired to compute the rank for each vector in F. We show 2 - -A- that this can be done in time 0(n(log n) ) for d = 2, 3, and 0(d n (log n) ) for d <: log n by employing interesting data structures. When d $ n log n, an algorithm is given which uses the Four-Russian Method to perform efficient partial order verification and runs in 0(d n /log n) time. Finally, an algorithm is 1 + d a-2 describea which, on the average, runs in time 0(a n (log n) ). 17. Key Words and Document Analysis. 17a. Descriptors rank function Four Russians' Methoa 17b. Identifiers /Open-Ended Terms 17c. COSAT1 Field/Group 18. Availability Statement Unlimitea 19. Security Class (This Report) TTNr) ASSIF1ED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 18 22. Price FORM NTIc.lF (1H--7T USCOMM-DC 40329-P7 1 ** \