LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 IJ*6r 
 no .679-6 84- 
 cop. 2. 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 NOV 5 
 
 OCT 13 
 
 -; 
 
 RECD 
 
 ,/.:■:; > ; 
 
 i^4 
 
 L161 — O-1096 
 
NSF - OCA - GJ-36936 - 000006 
 
 PARALLEL DIRECT POISSON AND BIHARMONIC SOLVERS 
 
 by 
 
 A. H. Sameh, S. C. Chen, and D. J. Kuck 
 
 July 1974 
 
 The 
 
 JAM 1 ' 19 75 
 
 ..tiiuis 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
Parallel Direct Poisson and Biharmonic Solvers 
 
 by 
 
 * ** ** 
 
 A. H. Sameh , S. C. Chen , and D. J. Kuck 
 
 +m, • 
 
 This work was supported in part by NSF Grant GJ-36936; and in part by the 
 Advanced Research Project Agency of the Department of Defense under Contract 
 No. DAHC04-72-C-0001. 
 
 * 
 
 Department of Computer Science and the Center for Advanced Computation, 
 
 University of Illinois, Urbana, Illinois 61801. 
 
 ** 
 
 Department of Computer Science, University of Illinois, Urbana, Illinois 61801, 
 
 July 19, 1974 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/paralleldirectpo684same 
 
1. INTRODUCTION 
 
 Several algorithms have been developed for the direct solution 
 of finite difference approximations of the Poisson and biharmonic equa- 
 tions on a rectangle [l- 3 ], [k,5]. For square meshes of n x n points, the 
 fastest available sequential algorithms require 0( n 2 log 2 n) time steps for 
 the Poisson equation, and 0(n 3 log 2 n) time steps for the biharmonic equa- 
 tion. Using as many parallel processors as necessary, one could hope to 
 achieve a solution to either of these problems in 0(log 2 n) time steps. In 
 fact, it follows from a simple fan in argument that any algorithm which 
 requires the interaction of n 2 data points with each other cannot be com- 
 puted in less than 21og n time steps. 
 
 We are concerned here with parallel computation and shall mea- 
 sure time in unit steps. If p processors are being used, then we denote 
 the computation time as T p unit steps. Thus T ± is the time required by a 
 serial machine. We define the speedup of a computation using p processors 
 over the serial computation time as S = T /T > 1. Further, we define 
 
 the efficiency of a p processor computation as E = S /p < 1. We also 
 
 y p 
 assume that: 
 
 (i) any number of processors may be used at any time, but we 
 will give bounds on this number, 
 
 (ii) each processor may perform any of the four arithmetic 
 operations in one time step, and 
 
 (iii) there are no memory or data alignment time penalties. 
 In this paper we develop a parallel algorithm for the Poisson 
 equation which uses only 0(log 2 n) time steps and 0(n 2 ) processors. Thus 
 
2 
 we achieve a speedup of 0(n ) over the fastest known sequential algorithms, 
 
 at an efficiency which is independent of n. Furthermore, we present a 
 
 parallel algorithm for the biharmonic equation which uses only 0(n) time 
 
 3 2 
 
 steps and 0(n ) processors for a speedup of 0(n log n) over the fastest 
 
 known sequential algorithm. We also present a slower, more efficient 
 
 5 2 
 
 algorithm which uses 0(nlog n) time steps and only 0(n ) processors, for 
 
 2 
 
 a speedup of 0(n ). 
 
 2. PRELIMINARY RESULTS 
 
 Now we introduce several lemmas which will be fundamental in 
 obtaining the main results of this paper. Throughout section 2, we assume 
 n = 2 for a positive integer k. 
 
 Lemma 1 
 
 The coefficients y. , l<i<n, of the Fourier transform of the vec- 
 tor of complex data points x. , l<i<n, can be computed in T =31og n steps 
 using p=2n processors with a speedup S =5n/3 and an efficiency E =5/6. 
 
 The details of the Fast Fourier Transform have been discussed 
 by a number of authors, e.g. [6-8]. Hence, we only sketch a proof that 
 31og n steps are sufficient. Each of the log n stages in the FFT algo- 
 rithm requires the multiplication of n/2 pairs of complex numbers, followed 
 by the addition of n pairs of complex numbers. Thus, 2n processors are 
 required to form the Mn/2) products in one step, and n processors can sum 
 the n/2 real and n/2 imaginary parts simultaneously in another step. The 
 complex addition of n pairs of numbers will require 2n processors for the 
 third step. Since the serial time required for this process is T = 5n log_n 
 steps, we achieve a speedup of S = T /T = 5n/3, with an efficiency of 
 E p = S p /p = 5/6. 
 
Corollary 1 
 
 The coefficients y. , l<i<n, of the Fourier transform of the 
 
 vector of real data points x. , l£i<n, can be computed in T =3 log (n/2) 
 
 steps using p = n processors with a speedup S = 5n/6(l - ), and an 
 
 P l°g n 
 
 efficiency E = 5/6(1 - ). 
 
 P log 2 n 
 
 This follows from Lemma 1 and the fact that n real points may 
 be regarded as n/2 complex points in the FFT algorithm, [9]. 
 
 Lemma 2 
 
 Any triangular system Ax = f of order n, where A is either 
 
 2 
 unit lower or upper triangular, can be solved in T < (log n + 3 log n)/2 
 
 steps using p <_ n /6h processors, with a speedup S = 0(n^/log n) and an 
 
 2 
 efficiency E = 0(l/n log n). 
 
 This is Theorem 1 in [10], hence the proof will be omitted here 
 
 Lemma 3 
 
 Let A in Lemma 2 be a bidiagonal matrix (a. .=0 for li-i >2). 
 Then the system Ax = f may be solved in T =2 log n steps using p = (n-l) 
 processors with a speedup S = (n-l ) /log n and an efficiency E = l/log_n. 
 
 This is a corollary of Theorem 2 in [l0]. 
 
 Lemma k 
 
 Let A be a nonsingular symmetric matrix of order n. Then it can 
 
 be factored in the form A = LDL 11 in T = 3 (n-l) steps using p = n(n-l)/2 
 
 P 
 
 processors, where L is unit lower triangular and D is a diagonal matrix with 
 nonzero elements. Thus, we obtain a speedup S = n(2n+5)/18 with an 
 
 2 
 efficiency -5- < E < 1 . 
 
Proof 
 
 Let L = U r * 2 ,...,* n ], where I* = (0.. .. ,0,1, Vl.k'W.k'" •• l n,k , « 
 
 n t 
 hence A = A - £ d.£.£.. At the beginning of the k-th stage we have 
 
 (d ,d_,...,cL ), [Z- ,Jlp, . . . , JL _ ) , and the symmetric matrix A^ = A - E d. . Jl . Jl . 
 
 i=l 
 
 (k) (k) 
 
 whose first (k-l) rows and columns are zero. Thus cL = a, , and I = a /cL 
 
 (r = k+l,...,n) are obtained by one division using (n-k) processors. We 
 
 then obtain the matrix cL JL JL in one multiplication using (n-k) (n-k+l)/2 
 
 processors, and A = A - till in one subtraction using the same number 
 
 of processors. Therefore, 3 time-steps are required in the k-th stage using 
 
 (n-k) (n-k+l)/2 processors. Hence, the whole factorization requires T = 3(n-l) 
 
 P 
 
 steps with p = n(n-l)/2 processors. 
 
 The sequential algorithm requires n(n-l) (2n+5)/6 steps, thus we 
 obtain the above-mentioned speedup and efficiency. 
 
3. A POISSON SOLVER 
 
 We consider the five-point finite difference approximation of 
 the Poisson equation 
 
 -V 2 !! = f (1) 
 
 on a unit square. We superimpose a square grid over the unit square with 
 a mesh size h = l/n and assume that log n = 2 for a positive integer k. 
 In fact, the results of this paper hold for n = 2 , but for later notational 
 simplicity we make the above restriction. Considering the periodic boundary 
 conditions: 
 
 u(x Q , y) = u(x n , y) 
 u(x s y Q ) = u(x, y n ), 
 
 (2) 
 
 we are led to the linear system, 
 
 where 
 
 and 
 
 M v = w , 
 P 
 
 M = 
 
 A -I 
 -I N A -I 
 
 -I 
 
 -I 
 
 -I A -I 
 
 -I A 
 
 v = 
 
 w 
 
 t t 
 
 = (v , w 2 , 
 
 / t t t> 
 
 (v x , v 2 , ..., v n ), 
 
 . . . , w 
 
 ), 
 
 f 2 2 ^ 
 
 (n x n ), 
 
 (3) 
 
 A = 
 
 k -1 
 
 -1 
 
 \ 
 
 1 h -l 
 \ 
 
 s -i " N u 
 
 S V V 
 
 N S V 
 
 -1 
 
 -1 
 
 (n x n . 
 
The matrix A has the eigenvalues A = k + 2cos— "— , (j = 0, 1, 2, ..., n-1 ) , 
 
 and the corresponding eigenvectors are 
 1 
 
 t 
 
 q o 
 
 ^ 
 
 (1, -1, 1, ..., -1), 
 
 q£ - -~ (1, 1, 1, ..-, 1), 
 2 ^ 
 
 t /2 , 2£tt kin on , 
 
 q„ = /— (cos , cos , .... cos 2£tt)_ 
 
 H V n n n ' 
 
 and 
 
 f2 , . 2£tt . kH-n 
 
 — (sin , sin 
 
 , sin 2&ir)* 
 
 where £ = 1,2, ...,—- 1. Let Q be the eigenvector matrix of A, v. = Q v. , 
 and w. = Q w. (i = 1, 2, . . . , n). Thus, the solution of system (3) can be 
 reduced [3] to that of solving the n systems of linear equations. 
 
 r .v, = v. 
 
 (j = 0, 1, 2, ..., n-1), 
 
 (k) 
 
 where 
 
 and 
 
 r. = 
 J 
 
 A . -1 -1 
 
 -1 A . -1 
 
 \\ \ 
 
 -1 A . -1 
 
 -1 
 
 -1 A. 
 
 J -J 
 
 v r ( v v j2' •■•• V' 
 
 (nxn), 
 
 in which v and w are the j-th components of v and w , respectively, 
 algorithm proceeds as follows in three stages: 
 
 The 
 
 i) Compute w. = Q w. 
 
 (i = 1, 2, ... , n), 
 
ii) Solve the n systems 
 
 r.v, = w 
 
 J J J 
 
 iii) Compute v. = Qv. 
 
 (j = 0, 1, 2, ..., n-1), 
 (i = 1, 2, ... , n). 
 
 With this theoretical "background for the Poisson equation, we 
 now turn to the development of an algorithm which computes a solution in 
 0(log 9 n) time steps. This will be developed by analyzing p and T for 
 stages i - iii above. The result is presented below as Theorem 1. 
 
 We describe an algorithm that capitalizes on Lemmas i and 2 so 
 
 as to require a time within a small multiple of the time required by a fan in 
 
 2 
 argument on n points. We give the number of time steps and processors re- 
 quired in each stage for the major operations whose computation time depends 
 on n. In other words we will ignore operations which can be computed in 
 constant time. Figure 1, below, gives a time-processor chart for the 
 parallel Poisson solver. 
 
 r 
 
 Tp 
 
 3 log 2 n 
 
 2 log 2 n 
 
 jL 
 
 l°g 2 n 
 
 T 
 
 2 log 2 n 
 
 f 
 
 3 log 2 n 
 
 -jn/log n 
 
 
 
 w. = Q w. 
 l l 
 
 
 z j 
 
 
 a. 
 J 
 
 
 
 
 
 v. = Qv. 
 
 1 
 
 
 
 stage i, i=l, 2, ...,n 
 
 stage ii, j = 0, 1, ..., n-1 
 
 stage iii, i = 1, 2, ..., n 
 
 Figure 1 
 
From the eigenvectors q and q , of the matrix A we see that 
 
 v. = Q w. will give rise to discrete cosine and sine transforms each of n/2 
 1 1 
 
 real data points. By Corollary 1, each of these can be computed by an FFT 
 
 algorithm of n/U complex data points; thus we can evaluate 
 
 (w , ..., w lo , w , 9xl) ..., w ) in 3 log (n/U ) steps using n 
 i,l i,n/£!-l i,n/^+± i,n-l 2 
 
 processors. Using (n/log n) additional processors, we can simultaneously 
 
 evaluate w 
 
 1 
 
 n-1 
 
 , n-1 
 
 , = I (-1) J w. . and w 
 
 •n j=0 ' d i,- /n j=0 
 
 Z w. . in 
 
 (3 log p n - log p log p n) steps. Therefore, stages i and iii require 
 
 2 2 
 3 log_n steps each with (n + n /log_n) processors as shown in Fig. 1. 
 
 In stage ii we solve the linear systems (h) using Toeplitz 
 
 factorization in conjunction with the Woodburry formula, [ll]. The matrix T 
 
 can be considered as the sum (B. + SV.), where 
 
 J J 
 
 j 
 
 -1 X . -\ 
 
 B. = \ J \ ! 
 
 \\ \l 
 
 -1 A. -1 
 
 -1 X. (nxn), 
 
 oJ 
 
 S = [e ,e ], and 
 
 lu.-i o- 
 
 t i J 
 
 V. = j 
 
 J -1 o- 
 
 -1 i 
 
 ! (2xn). 
 
 The matrix B can then be factored into the product LU , where 
 j J J 
 
 ^j- 1 
 
 u. =! 
 
 -l 
 
 1 , and L. 
 
 W ■ 
 
 -1 
 
 N 
 S 
 
 N s 
 
 I 
 
 y 
 
 J ! 
 
2 1 /2 
 
 in which y = (X /2) + ((X A) - l) , and for numerical stability of 
 
 the factorization the sign is chosen such that |y.| = |x /2 1 + ((X./1+) -l ) 
 
 J J J 
 
 Thus, 
 
 v. = rT w. 
 
 1/2 
 
 = uT^lT 1 * - l^s (i + v*u: 1 L: 1 sr :L v*u , : 1 L" 1 i,]. 
 
 J J J J 2 J j J J J J j 
 
 Solving the systems L.G. = S and U.H. = V. is rather straightforward; 
 
 J J J J J 
 
 and 
 
 H j = [(P J% - y j le n»' "I lg j ] ' 
 
 where 
 
 t ,_ -1 -2 -n+h 
 
 g = (1, y , y , ... , y ) 
 
 J J J J 
 
 We assume that such a vector may "be computed in constant time. The major 
 
 operations in computing the v.'s are those involved in solving the systems 
 
 J 
 
 L.z. = w., and U.v. = a. , 
 J J J J J J 
 
 (5) 
 
 in which 
 
 a. = z. - G.(I„ + H*0.) 1 H^z. 
 J J J2 jj jj 
 
 (6) 
 
 By Lemma 3 we see that solving each of the systems in (5) requires 2 log p n 
 steps and (n-l) processors. Since g.g. = (l-y. )/(l-y. ), then the only 
 
 J J J J 
 
 major operation in (6) is that of computing the inner product g.z.. This 
 
 J J 
 
 requires log n steps and n processors. Stage ii, therefore, requires a 
 
 2 
 total of 5 log p n steps with n processors as shown in Fig. 1. 
 
10 
 
 Thus the total number of time steps required for this algorithm 
 
 2 2 
 is 11 log n with n + n /log p n processors. Similarly we can show that for 
 
 Neumann boundary conditions the same numbers of steps and processors are 
 
 required, while for the Dirichlet boundary conditions we require only 
 
 2 
 10 log n steps with n processors, since stage ii needs only k log n steps. 
 
 The following theorem is therefore established, 
 
 Theorem 1 
 
 The five-point finite difference approximation of the Poisson 
 equation on the unit square with mesh size h = l/n can be solved in at most 
 
 T = 11 log„n steps using no more than p = n(n+[n/log nl ) processors, with 
 
 P 2 « 
 
 a speedup S = 0(n ) and an efficiency 9/44 <_ E < 9/20. O 
 
 Comparing our parallel algorithm with the odd-even reduction scheme 
 [3] , we obtain, 
 
 S = | n 2 log 2 n/ll log 2 n = 9n 2 /22, 9/44 < E £ 9/20 
 where the upper bound is achieved for the Dirichlet boundary conditions. 
 
 We use [xl to denote the smallest integer greater than or equal to x. 
 
11 
 
 k. A BIHARMONIC SOLVER 
 
 We consider the "biharmonic equation, 
 
 k 
 V u = f 
 
 on a unit square with the boundary conditions u(x, y) =0 and u (x, y) = 
 
 for (x, y) e 9R, where u (x, y) is the outward normal derivative at the 
 
 boundary. Superimposing a square mesh on the unit square with mesh size 
 
 h = l/(n+l), where n = 2 , then a finite difference approximation leads to 
 
 the linear system 
 
 M v = w, 
 
 (T) 
 
 where 
 
 "b 
 
 A+I 2B I 
 
 2B A 2B I 
 
 I 2B A 2B I 
 
 
 
 "1 2B A 2B I 
 I 2B A 2B 
 I 2B A+I 
 
 f 2 2 \ 
 (n x n ) 3 
 
 in which 
 
 B = 
 
 -k 1 
 1 -U 1 
 
 \ s 
 \ \ 
 \ \ 
 
 1-Ul 
 1 -h 
 t 
 
 A = B + 21 + 2EE 
 
 (nxn), 
 
 = V + 2EE , and 
 
 ■*- 
 
 1 o 
 
 o o 1 
 
 (2xn) 
 
12 
 
 The eigenvalues of B are given by 
 
 in 
 
 w. = -h + 2 cos - 
 l n+1 
 
 (i = 1, 2, ..., n), 
 
 and the eigenvector matrix, 
 
 <» -[«„!■ 
 
 n+1 
 
 sin 
 
 1 JTT 
 
 n+1 
 
 System (?) can be expressed [5] as 
 
 where 
 
 (G + 2FF )v = w, 
 
 (i, j = 1, 2, . . . , n). 
 
 G = 
 
 V+I 2B I 
 
 2B V 2B I 
 
 I 2B V 2B 
 
 I 2B V 2B I 
 
 \ \ \ 
 
 I 2B V 2B 
 I 2B V+I 
 
 (n xn ) , 
 
 and F/ 2 \ = diag(E, E, ..., E). Thus by the Woodburry formula, 
 \Zi x c.n ) 
 
 v = G _1 w - 2G~ 1 F(I^ + 2F t G X F) 1 F t G _1 w 
 
 2n 
 
 (8) 
 
 and the algorithm proceeds as follows in three stages: 
 i) Solve the (2n+l) linear systems 
 GY = F and Gv = w, 
 ii) Factor the symmetric matrix {T + 2F Y) in the form LDL 
 
 and solve the linear systems 
 
 t^ t 
 
 Lz„ = F v, Dz = Zp, and L z = z , 
 
 iii) Obtain the solution, 
 
 >\> 
 
 v = v - 2Yz. 
 
13 
 
 
 In stage i, we would like to solve (2n+l) systems of the form 
 
 Gx = b . (9) 
 
 As in the direct Poisson solver, solving the system (9) can be reduced to 
 
 that of solving the n systems, 
 
 fix = b j = 1,2,. ..,n , (10) 
 
 J.J J 
 
 where the positive-definite matrices fi. are given by 
 
 J 
 
 1+A. 
 
 J 
 
 2u> 
 
 fi. = 
 J 
 
 2u>. 1 
 J 
 
 2d) . 1 
 J J 
 
 2w. A. 2o). 1 
 
 \ 
 \ \ \ \ 
 \ \ \ \ 
 \ \ \ \ 
 \ \ \ 
 
 2u 
 
 2w 
 
 2u. A . 
 
 J J 
 
 2u. 
 
 1 2u. l+X 
 
 J J 
 
 (nxn) , 
 
 where 
 
 A. = 2 + id, 
 
 j ; 
 
 c. 
 
 J 
 
 ' ( *J1' 5 J2» 
 
 in which, 
 
 , x. ) , and 
 
 b . = (b , b , . . . , b. ) , 
 J Jl J2' jn 
 
 \ " «*V \ " «\ 
 
 k = 1, 2, . . . , n 
 
 1/2 
 
 Let y = (u /2) - ((oj /If) _ i) ' ; then fi. can be expressed as 
 J J j J 
 
 n. = t 2 + s.v,' 
 
 j j j j 
 
 (ii) 
 
where 
 
 1U 
 
 T. = 
 
 v i 1 
 
 1 u, 1 
 \ ^ N 
 
 \ N * 
 
 N l 0) 1 
 
 (J 
 
 1 OJ, 
 
 (n*n), 
 
 in which 
 
 S j = C(a j e l + 6 j e 2 K V e n ] ' md 
 V j = Ce l' 6 j e 2> e n ] ' 
 
 2 2 
 
 a. = 2 + co. - y . , 6 . = u>. - y . 
 J J J J J J 
 
 T. has the Toeplitz factorization L.U. similar to that of B in Sec. 3, 
 j J J J 
 
 where "both L and U. are diagonally dominant. Therefore the numerical 
 J J 
 
 stability of the process of solving the linear systems (L.U.) y. = f. is 
 
 assured. Thus the solutions to the systems (10) may he written as, 
 
 x = (L U )" 2 h - (L U )~ 2 S.(1„ + V^(L.U.)" 2 S.)"V(L.U J" 2 "b.. (12) 
 j j j j J J j3 J J J J J J J J 
 
 2 
 By Lemma 3, solving the system (L U.) y. = h. requires 8 log n steps 
 
 j j j j <— 
 
 using (n-1) processors. Furthermore, since x in Lx = e. (i=l,2,n), and 
 
 Ux = e can be obtained in constant time, the solution of the system 
 n 
 
 / \2^ ^ / N A 
 
 (L.U.J S. = S. requires only 16 log n steps using (n-1) processors. Thus x. 
 J J J J *- <J 
 
 in (12) is obtained in l6 log n steps with 2(n-l) processors. By virtue of 
 
 Corollary 1, the vectors b. and x. in (ll) are obtained in 6 log (n/2) steps 
 
 J J e - 
 
 using n processors. Therefore, the linear system (9) can be solved in 
 
 2 
 22 log n steps using 2n processors. Consequently, stage i requires 22 log n 
 
 steps provided we use 2n (2n+l) processors. 
 
15 
 
 By Lemma U, the matrix (I + 2F y) can be factored in the form 
 
 LDL in 3(2n-l) steps using n(2n - l) processors. By Lemma 2, we require 
 
 2 
 (log 2 2n + 3 log 2 2n) steps to solve the three linear systems in stage ii using 
 
 3 2 
 
 n /8 processors. Thus, stage ii requires (6n + log n + 5 log n + 1) steps with 
 
 '2 
 
 o 
 
 max (n(2n - l), n /8) processors. Finally, stage iii requires log n steps using 
 
 3 
 2n processors. 
 
 Hence we have established the following theorem: 
 
 Theorem 2 
 
 The finite difference approximation (7) of the biharmonic equation 
 
 2 
 on the unit square can be solved in at most T = 6n + log Q n + 28 log n + 1 steps 
 
 .M *~ 
 
 2 2 
 
 using p = 2n (2n + l) processors, with a speedup S = 0(n log n), and an 
 
 P " ^ 
 
 efficiency E = 0(log n/n).Q 
 
 For large n, T o6n and plk^n . Comparing this with the sequential 
 
 algorithm in [5], we obtain a speedup of no less than, 
 
 3 5 2 
 
 S = 5n log n/6n = — n log n 
 p 2 6 2 
 
 with an efficiency of, 
 
 E p = 5 log 2 n/24n . 
 
 Note that for n^2 , Lemma 2 requires at most i+n processors. 
 
 3 
 For large n, the requirement of n processors may be impractical. If we 
 
 wish to limit the number of processors, the above algorithm can be shown 
 
 2 
 to require 0(n log n) steps with 0(n ) processors. 
 
 This can be explained as follows. If we have, for example 
 
 2 
 n processors, then let us solve the top k equations of the linear sys- 
 
 2 ~i 2/3 
 
 terns in stage ii where by Lemma 2 n = k /6U, i.e. k = kn . We can 
 
 then compute the partial sums of the first k terms of the remaining (n-k) 
 
 equations as a new constant vector for the remaining triangular system. 
 
16 
 
 This process can be repeated 
 and it takes 
 
 2n 
 
 times until the whole system is solved, 
 
 T "o 
 P ~ 
 
 2n 
 
 [(log 2 k + 3 log 2 k)/2 + log 2 k] 
 
 = 0(n 1/3 log^n). 
 
 For a moderate range of n, T = (n). With 0(n ) processors, now, 
 stage i can also be evaluated within 0(n log n) steps by computing the 
 (2n + l) systems sequentially; stage iii will still have T = 0(log ? n). 
 Hence, by this procedure, we can obtain a somewhat slower but more effi- 
 cient algorithm with speedup of 0(n ) and an efficiency again indepen- 
 dent of n. 
 
17 
 
 REFERENCES 
 
 [1] Dorr, F. W. The direct solution of the discrete Poisson equation 
 
 on a rectangle. SI AM Review 12, 2 (1970), 248-263. 
 [2] Buzbee, B. L. A fast Poisson solver amenable to parallel computa- 
 tion. IEEE Trans. Comput . C-22, 8 (1973), 793-796. 
 [3] Buzbee, B. L. , Golub, G. H. , and Neilson, C. W. On direct methods 
 
 for solving Poisson' s equations. SIAM J. Numer. Anal . 
 
 7, 4 (1970), 627-656. 
 [4] Ehrlich, L. W. Solving the biharmonic equation in a square: a 
 
 direct vs. a semidirect method. Comm . ACM 6, 11 (1973), 
 
 711-714. 
 [5] Golub, G. H. An algorithm for the discrete biharmonic equation. 
 
 Private communication. 
 [6] Brigham, E. 0., and Morrow, R. E. The fast Fourier transform. 
 
 IEEE Spectrum , (Dec. 1967), 63-70. 
 [7] Pease, M. C. An adaptation of the fast Fourier transform for parallel 
 
 processing. J. ACM 15, 2 (April 1968), 252-264. 
 [8] Ackins, G. M. , and Kuck, D. J. Seismic signal processing via the 
 
 ILLIAC IV computer. IEEE Trans. Geo. Electron . GE-7, 
 
 (Jan. 1969), 34-41. 
 [9] Cooley, J. W. , Lewis, P. A. W. , and Welch, P. D. The fast Fourier 
 
 transform algorithm: programming considerations in the 
 
 calculation of sine, cosine and Laplace transforms. 
 
 J. Sound Vib. 12, 3 (1970), 315-337. 
 
18 
 
 [10] Chen, S. C. , and Kuck, D. J. Time and parallel processor bounds 
 
 for linear recurrence systems. Submitted to IEEE Trans . 
 
 Comput . 
 [11] Fischer, D. , Golub , G. , Hald, 0., Leiva, C. , and Widlund, 0. On 
 
 Fourier-Toeplitz methods for separable elliptic problems. 
 
 Math. Comp . 28, 126 (1974), 349-368. 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-74-684 
 
 2. 
 
 3. Recipient's Accession No. 
 
 4. Title and Subtitle 
 
 PARALLEL DIRECT POISSON AND BIHARMONIC SOLVERS 
 
 5. Report Date 
 
 July 1974 
 
 6. 
 
 7. Author(s) 
 
 A. H. Sameh, S. C. Chen and D. J. Kuck 
 
 8. Performing Organization Rept. 
 
 No - UIUCDCS-R-74-684 
 
 9. Performing Organization Name and Address 
 
 University of Illinois at Urbana-Champaign 
 
 10. Project/Task/Work Unit No. 
 
 Department of Computer Science 
 Urbana, Illinois 61801 
 
 11. Contract/Grant No. 
 
 US NSF-GJ-36936 
 
 12. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 
 13. Type of Report & Period 
 Covered 
 
 Washington, D. C. 20550 
 
 14. 
 
 15. Supplementary Notes 
 
 16. Abstracts 
 
 We consider the direct solution of the Poisson and biharmonic equations using a 
 number of arithmetic units in parallel. Assume an n*n grid of mesh points. We 
 
 show that a finite difference approximation to the Poisson equation can be solved in 
 
 2 
 0(log 9 n) time steps using 0(n ) processors in parallel. This holds for Dirichlet, 
 
 Neumann and periodic boundary conditions. We also show that a finite difference 
 
 approximation to the biharmonic equation can be solved in 0(n) time steps using 
 
 3 
 0(n ) processors in parallel. 
 
 17. Key Words and Documem 
 
 Analysis. 17a. Descriptors 
 
 
 
 Biharmonic Equation 
 Computational Complexity 
 Parallel Computation 
 Poisson Equation 
 
 Computing Reviews Categories 
 5.14, 5.17, 5.25. 
 
 17b. Identifiers/Open-Ended Terms 
 
 17c. COSATI Field/Group 
 
 18. Availability Statement 
 
 Unlimited 
 
 19. Security Class (This 
 Report) 
 
 TlNn ASSIFIED 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 20 
 
 22. Price 
 
 ORM NTIS-35 ( 10-70) 
 
 USCOMM-DC 40329-P7 1 
 

r«-. 
 in 
 
 a. 
 
■ 
 HH H •' 
 
 HHfl 9H 
 
 
 B! 
 
 UNIVERSITY OF ILLINOIS-URBANA 
 JlOMILIRno COO? no 179 6(4(1974 
 
 Riporl / 
 
 
 3 01 
 
 12 0884015 
 
 ■HI '*& 
 
 ■ 
 
 ■ 
 
 
 
 
 
 ■■ *£d 
 
 
 ■ 
 
 ■ V 
 
 ■ 
 
 ■ 
 
 H 
 m ■■■ 
 
 m 
 
 m 
 
 I 
 
 on 
 
 ■ 
 
 ■HE ?> I 
 
 ■H 
 
 
 i 
 
 ■HI