LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN 510.8^ ho. 6.I-&0 CO Digitized by the Internet Archive in 2013 http://archive.org/details/generatingpseudo74farr UNIVERSITY OF ILLINOIS GRADUATE COLLEGE DIGITAL COMPUTER LABORATORY REPORT NOo 74 GENERATING PSEUDO-RANDOM NUMBERS IN THE ILLIAC By Carl Co Farrington, Jr. December 7 > 19 56 This work has been supported by Contract N6ori-71 Task XXX United States Navy ONR NR Okh 001 GENERATING PSEUDO-RANDOM NUMBERS IN THE ILLIAC Carl C. Farrington, Jr. , University of Illinois Introduction This report contains results of research directed toward finding a better method of generating pseudo-random numbers in the ILLIAC . When we speak, loosely, of random numbers we mean numbers generated by a random process. We shall call a process random if we cannot predict the next number to be generated. Since very long sequences of random numbers are required in Monte Carlo computations, reading such numbers into the computer is not practical. A possible alternative would be to generate inside the computer numbers by a random physical process. This approach has the disadvantages that additional equipment is required and that the computations are not reproducible . The other alternative is to generate numbers by a determinate arithmetic process. Of course, such numbers will not be "random." However, inasmuch as one is only interested in obtaining trustworthy answers to problems, such "pseudo- random" numbers may be satisfactory. Of course, whether the results of a computation for which a particular sequence of pseudo-random numbers is used should be given as much confidence as those for which a randomly selected sequence of numbers is used, depends upon the particular problem. Accord- ingly, the user must decide for himself how much confidence to place in the numerical results he obtains. These considerations led to placing primary emphasis on obtaining a fast method of generating numbers, one taking only about 1/2 millisecond per number, which generates a very long sequence. This is the sense in which we speak of a "better" method. For the most part this report contains the results of three tests on two methods of generating pseudo-random numbers. These tests each determine whether some particular property of the numbers is statistically predictable. One of the methods tested is that used in the present library routine. The other is a modification of a method proposed by Tocher . Tdcher, K. D„, "The application of automatic computers to sampl- ing experiments," J. Roy. Stat. Soc, ser. B, vol. 16, (1954) 39-61. I» Methods of generating pseudo- random numbers The method used in the present library routine is the "middle-of- (2) —19 —38 the-square" method^ . A certain 38 binary digit number (2 +3X2) proposed by N. Metropolis is taken as the starting point. This number is squared and the middle 38 digits of the product are used as the second mem- ber of the sequence. This is squared to obtain the third member, and so on. The first 1,000 numbers are usually discarded. The sequence is about 700,000 numbers long, after which it rapidly degenerates to a sequence of all zeros. The first objective of this research was to find a faster method, one taking on the order of one-half millisecond to generate one number. Ac- cordingly, processes involving the use of multiplication or division orders had to be ruled out. The method finally arrived at uses five starting numbers. A linear combination of the last four is added to a number obtained by permut- ing the digits of the first number, to obtain the 6th number of the sequence. The same linear combination of the 3rd, 4th, 5th and 6th numbers is then added to the permuted digits of the 2nd number to obtain the 7th number, and so on. The linear combination used has only integral coefficients so the linear combination can be formed by a series of additions, subtractions, and shifts. The shifting operations are used simultaneously to permute the digits of the other number. Secondly, the red tape is reduced by generating five numbers instead of one each time the routine is entered. The overall result is that the routine takes 3 milliseconds to generate five numbers, so that the num- bers are generated at a rate of 0.6 millisecond per number. The method proposed by Tocher was to form a linear combination of a fixed number of the preceding numbers to obtain the next number in the se- quence, without permuting the digits of any of them. He recommended that the sequence of binary digits composing this sequence of numbers be paired and that these pairs be replaced by single digits according to the following code. 00 reject 01 10 1 11 reject (2) v 'Hammer, Preston C, "The Mid-Square Method of Generating Digits," Proceedings of a Symposium July 29, 30 and July 1, 1949, National Bureau of Standards Applied Mathematics Series 12. -2- He demonstrated that this scheme would improve several types of non-random- ness in the original sequence. He recommended that the resulting sequence of digits be regrouped as variables and subjected to the pairing process again to obtain the final sequence of random numbers. Since the ILLIAC is a parallel machine, the time which would be con- sumed in coding digits in the manner described would be absolutely prohibi- tive for obtaining long sequences of numbers. It was thought that a linear combination might be used directly as a method of generating pseudo-random numbers without the use of the improve- ment scheme suggested. One sequence, the Fibonacci numbers, which involves a linear combination of only two of the preceding numbers is rather well known. The Fibonacci numbers are obtained by making each number the sum of the two preceding numbers, using 0, 1 as the starting numbers, so that one obtains 0, 1, 1, 2, 3, 5, ••■> The starting numbers are represented in the computer -39 -39 as * 2 and 1 <* 2 . The sequence increases quite rapidly, and integers larger than 2 are soon obtained. Overflow occurs and the sum must be re- garded as a sum modulo 2 . The Fibonacci numbers and other sequences of numbers obtained as linear coirbinations of preceding numbers have been found to have a periodicity of their terminal digits which makes them unsuitable as sources of random numbers. An analysis showing this result follows.* We consider a sequence of integers modulo 2 given by q A A = (l) n+q , -^- , k n+q-k ' x with starting numbers A„, A, , . . . , A _ . We assume that the last coeffic- ient a is an odd integer. This is a sufficient condition for periodicity of the terminal digits, for if a is odd it is relatively prime to any power s of 2. Hence for any s, a has an inverse modulo 2 , so that the equation q - 1 k = 1 a A = A - > a, A q n n+q *=3- 1 Tc n+q - k 5 has a unique solution modulo 2 for A in terms of its first q successors. ^ n It is clear that if the s terminal digits of q successive numbers in the * The results and proof given here were suggested by those in an article by Vern Hoggatt, "A type of periodicity for Fibonacci numbers," Mathematics Magazine, vol. 28, no. 3, (1955), 139-142, analyzing the Fibonacci numbers in their decimal representation. -3- ii i H i : i U ,il sequence are the same as the s terminal digits of q successive preceding num- bers T (A . ) = T (A . ) , i = 0, 1, ... , q - 1 (2) s n + i + M' s n + i ' ' ' ' n v where T (A) represents the s terminal binary digits of A, then the s terminal s digits will be periodic for i^ of period M. Since there are only a finite number of possibilities for the s terminal digits of any number, the exis- tence of such numbers n and M is assured. But if the coefficient a is odd, then equation (2) will hold for all i^ -n due to the uniqueness of the pre- decessor of any q numbers. Thus the periodicity of the terminal digits will begin at the very beginning of the sequence. This result can be stated as (R) Let A [be a sequence of integers modulo 2 given by (l) with a odd. Then for any integer s $ 40 there exists an integer M such that T (A, __ J = T (A. ) s k + M s k for all k. If we denote the period of the s terminal binary digits by p we can 5 show now that • «s - 1 P s ^ 2 Pl To obtain this result we consider an auxiliary sequence q F = S" a, F , , n £ 0; n + q . ~ , k n + q - k ' ' 1 q-2 ' q - 1 (The Fibonacci numbers, for which q = 2 and a, = a„ = 1 are a special case of this) We shall show by induction that i - 5" y a. , F _ .A , , n, p ^ n+q+p k t- x p J+k p+q-l-j n+q-h » * (3) using the convention that a, = for k > q. For p = we have for the right hand member -4- ll I I II " l" I! I I" ( X a, A . ) F . = A v .-fi k n+q-k' q-1 n+q by equation (l), since F , = 1, which holds for all n ^ 0. The assumption that equation (3) holds for p = r and for all n f? yields n+q+(r+l) (n+l) + q + r q r = *£ 2- a. , F , .A . . k ti jT J k r+ ^- 1_ J n+1+ q~ k q r ' k?i w) a J +1+k "l r+q-(j+l) n+q-(k-l) q-1 r+1 k tb jti a J +k ?r+q -J An+q " k Using the convention that a, = for k > q we may write q r+1 r+1 A / -^ = X ST a. . F .A , + zl a. F .A n+q+(r+l) ^ jT J +k r+q-j n+q-k f^ j r+q-j n+q q ~\ A - F f^ n+q-k r+q If r +1 < q the second summation may be written r+1 q q q, we again ob- tain r+1 < a.F .A =]_ J r+q-j n+q <^ j r+q-j n+q -5- From the defining equation we have r+1 q y a. F .A = ( STa. F .) A = F A TZ-y 3 r+q-j n+q f% j r+q-j' n+q r+q n+q From equation (l) the third summation is q (T A A . ) F - A F v /^, k n+q-k r+q n+q r+q so we obtain q r+1 A / n % = S" £ a. , F .A +F A -A F n+q+(r+l) £. 7Z* J +k r+ 1~3 n+q-k r+q n+q n+q r+q q r+1 j^i jTq J + ^ (r+l)+q-l-j n+q-k which establishes the result for p = r+1. Hence it is true for all p, n 20. If M is the period of the s terminal digits of the F' s, then there are integers B. such that F M+*i = V*' i = °' lj "•" > q " 2 F M . = B n 2 S + 1 M+q-1 q-1 Hence, since M must be greater than q-1, equation (3) implies q q-1 F 2M+ i ^ k jf a j + k F M+q-l-j F M+i-k q-1 q = j?o Si aj+k Fm+c i- 1 -J FM+i " k q-1 q = j| t= f +I a t F M+q-l-j F M+i+j-t q-l-i q q-l-i j K t^l^ FM+i+ 3-t F M+q-l-j " S ^ a t F M+i+j-t F M+q-l-j q-1 q a. F- . . . F„ -q-l_j 5" < a. F„ . . F„ . <=- . L 4- , t M+i+i-t M+q-1-; j=q-i t=j+l -6- II 1 1' I q-l-i q-1 q q-l-i j = ^ F F +*> S~ a F F .< n M+i+j M+q-l-j ■ • * TTi t M+i+j-t M+q-l-j j=0 j i j J=q-i t=j+l J H J j5_ t ? x ^ F M+i+j-t F M+q-l-j (4) We note that for i < q - 1 the first summation is q-l-i q-2 <"" F F = 2F F + ^1. F F ^- Q M+i+j M+q-l-j " M+i M+q-1 21 r M+i+j M+q-l-j g Since each factor in the summation on the right is a multiple of 2 , the 2s > s+1 summati on is a multiple of 2 ^ 2 . The first term on the right is 2F M . F M , = 2B.2 S (B n 2 S +l) = B.2 S+1 + B.B -.2 2s+1 M+i M+q-1 l v q-1 l l q-1 Except for i = when the second summation in equation (4) is not present, all factors in the second and third summations in equation (4) are multiples of 2 so that these summations are multiples of 2 -5^2 . Hence we have expressions of the form s+1 F 2M+i = C i 2 for i = 0, 1, ... , q - 2 For i = q - 1 the third summation in equation (4) is not present, the second s+1 is a multiple of 2 and the first is F* . - B 2 ,2 Z3 .♦ B l2 S+1 ♦ 1 M+q-1 q-1 i q-1 which is of the form s+1 F „ _ = C n 2 +1. 2M+q-l q-1 Hence for the auxiliary sequence P s+1^ 2p s and we notice that if m is a number such that -7- F , - B -.2 m + 1, B - odd P m + q-i q-i ' q-i then p =2 p , s ~ m ^s m Finally, we obtain our result by letting M be the period of the s terminal digits of the F's. Then by equation (3) with n + q = i and p = M q q-i hi+1 = /£-, A a j+k F M+q-l-j A i-k k=l j=0 " n j q-i q = j?o t=f + i &t k± ^ '"^^ ** q ^ ^a A F = R, , A. + r C a, A. x . F M n « - 2 > t i+j-t M+q-l-i M+q-1 1 /■= ,<4r t l+j-t M+q-l-j .*-. *-t ° id n j=l t=l - j=l t=l By our previous results this is of the form A,, . = (B ,2 s + 1) A. + D2 S Tt+i q-1 i Hence T s ( W " T s (A i ) ' 4 * ° Thus the p for the A 1 s must be a divisor of the p for the F's for each s. s ^s The periods for the A' s need not be the same as for the F's. For example, if the starting numbers for the A's are all even numbers then p, = 1. Con- sequently our result should read < s-l P s (A) £ 2^ p x (F) The periodicity just demonstrated of the terminal digits of the numbers obtained by the linear combination method makes these digits too predictable to be regarded as random. For this reason the method was modi- fied to include permutation of the digits of one of the numbers of the combination. If (a Q , a., ... , a_g) denotes a forty binary digit number the permutation operation 77" chosen is given by IT C a Q* a -ij ••• y a 39' = ^ a 0' a L.* a 5 J *** 3 a 39* a l* a 2' a 3 With this modification of the linear combination method the condition for periodicity of the s < 39 terminal digits is no longer met, for the s ter- minal digits of a number no longer depend only on the s terminal digits of the q preceding numbers. The combination investigated uses q = 5 an d is given by A =7A , + A -4A + 3A , + 7T A n+5 n+4 n+3 n+2 "' n+1 n This combination has the advantage that the equation IT k = A r -?A ,-A _ + 4A ^ - 3A . n n+5 n+4 n+3 n+2 •* n+1 has a unique solution for a member of the sequence in terms of its 5 immediate successors, since the permutation operation If has a well-defined inverse. The choice of the other coefficients and of q were rather arbitrary, except that one case of a choice of q = 2 was found to be unsatisfactory. In the machine the number A - is formed by first placing A in the A register and A in the Q register. A left shift of one place is performed to give 2 A . in the A register. To this is added - A _ and + A , giving ° n+4 n+2 n+1 ° ° 2A ,-A _ + A ... Then a left shift of two places is performed, and to n+4 n+2 n+1 r the result is added -A ,+A „ - A n to give 4(2A ,-A _ + A , ) - A . n+4 n+3 n+1 & v n+4 n+2 n+1 n+4 + A „ - A ,=7A ,+A _+3A ... In the meantime the digits q, , q^, n+3 n+1 n+4 n+3 n+1 & ^1 J ^2 f q„ of A have been added to the end of the number in the A register, and the ^3 n digits q. , q^t ... 9 q^g have been shifted left three places, the last three places in the Q register being replaced by zeros. Adding the number in the Q register to the number in the A register then completes the formation of n+5 -9- II. Results of tests We describe next the tests for randomness performed on the two meth- ods of generating random numbers, the "iniddle-of-the-square" method and the modified linear combination method, and the results of these tests. In the main three tests were performed. The first was a chi-squared frequency test. The binary digits of the numbers were grouped in fours , each group of four being regarded as a sexadecimal digit. The test was a test of the frequencies of the sexadecimal digits. For the middle-of-the-square method, since 38 digit numbers are produced, the first 36 binary digits of the first number were used as the first 9 sexadecimal digit sj the last two binary digits of the first number and the first two of the second number were used as the tenth sexadecimal digit, and the remaining 36 binary digits of the second number were used as the next nine sexadecimal digits. All tests of the middle-of-the- square method began with the 1001st number of the sequence. Since the modi- fied linear combination produces numbers 40 binary digits in length, 10 sexadecimal digits were obtained from each such number. For the first 4,800,000 sexadecimal digits, representing 3A °f the sequence, for the middle-of-the-square method the frequency test gave a value of 16.6. The probability of getting a higher value of chi-squared is 35/&o For the first 4,800,000 sexadecimal digits obtained by the modified linear combination method a chi-squared value of 12.8 was obtained, which corresponds to a probability of 62$. The second test performed on the two sequences was a poker test. In this test a "hand" of five sexadecimal digits is classified as a bust, one pair, two pairs, three of a kind, full house, straight, four of a kind, or five of a kind. Hands were tested in groups of 10,000. Testing every 5th group of 10,000 beginning with the first such group making a total of 280,000 hands a chi-squared value of 13«81 was obtained for the middle-of-the-square method. This value corresponds to a probability of 5°6$ of obtaining a higher value of chi-squared. For 280,000 hands selected in the same manner a chi-squared value of 7.46 was obtained for the sequence generated by the modified linear combi- nation method. A chi-squared value of 7° 46 for seven degrees of freedom cor- responds to a probability of 39/°. -10- The third test applied is similar to a run or gap test for columns of binary digits and made use of the logical product instruction of the computer, The logical product of two numbers is formed and tested to determine whether the resulting number contains all zeros. If the probability of any digit of a 40 digit number being zero is one-half, the probability of the logical product of two numbers containing all zeros is (3/4) • If the logical product of the first two numbers did not contain all zeros, the product of it with the third number was formed. If this number was not all zeros its product with the fourth number was formed, and so on. The numbers were divided into groups of 20 and a determination was made of how many factors in the logical product of these num- bers were required before a number containing all zeros was obtained. The number of factors were classified as two or less, three, four, ... , twelve, thirteen or more. The number of groups of 20 falling into each of these class- ifications was compared to the theoretical distribution. The theoretical dis- tribution was slightly different for the 38 digit numbers generated by the middle-of-the-square method than for the 40 digit numbers generated by the modi- fied linear combination method, of course. For the first 700,000 numbers of the sequence obtained by the middle-of-the-square method a chi-squared value of 9.1 was obtained, which corresponds to a probability of 60$. For the first 1,000,000 numbers of the modified linear combination sequence chi-squared was 5.8, for which the corresponding probability is 89$« This third test has the advantage that it is very fast, since opera- tions are performed on whole numbers rather than on sexadecimal digits. For example, testing the first 1,000,000 numbers of the sequence obtained by the modified linear combination method required less than 30 minutes of computing time. For this reason this test was the first applied to two other trial meth- ods. One of these was a modified linear combination method which used only two starting numbers. This was tried because it was felt that a routine which generated only one number each time it was entered would be more convenient, and this feature would not be too costly in time if only two numbers were used in the combination. However, the chi-squared value obtained for the one se- quence was 190.3, which is far beyond the acceptable range. -11- The method of analysis applied to the linear combination method is not applicable to the modified linear combination method, because the permu- tation operation V does not have the necessary algebraic properties. Indeed, there does not appear to be any promise that any method of analysis will yield illuminating results. But perhaps this is not an altogether bad feature of a method for generating random numbers. At any rate, in lieu of an analytic determination of the length of the sequence the first ten million numbers of the sequence were tested for repetition. It was found that the sequence does not repeat within the first ten million numbers. As this test was being made every 100,000th group of five numbers of the sequence was printed out, as an aid to future users of the routine who might wish to check for possible mach- ine error or wish to use different parts of the sequence. This list is given in the appendix. -12- APPENDIX F085N08L+- J25291+706 63F95019F7 1L6-479F+3 662487 -L5 6 500000 7350338N5+ J108-4N+5- L9NOL5456L -L8N0J41N5 77L-77027J 3500000 9-2F917FN3 AN-JJ23L72 5L0F2J8JJ+ -8N8FFF5N1 3J7068NL-9 4000000 F00-80L4+J 0LL58L6177 F5604564J7 43-2660+L3 90NL-0N49+ 7000000 9+F87-LL86 6-F-N15F9L 342+57+9NJ 2F3-F68N+5 417719F9N4 7500000 J928+NN+JF 6191261+J3 1J1N3412+L 068369FN-F -32F9F203J 1000000 4500000 8000000 -2515+6-3F 674J-946-7 148+21JJ5+ 005-1LN5+8 4956J3918J 40JJ3J3+F- +NF5F73779 F5F3888+87 64+7-921+1 N-JNF7613F N57-L7-37N 35364J-N09 F7J129F6FL F5+N3993J8 42704-+916 1500000 -J13L36L51 60360L269- LJJ1082-F0 37473-U24 L99F6NL-79 5000000 NNLJJ263N1 8477-6J-5N LNN7960L82 N05N19J272 0L13FN97F3 8500000 9N3LOFNON4 4+-5FL8950 21L+2L8LF4 1+0L14J277 F-N61715+1 2000000 45FNON57L9 6208J2N18- N+N-2L2N77 L2NJ3+F0J2 4J-F7F8898 2500000 5JJ4+L9331 L3-4340+F+ 0-6498L3N3 +NLL34LLJN +65024+N2L 3000000 5500000 F9F-913F+5 353N4^931N 08JL60J287 3519595J+7 -1471NF0J6 6000000 6L864FLO65 103N 688482 OU60-6L25 -608J1+12J 74+454469L 6500000 9000000 6411+668N8 F535273553 -495N5L987 519+N9+N70 09081N-9F5 9500000 +25F73+L+3 2J-1F4J3+9 9NF6719F45 J34106NN21 915777+34- 10000000 292+805182 87F39N8L31 8838F1F37+ 3181F7JJ44 NF698N+042 7982J-230- 4-3569F605 F3488-J030 -2219FF741 80LJ39-J57 L94-L3N79+ -+59895367 9L2+6L+3L7 2385719LF4 52-7596J44