LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 1463c 
 
 no.51-GO 
 
 fc NGlNEEKHMiJ - 
 
AUG 51976 
 
 The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 I 
 
 m •» flj t.-sM *¥** ■£■**■. T. 
 
 piytCKi 
 
 
 (** 
 
 confe; 
 
 OCT 1 8 1!«8 
 OCT 11 
 
 MAY 
 HAY 7BH 
 
 t- 
 
 iitti 
 31990 
 
 FEB 27 
 
 Htt o 5 1 
 
 ... 91 
 
 
 L161 — O-1096 
 

Engin . 
 
 LLiNQi? 
 
 <8ANA 
 
 Ivanced Computation 
 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA. ILLINOIS 61801 
 
 CAC Document No. 56 
 
 AUGMENTED SIGNIFICANCE ROUTINES 
 FOR ILLIAC IV 
 
 By 
 R. J. Lermit and J. M. Randal 
 
 December 1972 
 
Digitized by the Internet Archive 
 
 in 2012 with funding from 
 
 University of Illinois Urbana-Champaign 
 
 http://archive.org/details/augmentedsignifi56lerm 
 
CAC Document No. 56 
 
 AUGMENTED SIGNIFICANCE ROUTINES 
 FOR ILLIAC IV 
 
 By 
 
 R. J. Lermit 
 
 and 
 J. M. Randal 
 
 Center for Advanced Computation 
 University of Illinois at Urbana-Champaign 
 Urbana. Illinois 6l801 
 
 December 1972 
 
 This work was supported in part by the Advanced Research Projects Agency 
 of the Department of Defense and was monitored by the U. S. Army Research 
 Office-Durham under Contract No. DAHCOU 72-C-0001; and under Contract No. 
 USAF 30(602) -klkk, Order No. 788, implemented initially by the Rome Air 
 Development Center, Research and Technology Division, Air Force Systems 
 Command, Griffiss Air Force Base, New York, and beginning January 17, 
 1972, by the NASA Ames Research Center, Moffett Field, California. 
 
ENGINEERING UMAM 
 
 ABSTRACT 
 
 A set of macros to enable users to retain numerical significance 
 during critical phases of a calculation is presented along with the 
 philosophy behind their conception. The macros are designed for speed 
 and have an average accuracy of at least 92 binary places. 
 
TABLE OF CONTENTS 
 
 Page 
 
 1. INTRODUCTION 1 
 
 2. REPRESENTATION 3 
 
 3. USE k 
 
 k. FUNCTIONS T 
 
 5. ACCURACY 9 
 
 6. SPACE AND TIMING l6 
 
 7. CONCLUSION IT 
 
 REFERENCES 18 
 
 APPENDIX A 19 
 
 APPENDIX B 21 
 
 APPENDIX C 23 
 
 APPENDIX D 27 
 
1. INTRODUCTION 
 
 These augmented precision algorithms for ILLIAC IV are presented 
 with two aims in mind: 
 
 - To provide the general user with a method of keeping numeri- 
 cal significance (and therefore accuracy) even though he is dealing 
 with numbers of very disperate size ■ 
 
 - To record these augmented significance algorithms so that 
 they may be modified by future ILLIAC IV. 
 
 Their function is thus different from the algorithms described by 
 Yasui [l] in the sense that they do not maintain 96 binary places of 
 accuracy, but only the accuracy desired by the user up to about 92 
 places. As a consequence, they are faster and less bulky than full 
 double precision routines, and use neither CU instructions nor bits of 
 the RGD. The CU and mode bits are thus left free to control the main 
 algorithm. 
 
 Because these algorithms are directed toward maintaining signifi- 
 cance, routines are also provided for obtaining augmented numbers from 
 sums and products of single precision numbers. A routine is provided 
 for rounding augmented precision numbers and converting them to single 
 precision. 
 
 No facilities are provided for reading or printing augmented preci- 
 sion numbers. Indeed it is envisioned that users will use them only 
 temporarily in their programs to ensure accuracy during certain criti- 
 cal parts of a calculation, and convert augmented precision results to 
 single precision before reaching the end of their algorithm. 
 
 The initiated reader may observe that more elegant macros may be 
 achieved by using "conditional compile" facilities, but since the 
 
paper is written for the general user, our aim is clarity at the 
 expense of elegance whenever the latter would make the method less 
 comprehensible. 
 
2. REPRESENTATION 
 
 A row of augmented precision numbers is represented "by two rows of 
 ordinary ILLIAC IV 6k -hit floating point numbers. [One word (a ); the 
 more significant part, is not necessarily normalized, but its exponent 
 is the true exponent for the whole augmented number. The other word 
 (a ) is the less significant or adjustment part of the number. Its 
 exponent is usually about kQ less than the exponent of a , the mantissa 
 
 Vi "1 
 
 may be of different sign from that of a , but when the mantissa of a 
 is aligned so that it is of the same sign as a and its exponent is 
 exactly kQ less than the exponent of a , then (a , a ) = a + a 
 represents a double precision number with the exponent of a . 
 
 However, this "aligned" representation is rarely necessary, and is 
 not explicitly maintained by the augmented precision algorithms. It 
 is the relaxation of these rules of representation that give these 
 algorithms their speed and compactness with a small cost in accuracy. 
 
 Throughout this document a and a are represented as AH and AL 
 respectively, in program text, the terminal H or L representing the more 
 or less significant part of the number respectively. Single precision 
 numbers are represented by single letters without an H or L ending. 
 
 A single precision number x, is expressed as (x, 0) in augmented 
 precision. 
 
 An augmented precision number has a value within a neighborhood e 
 of a single precision number if the result of normalizing the difference 
 between the augmented and single precision number has a modulus less 
 than e. A similar relationship holds between augmented precision 
 numbers. 
 
3. USE 
 
 All augmented arithmetic algorithms presented in this document are 
 represented as defines which, except for a few cases, operate on an 
 augmented precision accumulator represented by the labels ACC and AUXACC. 
 ACC and AUXACC contain the more and less significant part of the aug- 
 mented result. The accumulator is merely a device to reduce the length 
 of parameter bits in the defines, and because the need for augmented 
 precision numbers tends to arise from the need to accumulate sums and 
 products of single precision numbers, saves unnecessary access times 
 as well . 
 
 The parameters of the defines may be any valid PE address allowable 
 in ASK [2] providing they do not conflict with the register requirements 
 of the defines being invoked (See Appendix B). 
 
 Augmented precision values may be created by: 
 
 1. Assigning ACC and AUXACC the augmented precision value directly: 
 LIT(3) = 1.0; 
 
 LDA $C3; 
 
 STA ACC; 
 
 CLAR; 
 
 STA AUXACC; 
 (or more directly : 
 
 LIT(3) = 1.0; 
 
 DLOAD $C3 = ; ) 
 will create the augmented precision number (l.O, 0) in the accumulator. 
 
 2. Using the ILLIAC IV extended add (EAD) [3] or extended sub- 
 tract (ESB) instructions, 
 
 DEFINE AUX &A &B = 
 
LDA &A; 
 
 EAD &B; 
 
 STB ACC; 
 
 STA AUXACC ##; 
 
 DEFINE SUX &A &B = 
 
 LDA &A; 
 
 ESB &S; 
 
 STB ACC; 
 
 STA AUXACC ##; 
 will form augmented precision value in the accumulator from the sum 
 or difference respectively of invocation values of &A and &B. 
 
 3. Using the MUX define to form the product of two single preci- 
 sion numbers. Because the ILLIAC IV multiply (ML) instruction does 
 not produce a true augmented precision product, the define DML is 
 provided to do so (see Appendix C). The instructions: 
 
 LDA X; 
 
 DML Y; 
 leave the more significant part of the augmented precision product 
 of X and Y in RGR and the less significant part in RGA. Thus: 
 
 DEFINE MUX &A &B = 
 
 LDA &S; 
 
 DML &B; 
 
 STR ACC; 
 
 STA AUXACC ##; 
 The defines OCLEAR, DLOAD and DSTORE (which respectively, clear, load 
 and store the augmented precision accumulator) are provided so that the 
 user may work with an augmented ILLIAC IV which in addition to the 
 
existing ILLIAC IV facilities has a single augmented precision accumu- 
 lator upon which all augmented precision functions act. 
 
 This augmented precision accumulator destroys the symmetric pro- 
 perties of the augmented precision operations in the sense that it 
 alone is normalized during the augmented precision operations. It is 
 truly an accumulator of sums or products, and its use in this role 
 automatically increases the accuracy of the calculation (See Section 5). 
 To dispense with a unique accumulator would either increase the execution 
 times of the augmented arithmetic functions or reduce their accuracy. 
 
 Generally speaking, the results of augmented arithmetic calculation 
 are not normalized after the calculation because they will be normalized 
 during the next. However, they are normalized before being converted 
 back into single precision numbers. 
 
U. FUNCTIONS 
 
 In addition to the basic functions described in the last section, 
 the package provides the following facilities: 
 k.l Operations involving single precision numbers. 
 DEFINE DOT &A &B 
 This operation forms the double length product of &A and &B and 
 adds it to the augmented precision accumulator. Its repeated applica- 
 tion forms the scaler product of its successive augments. 
 DEFINE DPLUS &A 
 This operation adds the single precision quantity &A to the aug- 
 mented precision accumulator. 
 DEFINE DMINUS &A 
 This operation subtracts the single precision quantity &A from the 
 augmented precision accumulator. 
 DEFINE DTIMES &A 
 This function multiplies the augmented precision accumulator by 
 the single precision number &A. The result is left in the augmented 
 precision accumulator. 
 
 k. 2 Operations on the augmented precision accumulator. 
 DEFINE DNEG 
 This operation negates the augmented precision accumulator. 
 
 DEFINE DRECIP 
 This operation replaces the contents of the augmented precision 
 accumulator by its reciprocal. The method used includes finding the 
 reciprocal of the more significant part of the accumulator and refining 
 this with one application of Newton's method for reciprocation. No 
 other facilities for augmented precision division are provided in the 
 package. 
 
DEFINE DNORM 
 This operation normalizes the augmented precision accumulator by 
 normalizing its more significant half and adding its less significant 
 half to the result. This process is repeated once in case the more 
 significant part was originally zero and the less significant part was 
 unno rmal i z e d . 
 
 DEFINE SINGLE 
 This operation normalizes the augmented precision accumulator and 
 then rounds it by adding a quantity of the correct sign and exponent. 
 The single precision result is left in RGA. 
 h. 3 Operations involving double precision numbers. 
 DEFINE DADD &AH &AL 
 The augmented precision number (&AH, &AL) is added to the augmented 
 precision accumulator. 
 
 DEFINE DSUB &AH SAL 
 The augmented precision number (&AH, &AL) is subtracted from the 
 augmented precision accumulator. 
 DEFINE DMULT 8AH &AL 
 The augmented precision accumulator is multiplied by the augmented 
 precision number (&AH, &AL). The method used is equivalent to the follow- 
 ing algorithm, except that only the more significant part of MUX &AL &YL 
 
 is generated and used. 
 
 DEFINE DMULT &AH &AL = 
 
 DSTORE YH YL; 
 
 MUX &AL YL: 
 
 DOT &AL YH: 
 
 DOT &AH YL: 
 
 DOT &AH YH ##: 
 
5. ACCURACY 
 
 Since the extended precision hardware on ILLIAC IV differs from that 
 
 pprovided on most machines, it is necessary to consider how this affects 
 
 the accuracy of the result obtained. 
 
 Analysis will be confined to the accumulation of the inner product 
 n 
 of two vectors, Y a.b.. This is, by far, the most common use of double 
 
 i=l X X 
 precision arithmetic. 
 
 It is assumed that the calculations are being performed serially; 
 
 i 
 
 i.e., partial sums are being calculated — S. = Y a.b. from: S. = S. +a.b.. 
 
 i L J j 1 i-I 1 1 
 
 Normally, 6k such inner products would be accumulated simultaneously. 
 However, if summation is being carried out across the PEs by a method 
 such as the "log sum" [6] technique, the bound on the error is even 
 smaller, (See Linz [5]). The method analyzed is, therefore, "worst case". 
 
 Consider first the accumulation of inner products on a standard 
 computer using a word with t bits mantissa and accumulating products in 
 a double length register of 2t bits. The final sum is then rounded to 
 single precision. 
 
 Let fl(E) represent the evaluation in floating point arithmetic of 
 
 th 
 the expression E. If S. is the i partial sum, then 
 
 S = 
 o 
 
 S. = fl(s. . + fl (a.x b. )) 
 
 l i-I l l 
 
 i = 1, 2, ...,n. 
 
 The product of the single precision operands a., b. is first formed, 
 
 giving a double length product; this calculation is exact. Adding this 
 
 product to the partial sum S. , using double length arithmetic, results 
 in a rounding error: 
 
10 
 
 S. = fl(S. . + fl(a.xb.)) 
 1 l-l i l 
 
 = fl(S. . + a.b. ) 
 l-l l l 
 
 = S. , (1+e. ) + a.b. (1+6. ). 
 l-l l li l 
 
 It may be shown that 
 
 3 „-2t 
 
 lej. |6j <f2 
 
 [See for instance Wilkinson [U]]» and hence: 
 
 n 
 
 fl (aHa) = fl ( 7 a.b. ) = S 
 i. l i n 
 i=l 
 
 ■ l a l b i (1+ «i» (1+e i + l> (l+E i +2 » ••• (1+E l + n ) 
 
 1 = 1 
 
 ■ l a i b i (1+ Vi' 
 
 1=1 
 
 where: 
 
 (i-f 2 - 2 v i+1 t i + Ti < (1 + | 2 -2tri + i . 
 
 Since the factor (l ± — 2 ) is inconvenient, we make the 
 assumption that: 
 
 | n 2" 2t < 0.1. 
 
 -2t 
 This will be true for any value of n found in practice since 2 is very 
 
 small. The ineaualities for y. may then be simplified to: 
 
 l 
 
 |y. | < | (1.06) (n-i+l) 2 _2t 
 
 < I (1.06) n 2 _2t . 
 
 T T 
 
 Denoting by fl (a b) the result of rounding fl (a b) to single 
 
 precision, a bound on the absolute error may be found from: 
 
 T T i 
 |fl 2 1 (a b) - a b| 
 
 < |fl 2 x (a T b) - fl (a T b)| 
 
11 
 
 T T 
 
 + fl(a b) - a b 
 
 n 
 < |fl(a T b) | 2 _t + I |a.b. y. I 
 i=l 
 
 u . n 
 
 ) a.b. (1+y. ) | 2 ' + I |a.b. y. 
 1 L n i 1 i ' . L . ' l l i 
 
 i=l i=l 
 
 n n 
 
 | I a.b. | 2 _t + (l+2 _t ) I |a.| |b.| | Y . 
 
 -'.**_ 11 ...ill 
 
 i=l i=l 
 
 < 2 _t |a T b| + | .1.1 n 2" 2t J |a. | |b. | 
 
 1=1 
 
 Therefore, using the Schwarz inequality, the absolute error is bounded 
 by: 
 
 -t i T i ^ -Pt i i i i i i i i 
 
 2 \aS\ +| .1.1 n 2 ||a|| 2 ||b|| 2 . 
 
 Note that this does not necessarily give a small relative error; however, 
 unless severe cancellation takes place, the second term is negligible 
 compared with the first. 
 
 In performing the analogous calculations using ILLIAC IV hardware, 
 the following steps are performed: 
 
 5.1 At the i step, a. and b. are multiplied giving a double length 
 product. No round off error is produced. 
 
 (p.,p.) = fl (a. x b. ) = a.b. 
 
 11 l l ii 
 
 It is assumed that the a. and b. are normalized. This is essential for 
 
 l l 
 
 the error analysis. [intuitively we want to push as much of the product 
 into the high precision part as possible. ] 
 
 5. 2 The low precision parts of p. and S. are added using an ordinary 
 
 rounded and normalized add. Call the result u. 
 
 i 
 
12 
 
 u t = fl (pl i + s i + i» 
 
 5. 3 S. is added to p., using the EAD instruction, forming S with 
 
 the overflow going into v.. Normalizing S. _ controls the error. 
 
 l l-l 
 
 Intuitively we put as much as the sum in the high order word as possible, 
 
 Where % denotes the EAD operator, no round off error is involved; in 
 
 fact: 
 
 J*- oh h 
 
 S. + v. = S. _ + p. . 
 i i i=l l 
 
 And hence, 
 
 h v h 
 
 • = / P. ~ L v - 
 
 S. is formed from u: and v. using a rounded and normalized add: i.e., 
 l l 
 
 S 1 = fl (u. + v. ) 
 l ii 
 
 = u. (1 + e.J + v. (1 + e.,) 
 i i3 i i4 
 
 where le.J, | e . g | , | e . g | , |e.J < 2~ k \ 
 
 Using the above equations, we have 
 
 S. = S. , + a.h. + E. 
 l l-l ii l 
 
 where E. = [pj (l + ^J + S^ (l + e±2 )] (l + e^) + v. (l + e^) 
 
 " [Pj + S Ll + V i ] ! 
 
 = Pi [(1 + Eil ) (1 + c 1 3 ) " 1] + V i HM + S i-1 [(1 + e i2 )(l + e i3 } " 1] 
 
 Bounds must "be obtained for |p. I, |v. I and Is. _ I . 
 
 1 l ' ' 1 i i 1-1 i 
 
 I 1| 0- 46 1 h i 0- 46 I v I 
 p. < 2 p. : 2 a.b. 
 
 1 1 ' ~ ' 1 ' ~ ' 1 1 ' 
 
13 
 
 This depends on the fact that a. and b. are normalized. The mantissa 
 
 11 
 
 part of p. is at least r- since that of both a. and b. is at least — . 
 * r i k 11 2 
 
 Because the mantissa part of p. is not greater than 1, and the exponents 
 
 of p. and p. differing by exactly kQ. Since it is the low precision 
 
 part of an extended add of S. . and p., a bound for v. is 
 
 l-l i* ' i' 
 
 I I -l+6 r i h . I hi, 
 
 v . < 2 max { S . _,, p.) . 
 
 1 i ' - ' i-I ' ' i ' 
 
 This follows since the exponent of v. is at least U8 less than the larger 
 
 of S. ., and p., and the mantissa part of p. is at least r- and that of 
 i-I *i' i u 
 
 S , is at least — (since it is normalized). Thus, 
 1-1 2 
 
 |v.| <2-^max{|Y p* X f v.|, p*} 
 j=l J j=l J 
 
 \ r n n 
 
 <2' U6 < 1 |p*| + I k|) 
 
 and 
 
 n i £ n n 
 
 J |v. I < 2 n { J la.b. I + V Iv.l} 
 
 u ' i ' — "II 1 " ' 1 ' 
 
 i=l j=l J J j=l J 
 
 I |t. | U - 2" U6 n} < 2-^n ||a|L 
 1=1 
 
 Hence, for any reasonable n, 
 
 v i i -1+6 i i i i i i i i 
 
 I |t | < 1.1 x 2 n |a| | ||b| | 
 
 i=l 
 
 It only remains to find a bound for S. . 
 
 l 
 
 S 1 = u. (1 + e,J + v. (1 + e ., ) 
 
 ii i3 i i4 
 
 1 
 
 = P. (1 + e u ) (1 + E . 3 ) + v. (1 ♦ c . h ) ♦ s^d + e . 2 ) (1 ♦ e 13 ) 
 " j v] (1 ♦ e..) (1 ♦ e. 3 ) (1 + e J+1>2 ) x (l ♦ t.^J ... (l ♦ ^ 
 
lit 
 
 (1 + E i3> + l V J (1 + *ik> (1 + Vl.2 } (1 + W 
 
 (l + e ) (l + e ) 
 v i2 ; v e i3 
 
 1 1 i 
 
 I P 1 (1 + YJ + I v (1 + 6 ) 
 
 where, as before, it may be shown that 
 
 Y .|, |5j s f (1.06) (2n) 2" 4 ° = A. 
 
 However, there may be up to 2n factors in each terms and the work is being; 
 done in single precision. Therefore, 
 
 s 1 = y p^ + y v. + y p! Y . + y v. 6. 
 
 1 .1=1 J .1=1 x .1=1 J J .1=1 J - 1 
 
 But since, 
 
 s h = I P h - J 
 
 1 ;_n J ,_ 
 
 3-1 J 3=1 J 
 
 then. 
 
 s. = s h + S^ 
 111 
 
 i h i 1 i , i 
 
 = I Pi + I Pi + I Pi Y, + I v 6 
 j=l J j=l J j=l J J j=l J J 
 
 i , i 
 
 Putting i = n 
 
 j=l J J j=l J J j=l J J 
 
 fl (a T b) = S 
 
 T. 
 = a 
 
 1 r 1 r 
 
 b + ) p. Y. + ) v. 6. 
 
 . L _, l ' l . L . li 
 
 i=l i=l 
 
 The absolute error is, thus: 
 
 n n n n 
 
 I I Pi Yi + I v Y | < y |p | A + I |v | A 
 i=l i=l i=l i=l 
 
15 
 
 : 2~ h6 I |a/b.|A+ (1.1 x 2' k6 n ||a|L ||b|L)A 
 i=l 
 
 ^ 1.1 x 2~ 6 A (n + 1) I lal L I lb I L 
 
 <; 1.1 x 1.06 x 3 x 2 x 2 n(n+l) | |a| | | |b 
 
 < 2" 92 n(n + 1) ||a|| 2 ||b|| 2 . 
 
 Rounding the result to single precision, as before, and calling the 
 result fl (a b), the following bound is obtained: 
 
 |fl 2sl (a T b) - a T b| * 2~ kQ \ a \\ + 2" 92 n(n + l) ||a|| 2 ||b|| 2 . 
 
 Putting a mantissa size t = U8 into the error bound for a standard 
 machine gives, by comparison: 
 
 2" M |a T b| + 1.1 (fn)2- 96 | |.| | g ||b|| 2 . 
 
 The essential difference between these two results is that the 
 analysis for ILLIAC IV produces a factor of n(n + 1 ) in the second term 
 rather than a factor of n. Even for large n, however, this term should 
 be negligible unless severe cancellation takes place. ILLIAC IV 
 benefits further from a large mantissa size. The method used is 
 therefore fully justified. 
 
16 
 
 6. ' SPACE 'AND TIMING 
 
 Instruction times quoted in the ILLIAC IV Systems Characteristics 
 and Programming Manual [3] have been used in estimating the execution 
 times of the appropriate functions. No FINSTAPE overlap has been 
 assumed. One clock memory access time has been added for RGS and 7 
 clocks for memory. Each define parameter is assumed to imply a memory 
 access except where actual parameters in the package indicate otherwise 
 Space and execution times are presented in the following table: 
 
 FUNCTION 
 
 DADD 
 
 DCLEAR 
 
 DLOAD 
 
 DMINUS 
 
 DML 
 
 DMULT 
 
 DNEG 
 
 DNORM 
 
 DOT 
 
 DPLUS 
 
 DRECIP 
 
 DSTORE 
 
 DSUB 
 
 DTIMES 
 
 MUX 
 
 SINGLE 
 
 SPACE 
 (syllables ) 
 
 7 
 3 
 
 6 
 
 8 
 60 
 
 6 
 
 9 
 IT 
 
 6 
 8U 
 
 k 
 
 7 
 29 
 11 
 18 
 
 TIME 
 (PE clocks) 
 
 72 
 
 17 
 
 32 
 
 52 
 
 26 
 379 
 
 36 
 
 63 
 102 
 
 52 
 5^7 
 
 32 
 
 72 
 l6l 
 
 50 
 
 98 
 
 Table 6.1 Space and Execution Time 
 
 Users wishing to convert any of these defines into subroutines 
 should consult Appendix A. 
 
17 
 
 7. CONCLUSION 
 
 These augmented precision routines are designed to help the user 
 retain numerical significance during certain critical parts of a 
 calculation rather than to provide double precision routines per se . 
 The modest sacrifice of accuracy, an economical number representation, 
 and a certain symmetry in otherwise associative operations are, the 
 authors felt, more than amply repaid by increased execution speed. 
 
 It remains to be seen whether the general user agrees with this 
 thesis. 
 
18 
 
 REFERENCES 
 
 [l] Yasui, T. , Double Precision Algorithms for ILLIAC IV , 
 
 [2] ILLIAC IV 'Assembler , ILLIAC IV Software Reference Manual , Vol. 2, 
 Chapter 1. 
 
 [3] ILLIAC IV Programming and Characteristics Manual . 
 
 [h] Wilkinson, J. H. , The Algebraic Eigenvalue Problem , Oxford, 1965. 
 
 [5] Linz, Peter, Accurate Floating-Point Summation, Comm. ACM 13, 6 
 (June 1970), pp. 361-362. 
 
 [6] Denenberg, S. An Introduction to the ILLIAC IV Syst 
 
 em. 
 
19 
 
 APPENDIX A. Defines as Subroutines 
 
 Two defines, DMULT (30 words) and DRECTP (1+2 words), occupy enough. 
 space and execute long enough to "be made subroutines (if they are to be 
 invoked more than once) without appreciably degrading their performance. 
 For the purposes of this conversion, the augmented precision functions 
 fall into two classes. 
 A.l Defines Without Parameters 
 
 Because the augmented precision accumulator is the only operand, the 
 conversion is easy and the subroutine becomes: 
 
 RECIPS: :RECIP; 
 
 EXCHL(3) $ICR; 
 and may be called by invoking the standard CALL define: 
 
 DEFINE CALL &NAME = 
 
 CLC(3) 
 
 SLIT (2) = &NAME; 
 
 EXCHL(3) $ICR ##; 
 thus : 
 
 CALL RECIPS; 
 This complies with subroutine standards. 
 A. 2 Defines With Parameters 
 
 There are two useful methods. 
 A. 2.1 The user may declare row variables for use when passing parameters 
 to the subroutine. If XH and XL are user declared variables, then the 
 subroutine may look like the following: 
 
 MULTS::MULT XH XL; 
 
 EXCHL(3) $ICR; 
 (where XH, XL have been declared XH:BLK 1; XL:BLK 1;) and the calling 
 
20 
 
 sequence would, become 
 
 LDA AEGH; 
 
 STA XH; 
 
 LAD AEGL; 
 
 STA XL; 
 
 CALL MULTS: 
 That is, the subroutine has been made parameterless. 
 
 A. 2.1. A slightly faster method is to use ACARO and ACAR1 to "point" 
 to the correct arguments. The calling sequence would then be: 
 
 CLC(O); 
 
 SLIT(O) = ARGH; 
 
 CLC(l); 
 
 SLIT(l) = ARGL; 
 
 CALL MULTSS: 
 where the subroutine is now: 
 
 MULTSS::MULT 0(0) 0(l); 
 
 EXCHL(3) $ICR ##; 
 A little more elegance may then be achieved with: 
 
 DEFINE EXECUTE &NAME &AH &AL = 
 
 CLC(O); 
 
 SLIT(O) = &AH; 
 
 CLC(l); 
 
 SLIT(l) = &AL; 
 
 CALL &NAME ## ; 
 Both methods comply with standard subroutine conventions. 
 
21 
 
 APPENDIX B. Define Dependencies 
 
 Some defines invoke others which, in turn, invoke defines. The list 
 below illustrates these dependencies. Invoked defines are followed by 
 
 a list in parentheses of the defines which they Invoke. 
 DMULT DSTORE, DOT (DML, DADD) 
 
 DOT DML, DADD 
 
 DRECIP DTIMES (DML), DMINUS, DNEG 
 
 DTIMES DML 
 
 MUX DNORM 
 
 SINGLE DNORM 
 
 Two defines use memory locations which must be declared by the 
 user. The memory locations are: 
 
 DTEMPH: BLK 1; 
 
 DTEMPL: BLK 1; 
 The defines using those memory locations are: 
 
 DMULT DTEMPH, DTEMPL 
 
 DRECIP DTEMPH 
 
 If the user has rows X and Y say, which are available for use by DMULT 
 or DRECIP, the define: 
 
 DEFINE DTEMPH = X ##, DTEMPL = Y ##; 
 declared before DMULT or DRECIP is invoked will cause them to use X 
 and Y in place of DTEMPH and DTEMPL. 
 
 P.E. registers RGR and RGS are used by the following defines: 
 
 DMULT RGR, RGS 
 
 DNORM RGR 
 
 DOT RGR, RGS 
 
22 
 
 DRECIP 
 DTIMES 
 MUX 
 DNORM 
 
 RGR, RGS 
 
 RGR, RGS 
 RGR, RGS 
 RGR 
 
 All defines use RGA and RGB. None use RGX. 
 
23 
 
 APPENDIX C. 
 
 The annotated bodies of the augmented precision defines appear below, 
 Their use supposes that exponent order flow does not cause the F-bit to 
 be s et . 
 
2k 
 
 LI STC J i I'm J 0/01/ 7 I fiTl 1 3 : n 
 
 DEFjmE ! i"L R« = 
 
 ML *HJ 
 
 ASH: 
 
 LnR =, A ; 
 
 Lns hi i 
 
 Li)A =3Fno: ] f j 
 
 S -I A i 4 {* J 
 
 A n M f, r , ; 
 
 m n L / S R » * ; 
 
 DEFT ME. UTIMF*; * A = 
 
 LOA 
 
 Lns 
 
 STH 
 STA 
 Lf)A 
 O-iL 
 
 Lns 
 
 OADn 
 
 iCC 
 
 f a ; 
 
 AliX 
 Arc : 
 ACC S 
 A J X A C C J 
 * a ; 
 <s; 
 ^ a ; 
 
 SH IS 
 
 s * 
 
 DEFp'E OPLUS 
 LnA ACC : 
 
 ti fl R k> • 
 
 Ea') "A; 
 
 STH ACC! 
 
 AnR AUXACCJ 
 
 STA AUXACf tfi> 
 
 A = 
 
 DEFINE 
 LDA ACC 
 N R ~< s 
 ESH 
 ST 3 
 AnR 
 STA 
 
 f)?1II-i!lS 4 A 
 
 '. a; 
 ACC : 
 AUX acc ; 
 
 AOXACC 
 
 DEFINE ONilR 1 
 LnA ACC! 
 n n R ' : 
 EaD 
 
 L^K 
 LOA 
 MOR 
 LAD 
 STH 
 STA 
 
 A U X A C c : 
 tA; 
 t.-j ; 
 
 <R; 
 ACC; 
 
 A m x AC <" 
 
 OEFTNE DCLEAn = 
 
 CL^i : 
 
 STA Arc: 
 
 STA A.JXAC C - '■ 
 
 UEFT'-iE JL- 
 L D A \t\v- ; 
 STA A C c ; 
 
 l^a * al ; 
 
 STA A J X A C C 
 
 A y M H 
 
 A! 
 
25 
 
 uEFtNE )STJ?r 
 LnA acc: 
 
 STA «. A ri ! 
 
 l_ n a a . j y 4 c r : 
 STA KM »»; 
 
 4 An 4A[_ = 
 
 pFFtnE A f} -AH 
 L T A A c c ; 
 
 n n « » s 
 
 EAU 
 ST8 
 
 STA 
 
 » Ah : 
 ACC i 
 *. it J 
 A I X i C C 
 
 A U X A C C 
 
 Au = 
 
 « 1 
 
 OEFtNE 5 'J 
 LOA ACC: 
 •N R 4 J 
 
 lsh < ah; 
 
 ST« ACC : 
 
 S 8 q < A L i 
 AQK AUXACC: 
 STA A'KACC 
 
 * A H < A L 
 
 OEF"lNE 10 «: 
 L A < 4 ! 
 iHL, <H» 
 
 STH ACC : 
 STA A'JXACC 
 
 lA *•* 
 
 i * 
 
 L lA i M 
 
 ■)ml * <; 
 
 LIS *, 4 ; 
 
 4 Or) SH 5 5 *i 
 
 DEFINE OM'JLT '.AH 
 I) S T t) * E ) T E -i 3 h u T :• 
 L r\ A •: A L ; 
 .■IL^M r)TEi; J L s 
 
 STA ACC: 
 U)8 aOJ 
 
 S T 8 4 1 1 X A C " J 
 
 DOT \ AH DTE 1:>L '> 
 DOT o. A L iHFJvii 
 DOT VAH )Tt 1'H « 
 
 * AL 
 ■ PL 
 
 i)EFpiE ..)ME>'i : 
 L1A acc; 
 
 C M S A ! 
 
 sta a<:c ; 
 
 L ) A A '■ J X A c C : 
 
 CHS\! 
 
 STA A 1 1 X A c C 7 
 
 F F T J F ) -i r . C T : ' 
 
 L o a a •: c ; 
 
 L'ii< « A j 
 
 L i A = 1 o h : i > : 
 
26 
 
 S T A 1 T F -I P •* > 
 DTl^ES OTff^MJ 
 OTITIS UTE"tP-l! 
 M I M 1 1 S ') T Z * ~> H i 
 M I N 1 1 S T E. « * u J 
 3MEr, **! 
 
 OEFiNT SINGLF 
 
 ONUR M 
 
 Lr)A =7F41 : 16: 
 
 ShAl '^ : 
 i.dh aa; 
 L A A c c ; 
 
 AsBt 
 S W A P J 
 
 AHE.X *3J 
 EAO ACC! 
 S* AP **J 
 
27 
 
 APPENDIX D. The Use of Conditional Definition. 
 
 The reader will notice that the defines DADD and DPLUS differs "by 
 one parameter and one ASK instruction. By usinp: the conditional 
 assembly features of ASK [2], DPLUS and DADD may he combined, as can 
 DMINUS and DSUB: 
 
 DEFINE DADD &AH &AL = 
 
 LDA ACC; 
 
 NORM; 
 
 EAD &AH; 
 
 STB ACC; 
 
 &IF &EMPTY(&AL) &THEN % IF SECOND PARAMETER GIVEN 
 
 &ELSE ADR &AL; &FI& #THEN USE IT 
 
 ADR AUXACC; 
 
 STA AUXACC ##; 
 
 DEFINE DSUB &AH &AL = 
 
 LDA ACC; 
 
 NORM; 
 
 ESB &AH; 
 
 STB ACC; 
 
 &IF &EMPTY(&AL) &THEN $IF SECOND PARAMETER GIVEN 
 
 &ELSE SBR &AL; &FI ; #THEN USE IT 
 
 ADR AUXACC; 
 
 STA AUXACC ##; 
 The invocation: 
 
 DADD A; 
 is now identical with the invocation 
 
28 
 
 DPLUS A; 
 while the call 
 
 DADD A, B; 
 retains its original meaning. 
 
 The provision of defines with conditional bodies thus makes the 
 macro package easier to use in the sense that fewer define names need 
 to be learned and understood. 
 
 It is worth mentioning, however, that combining DTIMES and DMULT 
 is not as neat as the above examples: 
 
 DEFINE DMULT &AH &AL = 
 
 JSIF &EMPTY(&AL) &THEN J&IF SECOND PARAMETER ABSENT 
 
 #THEN ISSUE CODE FOR DTIMES 
 
 LAD &AH; 
 
 DML AUXACC; 
 
 LDS ACC; 
 
 STR ACC; 
 
 STA AUXACC; 
 
 LDA &AH; 
 
 DML $S; 
 
 LDS $A; 
 
 DADD $R $S 
 
 &ELSE ^OTHERWISE 
 
 DSTORE DTEMPH DTEMPL; TISSUE CODE FOR DMULT 
 
 LDA &AL; 
 
 MLRN DTEMPL; 
 
 STA ACC; 
 
 LDB = 0; 
 
29 
 
 STB AUXACC; 
 
 DOT &AH DTEMPL; 
 
 DOT &AH DTEMPH; 
 
 DOT &AH DTEMPH &FI; 
 However, the notational economy makes the use of conditional compilation 
 worthwhile. 
 
 Conditional compilation may also be used to increase the scope of 
 the define. For instance, when accumulating sums of positive numbers, 
 one knows the more significant word of the augmented precision accumula- 
 tor is non-zero, and thus some of the instructions in DNORM are 
 unnecessary. A version of DNORM that may he used to normalize only as 
 far as the more significant half of the augmented precision accumulator 
 or to completely normalize the accumulator might be written as follows: 
 
 DEFINE DNORM &N = 
 
 LDA ACC; 
 
 NORM; 
 
 EAD AUXACC; 
 
 &IF &N &THEN #IF &N IS ODD, NORMALIZE TO ACC ONLY &ELSE 
 
 LDR $A; ^OTHERWISE NORMALIZE WHOLE ACCUMULATOR 
 
 LDA $B; 
 
 NORM; 
 
 EAD $R; &FI; 
 
 STB ACC; 
 
 STA AUXACC ##; 
 Thus, the invocation 
 
 DNORM 1; 
 
30 
 
 normalizes only as far as the more significant half of the augmented 
 precision accumulator, while 
 
 DNORM 2; 
 normalizes the whole augmented precision accumulator. 
 
 Use of conditional compilation facilities may thus enhance the 
 efficiency of the compiled program without the notational disadvantage 
 of having to provide a myriad of specific subroutines or macros for 
 every function variant. 
 
 The facilities presented here are mainly illustrative and are not 
 present on the standard library tape. The user, considering the 
 augmented precision macros in the light of his own particular applica- 
 tion, will no doubt devise suitable local enhancements. 
 
UNCLASSIFIED 
 
 Security Classification 
 
 DOCUMENT CONTROL DATA -RID 
 
 (SmeuHtf claaaltleatlan at till*, Way of amattmcl mnd InSmutng awntertw miiai ha antatad gfcgg th» ovatall raport la claaalllad 
 
 I. OKICINATINS AC Ti vi Ty (Carpatata author) 
 
 Center for Advanced Computation 
 University of Illinois at Urbana-Champaign 
 Urbana. Illinois 6l R(VI 
 
 i*. «E»0« T $eCU»»l T Y C LA tSI f IC A TlOf 
 
 UNCLARSTFTT^ 
 
 2b. GROUP 
 
 3 REPORT TITLE 
 
 Augmented Significance Routines for ILLIAC IV 
 
 4. descriptive MOTH (Typa at ra p art aw| tocluaira aTmtma) 
 
 Research Report 
 
 B AU THONISI (Fltat MM, midaUa Initial. Iillm»i 
 
 J. M. Randal and R. J. Lermit 
 
 » REPORT OATC 
 
 December 197? 
 
 7a. TOTAL NO. OP PACES 
 
 2L 
 
 7b. NO. OF RCFI 
 
 Sa. CONTRACT OR 6RANT NO. 
 
 DAHCOU 72-C-0001 and USAF 30(602)l T l¥ T 
 
 b. PROJECT NO. 
 
 AREA Order No. 1899 and No. 788 
 
 •a. ORIGINATOR'S REPORT NUMICKIS) 
 
 CAC Document No. 56 
 
 OTHER REPORT NOIJI (Any omSat number* thai may ba aaalgnad 
 
 thia fa part) 
 
 10 DISTRIBUTION STATEMENT 
 
 Copies may be requested from the address given in (l) above , 
 
 II. SUPPLEMENT ARV NOTES 
 
 12. SPONSORING MILITARY ACTIVITY 
 
 U.S. Army Research Office-Durham 
 Duke Station, Durham, North Carolina 
 and: NASA Ames Research Center, Mail 
 
 Stop 233-14, Moffett Field, Calif. 
 
 13. ABSTRACT 
 
 A set of macros to enable users to retain numerical significance 
 during critical phases of a calculation is presented along with the philosophy 
 behind their conception. The macros are designed for speed and have an average 
 accuracy of at least 92 binary places. 
 
 DD ,'•" .1473 
 
 UNCLASSIFIED 
 
 Security Classification 
 
UNCLASSIFIED 
 
 Security Classification 
 
 kcv wonot 
 
 Error Analysis, Computer Arithmetic 
 Macro-As sembler 
 
 UNCLASSIFIED 
 
 Security Classification