LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 1463c no.51-GO fc NGlNEEKHMiJ - AUG 51976 The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN I m •» flj t.-sM *¥** ■£■**■. T. piytCKi (** confe; OCT 1 8 1!«8 OCT 11 MAY HAY 7BH t- iitti 31990 FEB 27 Htt o 5 1 ... 91 L161 — O-1096 Engin . LLiNQi? <8ANA Ivanced Computation UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA. ILLINOIS 61801 CAC Document No. 56 AUGMENTED SIGNIFICANCE ROUTINES FOR ILLIAC IV By R. J. Lermit and J. M. Randal December 1972 Digitized by the Internet Archive in 2012 with funding from University of Illinois Urbana-Champaign http://archive.org/details/augmentedsignifi56lerm CAC Document No. 56 AUGMENTED SIGNIFICANCE ROUTINES FOR ILLIAC IV By R. J. Lermit and J. M. Randal Center for Advanced Computation University of Illinois at Urbana-Champaign Urbana. Illinois 6l801 December 1972 This work was supported in part by the Advanced Research Projects Agency of the Department of Defense and was monitored by the U. S. Army Research Office-Durham under Contract No. DAHCOU 72-C-0001; and under Contract No. USAF 30(602) -klkk, Order No. 788, implemented initially by the Rome Air Development Center, Research and Technology Division, Air Force Systems Command, Griffiss Air Force Base, New York, and beginning January 17, 1972, by the NASA Ames Research Center, Moffett Field, California. ENGINEERING UMAM ABSTRACT A set of macros to enable users to retain numerical significance during critical phases of a calculation is presented along with the philosophy behind their conception. The macros are designed for speed and have an average accuracy of at least 92 binary places. TABLE OF CONTENTS Page 1. INTRODUCTION 1 2. REPRESENTATION 3 3. USE k k. FUNCTIONS T 5. ACCURACY 9 6. SPACE AND TIMING l6 7. CONCLUSION IT REFERENCES 18 APPENDIX A 19 APPENDIX B 21 APPENDIX C 23 APPENDIX D 27 1. INTRODUCTION These augmented precision algorithms for ILLIAC IV are presented with two aims in mind: - To provide the general user with a method of keeping numeri- cal significance (and therefore accuracy) even though he is dealing with numbers of very disperate size ■ - To record these augmented significance algorithms so that they may be modified by future ILLIAC IV. Their function is thus different from the algorithms described by Yasui [l] in the sense that they do not maintain 96 binary places of accuracy, but only the accuracy desired by the user up to about 92 places. As a consequence, they are faster and less bulky than full double precision routines, and use neither CU instructions nor bits of the RGD. The CU and mode bits are thus left free to control the main algorithm. Because these algorithms are directed toward maintaining signifi- cance, routines are also provided for obtaining augmented numbers from sums and products of single precision numbers. A routine is provided for rounding augmented precision numbers and converting them to single precision. No facilities are provided for reading or printing augmented preci- sion numbers. Indeed it is envisioned that users will use them only temporarily in their programs to ensure accuracy during certain criti- cal parts of a calculation, and convert augmented precision results to single precision before reaching the end of their algorithm. The initiated reader may observe that more elegant macros may be achieved by using "conditional compile" facilities, but since the paper is written for the general user, our aim is clarity at the expense of elegance whenever the latter would make the method less comprehensible. 2. REPRESENTATION A row of augmented precision numbers is represented "by two rows of ordinary ILLIAC IV 6k -hit floating point numbers. [One word (a ); the more significant part, is not necessarily normalized, but its exponent is the true exponent for the whole augmented number. The other word (a ) is the less significant or adjustment part of the number. Its exponent is usually about kQ less than the exponent of a , the mantissa Vi "1 may be of different sign from that of a , but when the mantissa of a is aligned so that it is of the same sign as a and its exponent is exactly kQ less than the exponent of a , then (a , a ) = a + a represents a double precision number with the exponent of a . However, this "aligned" representation is rarely necessary, and is not explicitly maintained by the augmented precision algorithms. It is the relaxation of these rules of representation that give these algorithms their speed and compactness with a small cost in accuracy. Throughout this document a and a are represented as AH and AL respectively, in program text, the terminal H or L representing the more or less significant part of the number respectively. Single precision numbers are represented by single letters without an H or L ending. A single precision number x, is expressed as (x, 0) in augmented precision. An augmented precision number has a value within a neighborhood e of a single precision number if the result of normalizing the difference between the augmented and single precision number has a modulus less than e. A similar relationship holds between augmented precision numbers. 3. USE All augmented arithmetic algorithms presented in this document are represented as defines which, except for a few cases, operate on an augmented precision accumulator represented by the labels ACC and AUXACC. ACC and AUXACC contain the more and less significant part of the aug- mented result. The accumulator is merely a device to reduce the length of parameter bits in the defines, and because the need for augmented precision numbers tends to arise from the need to accumulate sums and products of single precision numbers, saves unnecessary access times as well . The parameters of the defines may be any valid PE address allowable in ASK [2] providing they do not conflict with the register requirements of the defines being invoked (See Appendix B). Augmented precision values may be created by: 1. Assigning ACC and AUXACC the augmented precision value directly: LIT(3) = 1.0; LDA $C3; STA ACC; CLAR; STA AUXACC; (or more directly : LIT(3) = 1.0; DLOAD $C3 = ; ) will create the augmented precision number (l.O, 0) in the accumulator. 2. Using the ILLIAC IV extended add (EAD) [3] or extended sub- tract (ESB) instructions, DEFINE AUX &A &B = LDA &A; EAD &B; STB ACC; STA AUXACC ##; DEFINE SUX &A &B = LDA &A; ESB &S; STB ACC; STA AUXACC ##; will form augmented precision value in the accumulator from the sum or difference respectively of invocation values of &A and &B. 3. Using the MUX define to form the product of two single preci- sion numbers. Because the ILLIAC IV multiply (ML) instruction does not produce a true augmented precision product, the define DML is provided to do so (see Appendix C). The instructions: LDA X; DML Y; leave the more significant part of the augmented precision product of X and Y in RGR and the less significant part in RGA. Thus: DEFINE MUX &A &B = LDA &S; DML &B; STR ACC; STA AUXACC ##; The defines OCLEAR, DLOAD and DSTORE (which respectively, clear, load and store the augmented precision accumulator) are provided so that the user may work with an augmented ILLIAC IV which in addition to the existing ILLIAC IV facilities has a single augmented precision accumu- lator upon which all augmented precision functions act. This augmented precision accumulator destroys the symmetric pro- perties of the augmented precision operations in the sense that it alone is normalized during the augmented precision operations. It is truly an accumulator of sums or products, and its use in this role automatically increases the accuracy of the calculation (See Section 5). To dispense with a unique accumulator would either increase the execution times of the augmented arithmetic functions or reduce their accuracy. Generally speaking, the results of augmented arithmetic calculation are not normalized after the calculation because they will be normalized during the next. However, they are normalized before being converted back into single precision numbers. U. FUNCTIONS In addition to the basic functions described in the last section, the package provides the following facilities: k.l Operations involving single precision numbers. DEFINE DOT &A &B This operation forms the double length product of &A and &B and adds it to the augmented precision accumulator. Its repeated applica- tion forms the scaler product of its successive augments. DEFINE DPLUS &A This operation adds the single precision quantity &A to the aug- mented precision accumulator. DEFINE DMINUS &A This operation subtracts the single precision quantity &A from the augmented precision accumulator. DEFINE DTIMES &A This function multiplies the augmented precision accumulator by the single precision number &A. The result is left in the augmented precision accumulator. k. 2 Operations on the augmented precision accumulator. DEFINE DNEG This operation negates the augmented precision accumulator. DEFINE DRECIP This operation replaces the contents of the augmented precision accumulator by its reciprocal. The method used includes finding the reciprocal of the more significant part of the accumulator and refining this with one application of Newton's method for reciprocation. No other facilities for augmented precision division are provided in the package. DEFINE DNORM This operation normalizes the augmented precision accumulator by normalizing its more significant half and adding its less significant half to the result. This process is repeated once in case the more significant part was originally zero and the less significant part was unno rmal i z e d . DEFINE SINGLE This operation normalizes the augmented precision accumulator and then rounds it by adding a quantity of the correct sign and exponent. The single precision result is left in RGA. h. 3 Operations involving double precision numbers. DEFINE DADD &AH &AL The augmented precision number (&AH, &AL) is added to the augmented precision accumulator. DEFINE DSUB &AH SAL The augmented precision number (&AH, &AL) is subtracted from the augmented precision accumulator. DEFINE DMULT 8AH &AL The augmented precision accumulator is multiplied by the augmented precision number (&AH, &AL). The method used is equivalent to the follow- ing algorithm, except that only the more significant part of MUX &AL &YL is generated and used. DEFINE DMULT &AH &AL = DSTORE YH YL; MUX &AL YL: DOT &AL YH: DOT &AH YL: DOT &AH YH ##: 5. ACCURACY Since the extended precision hardware on ILLIAC IV differs from that pprovided on most machines, it is necessary to consider how this affects the accuracy of the result obtained. Analysis will be confined to the accumulation of the inner product n of two vectors, Y a.b.. This is, by far, the most common use of double i=l X X precision arithmetic. It is assumed that the calculations are being performed serially; i i.e., partial sums are being calculated — S. = Y a.b. from: S. = S. +a.b.. i L J j 1 i-I 1 1 Normally, 6k such inner products would be accumulated simultaneously. However, if summation is being carried out across the PEs by a method such as the "log sum" [6] technique, the bound on the error is even smaller, (See Linz [5]). The method analyzed is, therefore, "worst case". Consider first the accumulation of inner products on a standard computer using a word with t bits mantissa and accumulating products in a double length register of 2t bits. The final sum is then rounded to single precision. Let fl(E) represent the evaluation in floating point arithmetic of th the expression E. If S. is the i partial sum, then S = o S. = fl(s. . + fl (a.x b. )) l i-I l l i = 1, 2, ...,n. The product of the single precision operands a., b. is first formed, giving a double length product; this calculation is exact. Adding this product to the partial sum S. , using double length arithmetic, results in a rounding error: 10 S. = fl(S. . + fl(a.xb.)) 1 l-l i l = fl(S. . + a.b. ) l-l l l = S. , (1+e. ) + a.b. (1+6. ). l-l l li l It may be shown that 3 „-2t lej. |6j (l+E i +2 » ••• (1+E l + n ) 1 = 1 ■ l a i b i (1+ Vi' 1=1 where: (i-f 2 - 2 v i+1 t i + Ti < (1 + | 2 -2tri + i . Since the factor (l ± — 2 ) is inconvenient, we make the assumption that: | n 2" 2t < 0.1. -2t This will be true for any value of n found in practice since 2 is very small. The ineaualities for y. may then be simplified to: l |y. | < | (1.06) (n-i+l) 2 _2t < I (1.06) n 2 _2t . T T Denoting by fl (a b) the result of rounding fl (a b) to single precision, a bound on the absolute error may be found from: T T i |fl 2 1 (a b) - a b| < |fl 2 x (a T b) - fl (a T b)| 11 T T + fl(a b) - a b n < |fl(a T b) | 2 _t + I |a.b. y. I i=l u . n ) a.b. (1+y. ) | 2 ' + I |a.b. y. 1 L n i 1 i ' . L . ' l l i i=l i=l n n | I a.b. | 2 _t + (l+2 _t ) I |a.| |b.| | Y . -'.**_ 11 ...ill i=l i=l < 2 _t |a T b| + | .1.1 n 2" 2t J |a. | |b. | 1=1 Therefore, using the Schwarz inequality, the absolute error is bounded by: -t i T i ^ -Pt i i i i i i i i 2 \aS\ +| .1.1 n 2 ||a|| 2 ||b|| 2 . Note that this does not necessarily give a small relative error; however, unless severe cancellation takes place, the second term is negligible compared with the first. In performing the analogous calculations using ILLIAC IV hardware, the following steps are performed: 5.1 At the i step, a. and b. are multiplied giving a double length product. No round off error is produced. (p.,p.) = fl (a. x b. ) = a.b. 11 l l ii It is assumed that the a. and b. are normalized. This is essential for l l the error analysis. [intuitively we want to push as much of the product into the high precision part as possible. ] 5. 2 The low precision parts of p. and S. are added using an ordinary rounded and normalized add. Call the result u. i 12 u t = fl (pl i + s i + i» 5. 3 S. is added to p., using the EAD instruction, forming S with the overflow going into v.. Normalizing S. _ controls the error. l l-l Intuitively we put as much as the sum in the high order word as possible, Where % denotes the EAD operator, no round off error is involved; in fact: J*- oh h S. + v. = S. _ + p. . i i i=l l And hence, h v h • = / P. ~ L v - S. is formed from u: and v. using a rounded and normalized add: i.e., l l S 1 = fl (u. + v. ) l ii = u. (1 + e.J + v. (1 + e.,) i i3 i i4 where le.J, | e . g | , | e . g | , |e.J < 2~ k \ Using the above equations, we have S. = S. , + a.h. + E. l l-l ii l where E. = [pj (l + ^J + S^ (l + e±2 )] (l + e^) + v. (l + e^) " [Pj + S Ll + V i ] ! = Pi [(1 + Eil ) (1 + c 1 3 ) " 1] + V i HM + S i-1 [(1 + e i2 )(l + e i3 } " 1] Bounds must "be obtained for |p. I, |v. I and Is. _ I . 1 l ' ' 1 i i 1-1 i I 1| 0- 46 1 h i 0- 46 I v I p. < 2 p. : 2 a.b. 1 1 ' ~ ' 1 ' ~ ' 1 1 ' 13 This depends on the fact that a. and b. are normalized. The mantissa 11 part of p. is at least r- since that of both a. and b. is at least — . * r i k 11 2 Because the mantissa part of p. is not greater than 1, and the exponents of p. and p. differing by exactly kQ. Since it is the low precision part of an extended add of S. . and p., a bound for v. is l-l i* ' i' I I -l+6 r i h . I hi, v . < 2 max { S . _,, p.) . 1 i ' - ' i-I ' ' i ' This follows since the exponent of v. is at least U8 less than the larger of S. ., and p., and the mantissa part of p. is at least r- and that of i-I *i' i u S , is at least — (since it is normalized). Thus, 1-1 2 |v.| <2-^max{|Y p* X f v.|, p*} j=l J j=l J \ r n n <2' U6 < 1 |p*| + I k|) and n i £ n n J |v. I < 2 n { J la.b. I + V Iv.l} u ' i ' — "II 1 " ' 1 ' i=l j=l J J j=l J I |t. | U - 2" U6 n} < 2-^n ||a|L 1=1 Hence, for any reasonable n, v i i -1+6 i i i i i i i i I |t | < 1.1 x 2 n |a| | ||b| | i=l It only remains to find a bound for S. . l S 1 = u. (1 + e,J + v. (1 + e ., ) ii i3 i i4 1 = P. (1 + e u ) (1 + E . 3 ) + v. (1 ♦ c . h ) ♦ s^d + e . 2 ) (1 ♦ e 13 ) " j v] (1 ♦ e..) (1 ♦ e. 3 ) (1 + e J+1>2 ) x (l ♦ t.^J ... (l ♦ ^ lit (1 + E i3> + l V J (1 + *ik> (1 + Vl.2 } (1 + W (l + e ) (l + e ) v i2 ; v e i3 1 1 i I P 1 (1 + YJ + I v (1 + 6 ) where, as before, it may be shown that Y .|, |5j s f (1.06) (2n) 2" 4 ° = A. However, there may be up to 2n factors in each terms and the work is being; done in single precision. Therefore, s 1 = y p^ + y v. + y p! Y . + y v. 6. 1 .1=1 J .1=1 x .1=1 J J .1=1 J - 1 But since, s h = I P h - J 1 ;_n J ,_ 3-1 J 3=1 J then. s. = s h + S^ 111 i h i 1 i , i = I Pi + I Pi + I Pi Y, + I v 6 j=l J j=l J j=l J J j=l J J i , i Putting i = n j=l J J j=l J J j=l J J fl (a T b) = S T. = a 1 r 1 r b + ) p. Y. + ) v. 6. . L _, l ' l . L . li i=l i=l The absolute error is, thus: n n n n I I Pi Yi + I v Y | < y |p | A + I |v | A i=l i=l i=l i=l 15 : 2~ h6 I |a/b.|A+ (1.1 x 2' k6 n ||a|L ||b|L)A i=l ^ 1.1 x 2~ 6 A (n + 1) I lal L I lb I L <; 1.1 x 1.06 x 3 x 2 x 2 n(n+l) | |a| | | |b < 2" 92 n(n + 1) ||a|| 2 ||b|| 2 . Rounding the result to single precision, as before, and calling the result fl (a b), the following bound is obtained: |fl 2sl (a T b) - a T b| * 2~ kQ \ a \\ + 2" 92 n(n + l) ||a|| 2 ||b|| 2 . Putting a mantissa size t = U8 into the error bound for a standard machine gives, by comparison: 2" M |a T b| + 1.1 (fn)2- 96 | |.| | g ||b|| 2 . The essential difference between these two results is that the analysis for ILLIAC IV produces a factor of n(n + 1 ) in the second term rather than a factor of n. Even for large n, however, this term should be negligible unless severe cancellation takes place. ILLIAC IV benefits further from a large mantissa size. The method used is therefore fully justified. 16 6. ' SPACE 'AND TIMING Instruction times quoted in the ILLIAC IV Systems Characteristics and Programming Manual [3] have been used in estimating the execution times of the appropriate functions. No FINSTAPE overlap has been assumed. One clock memory access time has been added for RGS and 7 clocks for memory. Each define parameter is assumed to imply a memory access except where actual parameters in the package indicate otherwise Space and execution times are presented in the following table: FUNCTION DADD DCLEAR DLOAD DMINUS DML DMULT DNEG DNORM DOT DPLUS DRECIP DSTORE DSUB DTIMES MUX SINGLE SPACE (syllables ) 7 3 6 8 60 6 9 IT 6 8U k 7 29 11 18 TIME (PE clocks) 72 17 32 52 26 379 36 63 102 52 5^7 32 72 l6l 50 98 Table 6.1 Space and Execution Time Users wishing to convert any of these defines into subroutines should consult Appendix A. 17 7. CONCLUSION These augmented precision routines are designed to help the user retain numerical significance during certain critical parts of a calculation rather than to provide double precision routines per se . The modest sacrifice of accuracy, an economical number representation, and a certain symmetry in otherwise associative operations are, the authors felt, more than amply repaid by increased execution speed. It remains to be seen whether the general user agrees with this thesis. 18 REFERENCES [l] Yasui, T. , Double Precision Algorithms for ILLIAC IV , [2] ILLIAC IV 'Assembler , ILLIAC IV Software Reference Manual , Vol. 2, Chapter 1. [3] ILLIAC IV Programming and Characteristics Manual . [h] Wilkinson, J. H. , The Algebraic Eigenvalue Problem , Oxford, 1965. [5] Linz, Peter, Accurate Floating-Point Summation, Comm. ACM 13, 6 (June 1970), pp. 361-362. [6] Denenberg, S. An Introduction to the ILLIAC IV Syst em. 19 APPENDIX A. Defines as Subroutines Two defines, DMULT (30 words) and DRECTP (1+2 words), occupy enough. space and execute long enough to "be made subroutines (if they are to be invoked more than once) without appreciably degrading their performance. For the purposes of this conversion, the augmented precision functions fall into two classes. A.l Defines Without Parameters Because the augmented precision accumulator is the only operand, the conversion is easy and the subroutine becomes: RECIPS: :RECIP; EXCHL(3) $ICR; and may be called by invoking the standard CALL define: DEFINE CALL &NAME = CLC(3) SLIT (2) = &NAME; EXCHL(3) $ICR ##; thus : CALL RECIPS; This complies with subroutine standards. A. 2 Defines With Parameters There are two useful methods. A. 2.1 The user may declare row variables for use when passing parameters to the subroutine. If XH and XL are user declared variables, then the subroutine may look like the following: MULTS::MULT XH XL; EXCHL(3) $ICR; (where XH, XL have been declared XH:BLK 1; XL:BLK 1;) and the calling 20 sequence would, become LDA AEGH; STA XH; LAD AEGL; STA XL; CALL MULTS: That is, the subroutine has been made parameterless. A. 2.1. A slightly faster method is to use ACARO and ACAR1 to "point" to the correct arguments. The calling sequence would then be: CLC(O); SLIT(O) = ARGH; CLC(l); SLIT(l) = ARGL; CALL MULTSS: where the subroutine is now: MULTSS::MULT 0(0) 0(l); EXCHL(3) $ICR ##; A little more elegance may then be achieved with: DEFINE EXECUTE &NAME &AH &AL = CLC(O); SLIT(O) = &AH; CLC(l); SLIT(l) = &AL; CALL &NAME ## ; Both methods comply with standard subroutine conventions. 21 APPENDIX B. Define Dependencies Some defines invoke others which, in turn, invoke defines. The list below illustrates these dependencies. Invoked defines are followed by a list in parentheses of the defines which they Invoke. DMULT DSTORE, DOT (DML, DADD) DOT DML, DADD DRECIP DTIMES (DML), DMINUS, DNEG DTIMES DML MUX DNORM SINGLE DNORM Two defines use memory locations which must be declared by the user. The memory locations are: DTEMPH: BLK 1; DTEMPL: BLK 1; The defines using those memory locations are: DMULT DTEMPH, DTEMPL DRECIP DTEMPH If the user has rows X and Y say, which are available for use by DMULT or DRECIP, the define: DEFINE DTEMPH = X ##, DTEMPL = Y ##; declared before DMULT or DRECIP is invoked will cause them to use X and Y in place of DTEMPH and DTEMPL. P.E. registers RGR and RGS are used by the following defines: DMULT RGR, RGS DNORM RGR DOT RGR, RGS 22 DRECIP DTIMES MUX DNORM RGR, RGS RGR, RGS RGR, RGS RGR All defines use RGA and RGB. None use RGX. 23 APPENDIX C. The annotated bodies of the augmented precision defines appear below, Their use supposes that exponent order flow does not cause the F-bit to be s et . 2k LI STC J i I'm J 0/01/ 7 I fiTl 1 3 : n DEFjmE ! i"L R« = ML *HJ ASH: LnR =, A ; Lns hi i Li)A =3Fno: ] f j S -I A i 4 {* J A n M f, r , ; m n L / S R » * ; DEFT ME. UTIMF*; * A = LOA Lns STH STA Lf)A O-iL Lns OADn iCC f a ; AliX Arc : ACC S A J X A C C J * a ; • Ea') "A; STH ACC! AnR AUXACCJ STA AUXACf tfi> A = DEFINE LDA ACC N R ~< s ESH ST 3 AnR STA f)?1II-i!lS 4 A '. a; ACC : AUX acc ; AOXACC DEFINE ONilR 1 LnA ACC! n n R ' : EaD L^K LOA MOR LAD STH STA A U X A C c : tA; t.-j ; L '> DOT o. A L iHFJvii DOT VAH )Tt 1'H « * AL ■ PL i)EFpiE ..)ME>'i : L1A acc; C M S A ! sta a<:c ; L ) A A '■ J X A c C : CHS\! STA A 1 1 X A c C 7 F F T J F ) -i r . C T : ' L o a a •: c ; L'ii< « A j L i A = 1 o h : i > : 26 S T A 1 T F -I P •* > DTl^ES OTff^MJ OTITIS UTE"tP-l! M I M 1 1 S ') T Z * ~> H i M I N 1 1 S T E. « * u J 3MEr, **! OEFiNT SINGLF ONUR M Lr)A =7F41 : 16: ShAl '^ : i.dh aa; L A A c c ; AsBt S W A P J AHE.X *3J EAO ACC! S* AP **J 27 APPENDIX D. The Use of Conditional Definition. The reader will notice that the defines DADD and DPLUS differs "by one parameter and one ASK instruction. By usinp: the conditional assembly features of ASK [2], DPLUS and DADD may he combined, as can DMINUS and DSUB: DEFINE DADD &AH &AL = LDA ACC; NORM; EAD &AH; STB ACC; &IF &EMPTY(&AL) &THEN % IF SECOND PARAMETER GIVEN &ELSE ADR &AL; &FI& #THEN USE IT ADR AUXACC; STA AUXACC ##; DEFINE DSUB &AH &AL = LDA ACC; NORM; ESB &AH; STB ACC; &IF &EMPTY(&AL) &THEN $IF SECOND PARAMETER GIVEN &ELSE SBR &AL; &FI ; #THEN USE IT ADR AUXACC; STA AUXACC ##; The invocation: DADD A; is now identical with the invocation 28 DPLUS A; while the call DADD A, B; retains its original meaning. The provision of defines with conditional bodies thus makes the macro package easier to use in the sense that fewer define names need to be learned and understood. It is worth mentioning, however, that combining DTIMES and DMULT is not as neat as the above examples: DEFINE DMULT &AH &AL = JSIF &EMPTY(&AL) &THEN J&IF SECOND PARAMETER ABSENT #THEN ISSUE CODE FOR DTIMES LAD &AH; DML AUXACC; LDS ACC; STR ACC; STA AUXACC; LDA &AH; DML $S; LDS $A; DADD $R $S &ELSE ^OTHERWISE DSTORE DTEMPH DTEMPL; TISSUE CODE FOR DMULT LDA &AL; MLRN DTEMPL; STA ACC; LDB = 0; 29 STB AUXACC; DOT &AH DTEMPL; DOT &AH DTEMPH; DOT &AH DTEMPH &FI; However, the notational economy makes the use of conditional compilation worthwhile. Conditional compilation may also be used to increase the scope of the define. For instance, when accumulating sums of positive numbers, one knows the more significant word of the augmented precision accumula- tor is non-zero, and thus some of the instructions in DNORM are unnecessary. A version of DNORM that may he used to normalize only as far as the more significant half of the augmented precision accumulator or to completely normalize the accumulator might be written as follows: DEFINE DNORM &N = LDA ACC; NORM; EAD AUXACC; &IF &N &THEN #IF &N IS ODD, NORMALIZE TO ACC ONLY &ELSE LDR $A; ^OTHERWISE NORMALIZE WHOLE ACCUMULATOR LDA $B; NORM; EAD $R; &FI; STB ACC; STA AUXACC ##; Thus, the invocation DNORM 1; 30 normalizes only as far as the more significant half of the augmented precision accumulator, while DNORM 2; normalizes the whole augmented precision accumulator. Use of conditional compilation facilities may thus enhance the efficiency of the compiled program without the notational disadvantage of having to provide a myriad of specific subroutines or macros for every function variant. The facilities presented here are mainly illustrative and are not present on the standard library tape. The user, considering the augmented precision macros in the light of his own particular applica- tion, will no doubt devise suitable local enhancements. UNCLASSIFIED Security Classification DOCUMENT CONTROL DATA -RID (SmeuHtf claaaltleatlan at till*, Way of amattmcl mnd InSmutng awntertw miiai ha antatad gfcgg th» ovatall raport la claaalllad I. OKICINATINS AC Ti vi Ty (Carpatata author) Center for Advanced Computation University of Illinois at Urbana-Champaign Urbana. Illinois 6l R(VI i*. «E»0« T $eCU»»l T Y C LA tSI f IC A TlOf UNCLARSTFTT^ 2b. GROUP 3 REPORT TITLE Augmented Significance Routines for ILLIAC IV 4. descriptive MOTH (Typa at ra p art aw| tocluaira aTmtma) Research Report B AU THONISI (Fltat MM, midaUa Initial. Iillm»i J. M. Randal and R. J. Lermit » REPORT OATC December 197? 7a. TOTAL NO. OP PACES 2L 7b. NO. OF RCFI Sa. CONTRACT OR 6RANT NO. DAHCOU 72-C-0001 and USAF 30(602)l T l¥ T b. PROJECT NO. AREA Order No. 1899 and No. 788 •a. ORIGINATOR'S REPORT NUMICKIS) CAC Document No. 56 OTHER REPORT NOIJI (Any omSat number* thai may ba aaalgnad thia fa part) 10 DISTRIBUTION STATEMENT Copies may be requested from the address given in (l) above , II. SUPPLEMENT ARV NOTES 12. SPONSORING MILITARY ACTIVITY U.S. Army Research Office-Durham Duke Station, Durham, North Carolina and: NASA Ames Research Center, Mail Stop 233-14, Moffett Field, Calif. 13. ABSTRACT A set of macros to enable users to retain numerical significance during critical phases of a calculation is presented along with the philosophy behind their conception. The macros are designed for speed and have an average accuracy of at least 92 binary places. DD ,'•" .1473 UNCLASSIFIED Security Classification UNCLASSIFIED Security Classification kcv wonot Error Analysis, Computer Arithmetic Macro-As sembler UNCLASSIFIED Security Classification