LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 lifer no. £19- &26 cop. 2 v5 /« * r VLbf*^ uiucdcs-r- 7^-622 January, 197^ /ftUA Transportation of Higher- Level Language Programs : Exemplified by an ALGOL 68 Transportation Representation by Wilfred J. Hansen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS Ubh,'*Y 4 1974 1 r Oh ILLINOIS Digitized by the Internet Archive in 2013 http://archive.org/details/transportationof622hans UIUCDCS-R-7U-622 Transportation of Higher- Level Language Programs : Exemplified by an ALGOL 68 Transportation Representation by Wilfred J. Hansen January, 197^ Department of Computer Science University of Illinois at Urbana- Champaign Urbana, Illinois This research was supported by the Department of Computer Science, 11 Abstract Problem: Modern higher level languages, for example, ALGOL 68, must be defined to allow many alternate token representations to accommodate the variety of input/ output equipment available. Because of this variety, it may not be possible to transport programs from one hardware representation to another by any simple process of substitution. Solution: As an example of the proposed approach, this paper defines a six-bit encoding for ALGOL 68 program texts. It is an encoded character prefix representation, so it can be decoded without lookahead. The encoding is defined in detail and a decoder for a specific hardware representation is given. Ill Acknowle dgment s The concept of a syllable encoding was suggested to me by- Charles Lindsey when I voiced an intention to produce a hardware representation proposal. (To the extent that his suggestion was intended to dissuade my intention, it was unsuccessful (2). ) This paper would not have been completed without accepting (against inclination) C. A. R. Hoare's belief that "the quickest way to distinguish a good designer from a bad one is whether he prefers to avoid a problem rather than solving it." (5, p. 19) I hope I have avoided enough problems. IV Table of Contents page The Problem : Transporting Program Texts 1 I. What criteria must a transportation representation satisfy? 3 II. What ALGOL 68 problems must a transportation representation face?.. k III. Structure of the proposed representation 5 IV. What special problems may face implementations? 6 IV. 1. Token separation 6 IV. 2. Multi- symbol representations 8 IV. 3. Treatment of nonstandard graphics 8 IV. k. Encoders and decoders for implementations of variants 9 IV. 5. Spaces and visible spaces 10 IV. 6. What to convert to standard codes 10 IV. 7. Puns 11 V. Conclusion 12 References 1^4- Appendix A Specification of the ALGOL 68 Transportation Representation 15 A. 1 Syllables and their functions 15 A.2 Representations of standard symbols 19 A. 3 Summary of codes and standard symbols 21 Appendix B Decoder from the Transportation Representation to a Hardware Representation 22 B. 1 The hardware representation 22 B.2 Description of the decoder 22 B. 3 The decoder 26 The Problem: Transporting Program Texts One of the advantages originally claimed for higher level languages was machine- independence; the "same" program would run on many different machines. Experience has revealed that such languages are even more valuable as programming tools because they enhance communication between programmer, debugger, and modifier. Nonethless, there is still considerable need for machine independence and the transportation of programs between installations. For example, reference (k) defines a subset of FORTRAN that is accepted by most compilers; references (3) and (2) define ALGOL 68 representations that require only a reasonable, widely available set of graphics. (The transportation representation proposed below in no way depends on the hardware representation suggestions in (2). Those suggestions were made only to precisely define a vehicle for the decoder in Appendix B. ) In many cases, as J. McCarthy (7) has pointed out, the concrete syntax of a language is less critical than the abstract syntax - the inherent expressive features of the language. At least one recent language, Revised ALGOL 68 (l, hereinafter called the "Report"), has reflected this lesson; the defining document specifies the abstract syntax with great precision, but leaves many of the details of token representation to the discretion of implementors. This permits the freedom necessary to implement the language on a wide variety of input/ output devices, but raises a new problem: token representations may be so incompatible that programs cannot be trans- ported by any process of simple substitution. This paper will first explore the dimensions of the problem and then describe a solution for ALGOL 68 in the form of a "transportation representation" text encoding for interchange of programs. Without this representation, 2 program sharing among n implementations would require n transliteration programs; with it, each implementation need have only an encoder and a decoder for a total of 2n transliterators. (One can conceive of a compiler with numerous pragmat selected front ends to accept many hardware representa- tions, but the multiplicity of the latter seems to doom the former. ) A further advantage of the transportation representation will be the possibility of providing a decoder to translate to a publication language. Simple substitution may not suffice to transport program texts for several reasons. Sometimes one symbol is used for several functions. For example, in ALGOL 68 the symbol " " must be transliterated to either entier or lwb depending on the context. More serious problems are posed by "stropping, " the techniques used to distinguish boldface text from roman text. Some implementations demand apostrophes around boldface (whence the name stropping); others require backspacing and underlining; still others reduce clutter by specifying that boldface words are restricted and can never be used with a non-bold meaning. Many transformations between stropping conventions require complex analysis, including possibly tables of reserved words. A third problem occurs if the text contains lines longer than those accepted by the target compiler. Efforts to preserve original indentation are warranted because of its value in reading and modifying the text. Unfortunately, in contexts like bold words and strings it is not possible to freely intersperse spaces and new lines. Proper treatment of overlength lines is, in fact, a very context dependent problem. I. What criteria must a transportation representation satisfy? It must encode as much information from the original as possible. Where ALGOL 68 provides a choice between brief and bold forms, the programmer's choice should be recorded. When the encoding of '(b|c|d)' is decoded it should not become 'if b then c else d fi ' . Original spacing should be recorded so some semblance can be reproduced, depending on the line length of the decoding media. 1 Text "close" to ALGOL 68 ought to be accepted as well as perfect programs. Programs should not be transported with syntax errors, but someone will probably want to do it. Moreover, this provision will make it possible to transport programs written in super-sets of ALGOL 68. ° To be transportable to the maximum number of locations, the representation should be encoded in a small byte size. Six bits (6^ unique values) can be represented conveniently on the transput media for almost all machines. ° The representation design should emphasize simplicity of decoding, Encoders can be based on a compiler's existing token scanner. ° The representation should not expand the size of the text by any significant factor. The proposed representation collapses multiple blanks to a two syllable code, so its output is significantly more compact than, say, cards. For magnetic tape, this factor would be less important. II. What ALGOL 68 problems must a transportation representation face? o Represent two type faces, roman and bold, and two cases, upper and lower. ° Contend with unusual graphics specified in the Report; consider ' [] ', ' |o ', '°', not to mention the proposal in (2) for '-C*. Contend with graphics not mentioned in the Report but available at particular installations and used as "other monads" or "other string items." ° Deal with string and pragment tokens which can contain diverse graphics and even non-graphics. o Solve the ambiguities inherent in multiple symbols for a single operator. For ease of decoding, '[' should be encoded not as just itself but as to whether it represents itself, lwb, or entjer . ° Consider the problems of implementations that allow, for instance, different representations of ' : ' in its contexts as, say, label-symbol and routine -symbol. o Devote some attention to the existence of national variants of ALGOL 68. Should the transportation representation provide for several alphabets? What should be done about national variants of the bold-tags, the tags defined in the standard prelude, and the letters in real and bits denotations and in format texts. 5 III. Structure of the proposed representation . Three approaches to a transportation representation seem possible: (l) a standard representation in, perhaps, ASCII, (2) an encoded token level transliteration, (3) a syntax oriented representation. The first would seem a chimera since it would be difficult to get general agreement and because it could not encode all the token structure and unusual graphics that might be available in some implementations. It would have the advantage of being both human readable and machine readable. The third is very attractive from a number of viewpoints. Only the abstract syntax would be recorded so a decoder could easily choose whatever label-symbol or routine-symbol was applicable. Unfortunately, a syntactic representation takes more storage (see the Appendices in (6)), is complex to encode and decode, and implies some standard, agreed upon grammar. Based on these considerations, I chose the second option, a token level prefix code that can be decoded without look-ahead. The encoded form of the text is a sequence of operator, letter, and digit syllables, represented as six bit quantities. These syllables are specified in detail in Appendix A. Syllables in the code can change the case and the type face of succeeding letters. Special ALGOL 68 symbols and operators defined in the standard prelude are represented by a prefix syllable (#7) followed by a syllable to select the symbol (quote-symbol is thus #7> #2; 't ' as a left-shift-operator is #7, #35) • The decoder is simplified by assignment of different codes to each of the possible uses of 't ', '~', and '['. Provision is made for 6 "national letters, " 6k other nonstandard letters, and ^l6o "other monads." To reduce the size of encoded text, multiple blanks are represented with just two syllables, and there is provision to assign any syllable to invoke a sequence of syllables. Unlike special character symbols, there is no implied list of bold-tags or standard prelude tags; these are simply spelled out using appropriate type face changes. Consequently, superset languages may use this transportation representation even if they have additional standard bold-tags. IV. What special problems may face implementations? IV. 1. Token separation Trivial but annoying problems are posed to encoders and decoders by the need to determine transitions between roman tags, bold tags, and numbers. These problems are made a little more difficult if the hardware representation of the implementation uses reserved word stropping, but they are hardly insurmountable. To avoid expansion of the text, no codes are transmitted to specify token types. A decoder must keep track of the current token type to provide for proper separation. In fact, this requirement is the reason for inclusion in the decoder below of the non-trivial routine, CHMGE_STATE. Encoders in general may omit face shifts if tags are separated by spaces or special characters. Thus only one face shift at the beginning would be needed for either abs long 1 + abs or x + sin(y)/z2 + 7. The spaces separate the hold tags and the special characters separate the roman tags. The encoder may need to insert separation in several cases: a) Stropped bold tags separated only by strop characters: 'ref'real'. The encoder may insert spaces for strop characters, but preferably it will only mark the separation by inserting a bold face shift. b) Stropped bold tag followed by a number. Again, a face shift may be inserted. c) Number followed by a tag. This is perfectly legal and no separation would be needed. Some implementations will mark spaces in tags in some way. But, because tags and non-tag letters may appear in format, it is not always trivial to determine when a space is within a tag. For this reason, encoders must specially mark such spaces (with the token space code, #2). Encoders for reserved word hardware representations can best be built as an adjunct of the compiler and use the compiler's routines for distinguishing bold and roman. Decoders can easily distinguish bold from roman, but must properly strop the bold. The three cases discussed for encoders must be considered for the decoder, too. Where a bold tag is separated from a succeeding bold tag or number by no more than a face shift code, the decoder must insert stropping characters or spaces to separate them. If prefix stropping is used, a bold tag immediately after a number must be detected so the stropping can be inserted. 8 To avoid confusion of roman and bold, reserved word implementations must check every roman tag for equality to a bold tag and modify the roman in some way--perhaps by the addition of an "x". IV. 2. Mult i -symbol representations A few implementations may choose distinct representations for distinct symbols that are assigned the same representation by the Report (e.g., ':' suffices for four distinct symbols in the Report but some implementation may use another representation for one of the four — perhaps isdefinedtoimplyresumptionatthispoint for label-symbol) . The encoder for such an implementation will map the distinct representations into the single standard code provided by the transportation representation (#26 represents ':')• The decoder must solve the more context dependent problem of distinguishing among the various meanings of the standard code. This decision to represent only the basic graphics in the transportation representation means that the majority of encoders will be simplified at the expense of a small number of decoders. IV. 3 • Treatment of nonstandard graphics . - The proposed transportation representation provides for both 'national letter' and 'other monad' special characters. These characters may appear in five different contexts: letters appear in tags and bold tags, monads in TAO's, and both appear in strings and pragments. An encoder for an implementation that allows special characters must assign one of the appropriate codes to each character and output that code for each occurrence of the character. Accompanying the encoded text should be documentation specifying the characters the encoder has associated with each special code. Decoders must react appropriately to each type of special character in each of the five contexts. Default actions for each case are specified in Appendix B. In earlier versions the decoder accepted run- time input for (l) specification of transliterations for all special marks; (2) specification of notes — where the special character occurs, a pointer appears under the transliteration and the text of the note appears in the righthand margin. Another option a decoder might provide would be a specification of whether a TAO containing a special character should be transliterated. The user might not want this to occur if he was able to specify a monad to replace the special character in TAO's. IV. k. Encoders and decoders for implementations of variants . Like the Report itself, this proposal is slanted toward the English alphabet and English identifiers in the standard prelude. Considerable thought was devoted to allowing a kind of alphabet shift --similar to a case shift — to specify that the text was constructed from characters in some other alphabet. (Possibly even providing a representation for letter-aleph! ) With this scheme, however, every decoder would have had large tables of transliterations and considerable code that would in practice have been rarely used. Instead, this extra effort should be expended only by implementers for variant languages. They will transliterate into English to transport programs with the representation proposed here. 10 Several classes of token must be translated to English: standard bold tags, tags in the standard prelude, and letters in denotations and format texts. Other letters can be strictly transliterated in such a way that when the text is decoded back into its original language, it will be the same. The text will probably be meaningless English, but it could be compiled with an English speaking compiler. If used in this way, incidentally, there is no reason why the transportation representation cannot serve equally well to transport between two variant implementations. IV. 5 • Spaces and visible spaces Some ALGOL 68 implementations will not provide a representation for visible-space. They must still encode spaces within strings to the standard representation for visible-space. Moreover, on decoding they must ignore ordinary spaces in strings and convert visible-spaces to spaces. IV. 6. What to convert to standard codes Certain symbols are represented by both standard graphics and bold tags; the question arises as to when to encode a bold tag as the corresponding standard symbol. For example, nil in a program may be encoded as either nil or standard code $62. If the latter option were chosen, when it was decoded it might appear as ' °', depending on the character set available. In general it is preferable to encode a bold tag as a bold tag so the decoded text is as close as possible to the original. Appendix A. 2 reflects this by enclosing bold tag representations in parentheses in the fourth column. This signifies that any encoder I write will not output the standard code on the left (using the bold tag instead), but the decoder will respond to the standard code by producing the given bold tag. 11 A few graphics are well defined and widely available but also have bold alternatives. These I would encode as the standard code: NE, LT, LE, GE, GT, EQ, NOT, AND, OR, OF, CO. In addition, I would encode an 'e' in a real denotation as the code for times -ten-to-the -power. To keep the encoder simple I would convert style-i-sub(or bus) -symbol to open (or close )- symbol rather than to brief-sub(or bus) -symbol. IV.7 Puns A 'pun' occurs when two different sequences of codes map into the same ALGOL 68 text. For example, the decoder in Appendix B can generate puns six ways : 1) Appending "x" to tags that would otherwise be mistaken for bold reserved words. (The tags "real" and "realx" both map into 'realx". ) The generation of the "x" is an attempt to prevent the more serious pun of converting an input roman tag (which must have been stropped in the original text) into a bold tag. 2) Transliteration of special letters and monads. ("Zy" and code 32 both map into "Zy". The bold tag QO and the first "other monad" both map to "QO'". ) 3) Transliteration for illegal standard code operands. ( ERRlS and standard code 18 both map to "ERR18'".) k) Transliterations for legal standard codes. ( ELEM and standard code 31+ (window) both map to "ELEM'".) 5) Representation of two different nomads — asterisk and times — with 12 6) Representation of standard codes with diphthongs. ("+»" (as two standard codes) and standard code ill (plus-i-times) both map to "+*".) Puns will seldom occur in practice, but for this reason they are all the more dangerous — their possible existence may be forgotten. When they do occur, they will usually generate syntax errors during a subsequent compilation. If not, the worst has occurred, and lengthy frustration may ensue. To guard against this, the decoder must maintain a symbol table and take steps to correct all possible puns. V. Conclusion It should be noted that this proposal is only a small step toward machine independence. The far more crucial (and difficult) problems of arithmetic precision and bits- and bytes-widths still remain. Perhaps these can be remedied by some informal agreement that so many long ' s specify at least so much precision. The compiler would then give a warning message if it could not satisfy a precision request. It has been suggested that mode definitions can provide machine independence; however, denotations must have their length specified, and there is no coercion of length. Implementors ' efforts to provide encoders and decoders for the transportation representation will permit wide interchange of ALGOL 68 programs. This is one more step toward machine independence and freedom from the dictates of computer manufacturers. 13 References (1) van Wijngaarden, A., et al., Almost the Revised Report on the Algorithmic Language ALGOL 68, private communication (Sept., 1973). This version is slightly more recent than the version accepted at Los Angeles and includes most of the corrections agreed on there. (2) Hansen, W. J., A Revised ALGOL 68 Hardware Representation for ISO-code and EBCDIC, submitted to ALGOL Bulletin (Nov., 1969). (3) Lindsey, C. H., "An ISO-Code Representation for ALGOL 68," ALGOL Bulletin 31 (March, 1970 ), pp. 37-60. 00 Hall, A. D. Jr., "A 'Portable' FORTRAN IV Subset," Bell Telephone Laboratories, Murray Hill, New Jersey, Sept., 1969* unpublished; cited in Ryder, B. G., "The FORTRAN Verifier: User's Guide," Computing Science Tech. Rep. #12, Bell Telephone Laboratories, Murray Hill, New Jersey, March, 1973. (5) Hoare, C. A. R., "Hints on Programming Language Design." Draft manuscript distributed at SIGPLAN/SIGACT Symposium on Principles of Programming Languages, Boston (Oct., 1973). (6) Hansen, W. J., "Creation of Hierarchic Text with a Computer Display," Ph. Do Thesis, Stanford University, June, 1971* (7) McCarthy, John, "Towards a Mathematical Science of Computation," in Information Processing, 1962 (C. M. Popplewell, Ed. ), North-Holland Publ. Co., Amsterdam, 1963, pp. 21-28. 11+ Appendix A. Specification of the ALGOL 68 Transportation Representation An ALGOL 68 program is represented in the Transportation Representation as a sequence of six-bit syllables. Some syllables represent characters directly (#25 represents the character '9'), some sequences represent characters (#7, #33 is '& ' ), and some syllables only affect the state of the decoder (#11 switches to boldface). To facilitate debugging, the meaning of each code has been assigned with reference to the ISO-codes: code i is related to the ISO character at position i_+32 (e.g., #33 is 'A' and ISO character 65 is 'A'). The definitions of the syllables are given in section A.l. Section A. 2 specifies the representation of standard symbols and A. 3 summarizes both the definitions and the standard symbols. A.l Syllables and their functions a. Syllables that represent standard symbols directly: #0 space #1 I #8 ( #9 ) #12 , #26 : #27 ; #29 := b. Syllables that represent digits and letters: #16 #33 A • • • • • • #25 9 #58 z c. Syllables that represent unassigned letters: #32, #59, #60, #61, #62, #63 For each of these, a decoder may substitute letters not used for codes #33 through #58 or some carefully chosen string. d. Syllables with assigned functions : (Parenthesized terms indicate that succeeding input codes serve as arguments to the function syllable. ) 15 #ll+ - consecutive spaces (number of spaces) The argument syllable is interpreted as a number of spaces to be inserted in the output text. #2 - token space This syllable encodes a space occurring within a tag or number. It must be the first syllable in any string of typographical display features (codes #lh, #15* #5) in a tag or number. #15 - newline (number of spaces) A new line of output text is begun and started with the indicated number of spaces. #5 - new page (number of syllables) A new page of output is begun and its first line initially contains the specified number of spaces. (Decoders may treat this function as identical to #15 (new line). ) #28 - upper case Succeeding output letters are upper case. Case does not affect the graphic representation of digits and special characters. Decoders may ignore case shifts if the output representation has only one case or uses case for stropping. To preserve the appearance of the text, encoders for representations that strop with case should shift to bold case and the strop case simultaneously; but case shift is never to be interpreted by decoders as an indication of stropping. #30 - lower case Shift output letters to lower case. See the remarks for #28 (upper case). #11 - boldface Succeeding letters and any digits following letters are output as bold face (and stropped as necessary). Type face has no effect on special 16 characters and digits not following letters. The boldface shift may be omitted between two bold tags if they are separated by spaces or special characters. Bold shift may be inserted to separate two bold tags which in the original were separated only by stropping characters. See the discussion in Section IV. 1. #13 - roman face Succeeding characters are to be roman. This code may be omitted between two roman tags if they are separated by special characters. See the discussion in Section IV. 1. #7 - standard symbol (symbol code) The next code byte selects a standard symbol (according to tables A. 2 and A. 3). For example, a quote-symbol is #7, #2, quote-image- symbol is #7 , #7 , and a visible-space is #7, #63. #7 (standard symbol) may also select any of the codes noted in section A.l.a which represent standard symbols directly (thus #7, #8 could be used for open- symbol instead of #8 alone ) . The standard symbols represent functions rather than symbols in Chapter 9 of the Report. Thus there are, for example, three standard symbols corresponding to the Report's tilde- symbol: #52 (tilde- symbol), #i+7 (skip- operator), and #^3 (negation-operator). Moreover certain operators are represented, though they appear only in Chapter 10: for example #^5 (modulo-operator). Where a symbol only functions as a single operator, only one standard symbol is provided: e.g., #36 (down-symbol) for the shift -right -operator. #3 - pragment mark (pragment delimiter, pragment mark, pragment, pragment mark) Because there are many pragment delimiters, the job of the decoder is facilitated if there is a unique code to signal the bounds of a 17 pragment. The original pragment delimiter must be preserved, so it is encoded between two pragment marks and they are followed by the pragment and a third pragment mark. There is a net reduction of text if the delimiter is comment or pragmat , and in other cases the encoded text can be shortened with judicious use of #10 (define). #31 - other monad (monad number) Various character sets support various other monads. If an encoder is written for a character set with N other monads, they will be assigned monad numbers one to N. A decoder can represent these with other monads available in the target character set or with a transliterated bold tag. Other monads may also appear in strings and pragments; here they represent some symbol an explanation of which should accompany the encoded text when it is transported. #U - U096 monads (first digit of monad number, second digit of monad number) Some installations will have more than 6U other monads. The excess may be encoded with this code, where the digits of the monad number are interpreted in base 6k. The remarks with #31 (other monad) apply. #6 - other letter (letter number) If the six codes allocated for other letters are not enough, Gh more letters can be represented with this code. The decoder may replace each such letter with any suitable transliteration. #10 - define (code defined, definition length, definition string) To further shorten the encoded text, single codes may be assigned a string of codes as their meaning. The codes listed in sections A.l.a and A.l.c above are readily available for reassignment, but other codes can be used. If a code is assigned a definition string of length zero, its original interpretation is restored. For example, a comment delimited by 18 hash marks would be encoded as "#3, #7 ', #3, #3, text of comment, #3. " This could be shortened if the unas signed code #3? were defined to have the value #3, #7 ' , #3, #3. The definition would be established by the sequence #10, #32, #U, #3, #7, #3, #3. A. 2 Representations of standard symbols code representat ions hardware symbol reference notes 11+ point symbol , # 37 times ten to the power symbol 10 e 60 times ten to the power alter. \ (e) 2 quote symbol 1! it 7 quote image symbol If IT it it space symbol space space 7 63 space alternate • (space) 5^ or symbol V or 1+ •6 and symbol /\ & 1+ 33 ampersand symbol & (am) 1+ 58 differs from symbol t /=" 1+ 28 is less than symbol < < 5 6 1+1+ is at most symbol 4 <= 1+ 39 is at least symbol > >= 1+ 30 is greater than symbol > > 5 6 15 divided by symbol / L 5 6 5 over symbol • i 1+ ^ modulo operator fX 1o* 8 25 percent symbol i (es.) 1+ 3h window symbol (elem) 1+ 22 floor symbol L (fl) 1+ 21 entier operator L (entier) 8 20 lower bound operator L (lwb) 8 23 ceiling symbol r (upb) 1+ 1+1 plus i times symbol 1 +* 1+ 1+6 not symbol -1 s or ~ 1+ 52 tilde symbol r+j (tl) 1+ hi skip operator rsj (skip) 8 h3 negation operator f*j (ng) 8 36 down symbol I (shr) 1+ 53 up symbol t (he) 1+ 50 raised to operator t (**) 8 1 This code must follow the function code #7 (standard symbol) to invoke the symbol or operator shown in the second column. 2 See Hardware Representation (2). Parentheses around a graphic indicate the encoder (as discussed in section IV. 6) will never output the corresponding code, but in response to the code the decoder in Appendix B will produce the graphic. More notes on next page, code symbol representations reference hardware 19 notes 35 shift left operator r (shl) 8 kQ power operator X-* -*"* 8 11 plus symbol f + k 13 minus symbol - k 16 equals symbol = 5 6 10 times symbol X •* 5 6 51 asterisk symbol x- (*) 5 6 17 assigns to symbol = : = : 6 29 becomes symbol : = : = 6 7 12 comma symbol i > 7 27 semicolon symbol 5 5 7 26 colon symbol : : 7 8 open symbol ( ( 7 9 close symbol ) 7 1 stick symbol ? 7 55 again symbol : 9 • 59 brief sub symbol (b 61 brief bus symbol i 0) 32 at symbol @ @ 57 is symbol := : 56 is not symbol £ :/=: 62 nil symbol 3 (nil) 38 of symbol -c -< k formatter symbol $ $ 31 brief comment symbol f. i CO 3 3 style ii comment symbol ? \ r 3 In ISO/ASCII, co maps to co and {. . . ) maps to ^ ... ji. k This standard symbol is a monad defined in the Report. 5 Nomad defined in the Report. 6 Because in a TAO this code may follow a transliterated monad, the following transliterations are sometimes used - code 28 30 15 16 10 51 17 29 symbol < > / = X * =: ;= transliteration LT GT DV EQ TM ST TO AB 7 The code at the left is a function code that will invoke this standard symbol directly. (E.g., #29 alone may be used instead of #7> #29 to encode the becomes- symbol. ) 8 This symbol is not a monad and is so treated by the decoder in Appendix B, A. 3 Summary of codes and standard symbols 20 code ISO function std. symbol code ISO function std. symbol space space space 32 @ @ 1 i 33 A A & 2 n token space n 3^ B B a 3 # pragm* snt # 35 C C t (shl) k $ U096 monads $ 36 D D I 5 i new page • • 37 E E 10 6 & other letter •\ 38 F F -c 7 • std. : symbol IT IT 39 G G ^ 8 ( ( ( ko H H 9 ) ) ) la I I 1 10 * define X k2 J J 11 + bold + h3 K K ~(neg) 12 > > ) kk L L £ 13 - roman - h5 M M ;x Ik • spaces m k6 N N -1 15 l new line / hi ~ ( skip ) 16 - kS P P ■**- 17 1 1 * h9 Q Q 18 2 2 50 R R t (pow) 19 3 3 51 S S * 20 k k L(lwb) 52 T T /Nrf 21 5 5 |_ (entier) 53 U U t 22 6 6 L 5^ V V ss 23 7 7 r 55 w W 1 : 2k 8 8 56 X X -4- 25 9 9 i 57 Y Y 26 : : '. 58 Z Z '7 27 5 > > 59 [ [ 28 < upper case < 60 \ \ 29 = : = : = 61 ] i 30 > lower case > 62 'N 31 9 other monad £ 63 « code ISO function std. symbol value of six-bit code byte to invoke this function and standard symbol. graphic in position code+ 32 in the ISO code. the action or graphic selected by this code as a function. standard symbol selected if this code appears as an operand to code 7. 21 Appendix B. Decoder from the Transportation Representation to a Hardware Representation B.l The hardware representation The decoder is written in--and produces text in— the ISO/ASCIl/EBCDIC hardware representation defined in (2). The principle differences from the reference language are '-<• for of, '?' for then and else , and in or out for ' | ' meaning in or out . With three exceptions, bold tags defined in the report are reserved as recommended in (2). User defined bold tags and i, im, and re must be marked by a final apostrophe. Case is optional for all tokens, but in writing the program below I exercised this option to make bold tags upper case. (The program output depends on the case specified by its input. ) Tags and numbers may contain typographical display features (R9.Ud), but each string of them in a tag must contain at least one underline ("_"). As a concession to this decoder, the compiler allows the following diphthongs as monads: +*, >=, <=, and /=. (+* is the representation of the plus-i-times-symbol. ) This provision allows operators like >=< and /=/, but no ambiguity arises. Indeed, no ambiguity would arise if any number of nomads were allowed in operators, instead of at most two. B.2 Description of the decoder The input is on the file 'CODE_IN', a stream of six bits code syllables as defined in Appendix A. The output is a "reconstituted" ALGOL 68 program on the file 'RA68_OUT'; a listing is also produced on 'STAM)_OUT\ In general, the main routine of the decoder reads a code from the input and uses its value to select and execute an element of the 'row-proc-void' CODE_FUNCTION. The element is usually one of the procedures whose name begins with 'PUT_'. These in turn append characters to the output stream with the operator '+/:='. 22 Because typographical display features are illegal in bold tags and because string continuations may not be indented, it is necessary for the decoder to keep track of what type of token it is processing. At each point that might be the start of a token, CHANGE_STATE is called. If necessary, it terminates the old token and starts the next; it may introduce blanks or stropping characters. These functions are controlled by an automaton with nine states and nine inputs as shown in Figure B.l. Before exiting, CHANGE_STATE usually calls SET_BREAK to store the location of the beginning of the new token. If the old token extends beyond the maximum line length, SET_BREAK calls BREAK_LIWE to interrupt the line. In most cases, the entire old token is moved to the new line and the new line is indented seven spaces more than the last indentation specified with a new line code. Certain inputs are ignored: - newlines in strings, - all spaces in a tag following a 'token space' code, - spaces following a point where a line was broken (if the break was due to spaces on the previous line). Errors are treated by just counting them. In a production decoder, they would be indicated appropriately on the STAND_OUT listing. I wrote the code to do this, but it added bulk without light and was discarded from the current version. The following errors are detected: A 'token space' code used outside a tag or number. A 'token space' code in a tag but not followed by a continuation of the tag. A 'token space' code in a number, but not followed by a continuation of the number. An illegal operand for ' standard code ' . A TAO that has too many nomads. A 'visible space' occurring outside a string. An appropriate action is taken in each case, and processing continues. 23 -d a () •H -P cri ^H cr! P) ^~^ OJ UJ w hfl 01 S ft CI) O fi -P a H O rf *H o ■P £ i>3 C) EH bO CU -p CO II •H -d d co, o o p > d -p CO g Ph CJ .d +3 o R H O H CO d Fh p ■h d ■st CO •H cu P -H cu p erf CO CO p •H co p cu •H ft cd cu >> cu H CU •\ •H d (U H .'-•> erf 2 cu H bo CJ erf [£| > .d ft •H d s CO p co o d cu !-i 'erf ^-^ ^3 CO P H CO ^ H cu H H •H cu o erf •S O EH O ft H erf co U •H ^3 ^ >! P CO l>> o CO P & d 1 s • CO ft ft ft r EH P - CU •H 5 erf ,Q & EH CO p ^•1 CO erf ft Ph -d o d „ U 3 II erf d EH w Pq H p CO O o ft w bo •H co cu JD cu •H Ph O £Z2 erf CO C5 o m ^3 p CU *\ th • a £ H bO CU < erf EH d ft tsl -d 5 ft erf u EH ft ^ ft pq d o s d — H erf CO +3 p o >> > erf ft ft cu ^ d p Ph bO •H •N CO o w P p CO EH H O (U erf H d CO O CO CO d Si H -d S o o cu fe erf •s d 1 erf EH ft cu « CO •H u p XI Ph ^ (3 ft •- rl Eh H P erf cu - ^J d erf ^ tu erf ft p •H ft CU erf erf eh si H EH >> bO O P nd > CO p ft CO Ph H erf bO O •H rO U 1 •- ,-J o d H erf cu erf > p O ft d EH ft o

. CJ 0) bO Ph d bO •H P Ph d p ■d cu ft d u o £j erf t) 5 d *\ o ft d d cu o ft ft P CU ft erf d CO erf p o CJ bO erf W g B-\ ^ N cu a I • — ^ d CJ cu erf H erf 3 CO EH Ph O co Q ft Ph P * — s erf d erf d erf Ph U o CU P H •H H 4=11 1 1 1 erf CU ^ rf co H P ft CO ft Ti = x r ^ _.| r £H 3 •H P CO co CO CO CU d _•! — i erf d CO E> - = - 1 p bO d cu O cd erf M CO P H o •rl CO ft EH a a bD S> I P Ph Si CO W O % EH S EH B 0* d d •H •H ft ft H ft erf o CJ -d Ph d O Ph Ph Ph bD erf w CO S3 ■3 p •H d CO Si P •H ft O pq H H ft H erf erf O O T3 Ph O d bD •H Ph erf -d pq ft ft ft EH O ft O p [3 p Si p co ■H -d CO ■8 CO p erf ,d P P erf o •H d •H O P O Ph Cxi O P 3 IS P CO o II o ft p 1 d •rl <0 CO CO d o •H p erf Ph ft O •H ft Ph O ft CO P o 25 B.3 The Decoder BEGIN COMMENT Decoder for Transportation Representation COMMENT #The perm_£ode„function array is the permanent list of procedures to "be executed for each of the 6h input codes. To provide for definitions, the permanent array is copied into the code-function array, which is modified when a definition occurs. The main program will be WHILE decoding DO code- function (getch) OD where getch gets the next code from the input file, code_in.# (0:63) PROC VOID perm^code.jfunction, code_function; FOR i FROM l6 TO 25 DO perm_code_function (i) := put_dig OD; FOR i FROM 32 TO 63 DO perm_code_function (i) := VOID: put_let (curr_code) OD; perm code function (0:15) := (0:15) PROC VOID (#0 space # VOID : put spaces (l), #1 ' | ' # VOID : put std(l), #2 token space # VOID put toksp, #3 pragment # VOID : put pragment, #h I+096 monads # VOID put bold monad ( "Q"+whole (6U*getch+getch+61f, 0) ), #5 new page # VOID put line (getch), #6 other letters # . VOID put let(getch+6U), #7 std. symbol # VOID put std (getch), #8 »(' # VOID put std(8), #9 ')* # VOID put_std(9), #10 define # VOID set define, #11 boldface # VOID (curr face := bold face; tagstart := TRUE), #12 ', » # VOID: put_std(l2), #13 romanface # VOID (curr face := roman face; tagstart := TRUE), #lU spaces # VOID put spaces (getch), #15 newline # VOID put line(getch) ); 26 perm_code_f unction (26:31) := (26:31) PROC VOID (#26 • : • # VOID put std(26), #27 »;' # VOID put_std(27), #28 upper case # VOID curr case :- upper case, #29 ':=' # VOID put_std(29), #30 lower case # VOID curr case := lower case, #31 other monads # VOID put bold monad ("Q" + whole (getch, oj)); code_function := perm_code_function; # Line breaks and token delimitation are controlled by a finite state automaton which is activated at places of possible start of token. The states and input symbols have these mnemonics :# INT out_st = 1, tao_st = 2, bold_st = 3, tag_st = k, tagsp_st = 5, num_st = 6, numsp_st = 7, pment_st = 8, str_st = 9; INT space_in = 1, toksp_in = 2, std_in = 3j tao_in = h, pment_in = 5* rom_in = 6, bold_in = 7, dig_in =8, quot_in = 9 5 # These flags also maintain state information # BOOL unspaced := TRUE, # TRUE except during that part of a tag that follows a space in the tag (which is replaced by an underbar) # tagstart := TRUE, # set TRUE whenever the next digit or letter could start a token # skipping_spaces := false; # TRUE after underbar inserted in tag or sometimes after breaking a line. Causes spaces to be ignored. # INT curr_state := out_st; # remember what state we are in # PROC change_state = (INT input) VOID: (INT next_state := (input IN out_st, out_st, out_st, tao_st, pment_st, tag_st, bold_st, num_st, str_st); tagstart := input <= pment_in; CASE curr_state IN #1 out_st # IF input /= toksp_in THEN set_break; IF input /= space_in THEN skipping spaces := FALSE FI FI, 27 #2 tao_st # (tao_cnt := 0; IF input = std_in OR input = tao_in THEN set_break Fl), #3 bold_st # IF input /= dig_in THEN IF NOT is reserved (last_tok_out) THEN +/ := '"" ELIF (input = bold_in OR input = rom_in) & NOT line_has_left(0) THEN +/ : = " " FI; set_break; tao_cnt := FI, #k tag_st # IF input /= rom_in AND input /= dig_in THEN IF input =. toksp_in THEN next_state := tagsp_st; skipping_spaces := TRUE; unspaced := FALSE ELSE IF unspaced & i s_re served ( las t_tok_out) THEN +/ : = " x " FI; unspaced := TRUE; IF input = bold_in & NOT line_has_left(0) THEN +/:=■■ " " FI FI; set_break FI, #5 tagsp_st # IF input = space_in THEN next_state := tagsp_st ELIF input = rom_in OR input = dig_in THEN skipping spaces := FALSE; 28 next_state := tag_st; ELIF input = toksp_in THEN next_state := tagsp_st; set_break ELSE error ("Tag space not followed by more tag."); unspaced := TRUE; skipping- spaces := FALSE; set_break FI, #6 num_st # IF input /= dig_in THEN IF input = rom_in & curr_code = 50 # an "r" # THEN next_state := num_st ELSE IF (input = rom_in OR input = bold_in OR (input = toksp_in ? next_state := numsp_st; TRUE ? FALSE)) & NOT line_has_left(0) THEN +/ := " " US set_break FI FI , #7 numsp_st # IF input = space_in THEN next_state := numsp_st ELIF input = toksp_in THEN next_state := numsp_st; set_break ELIF input - dig_in THEN set_break ELIF input = rom_in 8= curr_code = 50 # an "r" # THEN next state := num st 29 ELSE error ("Space in number not followed by number."); set_break FI, #8 pment_st # (set_break; next_state := (input = pment_in ? out_st ? pment_st); IF input - quot_in THEN tagstart := TRUE Fl), #9 str_st # IF input = quot_in THEN tagstart := TRUE; next_state := out_st ELSE tagstart := FALSE; next_state := str_st FI ESAC; curr_state := next_state); # end of change_state # # The following T put_' routines are called on in response to specific codes from the input stream # PROC put_dig = VOID: (IF tagstart THEN change_state (dig_in) FI; +/:= "0123i456789"(curr_code - 16 # a "0" # + l)); INT roman_face :=rom__in, bold_face : = bold_in, upper_case = 2, lower_case = 1; INT curr_face := roman_face, curr_case :- lower_case; PROC put_let = (INT let) VOID: (IF tagstart OR curr_state = num_st THEN change_state (curr_face) FI; +/:= IF let > 63 THEN "Z" + whole ( let, 0) ELIF let >= 59 THEN "Z" f "aeiou"(let-58) ELIF let = 32 THEN "Zy" ELIF curr_case = upper_case THEN "ABCDEFGHIJKIMN0PQRSTUVWXYZ"(let-32 ) ELSE "abcdefghijkImnopqrstuvwxyz"(let-32) Fl); 30 PROC put_toksp = VOID: (IF curr_state <= bold_st OR curr_state >= pment_st THEN error ("Token space outside tag or number.") FI; change_state ( toksp_in ) ) ; PROC put_spaces = (INT nsp) VOID: (change_state ( space_in ) ; IF NOT skipping_spaces THEN +/ := nsp*" " Fl)j PROC put_line = (INT nsp) VOID: IF curr_state /= str_st THEN chang e_st ate ( space_in ) ; IF outline /= brind THEN # there is text # output_str ing ( outline ) FI ; breakpoint := nsp+1; brind := outline := indent := nsp*-" " FI; BOOL pragbody := FALSE; #TRUE while outputing pment body # STRING pragdel; # save delimiter for end of pment # PROC put_pragment = VOID: IF curr_state /= pment_st THEN change_state (pment_in); pragbody := FALSE ELIF NOT pragbody THEN pragbody := TRUE; pragdel :- last_tok_out ELSE +/:= pragdel; change_state(pment_in) FI; PROC put_std = (INT stdno) VOID: # one of the standard symbols # CASE std_type( stdno) IN #1 monad # put_monad ( std_r epr ( stdno ) ) , #2 nomad # put nomad (stdno), 31 #6 error # #7 other # #8 boldother # #9 boldmonad # #3 : =, = : put_asgn ( stdno ) , #i+ space # put_spaces(l), #5 "# ( change_state ( quot_in ) ; skipping_spaces := (curr_state = str_st); ( error ( '^Unknown standard code . " ) j change_state(std_in) ; +/:= std_repr( stdno)), (change_state(std_in); +/ : = std_repr( stdno ) ), (change_state(bold_in); +/:= std_repr ( stdno ) ; tagstart := TRUE), put_bold_monad ( std_repr ( stdno ) ) , #10 visible space # IF curr_state = str_st THEN +/ :=: " " ELSE error ("Visible space outside string."); put_spaces(l) FI ESAC: #end of put_std # INT tao_cnt := 0; # keeps track of how much tao has been put # BOOL trans_tao : = FALSE; # TRUE if had to transliterate monad # PROC put_monad = (STRING mon_repr) VOID: (change_state (tao_in); tao_cnt := 1; trans_tao := FALSE; +/ := mon_repr); PROC put_bold_monad = (STRING mon_repr) VOID: .( chang e_s tat e(bold_in); tao_cnt := 1; trans_tao := TRUE; +/ := mon_repr); PROC put_nomad = (INT nomad_num) VOID: (CASE tao_cnt + 1 IN # # (change_state(tao_in); tao_cnt := 1; trans_tao := FALSE), # 1 # tao_cnt := 2, 32 # 2 # error ("TAO too long.") ESAC; +/:= (NOT trans_tao ? std_repr(nomad_num) ? INT x; char_in_string(REPR nomad_num, x, nomad_codes ) ; "TMDVEQLTGTST"(2*x-l : 2*x)))j PROC put_asgn = (INT asgnjium) VOID: (IF tao_cnt = THEN change_state(tao_in) ; trans_tao := FALSE FI ; +/:= IF tao_cnt > & trans_tao THEN (asgnjium = 17 ? "TO" ? "AB") ELSE std_repr(asgn_num) FI; tao cnt : = ) ; (0:63) STRING std_repr; # usual text for each standard code # (0:63) INT std_type; # for CASE in put_std # BEGIN # this block is a documentary device. It is used to introduce 't T so as to list in parallel the initial values for std_repr and std_type # INT monad = 1, nomad = 2, asgn = 3, space = h, quote = 5j error = 6, other = 7 , bold_other = 8, bold_monad = 9* visible_space = 10; (0:63) STRUCT (STRING std repr, INT std type) t = ((# space # (# 1 stick # (# 2 quote # (# 3 hash ' # (# k dollar # (# 5 over # (# 6 and # "#", V, space),, other), quote), other), other), monad ) , monad), 33 ItlMM! tt It ) tt , tt tt tt (# 7 q image # (#8l paren # (# 9 r paren # (#10 times # (#11 plus # (#12 comma # (#13 minus # (#lk point # (#15 slash # "/", (#16 equal # (#17 assigns # (#18 # (#19 # (#20 flr(lwb)# (#21 flr(ent)# (#22 floor # (#23 ceiling # (#21+ # (#25 percent # (#26 colon # (#27 semicol # (#28 lessthan# (#29 becomes # (#30 greater # (#31 cent # (#32 at # (#33 amper # 11 ERR18' ", " ERE19' ", "ENTIER", "FL", "UPB", " ERE2V ", "PC", tt . tt 5 } '<», tt-^tt "CO", "@", n /\n/r" other), other), other), nomad), nomad), other), monad), other), nomad ) , nomad), asgn), error), error), bold_other), bold_other), bold_monad), bold_monad), error), bold_monad), other), other), nomad), asgn), nomad), bold_other), other), bold_monad), 3*+ (#3^ window # (#35 up(shl) # (#36 down # (#37 subten # (#38 of # (#39 grt eq # (#1+0 # (#1+1 plus i # (#1+2 # (#1+3 til (not )# (#1+1+ less eq # (#1+5 mod # (#1+6 not # {#kl til(skp)# (#^8 ** # (#1+9 # (#50 up(pow) # (#51 asterisk# (#52 tilde # (#53 uparrow # (#5^ or # (#55 again # (#56 isnt # (#57 is . # (#58 differs # (#59 sub # (#60 backsls # "ELEM", "SHL", "SHR", -<' " ERRl+0' ", " ERRl+2' ", "NOT", \ — 4 %*", it it 1 ? "SKIP", " ERRl+9' ", It y M It "TL", "UP", ttrvn't 'OR", 11 . / _ . 11 '/ ~ ' > 11 ._ . 11 it 11 boldjnonad ) , bold_other), boldjnonad), other), other), monad ) , error), monad), error), bold_other), monad ) , other), monad ) , bold_other), other), error), other), nomad), boldjnonad), boldjnonad), boldjnonad), other), other), other), monad ) , other), other), 35 (#61 bus # ")", other), (#62 nil # "NIL", bold_other), (#63 vis spc # " ", visible_space)); std_repr := std_repr- reserved.limit THEN FALSE ELSE (l:number_reserved) STRING reserved_word = ("ABS", "AND", "ARG", "AT", "BEGIN", "BITS", "BOOL", "BY", "BYTES", "CASE", "CHANNEL", "CHAR", "CO", "COMMENT", "COMPL", "CONJ", "DIVAB", "DO", "DOWN", "ELEM", "ELIF", "ELSE", "EMPTY", "END", "ENTIER", "EQ", "ESAC", "EXIT", "FALSE", "FI", "FILE", "FLEX", "FOR", "FORMAT", "FROM", "GE", "GO", "GOTO", "GT", "HEAP", #I# "IF", #IM# "IN", "INT", "IS", "ISNT", "IE", "LENG", "LEVEL", "LOC", "LONG", "LT", "LWB", "MINUSAB", "MOD", "MODAB", "MODE", "NE", "NIL", "NOT", "OD", "ODD", "OF", "OP", "OR", "OUSE", "OUT", "OVER", "OVERAB", "PAR", "PLUSAB", "PLUSTO", "POW", "PR", "PRAGMAT",' "PRIO", "PROC", #RE# "REAL", "REF", "REPR", "ROUND", "SEMA", "SHL", "SHORT", "SHORTEN", "SHR", "SIGN", "SKIP", "STRING", "STRUCT", "THEN", "TLMESAB", "TO", "TRUE", ,r UNION", ,r UP", "UPB", "VOID", "WHILE"); 36 STRING uctag := tag; FOR i TO UPB uctag DO INT loc; IF char_in_string (uctag(i), loc, "abcdefghijklmnopqrstuvwxyz") THEN uctag (i) := "ABCDEFGHIJKI^NOPQRSTUWXYZ" (loc) FI OD; INT loc := 6k, incr := 32; BOOL found := FALSE; TO 7 WHILE NOT found DO IF reserved_word(loc) > uctag THEN loc -:= incr ELIF reserved_word(loc) = uctag THEN found := TRUE ELSE loc +:= incr; IF loc > number_re served THEN loc := numb er_re served FI FI; incr %:= 2 OD; found FI); # end of is_reserved# # input of codes & definition of codes (code 10) # INT curr_code; # filled by getch # CHAR input_char; ff used by getch and logical_file_end # FILE code_in; open (code_in); make_conv (code_in^ complete_conv); on_logical_file_end (code_in, (REF FILE f) BOOL: (input_char := REPR 5; # new page code # decoding : = FALSE; TRUE)); # definitions are stored in the following # (0:63) FLEX (1:0) INT definition; # store definition # FLEX (1:0) INT inputs; # copy definition for reading by getch # INT Inptr := -1; # pointer into inputs used by getch # PROC set_define = VOID: # called in response to code 10 # IF INT code = getch; INT len = getch; # not collateral # len = THEN # restore prescribed meaning # code_f unction (code) := perm_code_function(code); definition (code) := () ELSE # define the code to be the next 'len' codes # 37 code_f -unction (code) := put_defn; definition (code) := HEAP (l:len) INT; FOR i TO len DO definition (code)(i) := getch OD FI; FROC put_defn = VOID: # code_function (defined code) = put_defn # (INT ui = UFB inputs - inptr + 1, ud = UFB definition (curr_code); (1 : ui+ud) INT tin; tin(l:ud) := definition (curr_code)( : ) ; tin(ud+l:) := inputs (inptr:); inputs := tin; inptr := 0); PROC getch = INT: # called on to get a code from input # IF inptr < THEN get (code_in, input_char); curr_code := ABS input_char ELIF inptr >= UPB inputs THEN inptr := -1; getch ELSE curr_code := input s(inptr+:=l) FI; # output routines and routines to access output string: outline # STRING outline := "", # line being built # indent := "", # blanks put at start of line by new line # . brind := "", # blanks put at start of most recent line (as possibly broken) # INT mxln = 72, # maximim nimiber of characters in line # breakpoint := 1, # start of most recent token (a potential line break spot) # linenumber := 0; # for numbering output lines # FILE ra68_out; open (ra68_out); # the reconstituted text # OP +/:= = (STRING s) VOID: outline +:= s; 38 PROC line_has_left = (INT n) BOOL: mxln - UPB outline >= n; PROC last_tok_out = STRING: outline (breakpoint:); PROC set_break = VOID: (WHILE UPB outline > mxln DO break_line OD; breakpoint := UPB outline + l); PROC output_string = (STRING line) VOID: (STRING tline = line + (mxln - UPB line + l)*" " + whole (linenumber +:= 100, 7); put (ra68_out, tline); put ( standout, tline ) ) ; PROC break_line = VOID : # terminate input line longer than mxln # IF breakpoint = UPB brind + 1 & brind /= "" & curr_state = bold_st THEN # no breakpoint, unindent line # outline := last_tok_out ; breakpoint := 1 ELSE # move last token to new line rather than split it # brind := jf indent for rest of line # (UBP indent > mxlnfc2 + 5 ? (mxln#2)*" " ? indent+7*" "); IF breakpoint = 1 THEN # must split token anyway # breakpoint := mxln+1; IF curr_state = bold_st THEN # truncate bold # outline := outline (l:mxln) ELIF curr_state = str_st THEN brind := "" ELIF curr_state =. tag_st THEN # insert "_" # outline := outline (l:mxln) + "_" + outline (mxln+1 : ) FI FT; 39 IF curr_state = out_st THEN skipping_spaces := TRUE FT; output_string ( outline ( 1 :mxln ) ) ; outline := brind + last_tok_out ; breakpoint := UPB brind + 1 FI; # end break_line # # errors # INT nerrs := 0; PROC error = (STRING msg) VOID: nerrs +:= 1; I in an implementation, the "msg" would be printed with a pointer to its location ff # main program # BOOL decoding := TRUE; # set FALSE by logical_file_end(code_in) # WHILE decoding DO code_function(getch) OD; put (standout, "There were "+whole (nerrs, 0)+" errors.") END;# of decoder # IBLIOGRAPHIC DATA iEET 1. Report No. UIUCDCS-R-7^-622 3. Recipient's Accession No. Title and Subtitle Transportation of Higher-Level Language Programs : Exemplified by an ALGOL 68 Transportation Representation 5. Report Date January, 197^ Author(s) Wilfred J. Hansen 8- Performing Organization Rept. No. Performing Organization Name and Address Department of Computer Science University of Illinois Urbana, Illinois 6l801 10. Project/Task/Work Unit No. 11. Contract /Grant No. Sponsoring Organization Name and Address Department of Computer Science University of Illinois Urbana, Illinois 618OI 13. Type of Report & Period Covered 14. Supplementary Notes Abstracts Problem: Modern higher level languages, for example, ALGOL 68, must be defined to allow many alternate token representations to accommodate the variety of input/output equipment available. Because of this variety, it may not be possible to transport programs from one hardware representation to another by any simple process of substitution. Solution: As an example of the proposed approach, this paper defines a six-bit encoding for ALGOL 68 program texts. It is an encoded character prefix representation, so it can be decoded without lookahead. The encoding is defined in detail and a decoder for a specific hardware representation is given. Key Words and Document Analysis. 17a. Descriptors . Identifiers/Open-Ended Terms . COSATI Field/Group lAvailability Statement 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 22. Price RM NTIS-35 ( 10-70) USCOMM-DC 40329-P7 1 WW 5 19T7 nsH UNIVER9ITY OF ILLINOIS-URBANA 510 B4 IL6R no COO? no S19 626(1974 Guide an Inlormitlon tyitim ^fl 1 abf >*• 3 0112 088401143 MnSu DjonoB iHMffl H M t Wi W BH 91 Hiua ■ n NB ran RB ■1 BBSS BB Bn MB V ■J ^H ■ In H KfSHM ■BiWBBbVI ^m^ Wok I MIMIIIIIIIMB BR _Ba HmbLh MilfflM Hi ■HHH nBmflBOoWMBF ■I llllii