UNIVERSITVOF The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. To renew call Telephone Center, 333-8400 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN TOR 2 2 1«3 JAN 2 1987 DEC 15 19)7 SEP 8 1 989 L161— O-1096 n% W /I/O, /0 J S " UIUCDCS-R-80-1025 /ricChh UILU-ENG 80 1717 &f-2- APPLICATION OF THE FAULT- TOLERANT DEADLINE MECHANISM TO A SATELLITE ON-BOARD COMPUTER SYSTEM by S June 1980 A. Wei, K. Hiraishl R. Cheng, R. Campbell (auJ iimOIS i«sgs£S§wwa» URB&.NA Digitized by the Internet Archive in 2013 http://archive.org/details/applicationoffau1025weia UIUCDCS-R-80-1025 APPLICATION OF THE FAULT- TOLERANT DEADLINE MECHANISM TO A SATELLITE ON-BOARD COMPUTER SYSTEM by A. Wei, K. Hiraishi R. Cheng, R. Campbell June 1980 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS 61801 Supported by NASA NSG 1471. Page 1 1 Introduction * A real - time system provides a set of services subject to timing con- straints. The design of such system is often expensive and error prone because of concurrency and timing constraints. Moreover, many highly reliable applications require the use of real-time computer systems; such as fly-by- wire passenger aircraft, nuclear reactor safety systems, etc. While several approaches to reliable hardware exist, there are few methodologies that permit the construction of reliable software. Fault-tolerant software is required to prevent system failure caused by accident, sabotage, or residual hardware/software design faults (that is, faults not detected by proof, inspection, or testing). This paper discusses fault-tolerance in high level programming languages for real-time systems and reviews the motivation and design of the deadline mechanism [1]. An application of the mechanism to a satellite on-board computer system is described which demonstrates the poten- tial of the deadline mechanism for programming fault-tolerant real-time processes. The definitions of concepts involved in real-time fault-tolerance are based on the paper [1]. It is assumed that timing constraints are imposed upon a system by its specification. Interactions between the system and an environment are performed by service components of the system. These service components receive requests from the environment, perform a service, and then respond to the environment. A system clock is assumed to be an essential component of a real-time system. System time is part of the internal state of the real-time system. It is assumed that consistent views of the times in external environments can Page 2 be obtained as functions of the system clock. Associated with each service component is a request time (the value of the system clock when a request is detected) and a response time (the value of the system clock by which a response must be completed). A response period is the maximum allowable amount of system time that can elapse between a request time for a particular service and the corresponding response time for that service. The response period is part of the specification of the system. A request period is the interarrival time of requests for the service measured with respect to the system clock and is also part of the specification. A deadline is the system time by which a system must respond to a request and is the sum of the request time and response period for a given service component. An execution period is the maximum amount of system time required to execute a block of instructions assuming no residual faults and uninterruptability. This is a measurable quantity and is determined for particular blocks of instructions executing on specific processors. The execution period of a ser- vice component is called a service period . 2 Problems Associated with Real-Time Systems . 2 . 1 Reliability Issues in Real-Time Systems . In real-time systems, software is a vital component whose failure may result in catastrophe. For example, in defense missile systems the possibil- ity of software failure can make the whole system unacceptable. In an aircraft control system a failure may lose human lives, and in nuclear reactor safety Page 3 systems a failure may result in a disaster. Hence the requirement that real- time systems meet their specifications is crucial. Reliable software has been traditionally obtained through fault - avoidance techniques. Minimizing system complexity, top-down modular design, structured programming, and proving program correctness are all efforts to achieve this goal. However, a fault-tolerance approach [2] is required because of at least the following reasons: 1) It is always impossible to guarantee the absence of faults, 2) The delays and interruptions of real-time systems (air traffic con- trol, process control, etc. ) caused by manual repair actions is unacceptable, 3) Manual intervention is inaccessible for some systems (space, under- sea, etc.), and 4) High cost of lost time and of manual maintenance is unacceptable in many installations. One critical requirement associated with real-time system reliability is that the timing constraints are met. 2.2 Timing Faults in Real-Time Systems . A real-time system that can always respond to a request before the deadline is characterized as timely . A timing failure occurs when a real-time system violates one of the timing constraints in its specification. A timing fault causes a timing error, and a timing error is defined as information representing an incorrect timing relationship in the internal state of the system. Timing faults may exist because of the following situations: 1) miscalculation of the execution periods of some algorithms, Page 4 2) transient hardware faults, 3) unbounded algorithms due to loops or recursions, 4) system overload, 5) resource contention, either of software resources or hardware resources , and 6) system deadlock. Timing faults may be included in software despite control over the system specification, design, and implementation. Timing errors may also occur because of hardware or some external system failure not documented in the requirement. In order to cope with unexpected timing errors, we need an error recovery mechanism. 2.3 Problems with Assembly Languages . Many real-time systems are implemented in assembly language. Some of the reasons for using assembly language are that the interrupts can be con- trolled explicitly with assembly language, machine-dependent features are easier to program, and the execution time is easy to estimate and usually smaller than that of using high level language. However, the proof of correct- ness of an assembly language program is difficult and the specification of timing constraints in assembly language programs is almost impossible. Simu- lation, repeated testing, and program redundancy can be used to improve system reliability. However, simulations may be lengthy and it is frequently diffi- cult to simulate the system closely. Lack of time may prevent repeated testing from attempting all the possible contingencies and input values. The provi- sion of redundancy in assembly language programs is ad hoc. Furthermore, these methods do not guarantee the timing constraints will be met. Finally, Page 5 systems developed in assembly language are usually very expensive. 2.4 Problems with High Level Languages . With advancing technology in high speed computer hardware and increas- ing complexity of software systems, the current trend is to develop high level programming languages to support real-time systems [3] . HAL/S [4] developed by Intermetric Inc. has been used to program the Space Shuttle control system. Modula [5], Concurrent Pascal [6], ILIAD [7], Ada [8] and Path Pascal [9, 10] are all efforts in this direction. The advantage of high level system program- ming languages is obvious; it facilitates programming, understanding, modifi- cation, and most importantly, maintenance of the system. Other advantages are portability and reliability. Reliability is greatly enhanced by the use of high level languages for real-time systems because program verification is simplified. Timing con- straints are an important aspect of reliability in real-time systems and should be rigorously specified. Verification that these constraints are satisfied is, however, very difficult. It is important to provide a sys- tematic approach to fault-tolerance in highly reliable real-time systems. Most existing high level languages, however, do not specify timing con- straints. Furthermore, there is no generalized methodology in existing high level languages to tolerate timing faults or to recover from timing errors. Page 6 3 Fault-Tolerance in High Level Languages . 3. 1 Fault - Tolerance in Real - Time Systems . 3.1.1 Recovery Block Scheme . Fault-tolerant design techniques have been used for a long time to improve hardware reliability, but only recently have they been proposed to achieve high software reliability [11, 12, 13]. The recovery block scheme [11, 14] has been proposed as a mechanism to support fault-tolerant software. A simple recovery block is shown below. ensure acceptance_test by primary else by alternate else error ; The acceptance_test is an assertion about the results computed by the primary or the alternate. If the acceptance test is false after a primary execution, the alternate is executed. Errors detected implicitly by the hardware or software during the execution of the primary or the alternate automatically set the acceptance test to false. If both the primary and alternate of a recovery block fail to satisfy the acceptance test, a software error is gen- erated. The alternate performs the same function as the primary but may use a different algorithm. A hardware device called the recovery cache [15] is used to restore the state after a primary failure to the state prior to entry to the recovery block. The recovery block scheme has been implemented in Sequen- tial Pascal [16] and hardware caches have been constructed for the PDP-11 [17]. The recovery block scheme is applied by Hecht to real-time applications [18]. Hecht augments the acceptance test with a watchdog timer that monitors the execution within a specified response period. Problems arise, however, if Page 7 recovery blocks are applied to software systems whose services must meet dead- lines. The deadline mechanism [1] , inspired by the recovery block scheme, is proposed as a means to systematically construct fault-tolerant real-time sys- tems with guaranteed response time. 3.1.2 Deadline Mechanism . The deadline mechanism, although similar to the recovery block scheme, performs a different function. Recovery blocks are insensitive to the passage of time and cannot be used to ensure that a response is generated by a specific deadline. (That is, it is impossible to recover from a missed dead- line by resetting the system clock and executing an alternate.) Recovery blocks do provide alternate algorithms to cope with general software faults involving logical errors but can not handle timing errors properly. The dead- line mechanism includes scheduling mechanisms to ensure timely responses. The recovery block and deadline mechanism complement each other and may be nested to provide tolerance to both timing and other faults. The deadline mechanism associates two algorithms with each particular service component. The "primary " algorithm produces a better quality service than the "alternate ". The alternate is a simpler, deterministic algorithm which always produces an acceptable result in a known, fixed length of time. The deadline mechanism provides fault-tolerance in a real-time system by ensuring that either the primary or the alternate will complete before the deadline. A set of simulations demonstrating the feasibility of fault- tolerance in real-time systems is described in [1] . Reliable scheduling of primaries or alternates to meet real-time con- straints requires the calculation of the execution period of each algorithm. Page 8 Accurate determination of execution periods is critical to system performance and reliability. Anderson and Witty [19] proposed placing upper bounds on iterative and recursive program codes. For many desirable algorithms such upper bounds may not be easily established. The "service period" of a com- ponent can then be set, for scheduling purposes, at a figure that is less than the response period, but no less than the execution time of the alternate. Since upper bounds on the execution periods for the alternate algorithms are assumed, the deadline mechanism has been designed so that the scheduler reserves a time for execution of the alternate as requests arrive. Primaries are scheduled in any remaining time which is called slack time . 3.2 Experimental Language Construct for the Deadline Mechanism . The deadline mechanism is a tool for specifying timing constraints explicitly in the program and provides tolerance for timing faults. We have incorporated an experimental language construct for the fault-tolerant dead- line mechanism into an existing programming language, Path Pascal [9, 10] . Path Pascal was chosen as the vehicle for this experiment because of its avai- lability and high level synchronization mechanisms. Path Pascal includes con- current processes, device programming features, a real-time clock, an encapsu- lation mechanism called an "object ", and Open Path Expressions [20] for syn- chronization of processes. Each object consists of a declaration of the data structure together with a set of operations which access the values of that structure. Objects permit synchronization to be associated with shared data and are implemented as an extension of the Pascal structure type facility. Operations accessible outside of the object are specified by prefixing the declaration of an operation with the reserved word "entry ". "Interrupt Page 9 processes " permit device programming by means of a "doio " statement similar to that in Modula [5] . The real-time programming features include three built-in routines "await ", "delay ", and "time "* Await suspends a process until the argument of await corresponds to the system time. Delay suspends a process for the number of time units specified in its argument. Time is a function which returns the current value of the system clock. Path Pascal has been extended to include the fault-tolerant deadline mechanism. The extensions should not be considered as a language design, rather as experimental features which have allowed the programming of example reliable real-time systems. A "deadline process " provides a fault-tolerant deadline service. The response period to a request is a parameter of the deadline process. The deadline process contains one "request " and one "ser- vice " statement. The request period is specified following an "every " phrase in the "request " statement. The service statement specifies the primary and alternate algorithms. A simple example of a deadline process is shown below: deadline process position__update [ response = onesecond] ; begin every fiveseconds request get_data_f rom_sensors; service nextposition. compute else nextposition. approximate; until navigation_terminated; end ; where "nextposition" is an object containing the information of positions to be updated. "Compute" and "approximate" are two entry procedures in the object to do the position update. "Compute" is a primary algorithm and "approximate" an alternate algorithm. In the example above, deadline process "position_update" computes the "nextposition" every five seconds starting from the instantiation of the dead- Page 10 line process. The request statement reads data from sensors and the service statement calculates the next position. Assuming that the execution period of reading sensor data and the approximating algorithm is less than the response period, the deadline mechanism ensures each request will complete before its deadline. The rate-monotonic priority assignment [21] is used to schedule alter- nates using the response period as a static priority, with small values having high priority. At any given instant the alternate with the highest priority (smallest response period) executes. This scheme requires preemption of alternates. It has been shown that such a system will be timely if processor utilization is less than In 2 (approx. 0.69) [21]. One advantage of this method is a reduction in scheduling complexity and overhead, an aspect not considered in [1]. The primaries are scheduled in earliest-deadline-first order. Executions of the primary and alternate within the service period are ordered alternate first (the first-chance scheduling algorithm [1]). If the primary completes within its execution period, its results are used in prefer- ence to those of the alternate. If the primary should fail to complete within its execution period, the results from the alternate are used. A modified recovery cache has been implemented in a software interpreter for the language and is used to hold results from the alternate until the primary either fails or completes successfully. Page 11 4 Application to a_ Satellite On-board System . A satellite on-board system has been selected as an example of a real-time system. The on-board system requires high reliable hardware and software to achieve a successful mission. Software failures, including timing failures, can degrade the system performance and inhibit the success of the mission. In the worst case, it can cause the loss of the satellite costing tens of millions of dollars. We show how the fault-tolerant deadline mechanism can be applied to the on-board system to tolerate timing faults. 4 . 1 The Model of the On-board System . The model of the on-board system is based on Multimission Modular Spacecraft [22] which is the standard satellite for a wide variety of earth orbiting missions in the 1980's. The spacecraft is a zero-momentum, three axis stabilized satellite. The executive program for its on-board system has been written in a low level language [23] . A block diagram of the model is depicted in Fig. 1. The Central Unit (CU) interfaces with the On-Board Computer (OBC), Remote Interface Units [RIUs], Command Demodulator-Decoder and the PCM Modula- tor. From the CU, commands are channeled through the Multiplex Data Bus (MDB) to the RIUs and from there to the various subsystems. Telemetry is sampled in the subsystems, selected by the RIUs, and then passed to the format generator in the CU through the MDB. From the CU, PCM goes to the Modulator. Both commands and PCM are channeled to the OBC. The OBC can communicate with all satellite subsystems through the CU, MDB and RIUs. Page 12 CM) Decoder Modulator <- CU <- ■> OBC PCM Multiplex Data Bus I I RIU-1 RIU-2 RIU-3 RIU-4 RIU-5 Communication Attitude Power & Control Data Handling System Thermal Mission Equipment Fig. 1 Satellite On-Board System Model Though the functions of the OBC generally depend on the mission, the attitude control, delay command process, statistical calculation and limit checking, power management, thermal control, and mission equipment management are functions commonly required by various missions. 4.2 Time Critical Subsystems . Among the functions mentioned above, attitude control is the most com- plicated and the most time critical, especially during the initial acquisition phase when a big attitude maneuver is required. If the satellite is in an incorrect attitude for a long time hardware damage may result from abnormal thermal conditions. Furthermore, the solar panel can not provide power and the battery will be drained. The satellite can not survive without battery power. The satellite has very little momentum to stabilize its attitude and is very sensitive to disturbance torque. The control loop must continuously Page 13 monitor disturbance torque and provide a cancelling control torque when neces- sary. The big attitude maneuver requires a large control torque, and control loop failures during this period may cause the satellite to tumble. If the satellite tumbles, the radio link between the satellite and the ground sta- tions may be lost. (The satellite antenna beam must be aligned with the ground stations.) Should this happen, the ground stations will be unable to transmit suitable recovery commands to the satellite, the satellite will become out of control and the mission will fail. The attitude control subsystem consists of attitude sensors (gyros, sun sensor, and earth sensor), OBC, and actuators (inertial wheels, and gas jets). Gyro signals are read every 125 ms and pre-processed every half second. Sun and earth sensor signals are read every second and sent to the gyro calibration process. The gyro calibration process is executed every two seconds to provide the calibration data to the attitude determination process. The attitude determination process is the most essential and time consuming process within this subsystem. This procedure is executed every second. Actuator control signals are provided by the control process every second and sent to the inertial wheels every 125 ms. The data flow between the processes is shown in the following timing chart (Fig. 2). 4. 3 Application of the Deadline Mechanism to the Attitude Control System . The deadline mechanism has been applied to each process of the atti- tude control system. For example, in the attitude determination process, a linearized Kalman filter is used as the primary procedure. The filter is I Page 14 | Inertial |- |- |- |- |- |- |- |- |- |- |- |- |- |- |- |- |- |. Wheel Gyro Cal. | —■■———--———■-—— =-=«.-..- ..... |... Read |— |-- |-« Sensors Det. Process Read |= |- |= |= |= |= |- |= |= |- |= |= |= |- |- |= |= | Gyro | . 5s 1 . 5s 1 . 5s 1 . 5s- J is j is Fig. 2 Attitude Control Timing Chart designed to determine attitude angles of the satellite and their changing rate. To achieve required accuracy, the primary becomes rather complicated. The alternate is a simple prediction algorithm by a transition matrix. This prediction algorithm is accurate enough when the time duration of the predic- tion is no longer than 4 or 5 seconds. A program example for attitude determination is listed below. The example shows the structure of the program. deadline process attitude_determine [ response = onesecond] ; local declarations procedure filter; begin kalman filter; end; Page 15 procedure prediction; begin prediction by transition matrix; end ; begin (* attitude_determine *) initializations of local data; awalt (specif ied time) ; every onesecond request service filter else prediction; until terminated; end ; (* attitude_determine *) All the other functions mentioned in the previous section are simi- larly programmed as deadline processes. The synchronization among deadline processes is achieved by the "await" function. 4.4 Reliability Improvement by the Deadline Mechanism . The deadline mechanism permits the construction of highly reliable systems from primaries that are orders of magnitude less reliable than the requirements. The following argument is based on several simplifying assump- tions, however, we believe that it demonstrates the potential of the mechanism for programming reliable systems. Let the timing failure probability of the alternate and primary for a unit of time be P[f ] and P[f ], respectively. We a p assume that each f and f are independent and that the deadline mechanism implementation is reliable. A timing failure of an alternate causes a subsys- tem failure. Since the alternate is much simpler than the primary, the results calculated by the alternate includes some tolerable inaccuracy. Consecutive primary failures, although tolerated by timely execution of the alternate, accumulate the inaccuracy and produce unacceptable results. Therefore, m. con- secutive primary timing failures are considered to cause a subsystem failure, "m" is determined by the quality of the alternate algorithm and the system Page 16 requirements. Then the probability of the subsystem timing failure for a unit of time, denoted by P[f ], is as follows: P[f ] « P[f ] + P[s]*P[mf ] (1) S 3 P Where P[mf ] is the probability of m consecutive primary failures for a unit of time and P[s ] is the success probability of the alternate for a unit of time. Since P[mf ] < {P[f ] } m , we can rewrite (1) as: P P P[f ] < P[f ] < P[£ J + P[s ]*{P[f ]> m (2) a s a a p The satellite acquisition maneuver is often designed to tolerate 99.73% of errors occurring in each of the maneuver parameters, such as control torque, disturbance torque, etc., during the maneuver. (The value 99.73% corresponds to an occurrence probability within 3 standard deviations on either side of the nominal under the assumption of normally distributed errors.) Considering the timing constraint of a deadline process as a maneuver parameter, the probability of a timing failure of a deadline process must be less than 0.0027 for the duration of the maneuver. We further assume that the duration of the maneuver is two days. For the attitude determination process, the required timing failure probability for the request period (1 second) is —8 less than 1.56*10 . Here we choose the request period of the attitude deter- mination process as the unit of time and the following discussions on proba- bility are based on this unit of time. Without the benefit of the deadline mechanism, the timing failure pro- —8 bability of the primary, P[f ], must be less than 1.56*10 . To estimate the effectiveness of the deadline mechanism, we will make the following simplifi- Page 17 cations. We can simplify equation (2) by replacing P[s ] by 1 (we assume that P[f ] is very small.) Suppose that only two consecutive failures of the pri- 8 mary can be allowed. Using the required timing failure probability of the subsystem, we can estimate the required timing failure probabilities of the primaries and alternates that will guarantee the reliability of the subsystem from: P[f ] < P[f ] < P[f ] +
2 < 1.56*10~ 8 a s a p The alternate is much simpler than the primary and we assume that it can be constructed with a timing failure probability less than that of the subsystem —8 (in the order of 10 ). The required timing failure probability of the pri- mary can be four order of magnitude greater than that of the alternate or the subsystem. 5 Conclusions . Path Pascal has been extended to include the fault-tolerant deadline mechanism. The satellite on-board system has been simulated using the extended Path Pascal. The deadline mechanism provides a systematic recovery scheme for timing errors. The language has shown its potential to describe real-time systems. The required manpower to develop the simulation program in extended Path Pascal was about six man-months. The development of the executive pro- gram by the 0A0 project using low level language required forty-eight man- months [23] . Though the simulation program is a simplified version of the Page 18 executive, we believe that a tremendous coding and testing time can be saved by the high level programming language approach. Fault-tolerance in software is needed in addition to the fault- avoidance approach. Indeed, tolerance and avoidance are complementary approaches to the fault problems. We believe that theoretically and practi- cally the reliability of a real-time system is considerably improved by apply- ing the deadline mechanism. Even though overhead is incurred for error recovery and executing additional algorithms [24] , the mechanism achieves high reliability for time critical components of real-time systems. 6 Acknowledgements . We would like to thank NASA (Grant NSG 1471) for financing this research. Special thanks also go to Nippon Electric Co., Ltd. for providing valuable information about on-board data handling systems. Page 19 REFERENCES [I] Campbell R. H., K. H. Horton and G. G. Belford, "Simulations of a Fault- Tolerant Deadline Mechanism," Proc. of the 9th International Conf. on Fault-Tolerant Computing, Madison, Wisconsin, June, 1979. [2] Avizienis, A., "Fault-Tolerant Computing - Progress, Problems, and Pros- pects," Information Processing 77, North-Holland Pub. Co., Amsterdam, pp. 405-420, 1977. [3] Wirth, N., "Toward a Discipline of Real-Time Programming," CACM, Vol. 20, No. 8, pp. 577-583, August, 1977. [4] Intermetric Inc., HAL/S Manual, 1975. [5] Wirth N., "Modula: a Language for Modular Multiprogramming," Software- Practice and Experience, 7, pp. 3-84, 1977. [6] Brinch Hansen, P., The Architecture of Concurrent Programs , Prentice- Hall, Englewood Cliffs, New Jersey, 1977. [7] Schutz, H. A., "On the Design of a Language for Programming Real-Time Concurrent Processes," IEEE Trans, on Software Engineering, Vol. SE-5, No. 3, pp. 248-255, May 1979. [8] Wegner, P. , "Programming with ADA: An Introduction by Means of Graduated Examples," SIGPLAN NOTICES, Vol. 14, No. 12, pp. 2-46, Dec. 1979. [9] Campbell R. H. and R. B. Kolstad, "Path Expressions in Pascal," Fourth International Conference on Software Engineering, Munich, Germany, Sept. 1979. [10] Campbell, R. H. and R. B. Kolstad, "Practical Applications of Path Pascal to Systems Programming," ACM79, Detroit, 1979. [II] Randell B., "System Structure for Software Fault Tolerance," IEEE Trans, on Software Engineering, Vol. SE-1, No. 2, 1975, pp. 220-232. [12] Denning, P. J., "Fault Tolerant Operating Systems," Computing Surveys, Vol. 8, No. 4, December, 1976, pp. 359-389. [13] Hecht, H., "Fault-Tolerant Software," IEEE Trans, on Reliability, Vol. R-28, No. 3, August, 1979, pp. 227-232. [14] Horning, J. J., H. C. Lauer, P. M. Melliar-Smith and B. Randell, "A Pro- gram Structure for Error Detection and Recovery," in Proc. Conf. Operat- ing Systems; Theoretical and Practical Aspects. IRIA, 1974, pp. 177-193. (Reprinted in Lecture Notes in Computer Science , Vol. 16, Springer- Verlag, New York.) [15] Anderson, T., and R. Kerr, "Recovery Blocks in Action: A system support- ing high reliability," 2nd International Conference on Software Engineer- ing, pp. 447-457, October, 1976. Page 20 [16] Shrivastava, S. K. , "Sequential Pascal with Recovery Blocks," Software - Practice and Experience, Vol. 8, pp. 177-185, 1978. [17] Lee, P. A., N. Ghanl and K. Heron, "A Recovery Cache for the PDP-11," Proc. of the 9th International Conf. on Fault-Tolerant Computing, Madi- son, Wisconsin, June, 1979. [18] Hecht H., "Fault-Tolerant Software for Real-Time Applications," Computing Surveys, Vol. 8, No. 4, 1976, pp. 391-A07. [19] Anderson, T. and R. W. Witty, "Safe Programming," BIT 18, pp. 1-8, 1978. [20] Campbell R. H., "Path Expressions: A Technique for Specifying Process Synchronization," Ph. D. Thesis, The University of Newcastle Upon Tyne, August, 1976. [21] Liu C. L. and J. W. Layland, "Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment," JACM, Vol. 20, No. 1, 1973, pp. 46-61. [22] NASA Godard Space Flight Center, "Multlmission Modular Spacecraft System Specification," S-700-10, May 1977. [23] Taylor T., "Modular Multi-Mission Spacecraft," Presentation Material for ICASE/NASA Langley Workshop on Real-Time Programming for NASA Flight Pro- jects, October 1979. [24] Liestman, A. L. and R. H. Campbell, "A Fault-tolerant Scheduling Prob- lem," Tech. Report UIUCDCS-R-80-1010, Dept. of Computer Science, Univer- sity of Illinois, Feb. 1980. BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R-80-1025 2. 3. Recipient's Accession No. 4. Title and Subtitle Application of the Fault-Tolerant Deadline Mechanism to a Satellite On-Board Computer System 5. Report Date June 1980 6. 7. Author(s) A. Wei, K. Hiraishi, R. Cheng, R. Campbell 8. Performing Organization Rept. No. r_80-1025 9. Performing Organization Name and Address Department of Computer Science University of Illinois U-C Urbana, IL 61801 10. Project/Task/Work Unit No. 11. Contract/Grant No. NASA NSG 1471 12. Sponsoring Organization Name and Address NASA Langley Research Center Hampton, Virginia 23665 13. Type of Report & Period Covered 14. 15. Supplementary Notes 16. Abstracts Many real-time systems require high reliability both in software and hardware. While several approaches to reliable hardware components exist, there are few methodologies that permit the construction of reliable software. Fault-tolerant software is required to prevent system failure caused by accident, sabotage, or residual hardware/software design faults (that is, faults not detected by proof, inspection, or resting). A deadline mechanism based on recovery blocks is briefly described which permits recovery from timing faults. A language construct for the deadline mechanism is incorporated into the Path Pascal programming language. Application of the fault-tolerant deadline mechanism to a satellite on-board computer system is described using extended Path Pascal as a vehicle. The MMS (Multimission Modular Spacecraft), a satellite designed by NASA for the 1980' s, is used as model. 17. Key Words and Document Analysis. 17a. Descriptors Path Pascal fault- tolerant deadline mechanism 17b. Identifiers/Open-Ended Terms 17c. COSATI Field/Group 18. Availability Statement unlimited 19. Security Class (This Report) UNCLASSIFIED 21. No. of Pages 23 20. Security Class (This Page UNCLASSIFIED 22. Price FORM NTIS-3B ( 10-70) USCOMM-OC 40329-P7! JUN 3 WW