m 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/videotapeanalysi836hans 
 
r .xuy 
 
 UIUCDCS-R-76-836 
 
 A VIDEOTAPE ANALYSIS OF STUDENT PERFORMANCE ON 
 
 ol 
 
 October 1976 
 
 AN INTERACTIVE EXAMINATION 
 
 by 
 
 Wilfred J. Hansen 
 
 Richard Doring 
 
 Lawrence R. Whitlock 
 
UIUCDCS-R-76-836 
 
 A VIDEOTAPE ANALYSIS OF STUDENT PERFORMANCE ON 
 AN INTERACTIVE EXAMINATION 
 
 by 
 
 Wilfred J. Hansen 
 
 Richard Doring 
 
 Lawrence R. Whitlock 
 
 October 1976 
 
 Department of Computer Science 
 
 University of Illinois at Urban a- Champaign 
 
 Urbana, Illinois 61801 
 
 This work supported in part by the National Science Foundation under 
 grant EC41511. 
 
t gZ C-S</6 
 
 11 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1. Introduction 1 
 
 2 . Environment 3 
 
 3. Experiment Design 7 
 
 h. Results 10 
 
 k.l Bias Analysis 10 
 
 U.l.l Do the subjects represent the population? 10 
 
 U.1.2 Were the subjects fairly distributed between 
 
 treatment groups? 13 
 
 U.1.3 Did the experimental procedure disturb the students?. lU 
 U.l.U Were scores and times influenced by exam form 
 
 or order? IT 
 
 k.2 Main Effects 21 
 
 U.2.1 Subjects spent longer taking exams on PLATO 21 
 
 U.2.2 Influences on scores 25 
 
 5. System Improvements 29 
 
 5.1 Reducible Overhead 29 
 
 5.2 Inelastic Overhead 30 
 
 5.3 Productive Time 31 
 
 5.^ Trouble Time 32 
 
 6. Further Analyses 3^ 
 
 6.1 "Context Switch" Time 3U 
 
 6.2 PLATO Experience 37 
 
 6.3 Review Behavior 38 
 
 6.k Nuisances, Annoyance, and Agitation U2 
 
 7. Conclusions U7 
 
 7.1 Results and Action U7 
 
 7.2 General Model U8 
 
 7-3 Videotape as a Tool for Experiments 50 
 
 8. References 52 
 
Ill 
 LIST OF FIGURES 
 
 Figure Page 
 
 3.1 Experiment Design 8 
 
 3.2 Activity Categories 9 
 
 k.l Summary of all Subjects 11 
 
 U.2 Summary of Post Test Questionnaire Responses 12 
 
 1+.3 Comparison of Ability and Scores . l6 
 
 k.h Interaction Effects 19 
 
 U.5 Summary of Detailed Times for Taped Subjects 22 
 
 h.6 Breakdown of the Extra Time Spent on PLATO 23 
 
 h .J Productive Work Times for each Problem 27 
 
 6.1 "Context Switch" Time Calculation 35 
 
 6.2 Work Spent on Review 39 
 
 6.3 Success at Review Uo 
 
 6.h Comparison of Manual Calculation Behavior for the 
 
 Arithmetic Expressions Problem UU 
 
 7.1 Relations Among Groups of Variables Affecting Exam 
 
 Performance U9 
 
1. Introduction 
 
 Enlightened design of interactive systems demands insight into the 
 behavior of system users. Such insight is not crucial for users who can 
 he extensively trained, hut not all users can receive special training. 
 This paper reports a preliminary study of one variety of untrained users: 
 students taking an examination interactively. In undertaking this study, 
 we had several goals : 
 
 i) to explore videotaping as a means to measure and analyze 
 the behavior of terminal users; 
 
 ii) to determine why students seemed to be taking as much as 
 twice as long on examinations administered interactively; 
 
 iii) to use the results of (ii) to improve our system, and 
 
 iv) to stimulate the formation of hypotheses about users and 
 usage of interactive systems. 
 
 Among the techniques employed to study terminal users, one obvious 
 approach is to record all terminal input-output activity. This can be 
 accomplished even without system modification by introducing a small 
 processor between computer and terminal and manually examining the results 
 [Treu, 1972]. Alternate approaches to recording the reason for a user's 
 actions are to display a menu of possible reasons [Kulsrud, 197*+] and to 
 design the system so various benefits encourage the user to state the 
 nature of the work [Hansen, 1973] . None of these techniques allow 
 detailed examination of the non-terminal activities of the user; for 
 
2 
 example , the researcher cannot demarcate reaction to an annoyance and subsequent 
 resumption of productive work. To attempt to solve such problems, we decided 
 to videotape users. 
 
 Preliminary results showed that the subjects, on average, did indeed 
 spend longer on interactive versions of examinations (as administered by an 
 early version of the exam system). Using the paper exam time as a base, the 
 additional times included 16% for display generation, U5% for avoidable 
 attributes of the system as it then stood; and, curiously, 31% as a result of 
 longer think times on each problem. However, a "representative times" 
 analysis shows that most subjects spent the same amount of think time on both 
 PLATO and paper. 
 
 Section 2 outlines those aspects of the system relevant to this paper. 
 Sections 3 and k present the experimental procedure and basic results. 
 Section 5 describes changes made in the exam system which reduce the "PLATO 
 penalty" from 9&% to perhaps 20$. Section 6 discusses some observations and 
 hypotheses suggested by our work, especially context-switch time and a theory 
 of nuisance, annoyance, and agitation. The exams, questionnaires, and detailed 
 results of the experiment are presented in a companion paper [Doring, 1976]. 
 
 Acting on an idea of the first author, the second author conducted the 
 experiment with the aid of the third author. The latter implemented the exam 
 monitor and several problem generators, including two of the four used in the 
 experiment. After the second and third authors coded the tapes and summarized 
 the data, the first author conducted the analyses presented below. 
 
2 . Environment 
 
 The examination system in question - the Generative Exam System 
 [Whitlock, 1976] - offers many advantages. A wide variety of question schemes 
 are available. Geographic and temporal scheduling problems can be simplified 
 by the network nature of the host system. The system can offer non-passive 
 problems wherein subsequent subproblems reflect prior performance. Each 
 student receives a slightly different form of each question (to encourage 
 honesty). Finally, scores and correct answers are instantly ready for student 
 review, and scores and statistics are instantly ready for the instructor. 
 
 The exam system, in turn, is but one component of the Automated Computer 
 Science Education System (ACSES) under development by the Computer Science 
 Department at the University of Illinois [Nievergelt, 1976]. ACSES includes 
 over 100 instructional lessons, some twenty of which are in regular use for 
 over 3000 introductory level students each year. These students constitute 
 the population for which the exam system is designed. They are non-major 
 underclassmen from a wide variety of fields . They have undeveloped typing 
 skills, minimal exposure to interactive systems, and little motivation to 
 learn computer programming. 
 
 ACSES is implemented on the PLATO system for computer aided instruction 
 developed by the Computer-based Education Research Laboratory, also at the 
 University of Illinois [Bitzer, 1973]. The University's PLATO system is 
 currently the most fully used; it has 1000 terminals connected and about 
 400 are usually in operation. From a terminal a student can have access to 
 several thousand hours of instruction, several hundred of which have been 
 polished for - and used in - regular courses. 
 
It 
 
 Though the PLATO system has many advantages, it imposes serious 
 constraints on processing power, memory size, disk accesses, and display 
 speed. Each user is limited to a maximum of ten "TIPS" (thousands of 
 instructions per seconds) and good response cannot be expected if a user 
 requires more than three TIPS. This is adequate for simple quest ion- 
 and-answer interaction, but it cannot support sophisticated versions of 
 data base search, program text analysis, or exam grading. Program and 
 data memories are limited, so large programs and data bases are impractical. 
 Core limitations could be avoided by retrieval from disk, but each user 
 is restricted to an average of one disk access per minute. In the 
 Generative Exam System, this resource is consumed in reading and updating 
 student records for each switch from one problem to another. 
 
 PLATO employs a plasma panel terminal. Its 512x512 dot display 
 is high precision but, like a storage tube, cannot support rapid animation. 
 Because it is driven by 1200 baud lines with a maximum of 180 characters 
 per second, the terminal takes ten seconds to display a full page of text. 
 Text may be written in a very rich character set because the terminal 
 provides 128 built-in characters and memory space for another 128 to be 
 defined by the user program. Unfortunately, it can take over ten seconds 
 to transmit the codes to define any substantial number of extra characters. 
 
 The keyboard is an augmented typewriter design with a number of 
 "function" keys, including NEXT, BACK, DATA, and HELP. Though a lesson 
 designer can specify any response to these keys, they have certain conven- 
 tional uses. HELP usually causes display of some additional explanation. 
 BACK moves the lesson back to the last section of material covered. 
 Shifted-DATA often returns the display to the index page for the lesson. 
 
5 
 
 NEXT has two possible meanings: sometimes it terminates an answer and 
 sometimes it simply signals that the student is ready to go on. Usually 
 when the student answers a question and presses NEXT the system responds 
 with "no" or "ok" following the response. This feedback is an important 
 reinforcement during instruction, but must be switched off for exams. 
 
 The Generative Exam System is structured as a central monitor 
 and a collection of "problem generator/graders" (PG/G's). The central 
 monitor displays the exam cover page, transfers to individual PG/G's, 
 stores student answers and scores, and generates statistics. Each PG/G 
 displays one or more problems and interacts with the student to accept 
 the answers. The generality of this structure supports an unlimited 
 variety of question types and within each PG/G random generation techniques 
 provide an infinite number of questions. 
 
 Each subject took two exams, one form A and one form B. Each 
 form used the same four PG/G's: 
 
 1) Arithmetic expressions - 12 variables and their values were displayed 
 and the student was asked to evaluate five expressions involving 
 those variables. 
 
 2) FORTRAN syntax - This PG/G included a page of instructions, a cover 
 page, and three problem pages. Each problem page displayed a FORTRAN 
 statement with possibly a syntactic error created with an Extra, 
 Missing, or Replaced basic symbol. For the experiment, form A had 
 assignment, ASSIGN, and PRINT statements, while form B had assignment, 
 GOTO, and DO. In each case the answer was to be given by specifying 
 
 a type of error (E, M, R, or None) and the associated basic symbols. 
 
 3) Print with FORMAT - A print statement and a FORMAT were displayed 
 along with a grid for the answers. The student entered a line of output 
 and then specified where in the grid it should go. Form A had three 
 
 F format items; form B had three I format items. 
 
 h) DO loop - A DO-loop with a PRINT statement was presented and the 
 
 student had to specify the values that would be printed. This PG/G 
 
graded interactively; each answer was checked when it was entered. 
 Points were immediately deducted for an incorrect answer and a second 
 chance was given. This scheme tried to avoid the problem of 
 propagation of errors. 
 
3. Experiment Design 
 
 A two by two design was chosen with subjects in each cell taking 
 two forms of an exam - one on paper and one on PLATO. Both forms were 
 generated by the system and copied on a screen copier to generate the 
 paper version. The design assigned exam forms and methods as shown in 
 figure 3.1. Because the two forms were considered essentially equivalent 
 (and turned out to be so), the subjects were split between PLATO-first 
 and paper-first so neither treatment was favored by the subjects' self- 
 expressed typing ability. Having only one set of videotape equipment 
 and one set of proctors the experiment was run in separate sessions of 
 roughly one hour each. 
 
 Subjects were recruited in an elementary computer science course 
 similar to those for which the exam system will be used (Computer Science 
 101, an introduction to FORTRAN for engineering students, Fall, 1975). 
 They were offered the inducement of practice for the hour exam they would 
 be taking a week later. For fairness, however, the experiment exams 
 were made available to all students after completion of the experiment. 
 (More recently, the exam system has become a popular way to study for 
 exams.) We decided to tape only four subjects for several reasons: 
 the major anticipated effect was large (i.e., slowness on PLATO), the 
 equipment was expensive, and we needed experience. We had enough 
 volunteers to schedule an additional four subjects who were not videotaped; 
 this proved fortunate, because one of the eight subjects 
 
8 
 failed to appear at the appointed hour. When students volunteered, they 
 were asked to state their typing ability. 
 
 Taped 
 Subject 
 
 SA 
 
 Untaped 
 Subject 
 
 Method 
 First 
 PL A 
 
 and Form 
 Second 
 pa B 
 
 SB 
 
 SE 
 
 pa A 
 
 PL B 
 
 SC 
 
 SF 
 
 PL B 
 
 pa A 
 
 SD 
 
 SG 
 
 pa B 
 
 PL A 
 
 Figure 3.1. Experiment Design. For this paper, the subjects have been 
 randomly coded as SA,..., SG. "PL" = PLATO. "pa" = paper. 
 
 Each experiment session began with a practice exam on PLATO, 
 followed by the two exams dictated by the design. The post-test 
 questionnaire elicited reactions to the two methods of giving exams. 
 
 The video camera was located slightly behind the subject's 
 shoulder so as to be out of sight while still recording the subject's 
 face and hands, and the general appearance of the PLATO screen or paper 
 exam. Due to low resolution, it was not possible to record the details 
 of the work, so an observer was positioned behind the subject to make a 
 manual record including the sequence in which the problems were worked. 
 A clock was positioned beside the terminal so the time was recorded. 
 To avoid time pressure, the clock was turned so the subject could not 
 see its face. 
 
Data analysis began by coding each "activity" observed into one 
 of the categories listed in figure 3.2. 
 
 Brief Title 
 Think 
 
 Code 
 
 RT 
 
 Description 
 
 Read and Think 
 
 Tt 
 
 CP 
 
 Calculate with Paper and Pencil 
 
 Answer 
 
 EA 
 
 Enter Answers 
 
 Select 
 
 PS 
 
 Problem Selection 
 
 Generate 
 
 PG 
 
 Problem Generation (PLATO only) 
 
 Load 
 
 LC 
 
 Load Character Set (PLATO only) 
 
 Display 
 
 PP 
 
 Problem Presentation (PLATO only) 
 
 Trouble 
 
 WN 
 
 What Next (subject confused) 
 
 Figure 3.2. Activity Categories . For most of this paper, CP — calculate 
 with Paper and Pencil — has been lumped together with "Think." 
 
10 
 
 h. Results 
 
 Ideally, we would show that our seven subjects mirror the population 
 and that the main result - increased time on PLATO - appeared equally in 
 both taped and untaped subjects. Then it would be reasonable to claim that 
 detailed behavioral analysis of the taped subjects represents the behavior 
 of the entire population. Unfortunately, the subjects do not represent the 
 population and taping did influence behavior. Our task then is to examine 
 closely the ways in which these differences affect the possible conclusions. 
 Fortunately - as we will discuss in section 5 - many effects were obvious 
 from unaided observation, so extensive statistical analysis is not essential, 
 
 U.l Bias Analysis 
 
 The essential data for comparing the user population with our 
 seven subjects is presented in figures U.l and U.2. Figure U.l shows 
 the backgrounds of the subjects and their times and scores for each 
 experimental treatment. Figure U.2 shows their responses to the post-test 
 questionnaire . 
 
 U.l.l Do the subjects represent the population? 
 
 As far as field of study, typing skill, and approximate hours on 
 PLATO, the subjects are reasonably representative of the population, though 
 perhaps with slightly more years of school. In all cases their ages were 
 consonant with their class standing. Since there were no freshmen, all 
 subjects had prior experience with exams on paper. The four grade dependent 
 variables are all above average for the expected population. This 
 
11 
 
 -p 
 
 
 DO 
 
 ^ a 
 
 u 
 
 3 3 
 
 •H 
 
 O X 
 
 N 
 
 W ft 
 
 
 < 
 
 
 ft 
 
 
 O 
 
 
 c 
 
 X 
 
 o 
 
 o 
 
 o 
 
 ^ 
 
 • EH 
 
 ft 
 
 S3 
 
 <c 
 
 W ft 
 
 bO 
 
 t3 
 
 0) 
 
 -p 
 
 O CU 
 <U <& 
 
 ft o3 
 X ^ 
 ft O 
 
 o 
 
 •H 
 -P 
 
 cd 
 
 H 
 a5 
 
 > 
 
 <D 
 
 CO ft 
 
 T5 
 H 
 <U 
 •H 
 ft 
 
 05 
 
 (1) 
 
 <; pq pq pq o ft <; 
 
 II +1 
 
 < pq pq pq pq < 
 
 omooo 
 
 OJ H 
 
 O LA O 
 
 H CO 
 
 A 
 
 A 
 
 CU <L> <U <D 
 
 £_, ^ ^ ^ ^ 
 
 ^ ^ ^ ^ 
 
 & co <xi CO CO CO co 
 
 •rj > O > > > > 
 
 -d o O' O o O O 
 
 <U H ,H H H H 
 
 2 CO A |co CO CO CO 
 
 pq pq pq pq O ft < 
 
 o 
 o 
 
 O 
 
 T3 T3 T3 ?h 
 O O O -H 
 
 o 
 
 . bO 
 bO C 
 £ ft 
 ft • 
 
 OHO 
 
 bO 
 d 
 
 ft '— 
 • CO 
 
 cu ft 
 
 S — - 
 
 O Q O o5 o3 O OJ 
 
 O O O ft En O > 
 
 
 u 
 
 • 
 
 •H 
 
 & 
 
 DO 
 
 -p 
 
 fel 
 
 01 
 
 fl 
 
 £ 
 
 ft 
 
 O ft K ft O ft ffi 
 CO h) CO CO CO h> *-3 
 
 
 
 co 
 
 
 
 -p 
 
 
 
 H 
 
 1 -p 
 
 
 3 
 
 ,Q o 
 
 
 L0 
 
 3 <u 
 
 << pq o ft w fm o 
 
 <D 
 
 CO t"3 
 
 CO CO CO CO CO CO CQ 
 
 ft 
 
 o 
 
 •H 
 -P 
 CJ 
 05 
 ft 
 
 to 
 
 •H 
 
 ts 
 
 CO 
 
 ft 
 
 Oh 
 
 ft 
 
 d) 
 ft 
 oj 
 ft 
 
 CD 
 ft 
 
 -P 
 
 co g 
 
 ^ a 
 
 ■H M 
 
 ft ft 
 
 I -p 
 
 ,£> o 
 
 3 CU 
 
 CO "-J 
 
 C— CM VD ON OJ -3" CO 
 
 H oo oo oo oo c\J oo 
 
 po on on uov on vo -=t 
 
 -=t OJ OJ OO H OO OJ 
 
 H OJ O rH H ON 00 
 
 H H H OJ H OJ 00 
 
 O LTN LTN ON VD VD O 
 OO OO OJ 00 OJ 00 Lf\ 
 
 LT\ J- t— CO V£> t— On 
 oj OJ oj oo oj -=r -* 
 
 OO CO CO -=1- 
 
 t— -=J- O H -* O H 
 
 H H H H H OJ 
 
 OJ CO 
 
 On 
 
 oo On no f-J-vo ir\ 
 OJ H OJ H OJ H H 
 
 CO 00 CO LTN On CO H 
 H H 
 
 <! <; pq pq < pq pq 
 ft oj ft Oj 05 ft o5 
 
 ft ft ft ft ft ft ft 
 
 <; pq o ft ft ft O 
 
 CO CO CO CO CO CO CO 
 
 H l>- 
 
 oo 
 
 OS 
 OJ 
 
 CO 
 
 H 
 
 -cf O 
 
 _d- CO 
 00 
 
 t- H 
 
 oo o 
 
 OO H 
 
 OJ 
 
 oj -=r 
 
 H 
 
 -* H 
 
 o j- 
 
 OJ 
 
 On 
 CO 
 
 pq lt\ 
 
 •H 
 -P 
 
 o3 
 
 
 X 
 
 (L) 
 
 •H 
 > 
 
 
 
 -J" O 
 
 oj ti 
 
 <U 
 
 CO 
 
 bO ^ 
 •H -H 
 ft <tH 
 
 b +° 
 
 
 o5 co 
 
 
 
 Ti 
 
 a o5 
 
 o a 
 o 
 
 •H 
 -P 
 CO 
 
 < <u 
 
 U 
 
 (U -p 
 ft CO 
 
 o5 cu 
 
 ft 4J 
 
 II 
 
 II -P 
 
 CO 
 
 05 O 
 
 ft ft 
 
12 
 
 SA SB SC SD SE SF SG 
 
 1. liked PLATO vs. paper (o = much less) d c d a c d c 
 
 2. experiment bother (c = no bother) a c b b c b c 
 
 3. PLATO content difficulty (e = too trivial) c c c c c d d 
 
 k. PLATO instructions difficulty 
 
 (e = very confusing) e c c d c d b 
 
 5. concentrate on PLATO (c = not able to) baabbaa 
 
 6. paper content difficulty (e = too trivial) c c c c c d d 
 
 7. concentrate on paper (c = not able to) aaaaaaa 
 
 8. PLATO switch frequency (c = once each) a b b b c b c 
 
 9. PLATO switch difficulty (c = very little) a c c c b b c 
 
 10. paper switch frequency (c = once each) b b b c b c c 
 
 11. PLATO keyboard hindrance (e = keyboard 
 much preferred) b b c e c c a* 
 
 12. PLATO typing impact (d = no influence) b c c c c b b** 
 
 13. PLATO reading time (e = much less) b c c b c c c 
 
 ik. PLATO vs. paper preference 
 
 (a = PLATO; b = paper; c = both; d = 
 
 don't care) b d c a a b d 
 
 15. did PLATO delay (yes; no) y y n n y y y 
 
 Satisfaction 1.7 3.2 3.6 3.9 3.2 2.k 3.8 
 
 Figure k.2. Summary of Post Test Questionnaire Responses . The taglines in 
 
 parentheses indicate the permitted range of response and the meaning of that 
 
 response. *Response "a" meant "no hindrance". "b" through "e" expressed 
 
 preference. **The subject wrote that the impact was "increased speed and 
 reduced clerical errors . 
 
13 
 is reasonable since they volunteered for the sometimes traumatic experience 
 of taking an examination. Unlike some students, the subjects are at least 
 tolerant of exams. Moreover, they are probably more interested in new 
 experiences, computing, PLATO, and interactive systems. Class standing, 
 grades, and interest all tend to bias the experiment away from finding a 
 difference between PLATO and paper because such students can be expected 
 to do better than average at adapting to the new situation of PLATO and 
 the exam system. 
 
 Partial evidence for similarity between our subjects and the user 
 population is that the correlation between their GPA and first hour exam 
 scores is .6l, a value similar to that for the population. 
 
 1+.1.2 Were the subjects fairly distributed between treatment groups? 
 
 A careful analysis of the differences and similarities between 
 taped and untaped subjects is essential to understand how conclusions 
 from analysis of the tapes apply to the untaped subjects. This section 
 compares the two groups with respect to ability and adaptation to PLATO. 
 Because the groups are reasonably similar, it is not necessary to rigorously 
 control for ability in further analysis. 
 
 The initial assignment of subjects to treatment group was based 
 on typing skill (Figure U.la) and not score-on-first-hour-exam since that 
 exam did not take place until shortly after the experiment. The first- 
 hour-exam scores are therefore a good check on the assignment of subjects. 
 Examining them we find that among the taped subjects the highest and lowest 
 both started on PLATO and the two in the middle both started on paper. For 
 the untaped subjects, the highest and lowest started on paper and the 
 
Ill 
 
 middle started on PLATO. In any case, it is difficult to see how this 
 might have influenced the results of the experiment. The similarity of 
 scores between the taped and untaped groups makes comparison reasonable. 
 
 A "satisfaction with PLATO" score can be computed by assigning 
 numeric values between one and five to post-test questions 1, U, 5, 9, 
 and 11-15, with five indicating complete satisfaction. As shown in figure 
 U.2, these values again suggest that the taped and untaped subjects are 
 sufficiently similar. 
 
 Curiously, though without statistical significance, the untaped 
 subjects had a higher average GPA (k.3 vs. U.l) while the taped subjects 
 had a higher average score on the first hour examination (87 vs. 85). 
 It is possible the taped subjects had more interest in computing relative 
 to other subjects than did the untaped. If so, the assignment of subjects 
 may have introduced a preference for PLATO among the taped subjects, or 
 it may have introduced a propensity for staying on PLATO longer just to 
 "play" with its many bells and whistles. Observation did not seem to 
 bear out either of these possibilities, and they are probably small enough 
 to be overwhelmed by other effects. 
 
 The only serious disparity between treatments is that the students 
 who took form A on PLATO averaged nine points higher on the first hour 
 exam than those who took it on paper. As will be discussed below, this 
 disparity does not appear to have had serious consequences. 
 
 U.1.3 Did the experimental procedure disturb the students? 
 
 There is considerable evidence to show that the experimental 
 situation did influence the behavior of our subjects. 
 
15 
 
 a) Taped and untaped subjects differed markedly in correlations between 
 experiment score and student ability, as measured by grade point 
 average and score on first hour-exam. For the untaped subjects these 
 correlations were nearly 1.0; for the taped subjects they were ,lk 
 and -.18. Indeed examination of the scatter plot in figure k.3 shows 
 a relationship between scores and ability for the untaped, but not for 
 the taped. 
 
 b) The time difference between PLATO and paper was much greater for the 
 taped subjects - 10.8 minutes versus 3.3 for the untaped. It seems 
 likely that taped subjects had more trouble due to the pressure they 
 were under. 
 
 c) The post-test questionnaire responses showed some negative reaction 
 to the experimental situation. In reply to question two, untaped 
 subjects indicated that the experimental situation was no bother, 
 
 but more of the taped subjects felt bothered to some extent. Moreover, 
 two taped subjects (and no others) wrote comments on the situation: 
 "The atmosphere had some affect on concentration." 
 "Being conscious of being watched and recorded did make me 
 apprehensive ... .with PLATO you are "broadcasting" your answer 
 to the world ." 
 (underlined in original). The responses to questions four and five are 
 consistent with a negative reaction, but do not themselves indicate one, 
 Although there are many possible explanations for these results 
 (for example, (a) may have resulted from differing amounts of preparation), 
 the most likely is that subjects were influenced by the experimental 
 procedure. Intimidated by l) questions on a subject they were just 
 
16 
 
 
 50 . 
 
 
 
 
 ^5 ■ 
 
 
 
 
 ho • 
 
 
 
 Average 
 
 
 
 
 Experiment 
 
 35 i 
 
 
 
 Score 
 
 
 
 
 
 30 ■ 
 
 
 
 
 25 ■ 
 
 SC 
 
 SE 
 
 SD 
 
 SB 
 
 SA 
 
 Weighted Average Ability- 
 Figure k.3. Comparison of Ability and Scores . The untaped subject (boxes) 
 showed a relationship (correlation = 1.0-) while the taped subjects did not 
 (correlation = .03). "Weighted Average Ability" is the sum of GPA times 20 
 and first hour exam score. "Average Experiment Score" is the average of the 
 subject's two scores. 
 
 
17 
 learning, 2) a system they -were just learning, and 3) not only close 
 
 observation, but also preservation for some posterity, the subjects 
 
 performed erratically. It is our hope that the increased stress did not 
 
 modify the difficulties but only magnified their visibility. 
 
 One technique we should have used more thoroughly to control for 
 
 apprehension was to observe the untaped subjects more closely. Knowledge 
 
 that a written record was made of every action would put more pressure 
 
 on them than our procedure of simply observing which page they turned to. 
 
 U.l.U Were scores and times influenced by exam form or order? 
 
 In order to ignore the difference between forms A and B and the 
 difference between first and second treatment, it is necessary to show 
 that these factors did not affect the main result. Non-existence of an 
 effect is impossible to prove, but we have some evidence from Student's t 
 test. Six tests were made with a paired-subjects test for mean different 
 from zero over all seven subjects (df = 6). The following tests all 
 yielded probabilities higher than .4, suggesting they may very well be 
 the result of chance: 
 
 the difference in times between form A and form B (t = -.72), 
 the difference in scores between form A and form B (t = -.52), 
 the difference in times between first exam and second exam 
 
 (t = .80), 
 the difference in scores between paper exam and PLATO exam 
 (t = .28). 
 The difference in times between PLATO and paper was unlikely to have been 
 due to chance (t = 2.68, pr < .05), as will be discussed in the next 
 section. 
 
18 
 The sixth t-test, for the difference in scores depending on order, 
 gave a t value of -1.3*+ (pr < .3), which suggests a closer look. Further 
 examination shows that the group who took PLATO first got about the same 
 score as the group who took PLATO second, while the group who took the 
 paper exam first scored higher than the group who took it second. In 
 fact, all but one subject got a higher score on the first exam and that 
 one was SA who had a very negative reaction to PLATO. These results are 
 counter-intuitive since one would expect subjects to score better on paper 
 and on their second exam. However, detailed examination of the responses 
 to the paper exam do not reveal any systematic explanation. The subjects 
 who took the paper exam second lost most points simply from failure to 
 answer the format question. They also suffered more than the others from 
 confusion over an eight letter identifier generated in an ASSIGN statement 
 (in form A). The failure to improve on the second exam might indicate 
 that there is no learning from an exam, but other research [Kulhavy, 1972] 
 indicates that it simply takes longer for learning from an exam to take 
 effect. More likely is the explanation that subjects became bored or were 
 unable to concentrate on the second exam. 
 
 Examination of scores and times by pairs of variables reveals the 
 two interactions shown in figure k.ka: 
 
 l) The group who took form A on PLATO had a higher average score than the 
 other subjects. This result could be expected because this group 
 averaged about ten points higher on the first hour-exam. There were 
 also several respects in which the paper version of form A was probably 
 harder than versions produced interactively. Here, too, the eight 
 letter identifier in the ASSIGN statement was a major factor. 
 
19 
 
 (a) Scores 
 
 form A 
 
 form B 
 
 PLATO paper first second 
 
 form A 
 
 37.3 30.5 
 
 31.0 39-7 
 
 form B 
 
 28.7 37.0 
 
 40.8 26.7 
 
 (b) Total Time 
 
 first 
 
 PLATO 
 
 paper 
 
 second 
 
 Figure k.k. Interaction Effects . Note that in each case diagonally opposite 
 values were compiled by the same group of three or four out of the seven 
 subjects . 
 
 (a) Average Score . The form-A-PLATO-form-B-paper group was subjects 
 SA, SD, and SG. The form-A-first-form-B-second group was subjects 
 SA, SB, and SE. 
 
 (b) Total Time in minutes. The PLATO-first-paper-second group was 
 subjects SA, SC , and SF. 
 
20 
 
 2) The group who started on form A did better than the remaining subjects. 
 
 Since this result is independent of PLATO vs. paper, student skill, and 
 
 several other variables, we can propose no reasonable explanation. 
 
 A more subtle interaction is revealed by examining total time 
 
 organized by treatment and order, as shown in figure U.Ub. Those who 
 
 started on paper took only four minutes longer on PLATO, but the others 
 
 took much longer on PLATO and much less on paper. One way to express 
 
 this relation is to say that total time in minutes is 
 
 9 . U + {k.k if this is the first exam) 
 
 (U.l.U-1) 
 + {8.2 if this is a PLATO exam} 
 
 The decreased time on the second exam may be from a learning effect or a rush 
 
 due to loss of interest. The higher scores on the first exam would argue for 
 
 the second explanation. If this introduces bias, however, it is probably in 
 
 favor of reducing the time on PLATO and thus is against the finding of the main 
 
 effect. (The "PLATO exam" effect is partially explained in U.2.1.) 
 
 One clear case of learning effect was the variation in proctoring 
 
 between early sessions and later. In early sessions we tended to try to 
 
 have the subjects understand the instructions by reading them just as they 
 
 would on an examination. When high levels of frustration, as for SA, 
 
 appeared, we modified our approach to that of quickly helping subjects 
 
 out of difficulty. In particular, we always showed them how to get out 
 
 of one of the more difficult spots in the syntax problem. This variation 
 
 in the experimental procedure should have been avoided by more careful 
 
 definition of the role of proctors, however, it probably did serve to 
 
 alleviate some of the pressure of the experimental situation. 
 
 
21 
 k.2 Main Effects 
 1+.2.1 Subjects spent longer taking exams on PLATO 
 
 A summary of the data for the four taped subjects is in figure U . 5 . 
 Even when combined with the untaped subjects (figure i+.l), the difference in 
 total time (20. h vs. 12.7) is statistically significant using a paired 
 values t-test for difference not equal to zero (t = 2.68, df = 6, pr < .05). 
 Two different approaches to analysis of the time difference contribute to 
 our understanding of the relative influence of various factors. 
 
 The first approach (figure h.6a) shows that the increased time for 
 taped subjects can be explained as Trouble time and non-productive time: 
 
 1) We estimate a "representative PLATO time" by noting that subjects SA, 
 SB, SD, SF, and SG all had times very close to their average of l6.5 
 minutes after subtracting Trouble time from total time. (We drop the 
 highest and lowest subject. SC spent considerable time in review 
 without changing answers; SE spent IT minutes on the syntax problem alone, 
 presumably having trouble with the answer scheme.) 
 
 2) We calculate a "representative PLATO non-productive time" as the sum 
 of Overhead and Reducible times averaged over the four taped subjects. 
 This value is k.3 and is close to the value for each subject. (it is 
 reasonable to project this value to untaped subjects because it measures 
 only system actions and its magnitude is not large compared to (l). 
 Incidentally this value explains a large part of the 8.2 minutes 
 
 PLATO time in equation U.l.U-1.) 
 
 3) The "representative PLATO productive time" is the difference of the 
 values computed in (l) and (2): 12.2 minutes. 
 
22 
 
 n 
 
 n 
 
 n 
 
 0\ 
 
 H -4" 
 t— -4 
 
 CO 
 
 CM 
 
 t— 
 
 -4" 
 
 OJ 
 
 VO -=* 
 
 -4 -4- 
 
 ITS 
 
 m o\ vd 
 
 WHO! 
 
 I CO 
 
 I o 
 
 o 
 
 _4 
 
 cn 
 cn 
 
 .3- 
 
 IA ftl Ol 
 
 cn co t~ 
 CM O rH 
 
 on 
 o 
 
 cn 
 
 LA 
 
 OJ 
 
 cn 
 
 vo 
 
 LA 
 
 CM 
 
 CO 
 
 H 
 
 on 
 
 H 
 
 CM 
 -4" 
 
 -5t 
 
 on 
 
 CM 
 
 CO 
 
 H 
 
 CM 
 
 H 
 
 CM 
 
 CM 
 
 CM 
 
 CM CO LA H 
 
 1A IA VO IA 
 
 vo -4 vo -4- 
 
 O CO O H 
 H rH H 
 
 I h l cn 
 I I 
 
 00 CO la cn 
 
 ON t— CM t— 
 H 
 
 [— CM O CM 
 rH H CM CM 
 
 
 O 
 
 a, 
 
 gj 
 
 CO 
 
 4> 
 •H 
 P 
 •iH 
 
 > 
 •H 
 -P 
 
 O 
 < 
 
 
 p 
 
 V 
 
 u 
 
 rH 
 
 cu 
 
 fi 
 
 H 
 
 •H 
 
 <L) 
 
 CJ 
 
 co 
 
 13 
 
 + 
 
 -1 
 
 T) 
 
 OJ 
 
 cd 
 
 « 
 
 O 
 
 
 i-l 
 
 ON VO 
 
 <h co cn 
 
 n 
 
 CO 
 CO 
 
 on 
 cn 
 
 CM 
 
 on 
 t— 
 
 , cn co 
 od I CM o 
 
 CM 
 
 -4/ 
 
 vo 
 
 on 
 
 CM OO LA H 
 LA LA VO LA 
 
 LA 
 CO 
 
 on 
 
 ON 
 
 on 
 
 CM 
 
 on 
 
 LA 
 
 o 
 o 
 
 cm on 
 
 p 
 cu 
 co 
 
 P 
 O 
 0) 
 
 o 
 
 4> 
 XI 
 Eh 
 « 
 
 > 
 
 41 
 
 • -H 
 
 CO > 
 
 T3 41 
 
 a u 
 o 
 o u 
 
 41 O 
 CO «M 
 
 T3 >> 
 C H 
 
 a c 
 
 O 
 CO 
 4i 4) 
 
 P *H 
 
 3 ed 
 G 
 •H G 
 
 s 5 
 
 O 
 G X 
 
 •H CO 
 
 * * 
 
 o o 
 
 CM 
 
 on 
 -4- 
 
 , J" ON 
 
 ct3 I on cm 
 
 cn 
 cn 
 
 n 
 
 cn 
 
 cn 
 
 * * 
 
 o\ CO 
 
 cn 
 cn 
 
 on 
 
 VO CO 
 CM CM 
 
 ON ON 
 
 -4" 
 
 on 
 -4- 
 
 vo 
 cn 
 
 rH 
 
 CM 
 
 ON 
 H 
 
 O 
 
 cn 
 
 cn 
 o 
 
 LA 
 
 H 
 
 CM 
 CM 
 
 CM 
 
 o 
 
 H 
 
 -4" 
 
 LA 
 LA 
 
 O 
 
 on 
 
 vo 
 
 cn 
 
 H 
 
 ON 
 
 CO 
 
 ON 
 
 o 
 
 c— 
 
 CM 
 
 on 
 _4 
 
 ON 
 H 
 
 -4" 
 CM 
 
 CM 
 H 
 
 CM 
 H 
 
 CO 
 
 rH 
 
 O 
 H 
 
 on 
 H 
 
 * * * * 
 H O H H 
 
 4 CO \0 VD 
 H H H 
 
 m on 4 j 
 
 -4- 
 on 
 
 H ON O 
 
 on on -4- 
 
 tJ 
 
 c 
 
 a 
 
 cu 
 
 $ 
 
 XI 
 
 + 
 
 Sh 
 
 >i 
 
 cu 
 
 al 
 
 > 
 
 H 
 
 o 
 
 P 
 
 
 co 
 
 p 
 
 •H 
 
 Ph 
 
 Q 
 
 o 
 
 LA 
 
 t— 
 
 on 
 
 r— 
 
 ON 
 
 
 
 H 
 
 CM 
 
 H 
 
 H 
 
 CJ 
 
 >t 4) 
 
 O CO 
 
 a 
 
 p 
 
 -4" 
 
 CM 
 
 H 
 
 CO 
 
 CU tJ 
 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 Effici 
 pts/pro 
 
 1-4 
 
 P4 
 
 VO 
 
 t— 
 
 VO 
 
 CM 
 
 ^— * 
 
 
 CM 
 
 CM 
 
 on 
 
 CM 
 
 
 
 LA 
 
 CM 
 O 
 
 CM 
 
 u 
 
 
 VO 
 
 rH 
 
 O 
 
 -4" 
 
 4) 4) 
 
 > > 
 
 gj 
 
 LA 
 
 on 
 
 on 
 
 H 
 
 •H CO 
 
 
 VO 
 
 _4 
 
 o 
 
 O 
 
 S3 
 
 
 
 H 
 
 H 
 
 H 
 
 2 + 
 
 
 
 
 
 
 'O ^ 
 
 
 O 
 
 o 
 
 ON 
 
 VO 
 
 o G 
 
 i-h| 
 
 CM 
 
 H 
 
 H 
 
 _4" 
 
 U -H 
 
 Ph| 
 
 
 
 
 
 Ph ,G 
 
 
 on 
 
 -4 
 
 ON 
 
 H 
 
 EH 
 
 
 H 
 
 H 
 
 rH 
 
 H 
 
 CM O O _4" 
 
 t— -4" -4" VO 
 
 o o o o 
 
 H co on _4 
 on CM CM LA 
 
 o o o o 
 
 CO 
 
 Si 
 
 o 
 
 on 
 
 LA 
 
 on 
 
 LA 
 CM 
 
 ON 
 
 on 
 
 p- 
 
 PhI 
 
 LA 
 
 CM 
 
 -4" 
 CM 
 
 CM 
 
 CO 
 
 on 
 
 4) to 
 
 U -P 
 
 cd G 
 
 co O 
 
 41 O 
 
 m 
 
 +^ 
 o 
 
 41 
 
 CO 
 
 TH 
 0) 
 P 
 cd 
 
 Eh 
 U 
 
 o 
 
 «H 
 
 m 
 
 CJ 
 
 6 
 
 X 
 4* 
 
 4) 
 CJ 
 •H 
 -P 
 O 
 
 cS 
 
 rH 
 
 fk 
 
 4) 
 
 -P 
 
 bD 
 P 
 
 •iH 
 
 rH 
 
 4) 
 
 to 
 
 CU 
 
 •H 
 
 P 
 
 h-SJTJ 
 
 q.oaC 
 -qns 
 
 P-* 
 
 < pq pq 
 
 CS hH 
 
 Ph Ph 
 
 <; pq 
 
 CO CO 
 
 o 
 
 CO 
 
 Ph 
 
 p 
 
 CO 
 
 4) 
 bD 
 cd 
 U 
 41 
 !> 
 o3 
 
 CO 
 
 
 
 ■H 
 
 P 
 •H 
 
 > 
 •H 
 +3 
 
 O 
 
 cd 
 
 < < m pp 
 
 Ph 
 
 cd i-q 
 P< Ph 
 
 < pq 
 
 CO CO 
 
 cd 
 
 P^ 
 
 O O 
 CO CO 
 
 <; < w w 
 
 Ph 
 
 cd P 
 
 P< Ph 
 
 cd 
 
 Ph 
 
 <; pq o q 
 
 CO CO CO CO 
 
23 
 
 (a) "Representative" Times 
 
 PLATO 
 
 paper 
 
 l)k) Average of "*\ lame Std. Dev. 
 total time less) i6.5 1.2^ 
 Trouble time J 
 
 2)5) Average non- j 
 productive time I 
 (Select, Gener-) U.3 .8U 
 ate, Load, 
 Display) 
 
 
 3)6) Representative^ 
 Productive Time^ 
 
 SuDjects 
 (SF,SG,SA, 
 
 SB,SD) 
 
 (SC,SD,SA, 
 SB) 
 
 Time Std. Dev. Subjects 
 12.2 1.86 (SE,SF,SB, 
 SC,SD) 
 
 .3 
 
 11.9 
 
 .12 (SC,SD,SA. 
 SB) 
 
 (b) Distribution of Activities 
 
 
 
 
 average times 
 
 
 
 
 (1) 
 
 (2) 
 
 (3) 
 
 
 (k) 
 
 
 PLATO 
 
 paper 
 
 PLATO-paper 
 
 (3) 
 
 of 
 
 as a % 
 total (2) 
 
 Reducible overhead "\ 
 (Load, Select) J 
 
 2:32 
 
 :18 
 
 2:lU 
 
 20$ 
 
 Inelastic overhead ^ 
 (Generate, Display)/ 
 
 1:U6 
 
 
 
 l:h6 
 
 
 16% 
 
 Productive work *\ 
 (Think, Answer) J 
 
 1^:39 
 
 10:33 
 
 h:06 
 
 
 31% 
 
 Trouble } 
 
 2:56 
 
 :13 
 
 2:U3 
 
 
 25% 
 
 Total 
 
 21:53 
 
 11:0U 
 
 10:^9 
 
 98^ 
 
 Figure U.6. Breakdown of the Extra Time Spent on PLATO . For description 
 of the categories see the text. Times are in minutes and seconds. The 
 figures in column (bU) express the excess time on PLATO as a percentage 
 overhead beyond the total time required on paper. (For example, the k:06 
 minutes longer productive work on PLATO is 31% of the total 11:0U minutes 
 spent on the paper exam. ) 
 
2\ 
 
 k) The "representative paper time" is total time less Trouble time 
 
 averaged over subjects SB, SC , SD, SE, and SF, who were all reasonably- 
 close to that value: 12.2. (Again we drop the two extreme cases.' SA had 
 great Trouble on PLATO but breezed tnrougn the paper exam. SG was pain- 
 staking enough on paper to achieve a perfect score.) 
 
 5) The "representative paper non-productive time" is .3, the average paper 
 problem Selection time for the four taped subjects. 
 
 6) The "representative paper productive time" is the difference of the 
 values from (h) and (5): 11.9. 
 
 We see that taking account of total time, Trouble time, and non-productive 
 time, the time spent on each exam was about the same. 
 
 The second approach provides insight into the relative influence 
 of the factors responsible for the longer times on PLATO. In this approach 
 the videotape data for the four taped subjects are categorized as shown in 
 figure *+.6b. (These times differ from the "representative times" because 
 they include subjects whose times were not representative of all seven 
 subjects.) In the figure, "Productive work" is that time the user spent 
 working on the problem, independent of the method. "Reducible overhead" 
 is time that is easy to avoid by modification of the system. "Inelastic 
 overhead" is inherent in the PLATO system. "Trouble" is that time when 
 the user was having difficulty understanding the requirements. In column 
 (h) the figure expresses PLATO excess time as a percentage of the time 
 the exam took on paper. This basis will permit valid comparison when we 
 discuss reduction of PLATO overhead in section 5. 
 
 Several other interesting observations can be made from the data 
 and figure h.5: 
 
25 
 
 a) All four taped subjects wrote comments about Trouble on the post-test 
 questionnaire. One even remarked - with cautious ambiguity - that, 
 "All of the instructions were not clear." 
 
 b) Taped subjects had more "activities" on PLATO than on paper, even 
 when the activities are restricted to those actually performed by 
 the human - Think, Answer, Select, and Trouble. From the data of 
 figure U.5, we find an average of 65 for PLATO and 57 for paper. 
 Partly this is because subjects reviewed more on PLATO and partly 
 because often when a "Trouble" activity occurred, the subject had to 
 try several times to enter the answer. 
 
 c) "Overhead" on the PLATO exam was not negligible; the average time for 
 Display, Generate, and Select per problem page was .36 minutes. Even 
 though comfortingly constant, this time is frustrating to the student 
 because there is little to do but wait. The corresponding value 
 
 for the paper exam includes only Selection time; an average of . 0l± 
 minutes. Overhead is especially noticeable on PLATO during review; 
 it averaged over 10$ on PLATO and only 3% for those who reviewed on 
 paper. Indeed, for review, Display time is slightly larger than 
 initial presentation because the student's prior work must be dis- 
 played; this is offset by the fact that no problem Generation time 
 is needed. 
 
 U.2.2 Influences on scores 
 
 Factors other than ability influenced scores in this experiment 
 to an unusual extent. Section U.1.3 has mentioned the impact of the 
 experimental situation; section U.l.U discussed the variation due to 
 
26 
 exam order. Other factors can be observed in figure U.7, which presents 
 scores and Think times for each problem. 
 
 The figure demonstrates that PLATO itself did not reduce scores. 
 Instead, the major factor in lower PLATO scores is that the interactive 
 grading algorithm in the DO loop problem subtracted too many points for 
 a wrong answer, so subsequent correct answers got no credit. Indeed, we 
 expect that PLATO will actually lead to higher scores because it aids the 
 student in several ways . For example : 
 
 SB tried an invalid answer to the syntax problem and was rejected. 
 
 The retry was correct, so the subject received full credit. (On 
 
 paper the invalid answer would simply have been marked down.) 
 SC missed the first line on the DO problem but was given the correct 
 
 answer and got the rest correct. (Relative grading could have 
 
 achieved the same result.) 
 There does not appear to be any inequity in this approach, since all 
 students stand the same chance of being corrected and because invalid 
 answers are likely to be misconceptions that ought to be cleared up on 
 paper by asking the proctor. 
 
 Inequity does occur, however, in the level of difficulty of 
 problems. As shown by the average points per minute (last column in 
 U.j), some problems were fairly easy - problem 2 on PLATO and h on paper; 
 while some were too hard - 3 and h on PLATO. (indeed problem 2 on PLATO 
 was so easy that the only points missed were on the ASSIGN statement, a 
 construct not covered in lecture.) Had there been a time constraint, a 
 student who attempted one of the harder problems would be at a disadvantage. 
 
27 
 
 CO 
 
 
 
 ft 
 
 •H 
 
 -3- 
 
 hO 
 
 
 
 OJ 
 
 > 
 
 
 
 ctf 
 
 
 
 t- 
 
 on 
 
 H VD 
 H 
 
 oo 
 
 OJ 
 
 ON 
 
 CO On 
 OJ on 
 
 3 Si 
 
 o 
 la 
 
 la 
 H 
 
 in 
 H 
 
 O 
 H 
 
 H 
 
 O 
 la 
 
 m 
 
 ft] 
 
 n 
 
 -p 
 
 •H -P 
 
 Ov 
 
 CO 
 
 CM 
 OJ 
 
 LA 
 O 
 
 ON 
 
 LA 
 00 
 
 CO 
 
 oo 
 
 CO 
 
 oo 
 
 LA 
 
 OJ 
 
 vo 
 H 
 
 VO 
 00 
 
 LA 
 
 H 
 
 CM 
 
 ON 
 LA 
 
 CO 
 LA 
 
 OJ 
 
 
 
 
 OJ 
 
 H 
 
 t— 
 
 VO 
 
 CO 
 
 OJ 
 
 CO 
 
 t— CO VO OO 
 
 OJ H OJ H 
 
 OO 
 
 on 
 
 ON 
 
 on 
 
 
 o 
 
 II 
 
 -p 
 
 OJ 
 
 LA 
 H 
 
 t— On oo 
 OJ 00 OJ 
 
 On oo 
 OJ O 
 
 * i 
 
 o 
 
 LA 
 O 
 
 t— 
 OJ 
 
 H 
 
 OO 
 
 00 
 
 -3- 
 
 LA 
 
 H 
 
 on 
 on 
 
 LA 
 
 on 
 
 H 
 on 
 
 CO 
 
 o 
 
 -=r 
 
 
 H 
 
 H 
 
 H 
 
 LA 
 
 LA 
 
 H 
 
 OJ 
 H 
 
 NO 
 
 H VO 
 
 on _=r 
 
 LA 
 
 VO VO 
 
 LA O 
 
 M 
 
 LA 
 
 LA 
 OJ 
 
 VO 
 
 OJ 
 
 -4- 
 o 
 
 ON 
 
 II 
 
 -p 
 
 w >, 
 
 u u 
 
 •H -P 
 
 -p 
 
 ON 
 
 on 
 o 
 
 t— 
 o 
 
 CO 
 
 o 
 
 H 
 
 * i 
 
 O _=!- 
 
 OJ 
 
 vo 
 
 VO 
 
 o 
 
 OO 
 
 H 
 
 oo 
 -3- 
 
 LA 
 
 on 
 
 VO 
 OJ 
 
 VO 
 
 H 
 
 OJ 
 
 LA 
 
 
 
 H 
 
 H 
 
 -^ 
 
 on 
 
 H 
 
 * I 
 o 
 
 LA 
 O 
 
 LA 
 OJ 
 
 o 
 
 OJ 
 
 On 
 
 OJ 
 
 OJ 
 OJ 
 
 o 
 on 
 
 o 
 
 OJ 
 
 LA 
 
 O 
 
 -3" 
 
 OJ 
 
 
 H 
 
 
 
 LA 
 
 o 
 
 H 
 
 
 H 
 
 OJ 
 
 on 
 
 
 
 
 w 
 
 a 
 
 
 
 H 
 
 
 H 
 
 ^H 
 
 -p 
 
 
 
 S 
 
 
 crt 
 
 ft 
 
 K 
 
 - 
 
 - 
 
 H 
 
 o 
 
 •P 
 
 O 
 
 w 
 
 CO 
 
 
 
 PL, 
 
 Q 
 
 -p 
 
 OJ 
 
 on _3- 
 
 ft 
 
 ON 
 
 OJ 
 H 
 
 ft 
 
 -p 
 
 R 
 
 CO 
 
 OJ 
 
 H 
 H 
 
 H 
 O 
 
 CO 
 
 OJ 
 
 CO 
 O 
 
 CO 
 
 on 
 
 OJ 
 
 on 
 
 
 H 
 on 
 
 -tf 
 
 H 
 
 on 
 
 
 on 
 
 H 
 
 H 
 
 CO 
 
 LA 
 
 O 
 
 OJ 
 
 ON 
 H 
 
 OO 
 
 OJ 
 
 H 
 H 
 
 H 
 
 H 
 
 LA 
 
 H 
 
 
 
 H 
 
 
 H 
 
 LA 
 
 oj oo 
 
 EH 
 
 H 
 « 
 ft 
 
 O 
 Q 
 
 LA 
 00 
 
 O 
 
 LA 
 O 
 
 a} 
 -P 
 o 
 -p 
 
 on _=r 
 
28 
 In addition, there were great disparities in difficulty between different 
 instances of the same problem: 
 
 Division problems tend to be harder than other arithmetic expressions, 
 
 yet a subject might receive 0, 1, or 2 of them. 
 The syntax generator sometimes inserted no errors in simple statements 
 and sometimes embedded an error deep in a complex statement. It 
 sometimes generated instances of language features not yet studied 
 (as the action of a logical IF). 
 The format problem required rounding up anywhere from zero to three 
 
 times. 
 Sometimes one iteration of the DO loop would modify an array value 
 
 used by a later iteration. Few students analyzed this correctly. 
 Careful attention to relative difficulty is essential in interactive exam 
 design. Generators in our improved system are based on a detailed analysis 
 of the factors which contribute to difficulty. 
 
29 
 
 5. System Improvements 
 
 As a consequence of this experiement, other observations, and 
 introspection, the exam system has "been improved in many ways. This 
 section discusses these improvements roughly in the order a student 
 encounters them. The percentage overheads are taken from figure U.6; 
 to compare them with improved versions, the denominator is the total 
 time on the paper exam. 
 
 5.1 Reducible Overhead 
 
 On PLATO, character set Loading and problem Selection time 
 constituted a 20% overhead. Load time has been eliminated by restricting 
 PG/G's to the standard set of 128 characters. It includes almost all 
 characters used in common programming languages, and one or two extra 
 characters can be loaded quickly for special problems (for example, the 
 PL/I logical NOT). 
 
 Problem selection time was bloated by a design that required a 
 return to the cover page after every problem. The system has been modified 
 so that the shifted-NEXT key transfers directly to the next problem page 
 and shifted-BACK goes directly to the previous one. These two possibilities 
 account for the vast majority of interproblem transitions. For random 
 access, the shifted-DATA key returns the student to the cover page. 
 
 We have not yet found a satisfactory scheme for problems with 
 multiple pages. Certainly shift-NEXT and shift-BACK should move among 
 
30 
 the pages of the problem, but should there be a "sub"-cover page for the 
 problem? One pilot PG/G uses a vector of page numbers at the bottom of 
 the page, but this scheme can be a little too confusing. The best interim 
 solution seems to be avoidance of multiple page problems. 
 
 5 .2 Inelastic Overhead 
 
 A l6% overhead resulted from problem Generation and Display, 
 mostly on the latter. Unfortunately, Display time depends on the communi- 
 cation link technology and cannot be reduced by changes to the exam system. 
 Moreover, problem Generation time has increased slightly as more sophisti- 
 cated approaches have been used. However, several steps have been taken 
 to reorganize Generation and Display so the time required is more palatable 
 and useful: 
 
 - Basic problem statements are presented before captions and 
 descriptions of the control keys. 
 
 - Problem details are presented as they are generated so the user 
 has something to work on and does not feel the system has halted. 
 
 - Detailed instructions that follow standard conventions are accessible 
 via the HELP key, but are not displayed as part of the main display. 
 
 Though these techniques create the display efficiently, our observations 
 suggest that users often wait for the complete display before starting to 
 work. Possibly more experience and time pressure will encourage productive 
 use of display construction time. 
 
 Other exam systems generate the entire exam when the user first 
 enters the system, or even earlier. We have not chosen this approach 
 because it would be expensive in terms of disk accesses. Two (a read and 
 
31 
 a write) would be required for each PG/G during the generation-only phase. 
 It is our hope that generation for each problem can be short enough to 
 keep disruption to a minimum. (Similarly, grading could be deferred until 
 completion of the entire exam, but isn't because of the disk access 
 limitation. Most PG/G's have straightforward, rapid grading algorithms.) 
 
 5.3 Productive Time 
 
 Although the excess productive time - Thinking and Answering - 
 amounted to 31% of the time to do the exam on paper, there does not seem 
 to be any inherent reason why it should be longer. Indeed, the "repre- 
 sentative time" analysis showed similar productive times with both 
 approaches. In addition to the pressures discussed in U.1.3, longer 
 Think time on PLATO may result from these factors: 
 
 (i) Students are used to having an answer judged immediately on 
 
 PLATO, so they work very hard to be sure they have the correct 
 answer before entering it on the exam, 
 (ii) They do not know how to change an answer, or perceive it as 
 
 a difficult process, 
 (iii) They suspect that a wrong answer will be counted in the grading 
 
 algorithm. (The suspicion is not unwarranted; we have considered 
 this approach. ) 
 (iv) They fear "exposure" since their answers are more readable by 
 
 the proctor (and harder to cover up) than they would be on paper. 
 Improved system design and more practice will help reduce the magnitude 
 of at least the first two of these problems. 
 
 The PLATO time to enter an Answer was 19$ longer than that on 
 paper, but still amounted to only a second or two more per page. This is 
 
32 
 
 probably not a serious factor. Students did not express difficulty with 
 typing answers and typing ability was not a factor in our results. Many 
 future students will be even more familiar with the keyboard because they 
 will have been using PLATO for other courses. 
 
 Nonetheless, the answer entry mechanism is a key element in system 
 design; since no judgement of the answer is expressed, some other action 
 must be taken to show acceptance. In the exam system (even at the time of 
 the experiment) there are two areas for each answer - an entry area and the 
 display area. After entry the answer is moved to the display area, so 
 that even when a new answer is being entered, the old one is still on view 
 in the display area. It is also important that the entry area be adjacent 
 to the display area; this principle is violated by the PRINT FORMAT problem 
 and it is more disconcerting and confusing than the others. 
 
 5 .h Trouble Time 
 
 Trouble time - a 25% overhead in this experiment - is a very 
 variable quantity and will always exist, even on paper exams; the best 
 remedy is to provide a proctor for every exam. Several of the steps 
 indicated above will also help reduce Trouble time, especially the pro- 
 vision that the shifted control keys for travel between pages must always 
 work. In addition, practice examinations and written pre-test instructions 
 help reduce surprise and confusion. 
 
 5. System Improvements (continued) 
 
 In view of the above, we can estimate the overhead for doing an 
 exam on PLATO instead of on paper. Reducible time can be eliminated. 
 Inelastic time can be reorganized so the student pays a penalty of perhaps 
 
33 
 
 no more than 5% of the paper exam time. Productive time need not be any 
 longer, especially if students are given practice exams; we hope for a 
 penalty of at most 5-10$. Similarly, Trouble time can be reduced by good 
 design and training, so the penalty should also be no more than 5-10$. 
 We conclude that it is reasonable to expect that the PLATO version of an 
 exam should take no more than 20% longer than the same exam on paper. 
 
 An important question is whether students should spend any 
 unnecessary time on exams. The benefits to the instructional staff of 
 reduced preparation and grading time are economically tangible, but does 
 a student receive any benefits for the extra time? Among the benefits 
 are unbiased grading, help with getting the answer into the expected 
 format, immediate knowledge of results, and increased flexibility in 
 temporal or physical scheduling (if the student is willing to risk the 
 absence of a proctor and if the obvious user-identity problem can be 
 solved) . We believe these benefits justify the interactive examination 
 approach . 
 
3>* 
 
 6. Further Analyses 
 
 One problem with the videotape technique is the embarrassing 
 richness and variety of data it provides. This section explores a few of 
 the possibilities raised by our data. 
 
 6.1 "Context Switch" Time 
 
 It is not unlikely that when a subject returns to a familiar page 
 it will take some time to switch mental "context" and recall the material. 
 The experiment offers two types of page return: return to the cover page 
 between problems and return to a problem for review. The "context switch" 
 time on return to the cover page is not directly available in our data 
 because we encoded the entire time from the end of one problem to the 
 start of the next as Select time. However, total problem Select time is 
 known and has four distinct components: Display time for the cover page, 
 "context switch" time, Think time, and the time to press a single key for 
 the next problem. The computation is summarized in figure 6.1. Column 
 (l) is the average Select time for those occasions (including quitting) 
 where selection was via the cover page. Column (2) is the problem Display 
 time computed by dividing the total PLATO Display time by the number of 
 problem pages viewed. (Display time varies slightly with changes in system 
 load.) The next three columns are estimates of the time to choose the next 
 problem and to type the corresponding problem number. Column (3) estimates 
 this as the average time required to select the next syntax problem by 
 
35 
 
 (1) 
 
 (2) 
 
 (3) 
 
 SA 
 SB 
 SC 
 SD 
 N 
 Avg, 
 SD 
 
 M (5) (6) 
 Select Max of 
 
 (T) 
 Think & 
 
 Select Display , rovm " Answer 7^' "\ 7o i^ Switch 
 * J NEXT (paper) (3, 4) h_2_6 
 
 15.8 
 
 7.0 
 
 1.8 
 
 3.0 
 
 2.6 
 
 3.0 
 
 5.8 
 
 13. T 
 
 5.U 
 
 3.0 
 
 h.l 
 
 1.1 
 
 h.l 
 
 3.6 
 
 Hi. 6 
 
 7.2 
 
 k.O 
 
 1.7 
 
 l.k 
 
 k.O 
 
 3.U 
 
 15.3 
 
 7.2 
 
 k.Q 
 
 3.1 
 
 3.3 
 
 k.8 
 
 3.3 
 
 33 
 
 ^5 
 
 16 
 
 95 
 
 35 
 
 56 
 
 ~ 
 
 1U.88 
 
 6.81+ 
 
 3.56 
 
 3.15 
 
 2.03 
 
 U.00 
 
 U.O 
 
 T.U5 
 
 2.38 
 
 2.37 
 
 <2.30 
 
 1.5 
 
 2.3U 
 
 
 Figure 6.1. " Context Switch" Time Calculation . All times are in seconds. 
 "N" is the total number of occurrences for these subjects. The averages and 
 standard deviations are with respect to all occurrences. See text for 
 description of columns. (The value of the k.O for the average of column (7) 
 is close to both its horizontal value — horizontally computed from the 
 averages from columns (l), (2), and (6) — or its vertical value — the average 
 of the values in column (7).) 
 
36 
 
 pressing shift-NEXT (the one case of Select time that did not use a cover 
 page). Column (h) is the average time to enter an answer (Answer time 
 divided by number of activities from figure k.5). For comparison, column 
 (5) gives Select time on paper. Column (6) is just the maximum value from 
 columns (3) and (k) . The minimum context switch time plus think time in 
 column (7) is the cover page Select time (l) less the sum of Display time 
 (2) and minimum one-key Answer time (6). Ths average of k.O for context 
 switch plus think time suggests that subjects spent that long just staring 
 at the cover page and recovering from the previous problem. Since the 
 thought involved is negligible, context switch time must be a large 
 fraction of four seconds. 
 
 The other approach to context switch time is to consider those 
 occasions when a subject looked at a problem very briefly and went on to 
 another. We can assume they spent that time mostly remembering their 
 work on the problem and that they decided there was little likelihood 
 for improvement. On PLATO, SA had three such occasions, SC had one, and 
 SD had two; for an average time of 6.7 seconds (five were 5 or 7; one was 
 ll). On paper, one subject had four such occasions with an average time 
 of 7.8 seconds (values were 2, 7, 9 5 and 13). 
 
 One implication of "context switch" time is that every time the 
 screen is erased the user pays a time penalty to get reacquainted with 
 the new page. Personal observation suggests that this penalty (though 
 probably less than four seconds) is paid even if the subsequent page has 
 exactly the same format as the former. Thus problem design should 
 emphasize putting a number of similar problems on one display, rather 
 than erasing the screen each time a new question is to be asked. 
 
37 
 
 "Context switch" time suggests that there is some penalty in 
 starting to read any new page and a greater penalty on encountering a new 
 page format. For this reason PG/G's should not use diverse page layouts. 
 In particular, ours all place headings and standard key conventions in 
 the same places. 
 
 6.2 PLATO Experience 
 
 Intuitively, subjects with more prior exposure to PLATO should do 
 better than others on the automated exam system. However, no hint of 
 this relationship can be derived from our data, whether we consider all 
 seven subjects or only those who were videotaped. For example, we can 
 compare SA and SD , who had a total of thirty hours prior experience, with 
 SC and SB who had a total of six hours. It is true that SA-SD spent less 
 time on the PLATO exam, but they also spent less time on the paper exam; 
 moreover, the ratio of PLATO times is less than the ratio of paper times. 
 I.e., SA-SD were not as much faster as would be predicted by their speed 
 on the paper exam. For scores we can compare the groups on first hour 
 exam score, PLATO score, and paper score. The ratios are 1.07, 1.24, and 
 1.15, so those with more PLATO experience scored slightly higher on PLATO 
 than would be predicted by other factors alone. However, the difference 
 is not convincing. 
 
 It is possible that the learning curve is such that three hours 
 PLATO exposure was enough to learn the features used by the exam system. 
 Another possibility is that prior experience on PLATO was actually 
 detrimental. Prior PLATO experience could not have taught the same set 
 of key conventions as the exam system, especially the return to the cover 
 
38 
 
 page after each problem. Indeed, some PLATO training - for example, 
 waiting for a NO-or-OK judgment on an answer - is antithetical to the 
 behavior of the exam system. Possibly the higher scores achieved by 
 those with more PLATO experience is simply due to more practice with the 
 course material gained by exploiting the available PLATO lessons. 
 
 In addition to exposure to PLATO, we can ask whether exposure 
 to typing affects performance at the terminal. One subject who had 
 little PLATO exposure but had had a typing course did reasonably well. 
 Another who claimed to be a better-than-good typist had little Trouble 
 and entered Answers quickly, but had much more Think time on PLATO. 
 
 6.3 Review Behavior 
 
 A very simple exam system could be implemented if students simply 
 worked exam problems one after the other. No one does, however, so it is 
 important to study the variety and extent of "review behavior" - i.e., of 
 return to problems previously worked on . Such behavior can be a clue to a 
 student's confidence in self and in the answers, so investigation of 
 review behavior can add depth to our knowledge of personality and ability. 
 Two types of review are apparent in our data: " hesitation " - the review 
 of a page prior to going to another, and " rework" - the return to a page. 
 
 Our subjects exhibited a variety of review behaviors, as indicated 
 in figures 6.2 and 6.3. For example, SD reworked no problems on paper, 
 but all of them on PLATO. Subject SC - displaying either caution or 
 uncertainty - hesitated on half the problems and reworked almost all. 
 Most subjects hesitated on half the problems, but SC spent considerably 
 more time at it. Our limited data did not reveal any relationships 
 
39 
 
 w 
 <u 
 ^. 
 o 
 o 
 
 CO 
 
 M 
 
 *H 
 I o 
 
 O > 
 
 u 
 
 > 
 
 0) -P 
 A o 
 
 -p 3 
 
 O t3 
 
 SJ 
 
 o 
 on 
 
 LA 
 
 LA 
 CM 
 
 ON 
 
 oo 
 
 £1 
 
 LA 
 
 CO 
 
 OJ 
 
 CM 
 
 00 
 
 oo 
 
 £J 
 
 00 
 
 ON 
 H 
 
 ON 
 H 
 
 o 
 
 
 la 
 
 H 
 
 00 
 
 o 
 
 H 
 
 £1 
 
 CM 
 
 oo 
 
 o 
 H 
 
 oo 
 
 CVI 
 H 
 
 LA 
 
 H 
 
 on 
 
 H 
 
 LA 
 VD 
 
 -p £— 
 
 SJ 
 
 SJ 
 
 SI 
 
 SJ 
 
 CM 
 
 00 
 
 J-3 LA CM 
 
 LA 
 O 
 
 H 
 
 o 
 
 CM 
 
 00 
 
 o 
 o 
 
 
 CM 
 
 00 
 
 H 
 
 O CM 
 
 LA O 
 
 VD \£> 
 
 VD O 
 
 CM O 
 
 H oo 
 
 H _^- 
 
 -=f oo 
 
 oo h 
 
 00 LA 
 
 ON 
 OO 
 
 00 
 
 t- 
 
 SJ 
 
 -3- 
 H 
 
 CM 
 H 
 
 LA 
 
 O 
 
 OO 
 
 o 
 
 00 
 
 Oil 
 
 O0 
 
 o 
 
 O 
 LA 
 
 OO 
 LA 
 
 H 
 
 o 
 
 0\ 
 H 
 
 u 
 
 
 
 
 
 -P 
 
 ** "» 
 
 -" ^ 
 
 — «. 
 
 • s. 
 
 w 
 
 3 
 
 1 
 
 pq 
 
 
 CI 
 
 •H 
 
 Ph 
 
 ft 
 
 P-, 
 
 ft 
 
 
 ^~*' 
 
 * — ■ 
 
 * — 
 
 
 • 
 
 <c 
 
 pq 
 
 o 
 
 o 
 
 
 en 
 
 CO 
 
 CO 
 
 CQ 
 
 ci 
 
1*0 
 
 3 
 
 p 
 o 
 
 H 
 
 P 
 
 to 
 
 o 
 
 w 
 
 pq 
 
 to 
 
 < 
 ra 
 
 LTN 
 
 On 
 
 on 
 
 
 o 
 
 ITN 
 
 OO 
 LTN 
 
 vo 
 
 
 H 
 
 O 
 
 
 o 
 
 GO 
 
 
 o 
 
 H 
 
 O 
 O 
 H 
 
 l/A 
 O 
 H 
 
 
 o 
 o 
 
 J* 
 
 H 
 CO 
 
 H 
 CO 
 
 ON 
 
 LTN 
 
 
 ITN 
 
 OO 
 LTN 
 
 On 
 
 
 on 
 
 OJ 
 
 t— 
 
 -3" 
 
 ON 
 
 
 OJ 
 
 en 
 
 H 
 
 on 
 
 H 
 
 
 oo 
 
 j3 
 
 o 
 
 LTN 
 
 O 
 
 o 
 
 H 
 
 
 LTN 
 
 0- 
 
 t— 
 
 o 
 
 o 
 
 OJ 
 
 O 
 
 H 
 
 
 
 O 
 H 
 
 o 
 
 
 o 
 o 
 
 H 
 
 -3- 
 H 
 
 0\ 
 
 O 
 H 
 
 
 
 LTN 
 
 On 
 
 OJ 
 
 
 t— 
 
 OJ 
 
 OJ 
 
 H 
 
 
 O 
 
 OJ 
 
 US 
 
 
 OJ 
 H 
 
 On 
 
 -3- 
 
 o 
 
 CO 
 
 O 
 
 
 O 
 
 
 OJ 
 LTN 
 
 en 
 
 H 
 
 O 
 H 
 
 
 
 o 
 
 -3" 
 
 
 
 o 
 o 
 
 H 
 
 H 
 
 OJ 
 H 
 
 t- 
 
 
 
 vo 
 
 H 
 
 
 
 OJ 
 L/N. 
 
 _=f 
 
 H 
 
 H 
 
 
 o 
 
 VO 
 
 o 
 
 
 9 
 
 
 
 NO 
 
 
 o 
 o 
 
 
 On 
 
 
 
 -3" 
 
 
 
 o 
 
 H 
 
 -3- 
 
 
 O 
 
 o 
 
 H 
 
 
 
 ON 
 
 OJ 
 
 
 
 o 
 
 o 
 
 on 
 
 
 On 
 
 O 
 
 O 
 
 LTN 
 
 
 o 
 
 H 
 
 vo 
 
 
 OJ 
 H 
 
 o 
 o 
 
 o 
 o 
 
 CO 
 
 o 
 
 LTN 
 
 on 
 On 
 
 o 
 o 
 
 
 LTN 
 
 o 
 H 
 
 LTN 
 
 H 
 
 
 O 
 
 H 
 
 O 
 
 o 
 
 OJ 
 
 
 O 
 O 
 
 H 
 
 O 
 
 o 
 
 on 
 
 H 
 
 
 LTN 
 
 t— 
 
 on 
 
 o 
 
 
 
 H 
 
 H 
 
 OJ 
 
 
 OJ 
 
 _* 
 
 OJ 
 
 
 OJ 
 H 
 
 aj 
 ■P 
 
 o 
 
 Eh 
 
 uoiq.'eq.TsaH uoiq.Bq.isaH oj\i 
 
 -M 
 
 t3 
 
 
 • 
 
 £1 
 
 aj 
 
 CO 
 
 C 
 
 
 
 p 
 
 i 
 
 o 
 
 •^j 
 
 oO 
 
 1/ 
 
 •H 
 
 £ 
 
 P 
 
 H 
 
 P 
 
 ;• 
 
 •H 
 
 £ 
 
 o 
 
 DB 
 
 CO 
 
 a 
 
 cd 
 
 
 01 
 
 h 
 
 H 
 
 
 
 X! 
 
 ft «H 
 
 A 
 
 
 
 
 p 
 
 P 
 
 *H 
 
 OJ 
 
 
 (0 
 
 o 
 
 H 
 
 o 
 
 W 
 
 
 tJ 
 
 -p 
 
 •H 
 
 w 
 
 t3 
 
 
 <H 
 
 1) 
 
 •H 
 
 u 
 
 
 £ 
 
 3 
 
 a 
 
 P 
 
 Q 
 
 
 •H 
 
 a 
 
 ^ 
 
 OJ 
 
 tJ 
 
 OJ 
 
 d 
 
 X 
 
 h 
 
 •o 
 
 
 p 
 
 o 
 
 ,a 
 
 u 
 
 
 U 
 
 d 
 
 * 
 
 <H 
 
 a 
 
 CO 
 
 p 
 
 o 
 
 3 
 
 
 
 
 
 OJ 
 
 OJ 
 
 OJ 
 
 en 
 
 X! 
 
 C 
 
 d 
 
 <D 
 
 P 
 
 3 
 
 H 
 
 X 
 
 
 
 aj 
 
 o 
 
 OJ 
 
 X 
 
 > 
 
 ,o 
 
 Jh 
 
 o 
 
 
 
 OJ 
 
 X 
 
 rH 
 
 U 
 
 J2 
 
 
 d 
 
 BO 
 
 > 
 
 A 
 
 S 
 
 QJ 
 
 
 o 
 
 •H 
 
 A 
 
 to 
 
 i 
 
 O 
 
 P 
 
 a 
 
 OJ 
 
 
 
 
 0) 
 
 
 TJ 
 
 c^ 
 
 H 
 
 c 
 
 
 O 
 
 ,£> 
 
 •H 
 
 OJ 
 
 
 O 
 
 
 XI 
 
 OJ 
 
 H 
 
 BO 
 
 P 
 
 a 
 
 ft 
 
 
 
 
 
 
 
 J_, 
 
 T3 
 
 c 
 
 CO 
 
 C 
 
 bfl 
 
 § 
 
 •H 
 
 •H 
 
 •H 
 
 
 
 ft) 
 
 Cm 
 
 *- 
 
 "3 
 
 P 
 
 
 -d 
 
 <L> 
 
 C 
 
 U 
 
 OJ 
 
 U 
 
 o 
 
 1 
 
 p 
 
 a 
 
 CJ 
 
 G 
 
 ft 
 
 rH 
 
 
 «H 
 
 a 
 
 ft 
 
 > 
 
 
 OJ 
 
 
 O 
 
 o 
 
 p 
 
 m 
 
 u 
 
 H 
 
 p 
 
 •H 
 
 p 
 
 ^ 
 
 aj 
 
 E 
 
 to 
 
 
 CO 
 
 Oj 
 
 t- 
 
 • 
 
 P 
 
 H 
 
 •H 
 
 M 
 
 d 
 
 X3 
 
 <4H 
 
 Fh 
 
 •H 
 
 
 
 
 o 
 
 o 
 
 H 
 
 OJ 
 
 > 
 
 ft 
 
 ft X 
 
 OJ 
 
 
 
 P 
 
 h 
 
 OJ 
 
 < 
 
 
 
 X 
 
 
 * 
 
 Dj 
 
 p 
 
 
 CO 
 
 
 
 • 
 
 
 
 bD 
 
 * 
 
 > 
 
 X 
 
 C 
 
 tJ 
 
 OJ 
 
 EH 
 
 •H 
 
 0) 
 
 ■H 
 
 > 
 
 
 ^ 
 
 > 
 
 
 OJ 
 
 • 
 
 ■d 
 
 •H 
 
 K 
 
 p 
 
 
 X! 
 
 
 •H 
 
 M 
 
 CJ 
 
 P 
 
 
 <u 
 
 aj 
 
 aj 
 
 HO 
 
 > 
 
 
 
 C 
 
 m 
 
 CO 
 
 en 
 
 •H 
 
 ri 
 
 P 
 
 to 
 
 M 
 
 3 
 
 d 
 
 0) 
 
 U 
 
 
 •H 
 
 O 
 
 O 
 
 OJ 
 
 o 
 
 o 
 
 > 
 
 X 
 
 ft 
 
 2 
 
 
 p 
 
 
 ra 
 
 OJ 
 
 
 OJ 
 
 
 H 
 
 nd 
 
 X 
 
 
 •H 
 
 <D 
 
 p 
 
 • 
 
 x: 
 
 &D 
 
 
 on 
 
 > 
 
 a 
 
 #* 
 
 • 
 
 
 a 
 
 •d 
 
 vO 
 
 »H 
 
 Jd 
 
 OJ 
 
 
 O 
 
 o 
 
 p 
 
 <D 
 
 •H 
 
 
 ft 
 
 U 
 
 > 
 
 U 
 
 a 
 
 s 
 
 aj 
 
 cu 
 
 OJ 
 
 ,d 
 
 +J 
 
 p 
 
 •H 
 
 OJ 
 
 cd 
 
 p 
 
 fe 
 
 X> 
 
 H 
 
 aj 
 
Hi 
 
 between review "behavior and any one of score, PLATO satisfaction, or total 
 non-review productive work time. 
 
 One good technique for examination taking is to scan the entire 
 exam "before starting work. Though none of our subjects adopted this 
 strategy, it is important to note its impracticality on PLATO due to the 
 high overhead of page turning. Despite this, our subjects did rework 
 more problems on PLATO. Such extra rework may reflect uncertainty about 
 the answers, but they were not changed any more than on paper. More 
 likely, the increased rework was to have the assurance that the system 
 really had retained the student's answers. 
 
 The data in figure 6.3 illustrate the detail of analysis that can 
 be accomplished with the videotape approach. In this table each problem 
 for each subject is assigned to a cell according to whether the subject 
 hesitated on or reworked the problem and whether the answer was changed 
 during rework. The data suggest the following hypotheses: 
 
 a) If a student hesitates and later reworks a problem it is as 
 likely to be changed as not . 
 
 b) If there is hesitation, but no rework, the answer is likely 
 to be correct . 
 
 c) If there was no hesitation, there is unlikely to be any change 
 during a rework. 
 
 Hypothesis (b) might correspond to the subject gaining confidence during 
 the hesitation. Similarly, hypothesis (c) might correspond to a complete 
 confidence in the solution. Such cues, if valid, would provide a basis 
 for evaluation of self-confidence. Subsequent comparison with the final 
 score would provide a measure how realistic the student is. Knowledge 
 
1+2 
 of this factor would be invaluable, for example, in consultation with the 
 student concerning study habits. 
 
 6.k Nuisances, Annoyance, and Agitation 
 
 Many of our other observations contribute to the theory of 
 "nuisances, annoyance, and agitation": 
 
 A nuisance is a system action or requirement that bothers a user. 
 
 Annoyance is the user's immediate response to a nuisance. 
 
 Agitation is the cumulative effect of multiple annoyances. 
 In this theory, annoyance and agitation decay exponentially, but agitation 
 decays more slowly each time one annoyance follows closely on 
 another. Both decay rates depend on the personalities of the individual 
 users. For example, we can hypothesize someone with a "rigid" personality 
 who would adapt well to the constraints of interactive computing, but 
 would have increased agitation when the system fails to be consistent. 
 Even non-rigid personalities may suffer on exams where agitation is 
 compounded with tension and ego-involvement. 
 
 Quantification of agitation may have a significant impact on 
 system design. If an economical measure can be found the system could 
 detect problems and modify itself or suggest alternative approaches to 
 the user. If measurement requires extensive subsidiary equipment, it 
 can still be a valuable tool in testing system designs and suggesting 
 training regimens. Ultimately, understanding of the causes of agitation 
 can lead to better methodologies for initial system design. 
 
 At present, agitation itself can best be measured indirectly 
 as by the "satisfaction" scale in figure k .2 or by inference from behavior. 
 
1*3 
 
 However, direct observation can reveal nuisances such as the three classes 
 we found in our experiment: work habit change, interaction uncertainty, 
 and surprise. 
 
 A work habit - for example underlining key words or lifting the 
 page corner while finishing an answer - may help a worker concentrate 
 and reduce anxiety. When a tool change forces a change in habit, the 
 worker will suffer momentary distraction each time a habitual action is 
 thwarted. Unfortunately a switch from paper to an interactive system - 
 no matter how valuable otherwise - requires significant habit changes. 
 The biggest changes are the switch from pen or pencil to keyboard, the 
 increased formality of page turning, and the severe constraints on 
 modification of the visible image. Marginal notes are no longer possible; 
 for even a simple manual calculation the entire torso must turn. 
 
 As evidence of changes in work habits we present in figure 6.h 
 the times subjects spent calculating with paper and pencil. The first 
 two subjects performed similarly on both media, but the other two did 
 much less hand calculation on PLATO. Curiously, the latter two also 
 had higher "satisfaction" scores. In all cases, the total times and 
 scores were similar for both PLATO and paper. Another variety of change 
 was exhibited by SB who pointed at the variables and their values on 
 paper, but not on the PLATO display. 
 
 Interaction uncertainty refers to a little noticed disadvantage 
 of the very flexibility which is the greatest advantage of interactive 
 systems. Because a system can exhibit so many behaviors, it far less 
 predictable than a piece of paper; a new user must always wonder, "What 
 
uu 
 
 manual calculations 
 
 think and answer 
 
 
 first 
 
 PLATO 
 time # 
 
 paper 
 time # 
 
 PLATO 
 
 paper 
 
 sat 
 
 
 subject 
 
 time 
 
 score 
 
 time 
 
 score 
 
 isfaction 
 
 SA 
 
 PLA 
 
 1:1+9 
 
 5 
 
 1:50 
 
 5 
 
 1:1+3 
 
 15 
 
 :23 
 
 12 
 
 
 1.7 
 
 SB 
 
 paA 
 
 :10 
 
 1 
 
 :23 
 
 1 
 
 h:56 
 
 9 
 
 3:38 
 
 9 
 
 
 3.2 
 
 SC 
 
 PLB 
 
 :51 
 
 3 
 
 2:12 
 
 6 
 
 5:07 
 
 12 
 
 2:lU 
 
 12 
 
 
 3.6 
 
 SD 
 
 paB 
 
 :55 
 
 2 
 
 2:1+6 
 
 1 
 
 3:18 
 
 9 
 
 :32 
 
 6 
 
 
 3.9 
 
 Figure 6.1+ . Comparison of Manual Calculation Behavior for the Arithmetic 
 Expressions Problem . Times are in minutes and seconds. "#" is the number of 
 instances of manual calculation. The "productive time" in Figure h.f is the 
 sum of the two times shown here. The "satisfaction" score is copied from 
 Figure 1+.2. 
 
^5 
 
 will it do now?" In an instructional lesson, imaginative variety can 
 help maintain student interest, but on an exam it only heightens anxiety. 
 Uncertainty arises from the variability of the time to create a screen 
 image and from the fact that images may be constructed in random order. 
 The student must integrate the image as it forms (and hope to find and 
 read each piece before the next appears), or wait for the full display. 
 Our system may further increase insecurity by moving each answer after 
 it is entered, and by occasionally rejecting an answer, features which 
 have been cited as advantages in other sections of this paper. 
 
 The evidence for interaction uncertainty is sketchy. The 
 videotapes seem to show subjects waiting patiently for the full display 
 and then moving to an attitude of attention when the display is complete. 
 In addition, subjects reviewed more on PLATO, both hesitating and reworking 
 longer and more often (see section 6.3), possibly to check that the 
 system had actually retained their answers . 
 
 In section 5 we advocated nonlinear image generation to emphasize 
 key points and utilize generation time. If interaction uncertainty is 
 a severe problem, it may be necessary to reconsider this choice. In any 
 case, students must receive considerable exposure to PLATO and the exam 
 system prior to the exam in order to gain confidence in the predictability 
 of the system. 
 
 Surprise nuisances are unusual actions which the student will 
 
 encounter only rarely, whether because they are bugs or are inconsistent 
 
 with other system behavior, or only occur when the student performs an 
 uncommon sequence of actions. 
 
k6 
 Several aspects of the exam system surprised our subjects. The 
 return of the arrow to the top of the arithmetic expression page prompted 
 one subject to turn and ask what had happened. Another subject thoroughly 
 reconsidered a syntax problem after answering correctly, possibly because 
 the advance of the arrow suggested there was another syntax error. The 
 lack of immediate judgment of answers was at variance with all prior PLATO 
 experience. Answer rejection caused many problems since it is completely 
 unlike the behavior of the paper exam. Students had to pause to replace 
 an arithmetic expression answer with an "e" (and "E" wouldn't work either), 
 The DO loop problem not only rejected answers, but only gave one second 
 chance. Worst of all, the syntax problem would reject an answer and then 
 reject all control keys until an acceptable answer was entered. The 
 result of all these answer entry surprises may be a pause by the student 
 after entering each answer. If so, this would be a possible explanation 
 of the hesitation behavior described in section 6.3. 
 
 The nuisances of work habit change, interaction uncertainty, and 
 surprise can combine to provoke very negative reactions to a system, as 
 happened to subject SA. Considerable research is needed to learn how to 
 detect and eliminate such problems, preferably before the system is 
 implemented. 
 
hi 
 
 7. Conclusions 
 
 In videotaping as few as four subjects, it was not our intention 
 to produce statistically defensible results, but rather to discover trends, 
 to help form hypotheses, and to study videotaping as a tool. As a dividend, 
 we were able to find numerous exam system improvements which - while only 
 suggested by the data - were seen to be important by introspection and 
 observation of larger groups of students. 
 
 7.1 Results and Action 
 
 The primary results in section k show that the experiment was not 
 unduly biased by choice of subject or their assignment to groups. Two 
 analyses of the time taken on PLATO versus that on paper were made: 
 
 a) "Representative" times constructed by subtracting Trouble time and 
 system overhead from the total time for a representative group 
 showed that the total time for "normal" exams on PLATO was very 
 close to that for paper (12.2 minutes versus 11.9 minutes; the 
 PLATO group had about three minutes more Trouble time and about 
 four minutes more system overhead). 
 
 b) Micro-analysis of the eight categories of activities by the 
 videotaped subjects showed that the PLATO group spent more time 
 in each of four groups of activities - Productive work, Easily 
 reducible, Inelastic, and Trouble. 
 
 The first of these analyses demonstrated that, under good conditions, 
 PLATO time can be similar to paper time. The second indicated the benefits 
 
U8 
 
 to be gained by efforts to reduce times for each group of activities. Among 
 the specific steps we have taken to reduce PLATO time are these, as discussed 
 in section 5- 
 
 Selection time has been eliminated by defining two control keys to 
 move between problem pages (rather than require a trip to the index 
 for every Selection). 
 
 Load time for character sets has been eliminated by prohibiting 
 special character sets within Problem-Generator/Graders. 
 Display and Generate time has been masked by reordering displays to 
 present crucial information first and to display as much as possible 
 during each step of generation. 
 
 Trouble and Think time may be reduced by a number of other steps 
 taken to simplify and standardize the various PG/G's. 
 It is our expectation that as a result of these steps a PLATO exam will 
 usually take no more than 20% longer than a similar paper exam. 
 
 7.2 General Model 
 
 Through this work we have observed a number of variables operating 
 to influence a subject's performance. One way to organize these diverse 
 factors is sketched in figure 7.1. In this diagram, a student's general 
 personality factors are in the box at the top. The three boxes on the 
 left represent groups of parameters related to specific knowledge of the 
 course material; the three at the right contain the groups of variables 
 relating solely to usage of the terminal and PLATO system. On both the 
 left and right we find some factors inherent in the student, some external, 
 and some derived from interaction of the other two. Finally, at the bottom 
 
a 
 a) 
 
 > 
 -p 
 
 43 
 
 QJ 
 S3 
 •H 
 
 QJ TJ 
 H S3 
 
 O 
 
 CO 
 QJ 
 H 
 43 
 cd 
 
 •H 
 
 cd 
 > 
 
 p 
 S3 
 
 o 
 o 
 0) 
 
 CO tj 
 
 bO 
 S3 
 
 •H 
 
 -P 
 
 u 
 
 a> 
 
 CD 
 
 -p 
 
 
 
 •H 
 
 qj 
 Pi 
 
 T3 
 
 (U 
 
 o 
 
 «H 
 
 cd t3 
 
 a 
 
 Sh 
 
 Ph CL) 
 
 fn S3 
 bO O 
 cd -H 
 
 CO 
 0) 
 H 
 
 ■§ 
 
 •H 
 
 cd 
 > 
 
 o 
 
 CO 
 
 ? 
 
 o 
 U 
 
 o 
 Ml 
 o 
 
 pi 
 
 o 
 u 
 w 
 
 -p 
 
 CO 
 
 a 
 
 ■H 
 QJ 
 ■P 
 
 P 
 cd 
 H 
 QJ 
 Sh 
 Sh 
 O 
 O 
 
 CD 
 43 
 P 
 
 'HP O 
 
 QJ tJ S3 
 
 QJ 
 P 
 CO 
 QJ 
 bO qj 
 bOH 
 P 
 
 CO 
 
 •P 
 
 cd 
 43 
 P 
 
 CO 
 QJ 
 
 ■P 
 cd 
 o 
 
 •H 
 
 a 
 
 •H 
 
 CO 
 ft 
 
 O 
 
 u 
 
 bO 
 
 cd 
 43 
 O 
 
 P4 
 
 QJ 
 43 
 P 
 
 P> 
 CO 
 QJ 
 
 bO 
 bO 
 
 CO 
 
 O 
 
 cd TJ 
 QJ 
 
 co f3 
 ft QJ 
 3 P 
 
 S3 
 •H 
 43 
 
 P > 
 
 •H 
 
 CO 
 •H 
 
 S3 
 
 bO 
 
50 
 
 we find the dependent variables observed in the experiment: score, time, 
 trouble, review behavior, and PLATO preference. The relations between 
 boxes are not simple lines, but sets of lines connecting individual 
 variables, E.G., system-clarity to trouble. Similarly, the variables 
 within each box are not independent; for example probable relations between 
 the dependent variables are shown. The arithmetic signs indicate that 
 trouble may increase time while decreasing score and preference for PLATO. 
 No sign is shown to review because trouble may increase it by increasing 
 uncertainty or decrease it by increasing distaste for the system. 
 
 We present the model not as a program for conscientious measurement, 
 but rather as a means of clarifying relationships and suggesting directions 
 for future work. Even partial understanding of a few of the variables can 
 help us 
 
 understand the impact of changes to the exam system and the 
 
 relative value of different changes. 
 
 understand how knowledge in one area of interactive system design 
 
 carries over to other areas. 
 
 provide better tools for evaluating students, so as to provide 
 
 information beyond the mere assignment of a single grade. 
 
 7.3 Videotape as a Tool for Experiments 
 
 Although videotape has proven its value in numerous fields, it 
 was interesting to see how well it worked for our own experiment. Our 
 expectation of precise measurements was borne out not because we timed the 
 tapes during replay, but because we videotaped a clock. Its face not only 
 showed elapsed time, but also permitted coordination of the tape with the 
 
 
51 
 manual log of the session. Even with the recorded clock as an aid, analysis 
 of three hours of tape took more than 15 hours. 
 
 We were pleasantly surprised that audio recording - a trivial 
 by-product of the equipment - was valuable . It enabled us to tell exactly 
 when a key was pressed and to distinguish certain forms of Trouble when 
 the student requested help from the proctor. 
 
 A shortcoming we found was that - as we knew in advance - the tapes 
 aid not have enough resolution to record the text of the problems and 
 answers. Two possible solutions are to record a close-up of the work with 
 another camera or to record more details manually. More valuable, but more 
 expensive, would be a complete record of terminal input/output, including 
 timings. Such a system might permit computer aided analysis of the terminal 
 session with even more precise timings. In this case videotape analysis 
 would be faster because it would only have to be scanned to observe the 
 user's actions and reactions. 
 
 In summary, the videotape technique - when appropriately applied - 
 can be a valuable tool for research into the detailed behavior of interactive 
 computer users . 
 
52 
 
 8. References 
 
 Bitzer, Donald L. , et al. [1973] 
 
 "Computer-Based Science Education," CERL Report X-37, Computer-based 
 Education Research Laboratory, Univ. of 111, Urbana. 
 
 Doring, R. , L. R. Whitlock, and W. J. Hansen [1976] 
 
 Details of an Experimental Videotape Evaluation of an Interactive 
 Exam System, UIUCDCS-R-76-782, Dept . of Computer Science, Univ. of 
 111. , Urbana. 
 
 Hansen, W. J. [1973] 
 
 Design and Evaluation of Systems for Scattered Team Research . 
 Technical Manual TM-5, Dept. of Comp. Sci., Univ. of British 
 Columbia, Vancouver, B.C. 
 
 Kulhavy, R. W. and R. C. Anderson [1972] 
 
 "The Delay-Retention Effect with Multiple-Choice Tests." J. Ed. 
 Psych V. 63 , pp. 505-512. 
 
 Kulsrud, H. E. [l9lh] 
 
 "Some Statistics on the Reasons for Compiler Use." Software- 
 Practice and Experience, V. h , pp. 21+1-2^9. 
 
 Nievergelt, J., et al. [1976] 
 
 ACSES : The Automated Computer Science Education System at the 
 University of Illinois . UIUCDCS-R-76-810, Dept. of Comp. Sci., 
 Univ. of 111., Urbana. 
 
 Treu, S. [1972] 
 
 Transparent Stimulation of a Computer User . NBS Report 10 863 9 
 National Bureau of Standards, Washington, D.C. 
 
 Whitlock, L. R. [1976] 
 
 Interactive Test Construction and Administration in the Generative 
 Exam System . UIUCDCS-R-76-821, Dept. of Comp. Sci., Univ. of 111., 
 Urbana. 
 
 
IBLIOGRAPHIC DATA 
 
 1. Report No. 
 
 UIUCDCS-R-76-836 
 
 2- 
 
 3. Recipient's Accession No. 
 
 . Title anTSubtitle 
 
 A VIDEOTAPE ANALYSIS OF STUDENT PERFORMANCE ON 
 .AN INTERACTIVE EXAMINATION 
 
 5. Report Date 
 
 October 1976 
 
 6. 
 
 1 Author(s) 
 
 Wilfred J. Hansen, Richard Doring, Lawrence R. Whitlock 
 
 8. Performing Organization Rept. 
 
 No - UIUCDCS-R-76-836 
 
 i Performing Organization Name and Address 
 
 department of Computer Science 
 Jniversity of Illinois 
 Jrbana, IL 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 NSF ECA1511 
 
 1 Sponsoring Organization Name and Address 
 
 Jational Science Foundation 
 Washington, DC 
 
 13. Type of Report & Period 
 Covered 
 
 Research 
 
 14. 
 
 1 Supplementary Notes 
 
 1 Abstracts 
 
 Creful analysis of the behavior of users of interactive systems can yield important 
 i sights into the appropriate design of such systems. Because it has not been easy 
 t determine user behavior precisely, we investigated videotaping as a tool. Our 
 aalysis of four students taking examinations both interactively and on paper showed 
 tat they took considerably longer interactively, primarily due to system overhead and 
 touble understanding instructions. The experiment revealed a number of important 
 dsign changes for our system which we expect will reduce the excess time to no more 
 tan 10-20 percent. 
 
 >fdeotaping led to analysis of many other variables including context switch time, 
 rview behavior and habit changes. These and others observations led us to hypothesize 
 atheory of nuisance, annoyance, and agitation that explains why some students have 
 /ry negative reactions to interaction. 
 
 7 Key Words and Document 
 Luerarfivp evct-omc 
 
 Analysis. 17o. Descriptors 
 
 
 
 i::r behavior 
 
 rleotape 
 
 :oputer aided instruction 
 
 n.sance 
 
 inoyance 
 
 station 
 
 71 
 
 Identifiers/Open-Ended Terms 
 
 7eCOSATI Field/Group 
 
 8. vailability Statement 
 
 ilimited 
 
 ° f NTlS-35 (10-70) 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 57 
 
 22. Price 
 
 USCOMM-DC 40329-P7 1 
 
>)ND e 7