m Digitized by the Internet Archive in 2013 http://archive.org/details/videotapeanalysi836hans r .xuy UIUCDCS-R-76-836 A VIDEOTAPE ANALYSIS OF STUDENT PERFORMANCE ON ol October 1976 AN INTERACTIVE EXAMINATION by Wilfred J. Hansen Richard Doring Lawrence R. Whitlock UIUCDCS-R-76-836 A VIDEOTAPE ANALYSIS OF STUDENT PERFORMANCE ON AN INTERACTIVE EXAMINATION by Wilfred J. Hansen Richard Doring Lawrence R. Whitlock October 1976 Department of Computer Science University of Illinois at Urban a- Champaign Urbana, Illinois 61801 This work supported in part by the National Science Foundation under grant EC41511. t gZ C-S O > > > > -d o O' O o O O u • •H & DO -p fel 01 fl £ ft O ft K ft O ft ffi CO h) CO CO CO h> *-3 co -p H 1 -p 3 ,Q o L0 3 o 3 CU CO "-J C— CM VD ON OJ -3" CO H oo oo oo oo c\J oo po on on uov on vo -=t -=t OJ OJ OO H OO OJ H OJ O rH H ON 00 H H H OJ H OJ 00 O LTN LTN ON VD VD O OO OO OJ 00 OJ 00 Lf\ LT\ J- t— CO V£> t— On oj OJ oj oo oj -=r -* OO CO CO -=1- t— -=J- O H -* O H H H H H H OJ OJ CO On oo On no f-J-vo ir\ OJ H OJ H OJ H H CO 00 CO LTN On CO H H H - oo OS OJ CO H -cf O _d- CO 00 t- H oo o OO H OJ oj -=r H -* H o j- OJ On CO pq lt\ •H -P o3 X (L) •H > -J" O oj ti •H P •iH > •H -P O < p V u rH cu fi H •H XI Eh « > 41 • -H CO > T3 41 a u o o u 41 O CO «M T3 >> C H a c O CO 4i 4) P *H 3 ed G •H G s 5 O G X •H CO * * o o CM on -4- , J" ON ct3 I on cm cn cn n cn cn * * o\ CO cn cn on VO CO CM CM ON ON -4" on -4- vo cn rH CM ON H O cn cn o LA H CM CM CM o H -4" LA LA O on vo cn H ON CO ON o c— CM on _4 ON H -4" CM CM H CM H CO rH O H on H * * * * H O H H 4 CO \0 VD H H H m on 4 j -4- on H ON O on on -4- tJ c a cu $ XI + Sh >i cu al > H o P co p •H Ph Q o LA t— on r— ON H CM H H CJ >t 4) O CO a p -4" CM H CO CU tJ CM CM CM CM Effici pts/pro 1-4 P4 VO t— VO CM ^— * CM CM on CM LA CM O CM u VO rH O -4" 4) 4) > > gj LA on on H •H CO VO _4 o O S3 H H H 2 + 'O ^ O o ON VO o G i-h| CM H H _4" U -H Ph| Ph ,G on -4 ON H EH H H rH H CM O O _4" t— -4" -4" VO o o o o H co on _4 on CM CM LA o o o o CO Si o on LA on LA CM ON on p- PhI LA CM -4" CM CM CO on 4) to U -P cd G co O 41 O m +^ o 41 CO TH 0) P cd Eh U o «H m CJ 6 X 4* 4) CJ •H -P O cS rH fk 4) -P bD P •iH rH 4) to CU •H P h-SJTJ q.oaC -qns P-* < pq pq CS hH Ph Ph <; pq CO CO o CO Ph p CO 4) bD cd U 41 !> o3 CO ■H P •H > •H +3 O cd < < m pp Ph cd i-q P< Ph < pq CO CO cd P^ O O CO CO <; < w w Ph cd P P< Ph cd Ph <; pq o q CO CO CO CO 23 (a) "Representative" Times PLATO paper l)k) Average of "*\ lame Std. Dev. total time less) i6.5 1.2^ Trouble time J 2)5) Average non- j productive time I (Select, Gener-) U.3 .8U ate, Load, Display) 3)6) Representative^ Productive Time^ SuDjects (SF,SG,SA, SB,SD) (SC,SD,SA, SB) Time Std. Dev. Subjects 12.2 1.86 (SE,SF,SB, SC,SD) .3 11.9 .12 (SC,SD,SA. SB) (b) Distribution of Activities average times (1) (2) (3) (k) PLATO paper PLATO-paper (3) of as a % total (2) Reducible overhead "\ (Load, Select) J 2:32 :18 2:lU 20$ Inelastic overhead ^ (Generate, Display)/ 1:U6 l:h6 16% Productive work *\ (Think, Answer) J 1^:39 10:33 h:06 31% Trouble } 2:56 :13 2:U3 25% Total 21:53 11:0U 10:^9 98^ Figure U.6. Breakdown of the Extra Time Spent on PLATO . For description of the categories see the text. Times are in minutes and seconds. The figures in column (bU) express the excess time on PLATO as a percentage overhead beyond the total time required on paper. (For example, the k:06 minutes longer productive work on PLATO is 31% of the total 11:0U minutes spent on the paper exam. ) 2\ k) The "representative paper time" is total time less Trouble time averaged over subjects SB, SC , SD, SE, and SF, who were all reasonably- close to that value: 12.2. (Again we drop the two extreme cases.' SA had great Trouble on PLATO but breezed tnrougn the paper exam. SG was pain- staking enough on paper to achieve a perfect score.) 5) The "representative paper non-productive time" is .3, the average paper problem Selection time for the four taped subjects. 6) The "representative paper productive time" is the difference of the values from (h) and (5): 11.9. We see that taking account of total time, Trouble time, and non-productive time, the time spent on each exam was about the same. The second approach provides insight into the relative influence of the factors responsible for the longer times on PLATO. In this approach the videotape data for the four taped subjects are categorized as shown in figure *+.6b. (These times differ from the "representative times" because they include subjects whose times were not representative of all seven subjects.) In the figure, "Productive work" is that time the user spent working on the problem, independent of the method. "Reducible overhead" is time that is easy to avoid by modification of the system. "Inelastic overhead" is inherent in the PLATO system. "Trouble" is that time when the user was having difficulty understanding the requirements. In column (h) the figure expresses PLATO excess time as a percentage of the time the exam took on paper. This basis will permit valid comparison when we discuss reduction of PLATO overhead in section 5. Several other interesting observations can be made from the data and figure h.5: 25 a) All four taped subjects wrote comments about Trouble on the post-test questionnaire. One even remarked - with cautious ambiguity - that, "All of the instructions were not clear." b) Taped subjects had more "activities" on PLATO than on paper, even when the activities are restricted to those actually performed by the human - Think, Answer, Select, and Trouble. From the data of figure U.5, we find an average of 65 for PLATO and 57 for paper. Partly this is because subjects reviewed more on PLATO and partly because often when a "Trouble" activity occurred, the subject had to try several times to enter the answer. c) "Overhead" on the PLATO exam was not negligible; the average time for Display, Generate, and Select per problem page was .36 minutes. Even though comfortingly constant, this time is frustrating to the student because there is little to do but wait. The corresponding value for the paper exam includes only Selection time; an average of . 0l± minutes. Overhead is especially noticeable on PLATO during review; it averaged over 10$ on PLATO and only 3% for those who reviewed on paper. Indeed, for review, Display time is slightly larger than initial presentation because the student's prior work must be dis- played; this is offset by the fact that no problem Generation time is needed. U.2.2 Influences on scores Factors other than ability influenced scores in this experiment to an unusual extent. Section U.1.3 has mentioned the impact of the experimental situation; section U.l.U discussed the variation due to 26 exam order. Other factors can be observed in figure U.7, which presents scores and Think times for each problem. The figure demonstrates that PLATO itself did not reduce scores. Instead, the major factor in lower PLATO scores is that the interactive grading algorithm in the DO loop problem subtracted too many points for a wrong answer, so subsequent correct answers got no credit. Indeed, we expect that PLATO will actually lead to higher scores because it aids the student in several ways . For example : SB tried an invalid answer to the syntax problem and was rejected. The retry was correct, so the subject received full credit. (On paper the invalid answer would simply have been marked down.) SC missed the first line on the DO problem but was given the correct answer and got the rest correct. (Relative grading could have achieved the same result.) There does not appear to be any inequity in this approach, since all students stand the same chance of being corrected and because invalid answers are likely to be misconceptions that ought to be cleared up on paper by asking the proctor. Inequity does occur, however, in the level of difficulty of problems. As shown by the average points per minute (last column in U.j), some problems were fairly easy - problem 2 on PLATO and h on paper; while some were too hard - 3 and h on PLATO. (indeed problem 2 on PLATO was so easy that the only points missed were on the ASSIGN statement, a construct not covered in lecture.) Had there been a time constraint, a student who attempted one of the harder problems would be at a disadvantage. 27 CO ft •H -3- hO OJ > ctf t- on H VD H oo OJ ON CO On OJ on 3 Si o la la H in H O H H O la m ft] n -p •H -P Ov CO CM OJ LA O ON LA 00 CO oo CO oo LA OJ vo H VO 00 LA H CM ON LA CO LA OJ OJ H t— VO CO OJ CO t— CO VO OO OJ H OJ H OO on ON on o II -p OJ LA H t— On oo OJ 00 OJ On oo OJ O * i o LA O t— OJ H OO 00 -3- LA H on on LA on H on CO o -=r H H H LA LA H OJ H NO H VO on _=r LA VO VO LA O M LA LA OJ VO OJ -4- o ON II -p w >, u u •H -P -p ON on o t— o CO o H * i O _=!- OJ vo VO o OO H oo -3- LA on VO OJ VO H OJ LA H H -^ on H * I o LA O LA OJ o OJ On OJ OJ OJ o on o OJ LA O -3" OJ H LA o H H OJ on w a H H ^H -p S crt ft K - - H o •P O w CO PL, Q -p OJ on _3- ft ON OJ H ft -p R CO OJ H H H O CO OJ CO O CO on OJ on H on -tf H on on H H CO LA O OJ ON H OO OJ H H H H LA H H H LA oj oo EH H « ft O Q LA 00 O LA O a} -P o -p on _=r 28 In addition, there were great disparities in difficulty between different instances of the same problem: Division problems tend to be harder than other arithmetic expressions, yet a subject might receive 0, 1, or 2 of them. The syntax generator sometimes inserted no errors in simple statements and sometimes embedded an error deep in a complex statement. It sometimes generated instances of language features not yet studied (as the action of a logical IF). The format problem required rounding up anywhere from zero to three times. Sometimes one iteration of the DO loop would modify an array value used by a later iteration. Few students analyzed this correctly. Careful attention to relative difficulty is essential in interactive exam design. Generators in our improved system are based on a detailed analysis of the factors which contribute to difficulty. 29 5. System Improvements As a consequence of this experiement, other observations, and introspection, the exam system has "been improved in many ways. This section discusses these improvements roughly in the order a student encounters them. The percentage overheads are taken from figure U.6; to compare them with improved versions, the denominator is the total time on the paper exam. 5.1 Reducible Overhead On PLATO, character set Loading and problem Selection time constituted a 20% overhead. Load time has been eliminated by restricting PG/G's to the standard set of 128 characters. It includes almost all characters used in common programming languages, and one or two extra characters can be loaded quickly for special problems (for example, the PL/I logical NOT). Problem selection time was bloated by a design that required a return to the cover page after every problem. The system has been modified so that the shifted-NEXT key transfers directly to the next problem page and shifted-BACK goes directly to the previous one. These two possibilities account for the vast majority of interproblem transitions. For random access, the shifted-DATA key returns the student to the cover page. We have not yet found a satisfactory scheme for problems with multiple pages. Certainly shift-NEXT and shift-BACK should move among 30 the pages of the problem, but should there be a "sub"-cover page for the problem? One pilot PG/G uses a vector of page numbers at the bottom of the page, but this scheme can be a little too confusing. The best interim solution seems to be avoidance of multiple page problems. 5 .2 Inelastic Overhead A l6% overhead resulted from problem Generation and Display, mostly on the latter. Unfortunately, Display time depends on the communi- cation link technology and cannot be reduced by changes to the exam system. Moreover, problem Generation time has increased slightly as more sophisti- cated approaches have been used. However, several steps have been taken to reorganize Generation and Display so the time required is more palatable and useful: - Basic problem statements are presented before captions and descriptions of the control keys. - Problem details are presented as they are generated so the user has something to work on and does not feel the system has halted. - Detailed instructions that follow standard conventions are accessible via the HELP key, but are not displayed as part of the main display. Though these techniques create the display efficiently, our observations suggest that users often wait for the complete display before starting to work. Possibly more experience and time pressure will encourage productive use of display construction time. Other exam systems generate the entire exam when the user first enters the system, or even earlier. We have not chosen this approach because it would be expensive in terms of disk accesses. Two (a read and 31 a write) would be required for each PG/G during the generation-only phase. It is our hope that generation for each problem can be short enough to keep disruption to a minimum. (Similarly, grading could be deferred until completion of the entire exam, but isn't because of the disk access limitation. Most PG/G's have straightforward, rapid grading algorithms.) 5.3 Productive Time Although the excess productive time - Thinking and Answering - amounted to 31% of the time to do the exam on paper, there does not seem to be any inherent reason why it should be longer. Indeed, the "repre- sentative time" analysis showed similar productive times with both approaches. In addition to the pressures discussed in U.1.3, longer Think time on PLATO may result from these factors: (i) Students are used to having an answer judged immediately on PLATO, so they work very hard to be sure they have the correct answer before entering it on the exam, (ii) They do not know how to change an answer, or perceive it as a difficult process, (iii) They suspect that a wrong answer will be counted in the grading algorithm. (The suspicion is not unwarranted; we have considered this approach. ) (iv) They fear "exposure" since their answers are more readable by the proctor (and harder to cover up) than they would be on paper. Improved system design and more practice will help reduce the magnitude of at least the first two of these problems. The PLATO time to enter an Answer was 19$ longer than that on paper, but still amounted to only a second or two more per page. This is 32 probably not a serious factor. Students did not express difficulty with typing answers and typing ability was not a factor in our results. Many future students will be even more familiar with the keyboard because they will have been using PLATO for other courses. Nonetheless, the answer entry mechanism is a key element in system design; since no judgement of the answer is expressed, some other action must be taken to show acceptance. In the exam system (even at the time of the experiment) there are two areas for each answer - an entry area and the display area. After entry the answer is moved to the display area, so that even when a new answer is being entered, the old one is still on view in the display area. It is also important that the entry area be adjacent to the display area; this principle is violated by the PRINT FORMAT problem and it is more disconcerting and confusing than the others. 5 .h Trouble Time Trouble time - a 25% overhead in this experiment - is a very variable quantity and will always exist, even on paper exams; the best remedy is to provide a proctor for every exam. Several of the steps indicated above will also help reduce Trouble time, especially the pro- vision that the shifted control keys for travel between pages must always work. In addition, practice examinations and written pre-test instructions help reduce surprise and confusion. 5. System Improvements (continued) In view of the above, we can estimate the overhead for doing an exam on PLATO instead of on paper. Reducible time can be eliminated. Inelastic time can be reorganized so the student pays a penalty of perhaps 33 no more than 5% of the paper exam time. Productive time need not be any longer, especially if students are given practice exams; we hope for a penalty of at most 5-10$. Similarly, Trouble time can be reduced by good design and training, so the penalty should also be no more than 5-10$. We conclude that it is reasonable to expect that the PLATO version of an exam should take no more than 20% longer than the same exam on paper. An important question is whether students should spend any unnecessary time on exams. The benefits to the instructional staff of reduced preparation and grading time are economically tangible, but does a student receive any benefits for the extra time? Among the benefits are unbiased grading, help with getting the answer into the expected format, immediate knowledge of results, and increased flexibility in temporal or physical scheduling (if the student is willing to risk the absence of a proctor and if the obvious user-identity problem can be solved) . We believe these benefits justify the interactive examination approach . 3>* 6. Further Analyses One problem with the videotape technique is the embarrassing richness and variety of data it provides. This section explores a few of the possibilities raised by our data. 6.1 "Context Switch" Time It is not unlikely that when a subject returns to a familiar page it will take some time to switch mental "context" and recall the material. The experiment offers two types of page return: return to the cover page between problems and return to a problem for review. The "context switch" time on return to the cover page is not directly available in our data because we encoded the entire time from the end of one problem to the start of the next as Select time. However, total problem Select time is known and has four distinct components: Display time for the cover page, "context switch" time, Think time, and the time to press a single key for the next problem. The computation is summarized in figure 6.1. Column (l) is the average Select time for those occasions (including quitting) where selection was via the cover page. Column (2) is the problem Display time computed by dividing the total PLATO Display time by the number of problem pages viewed. (Display time varies slightly with changes in system load.) The next three columns are estimates of the time to choose the next problem and to type the corresponding problem number. Column (3) estimates this as the average time required to select the next syntax problem by 35 (1) (2) (3) SA SB SC SD N Avg, SD M (5) (6) Select Max of (T) Think & Select Display , rovm " Answer 7^' "\ 7o i^ Switch * J NEXT (paper) (3, 4) h_2_6 15.8 7.0 1.8 3.0 2.6 3.0 5.8 13. T 5.U 3.0 h.l 1.1 h.l 3.6 Hi. 6 7.2 k.O 1.7 l.k k.O 3.U 15.3 7.2 k.Q 3.1 3.3 k.8 3.3 33 ^5 16 95 35 56 ~ 1U.88 6.81+ 3.56 3.15 2.03 U.00 U.O T.U5 2.38 2.37 <2.30 1.5 2.3U Figure 6.1. " Context Switch" Time Calculation . All times are in seconds. "N" is the total number of occurrences for these subjects. The averages and standard deviations are with respect to all occurrences. See text for description of columns. (The value of the k.O for the average of column (7) is close to both its horizontal value — horizontally computed from the averages from columns (l), (2), and (6) — or its vertical value — the average of the values in column (7).) 36 pressing shift-NEXT (the one case of Select time that did not use a cover page). Column (h) is the average time to enter an answer (Answer time divided by number of activities from figure k.5). For comparison, column (5) gives Select time on paper. Column (6) is just the maximum value from columns (3) and (k) . The minimum context switch time plus think time in column (7) is the cover page Select time (l) less the sum of Display time (2) and minimum one-key Answer time (6). Ths average of k.O for context switch plus think time suggests that subjects spent that long just staring at the cover page and recovering from the previous problem. Since the thought involved is negligible, context switch time must be a large fraction of four seconds. The other approach to context switch time is to consider those occasions when a subject looked at a problem very briefly and went on to another. We can assume they spent that time mostly remembering their work on the problem and that they decided there was little likelihood for improvement. On PLATO, SA had three such occasions, SC had one, and SD had two; for an average time of 6.7 seconds (five were 5 or 7; one was ll). On paper, one subject had four such occasions with an average time of 7.8 seconds (values were 2, 7, 9 5 and 13). One implication of "context switch" time is that every time the screen is erased the user pays a time penalty to get reacquainted with the new page. Personal observation suggests that this penalty (though probably less than four seconds) is paid even if the subsequent page has exactly the same format as the former. Thus problem design should emphasize putting a number of similar problems on one display, rather than erasing the screen each time a new question is to be asked. 37 "Context switch" time suggests that there is some penalty in starting to read any new page and a greater penalty on encountering a new page format. For this reason PG/G's should not use diverse page layouts. In particular, ours all place headings and standard key conventions in the same places. 6.2 PLATO Experience Intuitively, subjects with more prior exposure to PLATO should do better than others on the automated exam system. However, no hint of this relationship can be derived from our data, whether we consider all seven subjects or only those who were videotaped. For example, we can compare SA and SD , who had a total of thirty hours prior experience, with SC and SB who had a total of six hours. It is true that SA-SD spent less time on the PLATO exam, but they also spent less time on the paper exam; moreover, the ratio of PLATO times is less than the ratio of paper times. I.e., SA-SD were not as much faster as would be predicted by their speed on the paper exam. For scores we can compare the groups on first hour exam score, PLATO score, and paper score. The ratios are 1.07, 1.24, and 1.15, so those with more PLATO experience scored slightly higher on PLATO than would be predicted by other factors alone. However, the difference is not convincing. It is possible that the learning curve is such that three hours PLATO exposure was enough to learn the features used by the exam system. Another possibility is that prior experience on PLATO was actually detrimental. Prior PLATO experience could not have taught the same set of key conventions as the exam system, especially the return to the cover 38 page after each problem. Indeed, some PLATO training - for example, waiting for a NO-or-OK judgment on an answer - is antithetical to the behavior of the exam system. Possibly the higher scores achieved by those with more PLATO experience is simply due to more practice with the course material gained by exploiting the available PLATO lessons. In addition to exposure to PLATO, we can ask whether exposure to typing affects performance at the terminal. One subject who had little PLATO exposure but had had a typing course did reasonably well. Another who claimed to be a better-than-good typist had little Trouble and entered Answers quickly, but had much more Think time on PLATO. 6.3 Review Behavior A very simple exam system could be implemented if students simply worked exam problems one after the other. No one does, however, so it is important to study the variety and extent of "review behavior" - i.e., of return to problems previously worked on . Such behavior can be a clue to a student's confidence in self and in the answers, so investigation of review behavior can add depth to our knowledge of personality and ability. Two types of review are apparent in our data: " hesitation " - the review of a page prior to going to another, and " rework" - the return to a page. Our subjects exhibited a variety of review behaviors, as indicated in figures 6.2 and 6.3. For example, SD reworked no problems on paper, but all of them on PLATO. Subject SC - displaying either caution or uncertainty - hesitated on half the problems and reworked almost all. Most subjects hesitated on half the problems, but SC spent considerably more time at it. Our limited data did not reveal any relationships 39 w u > 0) -P A o -p 3 O t3 SJ o on LA LA CM ON oo £1 LA CO OJ CM 00 oo £J 00 ON H ON H o la H 00 o H £1 CM oo o H oo CVI H LA H on H LA VD -p £— SJ SJ SI SJ CM 00 J-3 LA CM LA O H o CM 00 o o CM 00 H O CM LA O VD \£> VD O CM O H oo H _^- -=f oo oo h 00 LA ON OO 00 t- SJ -3- H CM H LA O OO o 00 Oil O0 o O LA OO LA H o 0\ H u -P ** "» -" ^ — «. • s. w 3 1 pq CI •H Ph ft P-, ft ^~*' * — ■ * — • ,o Jh o OJ X rH U J2 d BO > A S QJ o •H A to i O P a OJ 0) TJ c^ H c O ,£> •H OJ O XI OJ H BO P a ft J_, T3 c CO C bfl § •H •H •H ft) Cm *- "3 P -d C U OJ U o 1 p a CJ G ft rH «H a ft > OJ O o p m u H p •H p ^ aj E to CO Oj t- • P H •H M d X3 <4H Fh •H o o H OJ > ft ft X OJ P h OJ < X * Dj p CO • bD * > X C tJ OJ EH •H 0) ■H > ^ > OJ • ■d •H K p X! •H M CJ P C m CO en •H ri P to M 3 d 0) U •H O O OJ o o > X ft 2 p ra OJ OJ H nd X •H a #* • a •d vO »H Jd OJ O o p U a s aj cu OJ ,d +J p •H OJ cd p fe X> H aj Hi between review "behavior and any one of score, PLATO satisfaction, or total non-review productive work time. One good technique for examination taking is to scan the entire exam "before starting work. Though none of our subjects adopted this strategy, it is important to note its impracticality on PLATO due to the high overhead of page turning. Despite this, our subjects did rework more problems on PLATO. Such extra rework may reflect uncertainty about the answers, but they were not changed any more than on paper. More likely, the increased rework was to have the assurance that the system really had retained the student's answers. The data in figure 6.3 illustrate the detail of analysis that can be accomplished with the videotape approach. In this table each problem for each subject is assigned to a cell according to whether the subject hesitated on or reworked the problem and whether the answer was changed during rework. The data suggest the following hypotheses: a) If a student hesitates and later reworks a problem it is as likely to be changed as not . b) If there is hesitation, but no rework, the answer is likely to be correct . c) If there was no hesitation, there is unlikely to be any change during a rework. Hypothesis (b) might correspond to the subject gaining confidence during the hesitation. Similarly, hypothesis (c) might correspond to a complete confidence in the solution. Such cues, if valid, would provide a basis for evaluation of self-confidence. Subsequent comparison with the final score would provide a measure how realistic the student is. Knowledge 1+2 of this factor would be invaluable, for example, in consultation with the student concerning study habits. 6.k Nuisances, Annoyance, and Agitation Many of our other observations contribute to the theory of "nuisances, annoyance, and agitation": A nuisance is a system action or requirement that bothers a user. Annoyance is the user's immediate response to a nuisance. Agitation is the cumulative effect of multiple annoyances. In this theory, annoyance and agitation decay exponentially, but agitation decays more slowly each time one annoyance follows closely on another. Both decay rates depend on the personalities of the individual users. For example, we can hypothesize someone with a "rigid" personality who would adapt well to the constraints of interactive computing, but would have increased agitation when the system fails to be consistent. Even non-rigid personalities may suffer on exams where agitation is compounded with tension and ego-involvement. Quantification of agitation may have a significant impact on system design. If an economical measure can be found the system could detect problems and modify itself or suggest alternative approaches to the user. If measurement requires extensive subsidiary equipment, it can still be a valuable tool in testing system designs and suggesting training regimens. Ultimately, understanding of the causes of agitation can lead to better methodologies for initial system design. At present, agitation itself can best be measured indirectly as by the "satisfaction" scale in figure k .2 or by inference from behavior. 1*3 However, direct observation can reveal nuisances such as the three classes we found in our experiment: work habit change, interaction uncertainty, and surprise. A work habit - for example underlining key words or lifting the page corner while finishing an answer - may help a worker concentrate and reduce anxiety. When a tool change forces a change in habit, the worker will suffer momentary distraction each time a habitual action is thwarted. Unfortunately a switch from paper to an interactive system - no matter how valuable otherwise - requires significant habit changes. The biggest changes are the switch from pen or pencil to keyboard, the increased formality of page turning, and the severe constraints on modification of the visible image. Marginal notes are no longer possible; for even a simple manual calculation the entire torso must turn. As evidence of changes in work habits we present in figure 6.h the times subjects spent calculating with paper and pencil. The first two subjects performed similarly on both media, but the other two did much less hand calculation on PLATO. Curiously, the latter two also had higher "satisfaction" scores. In all cases, the total times and scores were similar for both PLATO and paper. Another variety of change was exhibited by SB who pointed at the variables and their values on paper, but not on the PLATO display. Interaction uncertainty refers to a little noticed disadvantage of the very flexibility which is the greatest advantage of interactive systems. Because a system can exhibit so many behaviors, it far less predictable than a piece of paper; a new user must always wonder, "What uu manual calculations think and answer first PLATO time # paper time # PLATO paper sat subject time score time score isfaction SA PLA 1:1+9 5 1:50 5 1:1+3 15 :23 12 1.7 SB paA :10 1 :23 1 h:56 9 3:38 9 3.2 SC PLB :51 3 2:12 6 5:07 12 2:lU 12 3.6 SD paB :55 2 2:1+6 1 3:18 9 :32 6 3.9 Figure 6.1+ . Comparison of Manual Calculation Behavior for the Arithmetic Expressions Problem . Times are in minutes and seconds. "#" is the number of instances of manual calculation. The "productive time" in Figure h.f is the sum of the two times shown here. The "satisfaction" score is copied from Figure 1+.2. ^5 will it do now?" In an instructional lesson, imaginative variety can help maintain student interest, but on an exam it only heightens anxiety. Uncertainty arises from the variability of the time to create a screen image and from the fact that images may be constructed in random order. The student must integrate the image as it forms (and hope to find and read each piece before the next appears), or wait for the full display. Our system may further increase insecurity by moving each answer after it is entered, and by occasionally rejecting an answer, features which have been cited as advantages in other sections of this paper. The evidence for interaction uncertainty is sketchy. The videotapes seem to show subjects waiting patiently for the full display and then moving to an attitude of attention when the display is complete. In addition, subjects reviewed more on PLATO, both hesitating and reworking longer and more often (see section 6.3), possibly to check that the system had actually retained their answers . In section 5 we advocated nonlinear image generation to emphasize key points and utilize generation time. If interaction uncertainty is a severe problem, it may be necessary to reconsider this choice. In any case, students must receive considerable exposure to PLATO and the exam system prior to the exam in order to gain confidence in the predictability of the system. Surprise nuisances are unusual actions which the student will encounter only rarely, whether because they are bugs or are inconsistent with other system behavior, or only occur when the student performs an uncommon sequence of actions. k6 Several aspects of the exam system surprised our subjects. The return of the arrow to the top of the arithmetic expression page prompted one subject to turn and ask what had happened. Another subject thoroughly reconsidered a syntax problem after answering correctly, possibly because the advance of the arrow suggested there was another syntax error. The lack of immediate judgment of answers was at variance with all prior PLATO experience. Answer rejection caused many problems since it is completely unlike the behavior of the paper exam. Students had to pause to replace an arithmetic expression answer with an "e" (and "E" wouldn't work either), The DO loop problem not only rejected answers, but only gave one second chance. Worst of all, the syntax problem would reject an answer and then reject all control keys until an acceptable answer was entered. The result of all these answer entry surprises may be a pause by the student after entering each answer. If so, this would be a possible explanation of the hesitation behavior described in section 6.3. The nuisances of work habit change, interaction uncertainty, and surprise can combine to provoke very negative reactions to a system, as happened to subject SA. Considerable research is needed to learn how to detect and eliminate such problems, preferably before the system is implemented. hi 7. Conclusions In videotaping as few as four subjects, it was not our intention to produce statistically defensible results, but rather to discover trends, to help form hypotheses, and to study videotaping as a tool. As a dividend, we were able to find numerous exam system improvements which - while only suggested by the data - were seen to be important by introspection and observation of larger groups of students. 7.1 Results and Action The primary results in section k show that the experiment was not unduly biased by choice of subject or their assignment to groups. Two analyses of the time taken on PLATO versus that on paper were made: a) "Representative" times constructed by subtracting Trouble time and system overhead from the total time for a representative group showed that the total time for "normal" exams on PLATO was very close to that for paper (12.2 minutes versus 11.9 minutes; the PLATO group had about three minutes more Trouble time and about four minutes more system overhead). b) Micro-analysis of the eight categories of activities by the videotaped subjects showed that the PLATO group spent more time in each of four groups of activities - Productive work, Easily reducible, Inelastic, and Trouble. The first of these analyses demonstrated that, under good conditions, PLATO time can be similar to paper time. The second indicated the benefits U8 to be gained by efforts to reduce times for each group of activities. Among the specific steps we have taken to reduce PLATO time are these, as discussed in section 5- Selection time has been eliminated by defining two control keys to move between problem pages (rather than require a trip to the index for every Selection). Load time for character sets has been eliminated by prohibiting special character sets within Problem-Generator/Graders. Display and Generate time has been masked by reordering displays to present crucial information first and to display as much as possible during each step of generation. Trouble and Think time may be reduced by a number of other steps taken to simplify and standardize the various PG/G's. It is our expectation that as a result of these steps a PLATO exam will usually take no more than 20% longer than a similar paper exam. 7.2 General Model Through this work we have observed a number of variables operating to influence a subject's performance. One way to organize these diverse factors is sketched in figure 7.1. In this diagram, a student's general personality factors are in the box at the top. The three boxes on the left represent groups of parameters related to specific knowledge of the course material; the three at the right contain the groups of variables relating solely to usage of the terminal and PLATO system. On both the left and right we find some factors inherent in the student, some external, and some derived from interaction of the other two. Finally, at the bottom a a) > -p 43 QJ S3 •H QJ TJ H S3 O CO QJ H 43 cd •H cd > p S3 o o 0) CO tj bO S3 •H -P u a> CD -p •H qj Pi T3 (U o «H cd t3 a Sh Ph CL) fn S3 bO O cd -H CO 0) H ■§ •H cd > o CO ? o U o Ml o pi o u w -p CO a ■H QJ ■P P cd H QJ Sh Sh O O CD 43 P 'HP O QJ tJ S3 QJ P CO QJ bO qj bOH P CO •P cd 43 P CO QJ ■P cd o •H a •H CO ft O u bO cd 43 O P4 QJ 43 P P> CO QJ bO bO CO O cd TJ QJ co f3 ft QJ 3 P S3 •H 43 P > •H CO •H S3 bO 50 we find the dependent variables observed in the experiment: score, time, trouble, review behavior, and PLATO preference. The relations between boxes are not simple lines, but sets of lines connecting individual variables, E.G., system-clarity to trouble. Similarly, the variables within each box are not independent; for example probable relations between the dependent variables are shown. The arithmetic signs indicate that trouble may increase time while decreasing score and preference for PLATO. No sign is shown to review because trouble may increase it by increasing uncertainty or decrease it by increasing distaste for the system. We present the model not as a program for conscientious measurement, but rather as a means of clarifying relationships and suggesting directions for future work. Even partial understanding of a few of the variables can help us understand the impact of changes to the exam system and the relative value of different changes. understand how knowledge in one area of interactive system design carries over to other areas. provide better tools for evaluating students, so as to provide information beyond the mere assignment of a single grade. 7.3 Videotape as a Tool for Experiments Although videotape has proven its value in numerous fields, it was interesting to see how well it worked for our own experiment. Our expectation of precise measurements was borne out not because we timed the tapes during replay, but because we videotaped a clock. Its face not only showed elapsed time, but also permitted coordination of the tape with the 51 manual log of the session. Even with the recorded clock as an aid, analysis of three hours of tape took more than 15 hours. We were pleasantly surprised that audio recording - a trivial by-product of the equipment - was valuable . It enabled us to tell exactly when a key was pressed and to distinguish certain forms of Trouble when the student requested help from the proctor. A shortcoming we found was that - as we knew in advance - the tapes aid not have enough resolution to record the text of the problems and answers. Two possible solutions are to record a close-up of the work with another camera or to record more details manually. More valuable, but more expensive, would be a complete record of terminal input/output, including timings. Such a system might permit computer aided analysis of the terminal session with even more precise timings. In this case videotape analysis would be faster because it would only have to be scanned to observe the user's actions and reactions. In summary, the videotape technique - when appropriately applied - can be a valuable tool for research into the detailed behavior of interactive computer users . 52 8. References Bitzer, Donald L. , et al. [1973] "Computer-Based Science Education," CERL Report X-37, Computer-based Education Research Laboratory, Univ. of 111, Urbana. Doring, R. , L. R. Whitlock, and W. J. Hansen [1976] Details of an Experimental Videotape Evaluation of an Interactive Exam System, UIUCDCS-R-76-782, Dept . of Computer Science, Univ. of 111. , Urbana. Hansen, W. J. [1973] Design and Evaluation of Systems for Scattered Team Research . Technical Manual TM-5, Dept. of Comp. Sci., Univ. of British Columbia, Vancouver, B.C. Kulhavy, R. W. and R. C. Anderson [1972] "The Delay-Retention Effect with Multiple-Choice Tests." J. Ed. Psych V. 63 , pp. 505-512. Kulsrud, H. E. [l9lh] "Some Statistics on the Reasons for Compiler Use." Software- Practice and Experience, V. h , pp. 21+1-2^9. Nievergelt, J., et al. [1976] ACSES : The Automated Computer Science Education System at the University of Illinois . UIUCDCS-R-76-810, Dept. of Comp. Sci., Univ. of 111., Urbana. Treu, S. [1972] Transparent Stimulation of a Computer User . NBS Report 10 863 9 National Bureau of Standards, Washington, D.C. Whitlock, L. R. [1976] Interactive Test Construction and Administration in the Generative Exam System . UIUCDCS-R-76-821, Dept. of Comp. Sci., Univ. of 111., Urbana. IBLIOGRAPHIC DATA 1. Report No. UIUCDCS-R-76-836 2- 3. Recipient's Accession No. . Title anTSubtitle A VIDEOTAPE ANALYSIS OF STUDENT PERFORMANCE ON .AN INTERACTIVE EXAMINATION 5. Report Date October 1976 6. 1 Author(s) Wilfred J. Hansen, Richard Doring, Lawrence R. Whitlock 8. Performing Organization Rept. No - UIUCDCS-R-76-836 i Performing Organization Name and Address department of Computer Science Jniversity of Illinois Jrbana, IL 61801 10. Project/Task/Work Unit No. 11. Contract /Grant No. NSF ECA1511 1 Sponsoring Organization Name and Address Jational Science Foundation Washington, DC 13. Type of Report & Period Covered Research 14. 1 Supplementary Notes 1 Abstracts Creful analysis of the behavior of users of interactive systems can yield important i sights into the appropriate design of such systems. Because it has not been easy t determine user behavior precisely, we investigated videotaping as a tool. Our aalysis of four students taking examinations both interactively and on paper showed tat they took considerably longer interactively, primarily due to system overhead and touble understanding instructions. The experiment revealed a number of important dsign changes for our system which we expect will reduce the excess time to no more tan 10-20 percent. >fdeotaping led to analysis of many other variables including context switch time, rview behavior and habit changes. These and others observations led us to hypothesize atheory of nuisance, annoyance, and agitation that explains why some students have /ry negative reactions to interaction. 7 Key Words and Document Luerarfivp evct-omc Analysis. 17o. Descriptors i::r behavior rleotape :oputer aided instruction n.sance inoyance station 71 Identifiers/Open-Ended Terms 7eCOSATI Field/Group 8. vailability Statement ilimited ° f NTlS-35 (10-70) 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 57 22. Price USCOMM-DC 40329-P7 1 >)ND e 7