UNIVERSITY OF ILLINOIS BULLETIN ISSUED WEEKLY Vol. XVIII JANUARY 24, 1921 No. 21 (Entered as eecond-class matter Deceember 11, 1912, al the post office at Urbana, Illinois, under the Act of August 24, 1912. Acceptance for mailing at the special rate of postage provided for in sec- tion 1103, Act of October 3, 1917, authorized July 31, 1918.) BUREAU OF EDUCATIONAL RESEARCH— BULLETIN No. 5 Report of 1 )ivision of Educational 1 ests for '19-20 BY WALTER S. MONROE Aisistant Director Bureau of Educatiotial Research, University of Illinois PRICE, 25 CENTS PUBLISHED BY THE UNIVERSITY OF ILLINOIS URBANA, ILLINOIS Moftc BUREAU OF EDUCATIONAL RESEARCH-BULLETIN No. 5 Report of Division of Educational Tests for '19-20 BY. WALTER S. MONROE Assistant Director Bureau of Educational Research University of Illinois PRICE, 25 CENTS PUBLISHED BY THE UNIVERSITY OF ILLINOIS URBANA, ILLINOIS LIBRARY OF CONGRESS pOOVjiiyiENTS DIVISION Bulletins of the Bureau of Educational Research B. R. Buckingham, Editor EDITORIAL INTRODUCTION We should like to have the reader consider this monograph as, in a certain sense, "chips from the work-shop." We hold that no orga- nization, such as that from which this bulletin emanates, should collect from users of test materials the results which they have attained in their localities and hoard them in miserly fashion for its own purposes. More- over, it is not so nominated in the bond. It is understood that when copies of score sheets are mailed to us we are to combine them into master score sheets and to issue tabulations which will indicate over a wider field than any school system affords the conditions disclosed by the tests in question. Although the Bureau of Educational Research of the University of Illinois went out of the business of distributing tests last November, there had been collected up to that time a valuable body of data which was augmented during the succeeding months until it now appears to justify publication. Aside, therefore, from chapters three and four, which deal with Monroe's Standardized Reasoning Test in Arithmetic and his Timed Sentence Spelling Test, the bulletin is devoted to presenting material which will make it possible to use a number of tests more intelligently. We do not, as Dr. Monroe says, attempt in this bulletin to give directions for administering tests or interpreting results. We are mainly concerned with what the results are. We are continually receiving questions from practical workers in the field. These questions have led us to believe that they are much interested in and perplexed by the question of standards. Realizing this fact we have tried to make our presentation of re- sults as complete and helpful as possible. They are presented substantially in three ways. First, we show for each test the median scores by grade. In tables devoted to this sort of data we also include the 25- and 75- per- centiles. To those who understand the meaning of these latter figures the nature of the distribution out of which the medians arise will be made evident. If the 75- and 25- percentiles are far apart it means that the data are scattering. In other words that the distribution of scores spreads over a wide range. In order that there might be no doubt about the nature of the distribution, we have in the second place presented for each test, the number of pupils attaining the indicated scores in each grade to which the test was applied. The v^alue of this sort of a showing is greater than the practical teacher is likely to realize upon the first inspection. Such a table may be converted into a table indicating a distribution in terms of percents by dividing each of the entires by the column totals. When the table is thus converted it becomes directly comparable with a similar table which may be computed for a school or school system. Moreover, since it is customary to give the grade medians in connection with this type of table — and the custom is followed in this bulletin — the teacher may learn from these figures the number and percent of pupils in each grade exceeding or falling short of the median of other grades. A teacher may likewise discover from such tables a number of subordinate facts concerning the test and its applicability to the grades in question — such facts as the number of zero scores or scores in the neighborhood of zero, the number of perfect or nearly perfect scores, and the nature of the distribution of the frequencies throughout. But it is probable that the greatest usefulness of these distribution tables is of another sort. They are indispensible to those who wish to contribute toward the better standardization of tests. For example the 3000 pupils, more or less, in each grade, whose scores are shown in Table II for Monroe's Reasoning Test may reasonably be thought to be insuffi- cient for a final standardization. This tabulation provides a form and makes a beginning for a more reliable treatment of the test in question. Any superintendent can place the pupils whom he has tested — be they few or many — in this scheme. Any bureau of research may gather scores from schools and school systems in this manner. After a little it may (and indeed it should) publish its findings in this manner to the end that more reliable standards may be secured. It is because tables of this sort are costly to print and of little direct school use that they are so seldom seen. They are frequently found after they have been converted into percentage distributions, because the latter are useful in making comparisons. But they are seldom found in mere frequency form. Yet the presentation of such tables is fundamental to cooperative effort. In our judgment every research organization ought to publish material in this form. Its high value for research purposes should be appreciated in contrast with its low value for immediate practical purposes. On the other hand, the third form of tabulation is of most value for school uses. We are referring to the percentile tables presented in the appendix. We are convinced that when this type of material is better understood it will be much more widely used. By means of it a teacher may "place" a pupil's score among one hundred scores, arranged from highest to lowest, these one hundred scores being regarded as typical. Thus the percentile table will enable a fifth-grade teacher to state that a pupil is (say) twentieth among one hundred typical children of his grade in speed of reading, that he is thirty-seventh in the operations of arithmetic, fiftieth in spelling, etc. If he is fiftieth in spelling we have the special case of the median, which we ordinarily arrive at from another point of view. In using percentile tables such as those given in the appendix of this bulletin, regard must be had for the source of the tables. In its ideal form a percentile table is supposed to have been derived from a sufficient random sampling of a total "population" — e. g., from the entire fifth-grade in American schools, or from the entire number of ten-year-olds in rural schools, or from the entire number of graduates of the Chicago high schools. In ranking a child's performance one must be sure either that he belongs to the population to which the table refers or that the population of both the child and the table are indicated. Thus if a fifth-grade pupil obtains a score in composition equal to the 80-percentile for his grade, we thereby define his rank as twentieth from the top (or eightieth from the bottom) among one hundred typical fifth grade children. A pupil thus ranked has evidently done rather well compared with pupils of his own grade. Very appropriately therefore, we may wish to rank him with reference to the sixth grade. His score may perhaps equal the 60-percentile of the sixth grade. Accordingly, he would be ranked, on his performance, as fortieth from the top among one hundred typical sixth-grade children. Similarly he may rank as fiftieth (median) for the seventh grade. We submit these percentile tables for their practical utility. They are, however, based upon a limited number of cases; and they will be somewhat modified when more scores have been made available. From the above statements it will be clear that the chief purpose of this bulletin is to furnish an accounting of the test results which we have received. Nevertheless, we have included two chapters (III and IV) on the derivation of Monroe's reasoning tests and timed sentence tests. These accounts have been held up a long time. When, therefore, they were released, we took account of the demand that has been made for then — especially the one relative to the reasoning test — and incorporated them into this report of the Division of Educational Tests. TABLE OF CONTENTS Chapter Page I. Introduction 9 II. Tentative Grade Norms 12 A. Monroe's Standardized Reasoning Tests in Arithmetic... 12 B. Buckingham's Scale for Problems in Arithmetic 14 C. Monroe's Diagnostic Tests in Arithmetic 16 D. Monroe's Standardized Silent Reading Tests 19 E. Charters' Diagnostic Language, and Language and Grammar Tests 25 F. Willing 's Scale for Measuring Written Composition 26 G. Harlan's Test for Information in American History 28 H. Sackett's Scale in United States History 29 I. Hotz's First Year Algebra Scale 30 J. Minnick's Geometry Tests 31 K. Holley's Sentence Vocabulary Scale 32 L. Holley's Picture Completion Test for Primary Grades 34 III. The Derivation of Monroe 's Standardized Reasoning Tests in Arithmetic 36 IV. Monroe's Timed Sentence Spelling Tests AND Pupils' Errors. 48 Appendix: Percentile Scores 57 1. Monroe's Standardized Reasoning Tests in Arithmetic. 2. Monroe 's Standardized Silent Reading Tests. 3. Charters' Diagnostic Language Tests for Grades III to VIII. A. Pronouns B. Verbs C. Miscellaneous A D. Miscellaneous B 4. Charters' Diagnostic Language and Grammar Tests for Grades VII to VIII A. Pronouns B. Verbs C. Miscellaneous 5. Willing 's Scale for Measuring Written Composition 6. Harlan 's Test for Information in American History. 7. Hotz's First Year Algebra Scale 8. Holley's Sentence Vocabulary Scale. CHAPTER I INTRODUCTION Source of data. In distributing educational tests the Bureau of Edu- cational Research has always supplied a duplicate class record sheet on which was printed a request that the duplicate be returned after the scores had been entered upon it. There has been no effort to follow up the purchasers of the tests in order to secure complete returns of the scores. Consequently, this report is based upon the scores voluntarily contributed. The bulk of the scores are from medium sized cities. Reports have been received from a few large cities (population of 100,000 or more) for Monroe 's Standardized Silent Read- ing Tests but in no other case has more than one such city reported. Practically no scores were reported from rural schools except for Monroe's Standardized Silent Reading Tests. Several tests distributed by the Bureau of Educational Research are not included in this report for the reason that the number of scores reported seemed to be too small to justify any announcement of median scores which would be useful as tentative standards. Form of the report. The distributions of scores entered upon the class record sheets were combined to form a total distribution of scores for each yearly grade. No attempt was made to keep separate the scores of the A and B sections of the yearly grades. In addition to the median scores for each grade the 25- and 75- percentile scores are also given for several of the tests. In a few cases the total distributions are given because they have a special significance. In the Appendix of this bulletin the 5-, 10-, 20-, 30-, 40-, 50-, 60-, 70-, 80-, 90-, and 95- percentile scores are given for a number of the tests. The publication of the percentile scores is prompted by the desire to make possible a more accurate interpretation of a pupil 's score than merely that it is above or below the median score. Pupils belonging to any grade exhibit large individual differences. For this reason it is frequently desirable to know where the score of a pupil places him in the total distribution from which the median for his grade was calculated. The percentile scores make it possible to ascer- tain for any pupil his approximate position in this total distribution. For example, if a pupil 's score is equal or superior to the 80- percentile score for his grade, he ranks in the upper 20 percent of all pupils to whom the test was given in that grade. Time of testing. No attempt was made to organize the giving of the tests. Consequently, the scores on which the median and percentile scores are based represent testings all the way from September to June. This condition makes the derived scores somewhat less useful as tentative grade standards than they would be if they were based upon measures obtained at some one fixed 10 . time during the school year. The situation is further complicated by reason of the fact that some of the schools reporting have semi-annual promotion while others have annual promotion. Some of those which have semi-annual pro- motion combined the A and B divisions of each grade in making their reports. In order that the median and percentile scores shall have as definite a meaning as possible we have estimated for each test the month of the school year which the median scores appear to represent. This estimate, however, must be con- sidered only approximate. An organized, cooperative plan which would have resulted in the tests being given on one or more fixed dates during the school year, was not attempted for two reasons. In the first place, the complete realization of such a plan was impossible because in a large number of school systems the tests were given as a part of a local plan. The Bureau of Educational Research believes it is wise to encourage this use of educational tests. When the tests are given merely as a part of a project originated by a central bureau of research, little use is likely to be made of the results by the schools giving them. Their motive is . to cooperate with the central bureau, and when this has been completed there is a tendency for them to feel that all has been accomplished which may be accomplished. This results in a great waste. Information which might be of much value to the local school systems is not used. Furthermore, this practise tends to engender an attitude toward educational tests that they are merely tools of research to be used by central research bureaus and not tools which may be used by a school system, or even a teacher, in improving instruction. A second reason for not attempting to organize the giving of the tests was that tests were distributed in all sections of the United States and in a few foreign countries. It would have been impossible to solicit the cooperation of all who gave the tests in a plan of organized testing. The interpretation of scores by comparison with grade norms. It is not the purpose of this bulletin to give detailed suggestions concerning the use of the grade medians and percentile scores in the interpretation of the scores obtained in any school system by giving the tests. In another place^ the writer has indicated in some detail the general procedure to be followed in interpreting scores for the purpose of improving instruction. Grade norms are also useful in interpreting the scores of pupils for the purpose of classification.- In any interpretation of scores, either individual or group, it is necessary to bear in mind certain limitations. In the first place none of our educational tests yield scores which ar,e absolutely accurate. The errors of measurement are large in comparison with the errors made in the measurement of physical objects. Errors larger than the difference between the median scores for suc- ^Monroe, Walter S. "Improvement of Instruction Through the use of Educational Tests," Journal of Educational Research, I (February, 1920), 96-102. ^Buckingham, B.R. "Suggestions for procedure following a testing program^I, Reclassification," Journal of Educational Research, II (December, 1920), 787-801. 11 •cessive school grades frequently occur although in the case of our better tests the majority of the errors are less. These errors of measurement are chance errors and for that reason tend to neutralize each other in the median and average scores of groups. Therefore, the group scores are more accurate than individual scores. However, in interpreting either type of scores one should bear in mind the possible errors of measurement which they may include. In the second place the score which a pupil makes on any subject- matter test, such as reading, arithmetic, history, or language, depends in part upon his general intelligence. Pupils belonging to any school grade differ widely with respect to their general intelligence and consequently may be expected to differ in their achievements. For this reason some pupils belonging to a given grade should have scores above the median, while others may be expected to have scores below the median because of their differences in capacity to learn. There are also differences in the average general intelligence of pupils belonging to the same school grade. For example, the average general intelligence of the fifth-grade pupils in one school may be a year or more above that of the fifth- grade pupils in another school. It is unfair to both pupils and teachers to interpret achievement scores without recognizing the differences which may exist in the general intelligence of the pupils. To do so will frequently result in arriving at erroneous conclusions. Hence, grade standards such as are given in this report must be used with due caution. Chapters III and IV contain reports of studies which were made by the writer during the time he was Director of the Bureau of Educational Measure- ments and Standards of the Kansas State Normal School, Emporia, Kansas. These reports were originally prepared for publication by that institution. Permission has been obtained to incorporate them in this bulletin. In doing this the manuscript has been only slightly revised. Chapter III gives an ac- count of the derivation of the Monroe Standardized Reasoning Tests in Arith- metic. Chapter IV contains a description of the derivation of Monroe 's Timed •Sentence Spelling Tests and a report of a study of pupils* errors in spelling based upon them. CHAPTER II TENTATIVE GRADE NORMS The percentile scores as well as the median scores which are given in this chapter should be used only as tentative grade standards. For several of the tests the number of scores on which these are based is so small that the standards can not be thought of as final. When other scores are added to the distributions, it is likely that different medians will be obtained. Furthermore, such standards should always be thought of as representing the average of present conditions and not as being ideal standards or what ought to be. A. Monroe's Standardized Reasoning Tests in Arithmetic The derivation of these tests, which consist of a series of one- and two- step problems, is described in Chapter III. For each problem two values were calculated, "correct principle value," or P, and "correct answer value," or C. These values represent the credit which is to be given for solving the problem correctly in principle and for obtaining the correct answer. Each problem is marked for correct principle. If a problem is solved correctly in principle it is further marked with reference to correct answer. A pupil does not receive credit for a correct answer if the problem was solved by the wrong principle. The directions for administering the tests provide for having the pupils mark the problem on which they are working at the end often minutes. In this way TABLE I. MONROE'S STANDARDIZED REASONING TESTS IN ARITHMETIC. FORM I. GRADE NORMS FOR APRIL TESTING Grade IV V VI VII VIII Correct Principle Number of pupils. 2932 3027 3498 2796 2472 25-percentiIe 6.2 12.1 10.0 13.8 11.5 Median 11.3 19.2 14.2 19.7 17.2 75-percentile 16.8 25.9 19.4 24.7 22.8 *Rate Number of pupils 1412 1705 1699 1717 1642 25-percentile 5.2 8.0 6.4 8.0 5.3 Median 7.8 11.2 8.7 11.2 7.5 75-percentile 8.1 15.1 12.1 14.5 10.9 Correct Answers Number of pupils 2968 2996 3518 2803 2515 25-percentile 4.1 7.1 6.9 9.4 5.1 Median 7.0 11.3 10.4 13.4 9.0 75-percentile 10.7 15.5 14.0 17.4 13.0 "Sum of correct principle values of problems done correctly within ten minutes. 13 a rate score may be obtained. It is the sum of the "principle vakies" of the problems which are solved correctly in principle within ten minutes. However, the obtaining of the rate score is optional, and it was reported in only about half of the cases. There are two forms of these tests. These forms were constructed so that they were expected to be equivalent. Experience in using them suggests that they are not equivalent, although data are lacking at this time on which a statement concerning their comparability may be based. No scores are reported for Form 2 because the returns received for this form included an insufficient number of cases. Test I is given in Grades IV and V, Test II in Grades VI and VII, and Test III in Grade VIII. The tests were not constructed so that the scores yielded by the different tests are comparable. Therefore, direct comparisons can not be made between the fourth and fifth grade scores and between the seventh and eighth grade scores. TABLE II. MONROE'S STANDARDIZED REASONING TESTS IN ARITHMETIC FORM I, CORRECT PRINCIPLE Score* Grade IV V VI VII VIII 43 3 11 41 5 39 7 20 37 1 21 35 12 89 33 11 56 31 26 137 56 29 25 120 47 127 94 27 40 191 93 262 63 25 80 191 131 202 171 23 89 223 135 242 214 21 124 269 248 304 233 19 130 207 280 322 214 17 161 211 306 259 217 15 248 231 328 225 237 13 260 219 470 225 203 11 298 167 425 193 201 9 267 166 374 134 133 7 294 148 276 103 154 5 304 140 191 52 121 3 230 94 123 30 87 1 185 54 49 18 51 137 57 22 8 23 Total 2932 3027 3498 2706 2472 Median 11.3 19.2 14.2 19.7 17.2 • In this bulletin ail intervals unless other- wise noted are expressed in terms of their lower limits. 14 Table I gives the grade medians, 25-percentile, and 75-percentile scores^ for correct principle, correct answer, and rate. The distributions of scores for correct principle are given in Table II. These indicate that Test I is too difficult for a number of pupils in the fourth and fifth grades. In the construc- tion of the tests no effort was made to include very easy problems. In fact,^. as is shown in Chapter III, the difficulty of a problem was not considered as a basis for selection. In none of the other grades do the zero scores amount to as much as one per cent of the total. In the seventh grade nearly five percent of the pupils made perfect scores. References Willing, M. H. "The Encouragement of Individual Instruction by Means of Standard- ized Tests," Journal of Educational Research, I (March, 1920), 193-198. Results from the Monroe Scandardized Reasoning Tests are used to illustrate how such: work as the title mentions may be carried on. Suggestions for diagnosis of faults, remedial measures, etc. are given. B. Buckingham's Scale for Problems in Arithmetic. The problems for Buckingham 's scale were selected largely on the basis of diffculty. Division One is for Grades III and IV, Division Two for Grades V and VI, and Division Three for Grades VII and VIII. The problems of Division One increase by steps of approximately 0.3 P. E. from 2.7 to 5.3. The problems of Division Two increase by similar steps of difficulty from 5.5 to 7.3, and the problems of Division Three increase from 7.5 to 9.4. In scoring the test papers attention is given only to the numerical accuracy of the answers.. A pupil's score is the difficulty value of the hardest problem which he answers correctly, unless he has failed on one or more previous problems. In that case, a correction is made by subtracting from the value of the hardest correctly solved problem 0.3 for each failure in Division One, or 0.2 for each failure in Division Two or Three. Thus, if a pupil solved the first six problems in Divi- sion One, his score is 4.2; but if he fails on the 4th and 5 th (otherwise succeeding, through the 6th), his score is 3.6— i.e., 4.2 — 2 x 0.3. TABLE III. BUCKINGHAM 'S SCALE FOR PROBLEMS IN ARITHMETIC. I. GRADE NORMS FOR JUNE TESTING. FORM Grade III IV V VI VII VIII No. of pupils 25-Percentile Median 75-Percentile 4181 3.4 3.8 4.3 4589' 4.2 4.6 5.2 7142 5.7 5.9 6.3 5927 5.9 6.4 6.8 6632 7.6 7.8 8.3 5269 7.7 8.2 8.7 Although the three divisions of the scale were constructed so that it was expected that the scores obtained from the different divisions would be com- 15 parable, the grade medians given in Table III clearly indicate that the scores are not comparable. The increase in the median scores from the third grade to the fourth grade is 0.8. The increase from the fourth grade to the fifth grade is 1.3. A similar variation is found in the differences between the subsequent grades. Therefore, the scores obtained by the different divisions of the scale are not comparable. The reason for this is that the pupils taking Division Two or Division Three do not have an opportunity to do the problems of the lower divisions. If they did, a number of them would fail to do all of them correctly. Thus, they would receive a score lower than that which they receive when taking only the higher divisions. TABLE IV. BUCKINGHAM'S SCALE FOR PROBLEMS IN ARITHMETIC. FORM I GRADE DISTRIBUTIONS FOR JUNE TESTING Grade III IV V VI VII VIII 9.0 328 699 8.5 6 782 1084 8.0 1 4 11 1349 1290 7.5 14 13 2931 1740 7.0 240 775 58 44 6.5 2 1012 1886 6.0 6 1540 1504 5.5 21 3663 1569 5.0 131 1069 106 57 4.5 490 1474 14 12 4.0 815 ^ 863 3.5 1305 798 3.0 967 255 2.5 298 75 173 25 549 94 1184 412 Total 4181 4589 7142 5927 6632 5269 Median 3.8 4.6 5.9 6.4 7.8 8.2 In Table IV the total distributions are given. Evidently a division of the scale higher or lower than that designed for the grade has been used in a few cases. The distributions are significant in that they show that the divi- sions of the scale are too difficult for the respective grades. The percent of pupils making zero scores in the third, fifth, seventh, and eighth grades is so large that the scale as now published must be considered unsatisfactory for these grades. This condition could be remedied in the case of Division Two and Division Three by giving the next lower division to the pupils who make zero scores. In the case of Division One, the scale will have to be extended downward by adding less difficult problems. 16 References First Annual Report, Bureau of Educational Research, University of Illinois, pp. 21-22. These pages contain a very brief suggestion of what was done along the line of this scale that seemed to justify its construction, also a short description of the scale. Buckingham, B.R. "Notes on the Derivation of Scales in School Subjects, with Special Application to Arithmetic," Fifteenth Yearbook of National Society for the Study of Education, Part I, pp. 23-40. This presents a report of a series of problems which was given to a number of school children in New York and other cities. The results are given and discussed, especially with reference to locating the problems on a scale. Although this scale is not the one now in use, it is similar to it. C. Monroe's Diagnostic Tests in Arithmetic. Monroe's Diagnostic Tests in Arithmetic consist of four parts. Part I includes Tests 1 to 6. Part II includes Tests 7 to 11. These tests involve only- integers. Part III includes Tests 12 to 16, which consist of examples involving common fractions. Part IV includes Tests 17 to 21, which consist of examples involving multiplication and division of decimal fractions. Tables V and VI TABLE V. MONROE'S DIAGNOSTIC TESTS IN ARITHMETIC. MEDIANS FOR APRIL TESTING. RATE (NUMBER OF EXAMPLES ATTEMPTED) GRADE Grade IV V VI VII VIII Part I (Approximate number of pupils) 900 480 590 600 600 Test 1 7.2 11.6 13.3 12.6 14.0 Test 2 4.1 7.2 9.3 8.6 9.2 Test 3 3.3 5.0 5.8 5.7 7.2 Test 4 2.0 3.2 4.0 4.7 5.7 Tests 3.9 4.8 5.6 5.7 6.2 Test 6 1.7 2.7 3.1 3.0 4.0 Part II (Approximate number of pupils) 380 760 610 520 460 Test? 3.8 4.2 5.5 5.3 6.3 Tests 3.0 4.1 5.5 6.1 6.6 Test 9 4.8 5.9 8.2 8.6 9.8 Test 10 2.8 3.2 5.3 5.3 6.7 Test 11 1.6 2.2 2.2 2.9 3.7 Part III (Approximate number of pupils) 370 1000 580 560 Test 12 5.8 7.6 8.6 9.4 Test 13 4.5 5.4 6.0 6.0 Test 14 5.2 7.1 8.1 8.7 Test 15 6.2 7.1 7.7 8.1 Test 16 5.7 7.4 8.0 9.1 Part IV (Approximate number of pupils) 440 900 660 Test 17 3.6 3.5 4.5 Test 18 11.9 11.5 12.9 Test 19 5.8 4.5 5.3 Test 20 12.5 11.1 13.5 Test 21 5.1 4.3 4.8 17 give the median scores for these tests in terms of rate (number of examples attempted) and accuracy (percent of examples done correctly). In order to simplify the administration of these tests the plan of scoring has been changed so that the pupil is now given only one score, the number of examples right. In Table VII tentative grade norms are given in terms of this score. In the interest of economy, both of cost of the tests and time required for their administration, most of the tests of this series were made so short that there is a lack of discrimination between pupils. For example, the increase in the number of examples attempted from grade to grade is frequently less than one example. The shortness of the tests also makes the errors of measurement relatively large. This group of tests was designed for diagnostic purposes, i. e., it was in- tended to measure separately the abilities of pupils to do the important types of TABLE VI. MONROE 'S DIAGNOSTIC TESTS IN ARITHMETIC. GRADE MEDIANS FOR APRIL TESTING. ACCURACY (PERCENT OF EXAMPLES CORRECT) Grade IV V VI VII VIII Part I Approximate number of pupils 900 480 590 600 600 Test 1 100 100 100 100 100 Test 2 66.6 86.8 100 100 100 Test 3 56.5 72.3 80.8 82.0 87.6 Test 4 28.0 55.1 71.9 79.8 85.4 Tests 52.5 61.7 66.9 67.9 75.1 Test 6 22.4 49.5 64.0 77.5 100 Part 11 Approximate number of pupils 380 760 610 520 460 Test? 63.2 65.1 75.9 76.3 81.6 Tests 30.4 52.9 66.9 79.8 78.5 Test 9 75.0 86.3 91.2 93.1 100 Test 10 35.2 58.4 72.1 72.3 81.9 Test 11 22.4 35.5 53.4 65.0 68.2 Part III Approximate number of pupils 370 1000 580 560 Test 12 35.5 32.0 33.6 49.0 Test 13 38.0 29.2 36.0 53.9 Test 14 57.5 70.3 79.6 86.0 Test 15 37.5 30.0 33.6 45.5 Test 16 38.5 36.8 59.1 70.2 Part IV Approximate number of pupils 440 900 660 Test 17 37.6 36.4 53.2 Test 18 - 100 100 100 Test 19 39.6 47.0 61.7 Test 20 100 100 100 Test 21 35.6 44.0 51.3 18 examples in the field of the operations of arithmetic. A weighted sum of a pupil 's scores on such a group of tests would yield a general measure of his ability in this field.' TABLE VII. MONROE'S DIAGNOSTIC TESTS IN ARITHMETIC. GRADE MEDIANS FOR APRIL TESTING. NUMBER OF EXAMPLES CORRECT •>. Grade IV V VI VII VIII Part I Approximate number of pupils 900 480 590 600 600 Test 1 7.2 11.6 13.3 12.6 14.0 Test 2 2.8 6.2 9.3 8.6 9.2 Test 3 1.9 3.6 4.7 4.7 6.4 Test 4 .6 1.7 2.9 3.6 4.8 Test 5 2.0 3.0 3.7 3.8 4.4 Test 6 .4 1.3 2.0 2.3 4.0 Part II Approximate number of pupils 380 760 610 520 460 Test 7 2.4 2.7 4.2 4.0 5.1 Tests .9 2.2 3.7 4.8 5.2 Test 9 3.6 5.1 7.5 8.0 9.8 Test 10 1.0 1.8 3.8 3.8 5.5 Test 1 1 .4 .8 1.2 1.8 2.4 Part III Approximate number of pupils 370 1000 580 560 Test 12 2.0 2.4 2.9 4.6 Test 13 1.7 1.6 2.2 3.2 Test 14 3.0 5,0 6.5 7.4 Test 15 2.3 2.1 2.6 3.7 Test 16 2.2 2.7 4.7 6.4 Part IV Approximate number of pupils 440 900 660 Test 17 1.4 1.3 2.4 Test 18 11.9 11.5 12.9 Test 19 2.4 2.1 3.3 Test 20 12.5 11.1 13.5 Test 21 1.8 1.9 2.5 References Finley, G. W., A Comparative Study of Three Diagnostic Arithmetic Tests. Colorado State Teachers College Bulletin, Series XX, No. 4. This reports a study made of the Cleveland Survey Tests, The Woody Arithmetic Scales and Monroe's Diagnostic Tests in Arithmetic. The tests were given on six successive days to some 60 eighth grade pupils. The scores made are given in detail, compared with each other and with scores obtained elsewhere. Monroe, W. S., "A Series of Diagnostic Tests in Arithmetic," Elementary School Journal, XIX, (April, 1919), 585-607. This article discusses the types of examples in the four fundamental operations, the question of "one dimensional" vs. "two dimensional" tests, and thus establishes the theoretical bases of the tests presented. The series is described, a distribution of scores made thereon is ana- lyzed, and the value of using such tests pointed out. 'See the group of tests on the operations of arithmetic included in the Illinois Examination. 19 Uhl, W. L., "The Use of Standardized Materials in Arithmetic for Diagnosing Pupils' Methods of Work," Elementary School Journal, XVIII, (November, 1917), 215-218. This article contains no reference to the Monroe Tests, but describes an experiment in diag- nosis similar to that made possible by their use. Both finding specific faults and remedying them is considered briefly. D. Monroe's Standardized Silent Reading Tests. Monroe's Standardized Silent Reading Tests have been used so widely that a detailed description here is unnecessary. Each test consists of several exercises, each of which has been assigned a rate value and a comprehension value. The rate value is based upon the number of words in the exercise and the comprehension value is based upon the rate and accuracy with which pupils were found to be able to do the exercise. Test I is for Grades III, IV, and V, Test II is for Grades VI, VII, and VIII, and Test III is for the high school. There are three forms of Tests I and II. There are only two forms of Test III, The different forms of these tests were constructed so that they were expected to be equivalent. The use of the forms, however, indicates that they are not equivalent. In order to study the degree of equivalence of the three forms, copies of the different forms were arranged in alternate order before the test papers were distributed to the pupils. This plan results in the first, fourth, seventh, tenth, etc. pupil having a copy of Form 1. The second, fifth, eighth, eleventh, etc. pupil would have a copy of Form 2. The third, sixth, ninth, twelfth, etc. pupil would have a copy of Form 3. By this plan each form of the test is given to similar samples of the school population. Test I was given to approximately 775 pupils and Test II was given to approximately 645. The numbers of pupils taking the different forms in each grade differed slightly. This is an accidental result of the way in which the test papers were arranged. The average and the standard deviation have been calculated for each distribution of scores. In general the pupils made higher scores on Forms 2 and 3 than they did on Form 1. The standard deviations are also unequal. This suggests that the exercises of the different forms of the tests make somewhat irregular scales. The formula for reducing the scores obtained from one scale to equivalent scores on another scale is as follows: S,=-^S,+ (^y,- p-Av^l In this formula Sj is the equivalent score in Form 1 and S2 the obtained sorec in Form 2. Avi refers to the average of the scores obtained from Form 1, Av2 refers to the average of the scores obtained from Form 2. o-i is the standard deviation of the distribution of the Form 1 scores and <^2 is the standard devia- tion of the distribution of the Form 2 scores. This formula is based upon the usual assumption that the deviations from the average are equal when expressed in terms of the standard deviation of the distribution; in other words that 20 Si - Avi AV2 <^!! When this equation is solved for Si we obtain the formula as given above. Since the scores on Form 1 are in general smaller than the scores on the other two forms it was decided to reduce both the Form 2 and the Form 3 scores to the equivalence of Form 1 scores. The application of the above formula involves the determination of the numerical value of the ratio of -^ by which the Form 2 score is to be multiplied and the determination of the numerical equivalent of the constant terms of the formula, (i.e., of the expression in parentheses). This latter numerical equivalent may be plus or minus. When it is positive it is to be added and when negative it is to be subtracted. + ?o n II H H % H H o 2 w " M w n T H H H H 71 t-H t.1^ ^^ )_^ I Z o z •n o CO to — U) to — oo to — w to ■— 7> Z p OS ^ ON -J^-J On ON ^ ^^--j ^ U> 4^ 4^ On -^J -J Oi 4^ 4^ On ~J ^ G 2 vOOi^O VO -~J Ui SOU! SO SO--J^^ > o — >«o OO ^J On to to to 1—^ H- » 1— » *. -~J--1 O tOOJ Ui CO to to to 1— < Oi-^ro 4^ ^J oo to oo to b b 4^ NJ U) to to to to 5 z > o H > to to \o vO-J 4^ so so so On Ln On ^ ^O Vi H- Ln O On O t/1 to SO Un §5 S c ^^ ^- H- r OJ oo OO oo oo SO b — -J n o (0 OO 1 to-^ 1 oo 4^ 1 4- O 1 -0 OOs oo lyi4>. OO r; m ?3 o O n z OO oo 1 1 1 1 1 o z Ml 2; c g; cs -P.W ' -^ ' -^ • 1 — oo z -J H m so S !o > so > 2 " ►no o > r 2 o o o i_. fn CO > > o O ^^ n > z -; o N > S f^ tfl o z ?z " 2 ^ . 00 ■ H O JO PI r m pel m > o o M H 21 In Table VIII, the number of pupils, the average score and the standard deviation is given for each form of each test. In the last two columns the multiplier and the constant term in the above formula are given for Form 2 and Form 3. These can be used in reducing scores obtained from Form 2 or Form 3 to the basis of Form 1. In securing data for these determinations, each test was given in each of the three grades for which it is intended. Except in the eighth grade the number of pupils in the different grades was approxi- mately the same. The correction numbers were calculated for each grade separately. Since they were found to be approximately the same it was decided to combine the scores from the different grades and compute a single set of correction numbers for each test. The grade medians calculated from the distributions of the scores yielded by the different forms furnished additional information concerning the degree of their equivalence. This information is not in complete agreement with that obtained by the study described. Although it is less accurate it deserves some consideration in the formulation of a set of rules for translating the scores obtained from one form to the basis of another form. In Table IX-A grade medians for all forms are given, and in Table IX-B correction numbers which may be used in reducing scores from Form 2 or Form 3 to Form 1. The cor- rection numbers are based primarily on the results of the study just described but some weight was given to the information furnished by the tabulations of TABLE IX-A. MONROE 'S STANDARDIZED SILENT READING TESTS. GRADE MEDIANS FOR JANUARY AND JUNE TESTING, BASED UPON 130,000 SCORES Grade III IV V VI VII VIII IX X XI XII Form 1 Rate January 52 70 87 90 100 106 83 85 90 96 June 60 79 94 96 104 108 86 87 94 100 Comprehension January 6.8 12.7 17.8 18.5 22.8 26.0 23.0 25.4 27.2 30.0 June 9.3 15.3 20.8 21.0 24.5 27.3 24.0 26.0 28.6 32.0 Form 2 Rate January 63 77 98 116 130 133 84 90 98 104 June 70 88 106 124 132 136 86 92 101 109 Comprehension January 8,3 13.3 17.2 18.1 26.0 28.2 25.4 28.0 31.0 33.1 June 10.6 15.6 20.5 20.8 27.3 29.4 26.6 29.4 32.2 34.5 Form 3 Rate January 78 92 97 101 109 111 June 85 95 104 106 111 114 Comprehension January 9.3 14.8 18.4 22.2 26.5 29.8 June 11.9 16.8 21.5 24.4 28.2 30.5 22 TABLE IX-B. APPROXIMATE CORRECTIONS BY WHICH TO MULTIPLY FORM 2 AND FORM 3 SCORES TO REDUCE TO THE BASIS OF FORM 1 SCORES Rate Comprehension Test I Test II Test III Test I Test II Test III Form 2 . 88 Form 3 .78 .80 .93 .94 .95 .94 .93 .86 .90 the scores obtained from the different forms. This is the explanation of some apparent inconsistencies in the reductions to the basis of Form 1 . It should be noted that the scores of the different tests in this series are not comparable. This is to be expected in the case of the rate scores but in the case of the comprehension scores an effort was made to have the different tests yield comparable scores. This attempt was not successful. The grade distributions which are not published here, show that the tests are too short for the time allowed. In order to secure accurate measures of the abilities of the best readers it will be necessary either to lengthen the test or to shorten the time allowed. The wide spread use of these tests has revealed other defects. Instead of attempting to remedy these defects in the present series it was decided to derive an entirely new series. These have been issued under the tide of " Monroe 's Standardized Silent Reading Tests, Revised." Three forms of Tests I and II are now available. They were originally pub- lished as a part of the Illinois Examination but are now printed separately. In Tables X-A and X-B we have assembled a miscellaneous collection of grade medians. These are published because a number of requests have been received for just this type of information. References Monroe, W, S., "Monroe's Standardized Silent Reading Tests," Journal of Edu- cational Psychology, IX, (June, 1918), 303-312. The deriviation of the tests more or less based upon the Kansas Silent Reading Test plan, is briefly sketched. The investigation of weighting and timing is outlined, a sample of the tests is given and some few data concerning results from pupils. Barnes, Harold, "Reorganization of Classes Based on the Monroe Silent Reading Tests," University oj Pennsylvania Bulletin, vol. XX, No. 1, 119-123. This article recounts the use made of these tests in the elementary grades of Girard College. Not only are the scores presented, but also the resulting organization upon the basis of ability as shown on the tests is described. Kelly, F. J., "Kansas Silent Reading Tests," Journal of Educational Psychology, VII, (February, 1916), 63-80. The author of these tests, which were the forerunners of Monroe 's Standardized Silent Reading Tests, gives a brief statement of the construction, administration, and use of the tests, follow- ing it with a more detailed statement of results secured in nineteen Kansas cities. Lloyd, S. M. and Gray, C. T., "Reading in a Texas City, Diagnosis and Remedy," University of Texas Bulletin, No. 1853. 23 This bulletin gives an account of a study of the reading situation in Austin. The Monroe tests were given in grades 3-7. Results obtained are analyzed at considerable length, measures to improve the situation are discussed, and improvement after a period of special emphasis on reading is shown. Pressey, S. L. and L. W., "The Relative Value of Rate and Comprehension Scores in Monroe's Silent Reading Test, as Measures of Reading Ability," School and Society, Gune 19, 1920), 747-49. In a brief discussion of the above subject, the writers present results of correlating teachers* estimates of reading ability with rate and comprehension scores, also the latter with each other. They conclude that comprehension scores may tell us all the tests can about children's ability in reading. G 3_ m n> Form Large Mediu Form Large Mediu Wisco: Denve Denve Pittsb Form Large Mediu 3 ^ o'<.z 3 n^ £ - -■ g g n^ B Q'-' < ities (May) ! Sized Cities ( in Cities (Feb (October) (May) gh (May) g P H 0? -t [n M Sp ° fD -^-y * 3 2 i^ "5 M » B •-<; 5.^ » On 00 00 OS v^ CO ON 0^ OS 4^ Cn ,_( Oi PT Oi OJ ^ ^ >*>■ t^^ Il2 b H- 4^ --J b '►- ►- OS ■~j vo -^ --4 >^ -^ -J -^J OS -J 00 w>— 00 OS »>D On 4^ -J en < b 00 ON ''c:>'^'^'^o Un 1— » *-— _« 00 1— >0 VD -^ ^ ^ ^1 00 ^o l-n H- Co Ln On OJ ^J -J 00 00 < b '4^0 b b b b vo '>-' Cr> H- ,_* |_A N^ >— .— » »— • H— ^— . 00 H- 00 ►— ^ ►— 00 SO ufi Sk> <_« Lr> ►- Ui CO 4^ 00 '-' < b Vd'-4 b b b b Ni ^ ~~3 OS (— * »— . 1— ' .—1 ^^ 1—. H^ H-' )_. ■—• Oi— ^ ro vo w CO CO ^■z; < Lr» 00 OJ ^J --J 00 CO CO 4^0 b 00 On cr> "- --a bb JO > ^_* ►— . t— * H^ H- . 1-^ 1—* H- . »— ' H-* a c^ H- . 1—. CO W VO CJ CO CO 00 < C3N to >o to W 00 U> en 00 CTsOs m b 00 'oto'c^'o^c^ '4^K) ^ CO 00 00 vo -^ Ni !^ b-f^ Cn 00 00 00 vO 00 en X bo OS ,_^ 00 00 OS >o On M OK) OS «— • »— * ^^ 00 ^ OS H- NJ bos OS ■^ o 2: ?o o 50 CO W > > P > o z H c«- CO- 24 H (/) W H O Q 1 in ^^ Q Oh Q U Pi < X X X so oo oo oo oo 'f -< O CM (N 25 ■- U O CO O O O i^ r-^ <-o o O >^ "^ w-i r~^ rfi o o m ooOsvOOOO w)0 r-- •— •* lo VJ-) lo ro ro CO t^ 2 ^ >. B = M n 25 E. Charters Diagnostic Language Tests, and Diagnostic Language and Grammar Tests. There are two groups of these tests: (1) The Diagnostic Language Tests, designed for Grades III to VIII inclusive, which inckide. Pronouns, Verbs, (formerly Verbs A), Miscellaneous A (formerly Miscellaneous), Mis- cellaneous B (formerly Verbs B); (2) The Language and Grammar Tests, designed for Grades VII and VIII, which include Pronouns, Verbs (formerly Verbs A) and Miscellaneous A (formerly Miscellaneous). The Language Tests consist of a number of sentences most of which are grammatically in- correct. If a sentence is correct the pupil makes a cross on the dotted line below the sentence. If the sentence is not right the pupil is required to put the correct words on the dotted line below it. In the Language and Grammar Tests the pupil is required in addition to write the rule on which the correction is based. The pupil 's score is the number of exercises which he does correctly. Since the sentences which make up the tests were selected as representative of the errors which pupils make, a pupil 's performance on the tests gives a diagnosis of his abilities in the field of these tests. There are two forms of these tests. The second form, however, was not published until September, 1920. Consequently, the scores reported in this bulletin are based on Form 1. Although the two forms were constructed so that Form 2 might be expected to be equivalent to Form 1, there is available at this time no information concerning the degree of their equivalence. TABLE XI A. GRADE NORMS FOR CHARTERS' DIAGNOSTIC LANGUAGE TESTS. MARCH TESTING Grades III IV V VI VII VIII *MlSCELLANEOUS A Number of Pupils 386 669 668 845 758 494 25-percentile 4.0 5.8 8.1 11.8 14.0 16.6 Median 6.7 9.3 11.6 16.5 18.9 22.3 75-percentile 13.3 13.6 16.0 21.7 24.4 27.1 tMlSCELLANEOUS B Number of Pupils 230 430 307 475 412 294 25-percentile 3.0 10.6 15.7 19.8 23.5 28.7 Median 7.9 17.8 22.0 27.3 29.4 32.0 75-percentiIe 14.8 24.5 27.6 32.4 33.7 36.8 **Verbs Number of pupils 365 403 373 478 539 638 25-percentile 7.3 12.9 17.2 19.0 •22.7 28.6 Median 12.6 17.7 22.6 24.3 27.7 32.8 75-percentile 18.8 22.7 28.4 29.3 31.9 36.1 Pronouns Number of pupils 787 864 895 1344 1566 1253 25-percentile 8.9 11.1 14.2 17.0 19.6 23.1 Median 13.6 15.1 18.5 21.4 24.5 29.0 75-percentile 19.8 20.3 22.6 25.7 29.5 34.0 * Formerly Miscellaneous t Formerly Verbs B ** Formerly Verbs A 26 TABLE XI-B. GRADE NORMS FOR CHARTERS' DIAGNOSTIC LANG- GUAGE AND GRAMMAR TESTS MARCH TESTING Grades VII VIII Miscellaneous Number of pupils 332 362 25-percentile 2.9 6.1 Median 6.3 11.9 75-percentile 11.7 18.7 Verbs Number of pupils 434 497 25-percentile 2.8 6.9 Median 7.8 14.0 75-percentile 22.9 24.1 Pronouns Number of pupils 332 362 25-percentile 4.4 8.5 Median 8.0 17.1 75-percentile 16.7 26.0 References Charters, W. W., "Minimum Essentials in Elementary Language and Grammar," Sixteenth Yearbook of the National Society for the Study oj Education. Part I, 85-110. This article gives a brief account of a number of studies of language and grammar errors made by school children, with tables of results. These studies were the basis of the content of Charters' tests. Sixth Conference on Educational Measurements. Bulletin of the Extension Division, Indiana University, Vol. V, No. I, pp. 6-12 and 13-24. These two discussions by Charters give a rather general discussion leading up to a brief account of the development and form of the tests, followed by some suggestions as to their use. Charters, W. W., "Constructing a Language and Grammar Scale," Journal of Educational Research, I (April, 1920), 249-257. The tests herein considered are a revision of those referred to above. The writer gives a short description of their derivation, use, scoring, etc. The question of weighting is discussed and the reason for its elimination given. F. Willing 's Scale for Measuring Written Composition The Willing Scale for Measuring Written Composition differs from other composition scales in that an attempt is made to secure separate measures of "form value" and "story value." The "form value" of a pupil's composition is based upon his errors in grammar, punctuation, capitalization, and spelling. In order to make the scores in form value comparable, the number of errors which the pupil makes is multiplied by 100 and divided by the number of words in his composition. The quotient is the number of errors per hundred words. The "story value" of a pupil's composition is its value when errors of grammar punctuation, capitalization, and spelling are neglected. This value is measured by means of the scale. 27 TABLE XII. GRADE NORMS FOR WILLING 'S SCALE FOR MEASURING WRITTEN COMPOSITION, MARCH TESTING Grade III IV V VI VII VIII Approx. No. of Pupils 325 580 705 695 570 130 Story Value 25-percentile 30.5 43.4 55.7 60.9 65.9 65.3 Median 41.5 58.7 68.1 74.0 76.6 79.0 75-percentile 54.8 74.5 78.8 85.2 86.6 86.2 Form Value (Errors per 100 Words) 25-percentile 11.7 6.2 3.6 3.2 2.6 2.3 Median 18.5 10.7 6.8 5.8 4.4 4.4 75-percentile 26.0 17.3 10.9 9.8 6.4 7.0 The grade distributions given in Tables XIII and XIV indicate that the scale needs to be extended at both ends. It does not contain steps low enough in story value to provide adequate measures for many compositions contributed by pupils in the third and fourth grades. Neither does it provide adequate measures for the best compositions in grades beyond the fourth. For practical purposes these limitations are not serious because when a pupil's composition is as poor as 20 on this scale the pupil needs special attention. When a pupil writes a composition as good as 90 on this scale it is likely that special instruction is superfluous. In addition it may be pointed out that the median score of a class is probably not affected by these limitations of the scale. TABLE XIII. WILLING 'S SCALE FOR MEASUR- ING WRITTEN COMPOSITION. GRADE DISTRI- BUTIONS FOR MARCH TESTING Grade Story Value Score* IV V VI VII VIII IX 90 7 28 52 81 95 14 80 6 67 105 177 141 48 70 8 91 168 147 150 25 60 28 91 146 123 107 21 50 60 98 110 75 41 12 40 60 90 77 49 31 4 30 75 66 39 32 5 4 20 76 48 8 8 o 1 Total 320 579 705 692 572 129 Median 41.5 58.7 68.1 74.0 76.6 79.0 *These intervals are expressed in terms of their mid- points. References Willing, M. H., "The Measurement of Written Composition in Grades IV to VIII English Journal, VII (March, 1915), 193-202. 28 • The writer explains and outlines the measurement of written composition especially in con- nection with the Denver and Grand Rapids surveys. The method of constructing the scale, its use, scoring, results obtained, etc. are discussed, and the scale reproduced. The Denver Survey, 1916, Part II, pp. 59-63, and the Grand Rapids Survey, 1916, pp. 85-105, give accounts of the use of this scale. The latter contains rather complete tables and graphs of pupil achievement, and comparisons of results with those obtained in Denver TABLE XIV. WILLING 'S SCALE FOR MEASURING WRITTEN COMPOSITION. GRADE DISTRI- BUTIONS FOR MARCH TESTING G RADE IV V VI VII VIII IX 30 50 14 3 5 2 27 24 17 1 ' 4 3 24 22 27 8 4 1 21 32 29 9 4 1 18 43 38 19 18 4 15 29 51 45 17 15 2 12 42 56 52 48 28 4 9 34 120 105 96 49 12 6 32 79 149 140 131 21 3 14 98 170 199 200 49 5 41 141 158 138 41 Total 327 570 702 693 573 129 Median 18.5 10.7 6.8 5.8 4.5 4.4 * Errors per 100 words. G. Harlan's Test for Information in American History This test consists of ten exercises in the field of American History and is designed for use in the seventh and eighth grades. Each exercise consists of two or more parts. The maximum score which a pupil may receive is 100. TABLE XV. GRADE NORMS FOR HARLAN'S TEST OF INFORMATION IN AMERICAN HISTORY. MAY TESTING Grade VII VIII Number of pupils 25-percentile Median 75-percentile 1109 30.1 43.9 57.3 1691 45.7 68.2 83.3 In Table XVI the distributions of scores for the seventh and eighth grades are given. These distributions are of interest because of the very great 29 individual differences which they suggest. It is possible that the apparent differences are due in a considerable measure to the errors of measurement. Since there is only one form of the test no measure of reliability is available. TABLE XVI. HARLAN'S TEST OF INFORMATION IN AMERICAN HISTORY. GARDE DISTRI- BUTIONS FOR MAY TESTING Score Grade VII VIII 96 91 86 81 76 71 66 61 56 51 46 41 36 31 26 21 16 11 6 6 3 13 18 35 35 54 65 65 99 113 115 100 93 94 79 63 37 19 3 66 118 140 187 136 136 112 92 100 95 79 98 90 79 61 42 35 13 10 2 Total 1109 1691 Median 43.9 68.2 References Harlan, Chas. L., "Educational Measurement in The Field of History," Journal oj Educational Research, II (December, 1920), 849-853. The writer follows a short discussion of tests in the "content" subjects with a brief description of his test and its use in nine cities. The requirements he deems essential to a good test are listed as a basis of his test. Griffith, G. L., " Harlan 's American History Tests in the New Trier Township Schools," School Review (November, 1920), 697-708. The first half of this article is devoted to a general discussion of history and history testing. This is followed by a description of the test and the results of its use in the eighth grade of this township. Data are given for each of the single exercises of the test. H. Sackett's Scale in United States History This scale, arranged by L. W. Sackett, was originally devised by Bell and McCollum. It consists of seven tests which appear to have been intended 30 for use in secondary schools and colleges. The medians given in Table XVII are for the eighth grade. The number of scores is such that it is doubtful if the median scores have much value for use as tentative standards. TABLE XVII. GRADE NORMS FOR SACKETT'S SCALE IN UNITED STATES HISTORY. MAY TESTING. EIGHTH GRADE Tests I II III IV V VI VII Number of pupils 25-percentile Median 75-percentile 111 62.6 118.7 192.5 101 50.8 146.2 273.9 107 69.2 115.0 183.1 92 37.0 125.0 287.5 93 44.7 86.5 193.7 78 7.5 46.6 138.1 85 9.6 96.5 195.5 References Bell, J. C. and McCollum, D. F. "A Study of the Attainments of Pupils in United States History," Journal of Educational Psychology, VIII (May, 1917), ISl-lA. The writers follow a discussion of historical ability with an account of the use of test material in various schools from Grade V through the senior year of the University of Texas. The results secured are analyzed. The test questions used were in general similar in kind to those of Sackett's Scale in Ancient History, although based upon United States History. Sackett, L. W. "A Scale in Ancient History." Journal of Educational Psychology y. VIII (May, 1917), 284-93. The test questions are given with a brief statement of their source, use, and scoring. Results are given from almost 1000 papers, and the relative difficulty of the questions computed. Sackett, L. W. "A Scale in United States History," Journal of Educational Psy- chology, X (September, 1919), 345-348. The writer tells of the development of this scale out of the data furnished by Bell and McCol- lum 's work referred to above. The determination of the relative difficulty of the parts i& given considerable space. I. HoTz's First Year Algebra Scale This scale consists of five separate scales: (1) Addition and subtraction j. (2) Multiplication and division; (3) Equation and formulae; (4) Problems; (5) Graphs. Each sub-scale consists ofexercises arranged in order of increasing, difficulty. References Hotz, H. G. First Year Algebra Scales, Teachers College, Columbia University, Contri. butions to Education No. 90. The writer gives a history of the derivation of these scales, a complete reproduction of them, and a discussion of their administration and use. The statistical working out of the scales is treated fully for one of them, the procedure for all being the same. Cawl, F. R. "Practical Uses of an Algebra Standard Scale," School and Society (July, 1919), 89-91. The results of testing a class in a large private school are here presented. The matter of correlation with English, French, and Latin is considered. A short interpretation of results is given, with suggestions as to the value of using such a scale. 31 TABLE XVIII. GRADE NORMS FOR HOTZ'S FIRST YEAR ALGEBRA SCALES. MAY TESTING Grade IX X Addition and Subtraction Number of pupils 561 390 25-percentile 5.2 5.8 Median 6.9 7.3 75-percentile 9.1 8.7 Multiplication and Division Number of Pupils 570 388 25-percentile 5.7 5.9 Median 7.2 7.4 75-percentiIe 8.4 8.7 Equations and Formulas Number of Pupils 478 385 25-percentile 6.2 6.7 Median 7.7 7.9 75-percentile 9.7 9.1 Problems Number of Pupils 566 394 25-percentile 4.5 3.9 Median 6.4 5.0 75-percentile 8.6 6.3 Graphs Number of Pupils 121 413 25-percentile 5.2 4.1 Median 6.2 5.0 75-percentile 7.0 6.0 J. Minnick's Geometry Tests This series of tests is based on the assumption that the demonstration of a geometrical theorem involves the following abilities: Test A, the ability to draw the figure. Test B, the ability to state the hypothesis and conclusion. Test C, the ability to recall the facts concerning the figure. Test D, the ability to select and organize facts so as to produce the proof. Test E, the ability to draw auxiliary lines. The series includes one test for each of these abilities. No report is made for Test E. These tests are unique in that they provide for both positive scores and negative scores. The positive score is the percent of the necessary elements of the proof given correctly by the pupil. The negative score is the number of incorrect and unnecessary elements. References Minnick, J. H. An Investigation of Certain Abilities Fundamental to the Study of Geometry, University of Pennsylvania, This monograph gives a synopsis of methods and results used in deriving the tests, followed by a more detailed statement. The latter includes a reproduction of the tests, tables giving data secured from testing, statistical methods of weighting exericises, suggestions as to use, etc. 32 TABLE XIX. GRADE NORMS FOR MINNICK'S GEOMETRY TESTS Test A (ability to draw accurate figures for theorems) Number of pupils 25-percentile Median 75-percentile Test B (Ability to state hypothesis and conclusion in terms of given figure.) Number of Pupils 25-percentile Median 7S-percentile Test C (Ability to recall known facts about figures when one or more are given). Number of Pupils 25-percentile Median 75-percentile Test D (Ability to organize and select facts to produce a proof). Number of Pupils 25-percentile Median 75-percentile Positive Scores Negative Scores Grade Gra de X XI X XI 126 66 126 60 53.3 43.8 2.4 1.1 63.0 58.0 4.1 2.6 67.2 69.2 6.6 5.4 167 66 167 66 55.2 55.5 1.1 1.0 69.6 67.1 2.3 2.0 81.3 83.6 3.9 3.9 154 65 154 63 52.2 55.6 1.9 1.4 64.1 64.7 3.8 3.9 77.2 77.9 7.1 5.7 155 68 155 54 68.0 75.0 .8 .8 85.5 89.2 1.6 1.6 92.9 98.3 2.3 3.2 Minnick, J. H. "A Scale for Measuring Pupil's Ability to Demonstrate Geometrical Theorems," School Review, (Feb., 1919), 101-109. A brief account of the construction of a scale to measure one definite geometric ability is given. The scores made upon the first selection of exercises,' the resultant weighting and then the selection of those best suited to make up a scale are briefly treated. The exercises chosen are reproduced. Minnick, T- H. "Certain Abilities Fundamental to the Study of Geometry," Journal of Educational Psychology, (Feb., 1918), 83-90. Four abilities requisite to formal geometrical demonstration are listed. Their relation to teaching, development by teaching, and diagnosis by tests are discussed. The tests used were those of the author. Correlations with teachers marks are given. K. Holley's Sentence Vocabulary Scale This scale consists of a number of exercises of the following type: 1. Impolite people are kindly brave young ill-bred. 2. A man \s afloat \n a mine tower boat hospital. The pupil is asked to underline the word which makes the truest sentence. These exercises are arranged in order of increasing difficulty, and a pupil's score is found by subtracting one-third of the number of errors from the number correct. An abbreviated form of this scale has been incorporated in the Illinois General Intelligence Scale. The scale was constructed to provide a suitable means of ascertaining the general intelligence of groups of children. The measure which it yields is not sufficiently accurate to be used as an index of the general intelligence of individual pupils. The scale is also recommended as 33 an instrument for measuring the vocabulary of pupils. The total distributions given in Table XXI indicate that this scale is too difficult for pupils in the third and fourth grades. TABLE XX. GRADE NORMS FOR HOLLEY'S SENTENCE VOCABULARY SCALE APRIL TESTING Grade III IV V VI VII VIII IX X XI XII Number of pupils 25-percentile Median 75-percenti!e 406 8.4 16.6 28,5 520 16.7 25.1 33.6 465 25.3 33.0 39.9 450 33.4 42.8 51.5 1188 32.3 41.9 49.7 1047 40.1 47.7 55.8 253 40.9 49.0 57.1 223 50.1 56.0 63.5 155 52.4 59.9 67.9 108 54.5 62.7 70.1 References Terman, L. M. and Childs, H. G. "A Tentative Revision and Extension of the Binet- Simon Measuring Scale of Intelligence," Journal of Educational Psychology. (April, 1912) 205-208. The basis of Holley 's Sentence Vocabulary Scales is the Stanford Revi.?ion, 100 word Vocabu- lary Test, the construction of which is here described. Tentative standards of achievement are also given. Holley, C. E. Mental Tests for School Use. Bureau of Educational Research, Uni- versity of Illinois, Bulletin No. 4 pp. 86-91. This bulletin gives an account of a comparative study of six group intelligence scales, of which the above was one, based on data from the school system of Champaign, Illinois. A brief account of the origin of the Sentence Vocabulary Scales is included (p. 30). Branson, E. P. "An Experiment in Arranging High-School Sections on the Basis of General Ability," Journal of Educational Research, (Jan., 1921), 53-56. At Long Beach, California, this scale was given to two groups of high-school entrants who had recently taken, and been grouped by the Otis Group Intelligence Scale. At the end of the term the test was repeated. A comparison by groups oi the scores at the two periods, and correlations with the Otis Scale, are given. TABLE XXL HOLLEY'S SENTENCE VOCABULARY SCALE . GRADE DISTRI- BUTIONS FOR APRIL TESTING Grade III IV V VI VII VIII IX X XI XII 90 6 80 5 1 2 7 70 3 1 8 21 27 21 60 1 5 50 75 126 32 56 48 36 50 31 4 20 74 199 321 79 85 52 31 40 18 44 89 141 379 341 78 45 23 13 30 42 128 170 110 290 153 50 15 3 20 71 170 137 59 123 73 5 10 124 130 30 13 74 29 1 120 43 14 3 34 3 Total 406 520 465 450 1188 1047 253 223 155 108 Median 16.7 25.1 33.0 42.8 41.9 47.8 49.0 56.0 59.9 62.7 34 L. Holley's Picture Completion Test for Primary Grades This test, as it name suggests, consists of a number of pictures which are ■ incomplete. The pupil is expected to add the part which is missing. It was designed as an instrument for measuring the general intelligence of young children. The total distributions as given in Table XXIII indicate that it is not a good instrument for this purpose. The distributions exhibit unusually TABLE XXII. GRADE NORMS FOR HOLLEY'S PICTURE COMPLETION TEST FOR PRIMARY GRADES. JANUARY TESTING Grade Kinder- garten I II III IV Number of pupils 25-percentile Median 75-percentiIe 75 1.8 5.3 7.8 1438 4.4 7.8 12.2 1233 7.9 11.5 15.1 327 9.9 13.5 16.5 167 8.8 12.3 15.0 TABLE XXIII. HOLLEY'S PICTURE COMPLETION TEST FOR PRIMARY GRADES. GRADE DISTRIBU- TIONS FOR JANUARY . TESTING Grade Score I II III IV 20 34 31 9 4 19 23 43 13 4 18 33 46 25 9 17 36 59 20 12 16 42 64 27 3 15 52 72 34 10 14 47 76 22 15 13 54 88 26 14 12 48 98 24 17 11 82 77 19 14 10 72 92 24 9 9 81 84 24 12 8 9] 89 26 9 7 103 75 11 11 6 122 59 9 13 5 94 50 8 8 4 112 50 4 2 3 108 29 1 1 2 86 21 1 1 77 28 41 2 Total 1438 1233 327 167 Median 7.8 11.5 13.5 12.3 35 high variability. This is much greater than is exhibited by other tests when applied to children in these grades. The median scores given in Table XXII give further indications of the inadequacy of this test, particularly in the grades above the first. References Holley, C. E. Mental Tests for School Use. Bureau of Educational Research, University of Illinois, Bulletin No. 4, pp. 86-91. A general discussion of tests of this type is followed by an account of the testing from which this test came. This was done in Champaign, Illinois. Results are merely outlined. CHAPTER III THE DERIVATION OF MONROE's STANDARDIZED REASONING TESTS IN ARITHMATIC^ The process of problem solving." Reasoning"asit occurs in the solving of an arithmetical problem involves these steps: (1) A careful reading of the problem including the association of correct arithmetical meanings with the "technical" terms used in stating the problem. (2) Recall of facts and prin- ciples suggested by the problem and required for its solution. (3) Formulation of a hypothesis or plan of solution using as data the results of the first two steps. (4) Verification of this plan of solution. This process of reasoning is usually followed by the calculations outlined in the plan of solution. This additional step, however, is not a part of the reasoning process. Two kinds of words are used in stating arithmetical problems: (1) The descriptive words give the setting of the problem. Only in an indirect way do these affect the solution. (2) The "technical terms" of an arithmetical problem consist of those words and phrases which define quantities and quan- titative relationships. Every problem involves at least three quantities, two given and the third to be found. These quantities are related in a definite way. For example, the sum of the two quantities given equals the third, or the third is the quotient of one divided by the other. In problems involving two or more steps there are more than three quantities and the relationships are more complex. However, in every case there are words or phrases which either directly or indirectly tell what these relationships are, and, consequently, what operations must be performed to obtain the desired answer. This principle may be illustrated by the following problems: "What are the average daily earnings of a boy who receives $0.88, $0.25, $1.15, $0.75, $0.50, and $0.60 in one week?" The phrase "average daily earnings" names the quantity to be found and also specifies its relationship with the given quantities. The "average" is the quotient of the sum of the several amounts divided by the number of items. A knowledge of this definite meaning of "average" is necessary if one is form- ulating a rational plan of solving the problem. If the phrase "average daily" was omitted we would have an entirely different problem. "How many square yards of linoleum will be required to cover a floor 16 feet by 12 feet?" "How many square yards" names the third quantity in this problem and in connection with " 15 feet by 12 feet" specifies the relations which exist *A number of considerations on which the derivation of these tests is based are con- tained in an article by the writer in School and Society, Volume VIII, pages 295 and 424. Sample copies of these tests may be obtained from the Public School Publishing Company, Bloomington, Illinois. 37 between the quantities. This third quantity is the product of the dimensions divided by nine.^ In this case the number of square feet in a square yard must be remembered and also the principle that the area of a rectangle (i.e., a figure whose dimensions are given as in the problem) is the product of the length by the width. In many cases when the first two steps of the reasoning process have been completed satisfactorily, the formulation of the plan of solution (the next step in the reasoning process) involves little uncertainty. In fact it is essentially mechanical. This is the case in these illustrations. In the case of very simple problems, or very familiar problems, the reasoning process is usually short- circuited so that there is no explicit association of meaning with the technical terms nor recall of principles. The problem as a whole or some feature of it serves as a cue for the direct association of the plan of solution. In such cases there is strictly speaking, no reflective thinking or reasoning, and the mental process involved is much the same as that which occurs in the operations of arithmetic. The solution of the problem has become automatic. The nature of a reasoning test in arithmetic. A reasoning test in arithmetic is essentially a test of careful reading in a limited field to answer specific questions. In this reading, technical vocabulary is fundamental. The pupil gives evidence of his degree of comprehension by his plan of solution. The correctness of the numerical answer to the problem depends upon the ac- curacy of the pupil 's calculations and the recall of denominate number facts as well as upon the plan of solution. The plan, or principle, of the solution and not the accuracy of the numerical answer is, therefore, the measure of the pupil's ability to reason in arithmetic. Thus in describing a pupil's perform- ance on a reasoning test, errors in the recall of facts and in calculation should be disregarded. For the problems which are solved correctly in principle a score based on correct answers may be used as a crude measure of the pupil 's ability to perform the operations of arithmetic. In order that a pupil's score on a reasoning test may be indicative of his ability to solve arithmetical problems in general, the problems must be carefully selected with reference to content (vocabulary). The ideal reasoning test would be one that included all of the technical terms but this is not possible because the vocabulary of arithmetical problems is extremely varied and volum- inous. In another^ place the writer has reproduced 28 different forms of state- ment which were found in the examination of eight text-books for the problem, "Given, $7.50, paid for silk, and price per yard ?1.50, to find the number of yards purchased." This condition makes it necessary to select a few problems which will be representative in respect to content, in order to have a test of usable length. 'An alternative solution is to reduce each dimension to yards before finding the area. *Monroe, Walter S. Measuring the Results of Teaching, Houghton Mifflin Company (1918), 163. 3S Method of selecting problems on basis of content. In the case of the series of tests described in this report the representative problems were selected by the following method. The one- and two-step problems appearing in eight widely used texts were classified according to the operation or operations they called for. This gave in one group all the problems requiring only addition, in another those requiring only subtraction and so on. The problems in each of these groups were further classified by the writer on the basis of the technical terms used. This was found to be difficult because of the great variety of these terms. Since the classification represents the judgment of only one person, it cannot be considered final in any sense. The general plan of classification may be illustrated by the types of division problems. In the list on the following pages only those types are given which included problems found in five or more of the eight texts examined. A large number of additional types included problems from less than a majority of the texts. A descriptive statement of the type is followed by a limited number of illustrative problems. It will be noted that for a single type these problems are not identical in vocabulary but it was the judgment of the writer that they were sufficiently similar to justify grouping them together. It has been assumed that the terms used are essentially synonymous. This hypothesis is, of course, subject to experimental verification. Unfortunate- ly, this is lacking at this time. However, the resulting list of type prob- lems is more representative of the vocabulary of arithmetical problems as they occur in our texts than any other available list. Description of types and illustrative problems: 1. Given a whole to find number of parts of a given size, including to find the number of acres to produce a given yield. A baker used three-fifths lbs. of flour to a loaf of bread. How many loaves could he make from a barrel? When the average yield per acre is 25 bushels how many acres will yield 925 bushels. How many lengths three-fourths yds. long can be cut from 15 yds. of goods? How many hens can be properly accommodated in a pen containing 51 square feet, if each hen requires 6 square feet? 964 marbles are distributed equally among a certain number of boys. Each boy has 82 marbles. How many boys are there? At 7}4 gallons to the cubic foot, how many cubic feet will 3000 gallons of oil occupy? Oats weigh 32 lbs. to a bushel. How many bushels are there in a load weigh- ing 1344 lbs? 2. Given cost and price to find the number of articles purchased. This includes wages when question is how many days, weeks, etc. to earn a given amount. At 16 cents per pound, how many pounds of steak does a woman get it the amount of the purchase is 80 cents. 39 3. The reverse of No. 1. Given whole and number of parts to find size of each part. Three boys buy a rowboat for twelve dollars and seventy-five cents, sharing the expense equally. Find how much each boy has to pay. If 54 marbles are divided equally among 6 boys, how many marbles will each receive ? In 28 days a hotel used 361 lbs. of butter. How many pounds did it use a day? 4. The reverse of No. 2. A farmer paid thirty-three dollars and a half for 4 bushels of seed wheat. How much did he pay for a bushel? The bill for 58 tons of copper amounted to 612 dollars. What was the price per ton? A fowl weighing 4 and one half lbs. sells for 51.00. What is the price per pound? A man's wages amounted to 46 dollars for 9 and one-fifth day's work. How much did he receive per day? A man works 8 hours a day for 4 dollars and 80 cents. How much does he receive for each hour's work? -5. Given the price for a given denomination to find the price at a lower denomination. A boy bought a dozen oranges at the rate of 15 cents a dozen. W'hat did they cost him apiece? When milk is 10 cents a q,uart, how much is a pint worth? 6. Given the whole and the number of parts to find the average (rate, price, yield, etc.) A farmer raised 500 bushels of wheat on a field of 40 acres. What was the average yield per acre? A fast train runs from Chicago to a station 356.4 mi. distant in exactly 9 hours. What is the average rate of the train? A drover paid J1125 for cows, what was the average price if he bought 25? A mill employs 600 hands and has a weekly pay roll of $2,000. What is the average weekly wage for each employee? 7. The whole and the rate are given. The question is asked by, "How long?" If a horse eats three-eights bu. of oats a day, how long will 6 bus. last? How long will it take to earn 28 dollars at $1.75 a day? ■8. Given distance and rate to find how long. At 25 miles an hour, how long will it take an automobile to go 160 miles? 9. Given distance and number of units of time to find ratel In 3.2 hours a man walks 12.32 mi. How far does he walk in one hour? Find the rate of speed per hour made by an airship traveling 218.05 miles in 3.5 hour. 10. A fractional part of a whole is given to find the whole. If, when 18 and three eighths mi. of track are laid, one third of the road is completed, how long is the road? I sold a bicycle for 18 dollars. This was three sevenths of what I paid for it. How much did I pay for it ."^ 40 • 11. A percent of the whole is given to find the whole. If 33 and one third percent of a man's loss is 300 dollars, how much does he- lose ? A girl spent 25 cents which was 12>^ percent of her monthly allowance, how much was her allowance? A clerk had his weekly wages increased 3 dollars, or 16 and two thirds percent. What were his wages before this increase? 12. Given the amount of gain or profit and percent of gain or profit to find the cost or selling price. A hardware merchant makes a profit of 25 percent or 32 cents on saws. Find; the cost. A farmer sold his horse at a gain of 30 dollars, or 25 percent. Find the cost. 13. Given the commission and percent of commission to find amount sold. Five percent commission on a certain amount of money was 684.20 dollars. What was the amount? 14. Given two numbers to find what percent one is of the other. If 1000 lbs. of potatoes contain 180 lbs. of starch, what percent of potatoes is- starch? If a man saves 187.50 dollars out of his salary of 1250 dollars, what percent does he save? The boys in the Marshall school won 5 of the 8 games of hockey. What percent ? In his examination in arithmetic a boy had 10 problems out of twelve rights His grade was what percent? 15. Given two numbers to find what part one is to the other. The Jackson basket-ball team won 35 out of 56 games. What part did it win? A man spends for rent 360 dollars out of an income of 1 500 dollars. What part of his income is spent this way? 16. Given the amount of investment, or principle, and the income or interest to find rate. Mrs. Lynch received 24 dollars a year interest on 400 dollars loaned Mrs.. Burnet. What is the rate? 17. Given an amount in one denomination to reduce to a higher. An aviator reaches a height of 11,474 feet. Express this height in miles. A milk dealer sells 302 qts. of cream. Express this as gallons and quarts. In digging out a cellar 8260 cubic feet of earth were removed. At 27 cubic: feet to the cubic yard, how many cubic yds. were removed? 18. Given the value or face of a policy and premium to find the rate. Find the rate, given the face of the policy as 1500 dollars and premium 15- dollars. A fire insurance company charged 20 dollars for insuring an automobile for 1000 dollars. What was the rate of insurance. 41 19. Given the premium and rate of insurance to find face of policy. A man paid 50 dollars tor insuring a house, the rate being 2 and What was the face of the policy? 'A percent. Table XXIV gives the frequency of occurrence of each type in each of the eight texts examined. This table is to be read as follows: 10 problems classified as belonging to Type 1 which were found in Text 1; 30 such prob- lems were found in Text 2; 20 in Text 3; 34 in Text 4, etc. The total number of problems classified under Type 1 is 147. The variations in the frequency of the occurrence of problems belonging to a single type are worthy of notice. Some types have a high frequency in certain texts while in other texts their frequency is low and in many cases they do not occur at all. This means that different authors have tended to use different vocabularies. TABLE XXIV. FREQUENCY OF OCCURRENCE OF TYPES OF PROBLEMS IN DIVISION IN EIGHT TEXTS Text T V PF ?CtTV1RFR 1. irH' i^UiViiict^ 1 2 3 4 5 6 7 8 Total 1 10 30 20 34 8 13 13 19 147 2 28 14 123 11 - 75 49 27 327 3 _ 7 6 2 4 11 3 15 48 4 11 24 24 2 19 20 14 17 131 5 2 4 2 - 3 1 - 1 13 6 5 10 4 9 3 13 4 52 100 7 _ 2 2 3 2 1 1 2 13 8 1 1 - 4 - 4 1 - 11 9 2 7 3 - 1 1 2 12 28 10 _ 1 13 2 2 13 3 2 36 11 6 19 13 3 - 4 11 16 72 12 1 - 2 - 1 1 3 - 8 13 _ 1 1 2 4 1 - 1 10 14 9 42 12 41 3 14 39 30 190 15 12 5 - 1 - 18 4 40 16 1 2 1 1 1 8 1 8 23 17 6 1 1 1 - 2 2 - 13 18 8 4 - 1 - 3 7 23 19 9 - 3 1 - - 1 1 15 Space does not permit the reproduction of similar tables for addition, subtraction, multiplication and the classes of two-step problems. In Table XXV a summary of the frequencies of the occurrence of the several types is given. This table is read as follows: In the case of problems requiring only addition, three types occurred in all eight texts, one occurred in seven and two in six texts. The total number of types occurring in five or more of the texts is six. The total number of problems classified in these six types is 464. The total number of problems is 622. The reader should bear in mind thait no attempt was made in this classi- fication to determine what problems pupils should be asked to solve. The 42 problems have been taken as they occurred in the texts. In effecting the classi- fication, no consideration was given to the question of whether the problem was practical. In fact the purpose was not to obtain a list of practical problems but to secure a list of the forms of statement or language which had been used in the one- and two-step problems by the authors of widely used texts. Many of the technical terms of arithmetic are used (probably must be used) whether the problems are practical or not. Experimental selection of problems. In order to secure data for the construction of a series of reasoning tests in arithmetic, about 300 problems were selected out of the total number examined and classified. Out of this num- ber 156 problems were chosen for an experimental series of tests. In making the selections for this purpose the writer considered, in addition to the classifi- cation described above, the social importance of the problems. Thus a few types of problems which occur in a majority of the texts and have a high total frequency, were not represented. This introduces an additional subjective factor but in view of the emphasis which is being placed upon the social im- portance of the subject matter, the writer believes it is better to exercise judg- ment in this instance rather than to follow blindly statistics based upon the content of our present texts, particularly when it is obviously impossible to include representative problems of all types within a single series of tests of suitable length for classroom use. TABLE XXV. FREQUENCY OF TYPES OF PROBLEMS Number of types OCCURRING IN Total Frequency Frequency Operation No. of OF problems OF ALL 8 7 6 5 Types CLASSIFIED Problems texts texts texts texts + 3 1 2 6 464 622 1 2 4 7 211 456 X 5 5 2 3 16 1641 1938 5 6 3 5 19 1248 1610 + - 1 2 2 5 199 346 +x 4 2 3 4 13 472 718 +-^ 1 1 3 5 127 299 -X 1 7 8 166 559 ■- 1 3 4 114 413 XX 3 1 2 4 10 429 581 XH- 1 4 2 6 13 704 1234 ++ 7 -j--^ 70 Total 25 23 25 32 105 5775 8853 In constructing the experimental series, Test I was designed for grades four and five. Test II for grades six and seven, and Test III for grade eight. Some such division is necessary because certain social situations from which. 43 problems are taken are not studied until the later grades although the mathematical relationships are very simple. Pupils cannot be expected to solve such problems until they are acquainted with the social situations. For this reason all problems involving percentage were placed in Test III. No consideration was given to the relative difficulty of the problems in making this division except that no problems requiring common fractions were placed in Test I and for the most part decimal fractions were confined to Test III. Each test consisted of sixteen problems printed on a four page folder with space so that the pupil could do all of his work upon the test paper. The test papers showed that unless the pupil made errors and did his work over or used an elaborate method, ample space was provided except in a very few cases. The directions for administering the preliminary tests were essentially the same as those which now accompany the tests. A number of cities were invited to cooperate by giving the tests between April 1 and 15, 1918. Fourteen cities respxjnded, nine in Kansas, and one city in each of the following states: Illinois, Ohio, Michigan, New York, and Pennsylvania. Usable returns were received from 12,859 pupils. When the record sheets and the test papers were returned to the writer it was found that the directions for marking the papers were not sufficiently complete and explicit. Consequently, there was a lack of uniformity in the marking. In order to insure uniformity the writer, assisted by two clerks rescored the papers. Whenever an unusual or questionable solution was found a record was made and all similar solutions were marked in the same way. In this way a high degree of uniformity in the marking of the papers was secured. Space does not permit a detailed statement of the plan of scoring of the solution of each problem but the general plan may be indicated. The solution of a problem was considered correct in principle if the pupil's work showed that he had based his solution upon the relationships which exist between the quantities of the problem. For example, in the problem, "If a man has $275 in the bank and draws out 370, how much has he left in the bank?", there are three quantities: $275, $70, and the amount "left in the bank." These are related so that the difference between $275 and $70 must equal the amount left in the bank. A solution of the problem based upon this relationship must involve the subtraction of $70 from $275, or the finding of a number which added to $70 will make $275. In the problem, "A house rents for $35 a month. This is how much a year?", the three quantities are $25, 12, or the number of months in a year, and the amount for a year. The relation is that the product of $35 and 12 equals the amount of rent for a year. A solution based upon this relationship would usually be one in which $35 was multiplied by 12. In a few cases the pupil had set down $35 twelve times and added. This solution was counted as correct in principle because it was considered that the pupil had recognized 44 the relationship which existed between the quantities ot the problem. Inci- dentally it should be noted that although such a solution was counted as being correct in scoring the papers of the test, a teacher should not encourage it. In fact the writer believes it should be discouraged, except possibly when the pupil is learning the idea of multiplication, because the method is not efficient. It requires more time and there are more opportunities for error in the mech- anical work. In the case of the above problem, if a multiplier other than 12 was used the solution was counted as correct in principle because it was considered that correct recall of denominate number facts was not a part of the reasoning. In a few cases 35 was multiplied by itself. This was marked incorrect in principle. Although the pupils were directed to do all work upon the test papers a few gave only the answer in the case of certain problems. They had either solved the problem mentally or on another sheet of paper. An arbitrary rule was adopted. If the answer was correct the problem was marked correct in principle and answer. If the answer was wrong it was marked incorrect in both principle and answer. An answer was not marked correct unless the solution of the problem was correct in principle and the answer was numerically correct and in its lowest terms if it contained a fraction. It was not required that the answer be labeled with its denomination. Weighting the problems. For each problem three records were secured : (1) Number of pupils attempting the problem. (2) Number of solutions correct in principle. (3) Number of correct answers. From these facts the percent of solutions correct in principle and the percent of those solved accord- ing to the right principle which had also correct answers were calculated. These percents were translated into sigma values. The former being designated as the "P" value of the problem and the latter as the **C" value. In doing this it was assumed that the ability to solve problems was distributed normally and included between +2.5 sigma and — 2.5 sigma. The tables given in Rugg's "Statistical Methods as Applied to Education" were used. The values were calculated to two decimal places but in order to simplify the computation of scores they were expressed in terms of the nearest integer in the tests as now published. In the case of those problems which were solved by the pupils in two successive grades, the average inter-grade interval was found for each group of problems by taking the average of the differences of the sigma values of the problems of the test. This inter-grade interval was added to the values of the problems for the upper of the two grades to reduce them to the basis of the lower grade. The average of the two values was taken as the final value of the problem. 45 An attempt was made to reduce the sigma values to a common zero point, and thus secure comparable scores, by having a limited number of prob- lems from Test I appear in Test II and also a limited number of problems from Test II appear in Test III. It happened that some of the problems chosen showed inversion and for this reason it was deemed advisable not to attempt to reduce the values to a common zero. Thus the scores obtained from the differ- ent tests of the series are not comparable.' Construction of the final tests. Out of the 156 problems included in the preliminary test, 90 which belonged to types occurring in five or more of the eight texts examined, were selected for the final tests. Since in the select- ion on the basis of content there was no effort to include problems which ex- hibited wide range of difficulty, there was no attempt to construct a difficulty scale. In fact it is the judgment of the writer that the educational objectives implied by such a scale in the field of problem-solving are open to serious criticism. In our schools we should endeavor to instruct pupils to solve problems because they are socially worth while rather than because they exhibit a certain degree of difficulty. The purpose here is to construct a group of tests containing problems that are representative of the language in which problems are stated in our representative text books and which appear to be satisfactory for testing purposes. The final tests consist of 15 problems each. Test I is for grades four and five, Test II for grades six and seven, and Test III for grade eight. There are two forms of each test. In selecting the 90 problems for these tests those were rejected which were commented on unfavorably by those who gave the preliminary tests. Also those problems were rejected which were found to be particularly confusing to pupils. The arrangement of the order of the problems in a test was made without reference to their difficulty values. An attempt was made to secure as high degree of variation in the operations required as possible. In the two forms of each test the corresponding problems are ap- proximately equal in difficulty, and so far as possible, the two forms were made equivalent in other respects.^ 'The method of weighting is open to criticism. It is used in an attempt to give more credit for doing a difficult problem than for doing an easy one. It is not at al! certain that such a plan gives the most truthful indication of a pupil's ability. Some recent studies have shown that unweighted scores correlate very highly with the weighted scores obtained by this method. Therefore, it is likely that the tests would have been nearly as accurate measuring instruments without any determination of weights. *No determination of the reliability or validity of these tests was made as a part of the original derivation. Neither is it possible to make a report on these questions at this time. Some work which was done on the question of reliability indicated that the tests were less reliable than tests in the operations of arithmetic and in silent reading. This appeared to be due to the fact that frequently pupils are unable to do certain problems because of a peculiar course of study. 46 Analysis of errors made by pupils on preliminary test. In the pre- liminary testing the following six problems were given to 100 fifth grade pupils in one city. The results of an analysis of the test papers are given in Table XXVI. 1. Mrs. Black received $2 a yd. for broadcloth. She sold 78 yds. How much did she receive ? 2. At the store a towel roller costs 35c. George made one for his mother. He used 12c v/orth of lumber, 2c of hardware, and 3c worth of shellac. Find how much George saved his mother. 3. A Kansas farmer bought 80 acres of cheap land for $240. Oil being found on his farm he sold the land for $60,000. What was his profit? 4. A car contains 72,060 lbs. of wheat. How much is it worth at 87c a bushel? 5. _A field is 20 rds. long and 12 rds. wide. How many rods of fence are needed to enclose it? 6. What are the average daily earnings of a boy who received 88 cents, 25 cents, $1.15, 75 cents, 50 cents, and 60 cents in one week? TABLE XXVI. RESULTS OF ANALYZING THE ERRORS OF 100 FIFTH GRADE PUPILS Problem 1 2 3 4 5 6 Total Number of pupils attem.pting Errors in reasoning Errors in fundamentals Omissions and errors in copying Errors in decimals 100 8 1 2 13 100 21 7 I 92 39 26 2 16 52 38 16 3 25 94 33 2 94 67 35 4 14 532 206 87 11 68 Two significant facts are shown in this table. First, a majority of the errors (55 percent) are in reasoning. More than one-third of the attempts (39 percent) resulted in faulty reasoning. Second, 41 percent of the errors in calculation were in placing the decimal point. This second fact becomes more significant when we note that these errors occurred in problems involving only United States money and that the first and third problems which produced 29 of the 68 errors do not really involve decimal fractions. In these two problems the error consisted in pointing off the answer when it should not have been done. The wide spread use of the Courtis Standard Research Tests, Series B and other tests upon fundamentals has resulted in increased attention to the fundamental operations with integers. There should be no decrease in the emphasis upon this phase of arithmetic for one out of six pupils made errors in the simple calculations required in these problems but the greatest source of error and therefore the greatest need is increased attention to the mental processes involved in reasoning as it occurs in solving arithmetical problems. The necessity for doing this becomes more obvious when we examine the nature 47 of the errors in reasoning. In problem 2, 13 of the 21 errors in reasoning were due to adding all of the terms; in problem 4, 33 out of 38 of the errors in reason- ing were due to multiplying 72,060 by 87 without attempting to reduce the pounds to bushels; in problem 6, 58 of the 67 pupils who reasoned incorrectly simply added the terms. Each of these errors may be ascribed to inaccurate or incomplete reading of the problem, or the first step in the rational solution of a problem. In tabulating the scores of the preliminary test the variation in achieve- ment of different classes was particularly noticeable. It was evident that some teachers were teaching their pupils to solve problems while others, fre- quently within the same school system, were not doing so. It is also sig- nificant that a number of teachers consistently marked as correct, solutions which were clearly incorrect. This may have been accidental on the part of the teacher but this is very doubtful. A striking illustration was furnished by the problem: "A baker used 2/S lbs. of flour to a loaf of bread. How many loaves could he make from a barrel (1^6 lbs.) of flour?" The correct solution requires that 196 be divided by 3/5 which gives an answer of 326 2/3 loaves. In several instances teachers marked as correct all papers in a class in which 196 was multiplied by 3/5. This latter solution gives an answer of 117 3/5 loaves. The fact that teachers made such errors as this indicates that they are not familiar with solving reasoning problems. Perhaps this is one source of the poor records made by the pupils. CHAPTER IV Monroe's timed sentence spelling tests and pupil's errors How ability in spelling should be measured. In measuring ability in spelling by having a pupil spell words which are dictated in lists it is clear that the conditions under which the pupil spells the words are not the conditions under which he spells the words which he uses in writing themes, letters, and other school exercises. As a result we probably fail to obtain a measure of his "true spelling ability." If the test words are embedded in sentences and the sentences written from dictation we approach more nearly normal spelling conditions because the pupil is writing connected words which have meaning. A still closer approximation appears to be secured by dictating the sentences at approx- imately the rate at which the pupil is accustomed to write. By thus causing the pupil to write at approximately his normal rate of writing, he does not have time to study over the spelling of words, and as a result we secure a record of spelling which is largely automatic. Under such conditions a pupil 's attention is centered primarily upon writing and not upon his spelling. Of course, these conditions are not those under which the pupil normally spells words^ The writing from dictation may be an unusual exercise for the pupil. Some pupils will be accustomed to write more slowly than the rate of dictation. This may tend to confuse them. To what extent these and possibly other factors prevent our obtaining a record of the "true spelling ability" of the pupil by a timed-dictation test we do not know. It appears, however, that a Timed Sentence Spelling Test is likely to yield a more valid measure of spelling ability than a list of words dictated separately. The construction of Monroe's Timed Sentence Spelling Tests. In order to make easily available a timed sentence spelling test, the writer constructed a series of such tests, using test words chosen from appropriate columns of Ayres' Scale for Measuring the Ability in Spelling and basing the rate of dictation upon the measurements of the rate of handwriting of over six thousand Kansas school children. In order that the scores might have a high degree of reliability as measures of the spelling ability of individual pupils, fifty test words were used in each test. According to one study^ the probable error of an individual score for a test of fifty words is less than 1 .00 when the score is expressed as the percent of words spelled correctly. For a class of twenty-five or more pupils the probable error of the class score would be 0.2. "Otis, A. S. "The reliability of spelling scales involving a 'deviation formula' for cor- relation." School and Society, 4, (November 11, 1916), 716-22. This study deals with the reliability of tests consisting of isolated words and it is possible that the results might not apply to a timed sentence spelling test. 49 For grades III and IV the test words were taken from Column M of Ayres' Scale for Measuring Ability in Spelling. For grades V and VI they were taken from Column Q and for Grades VII and VIII and the high school they were taken from Coulmans S, T, and U. The test for the fifth grade is reproduced. The sentences are to be dictated when the second hand of the watch reaches the position indicated in the left-hand margin. In this test no test words come at the end of the sentences. Thus, the pupil who writes slowly will be much less likely to make low scores because he does not have time to complete writing the sentences. It should also be noted that all other words found in these sentences are easier to spell as shown by the Ayres Scale. It was thought advisable to allow time for dic- tating the sentences in addition to the actual writing. For this reason, the rate of writing for each school grade was increased by ten percent, sixty-six seconds instead of sixty being allowed for the number of letters which pupils commonly write in sixty seconds. Tentative grade Norms. This series of timed sentence spelling tests was given in sixteen Kansas cities in April and May, 1917. During the school year of 1919-20 scores were reported from a number of cities. The grade medians for these two groups of cities and the norms given by Ayres are given in Table XXVII. In comparing the successive grades it must be remembered that the same test words were not used for all grades. One list of test words was used for grades III and IV, another for grades V and VI and still another for grades VII and VIII and for the high school. The fact that the median scores in Table XXVII are materially below Ayres ' norms indicates that a different type of spelling ability has been measured by the timed spelling sentence test than that measured by Ayres In constucting his scale. (Ayres had the words dictated in lists). This fact becomes more apparent when it is recalled that many of the cities which gave these tests had used Ayres' Scale as a minimum course of study as well as a source of tCgt words. Thus, had the test words been dictated in lists it is likely that the m ian scores would have been materially above Ayres' norms.'" Monroe's Timed Sentence Spelling Test Arranged for the Fifth Grade. Seconds 60 The president gave important information to the men. 48 The women were present at the time. 19 The entire region was burned over. 49 The gentlemen declare the result was printed. 30 Suppose a special attempt is made. '''It is possible that the difference between the median scores and Ayres' norms may be due to factors other than the measurement of different types of spelling ability. Many of the pupils probably were not accustomed to writing from dictation and all were not accustomed to writing at the rate at which these tests were dictated. It is possible that these unusual conditions may have been operated to materially lower the scores ot a number ot pupils or even of most pupils. 50 60 The final debates were held. 24 Tht factory employs forty men. 51 Sometimes the connection is not made. 24 I enclose a written statement with the book. 3 Prompt action is needed. 25 It was a wonderful surprise to all. 55 The addition to the property was begun. 31 Remember y Saturday is the day. 57 They