LIBRA HY OF THE U N I VERSITY Of ILLINOIS 370 no -53 BULLETIN No. 48 BUREAU OF EDUCATIONAL RESEARCH COLLEGE OF EDUCATION EXPERIMENTAL RESEARCH IN EDUCATION By Walter S. Monroe Director, Bureau of Educational Research and Max D. En gel hart Assistant, Bureau of Educational Research PUBLISHED BY THE UNIVERSITY OF ILLINOIS URBANA OF ILLINOIS PREFACE There is urgent need for a comprehensive description of the tech- niques employed in educational research. There are a large number of texts dealing with statistical methods, especially the more ele- mentary ones, but statistical procedures represent only one group of the techniques of educational research. Among the techniques for which we have no adequate treatment, the need is probably most urgent for those relating to setting up and conducting experiments. Experimental research is a means for evaluating educational pro- cedures and, hence, occupies a position of importance. In general outline, the procedure is simple, but an analysis reveals its com- plexity. The idea of "controlled experimentation'' is easy to com- prehend, but it is not easy to specify precisely what is involved in maintaining a control group. In this bulletin an attempt is made to describe in some detail the procedure of controlled experimentation, and on the basis of the requirements revealed, a small group of experiments is evaluated. The analysis of the factors affecting pupil achievement and the evaluation of the factors considered are largely subjective. An at- tempt was made to utilize the best data obtainable, but the supply is inadequate and in some cases the information is not highly depend- able. Consequently, both the analysis and the evaluation must be considered tentative and subject to revision in the light of future investigations. The writers, however, believe that they have suc- ceeded in showing controlled experimentation to be a highly complex and an intricate type of research, rather than one which can be car- ried out successfully by any novice who is sufficiently interested. The bulletin should be of interest to teachers, supervisors, and administrators, as well as to research workers. The latter will find it helpful as a guide in planning and conducting an experiment and in interpreting the results. To the others, it should give a set of cri- teria that may be used in evaluating the experimental investigations reported in our educational literature. Trie writers are glad to take this opportunity to express their in- debtedness to Dr. C. W. Odell for a careful reading of Chapter III and to Mr. T. T. Hamilton, Jr., for the editing of the entire manuscript. January, 1930. Walter S. Monroe Max D. Exgelhart Digitized by the Internet Archive in 2012 with funding from University of Illinois Urbana-Champaign http://www.archive.org/details/experimentalrese48monr TABLE OF CONTENTS Chapter I. Introduction 7 Chapter II. The Requirements for Controlled Group Experimentation IS Chapter III. The Interpretation of Differences in Gains 59 Chapter IV. A Critical Evaluation of Experimental Studies Relating to Supervised Study . 77 Chapter V. Experimentation as a Procedure in Educa- tional Research 99 EXPERIMENTAL RESEARCH IN EDUCATION CHAPTER I INTRODUCTION The passing of speculation and authority. Until recently the typi- cal method of answering questions relative to education has been that of speculation, and the pronouncements of those recognized as authori- ties have been accepted generally as final; but history records a num- ber of attempts to solve thought questions in education by means of trial and observation of results. For example, Vittorino da Feltre (1378-1446) followed this procedure in devising methods of teaching that attracted much attention to his school, the Casa Giocosa at Mantua. 1 Wolfgang von Ratke, or Ratich, (1571-1635) also at- tempted to prove the value of his method by actual trial in practice. 2 The theories of Comenius and Rousseau found expression in prac- tice through the founding of the Philanthropinum at Dessau by Johann Bernhard Basedow (1723-1790) . 3 Johann Heinrich Pestalozzi (1746-1827) put his educational theories into practice in his schools at Stanz, Burgdorf, and Yverdun. 4 Johann Friederich Herbart (1776-1841) was a firm believer in the value of experimental pro- cedure and inaugurated a practice school along with his pedagogical seminar at the University of Konigsberg. 5 The evaluation of pedagogical theory by trial in practice was the aim of several pioneer experimental schools in the United States. Among the most notable of these were the Oswego Primary Teachers 1 Woodward, W. H. Vittorino da Feltre and Other Humanist Educators. Cambridge, England : Cambridge University Press, 1905. 261 p. 2 Raumer, Karl von. Geschicte der Pddagogik. Giitersloh : Druck und Verlag von C. Bertelsmann, 1902, p. 27-29. A briefer description of this "experiment" is given in: Graves, F. P. Great Educators of Three Centuries. New York : The Macmillan Company, 1912, p. 20-26. 3 Raumer, op. cit., p. 212-52. Brief descriptions are given in : Graves, op. cit., p. 112-21. Monroe, Paul. A Textbook in the History of Education. New York: The Macmillan Company, 1929, p. 580-83. 4 An account of his visit to Pestalozzi's institution at Yverdun is to be found in : Raumer, op. cit., p. 340-59. Other descriptions of Pestalozzi's work are to be found in: Barnard, Henry. Pestalozzi and his Educational System. Syracuse, New York: C. W. Bardeen Company, 1906. 751 p. Graves, op. cit., p. 122-66. Monroe, op. cit., p. 601-22. Parker, S. C. A Textbook in the History of Modern Elementary Education. Boston: Ginn and Company, 1912, p. 273-74. 5 For discussions of the work of Herbart see : Compayre, Gabriel. Herbart and Education by Instruction. New York: Thomas Y. Crowell and Company, 1907. 142 p. De Garmo, Charles. Herbart and the Herbartians. New York : Charles Scribner's Sons, 1896. 268 p. Graves, op. cit., p. 167-93. Monroe, op. cit., p. 622-39. Parker, op. cit., p. 375-430. 8 Bulletin No. 48 Training School, with its model school for observation, established by Edward A. Sheldon in 1861 ; 6 the experimental school inaugurated by Francis W. Parker when he assumed the principalship of the Cook County Normal School in 1883 ; 7 and the Laboratory School at the University of Chicago, established by John Dewey in 1896. 8 Early experimentation handicapped by inadequate conception of control of educative factors and by lack of instruments for measuring pupil material and pupil achievement. The pioneer experimentation in education failed to yield dependable results because of an inade- quate conception of control of educative factors. A single group of pupils was subjected to a complex of educative influences, including the novel procedure that was being tried, and after the close of the experiment, the results were ascribed, in many cases erroneously, to the novel procedure alone. A repetition with another group of pupils secured contrary results. This is well illustrated by the success of enthusiastic reformers who, in their own schools, showed an ap- parent superiority of their methods. Repetition by less enthusiastic schoolmen often failed to substantiate the contentions of the re- formers. A second handicap in these early experiments was the lack of instruments for measuring pupil material and pupil achievement. Measurement is fundamental to experimentation. The investigator must measure the original status of the pupils participating in the experiment, submit them to the experimental procedure, and measure them again. The pioneer experimenters were handicapped by their inability to secure quantitative measurements of the initial status of their pupils and of their final status after they had been subjected to the experimental procedure. The development of the concept of control of experimental condi- tions. The investigations of Rice, which were made between 1894 and 1897, were transitional in the techniques used. The results obtained with one group of pupils were compared with the results secured from other groups of pupils. Comparison of results obtained by one procedure with results obtained by other procedures is a means of 6 For a description of this school see: Autobiography of Edward Austin Sheldon. New York: Ives-Butler Company, 1911, p. 133-80. Dearborn, N. H. "The Oswego Movement in American Education," Teachers College, Columbia University Contributions to Education, No. 183. New York: Bureau of Publications, Teachers College, Columbia University, 1925. 191 p. 7 Rugg states, " .... he built up in the Cook County Normal School a faculty of ex- perimentalists, of fearless innovators, real students of childhood, and a practice school which proved an influential object lesson for both teachers and the general public." See: Rugg, H. O. "Curriculum- Making in Laboratory Schools," Twenty -Sixth Yearbook of the National Society for the Study of Education, Part I. Bloomington, Illinois: Public School Publishing Company, 1926, p. 87-91. 8 Dewey, John. The School and Society. Chicago: University of Chicago Press, 1900. 129 p. A revised edition of 164 pages was published by the University of Chicago Press in 1915. Experimental Research in Education 9 securing a measure of control of experimental conditions whose im- portance was recognized by Rice in the following statement: By a comparative study of results, even on a much narrower basis than I have indicated, a great deal might be accomplished in a very brief period toward the solution of the problem of methods. 9 The influence of Rice is evident in the report of an experimental investigation of spelling by Cornman. This research had as its object the determination of the relative effectiveness of formal instruction and incidental teaching in spelling. The results obtained in the two experimental schools were compared with those obtained in schools retaining the formal instruction. 10 Prior to 1910 the use of control groups was most prevalent in learning experiments conducted by psychologists under laboratory conditions, but several notable experiments were carried out with the use of control groups under school conditions. Three may be mentioned from the field of transfer of training. Bagley, W. C. and Squire, C. R. "Experiment on Transfer of Ideals of Neat- ness," performed in 1905 and reported in Bagley, W. C. Educational Values. New York: The Macmillan Company, 1911, p. 188-89. Ruediger, W. C. "The Indirect Improvement of Mental Functions Thru Ideals," Educational Review, 36:364-71, November, 1908. Winch, W. H. "The Transference of Improvement in Memory in School Children," British Journal of Psychology, 2:284-93, January, 1908; 3:386-405, December, 1910. The extent to which the use of control groups has been recognized by experimenters in education is indicated by the fact that control groups were employed in thirty-five out of seventy-two experimental investigations reported in the Journal of Educational Research from January, 1920 to June, 1927, X1 and in seventeen out of twenty-six ex- periments reported as Teachers College Contributions to Education from 1918 to 1926. 12 It is evident that this technique is almost uni- versally recognized as essential, even though a large proportion of contemporary experimenters fail to employ it. The development of instruments for measuring pupil material. The use of control groups, as an experimental technique, rests on the assumption that equivalent groups can be secured. In order that equivalence may be secured, it is essential to measure the pupils with respect to characteristics which influence learning in the experiment. 9 Rice, J. M. Scientific Management in Education. New York: Hinds, Noble and Eld- redge, 1914, p. 51. The chapter from which this quotation was taken was first published in The Forum for January, 1897. 10 Cornman, O. P. Spelling in the Elementary School: An Experimental and Statistical Investigation. Boston: Ginn and Company, 1902, p. 59. "Monroe, W. S., et al. "Ten Years of Educational Research, 1918-1927," University of Illinois Bulletin, Vol. 25, No. 51, Bureau of Educational Research Bulletin No. 42. Urbana : University of Illinois, 1928, p. 79-80. 12 Ibid., p. 82. 10 Bulletin No. 48 Educational experimentation has acquired one of its most important tools in the development of tests to measure the chief of these charac- teristics — intelligence. The following paragraph briefly traces their development. The work of Galton (1869- ) and Cattell (1890- ) and other American psychologists on the differences in mental abilities of in- dividuals has been said to mark the beginning of modern intelligence testing. 13 In 1905, Binet, in collaboration with Simon, published the first individual intelligence scale. 14 Intelligence testing became fairly common when Terman's revision of the Binet-Simon Scale became generally available in 1916. In 1918 appeared the first group intelli- gence scale designed for school use, that of Otis, 15 and since 1918, group intelligence tests have been widely used in elementary and secondary schools, and to some extent in colleges and universities. It is estimated that seven to ten million are used annually at present. 16 In 43 per cent of the learning experiments reported in the Journal of Educational Research from January, 1920 to December, 1928, in- telligence tests were used to measure pupil material for the purpose of securing equivalent groups. The development of instruments for measuring pupil achievement. For securing control groups that are equivalent to experimental groups in such an important characteristic as previous school achievement and for measuring the experimental achievement, valid and reliable instruments are essential. In 1908, Stone, under the direction of Thorndike, devised the first standardized achievement test. 17 This was followed in the next few years by Courtis' Arithmetic Tests, Series A (1909), Thorndike's Handwriting Scale (1909), Hillegas' Compo- sition Scale (1912), Buckingham's Spelling Scale (1913), and Ayres' Handwriting and Spelling Scales (1912-15). 18 In more recent years there have been developed a multitude of achievement tests in almost all of the school subjects, both elementary and secondary, and to some extent in subjects of higher education. Some progress is being made at present in the development of measurements of character and per- sonality. It has been estimated that thirty to forty million standard- ized tests and scales are used annually, of which, three-fourths are tests of achievement. 19 In 58.3 per cent of the learning experiments 13 Monroe, et al., op. cit., P. "/bid., p. 90. ™Ibid., p. 98. ™lbid., p. 114. "Ibid., p. 90. 18 7bid., p. 91. ™Ibid., p. 114. Experimental Research in Education 11 reported in the Journal of Educational Research from January, 1920 to December, 1928, standardized achievement tests were used to measure pupil achievement for the purpose of evaluating the effect of the experimental procedure. 20 Development of statistical techniques used in securing equivalent groups and in interpreting differences in gains in achievement. The theory of correlation, discovered by Galton about 1875 21 and ex- tended by Pearson, Yule, Spearman, and others, 22 has enabled ex- perimenters to evaluate the validity and reliability of intelligence tests used to secure equivalence and of educational tests used to measure gains in achievement. Gauss, Encke, Quetelet, Galton, Pear- son, Sheppard, Yule, Spearman, Filon, and Kelley should be men- tioned for their work in the development of the statistics of errors. The error of a difference formula, particularly useful in the interpre- tation of differences in gains in experimentation, has evolved as a a result of the work of Encke, Airy, Sheppard, and Yule. 23 The suggestion of the "experimental coefficient" by McCall in 1923 has provided experimental workers in education with a criterion for testing the statistical significance of a difference. 24 It would be possible to mention many other statistical devices that have been developed in recent years and that are of service in educational experimentation. Development of educational tests accompanied by interest in ex- perimentation under school conditions. The development of educa- tional tests was accompanied by increasing interest in experimen- tation under school conditions. Leaders in the field of education stimulated this interest by speeches at educational meetings and by editorials in educational journals. The following quotation from an editorial in the first number of the Journal of Educational Psychology is characteristic of these utterances. Educational practice is still very largely based on opinion and hypothesis, and thus will it continue until competent workers in large numbers are enlisted in the application of the experimental method to educational problems. Little more than a beginning has been made in this important movement. 23 ^It is, of course, not essential that an achievement test be standardized for it to be suitable for use in an experiment. Standardized tests are usually better constructed than tests made informally, and as such, are better measures of achievement. See: Odell, C. W. Traditional Examinations and New-Type Tests. New York: The Century Company, 1928, p. 21. 21 Adrain, Laplace, Plana, Gauss, and Bravais had developed some ideas of correlation before Galton. but the first clear statement of the theory and the first use of the term "cor- relation" in 1888 must be credited to him. See: Walker, H. M. Studies in the History of Statistical Method. Baltimore: The Williams and Wilkins Company, 1929, p. 92-106. 22 Ibid., p. 107-41. 2 Hbid., p. 114-15. 2i Ibid., p. 180. The idea of such a ratio was first developed by De Moivre, Kramp, and McGaughy. ^Journal of Educational Psychology, 1:2, January, 1910. (An editorial.) 12 Bulletin No. 48 In an article published in this same issue of the journal, which has since published more learning experiments than any other periodical, Thorndike states: Schoolroom life itself is a vast laboratory in which are made thousands of experiments of the utmost interest to "pure" psychology Experts in education studying the responses to school situations for the sake of practical control will advance knowledge not only of the mind as a learner under school conditions but also of the mind for every point of view. 28 Dearborn urged the repetition of laboratory experiments under school conditions. His emphasis on the use of appropriate techniques and his plea for careful work, coming as it did in 1911 long before "mass production" in educational research had been reached, should not fail to be noted. The following quotation illustrates the character of Dearborn's pleading: If this is to be a serious school experiment, practice should be carried out for months at a time, and longer. The entire subject may be dropped for a year or more from the work of one class or group and carried on with regular and persistent practice in a comparable group. Such an arrangement of the work in the early years of the elementary school in view of the importance of the ex- periment and in view of the possible flexibility of the elementary-school course would not be an undue interference with the work of the school. 27 In 1914, Whipple urged that learning experiments under labora- tory conditions and using adults as subjects be repeated under school conditions with children as the subjects. Such a statement as the following could not help but stimulate research workers in the field of education to undertake investigations of the type advocated. .... I believe that in one important phase of experimental work — that dealing with the effects of practice and its spread or transfer — experimentation with children has been somewhat neglected, and that most of the conclusions now current upon the nature of formal discipline have been based upon observa- tions carried on with adults The whole problem of practice might be recanvassed to advantage with children working under classroom conditions. 28 A sublime faith in the value of experimentation in the solution of educational problems is expressed in the following quotations: Now comes the experimentalist, and with clear, unfaltering eye and steady, relentless tone, he demands of each subject the justification for its existence. 29 Everywhere there are evidences of an increasing tendency to evaluate educa- tional procedures experimentally Scientific organizations, research com- mittees, an institute of educational research, and large educational foundations are lending such impetus as make experimental education the most important current movement in education. 30 26 Thorndike, E. L. "The Contribution of Psychology to Education," Journal of Educational Psychology, 1 :12, January, 1910. 27 Dearborn, W. F. "Experimental Education," School Review Monograph No. 1. Chicago: University of Chicago Press, 1911, p. 10. ^Whipple, G. M. "Applicability to Children Secured with Adults," Journal of Educa- tional Psychology, 5 :362, June, 1914. (An editorial.) 29 Bell, J. C. "A New Humanism Needed," Journal of Educational Psychology, 9:165, March, 1918. (An editorial.) 30 McCall, W. A. How to Experiment in Education. New York: The Macmillan Com- pany, 1923, p. 2. Experimental Research in Education 13 It is to the experimental method that education must look for the solution of many of its most vexing problems. It is upon this basis that the ultimate establishment of education as a science must rest. 31 Thus within a relatively short period, controlled experimentation reached the highest vogue in the repertoire of research workers in education. Through the stimulation of leaders in this field, the multi- tude engaged in educational experimentation. The following para- graphs portray the awakening of a few to the limitations of present experimental method even with perfected techniques. Recent criticism of the experimental method in education. In recent years the feeling has arisen on the part of some leaders in the field of education that educational experimentation, as it is carried on at present, is largely futile. The need for a program of research in teaching becomes more apparent when the nature of the so-called "scientific investigations" in that field is considered. In general, many of the investigations are too limited in duration, involve too few subjects, and are too crudely done to warrant satisfactory conclusions. The topics investigated are unrelated, and many of those attempting research have not been properly trained for such work. 32 A survey of the learning experiments reported in the Journal of Educational Research, Journal of Educational Psychology, and the Teachers College, Columbia University Contributions to Education during the period 1918-27 provided the data on which the following conclusion is based. Although no systematic survey has been made, it appears that the permanent accomplishments of educational research during this period are much less than the quantity of production would lead one to expect. This is especially true of experimental studies. 33 Henmon, after three years work with the Modem Foreign Lan- guage Study in the production of tests and in the setting up of con- trolled experiments, reflects as follows on the possibilities and diffi- culties of experimentation: We teach our students to be scornful of tradition and mere observation and insist that all things must be subjected to the test of controlled experimentation. This is undoubtedly a healthy attitude to take if education is to become a science but the constant reader of the present day educational literature cannot in his critical moments help but be troubled by the imperfections and ambiguities of our measurements and the inconclusiveness of our sporadic experiments. When, for example, on such an important problem for educational theory and practice as the effect of equal practice on individual differences, whether equal practice increases or decreases them, we find out of twenty-four experimental studies twelve of them leading at least tentatively to the conclusion that differences are increased and twelve to the conclusion that differences are decreased, we cannot 31 Good, C. V. How to do Research in Education. Baltimore: Warwick and York, 1928, p. 146. 32 Woody, Clifford. "The Values of Educational Research to the Classroom Teacher," Journal of Educational Research, 16:175, October, 1927. 33 Monroe, W. S., ct al. "Ten Years of Educational Research, 1918-1927." University of Illinois Bulletin, Vol. 25, No. 51, Bureau of Educational Research Bulletin No. 42. I'rhana : University of Illinois, 1928, p. 84. 14 Bulletin No. 48 help wondering about our experiments and about the conclusions derived from them. 34 The following quotations show that the feeling of distrust for the results of educational experiments reported in the literature is not restricted to the men quoted above. Or the investigator gives a few standard tests; he finds the pupils very deficient. He calls the teachers together; he arouses great enthusiasm, doubles the time to be given to the subject, introduces an entirely different method, works up a high degree of skill in the use of it, and after a few months "con- cludes that the new method was alone responsible for the improvement observed. Everybody should at once follow suit. 35 Perhaps the extreme case is that of the examination and treatment of a fourth-grade pupil, found to be deficient in reading. After a brief diagnosis and application of "remedial measures," the announcement is gravely made that in the light of this experience we may safely assume that the proper method of dealing with all fourth-grade pupils having similar disabilities is that used in this case. Making a sweeping generalization on the basis of a single instance would seem to exhaust the possibilities of the scientific method in education and leave nothing to be desired in the way of economy, efficiency, and dispatch. Many of the "conclusions" appended to recent "scientific" investigations have little more to support them. We are in a fair way to be able to prove anything. A few figures and a graph will turn the trick. 36 We have observed in many of the practices of educational research workers a tendency to shallowness. We have taken occasion to point out more than once a lack of sustained effort, a willingness to flit from one thing to another, and an unwillingness to stay with a problem until fundamental — the word seems to haunt us — until fundamental results are secured .... We are threatened with becoming mere dabblers in research, foolishly confident of the virtues of a fresh start. 37 Another line of inquiry has to do with the operations of the classroom. Some of the most influential investigations made in recent years have had to do with the problems of classroom procedure, and yet anyone who contrasts the facts which appear during observation of a good teacher and the recommenda- tions made in even our best textbooks on methods knows that the scientific description of teaching is in its infancy. 38 We must use greater care to make certain that the conclusions we state in our reports follow logically from the data presented. Too many reports state conclusions that are not fully supported by the research data included in them. This association should interest itself in the quality as well as in the quantity of educational research. 39 Nevertheless, I can not evade the conviction that, relatively speaking, the published research in education is, on the whole, inferior in quality, and more especially inferior in ultimate significance, to the published research in other branches of scientific endeavor. Too many contributions seem essentially futile. 34 Henmon, V.A.C. "Measurement and Experimentation in Educational Methods," Journal of Educational Research, 18:185-186, October, 1928. 35 "Assuming the Major Premise," Journal of Educational Method, 2:229, February, 1923. (An editorial.) ™Loc. cit. ""Fundamentalism in Research," Journal of Educational Research, 9:331, April, 1924. (An editorial.) ^Judd, C. H. "Research in Elementary Education," Journal of Educational Psychology, 17:224-225, April, 1926. 39 Trabue, M. R. "Educational Research in 1925," Journal of Educational Research, 13:344, May, 1926. Experimental Research in Education 13 After you read them, vou feel like saying: "Well, suppose it is true; what of it?" 40 It is easily charged, and must be admitted, that initial effort to apply ex- perimental techniques to the intricate problems of human affairs is often a lame and halting procedure, and far too much may easily be claimed by way of fact and inference as forthcoming from first efforts in this direction. 41 Educational experimentation in a plateau period. The previous discussion has traced the past of educational experimentation. It has been shown that this method of answering thought questions in education has undergone an evolutionary development over a period of some centuries. The contributions of laboratory experimentation in the field of psychology and the aid rendered by the production of more suitable measuring instruments have been mentioned. Finally, some indication was given of the effect of the writings of prominent leaders in the field. These writings, stimulating and optimistic for the most part a few years ago, have been replaced by others reflecting disillusionment and an attitude of distrust for this method of research in education. However, the feeling seems to be that the fault lies not with the fundamental theory of experimentation nor with the difficulties involved when human beings are the subjects of experi- ment, but with our present experimental techniques. That is to say, there is a feeling that the mediocre quality of experimental results has been due to the lack of adequate techniques and to the belief that conclusive results will be secured when techniques are perfected. If this is true, possibly an analogy may be drawn with plateaus in learn- ing. The lower order ''habits" have been formed; improvement is at a standstill until higher order "habits" have been perfected. Definition of experiment. The foregoing discussion has been given before defining the term "experiment," since the concept repre- sented by this term has undergone an evolution analogous to the historical development of the method itself. The concept of "educa- tional experimentation" that is expressed in the following paragraphs is the culmination of this development; it is, therefore, appropriately given at this time. A child's achievement from a period of learning is the resultant of several educative factors. "Experimentation" is the name given to the type of educational research in which the investigator controls the educative factors to which a child or group of children is sub- jected during the period of inquiry and observes the resulting achieve- 40 Whipple, G. M. ''The Improvement of Educational Research," School and Society, 26:251, August 27, 1927. 41 Haggerty, M. E. "The Scholarly Study of College Education," Journal of Educational Research, 19:140, February, 1929. (An editorial.) 16 Bulletin No. 48 ment. The meaning of this definition will be clearer if consideration is given to some of the procedures employed. In the simplest type of educational experiment the investigator seeks to evaluate the in- fluence of some one educative or "experimental" factor on a single group of children. He must start the experiment with some measure- ment of the initial attainment of the children in the trait or ability to be influenced. He then subjects the group to the experimental fac- tor, such as a particular type of drill material in arithmetic, for the duration of the experiment. At the end, the investigator applies a final test for the purpose of determining the gain in achievement that has resulted from the application of the experimental factor. This simple type of experiment may be illustrated by describing briefly one reported by Glick. This experiment had as its problem the determination of the effect of practice on intelligence test scores. 42 Students were tested at the start of the experiment with one of the forms of the Army Alpha Intelligence Examination. The experimental factor consisted of prac- tice exercises similar to, but not identical with, the exercises in the sub-tests of the intelligence examination. After certain intervals other forms of the Army Alpha were administered to these same students. The increase in scores from one application of the intelli- gence examination to another is a measure of the effect of the ex- perimental factor operating over the interval of practice. A single group experiment, such as the one just described, is ap- propriate when it is evident that the effect is to be ascribed to the operation of only one educative factor. In many cases, however, such a situation does not exist. Instead of being able to have the subjects influenced by one factor, they are influenced by many. If one were to use a single group of individuals, it would be impossible to say how much of the effect was due to any particular cause. When two or more groups are used, it is possible to subject them to identical conditions with the exception of the experimental factor. The dif- ference in the effect when one group is compared with the other may be ascribed to the operation of this single factor. This method may be illustrated by describing an experiment by Anibel. The problem of this investigation was the determination of the comparative effectiveness of the lecture-demonstration and individual- laboratory methods in chemistry. 43 Anibel set up two groups pre- 42 Glick, H. N. "Effect of Practice on Intelligence Tests," University of Illinois Bulletin, Vol. 23, No. 3, Bureau of Educational Research Bulletin No. 27. Urbana : University of Illi- nois, 1925, p. 6. 43 Anibel, F. G. "Comparative Effectiveness of the Lecture-Demonstration and Individual Laboratory Method," Journal of Educational Research, 13:355-65, May, 1926. Experimental Research in Education 17 sumably equivalent in intelligence as follows: "A student from the lecture-demonstration or test group was paired, for the purpose of comparing achievement records, with a student from the individual- laboratory class or control group." 44 The investigator sought to keep certain educative factors identical in both of these groups. He states, "all classes in chemistry met five times per week for forty-five minute periods .... The classroom instruction was identical for the two groups, thus equalizing any factors that might be present in class- room instruction.' 745 After getting these educative factors under con- trol, it was possible for him to use different instructional procedures in the laboratory instruction of the groups. These instructional pro- cedures were demonstrations of chemical experiments before one group of the pupils, while the pupils of the other group were required to perform the experiments for themselves. The difference between the two groups in gains in achievement is to be ascribed, with certain limitations, to the superiority of one of these instructional procedures over the other. The problem of this investigation. The previous discussion has indicated to the reader that educational experimentation has under- gone a long period of development. In the last twenty years, hundreds of learning experiments have been performed under school condi- tions. Early enthusiasm has been replaced to some extent by ex- pressions of distrust. Hence, there appears to be a need for a critical analysis of experimentation as a procedure in educational research. In the chapters that follow, the present writers attempt: 1. To describe in detail the procedure that should be followed in educational experimentation to arrive at dependable conclusions. 2. To apply the procedure outlined as a means of evaluating a group of experiments. 3. To formulate an appraisal of the present status of experimen- tation as a procedure in educational research. 44 Anibel, op. cit., p. 356. i5 Ibid., p. 356. CHAPTER II THE REQUIREMENTS FOR CONTROLLED GROUP EXPERIMENTATION The general plan of controlled group experimentation. In a con- trolled experiment there are two groups of pupils which are equivalent in all respects that affect learning in the field of experimentation. The instruction and other educative influences to which the two groups are subjected are the same except for one factor. This experimental factor may be an instructional technique, the size of the class, the textbook, or any other educative influence that may be studied experimentally. The difference in the gains in achievement made by the two groups during the period of experimentation is an index of the relative merits of the two forms of the experimental factor. 1 This plan may be de- scribed more formally as follows: Let Ei = mean initial status of experimental group in the abilities that the application of the experimental factor is expected to affect. C-l = mean initial status of control group in the same abilities. E 2 = mean final status of experimental group in the abili- ties that the application of the experimental factor is expected to have affected. C 2 = mean final status of the control group in the same abilities. E 2 — E x = Gain E C 2 - C x = Gain C The "difference in gain," D, equals the result found when Gain C is subtracted from Gain E. If this difference is positive, the status of the experimental factor prevailing in the experimental group repre- sents the more effective instructional conditions. If the difference is negative, the opposite conclusion is indicated. The validity of this interpretation depends upon the satisfaction of three requirements: (1) The two groups of pupils are equivalent at the start of the experi- ment. (2) All educative factors except the experimental one are the same for both groups. (3) The measures of achievement from which the gains are computed are both valid and accurate. When any one a An educational experiment may involve more than two groups of pupils and may be more complex in other respects, but the following discussion assumes the simple plan described here. Later, attention will be given to the procedure of the more complex experiments. 18 Experimental Research in Education 19 of these requirements is not fully realized, it becomes necessary to discount the difference in the gains made by the two groups. If the difference is small, and if the departure from the requirement is large, the relative merits of the two procedures compared will not have been determined. Questions considered in this chapter. In this chapter the follow- ing questions are considered: 1. What is required to secure equivalent groups of pupils? 2. What are the important educative factors that affect the achievements of pupils? 3. What is involved in controlling the important educative factors that affect pupil achievements? The analysis of the causes that affect achievement and the determi- nation of the important educative factors is, of course, only tentative. Although there are a number of causal investigations in which an attempt has been made to determine the contributions of certain fac- tors to achievement and their relative potency, the available evidence is fragmentary, and there is reason to doubt the validity of the find- ings, at least in a number of cases. 2 Consideration of experimentation as a research procedure, however, requires that an attempt be made to identify the more important educative factors. In doing this, the present writers have endeavored to make use of the best data obtain- able, but they are not unmindful of the fact that in this case the best data may not be sufficiently valid to accomplish the desired result. Consequently, the conclusions presented in the following pages relating to the factors to be considered in educational experimentation should be thought of as tentative and subject to modification when more dependable data are available. The significant characteristics of pupil material. In listing the significant characteristics of pupil material, we are concerned only with those that affect achievement in the field of experimentation. Obviously, such characteristics as color of hair, degree of beauty, and height, do not belong in the list. On the other hand, general intelli- gence and previous achievement in the field of experimentation must be included. The following characteristics appear to deserve con- sideration : 1. General intelligence in terms of point scores, or of mental age 2. Chronological age 3. Previous achievement in the field of experimentation 2 Burks, B. S. "On the Inadequacy of the Partial and Multiple Correlation Technique, Journal of Educational Psychology, 17:532-40, 625-30, November, December, 1926. 20 Bulletin Xo. 48 4. Study habits 5. Personality traits (attitudes, ideals, and interests) 6. Physical condition (health) 7. Sex 8. Race 1. There is abundant evidence that general intelligence, as meas- ured by typical intelligence tests, influences the achievement of chil- dren. Many investigators have concluded that it is the most impor- tant factor. The following conclusion from the report of a recent investigation by Heilman is indicative of this belief: Our results also appear to show that under the prevailing conditions of the home and school organization, intellectual endowment, or whatever is measured by the Stanford-Binet test, has by far the most powerful influence in determining differences in achievement in the traditional curriculum. It is not unlikely that a similar statement could be made for achievement in general. 3 This may not be true in the case of some pupils, but the general statement appears to be justified. Hence, general intelligence (mental age, or test scores) may be placed at the head of the list of significant characteristics of pupil material. 2. The significance of chronological age 4 becomes apparent when a child having a mental age of twelve years and a chronological age of ten years is compared with one whose corresponding ages are twelve and fifteen. The first child has an I. Q. of 120 and the second one, an I. Q. of 80. Although the two children have equivalent mental ages, the first one is "bright" and the second is "dull." The significance of chronological age is further shown by a comparison of two children of the same I. Q. but of different chronological ages. Although the chil- dren are equally "bright," the difference in mental ages, as well as the differences in physiological and social maturity, emphasizes the im- portance of chronological age as a factor in school achievement. The importance of chronological age as a factor in school achievement is recognized by those who recommend homogeneous grouping on the two bases, mental age and chronological age. 5 An excellent discussion of the influence of chronological age, or the maturity of which it is an index, is to be found in a recent monograph by Commins. 6 3 Heilman, J. D. "Factors Determining Achievement and Grade Location," The Pedagogi- cal Seminary and Journal of Genetic Psychology, 36:454, September, 1929. For a comprehensive account of the influence of general intelligence upon school achieve- ment . see : Terman. L. M., et al. "Nature and Nurture, Their Influence Upon Achievement," Twenty- Seventh Yearbook of the National Society for the Study of Education, Part II. Bloomington, Illinois: Public School Publishing Company. 1928. 397 p. 4 The I. Q. might have been listed as a pupil characteristic instead of chronological age. The measurement of the latter is objective; hence it is to be preferred. When chronological age is included, the I. Q. is superfluous. 5 Freeman, F. N. Mental Tests. Boston: Houghton Mifflin Company, 1926. p. 23. 6 Commins. W. D. "Maturity and Education," Educational Research Bulletin, Vol. 3, No. 7, Catholic University of America. Washington : Catholic University Press, 1928, p. 36. Experimental Research in Education 21 3. Previous achievement 7 is a significant characteristic of the pupil material when it functions as a prerequisite for the learning involved in the experiment. For example, ability to read function? as a tool in learning arithmetic, geography, history, literature, and the like. 8 Certain abilities in arithmetic and algebra function as tools in the study of chemistry, and achievement in chemistry contributes to achievement in physics. Achievement in the first year of a foreign language functions as a tool in the more advanced study of that language. It would be easy to enumerate a large number of cases in which abilities engendered in a school subject function later in the learning of that subject or related subjects. Abilities that function as a prerequisite for learning in one school subject may, or may not, be significant for learning in another school subject. For example, achievement in the first year of a given foreign language would be of more significance in an experiment in the second year of that language than it would be in an experiment in a different language. Achievement in the first year of a foreign language would probably be of least significance in an experiment that involved type- writing. The previous achievement of children becomes of increasing importance as a factor in the achievement of the experiment in pro- portion to the extent to which the children have experienced subject- matter similar in content to that of the experiment. 4. The term study habits is used to designate a somewhat indefi- nite group of procedures employed in doing assignments. Their general nature is indicated by samples of the rules proposed by Whipple. 9 a. When possible, prepare the advance assignment in a given subject directly after the day's recitation in it. b. Form a time-study habit. c. Form a place-study habit. d. Don't stop work when you have just barely learned the ma- terial, but keep on until you have over-learned it. e. Begin work promptly. f. Train yourself to ignore distractions from without. g. Do your work with the intent to learn and to remember, h. Mentally review even' paragraph as soon as you have read it. 7 The total outcome of learning includes general patterns of conduct as well as specific habits and knowledge. Among the possible outcomes are study habits which are not included here under the head of "previous achievement." 8 For a discussion of the contribution of achievement in reading to achievement in arith- metic, see: Lessenger, W. E. "Reading Difficulties in Arithmetical Computation," Journal of Edu- cational Research, 11:287-91, April, 1925. 'Whipple, G. M. How to Study Effectively. Bloomington, Illinois: Public School Pub- lishing Company, 1927. 96 p. (Revised edition). 22 Bulletin No. 48 It is evident from an examination of this list that study habits vary widely in specificity and in value. Habits with respect to the time and place in which study is carried on are far more specific in nature than those of mentally reviewing a paragraph or studying with the intent to remember. Mentally reviewing a paragraph im- plies organization of knowledge and as such may be classed as a method of thinking. Studying with the intent to remember is an attitude toward study that may function in all study situations. While the precise effect of conforming to recommended study habits is not known, it is probable that their value is a function of the degree of their generalization. The more specific study habits may, or may not, be useful since the brighter students can get along without them. The more general study habits are indispensable since they are the methods and the driving forces of reflective thinking. The following quotations tend to substantiate the above contentions with respect to the importance of this factor in learning activity: When the pupil had acquired effective methods of study and observed that he really could learn, a new and happy interest was the common result. Some of the pupils, for example, began to read books, a thing that had never previ- ously been done because reading was difficult work. 10 During the latter half of the period, the methods used in the preparation of actual assignments were given special attention. Certain of the students made noticeable progress; one sophomore made his first "A" since entering high school while receiving training in ancient history, a freshman showed a decided gain in algebra. 11 The work of the class was greatly improved through the use of better meth- ods of study. The pupils became more independent and more alert to the im- portance of the history topics and their relation to our lives today. 12 .... that when reasonably effective methods are used to control admission to college, the failure of students subsequently is not commonly due to inade- quate intelligence ; that, on the contrary, the failures are mainly due to several factors, among which, according to the reports gained from the students them- selves, a prominent place is to be assigned to neglect of proper instructions in the art of study. 13 What is still more significant, perhaps, is the fact that 87 per cent of the freshman students enrolled in these two "How to Study" sections completed their total enrolment in the university in a way that was satisfactory to all their instructors, while only about half of the freshman students entering the uni- versity last year, (46 per cent of the bo}'s and 63 per cent of the girls) completed their total enrolment in a satisfactory manner. 14 10 Gates, A. I. "A Study of Reading and Spelling With Special Reference to Disability," Journal of Educational Research, 6:20. June, 1922. n Monroe, W. S. and Mohlman. D. K. ''Training in the Technique of Study," University of Illinois Bulletin, Vol. 22, Xo. 2, Bureau of Educational Research Bulletin No. 20. Urbana : University of Illinois. 1924, p. 20. i-Fisk. E. M. "An Experiment in How to Study," Elementary School Journal, 27:138, October. 1926. "Whipple, G. M. "Experiments in Teaching Students How to Study," Journal of Educa- tional Research, 19:1, January, 1929. 14 Book. W. F. "Results Obtained in a Special 'How to Study' Course Given to College Students." School and Society, 26:534, October 22, 1927. Experimental Research in Education 23 On the same level of intelligence the methods of study are of great impor- tance. As a rule the students of low intelligence who were successful in college were employing good study technique. 13 Symonds, in concluding an excellent review of the research on "How to Study," is probably correct in stating " . . . . that the com- monly accepted rules of study are often non-consequential/' 16 but his later statement, "While one would not deny the fact that all these rules are factors in efficient study one may question their relative importance," 17 would cause one to believe that some study habits are important factors in learning activity. While further experimentation is necessary before it may be known definitely which of the more specific study habits are most effective and. therefore, most important to the experimenter, it may be safely stated that the status of the pupils with respect to study habits, particularly the more general ones. should be considered by the experimenter in forming equivalent groups. 5. The term personality traits is used to designate a group of atti- tudes, interests, ideals, and other reaction tendencies. This group has not been fully analyzed, but several traits have been identified that appear to influence pupil achievement. After canvassing the available literature, Herriott listed five attitudes: a. Ambitious — Indifferent b. Cheerful — Despondent c. Evaluative — Non-evaluative d. Persevering — Vascillating e. Self-confident — Dependent^ His investigation to determine the importance of these attitudes as factors of scholastic success indicated that the last three are major factors. The third and fourth are related to scholastic success in a positive way. That is to say, the student who has the attitudes of evaluating and persevering is more successful, in general, than the stu- dent whose behavior is characterized by their opposites, non-evaluat- ive and vacillating. The fifth is related to success in a negative way. Self-confidence is apt to be dangerous as an attitude of cocksureness, 15 Ross, C. C. and Klise, N. M. "Study Methods of College Students in Relation to In- telligence and Achievement," Educational Administration and Supervision, 13:562, November, 1927. 16 Symonds, P. M. "Methods of Investigation of Study Habits," School and Society, 24:151, July 31, 1926. "Ibid., p. 152. 18 Herriott, M. E. "Attitudes as Factors of Scholastic Success," University of Illinois Bul- letin, Vol. 27, No. 2, Bureau of Educational Research Bulletin No. 47. Urbana : University of Illinois, 1929, p. 31. The two words used to designate an attitude represent opposite extremes of what Her- riott calls a "single attitude." Thus the pupil who is "ambitious" and the one who is "in- different" are merely exhibiting extreme differences in the same attitude rather than two different attitudes. 24 Bulletin No. 48 or the instructor is likely to favor an attitude of dependence on him- self and the text. 19 Herriott concludes with the following statement relative to the significance of these traits: "These data support the belief expressed by many authorities that traits such as study habits and attitudes are factors of success comparable to the seemingly more tangible and more usually measured factors such as intelligence and previous preparation." 20 Statements by several other investigators are reproduced as indica- tive of the recognition of the importance of personality traits as fac- tors in learning: "Without doubt, some of the backwardness was due to a lack of interest and effort, .... " 21 It is possible to obtain statements of personality traits (moral attitudes, emotional maladjustments, and interests) which give correlations of very appre- ciable size (about as large as those obtained between tests of intelligence and marks) with academic success. 22 The most significant factor next to estimated intelligence in its association with scholarship appeared to be the quality, or composite of qualities, defined as school attitude. 23 It would appear from all available data that the relationship between educa- tional interests and abilities as expressed in school grades is represented by aver- age correlations between + .20 and + .40. 24 The major groups of causes of scholastic deficiency were found to appear in the following order of significance. 25 Significance scores Motivation and interests 265 Intellectual factors 265 Emotional factors 221 Educational factors 202 Environmental factors 148 Study habits and methods 113 Physical factors 90 Teaching methods and content 32 Motor factors 28 The data give a minus third-order correlation between general health and school marks, and a relatively low correlation between preparation and marks; the high correlation of "school attitude" with marks is the striking feature of the situation. 26 19 Herriott, op. cit., p. 42-43. 20 Ibid., p. 44. 21 Gates, op. cit., p. 20. 22 Chambers, O. R. "Measurement of Personality Traits," Research Adventures in University Teaching. Bloomington, Illinois: Public School Publishing Company, 1927, p. 76. 23 Fleming, C. W. "A Detailed Analysis of Achievement in the High School," Teachers College, Columbia University Contributions to Education, No. 196. New York: Bureau of Publications, Teachers College, Columbia University, 1925, p. 185. 24 Fryer, Douglas. "Interest and Ability in Educational Guidance," Journal of Educational Research, 16:36, June, 1927. 25 0hmann, O. A. "A Study of the Causes of Scholastic Deficiencies in Engineering by the Individual Case Method," University of Iowa Studies in Education, Vol. 3, No. 7. Iowa City: University of Iowa, 1927, p. 56-57. 26 Pressey, S. L. "An Attempt to Measure the Comparative Importance of General In- telligence and Certain Character Traits in Contributing to Success in School," Elementary School Journal, 21:229, November, 1920. Experimental Research in Education 25 In the light of these investigations and several others that might be mentioned, it appears that personality traits form a significant characteristic of pupils when considered as learners in school subjects. 6. Severe illness and certain types of physical defects, such as blindness, or deafness, are handicaps in learning, but the significance of all aspects of a pupil's physical condition is not known. That such physical defects as adenoids, enlarged tonsils, deafness, and poor vision are factors in school achievement is indicated in the following quo- tations : In every case, except in that of vision, the children rated as "dull" are found to be suffering from physical defects to greater degree than the "normal" or "bright" children. 27 No evidence is apparent that the good or bad condition of the tonsils had any effect on intelligence, but those children who showed improvement in con- dition of tonsils had the highest rate in school achievement. 28 The conclusions of Sandwick, 29 Hall and Crosby, 30 Sumner, 31 and Mallory 32 concur with the above in emphasizing the importance of physical condition as a factor in school achievement. The conclusions of two recent investigations are not in harmony with those just given, since the claim is made that physical defects are a minor factor in scholastic success : Physical defects were not much more prevalent among the retarded group than among the normally progressing group, and therefore, nonpromotion could not be attributed to that cause. 33 It is obvious, of course, that very serious defects will handicap a child in learning. Their influence is probably both direct and indirect. But lesser de- fects do not appear to have any causal connection with poor scholarship. In fact no association of any kind appears in these data between physical health and achievement. Even comparatively serious defects do not necessarily entail poor achievement. 34 The writers are inclined to favor the view expressed in the con- clusion just quoted from Westenberger. The experimenter is justified in considering physical condition, so far as the experiment is con- cerned, a minor factor in learning activity. Extreme cases of ill 27 Ayres, L. P. Laggards in Our Schools. New York : The Russell Sage Foundation, 1909, p. 125. 28 Hoefer, Carolyn and Hardy, M. C. "The Influence of Improvement in Physical Con- dition on Intelligence and Educational Achievement," Twenty-Seventh Yearbook of the Xa- tional Society for the Study of Education, Part I. Bloomington, Illinois: Public School Pub- lishing Company, 1928, p. 387. ^Sandwick, R. L. "Correlation of Physical Health and Mental Efficiency," Journal of Educational Research, 1:199-203, March, 1920. 30 Hall, Irene, and Crosby, Amy. "A Study of the Causes of Inferior Scholarship of Pupils in Low First Grade," Journal of Educational Research, 14:375-83, December, 1926. 31 Sumner, H. W. "Health and Home Factors in Non-Promotions," Chicago School Journal, 9:101-103, November, 1926. 32 Mallory, J. X. "A Study of the Relation of Some Physical Defects to Achievement in the Elementary School," George Peabody College for Teachers, Contributions to Education, No. 9. Nashville: George Peabody College for Teachers, 1922. 78 p. 33 Stalnaker, E. M. and Roller, R. D., Jr. "A Study of One Hundred Nonpromoted Children," Journal of Educational Research, 16:270, November, 1927. 34 Westenberger, E. J. "A Study of the Influence of Physical Defects upon Intelligence and achievement." The Catholic University of America, Educational Research Bulletin, Vol. 2, No. 9. Washington: The Catholic Education Press, W27, p. 45. 26 Bulletin No. 48 health, or physical defect, tend to eliminate themselves from the ordi- nary schoolroom. It is probable that both groups will contain approxi- mately the same number of children with defects, due to the operation of chance, and even if they do not, the inequality would have to be considerable to influence appreciably the mean achievement of the group. 7. Both Thorndike 35 and Starch 36 have concluded on the basis of the findings of several investigations reviewed by them that sex is a vers' minor factor in learning. The conclusions of Minnick 37 and Tout on 38 are in agreement with those of Thorndike and Starch, but a recent investigation by Webb shows that when boys and girls of the same general intelligence are compared with respect to achieve- ment in geometry, the boys exceed the girls on the lower mental levels but are exceeded by them on the higher. 39 He states "that those studies of sex differences, which neglect to take into account the factor of mental ability, fail to discover significant differences between sex groups which may exist at one mental age level, but not at another." 40 Fitzgerald and Ludeman have reported the results of an investi- gation in which it was found that in the sixth and seventh grades boys achieved more in history than girls, but in the eighth grade the greater achievement was shown by the girls. 41 Van Wagenen found sex differences in learning American history to be great enough to warrant establishing two sets of norms for his "American History Scales." 42 Fisher discovered that a loss of efficiency in mechanical learning takes place a year earlier in girls than in boys. 43 From these more recent studies the conclusion may be drawn that sex is a factor of less importance than those described in the preceding pages, but it should not be neglected by the educational experimenter who seeks highly dependable results. 35 Thorndike, E. L. Educational Psychology, Vol. III. New York : Teachers College, Columbia University, 1914. p. 169-205. 36 Starch, Daniel. Educational Psychology. New York: The Macmillan Company, 1919, p. 63-72. 37 Minnick, J. H. "A Comparative Study of the Mathematical Ability of Boys and Girls," School Review, 23:73-84, February, 1915. 38 Touton, F. C. "Sex Differences in Geometric Abilities," Journal of Educational Psychology, 15:246-47. April, 1924. 39 Webb, P. E. "A Study of Geometric Abilities Among Boys and Girls of Equal Mental Abilities." Journal of Educational Research, 15 :256-62, April, 1927. i0 Ibid., p. 262. "Fitzgerald, J. A. and Ludeman, W. W. "Sex Differences in History Ability," Peabody Journal of Education, 6:175-81, November, 1928. 42 Van Wagenen, M. J. "Historical Information and Judgment in Pupils of Elementary Schools," Teachers College, Columbia University Contributions to Education, No. 101. New York: Bureau of Publications, Teachers College, Columbia University, 1919. 74 p. 43 Fisher. V. E. "A Few Notes on Age and Sex Differences in Mechanical Learning," Journal of Educational Psychology, 18:562-564, November, 1927. Experimental Research in Education 27 8. The importance of race as a factor in learning is difficult to de- termine. 44 Various other factors, such as language handicap, social status, parental occupation, and other environmental influences, tend to obscure its significance. Although it may be impossible to ascribe differences between racial groups to something inherent in their re- spective races, it is none the less evident that actual differences exist in school achievement. The following are typical of the conclusions reached by investigators. Statistical data carefully collected and presented in the foregoing study indicate rather conclusively that primary French-speaking children in certain Louisiana parishes are lower in achievement than English-speaking children and are seriously retarded. 43 The authors ascribe the lower achievement of the French-speaking children to the language handicap. The number of months by which the median educational age of the entire group of white children exceeds the median of the negro group was found to be 16.7 months. It was foimd further that only 14.5 per cent of the negro children reach or exceed the median educational age of the white children/" 5 A recent study of the retardation of seventeen hundred children of immigrants in two cities of northern Michigan shows that retardation according to nationality follows very closely the median intelligence quotients of the nationalities. 47 The conclusions may be expressed that whether there are inherent differences in race or not. characteristics distinctive of racial groups, such as language ability, social status, parental occupations, customs. prejudices, attitudes, and the like, are of enough significance in school work that race must be considered by the educational experimenter who desires to set up equivalent groups. Significant characteristics of pupil material not independent traits. Several of the characteristics of pupil material described in the pre- ceding pages are not independent. The correlation between general intelligence and previous achievement in any one school subject for a single school grade and general intelligence is likely to fall between .11 and .69. 4S The interdependence of general intelligence and general school achievement is expressed in the following statement from ^Race may influence achievement indirectly through intelligence since races will differ in capacity to learn as they vary in intelligence. This aspect of the race factor does not concern us here since equating pupils with respect to intelligence will take care of it. We are interested in the more direct influences of racial characteristics on learning. ^Brouillette, J. W., Foote, I. P., Robert, E. B. T and Terrebonne. L. P. A Comparative Study of the School Progress of Foreign -Speaking and English-Speaking Children in the Early Elementary Grades. Chicago: Scott, Foresman and Company. 1928. p. 62-63. 46 Witty, P. A. and Decker, A. I. "A Comparative Study of the Educational Attainment of Negro and White Children." Journal of Educational Psychology, 18:498-99. October. 1927. 4T Brown. G. L. "Intelligence as Related to Xationalitv," Journal of Educational Research, 5:326. April. 1922. 4S Gates. A. I. "The Correlations of Achievement in School Subjects with Intelligence Tests and Other Variables," Journal of Educational Psychology, 13:280, May, 1922. 28 Bulletin No. 48 Kelley: "On the average, in the neighborhood of .90 of the capacity measured by an all-round achievement battery score, — reading, arith- metic, science, history, etc., — and of the capacity measured by a general intelligence test is one and the same." 49 General intelli- gence is also positively related to study habits and personality traits. Butterweck has shown that the brighter pupils in his investigation tended to employ the better study habits. 50 Herriott, in the research already referred to, found a small though significant positive rela- tionship between general intelligence and the personality traits listed on page 23. In typical grade groups the correlation between chrono- logical age and mental age is negative. Brighter children tend to be accelerated, while duller children are retarded, so that in a given school class, the relatively brighter are the younger, and the relatively duller are the older. Terman reports a high negative correlation (—.74) between the I. Q. and the chronological age of a group of children entering high school as freshmen. 51 Baldwin has shown that, in general, children who are gifted mentally are also superior physi- cally. 52 Hoefer and Hardy state that children whose physical con- dition is good have a more rapid mental growth than children whose physical condition is fair. 53 Sex and race are not to be thought of as variables that may be correlated with intelligence. They are the least significant of the factors listed as characteristics of pupil ma- terial and are the most independent. The fact that several of the characteristics are positively corre- lated with general intelligence means that if two groups of pupils are equivalent with respect to mental age, or intelligence-test scores, under typical conditions, they are likely to approach equivalence with respect to previous achievement, study habits, personality traits, and physi- cal condition. Securing equivalent groups for a controlled experiment. 54 It is relatively easy to assemble two groups that are equivalent with refer- ence to a given characteristic, provided that characteristic can be measured accurately. For example, pupils may be paired on the basis of mental age, or intelligence-test scores, so that for each pupil in one 49 Kelley, T. L. Interpretation of Educational Measurements. Yonkers-on-Hudson, New York: World Book Company, 1927, p. 21. ^Butterweck, J. S. "The How to Study Problem," Journal of Educational Research, 18:66-76, June, 1928. 51 Terman, L. M. The Intelligence of School Children. Boston: Houghton Mifflin and Company, 1919, p. 82. 52 Baldwin, B. T. "Anthropometric Measurements," Genetic Studies of Genius, Vol. 1. Stanford University, California : Stanford University Press, 1925, p. 135-71. 53 Hoefer and Hardy, op. cit., p. 371-87. ^One group of pupils is equivalent to another with respect to a given characteristic when for each pupil in one group there is a mate in the second who possesses the same amount of the characteristic. An approach to equivalence is secured when the central tendency and variability of one group with respect to a given characteristic are equal to these measures of the other. Experimental Research in Education 29 group there will be a mate in the second group having the same mental age or test score. Obviously, it would be difficult, if not impossible, under typical conditions to assemble two groups by locating pairs of pupils that are equivalent in respect to all significant characteristics. Hence, in assembling equivalent groups by pairing, the experimenter usually considers only one or at most two characteristics. "When the groups have been assembled, they should be checked for equivalence with respect to the remaining significant characteristics. For example, if two groups have been assembled by pairing pupils having the same mental ages, or intelligence-test scores, the mean and standard devia- tion of each group should be calculated for chronological age, and previous achievement, when it is significant. If the mean and stand- ard deviation of one group are not approximately equal to those of the other group, adjustments should be made to secure approximate equality or the pair of groups rejected for experimentation. If ade- quate measuring instruments were available, it would be desirable also to check the equivalence of the groups with respect to study habits and personality traits in the same way. The equivalence of the groups with respect to sex and race should be checked to make certain the groups exhibit no marked differences with respect to these character- istics. The experimenter should also make certain that the two groups involve no serious differences in physical condition. A technique seems to be evolving for selecting pairs of pupils on the basis of a composite measure of characteristics. According to one technique, if it is desired to pair children on the basis of two charac- teristics, such as intelligence and previous achievement, a correlation chart of the test scores for intelligence and achievement may be con- structed. The position of each child with respect to both of these characteristics is shown by a single dot on the chart. The experi- menter selects his pairs by locating dots which are closest together. An illustration of the use of this technique is to be found in the report of an experiment by Butterweck/' 5 Another technique is that of com- bining measures of different characteristics into one composite measure representing all of them. The children are then paired on the basis of the composite scores. The use of this technique is illustrated in the experiment of Douglass on the relative effectiveness of two sequences in supervised study. 56 Before this technique can be commended as the 55 Butterweck, J. S. "The Problem of Teaching High-School Pupils How to Study," Teachers College, Columbia University Contributions to Education, No. 237. New York: Bureau of Publications, Teachers College, Columbia University, 1926. 116 p. 56 Douglass, H. R. "The Experimental Comparison of the Relative Effectiveness of Two Sequences in Supervised Study," University of Oregon Publication, Education Series, Vol. 1, No. 4. Eugene, Oregon: University of Oregon, 1927, p. 173-218. 30 Bulletin Xo. 48 best technique to use, research must determine the weights to be given each characteristic in the composite score. Melby and Lien 57 have reported on a technique for controlling pupil factors which does not involve the use of pairing procedures. The initial status of three or more available groups is determined with respect to intelligence and previous achievement. After the order of superiority of the groups has been determined from comparison of their initial status, the experimenter selects one of the average groups as the experimental group. The assumption is made that if the medi- ocre experimental group exceeds in final achievement the initially su- perior group, then superiority of the experimental factor is dependably indicated. If, however, the final achievement of the mediocre experi- mental group falls below that of the initially inferior group, then the inferiority of the experimental factor is shown. The technique is com- mendable in that it permits the use of ordinary school classes without modification. It cannot be regarded, however, as anything more than a "practicable technique," as it is labeled by Melby and Lien. The technique lacks precision in that the difference in gains in achievement is not ascribable to educative factors alone, as is the case when equiv- alent groups are used. Since it lacks this precision, it is difficult to see how clear-cut conclusions may be drawn from an experiment in which it is used. The educative factors that affect pupil achievement. The educative factors that affect pupil achievement are grouped here under the following heads: I. Teacher factors II. General school factors III. Extra-school factors I. Teacher factors that affect pupil achievement. Amount of train- ing, teaching experience, intellectual status, personality, physical con- dition, sex, and age are usually listed as important teacher factors, but they influence pupil achievement for the most part indirectly through their contributions to more immediate factors. For example, training, amount of teaching experience, and intellectual status con- tribute to the teacher's instructional techniques and to his skill in the use of them; hence, these factors influence pupil achievement in- directly. Our problem here is to determine the teacher factors that influence pupil achievement directly. 57 Melby, E. O. and Lien, Agnes. "A Practicable Technique for Determining the Relative Effectiveness of Different Methods of Teaching," Journal of Educational Research, 19:255-264, April, 1929. Credit is given to Professor John G. Rockwell of the University of Minnesota for de- vising this technique and using it in an experiment on thyroid deficiency. Experimental Research in Education 31 In The Commonwealth Teacher -Training Study, Charters and Waples determined a list of twenty-five teacher traits 58 by interview- ing a number of persons considered competent and by listing the "trait names" and "trait actions" mentioned by these persons as character- istics of good teachers. In view of the comprehensiveness of this study, it might be argued that the twenty-five traits listed, or at least those of highest rank, should be taken as the teacher factors that affect pupil achievement and, hence, as the teacher factors that should be controlled in an experiment. It does not appear satisfactory to do this. The list is too long and does not include instructional techniques or classroom-management procedures. Consequently, the present writers propose the following list of teacher factors for consideration. Evidence of the potency of each of these factors is presented as a basis for a conclusion in regard to the ones that must be controlled in experimentation in order to avoid introducing a serious error in the results. 1. Instructional techniques a. Learning exercises b. Motivation procedures c. Directive procedures d. Diagnostic procedures 2. Classroom-management procedures 3. Skill in carrying out instructional techniques and classroom- management procedures 4. Zeal of the teacher with reference to experimental factor 5. Personality traits 6. Physical condition 7. Sex 8. Age 1. The more influential instructional techniques may be classified under four heads: (a) learning exercises; (b) motivation procedures; (c) directive procedures; (d) diagnostic procedures. The attention given to methods courses in the professional training of teachers is evidence of the conviction that the instructional techniques employed by a teacher affect the achievements of his pupils. Hence, it is not necessary to present evidence in justification of them as important teacher factors. 59 It should be noted, however, that the influence of 5s Charters, W. W. and Waples, Douglas. The Commonwealth Teacher -Training Study. Chicago: University of Chicago Press. 1929, p. 18. 59 Some indirect evidence is afforded by investigations of the relation between achievement in the field of methods of teaching and teaching ability. In one such investigation the partial correlation (between "ability to pass a professional test" and "general teaching ability") was found to be 4- .570. Knight, F. B. "Qualities Related to Success in Teaching," Teachers College, Columbia University Contributions to Education, No. 120. New York: Bureau of Publications. Teachers College, Columbia University, 1922. p. 42. 32 Bulletin No. 48 a given technique depends on its appropriateness. In order to be most effective, a given technique must be suited to the pupils, com- patible with the objectives to be attained, and supplemented by other techniques. For example, a learning exercise suitable for pupils on the lower levels of. intelligence is not likely to be a good one for bright pupils. Certain types of drill exercises in arithmetic have been demon- strated to be effective relative to the attainment of certain objectives, but they are not effective when other objectives are to be attained. A "good" learning exercise is likely to be relatively ineffective unless it is supplemented by appropriate directive and diagnostic procedures. The rule that practice should be distributed rather than concentrated is further evidence that the influence of a given instructional tech- nique depends upon factors other than its intrinsic character. 2. Classroom-management procedures include such items as taking the roll, distributing and collecting materials, starting the work of the period, and dismissing the class in case the pupils go to another room at the end of the period, and dealing with disciplinary cases. The importance of these procedures is generally recognized. In fact, until recent years the teacher's ability as a disciplinarian was con- sidered to be the most important of his qualifications. While other aspects of teaching are now considered of more importance than the mere maintenance of order, adequate attention to routine matters of classroom management, inclusive of discipline, is regarded as essen- tial for the promotion of the most suitable environment for learning. If, however, distinctly undesirable practices are avoided, it appears likely that variations in classroom-management procedures will not affect pupil achievement to a significant extent. 3. The effectiveness of an instructional technique or a classroom- management procedure depends upon the skill with which it is carried out. This factor was implied in the discussion of instructional tech- niques, but its importance justifies more specific consideration. Although we have no means of securing precise measures of teaching skill, it is obvious that some teachers are more skillful in carrying out certain instructional techniques than are other teachers. When a new technique is being compared with a familiar one, it is likely that the new one will be applied less skillfully. For example, sup- pose an experiment is devised to determine the effect of supervised study in comparison with study without supervision. Suppose further that the plan of supervising study has been worked out so that the procedure is specified in detail. If a teacher, who has become a skill- ful instructor under a plan that does not involve supervised study, Experimental Research in Education 33 but who has not had experience in supervising study, attempts to teach one class employing supervised study and another without supervised study, it is reasonable to expect that he will be consider- ably more skillful in teaching the second class., If this is the case, the experiment would furnish a comparison between skillful teaching without supervised study and teaching with supervised study some- what crudely carried out. Hence, the experiment would not yield satisfactory evidence of the relative merits of skillful teaching with supervised study and skillful teaching without supervised study. An illustration of the recognition of the importance of skill as an educative factor is afforded by the Newark Phonics Experiment. The teachers of the experimental classes, the principals of the schools in- volved, and the members of the Experimental Committee met to- gether and formulated a detailed working plan. Then the plan was tried out for a semester before the real experiment was begun. 60 4. The zeal that a teacher exhibits in carrying out the instruc- tional techniques he is employing is a subtle factor. It is closely related to the factor of skill, and perhaps the two overlap to some extent, but there is evidence that indicates the presence of an important educative factor that differs in some respects from skill. It is reason- able to expect that a teacher will exhibit greater zeal when employ- ing a method that he believes in than when employing one that he does not like. The influence upon pupil achievement of the teacher's preference in regard to methods is indicated in an unpublished re- port 61 of an experiment to determine the relative merits of instruc- tional procedures that may be designated as Method A and Method B. Several teachers cooperated in the experiment, each one teaching one class according to Method A and another class according to Method B. The following results were secured. Mean Mean Scholastic X umber Score Grade Pupils taught by Method A 417 71.5 83.9 Pupils taught by Method B 440 69.5 83.8 Gain in favor of Method A 2.0 The teachers were asked to indicate which method they preferred. The following results were obtained when the data were tabulated according to the preference of the teachers. ^Sexton, E. K. and Herron, J. S. "The Newark Phonics Experiment," Elementary School Journal, 28:690-701. May, 1928. 61 The writers are indebted to Dr. Rosalie M. Parr, of the University of Illinois, for these data. A report of this study is to be published in the Journal of Chemical Education. 34 Bulletin No. 48 Teachers Preferring Method A Number Pupils taught by Method A 131 Pupils taught by Method B 140 Gain in favor of Method A Teachers Preferring Method B Number Pupils taught by Method A 180 Pupils taught by Method B 178 Gain in favor of Method B Mean Mean Scholastic Score Grade 75.0 84.8 59.3 82.4 15.7 Mean Mean Scholastic Score Grade 67.2 85.4 72.2 85.2 5.0 Teachers Having No Preference Mean Mean Scholastic Number Score Grade Pupils taught by Method A 80 67.0 82.7 Pupils taught by Method B 89 67.2 83.0 Gain in favor of Method B 02 The experiment was carried on during the latter part of the semes- ter and the scholastic grades are probably a fair index of the equiv- alence of the two groups of pupils. According to this criterion, the paired groups are approximately equivalent except in the case of those taught by teachers preferring Method A. The difference for this pair of groups, however, is small in comparison with the difference between the mean scores. Furthermore, it may be that the scholastic grades of these pupils were influenced by their performances during the ex- periment. 62 The differences between the mean scores of the several pairs of groups strongly suggest that the preference of the teachers in regard to the method of teaching affected the achievements of the pupils. If it is assumed that the preference in regard to methods affected the zeal of the teachers, it follows that this characteristic of teaching is an important educative factor and, hence, must be con- trolled in order to secure dependable results. Douglass 63 has reported data that may appear to be in opposition to the conclusion just stated. At the close of an experiment to de- termine the relative effectiveness of two sequences of supervised study, 62 The papers of the test given at the close of the experiment were scored by a central committee, and, consequently, the teachers did not know the results until after the grades had been assigned. But it is likely that> the teachers had some idea of the achievements of the pupils during the experiment, and since the pupils taught by Method A by teachers pre- ferring this method did much better work than the pupils taught by Method B by the same teachers, it is not unlikely that the difference in mean scholastic grades is due in part to this fact. If this hypothesis is correct, these two groups were more nearly equivalent than the mean semester grades indicate. 63 Douglass, op. cit., p. 173-218. Experimental Research in Education 35 all instructors involved in the investigation were asked to express an opinion in regard to the relative merit of the two instructional pro- cedures. Nine of the fourteen opinions expressed were contrary to the experimental results in the pair of classes taught by the teacher giving the opinion. The conditions of this investigation differ in certain sig- nificant respects from the one described in the preceding paragraphs. In the first place, the Douglass experiment was carried on in the Uni- versity High School at the University of Oregon. The other experi- ment was cooperative and involved about as many different schools as there were teachers. Another difference is that Douglass asked his teachers to express an opinion in regard to the results of the experi- ment, whereas in the other experiment the teachers were asked to indi- cate the method they preferred. Finally, the teachers in a University High School are likely to be more scientifically-minded than teachers in typical high schools and, hence, would be less likely to have strong preferences and more likely to be equally zealous in carrying out both of the methods. The statement is made in the report of the experiment by Melby and Lien that, "The teacher, in fact, was secretly hoping that the results would reflect credit on the experimental method .... Yet results favored the control groups." 64 This should not be interpreted to mean that the data of this experiment are such as to minimize the importance of zeal as a factor. It is evident from the description of the procedures employed in the instruction of the control pupils that considerable zeal was exercised in spite of the teacher's dislike for these procedures. The inference that may be drawn from the report of this experiment is that this teacher was also sufficiently scien- tifically-minded to control the factor or zeal. It, therefore, appears that the Melby and Lien experiment and the one by Douglass are not necessarily in opposition to the one previously described. This con- clusion is supported by evidence from other investigations. The influence of some teacher factor, which probably was zeal, is revealed in the Newark Phonics Experiment. " . . . . the results show conclusively that there is immeasurably less difference between classes taught with and without phonics than between different schools. Where the results were unusually good in a class taught by a teacher using phonics, they were unusually good when the same teacher taught without phonics. On the other hand, poor results were secured in both phonic and non-phonic groups taught by the same teacher." 65 ^Melby and Lien, op. cit., p. 264. 65 Sexton and Herron, op. cit., p. 701. 36 Bulletin No. 48 In the experiment by Collings 66 the children taught by the project method achieved more than those taught by the traditional method, but it appears from Collings' report that these teachers worked much harder at their task than did the teachers in the control schools. In view of this fact, it does not appear justifiable to ascribe the superior achievement of the project-method group entirely to the method of instruction. The unusual zeal of the teachers undoubtedly contributed a large portion of the superiority in achievement. This conclusion is supported by an investigation reported by Gates. 67 The account of the experiment indicates that the teachers employing the "modern systematic method" exhibited as much zeal as those employing the "opportunistic method." The results favor the former. Although this experiment differs in several respects from the one conducted by Col- lings, they are sufficiently alike to justify the conclusion that the zeal of the teacher is a potent educative factor. More direct evidence of the effect of a high degree of zeal upon achievement is furnished by an investigation by Pittman. 68 The de- scription of the activities of the teachers in the experimental group of schools makes it apparent that they exhibited a very high degree of zeal. For example, it is stated: "The teachers under professional supervision did approximately four times as much professional reading as they themselves had done during the previous year, or as the un- supervised group, with which they were compared, did during the year of the experiment." 69 The description of this experiment would not be seriously distorted if the zeal of the teachers was designated as the experimental factor. Hence, the distinct superiority in achievement of the pupils in the experimental schools was undoubtedly due in a large measure, either directly or indirectly, to the zeal of the teachers. After a relatively elaborate and careful study of the factors related to teaching success, Knight concluded that " . . . . general factor of interest in one's work becomes the dominant factor in determining one's success in teaching .... it is reasonable to suppose that genuine interest in one's work accounts for a large part of teaching success." 70 5. The term personality traits is used here as a name for a complex of subtle teacher factors that are commonly designated by such terms as "general appearance," "voice," "self-control," "tact," "sympathy," ''sense of justice," and "loyalty." Such traits have not been defined 66 Collings. Ellsworth. An Experiment with a Project Curriculum. New York: The Mac- millan Company, 1923. 346 p. 67 Gates, A. I., et al. "A Modern Systematic Versus an Opportunistic Method of Teaching," Teachers College Record. 27:679-700, April, 1926. ^Pittman, M. S. The Value of School Supervision. Baltimore: Warwick and York, Inc., 1921. 129 p. 6! >Ibid., p. 101. 70 Knight, op. at., p. 9. Experimental Research in Education 37 so that satisfactory measurement is possible, and, consequently, we do not have any definite measure of their effect upon pupil achieve- ment. There is, however, a wide-spread conviction 71 that the "person- ality" 72 of the teacher is an important educative factor. This con- viction is supported by some evidence. Morris has reported a partial coefficient of correlation of .463 between success in practice teaching and "trait index." 73 Hence, it seems safe to conclude that "person- ality traits" do affect pupil achievement to such an extent that they cannot be safely neglected in an experiment. 6, 7, and 8. The teacher's physical condition, sex, and age have an indirect influence on school achievement in so far as they condition the zeal with which a teacher employs instructional procedures and the skill with which he uses them. The teacher's physical condition, sex, and age may influence directly the achievement of children by engendering attitudes that may be beneficial, or detrimental, to learn- ing. The relation of the teacher's physical condition to teaching effi- ciency is indicated in studies made of teacher failure. In a study by Buellesfield poor health takes twelfth rank as a chief cause and twen- tieth rank as a contributory cause. 74 Littler ranks poor health lowest of seven causes of teacher failure. 75 Moses places it eleventh in point of frequency; there are no less frequent causes mentioned. 76 In a recent study of teacher failure Madsen reports physical condition as the cause in only two out of thirty-one cases, in one case deafness and in the other case general physical disability. 77 Some correlation coefficients have been determined in an effort to indicate the relation of health or physical condition to teaching effi- ciency. Bradley states that the correlation between "general merit" and physical efficiency is .59, the lowest of several listed by him. TS * a The Commonwealth Teacher-Training Study referred to on page 31 affords conclusive evidence of this statement. For senior high-school teachers the ten traits considered most important are: breadth of interest, self-control, good judgment, leadership, scholarship (in- tellectual curiosity), forcefulness. honesty, adaptability, enthusiasm, and open-mindedness. "Although this term has not been defined with precision, as it is commonly used, it undoubtedly includes zeal (as the term has been used in the preceding pages) and probably overlaps with skill. Hence, the term "personality traits," as it is used here, is not synonymous with "personality." "Morris, E. H. "Personal Traits and Success in Teaching," Teachers College, Columbia University Contributions to Education, No. 342. New York: Bureau of Publications, Teachers College, Columbia University, 1929, p. 49. This "trait index" is defined as a composite of likes and dislikes, resourcefulness and insight, tact, degree of positiveness of judgment, and characteristic feeling-attitudes. Ibid., p. 18. "Buellesfield, Henry. "Causes of Failure Among Teachers," Educational Administration and Supervision, 1:451, September, 1915. "Littler, Sherman. "Causes of Failure Among Elementary-School Teachers," School and Home Education, 33:255-256, March, 1914. "Moses, C. V. "Why High-School Teachers Fail," School and Home Education, 33:166-169. January, 1914. "Madsen, I. N. "The Prediction of Teaching Success," Educational Administration and Supervision, 13:44-45, January, 1927. "Bradley, J. H. "A Study of the Relative Importance of the Qualities of a Teacher and Her Teaching in Their Relation to General Merit," Educational Administration and Super- vision, 4:359, September, 1918. 38 Bulletin No. 48 Boyce gives a smaller correlation coefficient, .18, between health and general merit. 79 Ruediger and Strayer report the correlation to be .04 between general merit and health. 80 The recent study of Whitney gives a coefficient of .124 between physique and teaching success after graduation. 81 However, it should be stated that this is a greater rela- tionship than was found for intelligence and success after graduation. Whitney places physique as the fourth most important item in the pre- diction of success in teaching. It follows student teaching, professional marks, and academic marks. The weight of the evidence is in favor of regarding the physical condition of the teacher, so long as extremes are avoided, as a minor factor in the achievement of the pupils. There is no evidence to be found in the literature which would in- dicate that the teacher's sex is an important factor in the learning activity of the pupils. It is said that the pre-adolescent boy prefers men teachers to women teachers, but it is yet to be proven that this prejudice, even assuming it to be universally existent, is sufficient to decrease his achievement significantly. The age of the teacher is not usually a significant factor in suc- cessful teaching. After stating that the correlation of teaching skill with age was negligible for a group of Massachusetts teachers, Knight goes on to say: We know there is some correlation between age in general and teaching ability. A five-year-old child could not teach, and excessive old age would no doubt be negatively correlated. But within those age limits during which men and women ordinarily teach, age does not appear to be correlated with teaching skill. The younger teachers are not the best as a current superstition would lead us to think; nor do years of tenure make material additions to skill. 82 This estimate of the importance of age as a factor in teaching is concurred in by Whitney who states, "Age is not a particularly im- portant element in good teaching." 83 It seems safe to assume, there- fore, that so long as extremes are avoided, the teacher's age is not a significant factor in the learning activity of the pupils. The control of teacher factors. The preceding consideration of teacher factors has demonstrated the necessity of controlling, i. e. keeping the same, at least four teacher factors: (1) instructional pro- cedures, (2) skill in carrying out these procedures, (3) zeal of the teacher, and (4) personality traits. Control of instructional tech- niques may be approached by giving the teachers participating in the 79 Boyce, A. C. "Qualities of Merit in Secondary School Teachers," Journal of Educational Psychology, 3:154, March, 1912. ^Ruediger, W. C. and Strayer, G. D. "The Qualities of Merit in Teachers," Journal of Educational Psychology, 1:275, March, 1910. 81 Whitney, F. L. "The Prediction of Teaching Success," Journal of Educational Research Monographs, No. 6. Blooniington, Illinois: Public School Publishing Company, 1924, p. 20. 82 Knight, F. B. "Qualities Related to Success in Elementary School Teaching," Journal of Educational Research, 5:212, March, 1922. &3 Whitney, op. cit., p. 63. Experimental Research in Education 39 experiment detailed instructions with regard to the conduct of their classes during the experiment. An example of such an attempt to con- trol this factor is illustrated in the following quotation from the re- port of an experiment by Coryell: In order to secure uniformity of procedure and the consistent canying out of the prescribed methods, the teachers who were to collaborate in the experi- ment met from time to time in conference for devising ways and means. The class work for the first week was planned in the minutest detail. The same questions were actually used by all three teachers and the lesson plans were followed as exactly as was humanly possible while conducting a live recitation. Many other plans were made and used in common, and where no detailed lesson plan was drafted for all three teachers to use, the work to be covered each day was broadly outlined. 84 From one point of view, it is not sufficient to secure equivalence of instructional techniques. They should be representative of sound educational practice. This requires, among other things, that there be adaptation of techniques to the needs and purposes of the pupils as they are revealed in the course of the instruction. Hence, if control of instructional techniques is carried too far, the requirement of sound educational practice may be violated. The control of skill and of zeal is much more difficult. A speci- fied degree of skill or of zeal cannot be secured by asking teachers to follow certain instructions. Any direct request to be more skillful, and especially to be more zealous, may produce the opposite result. By exercising care in selecting teachers and by making tise of in- direct devices, a skillful experimenter may secure approximate equiva- lence of these two teacher factors, but, since neither can be measured objectively, he cannot be certain that he has done so. A method frequently employed to control these factors is the rotation of teachers at the mid-point of the experimental period. The teacher who has been teaching the experimental group exchanges with one who has been teaching the control group. S5 Thus, any difference in skill or zeal on the part of two teachers is expected to be corrected by the fact that both the experimental and control pupils have re- ceived an equal amount of stimulation from both teachers. The ex- periment of Douglass is an example of the use of this technique. 86 This procedure will be successful in securing equivalence of these fac- tors when each teacher is equally skillful in carrying out both forms of the experimental factor and is equally zealous in doing so. A teacher might teach with equal skill and zeal in employing two different tech- niques, but it appears likely that most teachers, because of a lack of 84 Coryell, N. G. "An Evaluation of Extensive and Intensive Teaching of Literature," Teachers College, Columbia University Contributions to Education, No. 275. New York: Bureau of Publications, Teachers College, Columbia University, 1927, p. 13. s5 This exchange involves also a change of teachers relative to the experimental factor. 83 Douglass, op. cit.. p. 187. 40 Bulletin No. 48 familiarity with, or a dislike for, one of the procedures, might teach with less skill and zeal in one of the groups than in the other. When this occurs, the rotation procedure will not succeed in securing con- trol of skill and zeal except by chance. Another plan for securing control of these factors is to have the same teacher teach an experimental and a control group. The success of this method will depend on the degree to which the teacher carries out both the experimental and control instructional procedures with equal skill and zeal. In order to approach control when a single teacher is used, or when two teachers exchange groups at the mid- point of the experiment, it might be suggested that teachers be prac- ticed in the experimental procedures before the start of the experi- ment, and enough of the scientific attitude be engendered in them to overcome the operation of prejudice for any particular method. When a teacher develops a preference for one of the procedures, he becomes disqualified for the experiment. Another difficulty is introduced when we recognize the require- ment that the teaching represent sound educational practice. This requirement involves the provision that the teacher believe in the method he is employing rather than be indifferent or even open- minded toward it. In fact, sound educational practice probably re- quires that the teacher be prejudiced in favor of the method he is employing. If this is admitted, it is apparent that an experimenter should not expect to secure equivalence of skill and zeal by the rota- tion method or by having a teacher teach both an experimental group and a control group. Since "personality traits" can not be measured satisfactorily, it is impossible to determine accurately the status of teachers with refer- ence to this factor. Marked differences probably can be discovered, but in general it is not possible with our present techniques to select teachers who are equivalent with reference to "personality traits." Control may be secured by the rotation method or by having the same teacher instruct both a control group and an experimental group. Although these procedures will secure control of "personality traits," they are not completely satisfactory for reasons pointed out in the preceding paragraphs. II. General school factors that affect pupil achievement. Pupil achievement is affected directly or indirectly by several general school factors. For example, it is generally assumed that the textbook used in a course influences the achievement of the pupils. Much of this influence is indirect. The character of the text influences the learn- Experimental Research in Education 41 ing exercises assigned which in turn influence achievement. In the following list of general school factors no attempt is made to indicate whether a factor functions directly or indirectly. 1. Instructional materials (textbooks, library, maps, labora- tory apparatus, etc.) 2. Time devoted to learning activity 3. Characteristics of the class as a group 4. Size of class 5. Size of school 6. School organization 7. Administration and supervision 8. School building, especially lighting, heating, and ventilation 1. Instructional materials, such as textbooks, libraries, and other school equipment, influence the learning activity of pupils through the learning exercises that they furnish or make possible. Texts in arithmetic, algebra, language, physics, and most of the other sub- jects furnish a number of learning exercises. Texts and other books make possible other learning exercises, such as requests to study certain pages or questions whose answers may be found by reading. In a similar way charts, maps, moving picture machines, laboratory apparatus, and the like affect the number and type of learning exer- cises that may be assigned. Hence, the achievement of the pupils is likely to be affected by the instructional materials used with a class. The intimate relation between instructional materials and learning exercises may make it impossible to have the former constant when the latter are greatly different. It should be noted, however, that cer- tain types of learning exercises require certain instructional materials. Hence, if the purpose of an experiment is to compare two types of learning exercises, such as the lecture-demonstration method of teach- ing a science and individual-laboratory work, the materials must differ. In such cases, the difference in instructional materials is essentially a phase of the experimental factor. 2. If the time devoted to learning activity is assumed to be an approximate index of the amount of exercise of modifiable connections, it is apparently an important educative factor. In considering its im- portance in experimental investigations two cases should be noted: (1) the long-time experiment in which the difference in time spent is due to absences of certain pupils, and (2) the experiment in which the difference in time spent is a difference in the length of the class period or in the total time devoted to studv. 42 Bulletin Xo. 48 In a study reported by Odell, a slight positive correlation was found between average school marks and per cent of time in attend- ance. 87 The relationship of length of attendance to educational age is shown in the coefficient of correlation of .30 ± 2 reported by Den- worth. 88 In view of these relatively low correlations it would seem plausible to say that if there were no extreme cases of inattendance, or irregular attendance, and if the absences were approximately bal- anced, attendance would be an insignificant factor. In the second case, it seems reasonable to expect that the time spent in learning activity is an important educative factor. Experi- mentation on the distribution of practice in learning has shown that there are optimum lengths of practice periods for different types of learning. Pyle 89 in his substitution experiment used fifteen-, thirty-, forty-five-, and sixty-minute practice periods. His results were in favor of the thirty-minute period. The experiments of Hahn and Thorndike, 90 Kirby, 91 Starch, 92 and Lyon 93 indicate that the length of the practice period is a factor in learning. The evidence just cited tends to show that more time spent in learning activity does not nec- essarily imply more learning. Up to a certain point, increase of the learning period may be beneficial to learning; beyond this point, in- crease may be detrimental. It is probable that this sort of thing operated in the investigations of Rice, 94 Heck, 9 " 5 Jones and Ruch, 96 and Barnes and Douglass 97 who found little relation between time spent in learning activity and achievement. It should be noted that "time spent in learning activity'' needs definition in this connection. Think- ing and talking about an assignment probably should be included as well as formal study, either at school or at home. If this thesis is 87 0dell, C. W. "The Effect of Attendance Upon School Achievement," Journal of Educa- tional Research, 8:422-32, December, 1923. ^Denworth, K. M. "The Effect of Length of School Attendance Upon Mental and Edu- cational Ages," Twenty -Seventh Yearbook of the National Society for the Study of Educa- tion, Part II. Bloomington, Illinois: Public School Publishing Company, 1928, p. 80. 89 Pyle, W. H. The Psychology of Learning. Baltimore: Warwick and York, Inc., 1928, p. 44. (Revised edition.) ^Hahn, H. H. and Thorndike, E. L. "Some Results of Practice in Addition under School Conditions," Journal of Educational Psychology, 5 :65-84, February, 1914. 91 Kirby, T. J. "Practice in the Case of School Children," Teachers College, Columbia University Contributions to Education, No. 58. New York: Bureau of Publications, Teachers College, Columbia University. 1913. 98 p. 92 Starch, Daniel. "Periods of Work in Learning," Journal of Educational Psychology, 3:209-213, April, 1912. 93 Lyon, D. O. "The Relation of the Length of Material to the Time Taken for Learning and the Optimum Distribution of Time," Journal of Educational Psychology, 5:1-9, 85-91, 155-163, Januarv, February, and March, 1914. ^Rice, J. M. "The Futility of the Spelling Grind," Forum, 23:163-72, 409-19, April, June, 1897. 95 Heck, W. H. "Correlation Between Amounts of Home Study and Class Marks," School Review, 24:533-549, September, 1916. 96 Jones, Lonzo and Ruch, G. M. "Achievement as Affected by Amount of Time Spent in Study," Twenty-Seventh Yearbook of the National Society for the Study of Education, Part II. Bloomington, Illinois: Public School Publishing Company, 1928, p. 131-134. 97 Barnes, D. G. and Douglass, H. R. "The Value of Extra Quiz Sections in the Teaching of History," University of Oregon Publication, Education Series, Vol. 1, Xo. 7. Eugene: University of Oregon, 1929, p. 276-284. Experimental Research in Education 43 accepted, the control of the time spent in learning activity frequently will be difficult. The evidence cited from the experimentation on dis- tribution of practice is sufficient to warrant the assertion that the experimenter should use whatever means are available to secure, so far as possible, an equal amount of time each day to be spent in learn- ing activity, both in recitation and study, by the experimental and control pupils. 3. The phrase characteristics of a class as a group is used to desig- nate a factor that is difficult to define. It includes what is commonly called esprit de corps. It does not include general intelligence and the other factors listed on pages 19 and 20 since the two groups are ex- pected to be equivalent in these respects. Rivalry among certain mem- bers of a class may stimulate the entire group to greater effort. Be- cause the pupils like each other or because of outside associations a teacher may prefer to instruct one class rather than another and, hence, may exhibit unusual zeal in his work. On the other hand, the members of a class may not like each other. There may even be petty jealousies and enmities that make the teacher's task unusually difficult. The characteristics of the class as a group constitute a very intangible factor that operates in subtle ways. It is, however, of sufficient im- portance to warrant the consideration of the careful experimenter. 4. The size of the class disappears as an educative factor in an experiment where equivalent groups are secured by pairing, since this procedure secures classes of equal size. If the two groups are not equal in size, small differences do not appear to be significant because within fairly wide limits size of class does not appear to be an im- portant educative factor. 9& Incidentally, it may be noted that when generalizing from an ex- periment with classes of a given size, the conclusions may be expected to be applicable to classes of other sizes within a considerable range, provided the size of the class does not affect other educative factors. 5. The size of the school indirectly affects the achievement of its pupils. Larger schools tend to possess superior organizations, better qualified administrative, supervisory, and instructional staffs, and a 98 Breed, F. S. and McCarthy, G. D. "Size of Class and Efficiency of Teaching," School and Society, 4:965-971, December 23, 1916. Edmonson, J. B. and Mulder, F. J. "Size of Class as a Factor in University Instruc- tion," Journal of Educational Research, 9:1-12, January, 1924. Hudelson, Earl. Class Size at the College Level. Minneapolis: The University of Min- nesota Press, 1928. 300 p. Stevenson, P. R. "Class-Size in the Elementary School," Bureau of Educational Research Monograph, No. 3. Columbus : Ohio State University, 1925. 35 p. Stevenson, P. R. "Smaller Classes or Larger: A Study of the Relation of Class-Size to the Efficiency of Teaching," Journal of Educational Research Monographs, No. 4. Blooming- ton, Illinois: Public School Publishing Company, 1923. 127 p. Bureau of Educational Research. "Relation of Size of Class to School Efficiency." Uni- versity of Illinois Bulletin, Vol. 19, No. 45, Bureau of Educational Research Bulletin No. 10. Urbana : University of Illinois, 1922, p. 39. 44 Bulletin No. 48 greater diversity of school equipment and, hence, probably provide a better environment for learning. Inferiority in these things has caused Rufi" to question the efficiency of the small high school. However, in spite of the diversity between small and large schools in these things, it is yet to be proven that school size is anything more than a minor factor in the achievement of the pupils. Gowen and Gooch 100 compared the average college grades of students from large high schools with those of students from small high schools and failed to find a significant difference. Size of school, in itself, does not seem to be anything but a very minor factor in learning activity. As long as small size does not mean different organization, less qualified ad- ministrators and teachers, or lack of the materials of instruction pre- scribed in the experiment, it may be neglected by the experimenter even when the two groups compared are in different schools. 6. It seems reasonable that the organization of the school is im- portant enough to be considered when experimental and control groups are in different schools. For example, schools in which there is in- dividual instruction, ability grouping, or a platoon system are not appropriate environments for experimental groups unless the control groups are subject to the same conditions. To illustrate the im- portance of school organization, the conclusion of an investigation in which the achievement of equivalent rural- and city-school children was compared may be given: The results of this study indicate that the progress of graded-school pupils was approximately one-half school year in advance of that of the pupils with whom they were paired from the rural schools. 101 Since the pupils were equivalent in intelligence and chronological age, the difference in achievement must be ascribed, at least in part, to the superior organization of the graded city schools. 7. The administration and supervision of a school must be an important factor in learning activity if the attention given to these fields in teacher-training institutions is any criterion. However, it is difficult, if not impossible, to find any quantitative evidence in regard to the contribution of this factor to classroom learning. The reason for this seems to lie in the fact that any influence exerted by administration or supervision must be an indirect one operating through the teacher, the course of study, the organization of classes, "Rufi, John. "The Small High School," Teachers College, Columbia University Contri- butions to Education, No. 236. New York : Bureau of Publications, Teachers College, Columbia University, 1926, p. 141. 100 Gowen, J. W. and Gooch, Marjorie. "The Mental Attainments of College Students in Relation to the Preparatory School and Heredity," Journal of Educational Psychology, 17:408-418, September, 1926. 101 Stone, C. W. and Curtis, J. W. "Progress of Equivalent One-Room and Graded-School Pupils," Journal of Educational Research, 16:264, November, 1927. Experimental Research in Education 45 the provision of school equipment, and so on. It seems logical to assume that the educational experimenter who has controlled these more direct factors will have taken care of administration and super- vision. 8. The school building is a factor in school achievement in that it provides an environment that may be beneficial or detrimental to learning. While there is no experimental evidence, it seems logical to assume that learning takes place more readily in the beautiful and appropriate school buildings now in existence than in the ugly and inconvenient structures of a generation ago. The importance of light- ing is recognized in the weight given to it in building score cards. 102 While the chemical composition of the air is usually so constant as- to be unimportant, its temperature, humidity, and movement spell comfort or discomfort to pupils in a classroom and through this in- fluence achievement. 103 Two investigations in which Thorndike has participated minimize these factors of ventilation. 104 However, lest the inference be made that ventilation is a wholly unimportant factor because of these findings, Sandiford has made the following comment: Apparently we can, if we will, work as hard under adverse conditions of heat and humidity as under favorable ones. Even summer school in New York or Timbuctoo need not daunt us ! What should be noted, however, is that these distressing conditions are uncomfortable, and if we subject children to them the likelihood is that their attention will be distracted from work. 105 It may be stated that light, heat, and ventilation become important factors to the educational experimenter only when grossly abnormal conditions prevail. When such conditions are avoided, they probably are not factors of sufficient importance to warrant the attention of the experimenter. The control of general school factors. When the two groups of pupils are within the same school, the most significant general school factors appear to be (1) instructional materials, and (2) time de- voted to learning activity. The control of instructional materials as a non-experimental factor is accomplished by securing an identity 102 For example see: Strayer, G. D. Score Card for City School Buildings. Teachers College Bulletin, Seventh Series, No. 12. New York: Teachers College, Columbia University, 1916, p. 6. 103 Burnham, W. H. "The Optimum Temperature for Mental Work," Pedagogical Seminary, 24:69, March, 1917. Burnham, W. H. "The Optimum Humidity for Mental Work," Pedagogical Seminary, 26:328, December, 1919. McLlure, J. R. "The Ventilation of School Buildings," Teachers College, Columbia Uni- versity Contributions to Education, No. 157. New York : Bureau of Publications, Teachers College, Columbia University, 1924, p. 109. 104 Thorndike, E. L., McCall, W. A., and Chapman, J. C. "Ventilation in Relation to Mental Work," Teachers College, Columbia University Contributions to Education, No. 78. New York: Bureau of Publications, Teachers College, Columbia University, 1916. 83 p. Thorndike, E. L. and Kruse, P. J. "The Effect of Humidification of a Room Upon the Intellectual Progress of the Pupils," School and Society, 5:657-660, June 2, 1917. 105 Sandiford, Peter. Educational Psychology. New York: Longmans, Green and Com- pany, 1929, p. 271-72. 46 Bulletin No. 48 of instructional materials for both the experimental and control groups. Reeder 100 secured such identity by using the same textbook in both groups and permitting access to the textbook only during the class period. It goes without saying that pupils should not only have the same opportunities with respect to a textbook, but should have equal access to supplementary material, such as reference books, maps, charts, and museums, as well. This statement, of course, applies only to those instructional materials that are not involved in the experi- mental factor. Securing equivalence of time spent in learning activity demands that the length of the class period and the number of periods spent on the experimental learning be the same for both groups. It necessi- tates also that the experimental and control pupils spend an equal amount of time in study whether in school or at home. This is probably best accomplished by having all the pupils study the ex- perimental learning for the same length of time under the super- vision of the study-hall director, or possibly their classroom teacher, or teachers. If the experimental and control pupils are to engage in the experimental learning at home, all of them should do so. The control of the time factor in this case may be approached by securing the cooperation of the parents of the children. Equivalence with re- spect to the time factor also demands that at the close of the experi- ment, the experimenter should check the amount of and regularity of attendance of the pupils in the experimental and the control groups. When the attendance of either pupil of a pair is grossly deficient or irregular, the pair probably should be discarded. The characteristics of the class as a group should not be neglected, but little can be done to control this factor. If equivalent groups are secured and the teacher factors are adequately controlled, it is likely that the two classes will not differ greatly in their characteristics as a group. For example, if the pupils are equivalent with respect to intelligence and previous achievement, both the experimental and control pupils will have the same advantages in profiting from the recitations of their fellows. If the teacher, or teachers, instruct the two groups with equal skill and zeal, similar group attitudes should be engendered. The experimenter, however, should endeavor to evalu- ate the two classes with respect to their group characteristics, and, if differences are apparent, the fact should be recognized in interpret- ing the results of the experiment. 106 Reedsr, E. H. "A Method of Directing Children's Study of Geography," Teachers College, Columbia University Contributions to Education, No. 193. New York: Bureau of Publications, Teachers College, Columbia University, 1925. 98 p. Experimental Research in Education 47 The size of the class is automatically controlled as a factor when classes are formed by pairing pupils. The presence of pupils not actually included in the experiment, as is sometimes the case when groups are selected without interfering with the composition of regular school classes, will not interfere with the experiment unless the num- ber of such pupils is large. Where a number of paired groups are used, it is probable that none of the pairs should differ greatly in size if the results are to be combined or compared. The size of the school does not influence one group more than the other, if both are within that school. When experimental and control groups are to be in different schools, the experimenter should select schools that are approximately the same size. In the control of this and other general school factors in cooperative experimentation where several schools participate, a measure of control is attained by having each experimental group paired with a control group in the same school. The cooperative experiment of Breed in which fourteen schools cooperated is an example of this. 107 III. Extra-school factors that affect pupil achievement. Pupil achievement is affected by several factors that have not been included in the preceding lists. The following appear to deserve consideration: 1. Participation in extra-curricular activities 2. The pupil's home life 3. Community interest in and attitude toward the school 1. Carefully supervised participation in extra-curricular activities tends to be beneficial rather than detrimental to learning. 108 The pupil who engages in some such activity frequently becomes more interested in his regular school work. Dramatic, scientific, technical, literary, and debating clubs not only add interest to the school sub- jects to which they are related but they also may contribute directly to achievement. It is likely, however, that for each child, there is an optimum amount of participation above which his school achieve- ment will suffer. 2. The child's home life may influence his school achievement in many ways. Studying at home under parental supervision and with parental sympathy, listening to conversation of parents and other members of the family, reading periodicals and books that the home 107 Breed, F. S. "Measured Results of Supervised Study," School Review, 27:186-204, 262-84, March, April, 1919. los-phis contention is supported in the following reports of research : Tremper, G. N. "The Effect of Participation in Extra-Curricular Activities on the Scholar- ship of the Participants in the Kenosha, Wisconsin, Senior High School." A thesis submit- ted for the degree of Master of Arts in Education. Urbana : University of Illinois, 1928. 63 p. Crawford, C. E. "The Effect of Participation in Extra-Class-Room Activities on Scholar- ship of High School Pupils." A thesis submitted for the degree of Master of Arts in Edu- cation. Urbana: University of Illinois, 1929. 64 p. 48 Bulletin No. 48 affords, traveling with members of the family, and the like are activi- ties that sometimes make large contributions in the fields of school achievement. The following quotations indicate that the parents of children have been regarded as an important factor in school achieve- ment, particularly achievement that results from home study: Home environment is a factor in the formation of study habits. Its influence may be either for good or for bad Home study is desirable because it acts as a check on the formation of habits out of school that would be negative in their influence on habits in school. 109 The survey of the fourth, fifth, and sixth grades seems to justify this con- clusion: Where the parents are capable of guiding the child and are inclined to supervise the home study, their children succeed in school. But where parents are illiterate or for other reasons are unable or unwilling to supervise the home study, their children as a rule either make slow progress or are failures entirely when measured by the progress of their companions in school. 110 It is probable that the conclusion of Brooks exaggerates the im- portance of supervision of home study by parents since Heck 111 has presented data to show that it is immaterial whether students study at home or at school. The inconclusiveness of the research on super- vised study would lead one to question the value of the type of super- vision most parents are capable of giving. It is possible that the at- titude parents take toward the school as an educative agency is a more potent influence than any supervision they may administer. For example, Hurlock 112 has shown that praise of pupils by teachers is much more beneficial to achievement than reproof or indifference. It is probable that the same is true of praise or reproof on the part of parents relative to the school work of their children. Reavis 113 has described several interesting cases in which failure in school achieve- ment was due to detrimental parental attitudes whose correction, effected by enlisting the cooperation of the parents, resulted in the change from failure to success. Listening to conversation of parents and other members of the family, reading periodicals and books that the home affords, and traveling with members of the family are activities that may con- tribute to the experimental learning. These experiences help pro- vide the background of information for the learning that is to take 109 Reavis, W. C. "Some Factors That Determine the Habits of Study of Grade Pupils," Elementary School Teacher, 12:81, October, 1911. n0 Brooks, E. C. "The Value of Home Study Under Parental Supervision," Elementary School Journal, 17:193, November, 1916. ni Heck, W. H. "Comparative Tests of Home Work and School Work," Journal of Edu- cational Psychology, 10:153-62. March, 1919. 112 Hurlock, E. B. "An Evaluation of Certain Incentives Used in School Work," Journal of Educational Psychology, 16:145-59, March, 1925. 113 Reavis, W. C. "Constructive Student Accounting in the Secondary School, A. Ad- ministering the Maladjusted Student," Supplementary Educational Monographs, No. 24. Chicago: University of Chicago, 1923, p. 20-33. Experimental Research in Education 49 place during the experiment. Topics in history, civics, biology, litera- ture, and economics are more meaningful to the pupil who has had related experiences through conversation with members of his family, or through travel. It is impossible to estimate the extent to which school achievement is influenced by these out-of-school experiences. Several recent studies have minimized the importance of home environment with respect to school achievement. For example, Her- man's statement that "57% of the variation in educational age was due to mental age or such hereditary factors as had been measured; about 7% of the variation was due to the influences of school training and socio-economic status combined, or such environmental factors as had not been measured" 114 would lead one to believe that home environment is not a significant factor. However, it is yet to be proven that the information acquired and the ideals and attitudes engendered in the home influence school achievement, as represented in the experimental learning, so little that the experimenter is justified in neglecting this factor. 3. School achievement is influenced by community interest in and attitude toward the school. If the community is high in the socio- economic scale, the members of the community are likely to show much interest in school affairs and to cooperate with the principal and teachers in attaining the best conditions for school work. For example, the parents of such a community may cooperate with the school faculty in providing more adequate library facilities. In other cases the community may be permeated with attitudes antagonistic toward the school administration. Such attitudes among parents tend to be acquired by pupils. Thus, community attitudes and interest may exert a subtle though powerful influence on school learning. The control of extra-school factors. The participation in extra- curricular activities should be checked by means of information se- cured from teachers, school records, or the pupils themselves. The experimenter probably has controlled this factor satisfactorily when he has insured that there is no great excess of participating students in either group. This factor appears to be a minor one; therefore small differences in participation may be neglected. The relationships found to exist between the intelligence of chil- dren as measured by typical intelligence tests and parental occupation and social status would lead one to believe that pairing children on the basis of intelligence scores helps to secure equivalence with respect 114 Heilman, J. D. "Factors Determining Achievement and Grade Location," The Peda- gogical Seminary and Journal of Genetic Psychology, 36 :453, September, 1929. 50 Bulletin Xo. 48 to the home life of the children used in the experiment. 115 Control of this factor is also aided by securing equivalence with respect to previous achievement. If the initial achievement tests are valid and reliable with respect to the experimental learning, and if the groups are equivalent with respect to the mean scores on the initial achieve- ment test, then it seems probable that the groups will be equivalent so far as influence from information obtained out of school is con- cerned. The inference should not be made that the child's home life is considered an insignificant factor, or that securing of equivalence with respect to intelligence is all that needs to be done. The experi- menter should be on the alert to detect cases in which abnormal home environment, particularly detrimental parental attitudes, are handi- capping the learning of individual children in the groups. The factor of community interest in and attitude toward the school will not usually demand attention if the experiment is confined to one school, or if there is a pair of groups in each one of several schools. If the experimental and control groups are in different schools, in different communities, this factor should receive attention. In such cases, however, it seems probable that pairing pupils on the basis of intelligence will do much to insure control of this factor, since it is likely that community interests and attitudes tend to vary with the intellectual level of the children. It is probably desirable that the presence of the experiment should not be given too much publicity, lest the parents and the other members of community take an un- welcome interest in the experimental or control procedures. It is desir- able, for the sake of generalization, that the communities in which experiments are conducted be typical of communities to which the results are to be applied. Summary with reference to control of educative factors. In the light of the preceding discussion and of practical considerations, it appears that equivalence 116 of at least the following educative fac- n5 Book, W. F. The Intelligence of High-School Seniors. New York: The Macmillan Com- pany. 1922. 371 p. Bridges, J. W. and Coler, L. E. "The Relation of Intelligence to Social Status," Psycho- logical Review, 24:1-31, January, 1917. Dexter, E. S. "The Relation Between Occupation of Parent and Intelligence of Children," School and Society, 17:612-14. June 2. 1923. English, H. B. "An Experimental Study of Mental Capacities of School Children, Cor- related With Social Status," (Yale Psychological Studies) Psychological Monographs, 23:266-331, 1917. Haggerty, M. E. and Nash, H. B. "Mental Capacity of Children and Paternal Occupa- tion," Journal of Educational Psychology, 15:559-72. December, 1924. Pressey, S. L. and Ralston. Ruth. "The Relation of the General Intelligence of School Children to the Occupation of Their Fathers," Journal of Applied Psychology, 3:366-73, De- cember, 1919. Terman, L. M., et al. "Racial and Social Origin," Genetic Studies of Genius, Vol. 1. Stanford University, California: Stanford University Press, 1925, p. 55-83. 116 Control of factors by securing equivalence means that conditions have been so arranged by the experimenter that the factors operate equally in both the experimental and the control groups. Experimental Research in Education 51 tors must be secured, or appropriate allowance for non-equivalence must be made in interpreting the difference in gains. 1. Instructional techniques 2. Skill in carrying out instructional techniques 3. Zeal of the teacher 4. Personality traits of teachers 5. Instructional materials 6. Time spent in learning activity It should be noted that frequently the experimental factor is a phase of instructional techniques. When this is the case, the require- ment of equivalence applies only to the remaining phases. A similar comment applies to instructional materials. Control of instructional techniques and of instructional materials can be secured by careful planning and by giving attention to details during the experiment. When the learning is restricted to the class- room, the control of time spent and of the other environmental factors is easily secured, but when the learning involves home study, satis- factory control is more difficult. The greatest difficulty is in securing satisfactory control of skill, zeal, and personality traits of teachers. In addition to controlling the factors enumerated, the other edu- cative factors should be investigated to make certain that no marked differences exist. If there are significant differences, either the experi- ment should be organized so that they neutralize each other, or their possible influence must be estimated and corrected for in the interpre- tation of the data. 117 The problem of controlling educative factors in different types of experimentation. 1. Single group experiments. Since it is impossible to equate non-experimental factors in a single group experiment, con- trol of factors must depend on estimations of their influences on the experimental learning. The effect due to the application of the ex- perimental factor must be singled out from the effect due to all the other factors. This is usually done by comparing the achievement of the pupils under the influence of the experimental factor with their achievement prior to the application of the factor. It is obvious that this procedure will not often secure dependable quantitative results. In addition to the improvement that must be ascribed to the change that has occurred in the intellectual and educational status of the pupils, some of the improvement may be due to the less difficult in- structional material or to the greater zeal and effort- shown by the teacher because of the novelty of the experimental method, or both. 117 Factors not equated may be said to be controlled when their variation is determined and the effect recognized in the difference in gains. 52 Bulletin No. 48 The single group experiment is a desirable activity for the classroom teacher to engage in since it is likely to be stimulating, but in general it cannot be expected to result in dependable answers to educational problems. 2. Experiments in which two equivalent groups are taught by the same teacher. The control of the non-experimental factors in experi- ments in which two equivalent groups are taught by the same teacher is dependent on the degree to which conditions are arranged so that these factors operate equally in both groups. The procedures to be used for securing such equivalence of factors have been suggested in the preceding pages. Since this type of experiment is conducted by a single teacher in one school, personality traits, sex, age, and physical condition of the teacher, size of school, school organization, admin- istration and supervision, school building, and community attitude and interest in the school do not need to receive the attention of the experimenter, since these are the same for both groups. 118 It is evi- dent, however, that the other factors listed may be of unequal in- fluence on achievement unless conditions are arranged with care. After equivalent groups of pupils have been secured, control of the teacher factors of skill and zeal is very important. The teacher should be equally familiar with the instructional procedures and ma- terials used in the experimental group and with the instructional pro- cedures and materials used in the control group. The attitudes of the teacher with respect to these instructional procedures and ma- terials should be such that the teaching is done with equal zeal in both groups. In addition, the teacher must exercise constant care through- out the experiment in order to maintain an identity of classroom- management procedures and time spent in learning activity. The teacher must be able to adapt herself readily when teaching one group immediately after the other. The rotation technique is often employed to secure control of pupil factors when the groups are only approximately equivalent. 119 The instructional procedures and materials of the experimental and con- trol groups are exchanged at the mid-point of the experiment. In computing the results, the gain credited to the experimental factor is the sum of the gains of both groups while under its influence. The gain credited to the control procedures is the sum of the gains of 118 This statement applies only to the control of factors so that the difference may be a significant difference for the groups concerned. If the results are to form the basis of generaliza- tions, these factors must be typical of the schools to which the generalizations are to be applied. 119 If Group A acts first as experimental and second as the control group, while Group B acts first as control and second as experimental group, two hypothetically equivalent groups are secured. Group A (as experimental) plus Group B (as experimental) is equivalent to Group A (as control) plus Group B (as control). In other words, the pupils are equivalent to themselves. Experimental Research in Education 53 both groups while acting in the capacity of controls. The use of this technique, however, may introduce errors of more significance than those it would seek to eliminate. The group that first receives the benefit of the experimental factor is likely to acquire abilities, such as study habits, that will carry over and function when the group is acting as control. What is likely to happen is shown in the following illustrations in which the true gain is assumed to be 8 when the experimental instructional procedures are used. The true gain is assumed to be 4 when the control procedures are used. The experimental instructional procedure is labeled Method X, and the control procedure. Method Y. Then for the hypothetical ideal situa- tion the gains are as follows: Gain Group A 8 with Method X Group B 4 with Method Y Group B 8 with Method X (after rotation) Group A 4 with Method Y (after rotation) Difference = (8 + 8) — (4 + 4) = 8 in favor of Method X Assume that the cany over of study habits by Group A intro- duces an error of 3: Gain Group A 8 with Method X Group B 4 with Method Y Group B 8 with Method X Group A 7 (4 + 3) with Method Y plus study habits Difference = (8 + 8) — (4 + 7) = 5 in favor of Method X The effect of this error plus the effects of others combining with it in unknown ways may be sufficient to destroy the significance of results, especially if the computed difference in gains happens to be small. If the teacher varies in skill or zeal, the use of the technique of rotating pupils is not likely to eliminate the errors created. Let us assume that the teacher prefers the experimental factor to the extent that the error introduced is equal to one-half the influence due to the experimental factor, or Method X, and at the same time let us assume that her dislike for the control procedures, or Method Y, is also sufficient to cause an error equal to one-half of the influence due to the Method Y. Gain Group A 12 (8 + 4) with Method X plus preference Group B 2 (4 — 2) with Method Y plus dislike Group B 12 (8 + 4) with Method X plus preference Group A 2 (4 — 2) with Method Y plus dislike Difference = (12 + 12) — (2 + 2) = 20 in favor of Method X For the sake of comparison let us assume that instead of pre- ferring the experimental instructional procedure, Method X, the 54 Bulletin No. 48 teacher dislikes it and prefers the control procedure, Method Y. Further, let us assume that the dislike of Method X removes half of its effectiveness and the preference for Method Y doubles its effective- ness. Then: Gain Group A 4 (8 — 4) with Method X plus dislike Group B 6 (4 -f 2) with Method Y plus preference Group B 4 (8 — 4) with Method X plus dislike Group A 6 (4 + 2) with Method Y plus preference Difference = (4 + 4) — (6 + 6) = — 4 in favor of Method Y Thus the failure of the rotation technique to control the teacher factor may result in exaggerating the influence of this experimental instructional procedure, or it may result in creating an apparent differ- ence in favor of the really less desirable control instructional pro- cedures. When it is remembered that failure to control the teacher factor may be accompanied by error due to carry over of abilities, it will be recognized that the rotation technique does not insure de- pendable results. 3. Experiments in which two equivalent groups are taught by different teachers in the same school. The control of non-experimental factors when equivalent groups are taught by different teachers in the same school is very similar to the control of factors when both groups are taught by the same teacher. There is no need to give attention to such factors as size of school, school organization, ad- ministration and supervision, school building, and community atti- tude and interest, since these are the same for both groups. 120 The fact that different teachers are used increases the importance of the teacher factors. In order that both teachers will be equal in their influence on achievement, irrespective of the experimental factor, they must teach with equal "skill" and "zeal" in carrying out instructional techniques and classroom-management procedures. The experimenter may seek to secure equality of these teacher factors by selecting teachers who have approximately the same intelligence, training, and experience, and who are not widely different in age or physical condi- tion. After teachers have been selected on the basis of equality, or similarity, in the above characteristics, more adequate control of skill and zeal may be attempted by practicing the teachers in the instructional procedures and materials of the experiment. In doing this, the experimenter should be especially careful to engender scien- tific attitudes toward the instructional procedures and materials in an 120 A partial exception should be noted with reference to the last two factors. School- rooms within a building vary and care should be exercised to insure that the rooms in the experiment as being carried on are not significantly different. Care should also be exercised to insure that the community attitude toward the experiment is neutral. Experimental Research in Education 55 effort to minimize the influence of teacher preferences, or dislikes, for methods or materials. Finally, the use of detailed lesson plans in both groups during the experiment should be effective as an aid in the control of the teacher factors. "Personality traits" must receive attention; however, it is difficult to select teachers who are equivalent with respect to this factor. A principal or other supervisor who is intimately acquainted with the teachers of a school may select two teachers who are approximately equivalent, but since we have no satisfactory means of measuring this fact, the degree of equivalence cannot be determined. The rotation of teachers is a technique frequently used to control the teacher factors. At the mid-point of the experiment the teachers exchange groups and procedures. Group A with Method X continues with Method X but with the new teacher. Group B with Method Y continues with Method Y but with a new teacher. 121 It is probable that the use of this technique eliminates such lesser teacher factors as age, sex, physical condition, and personal idiosyncrasies. It does not seem likely, however, that rotation of teachers adds anything to the control of the important factors, skill and zeal. For example, let 8 be the gain due to Method X for one-half the experimental period, and let 4 be the gain due to Method Y for one-half of the experimental period. Then for the ideal case the gains are as follows: Gain Group A 8 with Method X Group B 4 with Method Y Group B 4 with Method Y (continued) Group A 8 with Method X (continued) Difference = (8 + 8) — (4 + 4) = 8 the "true" difference due to the superiority of Method X. Now let us assume that both teachers are unskilled in the use of Method X so that half of its effectiveness is lost. Gain Group A 4 (8 — 4) with Method X plus lack of skill Group B 4 with Method Y Group B 4 with Method Y Group A 4 (8 — 4) with Method X plus lack of skill Difference = (4 + 4) — (4 + 4) = Again, let us assume that both teachers are equally more zealous for the experimental factor, although they are equally skilled in the use of both methods. 121 Both teachers and pupils might be rotated, in which case the hazards described in the previous discussion of rotation of pupils would be accompanied by those described in the present discussion. 56 Bulletin Xo. 48 Gain Group A 12 (8 + 4) with Method X plus zeal Group B 4 with Method Y Group B 4 with Method Y Group A 12 (8 + 4) with Method X plus zeal Difference = (12 + 12) — (4 + 4) = 16 Finally, let us assume that, although the two teachers are equally skilled in both methods, one is zealous for the experimental factor, while the other is equally prejudiced against it. Gain Group A 12 (8 + 4) with Method X plus zeal Group B 4 with Method Y Group B 4 with Method Y Group A 4 (8 — 4) with Method X plus prejudice Difference = (12 + 4) — (4 + 4) = 8. This happens to be the true dif- ference, but notice the conditions necessary for these two factors to eliminate themselves. Thus the rotation of teachers may fail to eliminate the error due to the teacher factors — skill in the use of instructional procedures and zeal of the teacher with reference to the experimental factor. The rotation technique, whether of pupils, or teachers, or both, is of doubtful desirability since its use does not give more certain control than when rotation is not used. It is a dangerous technique to employ in any form, since it may engender a false idea that by its use non- experimental factors are controlled, and because rotation of pupils or teachers, except at the end of a term or semester, creates an ab- normal situation. 4. Experiments in which two equivalent groups are taught by different teachers in different schools. When the experimental group is in one school and the control group is in another under a different teacher, the general school and extra-school factors become signifi- cant. 122 The fact that different schools are used introduces possible differences in instructional materials, size of school, time spent in learning activity (class periods and study periods), school organiza- tions, school administration and supervision, school buildings, com- munity interests in and attitudes towards the schools, children's home lives, attitudes of homes toward the schools, home facilities for study, home duties performed by pupils, and the participation in extra-cur- ricular activities. The most effective means of securing control of these factors rests in selecting schools that appear to be as much alike as possible. 123 In other words, schools should be selected that 122 The control of pupil and teacher factors is no less important than when a single school is used. m If the results are to serve later as the basis of generalization the school selected should also be typical of those to which the generalizations are to apply. Experimental Research in Education 57 are approximately the same size and in communities of much the same social and economic status. It would not be desirable to select one school that employs ability grouping while the other selected does not. It would not be advisable to select one school that has elabo- rate library and laboratory facilities, while the other school selected does not have these advantages. The control of instructional materi- als and time spent in learning activity is most effectively accomplished by preparing detailed lesson plans for both the experimental and con- trol groups. The possible variation in other factors, such as home facilities for study and extra duties performed by the pupils, should be investigated and the differences observed used as the basis of a limitation placed on the experimental conclusions. 5. Cooperative experiments. Cooperative experiments may be conducted in the same, or in different schools. It is considered the most desirable practice to have the cooperating teachers instruct pairs of experimental and control groups. In the interpretation of the data obtained by cooperative experimentation one of two methods may be used. Each pair of groups may be regarded as a sub-experiment, and the conclusions of these sub-experiments may be compared with one another. The other method of interpretation is that which depends on the concept of the cooperative experiment as a single experiment. All of the experimental pupils are regarded as composing one large experimental group; all of the control pupils are considered as a single large control group. 124 The difference in gains obtained will, of course, be the average of the differences in gains for all of the paired groups. The increase in size of the experimental and control groups by this combining of data will reduce considerably the variable errors of measurement, validity, and sampling that existed for the individual pairs of groups. 125 It is probable that systematic errors, since they are likely to vary from one pair of groups to another, will tend to offset each other. In other words, they may become variable errors and, hence, be more easily accounted for in the statistical treatment of results. It is probable that such combination of experimental and control pupils will aid in securing more perfect equivalence with re- spect to pupil factors and more perfect control of non-experimental factors, since departures from control in the several pairs of groups may tend to balance each other. It is probable, also, that the com- bined group of experimental pupils and the combined group of control 124 It may be desirable to exclude one or more pairs of groups because of gross errors. For an example of such exclusion, see: Douglass, H. R., et al. "The Relative Effectiveness of the Problem and Lecture Methods of Instruction in Principles of Economics," University of Oregon Publication, Vol. 1, No. 7. Eugene, Oregon: University of Oregon, 1929, p. 290. 125 The next chapter describes the interpretation of experimental data with reference to these errors. 58 Bulletin No. 48 pupils will be more representative of the pupils to whom the generali- zations are to apply than one of the small groups would be. The combining of data in this way does not guarantee all this, however, since it is easily possible for a systematic error of measurement, valid- ity, or sampling, or a lack of control of some non-experimental factor to run through the measures of all the groups and thus bias the com- bined results. For example, all of the teachers might be zealous for the new method of procedure that constitutes the experimental factor, since to be zealous for it is the mode. Again, all of the teachers might be unskilled in the use of the method because of its newness. If the cooperative group were all selected from rural schools, or all from city schools, representativeness with respect to all children would not be increased by combining results. Experimentation by cooperation of teachers and schools is eminently desirable, but in order to secure dependable results, data from the cooperating groups should be com- bined with care if summation of faults is to be avoided. CHAPTER III THE INTERPRETATION OF DIFFERENCES IN GAINS The general plan of handling experimental data. The general plan of handling experimental data may be illustrated by considering an experiment involving two groups — one an experimental group and the other a control group. The administration of the achievement test at the beginning of the experimental period 1 yields scores as follows : For the experimental group e b e 2 , e 3 e n whose mean is Ei. For the control group Ci, c 2 , e 3 c n whose mean is Ci. The administration of the test at the end of the experimental period yields a second set of scores: For the experimental group e/, e 2 ', e 3 ', e n ' whose mean is E 2 . For the control group c/, c 2 ', c/ c n ' whose mean is C 2 . The mean gain in achievement made by the experimental group is E 2 — Ei and is designated by the symbol, "Gain E." 2 The mean gain in achievement made by the control group is C 2 — Ci and is labeled, "Gain C." The difference in gains, D, is equal to Gain E — Gain C. The problem of interpretation. The problem of interpretation is to determine the extent to which the difference in gains, D, may be due to imperfections in the experimental procedure and in the measures of achievement and, consequently, to determine the extent to which the experimenter is justified in interpreting D as indicating the merit of the experimental factor. The errors introduced in the measures of achievement by the imperfections of the experimental procedure and of the measuring instruments are of two kinds: variable and system- atic. The effect of the variable errors is described in terms of the chances that, if they were eliminated, the difference would have the opposite sign. For example, assume that the obtained difference, D, is equal to 2.5. If the variable errors were eliminated, D would be different — possibly 4.2, possibly 6.4, possibly 0.7, possibly —1.2, possibly other values. The correct value cannot be calculated, but, if we have certain information about the magnitude of the variable errors, we can calculate the chances that the true value of D will fall within any interval. In view of the fact that D is an index of the 'Under certain conditions it is appropriate to omit this initial test. 2 This may also be obtained by calculating the individual gains and averaging them. 59 60 Bulletin No. 48 merit of the experimental factor, it is obvious that we are primarily concerned with the chances that the true D may be negative. If the calculated D is positive, the true D is more likely to be positive than negative; hence, the experimental factor is more likely to be superior than inferior. However the experimenter cannot make a very strong claim for the superiority of this factor unless the chances for the true D being positive are much greater than for it being negative. How many times greater they should be in order to justify a claim for the superiority is a matter of opinion. The chances are 3 to 1 in favor of the true difference being positive when the obtained difference, D, is equal to the probable error of measurement, P.E. MeaSD , and slightly greater than 10 to 1 when D is equal to twice P.E. Meas . D . It may seem that these chances, especially 10 to 1, are sufficient bet- ting odds to justify a rather strong claim for the superiority of the experimental factor. Undoubtedly they do justify some claim for superiority, but it is a common practice to require that they be at least 369 to 1 in order to call the difference statistically significant 3 with reference to the variable errors being considered. This condi- tion is fulfilled when the difference is equal to or greater than 2.78 times the standard error of the difference or approximately 4.4 times the probable error of the difference. It should be noted that in addition to determining the degree of significance of the obtained difference, D, as indicated in the preced- ing paragraph, it is necessary to consider the effect of the systematic errors due to imperfections in the experimental procedure and to imperfections in the measuring instruments used. When the experi- menter desires to generalize from his results, he must consider also the extent to which the two groups of pupils are representative of the larger group for which he desires to express conclusions. If the experimental group is assumed to be equivalent to the con- trol group, the specific questions to be considered are: 1. What allowance 4 must be made for variable errors in the meas- ures of achievement? 2. What allowance must be made for systematic errors of meas- urement not common to all groups of measures of achieve- ment? 3. What allowance must be made for variable errors of validity in the measures of achievement? 3 "Significant" and "significance" are technical terms in the field of statistics. *The allowance for variable errors in the measures of achievement will be expressed in terms of a standard error of measurement, or °Dist. c - In determining the standard error of measurement of Gain E or Gain C, one should insert the values of the standard errors of measure- ment of Ei and E 2 or C\ and C 2 , obtained by the use of the above formulae in the formulae below. 14 If one has determined the probable errors of measurement of Ei and E 2 , or d and C 2 , the formulae to be used are similar. In place of each standard error substitute the cor- responding probable error in order to obtain the probable error of measurement of Gain E or Gain C. 15 °"Meas. Gain E = A/^Meas.E + 0"Meas. E ~~ 2r El £ 2 • 0Meas. E ' ^Meas-E \ 1 2 12 °"Meas. GainC — X/'o'Meas.c + °Meas. c — 2l*C 1 C 2 ' °Meas. c * ^Meas. c V 1 2 12 Standard errors of measurement of the mean gains may be com- puted in another way with equivalent results. To do so requires cal- culation of the individual gains by subtracting ei from e/, e 2 from e 2 ', Ci from Ci', and so on for all the individuals participating in the ex- periment. The standard error of the mean gain is then calculated by the appropriate formula below: 16 ^Distribution of Individual Gains V I T] 2 °"Meas. GainEor c = -v/N ^Distribution of Individual Gains V 1*12 ~ 1*12* °"Meas. Gain E or c = /^ "This procedure is justified only when the same test, or equivalent forms, are administered at the beginning and end of the experiment. When different tests are given, there is oppoitunity for difference in units and zero points that prevents computation of gains. When the use of the same test, or equiv- alent forms, is not feasible, comparison must be restricted to the final test means, E2 and C2, and the standard error of difference between these means computed by the formula: Meas. V* 2 M« (E 2 -C 2 ) 15 This statement applies also to formulae given later for cr M 1 °"(m-v) • < a (m v) ■ • a . , Gain C (m+3) D' The coefficients of correlation used in these formulae are, theoretically, those between the mean of initial test scores (Ei) and the mean of final test scores (E2) of a large number of similar experimental groups. The same is true for the control groups. Practically, the coefficient used is obtained by correlating the initial and final test scores of the experimental group to give r E E and the initial and final test scores of the control group to obtain r G c • F° r justification of this, see: 1 2 Kelley, T. L. Statistical Method. New York: The Macmillan Company, 1923, p. 178. 16 ru should be corrected to correspond with the standard deviation of the individual gains used. See page 61. 64 Bulletin Xo. 48 To determine the standard error of measurement of the difference in gains, D, one should insert the values of the standard errors of measurement of the Gains E and C in the formula below. 17 OMeas. D = \^Meas. GainE + ^Meas. Gain c The following hypothetical example illustrates the use of the pre- ceding formulae. It is assumed that equivalent forms of an achieve- ment test were used whose coefficient of reliability, ri 2 , is equal to .85. It is also assumed that the correlations between the initial and final test scores have been computed for both groups, and the means and standard deviations of the four distributions obtained. These hypo- thetical values are: r i2 = .85 Ei = 73.32 °Dist. e = 7.60 r El E 2 = .71 E 2 = 76.25 frDist. e ' = 7.44 r Cl c 2 = .65 Ci = 73.20 °Dist. c = 7.56 N = 25 C 2 = 74.12 °Dist. c ' = 7.84 Gain E = E 2 - Ei = 76.25 - 73.32 = 2.93 Gain C = C 2 - Ci = 74.12 - 73.20 = .92 D, the difference in gains = Gain E - Gain C = 2.93 - .92 = 2.01 0"M eas. E 7.60 Vl-. 85 V25 = .5887 17 One step of the total procedure may be eliminated by the use of the following formula: °"Meas. D = V^Meas-j. +°" 2 Meas. E + °' 2 Mea S . c + ff2 Meas. c -^EE^Meas.^ '°MeaB. £ "^C ■*Meas. c '^Meas-c For a derivation of this formula with respect to errors of sampling, see: Lindquist, E. F. and Foster, R. R. "On the Determination of Reliability in Comparing the Final Mean-Scores of Matched Groups," Journal of Educational Psychology, 20:102-106, February, 1929. The comment might be made that these formulae neglect the correlation that may exist between the gains of the paired pupils. In other words, the expression, — 2r gegc cr Meas °"Meas. ~ • --. mental pupils with the distribution of individual gains of the control pupils, should also be included under the radical of the foimula given above, or under the radical in the long formula just given. The authors just referred to justify its exclusion by the statement, "But since there can be no real correlation between the scores of one group and those of another mav be omitted from the equation " p. 105. Coefficients of correlation are regularly obtained by correlating two distributions of measures of the same individuals. The uncertain conclusions of research on the effect of practice on individual differences would cause one to question the dependability of a coefficient obtained by correlating gains of paired individuals. Owing to the uncertainty of this correlation the probable and standard errors obtained with the above formula are interpreted as "limits beyond which the true error cannot fall." For arguments in favor of the inclusion of this expression, see: Walker, H. M. "Concerning the Standard Error of a Difference," Journal of Educational Psy- chology, 20:53-60, January, 1929. Experimental Research in Education 65 7.44V1-.85 ■*-* V 25 .5763 .56V1-.85 °Meas. c 7^= .5856 Cl V25 7.84 Vl-. 85 rn ~ Q tr Mea8 = — = = .60/3 V2o ^Meas. Gain E = V(.5887) 2 + (.5763) 2 - 2 X .71 X .5887 X .5763 = .4438 ^Meas-oain c = V(.5856j 2 + (.6073 > 2 - 2 X .65 X .5856 X .6073 = .4994 0Meas. D = V(.4438j 2 + (.4994j 2 = .6681 or .67 Since the difference in gains, which is 2.01, is three times as large as the standard error of measurement of the difference, which is .67, the following interpretation is justified. Considering only the var- iable error of measurement and assuming that errors due to faulty equivalence, failure to control external non-experimental factors, and departure from validity of the measuring instruments have been eliminated, or otherwise accounted for, then for the groups concerned, and only for the groups concerned, the difference in achievement indi- cates the superiority of the status of the experimental factor prevail- ing in the experimental group. Subject to the limitations just ex- pressed, the probability that the observed difference has the same sign, or is in the same direction, as the true difference is greater than the ratio 740 to l. 19 Stated in another way, if the experiment could be repeated with the same groups, under the same conditions, the chances of obtaining another observed difference of the same sign, or in the same direction, are greater than the ratio 740 to l. 20 The example given illustrates the calculation when obtained scores and the standard errors are used. Other examples might have been given using regressed scores with the standard errors, obtained scores with the probable errors, or regressed scores with the probable errors. Although the calculation of these is similar, care must be 19 As has already been explained, the standard or probable error obtained by the procedure outlined is regarded as a limit. If it were feasible to obtain a reliable coefficient for the small amount of correla- tion that may exist between the gains of the paired pupils and thus to arrive at a more accurate and an always smaller standard error, the chances of statistical significance would of course be greater. S0 The comment might be made in regard to this interpretation that repetition under the same ex- perimental conditions with the same groups should secure differences not only of the same sign, but of the same magnitude. Identical differences would be secured with identical conditions and groups, if it were not for the unreliability of the initial and final tests. The standard and probable errors of measure- ment of a difference allow for this unreliability and nothing else. 66 Bulletin Xo. 48 taken to use the appropriate formulae. The following table gives the chances of statistical significance of differences that are a given num- ber of times larger than the standard or probable error of the differ- ences. The second column gives the chances of the true difference falling within the range, plus and minus, of the probable or standard error of the difference. This interpretation is less applicable to experi- mentation than that given in the third column. The experimenter is most interested, not in the magnitude of the observed difference, but in the probability that the observed difference has the same sign as the true difference. When these chances are great, 369 to 1 or better, the experimenter is justified in asserting that the variable errors of measure- ment do not destroy the dependability of a conclusion in favor of the superiority of the experimental factor. 21 Table I. Chances of Statistical Significance of a Difference The chances that the The chances that the true true difference does not difference has the same differ from the observed sign, or is in the same di- difference bv more than rection, as the observed the indicated amount. difference. D = <7 D 2.1.: to 1 5.3 to 1 D = 2 15,772 to 1 31,545 to 1 D = P.E. D 1 to 1 3 to 1 D = 2 P.E. D 4.6 to 1 10.3 to 1 D = 3 P.E. D 22 to 1 45 to 1 D = 4 P.E.d 142 to 1 2S6 to 1 D = 5 P.E.d 1,341 to 1 2,684 to 1 *This multiple of the standard error of difference appears in McCall's formula for the experi- mental coefficient: E.C. = Difference 2.78 X ^Difference When the expression is equal to 1.0, the chances that the true difference has the same sign are in the ratio of 369 to 1. McCall uses this as the critical point below which differences should not be recog- nized as significant. If the chances are greater than 369 to 1 then the difference is to be recognized as significant. The statement, "An experimental coefficient of 1.0 is just exactly practical certainty. An experimental coefficient of .5 means half certainty, one of 2.0 means double certainty and so on," is not very meaningful since it is impossible to multiply certainty. See: McCall, W. A. How to Measure in Education. New York: The Macmillan Company, 1922, p. 404-405. The allowance for variable errors of validity. Achievement is not a unitary thing. It includes three types of controls of conduct : (1) spe- cific habits; (2) knowledge; (3) general patterns of conduct. In a given case the achievement to be considered may be restricted to only certain elements under one of the rubrics. For example, in an experi- 21 See footnote on page 60. Experimental Research in Education 67 ment to determine the relative merits of two methods of teaching addition, the achievement to be measured might be restricted to the skills (specific habits) that function in doing examples of addition of a specified type. In an experiment to determine the relative effect of certain methods of teaching English literature in the high school, the achievement to be measured might be restricted to changes in the interest of the pupils in reading literature of a specified type. On the other hand, when the problem of an experiment asks concerning the effect of an educative factor without any restrictions, there is the implied requirement for measuring all elements of the resulting pupil achievement which may include specific habits, knowledge, and gen- eral patterns of conduct. In order for an achievement test or a group of such tests to yield results that are valid for a given experiment, it must measure, either directly or indirectly, all of the elements of the achievement or a representative sample of all of the achievements specified or implied by the statement of the problem of the experiment. The allowance for the variable errors of validity can be calculated if the coefficient of validity is known. In order to obtain this coeffic- ient it will, of course, be necessary to have a valid criterion measure of the achievement specified by the problem of the experiment. If such measures were available for the pupils in the experiment, they would be used, and the question of validity would be eliminated. This will seldom be the case, but it may happen that the test used has been validated previous to its use in the experiment by calculating the coefficient of correlation between the scores it yields and the valid criterion measures. If this coefficient, r 1G , is known and the standard deviations on which it is based are approximately equal to the standard deviation of the obtained scores, then the gross 22 allowance for vari- able errors of validity (validity and measurement) may be calculated by the following formulae: 0"(m+v) E EC orC 12 1 2 = <7Dist. \/ 1 — r i C Vn ^(m+^Oc^E \ ff (m+v) Ei + (r (m+v) Ej — 2r E 1 E 2 ^m+v)^ " °"(m+v) E 2 °"(m+v) Gain c \'0"(m+v) c + °"(m+v) c — 2r ClC2