Contrtt)utton£( to Ctrucation Ctatiiers CaUeac Sbttiti 'LL'''' I \ Book - ^'"^^ "^ Copyright N" _— CDE^tlGHT DEPOSIT. Qualities Related to Success in Teaching By Frederic Butterfield Knight, Ph.D. Teachers College, Columbia University Contributions to Education, No. 120 Published by dTeacijersi College, Columbia Winihtvaitp NEW YORK CITY 1922 Copyright, 1922, by F. B, Knight ,»•'* S^ <^'^^ SEP -6 mi )i ©C1A683866 r s 4 ^ ACKNOWLEDGMENTS The writer congratulates himself upon the fact that for many happy months he was a student of Dr. E. L. Thorndike, of Teachers College, to whose genius and leadership an ever-growing army of students owe their greatest inspiration. He wishes to thank the teachers whose unusual patience and forebearance made the gathering of the data for this study possible, and to express also his deep obligation to Dr. George D. Strayer and Dr. WilHam C. Bagley, of Teachers College. PREFACE The only variation from current practice in the organization of the subject matter of this study is the changing of the position of the conclusions. The writer has stated his conclusions at the very beginning of the study. This was done to enable the busy reader to see at a glance the general trend of the study. As the practical school administrator knows, the gathering of the kind of data upon which this study is based is a matter of some difficulty. It is the kind of data, however, which we must have if science is to aid in the selection of teachers. Some psychologists may contend that an analysis of teaching rather than the correlation of observable facts with varying amounts of success of actual teachers is the only correct method of determining what tests will distinguish good from poor teachers. No one would deny the value of sagacious insight into any problem of human engineering. So far neither the analysis method nor the correlation method has done very well in practice on the practical job. This study is based on the correlation method. Its shortcomings should not be confused either with the logical soundness or with the practical superiority of test construction on a basis of correlation between test scores and performance. Frederic B. Knight. CONTENTS Conclusions Based on This Study viii I The History of Teacher Rating 1 II The Method and Data Involved in This Study 5 III Measurable Facts Related to General Teaching Ability 21 IV The Relative Significance of the Qualities Measured 37 V The Relation Between Specific Traits When Several Judges Rate the Same Teachers in Those Traits 45 CONCLUSIONS BASED ON THIS STUDY This thesis deals with the problems of isolating the significant and measurable qualities of effective teaching and the methods of measuring these qualities. It is a continuation of similar studies of which the work of Meriam was the first. A rating of 153 high-school and elementary-school teachers was obtained by hav- ing the teachers rate each other for the quality of general teaching ability and other traits. While it may be said that the teachers knew each other only in a social way and, therefore, could not rate each other for general teaching ability, the data show that an adequate rating can be procured by this method. The statistical treatment of the data shows that : 1 . Chance halves of the mutual ratings of the teachers correlate with each other -f .899, ± .01. 2. The mutual ratings of the teachers correlate with the ratings of the supervisors -\- .962, ± .001. 3. The mutual ratings of the teachers correlate with pupils' estimates +.681, ±.05, 4. There is also substantial agreement among the mutual rat- ings of teachers, when they rate for specific traits, such as intel- lectual strength and skill in discipline. The average correlations between chance halves of the ratings are respectively + . 879, ±.016, and +.838, ±.023. These correlations are evidences of the ability of teachers to rate each other. The ratings for general teaching ability, secured in this way, were used as measures of teaching merit, against which objective facts were correlated. The correlations between general teaching ability and age, amount of experience, quality of handwriting, intelUgence as measured by test, major academic interests, nor- mal-school scholarship, amount of professional study during active service, and ability to pass a professional test have been secured. The correlations are too low to warrant one in using these factors for prognostic purposes, except ability to pass a professional test (+.541), normal-school scholarship (+.153) and intelligence (+.108). By using the coefficient of partial correlation we find, in viii Conclusions Based on This Study ix the case of elementary school teachers, that, the factors of intelli- gence and normal-school scholarship being constant, there is a mutual relationship of -f . 57 between ability to teach and ability to pass a professional test. Professional tests may be used to estimate probable success in teaching. The amount of profes- sional study accomplished during active service is also indicative of success in teaching. The number of teachers who had accom- plished professional study of this sort was too small in the groups which were studied to allow an accurate determination of the degree of significance that professional study has. In the case of high-school teachers, intellectual differences, as determined by mental tests, appear to be significant. For the selection of high-school teachers the use of mental tests would be of value. These data, as a whole, may be interpreted to mean that the general factor of interest in one's work becomes the dominant factor in determining one's success in teaching. The reasoning which leads to this conclusion is not straightaway, for we have not as yet objective tests of interest. We do know, however, that other measurable traits, either alone or in combinations, are not adequate explanations of teaching success. With our present knowledge it is reasonable to suppose that genuine interest in one's work accounts for a large part of teaching success. In the second part of the study data are presented which show the spread of general estimate to particular traits, when judgments or ratings are made. For example, when a judge attempts to rate a teacher in some particular trait, his rating is a defense of his general estimate of that teacher, as well as a rating of the trait under consideration. The mutual judgments of teachers for the trait, intellectual ability, correlate with their judgments of general teaching ability -f-. 935, ±.014. The mutual judgments of teachers for the trait, skill in disci- pline, correlate with their judgments for general teaching ability + .789, ±.001. The mutual judgments of teachers for the trait, skill indiscipline, correlate with their judgments for intellectual ability +.863,=*= .080. It would be difficult to hold that these correlations represent the true relationships which exist between these pairs of traits. The X Qualities Related to Success in Teaching presence of a large factor of spread of general estimate accounts best for the size of these correlations. A study of the correlations between the ratings of 126 teachers in a New York school system for 15 traits showed that 105 of the 120 correlations studied could be accounted for by chance varia- tion from an average correlation, even if a perfect, or a 100 per cent, spread of general estimate was present. A study of the correlation between qualities of teaching as presented by Boyce in his work, published in the Fourteenth Year- hook of the National Society for the Study of Education, shows that 85 per cent of the correlations come within a range of ±.150. These facts can be satisfactorily explained only when a factor of spread of general estimate is allowed. It seems fair to conclude, therefore, that in judging particular traits general estimate influences the particular estimate to such a degree that judgments of particular traits are in themselves of little practical use. CHAPTER I INTRODUCTION This thesis^ lies in the field of research which is concerned with methods of rating teaching, of determining the significant factors in teaching ability, and of measuring objectively such factors. This field of research is by no means a virgin one nor is it one of academic interest only. Practical school administrators have no more important and, at present, no more troublesome problems than those which are grouped around the technique of selecting and rating teachers. Actual isolation of such factors as intellect and temperament, which are indispensable to successful teaching, and the discovery of a method determining whether a prospective teacher possesses the indispensable qualities of a good teacher would be a' boon to school administrators. During the past fifteen years educators and psychologists have given their earnest attention to this series of problems, on which a great deal has been written and on which much research work has been done. Three studies have been selected to show the general development of attempts which have been made to find solutions to different phases of the personal-management problems of our public schools. meriam's study Dr. L. L. Meriam, in a research study. Normal School Education and Efficiency in Teaching, published in 1906, Teachers College Contributions to Education, No. 1, Chapter IV, presented data which were used to discover the correlation between teaching efficiency and scholarship in the normal school. "This is the problem," said Dr. Meriam. "Is the efficient teacher the proficient scholar? To what extent is he so in each of the subjects of the normal-school course? In other words, does the one who stands high among fellow-teachers stand rela- tively high among fellow-students in the work preparatory to his ^All data used in this study are on file at Teachers College, Columbia University, New York City. 1 2 Qualities Related to Success in Teaching teaching? Such a study of mental relationships is in itself a study of causes. If it be found a rule that efficiency in teaching follows proficiency in scholarship, then, other things being equal, the latter may be considered a vital contribution to the latter. And this is our present purpose: to discover, so far as possible, what elements enter into the making of a capable teacher. Corollary questions are : To what extent does proficiency in scholarship mean efficiency in teaching? . . ." In Dr. Meriam's research study an admirable attempt was made to find out the relative teaching ability of a large number (1,185) of normal-school graduates. Equally careful work was done to determine the relative normal-school success of these graduates. Meriam had no accurate measure of teaching efficiency and no reliable measure to equate the amount of success in one school system with the amount of success in another. He encountered the same difficulty in interpreting normal-school marks as meas- ures of scholastic accomplishment. Great statistical ingenuity was shown by Dr. Meriam and his results, by all odds, were the most dependable at the time of the publication of his thesis. The correlation between normal-school standing and ability or success in the field was found to be so surprisingly low that dif- ferences in scholarship among students in the normal schools seemed to bear a negligible relation to future differences in teach- ing ability. Meriam found that practice teaching during normal- school training was slightly prophetic of the quality of teaching which should be expected after graduation. Examinations con- taining professional subject-matter did not appear to furnish a significant index of an individual's ability to teach. The statistical difficulties of Meriam's work should not blind one to its value. It clearly stated the problem of correlating teaching ability with factors which are more or less objective and measurable. It developed a technique of research that was sound in theory. It exercised much influence in taking the problem of teaching efficiency from the field of opinion and discussion and in placing it, where it properly belongs; namely, in the field of re- search and objective measurement. Meriam's more important findings are expressed as coefficients of correlation between teaching efficiency and scholarship in normal-school studies. These he reports as follows: Introduction 3 Correlation between Teaching Ability and Practice Teaching + . 39 Correlation between Teaching Ability and Psychology + . 37 Correlation between Teaching Abihty and History and Principles of Education + .28 Correlation between Teaching Abihty and Method Courses + . 29 Correlation between Teaching Abihty and Academic Coiu-ses + . 22 Meriam's data also support his conclusion that, after the first year of teaching, experience, as such, has little if any influence on the improvement of teaching efficiency. Elliott's study In 1910 another treatment of this general subject was published which deserves notice. Dr. Edward C. Elliott presented to the second annual convention of city superintendents in Wisconsin "A Tentative Scheme for the Measurement of Teaching Effi- ciency." This score card has been revised in detail, but the first scheme included all the essential factors. Elliott stated these three propositions which were of more than temporary importance: "Is it possible to devise and to apply to the teaching process, impersonal, quantitative standards, whereby the relative worth and efficiency of teachers may be determined more justly and with greater precision than under the ordinary practices of the day? " Does not the effective organization, administration, and super- vision of public schools require that the conditions and results of the teachers' work be subjected to measurements of a quantitative rather than a qualitative nature? "Is it possible for the present generation to make any reliable and satisfactory conclusions concerning the direction and rate of educational progress without standards of value resting upon a quantitative basis?" The scheme divides teaching efficiency into seven sections and to each section assigns a weight or value. The scheme, in sum- mary, is here reproduced: I. Physical Efficiency 12 points II. Moral-nature Efficiency 14 " III. Administrative Efficiency 10 " IV. Dynamic Efficiency 24 " V. Projected Efficiency 6 " VI. Achieved Efficiency 24 " VII. Social Efficiency 10 " Total 100 " 4 Qualities Related to Success in Teaching The value of this scheme is that attention is directed to par- ticular traits and that diagnosis of teaching merit is stimulated. The suggested values are, of course, matters of opinion. The assumption that analysis of the teacher and of the judgment of particular qualities, studied in isolation, can be made is highly questionable. boyce's study In addition to Meriam's study the only other research of ex- tensive nature is that made by A, C. Boyce^ and published under the title "Methods of Measuring Teachers' Efficiency," Part II of the Fourteenth Year-Book of the National Society for Study of Education. Boyce obtained the rating of a great many teachers for general merit and for specific qualities. Then, by a method of correlation, he worked out the relative significance of the qualities. There are many technical improvements in this study over that of Meriam's, but the general procedure is the same. For fifteen years the teaching profession has been sensitive to problems of recruiting new members. As yet, however, no one knows the exact formula for success in teaching. The complexity of personality and character and the many-sidedness of teaching have continually baffled useful analysis. We know that several measurable traits are not essential to successful teaching, but we do not know what traits must be present in superior instructors. The inspiring advance in the application of psychological methods to the selection of clerks, stenographers, machine operators, and fliers in industry together with similar success in vocational guidance in professional education, such as engineering and dentistry, increases our confidence in the hope that before long psychology will enable school administrators to select teachers with frequency and size of error far smaller than prevails at present. 1 For a discussion of this study, see the last part of Chapter V. CHAPTER II METHOD AND DATA INVOLVED IN THIS STUDY An accurate rating of a sufficiently large number of teachers for general teaching ability must be obtained before any analysis of the significant qualities of teaching is possible. We must know who the good teachers are, who are the poor teachers and who are the fair teachers, before it is worth while to attempt to find out what facts are pertinent in judging their teaching skill. After we get a group of teachers who we know differ among themselves in general teaching ability, by certain amounts or units, then we may proceed, by a method of correlation, to find out what facts about them are of prognostic or diagnostic value. Such a rating of general teaching ability for 156 grade and high-school teachers who were at work in the public schools of Towns A, B, and C, in Massachusetts, during the school year 1918-1919, has been obtained. There were six groups of teachers. Three of these groups were the grade teachers in Towns A, B, and C, and three were the high-school faculties in these towns. The number of teachers in each group follows: Grade teachers 53 High-school teachers 15 Grade teachers 35 High-school teachers 13 Grade teachers 30 High-school teachers 10 Town A Town B TownC Three separate ratings for general ability in teaching were ob- tained for each group. One rating was secured from the supervi- sors in each system for their respective teachers. Another was secured by the mutual judgments of the teachers themselves of each group. Another was secured by a consensus of pupils' opinions. In general the method which was used in deriving the ratings was to have the several judges rate each teacher in the group rela- tive to the other members of the group for the broad quality general ability as a teacher. The theory which underlies this method is this: Where direct measurement in terms of amount is impossible, measurements by relative position in a series may be so controlled that possibly as exact and as true ratings may be ob- 5 6 Qualities Related to Success in Teaching tained as if units of amount had been used. It is assumed that the amount of difference between two teachers who have been thus judged will depend on the ease with which the differences are ob- served by competent judges. In using this general method of rating teachers for teaching ability, we have taken it for granted that the good teacher is the one whom competent judges rate as good. We hold throughout this study that the poor teacher is the one whom the judges have rated as poor. These hypotheses will presumably be acceptable to those who are familiar with the theory and practice of social measurements. It may be admitted that one could question the final truth that the opinions of any number of judges, however competent and harmonious they might be, necessarily establish the facts of teaching merit. Thus the really good teacher, it might be held, is the one who gets her pupils on fastest. To determine how much the progress of pupils is due to any one teacher is not possible by any method or information that is as yet available. Even to measure a pupil's total progress, much less the total progress of a class, is as yet a little venturesome. It might also be held that the amount of development of char- acter and morality in the pupils is the only test of good teaching and that what others think about the teacher is really irrelevant. To hold, on the other hand, that competent judgments of teach- ers, when properly combined, will give a very useful and approxi- mately true rating, as well as probably the best rating method that is now available, is only common sense. This rating method is entirely defensible. The good lawyer, after all, is the one w^ho is considered a good lawyer by fellow-members of the bar. The poor dentist is the one to whom no other dentist would go or recommend anybody else. The great preacher is the one who attracts visitors. The good teacher is the teacher who is thought to be good. Where differences in skill among employed people must be de- termined, judgments in terms of hetter than the average, poorer than one's associates, and similar expressions, are useful measures of ability. Of course, the final vahdity of the judgments may be les- sened by the presence of constant error in the opinion that it of- fered, or by the incompetence or paucity of the opinions expressed, or by the failure properly to combine the judgments after they are obtained. Method and Data Involved in This Study 7 Of the three ratings, — by the teachers themselves, which is labeled "A," by the supervisors, " B," by the pupils " C," — we shall take up first the ratings of the teachers which are indicated by the judgments of their fellow-teachers. PROCESS OF RATING TEACHERS, BASED UPON THE TEACHERS* ESTIMATES Step 1 . Teachers' meetings for each group were called and the teachers were asked to rate each other for general teaching ability, using the relative-position method. The ratings were not in terms of good, fair, poor, because what one teacher might consider good, another teacher who was more critical might consider only fair. This type of difference might run through the series of judgments. The ratings were not secured in terms of how much below the best teacher you have ever known, or the equivalent expressions, for er- rors of an obvious nature are bound to creep into any such rating: system. The ratings were all given in terms of relative position within the group itself. Thus, when the grade teachers of Town A, rated each other, every teacher placed in order of merit all the teachers in the Town A group. The amount of difference between! the teachers in the final rating was determined by combining all the judgments of the teachers. Each member of the six groups of teachers, while in a teachers' meeting, rated those in the group to which she belonged in a similar fashion and under similar condi- tions with the same instructions. The instructions which were given to the teachers follow: INSTRUCTIONS TO TEACHERS On this sheet you are requested to give certain ratings of each teacher in the list, including yourself. Please rate every teacher and please be absolutely frank in your ratings. You need not sign your name. Nobody will ever know how you or anybody else rated him. No personal use will ever be made of any of these ratings. They will be used in a purely scientific study to determine the significance of age, education, early interests, etc., etc., for success as a teacher. The names will all be cut off and destroyed as soon as the different items in the inquiry have been numbered to fit the ones to whom they refer. Also, do not feel disturbed because in each respect somebody has to be rated lowest. These ratings are all relative, and the lowest teacher in the group may well be of very great abiUty. Please be siure to record ratings, even if they seem to you to be little better than mere guesses. The opinions of twenty men give a useful rat- ing, even if any one of the twenty taken alone is almost worthless. 2 8 Qualities Related to Success in Teaching On the sheet is a list of the teachers. Choose the teacher of greatest teaching ability and write 1 after his or her name in Column 1. Choose the teacher next below in teaching abiUty and write 2 after his or her name in Column 1. Write 3 after the name of the one next below in teaching abiUty, and do so for 4, 5, 6, etc. If two or more seem absolutely equal in teaching ability give them the same rating.^ After the teachers had read the instructions carefully a few minutes were allowed them for asking any questions that might occur. When it was clear that the teachers understood what was wanted of them, they proceeded with the rating. No names were signed to the rating sheets. It was evident that honest and sincere opinions were expressed. The resultant ratings of each of the six groups of teachers were then examined. Those sheets which were incomplete or did not sufficiently distribute the ratings were discarded. This lack of usable material was not at all great. The teachers found that rating each other was a method of polite gossip and was evidently more or less enjoyable. For each set of teachers sufficient material was obtained. The spread or range between the poorest and the best teacher was large. In many cases it was as great as the number of teachers involved. The number of useful ratings (97) were distributed as follows : Grade teachers 30 ratings High-school teachers 14 Grade teachers 16 High-school teachers 10 Grade teachers 18 High-school teachers 9 Total 97 Town A Town B Town C Step 2. Each set of ratings was then divided into two halves by chance drawings. Each half has been treated separately through- out this study. These halves will be referred to as Group A and Group B. The carrying of two groups makes corrections for at- tenuation in the correlations and shows also the reliability of the judgments. A transcript (see Table I) of fifteen of the ratings for general teaching ability of Town A grade teachers has been made. These ratings compose one group (Group B) of the mutual judgments which is treated later to get a single rating of teachers. The * The complete instructions are given on pp. 46-48. Method and Data Involved in This Study 9 columns include the complete ratings of fifteen different judges. Thus, by looking at the first column one will see that one judge rated Ah — as the 21st best teacher of the group, Dr — as the 6th best, Sm — as the best, Hi — as the 3rd, El — as the 21st, and so on. The numbers opposite each teacher's name show the ratings received. Thus, Ah — was rated by one teacher as 21st, by an- other as 2nd, by another as 1st, by another as 11th, by another as 1st, and so on. In this way, we have approximately 750 ratings of teachers in a group in terms of relative worth. The ratings by fifteen other teachers of the teachers of this group were similarly obtained and transcribed. We have the judgments of every teacher on every teacher. Occasionally a judge failed to rate some teacher. This was due to the lack of acquaintance with that teacher. This kind of omis- sion is an index of thoughtful estimate, because it indicates the fact that a judge who had no opinion gave no rating rather than record a mere guess. The trustworthiness of these judgments will be discussed later. ^ Step 3. We now have the relative ranking of each teacher in the opinion of fifteen judges. This is a chance half of all of the ratings; namely, Group B. Group A was similarly obtained. The next step is to combine these two groups of ratings into a single rating. The theory which underlies the procedure requires some explanation. We know several facts about the resultant and combined rating, even before we make it. First, the final relative arrangement will not be the result of any one individual judgment, since it will be the product of the ratings of fifteen judges. The bias of any single person who has served as a judge will not operate unduly to influence the final result. The fifteen sets of ratings were chosen by chance and chance errors of overestimation or underestimation of any teacher, because of the particular friendship of or dislike for any teacher, by a particular judge, will be offset by opposite chances. Second, except for a negligible chance in the drawing of the fifteen ratings for Group A or Group B, there will be no constant error, due to the fact that the judges may know some of the teach- ers very well and others only slightly. No one judge will know all the teachers equally well; but the teachers who are well known *Seepagel7. xo Qualities Related to Success in Teaching TABLE I A Transcript of Fifteen Ratings for General Teaching Ability Group B, Town A, Elementary Grade Teachers 1 Ratings by Judges 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 Ah— 21 2 1 11 1 21 1 1 1 1 2 11 2 7 2 Dr— 6 1 3 1 10 2 2 2 1 3 52 2 3 Sm— 1 3 1 1 2 18 2 2 1 5 46 1 4 Hi- 3 15 16 3 4 2 1 7 19 3 5 El— 211 3 8 25 2 26 1 2 2 4 2 2 15 22 8 6 Co— 23 5 5 1 8 1 6 2 3 22 1 34 32 20 7 Sy- 15 2 13 1 17 2 1 2 4 24 2 21 16 . . 8 Pa— 11 14 6 17 1 13 1 4 2 3 12 5 1 11 6 9 Sp- 10 14 10 14 1 18 11 8 2 2 2 3 6 15 2 10 Le— 28 1 2 1 11 7 2 4 2 1 17 17 15 1 It was proper to give two teachers the same rating if their abihty seemed equal. Better-Worse Judgments Based on Table I Read: 1 is judged worse than 2, 7 times with equal or tied votes; 1 is judged worse than 3, 6 times with 1 equal or tied votes, and so on, a b c d 6 f ff ^ 120 — 7 562 — 6 31 — 6 73 — 6 4 1 — 4 7 8 9 2 4 2 — 4 — 6 — 6 8 9 10 2 2 3 — 7 — 5 — 6 451 — 2 893 — 6 61 — 2 10 3 — 4 73 — 2 12 1 — 5 Columns a, b, e, f refer to teachers by number. Columns c, g, record tied votes. Columns d, h, record worse votes. Method and Data Involved in This Study 11 to some of the judges will be those who are not so well known to others. Then, too, those teachers who are little known to some judges will be well known to others. Thus, intimate knowledge and lack of knowledge on the part of those who are judging the teachers will be somewhat evenly spread over the whole Ust. Third, a fairly minute scale will be possible, because the ratings are spread about as widely as the number of teachers who have been judged. Many of the ratings which were presented by the teachers had a spread of over 40, and the number of teachers who had been judged was 52. In any scale which is based on relative position, it is well known that absence of personal bias, absence of constant error, and a large number of judges are the chief desiderata. In these ratings such desiderata are present. THE LOGIC OF THE METHOD We must have at our command a technique, not only of chang- ing measures of relative position into measures of units of amount, but also of combining incomplete judgments of relative positions into units of amount. Our problem, in simplified form, is to change differences which are noticeable to competent judges into differences of units of amount. A concrete application follows: If, in the opinion of judges, person A is better than person B, we must find out how much better. Suppose that, in j udging A and B, we have ten opinions. If five of the judges think that A is better than B and five think that B is better than A, then we are justi- fied in calling the matter a draw and rule that A and B are equal. Suppose, however, that six think A better than B, and only four think B better than A, then we are justified in holding that A is better than B. The question now is: By how much is A better than B? The percentage of judges who notice a difference be- comes the basis of our procedure. It is reasonable to suppose that differences which are noticed equally often are equal in amount. We assume that all judgments are of equal value. Further, we arbitrarily define as one unit of difference that difference which 75 per cent of the judges notice. Thus, if 100 judgments are made comparing A and B and 75 vote A to be better than B, then A is better than B by one unit. When the data are incomplete, the only thing to do is to com- pare those judgments which are complete and disregard the judg- 12 Qualities Related to Success in Teaching ments which are not paralleled with similar judgments of the per- son compared. As an illustration, let us turn back to the judgments of the Town A grade teachers (Table I) . In comparing Ah — with Dr — , we shall neglect the fifth judgment of Ah — because it is not paralleled with one of Dr — , but we shall not neglect it in compar- ing Ah — with El — , for both have ratings given by the same judge. We also give to the teacher who has been rated the lowest an arbitrary value of one unit. From this as a base we build up the values of the series. It is obvious that the more judgments there are, the better will be the final rating. The more competent the judges are, the more dependable the final rating will be. These marks will be used as measures only in respect to their differences. Thus there is a difference of 10 units between a teacher rated 15 and one rated 25. But it would not be correct to think of a teacher rated 20 as twice as efficient as one rated 10 or half as good as one rated 40. These quantitative measures of ef- ficiency cannot be compared on a basis of multiplication or divi- sion. We are interested in the differences and in no other mathe- matical relations. Thus, if teachers A, B, C, are rated 10, 20, 30, we know that they vary in ability by equal amounts. For our purposes, it would make no difference, if the measures were 110, 120, 130, or 610, 620, 630, or 750, 760, 770, as long as the quantita- tive differences should be preserved. If we know where the true zero of teaching efficiency is, in rela- tion to the values 10, 20, and 30, then, of course, other mathemat- ical relations can be immediately worked out; but we do not know where the true zero point lies. APPLICATION OF THE METHOD In computing the final scale, we begin by making a rough ap- proximation of the probable order by inspection or by computing the median rank of each teacher for her fifteen ratings. From an inspection of the Town A grade ratings, it is easily seen that Ah — will be better than Co — , for example, since Ah — 's median rating is 6, while Co — 's is 11 + . The rough approximation is then refined. This approximation is done by comparing each teacher with the one next above and next below and by continuing the process for several places each way. Usually three places will be sufficient. This refinement is continued by finding the percentages of the Method and Data Involved in This Study 13 judgments which are in favor of each teacher and the percent- ages of the judgments which rate numerically lower each teacher in comparison with those near her. In every case only those judgments are used in which both teachers who are compared are also rated. The approximate arrangement, in some cases, will be wrong. Then the teachers must be shifted in order. By using the unit values which have been calculated from the table that corresponds to the percentage differences, a scale of amount of difference between the teachers can be built up. In actual procedure it is better to begin with the worst teacher and work up. A procedure of trial and success, with frequent shiftings back and forth, will be found more economical of time and patience than a more complicated method. Referring back to Table I, we find that in comparing Ah — with Dr — the fifth rating must be disregarded, as it is incomplete. In seven cases Ah — is higher numerically or worse actually than Dr — . There are twelve usable judgments in all. This gives 7 out of 12 votes, as it were, against Ah — . The percentage is then found and its corresponding unit value. It is well to determine the percentage differences, not only between a teacher and the next one to her, but also for three teachers away. In this way any individual, if rightly placed, will be above not only the one next lower, but also above the three next in order. Where mistakes in order occur, then there must be some shifting. Two exceptions should be noted: Tie ratings are split, and when there is a 100 per cent agreement that one teacher is better than the next, then there is theoretically an infinite, or at least an unknown, amount of difference between them. In this case commonsense is as good a guide as any. Either the statistician can make two or three indirect comparisons through other teach- ers' ranks in comparison with the two in question, or he can as- sign a value to 100 per cent a little larger than that assigned to 99 per gent. The latter procedure was followed here as the amount of difference between a 99 per cent to 1 per cent vote is 3.45 units. We arbitrarily assign 4.00 to 100 per cent to per cent comparison. While no doubt the explanation seems complex, the procedure, if followed as described above, will readily yield a well-made scale. Turning back to the transcript of ratings (see Table I), which were made by the Town A teachers, we find that the " Bet- 14 Qualities Related to Success in Teaching ter-Worse" columns give in detail the comparisons in judgments. It reads: 1 is worse than 2 (with no equal or tied votes) 7 times; 1 is worse than 3 (with one equal or tied vote) 6 times. Of course, when the worse votes are less than the better, the table is used in the same way. Table II shows, in part, the amount of difference in percentages from the worst to the best, with comparisons two or three places removed for teachers of Group A, grade teachers. Town A. The numbers down the table and across are key numbers to the teach- ers' names, as has been noted in Table I. The table reads: Teacher 2 is better than teacher 1 by 58 per cent; teacher 2 is better than teacher 3 by 61 per cent; teacher 1 is better than teacher 3 by 54 per cent; teacher 8 is better than teacher 7 by 57 per cent; and so on. TABLE II Amount of Difference in Percentages Group A, Town A Grade Teachers 1 3 6 4 8 7 5 10 12> 62 58 12' 50 57 12' 54 64 19 73 64 67 58 55 44 54 12^ 57 50 53 162 45 16' 54 57 2 2 58 61 54 73 84 65 58 68 81 75 50 50 57 77 75 73 77 54 54 54 65 1 3 6 4 8 7 5 10 121 122 12' 19 Step 4. The final step in getting a quantitative ranking is to change the percentage differences into amounts of difference. Here we use the table of percentage differences (see Table III). The table shows the unit values that correspond to percentage dif- ferences and is reproduced from Thorndike's Mental and Social Measurements, page 123. Method and Data Involved in This Study 15 TABLE III The Amounts of Difference (x—y) Corresponding to Given Percent- ages OF Judgments that x > y. % r=THE Percentage of Judgments that x >y. A/P.E.=x— r/, in Multi- ples OF the Difference such that A% r is 75. %r A/P.E. %r A/P.E. %r A/P.E. %r A/P.E. %r A/P.E 50 .00 60 .38 70 .78 80 1.25 90 1.90 51 .04 61 .41 71 .82 81 1.30 91 1.99 52 .07 62 .45 72 .86 82 1.36 92 2.08 53 .11 63 .49 73 .91 83 1.41 93 2.19 54 .15 64 .53 74 .95 84 1.47 94 2.31 55 .19 65 .57 75 1.00 85 1.54 95 2.44 56 .22 66 .61 76 1.05 86 1.60 96 2.60 57 .26 67 .65 77 1.10 87 1.67 97 2.79 58 .30 68 .69 78 1.14 88 1.74 98 3.05 59 .34 69 .74 79 1.20 89 1.82 99 UOO 3.45 4.00 Arbitrarily taken. TABLE IV Differences in Amount of Difference for the Lowest Thirteen Teachers in Town A, Grades, as Judged by Group A No. Name Difference OF Amount Per Cent Amount of Per Cent by Table III 49 Cd— Rd— Dn— Rt— De— CI— Pm— Fm— Jr— By- Me— My— Ws— 1.00 1.37 1.37 1.86 2.32 2.69 2.88 3.03 3.21 3.21 3.71 4.12 4.34 60 50 63 62 60 55 54 55 50 63 61 56 50 .376 471 .000 47» .492 43* .453 43» .376 43» .186 43» .149 41 .186 371 .000 40 .492 375 .414 31« .224 16 Qualities Related to Success in Teaching Table IV shows the differences in amount of difference for the lowest thirteen teachers in the grades of Town A as judged by Group A. In this way we build up a scale of teaching ability in terms of amount, based upon fifteen sets of judgments. In actual practice one need not carry the differences in terms of amount to more than the first decimal place. This scale may now be used as a rating scale of the teachers with which to correlate any other significant scale of those teachers. Thus we may take their ages and, by correlation, find out whatever influence age may appear to have, or we may take professional training, salary, etc. POSSIBLE ERRORS In constructing a scale of this kind there is the possibility that two types of errors may be met; namely, constant errors and variable errors. For constant errors we cannot compensate. A constant error would be a universal tendency on the part of the judges, for ex- ample, to rate high those teachers who were graduated from Harvard and to rate low those teachers who were graduated from Yale, because it might popularly be thought that graduation from Harvard signified something that graduation from Yale did not, when in point of fact it makes no difference which college the teachers had attended. Another more naive type of constant error would be to rate high certain teachers because they are blondes and to rate low other teachers because they are brunettes, when complexion is not a determining factor. If all judges should consistently err in some such ways as these, then we would have a constant error. Variable errors are entirely taken care of statistically. Errors of this type operate when the judges do not know all teachers equally well. They also occur in reporting clerical mistakes that are made. These variable errors balance each other and by proper treatment are either eradicated or at least exposed. The most frequent suspicion of the validity of this method of rating is due to the feeling that teachers do not know each other and, therefore, cannot judge each other. Teachers, however, receive a fairly good idea of each other through conversations, the remarks of pupils, general reputation, appearance in the halls, teachers' meetings, and other sources. Method and Data Involved in This Study 17 While it is quite probable that teachers do not know, in minute particulars, whether other teachers are good or poor, it would be unwise to claim that teachers do not know pretty well whether their associates are successful or not. We have seen how a quantitative ranking of the grade teachers of Town A by a chance half of the judgments was obtained. The quantitative ranking of the same teachers by another chance half of the judgments was similarly made. For the high-school teach- ers of Town A, for the grade and high-school teachers of Town B and Town C, respectively, exactly the same computations were made. We have, then, two sets of rankings for the teachers of these six groups. THE DEPENDABILITY OF THE DATA If there is high agreement between two chance halves of the judges, such an agreement is evidence of the reliability of the data. The following paragraphs are concerned with establishing the reliability of the judgments upon which the ratings are based. AGREEMENT BETWEEN TWO GROUPS OF TEACHERS WHO JUDGE THEMSELVES FOR GENERAL TEACHING ABILITY The agreement between Groups A and B for general teaching ability is shown by the following correlations: Correlations Grade teachers (53) +941, db .016 High-school teachers (15) + .894, ± .05 Grade teachers (35) + .906, it .03 High-school teachers (13) +.894, ±.05 Grade teachers (30) +.813, ±.06 High-school teachers (10) + . 664, ± . 17 These are raw correlations. The highest, +.941, was from the largest group, the next highest from the next largest group, and the lowest correlation from the smallest group. The average correlation, with weighting for size of group, is +.882 ±.01. While it is a fact that the larger group and the higher correlation go together, this fact should not be taken to mean more than it actually does. We know in general that the smaller the group the more effective becomes the influence of error and that findings for large groups are usually more dependable than those for small. Tovra A TownB TownC 18 Qualities Related to Success in Teaching The average correlation of +.882 ±.01 shows that there is a high degree of resemblance between the findings of one group and those of the other. In other words, if the correlation had been zero, then there would have been nothing but a chance agreement as to who was the good teacher and who was the poor teacher. If there had been a correlation of —1.00, it would have implied that a teacher who was highly esteemed by one group would have been thought meanly of by the other group. If there had been a correlation of +1.0, it would have implied that there was per- fect agreement between the two groups in their estimates of teachers. In a range of — 1 .0 to + 1 .0, an average correlation of + .899 =±= .01 is seen to mean an amount of agreement that is, indeed, very significant. This inner consistency may be taken to connote that teachers' estimates of each other are by no means a hit-and- miss aflfair, but that there is a practical unanimity of opinion concerning teaching ability. Errors in judgment tend to lower a correlation and there could not be very much of chance guessing in a set of judgments which correlate with a similar set as highly as +.899, ±.01. AGREEMENT BETWEEN TWO GROUPS OF TEACHERS AND THE SUPER- VISORS' JUDGMENTS WHO JUDGE THE SAME TEACHERS FOR GENERAL TEACHING ABILITY In the Town A judgments, the supervisory force consisted of the superintendent, the principals who spent full time in supervision, the supervisors of music, drawing, and physical education, health officers, and two school-board members who were especially well informed concerning the teachers. In Towns B and C the judgment of the superintendent alone was available. There is every reason, however, to assume that these judgments were of a high order. The second method of making an estimate of teaching ability was to have the supervisors rate each teacher. This was done by the relative-position method. The statistical procedure was similar to that which was used in deriving the ranking of teach- ers from the summation of the ratings of the teachers.^ ^ These ratings are reported in full in the data sheets which are filed at Teachers College, Columbia University. Method and Data Involved in This Study 19 By correlation we find the following agreement between the ratings of teachers of each other and corresponding ratings by the supervisors: r between Super- r between Super- visors and Group visors and Group A Teachers for B Teachers for General Teaching General Teaching Ability Ability Town A ( ^'■^'^^ ^^^^^^^'^ +.934, ±.01 +.999, ±.00 1 High-school teachers + . 930, ± . 03 + . 606, ± . 16 Town B / ^'■^^^ ^^^^^^^^ +.976, ±.00 +.976, ±.00 ) High-school teachers + . 972, ± . 01 + . 833, ± . 08 rpQ.,^ Q f Grade teachers +.959, ±.01 +.913, ±.03 \ High-school teachers + . 912, ± . 05 + . 761, ± . 13 Average +.974, ±.00 +.951, ±.00 Averaging the correlations +.974 and +.951, for the correla- tion between the judgments of supervisors and teachers in their rating for general teaching ability, we get the coefficient of corre- lation + .962. These figures may mean that the teachers judge as they do, because they know in a general way what the super- visors think and therefore make their ratings agree as far as they can with those which they think the supervisors will give. Or they may mean just the opposite. More reasonable is the opinion that teachers and supervisors alike have access to the same information and therefore form similar judgments from a consideration of similar data. Whatever may be the explanation of the high correlation be- tween the judgments of the teachers themselves and their super- visory officers, the important thing is that the correlation is high. If supervisors can form a fair ranking of teachers, then teachers can rank themselves, as is shown by the high correlation (+ . 962) between the Group A plus Group B teachers and the supervisors who judged for general teaching ability. AGREEMENT BETWEEN PUPILS JUDGMENTS OF TEACHERS ANI> OF TEACHERS AND SUPERVISORS WHO JUDGE THE SAME TEACHERS There was one group of pupils (nearly 200), in the grades of Town A, who were receiving instruction from eleven different teachers. These pupils rated their teachers under dignified and respectful circumstances. The exact method will be explained 20 Qualities Related to Success in Teaching in a later connection. The correlation between the scale values which these eleven teachers received in the rating by mutual judgments and the pupils' ratings was +.681. The high-school faculties of Towns A and C were similarly rated, with the result- ing coefficients of correlations between mutual judgment rating and pupils' ratings of + . 807 for Town A high school, and + . 684 for Town C high school. Dividing the pupils' ratings into two chance groups (A and B) and correlating with the supervisors' estimates, we get the fol- lowing results: r Group A Pupils' r Group B Pupils' Estimates and Su- Estimates and Su- pervisors' Esti- pervisors' Esti- mates mates _, . f Grade teachers +.875 +.656 own A j jjigj^.ggijooi teachers + . 682 + . 730 Town C, High-school teachers + . 631 + . 738 Note. — The pupils' ranking of teachers in the grades of Town C could not be obtained. The grades were not departmentalized and hence pupils were acquainted with too few teachers. From three separate sources, scales, in units of amount, for the general teaching ability of the teachers, have been obtained. The correlations are of such a nature that one is warranted in assuming that the ratings which have been given by either group of the teachers themselves are dependable ratings of general teaching ability. CHAPTER III MEASURABLE FACTS RELATED TO GENERAL TEACHING ABILITY We have now a rating for general teaching abihty for 156 teach- ers. The data which show the correlation between success in teaching and certain measurable facts concerning teachers will now be presented. As a measure of teaching ability both the teachers' mutual ratings of Group A and the supervisors' ratings will be used. In all instances the correlations are computed by the Pearson formula. THE SIGNIFICANCE OF AGE The coefficients of correlation between teaching ability and age for each of the six groups of teachers have been computed. The age at the last birthday has been taken. Fractional parts of a year have not been used. It has not been possible to check the correctness of the ages of all the teachers, but those teachers who were members of the State Pension Fund, and practically all the teachers were members, were checked from the affidavits which are on file at the office of the Fund. The ages were of April, 1919. Typical examples of distributions, for Town A, grade teachers, are inserted on the following page. It would seem that a teacher's age is not a very good index of her general teaching ability. This same negligible and often negative correlation has been found in other studies even when a more lenient method of determining teaching ability has been used. There is one factor, however, which may have some effect on the correlation. In Town A and in Town B no teacher is at pres- ent employed who has not had elsewhere two years of successful experience. The rule has been in force for two years. On ac- count of the exclusion, for even this short time, of very young teachers, the correlation may have been affected as it would not have been affected in other school systems. The coefficients of correlation, which have been given above, should not be carelessly taken to mean that there is no relation- ship between general teaching ability and age. Obviously, a 21 22 Qualities Related to Success in Teaching child could not teach. Excessive old age, on the other hand, is not a negligible factor in determining general teaching ability. Within the limits of ages at which people actually do teach, age appears to be an irrelevant factor. 1. Distribution by Ages in Years: No. of Teachers 25 or under 10 25 to 30 18 30 to 35.. 35 to 40.. 40 to 45 . . 45 to 50.. 50 to 55 . . 55 to 60.. 60 or over . 2. Distribution by Years of Experience: Less tiian 5 11 5 to 10 20 10 to 15 . 15 to 20. 20 to 25 . 25 to 30. Over 30. These are grouped distributions. In computing the correlations, the actual fact was used in each case. Grouping of this kind, however, is sufficient to show the facts of distribution. 3. Distribution by Amount of Professional Study while in Service: No professional study 33 Professional study equivalent: To one summer-school session of work in education 12 To two summer-school sessions of work 6 To three summer-school sessions of work 1 r of Age with Group A Rating of Teachers' Judg- ments for General Teaching Ability Grade teachers + . 191 High-school teachers —.151 Grade teachers + . 050 High-school teachers +. 525 Grade teachers — .050 High-school teachers +. 604 Average (weighted for number in each group) +.135±.07 Town A TownB TownC r of Age with Supervisors' Esti- mates of General Teaching Ability -f .047 — .001 — .100 -f .422 —.108 -1-.335 -f.0298±.07 • 7 Measurable Facts Related to Teaching Ability 23 THE SIGNIFICANCE OF EXPERIENCE The factor of experience has been studied in two ways — (1) total experience in teaching, wherever that experience has been gained, and (2) the experience gained in the present school sys- tem or position. No significant mutual relationship appeared. These correlations may be affected by the fact that teachers who are fresh from the normal schools are not engaged. In these data experience as such mattered little. While it is clear that a teacher as she becomes older does not necessarily become better by any process of inner growth, it is also clear that the older teacher is not necessarily the poorer one. As far as these data reveal the true situation, neither amount of experience nor age should be considered factors of large significance in the assurance- of teaching success. r between Total r between Totat Experience and Experience and General Teaching General Teaching Ability as Deter- Ability as Deter- mined by Group mined by Super- A, Teachers' Judg- visors' Judgments ments „ . f Grade teachers + . 018 + . 102 1 own A < jjigjj.g(.jjooi teachers - . 079 —.102 _, p, f Grade teachers + . 140 + . 135 i own B < High-school teachers + . 531 + . 422 „ p f Grade teachers — .249 +.135 \ High-school teacher — . 180 + . 340 Average (weighted for number in each group) —.0386, ±.11 4-. 140, ±.10 r between Local r between Local Experience and Experience and General Teaching General Teaching Ability as Deter- Ability as Deter- mined by Group mined by Super- A, Teachers' Judg- visors' Judgments ments _ . f Grade teachers +.047 — .015 1 High-school teachers — . 079 —.066 f Grade teachers +.144 +.089 1 High-school teachers + . 364 + . 250 _ p f Grade teachers — . 148 — .275 \ High-school teachers + . 510 + . 416 Average (weighted for number in each group) +.124, ±.73 +.137±.7 3 24 Qualities Related to Success in Teaching If there were little difference in the amounts of experience that the teachers possessed, then the coefficients of correlation would be low, not because experience was not a factor in determining general teaching ability, but because all teachers had the same amount of it. The amounts of experience, of age, and of pro- fessional study in the case of the grade teachers of Town A have already been shown to illustrate what the differences in experience actually were. The same is true of salary, age, or any other factors. In short, if series A and series B are to be correlated, no distribution in either A or B would mean a zero correlation. But in all of our series distributions do occur. The low correlations cannot be accounted for by absence of distribution. CORRELATION OF TEACHING ABILITY AND SALARY RECEIVED At the time that this study was made there was no adequate salary schedule operative in Town A. Although there were some salary differences, many factors, other than those of services rendered, were effective in determining the salaries which were paid. In Town B and Town C, however, salary schedules, based on merit, were already well started. The correlations, therefore, are of particular interest. r between Salary r between Salary and General Teach- and General Teach- ing Ability as Deter- ing Ability as Deter- mined by Group mined by Super- A, Teachers' Judg- visors' Judgments ments J Grade teachers Not computed Not computed Town A <; jjigi^.gpj^ooi teachers Not computed Not computed j Grade teachers + .359 + AlO Town B < jjigi^.gphool teachers + . 130 + . 089 „ J Grade teachers + .575 + .083 Town O j jjigh.school teachers + . 676 + . 515 Average (weighted for number in each group)! -H. 434, ±.08 +.263, db.09 It will be seen that the coefficients of correlations, although they are not high, are at least positive, even in the judgments of the teachers themselves. In view of the fact that men are paid higher than are women, though not primarily for their better 1 Town A not counted; others weighted for size of groups. Measurable Facts Related to Teaching Ability 25 service, but because of their sex, it may be a fact that a few men in the grades are receiving relatively high salaries, although they have only moderate ability. If this be so, then the coeffi- cients of correlation will be somewhat lowered. Perhaps we should then more properly compute the correlations for women only. In neither Town B nor Town C was this the case, for there was only one man involved, and he was rated high and paid well. I mention this possible condition, to caution those who are making similar studies in which important sex differences occur. In the case of the high-school teachers, however, a distinction of sex should be made in order to eradicate sex as a factor — and not ability as a factor — in the salaries which teachers receive. THE RELATION BETWEEN GENERAL TEACHING ABILITY AND SCORES MADE IN TWO PSYCHOLOGICAL TESTS ^ Approximately one hundred teachers were given psychological tests. The first test might be called a test of mental alertness. It has been used sufficiently in many other connections to warrant the placing of considerable confidence in it. This test was divided into two parts. Each part lasted somewhat over thirty minutes. Before the test was given, the teachers had an opportunity to look over a similar test so that unfamiliarity with the material would not be a handicap. The scoring and the methods of com- puting final ratings are standardized. r between General r between General Teaching Ability Teaching AbiUty as Determined by as Determined by Group A Teachers' Supervisors' Es- Judgments and timates and Men- Mental Alertness tal Alertness _, . r Grade teachers — .099 + . 115 °^^ 1 High-school teachers + . 381 -}- . 346 „ I Grade teachers + .306 -|-.230 \ High-school teachers +. 545 -|- . 484 A second intelligence test was given to the grade teachers and high-school teachers of Town A. The r gained from the second tests were: f Grade teachers -F.060 +.179 1 own A j 2igh_g(,hool teachers -h . 480 + . 648 1 The tests used here are known as the first section of Thorndike College Entrance Examination. 26 Qualities Related to Success in Teaching The correlation between the two tests was +.812. It shows to what extent the same abihty was measured in both tests. The correlation between general teaching ability in the elementary grades, as estimated by teachers who served as judges, and intel- lect, as measured by this test, is +.173, ± .10. The correlation between general teaching ability, as estimated by supervisors who served as judges, and intellect, as measured by psychological tests, is + . 156, ± . 10. The correlations are distinctively higher in the case of high- school teachers. The mutual relationship between intellect and teaching ability, as measured by teachers' estimates, is + . 446, ± . 16. When general teaching ability is estimated by the super- visors the correlation is +.410, ±.16. These correlations are averages which have been weighted for the size of teacher groups. By using these two tests of separate measures of intellect and by using the Group A mutual judgments and the supervisors' estimates as two separate measures of teaching ability, we can correct for attenuation, and get a final correlation of + . 57 be- tween general teaching ability and scores which have been made in psychological tests, as in the case of high-school teachers. The practically zero correlation between the teaching ability of grade teachers and mental alertness, as measured by test, does not mean that intellect is an irrelevant factor in teaching. For there is no occupation in which intellect is not to some extent useful. Even a man with a pick can use his intellect to advantage in deciding where best to grasp the handle of the pick, in deter- mining the distance which one foot should be ahead of the other, and in arriving at other conclusions. Intellectual differences, however, among those who use the pick are not as significant as they would be among surgeons, philosophers, or psychologists. Although brains are of use in picking, it is also true that physical strength, lung expansion, large nasal passages, and other factors are relatively of much greater importance than intellect. In elementary-school teaching, even in the most routine work, intellect can be used and is used, but patience, industry, sym- pathy, and other qualities are relatively of greater importance than intellect. The differences in intellect among teachers are, as it were, lost in the complexities of differences in the amount of many other traits which are also important in elementary-school teaching. Measurable Facts Related to Teaching Ability 27 For high-school teachers this is not so correspondingly true. There does appear to be some relationship between differences in intellect and differences in teaching ability. High-school pupils are more mature; the content of high-school subjects is less under the spell of method than it is in the elementary-school subjects. Therefore, sheer intellectual ability does operate in a way that it does not seem to operate in the elementary-school teaching. We have too few cases, however, from which to generalize. THE SIGNIFICANCE OF ABILITY TO PASS A PROFESSIONAL TEST IN RELATION TO GENERAL TEACHING ABILITY Tests of a professional nature are often used as a means of determining a candidate's fitness for election or promotion. It was, therefore, entirely within our province to determine, as ac- curately as possible, the correlation between the ability to pass a professional test and the ability to teach. An examination,^ which called more or less definitely for knowledge of the technique of teaching, was given. The time allowed was seventy minutes. While no objective means for correction ^ were obviously avail- able, due care in the correction work was taken. The names of the teachers were not written on the papers until after the cor- rections were made. By this method, any weakness on the part of the examiner, to favor some papers and to discriminate against others, was avoided. For grade teachers in Town A the r between ability to pass a professional test and teaching ability as determined by teachers' ratings was -f-.450 (number of cases 33). The r between abihty to pass a professional test and teaching abihty as rated by super- visors was +.767 (number of cases 33). For high-school teachers in Town A the r between ability to pass a professional test and abihty to teach as determined by teachers' ratings was +.147 (number of cases 7). The r between ability to pass a professional test and ability to teach as rated by supervisors was +.001. It is unfortunate that more cases could not have been secured. From the evidence which we have, it would seem that knowledge which is required to pass a test such as the one referred to above 1 A copy of the examination used is filed with original data at Teachers College, Columbia University. Revised copies known as " ATradeTest for Elementary- School Teachers" by Knight and Franzen can be secured from the writer. 2 The correction of the examinations was made by the writer. 28 Qualities Related to Success in Teaching is not necessary for a person who wishes to be successful in high- school teaching. For elementary-school teaching such knowledge is much more needed. Variations among teachers in their ability to pass a test such as the one referred to above is more significant than variations in their age, in their experience, or in their salary. These data strongly suggest the practicability (1) of selecting high-school teachers by psychological test and (2) of selecting elementary-school teachers by a test which involves a knowledge of the technique of teaching. If professional tests could be made as accurate tests of technical knowledge as psychological tests are made tests of intellect, then the correlation between the achieved scores and success in ele- mentary-school teaching might well be measurably increased. Further, if high-school teachers were given a more extended psychological test, let us say at least three hours instead of one, as here indicated, then the results might be even more indicative of their teaching ability as a qualification for work in the high school. THE SIGNIFICANCE OF PROFESSIONAL STUDY WHILE IN SERVICE IN RELATION TO GENERAL TEACHING ABILITY Much stress has lately been placed upon the value of profes- sional study while in service. Many school administrators place a high value upon summer-school and university study in which teachers may engage. In many cases salary adjustments are made in part, at least, upon the fact that a specified teacher has taken professional courses in education. It is of value to ascertain what effect this professional study has upon the general teaching ability of an individual teacher or group of teachers. We know in the cases of the teachers whom we have studied closely, how these teachers stand in general teach- ing merit and also the amount of professional study while in service which they have to their credit. Is it the case that those teachers who are studying their pro- fession are also the teachers who stand high in general teaching merit? Even if this were the fact, it would not be clear just what the fact might mean. For example, it might mean that, because a teacher studied, she gained power and was, therefore, a better teacher. It might mean that, because she was a good teacher, Measurable Facts Related to Teaching Ability 29 she was, therefore, deeply interested in the technique of teaching and consequently studied. It might mean that the motive for doing the proper thing is operative and that those teachers take summer-school work who are most easily influenced by the desire to please their administrative officers. It might mean that the correlation between good teaching and professional study is more or less fictitious. It is also true that many teachers would like to study, but, for domestic and financial reasons, cannot do so. The teacher may be good in her work because she studies. She may study, how- ever, because she is good in her work. On the other hand, whether she studies or not may have an indifferent relation to her merit. Finally, the true relation between professional study and teaching ability may be a composite of all these possibilities, which is prob- ably the case. Unfortunately, for this consideration my data are scant. In Towns B and C so few teachers had done any organized study, while they were in service, that no relation could be estab- lished in these school systems between professional study and quality of service. In Town A enough of the teachers had done summer-school and university work while they were in service to make the study worth while. Six weeks of professional study in one course was counted as the unit of measurement of professional study. Amounts of profes- sional study are not as good measures as amounts of study plus quality, but the quality of professional study during service is a fact too elusive to obtain. Those teachers who undertook no pro- fessional study were, of course, rated as having done a zero amount. The correlations follow : r between General r between General Teaching Ability Teaching Ability as Determined by as Determined by Group A of Teach- Supervisors' Es- ers' Judgments timates and Pro- and Professional fessional Study Study ./ Grade teachers +.275 +.381 1 own A j High.school teachers + . 422 + • 364 Number of cases used in this computation, 52. In Town A no teacher had been forced to study. While some premium had been placed on professional study, even those who 30 Qualities Related to Success in Teaching had been given the opportunity of professional study were not chosen from among the ablest teachers. This fact would tend to lower the correlation. We cannot, of course, tell whether some teachers who had done professional study would have been dif- ferently rated if they had not done so. We cannot tell, on the other hand, how other teachers would have been rated had they undertaken professional study. In view of the presence of irrele- vant factors which tend to lower the correlation, it seems fair to say that, under ideal conditions where all teachers can study, if they wish to do so, the true correlation between professional study while in service and teaching merit will be no lower and, in all probability, will be higher, than the correlation which we have obtained. The effect of professional study is as yet not clear. The fact of a positive though small correlation, in spite of factors which tend to lower it, seems to justify the use of such a factor as the amount of professional study for diagnostic and prognostic purposes. IS QUALITY OF PENMANSHIP AN INDEX OF TEACHING ABILITY? In the professional test (see page 27) there was a copy of a letter, uncapitalized and unpunctuated, which was to be copied. This, of course, would be interpreted by any teacher who took the examination as an exercise in punctuation and capitalization — and such it was. It might also be used as a very convenient test of the quality of a teacher's handwriting. We get a much truer picture of how teachers write under working conditions, by using material of this kind, than we can get if we merely asked teachers to furnish a specimen of their handwriting. This material, used as an index of the handwriting ability of teachers, was scored for legibility by using the Thorndike scale, and quantitative values were assigned to the specimens of hand- writing. The scoring was made with the scorer ignorant of the names of the persons who wrote the specimens. The correla- tions between the legibility of the teachers' handwriting, as ex- pressed in terms of amount of legibility, and their general teach- ing ability rating follow: Measurable Facts Related to Teaching Ability 31 r between General r between General Teaching Ability Ability as Deter- as Determined by mined by Super- Group A of Teach- visors' Estimates ers' Judgments and Legibility of and Legibility of Handwriting Handwriting Town A, Grade teachers + . 001 + . 012 That legibility of penmanship is no index of teaching ability seems clear. It should be added, however, that the variations in the legibility of the handwriting were small. Most of the teach- ers are so-called "Palmer handwriting certificate holders." The restricted spread in differences in handwriting ability is not needed to explain the zero correlation. As a matter of common sense there is no causal relation between handwriting and ability to teach. THE MUTUAL RELATION BETWEEN GENERAL TEACHING ABILITY AND NORMAL-SCHOOL SUCCESS The relationship between general teaching ability and normal- school success has been obtained in two ways. First, a study was made of the relation between those teachers who came from the same normal school and those teachers who now teach in the same group. By this rigorous requirement errors have been eliminated which would exist if comparisons were made with the records of teachers who came from different normal schools, or if comparisons were made with the records for teaching ability which would come from equating the relative merits of teachers who work in different systems. It is exceedingly difficult to get, for any considerable number of teachers, accurate measures of the teachers' standings in normal schools and of the success in teaching of the same group. For, upon leaving the normal schools, the graduates scatter. In any school system there are teachers who come from so many different schools and at such widely varying times that large error is Hkely to creep into any investigation of the relation of normal-school success to general teaching ability. While the procedure adopted in this study was calculated to reduce error and doubtless did reduce the error, it also reduced the number of cases. The correlations, which are positive, are 32 Qualities Related to Success in Teaching for grade teachers, +.147 (19 cases); for high-school teachers with college records, Town B, +.600 (6 cases). The standing in normal school or college was determined by a complex process. The grades which determined the total stand- ing were all added together and divided by the number of grades. All grades were not counted as having equal value. Thus a grade "A" in English was not counted as the equal of a grade "A" in history. The values of the grades "A," "B," "C," ''D," "E" of each study were determined by taking all the grades of each study and by computing the percentage of each grade of the total. Then a probability curve for them of form "A" was as- sumed. A computation of their value in terms of the standard deviation (S.D.) distance from the mean was made. Thus the inequalities of grading in each department were to some extent at least neutralized. As an illustration, let us consider the 194 grades in English, which were distributed as follows: 11 or 5 per cent were "F"; 20 or 10 per cent were "E"; 27 or 19 per cent were "D"; 74 or 38 per cent were "C"; 49 or 25 per cent were "B"; 3 or 1 per cent were "A." Assuming a normal distribution of ability and using the S.D. values which are found in Thorndike's Menial and Social Measure- ments, we assign to "E" a value of minus 20, to "D" a value of minus 12, and so on. The values of the several grades in the other subjects were similarly determined. The grades received in practice teaching, English, arithmetic, history, science, and method were used in the computation. We do not know the quan- titative value of "F" in terms of "E" or "D." We cannot say that " B " is twice as good as " E," etc. These marks can be used, however, in denoting relative positions in a group or series. These relative positions, in turn, can be translated^ into terms of amount. The correlation was obtained between teaching ability and normal-school standing for such persons only as were teach- ing in the same group and came from the same normal school. This rigorous method of selecting data reduced the number of available cases to 19. This correlation of teaching ability with normal-school success or standing is dependable, because it uses a very accurate rating of teaching ability. The method of its determination has elim- * For the process see Thorndike's Mental and Social Measurements, Table 54, p. 221. Measurable Facts Related to Teaching Ability 33 inated errors in obtaining normal-school standing by taking only those teachers from the same normal school and by evaluating the marks which the teachers received in different studies. Vari- ations in the meaning of marks, however, may occur in one sub- ject from year to year as they do from subject to subject. Whether marks actually do or do not vary is a matter of conjec- ture. The real weakness in this correlation is due to the small number of cases. Since a correlation using less than 50 cases lacks numerical strength, the correlation between normal-school standing and teaching ability has been computed in still another way. The following assumptions have been made: 1. It is assumed that the median teacher in one system is, to all intents and purposes, equal in teaching ability to the median teacher in the other two systems, and that the summation of the quantitative variations from the median in one system is equiva- lent to the summation of the variations in either of the other two systems. While this is an assumption, the correctness of which cannot be proved, it is reasonable. The three systems which have been studied are all within metropolitan Boston; they draw teachers from the same normal-school systems; they pay about the same salaries; they fit pupils for the same colleges; and they are not widely dissimilar in size. It is fair to assume that the teaching forces are about the same. 2. It is assumed that the marks in any one subject in a normal school mean about the same as they do in any other subject in that school. This was the fact when the values of marks which were given in the several subjects by the Salem Normal School were computed for the first correlation. If the values of the marks varied to some degree, the final result would not be seri- ously affected, for the variations would be in all directions and would have the same effect as chance errors. 3. It is assumed that, while the individual marks in one normal school do not mean the same as they do in another, the composite marks are comparable. That is, if we find that all the teachers whom we studied came from the Salem Normal School, then a certain teacher is the median in normal-school standing for that group of teachers and her standing is, for our purpose, the same as the standing of the median teacher from any other normal- school group that we may study. It is necessary to make this assumption in order to get enough cases. It is not an unusual 34 Qualities Related to Success in Teaching assumption, although objections to it are perfectly allowable on mathematical or theoretical grounds. The normal schools which are studied are all in the same part of the country; they pay about equal salaries to their faculties; they have similar courses of study; they are in practically equal repute; they draw about the same class of pupils; they are super- vised, with one exception, from the same office; and they have such professional relations among themselves that their ideals and standards are largely mutual. Their graduates are attracted to the same school system and are equally well thought of. It seems reasonable to assume a practical identity of work required. This assumption is generally made in practical school administration. It is admitted it has statistical shortcomings, but its validity for this purpose may be allowed. Working upon these assumptions, the writer has computed the correlation between the normal-school standing of 53 teachers and their success in teaching out in the field, which is +.333. The ratings given to teachers by their fellow-teachers were used as the quantity to represent teaching ability. The normal-school standing was the numerical value of the average grade received. THE VALUE OF PUPILS' ESTIMATES OF TEACHERS In school administration we have never taken into account the fact that the estimates of pupils of their teachers might be valu- able. The importance of having content to which pupils would respond and of having methods to which pupils would favorably react has been repeatedly discussed, but we have assumed, on the whole, that pupils' judgments of their teachers were either unob- tainable or useless. We may yet find that there is a closer relationship between pupils' success in school and their reaction to the teacher than there is between their success and the methods of teaching read- ing, or the size of print in the text-books, or the amount of play space, or any other so-called important factor of school manage- ment. Pupils may be as competent judges of good teaching as anyone else. They are certainly the most concerned. Data will show that it is not the poor teacher in the eyes of the supervisor who is the good teacher in the eyes of the pupils. The estimates of the pupils were obtained by asking the pupils Measurable Facts Related to Teaching Ability 35 to write on a sheet of paper the names of all the teachers whom they had ever had. Then it was explained to them that they were going to say which of all these teachers, all things considered, was the best. Reasonable precautions were taken in giving the di- rections to have the pupils understand the meaning of best and the importance of making the most deliberate ratings that they could. In all cases the pupils were told not to write their own names and not to hurry in their answers. After the names of teachers whom they considered best, they wrote the word best. After the next best teacher, they wrote the word next; and after the third best teacher, they wrote the word third. In this manner the following groups of pupils judged their teachers; two groups of high-school pupils, one of seventh-grade pupils and one of eighth-grade pupils. To offset the factor of forgetting on the part of the pupils, in the computation only those teachers whom the pupils were having at the time of their making the judgments were considered. The two high-school groups were obtained by having the prin- cipal call together the forty most dependable pupils in the school. The elementary-school group was composed of 200 pupils in the departmentalized grades of a school in Town A. The teachers who were judged by each group of pupils fell into three groups — 11, 15, 13 — or sets of cases. The writer is certain that the pupils responded thoughtfully to his request for their judgments and that careful opinion was expressed. Each group of pupils' judg- ments was then divided into two chance groups and these were treated separately. The fact that the correlations between these groups were -f . 767, -f- . 517, -f . 905 respectively for three groups of pupils shows that factors of chance were not operating to any great degree. The correlations between the pupils' estimates of teachers and the estimates of the fellow-teachers and supervisors, follow on the next page. Using the two halves of the pupils' estimates as two independ- ent measures of the pupils' opinions and the mutual judgments of teachers and the supervisors' estimates as two independent esti- mates, we correct these correlations for attenuation. The cor- rected correlation between pupils' estimates and adult estimate of teaching abiUty is found to be -|- . 784. 36 Qualities Related to Success in Teaching Town A TownC Average r between Teaching r between Teaching Ability as Deter- Ability as Deter- mined by Pupils' mined by Super- Estimates and by visors and as De- Teachers Esti- termined by Pupils mates Group A, Grade teachers + .681 + .875 Group B, Grade teachers -I-.380 + .656 Group A, High-school teachers + .807 + .682 Group B, High-school teachers + .600 + .730 Group A, Grade teachers + .684 + .631 Group B, Grade teachers + .604 + .738 Group A, High-school teachers + .605 + .806 Group B, High-school teachers + .451 + .743 + .578 + .743 The fact that the correlation between groups of pupils' judg- ments was so high implies a real relation between those in whom the pupils have confidence and those in whom supervisors and fellow-teachers have confidence for their teaching ability. The weakness in these correlations is due to the large probable error, which is due to the small number (39) of teachers who are studied. THE SIGNIFICANCE OF INTERESTS The relation, if any, between success in teaching and interests in the various school subjects was determined by having the teachers fill out blanks which were constructed to reveal the rela- tive amount of interest that each teacher had in mathematics, history, literature, and science. An examination of the data clearly shows that teachers do not have distinct types of interests. The better teachers showed a slight tendency to prefer what are usually considered the harder subjects. CHAPTER IV THE RELATIVE SIGNIFICANCE OF THE QUALITIES MEASURED We have obtained, in the case of six groups of teachers, a quan- titative rating for general teaching abihty and we have correlated success in teaching with certain measurable facts about teachers. The number of cases involved was 153. In some of the correla- tions the total number of cases was not used. METHOD USED The process of obtaining the rating for general teaching ability- has been explained and the process by which certain significant facts about the teachers were obtained has been explained. The Pearson formula for computing the coefficient of correla- ^x • y tion : r = / — /-= — was used in all cases. In this formula x is the divergence from the central tendency in one distribution and y is the corresponding divergence in the other. The weighted average correlation was obtained by weighting the correlation of each group on the basis of the number of cases in that group. The formula which was used for attenuation follows: In all cases reported the coefficients of unreliability of the cor- relations have been computed. The formula used for this was: o =i^' ^tr—oht.T /— Where several correlations have been averaged, the weighting has been done on a basis of the number of cases. If, for example, the correlation between teaching ability and age was + . 444 for a group of 20 and + . 666 for a group of 40, the average correlation would be + • 592, counting the 20-group once, the 40-group twice, and then dividing by three. 37 38 Qualities Related to Success in Teaching THE MEANING OF CORRELATION The correlations vary in size and in significance. The use of correlation in this connection is for diagnostic purposes. For example, if we knew that the correlation between ability to pass an intelligence test and ability to teach was + . 999, all we would have to know about a teacher would be her ability to pass an intel- ligence test in order to know how good a teacher she would be. Correlations of this sort do not exist. Teaching is not perfectly correlated with any one thing, except teaching ability. Perfect correlation between general teaching ability and any other single quality would mean for us complete identity between the two traits correlated, and teaching is not identically like any one quality or ability which we can as yet measure. We cannot be sure just what qualities must be pos- sessed, or in what degrees, or in what combinations, for a teacher to be a successful teacher. We do know that more than one quality is needed. In all probability, we shall know at some time and with scien- tific precision why the good teacher is good and why the poor teacher is poor. We shall also possess at some time the means of securing a satisfactory measure. We are all certain that success in teaching does not "just hap- pen," but is due to the possession of certain traits in certain amounts and in many combinations. Some minimum essentials can be stated, but at present we are not certain as to how we can use the knowledge of minimum essentials which we now have. We know, of course, that a stark idiot could not teach; but, on the other hand, we do not know how much intelligence is the ideal amount for the elementary teacher to possess. It is not at all certain that unusual intellectual attainments in a first-grade teacher are, all things being considered, worth paying for. It has never been shown that a teacher with an intelligence quotient of 180 is a better teacher, because of that rating, than a teacher with an intelligence quotient of 120. It is well within reason to suppose that too much intelligence among those who do some kinds of teaching work is a handicap, just as in a corresponding degree too little intelligence is a handicap to other teachers. Similarly, a certain amount of health is a minimum essential for teaching, but it has never been shown that the healthiest Relative Significance of the Qualities Measured 39 teachers are the best teachers. After a certain standard of health is reached, more health may not be effective in improving the quality of teaching. We may yet find that certain ratios between height and weight, certain ranges of body temperature, certain ranges of emotional characteristics, certain qualities of vision and of eyesight, or cer- tain speeds in time reaction, or certain flexibilities of memory, or certain degrees of blood pressure, are present in good teachers and not in poor teachers. It could not, however, be stated as a fundamental hypothesis that, after a certain degree of keenness of vision is reached, still more keenness of vision will correlate with, or bring about, better teaching. Moreover, it is reasonable to assume that, to the extent that we can determine relationships between effective teaching and objective, measurable facts, we shall advance toward skill in the rating and prognosis of teaching ability. Suppose, for the moment, we found that the older a teacher becomes, the better teacher she also becomes. Suppose, on the other hand, that the more poorly a teacher wrote, the more skill- ful she was in governing pupils. Although some qualities are not constituents of teaching ability, as are intellect and faithful- ness, nevertheless they may still serve usefully as indices of teaching ability. If we could get enough measurable facts about a teacher and then correlate them with teaching ability, we should be able to rate teachers successfully. These measurable facts do exist and our problem is to discover them and correlate them. In reviewing the correlations which have been presented, the reader should keep in mind the simple interpretation of the mean- ing of correlation; namely, (1) that there is perfect correlation between two observable series of facts, if the presence of one fact means the presence also of the other fact in the same way and in the same relative degree; (2) that there is perfect negative corre- lation, if the presence of one fact meant the absence of the other. For example, if we know that the older a teacher is, the poorer she is, then there will be perfect negative correlation between age and ability to teach. Zero correlation exists, if the relation between the two facts is such as would be produced by pure chance. Prediction is pos- sible, if the correlations are removed in size from zero. The 4 40 Qualities Related to Success in Teaching greater the size of the correlation, other things being equal, the more exact the prediction. The correlations between ability to teach and other sets of facts, which have been found in this study, after adjustments have been made in order that one correlation may best represent the facts, follow. Correlations between: Ability to Teach and Age + .082 Ability to Teach and Salary + • 348 Ability to Teach and Experience + . 041 Ability to Teach and Intelligence as measured by test + . 164 Ability to Teach and Handwriting + . 000 Ability to Teach and Knowledge of Teaching Technique as measured by professional test + . 608 Ability to Teach and Study while in service + . 328 Ability to Teach and Normal-school Scholarship + . 147 Ability to Teach and Pupils' Estimates of Teachers + . 784 Other significant correlations are: Test A (First Mental Test) . . Test B (Second Mental Test) Test C (Professional Test) . . . Normal- School Test B Teste Standing .812 .470 .559 .584 .536 .486 General teaching ability and success in normal-school studies were correlated as follows: English, +.040; Arithmetic, +.001; Geography, +.370; Science, +.268; History, +.235; Practice Teaching, +.057. Intellect, as measured by test, correlates with ability to pass a professional test in about the same degree as it does with normal- school standing, and ability to pass a professional test correlates a little lower with normal-school standing than does intellect, when the results of the two tests are pooled. Apparently the factor of intellect is quite significant in normal- school study, but, in comparison with other factors, it fades out in class-room work. Intellect is certainly operative in ability to pass a professional test, but it is uncertain whether the intellectual factors which operate in ability to pass a professional test are those which ac- count for the correlation between ability to teach and ability to pass a professional test, since the correlation between ability to teach and ability as revealed in psychological tests is itself so low. Relative Significance of the Qualities Measured 41 THE MORE IMPORTANT TRAITS Some measurable facts do not appear to have prognostic value, while others do. We may now consider the interrelationships of four traits: general ability to teach; ability to pass a professional test; ability to pass a mental test; and standing in normal school or normal-school record. All of these traits or abilities are interrelated. There is some correlation between a teacher's standing in normal school and her subsequent abiHty to teach, her ability to pass a professional test, her ability to pass a mental test. Some relation exists be- tween each trait and every other trait. The amount of positive relationship between any two traits which appears in a simple correlation is affected by the influence of the other traits. The interrelationships are exceedingly complex. The problem may be analyzed as follows: Let G represent general teaching ability- Let / represent intellect as measured by test Let P represent ability to pass a professional test Let A'^ represent normal-school record G is related to I G is related to P G is related to N I is related to P / is related to A^ P is related to A'' The GI relationship is related to or affected by P The GI relationship is related to or affected by N The GI relationship is related to or affected by PN The GP relationship is influenced by I and by A'' The GN relationship is influenced by / and by P The IP relationship is influenced by G and by N The GN relationship is influenced by / and by P The PA'' relationship is influenced by G and by I The mutual relationship between ability to teach and ability to make scores in mental tests will be affected in some measure by one's standing in normal school, because one's ability to teach is affected by what one did in normal school and one's standing in normal school is also, more or less, a result of intellectual ability. By a statistical procedure of partial correlation the true rela- tion may be found in the following cases: 42 Qualities Related to Success in Teaching G and 7, when factors P and N are neutralized or non-operative G and P, when factors I and N are neutralized or non-operative G and iV, when factors I and P are neutralized or non-operative To make partial correlations we must have measures in all traits for each person. It is exceedingly difficult to get measure for all traits for many cases. We have, however, satisfactory measures of teaching ability, ability to pass a professional test, normal-school record, and a measure of intellectual keenness for 29 elementary-school teachers. This is too small a number on which to base any sweeping conclusion. The method which has been used is the correct one, however, and is best adapted to find out what relationships exist between teaching ability and certain measurable traits. More cases, other studies, further investigations, must be made before the question can be finally answered. These total correlations were discovered: General Teaching Ability and Intellectual Keenness ± . 000 General Teaching Ability and AbiUty to Pass a Professional Test ... -j- . 541 General Teaching Ability and Normal-school Standing -j- . 153 Intellectual Keenness and AbiUty to Pass a Professional Test -\- . 108 Intellectual Keenness and Normal-school Standing -\- . 371 Ability to Pass a Professional Test and Normal-school Standing .... -1- . 560 These partial correlations were discovered: General Teaching Ability and Intellectual Keenness -f . 088 General Teaching Ability and Normal-school Standing — .214 General Teaching Ability and Ability to Pass a Professional Test ... -|- . 570 From the partial correlations these deductions seem justifiable: 1. The differences in mental keenness which are revealed in the passing of psychological tests do not correspond with differences in teaching success. 2. The position that a student in normal school holds in her class is not indicative of her subsequent success as a teacher. 3. The relative success achieved in passing a professional test is correlated positively and highly with success in teaching. 4. Matters of such importance as we have been studying can- not be settled without similar investigation of many more cases, although in the present study the correct statistical method has been followed. This study indicates that age, experience, quality of hand- writing, intelHgence as measured by tests, normal-school stand- Relative Significance of the Qualities Measured 43 ing, or the expressed interests of teachers are not closely related to success in teaching. We have, however, an indication of a mutual relationship be- tween teaching and a knowledge of the technique of teaching which challenges attention. Everyone must interpret this fact in the light of his own experience and judgment, until further data have been analyzed. If the ability to pass a professional test were an index of teach- ing ability, because the teacher who teaches a long time learns how to teach and also how to pass a test, — that is, if experience or age were the real sine qua non of good teaching, — then that fact would have appeared in our correlations between age and experience with general teaching ability. It did not so appear. Professional preparation, as indicated by normal-school stand- ing, does not appear to account for the + .570 correlation between teaching success and knowledge of technique, because normal- school standing, when correlated directly with teaching ability (50 cases), correlated only +.333. In the partial correlation the relation was even slightly negative. Relatively large amounts of pure intellectual alertness are not uniformly possessed by good teachers, while poor teachers uni- formly lack intellectual alertness. For, with 100 cases, the corre- lation between success in teaching and intellect, as measured by test, was very low, and in the partial correlation a zero relation- ship appeared. The most reasonable explanation seems to be along the line of a teacher's interest in her work. No other explanation is apparent nor is any other perhaps needed. The teacher who has a genuine interest in her profession will learn its technique and hence will pass well in a professional test. Those who have not a real devotion to their art will forget, or never take the trouble to master, the technique of their work. When, therefore, a test which requires technical knowledge is given to them without warn- ing, they will fail. Teaching, especially in the grades, will be well done by those who are sensitive to its problems and thoughtful of their solution. Interest of a substantial vital kind will explain the mutual rela- tionship between ability to pass a professional test and success in actual teaching. Moreover, the abihty to pass a professional 44 Qualities Related to Success in Teaching test may be taken as an index of real interest for, and of probable success in, teaching. If it were within the possibilities of any one study to procure enough cases upon which to base final conclusions, we could take the partial correlations between general teaching ability and several measurable factors and, by combining in a regression equation, say that teaching is a composite of measurable factor a, taken x times; factor b, taken y times; factor c, taken z times; factor d, taken n times. Then, in order to rate teachers or to select them, the proper procedure would be to procure measures and combine them into a final rating. This has not been done and will not be done, until some provision is made for a competent investigator with a staff of statisticians at his command to have access to at least 500 elementary-school teachers, distributed in several types of school systems, and studied for a period of years. Until this situation is possible, proper checking of results is impossible. The method which has been used in this study would be in the main a satisfactory procedure for such an elaborate study. CHAPTER V THE RELATION BETWEEN SPECIFIC TRAITS WHEN SEVERAL JUDGES RATE THE SAME TEACHERS IN THOSE TRAITS THE THEORY OF ANALYSIS Teaching as a whole may be analyzed, for purposes of corl-- venience, into constituent parts, such as ability to ask questions; ability to direct study; ability to govern; ability to stimulate the moral health of the community; and kindred abilities. Theo- retically, such an analysis is possible. How much analysis of this kind is real, however, when the analysis is made on a basis of personal judgment is uncertain. In the first part of this study emphasis was placed on what many judges thought about a teacher in general. This was taken as an adequate basis of merit. Would it not be better to use analyzed judgments? School administrators have been using of late a score card which contains many traits of teaching. This score card is used as a basis, or as a method, or as a help, in aiding administrators to form their judgments concerning teachers. So much has been written on the subject of score-card rating, and students of educa- tional theory and practice are already so sufficiently informed concerning the score-card method of rating teaching, that a further review of the literature, other than the brief discussion in the Introduction of this study, is redundant. Teaching, in a certain sense, is an organic unity, but, in a very useful sense, it is also a composite of faculties, or traits, or func- tions, all of which are more or less disparate and separable. Accepting teaching in the latter meaning, we get genuine insight into the troublesome problems of estimating teaching ability and in rating teachers, if we list the constituents of teaching ability, assign a value to each ability, measure the amount in which each ability exists in any given teacher, and thus compute a final rating for teachers. The various score cards which have been devised 45 46 Qualities Related to Success in Teaching for rating teachers attempt to solve, in various degrees of com- pleteness, just this sort of problem. The following data throw light on what actually happens, when analysis by personal judg- ments is attempted. DATA ON ANALYSIS The best way to present the data is to give a running account of how they were obtained. When the teachers in Towns A, B, and C rated each other for general teaching ability, they also rated each other for specific qualities. These qualities are those which could well be considered as significant and analyzed qualities of teaching ability. The complete instructions which were given to the teachers and the qualities which were to be rated will be seen from an inspection of the original instructions which are here reproduced. INSTRUCTIONS GIVEN TO TEACHERS On this sheet you are requested to give certain ratings of each teacher in the list, including yourself. Please rate every teacher, and please be absolutely frank in your ratings. You need not sign your name. Nobody will ever know how you or anybody else rated him. No personal use will ever be made of any of these ratings. They will be used in a purely scientific study to determine the significance of age, education, early interests, etc., etc., for success as a teacher. The names will all be cut off and destroyed as soon as the different items in the inquiry have been numbered to fit the ones to whom they refer. Also, do not feel disturbed because in each respect somebody has to be rated lowest. These ratings are all relative, and the lowest teacher in the group may well be of very great ability. Please be sure to record ratings even if they seem to you to be little better than mere guesses. The opinions of twenty men will give a useful rating, even if any one of the twenty taken alone is almost worthless. On the sheet is a list of the teachers. Choose the teacher of greatest teaching ability in the group and write a figure 1 after his or her name in column 1. Choose the teacher next below in teaching ability and write 2 after his or her name in colmnn 1. Write 3 after the name of the one next in teaching ability and so on with 4, 5, 6, etc. If two or more seem absolutely equal in teaching abUity give them the same rating. Then think of the ability to understand and ?nanage people, to get on with other men, to secure obedience from inferiors, cooperation from equals, and consent and support from superiors in school, business, or other activities. Choose the teacher of greatest ability in the group and write 1 after his or her name in column 2. Proceed as for teaching ability ranking. Then think of intellectual ability, the ability to manage ideas, to work with facts, rules, and principles, to learn the science of a thing, to understand explanations and reasons, to think things out. Choose the teacher of greatest The Relation Between Specific Traits 47 intellectual abiUty in the group and write 1 after his or her name in column 3. Proceed as for teaching ability rating. Then think of the ability to manage things and mechanisms, the ability to sail a boat, to drive a motor car, to use tools, machines, and instruments of all sorts, to be handy. Choose the teacher of greatest ability to manage things and mechanisms in the group and write 1 after his or her name in column 4. Pro- ceed as for teaching abiUty rating. Then think of general scholarship, signs of education, knowledge of literature, etc. Choose the teacher of greatest general scholarship in the group and write 1 after his or her name in column 5. Proceed as for teaching ability ranking. Then think of skill in government or discipline, abihty to control, to keep order, etc. Choose the teacher of greatest skill in government or discipline in the group and write 1 after his or her name in column 6. Proceed as for teacher rating ability. Then think of instructional skill, pure ability to instruct, correct and effective methods, economy of time and effort, abiUty to get all pupils to understand the subject-matter. Choose the teacher of greatest ability in instructional skill in the group and write 1 after his or her name in column 7. Proceed as for teaching ability rating. Then think of initiative, the making of headway, the starting of new means, the stating of new ends. Choose the teacher with the greatest initiative in the group and write 1 after his or her name in column 8. Proceed as for teaching ability rating. Then think of nervous and physical strength. Choose the teacher of greatest nervous and physical strength in the group and write 1 after liis or her name in column 9. Proceed as for teaching ability rating. Then think of that teacher who commands the greatest respect of the pupils in the group and write 1 after his or her name in coliunn 10. Proceed as for teaching ability rating. Finally, think of general ability to get results. Choose the teacher with the greatest ability to get results in the group and write 1 after his or her name in column 11. Proceed as for teaching ability rating. The actual ratings for eleven traits made on a prepared sheet (see illustration on next page) were secured for the 156 teachers in exactly the same way as the ratings for general teaching ability were secured. The ratings for general intellectual ability and for skill in dis- cipline were then treated as were the estimates of general teaching ability. For the six groups of teachers a relative rating for the qualities, general intellectual ability and skill in discipline, were obtained. These were turned into quantitative ratings as in the case for general teaching ability. As the statistical process was the same as that which was described in Chapter II, under the heading "Process of Rating Teachers," a description of the pro- cedure need not be here repeated. 48 Qualities Related to Success in Teaching Names of Teachers 1 2 3 4 5 6 7 8 9 10 11 EXPLANATION OF COLUMNS Col. 1. General ability as a teacher. Col. 2. General ability to manage people. Col. 3. General intellectual ability. Col. 4. Ability to manage things and mechanism. Col. 5. General scholarship. Col. 6. Skill in discipline. Col. 7. Ability to instruct. Col. 8. Initiative. Col. 9. Nervous and physical strength. Col. 10. Respect of pupils. Col. 11. General ability to get results. In all instances half of the ratings were selected by chance and were treated as the data to form one rating, and the other half of the ratings were used to form another rating. Thus ratings for those two qualities for the six groups could be checked. AGREEMENT BETWEEN TWO GROUPS OF JUDGES WHO JUDGE THE SAME TEACHERS FOR THE QUALITY GENERAL INTELLECTUAL ABILITY The correlations between the halves of the judgments for the qualities, general intellectual ability and skill in discipline were then computed. One half of the judgments we shall call Group A and the other. Group B. The reader will recall that there was, in the case of mutual judgments for general teaching ability, a high correlation between the two chance halves of the judgments The Relation Between Specific Traits 49 in the case of all six groups. We have the same condition pre- vailing here. These correlations are of interest: r between Two Groups of Judges Who Judge the Same Teachers for General Intel- lectual Ability No. of Cases ^ . , Grade teachers + . 861, rt . 035 53 own A \ jjjgij.gp}jooi teachers + . 967, i . 016 15 , Grade teachers + .899, ± .031 35 1 own B < jjigh.school teachers + . 845, ± . 079 13 TownC Grade teachers +.958, ± .014 30 High-school teachers + . 326, ± . 279 10 Average (weighted for number in each group) + . 879, ± . 016 These correlations show that there is close agreement among the teachers as to the distribution of general intellectual ability among them. In this case when two groups of judges estimate the differences of intellectual capacity of a corps of teachers the mutual agreement is on the average +.879 ±.01. This is a weighted average of estimates of six different corps of teachers. These correlations are also of interest: AGREEMENT BETWEEN TWO SETS OF JUDGES IN RATING A GROUP OF TEACHERS FOR THE TRAIT SKILL IN DISCIPLINE r between Two Groups of Judges Who Judge the Same Teachers for Skill in Dis- cipline , Grade teachers +.943, ±.015 iown A <, jjigh.g^.j^ooi teachers +.896, ±.050 Grade teachers. +.757, ±.071 High-school teachers + • 581, ± . 180 Grade teachers + • 728, ± . 085 High-school teachers +.917, ±.049 Average (weighted for size of group) + • 838, ± . 023 TownB Town G With skill in discipline as with general teaching ability and general intellectual ability, we find that the size of the correlation indicates substantial agreement among the judges, which is very 4 50 Qualities Related to Success in Teaching far from a matter of chance that guesses or haphazard opinions would have produced. This agreement between chance halves of judgments for the quahties, intellectual ability and skill in discipline, does not prove that intellectual ability, as such, or skill in discipline, as such, were the traits actually rated, although an easy interpretation of the data might lead one to think so. This agreement simply means that on the whole the judges had the same quality or trait in mind and really did agree as to the distribution of amounts of the qualities or traits. What agreement there would have been concerning other traits we do not know. It is fair to assume, however, that equally high agreement exists. The three traits which we treated statistically show uniformly high agreement and the enormous amount of time required to work out other ratings seems unnecessary, when the first three treated show the amount of agreement that is present. The important fact is that when teachers rate each other for general teaching ability, or for a specific quality, such as skill in discipline, chance halves of the ratings mutually correlate so highly that substantial agreement is fairly established. The average correlation between chance halves of judges when judging the same group of teachers for the same qualities is +.872. This calculation is based on the average of eighteen sets of judgments. THE ABSENCE OF ANALYSIS IN RATING To find out how much actual analysis is made when judgments for specific traits are recorded, we shall correlate the ratings for general teaching ability with general intellectual ability; general teaching ability with skill in discipline ; general intellectual ability with skill in discipline; and then interpret the correlations which have thus been obtained. What relation is there between ability to teach and intellectual ability when both traits are judged by mutual ratings? As we have here two independent measures for each trait, we can correct for attenuation and get a reliable finding. The independent measures are, of course, the two ratings of the two chance groups, "A" and "B" mentioned before. The Relation Between Specific Traits 51 Town A Town B Town C Grade teachers .... High-school teachers Grade teachers .... High-school teachers Grade teachers. . . . High-school teachers r between I A and II W -l-.927±.019 -f-. 925 ±.037 -1-. 899 ±.032 + .919 ±.043 -f.461 ±.143 + .944 ±.034 Average (weighted for size of group) + . 847 ± . 018 r between II A and I W + .802 ±.049 + .822 ±.083 + .919 ±.026 + .791 ±.103 + .859 ±.047 + .260 ±.294 + .819 ±.026 THE CORRELATIONS BETWEEN (l) GENERAL TEACHING ABILITY (ll) GENERAL INTELLECTUAL ABILITY WHEN THE SAME JUDGES JUDGE THE SAME TEACHERS FOR TWO TRAITS FOLLOW r between Traits 1 and 2, Corrected for Attenua- tion + .957 ±.011 + .937 ±.030 +1.000+' + .925 ±.041 + .713 ±.089 +1.000+' + .935 ±.014 1 1 A and II B is read the correlation between teaching ability as rated by one group of judges and intellectual capacity as rated by another group of judges. 2 II A and I B is read the same, except that the groups of judges estimate the traits in reverse order. •The two correlations above +1.0 of course are wrong in the sense that we could have more than perfect correspondence. The correlation between general teaching ability and general intellectual ability, when weighted for size of groups, is +.935 ± .014. On first glance, it would seem as if there were an astound- ingly high mutual relationship between ability to teach and gen- eral intellectual ability. The correlations, however, should be given more than passing notice. First, let us go back to the eleven traits on the original rating sheets. In a sense, these original rating sheets might be considered score cards extended from one person judged and one person rating to many persons judged and many persons scoring. We might further think that general teaching ability is a compos- ite of the other ten traits which have been mentioned. At least these are among the more important traits mentioned on score cards in general. Our correlation of +.935 ±.014 between gen- eral teaching ability and general intellectual ability could be variously interpreted. It is exceedingly important that the interpretation should be correct. First, we might conclude that the judges kept general teaching ability and general intellectual ability clearly distinct from each other in their minds when they were rating and that they actually found that there was this high mutual relationship between intellect and pedagogic skill. From this reasoning it might fairly be held that the stronger a teacher was the abler she would be 52 Qualities Related to Success in Teaching mentally, and, conversely, that mental vigor implies a corre- sponding degree of teaching power. Second, on the other hand, this correlation of 4-.935±.014 may be interpreted as an inability on the part of the judges to distinguish effectively between teaching strength and intellec- tual capacity in persons judged by them. In other words, a judge has a certain opinion of a teacher in toto, and his opinion is given according to his general impression in answer to any significant question about that teacher. Thus, the general estimate may be taken to permeate all particular judgments, and, conversely, particular judgments are simply defenses for, or justifications of, the general opinion which has thus been held. To make this still clearer, let us assume that a person likes a certain picture. If this like is strong enough, it will not vary from whatever point of view the picture may appear. Let it stand on the right of the person; he will still like it. Let him see the picture from the left; he will still like it. The total effect being pleasing, it will not be hard so to rationalize his thinking that the background, the middle, and the foreground will all appear to be well painted. The detail will be correct or over- looked, and the main features will be good or easily condoned. We can very well term this process the spreading of a halo of general effect to all particular parts. So it might well be in judging a teacher. Looked at from the right or the left, from the aspect of intellect or from that of gen- eral ability to teach, the general opinion will still be present and will be the basis upon which the judgment is formed. This is apparently the most reasonable interpretation of the correlation. In many of our school practices we have assumed for ourselves the ability to analyze an organic whole and an ability to judge the parts of a person, irrespective of the whole; but, when we actually check up our mental processes, we see that this ability, if it exists at all, exists in a very small degree. It appears that this spread of the general estimate enters into our particular judgments to a degree little before expected. For it is to be doubted if anyone would seriously hold that there was this correlation of +.935=t=.014 which really existed between general teaching ability and general intellectual ability. The reader will remember that in about 100 cases we deter- mined intellectual differences by means of standardized tests. The Relation Between Specific Traits 53 The correlation between general teaching ability and intellect, as measured by tests, was extremely low ( + .164 was the average). Either the tests are not measures of intellect at all and hence the correlation +.164 is false, or the judgments of intellect include so many other qualities that they really are not judgments of intel- lect at all and the +.935 correlation is false. It should be remembered that teachers are already a highly selected group. There could hardly be any correlation of +.935 between any two traits which were not practically identical. Since we know that the tests which were used were more than indifferent tests of what goes by the name of intellect, we are fairly correct in our conclusion that the correlation of +.935 between general teaching ability and general intellectual abihty as estimated by judgments shows not an estimate of intellect to have been made, but rather an estimate of general ability under the name of intellect. The analysis in these instances simply was not made! THE CORRELATION BETWEEN ABILITY TO TEACH AND SKILL IN DISCIPLINE We have two separate measures of general teaching ability and two separate measures of skill in discipline. Correlation between the "A" group of judges' estimates for general teaching ability and the "B" group of judges' estimates for skill in discipUne are recorded under the Caption I A and VI B. The correlations under the caption I B and VI A are the correlations between the "B" group of judges' estimates of general teaching ability and the "A" group of judges' estimates for skill in discipline. The third column gives the correlations which have been corrected for attenuation. r between Trait 1 and Trait 6, Cor- rected for I A and VI B IB and VI A Attenuation f Grade teachers +.776 ±.055 +.712 ±.068 +.787 ±.052 °^° 1 High-school teachers +.650 ±.149 +.829 ±.080 +.789 ±.094 f Grade teachers +.686 ±.089 +.580 ±.112 +.699 ±.085 °^° 1 High-school teachers + . 703 ± . 140 + . 767 ± . 081 1 . 000 + i _, I Grade teachers +.679 ±.098 +.696 ±.094 +.824 ±.058 ^"^^ \ High-school teachers +.900 ±.060 +.625 ±.192 +.964 ±.022 Average (weighted for size of group) + . 741 ± . 036 + . 703 ± . 040 + . 789 ± . 001 'The two correlations above +1.0 of course are wrong in the sense that we could have more than perfect correspondence. 54 Qualities Related to Success in Teaching The correlation between general teaching ability and skill in discipline, when weighted for size of groups, is +.789 ±.001. Here again we find a higher correlation than we would ordinarily expect. It is not higher than that between general teaching ability and general intellectual ability, although we would cer- tainly hold that it should be. This is accounted for by the fact that disciplinary skill can be better judged than intellect, and, therefore, the tendency to spread a judgment might be lessened; but there is more of the explanation in the fact that discipline was the sixth trait that was rated. By the time that the sixth column is reached there is a fairly definite temptation to vary ratings as a matter of principle or as a device to relieve monotony, or simply because one wants to. It is fair to assume that, if discipline had been the second rather than the sixth trait to be rated, the correlation would have been higher. By how much is, of course, uncertain. As far as tradi- tion goes the correlation should have been higher between general teaching ability and skill in discipline than between general teach- ing merit and intellectual strength. For it is everywhere assumed that in public-school teaching, skill in discipline is the first requi- site. The fact that 153 teachers in groups rating each other found a higher mutual relationship between general teaching ability and general intellectual ability than between general teaching ability and skill in discipline is, to say the least, interesting. It is also hard to account for, except by the fact that judgments of particu- lar traits are really defenses of general estimate rather than esti- mates of particular traits which have been considered in isolation. Of course, governing skill is a constituent of good teaching, but that the true correlation is as high as +.787 is to be much doubted. If it were true, it should mean that the drill sergeant would be the best teacher. It would also imply that mere order- keeping was a larger part of instruction than we believe it to be. The factor of spread of general opinion is also present here. The correlation between general intellectual abihty and skill in disciphne, when weighted for size of groups, is +.719 and when corrected for attenuation is +.863, ±.020. The correlation between Trait II and Trait VI, when corrected for attenuation, gives the final correlation between general intel- lectual ability and skill in disciphne. This correlation instead of revealing the fact of the case is, if taken at its face value, nothing The Relation Between Specific Traits 55 short of preposterous. Were this really the truth, what a prodigy of intellect the "strict," but often dull, teacher would be! If we thus generalized, we would also hold that Grant, admittedly a past master in control, also towered above Lincoln in mental stature. THE CORRELATION BETWEEN GENERAL INTELLECTUAL ABILITY AND SKILL IN DISCIPLINE r between Trait II and Trait VI, Cor- fPPtPfi TOT* Afi^ II A and VI Bi II B and VI A 2 tenuation r Grade teachers +.700±.070 +.800±.049 +.941 ^^^1 High-schoolteachers.... +.525 ±.187 +.805 ±.090 +.698 J Grade teachers +.756 ±.072 +.609 ±.106 +.824 own a < jjigij.school teachers .... + . 583 ± . 1 83 + . 789 ± . 1 04 + . 968 I Grade teachers +.915±.029 +.663±.102 +.932 ^^°\ High-school teachers.... +.024 ±.316 +.750 ±.297 +.245 Average (weighted for size of group) + . 697 ± . 042 + . 741 ± . 036 + . 863 III A and VI B is read: the correlation between Group A judges' estimate of general in- tellectual ability with Group B judgments of skill in discipline, when both groups of judges are judging the same teacher. » II B and VI A is similarly interpreted. THE INFLUENCE OF GENERAL ESTIMATE The factor of spread of general opinion to particular traits is here well illustrated. We must remember that these teachers are a relatively selected group for general intellectual ability. All are graduates of normal school or college, and this would tend to lower the correlation. Of course, there is some correlation between general intellectual ability and skill in discipline. A stark fool could not control a class, but common sense would prohibit us from believing that any such mutual relationship as a correlation of +.863 suggests is the actual fact. We would also deny, irrespective of these or any other data likely to be presented, that there was no closer relationship between teaching ability and discipline than between intellect and discipline. And yet our findings, if interpreted literally^ show this. This factor of spread of general estimate can be illustrated in another way. Allow, for the purpose of the illustration, that the supervisors' estimates of general teaching merit adequately represent the facts. If we correlate what the teachers rated as intellect with what the supervisors rated as general ability, we get a valuable evidence that teachers rate teaching ability even 56 Qualities Related to Success in Teaching when- they are asked to rate general intellectual ability. The same thing can be done for teachers' estimates for skill in disci- pline and supervisors' ratings for general teaching ability. These correlations are as follows: r between Super- r between Super- visors' Estimate visors' Estimate of of General Teach- General Teaching ing Ability and Ability and General General Intellec- Intellectual Ability tual Ability as Es- as Estimated by timated by Group Group A Teachers B Teachers _, . f Grade teachers + . 883 (52 cases) + . 869 1 High-school teachers + .840 (15 cases) + -885 „ -r, i Grade teachers -f- . 945 (35 cases) + . 971 1 High-school teachers -h . 611 (13 cases) + . 750 rp „ f Grade teachers -|- . 768 (30 cases) + . 741 own s^ High-school teachers -|- . 477 (10 cases) -|- . 999 r between Super- r between Super- visors' Estimate visors' Estimate of of General Teach- General Teaching ing Ability and AbiUty and Skill in Skill in Disci- DiscipUne as Esti- pline as Esti- mated by Group A mated by Group Teachers B Teachers f Grade teachers -|-.786 +.580 1 own A < jjjgh.school teachers 4- . 450 + . 708 I Grade teachers +.679 +.729 I own B < jjjgj^.gchool teachers + . 562 + . 847 _ p J Grade teachers + . 759 + . 956 1 own U j jjigh.school teachers + . 890 + . 772 The average correlation, weighting for size of group judgments, between supervisors' estimates of general teaching ability and mutual judgments of the teachers for general intellectual ability is +.876; between supervisors' estimates of general teaching ability and mutual judgments of the teachers for skill in disci- pline, + .744. We could not hold that any such relation really held between general teaching ability and either general intellec- tual ability or skill in discipline. These correlations are another and sufficient evidence of the fact that in analyzed judgments the factor of the spread of the general estimate is present in a most vicious form. The Relation Between Specific Traits 57 The factor of spread is shown by these data: CORRELATIONS, CORRECTED FOR ATTENUATION, BETWEEN GENERAL TEACmNQ ABILITY AND GENERAL INTELLECTUAL ABILITY, WHEN BOTH ARE JUDGED BY GROUPS OF TEACHERS Grade teachers + . 957 High-school teachers + . 937 Grade teachers +1 . 000 High-school teachers -|- . 925 Grade teachers -(- . 713 High-school teachers -f 1 . 000 Average (weighted for size of groups) -|- . 935 ± . 014 Town A TownB TownC CORRELATIONS BETWEEN GENERAL TEACHING ABILITY AND SKILL IN DISCIPLINE WHEN BOTH ARE JUDGED BY GROUPS OF TEACHERS _, . f Grade teachers + . 787 1 High-school teachers -f- . 789 _, p. f Grade teachers + . 698 1 High-school teachers -f 1 . 000 _, p f Grade teachers + . 824 \ High-school teachers -\- . 964 Average (weighted for size of groups) + . 789 ± . 041 CORRELATIONS BETWEEN SKILL IN DISCIPLINE AND GENERAL INTELLECTUAL ABILITY, WHEN BOTH ARE JUDGED BY GROUPS OF TEACHERS Grade teachers + . 941 High-school teachers -\- . 698 Grade teachers + . 824 High-school teachers + . 968 Grade teachers + . 932 High-school teachers + . 245 Average (weighted for size of groups) + . 863 zb . 025 Town A TownB TownC CORRELATION BETWEEN ABILITY TO TEACH, AS JUDGED BY SUPERVISORS, AND GENERAL INTELLECTUAL ABILITY, AS JUDGED BY GROUPS OP TEACHERS Average + . 876 CORRELATION BETWEEN ABILITY TO TEACH, AS JUDGED BY SUPERVISORS, AND SKILL IN DISCIPLINE, AS JUDGED BY GROUPS OF TEACHERS Average + . 744 It would seem that, when estimates are made of specific traits and such high correlations are obtained between the traits, a damaging factor of spread of general estimate must be allowed as a fact. 58 Qualities Related to Success in Teaching Some conclusions follow: First, that teachers, when rating each other for specific quali- ties, such as intellect or skill in discipline, agree in their estimates. This is shown by correlating the ratings of the same teachers for the same trait. These correlations average +.85'8. Second, that when ratings are made for specific qualities, a correlation between these ratings and those for general teaching ability is so high that a very great spread of the general estimate is present in the judgments for particular qualities. Third, this factor of the halo of general estimate, being present in particular judgments, is further shown by correlating the ratings for general teaching ability as they are given by the super- visors with ratings for particular qualities obtained by group judgments. Some average correlations are given below: General teaching ability, as obtained by group judgments, with general intellectual ability, similarly obtained + . 935 ± . 014 General teaching ability, as obtained by group judgments, with skill in discipline, similarly obtained +.989 ±.001 General intellectual ability, as obtained by group judg- ments, and skill in discipline, similarly obtained + . 863 =b . 020 General teaching ability, as obtained by supervisors' esti- mates, with general intellectual ability, as obtained by group judgments + . 876 ± . 020 General teaching ability, as obtained by supervisors' esti- mates, with skill in discipline + . 744 ± . 091 Fourth, when analysis is attempted, analysis is not obtained^ but ratings are obtained and these ratings are vitally influenced by the general estimate. It might be urged that this factor of spread of general estimate was greatly stimulated by the method of scoring, by the nature of instructions which were given, or by other reasons. FAILURE OF ATTEMPTS AT ANALYSIS To check up the factor of spread in other circumstances, the analyzed ratings have been obtained of 129 teachers in a New York school system. Here a regular Boyce score card was used and the teachers were rated by their superintendent, by their respective principals, and by their supervisors. Each teacher was rated for forty-five distinct qualities. A list of these quali- ties is found on page 64. Some of the usual errors in rating were present. One error The Relation Between Specific Traits 59 that is worth mentioning is that, although the instructions specifi- cally pointed out that good means above the average, the distribu- tion of ratings were in part skewed somewhat sharply from a normal distribution. Within any group of sufficient size it may be assumed for statis- tical purposes that the following distribution will satisfactorily represent the facts: 10 per cent very poor; 20 per cent poor; 40 per cent medium or average; 20 per cent good; and 10 per cent excellent. The distributions of the ratings, which were given, follow: General Teaching Ability: No. Per Cent Per Cent Very poor 0.0 normally should be 10 Poor 5 3.9 normally should be 20 Medium 34 27 . 8 normally should be 40 Good 71 55 . normally should be 20 Excellent 19 14.7 normally should be 10 Skill in Discipline: No. Per Cent Per Cent Very poor 1 0.7 normally should be 10 Poor 4 3.0 normally should be 20 Medium. .". 27 20.9 normally should be 40 Good 61 47.2 normally should be 20 Excellent 36 27 . 9 normally should be 10 General Intellectual AbiUty: No. Per Cent Per Cent Very poor 0.0 normally should be 10 Poor 0.0 normally should be 20 Medium 32 24 . 8 normally should be 40 Good 80 62.0 normally should be 20 Excellent 17 13.1 normally should be 10 The distributions for the ratings in voice are similarly massed and are also high. The distributions for the other traits, for which ratings were made, have not been worked out. The extent of the mismarking can be seen in the case of general intellectual ability. Assuming the least possible error and assuming the approxi- mate truth that a normal distribution of mental strength will be present in a group as large as that which has been here con- sidered, both of which assumptions may fairly be made, we found that only 17 per cent of the teachers received a proper rating. 60 Qualities Related to Success in Teaching We also found 16 per cent of the teachers were rated two steps too high. The remainder were misrated by one step. All were "rated up." The same fault thus appeared in score- card rating as in general-estimate rating. This skewing of the distributions is of importance to us, not because it shows that actual ratings hardly correspond with the probable facts, but because it reduces the number of groups or the spread of the dis- tribution. When correlations between traits are computed from data so greatly restricted in range, the correlations are lowered considerably not because a low correlation is the ultimate fact, but because the lack of spread in the distribution reduces the cor- relation mathematically. This rather technical consideration need not unduly concern us, for the factor of spread of judgment may be shown in quite another way, through correlations between qualities so large that only undue spread of judgment can ac- count for them. Eight correlations of traits follow: General teaching ability with general intellectual ability ... + . 677, ± . 03 General teaching abihty with skill in discipline + . 787, ± . 02 General teaching ability with voice + . 632, ± . 04 General intellectual ability with voice + . 625, ± . 04 General intellectual ability with skill in discipUne +.560, ±.04 Voice with interest in community + . 500, ± . 04 Voice with skill in discipUne + . 438, ± . 06 SkiU in discipline with morals + . 333, ± . 11 These ratings were made, of course, entirely independent of this study and under circumstances calling for unusual care and thoroughness. Common sense would tell us that the correlation between voice — defined on the score card as ''voice — pitch, quality, clear- ness of school-room voice" — and interest in community is probably zero, but here it was found to be -f.500, while voice and discipUne was -f .438, and general intellectual capacity and voice was -f.625. The sizes of the correlations do not correspond to the importance of the relationships. These data are worthy of more extended treatment than the correlations on page 61 would indicate. The inter-correlations (120 in number) have been computed for the following traits: general appearance, health, voice, intellectual capacity, accuracy, self-control, sense of justice, academic preparation, interest in The Relation Between Specific Traits 61 the life of community, ability to meet and interest patrons, pro- fessional interest and growth, use of English, discipline (govern- ing skill), attention to individual needs, and general development of pupils. The Correlations between 15 Traits as Found in the Score-Card Rating of 129 Teachers of a City in New York State Trait 1. General Appearance 2. Health 3. Voice 4. Intellectual Capacity 7. Accuracy 11. Self-Control 14. Sense of Justice 15. Academic Preparation 20. Interest in the Life of the Community 21. Ability to Meet and Inter- est Patrons 24. Professional Interest and Growth 26. Use of English 30. Discipline 40. Attention to Individual Need 43. General Development of Pupils 727 20 21 24 26 30 513 491 438 560 461 503 647 419 362 584 621 572 40 43 45 482 427 172 173 218 467 529 423 376 499 589 427 333 621 290 The most obvious fact about these correlations is monotonous similarity. They do not vary with the relevance of the relation- ships. Most of them are too high. This fact illustrates again the factor of spread of general estimate, which can be shown in still a different fashion. The distribution of the correlations follows: Frequency Correlation Range 4 -i-.l to +.2 2 +.2to +.3 15 -f.3to+.4 38 +.4 to +.5 30 +.5to+.6 19 -f.6to+.7 8 +.7to+.8 3 +.8to4-.9 True average + • 5 62 Qualities Related to Success in Teaching 1. Suppose there were present in these correlations 100 per cent spread of general estimate, and that the correlation which was typical of the halo effect was +.5, we should not expect all the correlations to be exactly +.5. They would vary or be grouped around +.5 as a mean in a normal probability curve. That is, they would tend to be at +-5, but some would be a little above and some a little below. The probable error of the +.5 correla- tion, the number of individuals being 126, is ±.068 by the for- Using ± .068 as the S. D. of the probability mula S.D.= V. curve, we can plot the position of the 120 correlations as they would occur, if pure chance only were operating. If we place the distribution of the correlations as we have them upon a distribution as they would occur by pure chance, then we can see very clearly what part of the total number of correlations need other explanation than that of pure chance variation from a typical one. The following chart shows this comparison: 8 20D2 8 3002 « 468eooz« *■ t7oot.«« t too 2. * 8 90O COMPARISON OP NORMAL CURVE AND DISTRIBtJTION OP CORRELATIONS. THE SOLID LINE IS THE NORMAL CURVE, THE BROKEN LINE THE DISTRIBUTION OP CORRELATIONS. CENTRAL TENDENCY IN BOTH CASES +.5. 2. Of the 120 correlations, 15 lie beyond the limits of pure- chance variations from a mean and 105 lie within the limits of chance variations. Within these limits the position of the correla- tions are not far dissimilar from what they would be, if chance only were operating. For 105 of the correlations no other facts than 100 per cent spread of general estimate and chance variation are needed. The Relation Between Specific Traits 63 The reader must determine for himself how significant the other 15 are. The 15 correlations which are not explainable by chance follow: Voice and Moral Influence + . 172 Intellectual Capacity and Moral Influence + . 173 InteUectual Capacity and Academic Preparation + . 126 Academic Preparation and Ability to Meet and Interest Patrons .... + . 158 Accuracy and Moral Influence + . 218 General Appearance and Health + ■ 727 Accuracy and Attention to Individual Needs 4- . 725 Self -Control and Sense of Justice + . 872 Sense of Justice and Ability to Meet and Interest Patrons + . 766 Sense of Justice and Use of English + . 771 Sense of Justice and Attention to Individual Needs + . 822 Professional Interest and Growth and Use of English + ■ 748 Use of EngUsh and General Development of Pupils + . 720 Discipline and General Development of Pupils + . 787 Attention to Individual Needs and General Development of Pupils. . + . 807 It will be seen that 5 of these correlations are too low and 10 are too high for explanation on a basis of mathematical chance. Take the correlation of voice with moral influence of +.172. Why should voice correlate with moral influence so loosely and cor- relate +.682 with intellect, .628 with accuracy, .454 with academic preparation, and .500 with interest in the life of the community. Why should voice correlate with accuracy as highly as it does with skill in discipline? In fact, it is not clear why the 5 especially low correlations are the ones which they happen to be, instead of being a part of any other 50 that one might pick almost at random. The correlations that are so high that they are beyond the range of chance variation from the mean are also a little difficult to ex- plain from any necessary relationships. Three of them have general development of the pupils as one factor and use of Eng- lish, discipline, attention to individual needs, as the others. Very likely these show true relationships, but the same data show equally as high mutual relationships between sense of justice and use of English; sense of justice and ability to meet people; self- control and sense of justice. Of the 120 correlations 105, or 87 per cent, could be explained by mere-chance variation, if the state- ment "there is as much correlation between any two traits as be- tween any other two" were literally true. The remaining 15 coefficients of correlation perhaps can best be accounted for by a 64 Qimlities Related to Success in Teaching mixture of true insight and more or less of the usual amount of tendency to spread one's general estimate over particular judg- ments. DATA ON BOYCe's SCORE CARD The score card upon which the ratings used in the last section were made is the work of A. C. Boyce. His study is reported in the Fourteenth Year-Book of the National Society for the Study of Education, Part IV. It is reported in the introduction of this study. This study is perhaps the most extended and the best one on the rating of teachers. Boyce's original data, as reported, con- tain some very good evidences of the factor of spread of general estimate. As this point is not stressed in the Boyce report, it might be well to close this discussion with a consideration of that report. Boyce found the following correlations between general teach- ing ability and forty-five traits: General Teaching Ability with r Rank 1. General Appearance + . 47 43 2. Health +.56 39 3. Voice +.53 42 4. Intellectual Capacity + . 62 34 5. Initiative and Self-reliance + . 77 13 6. Adaptability and Resourcefulness + .80 11 7. Accuracy +.74 17 8. Industry + .69 24 9. Enthusiasm and Optimism + . 71 22 10. Integrity and Sincerity + . 63 33 11. Self-control +.66 30 12. Promptness + .66 29 13. Tact +.69 25 14. Sense of Justice + . 61 36 15. Academic Preparation + . 41 44 16. Professional Preparation + . 38 45 17. Grasp of Subject-matter +.72 19 18. Understanding of Children + . 76 15 19. Interest in the Life of the School + . 65 31 20. Interest in the Life of the Community + . 62 35 21. AbiUty to Meet and Interest Patrons +.61 38 22. Interest in Lives of Pupils + .69 26 23. Cooperation and Loyalty + . 66 28 24. Professional Interest and Growth + . 72 18 25. Daily Preparation + . 68 27 26. Use of Enghsh +.55 40 27. Care of Light, Heat, and Ventilation + . 61 37 The Relation Between Specific Traits 65 General Teaching Ability with r Bank 28. Neatness of Room + . 54 41 29. Care of Routine +.64 32 30. Discipline (Governing Skill) + . 79 12 31. Definiteness and Clearness of Aim +.81 10 32. Skill in Habit Formation + .86 5 33. Skill in Stimulating Thought + . 84 8 34. Skill in Teaching How to Study + . 84 7 35. Skill in Questioning + .72 20 36. Choice of Subject-matter + . 85 6 37. Organization of Subject-matter + .87 3 38. Skill and Care in Assignment + .82 9 39. Skill in Motivating Work +.74 16 40. Attention to Individual Needs + .76 14 41. Attention and Response of the Class +.86 4 42. Growth of Pupils in Subject-matter +.87 2 43. General Development of Pupils +.88 1 44. Stimulation of Community + . 70 23 45. Moral Influence +.71 21 These correlations cannot be taken at their face value for two reasons: (a) They have not been corrected for attenuation and, hence, are far too low. (6) The procedure by which they were obtained injects an error which would make them too high. These correlations are based upon data collected from 39 schools. That is, in 39 schools some judge rated the teachers for the traits which have been mentioned. All the ratings were then put on one correlation table and the mutual relationships were worked out. Had the correlation for each of the 39 original rat- ings been worked out and the mean and variability of the distri- bution of these correlations been given, we should know what we had. When, however, ratings from 39 sets of teachers and from as many different judges are combined before they are correlated, we do not know just what we have. At best, the resulting corre- lations form a composite of the correlations between the respective pairs of traits, plus an erroneous pooling of sets of data. The re- sult is by no means a simple correlation between the traits, such as Boyce took for granted. Boyce gave as statistical reference Thorndike's Mental and Social Measurement, page 172 et seq. Neither Thorndike nor any other statistician would justify statis- tical liberties of the kind that Boyce took. It is also impossible to compute the reliability of his coefficients of correlation. 66 Qualities Related to Success in Teaching Assuming, moreover, that these two errors — lack of correction and improper treatment of data — check each other and, by good fortune, the correlations, as presented, are true, what are the probabilities of the correlations harboring a vicious spread of general estimate? The only reference Boyce made to the factor of spread or absence of analysis is on page 42, when he discussed the value of judging for separate traits. " The topics must not be too few," he said, ''for either they will be so general that little analysis is made, or, if not general, they will be sure to leave out important points." In discussing the significance of the correlations between general teaching ability and specific traits, Boyce does not seem to have been much impressed with the factor of spread. Before assuming its absence, however, he should have computed the correlations between the respective pairs of traits. In other data we find that the correlations range the same as do the correlations between specific traits and general teaching ability. This has not been done by Boyce, nor can it be done from any of his reported data. The best evidence for the presence or absence of spread of general estimate is, therefore, not available. The distribution is as follows: Frequency ' Correlation Range + .300 +.350 1 +.350 +.400 2 +.400 +.450 3 +.450 +.500 4 +.500 +.550 5 +.550 +.600 7 +.600 +.650 8 +.650 +.700 8 +.700 +.750 4 +.750 +.800 5 +.800 +.850 6 +.850 +.900 The average correlation between general teaching ability and specific traits is + . 70. This is, of course, far too high. Un- fortunately, we do not know what variations from + . 70 pure chance would explain. Within a variation of =t . 10, 60 per cent of the correlations fall. Further, within a variation of ± . 15, 85 per cent of the correlations fall. If teaching is a complex process and if the traits recorded are distinct and specific, even to a small The Relation Between Specific Traits 67 degree, the fact that 85 per cent of the correlations are found within a range of =»= .15 is exceedingly suggestive of the presence of this spread of general estimate. A cursory examination of the individual correlations suggests the same thing. There is only .03 difference between the correla- tion of academic preparation and the correlation of professional preparation with general teaching ability ! Within a range of . 04 come the correlations of industry, enthusiasm, tact, grasp of sub- ject-matter, interest in lives of the pupils, professional growth, daily preparation, skill in questioning, moral influence, stimula- tion of community with general teaching ability. The least im- portant trait of all is professional preparation ! Within a ± . 04 the following are of the same significance, care of routine, intel- lectual capacity, integrity and sincerity, self-control, promptness, sense of justice, interest in the life of the school, interest in the life of community, interest ability to meet and interest patrons, cooperation and loyalty, daily preparation, care of light, heat, ventilation. This decided monotony of the size of the correla- tions, which are obviously too high, is patent witness of the pres- ence of spread of general estimate. In our consideration of the correlations between general teach- ing ability, intellectual strength, and skill in discipline for Towns A, B, C, the fact that analysis of general worth into specific traits was not as complete as one would have ordinarily supposed, is statistically demonstrated. When the ratings of 120 teachers in a New York school system for 45 separate traits were examined, the evidence again showed that analyzed judgments are far from being beyond question. In Boyce's study, as reported, while complete statistical treat- ment is out of the question, the correlations, as given, do not show the range that common sense would lead us to expect. Their monotonous similarity also suggests that, when analyzed judg- ments are attempted, the influence of general estimate is so strong that the resulting analyses are perhaps even more justifications of the general estimates than they are judgments of the specific trait. The purpose of this chapter has been to present data which show that general estimate permeates judgments of specific traits to a degree which has not hitherto been sufficiently empha- sized.