i*t. /.6 dtorncU Unttteratta Cibrarg Jtljata, Hem f 0rb lrrJPJ}<^ty/A =ae was taken. To reaew this book copy the call No. and give to the librarian. „ _ ^ ^ HOME USE RULES ....... ....„.„.„..^....T ^^^ ^^'^^s subject to recall All borrowers must regis- ter in the library to borrow books for home use. "* All books must be re- __ turned at end of college year for inspection and repairs. Limited books must be - returned within the four week limit and not renewed. Students must return all ^ _ books before leaving town. Officers should arrange for ,« the return of books wanted during their absence from town. Vohunes of periodicals and of pamphlets are held in the library as much as possible. For special pur- ._ poses they are given out for a limited time. ■ Borrowers should not use their library privileges for ' the benefit of other persons. , Bools of special value and gift books, when the , giver wishes it, are not allowed to circulate. Readers are asked tore- port all cases of books marked or mutilated. D* not deface books by marks and writlne. LB1131 "Lts®" ""•^^'•y '-**'rary .. 3 1924 030 584 019 olin Cornell University Library The original of this bool< is in the Cornell University Library. There are no known copyright restrictions in the United States on the use of the text. http://www.archive.org/details/cu31924030584019 Non-Verbal Intelligence Tests for Use in China By Herman Chan-En Liu Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Faculty of Philosophy, Columbia University C^ .lift /h III II jf^'""^ Published by Sescdettf College, Columbia KnibectfUp New York City 1922 Non-Verbal Intelligence Tests for Use in China By Herman Chan-En Liu Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Faculty of Philosophy, Columbia University Published by ^eiuiitvi College, Columbia ^niberKitp New York City 1922 J). (. OUKM- 1 1- UK'iy.l- ir.n Y ~^c Copyright 1922, hy HERMAN ChAN-En LiU \*^ ^^ i.3 JJMHflOO Y;?1 A s\ u \ :i to my friends Emilie Bretthauer James H. Franklin Andrew MacLeish ACKNOWLEDGMENTS Grateful thanks and appreciation are hereby expressed to Pro- fessor Edward Lee Thorndike, of Teachers College, Columbia Uni- versity, under whose almost daily guidance and inspiration this study has been carried out; to Miss Margaret P Rae, principal of New York Public School No. io8, for her ready cooperation and assistance in making the experiment successful; and to Professor William Anderson McCall, Professor Henry Alford Ruger, Miss Ella Woodyard , of Teachers College, and a host of others for much valuable aid. Herman Chan-En Liu CONTENTS CHAPTER PAGE I. Introduction i A. The Problem i B. Intelligence Examination in China i C. The Development of Non-Verbal Tests in America ... . . . . 6 II. The Experiment 12 A. The Preliminary Plan 12 B. Tests Used in the Experiment 18 C. Method of Procedure .... ... 21 III. Formation of a Criterion . 28 A. Elements of a Criterion 28 B. Test Scores Weighting 33 C. Method of Selection of the Final Criterion . . 39 IV. Selection of Test Elements . .... 43 A. Selection of Test Elements by Correlation Method .... 43 B . Selection of Tests by Rating . 49 C. Selection of Tests by Partial Correlation 50 D. Selection of Tests by Composite Method 52 E. Weighting by Regression Equation . 55 V. Re-testing .... 58 A. Procedure of Re-testing 58 B. Statistical Study . . .61 VI . Alternative Forms and Standardization 63 A. Alternative Forms . 63 B. Standardization 64 1. Norms . ... 65 2. Scaling . . 66 viii Contents CHAPTER PAGE VII. The Chinese Non-Verbal Tests 68 A. The Nature of the Tests 68 B. Instructions to Examiner 69 C. Directions for Giving the Tests 70 D . Directions for Scoring the Tests ... 73 E. Treatment of Results 74 F. Caution 74 VIII. Summary AND Conclusions 75 Appendices A. Samples of Form A of the Chinese Non-Verbal Intelli- gence Examination 77 B. Samples of Records Kept 78 Bibliography 83 INDEX OF TABLES NUMBER PAGE I. Distribution of Age and the Numerical Values Assigned 30 II . Age Distribution Showing the Slope and the Increase of Scores as Age Advances (Data from Boys Who Have Taken the Pint- ner Non-Language Tests) . •■ ■ ■ 35 III. Grade Distribution of Pressey Scale 36 IV. Weighting of the Scales According to Q . 38 V. Data for School Criterion (10 Selected Pupils) . 40 VI. Data for Calculation of the Final Criterion (10 Selected Pupils) 42 VII. Correlations of Individual Tests with Final Criterion by Shep- pard's and Product-Moment Methods ... -45 VIII. Correlations of Individual Tests with Combination of Beta 4 and 6, by Sheppard's Formula . . . . 46 IX. Correlations of the Different Scales with the Final Criterion and the Inter-Correlations of the Individual Tests 47 X. Correlations between the Individual Tests and the Basic Tests (Pintner 2 and 3, and Beta 6). 49 XI. Ratings of the Individual Tests by Competent Judges 51 XII. Individual Tests Rated re Application to Chinese . . 52 XIII. Correlations of the Individual Tests with the Final Criterion with the Elements of the Basic Tests Eliminated (''12.3) 53 XIV. Combined Value of the Individual Tests, as Determined by Ratings and Partial r Method 54 XV. Data for Calculation of Regression Equation . . 56 XVI . Distribution of Re-testing Scores by Grades . 59 XVII. Distribution of Re-testing Scores by Ages 60 INDEX OF FIGURES NUMBER PAGE 1 . The Nine-Ring Puzzle . . . . 4 2. The Seven Mysterious Boards . . . 4 3. Illustrations of the Seven Mysterious Boards . 4 4. Illustrations of the Seven Mysterious Boards . . 4 5. Showing 27 Per Cent Overlapping of Grade III over Grade IV in the Scores of Pressey Primer Scale . . 37 6. Showing 21 Per Cent Overlapping of Grade III over Grade IV in the Scores of Myers Mental Measure 37 7. Showing 15.2 Per Cent Overlapping of Grade III over Grade IV in the Scores of Pintner Non-Language Tests . 37 8. Showing 12 Per Cent Overlapping of Grade III over Grade IV in the Scores of Army Beta Examination .... • • 37 9. Showing 9.8 Per Cent Overlapping of Grade III over Grade IV in the Scores of Dearborn Group Tests of Intelligence 37 CHAPTER I INTRODUCTION A. THE PROBLEM Psychological tests which have been applied in America with great success are now being experimented with in China. Progres- sive Chinese educators who are attempting to introduce the meas- urement movement into China, however, are confronted with the problem of procuring and selecting suitable test material. China, with its distinctive civilization and numerous dialects, presents a difficult field for the literal transcription of the American intelligence tests. This difficulty virtually prevents a widespread use in China of the language test, and makes necessary the construction of a non- language test. The present study is an attempt to develop a non- verbal scale, which, because of the elimination of language and schooling factors, may be used in China as an independent measure of general intelligence or as a supplement to a language test. B. INTELLIGENCE EXAMINATIONS IN CHINA The practice of setting intelligence examinations is not new in China. It is as old as our history, although the traditional methods have been crude and pseudo-psychological. The earliest methods, which still prevail, are Kan Hsiang, phys- iognomy-reading, and Shan Ming, fortune-telling. Pseudo-psy- chologists in the guise of fortune-tellers and popular physiognomists are found everywhere. They are frequently consulted by unedu- cated parents as to the intelligence of their children, whose careers and destinies they foretell. The calculations of these pseudo-psy- chologists are said to be based upon the hour and date of birth, and physiognomic and anthropometric characteristics. The system of competitive examinations, employed in China for centuries, was a sort of intelligence test. Its purpose was the selec- tion of candidates for civil service. Scholars gathered at the exam- ination halls, which were located in every district. There they were 2 Non- Verbal Intelligence Tests for Use in China confined in little cells in which they composed classical essays on assigned subjects. Examinations were conducted and papers graded by high government officials. The results were announced with great ceremony, and the successful candidates honored with "Kung Ming," — the equivalent of American academic degrees. The practice was founded on the theory that only the intelligent and educated men should rule. No age or birth qualifications were required for participation in these examinations. Youngsters under twelve years of age, however, were sometimes released from the rigid, formal standards. In such cases the regular examination was often replaced by a series of "opposites or matching tests," in which the applicants were required to match assigned words and phrases. For instance, "East" would be expected to be matched with the word "West"; "above" with "below"; "mountains" with "oceans." The following is a typical "Dui Dzi," or opposites test: ^ (a) Chiang Fu Djoh Ma {b) Wang Dzi Cheng Lung The translation of matching phrase (a) with phrase {b) is as follows: ^ (a) Consider Father Being Horse {b) Expect Son Becoming Dragon Of the old intelligence tests used in the schools of China, there were certain kinds called "Tien Dzih," that is, "completion tests." Some teachers occasionally employed these tests in judging the brightness of their pupils; others employed them as supplementary '^A story relates that a certain farmer carried his young son on his back to the examination hall. The examiner, upon the arrival of the youngster, was surprised at his presence and inquired of him how he had managed to come all the way from his distant home. The boy replied: "I came on my father's back." The boy's answer at once suggested to the examiner a topic for the opposites test, so he said; "Well, if you can match the phrase which I am about to give you, you are passed." The examiner then requested the boy to match "Consider Father Being Horse." The clever child, without a moment's hesitation, replied; "Expect Son Becoming Drdgon." He had matched the assigned phrase so well, that he was given a pass without further examination. ^ These are not strictly "opposites tests," as understood in America- but rather matching tests. They are comprehensive, requiring on the part of the examinee quick understanding and sound reasoning. Introduction 3 methods of teaching elementary composition. Problems in compo- sition were often made by omitting a few words from a well-con- structed sentence, necessitating the filling in of the blanks by the children. A type of test similar to the puzzles used by Ruger is also quite common in China. The most famous of these puzzles is the "Kiu Lien Huan" — a nine-ring puzzle (see Fig. i) , consisting of nine con- nected copper rings mounted on a bar with a rod running through the center of the rings. The puzzle is how to get the rod out of the rings m^^^^ m.w j ) ^^^"^^^^^^^^^^^^^^^^ -^S" C — FiG, I. The Nine-Ring Puzzle. — a task which requires reasoning, and which seldom is solved by the trial-and-error method. The ring puzzle is used merely as a toy, not as a formal test, yet one often hears the remark, "Solve this puzzle and let us see how bright you are." "Performance tests" also have been in use for centuries in China. The most noted one is "Yih Chih Tu," also called "Tsih Chiao Pan" (see Figs. 2, 3 and 4). Translated literally into English, it would be called "Increasing Wisdom Board," or the "Seven Mysterious Boards." It was called the "Increasing Wisdom Board" because playing with it was believed to increase one's wisdom. It was called the "Seven Mysterious Boards" because with the seven pieces of different shapes and sizes which made up the game, many forms of men, animals, birds, and inanimate objects could be constructed. The game was played by any number of persons, each with his own set of forms. The purpose was to see which person could construct the largest number of objects out of his seven pieces, the winner being considered the most intelligent person in the group.' ' It is said that the game originated in the ancient imperial palace among the women of the court, who, in the great amount of leisure time at their disposal, wel- comed such sport. Later it became popular among the people. Non-Verbal Intelligence Tests for Use in China F; &-. 7- Fio. 3 Fig. 2. The Seven Mysterious Boards. Fig. 3. Illustrations of the Seven Mysterious Boards: I Man walking; 2 carriage; 3 man running; 4 and 5 two animals fighting. Fig. 4. Illustrations of the Seven Mysterious Boards: Candle sticks and different kinds of vessels. Introduction 5 The various tests which have been described here cannot be termed "intelligence tests" in the strictly psychological sense be- cause they are not standardized. They are not extensively used as a measure of general intelligence, but rather as intellectual games. They do demonstrate, however, that the practice of intelligence examinations, although crude and pseudo-psychological, does exist and has existed in China for centuries. It is quite possible that some of these old methods and materials may prove useful in the construction of a genuine intelligence test for China. It is only within the last few years that scientific psychological measurements have been iniroduced in China. The earliest known experimental work on the subject is that conducted by Dr.W .W. Creighton.' From 1915 to 1917, under the direction of Professor W. H. Pyle, of the University of Missouri, Dr. Creighton made a study of the mental and physical characteristics of Cantonese chil- dren. Th^ subjects under examination numbered approximately five hundred, most of them ranging from ten to eighteen years in age, although twenty-five women were among those examined. The mental tests used in this experiment were those of rote memory, logical memory, substitution, analogies, and dot patterns. In con- ducting this experiment Dr. Creighton met with great difficulties as a result of the many dialects prevailing in this province. In his report he says: "In the mental measurements we were confronted at once by language difficulties." In 1918 Professor G. D. Walcott'' measured the intelligence of the students in the senior class in Tsinghua College, Peking, who averaged twenty-two years of age. Professor Walcott used the Stanford Revision of the Binet Scale, with the Scott Group Test as a check. The results of the experiment were not very satisfactory as, in addition to the insufficiency of the scale for persons of that age, the language difficulties were insurmountable. Somewhat later, in the fall of 1920, the Nanking Government Teachers College tried psychological tests for the entrance examina- tion. This is the first attempt made by Chinese educators to intro- ' Pyle, W. H.: "A Study ol the Mental and Physical Characteristics of the Chinese,'' School and Society, Vol. viii. No. 192 (August 31, 1918), pp. 264-69. * Walcott, G. D. "The Intelligence of Chinese Students," School and Society, xi. 1920, pp. 474-80. 6 Non- Verbal Intelligence Tests for Use in China duce scientific intelligence tests into China. Two psychologists educated in America, Professors H. C. Chen and S. C Liao,' devised five tests. The correlation of these tests and the average grades of the regular examination was .39.=' Psychological tests for entrance examinations were next taken up by the Peking Government Teachers College. The correlation between the tests and the average grades of the regular examination was practically zero (.000046).' At the present time, Chinese progressive educators, especially those trained in America, are eager to introduce the use of psycho- logical tests into China. Institutions, such as Nanking and Peking Teachers Colleges, as indicated above, have already started the movement, A few private and missionary schools have also adopted some form of tests. The Stanford-Binet Scale has been translated, although it is little used. Aside from these isolated experiments, however, very little has been done. Psychological tests remain virtually unknown. Here lies a great unexplored field of endeavor for the young Chinese schoolman trained in modern scientific method. He needs to understand, however, the difficulties which the use of the numerous dialects and the large percentage of illiter- acy offer to the use of any language scale. Evidently a non-verbal test may hope to succeed where the language test is totally inade- quate. The development of such a non-language scale is the pur- pose of the present study. C. THE DEVELOPMENT OP NON-VERBAL TESTS IN AMERICA Psychological tests may be roughly classified into two main groups: namely, language tests and non-language tests. The for- mer includes those tests which require verbal response from the sub- ject. The latter group of tests does not require such verbal response. The non-language tests, again, may be subdivided into a group of performance tests which require the doing of some task by means of ^ Journal of Educational Research, Vol. m, No. 5 (May, 1921), p. ■\qa » As this goes to press, the author has received a copy of Menial Tests in China, written by Professors H. C. Chen and C. S. Liao. It contains thirty-five diilerent tests, twenty-four of which are translated from American tests. * The data are found in the Peking Teachers College Weekly, No. 132 (September II, 192 1), p. 3, but the correlation was computed by the author by the product-moment method. Introduction 7 certain actual mechanical manipulations, and a group which require the subject to work with geometrical designs, figures or pictures, indicating the results of his thinking by making lines or pictures. Non-language or non-verbal intelligence tests are the outgrowth of the intelligence measurement movement. Of recent development and used extensively only within the last two or three years, these non-verbal tests have shared the fame of the language tests. Among the tests devised by Alfred Binet, father of the movement for the measurement of intelligence, and published in his 1905-06 series are a number of tests which do not require verbal responses.^ For example, in the visual coordinations test, the examiner moves a lighted match slowly before the subject's eyes and notes whether he follows the movement with the properly coordinated movements of the head and eyes. In the test known as "prehension provoked tactually," he places the small wooden cube in contact with the palm or the back of the subject's hand to determine whether he can execute properly coordinated movements of grasping. In the draw- ing test, he shows the subject two drawings, permits him to look at them for ten seconds, and then requires him to draw the views from memory. None of these tests expects a verbal response from the subject. The scale devised by these French psychologists, Alfred Binet and T. Simon, was first translated and adapted for American use by Goddard.^ Kuhlman ' and Wallin * followed with further adap- tations. The latest revisions of the scale are by Yerkes, Bridges, and Hardwick,^ by Terman ' and by Herring.' They all adhere ' The other non-language tests in the Binet-Simon 1905 series are tests numbered 3, 4, 5, 10, 12, 21, 22, 23, and 29. In the 1908 series the non-language tests are num- bered 9, 10, II, 12, 14, 16, 23, 24, 33, 54. For the complete account see A. Binet and T. Simon, "Le developpement de I'intelligence chez les enfants," in L'Annie Psychologigtie, 14, 1908, pp. 1-94; and A. Binet and T. Simon, "L'intelligence des imbe- ciles," in L'Annie psychologique, 1909, pp. 1-147. 2 Goddard, H. H.: "The Binet-Simon Measuring Scale of Intelligence. Revised," Training School BuUeiin, Vol. viii (1911), pp. 56-62. ' Kuhlman, F.: "A Revision of the Binet-Simon System for Measuring the Intel- ligence of Children," Journal of Psycho-Asthenics , monograph supnlement, No. i, p. 41. 'Wallin, J. E.; Experimental Studies of Mental Defectives: A Critique of the Binet- Simon Tests. ' Yerkes, R. M., Bridges, J. W. and Hardvifick. P. S.: A Point Scale for Measuring Mental Ability . ' Terman, Lewis M.: The Measurement of Intelligence . ' Herring, John P.: Significance of Certain Elements in Intelligence Examinations, unpublished Ph.D. dissertation (Columbia University). 8 Non- Verbal Intelligence Tests for Use in China more or less closely to the original Binet Scale, and consequently some of their tests are non-verbal in nature. . . In spite of the merits of the Binet-Simon Scale and its revisions, their chief deficiency lies in the large proportion of tasks requiring language responses. This criticism of the scale was vigorously pre- sented by Ayres in 191 1. He pointed out that the Binet tests pre- dominantly reflect the child's ability fluently to use words, and do not reveal his ability to do acts. Thus, it gives "a warped and par- tial measure of his real degree of intelligence." ' The language difficulty, inherent in the Binet-Simon Scale and its various revisions, became evident when the clinical psychologists attempted to apply it in various fields of practical work. They found that the Scale was utterly inadequate for the mental examina- tion of non-English speaking people, speech defectives, the deaf, and those with language difficulties. Hence they introduced non-lan- guage tests which do not require language responses on the part of the child for adequate performance. Among those who first used the non-language test were Healy and Fernald.^ In carrying out mental examinations at the Juvenile Psychopathic Institute of Chicago, they had been confronted with the problem of testing a cosmopolitan population. Some of the inmates were illiterate, and some, though educated in their own tongue, were unable to speak the English language. Since they represented most of the nation- alities and languages of Europe no single test requiring language directions and responses could be adequate to measure them. In discussing their work, Healy and Fernald say: "The Binet-Simon Scale helps little where the language factor is a barrier, either on account of foreign parentage or insufficient schooling, or with unedu- cated deaf and dumb children." ' They became convinced that language, as far as possible, should be eliminated from the mental examinations given to such subjects. They say: "In predicting the possible development of an individual under various conditions, it is most desirable to ascertain the mental ability quite apart from the individual's experience in formal training in our language, or indeed ' Ayres, L. P.; "The Binet-Simon Measuring Scale for Intelligence: Some Criticisms and Suggestions," Psychological Clinic, Vol. v (ipii), pp. 187-96. ^ Healy, W., and Fernald, G. M.: "Tests for Practical Mental Classifications," Psychological Monographs, Vol. 54, No. x, pp. 4-5. ' Ibid. Introduction 9 any language. It often becomes necessary to classify mentally a subject who has had no education in English-speaking schools, or indeed who has had but little schooling of any kind." '■ The work carried on at the Institute not only proved the inade- quacy of the language tests, but demonstrated the practical value of the non-language tests. Healy and Fernald conclude as follows: "On one occasion we found ourselves able to demonstrate satisfac- torily that a Gypsy boy of fifteen, quite innocent of schooling and knowledge of the three R's, had at least fair, if not good, native ability. And repeatedly a number of our tests have proved most serviceable in mentally classifying young, deaf and dumb chil- dren." 1 Knox, in^ his work among the immigrants at Ellis Island, found it impossible (even with the services of an interpreter) to use scales in which language responses were required. Faced with this language obstacle, and under the necessity to diagnose mental disease and mental deficiency among the immigrants, Knox devised a series of non-language tests, many of which are excellent and still widely used in psychological clinics. Pintner and Patterson' also found the language scale "absolutely inadequate to test the mentality of deaf children." They experi- mented with the Binet-Simon Scale, but were confronted with numerous difficulties, such as lack of comprehension of certain tasks due to physical deficiency which in turn had made for lack in the environment of opportunity for forms of experience needed to ac- quire the proper test reaction. Consequently, they constructed a scale of performance tests which requires practically no instructions for the child other than natural gestures. Pintner and Patterson consider the non-language feature of the test as a sine qua nan in the measurement of mentality in the deaf. As to the importance of the non-language tests, they say: "Here we have a group of indi- viduals, completely shut off from hearing language, and for that reason laboring under a language difficulty that only in rare cases is surmounted to the extent of making them comparable in language Hbid. * Knox, H. A.: "A Scale Based on the Work at Ellis Island for Establishing Mental Defects," Journal of the American Medical Association, Vol. Lxii (March 7, 1914), pp. 741-47- ' Pintner, R. and Patterson, D. G.: "The Binet Scale and the Deaf Child," Journal of Educational Psychology, Vol. vi (1915), pp. 202 ff. lo Non- Verbal Intelligence Tests for Use in China ability to ordinary hearing individuals. Any kind of tests involving reading or spoken language cannot be used as a test of their mental- ity. If we employ such tests for measuring the mentality of the deaf and use the standardization obtained from hearing children, we will not be measuring mentality but merely difference in language abil- ity. There may be a greater percentage of feeble-mindedness among the deaf than among the hearing but the fact that a deaf child does not measure up to the language standard of a hearing child is no indication of mental deficiency." The performance tests lately have been used not only for the deaf but also for the non-English- speaking children, speech defectives, and children from different language environments. The development of the non-language tests was greatly advanced , and their practical value definitely recognized, as a result of the United States Army psychological examinations.' In 1917, when the psychologists took up the personnel work in the Army they soon discovered that many of the men were handicapped by language difficulties. In order to permit the illiterates a real opportunity to show their ability, a non-language scale was constructed. Dem- onstration charts and pantomime were used to convey the instruc- tions to the examinees. These methods require no language direc- tions or responses. This scale, known as the Army Beta Examina- tion, consisted of seven tests, including maze, cube analysis, X-0 series, digit-symbol, number checking, pictorial completion, and geometrical construction. The scale was applied to 23 ,547 men. Its results correlate with the Army language examination Alpha to the amount of .80; with Stanford-Binet, .73; with the composite of Alpha, Beta, and Stanford-Binet, .91 . This high correlation demon- strates the practicability of making non-language tests and the feasibility of their use where the language tests fail utterly. The unexpected efficiency of the Army Beta Examination thus demon- strated during the war, later brought about a mushroom growth of the non-verbal test material. Thorndike,^ champion of the measure- ment movement, who had charge of much of the statistical work in the development of the Army tests, was first to utilize the data and experience gained from these tests. The Thorndike Non-Verbal 1 Voakura, C. S. and Yerkes, R. M.: Army Mental Tests. 2 Thorndike, E. L.; "A Standard Group Examination of Intelligence Independent of Language," Journal of Applied Psychology, Vol. 3, No. i (March, 1919), pp. 13-32. Introduction 1 1 Examination follows the general nature of the Army Beta, but elim- inates one weakness by providing ten alternative forms of the exam- ination instead of the single form, thus reducing the error in measure- ment caused by unfair tutoring. Such alternative forms widen the field of usefulness of tests in many ways, permitting a study of the growth of intelligence by repeated testing, comparison of groups and individuals and increased reliability in the determination of the intelligence of groups and individuals. Pintner,^ who with Patterson constructed the Performance Scale, has also, since the war, devised a non-language group intelligence test. He realized that his Performance Scale, although it required no language response, was still clumsy and not convenient for appli- cation to a group. Consequently, in his later scale he devised a set of six non-language tests for group use. When compared with the results obtained from the Binet-Simon Scale the correlation was found to be .66. He recommends that such tests be used in mental survey work for school children and adults, particularly in communi- ties containing a large foreign or illiterate element. In addition to the non-verbal tests which have already been dis- cussed, there are many others available. Among the more well- known scales are Myers' Mental Measure, Pressey's Primer Scale, Kingsbury's Primary Group Intelligence Tests, and Dearborn's Group Intelligence Tests. All these tests have been widely em- ployed, with varying degrees of success, by psychologists. The rapidity of the development of non-language tests has been phenomenal and indicates that it meets an important need. In the Binet-Simon Scale, there were only a few tests which required no language responses. Then followed the performance scales, devel- oped by Healy, Knox, Pintner and others, in which language re- sponses are completely eliminated. The Army Beta Examination, with its wide application among the millions of soldiers, demon- strated its practical value for intelligence measurement and for group use. Others have succeeded in advancing the non-verbal tests beyond the experimental stage. These tests are now applied to individuals and groups, both as an independent measure and as a supplement to language tests, with confidence that the results are trustworthy and fairly adequate. ' Pintner, R.: The Menial Survey. CHAPTER II THE EXPERIMENT A. PRELIMINARY PLAN In drafting a preliminary plan for the experiment it was first decided to devise a large number of tests and to try them out on Chinese children in America. Since the purpose of the experiment was to develop a non-verbal intelligence scale for use in China, it appeared essential that the subjects be Chinese. Ten non-verbal tests were consequently constructed and mimeographed for trials. Fifty-one persons were examined with these tests, after which the examinations were discontinued as impracticable. The reasons for the disuse of the examinations were threefold: first, the tests were constructed by the subjective method instead of by the objective or scientific method; second, the tests were mimeographed instead of being printed, causing the test material to be in many instances indistinct and difificult of recognition; third, the scarcity of Chinese subjects, and the difficulty of dealing with the few which were available. Three months' time and much labor had been expended, and naturally the results were discouraging. An important fact, however, was revealed by these examinations; namely, the children of naturalized Chinese and of Chinese long resident in this country had been affected by their American environment and training, so that they were more American than Chinese. Tests which were applicable to American-Chinese children would be quite irrelevant if applied to children in China. As a result of these findings, the mimeographed tests were abandoned and thought was turned to the formulation of a new plan. A careful study was then made of all the available intelligence tests, especially the non-verbal forms. The new plan under con- sideration was to select the best elements in the American non-verbal tests and to attempt to develop them into a non-verbal scale for use in China. At the time (1920), there were available the following non-verbal and semi-non-verbal tests: I. Army Beta Examination 2 3 4 5 6 7 8 9 10 r/te Experiment 13 Dearborn Group Tests of Intelligence, Series I Haggerty Intelligence Examination Delta 2 Holley Picture Completion Test for Primary Grades Myers Mental Measure National Intelligence Tests Otis Group Intelligence Examination Pintner's Mental Survey Tests Pressey Primer Scale Trabue Mentimeters The question arose whether all of these tests, or whether any of them, could be used in the experiment. Beforecoming toadecision, it was necessary to formulate definitely the principles to be embodied in the proposed scale for use in China. After considerable study, the following principles were adopted as criteria: 1. Tests should involve no language responses from the subjects. 2. Test materials should be drawn from social environment com- mon to all peoples. 3. Test material should exclude, as much as possible, school training. 4. Test material should be of interest to all types of subjects. 5. Tests should be valid as a measure of intelligence. 6. Tests should be reliable. 7. Objective methods should be employed in both giving and scoring of tests. 8. Tests should measure a wide range of intelligence. 9. Tests should indicate mental growth. 10. Tests should be adapted for group use. 11. Time for testing and scoring should be reasonably short. 12. Instructions for testing and scoring should be simplified for use by teachers and others who are not specialists in measurements. 13. Tests should have alternative forms as a preventive against the vicious effect of coaching. 14. Test material should be inexpensive, easy to handle, of small bulk, and easily kept in order. I. Tests should involve no language responses from the subject. General intelligence signifies a group of related inborn capacities for adapting one's self to specific situations in life. Inborn capaci- 14 Non-Verbal Intelligence Tests for Use in China ties, however, are never .measured directly but are always inferred from the ability displayed. Language use is one of these abilities which ordinarily is a good index to intelligence, but it has its limi- tations. It cannot be employed, for instance, as a medium to measure intelligence when the language varies among the subjects under examination sufficiently to make understanding or executing the tasks difficult, slow, or impossible. Such a condition exists in China. The languages spoken in various sections differing widely, people from Peking do not understand the dialect of Canton, and the Shanghai dialect is different from that of Hankow. This diver- sity of dialects is not only characteristic of the provinces but exists in local districts of the same province. The written language, it is true, is identical throughout China, but comparatively few can read , 90 per cent of the Chinese people being illiterate. Under these conditions, a non-verbal test for use in China would have great superiority over any existing language test. 2 . Test material should be drawn from social environment common to all peoples. It is a well-known fact that social environment affects the devel- opment of intelligence. Edison, born and raised in the wilds of Thibet, would doubtless never have developed into the particular kind of a mechanical genius he now is. To measure a Thibet-born Edison by the standards used in examining an American-born Edison, would manifestly be inaccurate and unfair. The uncivilized Miaotze boy in Yunnan could not be expected to answer questions on automobiles or airplanes; and the New York boy, raised in the Bronx, could not be expected to answer intelligently questions on rice growing. There should be common grounds; the test material should be drawn from an experience common to all. Tests should measure capacity, and this can be accomplished by measuring only those traits possible of development by all subjects. Tests, based on such a principle, could be employed over all of China. 3. Test material should exclude, as far as possible, school training. As ninety per cent of the Chinese people are illiterate, test material which requires school training must prove inadequate. Culture and school training are both acquired, not innate. They vary in different persons according to the environment to which The Experiment 15 they have been subjected. The boy ignorant of mathematics could not be expected to solve problems in algebra as well as the son of an instructor in mathematics. In order to compare the native ability of children, therefore, the products of school training should be excluded from the test material. 4. Test material should be of interest to all types of subjects. Interest in the tests is essential to proper reaction; therefore, a good test should arouse interest in the subjects of widely differing mentality and type of intellect. Unless this is accomplished, the results of the test will not indicate the actual intelligence. Errors have been made in drawing conclusions as to the intelligence of the individuals in a group, when these individuals have had interests different from those called out by the test. For instance, a mechan- ical test given to a co-educational class usually results in a higher score for the boys than for the girls. The scores in this case do not prove that the boys are more intelligent than the girls; they prob- ably indicate rather the difference in degree of interest in the sub- ject between the boys and girls. It is, therefore, evident that the tests to be adequate must be of common interest to the entire group. 5. Tests should be valid as a measure of intelligence. A test is valid when it actually measures the trait which it pro- fesses to measure. A valid test, therefore, implies actual, con- sistent measurement. Whether a test is valid or not is determined by the correlation of test scores and the elements of the intelligence, as objectively known by other means. The checks on validity most often used are school marks and progress, and estimates by teachers and associates. In applying this principle, the reliabiHty of such checks should be investigated. 6. Tests should be reliable. Reliability in a test indicates the obtaining of similar results from two or more testings of the same subjects under the same conditions. Perfect reliability implies identical results from two or more testings under identical conditions, and is, therefore, never completely attained; but competent authorities agree that the coefficient of reliability should be .90 or higher for a group of equal age. 1 6 Non- Verbal Intelligence Tests for Use in China 7. Objective methods should be employed in both giving and scoring of tests. Objectivity is attained when the methods and procedure of testing and scoring are uniform and independent of personal opinion so that the results may be verified by other testers. That is to say, methods of testing and scoring should be identical at all times for all testers. The personal equation of the teacher should be eliminated as far as possible. The results of the testing should endure verification in all cases where the same tests are applied to the same subjects, using the same methods under similar conditions. 8. Tests should measure a wide range of intelligence. The term "general intelligence" means the combination of many mental traits. It varies in amount in individuals from practically zero in the lowest grade of idiots to that large quantity, at present unmeasured, of the world's greatest genius. Its distribution, according to the best available estimates, approximates a bell- shaped curve; that is to say, there are few of genius level, a large number of ordinary or average people, and comparatively few idiots. An intelligence test, to be entirely satisfactory, should be easy enough for all except the hopeless idiots to make some score and sufficiently difficult for a person of great genius not to make a perfect score. On the other hand, the scores should be distributed continuously and around one mode. Furthermore, the tests should measure a large number of unlike or differentiating traits. The ideal way would be to measure every trait that contributes to intel- ligence and to give each trait a weighting proportional to its con- tribution to the total intelligence. This is impossible in our present state of knowledge, but an intelligence test should measure as many differentiating traits as possible. 9. Tests should indicate mental growth. Intelligence develops along with the advance of chronological age up to a point believed to be somewhere near the end of the adolescent period. As the child grows older, his native endowment unfolds. So a normal ten-year-old child should be able to do more than the eight-year-old child and a normal eight-year-old child should know more than a six-year-old child. The intelligence test should reveal the different stages of development by improved scores with each The Experiment 17 increase in chronological age. This mental index is known as mental age. The mental age divided by the chronological age gives what is known as the intelligence quotient. ID. Tests should be adapted for group use. Group testing enables the examiner to test many persons at a time and therefore makes possible the testing of many more children with the same expenditure of time, labor and money, than can be achieved by testing them singly. The success of the group- testing method was shown in the United States Army during the war. To test two million soldiers individually in so short a time was totally impossible, but by means of group tests the men were speedily sorted and classified. Group testing may not give such an accurate diag- nosis as does individual testing, but it is generally satisfactory and can be supplemented by individual tests in exceptional cases. For general use in China, the tests must be adapted for group use. 11. Time for testing and scoring should be reasonably short. Time for testing should be long enough for the average subject to give response without hurry, but it should be reasonably short so as not to cause fatigue in the subjects nor to entail such adminis- trative inconvenience as to prevent its use. If two scales, for in- stance, give the same result, and one takes thirty minutes to give, while the other takes two hours, the former is certainly preferable to the latter. As to scoring, the test should be constructed so that it may be accurately, uniformly, and rapidly scored with little depend- ence upon the judgment of the persons doing it. Mechanical scor- ing devices should be employed whenever advisable. 12. Instructions for testing and scoring should be simplified for use by teachers and others who are not specialists in measurement. There are not many psychologists in China. Most of the measure- ment work probably will be done by the ambitious teachers and others who are not specialists in measurements. To facilitate the work, it is absolutely necessary that the instructions for both testing and scoring should be simplified so that they can be followed easily. The instructions should be clear, concise, and adequate, but must be brief, consistent, and uniform for all who are to be testers. 1 8 Non- Verbal Intelligence Tests for Use in China Whenever possible, instructions should employ a preliminary dem- onstration test in order that the subjects may understand clearly what they are expected to do. 13. Tests should have alternative forms as a preventive against the vicious effect of coaching. The one-form scale has at least two defects. First, if the tests are to be used as a basis for promotion in education or business, ambitious parents will be likely to purchase the material and coach their children with the object of increasing their scores. Second, the one-form scale cannot be used in retesting for a study of mental growth. Therefore, alternative forms should be prepared. They should have the same value, however, as the original form, and measure the same traits. 14. Test material should be inexpensive, easy to handle, of small bulk and easily kept in order. As communication is inconvenient in some parts of China and the merit of intelligence measurement is not as yet widely demonstrated there, it is important that every advantage be taken to facilitate the use of the tests . They should , therefore , be easy to handle ; they should not be bulky nor contain apparatus which is difficult to keep in order; and the cost of the test material should be small. B. TESTS USED IN THE PRESENT EXPERIMENT In consideration of the above adopted principles, a selection was made from the ten non-verbal tests listed on pages 12 and 13 of the following tests, to be used in the experiment: 1. Myers Mental Measure 2. Pintner's Non-language Tests 3. Pressey Primer Scale 4. Army Beta Examination 5. Dearborn Group Examination, Series I General Examination i General Examination 2 General Examination 3 A brief description of each of these tests follows: I. Myers Mental Measure.^ ^ School and Society, Vol. lo, pp. 353-60 (1919). The Experiment 19 The Myers Mental Measure was devised by Carolyne E. Myers and Garry C. Myers for school use, the Measure being based upon the Army Beta tests. Mr. and Mrs. Myers were interested in the classification of children as early as possible on the basis of intelli- gence in order that children of marked ability might be selected for rapid advancement, and that those of very low grade intelligence might early be segregated. To do this, they devised a scale, uni- versal in nature, with the hope that it could be applied to school children of all ages and given in 1 5 or 20 minutes to a large number of individuals. The scale consists of four tests, all of which are pictures. The first test is called a directions test. It requires the child to obey certain directions, such as to draw a line or make a mark in a particular way. It furthermore needs no preliminary demonstration other than a brief pantomime with very little spoken instruction. The second test is a picture-completion test consisting of pictures of familiar objects or situations, with one important element missing which the subject must supply. The third is a learning test which requires the subject to make substitution of proper symbols for other sym- bols, while the fourth is a common element test in which the subject is asked to mark the pictures which are similar in some way. Mr. and Mrs. Myers used the Stanford-Binet Scale as a check upon their own scale. Omitting test 3, which gives practically zero correlation, the total of tests i, 2 and 4 correlates about .80 with Stanford-Binet. 2. Pintner's Non-Language Tests. ^ Pintner's Non-Language Tests were devised by Professor Rudolph Pintner with the purpose of measuring the general intelligence of the deaf, illiterate, and non-English speaking. A knowledge of English is not needed either to understand the directions or to make responses. The scale consists of six tests which have been arranged for group testing, suitable for children and for adults. The first is the imitation test which is essentially the same as the Knox Test. The second and third are "easy learning" and "hard learning" tests respectively. The task in the next one is a "drawing completion" test, which is an abbreviated form of the larger drawing test devised 'Pintner, R.: "A Non-language Group Intelligence Test,'' Journal of Applied Psychology, Vol. iii, No. 3 (September, 1919). 20 Non- Verbal Intelligence Tests for Use in China by the same author. The fifth is the "reversed-drawing" test, which requires the subject to draw the reversal, or counterpart of a drawing given. The last test is "picture-reconstruction," involving the rearranging of picture sections with the object of completing the entire picture. All the correlations between each test and the total score were found positive and fairly high; and the correlation be- tween the I Q on the Stanford-Binet and the percentile rank on the Pintner's Non-Language Tests was .66. 3. Pressey Primer Scale} Pressey Primer Scale is known as the "crossing-out" test. As the authors describe, "each test asks of the subject that by crossing out some one thing, he eliminate a wrong, irrelevant, or extreme element in a situation." The scale was devised for the use of the first three grades. In the first test, the subjects are required to cross out an unnecessary dot in each of several groups of dots. The second involves the crossing out of the most discordant, or dis- similar object from a group of three objects; the third, for the crossing out of the superfluous block in each square, after the other blocks have been fitted into four patterns at the top of the page; and the fourth test provides for the crossing out, in each picture, of the absurd part. 4. Army Beta Examination} The Beta Examination was introduced primarily for the group testing in the Army during the World War of those illiterate in English. Instructions were given in the form of four demonstrations at the beginning of each test with gestures and pantomimes. The original or trial series consisted of fifteen tests, but after an extensive trial, seven tests were finally retained. These tests are known as maze, cube analysis, X-0 series, digit-symbol, number checking, pictorial completion, and geometric construction. The maze test, devised by C. R. Brown, was retained from the preliminary trials because it could be successfully demonstrated , gives few zero scores and correlates fairly well with the total scores of army Alpha and ' Pressey, S. L. and Pressey, L. W.: "Cross-out Tests," Journal of Applied Psy- chology, Vol. Ill (1919), pp. 143-150. ' See Yerkes and Yoakum, Army Mental Tests; also Memoirs of the N ationnl Academy of Science, Vol. XV. The Experiment 2 1 Beta. The cube analysis test was originally devised by Edwards at Camp Lee, to take the place of the usual form of test for arithmetical reasoning. Test 3 (X-0 series) was an attempt to provide the equivalent of test 8 of Alpha. It proved to be an easy and effective way to indicate the institutional feeble-minded group. The digit- symbol test was modeled after the well-known substitution test which had been used in various forms by Woodworth, Pintner, Whipple, and others. Number checking was devised by Thorn- dike, and found satisfactory on all counts. The pictorial com- pletion test was devised by Kelley and patterned originally after the Binet mutilated pictures. The last test, geometrical construc- tion, was patterned after the various form-board tests. It was found particularly good in picking out the higher levels of ability. The product-moment coefificient of correlation between the Beta Examination weighted score and Stanford-Binet mental age was reported to be .731 ± .012. 5. The Dearborn Group Tests of Intelligence, Series 1} The Dearborn Group Tests of Intelligence, Series I, were devised and standardized by Professor W. F. Dearborn, of Harvard. They are not linguistic, and consist of three parts (known as General Examinations 1,2, and 3) for use in the first three grades. General Examination i contains a "directions test," a "clock test" and a "circus" test. General Examination 2 consists of seven "games" which, in order, are "color blocks," "substitution," "ladders," "picturemaking," "picture recognition," and "dominoes." General Examination 3 consists of "picture completion," "map of town," "ruler," and "number form puzzles." A correlation of .87 of the Stanford-Binet Scale with Dearborn tests has been reported.'' C. METHOD OF PROCEDURE The present testing was carried on in Public School No. 108, sit- uated in the section of New York City which is populated and in- habited by immigrants. This school has only the kindergarten and the first four grades. Each grade is divided into two sections, so there are altogether nine sections in the school. During the fall of 1920 ' Dearborn, W. F.: The Dearborn Group Tests of Intelligence. 'Journal of Educational Research, Vol. hi, No. 4, p. 308 (April, 1921). « Number Per Cent 362 90.27 21 5-24 14 3-49 2 •50 I •25 I 25 22 Non- Verbal Intelligence Tests jor Use in China when the experiment was started, the enrollment was about 1,000. After a preliminary trial, it was found impossible to test all the pupils, as those who were in the kindergarten and the first grade could not follow the directions of the tests satisfactorily. They were eliminated from the testing and only the children from grades 2B to 4B were tested. In these grades, there were 185 boys and 216 girls, a total of 401. The distribution of the children according to nationalities was as follows: Nationality Italian . . . Chinese . . Jewish . ... Jewish-Italian Chinese- Jewish . ... Spanish-Italian . . Only a few of these children were Chinese; more than 90 per cent were Italians. However, since the purpose of the experiment was to select the best non-verbal tests, and since special forms for use in China would have to be made later and no norms were expected to result from the testing, the nativity of the subjects was wholly immaterial. Prior to this experiment, the school had never used any standardized psychological or educational tests. The principal and the teachers were all deeply interested in the experiment and offered every possible assistance to make it a success. The writer took advantage of this unusually excellent opportunity to visit the school frequently and make friends with both teachers and pupils. In consequence, when he was ready to test the children, although a foreigner, he was no longer a stranger to the school population. All the testing was done in a large classroom equipped with desks, blackboard and comfortable chairs. Twenty-eight pupils were brought to this room, to be tested, at one time. The pupils were seated apart from each other, so the possibility of copying was reduced. Before giving a test to the children, the writer familiarized himself with the instructions by trying them with other children. All the examinations were conducted by the writer himself with the assist- ance of the principal. Miss Rae, and a college trained teacher. Pains were taken to maintain uniformity both of the procedure in The Experiment 23 the testing and the environment in the room. The order at testing was always from the younger ones, then to the older ones; that is, from Grade 2B to 4B. The testing time was from 10 a. m. to 12 m. and from i p. m. to 3 p. m. Every effort was made to make the test- ing informal and pleasant yet stimulating and searching. The scales were given on the following dates: No. I Nov. 24-26, 1920 — Myers Mental Measure No. 2 Dec. 4-6, 1920 — Pintner's Non-language Tests No. 3 Dec. 14-16, 1920 — Pressey Primer Scale No. 4 Dec. 20-22, 1920 — ^Army Beta Examination No. 5 Jan. 4-8, 1921 — Dearborn Group Tests of Intelligence In giving the scales, all the original directions were followed liter- ally, except in the cases of the Dearborn Group Tests of Intelli- gence and the Army Beta Examination, both of which were modi- fied to meet the peculiar needs. The altering of directions for the Dearborn Tests of Intelligence was very slight. The only change made was in the "clock" test of General Examination i. The original direction calls for the subjects to draw in the clock hands, indicating the time when school begins in the morning, when school begins in the afternoon, and when school closed in the afternoon. In the school where the testing was done, starting and closing time is different for different children. As it was therefore confusing for the children to answer these questions, the following directions were substituted: "In the first clock draw in the hands so as to show what time school assembly begins in the morning. Draw the hands in the next clock, to show what time school recess begins for lunch. In the third clock show what time school begins in the afternoon." As suggested by Dearborn, the tests were given in two periods, but with one day interval between them. In the case of the Army Beta Examination, the procedure was considerably modified. The original directions call for a blackboard frame consisting of eight fitted sections, a blackboard chart in a continuous roll 27 feet long, cardboard pieces for Test 7, and pat- terns for constructing Test 2. It was impossible to get the original apparatus, so the school blackboard, self-made cardboard pieces, and real wood cubes were substituted. Furthermore, according to the original form, it was necessary to have an examiner, a demon- 24 Non- Verbal Intelligence Tests for Use in China strator, and a number of orderlies. The demonstrator was charged with the single task of doing before the group just what the group was later to do with the examination blanks. The use of a special demonstrator, as provided for in the original tests, was considered both superfluous and cumbersome. The examiner also performed the duty of the demonstrator. As in other scales, he was to give the directions as well as demonstrate to the class the preliminary test. The adapted directions were as follows: Directions for Test i As soon as the pupils have been properly seated, and examination blanks dis- tributed, the examiner says, "Here are some papers. You must not open them or turn them over until you are told to." Holding up the Beta blank, the examiner continues: "In the place where it says name, print your name very clearly. Remember, print your full name. If you are Mary Jones, print Mary Jones; if you are John Smith, print John Smith. Right after your name, in the place where it says rank, write your grade. Do you know in which grade you are? That is fine. Write down your grade very clearly, so that I can read it. Look over your paper again and show me whether all of you have written your name and grade very clearly." Before the examination begins, each paper should be inspected by the assistants in order to make sure that the name and grade are clearly written. Then the Examiner remarks, "Attention! Watch what I do on the blackboard. I am going to do here what you are going to do on your papers. Ask no questions. Wait till I say, 'Go ahead.' Now is everybody ready? Turn your paper over. This is Test i, here (pointing to the page of record blank). Have you found it?" After all have found the page, the Examiner continues, "Don't make any marks till I say 'Go ahead.' What I want you to do is to draw a line which shall pass through the pictures from left to right without touching any line. Now watch me work on the blackboard." After touching both arrows, the Examiner traces through the first maze with chalk, slowly and purposely makes one mistake by going into the blind alley at upper left-hand corner of the maze and asks the class, "Is this correct?" After the class answers "No,'' the examiner places his hand back to the place where he may start right again, and traces through the rest of the maze, indicating an attempt at haste and hesitating only at ambiguous points. After this is done, he says, "Everybody ready! All right. Go ahead. Hurry up." At the end of two minutes, the examiner says, "Stop! Turn over the page to Test 2." Test 2 The examiner then continues, "This is Test 2 here. Look!" After everyone has found the page he says, "I want you to count the cubes and write the number in the little square below the picture. Now watch me work on these blocks." The order of procedure is as follows: The Experiment 25 a The examiner points to the three-cube model on the blackboard, making a rotary movement of the pointer to embrace the entire picture. b With similar motion he points to the three-cube wood model on the desk. c The examiner next points to picture on blackboard and asks the class, "How much?" d The examiner turns to cube model and counts aloud, putting up the fingers while so doing, and encouraging the class to count with him. e The examiner taps each cube on the blackboard and asks the class, "How much?" / After the class answers correctly, the examiner counts the cubes on black- board silently and writes proper figures in proper places. (The rest is the same as the original directions). After the demonstration is completed, the e.xaminer says, "Everybody ready! All right. Go ahead. Hurry up," and at the end of 2% minutes he says, "Stop! Look at me and don't turn the page." Test 3, x-o Series "This is Test 3 here. Look." After everyone has found the page, he says, "I want you to draw in X or O in the proper squares which are empty. Now watch me work on the blackboard." The examiner first points to the blank rectangles at the end, then traces each "O" in chart, then traces outline of "O's" in remaining spaces and draws them in. Then he traces first "X" in next sample, moves to next "X" by tracing the arc of an imaginary semicircle joining the two, and in the same manner traces each "X," moving an arc to the next. He then traces outlines of "X's" in the proper blank spaces, moving over imaginary arc in each case, and asks the class what should be drawn in. The examiner follows the answers of the class and fills in remaining problems very slowly. After the demonstration is finished, the examiner says, "All right. Go ahead. Hurry up!" At the end of i^ minutes he says, "Stop! Turn the page to Test 4." Test 4. Digit-Symbol "This is Test 4 here. Look." After everyone has found the paper — "I want you to study each number and memorize the symbol which represents it. Put in the right symbol under the right number." The examiner touches the number in first sample with index finger of right hand; holding finger there, finds with index finger of left hand the corresponding number in key; drops index finger of left hand to symbol for the number found; holding left hand in this position writes appropriate symbol in the lower half of the sample. Similar with the other sample. But for the last three samples the class is asked to give the correct symbols. At end of the demonstration, the examiner says, "All right. Go ahead. Hurry up!" At the end of 2 minutes the examiner says, "Stop! But don't turn the page." Test 5. Number Checking "This is Test 5 here. Look." After everyone has found the page — "I want 26 Non- Verbal Intelligence Tests for Use in China you to find out whether the two numbers are the same. If they are the same, write 'X' on the dotted line between them; if they are not the same, write '0' on the dotted line between them. Now watch me do this on the blackboard." In this demonstration the examiner must get "Yes" or "No" responses from the class. If the wrong response is volunteered by the group, the examiner points to digits again and gives right response, "Yes" or "No" as the case may be. The examiner points to the first digit of first number in left column , then to second digit first number in left column and second first number in right column. He says "Yes" to the class and marks an "X" on the dotted line between the number. The examiner does the same for the second line of figures, but here he indicates by "O." In the last three samples, the class is asked to answer "Yes" or "No." After the demonstration is over, the examiner points to page and says, "All right. Go ahead. Hurry up!" At the end of 3 minutes, he says, "Stop! Turn over the page to Test 6." Test 6. Pictorial Completion "This is Test 6 here. Look! A lot of pictures." After everyone has found the page — "Every picture has something gone. I want you to fix it. Now look at the pictures on the blackboard." The examiner points to the picture of the hand, then to the place where the finger is missing and asks the class, "What is gone?" After the class has given a correct answer, he says, "That's right. The finger is gone." Then he draws in the finger. Similarly with the other samples. When the demonstration is finished, the examiner says, "Fix all the pictures on the whole page. All right. Go ahead. Hurry up!" At the end of 3 minutes, the examiner says, "Stop! but don't turn over the page." Test 7. Geometrical Construction "This is Test 7 here. Look." After everyone has found the page — "Here are blocks. Imagine that you could fill them in this square, and then draw in the intersecting lines in this square. Now watch me." The examiner points to the first figure on blackboard. He then takes the two pieces of cardboard, fits them on the similar drawing on blackboard to show that they correspond and puts them together on the square on blackboard to show that they fill it. Then, after running his finger over the line of intersections of the parts, he removes the pieces and solution in the square on the blackboard. Similarly for the other samples. At the end of the demonstration, the examiner holds up the blanks, points to each square on the page and says, "All right. Go ahead. Hurry up!" At the end of 2yi minutes he says, "Everybody stop!" Papers are then collected by monitors immediately. While the children were doing the tests, a general impression of their attitude was recorded. As a whole they showed fine spirit, worked enthusiastically, and seemed to enjoy the work. Judged by their manner, they seemed especially interested in the Pressey The Experiment 27 Absurdity Test and in all the Pictorial Completion Tests. But in the case of the Dearborn Tests, the majority of the children were bewildered by lack of clearness in the directions and showed signs of fatigue due to the over-long time required . Practically all the tests were scored by the writer himself, with great care. The Dearborn Group Tests of Intelligence were found to be the most difificult to score. It took on an average of fifteen minutes to score a child's paper containing the three examinations. The amount of time required in scoring the Dearborn Tests seemed greatly out of proportion to the results obtained. CHAPTER III FORMATION OF A CRITERION The chief object of the present study is to select the best group of tests from the five inteUigence examinations which were given to the children in New York City Public School No. io8, with the view to modify them for use in China. In order to do so, it is necessary to have a definite, constant criterion with which to com- pare tests. This criterion should be made up from as many factors as possible that are known to be indices of the constituents and development of general intelligence. These factors must be reliable indicators if the criterion, which is depended upon to determine the value of the selected tests, is to be trustworthy. In this chapter, an account of the selection of the best elements to be included in the criterion with which to compare the tests is given. The elements of the criterion adopted are: (l) age, (2) school marks, (3) school progress, (4) teachers' estimates of intelligence, and (5) composite test scores of (a) Dearborn Group Tests of Intelligence, (b) Pintner Non-language Tests, (c) Army Beta Examination, (d) Myers Mental Measure, (e) and Pressey Primer Scale. Certain weights, to be described later, are given for age, school marks, school progress, and teachers' estimates of intelligence; and to each of the mental test scores, and the total combined into one rating called Final Criterion. A. ELEMENTS OF THE CRITERION The elements which may be included conveniently in a criterion for pupils' intelligence are age, teachers' estimates, school marks, school progress and test scores. All of these measure general intel- ligence in different ways, though their values are not equal. Some or all of them should be used in combination and given weight in reference to their special significance in showing the presence of intellectual ability. Formation of a Criterion 29 1. Chronological Age. The chronological ages of the children were copied directly from the school record, in order that they might be accurately known. As the administering of the intelligence scales extended from November 24, 1920, to January 8, 1921, the median date was taken as a standard to calculate the ages, that is, December 17, 1920. All the ages, therefore, shown on the record book date from the birth of the individual up to December 17, 1920. The dependence of intelligence upon age in adults is a theoretical problem, but gradual mental growth in children is accepted by all psychologists as beyond dispute. Binet, Terman, Thorndike, and others have all found that the general intelligence of a child gradu- ally develops as his age advances until he reaches maturity. Kelley ' and Fretwell,'' according to their experimental studies, find that there is a negative correlation between achievement^ — an indication that with pupils in the same grade the younger are the brighter ones. Since all the subjects in this study are below fifteen years of age, it is evident that age should be considered in the making of the criterion for the selection of the tests, but that the young child in an advanced grade should be given a bonus, and the older child in the early grade a demerit in utilizing age as a criterion of intelligence. Therefore an age distribution table was prepared and a numerical value was assigned to different ages in different sections of the grades. As the ages for the sexes were different, so the values assigned were also different. For instance, in section B of the second grade boys, 9 was assigned to 7 yrs. 6 mo.; 8 to 8 years; 7 to 8 yrs. 6 mo; 6 to 9 years; 5 to 9 yrs. 6 mo.; 4 to 10 years; and 3 to aboVe 10 years. For a complete record, see Table I. 2. Teachers' Estimates. A teacher associates with children daily. She knows in a general way a pupil's strong points as well as his weak ones. Her estimate of his general intelligence should be accurate in some particulars if she clearly understood that her rating was to be based upon native intelligence and not school achievement. Fretwell found that the correlation of the composite of teachers' judgments of pupils with the composite of eleven tests was .66. Kelley found that "the cor- ' Kelley, T. L.: Educational Guidance. ' Fretwell, E. K.: A Study in Educational Prognosis. 30 Non- Verbal Intelligence Tests for Use in China TABLE I Distribution of Ages and the Numerical Value Assigned to Each Age BOYS Grade Age 2B No. Value 3A No. Value 3B No. Value 4A No. Value 4B No. Value Yr. Mo. 8. 9- 9- 10. 10. II. II. 6 12.0 12.6 13.0 136 14.0 14.6 I.SO 2 3 5 2 2 10 9 13 9 16 2 II 10 9 8 7 6 5 4 3 3 3 20 7 6 I 3 12 II 10 9 8 7 6 5 4 3 3 3 I 15 I 4 13 12 II 10 9 8 7 6 5 4 3 3 3 13 5 7 2 5 I I I 14 13 12 II 10 9 8 7 6 5 4 3 3 3 3 3 3 Grade 2 B 3A 3 B 4A 4B Age No. Value No. Value No. Value No. Value No. Value Yr. Mo. 7.0 10 II 12 13 14 7.6 9 I 10 II 12 13 8.0 30 8 10 9 5 10 II 12 8.6 I 7 7 8 4 9 I 10 II 9.0 17 6 18 7 15 8 II 9 3 10 9.6 2 5 2 6 4 7 10 8 2 9 10. 2 4 6 5 6 6 18 7 19 8 10.6 3 4 6 5 3 6 3 7 II. 3 I 4 2 5 9 6 II. 6 I 3 4 5 12.0 I 3 3 3 4 12.6 3 I 3 3 13.0 I 3 4 3 Formation of a Criterion 31 relation between class standing and the regression equation com- bination of the estimates of traits by teachers" was .76. He re- marked, as a result of his investigation, "With such a high correla- tion, a division of pupils into classes by means of teachers' estimates would be highly reliable." ^ Teachers' estimates are not enough, however, because a teacher may overemphasize some factors and neglect others. Terman found that teachers frequently err in estimating general intelligence because they neglect to consider age and emotional differences. Whipple found that teachers estimate the dull children too high and the bright children too low. In this study, teachers were requested to separate their children into five classes A, B, C, D, and E on the assumption that "Intel- ligence is a general capacity of an individual consciously to adjust his thinking to new requirements: it is general mental adaptability to new problems and conditions of life."^ They were asked to rate few as A's or E's, comparatively more as B and D, and a larger number as C. The teachers were warned not to grade the intel- ligence of their children by their school achievement and deport- ment but by their general abilities or brightness shown both in their academic work and extra-curricular activities. In order to be fair to the children , the teachers were requested to grade their children independently three times, November 24, 1920, December 16, 1920, and January 11, 1921. The dates were sufficiently far apart so that the teachers scarcely remembered their previous marks. An aggregate of the three estimates was taken as the estimate of the teacher for the general intelligence of the child. In order to make possible statistical treatment, the letters A, B, C, D and E, given by the teachers, were transmuted into numerals. They are shown as follows: Teachers' Estimate Numerical Value Assigned A 4 B 3 C . . . . 2 D I E . . . o ' Kelley, T. L.; Educational Guidance, p. i6. 2 Stern, W.: "The Psychological Methods of Testing -Intelligence," translated by G. M. Whipple, Educational Psychology Monographs , No. 13, p. 3. 32 Non- Verbal Intelligence Tests for Use in China 3. School Marks. School marks have been the most universal method used for grading pupils. In the past, it has been the only method of judging the ability of the children recorded in school reports. While it is true that teachers often do not agree with each other, yet school marks are a fair measure of mental ability. Fretwell found the cor- relation between school marks and a group of tests as high as .57.' McCall says, "Teachers' marks are important because they are now and will continue for some time to be the most universal method of rating pupils. In fact, they may continue forever to be the criterion for classification because teachers will soon be familiar with the simple mysteries of scientific measurement." * In the school in which this experiment was carried on, there were weekly, monthly, and term examinations. The teachers mark the children by letters. The marks used in this study are the average school marks of the children in the fall term of 1920-21. For the convenience of statistical study, the school marks were turned into figures as follows: School Marks Numerical Value Assigned A B C D E 10 8 7 5 3 4. School Progress. By school progress is meant the progress which a child has made in the school, that is, his present class standing. The very reason that one could be promoted to a certain grade and maintain his standing there shows that he must have the mental ability to handle the subjects. When a pupil fails to make satisfactory progress in his school work, he is ordinarily retarded or eliminated. It is clear, there- fore, that advance in grade usually indicates development of intelli- gence , although there may be exceptions . Sometimes the school per- mits a pupil to move up a grade or class even though he has not done the work below, because the parents of the child insist upon it; or because the teacher wants to get rid of the backward child; or 1 Fretwell, E. K.: A Study in Educational Prognosis, p. 17. ' McCall. W. A.: How to Measure in Education. Formation of a Criterion 33 because the school must make room for younger pupils. However, the majority of the pupils are promoted because their mental ability permits the expected scholastic attainment, and therefore the grade reached should be utilized in building up the criterion for the selection of tests. In Public School No. 108, classes are divided into A and B sec- tions, and pupils are promoted by sections twice a year, A being the lower section. The following numerical values were assigned to the different sections of the different grades: School Progress Numerical Value Assigned Grade IIB o Grade IIIA 5 Grade IIIB .... 10 Grade IVA 15 Grade IVB . . 20 5. Test Scores. All tests given are standardized . They all are claimed to correlate highly with general intelligence. The correlation between Myers Mental Measure and Stanford-Binet Scale was reported to be .80; between Pressey Primer Scale and Stanford-Binet Scale, .60; between the Army Beta Examination and Stanford-Binet Scale, .73; and between Dearborn Group Tests of Intelligence and Stan- ford-Binet Scale, .87. It is safe to assume that if the individual scales are so valid as a measure of intelligence, a combination of the test scores of all these scales would result in an excellent measure- ment of general intelligence. Based upon this assumption, the com- bined test scores were included in the final criterion. B. TEST SCORES WEIGHTING In order to study the combined value of all the scales given to the children, it was necessary to have a composite of all the test scores. This could be done by summing all the raw test scores of the different scales. But the merits and variabilities of the scores of the different scales are different. To sum the raw scores is, therefore, unfair. The problem then to be next considered was how to weight the different scales properly. It was important to know the merits of the different scales, when a weight was attached to each. One of the simplest methods for 34 Non- Verbal Intelligence Tests for Use in China finding them was to prepare an age or grade distribution table and inspect the slope shown. This was based on the assumption that a child, as he advanced in age and grade, should make a higher score in an intelligence test. This gradual increase of scores in propor- tion to the advance in age and grade permits the appearance of a slope on the distribution table. When scores for a given age are near together and on the whole greater for each increased age, which is shown graphically by their clustering about the slope line, the more valuable is the scale. Based on the above assumption, age and grade distribution ta,bles of each scale were prepared for both sexes. An inspection of the tables shows the existence of some slope in all the scales; Pintner Non-language Tests, Army Beta Examination, and Dearborn Group Test, however, seemed better than the rest. All of the tables could not be shown here, but for illustration, the distribution of Pintner's Non-language Tests is given in Table II. Attention is called to the slope and the gradual increase of scores as expressed by the medians, from 50.83 for age 8 to 106.5 for age 11. Another rough method for finding the merits of the different scales is to compute the extent of overlapping of the two groups of scores. The assumption is this: the less overlapping in different grades the tests show the better measures of intelligence they are. For instance a good scale should show the differences in mental traits between the child in Grade III and the child in Grade IV. The more the scale can indicate the difference, the more reliable the scale. Such over- lapping can be computed by comparing the two total distributions of the test scores by stating the variabilities of the two groups and their central tendencies. The method used in this study, however, is a shortened one, based on the following formula:' T, ^ ^ ■ c A T> .4 (No. of cases) > median in 5 Per cent overlappmg of A over B = — ^ NA To illustrate the method, the data in Table III are taken. In this case A in the formula means Grade III and B Grade IV. The median for Grade IV is 67.5, which falls midway in the step 65-70. ' See Thorndike, E. L.: Mental and Social Measurements, p. 128 ff. Formation of a Criterion TABLE II 35 Age Distribution Showing the Slope and the Increase of Scores as Age Advances; Data from Boys Who Have Taken the Pintner Non- Language Tests Age Score 130-135 . . . • 125-130 . . . . 120-125 . . . . 115-120 . . . . 110-115 . . . . 105-110 . . . . 100-105 . . . . 95-100 . . . . 90- 95 ... . 85- 90 ... . 80-85 . . . 75- 80 . . . 70- 75 ... . 65- 70 ... . 60- 65 ... . 55- 60 ... . 50- 55 . • • ■ 45- 50 . . . 40- 45 ... . 35- 40 • 30- 35 • • ■ • 25- 30 ... . 20- 25 . 15- 20 ... . 10- 15 . 5- 10 ... . o- 5 • . • • Number of Cases Median . . . . Quartile . . . . / // // / // // //// /// // / /// // /// // // / / //// / //// // / //// ///// ///// //// /// ///// // /// /// / /// /// // / // / // // // / //// /// / ///// // // //// // / /// / / / // / /// / // / / / / / / /// 13 14 15 35 60 37 15 50.83 66.25 81.50 106.5 10.50 17.10 13-45 16.25 36 Non- Verbal Intelligence Tests for Use in China TABLE HI Grade Distributions of the Pressey Scale Score Grade III Grade IV 85-90 9 80-85 5 8 75-80 9 26 70-75 18 19 65-70 17 17 60-65 23 16 55-60 20 20 50-55 10 10 45-50 13 8 40-45 15 2 35-40 7 2 30-35 3 2 25-30 4 20-25 3 I 15-20 I 10-15 I 5-10 I 0- Number of Cases 149 141 Median 59-375 67.5 Quartile 10.6 9.6 The number of Grade III pupils who equal or exceed this score is 17 therefore — -|- 18 + 9 + 5 or 40.5, which is 27 per cent of the number in the third grade, 149. The per cent of overlapping of the third and fourth grades is, therefore, 27 per cent. It is illustrated by Fig. 5. By this method the per cents of overlapping were computed for Grade III and Grade IV in all the scales. The results were as follows (for illustrations, see Figs. 6, 7, 8 and 9): Value Number Scale Per Cent Overlapping OF Grade III over Grade IV I Dearborn 9-8 2 Army Beta 12.0 3 Pintner 15-2 4 Myers 21.0 5 Pressey 27.0 Formation of a Criterion 37 Fig. 5. Showing 27 percent overlapping of Grade III over Grade IV in the scores of Preswy Primer Scale, Fig. 6. Showing ai percent overlapping of Grade III over Grade IV in the scores of Myers Mental Measure. Fig. 7. Showing 15.2 percent overlapping of Grade III over Grade IV in the scores of Pintner's Non-language Tests. Fig. 8. Showing 12. percent overlapping of Grade III over Grade IV in the scores of Army Beta Examination. Fig. 9. Showing 9.3 percent overlapping of Grade III over Grade IV in the scores of Dearborn (Jroup Tests of Intelligence. 38 Non- Verbal Intelligence Tests for Use in China The Dearborn Group Tests of Intelligence, the Army Beta Examination, and the Pintner Non-language Tests, which were found better than the others according to the slope method, also stand high here. However, tests should be ultimately weighted according to the variabilities of their scores; the range and deviations from the averages should be taken into consideration. The measure of variability used in this study is Q or quartile-deviation. Q is that distance on the base line of the normal curve which includes roughly half of the measure, when laid ofT on each side of the aver- ts - Ci age. It is computed by (3 That is, Q = half of the dis- tance between the 75 percentile and 25 percentile. Q was computed for ages 8, 9, 10 of both boys and girls as shown in Table IV. The sum of these Q's in the different scales is 62.1 TABLE IV Weighting of the Scales According to Q Age Pressey Pintner Myers Beta Dearborn BOYS 8 II-5 10.5 5-4 13-3 390 9 8.6 17. 1 5-1 II-5 23.0 10 II. 7 135 5-6 II. 6 23.0 GIRLS 8 14.8 19.2 4.0 9.8 26.0 9 8.6 28.8 5-2 10.2 16.0 10 6.9 I5-0 4-5 12.9 23.0 Total 62.1 104. 1 29.8 693 150-0 or Abbrev. Total . 6 10 3 7 15 Multiplier . . I I 2 2 I Resulting Weight . 6 10 6 14 15 for Pressey Primer Scale, 104. i for Pintner Non^^language Tests, 29.8 for Myers Mental Measure, 69.3 for the Army Beta Examina- tion, and 150 for the Dearborn Group Tests of Intelligence. These Formation of a Criterion 39 numbers were then reduced, for convenience, to 6, 10, 3, 7, 15 respectively for five different scales. These values of the Q's show that, if the raw scores of the different scales were summed up just as they appear, the Dearborn Scale with its Q of 15 would have five times as great weight as the Myers Scale with its Q of 3; it would give Army Beta Scale almost the same weight as the Pressey Scale, and these weights did not appear to correspond with the real value of the tests. After several trial weightings, it was finally decided to multiply the Myers Scale scores and the Army Beta scores by 2, and the other scores by i . The results showed that they were thus weighted fairly as their value corresponded roughly with the results previously found by the overlapping method. Army Beta Exam- ination was found to be one of the best scales, and it should have at least as much weight as the Dearborn Scale. Although the Myers Scale was not considered one of the best, it was fair to assume that it should carry weight equal to the Pressey Scale. The following table indicates the weights: Per Cent Scale \'alue No. Overlapping Weighting Dearborn . . i 9.8 15 Army Beta . .. 2 12.0 14 Pintner 3 15.2 lo Myers 4 21.7 6 Pressey 5 27.0 6 After the raw scores of the different scales were weighted, they were summed up to get a composite score for each individual. C. METHOD OF SELECTION OF THE FINAL CRITERION After consideration of the various facts known about the sub- jects, and inspection of their correlations with the composite test score, the following composite (termed school criterion) of age, school marks, teachers' estimates and school progress was tried. As previously explained (page 29) numerical values were assigned to different ages, so that a young child in an advanced grade receives more credit than an older child in the same grade (see Table I). Likewise, teachers' estimates of intelligence and school marks (see pages 2 9, 3 1-32), both of which were registered in letters, were trans- muted into numbers. Numerical values were also assigned to the 40 Non- Verbal Intelligence Tests for Use in China grades reached (see pages 32, 33). To illustrate the procedure, ten cases are shown in Table V. Pupil A received a credit of 9 for her age, 10 for her school marks, 12 for teachers' estimates of her intelli- gence and 20 for her school progress. Similarly, pupil J received a credit of 3 for his age, 5 for his school marks, 4 for his teachers' estimates of his intelligence and o for his school progress. In the same way, credits were assigned to all the elements of the school criterion for each pupil. This seemed a reasonable weighting of the facts. Their correla- tions with the composite test score were .71 for the boys and .91 for the girls, with an average of .81. Since we may assume that the composite average of all the intelligence tests is a fairly true measure of intelligence, these high correlations are evidence that the school criterion is reasonable. TABLE V Data for School Criterion (10 Selected Pupils) Grade. . IV B (Girls) II B (Boys) Pupil A B c D E F G H I J Age Chron. Age . gyrs 6 mo. loyrs 3 mo. gyrs 6 mo. loyrs loyra 8yrs 5 mo. 9yrs 6 mo. 8yrs 6 mo. 8yrs 7yra Credit . 9 8 9 8 8 8 S 7 8 3 School Marks Marks. A B + A B B + C C c B c Credit . . 10 8 10 7 8 S s s 7 S Teachers' Estimates Marks. A A A A A A A A A B + B + B A+ A + A C+ c+ c c+ c+ D+ E+ D B + C+ c c+ E+ c Credit 12 X2 12 9 12 6 6 2 8 4 School Progress Grade. 4B1 4B. 4B1 462 4B! 2B, 2Bj 2B2 2B! 2Bs Credit . . . 20 20 20 20 20 Total . . ... SI 48 SI 44 48 19 16 14 23 12 Formation of a Criterion 41 However, a combination of the composite test score and the school criterion might be still more useful. So, we combine them into what we have called the Final Criterion. The S.D. for the school criterion is 8 and that for the test score 12. It seems desir- able to give each equal weight; therefore, the raw score of school criterion were multiplied by 3 and the composite test score by 2. This may be expressed by an equation as follows: Final Criterion = (3 X School Criterion) + (2 X Weighted Test Scores) that is, Final Criterion = [3 X (Age+School Marks+Teachers' Estimates -HSchool Progress)] + [2 X (Dearborn + Pressey + 2 X Army Beta + 2 X Myers)] For further explanation, see Table VI, which contains data for ten pupils. Similarly, the final criterion was calculated for all the pupils. The Final Criterion is the standard used to select the best test elements from the five intelligence scales for development into a valid and reliable measure of intelligence for use in China. The success of the work is, therefore, largely dependent upon the validity and reliability of the criterion. Now the questions arise: Is the final criterion valid and reliable? Are not the elements which made up the final criterion repeating themselves? Is it right to include the scores of tests in the final criterion and then use the combination with the tests elements? It is admitted here that the criterion ele- ments do overlap and there is no line of demarcation to differentiate them. For instance, when a teacher estimates the general intelli- gence of a child she considers his age and school achievement; and school progress involves many factors such as age, school marks and teachers' estimates. But there is no doubt that every element measures something which is somewhat different from that which the other elements measure, that no two of them measure exactly the same traits. Furthermore, all the criterion elements, as explained before, are in some degree measures of general intelligence and each of the five scales has been reported to be a reliable intelligence test. A combination of all these factors certainly should make the final criterion reliable. Finally it must be kept in mind that the 42 Non- Verbal Intelligence Tests for Use in China purpose of this study is to select the best test elements from the five scales; and the criterion required is simply a definite constant stand- ard. It really makes little difference whether the criterion ele- ments to some extent overlap in their functions, for the final cri- terion will be applied uniformly to the tests elements. TABLE VI Data for Ten Selected Pupils for Calculation of the Final'Criterion Grade. . . Pupils. . . School Criterion Age School marks Teachers' estimates School progress .... School Criterion Total . . 3 X School Criterion Total Test Score Pressey Pintner Myers Army Beta Dearborn Test Total Pressey4- Pintner^- 2 X Myer+ . . . 2XBeta+ Dearborn .... IV B (Girls) II B (Boys) D H 51 48 51 44 48 19 i6 14 23 153 144 153 132 144 57 48 42 69 36 65 130 19 90 197 59 91 24 73 169 78 130 28 108 210 69 121 15 82 205 78 104 28 60 220 61 51 o 42 138 24 15 10 15 87 610 513 690 589 578 96 180 334 187 176 2X Test Total (abbreviated) 138 116 36 66 38 36 Final Criterion [(3 X School Criterion Total) + (2 X Test Total)] 275 246 291 252 260 77 84 108 107 72 CHAPTER IV SELECTION OF TEST ELEMENTS A. SELECTION OF TEST ELEMENTS BY CORRELATION METHOD The ultimate aim of this study is to select the best single tests from the five intelligence scales, with the hope that they may con- stitute a non-verbal intelligence scale for use in China. Chapter III has discussed the "final criterion." The present task is to utilize it as a basis for selection . For this purpose the correlations of every single test of the five scales with the final criterion have been worked out. It is assumed that any test element which correlates highly with the final criterion is good. This, however, does not mean that all the tests which correlate highly with the final criterion should be adopted in the Chinese Scale. A high correlation between two tests may be because they measure the same traits; and the corre- lations so obtained are simply self-correlations. A good intelligence scale should measure a combination of different traits, so the test elements in the scale should measure as many different mental traits as possible. Consequently, the ultimate object should be to select those test elements which individually correlate highly with the final criterion but which correlate but little with each other. The writer has adopted r — .80 as a standard. It is aimed to discover a group of test elements from the five scales, which, combined to- gether, will give a correlation above .80 with the final criterion. Scattergrams were prepared charting every single test element against the final criterion, an inspection of which showed the follow- ing to have high correlations. Pressey Primer Scale, test 4 Pintner Non-language Tests, tests 2 and 3 Myers Mental Measure, test 2 Army Beta Examination, tests 4, 5 and 6 Dearborn Group Tests for Intelligence, Series I: General Examination i, test 17 General Examination 2 , test 4 General Examination 3, test i 44 Non- Verbal Intelligence Tests for Use in China After the scattergrams were inspected, the next step was to determine roughly the correlations of all the tests. The formula used is Sheppard's, r = cos ir U where U is the "percentage of unliked signed pairs,"' and n = the number of cases / = the number of H — |- and pairs u = the number of + — pairs d = the number of oo, o+ and o— pairs All the correlations which , by this method , were found to be above .60 were computed also by the product-moment method. Table VII shows the results as found. Among these tests, the two types which appear the most promis- ing are the completion tests and learning tests. Other workers in this field find similar results. Each of these was used by the makers of three of the five scales tried out and was included in their final forms because of its value as an independent measure of intelligence. Consequently, these two types of tests have been made the core of the proposed Chinese scale. The other elements to be chosen should not correlate highly with these two combined, since any other test which does correlate highly with them probably measures the same traits and, consequently, would add little to the measurement. The learning and completion tests selected were those from Army Beta (tests 4 and 6) rather than from the others, because this scale has had a wider use and more searching criticism than any of the others. With these as a basic group, the correlations with every other test in all the five scales were made. Table VIII shows the results. However, all the completion and learning tests show high corre- lations with the final criterion and certain of the correlations of the tests against Beta 4 and 6 combined give promise. But to 'xplore further to see whether a better basal combination could ' Thorndike, E. L.: An Introduclion to the Theory of Mental and Social Measure- ments, BD. 170-71. Selection of Test Elements TABLE VII 45 Correlations of Individual Tests with Final Criterion by Sheppard's Product-Moment Methods Tests Number Coli imn I Column II of Cases r (Sh eppard) r (Product-Moment) Pressey i 233 51 Pressey 2 . . 230 28 Pressey 3 . . . 230 42 Pressey 4 216 61 ■54 Pintner i 234 59 Pintner 2 235 90 .69 Pintner 3 235 84 .62 Pintner 4 234 48 Pintner 5 234 59 Pintner 6 235 59 Myers i 235 63 ■51 Myers 2 234 66 ■47 Myers 3 235 68 ■50 Myers 4 231 66 ■49 Army I . . . . 229 54 Army 2 234 39 Army 3 234 30 Army 4 233 61 ■44 Army 5 234 68 ■52 Army 6 234 75 ■65 Army 7 233 45 Dearborn 1 7 234 66 •45 Dearborn I 8 232 51 Dearborn 1 9 235 40 Dearborn I 10 234 63 •52 Dearborn III 235 66 •50 Dearborn I 12 235 42 Dearborn I 15 23.5 36 Dearborn I 16 212 72 •56 Dearborn I 17 235 51 Dearborn II i 235 39 Dearborn II 2 235 82 .58 Dearborn II 3 234 36 Dearborn II 4 234 66 .46 Dearborn 11 5 234 •45 Dearborn II 6 234 .48 Dearborn II 7 234 ■75 •56 Dearborn III 1 . . 235 .68 •58 Dearborn III 2 .... Dearborn III 3 ... 227 .64 •43 Dearborn III 4 .... 233 .,S6 46 Non- Verbal Intelligence Tests for Use in China TABLE VIII Correlations of Individual Tests with Combination of Beta 4 and 6 by Sheppard's Formula Tests No. OF Cases r (Sheppard) Pressey i 346 .45 Pressey 2 338 51 Pressey 3 341 45 Pressey 4 344 61 Pintner i 312 48 Pintner 2 313 42 Pintner 3 313 61 Pintner 4 313 51 Pintner 5 312 48 Pintner 6 312 34 Myers I 297 45 Myers 2 324 61 Myers 3 319 51 Myers 4 292 56 Army i 371 66 Army 2 370 42 Army 3 370 51 Array 5 368 58 Army 7 374 33 Dearborn I 7 334 45 Dearborn I 8 324 36 Dearborn I 9 335 19 Dearborn I 10 333 36 Dearborn I 11 331 33 Dearborn I 12 336 19 Dearborn I 15 329 31 Dearborn I 16 331 56 Dearborn I 17 297 48 Dearborn II i 341 45 Dearborn II 2 342 51 Dearborn II 3 340 31 Dearborn II 4 340 37 Dearborn II 5 343 34 Dearborn II 6 303 28 Dearborn II 7 305 37 Dearborn III i 266 48 Dearborn III 3 341 45 Dearborn III 4 303 51 Selection of Test Elements 47 be made, correlations were worked out between the criterion and various other combinations of tests. The results are shown in Table IX. TABLE IX Correlations of the Different Scales with the Final Criterion and the Inter-Correlations of the Individual Tests Correlation Between No. OF Cases r (F EARSON) Final Criterion and Pressey 237 58 Final Criterion and Pintner 235 78 Final Criterion and Myers 235 65 Final Criterion and Army 235 75 Final Criterion and Dearborn 235 80 Final Criterion and Dearborn I 235 69 Final Criterion and Dearborn II 235 76 Final Criterion and Dearborn III 235 67 Final Criterion and Dearborn I, i— 6 230 20 Final Criterion and Dearborn I, 7 — 15 234 63 Final Criterion and Pintner 2+3 236 73 Final Criterion and Army 3, 4, 5, 6 232 714 Final Criterion and Army, 4+6 234 711 Final Criterion and Army 3, 4, 5, 6 + Pressey, 2, 4 234 696 Final Criterion and Pressey, 2, 4 233 47 Final Criterion and Army 4, 6+ Pressey 2, 4 235 56 Final Criterion and Pintner 2,3, Army 6 233 815 " Army 3, 4, 5, 6, and Pressey 2, 4 235 38 Pressey 4 and Army 5 49 Pintner 2 and Pintner 3 337 73 Pintner 2 and Army 6 313 28 Pintner 3 and Army 6 313 37 Pintner 3 and Dearborn I 313 614 Pintner 3 and Dearborn II 315 551 Pintner 3 and Dearborn III 316 57 Here are shown significant results, establishing the fact that a better combination than Army Beta 4 and 6 is Pintner's 2 and 3 and Army Beta 6, its correlation with final criterion being .815. They are, however, still learning and completion tests. Tests 2 and 3 of Pintner's scale both correlate low with test 6 of the Army Beta (.28 and .37). These two types of tests really measure different 48 Non- Verbal Intelligence Tests for Use in China traits. Pintner 2 and 3 are both included rather than either one alone because they really form a single test.' These three tests finally selected were now termed "The Basic Tests." They take only ten minutes to perform. The other tests to be included with these must be of different type. This could be found out by correlating the individual tests with the basic tests. The correlations between the basic tests and the individual tests were computed and compared with their correlations with the final criterion, as shown in Table X. The results indicated that the other tests were fairly good as independent measures because their corre- ' A second significant correlation shown in Table IX is that of the final criterion and the entire Dearborn test (.80). However, this should not be interpreted as proof that the Dearborn Group Tests are the best of the five scales. They take more than two hours to finish, and consequently the high correlation may be due to practice effect. Any test if prolonged might result in a fairly high correlation. No single test in the Dearborn battery, however, correlates higher than .58 (see Table VI) with the final criterion. It is worth noting (see Table VIII) that when tests i and 6 are eliminated from Dearborn Group Examination I, little change in the total correlation is made; also that Group Examinations I, II and III each has almost the same value as the other, the correlations being .69, .76, .67 respectively. Each part of the Dearborn Scale, when used as a single measure of intelligence, is better than the Pressey Scale and just as good as the Myers Scale. Each of the three parts of the Dearborn Scale also corre- lates fairly high with the Pintner Scale, which also indicates the value of each part as a measure of intelligence. As a whole test, the Pressey Primer Scale seems to be the poorest of the five scales used in this experiment. Its correlation with the final criterion is only .58. Tests 2 and 4 were found better than the other two tests, but their correlation with the final criterion was only .47. The correlations were not raised when combined with Tests 4 and 6 or Tests 3, 4, S and 6 of the Army Beta Examination. According to this investigation, Myers Mental Measure was better than the Pressey Primer Scale, but it was inferior to the other three scales. The individual tests, how- ever, all showed fairly high correlations with the final criterion. Army Beta Examination as a whole had a correlation of .75 with the final criterion, which was good. When only the combined scores of Tests 3, 4, s and 6 were corre- lated with the final criterion, the result was r = .714; and the correlation between Tests 4 and 6 alone and the final criterion gave just as good result (.711). This proved that Tests 4 and 6 were the best test elements for our purpose in the Army Beta Examination. The conclusion was further confirmed when these two tests, combined with tests from other scales, failed to raise the correlation higher than .711 (see Table IX). Other things as well as the correlation being taken into account, Pintner's Non- language Scale seemed to give the best measure of intelligence, because (a) it corre- lated highly (.78) with final criterion, (6) it did not take a long time to give, and (c) it was easy to score. The individual tests also correlated highly with the final criterion. Pintner Teste 2 and 3 with Test 6 of the Army Beta stood highest among all the individual tests in the five scales. Selection of Test Elements 49 lations with the basic tests were in general lower than with the criterion. Test 4 of the Pressey scale and Test 7 of Dearborn Exam- ination I were the best, as their correlations both were below .30. TABLE X Correlations Between the Individual Tests and the Basic Tests (PiNTNER 2, 3 AND BETA 6) Name of Test No. OF Cases (Pearson) Dearborn I — 7 287 26 Dearborn I — 10 291 35 Dearborn I — 1 1 289 47 Dearborn II — 2 293 47 Dearborn II — 4 294 40 Dearborn II — 7 295 46 Dearborn I II — 1 293 44 Dearborn III — 3 293 36 Army 4 313 58 Arrays 309 38 Myers i 253 51 Myers 2 278 39 Myers 3 280 34 Myers 4 263 73 Pressey 4 291 25 However, as there are other factors to be considered in the selec- tion of the tests besides the correlations, the tests to be combined with the basic tests were not finally selected pending- further inves- tigation. B. SELECTION OF TESTS BY RATING The rating method is not so accurate as the correlation method, but when the results of the latter are known the former can be wisely used to help in the selection of tests. Sometimes the judgments of specialists are as valuable as objective computation. On October 20, 1921, the members of the psychology seminar at Teachers College, who are instructors and graduate students in the field of measurement, were asked to rate the individual tests in the different scales. A copy of the test material was distributed to each member and the instructions for administering the tests were read 50 Non- Verbal Intelligence Tests for Use in China to them. They were then asked to rate two characteristics of each individual test as follows: a Can many alternative forms be prepared for the test? Assign a value of to lo or more for alternative forms, o value for no alternative forms, and the others in proportion. b Is success in doing the test due to verbal instruction? Assign a value of lo if the success in doing the test is entirely inde- pendent from verbal instruction, a value of 5 if the success is fairly due to the verbal instruction, o value if the success is entirely dependent upon the verbal instruction, and other values in proportion. The results of the rating are shown in Table XI. It was assumed that the instructors and the writer, being familiar with tests and their making, would be better judges than the mem- bers of the class, and therefore their judgments were weighted four times as heavily as those of the students. A question of prime importance in the case of any test is whether or not it is applicable to Chinese. Consequently, ten Chinese advanced graduate students of education were asked to rate the tests in the same way as the seminar students. The test material was given to each of the judges and the instructions for giving the tests were read and explained to them. They were asked, "Is this test applicable to Chinese? Assign a value of 10 if it can be applied to Chinese very easily, a value of 5 if it can be applied with some difficulty and o value if it cannot be applied to Chinese at all." The results of these ratings were not so satisfactory as anticipated. The writer finally assumed the responsibility, although he was guided by the ratings of other Chinese judges, to rate the individual tests. The results are shown in Table XII (page 52). Both of the ratings of the two groups of judges indicate different values for the different tests. The three best tests according to this investigation were Tests 4 and 5 of the Army Beta Examination and Test 4 of the Pressey Primer Scale. This finding was still not co- sidered final and a further investigation was made. C. SELECTION OF TESTS BY PARTIAL CORRELATION To be certain that the other tests to be included in the Chinese Selection of Test Elements 51 scale should be different in their nature from the basic tests, all the completion and learning elements should be eliminated from the TABLE XI Ratings of the Individual Tests by Competent Judges A = Can many alternative forms be prepared? F = Is success due to verbal instruction? Judges X < K J > < 2 A B c D E F G H / J > < + i Army 4 A V 10 9 ID 7 10 8 ID 8 10 8 10. 8.0 10 5 10 10 10 9 8 8 ID 8 10 5 8 10 10 10 10 8 5 5 91 7.8 9 7 7 6 Army 5 A V 10 8 10 4 10 8 10 4 10 5 lO.O 5-8 10 3 10 7 10 6 7 10 10 5 10 2 10 8 10 8 10 5 10 7 9-7 6.1 9 5 9 9 Dear. I— 7 A V 3 2 4 7 2 I 2 6 4-2 1.2 10 I 10 8 8 5 10 9 7 5 5 2 5 6.5 1.8 4 I 9 4 Dear. I — 10 A V 4 2 7 I 3 I 4 3 ID 2 5-6 1.8 10 6 3 2 6 I 10 8 3 5 2 2 4.8 1.0 5 I 3 5 Dear. I— 11 A V 2 I I 6 I I 2 I 2.4 .6 10 10 2 3 4 8 3 I I 3 3-6 •9 2 8 7 Dear. 11— 2 A V 8 4 2 I 6 8 2 ID 5 6.8 2.4 10 10 10 2 I 10 10 2 9 3 5 5 3 3 6.5 1.8 6 2 7 2 Dear. II— 4 A V 10 2 10 8 I 7 2 10 3 9.0 1.6 10 10 10 3 5 8 10 2 10 3 10 5 9 7.5 2.0 8 I 5 7 Dear. Il— 7 A V 4 3 2 2 5 5 2 ID 2 5-2 1.8 10 10 9 I 8 I 2 2 10 2 10 3 3 5-6 1.5 5 I 3 7 Dear. Ill— i A V 10 4 2 5 8 3 8 2 10 I 7.6 30 6 3 10 9 I 7 10 3 10 10 5 5 10 2 5 3 6.5 3-4 7 3 2 I Dear. Ill— 3 A V 2 2 I 3 4 2 I 3 2.4 1-4 10 I 5 5 4 10 2 10 2 3-9 1 .0 2 I 9 3 Myers i A V 10 I 5 9 9 I 10 I 8.6 .6 10 10 10 6 10 10 8 10 10 8 7-8 1.4 8 3 8 Myers 2 A V 10 8 5 4 8 8 9 9 10 7 8.4 7-2 10 5 10 10 5 5 10 10 4 10 10 7 10 3 10 5 5 6 9.0 4-5 8 6 6 3 Myers 3 A V 10 8 4 3 9 2 7 3 10 7 8.0 4.6 ID 10 10 I 4 10 10 I 10 10 4 10 10 5 6 2 9.0 2.3 8 3 3 5 Myers 4 A V 10 8 4 I 8 2 7 2 10 6 7-8 3-8 10 10 10 6 10 10 10 2 7 10 10 3 6 3 8.4 2.3 8 3 3 Pressey 4 A V 10 8 10 _4 8 _7 9 _6 10 _7 9-4 6.4 10 10 10 8 10 10 _3 10 _2. 5 _3 4 _5 10 _7 5 2 7-8 4-7 8 5 9 8 52 Non- Verbal Intelligence Tests for Use in China TABLE XII Individual Tests Rated re Application to Chinese Application 9.60 8.03 6.00 7.70 7-30 8.46 8.16 7.00 9.76 7.20 7.86 8.76 9.06 8.80 . . 9.00 Other tests. This is done by the method of partial correlation. The formula ' used is: „ _ rii - (rn) '12 3 = Tests Army Beta 4 Army Beta 5 Dearborn I- -7 Dearborn I- -10 Dearborn I- -11 Dearborn 1 1- -2 Dearborn II- -4 Dearborn II- -7 Dearborn Ill- -I Dearborn Ill- -3 Myers I Myers 2 . . Myers 3 Myers 4 Pressey 4 m = The individual tests and the final criterion. r\3 = The individual tests and the basic tests. r& = The basic tests and the final criterion. The results are shown in Table XIII . Test 4 of Pressey Primer Scale has distinctly high partial correlation (.60) with the criterion after the learning and completion elements are partialed out. As to the other tests, the partial correlations vary from —.25 to -I-.43. D. SELECTION OP TESTS BY A COMPOSITE METHOD The rating method and the partial correlation method both indi- cated the general value of the different tests, but each by itself could not be used as a basis for the selection of the tests. The best way ' For a coirplete discussion on the partial correlation method, see Thorndlke, E. L.: Theory of Mental and Social Measurements, p. 182; and Kelley, T. L.: "Table to Facili- tate the Calculation of Partial Coefficient of Correlation and Regression Equations," Bulletin of University of Texas, 1916, No. 27. Selection of Test Elements S3 was to use a combination of all the available methods together with a consideration of all the other factors. This could be accomplished by first summing up the results obtained from the different methods and then selecting the best tests according to the composite results, which are shown in Table XIV. TABLE XIII Correlations of the Individual Tests with the Final Criterion with the Elements of the Basic Tests Eliminated (r 12.3 Column) ra = .815 (r Final Criterion and Basic Tests) Tests Times r. fa n ?.s Dearborn I — 7 2 45 26 42 Dearborn I — 10 2 52 35 43 Dearborn I — 11 2 50 47 23 Dearborn II — 2 5 58 47 38 Dearborn II — 4 7 46 40 25 Dearborn II— 7 3 56 46 36 Dearborn III — i 5 58 44 43 Dearborn III — 3 5 43 36 26 Army 4 2 44 58 07 Army 5 . 3 52 38 39 Myers i 4 51 51 19 Myers 2 4 47 39 28 Myers 3 4 50 34 41 Myers 4 5 49 73 25 Pressey 4 3 54 25 60 In comparison with the other factors, more weight should be attached to the partial correlations. Consequently, they were multiplied by 50 so as to equalize the values of the "alternative forms," "verbal instruction," and "application to Chinese." The last column is the summing up of the four values. A review of the combined results shows that the following tests have the highest values: Pressey Scale, test 4 . • 53-7° Army Beta, test 5 43 33 54 Non- Verbal Intelligence Tests for Use in China Evidently, Test 4 of the Pressey Scale and Test 5 of the Army Beta Examination are the best among all the individual tests of the five scales to add to the basic tests. These two were consequently definitely selected to be included in the proposed Chinese scale. ^ With the selection of Test 4 of the Pressey Scale and Test 5 of the Army Beta Examination, to be included in the Chinese scale, it TABLE XIV Combined Value of the Individual Tests as Determined by Ratings and Partial r Method Tests Alterna- tive Forms Instruc- tion Applica- tion Partial r X 50 Com- bined Value Dearborn I — 7 4-9 1-4 6.0 21.0 33 32 Dearborn I — 10 . . 5 3 1-5 7 7 21-5 30.20 Dearborn I — 11 . . 2 8 0.7 7 3 II-5 22.30 Dearborn II— 2 . . . 6 7 2.2 8 5 19.0 36.40 Dearborn 11 — 4 . . . . 8 5 1-7 8 2 12.5 30.90 Dearborn II — 7 5 3 1-7 7 18.0 32.00 Dearborn III — i . . . 7 2 31 9 8 21.5 41.60 Dearborn III — 3 2 9 1-3 7 2 130 24.40 Army 4 . . . 9 7 7.6 9 6 - 3-5 23.40 Army 5 9 9 5-9 8 03 19-5 43-33 Myers l . 8 3 0,8 7 9 9-5 26.50 Myers 2 . 8 6 6.8 8 8 14.0 37-70 Myers 3 . 8 3 3-5 9 I 20.5 41.40 Myers 4 . 8 3-3 8 8 -12.5 7.60 Pressey 4 8 9 5-8 9 30.0 53-70 was necessary to consider the character of these two tests in greater detail. A study of their correlations with other elements showed the following results: ' Test I of the Dearborn Group Examination III -would likewise have been included, had it not closely resembled the basic tests. Other important objections to the Dear- born tests -were: first, the value of the test might be due to practice effect; second, the test, comprising three pages of pictures, was too expensive. Selection of Test Elements 55 Correlation Between: Correlation Pressey 4 and Criterion ... .54 Army 5 and Criterion .... .52 Pressey 4 and Basic Tests • -25 Array 5 and Basic Tests .... .38 Pressey 4 and Army 5 .... .49 Thus Pressey 4 and Army Beta 5 both correlate fairly high with the final criterion, and rather low with the basic tests. On the other hand, their correlations with each other were not high. This proved that the two tests were good measures of intelligence, each measuring traits different from those of the basic tests and from each other. Because of these special qualities and characteristics of the Pressey 4 and Army Beta 5, they were chosen, along with the basic tests, to form the proposed Chinese intelligence examination. E. WEIGHTING BY REGRESSION EQUATION It has been found that both Army 5 and Pressey 4 should be included in the proposed Chinese examination. The question then arises as to the amount of weight to be attached to the two tests and the basic tests. To solve this problem the regression equation was used. The regression equation follows: Xl = Tn-U A2 + ?'l3-24 X} + ^14 .23 A'4 0'2-134 0'3-124 0'4-l23 O' 1-234 = Cl V'l — r''u Vl — r^i3.2 Vl — r''u-i O- 2-134 = 0-2 Vl — r^24 ^l — r^23-i '^ I — ?'^12.3 0- 3-124 =0-3 Vl — r^u Vl - r"23.4 Vl - ^^3.24 0- 4-123 =0-4 Vl — r^n Vl — ;-224.3 Vl — T^u.,} fn-Si = 1'\2A A13-4, 23-4 ~ B13.4, 234 ''lS-24 = ?'l3-4 A12.4, 23-4 ~ Bl2-4i 23 4 T14-23 = '■14-3 A12-S1 24-3 ~ B12.3, 24-3 56 Non- Verbal Intelligence Tests for Use in China TABLE XV Data for Calculation of Regression Equation I = Criterion. 2 = Basic tests. 3 = Army Beta 5.4 = Pressey 4. I 2 3 4 I . .... 2 ■815 3 ■52 .38 4 ■54 ■25 •49 a 45-3 28.0 8.1 5-6 In Table XV the figure "i" stands for criterion; "2" for basic tests; "3" for Arma Beta 5; "4" for Pressey 4. These correlations were substituted in the above regression equation and the following result was obtained: 21.10 21. IQ 21.9 Zi = 1. 135 Z2 + 0.578 X3 + 2.387 X4 According to the result of the regression equation, the different tests should be weighted as follows: (a) multiplying the basic tests score by 1. 14; (6) multiplying Army Beta 5 by .58; (c) multiplying Pressey 4 by 2.39. In consideration of the general impression of the tests, however, a conservative procedure was adopted. In giving final weights to the tests, the scores of the Basic Tests and of Army 5 were left unchanged, while the score of Pressey 4 was multiplied by 2. The weighted composite scores so obtained (called Composite A) were then correlated with the final criterion, the correlation found being .812. This result was very satisfactory, since it exceeds the goal oi r = .80. In order to find out whether the weighting had raised the correlation or not, the raw composite scores of the Basic Tests, Pressey 4, and Army Beta 5 (called Com- posite B), were also correlated with the final criterion, the correla- tion found being .789. This showed that the weighting had raised the correlation slightly. Selection of Test Elements 57 It should be kept in mind that the tests chosen are not based upon an empirical method of a single statistical computation, but upon all the possible available methods, such as correlation, rating by specialists, partial correlation, regression equation. The test ele- ments finally chosen from the five scales for the proposed Chinese Non-verbal Intelligence Examination are: Test 2 of Pintner Non-language Tests Test 3 of Pintner Non-language Tests Test 5 of Army Beta Examination Test 6 of Army Beta Examination Test 4 of Pressey Primer Scale CHAPTER V RE-TESTING A. PROCEDURE OF RE-TESTING The tests to be included in the proposed Chinese intelligence examination having been tentatively selected, the next step was to determine their reliability and practicability. This could be done by giving the above tests to the same children and calculating the correlations of their scores with the final criterion. If the tests are reliable and practicable, they should correlate highly with the old criterion. An effort was made, therefore, to secure the same sub- jects who the year before had taken all the tests. Some of them had moved out of the district or gone to a higher school and it was impossible to locate all of them, but finally 190 children (from the earlier total of 401) were secured. The re-testing was done from November 28 to 30, 1921, in the same room where the children were formerly tested. A uniform environment, which was similar to that at the first testing, was maintained throughout the examination. The same principal and the same teacher assisted in timing and policing. As in the first testing, 28 children were tested at a time; the children being suffi- ciently separated from each other, there was no opportunity for copying. The papers of three children, who continued working after the "stop" signal had been given, were discarded for the com- putation of the results, leaving papers for 187 children. The directions for giving and scoring the tests were the same as those of the year before, with the exception of a slight modification in introduction (see Chapter VII for a complete record of the tests). Preceding the testing, four boys and five girls were individually interviewed. Each was questioned whether he could recall anything concerning the tests of the year before. All of them indeed remem- bered the occasion of the testing — they remembered "the good time they had had with the Chinese teacher," but not one of them could recall any of the tests. In other words, these boys and girls had completely forgotten all about the first test, except for the vague Re-testing 59 idea of having done it. It is possible that the actual performing of the tests might recall the experience in previous testing, but in young children of this age the likelihood of recalling the tests of the year before seems so slight as to be immaterial. Consequently, the process of re-testing these children cannot be said to be influ- enced to any noticeable degree by repetition. In the re-testing, the children appeared to enjoy their work. There was no sign of fatigue; instead, they were very enthusiastic. The writer obtained some interesting information, on the effects of the tests upon the children, by mixing with them during the recess. Joining in their play, he was constantly approached by them with such remarks as, "Mister, play some more games with us." "When will you come back again?", "Oh, I like to see the woman without a nose, and the poor fish without an eye," "There's lots of fun in making zeros and crosses." The time consumed in testing was from 25 to 30 minutes. It is important to note, in discussing the time necessary for this testing, that none of the groups consumed more than 30 minutes in their testing, nor less than 25 minutes. This, of course, does not include the time taken in the distribution of test material nor for the pre- liminary remarks by the examiner. The method of scoring the tests was very simple. Stencils were prepared in order to facilitate the work. With a small amount of practice, test papers could be scored very rapidly, even at the rate of a paper a minute. B. STATISTICAL STUDY The first step was to determine the general merits of the selected tests, from now on known as "The proposed Chinese Non-verbal Intelligence Examination." Tables of grade distribution and age distribution were prepared (Tables XVI and XVII), and the medians for the different grades and ages were calculated. The medians found for the different grades were: Grade III, 101.36; Grade IV, 125.26; Grade V, 148.50. The result was encouraging as it showed a fair improvement in rentral tendencies for the different grades. The median scores for the different ages were found as follows: Age 8, 85; age 9, 115; age 10, 128; age 11, 142; age 12, 148. The 6o Non- Verbal Intelligence Tests for Use in China result was also encouraging. The medians for ages ii and 12 were close to each other, probably because the 12-year-old children in these grades were duller than the average 12 -year-old. The last step was to find out how closely the test scores of the selected tests corresponded with the old final criterion, which was used as a standard for the measure of general intelligence. Conse- quently, a scattergram was made and the correlation found, by the TABLE XVI Distribution of Re-testing Scores by Grades Re-testing Scores Grade III Grade IV Grade V 170-180 160-170 150-160 140-150 . 130-140 120-130 110-120 lOO-i 10 90-100 80- 90 . . 70- 80 . 60- 70 . ... 50- 60 . 40- 50 . 30- 40 . 20- 30 . . 10- 20 . . ... 0- 10 2 2 3 4 3 II 5 3 6 2 2 I I 2 I 2 4 II 15 18 11 8 6 2 I 2 I I I I I I z 2 3 I 9 Number of Cases ... 47 83 57 101.36 125.28 148. s product-moment method, to be .8768. The result was very satis- factory. Theoretically the correlation between the selected tests and the old criterion should be higher than the correlation between Re-testing 6i TABLE XVII Distribution of Re-testing Scores by Ages Re-testing Scores Age 7 Age 8 Age 9 Age 10 Age 11 Age 12 Age 13 170-180 160-170 150-160 . . 140-150 130-140 120-130 110-120 lOO-IIO go-100 80- 90 70- 80 60- 70 50- 60 40- 50 30- 40 20- 30 10- 20 0- 10 I 1 3 6 7 10 5 9 7 1 2 2 1 2 2 1 4 8 14 9 6 9 2 4 3 1 1 1 . 2 6 7 9 6 8 2 2 3 1 2 1 1 1 1 I I I No. of Cases 2 57 68 44 9 4 2 Median 85 "5 127.7 142.2 147 -5 any of the five scales with the old final criterion. This was proven true, as shown in the following:' Correlations Between the Final Criterion and the Different Scales Scales The Selected Tests Dearborn Group Tests Pintner Tests . . . Army Beta Examination Myers Mental Measure Pressey Primer Scale Correlation .88 .80 78 •75 •65 ■58 ' The first correlation is not strictly comparable to the others since it was obtained from the 187 cases of re-testing while the others were from the more than 250 cases in the first testing. 62 Non- Verbal Intelligence Tests for Use in China Judging by the results of the correlation of the selected tests with the old final criterion, by the comparatively short time to give the tests, and by the deep interest displayed by the children indoing the tests, together with their other merits, it seems fair to conclude that the selected five tests which are included in the proposed Chi- nese Non-verbal Intelligence Examination give better results than any of the five scales used in this experiment. CHAPTER VI ALTERNATIVE FORMS AND STANDARDIZATION A. ALTERNATIVE FORMS Although the selected five tests are to be considered the best among the five scales used in the experiment, they cannot be applied to Chinese as satisfactorily as to American children. For instance in tests 2, 3 and 5, Arabic figures are used in substitution and num- ber-checking. Arabic figures are taught in all of the modern Chinese schools, but the children who have not attended a modern school or learned the Western arithmetic are wholly ignorant of the meaning of them. Chinese children , not of better class families in some mod- ern city such as Shanghai, also will be greatly handicapped in per- forming test I. They can hardly be expected to draw the filament of an electric bulb. They cannot place a postage stamp in its proper American position on the envelope, nor complete the drawing of a pistol, a bowling game, a phonograph or a tennis net, for these objects are rare in China. The same may be said of Test 4, the tele- phone, the gloves, the ABC, the American flag, the music scale, and soon, are most likely unknown to 99 per cent of Chinese children. Consequently, these tests cannot be applied unless alternative forms are devised. As explained in previous chapters, alternative forms have distinct advantages, besides their application to Chinese, such as the prevention of coaching and the provision of material for retesting. In preparing the alternative forms, the criterions first adopted were strictly observed. One point was especially emphasized; namely, that the test material should be drawn from a social environment common to all people and the test should measure only those mental traits which every child has an equal opportunity to develop. This means that the test material selected should not be dependent upon any social or educational advantages. An attempt also was made to bring all of the alternative forms to yield the same 64 Non- Verbal Intelligence Tests for Use in China result. The writer, however, cannot claim credit for such an achievement as yet, because the tests have not been tried out in China. The first step in preparing the alternative forms was to devise a large number of test items. These were then submitted to ten graduate students originally from different parts of China. They were asked, "Is this common in your locality?" All the test items which were marked "Not common" in any of the localities were discarded. The selected test-elements were submitted to 2 Japanese, 2 Filipinos, 2 Indians, 2 Britons, and 2 Americans; and they were asked the same question. "Is this common in your country?" All those which were marked "Not common" were again discarded. These remaining from this double sifting were finally gathered and sorted into forms. Different methods are required for placing the test-items in the different individual tests. For tests 2, 3 and 5, the selection of the symbols was made by the chance method of tossing coins. For tests I and 4, the pictures were arranged, by the combined judgments of three experts, according to their degree of difficulty, beginning with the easiest, and ending with the most difficult ones. The best method for arranging the tests in the order of their difficulty would be one in which the tests are given to several hundred children , with the answers scored either right or wrong, and the per cent of correct answers obtained. In Tests i and 5 of the Chinese non-verbal forms, the preliminary demonstration is modified. To be uniform with the other tests, the marks and pictures to be used for the preliminary demonstration are printed at the top of each sheet. This is an improvement also because the use of a blackboard may be inconvenient or unfair. The alternative forms thus devised cannot be claimed as the final forms. They must yet be tried out upon a large number of children, the norms for ages and grades must be computed, and the tests scaled; but judging by the results of the experiment, there is every reason to believe that the tests will prove reliable and useful . B. STANDARDIZATION The last step of scale construction is standardization — the obtain- ing of norms and scaling of the tests. In order to do this for the Alternative Forms arid Standardization 65 proposed Chinese Non-verbal Intelligence Scale, it is necessary to give it to a large number of Chinese subjects, perhaps 5000. The selected tests were only applied to about 200 pupils, very few of whom were Chinese . The devised al ternative forms , furthermore , cannot be tried out in America. It was thus impossible to secure any age or grade norms to be reported here or to scale the tests. The final standardization must be done in China. However, the technique may be briefly discussed here. I . Norms The purpose of mental measurement is to reveal individual and group differences of intelligence. To perform such a function , norms or standards of achievement for different ages and possibly grades are required. We cannot, however, test all the Chinese people between certain ages and compute the average achievement of each age. This is unnecessary as well as impracticable. The obtaining of reliable norms does not require the test of every child in the coun- try, but it is essential that the subjects selected should be in random sampling, representing the whole range of intelligence from a low degree of moron to a high degree of genius. It is also essential that the subjects should be representative of all types of social environ- ment in different parts of the country. Norms are more valuable when they are stable. When a norm is stable, it indicates that the subjects are selected from random samp- ling and the number of cases is sufficient. As a rule, the greater the number of cases taken, the more stable are the norms; certainly a norm can be claimed to be stable only when it reaches the point where the addition of new cases does not materially alter the pre- vious determination . The safest way to tell whether the norms are stable or not is to average the scores of a varying number of cases and watch the resulting fluctuations in the average. McCall states that "when the addition of, say, 100 cases does not materially alter the previously determined norm, the norm has stabilized." ' Norms for both age and grade should be worked out. However, in China the age norms will be more important than the grade norms, as the grades are not uniform in the schools. Care must be taken, however, in obtaining ages to record the actual date of birth accord- ' McCall, W. A.; How to Measure in Education, p. 315. 66 Non- Verbal Intelligence Tests for Use in China ing to both old and new calendar, as many subjects, undoubtedly, will follow the custom of reporting ages by years although they may be born in the end of the year.' 2 . Scaling After the tests have been applied to a large number of subjects and the norms are obtained, scaling is comparatively an easy task. There are numerous methods of scaling tests. For the Chinese scale, the writer plans to adopt one or both of the two most commonly used methods — an age scale and a percentile scale. (a) Age scale: The construction of an age scale merely requires the determination of stable norms. Given a norm for each age, any pupil's test-score may be transmuted into a mental age and intelli- gence quotient. Mental age is obtained from a comparison of the subject's performances with the standard for normal children of the same age. Let us suppose the subject tested is lo years of age. If he can do as much as normal lo-year-old children do, the child has a mental age of lo, which in this case is normal. If he goes as far as normal 8-year-old children go, his mental age is 8. In this case, he is subnormal. In like manner a mental defective lo years old may have only a mental age of five, and a genius of the same age may have a mental age of 13 or 14. The intelligence quotient, often designated as I Q, is the ratio of mental age to chronological age. It is a valid expression of intelli- gence. On this basis of the Stanford Revision of the Binet Scale, Terman ' suggests this classification of intelligence quotients: I Q Classification Above 140 . "Near" genius or genius 120-140 Very superior in intelligence 1 10-120 Superior intelligence 90-110 Normal, or average intelligence 80- 90 Dullness 70- 80 Border-line deficiency Below Feeble-mindedness ' According to the old custom in China, which still prevails in many portions of the country, age is reckoned in years, according to the calendar. For example, a man whose 25th birthday comes in December would be considered as already 25 years of age in the preceding January. This may be explained as resulting from the literal translation of Chinese into English. In the Chinese language, aije or "sui" is expressed in the phrase "in the 2Sth year," whereas in America this would be translated as "25 years old." ^ Terman, L. M.: The Measurement of Intelligence, p. 79, Alternative Forms and Standardization 67 {b) Percentile scale: The technique of percentile scale construc- tion is described in detail by Pintner.' After the test papers have been scored, a distribution table for each test is made. The per- centiles are then calculated for each test counting usually from the lower end of the table. The 25-percentile or Qi is that score which is found by counting one-fourth of the score. The 75-percentile is found by counting three-fourths of the scores. Similarly, the 10- percentile is found by counting one- tenth of the score, the 20-per- centile by counting one-fifth of the scores and similarly for any other percentiles. After the percentiles are calculated, the percentile table for each test should be prepared. To get the mental index of any individual his percentile placement for each test is found by com- paring his score with those found in the table, and then the median of these various placements is found. Similarly the mental index for the class, for the grade, and for the entire school can be found. For purposes of rough classification, Pintner has adopted the follow- ing scheme: Percentile Classification 84 — 100 ... ... Very bright 72 — 83 . Bright 39 — 71 Average 22 — 38 Backward o — 21 Dull • ' Pintner, R.: The Mental Survey, p. 2i, if. CHAPTER VII THE CHINESE NON-VERBAL TESTS » A. THE NATURE OF THE TESTS The measurement of intelligence has recently become widespread in America. It has been proved very helpful in solving many admin- istrative problems. With the hope of facilitating Chinese educa- tional work, these tests are therefore introduced. The tests were scientifically constructed for the measurement of mental ability. They are applicable to a large number of children at a time, who are in the Citizens' Schools or Higher Primary Schools. There are four forms, all of equal value. It is advisable to use different forms in various grades, so as to prevent coaching. The period of testing does not exceed thirty minutes. It will enable the teacher or school administrator to measure the mental ability of pupils in groups for the following purposes: 1. Classification. The object of cla.ssification is to divide into homogeneous groups the pupils whose needs are similar, in order that work can be more exactly adapted to them. With the applica- tion of these tests, a teacher can scientifically determine the mental ability of his pupils in a rapid and accurate manner. 2. Promotion. The variability in the ability to learn among children of any grade is great, and their progress is not at an equal rate. It is obviously unwise to attempt to force all of them to keep the same pace in their class work at one time. The bright pupils therefore should be promoted as fast as their ability permits them to absorb their work, or their courses of study should be enriched; while slow ones may be given more time or requirements upon them may be reduced to the minimum essentials. 3. Provision for the Backward. These tests may give a valuable indication of the probable causes of difficulty with troublesome back- ward children. Their restlessness, incorrigibility, and lack of school progress may be due to a mentality unequal to the strain of ordinary [^ ' The material in this chapter is translated from the Manual of Directions. The Chinese Non-Verbal Tests 69 school work. The tests may therefore indicate those who should be segregated from the normal class and given special courses of study. 4. Vocational Guidance. These tests will not give prognosis of fitness for specific trades or professions except along broad lines; they are selective. The test scores will show whether a child should be encouraged to take a profession or do unskilled work. For instance, it would be absurd to encourage a child whose test indicates feeble- mindedness to study medicine or one with a genius to be a riksha coolie. Although these tests are primarily devised for the use of school children , they will be of aid to the employer in making a hasty classi- fication of his employees, especially the unskilled laborers; and will aid the employee to find early the place for which he is best fitted . B. INSTRUCTIONS TO EXAMINER 1. Any intelligent person who has a pleasing personality can conduct a group examination with these non-verbal tests in a reason- ably satisfactory manner and obtain fairly reliable results. 2. The examiner cannot give the examination satisfactorily until he has thoroughly mastered the technique. He should try the tests out on a smaller group of children than the one to be tested and then memorize the procedure. However, he should always read the directions from the manual. 3. The room for testing should be provided with chairs and desks. It should be free from distracting noises within or without. No visi- tor, school authority, or pupil should be permitted to enter or leave the room during an examination unless the reason for so doing is imperative. The school administration should so arrange the place and time of testing that no one be permitted to weaken the value of tests by distracting the attention of the children in any manner. 4. Children should used pencils rather than pens. Each child should be provided with two pencils (with eraser) and the examiner should always have on hand a supply of sharpened pencils to be used if needed. If a child breaks his pencil, the examiner should supply another with entire quietness and as little loss of time to the child as is possible. 5. It is better for the examiner to remain at the front of the room 70 Non- Verbal Intelligence Tests for Use in China during the entire testing. He should ask the assistant, or appoint several pupils, to distribute the test papers. 6. Before the examination begins, those to be tested should be made to feel comfortable, and in an easy, contented but responsive form of mind. Every effort should be made to make the testing as informal and as much like a game as possible, yet precision and exactness in obeying all the rules that have been worked out for administering the tests is essential. Otherwise the results obtained in different schools will be untrustworthy and not comparable. 7. In a given school, children should be tested in order from the lower grade upward. So far as possible, the same examiner should give all the examinations within the school. 8. The examiner should give the directions in a clear, energetic voice. He should speak distinctly, at moderate speed and loud enough to make his voice clearly audible to all the pupils in the room. He must make sure that each step is understood by all, that they turn to the proper page when the new test is to be begun , and that they give instant obedience to his directions. 9. The directions for giving the tests should be followed literally. Avoid all impromptu directions since such variations may modify the results. Even though the directions are memorized, they should always be read from the manual in giving a test. ID. All should start and stop together. If a child comes in late or leaves the room early, or his work otherwise is interfered with, a note of the fact should be put on his paper at the time. 11. Accurate timing of the results is of great importance. Use a watch with a second hand. Have an assistant to act as timer if convenient. 12 . The children must be constantly watched for copying. Every precaution against cheating should be taken, yet the manner of the examiner should not be accusing or offensive to the self-respect of the pupils. C. DIRECTIONS FOR GIVING THE TESTS Read: "Would you like to play a game?" {For pupils who can read and write), "Before we begin I must ask you a few questions. First, I want to know your name. Please write your name at the upper right corner." (Hold up test blank and point.) (Pause.) "Have you all done that?" (Pause.) "That The Chinese Non-Verbal Tests 71 ; \AOl ■■ SecTioN o^Tett**-. AbsuTN B^TesTS. M»vl^ ChetifiKft 78 Non- Verbal Intelligence Tests for Use in China B. Sample of Records Kept The following is a sample from the original record book which is now kept in the library of Teachers College, Columbia University. All those who are interested in the full record may have access to it by communication with the proper authorities. Boys' Names No. Age Yr.Mo. Nation- ality Health Promo- tion School Marks Teachers' Estimates Grade I 2 3 4 5 6 7 Jo. G. Wi. L. Jo. M. De.M. Ai. N. ii8 119 120 121 122 9 — 9 — 9 — 9 — 9 — Italian Chinese Italian Italian Italian Teeth Teeth Teeth Tonsils Teeth Yes Yes Yes Yes Yes B B B B-f- B CCB B BB BBB ABA DDD III A III A III A III A III A Thorndike-McCall Reading Scale Credit Assigned to T Score Reading Age R.Q. Age School Marks Teachers' Estimates School Progress 8 9 10 11 12 13 14 38.5 35 370 29.0 88.5 lOI .0 107.0 90.0 89 104 106 89 7 7 7 7 7 7 7 7 8 7 223 3 3 3 3 3 3 4 3 4 III 5 5 5 5 5 Appendices 79 School Criterion Pressey + Pintner + Final Criterion (3 X Total 2 xMyers+3X BMa + Tests Total Sch. Crit. + 2 X Dearborn Teat Total) 15 16 17 18 26 " 484 48 174 28 582 58 200 28 513 5« 186 31 389 39 171 22 337 34 134 4- Pressey Primer Scale 19 20 21 22 23 24 Total Score Test I Test II Test III Test IV Testll-IV 38 21 17 80 21 21 18 20 41 81 24 19 21 17 36 44 8 II 14 II 22 74 22 18 18 16 34 PiNTNER Non-language Tests 25 26 27 28 29 30 31 32 Test Total Test I Test II Test III Test IV TestV Test VI II-III 93 2 35 38 14 4 73 roi I 38 33 18 6 5 71 80 4 27 27 '4 4 4 54 58 ° 22 22 14 44 13 10 I 2 So Non-Verbal Intelligence Tests for Use in China Myers Mental Measure 6. 33 34 35 36 37 Test Total Test I Test II Test III Test IV 13 I 3 7 2 29 3 13 8 5 14 2 3 4 5 14 3 9 2 12 3 3 3 3 Army Beta Examination 38 39 40 41 42 43 44 45 46 47 Test Total I II III IV V VI VII ■ III-IV V-VI I V-VI 55>^ 7 6 i5?i 13 12 47 28 77?^ 9 6 21K 27 14 69 36 82>^ 5 8 25?i 28 13 3 75 39 70>^ 8 9 8 17?^ 14 8 6 48 26 55?i 6 4 16^ 17 12 50 29 Dearborn Group Tests, Series I 48 49 50 51 Grand Total Exam. I Exam. II Exam. Ill- 171 60 86 25 179 78 59 42 182 58 89 34 147 53 81 13 150 51 74 25 A ppendices 8i Dearborn Examination i 9- SI S3 54 ss S6 SI S8 59 60 61 63 63 64 65 66 67 68 6p Total I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVII XVIIl 60 3 3 3 I 3 3 4 3 I 4 3 4 4 3 3 8 9 79 3 3 3 I 3 3 4 3 I 4 4 4 6 3 16 20 S8 3 3 3 I 3 2 3 3 4 4 5 3 14 8 SI 3 3 3 I 2 3 3 4 4 4 3 8 10 sa 3 3 3 I 3 3 3 3 I I I 4 3 14 10 Dearborn Examination 2 70 71 72 73 74 75 76 77 Total I II III IV V VI VII 86 9J^ 15 21 14 II 12 4 59K 8K 15 15 2 9 9 I 89K 9>^ 15 24 II 9 13 81K 9K 12 24 II 14 II 74 ID 15 15 10 8 II 5 Dearborn Examination 3 78 79 80 81 82 Total I II III IV 25 14 II 42 15 4 23 34 14 9 II 13 3 2 8 25 9 16 82 12. Non- Verbal Intelligence Tests for Use in China Tests Combinations 83 84 85 86 Composite A Composite B Beta III-IV-V VI Pressey II-IV Beta IV-VI Pressey II-IV 104 132 93 104 152 no 69 116 84 36 80 62 94 105 72 51 36 52 67 53 Tests Combinations 13- 87 88 89 Pintner II-III Beta VI Dearborn I I-VI Dearborn I VII-XV 87 14 25 84 14 29 62 14 29 66 12 22 8 14 21 Re-Testing 14. 90 91 92 93 9t 95 Total I II III IV V 166 50 48 33 14 21 170 44 50 35 18 23 143 50 49 15 8 21 116 23 49 1 1 13 20 116 49 38 6 8 15 BIBLIOGRAPHY Ayers, L. p. "The Binet-Simon Measuring Scale for Intelligence: Some Crit- icisms and Suggestions." Psychological Clinic, Vol. V (1911), pp. 187-96. BiNET, A. and Simon, T. "Ledeveloppement del 'intelligence chezlesenfants." V Annie psychologique, 14 (1908), pp. 1-94. BiNET, A and Simon, T. "L'intelligence des imbeciles." L'Annie psycho- logique. (1909), pp. 1-47- Chen, H. C. "Educational Research in China." Journal of Educational Research (May, 1921), Vol. Ill, No. 5, p. 394. Chen, H. C. and Liao, C. S. Mental Tests. Commercial Press, Shanghai, China (1922). Dearborn, W. F. The Dearborn Group Tests of Intelligence. J. B. Lippincott Co. (1920). Fretwell, E. K. a Study in Educational Prognosis. Teachers College Con- tributions to Education, No. 99 (1919). GoDDARD, H. H. "The Binet-Simon Measuring Scale of Intelligence Revised." Training School Bulletin, Vol. VIII (1911), pp. 56-62. Healy, W. and Fernald, G. M. "Tests for Practical Mental Classifications." Psychological Monographs, Vol. No. 2, 54 (191 1), pp. 4-5. Herring, John P. "Significance of Certain Elements in Intelligence Examina- tion." Unpublished Ph.D. dissertation, Columbia University (1921). Kelley, T. L. Educational Guidance. Teachers College Contributions to Education, No. 71 (1914). Kelley, T. L. "Table to Facilitate the Calculation of Partial Coefficient of Correlation and Regression Equation." Bulletiti of the University of Texas (1916), No. 27. Knox, H. A. "A Scale Based on the Work at Ellis Island for Establishing Mental Defects." Journal of the American Medical Association, Vol. LXII (March 7, 1914), pp. 741-747. KuHLMAN, F. "A Revision of the Binet-Simon System for Measuring the Intelligence of Children." Journal of Psycho- Asthenics, Monograph Supple- ment, No. I (1912), p. 14. McCall, W. A. How to Measure in Education. Macmillan Co. (1922). Myers, Caroline E. and Garry C. "A Group Intelligence Test." School and Society (1919), Vol. 10, pp. 355-360. Peking Teachers College Weekly, No. 132 (Sept. 11, 1921, p. 3.) Pintner, R. "A Non-language Group Intelligence Tests." Journal of Applied Psychology, Vol. Ill (Sept., 1919). Pintner, R. The Mental Survey. D. Appleton Co. (1918). Pintner, R. and Patterson, D. G. "The Binet Scale and the Deaf Child." Journal of Educational Psychology, Vol. VI (1915), pp. 202 ff. 84 Bibliography Pressey, S. L. and Pressey, L. W. "Cross-out Tests." Journal of Applied Psychology, Vol. 3 (1919), pp. 143-150- Pyle, W. H. "A Study of the Mental and Physical Characteristics of the Chinese." School and Society, Vol. VIII, No. 132 (August 31, 1918), pp. 264-269. Stern, W. The Psychological Methods of Testing Intelligence. Translated by G. M. Whipple. Warwick and York, Baltimore (1914). Terman, Lewis M. The Measurement of Intelligence. Riverside Textbooks in Education. Houghton Mififin Co. (1918). Thorndike, E. L. "A Standard Group Examination of Intelligence Independent of Language." Journal of Applied Psychology, Vol. Ill, No. i (March, 1919), pp. 13-32. Thorndike, E. L. Mental and Social Measurements. Teachers College, Columbia University (1919). Walcott, G. D. "The Intelligence of Chinese Students." School and Society, Vol. II (1920), pp. 474-480. Wallin, J. E. Experimental Studies of Mental Defectives: a Critique of the Binet-Simon Tests. Warwick and York, Baltimore (1912). Yerkes, R. M. "Psychological Examining in the United States Army." Memoirs of the National Academy of Science, Vol. XV (1921). Yoakum, C. S. and Yerkes, R. M. Army Mental Tests. Henry Holt Co. (1920). Yerkes, R. M., Bridges, J. W. and Hardwick, P. S. A Point Scale for Measuring Mental Ability. Warwick and York, Baltimore (1912). VITA Herman Chan-En Liu was born at Hanyang, Hupeh, China, on December 12, 1896. He received his early education in the Baptist School of Hanyang, graduating, in 1914, from William Nast Academy, Kiukiang. He was an instructor in the same institution during the year 1914-15. He then attended Soochow University, from which he received, in 1918, the degree of Bachelor of Science. In the fall of 1918, he came to America for graduate work, attending the University of Chicago, from which he received in 1920, the de- gree of Master of Arts. In partial fulfillment of the requirements for this degree, he submitted the thesis Historical Development of Co- education in America. During the years 1920-1922 , he was a student at Teachers College with the exception of the period in which he served as the secretary of the Chinese Educational Commission which toured America during the spring of 1921 for the purpose of studying the American educational systems and practices.