QH 
 371 
 
 LP363 
 
^m fork 
 At (SLotaell UntntrHttH 
 
 SItbrarg 
 
 Cornell University Library 
 QH 371.P363 
 
 On the general theory of skew correlatio 
 
 3 1924 003 092 917 
 
a 
 
 V, 
 
 Cornell University 
 Library 
 
 The original of this book is in 
 the Cornell University Library. 
 
 There are no known copyright restrictions in 
 the United States on the use of the text. 
 
 http://www.archive.org/details/cu31924003092917 
 
DEPARTMENT OF APPLIED MATHEMATICS, 
 
 UNIVERSITY COLLEGE, UNIVERSITY OF LONDON. 
 
 
 DRAPERS' COMPANY RESEARSIS 
 
 MEMOIRS. 
 
 BIOMETRIC SERIES II. 
 
 MATHEMATICAL CONTRIBUTIONS TO THE 
 THEORY OF EVOLUTION. 
 
 XIV. ON THE GENERAL THEORY OF SKEW CORRELATION 
 
 AND NON-LINEAR REGRESSION. 
 
 BY 
 
 KARL PEARSON, F.R.S. 
 
 [WITH FIVE DIAGRAMS.] 
 
 LONDON: 
 PUBLISHED BY DULAU AND CO., 37, SOHQ SQUARE, W. 
 
 1905. 
 
 Price Five Shillings. 
 
In March, 1903, the Worshipful Company of Drapers announced their intention 
 of granting £1,000 to the University of London to he devoted to the furtherance of 
 research and higher work at University College. After consultation between the 
 University and College authorities, the Drapers Company presented £1,000 to the 
 University to assist the statistical work and higher teaching of the Department of 
 Applied Mathematics. It seemed desirable to commemorate this—probably, first 
 occasion on which a great City Company has directly endowed higher research work 
 in mathematical science — by the issue of a special series of memoirs in the 
 preparation of which the Department has been largely assisted hy the grant. Such 
 is the aim of the present series of '■'■Drapers' Company Research Memoirs." 
 
 K.P. 
 
Mathematical Contributions to the Theory of Evolution. — XIV. On the General 
 Theory of Skew Correlation and Non-linear Regression. 
 
 By Karl Pearson, F.R.S. 
 
 (1- 
 
 (2. 
 (3. 
 
 (4. 
 
 (5. 
 (6. 
 
 (7- 
 (8. 
 
 (9. 
 (10. 
 
 (11. 
 
 (12. 
 (13. 
 (U. 
 
 Contents. 
 
 Page 
 
 Introductory. General conceptions as to skew variation and correlation. General 
 
 theory of skew variation within the limits of practical errors of sampling. ... 3 
 Generalised idea of correlation. The correlation ratio r) and its relation to the 
 
 correlation coefficient r 9 
 
 Probable errors of the correlation ratio and other constants of the arrays. Probable 
 
 error of r 11 
 
 On the higher types of regression. Homoscedastic and heteroscedastic systems. 
 
 Homoclitic and heteroclitic systems 21 
 
 Cubical. regression. General equations for regression of any order 23 
 
 Parabolic regression 28 
 
 Linear regression 30 
 
 Illustration A. — On the skew correlation between number of branches to the whorl 
 
 and position of the whorl on the spray in the case of Asperula odorata 31 
 
 Illustration B. — On the skew correlation between age and head height in girls. ... 34 
 Illustration C. — On the skew correlation between size of cell and size of body in 
 
 Da/phnia magna 38 
 
 Illustration D. — On the skew correlation between number of branches to the whorl 
 
 and position of the whorl on the stem in Equisetum arvense 42 
 
 Quartic regression. Necessary criteria for various types of regression 47 
 
 Illustration E. — Calculation of quartic regression in the case of Equisetum arvense . . 49 
 General conclusions. Nomenclature, clitic and scedastic curves. Difference between 
 
 mere curve fitting and regression calculations. Remarks on retention of decimals . 51 
 
 (1.) Introductory. 
 
 In a series of memoirs presented to the Royal Society I have endeavoured to show- 
 that the Gaussian-Laplace normal distribution is very far from being a general law of 
 frequency distribution either for errors of observation* or for the distribution of 
 deviations from type such as occur in organic populations, t It is quite true that the 
 
 * "On Errors of Judgment, &c.," 'Phil. Trans.,' A, vol. 198, pp. 235-299. 
 t "On Skew Variation, &c.," 'Phil. Trans.,' A, vol. 186, pp. 343-414. 
 
 A 2 
 
4 PEOFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 normal distribution applies within certain fields with a remarkable degree of accuracy, 
 notably in a whole series of anthropometric, particularly craniometric, observations.* 
 In other fields it is not even approximately correct, for example in the distribution of 
 barometric variations,! of grades of fertility and incidence of disease.^ For such 
 cases I have introduced a series of skew frequency curves which serve the purpose of 
 describing the frequency of innumerable skew distributions well within the errors of 
 random sampling. An exact test for "goodness of fit" in the case of frequency 
 distributions has also been now provided. § 
 
 In deahng with frequency which diverges more or less conspicuously from the 
 normal law we require to bear in mind at least three important points : — 
 
 (i.) Any expression for frequency must be a graduation formula. It is not a 
 disadvantage, but a fundamental requisite that it should smooth ofi" " Scheingipfeln," 
 so far as these are irregularities within the limits of random sampling. 
 
 Hence formulae like those provided by Thiele|| and Wundt's pupils.H which depend 
 upon taking enough "moments" to reproduce the complete frequency, are a priori 
 fallacious. Many interpolation formulae would do this completely, but such inter- 
 polation formulae are not graduation formulae. 
 
 (ii.) The graduation formula must not depend upon the calculation of constants 
 having such a high probable error that their value is practically worthless. 
 
 Now, the probable error of high moments and products increases rapidly with their 
 dimensions ; hence there is, beyond the labour of arithmetic, a practical limit to the 
 number of moments or products which can be efiectively used in a graduation 
 formula. 
 
 (iii.) There must be a systematic method of approaching frequency distributions, 
 which can be applied to all cases with reasonably practical ease. 
 
 Now the immense majority, if not the totality, of frequency distributions in homo- 
 geneous material show, when the frequency is indefinitely increased, a tendency to 
 give a smooth curve characterised by the following properties : — 
 
 (i.) The frequency starts from zero, increases slowly or rapidly to a maximum, and 
 then falls again to zero — probably at a quite different rate — as the character for which 
 the frequency is measured is steadily increased. This is the almost universal 
 unimodal distribution of the frequency of homogeneous series. Homogeneity may 
 
 * ' Biometrika,' vol. I., p. 443; vol. II., p. 344; vol. HI., p. 230. 
 
 t 'Phil. Trans.,' A, vol. 190, pp. 423-469. 
 
 X 'PMl, Trans.,' A, vol. 192, pp. 257-330; 'The Chances of Death,' vol. I., pp. 69, et seq. ; 'Biometrika,' 
 vol. I., p. 134 and p. 292; and for disease, 'Phil. Trans.,' A, vol. 186, pp. 390 and 407; A, vol. 197, 
 p. 159. 
 
 § 'Phil. Mag.,' vol. 50, 1900, pp. 157-174, and 'Biometrika,' vol. L, pp. 154-163. 
 
 II ' Forelaesninger over Almindelig lagttagelslaere,' Kjobenhavn, 1889; 'Theory of Observations,' 
 London, 1903. 
 
 U WUNDT, ' Philosophische Studien.' A whole series of papers, by G. F. Lipps and others, seems to me 
 to quite miss the point of (i.) and (ii.) above. 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 5 
 
 for practical purposes be taken to imply unimodality, although the converse is very 
 far from true. 
 
 (ii.) In the next place there is generally contact of the frequency curve at the 
 extremities of the range. These characteristics at once suggest the following form of 
 frequency curve, if y8x measure the frequency falling between x and x-\-Sx : — 
 
 <iv/d.=yJ§^ (i.). 
 
 For in this case we have one mode only of the frequency, i.e., at x=—a, and 
 dyjdx wUl vanish when y=0. 
 
 But the assumption of this form, as long as F (a?) is general, is itself extremely 
 general, and it includes cases in which dyjdx may not be zero, but take any values 
 from to Qc , when y=z^.* 
 
 Now let us assume that F (x) can be expanded by Maclaurin's theorem, and 
 
 equals })f^-\-h^-\-h^ -\-h^ -\- .... Then our differential equation to the frequency 
 
 will be 
 
 1 ^ «-!-« n^s 
 
 y dx \-^hyX-\-h^x^ -\-h^x^ -\- * 
 
 There is now absolutely no difficulty in determining the unknown constants in 
 terms of the moments of the system. Multiply up and also by a;", and then integrate 
 throughout the range of frequency, we have 
 
 \x''{bQ-\-h^x-\-b^x^-\-h^x^-\- . . .)-^ dx=\y{x-\-a)x"dx . . . (iii.). 
 
 Or, noting that y=0, at the ends of the range we have, with the usual notation for a 
 total frequency N, i.e., 
 
 'H^tl^ = \yx''dx (iv.), 
 
 the result by integration by parts 
 
 Hence, if we write n=(), 1, 2, 3 ... s successively, we have s-\- 1 equations to find 
 a, 6oj ^i> ^a • • • ^«-i ^^ terms of the moments. For example, if we stop at h^ we 
 require two moments, at h^ three moments, at b^ four moments, at 63 six moments, at 
 b^ eight moments, and at fe,_i, 5>2, 2s— 2 moments. 
 
 * For example, cases in which there is a minimum frequency or antimode at a; = - a, and dyjdx infinite at 
 one or two values for which y = 0, as in the frequency distributions discussed in ' Phil. Trans.,' A, vol. 186, 
 pp. 364-5, and ' Roy. Soc. Proc.,' vol. 62, p. 287, " Cloudiness, a Novel Case of Frequency." 
 
PROFESSOR K. PEARSON ON THE GENERAL THEORY OP 
 
 There is no difficulty whatever in finding the h's ; we have the system of equations ; 
 where /a'o^I 
 
 ,jL'^a+ 2/160+ 3/2&1 + Wh + 5/^'4&3 + 6/5&,+ 
 [I'^a + 3/260 + 4/361 + 5/462+ 6/563 + 7/564+ 
 /^a + 4/360 + 5/461 + 6iJi'^b.2 + 7/663 + 8/764+ 
 
 = -/i 
 
 = -/4 
 
 = — )^5 
 
 (vi.). 
 
 Hence, a, b^, b^, b^, 63, . . . are at once given in terms of the determinant A and 
 its minors, where : 
 
 A = 
 
 H-'o' 
 
 0, 
 
 H-'o> 
 
 2/1, 
 
 3/2, 
 
 4/.'3, . . . 
 
 l^'v 
 
 /o> 
 
 2/1, 
 
 3/2, 
 
 4/3, 
 
 5/4, . . . 
 
 H-%, 
 
 2/1, 
 
 3/2, 
 
 4/3. 
 
 5/4, 
 
 6/5, . . . 
 
 f^'s> 
 
 3/2, 
 
 4/3, 
 
 5/4>. 
 
 6/5> 
 
 7/6, . . . 
 
 /*'*. 
 
 4/^'3> 
 
 5/4, 
 
 6/5, 
 
 7/6, 
 
 8/„ . . . 
 
 
 . . . (vii.). 
 
 The results may be simplified slightly by taking the origin at the mean, and the 
 moments about the mean, indicating this by dropping the dashes and putting /i = 0. 
 
 Thus we have the following series of frequency curves, the origin being the 
 mean : — 
 
 (i.) Keeping 60 only 
 
 ydx- '''^' 
 
 (viii.). 
 
 This is the Laplace-Gaussian normal form, 
 (ii.) Keeping 60, 6] only 
 
 This is the Type III. curve of my memoir on skew variation.* 
 (iii.) Keeping 60, 61, b^ only 
 
 (ix.). 
 
 1 %. 
 
 x-\- 
 
 /^3(/^4 + 3/*2^) 
 
 1 Ofi^H-i — 1 8/x.2^ — 1 2/X.3" 
 
 y dx 
 
 /^2 (4/^2^4- 3/^3'^) I /^3(/X4+3/ X2^) ,^ I 2/^2/^4— 3ms^ — 6^0^ 
 
 in S in 9,T^ 1 /-v in S in ->. I TTi V^ o T^Z" 
 
 (x.). 
 
 10i^2/*4-18/.23-12/^32 10/*2/*4- 18/^3'- 12/^3' 10^2)^4-18/^2'- 12/^3' 
 
 X'' 
 
 'Phil. Trans.,' A, vol. 186, p. 373. 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 7 
 
 This equation gave Types I.-VI. of my two memoirs on skew variation,* and 
 provides at once the expressions 
 
 d = distance from mode to mean = "'.//^ ^?%'^^\ . ■ • • (xi), 
 
 2(5/33—6^1 — 9) 
 
 skewness_2^g^^_g^^_g^ (xu.j, 
 
 where cr = v/*3, ^i = /^sV/^s^j I^z — /"'V/^a^j given in my memoir on the theory of errors 
 of observation without proof f 
 
 There is no theoretical limit, however, to this process; we can from (vi.) and (vii.) 
 express the a and h's at once in terms of determinants, and expanding obtain forms 
 which, Uke the formulae of Thiele, will fit closer and closer to the observed 
 distribution of frequency, the more moments we take. But there are three fundamental 
 practical objections to this. These are the following : — 
 
 (a.) Experience shows that the form (x.) suffices for certainly the great bulk of 
 frequency distributions, i.e., it describes them effectively within the limits of random 
 sampling. 
 
 If the distribution be even approximately normal, the series in the denominator 
 converges very rapidly, for the coefficients of every power of x vanish for moments 
 obeying the relationships : — 
 
 H'Zs + l = 0, 11.2s = (2s— l)/A2/*2,_2, 
 
 which hold for a normal series. 
 
 (b.) The labour of arithmetic and of analysis becomes very great, if we desire to 
 keep higher moments. If we go to 64 we should have to calculate the first eight 
 moments of the observations about their centroid— a by no means easy task. Further, 
 the classification of the resulting curves and the criteria for the right one to use in a 
 special case, although not absolutely prohibitive, if we only go as far as 63, are for 
 practical purposes idle in the case of taking into account 64,. 
 
 (c.) The probable errors of the higher moments are so large that the values found 
 for ju„7, /xg, &c., are quite untrustworthy, and even that for fig is doubtful, J unless we 
 have frequency series far larger than usually occur in actual observations. This is a 
 strong argument against the utility of any descriptions of frequency, such as those 
 suggested by Thiele or Lipps, which depend upon moments higher than the fifth 
 or sixth. 
 
 -^ * 'Phil. Trans.,' A, vol. 186, pp. 343-414, and ' Phil. Trans.,' A, vol. 197, pp. 443-459. 
 t 'Phil. Trans.,' A, vol. 198, p. 277. 
 
 X In 'Phil. Trans.,' A, vol. 185, pp. 71-110, I have given a method of breaking up a frequency 
 distribution into two normal series. I obtained long ago the criterion for determining whether such a 
 resolution is possible or not. But it involves moments higher than the fifth, and the probable error of the 
 criterion is thus so great that for practical purposes it is worthless. 
 
8 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 The question of the probable deviations of the higher moments can be illustrated as 
 follows, by finding the standard deviation of the moment when we take a number of 
 random samples from a general population. Let 2^, be the standard deviation of /x.,, 
 then IQQ^Jfis is the percentage variability of /a, due to random sampling. The table 
 below shows the increase of these percentages in the case of the moments of normal 
 distributions, which, quite as well as any other, will illustrate the rapid increase in 
 probable error as we use higher and higher moments. The general values of the 
 standard deviations of some of the moments were first given by Czuber,* then 
 far more completely by Sheppard,! and a resume of all the results recently in 
 ' Biometrika.';]: 
 
 Percentage Yariability in Moments due to Random Sampling when the Series 
 
 is supposed to be Normal. 
 
 Moment. 
 
 500 in series. 
 
 1000 in series. 
 
 /*2 
 /*8 
 
 ■ 
 
 6-3 
 14-6 
 30-1 
 60-6 
 
 4-5 
 10-3 
 21-3 
 42-9 
 
 Precisely the same rapid increase takes place when we find the variabilities of the 
 ratios ju.4//*/, i^s/fJi'^, f^s/f^i^f <^c., which are the forms in which the moments actually 
 occur in our coefficients. In this case we have to remember that errors in the 
 moments are correlated, but the correlations are given in the papers cited above. § I 
 find in this case the following series, which is almost as suggestive as the previous 
 table. 
 
 Percentage Variabilities in Ratio of Moments due to Random Sampling, the 
 
 Series being Normal. 
 
 Ratio. 
 
 500 in series. 
 
 1000 in series. 
 
 
 7-3 
 23-3 
 55-1 
 
 5-2 
 16-5 
 390 
 
 The order of this increase of percentage variability, and therefore of probable error, 
 is the same for skew as for normal variation, and it seems therefore, with the length 
 
 * 'Theorie der Beobachtungsfehler,' S. 130, d seq. 
 
 t 'Phil. Trans.,' A, vol. 192, pp. 122, et seq. 
 
 t Vol. II., pp. 273-281. 
 
 § Ibid., p. 277. 
 
SKEW COREELATION AND NON-LINEAR REGRESSION. 9 
 
 of the series in customary use, idle to use the 7"" or S"" moments ; these have 
 variabihties varying from 30 to 60 per cent, of their values, and accordingly we might 
 easily on a random sample reach a T"" or 8"" moment having half, or double the value 
 it actually has in the general population. Constants based on these high moments 
 will be practically idle. They may enable us to describe closely an individual random 
 sample, but no safe argument can be drawn from this individual sample as to the 
 general population at large, at any rate so far as the argument is based on the constants 
 depending upon these high moments. 
 
 It seems to me accordingly obvious that, bearing in mind the object of a theory of 
 frequency (i.e., the description of the distribution in the general population by aid of 
 a graduated sample, agreeing with the general population within the probable errors 
 of random sampling), we can dismiss from practical use all theories which call upon 
 us to use moments as high as the seventh or eighth. Any use of the general form 
 (ii.) beyond 63, indirectly or directly, involves such higher moments. Personally I am 
 inclined to doubt whether the continental series using higher moments are, from the 
 standpoint of graduation, nearly as good as my form (ii. ). 
 
 Hence we seem driven to the skew curves embraced in (x.) as a practical frequency 
 series. If we have a frequency not described by (x.) we may, perhaps, use /aj and /^g,* 
 but it is difficult to see how its description can possibly be bettered by the use of 
 still higher moments. This may seem a counsel of despair ; but it is very far from 
 being so in reality when we remember that (x. ) has proved its efficiency now — I might 
 almost say, without exception — in a wide range of economic, physical, biometric, and 
 actuarial data. 
 
 In this memoir on skew correlation I shall accordingly confine my attention, for the 
 most part, to constants the discovery of which does not involve the use of moments 
 or products of higher than six dimensions, judging all above this limit to be, as a rule, 
 disqualified for practical service by the magnitude of their probable errors. 
 
 (2.) Generalised Idea of Correlation. 
 
 Given any two variables or characters A and B, we say that they are correlated 
 when, with different values x of A, we do not find the same value ^ of B equally likely 
 to be associated. In other words, certain values of B are relatively more likely to 
 occur with the value x than others. The distribution of B's associated with a given 
 value cc of A is termed an a;-array of B's. If N pairs of A and B are taken, and n^ of 
 these have the character A = x, these n^^ form the x-array of B's. This array, like any 
 other frequency distribution, will have its mean, which we will denote by ^x, and its 
 
 * Referring to equation (ii.), I propose to call curves which stop at bq skew curves of the 2"' order. 
 Thus the normal curve is a skew curve of zero order; curve of Type III. is a skew curve of the P' order; 
 Types I., II., v., and VI. are of the 2"" order. I hope shortly to publish a discussion of skew curves of the 
 3"" order to complete the practically legitimate range of such curves. 
 
 B 
 
10 PROFESSOU K. PEARSON ON THE GENERAL THEORY OP 
 
 standard deviation, which we will denote by cr„^. The mean of all the B characters 
 shall be y and their variability given by the standard deviation a-y. Similarly x, cr^ 
 will denote the mean and standard deviation of the A's, and n^, Xy, and a-^ the 
 number of individuals, the mean and the standard deviation for a ?/-array of A's. 
 
 Now qlearly a knowledge of y^ and cTn, will not fix the B's which wUl be found 
 associated with a given A, but it wiU define the limits of probable or even possible 
 B's. The curve obtained by plotting y^ to x is termed the regression curve of y on x. 
 A curve in which the ratio of cr„^ to the standard deviation a-y is plotted to x may be 
 termed a scedastic* curve. Since the standard deviation is always a positive 
 quantity, this curve always lies on one side of the axis ; it is a horizontal line in the 
 case of normal correlation — i.e., the Gauss-Laplacian distribution of deviations — and 
 coincides with the axis, in any case where correlation passes into causation, i.e., when 
 one value of B only is associated with each A. 
 
 The mean ordinate of this curve would clearly be a sort of general measure of the 
 degree of correlation between A and B, but it seems for many reasons better to base 
 our measure on the mean square of the weighted standard deviations of the arrays, or 
 
 o-^2 = SKa-„/)/N (xiii.). 
 
 a- a, will thus measure the average variability in B to be found associated with any A, 
 its vanishing will mean that the scedastic curve as defined above will coincide with 
 
 the axis. Now let a new quantity t], defined by 
 
 / 
 
 0-^2^(1-772)0-/ (xiv.), 
 
 be introduced. Then clearly 77 must lie between ±1, because a-a^ cannot be negative, 
 being the sum of a number of positive squares. I term -q the coy-relation ratio, to 
 distinguish it from the correlation coefficient represented by r. When 17=^1 the 
 correlation is perfect or we have causation. Further we have by a well-known 
 property of moments, if 
 
 < = ^{n.{yn-yf}/^ (xv.), 
 
 or 
 
 ^ = o-n,Ja-y (xvi.). 
 
 This shows us that the correlation ratio is the ratio of the variability of the means 
 of the x-arrays to the variability of B's in general. If 77 = 0, it follows that o-,^ is 
 zero, or from (xv.) that every y„^=y, i.e., there is no association of B's with special 
 A's at all, or correlation is zero. Thus the correlation ratio 77, as defined by either 
 (xiv.) or (xvi.), is an excellent measure of the stringency of correlation, always lying 
 numerically between the values and 1, which mark absolute independence and 
 
 * I.e., a curve which measures the " scatter " in the arrays. 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 11 
 
 complete causation respectively. Further, remembering the definition ot r, the 
 coefficient of correlation, i.e., 
 
 = ^{n^(x-x){y„~p)] (xvii.), 
 
 we have, from (xv.) and (xvii.), 
 
 Now let 
 
 N (ry^-r^) cr/ = S [n. (l/„ -y) {y„- y- ^^ {x-x))]^ 
 
 Y=y+'^{x—x) (xviii.). 
 
 then (xviii.), as is well known, gives the best fitting straight line to the series of 
 points 2/„. loaded with their respective n^,. We can now write 
 
 N (V-r^) cr/ = S{n. {y.-Yf} + S{n. {Y-y){y„-Y)]. 
 
 But, using (xviii.), 
 
 ^{n^{Y-y){y„,^Y)] = T^^\n^{x-x){y„-y-'^{x-x))\, 
 
 = 0. 
 
 Thus the last summation vanishes, and we have 
 
 N(o,2-r2)cr/ = S{w.(y„-Y)^} (xix.). 
 
 The right-hand side must always be positive, unless y„^=Y, when it is zero. Hence 
 we conclude that r) is always greater than r, or the correlation ratio greater than the 
 correlation coefficient, except in the special case when the means of the ic-arrays of y's 
 all fall on a straight line, i.e., we have linear regression, and then the two correlation 
 constants are equal. 
 
 Thus the expression (77®— r^) cr/ has an important physical meaning ; it is the mean 
 square deviation of the regression curve from the straight line which fits this curve 
 most closely.* We have now -freed our treatment of correlation from any condition 
 as to linearity of the regression, and it remains to consider the probable errors of the 
 various quantities dealt with. 
 
 (3.) Probable Errors of Constants of Correlation. 
 
 We shall first prove a number of general propositions relating to the probable 
 errors of correlation constants. We first note that if n and n' be the frequencies in 
 
 * The properties of the correlation ratio were briefly noted in a footnote to a paper by the author in 
 ' Roy. Soc. Proc.,' vol. 71, pp. 303-4. It has been systematically used in my laboratory for some years 
 and determined longside r for many distributions. 
 
 B 2 
 
V2 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 any two sub-groups of a total N, for which no member of n is a member of n', then 
 the standard deviation of n due to random samphng is given by 
 
 tj' = n[l-^j (xx.), 
 
 and the correlation between deviations in n and n' due to random sampling is given 
 
 ix„„iZi„Zt„, — ^^ (^xxi. ). 
 
 Problem I. — To find the correlation in deviations due to random sampling between 
 
 the number n^^ in the Xp-array ofy's and the number ny_ in the y^-array ofx's. 
 
 If the symbol Sn denote the error or deviation in n, we have with an obvious 
 
 subscript notation* 
 
 hi,:=hi^^y^-\-hn^^y^-\-^n:,^y^-\^. . . + Sn^,j,, 
 
 if there be q groups of y'&, and again 
 
 Sn^,= 8n:j_j,.+Sn;^^3,.+Sw^^j,.+. . . + hn^.y_, 
 
 if there be i groups of x's. 
 
 Multiply the expressions for Sn^^ and Sn^,, together and we have 
 
 Zn:,Zny = (Sn^^j,,)2+S (Sn^^y.Sji^.j,.), 
 
 where the summation is for every pair of values of u and v, differing from s and p. 
 
 Summing all such pairs of values for every random sample and dividing by the 
 number of samples taken, we have the usual definition of correlation 
 
 or, 
 
 S„,^S^R%''», = ^^.y.-^^' (xxii.). 
 
 This gives E.„^„^_, the required correlation, since S„^ and X„^ are known from (xx.). 
 
 Problem II. — To find the correlation between deviations in the total n^^ of any array 
 
 and in any sub-group n^^y^ of this array. 
 
 We have at once 
 
 8n^^Sn^^y = {8rL^^y_Y+S {8n^^y8n^^y^) 
 
 where u is to be taken every value other than s in the summation term. Summing 
 for all random samples and dividing by their number, we have, after using results 
 like (xx.) and (xxi.), 
 
 ^'Vv,X^S^%v='^-^,y.(l-^) (xxiii.), 
 
 which gives Il«.__„,^,_. 
 
 * nxy = frequency of groups with characters x and y. 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 13 
 
 Proposition III. — There is no correlation between deviations in the mean oj an 
 x-array y^^ and the total number in that a/rray. 
 
 nx,Xy^=^{n^^y:yu), 
 
 na:,8y^M^P= — 2/^P (Sw^,)HS {Sn^,Sn^,y:y«). 
 Hence as before, using (xxiii.), &c., 
 
 = 0, 
 which proves that Ry^ „, is zero. 
 
 Proposition IV. — There is no correlation between deviations in the mean of an 
 x-array and in the total number in any other array. 
 
 Proof as before. 
 
 Proposition V. — There is no correlation between deviations in the mean of one 
 x-array and in the mean of a second x-array. 
 
 We have 
 
 nxM^p=^ {^^,yjj<)—y^M^p^ 
 
 ^ V ^x,' = S {pn^^,y^„) — y^^, In^j. 
 
 Multiply these two expressions together, sum for all random samples, and divide 
 •by the number of such samples. We find 
 
 +«/^,S {n^n^^,yjy„)/'N 
 +«/vS'(«v'*:r,3,^»)/N 
 
 — S' {na,,ynx,'yjy^yJ)l'^ 
 
 i)X,})xJ -xq- n i/«p -NT ijXr' 
 
 The last term is ^^"^y^^ ^^x^yx^' ^ ^^^ ^j^^g ^j^^ right-hand side is identically zero. It 
 
 thus appears that there is no correlation between errors made in finding the means of 
 two arrays. This result is not at once obvious, although a very little consideration 
 shows it must be true. 
 
14 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 Proposition VI. — To prove that the standard deviation of the mean y^^ of any 
 
 x-array due to random sampling equals ~7=^- 
 We have 
 
 Square, sum for all random samples, and divide by the number of such samples. 
 We have 
 
 W=^>^^ (l - N ) -22/^,S |n.,,„(l - 1^) yj^ 
 
 -28 1"^ 
 
 
 + S(/l 2\ S {nx^yJJ«) S (Wa;,y„'y«') 
 
 2 
 
 2 
 
 Hence 
 
 =n^,o-n. 
 
 ty^=(T„J\/n^^ (xxiv.). 
 
 Thus the probable error of the mean of an array has exactly the same form as the 
 probable error of the mean of a random sample of a definite number of individuals. 
 The array may have a variable number of individuals, but we have seen in 
 Proposition III. that there is no correlation between errors in its mean and errors in 
 the total number of individuals contained in it. 
 
 Problem VII. — To find the probable error of the standard deviation of any array. 
 
 By a precisely similar investigation to that of the previous proposition we find 
 
 where 
 
 This is identical with the probable error we should have if the array were a random 
 sample of constant size. 
 
 In many cases it will be sufficiently approximate to put 7/14= Sm^^ and we then 
 have 
 
 •67449 S,. =-67449-^!^ (xxvi.), 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 
 
 15 
 
 the well-known form for the probable error of the standard deviation of a normal 
 distribution of a definite number of individuals. 
 
 Problem VIII. — To find the standard deviation of the standard-deviation a-jx of the 
 means of the arrays due to random sampling. 
 
 Since 
 
 the last term of which vanishes, since 
 
 Ny=S {n^^y^^. 
 
 Square the above relation, sum for all random samples, and divide by the number 
 of such samples. • ■ 
 
 We find 
 
 4N W2.„^=S j«., (l -^) (2/. -# } 
 
 -2S[^{y.-^f{y.,-§Y] 
 +4S{S,,S,,R.^.^(2/.-y)n 
 
 +4S {t„,^^,R„,.,,^ (2/v-y)' (y^-P)} 
 
 +4S \t,^^^y^R,,^^,^ {y^c-y) (yv-y)] 
 
 +^^{ty^n.;\y.-yf]. 
 
 But Il„,j,, , Il„,,y. , and Ey.a,, vanish by Propositions III., IV., and V. Further, by 
 VI., S« ^=o-„ V^x- Hence we have 
 
 4NVm^SJ=S 
 
 W:, 
 
 ^^A-^-^m-yf 
 
 _2S 1^(2/. -#(?/.,-# 
 +^^{n.,<r,,Hy.-yf}. 
 
 Now let 
 
 ^X-^{n^^{y^-yy) 
 
 be the n* moment of the means of the arrays about their mean. Then clearly 
 \=zcr^^. Further, since S {n^/Tn, ^) = Ncr/ (1 — tj®), we can write 
 
16 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 where ^i is a purely numerical constant, which is equal to unity for those cases in 
 which there is no correlation between the standard deviation of an array and the 
 square of its mean's deviation from the mean. Thus finally we find 
 
 ^^■2^ K-y ^Ki^ /{l-v' ) (xxvii.). 
 
 4NX2 ^ N ^ ^ 
 
 This enables us at once to find the probable error of the standard deviation of the 
 means of the arrays. 
 
 Proposition IX. — To find the correlation between the deviations due to random 
 sampling in the values of <Ty and ctm. 
 
 We have 
 
 Ncr/=S{n,(2/-y)^}, 
 
 2No-^8£ry=S{Sny. {y,-yf} -2 hy S{n^. {y-y)] ; 
 
 the last term vanishes because S (ny,y,)=l^y. 
 Thus 
 
 2Na-3,Sa-j,=S{Sny. {ys—yf}- 
 
 But from the previous proposition 
 
 2No-MSo-M=S{Sra^^(2/^^-j/)2]+2S[S(/^n,^(2/^,-y)]. 
 
 Multiply these two expressions together, sum for all random samples and divide by 
 the number of such samples ; we find 
 
 +2S{i^,2„,Sy, R„„,y,^ {y,-yy {y.-y)\. 
 
 To evaluate this, we require to find the two correlations expressed by E,;^„, and 
 ^n,y, ■ We will consider the two summation terms separately. 
 
 First Term. hn^=^n^^y^-\-hn^^y^-\- . . . -\-hn^,y.-\- . . . 
 hny=hny^^ + S*iy,^,+ • • • +8%x,+ • • • 
 ^n^,^ny={hna:^yf-\-^ {8n^^y,hn^^,y), 
 
 where in the summation p' and s' are not equal to p and s. 
 Proceeding in the usual manner we find 
 
 S.,S„,R„.,„,=n.,,.(l-!|^--]-S| 
 
 '^^Xfyjnjc^iy^ 
 
 N 
 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 17 
 
 where in the first sum s' is to take all possible values, and in the second "p' is to take 
 all possible values. Thus we have 
 
 S„.^S„,R,,^„,=n.,, - !^ (xxviii.). 
 
 Substituting we find 
 
 First Term = ^Ar^.AV'-yfky^-yJ'i 
 
 -^.\^-f'{y.-yf{y^-yf 
 
 Here both the summations are really double summations ; fixing our attention on 
 any Xp, i.e., on any array of ^/'s for a given value of x, we have first to sum for all t/'s 
 in this array, and then we have to sum for all arrays. This is the meaning of S^. In 
 Sg we are to associate every array of cc's with every array of y^s ; hence this term wiU 
 break up at once into two factors, i.e., 
 
 =Ntr/Xo-Ml 
 
 Keeping Xj, constant first in Sj, we see that 
 
 S{n.,y. {y. -yf) 
 
 is the 2"* moment of the y's in the Xp array about the mean of the system 
 
 =n,A^„^^+{y^-yY}- 
 
 Combining we have 
 
 First Term = S{n., (i/.,-#} +S{n.,cr„./ {v^-yf} -Nct/ctm^ 
 
 = N{\4+cr/o-„3(l_,,2)Xi-cr/o-M^} ...... (xxix.). 
 
 We now turn to the second term which involves the discovery of Il«, y, • 
 
 Hence 
 
 Sum for all random samples and divide by the number of such samples ; we have 
 
 = «*py.{y*~y^,) (xxx.). 
 
 c 
 
Missing Page 
 
Missing Page 
 
20 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 In any other case, Xa. Xi~l> (/^4"~3/^2^)/M2^> {K~^\^)/K^ ^ill probably be small and 
 thus 
 
 Probable error of 
 
 ■»? = -67449 (l—Ty^Vv/N, nearly (xxxiv.). 
 
 This simple form suffices for many practical cases. 
 
 If greater exactitude is wanted, there is, however, no great labour in using 
 (xxxiii.). We find the means and standard deviations of each array. 
 
 Then NXg and NA.4, are the 2"* and 4"" moments of the means of these arrays 
 about their mean. 
 
 N/Lig and Nju,4 are the 2"'^ and 4*'' moments about the mean of the ^/-characters, and 
 will always be known for skeiv variation. 
 
 Xi is defined by 
 
 N,t/(1-7,^)o-m^ ^ ^' 
 
 and can be easily found when the means and standard deviations of each array have 
 been found. 
 
 The most troublesome expression is Xz defined by 
 
 But as we do not take usually more than 1 to 20 arrays, the discovery of their 
 3'''^ moments is not an extremely difficult task. As a rule, however, ^2 is very small 
 and may be fairly neglected, even when we must find Xi~l- ^^ these points will 
 be dealt with in the numerical illustrations given later in this paper. At present 
 we note that the probable error of t] has been determined, and that its value for the 
 general case is not really more complex than the value of the probable error of r in 
 the general case, which requires the determination of product moments of the 4*'' 
 order.* 
 
 * Let Npjg = S {nxy (% - x)i (y - yf}, then the probable eri'or of r is given by 
 
 y , f l[ P22 - 3^11^ P22 - 3^20^02 j?40 - 3^20^ Poi - 3j?02^ ^31 - ^PwPlO P\i - ^PuPdi \ , 
 
 "■"NI pn^ + 22)20^^02 ^ ip-ii? ^ W i'iii'20 i'11^02 ' r • (^^^^"•)- 
 
 This j,grees with the value given by Sheppard ('Phil. Trans.,' A, vol. 192, p. 128), except that the r'^ 
 factor has been dropped by a printer's error in his paper. For the special case of a normal distribution, we 
 have easily from the equation to the normal surface 
 
 Pm = ^Pi^^, Pm = ^Po2^ P3\ = ^PuP2<>, Pi3 = ^P\\Pm, {p22-Spn^)jpu^ = {l-r^)lr^ 
 and 
 
 the well-known form (' Phil. Trans.,' A, vol. 191, p. 245). 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 21 
 
 (4.) On the Higher Types of Regression. 
 
 • We have already seen how the introduction of the correlation ratio t) enables us to 
 drop the limitations associated with the Gauss-Laplacian form of frequency, and the 
 Bravais correlation formulae. The fundamental step towards this advance was 
 undoubtedly taken by G. U. Yule in his paper in the 'Roy. Soc. Proc.,' vol. 60, 
 pp. 477 et seq., wherein he shows that if the regression be linear, the Bravais type of 
 formula applied to multiple correlation is still true, although we make no assumption 
 as to the form of the frequency surface. It would undoubtedly be a gain to have 
 skew frequency surfaces which would describe skew correlation for the great mass of 
 cases as eifectivly as the series of skew frequency curves describe skew variation, but 
 although a considerable amount of progress has been made in the consideration of 
 these surfaces, their full theory has not yet been worked out owing to difficulties 
 of analysis, and their complete discussion must still be postponed. Yule's method 
 of approaching the problem from the form of the regression curves is, however, 
 available and capable of very great extension. Its chief advantage is that it 
 makes little or no assumption as to the distribution of frequency ; its chief defect 
 lies even in this advantage of generality : it does not enable us to predict the 
 probability of an individual with a given combination of characters. This follows at 
 once from the fact that we make no assumption as to the form of the distribution 
 within an array. Without some theory as to variation within the array, we are 
 reduced to the laborious process of calculating the standard deviation, skewness, and 
 other general characters of each array, a lengthy and troublesome process compared 
 with a theory which would, like the Bravais theory, give these at once in terms of a 
 few constants determined from the data as a whole. 
 
 In the great bulk of biometrical and economical enquiries, however, the regression 
 does not diverge very markedly from the linear form. In the cases of non- linear 
 regression that I have hitherto had to deal with, I find that parabolas of the 2"* 
 or 3"* order will suffice as a rule to describe the deviation from linearity. If 
 they did not, we could, of course, use curves of higher orders, but the difficulty 
 referred to in the first section of this paper at once arises : we then need to use 
 in the determination moments and product-moments of such high orders that the 
 probable errors of the constants are so high as to render valueless their calculation 
 from such statistical data as we can hope for in most actual inquiries. In th,-; great 
 bulk of investigations it is practically impossible to increase our random samples 
 from 500 to 1,000 individuals up to 50,000 to 100,000. Nor in the great 
 bulk of statistical cases is any such increase even desirable, for a fairly wide 
 experience shows that 2"* and .S'** order parabolae amply suffice to describe the 
 skewness of the regression line. I shall accordingly classify skew correlation in the 
 following manner : — 
 
22 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 {a. ) Linear Regression : 
 
 The mean of an x-arraj of y's, i.e., y^^, is given by 
 
 ya:,=ci'o+<^r^P (xxxviii.). 
 
 (&.) Parabolic^ Regression : 
 
 The mean of an a;-array of y's, i.e., y^^, is given by 
 
 Vx^aQ-^-aiXp-^-a^Xp^ (xxxix.). 
 
 {c.) Cubical* Regression : 
 
 The mean of an a^-array of ^s, i.e., y^^, is given by 
 
 ya:=ao+<^iXp+a^Xp^+a^Xp^ (xL). 
 
 It is conceivable— in fact, from unpublished work already done, highly probable — 
 that the theory of skew variation will give regression curves, not of the exact form 
 involved in (xxxix.) or (xl.), but containing product terms in x and y. The most 
 general equation to a regression curve may be taken to be of the type 
 
 and what experience shows us is : that for the great bulk of vital phenomena it is 
 sufficient to expand by Maclaurin's theorem and keep the first three or four terms. 
 Indeed, in the large majority of cases, (xxxviii.) alone suffices. Hence, if (xxxix.) 
 or (xl.) fit the data within the limits of random sampling, we are not injudiciously 
 circumscribing future developments of the theory of skew correlation by casting our 
 regression curves into the above forms. I shall deal first with the theory of cubical 
 regression, for we can then obtain from this the conditions necessary for parabolic 
 and linear regressions. 
 
 I must remind the reader, however, that the form of the regression line does not in 
 any way limit the nature of the distribution of the array about its mean ; the 
 variability of an array, i.e., the standard deviation of an array, having for its mean 
 value Oyv/l — rf', may or may not be the same for all arrays. If it is the same, or all 
 arrays are equally scattered about their means, I shall speak of the system as a 
 homoscedastic system, otherwise it is a heteroscedastic system. The Gauss-Laplacian 
 correlation surface gives a homoscedastic linear system. Mr. Yule's linear regression 
 is not necessarily homoscedastic ; it may, however, be homoscedastic without being 
 normal, and then the scatter of each array is measured by a-yy/l—r^. When a 
 system is homoscedastic, but not linear, then cr„^^=(r^^(l— ly^), and consequently the 
 Xl of (xxxv.) is equal to unity. Xi — •'^ ^^ ^ necessary result of homoscedasticity. 
 
 Lastly, we want a word to express the idea of all the arrays having equal skewness, 
 
 * ' Parabolic ' and ' cubical ' are here used in the narrower sense of regression curves corresponding to • 
 ordinary parabolse of the 2"* order and of the 3'* order respectively : in both cases the axis of the 
 parabola being parallel to the axis of the ^/-character. 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 23 
 
 or being asymmetrical in an equal degree about their means. I shall express this by 
 the term h omocliti c ; generally the arrays will not be equally asymmetrical round their 
 means, and in this case we shall speak of them as h eterocliti c. If there were no 
 skewness in any of the arrays, then m^ of (xxxvi.) would be zero for all of them. 
 I term arrays of no skewness isocurtic, and skew arrays allocurtic. If we supposed 
 that a curve of Type III. would sufficiently express the skewness of an array, we 
 
 should have 
 
 Sk.=^t3/(^„,__^ 
 and therefore from (xxxvi.) 
 
 _ 2S{n.,cr,„/(Sk.)(y.-y)} 
 
 For a homoscedastic system we have a;,, ^a-ys/l—rf', and therefore 
 
 2SK(Sk)(^V^} 
 
 and for a homoclitic system 
 
 _ 2(Sk.)S{n.,or,,/(y.-^)} 
 
 For a homoclitic homoscedastic system, whether isocurtic or allocurtic, 
 
 2(Sk.)S{n.^(y.-^)} _, 
 
 Thus x% is to a certain extent a measure of both homoscedasticity and homoclisy. 
 But as the correlation between o-^ and y:r,—y is in most cases extremely small, while 
 the skewness of the array can well change its sign with arrays above or below the 
 mean, we can fairly consider the smallness of ^3 to be a measure of the approach to 
 homoclisy. I am thus inclined to speak of Xi — 1 and ^3 as measures of heteroscedasticity 
 and heteroclisy. When they both vanish we have a homoscedastic homoclitic system. 
 For such systems 77, the correlation ratio, tells us effectively the scatter of any array, 
 and as a rule all we want to know, in addition, is the form of the regression line. 
 
 (5.) Cubical Regression. 
 
 We have already used the following notation 
 
 %,,,=S{w,„(a;-*)?(2/-yX} (xlii.). 
 
 We shall shorten our formulae if we write 
 
 'r=Pul{o-x<ry), ^=Pi\/{<T^^<ry), X='P%A^^^y\ ^=PJ{.o-*o-y) • (xliii.). 
 
 We have already used /x^ to denote p^, and we shall use v^ for p^^. Further, we 
 
 write 
 
 ^l = ''3V''•3^ ^z=vjvi, fis=^5^s/^i\ ^i=vjv^^ . . . . (xliv.). 
 
24 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 ■v//3i = i'3/o-/ will be of the same sign as v^. These constants /S have been previously 
 used in the theory of skew variation.* 
 We shall further put 
 
 i=e-r^/J„ Z=C-r^„ d=e-r^J^/J^ .... (xlv.). 
 
 The regularity of the forms e, ^, 6, is rather screened by the above notation, which 
 is introduced for brevity ; using the pgq notation, we have 
 
 g_BiBo-BiBo, ^—PuPm::z2rLP^, ff- PuPzQ — PnPio _ _ (xlvi.), 
 a-Ja-y (T.J'cry o-^o-y 
 
 whence the law of formation of these constants is easily seen. 
 The regression curve may now be conveniently put into the form 
 
 Vj-d ^l^j^l ^p-^ .j^l^ ( '^^"^ Y+^g i^P ^ ) .... (xlvii.). 
 
 Or, multiplying by //,.„ and summing for all arrays, 
 
 the sign of v//3j being always that of the 3'* moment. Hence, measuring from 
 the means of the two characters, i.e., ^j,-=Xp—x, Yj-^=yj-^ — y, we may re-write (xlvii.) 
 
 Now multiply by n,,c^p/a-j; and sum for all arrays, remembering that 
 
 Nrcr.,cr, = S(n.XY) = S(n.X^Y.,), 
 we find 
 
 This enables us to get rid of 6^ and write (xlviii.) 
 
 + h,{{X,/cT:f-^,{X,/a:)-s/J,] . . . (xlix). 
 Now multiply by nj^^i^plcr.rY' and sum for all arrays. We have 
 
 ^=r^J, + h,{^,-P,-\) + h,{li,l^/J,-^,-yJ,-^J,), 
 or 
 
 e = b2<f>2+hi>s (!•). 
 
 where 
 
 <^3 = (^3-^A-^i)/n/^1. 
 
 * 'Phil. Trans.,' A, vol. 186, p. 368, and A, vol. 198, p. 278. 
 
 (li.). 
 
SKEW COEEELATION AND NON-LINEAR REGEESSION. 
 Eliminating h^, we can write (xlix.) 
 
 9z 
 
 25 
 
 + ' 
 
 63[(XVo-..)3-;S,(X,/cr.)-v/;8,-|[(XVcr.„)^-v/A(X,/cr^)-l]] • (Hi-)- 
 
 Now multiply by n^^ (X^/cr^)^ and sum for all arrays ; we find 
 
 92 
 
 or 
 
 where 
 
 It follows from (1.) that 
 
 (4<^2-e<^3)/(<^a«^4-<^32) = &3 (liii.), 
 
 ^,=^,-^i-P, (liv.). 
 
 b,=i^<f>,-C<f>,)/{Mi-<t>s') (iv.)- 
 
 We can thus write the cubic regression curve in either of the forms* 
 
 * The method is perfectly easy of extension, if we choose to use higher products and moments, to a 
 regression curve of any order, e.g., 
 
 Y^J<Ty = bo + bi (Xp/a-^) + h (^jo-xY + . . . + S„ (X^/o-a,)" + . . . 
 
 For let: ^^qi = B{n^Y^^Xp9)/{,T^ga-y), and y, = •',/<r^« = S («^X/)/(No-/), 
 
 we have: 0= Jo + x 6i + 62 + 7363 + . . . + yj>n + 
 
 €11 = X Jo + 61 + 7362 + 74*3 + 
 «2i= h + 7361 + 7462 + 75*3 + 
 
 tpi= yph +yp+\h + yp+i>i + yp+th + 
 
 + 7«+l*n + 
 + 7n+2*n + 
 
 + 7n+p*n + 
 
 Hence writing epi for 0. 70= 1, 71 = 0, 72= 1, we have 
 where A 
 
 K = («01 Aon + «11 Ai„ 
 
 + ejl A2n + • 
 
 . + tpi Apn + . . 
 
 •)/A. 
 
 70, 71. 
 
 72. 
 
 73. • 
 
 7«. 
 
 
 71. 72, 
 
 73. 
 
 74, • 
 
 7n+l. 
 
 
 72, 73, 
 
 74, 
 
 75. • 
 
 7«+2- 
 
 
 7j.. 7p+i. 
 
 7^+2, 
 
 yp+t, ■ 
 
 yp+nt 
 
 
 and Agn is the minor of the constituent in the (ff+l)'" row and (w+l)'" column. As we have already 
 noted, however, solutions involving anything beyond 75 are hardly likely to be of practical value. 
 
 The value above for ft,, is the type equation given by the method of least squares, when we strike the 
 best fitting curve to all the entries in the correlation table. I have already pointed out that the method 
 of moments becomes identical with that of least squares, when we fit parabolse of any order (' Biometrika,' 
 vol. I., p. 271). The retention of the method of moments, however, enables us, without abrupt change of 
 method, to introduce the needful 1;, and to grasp at once the application of the proper Sheppard's correc- 
 tions. The extension of the method of least squares to continua in space has not yet, as far as I am aware, 
 
 been fully considered. 
 
 D 
 
26 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 YJa-,=r(X,/^,)+-if(XVo-.,)2-y^(V(r.,)-l} 
 
 or 
 
 YJo;=r (X>.)+ j%-% {(X,M)^- x/A i^/cr.)-!} 
 
 9294 — 93 
 9294 — 93 
 
 The former arrangement of the solution, while it is apparently more cumbersome, 
 is, perhaps, the better, for it gives us at once the measure of the deviation from 
 
 parabolic or 2""* order regression, i.e., the approach of ^c^^- ^*^3 ^^ zero. In the case 
 
 of normal correlation both e and £, vanish, and neglecting higher terms the condition 
 
 for linear regression is that e = 0, and ^(^3— e<^3 = 0, or, again, e and ^=0. For 
 material in which the a;-variability is isocurtic, ^^=^^=^^ = 0, and the regression 
 curve takes the simple form 
 
 Yja-,=r(X^/cr.)+i-{(XVo-.)^-l} + |{(X,/o-.)«-^,(X,/o-.)} . (Ivi.) ter. 
 
 92 94 
 
 We now turn to express these relations in terms of the correlation ratio rj. 
 Multiply (Ivi.) by n^^J^^Ja-y, and sum for aU arrays, we obtain 
 
 ^2,,^2+ |(,_y^,.)^ p2-^j ^s h-^,r- ^ (e-^M], 
 92 9294—93 I- 92 J 
 
 whence results 
 
 (Ivii.) is a necessary condition of cubical regression. 
 
 It is of course not a sufficient condition, as we ought to show that h^, 65, &c., all 
 vanish, and thus any number of conditions may be found. For example, multiply by 
 n^lLp^laJ' and sum for all arrays, then 
 
 9294—93 9294—98 V/Sj 
 
 is also a necessary condition. Here ^^=v,jvj(rj-^. But the high as well as complicated 
 value of the probable errors of such expressions renders it idle to consider them in 
 practice. 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 27 
 
 Substituting (Ivii.) in (Ivi.) we have : 
 
 Y J<r,=r (X,/o-.)+^{xV«^.)^- V^^(x>.) - 1 } 
 
 Which sign is to be given to the root will often be visible on inspection of the 
 observations. Otherwise the sign of the root must be the same as that of 
 
 ^^a— #3- 
 (lix.) will save the calculation of ^if the root-sign can be found by inspection. 
 Finally there is a third form into which we may put the cubic. Eliminate ^%'^i—'i>z 
 from (lix.) by aid of (Ivii.) and it becomes 
 
 YJa-,=r (V<-.)+^^^S-^^^ {(XV<r.)^-v/A (V«^.)-l} 
 
 + '^' %~-ii/ {(X./a-.)«-/8, (X,/cr.)- v/^} . . (Ix.). 
 
 At first sight this might appear to be the best form of the cubic, because it does 
 not involve the 6*^ moment of the variable x. But this is very far from being the 
 case in actual practice. The reason is simply this, e, ^ and yf—r^ are in most cases 
 very small — they vanish in normal correlation — relatively to ^^ and ^4. Hence both 
 numerators and denominators of the coefficients of the square and cubic terms are 
 the ratio of small quantities, and accordingly subject to large probable errors. For 
 this reason (Ix.) was found in actual practice to be of no service. Of the other two 
 forms (Ivii.) and (lix.), which neither suffer from this defect, <^2<^4,— ^3^ being always 
 large relative to the numerators, (lix.) while involving a 6"* moment does not 
 involve a 4*'' product, t„ and experience shows that the former is on the whole 
 easier to determine and more exact than the former. Hence (lix.) seems the prefer- 
 able form, even if it be needful in certain cases to determine X in order to fix the 
 sign of the radical. The cubic regression curve thus demands a knowledge of the 
 correlation ratio -q, of the " cubic product " e and the sign by inspection or calculation 
 of Z<l>2~^i^3- Besides this, we require the first six moments of the independent 
 variable x. Of course if the regression of a; on ?/ be required, as well as that of 
 y on X, the second correlation ratio and cubic product as well as the first six moments 
 of y must be found. It is rare, however, that both regression curves are needed for 
 a single enquiry. 
 
 As to the general form of (lix.), we note that there will always be a real point of 
 inflexion given by 
 
 ^/o-^=h Ms-^)/Mi) (Ixi.), 
 
 D 2 
 
28 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 where 
 
 and further that there may be two points of horizon tality given by a certain quadratic. 
 Thus, in general, the regression hne will tend to be part of an S-shaped curve. The 
 horizontal points may be imaginary, or, if real, either they or the point of inflexion 
 may be far beyond the portion of the curve which crosses the observed field of 
 frequency. If we consider, however, the slope of the regression curve to measure 
 the regression in the neighbourhood of any point, we note that the regression is a 
 maximum at the point given by (Ixi.), and grows smaller and smaller towards the two 
 points of horizontality, i.e., points of complete local independence of the two 
 characters. These are not unfamiliar features in certain practical cases of skew 
 correlation,* and accordingly the cubic regression curve provides us with a ready 
 means of describing regression phenomena, which cannot be dealt with by the simple 
 line or the parabola. 
 
 It may of course be suggested that a quartic or quintic curve would give a 
 better result than a cubic. The answer to this is : Possibly, but the high moments 
 and products required render it impossible to deal even superficially with the probable 
 errors of the constants involved. The calculation of the probable error of 7^ is a 
 sufficiently stiff task in the general case. To test the probable error of a condition 
 like (Ivii.), to say nothing of one like (Iviii.), would involve an immense amount of 
 work, since we should want the correlation of errors in y], I, l,, and 6. Speaking with 
 some experience of practical statistical possibilities, I think, the tendency to use very 
 high moments or product-moments must be curtailed to the minimum of actual needs. 
 We cannot deny the existence of skew vaiiation, nor of the sensible curvature of 
 regression lines. We must admit their existence as the result of statistical experience. 
 This existence involves a great widening of the old frequency notions and the need 
 for a new means of description. But we must remember that statistics are essentially 
 a practical study, the art of describing by a few numerical constants observational 
 experience, and we must curtail at every turn the desire to run riot in mathematical 
 formulae, which cannot be generally applied in actual practice, t Still I propose later 
 in this paper to deal with the general formulae for quartic regression. 
 
 (6.) Parabolic Regression. 
 
 For a parabolic system 63 must vanish, or nearly vanish. Hence we have from 
 (liii.) and (Ivii.). 
 
 C<l).2—i(f>s=0 (Ixii.), 
 
 <l>Av^-r"^)-^^=0 (Ixiii.). 
 
 * Compare for example the regression line of age of mean age of bridegroom for actual age of bride, 
 which gives a typical S-shaped curve. See ' Biometrika,' vol. II., p. 20. 
 t These remarks have special reference to the points dealt with on p. 6. 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 29 
 
 From these conditions we find 
 
 These give for the form of the parabolic regression curve 
 
 Yj<T,=r{X^/cr,)+l{{X,/<T.Y-^/J,{X,l<r,)-l} . . . (Ixiv.), 
 
 92 
 
 or 
 
 Y,>,=r(YVo-.)± V'5^{(X>,)-^-v/i8i(XVo-.)-n • • (Ixv.). 
 
 The latter form, besides the correlation coefficient and correlation ratio, requires only 
 a knowledge of the skew variation constants ^j and /Sg, and is therefore very easy to 
 determine. Except for very nearly linear regression, there can be no doubt as to the 
 sign of s/yf'—'r^, as we can tell at once whether the parabola ought to be concave or 
 convex to the a;-axis. In other cases the sign of y/rf—r'' must be taken to coincide 
 with that of e, which must therefore be found. It will then be as easy to use (Ixiv.) 
 as (Ixv.), although probably i) and r can be found with less error than e. 
 
 It is thus quite easy to allow for such curvature of the regression line as can be 
 expressed by a parabola of the 2"* order of the type considered. 
 
 We notice at once that the regression curve does not pass through the mean of the 
 two characters. Or, an individual with the mean of one character will most probably 
 not have the mean of a second character. This is a rather important result, which 
 follows at once for nearly all types of skew correlation. 
 
 It will be seen, for example, that Quetblbt's " mean man," defended by Professor 
 Edgewoeth as theoretically justifiable, depends entirely on human characters giving 
 linear regression curves. Such linear curves are certainly given by many pairs of 
 characters, e.g., cranial and body measurements, but there are certainly other 
 characters for which regression ceases to be sensibly linear, and the conception of the 
 " mean man " in this case fails. For example, if age be considered as a character, 
 then the regression is certainly not linear, and the individual of mean age will not 
 necessarily have either the mean physical or psychical characters. This seems of 
 some importance for the general conception of " type," if by type we denote the mean, 
 for probably there are other characters than age for which regression is skew. 
 
 The regression, i.e., dY:cJdXf will be zero, for a point ^(jmn.) for which 
 
 %he sign of the root being determined as before. Clearly, therefore, unless r be very 
 small, or t)^ diverges very sensibly from 7-^, this point of zero regression may correspond 
 
30 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 to a very large abscissa, and in some cases will lie entirely outside the range of 
 observable frequency. 
 
 The parabola of regression cuts the line of regression, i.e., the line of best fit to 
 the series of regression points, or to the means of the cc-arrays, in two points 
 determined by the quadratic equation 
 
 or 
 
 O-j 
 
 =i{v/'i8i±v/A+4} (Ixvii.). 
 
 These points are always real, and correspond, if regression be truly parabolic, to 
 the same values of the x-character, whatever be the ^/-character of which we are 
 considering the correlation. In the case of normal variation of the x-character 
 only, these are the points of inflexion of the as-distribution. 
 
 (7.) Linear Regression. 
 
 In this case it is necessary that both h^ and 63 vanish within the limits of random 
 sampling, and, although these are not theoretically sufficient — for a whole series of 
 relations between the higher product-moments could be written down* — they are for 
 practical purposes sufficient. 
 
 Hence we have the following conditions for linear regression : — 
 
 r)'^=r-^ (Ixviii.), 
 
 or, the coefficient of correlation, without regard to sign, should be equal to the 
 correlation ratio. Further e should be zero, or 
 
 PiiP-2o-2hiPso=^ (Ixix.). 
 
 The theory of linear i^egression is so familiar that it need not be further discussed 
 here. In the actual practice of statistics, the determination of the means of the 
 a;-arrays and the drawing of the regression line will often suffice to show the fairly 
 trained eye whether the deviations from it are random or not. If they are not 
 random, then we must proceed to the determination of r] and of the higher product- 
 moments. 
 
 The following are numerical examples of skew correlation, selected to illustrate the 
 theory developed above. 
 
 * For example, it is necessary in most cases that I should vanish. In the instance of that very special 
 case of linear regression, the Gauss-Laplacian normal frequency, it is easy to show that the constants €, ( 
 both vanish as well as t)^ = r^- 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 
 
 31 
 
 Statistical Illusteations. 
 
 (8.) Illustration A. — On the Skew Correlation between Number of Branches to the 
 Whorl and Position of the Whorl on the Spray in the case of Asperula odorata. 
 
 In this case the material was collected in a lane near Horsham, Sussex, at 
 Whitsuntide, 1903, by Miss M. Eadpord. There were 150 independent sprays, the 
 woodruff had just flowered, and the whorls were counted from the flower downwards. 
 Being early in the season, the maximum number of whorls was five, and, in some 
 cases, not even as many were available. The material was counted and tabled by 
 the author, and the results are exhibited in the table below : — 
 
 
 Table I.- 
 
 -Correlation of Whorl- Branches and Position ot Whorl. 
 
 
 
 X. 
 
 Whorl. 
 
 Number of branches in whorl. 
 
 np. 
 
 y^- 
 
 <^«p- 
 
 wis. 
 
 Ms. 
 
 4. 
 
 5. 
 
 6. 
 
 7. 
 
 8. 
 
 it 
 
 X2 
 Xi 
 
 First . . 
 Second . 
 Third. . 
 Fourth . 
 Fifth . . 
 
 1 
 1 
 
 3 
 
 3 
 
 6 
 
 12 
 
 13 
 
 66 
 61 
 60 
 68 
 53 
 
 42 
 47 
 40 
 39 
 10 
 
 39 
 39 
 44 
 22 
 10 
 
 150 
 150 
 150 
 142 
 
 87 
 
 6-7800 
 6-8133 
 6-8133 
 6-4859 
 6-1724 
 
 •8553 
 -8437 
 -9047 
 •8780 
 -8605 
 
 •7316 
 -7117 
 -8185 
 •7709 
 •7404 
 
 •1535 
 •0985 
 •0383 
 •1347 
 •4049 
 
 Totals. . . . 
 
 2 
 
 37 
 
 308 
 
 178 
 
 154 
 
 679 
 
 6-6554 
 
 — 
 
 — 
 
 — 
 
 We require the regression curve giving the probable number of branches for a 
 given whorl. 
 
 Dealing first with the skew variation in position, a purely arbitrary system 
 depending solely on the number of whorls dealt with in each position, we find, not 
 using Sheppard's correction,* 
 
 Mean = 2-802,651, 
 0-^=1 -336,887, 
 
 Hence we determine 
 
 i8,=3 
 
 ^2=1787,268, 
 1^3= -311,783, 
 1/4=5-841,682. 
 
 017,027, «^2= 
 
 828,767, «^3= 
 
 085,545, <j>^= 
 
 1/5= 2-799,638, 
 j/g=22-678,308. 
 
 •811,740, 
 -286,465. 
 -610,879, 
 
 •972,295, and ^^1 = + -130,487. 
 
 * The numbers are tabulated to six places, because we cannot be sure that the final calculations are for 
 the data true to two places, which is all we finally retain unless this is done. Any number of figures can 
 really be retained with perfect ease when the work is done on a calculator. 
 
32 PKOFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 We now turn to the skew variation in the number of branches to the whorl, and 
 get the following constants : — 
 
 Mean=6-655,375, /i2= -806,124, 
 
 cTy= -897,842, )u,3= -132,090, 
 
 )Li4= 1-138,410. 
 
 The values of y^^, m^, and wig are given in table above. Using them we find 
 
 o-M= -224,377, >; = -249,911, (Ta=iTy\/\-^^ = -869,355, 
 
 X2=V= -050,345, X^= -007,474, xi = "990.862, X2=- '059,851. 
 
 These give by (xxxiii.), showing the numerical contribution of each term, 
 S,^=:^ {-878,991 -•010,323--000,888--007,231 + -013,578}, 
 
 or the probable error of ■»; = -0242. 
 
 Had we calculated the probable error of ■>/ from (xxxiv.), we should have found for 
 its value -0243. It is clear that for this special case the simple formula (xxxiv.) is 
 amply sufiicient, the small terms almost cancelling. 
 
 We see that ^i is almost unity, and the graph of a-„J(Ty shows indeed that the system 
 is sensibly homoscedastic. Xi, '^^ small, but a glance at the graph of the clitic curve 
 on Diagram I. shows that we can hardly treat the system as homoclitic, the changes 
 in the skewness forming a fairly uniform curve.* 
 
 For practical purposes, we may treat the variability of the number of branches in 
 any array as sufiiciently closely given by cr^ v/l — rf. 
 
 We now turn to the product-momentst and find 
 
 Pji = — '249,160, P3i=— -896,415, 
 P2j=_ -236,289, jp^^ = — 1-210,225. 
 
 * Throughout these illustrations the clitic curve is plotted by calculating the skewness of the arrays 
 from ^maKmiY'^. See p. 23. 
 
 t In calculating these products referred to the centroid from those referred to any axes, generally 
 corresponding to whole numbers in the table, the following reduction formulae will be found useful 
 We take Nn^j- = S {n^y x'^y'^'), x' and y' being measured from any axes, further, x, y' are the distances of the 
 means from these axes, and V2, va, V4 the moments of the x-character about its mean as tabled above. 
 
 Pn = Hn - x'Uoi, fn = riji - ^xH-a + ai'^IIoi - y'v^, 
 
 i'si = Hsi - Sic'nai + Mm-a - S'^noi - y'vs, 
 
 Pii = n4i - 4a;'n3i + 6,T 2II21 + ixm^ + ui'^lloi - yvi. 
 
 The ^'s should be further corrected for grouping by Sheppard's corrections (given on my p. 36), provided 
 there be high contact at the contour of the surface of frequency. Sheppard's corrections have not in this 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 33 
 
 These lead to 
 
 r=--207,579, i=--120,164, ^=--088,241, ^=--285,890. 
 
 Thus all the constants are determined. 
 We find 
 
 7,2-r3= -019,867, 
 
 .^2 (7;2_^2)_g2_. 001,281, 
 
 Mv'-^)-'^'-a<f>^-^<}>sY/{M.-^.')=-ooo,276. 
 
 These should be respectively zero for linear, parabolic, and cubical regressions. It 
 will be seen that they are satisfied with increasing closeness ; we might well be 
 satisfied even with the parabolic regression curve. The following are the regres- 
 sion curves determined, y^, being the actual number of branches in the whorl 
 (= 6*655, 375 +¥;,:), and x^, the actual position of the whorl : — 
 
 (a.) Straight line . • 
 
 y^^=7 -04:6,087 — -139,408 Xp. 
 
 (b.) Parabola from (Ixv.) : 
 
 i/,^=6-794,052--125,872a;^--077,592a;/; 
 or, 
 
 «/^^=6-858,561- -077,592(3;^- 1-991, 535)1 
 
 This clearly gives a maximum number of branches, 6-8536 corresponding to 
 a;j„=l-9915, a value within the limits of observation, 
 (c.) Cubic from (lix.) : 
 
 y^ =6-799,399 - -192,489 X^- -084,230 X/-\- -020,915 X/. 
 
 Here Xp is measured from the mean position=ajp— 2*802, 651, and.y^^ is, as before, 
 the total number of branches for the given position. 
 
 Condition (Ivii.) is so closely satisfied that we shall here get sensibly as good 
 results from (lix.) as from (Ivi.). 
 
 In the table below and in the curves of Diagram I. the values of the mean of 
 the arrays, as found from line, parabola, and cubic, are given and compared with 
 observation. 
 
 case been used, as this condition is not fulfilled. The axes x', y' actually taken for woodruff were those 
 through the third whorl and through six branches. 
 
 An obvious warning about the signs of the sums of the products may be given which may save 
 computators some trouble. The axes being taken positive, as in the accompanying 
 figure, then the sums of the products for IIii and Hgi are positive in the 1" and 
 3'*, negative in the 2°* and 4'" quadrants. For 1121 and n^ they are positive 
 
 4th 
 ■+y 
 
 1st 
 
 + x 
 
 in the 1" and 4"" quadrants and negative in the 2'"' and 3"* quadrants. In 2°^ 
 
 the figure the axes are taken so as to suit the x and y-directions of the table on 
 
 p. 31. Care must, of course, be paid to this point. The products may also 
 
 be found from the «/»,'s in the manner indicated on p. 35, footnote. They were thus verified in this case. 
 
34 
 
 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 Table II. — Mean Branches to each Whorl. 
 
 Xp = 
 
 0. 
 
 1. 
 
 2. 
 
 3. 
 
 4. 
 
 5. 
 
 6. 
 
 yxf from line .... 
 
 yxr „ parabola . . . 
 
 yx, „ cubic .... 
 
 Observed .... 
 
 
 ■7-046' 
 ■6-546' 
 '6- 117" 
 
 
 6-907 
 6-777 
 6-750 
 6-780 
 
 6-767 
 6-854 
 6-889 
 6-813 
 
 6-628 
 6-775 
 6-758 
 6-813 
 
 6-488 
 6-541 
 6-443 
 6-486 
 
 6-349 
 6-151 
 6-192 
 6-172 
 
 6-210 
 
 5-607 
 
 6-007 
 
 1 
 
 I think we may safely say that in the relationship of branches to position of the 
 whorl in woodruff we have a case of homoscedastic correlation, which is effectively 
 described by a parabolic regression curve. Thus, in a case of this kind, it is only 
 needful, besides the moments up to the fourth of the x-character, to find the 
 correlation coefficient r and the correlation ratio t/. 
 
 (9.) Illustration B. — On the Correlation between Age and Head Height in Girls. 
 
 The data for this are taken from my School Measurement series, and involve the 
 auricular heights of 2272 girls between the ages of 3 and 22. There was considerable 
 paucity of material at the extreme ends of the range, and accordingly as our correlation 
 curves are all obtained by weighting the observations, we can hardly expect good fits 
 near 3 or 22 years of age. The actual correlation table is given as Table III. 
 Sheppaed's corrections were applied throughout, and the unit of height is 2 millims. 
 
 In the first place the means, standard deviations, and 3'''^ moments of all the arrays 
 of heights for different years of age were determined. These are given at the foot of 
 Table III., but in actually calculating the constants more places of decimals were 
 used. Then the first six moments of the frequency of the ages were found and the 
 first four moments of the height frequencies. These are the x and ^/-frequencies. 
 They give us : — 
 

 
 
 
 
 
 
 
 
 To face page 34. 
 
 
 3-4. 
 
 
 
 20-21. 
 
 21-22. 
 
 22-23. 
 
 Totals. 
 
 
 
 millims. 
 
 
 
 
 
 
 
 
 minima. 
 
 
 
 102 -25-104 -25 
 
 — 
 
 
 
 — 
 
 — 
 
 — 
 
 2 
 
 102 -25-104 -25 
 
 
 
 104 -25-106 -25 
 
 — 
 
 1 
 
 
 — 
 
 — 
 
 — 
 
 10 
 
 104 -25-106 -25 
 
 
 
 106 -25-108 -25 
 
 — 
 
 i 
 
 — 
 
 — 
 
 — 
 
 10 
 
 106 -25-108 -25 
 
 
 
 108 -25-110 -25 
 
 — 
 
 
 
 — 
 
 — 
 
 — 
 
 27 
 
 108 -25-110 -25 
 
 
 
 110 -25-112 -25 
 
 — 
 
 
 
 — 
 
 — 
 
 — 
 
 56 
 
 110 -25-112 -25 
 
 
 
 112 -25-114 -25 
 
 — 
 
 1 
 
 — 
 
 — 
 
 — 
 
 59 
 
 112 -25-114 -25 
 
 
 
 114-25-116-25 
 
 1 
 
 
 
 1 
 
 — 
 
 — 
 
 115 
 
 114 -25-116 -25 
 
 
 
 116 -25-118 -25 
 
 — 
 
 
 — 
 
 1 
 
 — 
 
 142 
 
 116 -25-118 -25 
 
 
 
 118 -25-120 -25 
 
 — 
 
 
 
 1 
 
 — 
 
 — 
 
 244 
 
 118 -25-120 -25 
 
 
 1 
 
 120 -23-122 -25 
 
 — 
 
 
 
 — 
 
 3 
 
 — 
 
 265 
 
 120 -25-122 -25 
 
 
 
 122-25-124-25 
 
 — 
 
 
 
 2 
 
 — 
 
 1 
 
 261 
 
 122 -25-124 -25 
 
 'g- 
 
 e4-( 
 
 
 
 
 
 
 
 
 
 
 a- 
 
 4 
 
 124 -25-126 -25 
 126 -25-128 -25 
 
 — 
 
 
 
 1 
 
 1 
 
 1 
 
 265 
 219 
 
 124-25-126-25 
 126 -25-128 -25 
 
 
 
 128 -25-130 -25 
 
 — 
 
 
 
 1 
 
 1 
 
 — 
 
 197 
 
 128 -25-130 -25 
 
 
 
 130 -25-132 -25 
 
 — 
 
 
 
 1 
 
 1 
 
 — 
 
 131 
 
 130 -25-132 -25 
 
 
 
 132 -25-134 -25 
 
 — 
 
 
 
 — 
 
 — 
 
 — 
 
 88 
 
 132 -25-134 -25 
 
 
 
 134 -25-136 -25 
 
 — 
 
 
 
 — 
 
 — 
 
 — 
 
 77 
 
 134 -25-136 -25 
 
 
 
 136 -25-138 -25 
 
 — 
 
 
 
 — 
 
 — 
 
 — 
 
 52 
 
 136 -25-138 -25 
 
 
 
 138 -25-140 -25 
 
 — 
 
 
 
 — 
 
 — 
 
 — 
 
 20 
 
 138 -25-140 -25 
 
 
 
 140 -25-142 -25 
 
 — 
 
 
 
 — 
 
 — 
 
 — 
 
 16 
 
 140 -25-142 -25 
 
 
 
 142 -25-144 -25 
 
 — 
 
 
 
 — 
 
 — 
 
 — 
 
 11 
 
 142 -25-144 -25 
 
 
 
 144 -25-146 -25 
 
 — 
 
 
 
 — 
 
 1 
 
 — 
 
 4 
 
 144 -25-146 -25 
 
 
 
 146 -25-148 -25 
 
 — 
 
 
 — 
 
 — 
 
 — 
 
 — 
 
 1 
 
 146 -25-148 -26 
 
 
 Totals 
 
 1 
 
 
 7 
 
 8 
 
 2 
 
 2272 
 
 Totals. 
 
 
 Means 1 
 in r 
 
 115 -2500 
 
 11 
 
 r 
 
 123 -8214 
 
 126 -5000 
 
 125 -2500 
 
 124 -0467 
 
 Means 
 1 ^^ 
 
 
 1-millim. units J 
 
 
 
 } 
 
 
 
 
 
 [_ 1-millim. units. 
 
 
 Standard deviation 
 in 
 
 
 
 
 2 -5311 
 
 4 -1414 
 
 -9574 
 
 3 -4541 
 
 r Standard deviation 
 1 "^ 
 
 
 2-millim. units 
 
 
 
 
 
 
 
 
 L 2-millim. units. 
 
 
 Third moments | 
 in f- 
 
 
 
 - 4 
 
 
 - 2 -729 
 
 + 85-816 
 
 
 
 + 5 -206 
 
 r Third moments 
 1 ^° 
 
 
 2-millim. units J 
 
 
 
 
 
 
 
 
 L 2-niilIim. units. 
 
 
 
 
 
 
 
 
 
 
 
 
Table III. — Correlation between Age and Auricular 
 
 w 
 
 Totals . 
 
 millims. 
 102 -25-104 -25 
 
 104 -25-106 -25 
 
 106 -25-108 -25 
 
 108 -25-110 -25 
 
 110 -25-112 -25 
 
 112 -25-114 -25 
 
 114 -25-116 -25 
 
 116 -25-118 -25 
 
 118 -25-120 -25 
 
 120 -25-122 -25 
 
 122-25-124-25 
 
 124 -25-126 -25 
 
 126 -25-128 -25 
 
 128 -25-130 -25 
 
 130 -25-132 -25 
 
 132 -25-134 -25 
 
 134 -25-136 -25 
 
 136 -25-138 -25 
 
 138 -25-140 -25 
 
 140 -25-142 -25 
 
 142 -25-144 -25 
 
 144. -25-146 -25 
 
 146 -25-148 -25 
 
 Means 
 
 iu 
 
 1-millim. units 
 
 Standard deviation 
 
 in 
 
 2-millim. units 
 
 Third moments "l 
 
 ™ . r 
 
 2-milluu. umts J 
 
 3-4. 
 
 4-5. 
 
 115 -2500 
 
 116 -9643 
 
 2 -8853 
 
 - 42-822 
 
 Age. 
 
 5-6. 
 
 18 
 
 117 -4722 
 
 2 -9276 
 
 - 18-108 
 
 6-7. 
 
 7-8. 
 
 40 
 
 119 -1000 
 
 2 -9641 
 
 7-679 
 
 1 
 
 5 
 1 
 4 
 7 
 9 
 13 
 9 
 7 
 6 
 9 
 
 3 
 1 
 1 
 
 76 
 
 120 -3026 
 
 2 -9882 
 
 + 1 -782 
 
 8-9. 
 
 2 
 
 5 
 
 3 
 
 8 
 
 7 
 
 22 
 
 19 
 
 17 
 
 19 
 
 10 
 
 6 
 
 6 
 
 125 
 
 121 -6340 
 
 2 -6366 
 
 - 6 -171 
 
 9-10. 
 
 1 
 1 
 1 
 
 12 
 10 
 15 
 10 
 24 
 25 
 23 
 18 
 
 8 
 
 9 
 
 5 
 
 7 
 
 3 
 
 3 
 
 1 
 1 
 
 177 
 
 121 -7246 
 
 3 3877 
 
 + 15-893 
 
 10-11. 
 
 4 
 
 3 
 
 8 
 
 14 
 
 23 
 
 25 
 
 29 
 
 34 
 
 33 
 
 21 
 
 17 
 
 7 
 
 8 
 
 4 
 
 2 
 
 2 
 
 235 
 
 122 -8160 
 
 2 -9653 
 
 + 2 -330 
 
 11-12. 
 
 2 
 
 4 
 
 2 
 
 6 
 
 6 
 
 11 
 
 15 
 
 37 
 
 34 
 
 38 
 
 29 
 
 27 
 
 16 
 
 13 
 
 10 
 
 4 
 
 2 
 
 3 
 
 2 
 
 261 
 
 123 -1427 
 
 3 -2089 
 
 + -238 
 
 12-13. 
 
 2 
 
 5 
 
 9 
 
 16 
 
 18 
 
 44 
 
 41 
 
 33 
 
 40 
 
 27 
 
 20 
 
 17 
 
 13 
 
 9 
 
 10 
 
 3 
 
 1 
 
 309 
 
 123 -8908 
 
 3 -2061 
 
 + 8 -219 
 
 13-14. 
 
 2 
 2 
 4 
 
 3 
 
 4 
 10 
 13 
 23 
 32 
 21 
 32 
 32 
 39 
 17 
 
 8 
 11 
 
 4 
 
 2 
 
 2 
 
 2 
 
 263 
 
 124 -8622 
 
 3 -3589 
 
 - 7 -286 
 
 14-15. 
 
 1 
 
 9 
 
 3 
 
 7 
 
 9 
 
 11 
 
 21 
 
 22 
 
 23 
 
 20 
 
 25 
 
 15 
 
 5 
 
 13 
 
 5 
 
 2 
 
 4 
 
 3 
 
 198 
 
 125 -7146 
 
 3 -5865 
 
 + 3 -015 
 
icular Height of Head in Girls. 
 
 To face page 34. 
 
 
 Totals. 
 
 
 
 
 
 
 
 
 
 
 
 
 4-15. 
 
 15-16. 
 
 16-17. 
 
 17-18. 
 
 18-19. 
 
 19-20, 
 
 20-21. 
 
 21-22. 
 
 22-23. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 — 
 
 2 
 
 millims. 
 102 -25-104 -25 
 
 
 — 
 
 — 
 
 — 
 
 — 
 
 — 
 
 — 
 
 — 
 
 — 
 
 — 
 
 10 
 
 104 -25-106 -25 
 
 
 — 
 
 1 
 
 — 
 
 — 
 
 — 
 
 — 
 
 — 
 
 — 
 
 — 
 
 10 
 
 106 -25-108 -25 
 
 
 1 
 
 3 
 
 1 
 
 — 
 
 — 
 
 1 
 
 — 
 
 — 
 
 — 
 
 27 
 
 108 -25-110 -25 
 
 
 9 
 
 2 
 
 4 
 
 
 1 
 
 — 
 
 — 
 
 — 
 
 — 
 
 56 
 
 110 -25-112 -25 
 
 
 3 
 
 5 
 
 5 
 
 1 
 
 — 
 
 — 
 
 — 
 
 — 
 
 — 
 
 59 
 
 112 -25-114 -25 
 
 
 7 
 
 6 
 
 8 
 
 2 
 
 2 
 
 — 
 
 1 
 
 — 
 
 — 
 
 115 
 
 114 -25-116 -25 
 
 
 9 
 
 11 
 
 6 
 
 4 
 
 3 
 
 — 
 
 — 
 
 1 
 
 — 
 
 142 
 
 116 -25-118 -25 
 
 
 11 
 
 19 
 
 6 
 
 G 
 
 3 
 
 2 
 
 1 
 
 — 
 
 — 
 
 244 
 
 118 -25-120 -25 
 
 
 21 
 
 15 
 
 13 
 
 9 
 
 4 
 
 — 
 
 — 
 
 3 
 
 — 
 
 265 
 
 120 -25-122 -25 
 
 w 
 
 crq* 
 
 8. 
 
 r 
 
 22 
 2.3 
 20 
 
 18 
 26 
 18 
 
 25 
 14 
 16 
 
 9 
 12 
 13 
 
 4 
 
 10 
 
 9 
 
 1 
 1 
 
 2 
 
 1 
 
 1 
 
 1 
 
 1 
 
 261 
 265 
 219 
 
 122 -25-124 -25 
 124 -25-126 -25 
 126 -25-128 -25 
 
 25 
 
 29 
 
 16 
 
 11 
 
 7 
 
 — 
 
 1 
 
 1 
 
 — 
 
 197 
 
 128 -25-130 -25 
 
 
 15 
 
 18 
 
 12 
 
 6 
 
 6 
 
 4 
 
 1 
 
 1 
 
 — 
 
 131 
 
 130 -25-132 -25 
 
 
 5 
 
 16 
 
 7 
 
 7 
 
 6 
 
 — 
 
 — 
 
 — 
 
 — 
 
 88 
 
 132 -25-134 -25 
 
 
 13 
 
 9 
 
 11 
 
 8 
 
 2 
 
 1 
 
 — 
 
 — 
 
 — 
 
 77 
 
 134 -25-136 -25 
 
 
 5 
 
 14 
 
 6 
 
 3 
 
 2 
 
 1 
 
 — 
 
 — 
 
 — 
 
 52 
 
 136 -25-138 -25 
 
 
 2 
 
 2 
 
 4 
 
 2 
 
 — 
 
 — 
 
 — 
 
 — 
 
 — 
 
 20 
 
 138 -25-140 -25 
 
 
 i 
 
 2 
 
 2 
 
 — 
 
 1 
 
 1 
 
 — 
 
 — 
 
 — 
 
 16 
 
 140 -25-142 -25 
 
 
 3 
 
 — 
 
 4 
 
 — 
 
 1 
 
 — 
 
 — 
 
 — 
 
 — 
 
 11 
 
 142 -25-144 -25 
 
 
 — 
 
 — 
 
 2 
 
 1 
 
 — 
 
 — 
 
 — 
 
 1 
 
 — 
 
 4 
 
 144 -25-146 -25 
 
 
 — 
 
 — 
 
 — 
 
 — 
 
 — 
 
 1 
 
 — 
 
 — 
 
 — 
 
 1 
 
 146 -25-148 -25 
 
 
 198 
 
 214 
 
 162 
 
 95 
 
 61 
 
 13 
 
 7 
 
 8 
 
 2 
 
 2272 
 
 Totals. 
 
 5 '7146 
 
 126 -1565 
 
 126 -5340 
 
 126 -9132 
 
 127 -0205 
 
 129 -5577 
 
 123 -8214 
 
 126 -5000 
 
 125 -2500 
 
 124 -0467 
 
 Means 
 
 in 
 
 1-millim. units. 
 
 3 -5865 
 
 3-4fi63 
 
 3 '8696 
 
 3 -1679 
 
 3 -1235 
 
 4-8406 
 
 2 -5311 
 
 4 -1414 
 
 -9574 
 
 3 -4541 
 
 Standard deviation 
 
 in 
 
 2.milliin. units. 
 
 3 015 
 
 - 9 -615 
 
 + 9 -379 
 
 + 2 -991 
 
 + 0'070 
 
 - 29-164 
 
 - 2 -729 
 
 + 85-816 
 
 
 
 + 5 -206 
 
 Third momenta 
 
 in 
 2-millini. units. 
 
SKEW COEEELATION AND NON-LINEAE REGRESSION. 
 
 35 
 
 Height Constants. 
 Mean height = 124-0467 millims. 
 
 a-j,= 
 
 3-454,125 
 
 Ma= 
 
 11-930,977 
 
 /*3 = 
 
 5-206,247 
 
 /^4 = 
 
 438-639,633 
 
 ^\= 
 
 -015,960, 
 
 P\= 
 
 3-081,454, 
 
 m 
 
 2 millim. 
 
 units. 
 
 Age Constants. 
 Mean age = 12-7007 
 
 o-^= 3-064,819 "" 
 v^= 9-393,110 
 j'3= 1-051,882 
 v^= 239-157,055 
 
 in 
 
 r ysar 
 units. 
 
 Further 
 
 Sm = 
 
 2-093,366 millims. 
 
 \= 
 
 4-382,181 1 in 1 millim 
 
 K= 
 
 62-399,135j units. 
 
 Hence 
 
 
 (X,-3V)/(4X,^) = 
 
 -062,340, 
 
 1/6= 104-298,702 
 V6 = 9536-265,059 
 fii= -001,335, 
 
 )82= 2-710,593, 
 ^83= -014,093, 
 
 ^4= 11-506,681, 
 \/Wi=+ -036,538, 
 <^2= 1-709,258, 
 <^3= -250,123. 
 
 <f>,-. 
 
 4-158,032. 
 
 In the next place the products were worked out and referred to the means with 
 the following results : — * 
 
 ^11= 3-113,712, 
 2>2i=~ 1-957,022, 
 P3i= 74-447,616, 
 j94i= -108-701,559, 
 
 whence r= -294,128, 
 e= — -071,065, 
 ^=-•048,576, 
 ^=-•470,126. 
 
 Further, from 2m, t? = -303,024. 
 
 In deducing the product-moments after they had been referred to the means, the 
 
 * These products were in this case (as in all other cases) verified by calculating from the means of the 
 arrays t/xp, the expressions 
 
 s/%p?^!_fe^"l, gl w^y^pfe-j^) "!^ s|%p3M^j:^\ }gJ %,y«,fa-'«> \ 
 
 Of course it is easiest to calculate these products about some arbitrary origin coinciding with the 
 abscissa of one array. If these products be then p'u, p'21, p'31, p'n, and *' be the mean, we have 
 
 Pii=/u, 
 
 i'21 =p'ii - 2*>'ii, 
 
 Psi =p'i\ - 3x'p'2i + Ss'Vii, 
 
 Pa =p'ii - ^'p'zi + 6iB'y2i - 4iB'yii> ■ • • ' 
 B 2 
 
36 PROFESSOE K. PEARSON ON THE GENERAL THEORY OF 
 
 proper Sheppard's corrections were introduced. These are, if {pn], {p=ii]> \Pii}> 
 \Pii\ represent the uncorrected moments : — 
 
 Pn={Pn]> Pii=iPii]' 
 
 Psi={Pii}-i{Pn}, Pii={Pii]-2{Pn]> 
 
 the units of grouping being the units throughout. 
 From the constants for the arrays, I found 
 
 Xi-1 = --000,675, X3=-'007'198. 
 
 Whence the probable error of vj was determined by (xxxiii.). Its value was* 
 
 Probable error of 77= -012,913, 
 
 If found from the simple formula '67449 (l-iy^VN, the value is -012,851. We 
 accordingly are again forced to the conclusion that -q may for practical purposes be 
 found from this simple formula, instead of the complicated result (xxxiii.). Although 
 both Xi— 1 a.nd xs are small, it is very doubtful whether we can legitimately consider 
 the system as homoscedastic. The dotted line ab of Diagram II. would fairly well 
 represent increasing variability with age. The skewness of the arrays is relatively 
 small and changes sign so frequently, that we can certainly not attribute any law to 
 such heteroclitic tendencies as there are. They are probably due to errors of random 
 sampling from truly isocurtic material. 
 
 It will be seen that the height frequencies with ;S'i = '0160 and /8'3=3-0815 do not 
 differ very much from a normal distribution ; in fact, we can lay no stress on the 
 heteroclisy of the system at all. But the values of the standard deviations of the 
 arrays, or the graph of (T„Ja-y, certainly shows increasing variation with increasing age, 
 a phenomenon with which one is familiar in a variety of other human characters, t 
 
 This heteroscedasticity, due to increasing variation with growth, would hardly have 
 been anticipated from a mere inspection of the smaUness of xi \ it is somewhat 
 obscured by the irregular values of the standard deviations of the small arrays at 
 the adult end of the age range. The mean value of the standard deviation of the 
 weighted arrays is a-y v/l— ■>7" = 3-2992 in 2-millim. units. 
 
 We now turn to the regression curves to see how far the conditions for the 
 
 different types are satisfied. We have 
 
 ^2_^3_ -005,312, 
 
 <^2 (r?^-r2)-€^= -004,030, 
 
 <^2('?'-^')-e^-(l«^3-e.^3)7(<^2«^4-«^3')=-000,604. 
 
 * The contributions of the successive terms of (xxxiii.) are in fact given by 
 
 V = i {-824,785 + -001,870 + -004,673 - -000,472 + -001,888 }. 
 
 t See Pearson : ' The Chances of Death and other Studies of Evolution,' vol. I., pp. 296, 307, 
 310, 314. 
 
SKEW COREELATION AND NON-LINEAR REGRESSION. 
 
 37 
 
 But the first should be zero, if the regression be hnear ; the second, if it be 
 parabolic ; and the third, if it be cubical. 
 
 We see increasing approximation to fulfilment of the several conditions. Referred 
 to axes through the mean age and head height, the following are the regression 
 
 curves 
 
 * ._ 
 
 (a.) Straight line: 
 
 Y, =-662,979 Xp. 
 
 (&.) Parabola (from equation (Ixv.)) : 
 
 Y^,= -055,749+ -667,570 X^- -041,001 X/. 
 
 (c.) Cubic (from equation (Ivi.)) : 
 
 Y^,= -280,194+ -722,886 X^- -029,580 X/- -002,223 X/. 
 
 (c'.) Cubic (from equation (lix.)) : 
 
 Y^= -296,076 + -812,249 X^- -028,004 X/- -005,740 X^^ 
 
 (c') will not give as good results as (c), for it depends on a use of the condition 
 (Ivii.) which is not absolutely fulfilled. 
 
 The following table gives the values in the case of the four curves : — 
 
 Table IV. — ?/^_=Mean Auricular Height of Girl's Head at Given Age. 
 
 a;j, = age. 
 
 Regression line. 
 
 Regression 
 parabola.t 
 
 Cubic (c). 
 
 Cubic (c'). 
 
 Observed. 
 
 3-5 
 
 117-95 
 
 114-49 
 
 116-90 
 
 118-94 
 
 115-25 
 
 4-5 
 
 118 
 
 61 
 
 115 
 
 87 
 
 117-66 
 
 118-94 
 
 116-96 
 
 5-5 
 
 119 
 
 27 
 
 117 
 
 17 
 
 118-42 
 
 119-16 
 
 117-47 
 
 6-5 
 
 119 
 
 94 
 
 118 
 
 39 
 
 119-24 
 
 119-57 
 
 119-10 
 
 7-5 
 
 120 
 
 60 
 
 119 
 
 52 
 
 120-08 
 
 120-14 
 
 120-30 
 
 8-5 
 
 121 
 
 26 
 
 120 
 
 57 
 
 120-93 
 
 120-84 
 
 121-63 
 
 9-5 
 
 121 
 
 92 
 
 121 
 
 55 
 
 121-78 
 
 121-62 
 
 121-72 
 
 10-5 
 
 122 
 
 59 
 
 122 
 
 43 
 
 122-62 
 
 122-45 
 
 122-82 
 
 11-5 
 
 123 
 
 25 
 
 123 
 
 24 
 
 123-42 
 
 123-26 
 
 123-14 
 
 12-5 
 
 123 
 
 91 
 
 123 
 
 97 
 
 124-18 
 
 124-15 
 
 123-89 
 
 13-5 
 
 124 
 
 58 
 
 124 
 
 61 
 
 124-88 
 
 124-95 
 
 124-86 
 
 U-5 
 
 125 
 
 24 
 
 125 
 
 17 
 
 125-52 
 
 125-65 
 
 125-71 
 
 15-5 
 
 125 
 
 90 
 
 125 
 
 65 
 
 126-07 
 
 126-22 
 
 126-16 
 
 16-5 
 
 126 
 
 57 
 
 126 
 
 05 
 
 126-52 
 
 126-68 
 
 126-53 
 
 17-5 
 
 127 
 
 23 
 
 126 
 
 36 
 
 126-87 
 
 126-93 
 
 126-91 
 
 18-5 
 
 127 
 
 89 
 
 126 
 
 59 
 
 127-09 
 
 126-96 
 
 127-02 
 
 19-5 
 
 128 
 
 55 
 
 126 
 
 75 
 
 127-18 
 
 126-74 
 
 129-56 
 
 20-5 
 
 129 
 
 22 
 
 126 
 
 81 
 
 127-11 
 
 126-22 
 
 123-82 
 
 21-5 
 
 129 
 
 88 
 
 126 
 
 80 
 
 126-88 
 
 125-38 
 
 126-50 
 
 22-5 
 
 130-54 
 
 126-71 
 
 126-48 
 
 124-28 
 
 125-25 
 
 * Y-ep is here measured in millimetres and Xj, in years. 
 
 t The maximum ordinate is at vertex of parabola, i.e., a; = 8-1409, or age 20-84; its magnitude = 126-82. 
 
38 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 An examination of this table and the graphs on Diagram II. seem to show : — 
 
 (i.) That cubic (c) is considerably better than cubic (c'). 
 
 (ii.) That we do get a sensible betterment in passing from parabola to cubic, and, 
 accordingly, that we must use in this the cubic to effectively describe the regression 
 within the range of observation. Probably neither cubic nor parabola would effectively 
 serve for extrapolation even close to the limits of observation. 
 
 Thus the cubic (c') starting at 3-4 with its point of inflection is clearly 
 inadmissible, and the drop after 20 or 21 years of age, shown by both parabola and 
 cubic, is, of course, only due to the anomalous character of the few girls over 18 left 
 in the schools. Actually the shrinkage of measurements does not begin till at least 
 26 years, and is then far more gradual than these curves indicate. 
 
 But, as in all fitting of this kind, we obtain the best fit we can within the range, 
 entirely at the expense of what may occur just outside the range. For this reason, 
 as E. Peerin* has pointed out, a good interpolation curve is usually a bad extra- 
 polation curve. 
 
 We might sum up our results for auricular height with age in girls by saying : 
 That the correlation is non-linear, effectively cubic ; heteroscedastic, there being 
 increasing variability with growth ; that while the total height frequency is not very 
 far from normal the array frequencies are slightly heteroclitic, but so very irregular in 
 sign, that probably we are dealing with a case of isocurtic homoclisy, to which the 
 sparsity of data in the extreme arrays gives an appearance of anomic heteroclisy. 
 
 (10.) Illustration C. — On the Skew Correlation between Size of Cell and Size of Body 
 
 in Daphnia magna. 
 
 Dr. E. Warren has dealt with this point in a memoir published in ' Biometrika,' 
 vol. II., pp. 2.55-9. The resulting regression curve of size of cell for given size of 
 body is very far from linear, and it is quite clear that the correlation is skew. It 
 has already been noted in ' Biometrika ' that the relationship is considerably obscured 
 by the irregularities produced by ecdysis. Our object at present, however, is purely 
 theoretical, namely, to show how a certain system of constants and of curves describes 
 the actual correlationship, and for this purpose Dr. Warren's observations form as 
 good material for graduation as we could expect to find. The following Table V. 
 gives the observations with the working scales attached. I must refer to 
 Dr. Warren's paper (p. 256) for the relation between the units of grouping on the 
 working scales and those of the actual measurements on body and cell lengths. As 
 far as correcting the raw moments is concerned, Sheppard's corrections were used 
 for the cell sizes, but not for the body lengths, because the number of individuals in 
 the latter case was perfectly arbitrary and there is no approach to high contact. The 
 
 * ' Biometrika,' vol. Ill,, p. 99. 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 
 
 39 
 
 product moments were also uncorrected. The product moments were found in both 
 ways (see p. 35, footnote) and the results thus verified. 
 
 Table V. gives the means, standard deviations, and third moments of the arrays ; 
 the latter are all small and superficially irregular in sign. I think we may say that 
 there is no marked and continuous heteroclisy. On the other hand, I think we may 
 say that while the clitic curve deviates to and fro from a zero base, the scedastic 
 curve would fit better to a parabolic curve than to the straight line which is its 
 mean. In other words, the variability of the cells increases with size of body {i.e., 
 growth) up to a certain stage and then decreases again. This result is obscured by 
 the fall of the variability after each ecdysis. Roughly the ecdyses produce a rhythm 
 in all three curves, the regression curve, the scedastic curve, and the clitic curve. 
 When the means of the arrays are above the regression cubic, then the ordinates of 
 the scedastic curve are above their mean and those of the clitic curve show positive 
 skewness ; when they are below the regression curve, we have lessened variability 
 and negative skewness. In other words, the ecdyses are accompanied by lessened 
 cell variability and negative skewness of distribution. I think we may state that 
 there is a nomic heteroscedasticity due to growth of body, giving first an increased 
 variability with growth and afterwards a decrease with age. There is probably 
 isocurtic homoclisy. Both of these are, however, obscured by a semi-rhythmic 
 heteroscedasticity and heteroclisy introduced by the ecdyses. 
 
 We now turn to the constants of the cell and body length distributions, merely 
 noting that all these constants are given in terms of the units of the working scales. 
 
 Body Length Constants. 
 
 Further 
 
 Cell Constants. 
 
 
 Mean cell= 
 
 9-268,657, 
 
 a-y= 
 
 2-541,734, 
 
 /*2 = 
 
 6-460,410, 
 
 H= 
 
 2-142,362, 
 
 /*4 = 
 
 123-921,496, 
 
 )8i'= 
 
 •017,021, 
 
 )8.'= 
 
 2-969,111. 
 
 Sh = 
 
 1-454,600, 
 
 K= 
 
 2-115,862, 
 
 K= 
 
 15-142,840. 
 
 -3X/)/(4X/)= 
 
 -095,615. 
 
 Dgth = 
 
 8-502,488, 
 
 (Tj: = 
 
 3-864,784, 
 
 v% = 
 
 14-936,562, 
 
 Vz = 
 
 - 5-125,806, 
 
 Vi = 
 
 432-769,533, 
 
 "6 = 
 
 - 425-276,682, 
 
 "6 = 
 
 15192-5375, 
 
 A = 
 
 •007,885, 
 
 A = 
 
 1-939,793, 
 
 )83 = 
 
 •043,796, 
 
 i84= 
 
 4-559,091, 
 
 v//8i= 
 
 - -088,798, 
 
 4>,= 
 
 •931,908, 
 
 ^3 = 
 
 - -232,167, 
 
 4,,= 
 
 -788,409. 
 
40 
 
 PROFESSOE K. PEARSON ON THE GENERAL THEORY OF 
 
 8 
 
 I 
 
 
 Hi 
 
 o 
 
 ^' 
 a? 
 
 -4-S 
 
 o 
 
 t-i 
 
 O 
 
 pq 
 
 s 
 
 C8 !« _• 
 
 02 
 
 ooj-*»otot-ooo»a»o-*-*o«oo 
 
 -*(NOSrHt»coi:-OOt~co-*Or-n- 
 (M00e0l0lO00C01Ot--*lO-*Q0Oi-> 
 
 OOOO-*OOO(N(M«0r-(i-lO(N 
 i I I++I I + I++I l + l 
 
 O5IM5SieoO5000OeO«D-*t-'^i-HO>«O 
 OOOOlOtDOeC-^THOS-^OSCDQOi— I 
 
 Or-Hr— l(M(Mi— (r-li— (<M(MC<lr— li— 11— li— I 
 
 a 
 
 oeoooeoiomco^HtDeoooeDSD 
 oeoo»icit«.eoo3«D«o<NioooeOi— I 
 
 lC5int~000500Q0OO'-lO050SOO 
 
 ^ 
 
 ooooooooooooooo 
 
 o 
 
 £1 
 • fi 
 
 I I I I |co CO I I I 
 
 to lO •* 
 
 I I 
 
 I I 
 
 eo I I eo CO 00 00 
 
 (M ■* t- O 
 
 I-H CI I-H 1— I 1— I 
 
 
 t»OOeOO«0005rHOC£>t-lOlO 
 ■^(M(M<M(N(N<Mi-ii-l-*i-i 
 
 1— li— Il0«0t~t-00e0t-lffl0r-l051£5 
 
 i-i.— irHi— imcq<Mi— |(^^eo(^^•^l— I 
 
 oeo5Dooo>i-HOO'*«ot»i— it--^to 
 
 1-li-H (M<M-*i-H(Mr-l<MeO(MeO 
 
 lOiO"— loseocoi— itrqt-Mioososto 
 
 i-HCMi— l<?qCO-*i— <r— II— (i-HCJJi— ti— I 
 
 (MCOCOMi-ieo-*0000i— iOOJ:-i-i(MCq 
 
 C^Clr— It— IC^CO rH i — ^rHCN 
 
 CO I— I I— I I— I 
 
 OJ t- in t- lo 50 (M 
 
 CO I— < 
 
 (M 05 cq CO IM IM CO 
 
 1-1 00 i-l I <M 
 
 rM I I I I 
 
 I I 
 
 rH(MCO-*in«Ot~00050i-llMeO-*lC5 
 
 •q}Su3| ifpog 
 
 
 + 
 
 
 00 
 
 CO 
 
 o 
 
 00 
 
 lO 
 
 
 CO 
 
 
 00 
 
 o 
 
 CO 
 
 o 
 
 CO 
 
 00 
 
 CO 
 
 I— ( 
 
 CO 
 
 CO 
 CO 
 
 ^ 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 41 
 
 We have next the product moments referred to the means 
 
 Pn= 3-892,863, whence r= -394,862, 
 
 ^21=- 12-104,322, e=- -281,831, 
 
 Psi= 127-348,064, C= '098,578, 
 
 ^^i=_ 541 -433,455, ^=--759,344, 
 
 Further, from t^, 
 
 ri = -572,287. 
 
 From the constants for the arrays I deduced 
 
 Xi-1 = --108,148, xa= '088,323. 
 
 These are higher values of Xi~l ^^^ Xa than we have found in the first two 
 illustrations. 
 
 We now obtain, showing the contribution of each term of (xxxiii.), 
 
 2,2=^{-452,240--002,528+-010,803--013,180--027,875}. 
 
 Whence probable error of 7? = -67449 S,= "0097. 
 
 Had we calculated the probable error of rj from (xxxiv.), we should have found it 
 equal to -0101. The difference is greater than in the two previous illustrations, but 
 is only -0004, and this would have no significance in any practical use of the probable 
 error. We again conclude, therefore, that (xxxiv.) is sufiiciently close to replace 
 (xxxiii.) in practice. 
 
 For the mean standard deviation of the weighted arrays we have 
 
 0-^=0-^^1 —ri^=2-084:,358. 
 If we now examine the criteria for the nature of the regression, we have 
 
 iy2_r2= -171,596, 
 <^2(Tj2_r2)-e2= -080,483, 
 
 <l>,{v'-^)-i'-{U,-'^<l>sm<f>^i>^-h')=-079A57. 
 
 We should conclude, therefore, that linear regression is inadmissible, but that 
 parabolic or cubic will be moderately successful, the latter not very much better than 
 the former. Our moderate success only in this case is, of course, due to the irregu- 
 larity of the results to be graduated, the influence of the ecdyses being so disturbing 
 that we really need a curve periodically varying from the graduated regression curve. 
 
 We have the following regression curves : — 
 
 (a.) Straight line: 
 
 Y^,=: -259,687 X^. 
 
42 
 
 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 (b.) Parabola from (Ixv.) : 
 
 Y,.,=l-097,690 + -236,135X^- -073,490 X/. 
 
 The maximum occurs when X;; = r6066, and is given by Y;,^=l-2874, thus occurring 
 within the limits of observation.* 
 (c.) Cubic from (lix.) : 
 
 Y,, = -752,856 + -193,058X^- -049,817 X/+ -001,710 X/. 
 
 In all these cases Y^^ and Xp are measured from the means of the cell and body- 
 lengths, or from 9-268,057 and 8-o02,488 respectively. 
 
 Table VI. gives the calculated and observed results, and the whole system is 
 represented in Diagram III. Either the parabola or cubic graduates quite well the 
 results, allowing for the periodic deviation, and we may fairly describe the system as 
 a heteroscedastic cubic regression with isocurtic homoclisy. The correlation ratio is 
 very sensibly different from the correlation coefficient. The regression cubic does not 
 differ widely from that given in ' Biometrika,' which was obtained without weighting 
 the means of the arrays, and by simply striking the best cubic of the given type 
 through the points. 
 
 Table VI. — 2/»^=Mean Cell Length for Given Body Length in Daphnia. 
 
 a;p = body length. 
 
 Regression line. 
 
 Regression parabola. 
 
 Regression cubic. 
 
 Observed. 
 
 1 
 
 7-320 
 
 4-458 
 
 5-047 
 
 5-300 
 
 2 
 
 7-580 
 
 5-724 
 
 6-190 
 
 5-833 
 
 3 
 
 7-840 
 
 6-842 
 
 7-166 
 
 7-790 
 
 4 
 
 8-099 
 
 7-813 
 
 7-986 
 
 8-050 
 
 5 
 
 8-359 
 
 8-638 
 
 8-661 
 
 9-473 
 
 6 
 
 8-619 
 
 9-315 
 
 9-200 
 
 8-436 
 
 7 
 
 8-879 
 
 9-846 
 
 9-613 
 
 8-596 
 
 8 
 
 9-138 
 
 10-229 
 
 9-912 
 
 10-267 
 
 9 
 
 9-398 
 
 10-466 
 
 10-105 
 
 10-761 
 
 10 
 
 9-658 
 
 10-555 
 
 10-205 
 
 11-027 
 
 11 
 
 9-917 
 
 10-498 
 
 10-220 
 
 10-953 
 
 12 
 
 10-177 
 
 10-293 
 
 10-161 
 
 9-100 
 
 13 
 
 10-437 
 
 9-942 
 
 10-038 
 
 9-000 
 
 U 
 
 10-696 
 
 9-443 
 
 9-861 
 
 10-036 
 
 15 
 
 10-956 
 
 8-798 
 
 9-642 
 
 10-317 
 
 (11.) Illustration T). — On the Skew Correlation between Number of Branches to the 
 Whorl and Position of the Whorl on the Stem in Equisetum arvense. 
 
 I have selected this example not on account of any biological importance, because 
 the material is — especially with regard to the first and last two whorls — unsatisfactory 
 either on account of irregularity or of insufficiency of material. It has been taken 
 
 Actual values on working scales, a;, = 10-1091 and yai,= 10-5560. 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 
 
 43 
 
 purely from its statistical interest, because it gives a series with markedly skew 
 correlation, having a regression curve of a rough S -shaped character. If we omit 
 the first and last whorls, we get, as I have already shown,* a remarkably close fit 
 with a cubical regression curve. My present object, however, is not to consider any 
 law of growth, but merely a mass of statistical material, to be dealt with by the 
 processes of the present paper. 
 
 We may anticipate that the irregularities of the series, indicated in the memoir 
 just referred to, will make themselves manifest in a less satisfactory fitting of the 
 regression curve than occurs when we deal with the more homogeneous group oi 
 equally weighted whorls fitted in the diagram of that paper. Table VII. gives the 
 data, with the means, standard deviations, and third moments of each array. 
 
 The axis of x shall be taken to give the position of the whorl on the stem and that 
 of y to denote the number of branches. We require the regression curve of y on x, 
 or the probable number of branches on a whorl in a given position. We shall not 
 use Sheppard's corrections for the moments of either the x or ^/-characters, as high 
 contact certainly does not hold for both at the low- value ends of their ranges. 
 
 We have the following constants : — 
 
 Position Constants. 
 
 Branch Constants. 
 
 Mean position = 
 
 6-403,315, 
 
 Mean number of branches = 
 
 7-216,851, 
 
 <T^ = 
 
 3-542,604, 
 
 
 0-y = 
 
 3-278,499, 
 
 "2 = 
 
 12-550,046, 
 
 
 /*2= 
 
 10-748,557, 
 
 "3 = 
 
 8-249,534, 
 
 
 /*3=- 
 
 - 24-313,478, 
 
 »'4= 
 
 319-515,824, 
 
 
 H= 
 
 245-811,660, 
 
 "6 = 
 
 644-095,176, 
 
 
 
 
 1/6=11203-5814, 
 
 
 
 
 A = 
 
 •034,429, 
 
 
 ^\ = 
 
 •476,044, 
 
 A= 
 
 2-028,625, 
 
 
 i8'.= 
 
 2^127,658. 
 
 A= 
 
 -214,190, 
 
 Further 
 
 
 
 ^.= 
 
 5-667,884, 
 
 
 Sm = 
 
 2-789,949, 
 
 v//8x = 
 
 -185,550, 
 
 
 \= 
 
 7-783,815, 
 
 ^.= 
 
 -994,196, 
 
 
 h= 
 
 140-441,685. 
 
 •^3 = 
 
 •592,384, 
 
 Hence 
 
 
 
 «^4= 
 
 1'518,136. 
 
 (\,-3V)/(4V)= 
 
 -•170,503. 
 
 We have next the product moments referred to the means 
 
 * 'Proc. Roy. Soc.,' vol. 71, p. 308. 
 r 2 
 
44 
 
 PROFESSOK K. PEARSON ON THE GENERAL THEORY OF 
 
 
 Third 
 moment. 
 
 t->-HO-<*iOi-ie'5t-«iC)<MOC5t--^0 
 eOi-Hiosooot-i— i<NOsiOOSi-ie'50g»0 
 
 05r^OOO<N1005<N<M'*C^'-'000 
 
 IIIIII1II+ + + + + + + 
 
 eo 
 
 1— t 
 eo 
 
 1 
 
 1§ 
 1^ 
 
 OMt-i-iooiOi-ii-ieocntfiOsoi-ieoo 
 
 (^^rtr^rHl-li-H(M(M(MrHOOOOO 
 
 00 
 eo 
 
 § 
 
 OS-^t-OeOt-IMt-lO-^MCOC^JOOO 
 
 rHa5(Meo-*(Meoo»io«5:*^«052ifflO 
 
 t-OlOSOSOOSOOt-lOWO^li— IrHi-lr-Hi-l 
 
 i-H 
 
 1:- 
 
 03 
 
 1 
 
 «OCOtDCD^"^COi-HOiOt-t^Oi<M'^(M 
 (M(MIM(N(M<M<M!Mi-li-'<35«OeOr-l 
 
 i-HrHi-HrHrHi— 4i-HrH»-HT— < 
 
 00 
 
 T-H 
 
 1 
 
 1 
 
 1 
 
 1 
 
 I— 1 
 
 1 1 r 1 1 M 1 1 1 1 1 1 1 1 
 
 I-H 
 
 T-H 
 
 leoioeocMCO^I 1 1 1 1 1 1 1 1 
 
 rH 
 
 I-H 
 1—4 
 
 i-HtoeoQOO-«tit-'*l 1 1 1 1 1 1 1 
 rHi-HcqiMcocqi-i 1 1 1 1 1 1 1 1 
 
 eo 
 
 r— i 
 
 d 
 
 1— H 
 
 t~ o \a la r-i CO oi -^ ta \ \ \ \ \ i 1 
 
 r-H-*-*-*-<!t(eO(Mi-l 1 1 1 1 1 1 1 
 
 IM 
 
 OS 
 
 <r^t-iCieoic5iooo-*Oi-i| 1 1 1 1 1 
 (MC<5eocoeoeo(Me<ir-i 1 I 1 I I 1 
 
 O 
 
 to 
 
 00 
 
 05 I-H o> o O e<5 eo M t- lO 1-1 1 1 1 1 1 
 
 I-H 
 
 to 
 
 I-H 
 
 t-^ 
 
 eomOi«DQ0«D(Mi-lC5r-ii-li-i 1 1 1 1 
 
 I-H f— 1 (?^ I— t T-H 1 1 1 1 
 
 <M 
 
 I-H 
 
 i-H 
 
 ed 
 
 oocoi 1 |co«oe<5-*-*co| 1 1 1 1 
 
 III I-H i-H I-H IIIII 
 
 to 
 
 irf 
 
 OTi-Hl 1 |{M(niCiO»t-lO|i-H| 1 1 
 III <— t 1 III 
 
 I-H 
 
 ■* 
 
 0| 1 1 1 |-*lO'«*<i-HO»(M| 1 1 1 
 i-hIIII! i-hi-h iIII 
 
 
 w 
 
 eoi 1 1 1 |i-Hi>.eoe<500«O| | | | 
 
 lllll i-Hi-Hi-H III! 
 
 I-H 
 
 eo 
 
 <N 
 
 IN 1 1 1 1 1 1 eo o O oj -*-*-* I-H 1 
 
 llllll i-H<M(MMi-H 1 
 
 I-H 
 I-H 
 
 i-H 
 
 iM| 1 1 1 1 |<Mooooi-H-*-*ooeocq 
 
 llllll T-H M (M <M 
 
 (M 
 
 (M 
 
 I-H 
 
 
 i-H(Ma3-*io«cii>.aoosOi-H(Me'3-*i050 
 
 ^ 1 
 
 ■poqAV JO uowsoj 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 45 
 
 Pn= - 8-225,585, whence r= -708,222, 
 
 i)2i= - 21-471,321, e= --390,436, 
 
 2)31= -205-084,042, 4= +-029,733, 
 
 p^^= -917-984,938, 6= --960,212. 
 
 Further, from Sm, 
 
 7; = -850,984. 
 
 From the constants for the arrays we deduce 
 
 Xj-1= --356,367, )(2= --312,952. 
 
 We now ohtain, showing the contribution of each term of (xxxiii. ), 
 
 V=i{'076,080--157,932-|--055,359 + -079,662 + -038,579}. 
 
 Whence probable error of 7^= -67449 ^,= '0054. 
 
 Had we calculated the probable error ol r) from (xxxiv.) we should have found it 
 equal to -0049. The difference -0005 is not of importance for practical purposes. 
 Yet in this case it is clear that the values of ^^j — 1 and Xi ^^^ very sensible. Thus we 
 see that a very marked heteroscedastic and heteroclitic system with continuously 
 changing standard deviation and skewness scarcely affects for practical purposes 
 (i.e., to three significant figures) the probable error of 77, All four of our illustrations 
 therefore confirm the conclusion that : 
 
 For practical purposes the probable error of the correlation ratio, rj, may be taken 
 as -67449(l-'»,2)/N.(7) f^ 
 
 Our Diagram IV. gives the values of the relative standard deviations of the arrays, 
 or, (r„Ja-y, the horizontal line giving v^l— )7^=-5252, or the mean value of the relative 
 standard deviations of the weighted arrays. We have also the clitic curve giving 
 ^\/Pi, for each array,* The remarkable smoothness of these scedastic and clitic curves 
 in this case indicates how far certain types of correlation surfaces diverge fi:"om pure 
 normality of distribution, the divergence being obviously nomic. 
 
 We now turn to the regression curves and write down the conditions for the 
 
 different types; the three expressions should be zero for linear, parabolic, and 
 
 cubical regression respectively 
 
 ^3_^3_. 222,596, 
 
 </»2 {'q^-r^)-^= -068,864, 
 
 «^3 (V^-^^)-^-(I«^2-^>s)V(«^3«^*-«^3')= -010,127. 
 
 * i JPi = difiference between mode and mean divided by standard deviation = skewness in the case of 
 skew-curves of Type III. (' Phil. Trans.,' A, vol. 186, p. 373), and may be taken as a reasonable measure of 
 the skewness for those cases in which the fuller form involving ^2 would involve too laborious calculations. 
 If in equation (xii.) of the present memoir we put ^82 = 3 + a small quantity, and remember that ySj is itself 
 a small quantity, we see that the more correct formula for the skewness involving fi^ reduces, neglecting 
 terms of 2"'' order, to | ^fp[. 
 
46 
 
 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 We see at once that the straight line is inadmissible, the parabola will not be very 
 good, and the cubic only moderately appropriate. The conditions are not nearly so 
 closely fulfilled as in the cases of woodrufi" and head heights ; the last two are better 
 than in the case of Daphnia cells, but while the deviations in the case of Daphnia 
 were irregular, there being no approximate smoothness in the scedastic or clitic 
 curves, we shall find here more uniform deviations which would probably be partially 
 allowed for by a quartic regression curve. 
 
 The following are the regression curves : — 
 
 (a.) Straight line: 
 
 Y;,^=- -655,423 X^. 
 
 (b.) Parabola from (Ixv.) : 
 
 Y:,^=l-551,307--574,17lX^--123,610X/. 
 
 The maximum ordinate is at the position Xj„=— 2-3225, or a3p=4-0808, with 
 maximum number of branches yp= 9-435. 
 (c.) Cubic from (Ivi.) : 
 
 Y^,= 1-590,413--987,694X;,--]37,641X/+-016,605X^3 
 
 In all cases X^, and Y^^ are measured from the mean position and the mean number 
 of branches, i.e., 6-403,315 and 7-216,851 respectively. 
 
 The following table contains the calculated and observed results : — 
 
 Table VIII. — Mean Number of Branches to each Whorl in Equisetum. 
 
 Position. 
 
 Regression line. 
 
 Regression 
 parabola. 
 
 Regression 
 cubic. 
 
 Observed. 
 
 Regression cubic 
 without first whorl. 
 
 1 
 
 10-758 
 
 8-262 
 
 7-506 
 
 7-619 
 
 [8-207] 
 
 2 
 
 10-103 
 
 8 
 
 900 
 
 9-070 
 
 9-294 
 
 8-929 
 
 3 
 
 9-447 
 
 9 
 
 291 
 
 9-920 
 
 9-627 
 
 9-869 
 
 4 
 
 8-792 
 
 9 
 
 434 
 
 10-156 
 
 9-730 
 
 10-161 
 
 5 
 
 8-137 
 
 9 
 
 330 
 
 9-876 
 
 9-643 
 
 9-911 
 
 6 
 
 7-481 
 
 8 
 
 980 
 
 9-182 
 
 9-427 
 
 9-224 
 
 7 
 
 6-826 
 
 8 
 
 382 
 
 8-172 
 
 8-732 
 
 8-205 
 
 8 
 
 6-170 
 
 7 
 
 536 
 
 6-947 
 
 7-297 
 
 6-962 
 
 9 
 
 5-515 
 
 6 
 
 444 
 
 5-605 
 
 5-555 
 
 5-599 
 
 10 
 
 4-859 
 
 5 
 
 104 
 
 4-247 
 
 3-964 
 
 4-223 
 
 11 
 
 4-204 
 
 3 
 
 517 
 
 2-971 
 
 2-443 
 
 2-939 
 
 12 
 
 3-549 
 
 1 
 
 683 
 
 1-879 
 
 1-866 
 
 1-854 
 
 13 
 
 2-893 
 
 -0 
 
 399 
 
 1-069 
 
 1-462 
 
 1-072 
 
 14 
 
 2-238 
 
 -2 
 
 727 
 
 0-641 
 
 1-333 
 
 0-700 
 
 15 
 
 1-582 
 
 -5 
 
 303 
 
 0-694 
 
 1-250 
 
 0-844 
 
 16 
 
 0-927 
 
 -8-126 
 
 1-328 
 
 1-000 
 
 1-610 
 
 In the last column I have placed the results of re-working the whole system, 
 omitting the first whorl as largely influenced by the ground condition at the foot of 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 47 
 
 the stem.* The improvement of fit is not sufficiently great to justify a publication of 
 all the constants for the distribution in this modified case. But there is improvement 
 for the higher whorls, which are so few in number as to be wholly insignificant when 
 compared with the weight of the first few low whorls. 
 
 It wUl be noticed at once that the line and the parabola (which gives at the top of 
 the stem negative numbers !) are absolutely unsuitable for representing the facts of 
 the case. The cubic is better and certainly gives the general trend of the observa- 
 tions, but in this our last illustration we have clearly reached the limit of material to 
 which such cubical regression can be satisfactorily applied. See Diagram V. 
 
 (12.) Quartic Regression. 
 
 It seemed of some interest in this case of Equisetum to ascertain whether any real 
 improvement in description would be reached by considering the quartic regression 
 curve. I briefly indicate the theory in this case as developed from the general 
 method in the footnote, p. 25. We shall now have 
 
 Y J(r,=6o+&] (XV«r.)+63 (X^/o-.)H&3 (X,Mr +&* (XA.)*. 
 
 Eliminating h^ and hi, by the processes familiar to us from the case of cubical 
 regression, we have 
 
 +fe3{(X,/cr.)3-^,(XVcr,)-v/A} 
 
 + &J(XVc7.r-(^3/V^)(Vcr^)-^2} (Ixx.). 
 
 Hence as before 
 
 ^=63^2+63^3+6^(^5" 
 
 l=h^i+h4>i+\'i>6 > (Ixxi.), 
 
 where c^jj <^3> ^.nd ^^ are given as before by (li. and liv.), while 
 
 <^5=^4-^3-^2 (Ixxii.), 
 
 i>MPB-l3,fis-Mi)/\^i (Ixxiii.), 
 
 ^MMe-fi^'-^M/^i (Ixxiv.), 
 
 and 
 
 ^h=VlvJ(T^^\ ^^=vjcrj (Ixxv.). 
 
 Solving, we have 
 
 5 — H4>2^i—^i) — ^(^4*^5 — <^3<^6) — C(<^2^6 — 4>?.^h) (Ixxvi.) 
 
 « < 
 
 Koy, Soc. Proc.,' vol. 71, pp. 308-310. 
 
48 PROFESSOE K. PEARSON ON THE GENERAL THEORY OF 
 
 and 
 
 
 V. (Ixxvii). 
 
 Substituting in (Ixx.), the solution is completed. The advantage of this form is that 
 we see clearly the modifications made in 63 and 63 as we pass from cubical to quartic 
 regression. On the other hand, ^g and t^^, as shown by (Ixxv.), involve the 7'" and 
 8"' moments of the «-character. These are not only very laborious to calculate, but, 
 as we have already shown, are as a rule very untrustworthy. 
 
 If we proceed as on p. 26, equation (Ivii.), we find 
 
 7,2-r2=&3i+63^+fe/ (Ixxviii.). 
 
 Using this and not the third equation of (Ixxi.), we replace (Ixxvi.) by 
 
 6^ = ((^2<^^-(^32)^ 1 <A2 MMi-j>^^)\ . (Ixxix.). 
 
 This equation for 64 only involves the 7"' and not the 8"" moment, but like the 
 corresponding form (Ix. ) suffers from being a ratio of small quantities. (Ixxvii.) 
 completes the solution as before. 
 
 (Ixxvii.) and (Ixxix.) in conjunction give us a necessary condition for quartic 
 regression. We can indeed now write the whole series of conditions as follows : — 
 
 Linear regression : 
 
 Parabolic regression : 
 Cubical regression : 
 
 ^•3_,.2_^Y<^^_(^^^_^^^)7l^^(^^^^_^^2).^o. 
 
 Quartic regression : 
 
 ^2 (Mi — 4>i) (Mi — ^3^}{Mi^7 — M'a" — Mb' — Me" + '^■6Me) 
 
 (Ixxx.). 
 
 We now have a third possibility : we can get rid of the fourth product moment d 
 from the value of h^ and write it : 
 
 , _ ^ A / v'-r'-ey<k2-a4>.-e<f>,f/{UM,-<l>^)] 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 49 
 
 While this value of 64 does not suffer like (Ixxix.) from being the ratio of small 
 quantities, and would a priori appear to save the calculation of 6, yet the right sign of 
 the root may not be ovious on inspection, so that an actual determination of 6 to find 
 the sign of h^ may after all be needful. If (Ixxx.) were absolutely satisfied, (Ixxxi.), 
 (Ixxix.) and (Ixxvi.) would lead to identical results; but this will rarely be true in 
 practice. In any of the three cases \ and 63 will be given by (Ixxviii.). On the 
 whole, I consider that (Ixxxi.) and (Ixxvi.) will give the better results, and probably 
 the former the best, but it will generally require as much arithmetic as the latter. 
 
 (13). Illustration E. — Calculation of the Quartic Regression Curve in the Case 
 
 of Equisetum arvense. 
 
 The only new constants required are : 
 
 1/7=43,207-386, whence ^85 = 1-144,882, 
 
 vg = 507,649'540, ^86=20-463,633, 
 
 and : 
 
 <^5=3-425,069, <^s= 3-452,046, 
 
 <^7 = 15-015792. 
 These lead us to : 
 
 <^A-<^3<^6 ^ 2-723,384, M^-<l>s'h = 1-211,194, 
 
 9i9i—'Pa 9294—93 
 
 A,= 
 
 <^2. <^3» ^6 
 
 = 1-745,622. 
 
 Our successive conditions are therefore : 
 
 ^2_^_. 222,596, 
 
 ^2_ra-6V<^2= -069,266, 
 
 ^a_r2-eV<^3-(C.^,-e<^3)7] <^2 («^2<^4- «^3')} = -010,186, 
 
 r,^-r^-^/<l>,-{U,-i<l>,)y{MU*-<f>z')} 
 
 _ f ^(«^2<^4— '^S^) — e('^4<^5 — ^I^Sa) — r(^2<^6 — '^3«^s) } ^ _ .Any OAA 
 
 (^A-<^3^)A, 
 
 whence we see the successive approximations to the fulfilment of the conditions. 
 
 Clearly great gains arise when we pass from linear to parabolic, and from parabolic 
 
 to cubic regression, but the advance is not so conspicuous when we pass to quartic 
 
 regression. 
 
 G 
 
50 PROFESSOE K. PEARSON ON THE GENERAL THEORY OF 
 
 We have : — 
 
 From (Ixxvi.) : 6^=-044,517, and 6^= --648,122, 63=-171,260, 
 From (Ixxix.) : &^=-151,842, and 63= --940,410, &3=-041,981, 
 From (Ixxxi.) : 6^=-025,999, and ^2= --597,691, 63 = -193,688. 
 
 The equations to the three corresponding quartics are : 
 
 (a). Y^^=l-724,611- -913.208 X^--169,311 X/+ -012,629 Xp3+-000,927 Xp\ 
 (b). ¥^,=2-047,717- -734,966 Xj„--245,667 X/4- -003,096 X^^^. .003,161 X^* 
 (c). ¥:,,= 1-668,788 --944,192 Xp--156,137 X^H '014,283 X/+-000,541 X/. 
 
 The values of Y^^ and Xp are as before measured from the means, or 7-216,851 and 
 6-403,315 respectively. 
 
 The values of the observed and calculated ordinates are given in Table IX., and 
 the graph of the results in the lower half of Diagram V. 
 
 Table IX. — Mean Number of Branches to Whorl in Equisetum deduced from Quartic 
 
 Regression. 
 
 Position. 
 
 Quartic (a). 
 
 Quartic (b). 
 
 Quartic (c). 
 
 Observed. 
 
 1 
 
 7-731 
 
 8-269 
 
 7-637 
 
 7.619 
 
 2 
 
 8-950 
 
 8-662 
 
 9-000 
 
 9-294 
 
 3 
 
 9-715 
 
 9-222 
 
 9-800 
 
 9-627 
 
 4 
 
 10-014 
 
 9-674 
 
 10-073 
 
 9-730 
 
 6 
 
 9-858 
 
 9-816 
 
 9-866 
 
 9-643 
 
 6 
 
 9-281 
 
 9-521 
 
 9-240 
 
 9-427 
 
 7 
 
 8-339 
 
 8-740 
 
 8-270 
 
 8-732 
 
 8 
 
 7-109 
 
 7-498 
 
 7-042 
 
 7-297 
 
 9 
 
 5-692 
 
 5-898 
 
 5-656 
 
 5-555 
 
 10 
 
 4-209 
 
 4-116 
 
 4-225 
 
 3-964 
 
 11 
 
 2-816 
 
 2-407 
 
 2-875 
 
 2-443 
 
 12 
 
 1-651 
 
 1-100 
 
 1-745 
 
 1-866 
 
 13 
 
 0-930 
 
 0-600 
 
 0-987 
 
 1-462 
 
 14 
 
 0-857 
 
 1-389 
 
 0-766 
 
 1-333 
 
 15 
 
 1-665 
 
 4-022 
 
 1-259 
 
 1-250 
 
 16 
 
 3-609 
 
 9-133 
 
 2-657 
 
 1-000 
 
 From these results we deduce the following conclusions : — 
 
 (i.) That the use of a quartic instead of a cubic regression curve has not very 
 markedly bettered the fit. The failure to get a closer fit lies largely in the nature of 
 the material. The number of plants with more than 13 whorls is very few, and their 
 contribution allows little weight to the tail of the regression curve. Further, all our 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 51 
 
 attempts to fit a smooth regression curve show that the observed data are unduly 
 flattened at the top. If we confine ourselves to a homogeneous series of 110 plants 
 with ten whorls apiece, we get a remarkably good fit.* The S-shape of the 
 regression line as indicated in both cubic and quartic does, however, appear to be 
 characteristic of the nature of the plant, and I take it that more ample material 
 would allow of a closer analytical description by a simple cubic. I doubt whether for 
 practical statistics the use of the quartic will often be requisite. 
 
 (ii.) The comparative failure of the quartic (b) shows us that a formula like (Ixxix.) 
 is of small service. This corresponds fully to our experience in the use of (Ix. ) in the 
 case of the cubic. In both cases we get rid of a high moment by making a certain 
 constant the ratio of two small quantities, and experience shows us that the result is 
 unsatisfactory. It is accordingly preferable to use formulae involving high moments 
 of one variable in preference to those with a ratio of small quantities. 
 
 (iii.) The quartic (c) appears as good, if not slightly better, than quartic (a). In 
 (c) we have got rid of a high product moment, 6, by supposing the quartic condition 
 (Ixxx.) rigidly fulfilled. This of course is not the case. It is clear that product 
 moments like of the 5* order are far from advantageous, and this is the same principle 
 which was in evidence when we found (Ixv.) giving better results than (Ixiv.) for 
 parabolic regression. Hence we must further conclude that the use of third, fourth or 
 fifth product moments is disadvantageous as compared respectively with fifth to eighth 
 moments of one variable. Or, a moment two degrees higher is preferable to a product 
 moment in calculating correlation values. This is, I think, consonant with our 
 knowledge of the relative magnitude of the probable errors in the two cases. 
 
 (14.) General Conclusions. 
 
 (i.) The present paper provides us with a general method of dealing with the 
 regression line and the variability of arrays in the case of skew correlation, without 
 any assumption as to the analytical form of the skew correlation surface. 
 
 (ii.) It provides a nomenclature and classification of the types of array variability 
 which may be of service. 
 
 Arrays are either homoclitic or heteroclific, according as their skewnesses are of 
 equal magnitude or not. Arrays are further homoscedastic or heteroscedastic, 
 according as their standard deviations are alike ot different. Skew arrays are termed 
 allocurtic; if arrays are symmetrical about their mean, they are isocurtic. 
 
 A heteroclitic system of arrays may be nomic or anomic, according as the skewness 
 of the arrays changes continuously or irregularly with the position of the array. 
 
 A heteroscedastic system of arrays is also either nomic or anomic, according as the 
 standard deviation of the arrays changes continuously or irregularly with the 
 
 ♦ 'Boy. Soc. Proc.,' vol. 71, p. 308. 
 G 2 
 
52 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
 
 position of the arrays. Anomic heteroclisy and anomic heteroscedasticity probably only 
 signify that our material is either heterogeneous or too sparse to free us from the 
 large errors of random sampling in the extreme arrays. Still the terms will be 
 found of use in describing the actual data. 
 
 The curve in which the skewness of the array is plotted to its position is termed 
 the clitic curve ; the curve in which the ratio of the standard deviation of the array 
 to the standard deviation of the character in the population at large is plotted to 
 position is termed a scedastic curve. 
 
 (iii.) The types of regression have been classified into linear, parabolic, cubic and 
 quartic. For most practical purposes the first three suffice. Necessary criteria 
 have been given for each case. But as in the case of the skew frequency of one 
 character, an indefinite number of conditions ought theoretically to be fulfilled. 
 Practically in dealing with frequency, no criteria are absolutely fulfilled, and the 
 probable errors of the expressions used become unmanageable as Ave ascend in the 
 scale. We must therefore be content to estimate the degree of approximation with 
 which one or two necessary criteria are satisfied. 
 
 The fundamental test of deviation from the familiar form of linear regression is the 
 inequality of the correlation coefficient r and the newly introduced correlation 
 ratio 7;. The probable error of this latter is determined. It is shown that 
 o-y v/l — 7j^ is the mean standard deviation of a system of arrays in skew correlation. 
 The ease with which t; can be calculated suggests that in many cases it should 
 accompany, if not replace the determination of the correlation coefficient. 
 
 In the determination of the constants of the regression curve we must use 
 moments and product moments. The limitations to the order of the curve used 
 depend : (a) on the labour of the arithmetic, (b) on the increasing probable errors of 
 the higher moments and product moments. For these reasons it seems idle to propose 
 going beyond the 6"^ to 8"" moments, or the S'* to 5* product-moments. Practical 
 experience suggests that little is to be gained by using moments beyond the S"*, or 
 product moments beyond the 3'*. A quartic regression curve may be useful 
 occasionally, but it has yet to justify its necessity. As our object is not to repro- 
 duce the given data, but to provide a graduation for them, which smooths down the 
 errors of random sampling, we believe that any legitimate and practical theory must 
 discard the high moments and high product moments with which Thiele and LiPPS 
 propose to deal. 
 
 (iv.) There is one point to which reference ought to be made. Some reader may 
 enquire why the method of my paper on curving fitting* should not be applied 
 to these regression curves in general, as we have in practice once or twice 
 already applied it. It would seem that that method is the easier, involving in the 
 case of the quartic only quantities analogous to our r, e, C and 0. The answer is 
 
 * " On the Systematic Fittings of Curves to Observations a d Measurements." ' Biometrika,' 
 vol. I., pp. 265-303, and vol. H., pp. 1-23, especially the latter, pp. 11-15. 
 
SKEW CORRELATION AND NON-LINEAR REGRESSION. 53 
 
 straightforward : that process supposes every y^^ to have equal weight, or n^^ to be 
 the same for each array. Hence the higher moments of the a:-character, which are 
 really involved, can be written down without calculation once and for all.* The 
 complexity of our present investigation arises from the introduction of the weighting 
 into the calculation of the moments of the a;-character, as well as into that of the 
 product moments r, e, ^, 6. Our results therefore, although they might not look so 
 good on a graph of the regression curve, would be markedly better, if due weight 
 were given to the frequency of each array. The difference of the two conceptions is 
 comparable to the determination of the regression on the one hand from the 
 correlation coefficient, and on the other from merely striking a line through the 
 plotted means of the arrays. The method of moments in the present case, if we 
 except the use of -q, is identical with that of fitting a curve to a continuum in space 
 by the method of least squares. 
 
 (v.) No stress whatever is laid on the actual instances here selected for illustration of 
 the methods of this paper. I have merely chosen out of available material cases in 
 which I had come across skew regression of various types. Thus we find : — 
 
 (a.) The correlation of the number of branches and position of the whorl in 
 Asperula odorata is practically parabolic, homoscedastic and of nomic heteroclisy. 
 
 (6.) The correlation between auricular height of head and age in girls is cubical, 
 of nomic heteroscedasticity and of anomic heteroclisy. It is probably really a case 
 of isocurtosis. 
 
 (c.) The correlation of size of cell and size of body in Daphnia magna, allowing 
 for the irregularities produced by the ecdyses, is parabolic or cubic, of nomic 
 heteroscedasticity, and probably, but for the above-mentioned irregularities, of 
 isocurtic homoclisy. 
 
 {d.) The correlation of the number of branches and position of the whorl in 
 Equisetum arvense is cubical or possibly even quartic, of markedly nomic hetero- 
 scedasticity and markedly nomic heteroclisy. 
 
 It is not impossible that slips have occurred in the lengthy arithmetic involved, but 
 every important piece of work has been done independently twice, once by Dr. Alice 
 Lee, whom I have most heartily to thank for her unwearying assistance, and once 
 by myself. To preserve uniformity of working, the constants have in each case 
 been carried to six figures. This involves little or no additional trouble, using as we 
 do mechanical calculators. The final results are of course of no value beyond their 
 probable errors, which will be in the second or third place of figures. No doubt I 
 shall be told that there is a show of accuracy in the number of decimal figures 
 retained, which does not really exist. It does not exist (and I am as fully conscious 
 of its non-existance as any would-be critic) so far as our results fit the actual 
 population, of which we have but a random sample. The figures, however, are of 
 importance, as far as testing accuracy of fit of result to actual sample goes. The 
 
 ♦ 'Biometrika,' vol. II., p. 12. 
 
54 ON SKEW CORRELATION AND NON-LINEAR REGRESSION. 
 
 cubic or quartic curves may have coefficients insensible before the third or fourth 
 figure of decimals, and these coefficients have to be multiplied occasionally by 
 abscissae of the third or fourth powers of 7 to 9. Hence to get ordinates true, as 
 far as the sample goes, to the second or third figure, we require to work to a fairly 
 high number of figures. There is no magic in six figures, four or five would probably 
 satisfy another worker, but they are easily read ofi" the calculator we use, and if the 
 constants had been tabled only to four or five, no reader would have been able to 
 agree exactly, if he wished to test any of our results, even to three figures, with the 
 final ordinates. 
 
DIAGRAM I. SKEW CORRELATION IN ASPERULA ODORATA. 
 
 V»^,.o 
 
 SCEDASTIC CURVE 
 
 * 5 
 
 * 'ft 
 
 .7 NUMBER OF BRANCHES TO WHORL FOR CUBIC 
 
 '6 REGRESSION CUBIC 
 
 REGRESSION LINE 
 REGRESSION PARABOLA 
 
 REGRESSION 
 PARABOLA 
 
 REGRESSION CUBIC 
 
 fS S CLITIC 
 » CURVE 
 
 21 32 
 
 AGE OF GIRL 
 
DIAGRAM III. SKEW CORRELATION BETWEEN SIZES OF CELL AND BODY IN DAPHNIA. 
 
 ,j3 
 
 REGRESSION CUBIC 
 
 REGRESSION PARABOLA 
 
 ■S S 
 
 --■s 
 
 =IO 
 
 6 7 
 
 10 II 
 
 12 13 14 IS 
 
 SIZE OF BODY 
 
 DIAGRAM IV- SKEW CORRELATION BETWEEN BRANCHES AND POSITION OF WHORL IN EQUISETUM: 
 
 SCEDASTIC AND CLITIC CURVES 
 
 
 ;>(, -s- 
 
 SCEDASTIC CURVE 
 
 ' 
 
 CLITIC CURVE 
 
 7 8 S 
 
 SIZE OF BODY 
 
 10 II 
 
 12 13 14 IS 
 
DIAGRAM V. SKEW CORRELATION BETWEEN BRANCHES AND POSITION OF WHORL tN EQUISETUM : 
 
 REGRESSION CURVES. 
 
 REGRESSION CUBIC 
 BEQRESSION LINE 
 
 QUARTIC (b) 
 
 REGRESSION PARABOLA 
 QUARTIC (a) 
 
 QUARTIO (cj 
 
 POSITION OF WHORL 
 
DRAPERS' COMPANY RESEARCH MEMOIRS. 
 
 DEPARTMENT OF APPLIED MATHEMATICS, UNIVERSITY COLLEGE, 
 
 UNIVERSITY OF LONDON. 
 
 These memoirs will be issued at short intervals. The following are ready or 
 will probably appear later in this series : — 
 
 Biometric Series. 
 I. Mathematical Contributions to the Theory of Evolution.— XIII. On the Theory of Contmgency and 
 its Relation to Association and Normal Correlation. By Karl Pearson, F.K.S. Issued. Piice is. 
 II. Mathematical Contributions to the Theory of Evolution. — XIV. On the Theory of Skew Correlation 
 
 and Non-linear Regression. By Karl Pearson, F.R.S. Issiied. Price 5s. 
 III. Mathematical Contributions to the Theory of Evolution.— XV. On Homotyposis in the Animal 
 Kingdom. By Ernest Warren, D.Sc, Alice Lee, D.Sc., Edna Lba-SiiHth, Marion RADFOito 
 and Karl Pearson, P.R.S. Slwrtly. 
 
 Technical Series. 
 I. On a Theory of the Stresses in Crane and Coupling Hooks with Experimental Comparison with 
 Existing Theory. By E. S. Andrews, B.Sc.Eng., assisted by Karl Pearson, F.R.S. Ismed. 
 Price 3s. 
 II. On some Disregarded Points in the Stability of Masonry Dams. By L. W. Atcheri.ey, assisted by 
 KARt Pearson, F,R.S. Issmd. Vvke Zs. M. 
 
 III. On the Graphics of Metal Arches, with Special Reference to the Relative Strength of Two-pivoted, 
 
 Three-pivoted and Built-in Metal Arches. By L. W. AtcherLEY and Karl PeaSSON, F.R.8. 
 Issued. Price 5s. 
 
 IV. On Torsional Vibrations in Shafting. By Karl Pearson, F.R.S. 
 
 PUBUSHED BY DULAU AND CO. 
 
 MATHEMATICAL CONTRIBUTIONS TO THE THEORY OF EVOLUTION. 
 
 XL ON THE INIXUENCE OF SELECTION ON THE VARIABILITY AND 
 
 CORRELATION OF ORGANS. 
 
 By Karl Pearson, F.R.S. 
 
 ' Phil. Trans.,' vol. 200, pp. 1-56. Price 3s. 
 
 XII. ON A GENERALISED THEORY OF ALTERNATIVE INHERITANCE, 
 WITH SPECLA.L REFERENCE TO MENDEL'S LAWS. 
 
 By Karl Pearson, F.E.S. 
 'Phil. Trans.,' vol. 203, pp. 53-86. Price Is. 6d. 
 ■:f — — — — — 
 
 PJJPUSHED BY THE CAMBRIDGE UNIVERSITY PreBS. 
 
 BIOMETRIKA. 
 
 A JOURNAL FOB THE STATISTICAL STUDY OF BIOLOGICAL PROBLEMS. 
 
 Edited, in Consultation with Francis Galton, 
 
 By W. Fl B. Weldon, Karl Pearson and C. B. Davenport. 
 
 Vol. ni., Pabts II. and III. j Vol. III., Past IV. 
 
 I. Experimental and Statistical Studies upon Lepidoptera. 
 I. Variation and Elimination in Philosaniea eyutllia. 
 By Hbket Edwabd Obampton. 
 
 II. On the Laws of Inheritance in Man. — II. On the In- 
 heritance of the Mental and Moral Characters in Man, 
 and its Comparison with the Inheritance of the 
 Physical Characters. By Kabl Pbaesoh. 
 
 III. A Study of the Variation and Correlation of the Human 
 
 Skull with Special Bef erence to English Crania. By 
 W. R. Macdokkll. (With 50 Plates.) 
 
 IV. On the Inheritance of Co^t.colour in the Greyhound. 
 
 By Amy BABBiNaiON, Alice Lbb and K. Pearson. 
 
 V. Kote on a Bace of Clausilia itala (Von Martens). By 
 
 W. P. E, W^BLDOK. 
 
 Miscellanea. On an Elemeutaxy Proof of SHBf fard's Cor- 
 rections for Baw Moments and on some Allied Points. 
 (Editorial.) 
 
 I. Merism and Sex ia Spinax niger. By B. C. ivirstvi. 
 II. Note on Inheritance of Meristic Character ip Spiiitue 
 niger. By K. Peabson. 
 
 III. On the Measurement of Internal Capacity from €wiial 
 Circumferences. By M. A. Lbwbnz and K. Fbak> 
 SON. (With two Plates.) 
 
 IV. £tude Biom^trique sur les Variations de la Fleur et sur 
 I'HeteroBtylie de Fulmonaria .(0fii%al%* L. Par 
 EdmcndGain. "N 
 
 Miscellanea. (I.) On the; Correlatibn between Hair Colour ' 
 and Eye Colour in Man. By K. Tbak- 
 
 80N. , 
 
 (II.) On the Correlation between Age and, iihe 
 Colour of Hair and Eyes in Man. By 
 Q-. UcaiuA. 
 (III.) On the Contingency hetween OocapationB 
 in the Case of iFather and Sou. By 
 Emilt PsBBra. 
 
 (IV.) OnaOonTementMeftnsofDnMriiigCarTes 
 to various Scales. By O. XJsirT Tvle. 
 
 (V.) Albinism in Sicily. By W. Bax^boh. 
 
 The subscription price, payable in advance, is 30s. net per volume (post free); single numbers 10s. net. 
 Volumes I. to III. (1902-4) complete, 30s. net per volume. Bound in Buckram 34s. 6(1. nd peir Voluate. 
 Subscriptions may be sent to Messrs. C. J. Clay & Sons, Cambridge University Press Warehottee, Ave 
 Maria Lane, London, either direct or through any bookseller. *