:P^M$'&mi ^ni»ii™te,K!^.!.„2f census age returns ^^ "" ■ mo^ o,;nJ 1924 031 299 955 The original of tliis book is in tine Cornell University Library. There are no known copyright restrictions in the United States on the use of the text. http://www.archive.org/details/cu31924031299955 A THE ADJUSTMENT OF CENSUS AGE RETURNS ^ In order to put the problem before us in its simplest form it will be well to conceive the ages returned at an enumeration of the population as arranged in a table of two columns. The first of these columns, which may have the heading "x", contains in order the possible ages of human life, expressed in years, from zero to the maximum age. Opposite each of these ages and in the second column is placed the number of persons who have reported them- selves as of that age. The second column may be headed "Px".^ Thus P24 represents the number reported as 24 years old. It is evident tbat in this table the value of x progress from o to 100+ in simple arithmetical progres- sion. But the value of Px decreases as x increases, approaching zero at the maximum value of x. This de- crease, however, is not constant. Px is not always smaller than Px-i or Px-2. Consideration of the forces which influence the age constitution of the population shows that there is a neces- sary relation between the . age and the number living at that age. This may be shown by -assuming a hypothetical population in which there is no relation between the values of X and P'x (when P'x represents the true number living at age x). In such a population P'x might be constant. IThis is the third of a series of three studies in age statistics. T^e first, on "The Comparative Accuracy of Different Forms of Quinquennial Agre Grpups." ap- peared in the Quarterly Publications of the American Statistical Association lor March, 1900, and was devoted to an analysis of the nature of the errprs in the reported ages of adults. The second article, on "The Enumeration of Qhildren," was printed in the same Publications for March, 1901, and dealt with the errors in the reported ages of children, with special refei;ence to the relation of thoae errors to the apparent deficiencies in the number of children enumerated in the census. In connection with each of these studies lam under special obligations to Professor Walter P. WiUcox, of Cornell University, Chief Statistician for the Division of Methods and Results of the Twelfth United States Census. 2 Borrowing a term sometimes used in life table notation to designate the mean number of survivors in any year of life. 8o ' THE BULIvETIN OP Now the force which in an actual population is most effec- tive in causing: variations from this hypothetical condition — the force of mortality — has been found to be so reg'ular in its action that it may be yery closely represented by a mathe- matical formula in which it is made a function of the age} The other causes of divergence from the hypothetical constant value of P'x are: (a) variations in the force of mortality in different years, (b) variations in the birth- rate, (c) migfration. Our knowledge of the forces behind these disturWngf factors is incomplete, but may be in part supplied by empirical knowledge about the factors them- selves. In short, the number living may be properly con- sidered a function of the age, the form of the function being determined by the factors just considered. While the exact form of the function must be unknown, an examination of the nature of the determining factors will often enable us to discover whether a reported number Px can correctly repre- sent the value of P'x. If the values of Px, as returned by any census, be studied, it will be seen that the irregularities in the series seem to follow a more or less definite law. This may be seen most clearly in a graphic representation of the popu- lation, classified by the reported ages. If on the axis of abscissas we erect successive ordinates, in such a manner that the area included between the ordinates at x and x-f-i represents Px it will be seen that for ages over 20, the values of Px, in addition to following the general order of decreas- ing as the values of x increase, are relatively larger at each recurrence of certain forms of the value of x. The values of x may be classified in the order of the corres- ponding values of Px as follows : (i) Numbers whose last digit is o. (2) Numbers whose last digit is 5. ■ (3) Numbers whose last digit is 2, 4, 6 or 8. (4) Numbers whose last digit is i, 3, 7 or 9. 1 Gompertz' Law of Mortality as modified by Makeham. Cf. below, p. 98. WESTERN KESERVE UNIVERSITY. 8 1 In those censuses in which the age question relates to date of birth, rather than to the number, of years Hved, the same order holds good, if we consider x as representing the year of birth. None of the different factors which afifect the values of P'x are of such a nature as to produce these regularly recur- ring relative maxima and relative minima. We must con- clude that the Px series does not truly represent the P'x series- Hence the need of adjustment. The legitimacy of applying some method of adjustment to the census age tables' hinges on several considerations, chief among which is the greater probability of the P'x series forming a fairly smooth progression than of its con- forming to the irregularities of the Px series. This prob- ability is supported by both analysis and experience. As has been stated above, none of the factors which determine the form of the function P'x is of such a nature as to pro- duce sudden irregularities in the series. It is certain that changes in the birth and death rates as well as variations in the amount and direction of migration must leave their marks in the form of certain irregularities. On this account it is probable that the curve representing the P'x series con- tains several points of inflexion^ and that the first differ- ences of the series do not always have the same sign.^ Yet these flexures must not be confounded with the sharp angles of the curve representing the Pk series. With- out going into further details, we may conclude that it is not probable thai: in the age constitution of the population the principle of continuity is so far violated that the Px series can not be accurately represented by a fairly smooth curve. This conclusion, based on analysis of the control- ling factors, is confirmed by experience. The greater the accuracy of a census, arising from special care in the enum- i A point of inflexion occurs ■when a curve clianges from concave towards tlie abscissa to convex, or vice versa. 2 That is, that while the values of P'j( generally decrease as x increases, in cer- tain parts of the aeries the values of P'x increase with the values of x. 82 THE BULLETIN OF eration or from the intelligence of the persons enumerated, the more nearly does the curve representing the age returns approach smoothness of form. Hence the legitimacy of applying a curve smoothing process to the Px series. Before passing on the subject of methods, it will be well to consider briefly the tests of a good adjustment. The thing desired is a smooth series which will adhere as closely as possible to the facts. The aim should be to eliminate the irregularities in the Px series caused by mis-statement of age, while retaining those corresponding to actual irreg- ularities in the age constitution of the population. In test- ing a particular method of adjustment we shall be aided by the fact that the real irregularities usually take the form of flexures in the curve covering a period of several terms of the series, while the irregularities caused by errors are more likely to appear as angular deflections, corresponding to abnormal values of single terms. It follows that groups of terms are likely to be more accurate than single terms. Especially is this the case when our knowledge of the form of error is sufficient to enable us to choose groups so con- stituted that the probability of the equality of positive and negative errors is a maximum. To obtain the closest agreement with the facts such groups should contain as few terms as considerations of accuracy permit. The agree- ment of corresponding groups of the terms of the adjusted and unadjusted series may be considered as one test of a good adjustment. The various methods which have been used for the ad- justment of age returns and for similar purposes can be classified for purposes of examination as: (i) Methods of Substitution, (2) Arbitrary Methods, (3) Methods of Averaging, (4) Methods of Algebraic Interpolation, (5) Graphic Methods. WESTERN RESERVE UNIVERSITY. 83 I. Methods of Substitution. Speaking accurately, what we have called "methods of substitution" are not methods of, adjustment. Their dis- tinguishing feature is the discarding of the census figures and the substitution of presumably more accurate figures gleaned from other sources. I have been able to find only one instance in which a complete table of the ages of a pop- ulation has been prepared in this way. The age returns of the Italian census of 1881 were subjected to careful analysis, the results of which were published as a volume of the 'Annali di Statistica.'^ Several adjusted age tables were prepared, in one of which the. census returns were not used, except for the sum total of the population.^ The arithme- tic mean of the number of registered births in 1880, 1881 and 1882 was used as the basis of the calculation. Then, on the somewhat naive assumption that the annual number of births had been increasing in arithmetical progression, the total number of births in the decade 1863- 1872 was sub- tracted from the number in the decade 1873-1882, the differ- ence was divided by 10, and the quotient was called the mean annual increase. The number of births in each of the hundred years preceding the census was thus easily com- puted, and the probable number of survivors at the date of the census was obtained by the use of Rameri's Italian life table.^ The total number of males thus estimated to be liv- ing at the date of the census (13,822,447) was 442,936 less than were en^imerated in the census. The deficiency was ascribed to excess of emigration and other disturbing fac- tors. The estimated numbers were therefore increased pro- portionately. 1 'Sulla Composizione della Popolazione per Eta.' Rome, 1885. These studies were made in the 'Officio matematico della Statistica' under the direction of Luigi Perozzo. 2 Op. cit., p. 5. and ff. 3 The mean of the inale births in 1880, 1881 and 1882 was 532,111. The mean annual increase in the number of births was found to tie 2,215. Then if S- be the number of survivors at a given age out of 1,000,000 births, by the life table, the number of males X years old living at the date of the census would be ■^2,1U p^'^ (1- Jgli X). 84 THE BULLETIN OF It need not be pointed out that the method just de- scribed is extremely crude. For the ItaHan population, which has experienced a fairly regular rate of increase, it seems to have given passably good results. For a popula- tion subject' to greater fluctuations in the rate of growth this method would give results very far from the truth. Even in the Italian case the goddess of chance must have been propitious. There is less difficulty in substituting figures based on the registration of births and deaths for the earlier part of' the age table. This was done for ages under lo in the Ital- ian studies mentioned above, and the results were incorpor- ated in tables in which the higher ages were adjusted by other methods.^ In the census of England and Wales ad- justed ages under five are obtained by distributing the num- ■ ber reported by the census among the single years in accord- ance with the registered births and deaths.'' In a number of censuses in which no adjustment of the returned ages is attempted, the registration reports are made to furnish the material for tables showing the probable num- ber of children of different ages living at the date of the census. ^ Comparisons of such tables and the census age tables often lead to the detection of errors in both. The process of substitution at the lower ages is espe- cially useful because none of the methods of adjustment which seem especially well adapted to the greater part of the age table is as successful when applied to these lower ages. This follows from the fact that the error in the re- ports of children's ages is of a different nature from that in the reports of the ages of adults, and from the high rate of mortality among young children. There seems, there- fore, to be no reason why knowledge respecting the ages of 1 Cf. Op. cit., or 'Censimento della Popolazione del Regno d'ltalia al31 Dicembre 1881'; Relazione Generale, p. 115. 2 'Census of England and Wales, 1891,' General Report, pp. 29, 105. 8 Among the censuses which have followed this procedure are those of Germany, France, Hungary and Sweden. WESTERN RESERVE UNIVERSITY. 85 children, gained from other sources, if of a more accurate character than is furnished by the census, should not be made the basis of the adjustment of the lower terms of the Px series. For the United States as a whole such knowl- edge is lacking, although this process of adjustment could be successfully applied to the age returns of some of our state censuses. II. Arbitrary Methods of Adjustment. Under this head must be placed all attempts to smooth the age curve in which the justification 'fbr any particular change from the unadjusted figures is derived from the in- ternal evidence yielded by an examination of the figures themselves. Such a method might be called analytical, were it not that the difficulties of a detailed analysis of the errors are so great that in applying the method one is compelled to rely very largely on more or less arbitrary assumptions. An interesting attempt of this kind was made by Mr. W. W. Drew, Superintendent of the Bombay Census of 189 1. Mr. Drew's adjustment illustrates so well both the possibilities and the limitations of this kind of adjustment that his de- scription of it will be quoted at length. After showing that the numbers especially favored in the age reports are, in order, the multiples of ten,, the multiples of five, and the multiples of two, Mr. Drew says :^ The conclusions that it seems to me to be quite fair to draw from this are that the ages not stated in round numbers are in the main correct; that where a person knows his age pretty thoroughly, but not the number of years quite iccurately, he is more likely to record an even number than an odd one; that where he is more doubtful he will give the number ending in S or o that he thinks it to be nearest ; and when he is quite uncertain he guesses at the nearest 10. For instance, a man of 43 who knew the exact day of his birth would record himself as such ; but if he was not quite so accurate but knew his age pretty well he would be more likely to put himself down as 42 or 44 than 43. With still more uncerj:ainty he would go to 45 or to 40; but he would not be likely to pass over the nearest number ending in 5, viz., 45, for a more distant one like 35 or 55 ; nor 40 1 'Census of India,' 1891, Vol. VII, Bombay, Pt. 1, p. 55. For an account of an application of a similar metliod of adjustment to quinquennial age groups, see Censusof India,' 1891. Assam, Vol. I. Report, p. 99. 86 THE BUI,I^(n-l)) (D) The earliest adjustments by formulas of this nature were of the simplest kind. Mr. J. Finlaison, an English actuary, in a report on the Law of Mortality of the Govern- ment Life Annuitants (March, 1829), described a method used by him in the graduation of mortality experience. He used a formula like (B) above, but consisting of five terms in place of three. Applying such a formula he obtained for his adjusted value: P'x=2V[SPx + 4(Px-l + Px + l) + • ■ • ■ + (Px-4 + Px-l)] This formula is identical with that recommended by Th. Wittstein, a German actuary, nearly 40 years later. ^ Filipowski's method is very similar.'-^ Assuming that Pxf )^ should equal i (Px + Px+i) he deduced the following: formula: P'x=i(Px-l + 2Px+Px+l) In the formulas which have been given thus far there is one peculiarity to be noted. The principle on which they all are based is that in a series accurately representing the facts each term would be the arithmetic mean of the terms in any group of which it is the median term. Now this re- 1 Wittstein, 'MBthematische Statistik' (Hanover, 1867). p. 30; Journal of the Institute of Actuaries, XVII, pp. 418-420. 2 Filipowski described his method in the 'Insurance Record' of December 9, 1870. WESTERN RESERVE UNIVERSITY. 89 lation holds true only in a series of the first order, that is, a series whose law is arithmetical progression. Such series are represented graphically by straight lines. Thus the effect of applying one of these formulas to either an age or a mortality curve would be to straighten it as well as to smooth it. If the unadjusted curve is so irregular as to re- quire several applications of the adjustment formula, the error introduced by this straightening process will be con- siderable. The values of Px thus obtained will be too large when the curve is convex toward the axis of abscissas and too small when it is concave. However, we are not limited in our choice of adjust- ment formulas to those which presuppose a series of the first order. We can assume that the facts may be repre- sented by a curve of any given order,^ and construct our adjustment formulas accordingly. The general principles underlying the choice of these "weights" or coefiicients seems to have been first considered by Schiaparelli, of Milan, in a study of the adjustment of meteorological observations.^ It is not necessary to enter into a detailed discussion of the mathematical principles in- volved. The problem which Schiaparelli attacked may be stated as follows : Given a group of n terms, to find a com- bination of these terms that will represent the median term exactly, if the values of the terms are exactly known, and which will give the closest approximation to the value of the median term when the observed values of the terms are subjected to error. Schiaparelli gives tables of the probable errors of the adjusted terms for series of different orders, but of the gen- eral parabolic form: Px=A-FBx+Cx'+....etc. 1 It is not necessary that a curve of the order chosen should represent the facts for the entire series It is sufficient if it fits the conditions of every group of n terms. 2 'Sul modo di 'ricavare la vera expressione delle leggi della natura delle curve empiriche,' [Mil^n, 1867]. go THE BULLETIN OP The coefficients of the various terms for curves of dif- ferent orders are easily obtained by substituting: in the gen- eral formula (D) values derived from the principle that in a curve of the Mth order, the wth differences are constant. The peculiar interest of Schiaparelli's work to the student of age statistics lies in the fact that it was the basis of sev- eral adjustments of the age reports of the Italian census of 1881. An account of these adjustments may be found in the studies already mentioned in our account of adjustment by substitution.^ The simplest form of adjustment was used, the formulas being P'z=i(Pi-l+Px + Px+l) andP'x=i(Pz-2+Px-i+Px+Px+i+PK+2) These formulas, as we have stated above, are accurate only when applied to a series of the first order. Now, whatever may be the form of the age curve, it is certainly not a straight line. To obtain the final adjustment in the case mentioned, the three term formula was applied twice to the crude returns in quinquennial groups. Values for single years were then interpolated, and the results were in turn smoothed by applying the three-term formula twice and the five-term formula three times. The resulting series is certainly "smooth," but how closely it represents the age constitution of the population at the time of the census is an open question. A graphic representation of the adjusted curve shows that it approaches the form of a straight line. The adjustment does not conform to one of the tests of a good adjustment — the general agreement of corresponding groups of terms of the adjusted and unadjusted series. The justification given in the studies under considera- tion for the use of these simple forms ,oi the adjustment formula is that the probable errors, as determined by Scjii- 1 Annali di Statistica, "Sulla Composizione della Popolazione per Eta " Rome, 1885. Cf. especially pp. 61 and ff. A short account of tne method of ad justment used, together with some of the results, will be found in "Censimento della Popolazione del Regno d'ltalia al 31 Dicimbn, iSSi," Relatione Generale pp. xxxix-xliv and 113-115. ' WESTERN RESERVE UNIVERSITY. 91 aparelli, were less for two successive applications of a for- mula of the first order than for one application of a formula of the second order.^ It is hard to see what this fact has to do with the matter. Schiaparelli deduced his coefficients of error on the assumption of an agreement between the or- der of the series and the adjustment formula. That is, he assumed that the adjustment formula was fitted to the form of the curve to be adjusted. If we apply a formula fitted for the adjustment of a series of the first order to a series of the second or third order, we introduce an error much greater than the difference between the "probable errors" of two adjustment formulas of the same form but of differ- ent orders. In fact, it is very certain that successive groups of terms of an accurate age series cannot be represented by curves of either the first or second degree. Such curves cannot contain points of inflexion, which, as we have seen, are usually present in age curves. Against the theory of adjustment upheld in the Italian age studies, it may further be stated that the errors in age reports cannot be treated as errors of observation. This point will be treated at greater length in another part of this study. Some important contributions to the theory of the ad- justment of irregular series were made by E. L. De Forest, whose first article on the subject appeared in the 'Smith- sonian Report' for 1871.'' De Forest devoted considerable study to different methods of obtaining the weights of co- efficients used in adjustment formulas, and reached some very interesting results. He first used formulas in which the weights increased in arithmetical progression from the extreme terms to the middle one. These were discarded for formulas in which the coefficients formed a curve of the sixth order tangent to the axis of X at the zero weights, — that is, at the first terms not included in the formula. An- 1 Loc. cit^, p. 75. 2 For other articles by De Forest 6n this subject see The Analyst, Vols IV and V, also a pamphlet on 'Interpolation and Adjustment of Series,' New Haven, 187ti. 92 THE BULI,ETIN OF Other method which gave good results consisted in finding those values of the weights which would make the probable value of the fourth differences a minimum. This method assumes that the mortality curve (which De Forest had es- pecially in mind) may be closely represented by successive curves of the third order. As such curves admit a point of inflexion, the assumption does not seem to be unfounded. De Forest suggested several other ingenious methods of obtaining adjustment coefficients. In his subsequent arti- cles in 'The Analyst' these methods were perfected. It must be said that De Forest's studies are the most thorough that have been made with reference to this kind of adjustment. However, his methods are not especially applicable to census data and probably have never been used for adjusting age tables. The next method to be considered under this head is one' devised by Mr. W. S. B. Woolhouse, and first used by him in graduating the Hm table for the institute of Actu- aries. His description of it is as follows •} " If we begin at the first age in the table and extract the num- bers living at quinquennial intervals, Pio, Pis, P20, Pa.'i . . . we can, by the formula for interpolation, determine all the intermedi- ate values at the other ages, and so obtain a complete series of values that shall be continuous. Geometrically speaking, we shall then pass a continuous curve-line through the indicated quinquen- nial points. Against the adoption of such curve lines as the basis of the final table, there is manifestly this tangible objection, that the numbers at the ages of 10, 15, 20, 25 are made use of exclusively, and that the original numbers between those ages are wholly ignored as data. This rather material objection, which is inherent in other methods of adjustment, is entirely removed by varying the epoch of the adopted quinquennial data, that is, by tak- ing the five distinct series hereunder stated, viz: Pio P15 P20 P25 Pii Pie P2i P26 P12 Pi7 P22 P27 Pi3 P18 P23 P28 Ph P18 P24 P20 Then by separately interpolating the intermediate values for each of these series, and by finally taking the arithmetical average or mean value, of the five completed sets of results. The logical I Journal of the Institute of Actuaries, XV, pp. S90, S91. WESTERN RESERVE UNIVERSITY. 93 premise that virtually guides us to this last deduction is the recog- nized principle, that the probabilities of positive and negative errors are equal. Reverting again to a graphical illustration, all the points of the original data are thus occupied by five distinct curves, assimi- lating to the experience and to one another, and forming in combination a kind of net work ; and at every age the resulting ordinate of the adjusted curve is the arithmetical mean of the fi.ve corresponding ordinates, and the five curves are as it were mutually drawn in towards a central course." It may seem that this method is in one sense a method of interpolation, rather than a method of averaging. But its essential principle is that the value of the adjusted term is the mean of the value of five other terms. Moreover, Mr. Woolhouse has moulded the rather cumbersome process de- scribed above into the application of a formula similar to those described by Schiaparelli and De Forest. Thus : P'x=.20Px + .I92(Px-l+ Px+l) + .l68(Px-2+ Px+2) + .056 (Px-^ + Ps+s) +.024(Px-4+Px+4)— .Ol6(Px-6 + Px+6) / + .024(Px-7+Px+7) This formula seems to be the best of its kind, and for many years was standard among actuaries. Mr. Wool- house's use of quinquennial intervals gives his method certain advantages in its application to census figures.^ If the percentage of reported ages concentrated on multiples of 5 were the same in each quinquennial group, and if the other errors of statement were distributed proportionately on the first, second, third, fourth, and fifth ages of each quinquennial group, the Woolhouse method would give a very close approximation to the facts. Of course, the errors ^ in an age table are not distributed so evenly, but there is enough correspondence between the actual con- ditions and the hypothetical conditions just considered to afford some justification for the use of the Woolhouse method. There is, however, one important objection to the use of what we have called "methods of averages" in the ad- 1 Mr. Woolhouse did not have census problems in mind when he evolved his formula. He states that his reason for adopting an interval of five terms had reference chiefly to ease of computation. [Jour. Inst. Act., XXIX, 237.] 94 THE BULLETIN OF justment of census age returns. All of the adjustment formulas considered under this head were constructed with a view to the elimination of accidental errors. By acci- dental errors we mean natural errors arising from paucity of observations ; errors whose distribution may be assumed to be effected by chance. With accidental errors the prin- ciple holds good that errors of a given amount are as liable to occur in one term as in another. That a given term has a positive error affords no basis for presuming that the error of the next term will be negative. The errors in the age table are not of this nature. They are systematic errors and take certain definite forms. There are undoubtedly some accidental errors in the age tables, but they are few in number, and tend to neutralize each other. • Schiaparelli's formulas were intended for the reduction of meteorological observations, while those of Woolhouse and De Forest were intended for the graduation of mortality tables based on the experience of life in- surance companies. The errors in these tables are mainly such as would be eliminated if the number of observations were indefinitely large, and the application of adjustment formulas based on the general theory of errors is entirely justifiable.^ Physical observations, such as those made in as- tronomical research are often subject to both kinds of errors. Before an astronomer can apply the "method of least squares" with a view to obtaining the most probable value of a number of observations he must first eliminate the known errors. Until the known errors are eliminated, the law of error is not applicable, as it relates solely to residual errors.^ 1 It is stated in the Census of Assam (India), for 1891 [Vol. I, p. 97], that the errors in census age reports "tend to eliminate each other'* and that "the approach to absolute correctness" varies "in direct proportion with the number of persons included in the returns." This statement is without foundation. 2 Cf. Sorley, 'Observations on the Graduation of Mortality Tables,' Joar. Inst. Act., XXII, 811. WESTERN RESERVE UNIVERSITY. 95 It should also be remembered that the errors in census age tables are usually very large, while the adjustment formulas are adapted only to comparatively small errors. In order to smooth the age curve, it would usually be neces- sary to make several applications of any of these formulas. The necessary difference between the actual form of the age curve and the form assumed in the construction of any formula which might be used, would in this way introduce a considerable element of error. This point need not be developed farther, as it has already been brought out in the discussion of adjustment by simple arithmetical averages. As has been suggested, the Woolhouse formula hap- pens to be so constructed that its use in the adjustment of census age returns can be in a measure justified. But as Mr. Woolhouse himself said in the passage quoted above, it is based "on the recognized principle that the proba- bilities of positive and negative errors are equal." Since this hypothesis does not apply to the periodic errors of the census age tables, the use of the Woolhouse formula cannot be recommended. What we want is a method of adjust- ment based, not on the theory of errors, but on the peculiar conditions of the problem under consideration. It has seemed necessary to discuss this matter at some length, for in at least three instances, methods of adjust- ment based on the theory of errors have been applied either to the census age returns or to mortality tables based on such returns.^ Of course what has been said against the lise of these adjustment formulas for the smoothing of the age curve applies with equal force against their use in smoothing mortality tables based on census statistics. Methods of Algebraic Interpolation. From a mathematical standpoint, interpolation is not a method of adjustment, but a method of obtaining values ] Cf. besides the Italian studies already mentioned, 'Census of New South Wales,' 1891, Statisticians' Report, o. 150; 'Eighth Census of the United States,' 1860, Mortality and Miscellaneous Statistics, p. 518. 96 THE BULLETIN OF of the intermediate terms of a series from known values taken at fixed intervals. The ordinary method of interpola- tion is by the use of finite differences, although in theory any method by which the values of the constants in an as- sumed equation can be obtained may be used as a method of interpolation.^ In the special problem presented by the census age re- turns, the particular method of interpolation is less im- portant than the method of securing the "known values at fixed intervals," since none of the terms of the Px series can be supposed to be accurate. In the adjustment of the age returns of the English census, this difficulty is sur- mounted by graduating the Qx instead of the Px series. (Qx represents the number living at and above any age x.) It has been shown that the value of a group is more likely to be in accordance with the facts than is the value of a single term. The difference in the accuracy of groups and of single terms is more marked when the form of group used is chosen with reference to the nature of the error in the reported ages. Thus it is assumed in the case of the English census that the most accurate groups are the decennial groups Ps — Pi6, Pis — P26, Fit, — Ps.j, etc., in which the year of greatest concentration is the median year. Now the values of these groups are the differences between the values of Qx taken at corresponding decennial intervals, and the error in the Qx series is no larger* than the error of the series formed by the decennial groups. Here then, we have "known values at fixed intervals" which may be con- veniently used for interpolation. It should be noted that 1 De Forest, in his article in the 'Smithsonian Report' of 1S71, already men- tioned, develops a method of interpolation based on the principle that "in a con- tinuous series whose law is ^ven or assumed, the sum of a limited number of terms can be regarded as a definite integral, which is the aggregate of a succession of similar integrals corresponding to the terms considered" ILoc, cit., p. 277.] When the known terms of a series are subject to errors of observation, the "method of least squares" is most advantageous. 2 The relative error is less. WESTERN RESERVE UNIVERSITY. 97 the terms of the Px series are the first differences of the Qx series. In interpolation by finite differences the first dififerences of the adjusted series cannot be expected to pro- gress as smoothly as the series itself. This difificulty is partly overcome in the practice of the English census by in- terpolating the values of log Qx.^ This method of adjustment seems better adapted to cen- sus figures than any of the other methods thus far examined. But the form of age groups used in the English practice seems open to criticism. In the first place a group of ten terms is too large. It covers up too many of the real irregularities of the age series. Moreover, the placing of the year of greatest concentration in the center of the group probably retains a greater error in the series than would be present if the first term of each group were a year of con- centration. This follows from the fact that analysis of age returns shows that many more persons under-state their ages than over-state them.^ In the appendix to this article will be found a table showing the results of applying this method of interpolation to the returns of the United States census of 1890. The process used was as described above, except that the fixed values used were log Qio, log Qie, log Q20, etc. It will be seen that the Px curve is fairly smooth, with the excep- tion of a few points at the junctures of the quinquennial groups. These are due to the fact that the irregularities in the progression of the groups are reflected in the first differ- ences of the Qx series.^ 1 By using the logaritlini a coustant percentage of error is changed to a con- stant amount of error. 2 Cf. my article on "The Comparative Accuracy of Different Forms of Quin- quennial Age Groups/' Quarterly Publications oi the American Statistical Asso- ciation, March, 1900. 3 The adjustment was made by the use of four orders of central differences. For a good description of various methods of applying the principle of finite differences to interpolation, see the chapters on 'Finite Differences' and 'Interpolation' in the 'Text Book of the Institute of Actuaries,' Vol. H. See also Rice, 'Theory and Practice of Interpolation,' Boole, ^Finite Differences,' and Bowley, 'Elements of Statistics,' p. 242 and ff. 98 THE BULLETIN OF The only other method of interpolation that needs con- sideration in this connection is interpolation by an ex- ponential formula; more specifically, by the Gompertz- Makeham law of mortality. This formula may be written : ' Px=ka^b-"^ in which k, a, b, and c, are constants determined from known terms of the series. This formula has been much used in actuarial practice, and has been found to express the decrease in the numbers living froni the age of lo upwards with a close degree of approximation. It is evident that no formula of this kind can be ex- pected to represent the age constitution of an actual popula- tion. It takes account of only one of the factors which de- termines the age constitution of the population. This formula is mentioned here because it was used by Mr. G. F. Hardy in the graduation of the age tables of the census of India of i88i.^ Mr. Hardy retained the results given by the formula only for the ages above 60. For the lower ages his results were used "simply as a base line by which to adjust, by the graphic method, the actual numbers recorded in the Census Returns, the adjusted curve being made to run into this base line about age 60."" As Mr. Hardy's ad- justment was made for the purpose of constructing life tables, which are intended to represent only the normal mor- tality, his procedure seems entirely justifiable. But for the purpose of obtaining a true statement of the age constitu- tion of an enumerated population arbitrary formulas of this nature are useless ; unless, indeed, the number of constants in a formula is made equal to the number of known values which it is desired to retain in the adjusted curve. In such cases, however, an arbitrary formula loses any peculiar 1 This formula is based on the supposition that there are two components in the law of human mortality : one expressing a chance distribution of deaths; the other expressing the decrease (with advancing age) of the ability to "withstand the forces tending to close life. 2 Report on the Census of British India taken on the 17th Februarv. 1881 Vol. I, pp. 160-162. SZ,oc. cit., p. 161. WESTERN RESERVE UNIVERSITY. 99 claim to represent a "law of mortality" and becomes a rather cumbersome method of interpolation. V. Graphic Methods. If an irregular series be represented graphically, and if a smooth curve be drawn which follows the general form of the irregular curve as closely as possible, we have an example of adjustment by the graphic method. If the terms of an irregular series are in successive groups, these groups may be represented by successive paralellograms, the width of each paralellogram being fixed by the number of terms in the corresponding group, and its height by the average value of the terms. Then if a smooth curve is drawn through the upper ends of the paralellograms, in such a manner as to add to each group as much area as it cuts off, we have an example of interpolation by the graphic method.'-^ This method has been used in several censuses. Its ad- vantages are manifest. It can be made to give as smooth a curve as is desired, and its application requires no mathe- matical knowledge. On the other hand it has the manifest disadvantage that no two persons in applying it would ob- tain exactly the same results. If any one using the method had preconceived ideas as to the form of the age curve, they would very probably be reflected in his results. Moreover, the small scale on which the curve must in practice be drawn, prevents a close reading of the values of the interpolated terms. Probably a very good curve could be obtained by readjusting the results of a graphic adjustment by means of the Woodhouse formula. The same considerations as to the form of age groups to be used apply to the graphic method as to the method of differences. 1 The best description of the graphic method, although from the standpoint of a partisan, is by T. B. Sprague, 'The Graphic Method of Adjusting Mortality Tables,' Jour. Inst. Act., XXVI, pp. 77-112. See also W. S. Jevons, 'Principles of Science,' Sec. on 'The Graphical Method.' It may be ot interest to note that the graphic method was used by Milne in the graduation of his famous Carlisle life tables. See the articles by Wm. Sutton and Geo. King on the 'Method used by Milne in the Construction of the Carlisle Table of Mortality' in Vol. XXIV of the Jour. Inst. Act, lOO THE BULLETIN OF General Considerations. It seems that of the methods discussed, interpolation by differences and the graphic method possess certain ad- vantages, when the curve to be smoothed is a census age table. Of the two, the graphic method is the more flexible, and will give possibly a smoother curve. It would appear that these advantages are more than balanced by the fact that the method of interpolation by differences is more definite and objective. It will give identical results if ap- plied independently by two or more persons.^ The prin- ciples underlying mterpolation by differences can be readily comprehended by anyone whose knowledge of algebra in- cludes the binomial theorem. Both the graphic method and interpolation by differ- ences give results which can be arranged in age groups whose totals will correspond to the totals of the correspond- ing groups of enumerated ages. The graphic method fails completely at the more ad- vanced ages. The values of the groups of terms above P 85 are so small that they do not afford a sufficient basis for a graphic adjustment. The method of differences is more successful, but does not give as good results as at the lower ages. The percentage of error in the reported ages above 80 or 85 is very high, and the error is of a different nature from that found in the reports of lower ages. Persons above 80 or 85 systematically over-state their ages. If the adjustment of these higher ages is necessary, it could be done best perhaps by using the decennial groups 85 — 94, 9S+. which are probably more accurate (for these ages) than the quinquennial groups recommended for the lower ages. For ages below 20 there is no perceptible concentration on multiples of five. Hence it is probably sufficient to begm the adjusted series with P20. It may be added that no method of adjustment is satisfactory when applied to ages 1 Assuming, of course, that the different computers use the same'orders ot differences and the same formula. WESTERN RESERVE UNIVERSITY. lOI below five. The error here also takes the form of over- statement. This, coupled with the high mortality, makes it difficult to improve upon the reported figures, except by substituting figures drawn from more accurate sources. Allyn a. Young. APPENDIX. AGES S OF THE PoPUI