QJotncU llntDet0ity Slibcati} 3tl{ata, ^ta Qack BOUGHT WITH THE INCOME OF THE SAGE ENDOWMENT FUND THE GIFT OF HENRY W. SAGE 1891 Cornell University Library HD9074 .M82 Forecasting the yield and the price of c oiin 3 1924 032 471 132 The original of tliis book is in tine Cornell University Library. There are no known copyright restrictions in the United States on the use of the text. http://www.archive.org/details/cu31 924032471 1 32 FORECASTING THE YIELD AND THE PRICE OF COTTON THE MACMILLAN COMPANY NEW YORE • BOSTON ■ CHICAGO • DALLAS ATLANTA ■ SAN FRANCISCO MACMILLAN & CO., Limited LONDON • BOMBAY ■ CALCUTTA MELBOURNE THE MACMILLAN CO. OF CANADA, Ltd. TORONTO FORECASTING THE YIELD AND THE PRICE OF COTTON BY HENRY LUDWELL MOORE PROFESSOR OF POLITICAL ECONOMY IN COLUMBIA UNIVERSITY AUTHOR OF " ECONOMIC CYCLES: THEIR LAW AND CAUSE," AND OF " LAWS OF WAGES " " We have to contemplate social phenomena as susceptible of prevision, like all other classes, within the limits of exactness compatible with their higher complexity." Adguste Comtb. THE MACMILLAN COMPANY 1917 All rights reserved COPTBIQHT, 1917 Bt the macmillan company Set up and printed. Published October, 1917. CONTENTS CHAPTER I PAGE Introduction 1 CHAPTER II THE MATHEMATICS OF CORRELATION A Frequency Distribution 17 The Standard Deviation as a Measure of Dispersion . . 20 The Fitting of Straight Lines to Data 28 The Coefficient of Correlation 40 CHAPTER III THE GOVERNMENT CROP REPORTS The Character and the Aim of the Crop-Reporting Service 52 Technical Terms: Normal, Condition, Indicated Yield per Acre 58 The Accuracy of Forecasts Tested . . 65 Acreage and Production . . .... 82 CHAPTER IV FORECASTING THE YIELD OF COTTON FROM WEATHER REPORTS The Official Forecasts of the Yield of Representative States 94 Forecasting the Yield of Cotton from the Accumulated Effects of the Weather 100 The Results Compared for the Representative States . .115 Three Possible Objections . . 121 vi Contents CHAPTER V THE LAW OF DEMAND FOR COTTON PAGE Two Practical Methods of Approach 140 Statics and Dynamics Discriminated . . . 147 A Complete Solution of the Problem . . 151 CHAPTER VI Conclusions 163 FORECASTING THE YIELD AND THE PRICE OF COTTON FORECASTING THE YIELD AND THE PRICE OF COTTON CHAPTER I INTRODUCTION An eminent economist has recently told us that economists no longer talk so confidently as they once did of forecasting social phenomena, and that, con- fronted with the complexity of social relations, "the sober-minded investigator will be slow in laying too much stress on single causes, slow in generaUzation, slowest of all in prediction." An equally distinguished statistician has warned his colleagues of the dangers of using refined mathematical methods in the treat- ment of the loose data supplied by our official bureaus. These are authoritative warnings, and I have not been unmindful of them as the successive theses of this Essay have been developed. But the ultimate aim of all science is prediction; the most ample and trust- worthy data of economic science are official statistics; and the only adequate means of exploiting raw statis- tics are mathematical methods. The statistical devices used in the treatment of our problem of forecasting prices were, for the most part, invented, for another purpose, by Professor Karl Pear- son, and rest upon the theory of probabilities. Of the 2 Forecasting the Yield and the Price of Cotton Pearsonian superstructure one may repeat what La- place has said of its foundation: "qiie la theorie des probabilites n'est, au fond, que le bon sens reduit au calcul; eUe fait apprecier avec exactitude ce que les es- prits justes sentent par une sorte d'instinct, sans qu'ils puissent souvent s'en rendre compte." On the Cotton Exchanges of the world there are always certain specu- lators, les esprits justes of the commodity market, who seem to know by a kind of instinct the degrees of sig- nificance to attach to Government crop reports, weather reports, changes in supply and demand, and the move- ments of general prices. Mathematical methods of probability reduce to system the extraction of truth contained in official statistics and enable the informed trader to compute, with relative exactitude, the in- fluence upon prices of routine market factors. The Department of Agriculture of the United States, referring to the use of its crop-reporting service, has briefly described the aim of its admirable statistical organization : "Everything . . . which tends toward certainty, as regards either supply or demand, is distinctly ad- vantageous to the farrner. Hence to throw Ught on future conditions and do away, as far as possible, with uncertainties as to supply and demand, is the principal object of the statistical work of the Department of Agriculture and constitutes the sole reason for the col- lection of data and the pubhcation of information re- garding current accumulations of farm products and concerning crop conditions and prospects. In so far, therefore, as these data are accurate and reliable — Introduction 3 qualities which depend on the integrity and inteUigence of crop correspondents and their interest in the work — the publication of the information secured can not fail to reduce the uncertainty regarding the future values of farm products, and thus have an important cash value to all farmers." Without a doubt great values are at stake. If the size of the cotton crop of 1914 is taken as a standard, an error in an official crop report which should lead to an ultimate depression of one cent a pound in the price of cotton lint would cost the farmers $80,000,000, or more. A corresponding error leading to a similar rise in price would entail upon manufacturers and consumers a comparably heavy loss. The Department of Agriculture has rendered its reports continuously for some fifty years, and yet, as far as I am aware, no one has either measured the degree of accuracy of the information it supplies "con- cerning crop conditions and prospects," or attempted to see whether, by different methods, more truth might not be gained from the stores of raw figures that its Bureaus collect. Government Departments seeking appropriations are very likely, out of administrative necessity, to stress their successes and suppress their failures. In the January number of the official Crop Reporter, for 1900, this illustration is given of the value to planters of the Government crop-reporting service : "The past year has afforded a striking example of the influence of the reports on prices. As early as Au- gust the Division of Statistics called attention to the prevaiUng drought and its deleterious effect upon the 4 Forecasting the Yield and the Price of Cotton growing crops. July 1 the average condition reported was 87.8; August 1 it was 84; September 1 it was 68.5; October 1, 62.4; resulting in an average estimated yield of lint cotton per acre of 184 pounds. Now, the lowest price of futures in the New York Cotton Exchange during 1899 was reached June 29 when July deUveries sold at 5.43. The highest price was November 9, when July deliveries sold at 7.74. Conunercial authorities of high standing had strongly disputed the position taken by the Division of Statistics, their estimates running as high in some cases as 12,000,000 bales. To- day the Department estimate of December 10 of 8,900,000 bales is generally conceded to be very close to the truth, even by these same commercial authori- ties. While, therefore, the effect of these overestimates was only temporary, it was, nevertheless, sufficient to cause a loss of several millions to the cotton planters." This is a success defiantly stressed. We shall have to take but a step to come upon a failure ingloriously suppressed. The reference to the preceding instance is given in the January number of the Crop Reporter, for 1900, p. 2. But in this same year 1900, the Crop Reporter for July, p. 2, gives the following account of the condition alnd prospects of the cotton crop for the current year: "Not only was the condition on July 1 for the cotton region as a whole the lowest July condition on record, but in Georgia, Florida, Alabama, and Mississippi also it was the lowest in the entire period of 34 years for which records are available, while in Tennessee it was the lowest with one exception and in South Caro- lina, Texas, and Arkansas the lowest with two excep- Introduction 5 tions in the same period of 34 years. Excessive rains, drowning out the crop, and followed by an extraordinary growth of grass and weeds, are reported for almost every State, and the gravity of the situation is greatly in- creased by the general scarcity of labor. In South Carolina, Georgia, Alabama, Louisiana, and Texas considerable areas will have to be abandoned." Notwithstanding this ill-boding forecast, the rec- ords of the Bureau of Statistics show that the yield per acre for 1900 was, with the exception of two years, the largest in three decades. Later on, as the crop approached maturity, the successive monthly reports departed more and more from the early forecast and then the official Bureau issued a final estimate that approximated the truth. When, however. Secretary Wilson of the Department of Agriculture was seeking with the help of Senator W. B. Allison to prevent the dupUcation by the Census Bureau of work usually done by his Department, he refers to the final estimate, by his Department, of the cotton crop of this same year, 1900-1901, as "an estimate so accurate that its sub- sequently ascertained close agreement with actual pro- duction was commented upon throughout the entire cotton world as a marvel of statistical forecasting." {Crop Reporter, March, 1902, p. 4.) Now, obviously, what is needed for business and for scientific purposes is not one or more illustrations either in praise or in blame of the Government crop-reporting service, but a quantitative testing of the accuracy of the continuous service throughout a long period, say a quarter of a century. For business and for scientific purposes one must know the degree of accuracy with 6 Forecasting the Yield and the Price of Cotton which, upon an average, from the official data avail- able at any time, one can forecast the ultimate yield. On many, if not upon all the Cotton Exchanges of the country, the daily variations of rainfall and temper- ature in the states of the Cotton Belt, during the grow- ing season, are, for the information of brokers, plotted on a large map. The Government crop reports de- scribe the variations of weather during the interval covered in their crop survey. The leading newspapers give daily reports of the "Weather in the Cotton Belt." To-day, August 18, 1916, the New York Times prints a typical description of the influence of weather re- ports on trading and prices : COTTON ADVANCES IN STEADY MARKET STORM IN THE GULF OF MEXICO KEEPS THE TALENT GUESSING BUT RAINS HELP TEXAS CROP "There was a steady undertone in the cotton market yesterday, but trading was rather light when the wide-spread interest in cotton at this season is taken into consideration. There was a manifest disposition to wait for further crop developments, and the fact that there was a storm in the Gulf of Mexico working toward the cotton belt made both the longs and the shorts a bit timid. On one hand there was the possibility that this disturbance might bring much needed rain to the region west of the Mississippi; on the other hand the danger that it might give bad weather to the Southern Atlantic States, where there has been severe damage by storms and too much rain. . . "Private reports from the belt were not particularly bullish, as they told of scattered showers in Texas and improvement in some parts of the Eastern States. The talk of the approaching storm, how('\'('r, rather <)\-cr,siiadowed the reports of the weather of the Introduction 7 minute, although the bulls did not neglect to call attention to the fact that there was no relief in Oklahoma, where rain was much needed." A series of critical questions is suggested by the great importance which Government Bureaus, Cotton Ex- changes, and the PubUc Press very obviously attach to the weather conditions as they are related to the cotton crop : (1) Variations of both temperature and rainfall must affect the yield per acre of cotton, but do they affect the yield in the same way and to the same degree? What is the measure of the ef- fect of each, independent of the other? What is the measure of their joint effect? (2) In Texas, throughout a quarter of a century, the yield of cotton has been steadily falUng, while in Georgia, throughout the same interval, the yield has been steadily increasing. How, then, can one measure the effects, jointly and separately, of rainfall and temperature upon the crop of each state and upon their combined crop? (3) Suppose that the above questions are satisfac- torily answered for one particular month. Would the answers be different for different months? Or would the particular combina- tion of temperature and rainfall for, say, July, produce an equal effect with the same combination for August? Are the answers for this and the above questions the same for all of the cotton-producing states? (4) Supposing that one has solved the above ques- 8 Forecasting the Yield and the Price of Cotton tions for all of the states of the Cotton Belt, how could one take account of the variations of the weather from the beginning of the grow- ing season up to a given date, in such a way as to be able to forecast their possible joint effects upon the ultimate yield of cotton? (5) Supposing that one could forecast the yield per acre of cotton from the successive reports of the Weather Bureau, how would the degree of accuracy of such forecasts compare with the forecasts of the Crop-Reporting Board, which are based upon the direct observations of the thousands of correspondents of the Department of Agriculture? A knowledge of the acreage and of the probable yield per acre of cotton will afford the necessary data to compute the probable supply. But in order to fore- cast the probable price of cotton hnt, the law of demand for cotton must be known. That is to say, one must know the probable variation in the price that will accompany a computed variation in the supply. With regard to this question of demand economic science is in the state which electrical science had reached about the middle of the nineteenth century. It would appear that there are two sciences of eco- nomics, one of the class room and one of the market place, and the difference between the two is the same as the difference described by Fleeming Jenkin as existing between the Electricity of the Schools and the Electricity of the Practical Engineer: "The difference between the Electricity of Schools Introduction 9 and of the testing office has been mainly brought about by the absolute necessity in practice for definite meas- urement. The lecturer is content to say, under such and such circmnstances, a current flows or a resistance is increased. The practical electrician must know how much current and how much resistance, or he knows nothing." The Open Sesame to academic economics is the "law of supply and demand" or "the equation of demand and supply." No general problem within the confines of the science may be approached except through the "law of supply and demand." But, as incredible as it may seem, what the law of demand actually is for any one commodity is nowhere stated in the text-books. Indeed not only do the text-book writers forbear to state the law for any one commodity, but, as a rule, they either omit to say whether there is any hope of ever knowing the law in any concrete case, or else say bluntly that the law can never be known because their discussion of economic theory is confined to normali- ties within an hypothetical, static state. The economist of the market place, however, not only must know that, under given circumstances of the supply, the price will rise or fall, but he must know the probable limits within which the price fluctuations will be confined. In different ways many agencies, public and private, assemble facts that have a bearing upon the probable demand for cotton, and the findings of the several inquiries are published for the information of those directly or indirectly concerned. Each individual is left free to draw his own conclusions as to the joint effect of the many factors in the problem, and the re- 10 Forecasting the Yield and the Price of Cotton suiting conduct of the many buyers gives definiteness to the law of demand for cotton. Would it not be possible to describe this resulting law of demand with a degree of precision as great as the accuracy with which, from elaborate Government reports as to crop-condi- tions and crop-prospects, the official Bureau forecasts the probable supply of cotton? By means of the principles and methods presently to be described, it is possible for any person (1) from the current reports of the Weather Bureau as to rainfall and temperature in the states of the Cotton Belt, to fore- cast the yield of cotton with a greater degree of accuracy than the forecasts of the Department of Agriculture, and (2) from the prospective magnitude of the crop, to forecast the probable price per pound of cotton with a greater precision than the Department of Agriculture forecasts the yield of the crop. The principal purpose of my Essay I should Uke to make very clear. It is not to point out the limitations of the work done in forecasting by the Department of Agriculture; much less, to urge any device of my own as a substitute for the methods that are followed by the official Statistical Bureaus. My chief aim has been to make a contribution to economic science by showing that the changes in the great basic industry of the South which dominate the whole economic Ufe of the Cotton Belt are so much a matter of routine that, with a high degree of accuracy, they admit of being predicted from natural causes. The business of economic science, as distinguished Introduction 11 from economic practice, is to discover the routine in economic affairs. It aims to separate out the elements of the routine, to ascertain their interdependence, and to use the knowledge of their connections to anticipate experience by forecasting from known changes the probabilities of correlated changes. The seal of the true science is the confirmation of the forecasts; its value is measured by the control it enables us to exer- cise over ourselves and our environment. CHAPTER II THE MATHEMATICS OF CORRELATION "The true Logic for this world is the Calculus of ProbabiUties, the only Mathematics for Practical Men." — Jambs Clerk Maxwell. In an Announcement ^ issued April 29, 1916, by the Office of Markets and Rural Organization of the U. S. Department of Agriculture, there is a "Review of Some of the Provisions of the Pending Cotton Futures Bill, H. R. 11861, and of Causes of Differences Between Prices of Middling Cotton in New York and Liverpool. " Three valuable charts are given of fluctuations in different markets of prices of spot cotton and prices of cotton futures. One of the charts is desfcribed in these words: "Chart 3 shows the variation in the prices of futures ^ on the cotton exchanges at New York and New Orleans, as compared with the price of Middling as determined by averaging the quotations obtained from the desig- nated spot markets, as follows: Norfolk, Augusta, Savannah, Montgomery, New Orleans, Memphis, Little Rock, Dallas, Houston, and Galveston. The chart > Service and Regulatory Annoimcevieids, No. 9. 2 The meaning of "futures" throughout the investigation is given in a description of the statistical data: "The future quotation for each day is always that for contracts which are to be fulfilled in the current month. During the last five days of a month, when contracts for the present month are no longer traded in, contracts for the following month are substituted, as they may be considered essentially the current month for such contracts may be purchased or sold and immediately fulfilled or closed." IHd., p. 101. The Mathematics of Correlation 13 covers the time between February 15, 1915, and Jan- uary 22, 1916." From a study of this and the other two charts, the writer of the Review concludes "that since the cotton futures Act^ went into operation future quotations have fairly reflected spot values in both New York and New Orleans, and also in a general way over the entire South, and that the law has thus ac- complished and is accomplishing the end for which it was enacted." Ibid., p. 104. With the question as to whether the cotton futures Act is doing the work for which it was enacted, we are not at present concerned, but we are interested in the statement of fact that "since the cotton futures Act went into operation future quotations have fairly re- flected spot values in both New York and New Orleans, and also in a general way over the entire South." The official words are that "future quotations have fairly re- flected spot values. ' ' Just what is meant by fairly? How can one measure the degree of association between futm-es and spot values? Or, to put the question in another form, suppose one knew the probable spot values in the South, how could one forecast the price of futures on the cotton exchanges at New York and New Orleans? These are types of problems which the statistical meth- ods we are about to describe enable us to solve. Let us propose a definite problem and connect the exposi- tion of the statistical methods with the solution of the problem: On Figure 1, the two graphs record for an interval of 42 days, from September 11 to October 30, 1915, the fluctuating prices of average spots in the South on 'The Act of 1914. 14 Forecasting the Yield and the Price of Cotton -] 1 1 r r- » «i ■o o lO fVl «o ;^ Ci a Oi 5 <3 ■ft: - g 5) I UQ//OJj/0 S-30U^ The Mathematics of Correlation 15 the ten markets which were enumerated above, and the fluctuating prices of futures on the New York exchange. The general trend of each series of figures shows an ascent to about the middle of the record and then a descent. Suppose we were to make allowance for the general trend in the two graphs, what would be the degree of connection between the fluctuations of the futures from their general trend and the fluctuations of the spots from their general trend? To be more definite, suppose we represent the general trend of each series of figures by a progressive average of five daily quotations; that is to say, suppose we place on both series for each day a mark indicating the mean of the respective quotations for the five days of which the given day is the middle day. We should then ob- tain for each series a number of points that would indicate its general trend. If we take the fluctuations of each series from its own general trend, we shall have the data for the problem which we propose to solve, namely, to ascertain the degree of association between the fluctuations of futures and the fluctuations of spots. Figure 2 shows the actual quotations for futures on the New York exchange and the general trend of the figures when the general trend is derived from a progressive average of five daily quota- tions. Figures 1 and 2 exhibit data for only 42 days.^ ' The data used by the Government office, covering records for 275 to 280 days, were kindly suppUed to me by Mr. Charles J. Brand, Chief of the Office of Markets and Rural Organization. When we come to the application of our statistical methods we shall use all of the available data. 16 Forecasting the Yield and the Price of Cotton -I 1 r "T 1 r "T 1 1 1 1 r _i I 1 1 I I I L. '5 Js O i5 I I I I ■vj o yvUi-i The Mathematics of Correlation 17 We pass now to the development of the mathematical theory of correlation. ^ A Frequency Distribution Statistical tables that show either the absolute or relative frequencies of observations for given types of measurements are called frequency tables, or frequency distributions. The accompanying Table 1 is a fre- quency distribution showing the absolute frequencies in the fluctuations of the average prices of spots from their general trend, the general trend being deri\ed from a progressive five days aA^erage. After the raw observations, for purposes of facility in the handling of the data, have been grouped into appropriate frequency distributions, the next step is to describe the distributions by the aid of the fewest possible measurements that will enable one to summar- ize the features of the distribution which, for the pur- pose in hand, are most important. One of the most important summary descriptions of a frequency distribution is the mean \'alue of the distribution. In the particular problem before us the mean value of the fluctuations of average spots from their general trend is the quantity that we wish to ascertain. This brings us to the first step in our math- ematical work. 1 1 wish most gratefully to thank Professor Karl Prarson for the in- struction that I received in his laboratory several years ago, and for the inspiration of his published works. To him, almost exclusively, I owe my knowledge of the theory of correlation. In beginning the study of Professor Pearson's writings, I received help from Professor G. U. Yule's article "On the Theory of Correlation," in the Jouriuil of the Royal Stalislical Society, December, 1897. and from .Mr. W. Palin Elderton's treatise on Frequency Ciiri'r>< mid Cornialion. 18 Forecasting the Yield and the Price of Cotton TABLE 1. — Frequency Distribution of Fluctuations op the Prices or Average Spots from a Five Days Progressive Mean op Prices Fluctuations of Average Spots (Cents) Frequency (Number of Days on which the Fluctuations Occurred) — .165 to — .135 3 — .135 to — .105 3 — .105 to — .075 4 — .075 to — .045 23 — .045 to — .015 55 — .015 to + .015 107 + .015 to + .045 54 + .045 to + .075 16 + .075 to + .105 7 + .105 to + .135 2 + .135 to + .165 1 Total 275 The Mathematics of Correlation 19 Theorem I. The algebraic sum of the deviations of a series of magnitudes from, their arithmetical mean value is zero. Let the magnitudes be Xi, Xi, Xz, . . . Xn, N in number, and let their arithmetical mean value be x. Then, by the definition of the arithmetical mean, we have X = Xi + X2 + Xs + . . . x^ N ' .' . N X = Xi + X2 + X3 + . . . Xn, and (xi — x) + (x2 — x) + {x^ — x) + . . . (x„ — x) = 0. But the quantities on the left-hand side of the equation are the deviations of the magnitudes from the arith- metical mean of the magnitudes, and the sum of these deviations is proved to be zero. This theorem we shall use later on in our work. Theorem II. The arithmetical mean of a series of mag- nitudes is equal to any arbitrary quantity plus the mean of the deviations of the magnitudes from, the arbitrary quantity. As before, let the magnitudes be Xi, X2, x^, . . . x„, and let P be the arbitrary quantity. S(x) Then x = N where S(a;) is put for the sum of the x's. Also we have Xi = P -\- x[, where x[ is the deviation of Xi from P; X2 = P-\- x'2, where x^ Xi = P + Xg, where x[ x^ = P + x'n, where x,' " X2 from P; " X3 from P; x„ from P. 20 Forecasting the Yield and the Price of Cotton Therefore S(a;) = TV P + ^{x'), and ^^ = P + -^ which is the proposition we had to prove. We shall now apply this latter theorem to find the mean ^-alue of the fluctuations of average spots from their general trend. The data are given in Table 2. Here, the arbitrary quantity from which the fluctua- tions are measured is zero. Column II gives the fluc- tuations measured from zero expressed in terms of the unit of grouping. According to Theorem II, the arith- metical mean is equal to the arbitrary quantity plus the mean of the deviations from the arbitrary quantity. Consequently, in this particular case, the arithmetical mean of the price fluctuations is ( — .07) in units of grouping, or (— .002) in absolute units. The Standard Deviation as a Measure of Dispersion The arithmetical mean of the frequency distribution gives us one of the most important summary descrip- tions of the distribution: it gives the centre of density of the distribution. But in economic, as well as in most other, measurements it is extremely important to know how the several observations are grouped about the arithmetical mean of the measurements, and a co- efficient showing the manner of grouping is a measure of dispersion. Just as we found that the arithmetical mean of the measurements gives us an idea of the centre of the density of the measurements, so, as a measure of dispersion, we might take the arithmetical mean of the deviations of the magnitudes from the mean of the observations. But if we followed this The Mathematics of Correlation 21 TABLE 2. — Computation of the Mean op the Fluctuations of Average Spots from Their General Trend I Fluctuations of Average Spots II Fluctuations Expressed In Units of Grouping Umt= .03 III Frequency / IV Product of Column II by Column III /x' — .15 —5 3 — 1,5 — .12 —4 3 — 12 — .09 —3 4 — 12 — .06 —2 23 — 46 — .03 —1 55 — 65 107 + .03 +1 54 + 64 + .06 +2 16 + 32 + .09 +3 7 + 21 + .12 +4 2 + 8 + .15 +5 1 + 5 Totals 275 —140 + 120 — 20 —20 The mean fluctuation from the general trend is, therefore, -rpf^ = — ^-07 in units of grouping, or ( — -.07) (.03) = — .002 in absolute units. plan, we should meet with an embarrassing difficulty: The deviations of the measurements from the arith- metical mean are some of them positive and some of them negative, and if we take account of the signs of 22 Forecasting the Yield and the Price of Cotton the deviations, then, according to Theorem I, the sum of the deviations is zero. We therefore choose, as our measure of dispersion, the square-root of the mean square of the deviations about the arithmetical mean of the observations, and we call this measure of dis- persion the standard deviation. If we let 0- represent the standard deviation, then, if Xi — X = Xi, X2 X = Ji-2) X3~X = Xs, Xji X -^ ny we shall have as the symbohc expression of the stand- ard deviation -v/"r-f2[ Theorem III. The square of the standard deviation of a series of magnitudes is equal to the mean square of the deviations of the magnitudes about an arbitrary quantity, minus the square of the difference between the arbitrary quantity and the mean of the magnitudes. As before, let the quantities be Xi, x^, Xz, . . ■ Xn and their mean value be x. Let the arbitrary quantity be P and let the difference between the arbitrary quan- tity and the mean be d^, so that x = P + d^. Let the deviations of the quantities from the arithmetical mean be Xi, X2, X3, ■ ■ . Xn and their deviations from P be x[, x\, x\, . . . x'n- We shall then have (1) The Mathematics of Correlation 23 ' Xi = Xi — X, -A. 2 = X2 — Xj ■A.i = X3 — X, (2) P = X - d,; (3) (4) x\ = x^—P = Xj — (x — 4) = {xi — x) +4 = X^+dxi X2 = X2— P = X2— (x — dj = (Xj— x) +d^ = Z2 +d^, X3 = X3 — P = X3— (x — 4) = (X3 — x) +4 = X3 +4, (xO^ = (Xi + dJ2 = X? + 2 4 Xi + cP,, (x^)^ = (X2 + 4)2 = XI + 2 4 X2 + 4, (xD^ = (X3 + 4)2 = XI + 2 4 X3 + (?„ {xy = (x„ + 4)^ = X2 + 2 4 x„ + 4; (5) Therefore 2(x')2 = SCX^) + 2d^(X) + Nd^. But according to Theorem I, S(X) = zero, and, ^ S(X2) S(x')2 ^ ^. , . ^ consequently — tj — = ^ — 4- bmce which was to be proved. Corollary. The mean square deviation about the arith- metical mean of the observations' is less than the mean square deviation about any arbitrary quantity. , , S(X2) 2(xT y> We have just proved that — ^^ = ^^ —4- The left-hand side of the equation is a positive quantity because it is a mean square. The right-hand side must, 24 Forecasting the Yield and the Price of Cotton therefore, also be a positive quantity, but it consists of the difference between two positive quantities, the greater of which is the mean square deviation about an arbitrary quantity. The same equation would hold no matter what the arbitrary quantity might be. There- fore the mean square deviation about the arithmetical mean is less than the mean square deviation about any arbitrary quantity. We shall now use this theorem to calculate the value of (Tx for the fluctuations of the average prices of spot cotton from the general trend of prices. The data are given in Table 3. Our mathematical theory of correlation is developed as an instrument to forecast economic events. We may stop for a moment, therefore, to consider the bearing of our results thus far upon the problem of forecasting. Figm-e 3 shows a smooth curve ^ passing closely to the broken line representing the frequency distribution of the fluctuations of average spot prices from their gen- eral trend. This ciuve is a symmetrical curve in the sense that the two sides of the figure are similarly dis- posed with reference to the maximum ordinate. If, for instance, the right-hand side of the figure were made to revolve about the maximum ordinate and be placed upon the left-hand side, the two parts of the curve would be congruent. This symmetrical curve is called the normal, or, sometimes, the Gaussian curve, after the author of Theoria Motus Corporum Coelestium, who was one of the first to investigate its properties. If we represent by x the deviations of the abscissas from f I In fitting the smooth curve to the data, the value of cr was computed with Sheppard's correction. The Mathematics of Correlation 25 TABLE 3. — Computation op the Standard Deviation of the Fluctuations of Average Spots from the General Trend I Fluctuations of A.verage Spots 11 Fluctuations Expressed in Units of Grouping Unit =.03 x' III Frequency / IV Product of Column II by Column III fx' V (x')' VI f(xV — .15 — 5 3 — 15 25 75 • —.12 — 4 3 — 12 16 48 — .09 — 3 4 — 12 9 36 — .06 — 2 23 — 46 4 92 — .03 — 1 55 — 55 1 55 107 + .03 + 1 54 + 54 1 54 + .06 + 2 16 + 32 4 64 + .09 + 3 7 + 21 9 63 + .12 + 4 2 + 8 16 32 + .15 + 5 1 + 5 25 25 Totals 275 — 140 + 120 544 -20 According to the symbols in the text dx is the mean deviation from the arbitrary origin, and, consequently, in this particular case, —20 dx = p=^ = — .0727, and d^ = .005285. The mean square deviation Sf(x') 2 544 about the arbitrary origin is -=^^^^ — = ^y? = 1.978182. By Theorem III, (tI = ^^ = 5%^' — 4, and, consequently, 6 in (4) are so determined that ^ is a minimum, then — Y is greater than — , and, by subtracting (4) from (5) , we have Y' -Y (6) — ^yT— = 2TOei {al + x") + 2(e2TO + ei6) x + 26e2 -2eip^^-2e2^; = 2ei (TO((r^ + xO + 6x-p^j,} + 2e2 {tox+ 6-^}, The Mathematics of Correlation 37 V But when m and b are so determined as to render — N V V a minimum, and d and 62 are very small, — and t^ are, for practical purposes, equal and equation (6) may be put into the form (7) 2ei{m{al + x') + bx~p,^} + 2ei{ mx + h-y] = 0. In order for equation (7) to be a true equation, a suffi- cient condition is that the coefficients of 2ei and 2e2 shall each be zero ; that is (i) m{ai + f2) -)- bx-p^y = 0; (ii) mx -\- b — y = 0. Solve these two equations for m and b. Multiply (ii) by X and subtract the result from (i). We get ''^x-^Vxy + ^y = 0, and, consequently, m = -^~ — . ^x Substitute this value of m in (ii) and solve the resulting equation for b. We obtain b = y— „ x. If we substitute these values of m and b in the equation to the straight line, y = mx + b, we get (8) y = P^^^^,+ \y-P^^^, V This is the equation to the straight line that makes — , the mean square deviation of the points from the line, a minimum; it is the straight Une that fits best the data. If our sole purpose were to find the fine fitting best any given data, we might stop here. We should then 38 Forecasting the Yield and the Price of Cotton compute from the given data the values of the constants in equation (8), and, by substituting these values in that equation, obtain the equation in its numerical form. Our problem, however, is not completely solved by finding the equation connecting the two variables X and y. We wish to know, in any given case, how closely the two variables are associated. In the particular case which we have taken to illustrate our mathemat- ical methods, we wish not only to know the equation connecting the fluctuations of New York futures from their general trend with the fluctuations of the aver- age prices of spot cotton from their general trend, but we wish to know how closely the prices of futures and the prices of spot cotton are connected. To approach this last problem we simplify equation (8) . We have agreed to call x the mean of the x's in the scatter diagram, and y, the mean of the y's. Suppose we call the point in the scatter diagram whose coor- dinates are {x, y) the mean of the system of points, and inquire whether the straight line described by equa- tion (8) passes through the mean of the system of points. If the line passes through this point, the coordinates of the point {x, y) must satisfy the equation. Substitute X, y respectively for x and y in (8) . We obtain (9) y^^-^^L^^+U-^S^^IZm _ v^-xy Px. - xy y 2 •*' ~ i/ 2 X. But this is a true and identical equation, and conse- quently the line described by equation (8) passes through the mean of the system of points on the scatter The Mathematics of Correlation 39 diagram, that is to say, the point whose coordinates are (x, y). The fact that the best-fitting hne passes through the point {x, y) enables us to simplify equation (8). By transposing we may write (8) as foUows: (10) (y-y) = ^^^^(x-x). The quantity {y — y) is the deviation of the ordinate of the best-fitting straight line from the mean of the i/'s, and may be represented by Y; the quantity (x — x) is the deviation of the abscissa of the Une from the mean of the x's, and may be represented by X. Since, as we have just proved, the Hne passes through the point (x, y), if we transfer the origin from zero to the point (x, y), equation (10) may be written (11) F = ^^^-=^Z. The effect of transferring the origin to the point (x, y) is to get rid of the value of h in the equation to the straight line. 77 ■ XT/ We shall now examine the quantity " ^ — - = m, which appears in both (10) and (11). We know that p^j, is the mean value of the products xy. Let us define a new quantity tr^y to be the mean product of the deviations of x and y f:tom their respective means. Then, by definition, 2:(x-x)(y-y) S(xy) ^S(y) -^^(x) 40 Forecasting the Yield and the Price of Cotton "W ~ ^"'" N ~^' N ~ Therefore -ir^y = Vxv — ^y> and we may write m = ^^'^ ~ ^^ = ^. Make this substitution in equations (10) and (11) and we get (12) {y-y) = '^{x-x); (13) y = ^ X. If, as a further step, we define r to be a quantity such that r = ^^, then, by substituting in (12) and (13), we may write the equation to the best-fitting straight hne in either of the following forms : (14) {y — y) = r^{x-x). O'x (15) Y = r^X. The quantity r in these equations is called the coefficient of correlation. The Coefficient of Correlation An inspection of equations (14) and (15) shows that in order to secure the best fit of a straight line to given data, all that is necessary is to compute from the data the values of x, y, a^, a^, r, and to make the proper substitutions in (14) and (15). We have already dis- cussed methods of computing x, y, a^, a^, and we now The Mathematics of Correlation 41 reach the question of the best method of computing A 1 ■ 1 "J^xv S(x ~ x)(y — y) r. As we have just seen, r = —^^ = — ^^ — ^" —, and if we were indifferent to the labor of computation, we might use this formula to ascertain, in any concrete case, the value of r. We, however, found methods for computing x, y, a^, Oy by working with deviations from arbitrary quantities as origins, and we now pro- ceed to develop a method of computing r by retaining the same arbitrary origins which we used in calculating the means and the standard deviations. We wish to find — and to derive its value by working with deviations of the a;'s and ^'s from arbitrary origins. Theorem IV. The mean product of the deviations of two correlated variables from their respective arithmetical means is equal to the mean product of the deviations of the two variables from arbitrary origins, minus the difference between the arbitrary origin and the mean of the one vari- able multiplied by the difference between the arbitrary origin and the mean of the second variable. Let the observations be (xi, yi) ; (x^, 2/2) ; (xs, y^) . . . (^n, Vn)- Let P be the arbitrary origin from which we measure the deviations of the .r's, and Q be the arbitrary origin from which we measure the deviations of the y's. Let the deviations of the x's from P be represented by x' and the deviation of the y's from Q be represented by y'. Let the deviations of the x's from x be represented by X, and the deviations of the y's from y be represented by Y. If we put x-P = d^, and y-Q = dy, our Theorem IV is that X ix-x)(y-y) _ 2(xY)_ N ~ N "^^ "■ 42 Forecasting the Yield and the Price of Cotton We have (16) x[ = Xi - P = Xi - (x — 4) = {^1 - x) + d^ = Xi+ 4, x'i = X2 — P = X2 ~ (x — d:,) = (a;2 - x) + d^ = Xi+ 4, xl,= x„ - P = x^ ~ {x - d^) = {x„ -x) + d^ = X„+ 4- Similarly, y'i = yi-Q = yi~ (y -dy) = (vi - y) + dy=Yi+ dy, y2 = y2-Q = y2- (y -dy) = (?/2 - y) + dy= Y2+ dy, y'n = yn - Q = yn - (y - dy) = (?/„ -y) + dy= F„+ dj,. Therefore, x[y[ = (Xi+ 4) {Yx + dy) = Xi7i + dyXi + 4Fi+ 4d„ x'2y'2 = (X2+ 4) (^2 + dy) = Z2F2 + dyX2 + d,Y2+ djy, xX= (Z„+ 4)(5^„+ d,,) = X„7„+ dj,Z„+ (ixF„+ dJy. Summing both sides of the equation, we get Xix'y') = S(X7) + d^(X) + d^ij) + Nd^dy. But, according to Theorem I, 2(Z) = 2(7) = 0, and, consequently, S(xV') = S(X7) + ^d^dy, or, n7^ ^(^^) 2(x-x)(j/-^) 2(xV) (17) -^^ = ^ = -^^ 4d.. This formula gives us a method of computing the the value of r, tl S(x — x)(y - y) factor — ^^ — in the value of r, the formula for which we know is, r =- N(X^(Xy ^ o cd IN -* ^ cq 00 ■^ ir> OJ CO CO >0 o cq CO s W N b- O O) 1 Eh 1 »o _in CO OcD »c + + »o_,»o w IN lo _in m O °co (N + + CD .S" S-l (N 4 "5 «"^ i^5o t}4 1 + + cocqg CDtHO oicqg ^-^S If^-^S t^ + U3 'o ^oK OJ i 1 1 + + IN (N cqTto 'ro s^s 1 ^ 1 1 "^ [ OCftO ^oiST «sl CO cog Tt< o s + + 1 X 1 ^ "^ ^ ^ + ft lO _IC IN ^ R^ o o-*^o oS§ o^§ t^ o OrHO otoo <=Jn£ oi>.o o P l' + l' tH 1 ! ■^^^ COtD^" MI>S ■^(NCN o^o 1^ 1 in lO s M 1 1 ""' ■^^ ""^ 1 1 1 1 00 l' l' oo^g =Drto CO Oo lei rH • ^^^ CO 1 1 l' to -W tOHCO m 1 1 lO^ic iOt-(^ CO ? »o _in lO - lO »o _ in lo -ira to -lO in -lo lO ->o ira -in lO -ifl lO _in iC „U3 oSo; O Ot>- i> Oc< IN 9b- b-5(N IN S(> M °I> r~-5 t^5o c^5b- (M-*^e; w-^'i^ i-H +*.- rH-^O o-*^o o*^o o-^o 0-*^rH r^-*^e« £N"^CN 0) I' 1 r 1 1 1 l' 1 1 1 1 + + 4 + 4 + 4 + 4 + + ^ s sjCBp g JO sn-Bara SAiesgiSoad sq* raojj sain^iijf 3[jojt Ma^ jo BaoT!^Bn^on[ j H 44 Forecasting the Yield and the Price of Cotton The data in Table 4 will serve to illustrate the method of computing the coefficient of correlation.^ We have proved that r =Z(^Z^)fcl), and that ^^^'^l^-y^ 'E(x'v') = — d^dy-, and we recall that d^ = x — P, where P is the arbitrary origin from which x' is measured, and dy'^y — Q where Q is the arbitrary- origin from which y' is measured. In the correlation table which is given in Table 4, P is the origin from which are measured the fluctuations of the average spots about the progressive means of 5 days, and is taken at the point zero, which lies mid-way between (- .015) and (+ .015). The values of x', the fluctua- tions of average spots, are the distances to the right and to the left of the arbitrary origin, and the sign of x' is positive or negative according as the distance is to the right or to the left of the arbitrary origin. In a similar manner, the arbitrary origin Q, from which are measured the fluctuations of New York futures about the progressive means of 5 days, is taken at the point zero which lies mid-way between ( — .025) and (+ .025). The fluctuations from Q, which are desig- nated by y', are negative toward the upper end of the table and positive toward the lower end. Just as in the scatter diagram, which is given in Figure 4, the 275 observations were represented, according to their co- ' The method described in the text is the one most frequently required in actual experience. Where, however, the number of observations is smaU, which happens to be the case with a large part of the data in this Essay, a shght alteration of the procedure described in the text is neces- sary. A complete illustration of the method of correlation when the observations are few in number is given in Chapter III, Table 6. The Mathematics of Correlation 45 ordinates, by points on the diagram, so in the correla- tion table each observation falls in some one of the cells composing the Table. ' The figure in the middle of the ceU gives the number of observations in the cell; for example, in the upper left-hand corner there is one ob- servation, which means that out of 275 days observa- tion, there was one day when the fluctuation of average spots was between ( — .165) and ( — .135) from the gen- eral trend of spots, and the fluctuation of New York futures was between (— .275) and (— .225) from the general trend of New York futures. In the same cell in the upper left-hand corner of the correlation table there is above the figure 1 the figure 25, and below the figure 1, the figure (25). A similar arrangement is followed in all of the cells in which observations occur, and we now proceed to explain its meaning. The work- ing unit in thp classification of the fluctuations of spots is .03, and in the classification of the fluctuations of New York futures, it is .05. The range of the fluctua- tions of spots is from the mid-value of the first cell on the left to the mid-value of the last cell on the right, that is, from (— .15) to (+ .15), or, since the working unit of the x"s is .03, the range is from (—5) to (+5) working units. Similarly, the range of the y"s is from (— .25) to (+ .25), or, since the working unit is .05, the range is from (— 5) to (-|- 5) working units. Returning now to the one observation in the upper left-hand corner of the correlation table, we find that its distance from the zero point of the x"s is (— 5) work- ing units, and from the zero point of the y"s, is also (—5) working units. The product of these two, which is x'y', is ( — 5) ( — 5) = 25, and this explains 46 Forecasting the Yield and the Price of Cotton the figure 25 at the top of this one cell. Since there is only one observation in this cell, if we weight the prod- uct 25 by 1 we get (25), which explains the figure (25) at the bottom of this particular cell. To summar- ize, the figure in the middle of the cell is the frequency of the observations; the figure at the top of the cell is the product x'y' in working units; and the figure at the bottom of the cell is x'y' weighted according to the number of observations in the cell. The heavy fines that pass from the top to the bottom, and from the left to the right of the correlation table divide the latter into four large divisions. All of the products in the cells of the upper left-hand and lower right-hand divi- sions are positive, and all of the products of the other two divisions are negative. If we sum all of the positive products separately and then all of the negative prod- ucts, their difference will give us S(a;'?/'); and if we then divide this result by 275 we shall obtain 'Zi(x'v') ^^ ' . If we indicate by S(-l- x'y') the sum of the positive products, and by '2,{—x'y') the sum of the negative products, we find from Table 4 that S(-|- x'y') = (25) -f- (15) + (5) ^ (32) + (4) + (18) -}- (6) -I- (8) + (24) + (32) -H (12) + (4) + (18) -H (14) + (25) + (19) -1- (28) + (18) + (8) + (28) -f (18) -f- (8) -|- (6) + (6) -f (18) -f- (12) + (15) -V (16) -I- (20) + (25) = 487; and 2(- x'y') = (- 3) -F (- 4) + (- 5) + (— 5) -h (— 2) = - 19. Consequently, ^{x'y') = 487 - 19 = 468, and ^^^ = g| = 1.7018. The quantity that we wish to determine next is — ~^' N ' The Mathematics of Correlation 47 which we know is equal to \J^ ^ — d^d„. We JSf have found in the early part of this chapter that 4 = - .073 in working units; and just as we deter- mined 4 we can, in a similar manner, determine dy. The actual computation shows that d^ = — .026 in working units. Consequently, d^d^ = ( - .073) (— .026) = .0019, and y ^ - d,dy = 1.7018 - .0019 = 1.6999, which is, therefore, the value of -^ j^ ^. But the coeffi- cient of correlation r is equal to — ^^ — tt^ —, and since, in working units, S; and 68 out of 100 lie between + S. The equa- tion to the best-fitting straight line enables us to com- pute the most probable value of y corresponding to a given value of x; the value of S enables us to say within what limits any proportion of the actual observations are scattered about the straight line. The coefficient of correlation r is the coefficient which we have been seeking as a measure of the degree of association be- tween two variables. Where the association between the variables is perfect, r = + 1, S = a-yVl — r^ = 0, and from the knowledge of the one variable we can, by means of the equation to the best-fitting straight line, forecast the other variable with perfect accviracy. When the association between the two variables is not perfect, r falls between the limiting values + 1, and S = (TyVl — r^ shows the accuracy with which, using the equation to the best-fitting straight line, the mag- nitude of the one variable may be predicted from a knowledge of the other. We may illustrate these points by the problem of the relation between New York futures and average spot values in the South. We have found that the best- fitting straight line connecting the fluctuations in New York futures with the fluctuations in the price of spot cotton is 2/ = 1.45a; + .002. For any given value of x, representing the fluctuation in the price of spot cotton, we can predict, by means of this formula, the most probable fluctuation in the price of New York futures. The Mathematics of Correlation 51 We are, however, not content to forecast the most probable values of y, but we wish to know, in addition, the degree of accuracy of the forecasts. The formula that has just been developed supplies an answer to this latter question. Since r = .714 and Cy = .085, there- fore S = (jyV \ — r'^ = .06, and from what we have learned about the significance of S, we know that, when we use the formula y = 1.45x + .002 as a prediction formula, in 99.7 per cent of all the forecasts the error will be less than + 3»S; in 95 per cent of all the fore- casts, the error will be less than + 28; and in 68 per cent of all the forecasts, the error will be less than + *S. In beginning this chapter we referred to the official statement that "since the cotton futures Act went into operation, future quotations have fairly reflected spot values in both New York and New Orleans, and also in a general way over the entire South." We made the comment: "Just what is meant by fairly? How can one measure the degree of association between futures and spot values? Or, to put the question in another form, suppose one knew the probable spot values in the South, how could one forecast the price of futures" on the cotton exchange at New York? All of these ques- tions may now be answered in a definite, numerical way: the degree of association between futures in New York and spot values in the South is measured by r = .714; the formula by which futures may be predicted from the knowledge of spot values is 2/ = 1.45a; -|- .002; and the error of the forecasts by means of this formula is measured by ! la P? 1 h s> d M K < lO >H o g II If 1^, Fh ^■^ s ^ 05 CO '=> O o 0) "^ TiH o ■^ lO V 03 00 (N CO to ^ ^ O lO cvi CO CO i 03 1 II 1 Ti< <— I 3 a tJ S -d .S + + + + S £ S ^ H H H H ■* CD CO o 1— 1 lO O 00 t '|i^| l' '-' bC £ II S 1 1 1 8 » II II 3) II II Sj II Si IttI i4 IN CO O tH IN I-H w to ^ ag mi l> CO IN T— 1 CO T-H 2; §13=3 O s s -rP r^ ^ r~t Ol S o 00 CO ^ IN ^ ja ^ o ^ += '^ CO T— 1 CD CO 03 a II S i-H CO ^ I— I d 1 =" 2 + + 1 + + g £ g ^ H H H H H 1^1 lO lO lO CO -^ > o >o tH CD o> CO i-H 00 (31 i ;bo ^ l' ^H M "2 II § £ 6 « II II II II II 1 Sj s> » a Si S.2i.2 ■3 a a K.2B ^ >> .Sgag 1— I CO i-H T— 1 IN T— 1 IN T— t (U '^2-^2 ° U 1 O £ |l Oi CO 1— i I-H CO 1—1 1—1 Z«S=3 0£ f-t -^J (V 1 o3 >> CO ,D S 1 < P. 01 CO 84 Forecasting the Yield and the Price of Cotton already been dealt with, and only a few words need be said about the official method of estimating acreage. Throughout the whole period under investigation, 1890-1914, the Bureau of Statistics has again and again warned the pubhc that its figures referring to acreage are merely estimates and not the results of extensive measurements such as are used by the Bureau of the Census. It has issued its reports as "the best available data, representing the fullest information obtainable at the time they are made," ^ and it has frankly pointed out the limitations of the method which it has felt compelled to follow in estimating acreage. The census figures of the acreage devoted to the several crops, which have appeared every ten years, have been taken by the Bureau of Statistics as the foundation upon which to base its calculation as to the acreage under cultivation during the intercensal years. For each year between the census surveys, correspond- ents were asked to observe whether, as compared with the preceding year, there had been an increase or a decrease in the acreage of cotton in their respective districts, and to express the change as a percentage change. The Bureau of Statistics, using the last re- turns of the Census as the best available data, has computed the absolute value of the combined per- centages of its correspondents and has issued the result as the Department's estimate of the acreage of the cotton crop for the current year. The Bureau has re- garded each of its estimates merely as "a consensus of 1 Annual Report of the Bureau of Statistics for the Fiscal Year 1911- 1912. Crop Reporter, December, 1912. The Oovernmenf Crop Reports 85 judgment of many thousands of correspondents," ^ and it has pointed out "that estimates made monthly from year to year, following each other during a period of 10 years, without means of verification or correction, are likely to be more or less out of line with conditions at the end of the 10-year period as disclosed by actual census enumerations. Cumulative errors, impossible of discovery, are Ukely to occur and cannot be corrected until census reports are available.' ' ^ At the appearance of new census figures the Bureau of Statistics has revised its estimates of the preceding intercensal years, and the more recent census figures have been used as the basis of estimates for the following years. It would doubtless be possible to test the degree of accuracy in the method employed by the Bureau of Statistics for calculating the acreage of the crops ; or, to be more exact, it would be possible to test how nearly the preUminary estimates correspond with the revised estimates. As far as I am aware this test has never been carried out. The Bureau reports that, in case of some of the crops, there has been a considerable differ- ence in the two estimates.^ If the test were made and the method were found to be unsatisfactory, the prob- lem would then present itself of finding a better method, and the solution of the problem would be sought in either of two directions: Either the direct measure- ment, such as is used by the Bureau of the Census, must '■ Annual Report of the Bureau of ' Statistics for the Fiscal Year 1911-1912. 2 Ibidem. 'Crop Reporter, May, 1900, p. 2. "Department of Agriculture and he Census." 86 Forecasting the Yield and the Price of Cotton be applied more frequently than ten years, and the method of estimates employed by the Bureau of Statistics be checked up at shorter intervals; or else the quantitative connections between variations in acreage and the variations in other economic factors must be discovered, and the acreage be then computed from these known connections. The former solution, which is undoubtedly the best, is urged by the Depart- ment of Agriculture.^ But an agricultural survey is extremely expensive, and its results are frequently made known when, for many practical purposes, it is too late. The Bureau of Statistics has reported that "the results of the agricultural census which related to 1909 were not published in time to permit a revision of estimates of this Bureau until the close of 1911." ^ Furthermore, while the Bureau of Statistics makes its estimates for current use, the estimate of the acreage of cotton is not published until about July 1. But there are a number of industries dependent upon the acreage of cotton which would profit by having a reliable estimate earlier in the year. Would it not be possible to have a fair estimate of the probable acreage even before the crop is planted? In Table 12 there is an illustration of a method by which a solution may be obtained of the problem that has just been described. The acreage planted in cotton, any given year, is largely dependent upon what has been the fortune, good or bad, of the cotton farmers in preceding years. If, for example, the price of cotton has been faUing, few acres will be seeded in that particular ' Annual Report of the Bureau of Statistics for the Fiscal Year 1911- 1912. - Ibidem. The Government Crop Reports 87 TABLE 12. — Percentage Change in the Acreage of Cotton and Percentage Change in the Production op Cotton Lint Year Acreage of Cotton (Thousands of acres) Absolute change in acreage Percentage change in acreage Produc- tion of cotton lint (Millions of bales) Absolute production Percentage change in production 1888 6.92 9 20,180 7.47 + 0.55 + 7.95 1890 21,886 + 1706 + 8.45 8.56 + 1.09 + 14.59 1 23,876 + 1990 + 9.09 8.94 + 0.38 + 4.44 2 15,228 — 8648 — 36.22 6.66 — 2.28 — 25.50 3 • 23,837 + 8609 + 56,53 7 43 + 0.77 + 11.56 4 24,959 + 1122 + 4.71 10.03 + 2.60 + 34.99 5 21,896 — 3063 — 12.27 7.15 — 2.88 — 28.71 6 32,823 + 10927 + 49.90 8.52 + 1.37 + 19.16 7 28,861 — 3962 — 12.08 10.99 + 2,47 + 28.99 8 25,174 — 3687 — 12.78 11.44 + 0.45 + 4.10 9 24,278 — 896 — 3.56 9.35 — 2.09 — 18.27 1900 24,982 + 704 + 2.90 10.12 + 0.77 + 8.24 1 26,897 + 1915 + 7.67 9.51 — 0.61 — 6.03 2 26,940 + 43 + 0.16 10.63 + 1.12 + 11.78 3 26,952 + 12 + 0.04 9.85 — 0.78 — 7.34 4 31,350 + 4398 + 16.32 13.44 + 3.79 + 36.45 5 27,205 — 4145 — 13.22 10.58 — 2.86 — 21.28 6 31,301 + 4096 + 15.06 13.27 + 2.69 + 25.43 7 29,848 — 1453 — 4.64 11.11 — 2.16 — 16.28 8 32,493 + 2645 + 8.86 13.24 + 2.13 + 19.17 9 31,060 — 1433 — 4.41 10.00 — 3.24 — 24.47 1910 32,467 + 1407 + 4.53 11.61 + 1.61 + 16.10 11 36,045 + 3578 + 11.02 15.69 + 4.08 + 35.14 12 34,283 — 1762 — 4.89 13.70 — 1.99 — 12.68 13 37,089 + 2806 + 8.18 crop. There should therefore, in normal times, be some relation between the percentage change in the price of cotton last year over the preceding year and the per- centage change in the acreage of cotton this year over 88 Forecasting the Yield and the Price of Cotton last year. In Table 12 the data are presented with which to compute the relation between the two va- riables, namely, the percentage change in the acreage of a given year over the acreage of the preceding year, and the percentage change in the price of cotton from the price prevailing two years before the current year to the price the year before the current year. In the same way that this correlation Table was prepared, similar Tables were compiled connecting the percentage change in the acreage of cotton with the percentage change, in the preceding year, of other variables. A summary of the calculations is here given : The correlation between the percentage change in the acreage of cotton and (1) the percentage change of the year before in the total production of cotton hnt, r = — .641 ; (2) the percentage change of the year before in the price per pound of cotton lint, r = .532; (3) the percentage change of the year before in the value of the yield per acre of cotton lint, r = .508; (4) the percentage change of the year before in the acreage of cotton, r = —.492; (5)^the percentage change of the year before in the yield per acre of cotton, r = — .217; (6)^the percentage change of the year before in the index number of general wholesale prices, r = .005. From these calculations it is clear that even before the cotton crop is planted, it is possible to fore- cast the probable acreage with substantially the same degree of accuracy with which the Bureau The Government Crop Reports 89 of Statistics can forecast the yield per acre of cot- ton at the first of September. We know from the results of the preceding chapter that when the cor- relation between two variables is linear, the scatter of the observations about the line of regression is measured by S = tr^ \/ i — r'. The degree of accuracy with which we can forecast results is, therefore, de- pendent upon the two factors (Ty and v/l — r . If we make allowance for the difference between the values of 03 lO I-H ° I-H »! 1-H CD 00 CO I-H CO CO CO I-H CO 00 s~ CO CO o 1 00 8 The Representa- tive States 1 .a 8 o J < 03 C OJ D 3 O CO 100 Forecasting the Yield and the Price of Cotton months have increasing value as the crop approaches maturity. The comparative values of S and 8' show that ia every month, in all four of the represent- ative states, the correlation equation gives a more accurate forecast than the official formula. Moreover, the correlation equation as a forecasting formula does not admit of the anomalous results which were brought out in the consideration of Table 13, where we found that in a number of cases ;S' was greater than Cy, which signifies that the forecasts are worse than useless. When the correlation equation is employed as a fore- casting -formula it is impossible for S to exceed o-^, since S = (Ty \/l — f'. The results collected in the two Tables in this section establish two of the theses enunciated at the beginning of the chapter : (1) That some of the official reports referring to the representative states are valuable as forecasts, but that others are worse than useless in the sense of supplying erroneous instruction as to the crop outlook, and thereby suggesting a misdirecting of activity on the part of farmers, dealers, and manufacturers; (2) That even in case of the useful forecasts the ofiicial method does not extract the full amount of truth contained in the laboriously collected data. Forecasting the Yield of Cotton from the Accumulated Effects of the Weather ^ Throughout the period from the first of May until the end of September, the growth of the cotton plant is ' Professor J. Warren Smith, of the University of Ohio, and Mr. R. H. Hooker, of London, have been pioneers in dealing with special phases Forecasting the Yield of Cotton from Weather Reports 101 watched with anxious solicitude. The changes of the weather may convert a crop that is flourishing at the first of June into a comparative failure at the time of harvest, or the damage of excessive heat in July may be off-set by a beneficial rainfall in August. The eifects of temperature and rainfall upon the crop vary from state to state, but in all cases the effects are cumulative, and the probable consequences of a rain or drought at any point in the growth season are dependent upon the quantity and distribution of rainfall and temperature preceding the time in question. The principal difficulty in forecasting the yield of the crop from the changes in the weather is that there are so many variables in the problem and all of the variables are interrelated. In a particular state it may be that the growing plant needs a cool, dry July and a rainy, hot August; but tempera- ture and rainfall may be so interrelated that, on the average, in July when the weather is cool, it is likewise rainy, and in August when the weather is hot, it is also dry; and it might be more important that the crop should have rain in August than that July should be cool. The economist who, at any given time in the growth period, seeks to forecast the yield of cotton at harvest must be able to measure the effects upon the crop of the accumulated variations in temperature and rainfall up to the time in question. Before passing to the account of the method adopted to measure the accumulated effect of the weather upon of the topic treated in this section. Professor Smith was one of the first to see the economic importance of forecasting the yield of the crops from the weather, and Mr. Hooker, as far as I know, led the way in the use of the method of multiple correlation to measure the joint effect of temperature and rainfall upon the yield of the crops. 102 Forecasting the Yield and the Price of Cotton the growing crop, we shall consider the device for meet- ing the difficulties that are traceable to the secular trend and cyclical variations in the yield per acre, the rainfall, and the temperature. In some states the yield per acre of cotton throughout the period under investi- gation has steadily increased, while in other states it has steadily decreased. Moreover, in all of the states the yield per acre, temperature, and rainfall have, dur- ing the same interval, been subjected to cyclical in- fluences. In order to measure the relation between the yield per acre and the variations in the weather, we must make allowance for these secular and cyclical changes, and to do this we have profited by the ex- perience of the Bureau of Statistics of the Department of Agriculture. We recall that, in order to forecast from the condition of the crop in any month the prob- able yield of the crop at the end of the year, the Bureau of Statistics does not work directly with the absolute values of the condition and the yield, but it takes the condition-ratio and the yield-ratio, the forecasting formula being Cjq^ = Yjy^. In the preceding chapter we showed that equally good results would be obtained by using the formula Cjq^ = Y/y^; that is to say, in- stead of making the denominators five years averages, to employ three years averages. One advantage of the latter method is that when the available data are few and as many as possible must be utiUzed, the three years method gives a larger number of cases upon which to base one's computations. We shall use this three years method in correlating the temperature-ratios and rainfall-ratios of the several months with the yield-ratios of cotton. For each month Forecasting the Yield of Cotton from Weather Report 103 the series to be correlated will be Tjf^^ Y/fs; R/Rs, '^I fa- in these formulae T is the average temperature for the given month, and T3 is the average temperature for the same month during the preceding three years; R is the amount of rainfall for the given month, and Rs is the average amount of rainfall for the same month during the preceding three years; Y is the yield per acre of cotton for the given year, and F3 is the average yield per acre of cotton for the three years preceding the given year. In Table 15 the method of preparing the data for computing the correlation is illustrated by the correlation, in Georgia, between the June temperature- ratio and the yield-ratio of cotton. When the two series that are given in columns VIII and IX of Table 15 are correlated, it is found that r = .551, and this value of r gives for the value of the scatter, S = a^s/T^^' = 13.89 \/l - r' = 11.59. By referring to Table 13 we find that the official method of forecasting from the condition of the crop gives for the month of June, in Georgia, S' = 15.20. The official forecast, inasmuch as ay = 13.89, is worse than useless, while the forecast of the crop from the June tempera- ture has a decided value. We also know from Table 13 that the official method of forecasting from the condi- tion of the crop gives for the month of May, in Georgia, a value of S' = 17.28, which is also worse than useless. But the correlation between the May rainfall-ratio in Georgia and the yield-ratio in Georgia gives r = — .410, and S = 12.67. These two illustrations show that the cotton crop in Georgia is favorably affected by a dry May and a warm June. How would it be possible to utilize the knowledge both of the rainfall in May and 104 Forecasting the Yield and the Price of Cotton CO o ■* s 3 00 O 00 CO CO 00 2 i-i o o Oi Oi >3 o> 00 CO 2 VIII June Temperature Ratio Tl_ ■■a o o s CD O o o o 03 t* g 2 Oi OS CO o O CO O S 00 O 05 g O 2 VII Mean Yield per Acre of Cotton Column VI Divided by Three Fa g 3 CO lO D CO t^ S s 2 CD 00 00 s g s 2 s Ol (N O VI Sum for Preceding Three Years in Column V CO § ■* CO 00 00 ira O OJ ?5 s ID o ID IQ lO CD ID r- s b- ID CD l> >D CD V Yield per Acre of Cotton in Pounds of Lint Y s 2 2 CM (N cq 00 CO 00 en 1> CD 1ft CD 00 o o IN CD o O X CO o CO cc ^ OS IV Mean Tem- perature for Three Years Column III Divided by Three Ta CO CD CO i CD 00 00 CO CM 00 CO O 00 ID C<1 00 (M CD b- 1^ III Sum for Preceding Three Years in Column II CT> CO CO -X) 00 CO CM ID CO CO CO CM CO CO CO CO o CO eg Ol CO CO ID CO (N CO o co (N cD CO II June Mean Temperature T g i 00 S CO in CO ID 00 00 o CO ID 1> ID 00 l> CO ID 6 00 ID b- CD 00 g o n ^ in CD t- X Ol o 2 -H (N CO -* >o CD t* 00 03 o - (M CO ■^ Forecasting the Yield of Cotton from Weather Reports 105 the temperature in June to forecast, at the end of June, the probable yield of the crop? This question brings us to a consideration of the method of multiple correlation. In Chapter II, "The Mathematics of Correlation," we developed in considerable detail the theory of the correlation between two variables. The essential steps are: (1) The assumption ^ that the two variables are related in a linear way by the equation y = mx + b; (2) The calculation of the coefficient of correlation r; and (3) The determination of the accuracy of the equa- tion y = mx + 6 as a forecasting formula by calculating the scatter S = (TyVl — r^. In the theory of multiple correlation the essential steps run parallel to those in the theory of the correlation of two variables. In case of three variables the three steps are : (1) The assumption that the equation connecting the three variables, Oj 1j 2 — I Ctiit/i I Ct'2**'2 } (2) The determination of the degree of association between the variable Xo and the other two variables Xi, X2 by calculating the coefficient of multiple correla- tion R; (3) The determination of the accuracy of the equa- tion Xo = ao + aiXi + a^Xi as a forecasting formula by calculating the scatter, aS" = (t^ /l — R^. We found, in the theory of the correlation of two vari- 1 There are methods for testing the legitimacy of the assumption which should, of course, be applied. 106 Forecasting the Yield and the Price of Cotton ables, that the forecasting formula, namely, y = mx + b, may be put into the form (y — y) = m{x —x). In a similar manner, when three variables x^, Xi, x^, are correlated, the forecasting formula may be put into the form {xa — Xo) = ai(xi — xi) + ai{x2 — x^). Our statistical problem will be solved if we can de- termine from the statistical data the following three items : (1) The values of the coefficients of (xi — Xi) and (x2 — X2) in the forecasting formula. These values are a, = Here r^ is the coefficient of correlation between the variables x^, x^; r^ is the coefficient of correlation be- tween Xo and X2; 7-12 the correlation between Xi and X2; o-p is the standard deviation of the x^'s; a^ is the stand- ard deviation of the Xi's; and 0-2 the standard deviation of the X2's. When these values of ai, 02 are substituted in the forecasting formula (x,, — x,,) = ai(xi — Xi) + a2(x2 — X2), the most probable value of Xq may be calculated from the known values of Xi, X2. (2) The value of the coefficient of multiple correla- tion, R. In case of three variables Xq, Xi, X2, ^01 I '02 ■^^01^02'"l2 B? = ^ '12 (3) The value of the scatter, S" = a„ \/l - R^, which measures the root-mean-square of the devia- tions of the observed values of Xo from the most prob- Forecasting the Yield of Cotton from Weather Reports 107 able values of Xq when the most probable values are predicted from the forecasting formula {xo — Xo) = ai(xi — Xi) + a2{x2 — X2). We may proceed at once to illustrate the method by showing how the yield-ratio of cotton, in Georgia, may be predicted from the rainfall-ratio for May and the temperature-ratio for June. Let the yield-ratio be Xo, the rainfall-ratio for May be Xi, and the tempera- ture-ratio for June be X2. The forecasting formula is (a;o — ^0) = ai{xi — Xi) + 02(2^2 — X2). From the statistical data we find that (1) The mean values of Xo, xi, X2, are, respectively, Xo = 104.05; xi = 107.04; X2 = 100.34; (2) The standard deviations of Xq, xi, X2 are, re- spectively, 0-0 = 13.890; 0-1 = 65.528; (T2 = 3.142; (3) The coefficient of correlation between Xq and Xi = r„i = — .410; between Xq and X2 = r^ = .551; be- tween Xi and X2 = ?'i2 = ^ -427. Since a, = """^^ "^ Y^^ -°, and a2 = """^ ~ Y'' -- ^^^ 1 - r?2 0-1 1 - r\^ 0-2 substitution of the above numerical values for the algebraic symbols gives ai = — .045, and a2 = 2.033. After the proper substitutions have been made and the equation simpUfied, the forecasting formula becomes Xo = — 95.12 - .045x1 -h 2.033x2. Since i?^ = ^°i + ^°^~^J»iVi2 ^g ^^ f^j. ^ijg coefficient 1 - ?-?2 of correlation between Xo and the two variables X\, X2, 108 Forecasting the Yield and the Price of Cotton the value R'^ = .340933, or E = .584. Furthermore, since S" = o-q \/i — R'^, we get as the numerical measure of the accuracy of the forecasting formula, /S"= 11.28. We have seen that, by means of the forecast from the weather, we get a formula for reducing the variability of (To even in May, while the Government report for May we have found to be erroneous and misleading. Furthermore, by means of the additional information given by the weather reports for June, we have been able still further to reduce the variability of Cq. The value of S" = a^ \/l — R^ for June is found to be 11.28. We may at this point take stock of our gains. From Table 13 we know that, according to the official method, the accuracy of the forecasts are, for May, S'= 17.28; for June, S' = 15.20; for July, ,S'= 12.17; for August, S'= 11.92; and for September, S' = 11.08. But by means of the method of forecasting from the data of the weather, we get, for May, S" = 12.67; and for June, *S"= 11.28, where, in the June forecast, we use the rainfall-ratio for May and the temperature- ratio for June. Not only are the forecasts from the weather for these two months better than the fore- casts by the official method from the condition of the crop, but the value of *S" for May is about as good as the value of S' two months later, at the end of July; and the value of *S" for June is about as good as the value of S' two or three months later at the end, re- spectively, of August and September. But our method admits of still further usefulness. From the accompanying Table 16, we see that the fruitfulness of the cotton crop is affected not only Forecasting the Yield of Cotton from Weather Reports 109 by the weather of May and June but also by that of July and of August. The coefficients for both tem- perature and rainfall in August have significant values. If at the end of August we should wish to forecast the yield of cotton from accumulated effects of past weather — for example, of the May rainfall, the June tempera- ture, and the August temperature — we have only to extend the principles of multiple correlation to cover four variables. The steps in the development are again three in number and they run parallel with the three steps that we have already traversed in describing the correlation between two variables and the correlation between three variables. TABLE 16. — Georgia. Cobbelation Between the Yield-Ratio OF Cotton and the TBMPBRATnBE-RATio and Rainfall-Ratio Values of the Coefl&cient of Correlation May June July August September Temperature-Ratio — .097 .551 — .032 — .499 .082 Rainfall-Ratio — .410 — .411 — .254 .426 — .188 The three steps are : (1) The assumption that the equation connecting the four variables Xq, Xi, X2, Xs is x^ = ag + aiXi + a-ix^ -\- a^xs. It is not difficult to prove that this forecasting equation may be put into the form (xo — Xq) = ai{xi — Xi) + ai{Xi — x^) + ag{Xi — Xz). By the Method of Least Squares the values of the co- efficients in the forecasting equation are ascertained to be _ roi(l — ria) + roijruris — riz) + rosCrizrza - Vu) (To "'^~ (1 — rls) + ri2{ri3r2i - r^) + riaCr-iaras - rn) ai' 110 Forecasting the Yield and the Price of Cotton _ rm (1 — 7^3) + ro3(ri2ri3 — rn) + ?-oi(r-i3r.23 - rn) ^ fy^ rv^" \ V'olT'o2T'o3 '01' 23 '02' 13 '03' 12/ M= < — 2(roiro2r-i2+roiro3ri3+ro2ro3r23) _^ + 2(roiro2ri3r23+ roiro3r-i2r-23+ ro2?'o3r"i2n3) J (3) The determination of the accuracy of (xo — x^) = ai{xi — Xi) + a2(a;2 — ^2) + ai{x^ — Xg) as a forecasting formula by calculating the scatter, S" = o-q v/l — R^. To illustrate the use of this more complex forecasting formula we shall go through the work of ascertaining how the yield-ratio x^ may be predicted from the knowledge of three variables, Xi = May rainfall-ratio, Xi = June temperature-ratio, x^ = August temperature- ratio. From the statistical data we find that (1) The mean values of Xo, Xi, X2, X3 are, respectively, xo = 104.05; xi = 107.04; Xj = 100.34; X3 = 100.15; (2) The standard deviations of x„, x^, x^, Xj are, re- spectively, (To = 13.890; (71 = 65.528; cr^ = 3.142; 0-3 = 1.761; Forecasting the Yield of Cotton from Weather Reports 111 (3) The coefficients of correlation are r-oi = - .410; ro2 = .551; ros = - .499;^ ris = - .427; ri3 = .014; r23 = - .126. When these numerical values are substituted for the algebraic symbols in the above formulae, we obtain for the forecasting formula, x^ = 286.84 — .050a;i + 1.743x2 — 3.518x3; for the coefficient of multiple corre- lation, R = .732; for the scatter, S" = 9A6. In Figure 9 the continuous line represents the values of the yield-ratios for the several years as the yield- ratios are computed from the actual statistics by means of the formula ^/Yz', the dashed line represents the yield-ratios as they are computed from the rainfall- ratio of May, the temperature-ratio of June, and the temperature-ratio of August by means of the forecasting formula xo = 286.84 - .050x1 -F 1.743x2 - 3.518x3; The root-mean-square deviation of the actual ratios from the predicted ratios is ;S" = 9.46. Figure 10 illustrates the degree of precision in the forecast at the end of August, by means of the official method, of the yield of cotton in Georgia. The contin- uous line represents the value of the actual yield-ratios, computed by the formula ^/Y^; and the dashed line represents the theoretical yield-ratios as they are pre- dicted by the formula C'/Cs- The root-mean-square deviation of the actual ratios from the predicted ratios is-S'= 11.92. If now we compare the value of S" for August with the value of S' for September, that is, >S'= 11.08, 112 Forecasting the Yield and the Price of Cotton ■o/^iu-p/az/C p3/o/pajd 3uj puo oi^oj-pp/X^ /on^y Forecasting the Yield of Cotton from Weather Re-ports 113 ■o/^^-p/a/zC pa/o'psje/^^^pi^o o/^o^-p/3// /onp]/ (i< 114 Forecasting the Yield and the Price of Cotton we see that from the weather data we can obtain, by means of mathematical methods, a better forecast at the end of August than the official method enables us to obtain from the condition of the crop at the end of September. A clearer view of the value of fore- casting the cotton yield from the weather reports, by means of the methods that we have described, may be had from the following Table 17. TABLE 17. — Relative Accuracy of the Forecasts op the Yield Per Acre op Cotton (1) from the Condition of the Crop, bt THE Official Method, and (2) from the Weather Reports, BY the Method op Correlation. Georgia Months May June July August September Error of the Forecasts S' S" 17.28 15.20 12.17 11.92 11.08 12.67 11.28 9.94 9.46 9.46 In computing S" we used for May, the rainfall-ratio; for June, the May rainfall-ratio and the June tempera- ture-ratio; for July, the May rainfall-ratio, the June temperature-ratio, and the July rainfall-ratio; for August and September, the May rainfall-ratio, the June temperature ratio, and the August temperature- ratio. From Table 17 it is seen that not only is S" less than S' for every month, but S" for May is about as good as S' two months later, at the end of July; the S" for June is about as good as the S' two months later, at the end of August; the S" for July is better than the S' two months later, at the end of September; and the S" at the end of August is better than the S' at the end of September. Forecasting the Yield of Cotton from Weather Reports 115 As far as the state of Georgia is concerned we have proved our thesis: That it is possible, by means of the weather reports and mathematical methods, to fore- cast the yield per acre of cotton with a greater degree of precision than the reports of the official Bureau with its vast organization for the collection and reduction of data referring to the condition of the growing crop. The Results Compared for the Representative States The thesis that has just been proved for the state of Georgia we shall now test for the representative states Texas, Georgia, Alabama, and South Carolina, which together, in 1914, produced 65 per cent of the total crop of the United States. In Table 18 are exhibited the correlations between the yield-ratios of cotton and the temperature-ratios and rainfall-ratios of the re- presentative states.^ If we refer to the earlier part • ' The statistics of the condition of the crop and of the yield per acre of cotton were kindly supplied to me by Mr. Leon JM . "Estabrook and Mr. George K." Holmes of the U. S. Department of Agriculture. The figures are reproduced in Tables 23, 24, 2.5, 26 of the Appendix to this chapter. The weather data for Georgia, Alabama, and South Carolina were taken from the pubUcation of the U. S. Weather Bureau Cli- malolo!/ical Data for 1915, and refer, in each case, to the mean tempera- ture and mean rainfall for the entire state. The coefficients of correla- tion between the yield-ratios and the weather-ratios were based upon 20 ratios in case of Georgia (1895-1914); 17 ratios in case of Alabama (1898-1915); 21 ratios in case of South Carolina, and 21 ratios in case of Texas (1894-1914). Because of the great size of Texas and the con- centration of the cotton production in the Eastern and Central parts of the state, the mean temperature and mean rainfall were computed for those two sections from the records for the individual stations that are given in the Annual Reports of the Chief of the U. S. Weather Bureau. The thirty-one stations that were selected for the rainfall record were: Abilene, Albany, Austin, Brenham, Brownwood, Clayton- ville, Coleman, CoUege Station, Corsicana, Dallas, Fairland, Fort Worth, Fredericksburg, Gainesville, Graham, Greenville, HuntsvUle, 116 Forecasting the Yield and the Price of Cotton 5 o: n m Iz; CO t^ I-H 1 (N 03 t^ 00 ■ o l' 1 IM CO i' '■+3 1 1 1 1 H o O 3 t^ 'Jl 00 CO lO lO »o r-i 'o a ■3 o C<1 (N 1— t "S l' \ Cj} Hi £ o 3 ^ § 1 ^H 1 1 CO (U •-s 1- S l> i-H r^ O) "rt tH lO -* o o o lO IM ■^ a l' 1 H ^ CD o 00 lO lO CO 00 a o Tt< lO TJH '3 p:^ l' l' 1 J*, (^ s 2: :3 CO I> >o o CO a^ 00 CO o I-H a a f '■*3 c3 o3 ca § 'S i ^;3 CD Eh ■S ^ 3 "o O H « Forecasting the Yield of Cotton from Weather Reports 117 of this chapter we shall see that the test of the accuracy of the official crop reports rests upon the yield-ratios and condition-ratios for the period 1894-1914. In order that the accuracy of the forecasts from the weather may be more fairly compared with the accuracy of the forecasts from the condition of the crop, the period of the weather observations has been taken, in case of each state, as near as possible to the period 1894- 1914. The Texas ratios are from 1894 to 1914; those of Georgia from 1895 to 1914; of South Carolina, 1894 to 1914. Because of the hmited meteorological record in Alabama the longest series of ratios that could be obtained runs from 1898 to 1915. The raw data are given in Tables 27, 28, 29, 30 of the Appendix to this chapter, and in computing the co- efficients of correlation all of the data in the Tables have been used exactly as they are recorded. ^ It would have been possible, on several occasions, to increase the coefficients by omitting one or two rainfall-ratios which, in consequence of torrential storms, presented unduly large values; but no such hberty has been taken with the crude material, although for purposes of fore- casting the yield in normal times such a procedure might have been justifiable. Lampasas, Longview, Menardville, Nacogdoches, Palestine, Panter, Paris, San Angelo, Sulphur Springs, Taylor, Temple, Tyler, Waco, Weatherford. The seventeen stations the records of which were used in compiling the mean temperature were Abilene, Brenham, Brownwood, Corsicana, Dallas, Fort Worth, Greenville, HuntsviUe, Lampasas, Longview, Nacogdoches, Palestine, Paris, Taylor, Temple, Waco, Weatherford. The weather data for all four states are given in Tables 27, 28, 29, 30, of the Appendix to this chapter. 1 The omission of the August rainfall for 1914, in Texas, is recorded in the Notes to Table 19. 118 Forecasting the Yield and the Price of Cotton A summary view of the relative accuracy of the forecast of the yield from the condition of the crop and the forecast from the accumulated rainfall and tempera- ture is given in Table 19. In considering the following comments on Table 19 we shall bear in mind that S' measures the accuracy of the forecasts from the con- dition of the crop by the official method, and S", the accuracy of the forecast from the weather by the method of correlation. The more accurate the forecasts, the smaller are the respective values of S' and *S". (1) There are four representative states — ■ Texas, Georgia, Alabama, and South Carolina, and there are five monthly reports on the condition of the growing crop, the reports describing the condition of the crop at the end of May, June, July, August, and September. There are, therefore, twenty cases in which the accuracy of the two methods may be compared. Table 19 shows that in 17 out of 20 cases the forecast from the weather by the method of correlation is more accurate than the forecast by the official method from the condition of the growing crop. (2) For all of the representative states the forecasts by the official method from the May condition of the crop are worse than useless because the values of S' are larger than the corresponding values of a^. On the contrary, the forecasts from the May weather by the method of correlation have a real value. ^ The forecasts from the weather for Georgia and South Carolina are, at the end of May, better than the official forecasts for June, and nearly as good as the official forecasts at the end of July; and the forecast for Alabama at the ' The last of the Notes on Table 19 should be consulted. Forecasting the Yield of Cotton from Weather Reports 119 end of May is nearly as good as the official forecast at the end of September. TABLE 19. — Relative Accuracy of the Forecasts of the Yield Per Acre of Cotton (1) from the Condition op the Crop, by THE Official Method, and (2) from the Weather Reports, BY the Method of Correlation Repre- sentative States Standard Deviation of the Yield- Ratios Error of the Forecast from the Condition of the Crop, S' Error of the Forecast from the Weather, S" May June July August September S' ,S" .S' S" .S' .S" .S' S" S' S" Texas ,24.64 26.38 25.31 22.11 25.15 19.23 22.58 17.86 16.80 13.77 16.80 Georgia 13 89 17.28 12.67 15.20 11.28 12.17 9.94 11.92 9.46 11.08 10.21 9.46 Alabama 13.37 18 . 6.5 17.59 10.40 13.. 58 9.70 12.24 9.52 11.66 9.19 9.19 South Carolina 21.90 17.42 19.06 16.10 17.02 Ifi.lO 19.03 14.98 15.28 14.98 In computing S', the formula S Vr JSC/r-,-'/,-..!", was used for every N state and every month. In obtaining >S" the formulas iS" = cTj, \/\-~r-, S" = c^ \/l — /J'^ were used accord- ing as one or more independent variables were employed. The combinations of vari- ables were: In case of Texas: For May, temperature-ratio; for June, rainfall-ratio; for July, temperature -ratio; for August and September, July temperature-ratio, August tem- perature ratio, and August rainfall ratio. In computing the correlation between the yield-ratio and the rainfall-ratio for August, the rainfall data for 1914 were not used. In case of Georgia: For Maj-, rainfall-ratio; for June, May rainfall-ratio and June temperature-ratio; for July, Slay rainfall-ratio, June temperature-ratio, and July rainfall-ratio; for August and -September, I\Iay rainfall-ratio, June temperature-ratio, August temperature-ratio. In case of Alabama: For May, rainfall-ratio; for June, May rainfall-ratio, June rainfall-ratio; for July, May rainfall-ratio, June rainfall-ratio, July rainfall-ratio; for August and September, May rainfall-ratio, June rainfall-ratio, August temperature- ratio. In case of South Carolina: For May, rainfall-ratio; for June and July, May rainfall- ratio, June rainfall-ratio and June temperature -ratio; for August and September, Ma\- rainfall-ratio, June temperature-ratio, August temperature-ratio. The ratios that were correlated were obtained from the official statistics by means of the formulas tp, /b ly where the symbols refer, respectively, to the tem- perature, rainfall, and yield per acre of cotton. The values of O",, are the standard deviations of the yield-ratios when the five years progressive means are used. This will explain the rather anomalous result that S". in case of Texas, is for May and June larger than O"^. When the three years means are the basis of the j-ield-ratios, the standard deviation is 25. (io. (3) For three out of the four states the prediction from the June condition of the crop by the official method is worse than useless since the values of *S' are greater than the corresponding values of <7^. But 120 Forecasting the Yield and the Price of Cotton the forecasts from the weather are in all three cases of decided value, being in all three cases better than the official forecasts for the following month. (4) For all of the states except Texas the forecast from the weather gives, for each month, a more accu- rate prediction than can be obtained by the official method from the condition of the crop one month later. That is to say, for all of the states except Texas, S" for May is smaller than S' for June; S" for June is smaller than S' for July; *S" for July is smaller than S' for August; and S" for August is smaller than *S' for September. (5) For all of the states except Texas the forecasts from the weather give for May, June, and July about as good predictions as can be obtained by the official method from the condition of the crop two months later. (In six out of the nine possible cases S" is less than S' two months later.) Considering the character of these findings it is clear that, as far as concerns the representative states which produce sixty-five per cent of the total cotton crop of the United States, we may conclude, in terms of our thesis: "Notwithstanding the vast official organi- zation for collecting and reducing data bearing upon the condition of the growing crop, it is possible, by means of mathematical methods, to make more accurate forecasts than the official reports, in the matter of the prospective yield per acre of cotton, simply from the data supplied by the Weather Bureau as to the current records of rainfall and temperature in the respective cotton states." Forecasting the Yield of Cotton from Weather Reports 121 Three Possible Objections The substance of this section, which is technical in character, is intended to meet three quite natural ob- jections: (1) In the preceding chapter and in the early part of the present chapter, the defects in the official fore- casting formula were pointed out, and the method of correlation, with the equation {y — y) = r-^ {x - x), was shown to give better results in all of the represent- ative states and in every month of the growing season. If this better forecasting formula were applied to the official data referring to the condition of the growing crop, would not the forecasts of yield per acre be better than the forecasts which we have been able to make from the current records of temperature and rainfall in the cotton states? (2) Although data as to the condition of the growing cotton crop have been officially collected and pub- lished since 1866, the official Bureaus refrained, until 1911, from interpreting their own data in the definite form of quantitative forecasts. May not the defects in the official forecasts which we have located and meas- ured be due to the fact that the official prediction formula, which was promulgated in 1911, has been .applied in this Essay to data running through a quarter of a century? (3) The problem of measuring the relation between the yield per acre of cotton, and the amount of rainfall and the temperature at various epochs, in its period of growth, presents theoretical and practical difficulties 122 Forecasting the Yield and the Price of Cotton that leave any attempt at solution open to possible objections. The chief difficulties are (1) that the series of available data — the yield per acre series and the series of rainfall and temperature records — are sum- mary results of three classes of changes with three different sets of causes : (a) secular changes, (b) cyclical changes, (c) random changes; and (2) that there is no known statistical method that will enable one with series as short as ours to segregate satisfactorily the effects of these three types of changes. May it not be true, therefore, that the good results which we have obtained in our forecasts from the weather are results that do not rest upon real causes but are largely spu- rious, inhering in the method which we have employed? We shall proceed to the consideration of these three objections. Table 20 presents the data that are neces- sary to compare the accuracy of the forecasts from the official material as to the condition of the crop, and the forecasts from the records of temperature and rainfall, both of the forecasts being made by the methods of correlation. S measures the accuracy of the forecast from the condition of the crop, where S = a^ \/l — r', and the forecasting equation is (y — ij) = ?- — (.r — x) . CTx S" measures the accuracy of the forecast from the weather and has the same value as it has retained throughout the investigations of this chapter. The smaller the values of S, S", the more accurate are the respective forecasts. An examination of Table 20 shows that since there are four representative states and five months of the growing season, there are 20 cases in which the values of S and *S" may be com- Forecasting the Yield of Cotton from Weather Reports 123 pared. In 12 out of these 20 cases *S" is smaller than S. For purposes of exploiting the prospects of the crop, the earlier a reliable forecast can be obtained, the greater is its economic value. If, therefore, we omit the month of September, there are, in Table 20, 16 cases in which S and S" may be compared, and in 12 of these 16 cases S" is smaller than S. TABLE 20. — Relative Accuracy of the Forecasts of the Yield Per Acre of Cotton (1) prom Data as to the Condition of THE Crop, by Means op the Correlation Equation; (2) from Data as to the Weather, by' Means op the Method of Multi- ple Correlation Representative States Error of the Forecast from the Condition of the Crop, S Error of the Forecast from the Weather, S" May June July August September ,S ,S'' .S S" S S" S S" ,S S" Texas 24 61 2,5.31 22 02 25.15 17.98 22 ,-)8 17.35 16.80 13.45 16.80 Georgia 13.87 12.67 13.14 11.28 10.43 9.94 10.39 9.46 9.30 9.46 Alabama 1.3.36 10.40 12.02 9.70 10.93 9.52 10.44 9.19 9.19 9.19 South Carolina 18.64 17.42 17.38 16 10 15.35 16.10 17.62 14.98 13.53 14.98 We conclude from these comparisons that, as far as concerns the representative states producing sixty- five per cent of the total cotton crop, the forecasts from the weather are at least as good as the forecasts from the condition of the crop by means of the formula (y — y) = r — (x — x) . This latter formula, we have 0".r already seen, gives a better forecast than the official formula in all of the months of the growing season of cotton, in all of the representative states. Table 21 contains the material by means of which an opinion may be formed as to the proper answer to the 124 Forecasting the Yield and the Price of Cotton second of our three questions. Throughout this Essay the error of the official formula has been measured by the scatter of the forecasts, o = y < r^ r , and in case of each of the representative states, the computed value of S' rested upon the observations for the 21 years, 1894 to. 1914. In 1911 the Depart- ment of Agriculture defined its formula for pre- dicting, from the monthly condition of the crop, the ultimate yield per acre of cotton; but until that year the condition figures were published without an official attempt to suggest what definite inference, as to the ultimate yield, should be made from the crop reports. It might therefore be assumed that the year 1911 would mark the beginning of a more accurate crop-reporting service, and that, under the new procedure of the De- partment of Agriculture, the conclusions which we have drawn concerning the comparative accuracy of forecasts from the condition of the crop by means of the official formula, and forecasts from the weather by means of correlation equations, would no longer hold true. Or one might urge that throughout the en- tire period 1894-1914 the crop-reporting service had been continuously improved, and that, consequently, the precision of the official formula when measured by the record for this long period would be no index of its present accuracy. These very natural doubts raise two important questions of fact: (a) Has the continuous improvement of the crop-reporting service throughout the 21 years 1894-1914 been such that the error of the forecasts since 1911 is less than the error which we ob- tain when the official formula is applied to the data Forecasting the Yield of Cotton from Weather Reports 125 J s o ta J o -^ >— 1 rr: ^ i-H X H a -^ n w I--I 'D Y. H -fl W y H ss G as fa J. S2 C5 « (M CO lO O CO n ^" 00 CO o O 1— 1 T— 1 D. Sq LO - O I-H I-H „ LO "ii lO C/^ C5 " ?„ « cc IC CO o:. 3 «3 i—t T— ( T-H ^ .- ^ en T— 1 T— 1 cn' ." f~ -* CO 03 VYV l— ' ^^rl =^^S £ E E rt 53 t^ >? ^" Ol IM o CO 3 &3 1-H >>>^>. ^Asi - CT> IM CO c/o o^o o =0 t, fc; t- HHH V. (M O i-H -Xj a) M 03 IC l> o 3 f/3 ^H I-H i-s _ M lO »o o ^ (N I-H 1-H IM n (N o 00 C3 ^ IM 1— 1 S" CJ CO o -t^ CO s CQ IM (N CI . I:^ 1^ 00 CJ ^ (M (N ll a CD o 126 Forecasting the Yield and the Price of Cotton from 1894 to 1911? (b) Is it true that the year 1911 begins a period of marked improvement in the crop- reporting service; or, in more definite terms, is the error of the forecasts for the years since 1911 less than the error for the same length of time preceding 1911? Table 21 supplies the material for a decision. We recall that S' is the coefficient that measures the error of the forecasts for the whole period 1894-1914. S[, S[, S'^, in Table 21, have the following meanings: >S( is the error of the forecasts when the official formula is applied to the data for the 17 years 1894-1910; S^ is the error of the forecasts for the years 1911-1914; S'^ is the error for the preceding four years, 1907-1910. We shall consider first the comparative values of S{, S!,. In Texas, the forecasts from the data for May, June, August, and September show that S[ is in all four months greater than Si; for July, the two coefficients are equal. Taking the values of these coefficients as they are, without regard to their probable errors, it is legitimate to infer that in Texas, during the years 1911-1914 as compared with the 17 years 1894-1910, there has been an improvement in the crop-reporting service. With regard to Georgia the contrary inference must be made. In three out of five cases S[ is less than S'^, while in the remaining two cases the coefficients are equal. In Alabama, there has possibly been some improve- ment. In three out of five cases S[ is greater than S'^; in one case S'l is less than S^; and in one case the two coefficients are equal. In South Carolina there has been no change. In Forecasting the Yield of Cotton from Weather Re-ports 127 two out of five cases *S( is less than 8'^; in two cases S[ is greater than S'^; and in one case they are equal. From these findings the conclusion may be drawn that, as far as the representative states are concerned, there has been no such improvement in the crop- reporting service during the 21 years, 1894-1914, as to make questionable the testing of the accuracy of the official forecasting formula by its application to the data which we have actually employed. We come now to the other question of fact: Is it true that the year 1911, when the Department of-Agri- culture published its forecasting formula, initiated a period of a more accurate crop-reporting service? The comparative values of S!^, S[ give the necessary figures. S'2 measures the accuracy of the prediction for the four years 1911-1914, and S'^ measures the ac- curacy of the forecasts when the official formula is applied to the data of the four years preceding 1911, namely, to the years 1907-1910. There are 20 cases in which the values of S'^, S'^ may be compared, and we find, to our surprise, that in ±5 out of 20 possible cases S[ is less than S'^; that is to say, there has been abso- lutely no improvement in the recent crop-reporting service. It cannot reasonably be maintained, therefore, that because of improvements in the official forecasting service, the inferences which we have drawn, concern- ing the comparative accuracy of the forecasts from the weather and the ofl&cial forecasts from the laboriously collected data about crop conditions, have not an abiding value. 128 Forecasting the Yield and the Price of Cotton « O < 3 o a O a H g 11 £; (5 ■< Q « ^ 00 o ■*! ^ t~ T— t -^ t-i ^ o 1— c (M o '3 -3 (S 1 s a k M CO r-i 'J} © GJ CO (M CO nb T-H (N a^ 1 1 B 1 1 ^ T— 1 lO CI O: en oo ■- CO CO I> ^ K 3 1 -j; o 00 C^l o a 01 t^ o D.^ to r^ t^ S " ?J 1 t-. 1 1 o O r^ c. rti lO o O 'o ■^ o (N 1, c: r o E ^:; o O Tf< d r^ Li lO CO 7i 05 CO o "^ '5 1-H I— 1 5 1 ^ J. r^ o lO o o Th 00 LO o o o I j H ';:^ ID t^ CO to lO 03 ■" o T— 1 1-1 . S' ^ d •Ig CO o --2 i-O o 'M n ^ H Cj 2 9 o . dj '-H 1 3 •>;5 s rt -^ ^ 2 ^ -2^ ^ c cu O Ti -^ 9" i »= s- S ST'g E!I! J. 2 Sag g:2 fe o t: o >> ft o >H fc< 03 o C8 . 1-= K 0; SE 03 si; oi u 3 » ^ S ^ S o ^- o " ra ±: ^ 3 ^ .3 M C3 CD . 3 S g o tc t- V ? ■-3 c o .- :g 03 '^ .1? >. __^ s jr c o ■-- '2 1 "i I o- -C CO c 2 rt -Ch -P ;^ t: „-^ o - g 'C '^ K .,-H 0^ ^ -- ra _j w a -^ t: "3 qC ^ o -e ^ — : ■- .ii 2: t" c ^ •^ CO ^ ^ ,-1 75 208 S4 Sll 81 64 03 150 6.) 74 71 79 70 184 Appendix 133 TABLE 24. — Geobgia. Official Monthly Reports on the Con- dition OF the Growing Cotton Crop, and Official Final Estimates of the Annual Yield Pee Acre in Pounds of Cotton Lint Year Condition of the Crop Yield per Acre in Pounds of Lint June 1 July 1 August 1 September 1 October 1 1889 80 86 91 90 87 155 1800 94 95 94 86 82 165 1 80 85 86 82 78 155 2 87 88 84 79 75 160 3 4 87 86 83 77 76 136 76 78 So 84 79 155 5 82 88 87 76 72 152 6 95 94 92 71 67 122 7 84 85 95 80 70 178 8 89 90 91 80 , 75 183 9 88 85 79 69 64 159 1900 89 74 77 69 67 172 1 80 72 78 81 73 167 2 3' 4 5 94 91 S3 68 62 165 75 75 77 81 68 158 78 85 91 86 78 205 84 82 82 77 76 200 6 86 82 74 72 68 165 7 S 9 1910 11 12 13 74 78 81 81 76 190 80 83 85 77 71 190 84 79 78 73 71 184 81 78 70 71 68 173 92 94 95 81 79 240 74 72 68 70 65 163 69 74 78 76 72 208 14 80 S3 82 81 SI 239 134 Forecasting the Yield and the Price of Cotton TABLE 25, ■ — Alabama. Official Monthly Reports on the Con- dition OF THE Growing Cotton Crop, and Official Final Estimates op the Annual Yield Per Acre in Pounds of Cotton Lint Year Condition of the Crop Yield per Acre in Pounds of Lint June 1 July 1 August 1 September 1 October 1 1SS9 X3 87 90 91 87 163 1890 93 • 95 93 84 80 160 1 2 89 87 SEi S3 76 165 91 90 83 72 69 135 3 82 80 79 78 76 148 4 ,ss 87 94 SB 84 160 5 85 S3 81 71 70 135 6 103 98 93 66 lil 124 7 81 85 88 80 73 155 195 • S 89 91 95 80 76 9 86 88 82 76 70 176 1900 X7 70 67 04 62 151 1 78 SO 82 75 65 156 2 92 84 77 54 52 144 3 73 7f) 79 84 68 161 4 SO s.-, 90 84 76 1S2 5 87 83 79 70 70 173 6 81 84 83 76 68 165 7 65 68 72 73 68 109 s 78 82 85 77 70 179 9 S3 64 68 OR 02 142 1910 83 SI 71 72 67 160 204 11 91 93 94 80 73 12 74 78 73 75 68 173 13 75 79 79 72 67 190 14 85 S8 SI 77 78 209 Appendix 135 TABLE 26. — South Carolina. Official Monthly Reports on THE Condition of the Growing Cotton Crop, and Official Final Estimates of the Annual Yield Per Acre in Pounds op Cotton Lint Year Condition of the Crop Yield per Acre in Pounds of Lint June 1 July 1 August 1 September 1 October 1 1889 1890 78 84 90 87 81 141 97 95 95 87 83 175 1 80 80 83 81 72 160 2 91 94 83 77 73 184 3 88 83 75 63 62 142 4 S3 88 95 86 79 168 5 72 84 81 82 64 141 6 97 08 88 70 67 129 7 87 86 92 84 74 189 8 9 85 90 89 81 79 245 86 88 78 66 62 165 1900 8.5 79 74 60 57 167 1 80 70 75 80 67 141 2 97 95 88 74 68 199 3 76 74 76 80 70 178 4 81 88 91 87 81 215 5 78 78 79 75 74 220 175 6 82 77 72 71 66 7 8 77 79 81 83 77 215 81 84 •84 76 68 219 9 83 77 77 74 70 210 1910 78 75 70 73 70 216 11 80 84 86 74 73 280 12 83 79 75 73 68 209 13 68 73 75 77 71 235 14 72 81 79 77 72 255 136 Forecasting the Yield and the Price of Cotton TABLE 27. — Texas. Temperature (Degrees FAHRE^fHEIT) and Rainfall (Inches) in Eastern and Central Texas Year May June July August September Tem- pera- ture Rain- fall Tem- pera- ture Rain- fall Tem- pera- ture Rain- fall Tem- pera- ture Rain- fall Tem- pera- ture Rain- fall 1891 7i.3 2.64 81.7 2.48 82 7 1.71 81.1 1.70 77.3 2.26 2 73 5 4.5.3 79.5 4.37 82.8 1.85 80.9 4.12 75.5 1.40 3 73.0 4.71 80.0 2.88 So.O 0.75 81.6 2.19 79.3 1.79 i 74,5 3 60 78.8 2.56 82.0 2.03 79.8 6.26 76 9 2.57 71 3 7.01 78.6 6.18 82.8 4 11 83.5 1.78 80.5 1.79 6 77 S 1.62 83 4 0.99 84.8 1.94 85.2 1.64 78.0 4 56 7 8 72.3 4.33 80.6 3.65 85.9 1.26 83.0 2.32 76.9 2.58 74 5 3. 55 80 2 5.63 82.3 2.09 82 6 2.50 77.7 1.61 9 77 1 .3.49 79 8 6 56 82.5 2 00 86.6 0.36 76.7 1.16 1900 72.3 5 89 81.8 1.85 81.8 4.50 82.0 2.98 80.8 6.60 1 72 4 4.22 81 8 1,08 85 4 1.99 85.2 1.77 76.8 3 . 27 2 76.4 3.78 82 8 2.19 81.6 7.73 85.2 0.11 75.0 4.51 3 70 2.27 73.9 3.70 81.3 5.91 82.4 1.71 74.7 2.98 4 72.3 4 76 79.4 4.57 81 8 2.45 81.9 2.16 79.0 2.82 5 74.9 .T 95 81.1 4.47 80.7 4.90 83.9 1.03 79.5 2.02 6 72.6 3.96 80.5 3.83 80.8 5.13 80.8 4.46 78.3 3,50 7 67 6 6.80 80 4 1 85 83,1 2 87 85.1 1.01 79.0 1.29 8 9 73 2 7.87 81 3 2.19 81 S 2 60 82 2 2.28 76.0 3.54 72.2 2.91 81 1 3.07 80.5 1.56 85.4 2.00 78.0 0.88 1910 ,71.4 4.04 80.2 1 79 84 8 1.15 86 4 0.83 81.5 1.7S 11 73.2 1.50 84.7 0.65 82.8 ' 4.36 84.6 2.93 83,1 1.53 12 74.0 2.21 77.4 3.28 85.4 1.04 S4.0 3.47 77 9 0.77 13 73.0 3.. 55 79.0 2.66 S4 9 1.52 84.7 1.03 73.2 5.70 14 71.1 7.81 82 4 1.31 86.2 0.91 80.7 8.95 77 4 1.39 Appendix 137 TABLE 28. — Georgia. Temperature (Degrees Fahrenheit) and Rainfall (Inches) 138 Forecasting the Yield and the Price of Cotton TABLE 29. — Alabama. Temperature (Degrees Fahrenheit) AND Rainfall (Inches) Year May June July August September Tem- pera- ture Rain- fall Tem- pera- ture Rain- fall Tem- pera- ture Rain- fall Tem- pera- ture Rain- fall Tem- pera- ture Rain- tall 1896 75.8 3.44 77.2 5.24 80,9 5.09 82.2 2.30 75.8 1.76 7 08. 6 1.56 80.9 1.85 81 1 4.78 78 8 5,58 75.6 O.oo 8 73.0 0.82 80.4 3.60 80,0 6.06 78.7 7.43 73.5 3.38 9 75.9 2.03 79.8 2 .54 80,4 6 76 81.3 3 68 72.7 0.66 1900 71.2 2,64 76.4 11.08 79,8 4.93 81.6 2.89 77 S 4.00 1 69 8 5.08 78 5 2.80 82.2 3.40 78.6 8,86 72.1 4.19 2 75 4 2.34 80.8 1.28 82,8 2.50 82 1 3,48 73 4 4,28 3 69.0 6.05 73 2 4.88 80.0 3.98 80.5 3,57 73 2 1.42 4 69. U 2.98 77.8 2,94 79.6 4 80 78 4 5,55 76.8 1.36 5 74 2 5.51 79.0 4.56 79.4 4.56 79,2 .5.30 76 2 2.61 6 69.7 4.63 78.9 3.45 78.8 8.50 80 4 3.78 78.2 S 44 7 68.0 7.94 73.6 2.85 81,0 5,00 80 4 3.50 74 8 5.50 8 71.4 5.34 77.5 2.76 79.8 4.72 79 4 3.44 74.2 2.42 9 68.6 6.51 78.0 7.82 79.3 4.52 81,0 3.30 73,7 2 87 I9I0 68.9 3.86 75.6 6.98 78.6 7.18 79.7 2.73 77,6 2.21 11 72.9 2.85 80.6 3.86 78.0 5.66 79.1 4 97 80 4 2.32 12 72.0 3.60 75.1 5.10 79.7 5.17 79.2 5.68 77.1 4.79 13 71 9 3.14 77.5 3.. 54 81. 1 5.00 80.5 2.38 73.1 6.96 14 71.8 1.05 83.1 2,66 81,6 4.23 79.1 6.41 72.4 4.69 1.5 |74 5 6 34 78 8 3,66 80,4 5.23 78.9 6.07 76.6 4.43 Appendix 139 TABLE 30. — South Carolina. Temperature (Degrees Fahrenheit) and Rainfall (Inches) Year May June July August September Tem- pera- ture Rain- fall Tem- pera- ture Rain- fall Tem- pera- ture Rain- fall Tem- pera- ture Rain fall Tem- pera- ture Rain- fall 1891 69.0 3.. 57 79,3 3.20 77.9 5.95 78.9 8,79 74.3 2,66 2 71.1 5.53 75.1 5.25 78.9 7 44 79,5 4,42 72.6 6.41 3 70.2 4.13 76.5 7,64 82.0 3.87 77,5 12,45 74 7 4 42 4 70.7 3.43 77.0 3,91 77 , 7 8 24 77,9 7.28 75.0 6.51 5 69.0 4.36 78.2 3.04 79.5 4.17 79,4 7.95 76.9 1.29 6 76.7 2.74 77.9 5.42 80.7 8.17 80 4 4.14 75,0 2.94 7 69.3 2.39 79.2 5.44 80.2 5,01 78,0 5.16 73,3 2,91 8 73.8 1.35 79,7 4.15 80,0 7 81 78,7 9,81 76,0 4.06 9 73.7 1.68 79,4 3.89 80,0 4.03 81.2 6,26 72 8 2.55 1900 70.2 2.. 37 76,2 7.94 81,2 4,08 83.0 2.13 77.1 2.83 1 71.4 7.31 76.7 6.55 81,4 4,52 78.6 9.01 73.1 4.66 2 74.0 2.69' 78.5 4 48 80.8 3,79 78.6 5.07 72.1 3.74 3 70.7 2.69 74 2 8.09 80.4 3,59 80.6 7.15 72.7 3.62 4 70.6 2.04 77 4.06 79.4 5,96 77.6 8.47 75,8 2.46 5 73.4 5.70 78.9 1.92 80 4 6,16 77.9 5 69 76,2 1.91 6 70.7 3.00 78.4 8.88 78 4 8.40 80.6 6.62 78,0 4 86 7 70.8 4.51 75.8 5.92 81.4 5.06 79.4 5 41 76.0 5.91 8 71.8 2.92 76.6 4.90 79,8 5.43 78.6 9.11 72,4 2.86 9 69.5 4.26 79 2 6 87 78,6 4,92 78.8 4 83 72 3.74 1910 69.8 4 03 75 5 7.78 79 4 6.83 79.0 6.00 75.8 3.10 11 72 6 0.65 80.9 3.42 79,8 3.79 80,0 6.05 78.9 3.33 12 72,4 4.08 75.5 5.68 79,6 5.22 79,2 3.69 77.7 5.91 13 71.7 2.13 76.2 5.53 81.9 4.78 78.9 3.76 71 7 4.66 14 72.1 0.83 81.1 3.80 80.0 5.56 79,0 5 88 71.3 3.63 CHAPTER V THE LAW OF DEMAND FOR COTTON "There is a general agreement as to the character and directions of the changes which various economic forces tend to produce. jNIuch less progress has been made towards the guantitative determina- tion of the relative strength of different economic forces." — Alfred Marshall. The investigations of the preceding two chapters have made us acquainted with the degree of rehability of the Government reports on the prospective cotton crop and with the measure of accuracy with which, at any stage in the growth season, the prospective yield of cotton may be calculated from the past conditions and vicissitudes of the weather. The new problem that we face in this chapter carries the inquiry to its final stage: Assuming that the ultimate volume of the crop may be forecast with a known degree of precision, is it possible to predict the relation that will subsist between the size of the crop and the price of cotton lint? Is it possible to know the dynamic law of the demand for cotton? Two Practical Methods of Approach In Chapters III and IV, we found that the method of progressive averages enabled us to get valuable results in the problem of forecasting the amount of production from the Government reports on the condition of the growing crop, and from the records of temperature and rainfall in the Cotton Belt. We shall test the helpful- The Law of Demand for Cotton 141 ness of this same device in our present inquiry as to the form of the concrete law of demand for cotton. Method of progressive averages. In Table 31 the data^ are collected for computing the relation between the price-ratio and the production-ratio of cotton. The problem to be solved may be put into symbolic form: Let p be the mean price per pound of cotton for any given year, and pa be the mean price for the preceding three years; let P be the total production of cotton for the given year, and Ps be the mean production for the preceding three years. Our problem is to find (1) the coefficient of correlation measuring the relation between P/p3 and P/Pa; (2) the statistical law connecting P/ps with P/Ps, which is the concrete law of demand for cotton; (3) the error incurred in using the law of de- mand for cotton as a formula with which to forecast the price of cotton from the prospective size of the crop. The values of the series P/pa and P/Pa, for the period 1890 to 1913, are given in columns 4 and 7 of Table 31. The calculation of the items that constitute the solu- tion of our problem gives: (1) The coefficient of correlation between P/pa and P/Pa is r = - .706; ' The crude data are taken from the Statistical Abstract of the United Stales, 1914, p. 505. "The production statistics relate, when possible, to the year of growth, but when figures for the year are wanting, a com- mercial crop which represents the trade movement is taken. The sta- tistics of production have been compiled from publications of the United States Department of Agriculture for 1860 to 1898. Census figures have, however, been used when available, including those for 1899 to date." Ibid., note 1. " The value of Unt per pound shown since 1902 relates to the average grade of upland cotton marketed prior to April 1 of the following year; from 1890 to 1901, the average price of middling cotton on the New Orleans Cotton Exchange." Ibid., note 2. 142 Forecasting the Yield and the Price of Cotton TABLE 31. — The Pkoduction-Ratio Cotton AND THE PhICE-RaTIO OP Year Equivalent 500 Pound Bales, Gross Weight (Millions of Bales) P Mean Pro- duction for the Preced- ing Three Years Pa Production- Ratio Price per Pound Upland Cotton (Cents) V Mean Price for the Preceding Three Years P3 Price- Ratio PL Ipz 1887 6.88 10.3 8 6.92 10.7 9 7.47 11.5 1890 8.56 7 09 120.7 8.6, 10.8 79.6 1 8.94 7.65 116.9 7.3 10.3 70.9 2 6.66 8.32 80.0 8.4 9.1 92.3 3 7.43 8.05 92.3 7.5 8.1 92.6 4 10.03 7.68 130.6 5.9 7.7 76.6 5 7.15 8.04 88.9 8.2 7.3 112.3 6 8.52 8.20 103.9 7.3 7.2 101.4 7 10.99 8.57 128.2 5.6 7.1 78.9 8 11.44 8.89 128.7 4.9 7.0 70.0 9 9.35 10.32 90.6 7.6 5,9 128.8 1900 10.12 10.59 95.6 9.3 6.0 155.0 1 9.51 10.30 92.3 8.1 7.3 111.0 2 10.63 9.66 110. 8 2 8.3 98 S 3 9.85 10.09 97.6 12.2 8.5 143.5 4 13.44 10.00 •134 4 8.7 9.5 91.6 5 10.58 11.31 93.5 10.9 9.7 112.4 6 13.27 11.29 117.5 10.0 10.6 94.3 7 11,11 12.43 89.4 11.5 9.9 116.2 8 13.24 11.65 113.6 9.2 10.8 85.2 9 10.00 12.54 79.7 14.3 10.2 140.2 1910 11.61 11.45 101.4 14 7 11.7 125.6 U 15.69 11.62 135.0 9.7 12.7 76.4 12 13.70 12.43 110.2 12.0 12.9 93.0 13 14.16 13.67 103.6 13,1 12.1 108.3 The Law of Demand for Cotton 143 (2) The concrete law of demand for cotton is ?/ = — .975a:; + 206.03; where x is put for Pjpi, and y is the most probable value of 'Pjpi, corresponding to the given value of -P/jPs'; (3) The accuracy with which the law of demand for cotton may be used to forecast the price of cotton lint is measured by *S = (Tys/l — r- = 16.38. Method of -percentage changes.^ In Table 32 the crude statistical data of production and prices are utilized in a different way. From year to year both the price and the production of cotton undergo changes, and in the construction of Table 32 the hypothesis in mind suggested that there is a close relation between the per- centage change of the price in any given year over the price of the preceding year, and the percentage change in production of the given year over the production of the preceding year. The percentage changes are tabu- lated in columns 4 and 7. The calculations based upon the data of this Table show that (1) The coefficient of correlation between the per- centage change in price and the percentage change in production is r = — .819; (2) The dynamic law of demand for cotton is y = — l.OSx -|- 8.81 ; where x is put for the percentage change in production, and y is the most probable value of the percentage change in price, corresponding to the given percentage change in production; (3) The accuracy with which the dynamic law of ' A more ample description of this method is contained in Economic Cycles: Their Law and Cause, Chapter IV. 144 Forecasting the Yield and the Price of Cotton demand for cotton may be used to forecast the percent- age change in the price of cotton hnt is measured by s = o-vr^r72 = 15.18. TABLE 32. — Percentage Changes in the Price and Production OF Cotton Lint Year Equivalent 500 Pound Bales, Gross Weight (Millions of Bales) Change over the Preceding Year Percentage Change over the Preceding Year Price per Pound Upland Cotton (Cents) Change over the Preceding Year Percentage Change over the Preceding Y"ear 1889 7.47 11.5 . 1S90 8.56 + 1.09 + 14 . 59 8.6 — 2.9 — 25 22 1 8.94 + 0.38 + 4.44 7.3 — 1.3 — 15 12 2 6 66 — 2.28 — 25.50 8 4 + 1.1 + 15.07 3 7.43 + 0.77 + 11.56 7.5 — 0.9 — 10.71 4 10.03 + 2 60 + 34.99 5.9 — 1 6 — 21.33 5 7 15 — 2.88 — 2S.71 8 2 + 2.3 + 38 98 B 8.52 + 1.37 + 19.16 7.3 — 0.9 — 10.98 7 10.99 + 2 47 + 28 99 5.6 — 1.7 — 23.29 8 11.44 + 0.45 + 4.10 4.9 — 0.7 — 12.50 9 9.35 — 2.09 — 18.27 7.6 + 2.7 + 55 10 1900 10.12 + 77 + 8.24 9.3 + 1.7 + 22 , 37 I 9 51 — 0.61 — 6.03 8.1 — 1.2 — 12.90 2 10.63 + 1.12 + 11 78 8 2 + 0.1 + 1.23 3 9.85 ■ — 0.78 — 7 34 12.2 + 4.0 + 48 78 4 5 13.44 + 3.59 + 36.45 8.7 -35 — 28 . 69 10.58 — 2.86 — 21.28 10.9 + 2.2 + 25 . 29 6 13.27 + 2.69 + 26.43 10.0 — 0.9 — 8. 26 7 11.11 — 2 16 — 16.28 11. o + 1.6 + 15 00 8 13.24 + 2.13 + 19.17 9.2 — 2.3 — 20.00 9 10.00 — 3.24 — 24.47 14 3 + 5 1 + 55 43 1910 11 (11 + 1.61 + 16.10 14 7 + 0.4 + 2.80 11 15 69 + 4.08 + 35.14 9.7 — 5.0 — 34.01 12 13.70 — 1.99 — 12.68 12.0 + 2.3 + 23.71 13 14 16 + 0.46 + 3 36 13.1 + 1.1 + 9.17 The Law of Demand for Cotton 145 Figure 11 makes clear to the eye the measure of agree- ment between the actual percentage changes in price and the percentage changes as they are predicted from the law of demand. A comparison of the results we obtain from these two methods of deriving the law of demand for cotton shows that there is very little difference between them so far as the accuracy of the forecasts are concerned. But, in Chapter I, we have said that it is possible to forecast the price of cotton from the size of the crop with greater accuracy than the Bureau of Statistics can forecast the yield of cotton from the known condi- tion of the growing crop. This statement we shall now prove. Throughout our investigations we have meas- ured the accuracy of forecasts by S = is linear, such that <^(.ri, X2, X3, . . .xj = M = ao-|-aiXi-|-a2a;2-|-. . .+ a„a;„; (2) That the interrelations of Xi, x^, Xi, . . . x„ are also linear, such that, for example xi = 61 + 62a;2- This second supposition will present no difficulties, since we have dealt with the problem of finding the relation between Xi and x^ when the connection is of the simple form Xi = h\-\- b^Xi. With regard to the first supposition, all that we need to do in order to ob- tain a satisfactory practical solution of our problem is so to determine from the actual statistics the values of the constants Uo, ai, . . . a„ that the correlation be- tween Xo and u, which we designate by R, shall be a maximum. In that case the Value oi S = ao \/l — R', which measures the root-mean-square error of the fore- casts by means of the formula Xo = fto + aiXi + 02X2 + . . . 4- a„^«, will be a minimum. Now precisely this problem has already received a 154 Forecasting the Yield and the Price of Cotton general solution in the statistical theory of multiple correlation. If, for the sake of simpUcity, we take the case of three variables, then the equation connecting Xa, Xi, X2 in such a way that the correlation is a maxi- mum between the actual values of Xo and the predicted values of Xo has the form -^ rai—raiTiida , . . , r„2 — roiri2(ro , ,, (x„^x„) = ^~^ - (,,_,0 + -j-^- (X.-X.) roi~\~ ro2 2ro\r(Bri2 and S = (To\/l — R'^, where R- 1 The forecasting equation enables us to predict the most probable values of Xo from the known values of Xi, X2, and S measures the degree of accuracy with which the forecasts are made. An example will make this abstract discussion much clearer. Professor Edgeworth makes the statement that, "One important cause of alteration in demand curves is the increase of the consumer's purchasing power. The case in which that increase is only apparent, being due to a rise in prices (and the converse case), may be specially distinguished. Owing to the variabil- ity, it may be doubted whether Jevons's hope of con- structing demand curves by statistics is capable of realisation." ^ Professor Edgeworth doubtless meant that because of the many factors tending to produce a variation in the demand schedule it might be doubtful whether Jevons's hope could be realised. But suppose — for ' Palgrave's Dictionary of Polilical Economy, "Demand Curves," Vol. I, p. 544. The Law of Demand for Cotton 155 the sake of simplicity and concreteness but in illustra- tion of a method of complete solution — we limit our inquiry to this question : How may the relation between the price of cotton and the amount of cotton demanded be determined (1) when account is taken of the varying purchasing power of money; (2) when there is no varia- tion in the purchasing power of money? Let Xo be the percentage change in the price of cotton, Xi be the percentage change in the amount of cotton produced, and x^ be the percentage change in the index number of general prices. We may then put our problem and its solution in this form: (1) Xo = 4>{xi, Xi), and we assume as a preliminary hypothesis that the form of <^(xi, X2) is linear so that we may write Xo = ao -|- aiXi -|- 02X2.. According to the theory of multiple correlation, when the values of tto, ai, a2 are so determined from the actual data as to make the correlation between the actual values of Xo and the values of Xo when forecast by the above formula a maximum, then , -, r-oi — ro2ri2(To , _ . , ^02 — r-oiri2 Co , _. (X„ — Xo) = —, -, (Xi - Xi) -\ 7, (Xo - X2). 1— ri2 ffi 1— ^12 cj The statistical material necessary for the computation of the quantities indicated in these symbols is given in Table 33. When the actual computations are made and the numerical values are substituted in the above equation, we obtain as our forecasting formula Xo= - .97x1-1- 1.60x2 -H 7.11. This formula enables us to predict the probable value of Xo for given values of Xi, Xa; it enables us to say what 156 Forecasting the Yield and the Price of Cotton TABLE 33. — Data tor the Quantitative Determination or the Law op Demand for Cotton Year Equivalent 500 Pound Bales, Gross Weight (Millions of Bales) Prioe per Pound of Upland Cotton (Cents) Bureau of Labor's In- dex of Prices of "All Com- modities" Percentage Change in the Amount Produced XI Percentage Change in the Price of Cotton In Percentage in the In- dex of Gen- eral Prices 1S89 7.47 11.5 115 1890 8.56 8.6 113 + 14.59 — 25.22 — 1.74 1 8.94 7,3 112 -1- 4.44 — 15 12 — O..S,S 2 6.66 8 4 106 — 25.50 + 15.07 — 5 3i; 3 7 43 7.5 106 -1-11.56 — 10.71 0.00 4 5 10.03 5.9 96 -h 34 . 99 — 21.33 — 9 43 7.15 S 2 94 — 28 71 + 38.98 — 2,08 6 8.52 7.3 90 -1-19 16 — 10.98 — 4 26 7 10.99 5 6 90 + 28.99 — 23.29 0.00 8 9 11.44 4.9 93 + 4.10 — 12.50 + 3.. 33 9.35 7.6 102 — 18.27 + 55.10 + 9 , 6,s 1900 10.12 9.3 110 + 8 24 + 22.37 + 7 84 1 9.51 ,S.I 108 — 6.03 — 12 90 — 1 82 2 10.63 ,S 2 113 ill 78 + 1 23 + 4 , 63 3 9.85 12.2 114 — 7 34 + 48.78 + 8,S 4 13.44 S.7 113 + 36 . 45 — 28.69 — 0. 88 o 10 68 10.9 116 — 21 28 + 25 29 + 2 65 6 1.S.27 10.0 122 + 25.43 — 8.26 + 5.. 17 7 11.11 11.0 130 — 16.28 + 15.00 + 6 . 56 8 13 24 9.2 123 + 19.17 — 20.00 — 5. ,38 9 1910 10.00 14.3 126 — 24.47 + 55.43 + 2.44 11.61 14.7 132 + 16.10 + 2.80 + 4.76 11 15.69 9.7 120 + 35.14 — ,34.01 — 2 27 12 13.70 12.0 1.S4 — 12 lis + 23 71 + 3 SS 13 14.16 13,1 135 + 3,36 + 9.17 + 75 the probable change in the price of cotton will be when we know the probable changes in the production of cotton and in the level of general prices. Figure 12 traces for a period of twenty-five years the actual variations in the percentage changes in the price of The Law of Demand for Cotton 157 ^ ^ jH -t-> T-< rr 1— ( zz 1> + fl ^- ■ss l' + i^ I Q< O ■S cj "3 m '3 > ■i/G//OJ_^0 3Dud 3t/^ ^/ £s6aOL^O 36d4U3DJS(J t^) a; '^ > Ci '^ bi -a 1^ o ■s rt x" v; GJ 1 ^ j:3 rrl H ■^ -(-> rn i 1 o o fl o K ^ |J es (it 158 Forecasting the Yield and the Price of Cotton cotton, together with the percentage changes as they are predicted by means of the formula Xo = — .97x1 + 1.60x2 + 7.11. (2) The degree of accuracy with which the above forecasting formula enables us to predict the changes in the price of cotton is measured by ^■oi + rl2 — 2roiro2rio S = (Tos/l ~ R-, where R- = r ^ 1 — 7'l2 The computations from the statistical data of Table 33 shows that R = .859, and S = 13.56. This is a very high coefficient of correlation, and con- sequently the forecasting formula makes possible the prediction of the changes in the price of cotton with a relatively high degree of precision. We have now a solution of the first part of our prob- lem. We know how to forecast the changes in the price of cotton when account is taken of the changes in the amount demanded and of variations in the purchasing power of money. We know, besides, the degree of reliability with which our forecasts are made. We next enter upon the second part of our problem: What is the relation between the changes in the price of cotton and the changes in the amount demanded when there are no changes in the purchasing power of money? (3) Since, in the forecasting formula Xo = ■— .97xi -|- 1.60x2 -|- 7.11, the variables Xo, Xi, X2 are percentage changes, if we put x^ = 0, we obtain the answer to the second part of our problem. The equation Xo = — .97xi -1-7.11 expresses the relation between the changes in The Law of Demand f err Cotton 159 the price of cotton and the changes in the amount of cotton demanded when the purchasing power of money remains constant. Figure 13 traces the course of the actual changes in the price of cotton, and the changes as they would occur under the supposition that the level of general prices remains constant. The root- mean-square error of the forecasts by means of this formula is *S = 15.38. Furthermore, by the theory of partial correlation we know that when X2 is constant — in this case when Xi = zero — ■ the coefficient measuring the relation be- tween Xo and Xi is poi = / i /^~ i '' which, by V 1 7*02 V 1 ?'i2 computation from the statistical data of Table 33, gives Poi = - .808. (4) If we collect our results bearing upon the rela- tion of changes in the price of cotton, changes in the amount of cotton demanded, and changes in the pur- chasing power of money, we find that we have con- sidered their interrelations under three different aspects : (i) The relation between Xo and Xi, when no atten- tion is paid to the variable X2, and Xo is regarded as a simple function of Xi. This is the case of the dynamic law of demand in its simplest form. Here foi = — .819; S = 15.18. The graph is given in Figure 11. (ii) The relation between Xo and both Xi and X2, where Xo is regarded as a function of two variables. This is illustrative of the case of the dynamic law of demand in its complex form. Here R = .859; S = 13.56. The graph is given in Figure 12. (iii) The relation between Xo and Xi, when X2 = 0; 160 Forecasting the Yield and the Price of Cotton M C'. (S CJ _d "! ^ CJ « -M bi; a a o 1 ^ a t^ E + o3 uo^o:> yo 3Dud ^'y/ ^/ i-^Suoi^o sSp^usmsij t-) T5 § -*^ (1) a -O oj -^j tt-i CO +3 O C ^ .S ^ 8 ^ r^ 1 rr^ « S' IS a o ri "■^ i> c3 ;-< s o -^j O CO § OJ a; •^ CD c! '4-' o OJ o ft s o cc ft 1 a;i 3 P. s ■i >. 5? 5! o ^ 01 5 GJ d" 1 3 1 o 'o 1 'd 5 -g S: a O ^ ■^ ft 63 t O — _kj S The Law of Demand for Cotton 161 that is to say the relation between the changes in the price of cotton and the changes in the amount of cot- ton demanded, when the purchasing power of money remains constant. This is illustrative of the static law of demand. Here poi = - .808; S = 15.38. The graph is given in Figure 13. A consideration of these results will show how theo- retical difficulties disappear before a practical solution. One of the discouraging aspects of deductive, mathe- matical economics is that when a complete theoretical formulation is given of the possible relations of factors in a particular problem, one despairs of ever arriving at a concrete solution because of the multiplicity of the interrelated variables. But the attempt to give statis- ■ tical form to the equations expressing the interrelations of the variables shows that many of the hypothetical relations have no significance which needs to be re- garded in the practical situation. When we write the law of demand in the form Xo = (j>ixi, Xi, x^, . . .xj, it is true, as we have pointed out, that we do not know the form of nor the types of the interrelations of Xi, Xi, Xz, . . .x„; but when we are confronted with the prac- tical problem of forecasting a:o, we can find empirical functions that enable us to predict Xo with a high degree of accuracy. Nor is this the only result of the prac- tical solution. We find that since roi = — .819 and poi = — .808, there is really no difference in the close- ness of the relation between x^ and x^ whether we com- pletely ignore the variations in the purchasing power of money or regard the purchasing power of money as 162 Forecasting the Yield and the Price of Cotton constant. And since R = .859, we learn that while the relation between Xo and x^ may be fairly high (in this case r-o2 = .492) still there is only a small advantage in accuracy of forecasting when we consider Xo a func- tion of the two variables X\, Xi instead of a simple linear function of X\. If we were to regard Xo = ^(xi, X2, Xs, . . .X,,) and the correlations between Xo and Xs; . . . ; Xo and x„ were small, little or nothing would be gained in accuracy of the forecast by considering these addi- tional variables.^ The method which we have adopted in case of three variables is general in its character and may be ap- plied to any number of variables. When Xo = <^ (xi, x., Xs, . . . x„) and the variables are all percentage changes, it is possible not only to deal with the dynamic law of demand in all of its natural complexity, but also to as- certain the static law of demand giving the relation between Xo and xi when all of the other variables are equal to zero. The problem of ascertaining the statistical form of the law of demand receives by this method an adequate solution. ' "If A in part determines B, when we disregard other factors, and C in part determines B, u'hen we disregard all else, and similarly D and E, it is argued that all these part-determinations can be added together and the sum will finally determine B. But the error made lies in the supposition that A, C, D, E, etc., are themselves indejpendent. In the universe as we know it, all these factors are themselves to a greater or less extent associated or correlated, and in actual experience, bvit little effect is produced in lessening the variability of B, by introducing addi- tional factors after we have taken the first few most highly associated phenomena." Pearson: Grammnr of Science, 3rd edition, p. 172. CHAPTER VI CONCLUSIONS The business of economic science, as distinguished from economic practice, is to discover the routine in economic affairs. It aims to separate out the elenients of the routine, to ascertain their interrela- tions, and to use the knowledge of their connections to anticipate ex- perience by forecasting from known changes the probabihties of corre- lated changes. The seal of the true science is the confirmation of the forecasts; its \alue is measured by the control it enables us to exercise over ourselves and our environment. Economists theoretical and practical ha\'e grown impatient with any form of speculation that is not of immediate use. The present generation of theoretical economists expects an inquiry to be dynamic, to take account of the economic flux, to show a routine in change; otherwise, it is hypothetical, static, without significance in the affairs of daily life. The man of affairs must be convinced that an economic inquiry will either make directly for the common weal or else will reveal to him, in the pursuits of his daily life, a source of individual profit; otherwise, as far as he is concerned, the inquiry is academic, visionary, doc- trinaire. The progress of the new type of economic theory is insured by the fact that it is profitable for practical men to give it their support. Forecasting is the essential aim of both the economic scientist and the man of affairs. According to the most approved doctrine, economic profit has its origin in economic changes. Other forms of income — interest, wages, and rent — would exist in a purely stationary 164 Forecasting the Yield and the Price of Cotton state, but there would be no profit. The talent of the director of industry in the modern state consists in his capacity to foresee and to exploit economic changes, and his profit is proportionate to the accuracy with which his forecasts are made. The economic scientist is likewise concerned with changes. His talent con- sists in his capacity to separate the general from the accidental, to detect the routine in the multitudinous details. His success is proportionate to the simplicity and generality of the routine that he may discover and the accuracy with which he is able to foretell the size and direction of future changes. To exemplify the simple laws of economic change, it appeared advisable to begin, not with a complex industrial state like England or the United States, but with a contemporary, progressive society in which the whole economic life is dependent upon a few funda- mental interests. It seemed that no territory would afford a more promising field for such a quest than the Cotton Belt of the United States. Throughout a long period it has been recognized that in the vast area of the Cotton Belt which, with Russia excepted, equals in area a third of Europe, " Cotton is King." And not only is cotton the leading staple of the South, but three- fourths of the world's production of this indispensable commodity is the yield of our Cotton Belt. Not only does the change in the price and yield of this commodity affect the local Cotton Belt, but, to the extent that cotton enters into international trade, its vicissitudes are reflected throughout the world. Would it be possible to discover the routine in the Conclusions 165 yield and price of cotton so that the knowledge might be used for purposes of forecasting? For their information as to the condition and promise of the growing cotton crop, farmers, brokers, manu- facturers, and merchants rely primarily upon 'the re- ports of the United States Department of Agriculture. To meet the pubhc demand, the Department of Agri- culture has instituted a wonderful statistical organiza- tion. By a connection with many thousands of cor- respondents, by field-agents, by special experts in crop estimates, by a Bureau of Statistics and a Crop-Report- ing Board, information has been systematically gath- ered and tabulated, and for several decades monthly reports have been issued throughout the growth season of the crop. Extraordinary precautions have been taken to prevent any leakage of the precious informa- tion before it is given to the public. What is the value of these reports? Since they are issued under the aegis of the Government they are assumed to be fairly accurate representations of the facts, and official authorities have, very naturally, lost no opportunity to point out the direct advantage to farmers of the expenditure of pubHc funds for this particular purpose. Speculators have regarded the official documents as of value for their ends, and nu- merous rumors have circulated of bribes offered and bribes taken for advanced information as to the con- tents of the reports. But what is the value of these crop reports in the sense of their degree of accuracy as descriptions of actual facts and their measure of re- liability as forecasts? Five reports are issued during the growth season of 166 Forecasting the Yield and the Price of Cotton cotton and refer to the condition of the crop at the end of May, June, July, August, and September. An ex- amination of these reports for a period extending over a quarter of a century, 1890-1914, shows: (1) That the May report, covering the condition of the cotton crop in the whole country at the end of May, is so erroneous that any forecast from it is spurious. Any money that changes hands as a result of the report is the gain or loss of a simple gamble ; (2) That the June report as a basis of forecasts is better than the May report, but that its value for pur- poses of forecasting the yield per acre of cotton is negligible; (3) That the remaining three reports — for July, August, and September — have real value, but the measurement of their degree of accuracy reveals the anomaly of the July report being as good as the re- port for August ; (4) That the official method of forecasting favors the farmers by giving an underestimate of the probable yield of cotton. These are the concrete facts upon which the practical man in touch with actual affairs bases his economic conduct. Is it the part of a visionary to expect to obtain equally reliable forecasts of the cotton yield from the simple reports of the weather? Lord Kelvin has told us that "when you can meas- ure what you are speaking about and express it in numbers, you know something about it, but when you cannot measure it, when you cannot express it in num- bers, your knowledge is of a meagre and unsatisfactory kind." By the help of statistical methods that rest Conclusions 167 upon the theory of probabihty, it is possible to measure the precise degree of accuracy of any method of fore- casting, and, consequently, it is possible to compare the relative accuracy of forecasts based upon official re- ports and forecasts that are derived from the records of accumulated rainfall and temperature in the states of the Cotton Belt. For purposes of comparison we have taken the four leading cotton states, which to- gether produce 65 per cent of the entire crop. These four states are Texas, Georgia, Alabama, and South Carolina. Not only do these four states produce the greater part of the total cotton crop, but they represent the weather conditions throughout the whole Cotton Belt: Texas exemplifies the conditions in the extreme Southwest; Georgia and South Carolina, those at the other extreme on the Atlantic Coast; and Alabama typifies the conditions on the Gulf of Mexico. We shall consider, for these representative states, the results of comparing the forecasts from the condition of the growing crop, by the official method; and the forecasts from the changes in rainfall and temperature, by a method which we have fully described. The comparison of methods will be made clear by examining first the results for the single state of Georgia. From calculations based upon data covering a quarter of a century, we find in case of Georgia : (1) That for each of the five months of the growth season the forecast of the yield per acre of cotton which is based upon the weather data is decidedly better than the forecast from the condition of the crop, by means of the official formula; (2) That for every month the forecasts from the 168 Forecasting the Yield and the Price of Cotton weather are better than the forecasts a month later, by the official method; or, more definitely, the forecasts from the accumulated weather at the end of May, June, July, and August are better than the forecasts by the official method at the end of June, July, August, and September; (3) That when regard is paid to the probable errors of the coefficients measuring the accuracy of the fore- casts, then, for every month, the forecasts from the weather are as good as the forecasts two months later by the official method. Or, more definitely, the fore- cast from the May weather is as good as the forecast by the official method at the end of July; the forecast from the joint effect of the May and June weather is as good as the forecast by the official method at the end of August, and the forecast from the accumulated weather at the end of July is as good as the forecast by the official method at the end of September. We shall now extend our comparison to the results for the representative states, Texas, Georgia, Alabama, and South Carolina. ' As there are five monthly reports on the condition of the growing crop and we have taken four representative states, there are twenty cases in which the forecasts of the yield per acre of cotton may be compared: (1) In 17 out of 20 cases the forecasts from the weather are more accurate than the forecasts from the condition of the crop, by the official method; (2) For all of the representative states the fore- casts by the official method from the May condition of the crop are worthless. By contrast, all of the fore- Conclusions 169 casts from the May weather have value. The forecasts from the weather for Georgia and South Carohna are, at the end of May, better than the forecasts by the official method at the end of June, and about as good as those at the end of July; and the forecast from the May weather in Alabama is about as good as the forecast by the official method at the end of September. The value of the forecast from the May weather in Texas is negligible; (3) For three out of the four representative states the forecasts from the June condition of the crop, by means of the official method, are worthless. But in all three cases the forecasts from the accumulated weather at the end of June are better than the fore- casts by the official method at the end of July; (4) For all of the states except Texas the forecasts from the weather give, for each month, more accurate predictions than can be obtained by the official method from the condition of the crop one month later. The forecasts from the accumulated weather at the end of May, June, July, and August are better than the fore- casts by the official method at the end of June, July, August, and September; (5) For all of the states except Texas the forecasts from the accumulated weather at the end of May, June, and July are about as good as can be obtained by the official method from the condition of the crop two months later, at the end, respectively, of July, August, and September. As the routine of measurable dependence of yield upon the weather is due to the presence of natural 170 Forecasting the Yield and the Price of Cotton causes, it might easily be inferred that when we move to strictly social facts, no such routine will be found. It could be argued that the price of cotton results from "the law of supply and demand"; the supply may be predictable because it is primarily dependent upon natural causes; but the demand is a social fact and is the resultant of many individual choices each of which, in its turn, is dependent upon many variable factors. By such a priori reasoning it would be easy to conclude that it ife futile to attempt to find a predictable routine in the dependence of the price of cotton upon the size of the crop. But again we are reminded of Lord Kelvin's state- ment that ' ' when you can measure what you are speak- ing about and express it in numbers, you know some- thing about it, but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind." Our researches have shown that there is a dynamic law of demand — a law that connects the price of cotton with the size of the crop — and that the knowledge of the law would have made possible the prediction of the price of cotton from 1890 to 1914 with a degree of accuracy higher than that attained by the formula of the Department of Agriculture in the annual prediction of the size of the crop at the end of September. Upon the appearance of the monthly cotton reports, great sums of money exchange hands because of the light they are supposed to throw upon the probable size of the crop. The most reliable report is, of course, the one nearest the harvest, but the accuracy of the forecasts of yield that are based upon this report is less than the accuracy with which Conclusions 171 the price of cotton can be predicted from the size of the crop, by means of the law of demand. Both the yield and the price of cotton, therefore, are so much a matter of routine that they admit of pre- diction with a high degree of precision. In laying the foundation of the modern type of Eco- nomic Theory, Jevons foresaw the necessity of the simultaneous development of its deductive and its inductive phases: ' ' I know not when we shall have a perfect system of statistics, but the want of it is the only insuperable obstacle in the way of making Economics an exact science. In the absence of complete statistics, the science will not be less mathematical, though it will be immensely less useful than if it were, comparatively speaking, exact. A correct theory is the first step to- wards improvement, by showing what we need and what we might accomplish." "The deductive science of Economics must be veri- fied and rendered useful by the purely empirical science of statistics. Theory must be invested with the reality and life of fact." ^ In the opinion of Professor Marshall the pressing need of economic science at the present time is "the quantitative determination of the relative strength of different economic forces." ^ And Professor Pareto, in 1 Jevons: Theory of Political Economy, 3rd edition, pp. 12, 22. Cf. Jevons: Principles of Science, chapter xxii, end of the section on "Illus- tration of Empirical Quantitative Laws." 2 Cf . the address of Professor W. J. Ashley as President of Section F of the British Association for the Advancement of Science. Report of the British Associatioti for the Advancement of Science, 1907, p. 591. 172 Forecasting the Yield and the Price of Cotton like spirit, points out the conditions of the further de- velopment of our science : "The progress of Political Economy in the future will depend in great part upon the investigation of empirical laws, derived from statistics, which will then be compared with known theoretical laws, or will suggest the derivation from them of new laws." ^ The idea that I should like to emphasize is that because of the recent development of statistical theory and the improvement in the collection of statistical data, we are now able to meet the needs so clearly de- scribed by the masters of the science. The great advance in the methodology of deducti^'e economics, after Cournot's epoch-making work, was initiated by Leon Walras in his use of simultaneous equations for the purpose of completely surveying the interrelated factors in the problems of exchange, pro- duction, and distribution. It was necessary in his work and in the work of his successors to begin with a simple, hypothetical construction and to approach the concrete problem by the introduction of an increasing number of complicating factors. The equations expressing the relations between the variables were, of necessity, arbitrary, but the device made possible the envisaging of all the elements in the problem, and suggested the types of their interrelations. But economic theory has now reached the stage where, according to Professor Marshall, there is need of a "quantitative determina- tion of the relative strength of the different economic ^ Giornale degli Economisli, Maggio, 1907, p. 366. "II progresso deir Economia politioa dipendera pel future in gran parte dalla ricerca di leggi empiriche, ricavate daHa statistica, e ohe si paragoneranno poi collf leggi teoriche note, o che ne faranno conoscere di nuove." Conclusions 173 forces"; and, according to Professor Pareto, empirical laws must be derived from statistics for the double purpose of comparing them with known theoretical laws and of gaining bases for new theoretical develop- ments. The statistical theory of multiple correlation is per- fectly adapted to these demands. No matter what may be the number of factors in the economic problem, it is specially fitted to make a "quantitative determi- nation" of their relative strength; and no matter how complex the functional relations between the variables, it can derive "empirical laws" which, by successive approximations, will describe the real relations with increasing accuracy. The mathematical method of deductive economics gives a coup d'ceil of the factors in the problem; the statistical method of multiple correlation affixes their relative value and reveals the laws of their association. The mathematical method begins with an ultra-hypothetical construction and then, by successive complications, approaches a theoret- ical description of the concrete goal. The method of multiple correlation reverses the process: It begins with concrete reality in all of its natural complexity and proceeds to segregate the important factors, to measure their relative strength, and to ascertain the laws according to which they produce their joint effect. When the method of multiple correlation is thus applied to economic data it invests the findings of deductive economics with "the reality and life of fact"; it is the Statistical Complement of Deductive Economics. 'HE following pages contain advertisements of Mac- millan books by the same author. Economic Cycles: Their Law and Cause By henry LUDWELL MOORE Professor of Political Economy in Columbia University 8vo, $2.00 Extract from the Introduction: "There is a consid- erable unanimity of opinion among experts that, from the purely economic point of view, the most general and characteristic phenomenon of a changing society is the ebb and flow of economic life, the alternation of energetic, buoyant activity with a spiritless, depressed and uncertain drifting. . . . What is the cause of this alternation of periods of activity and depression? What is its law? These are the fundamental problems of economic dynamics the solution of which is offered in this Essay." Comments of Specialists Moore's book is so important that it is sure to be widely criti- cized. Yet so far as the fundamental conclusions are concerned the book is so firmly grounded on a vast body of facts that its main line of argument seems unassailable. . . Moore has gone much further than his predecessors and has removed his subject from the realm of probability to that of almost absolute certainty. Here- after there can be little question that apart from such influences as the depreciation in gold, or great calamities like the war, the general trend of economic conditions in this country is closely dependent upon cyclical variations in the weather." — Ellswoeth Hunting- ton, in the Geographical Review. In reply to the question: "What are the two best books you have read recently," President Butler named, as one of the two books, Professor Moore's Economic Cycles because of its being "an original and very stimulating study in economic theory with quick applications to practical business affairs." — Nicholas Murray Butler, in the Nexo York World. "Professor Moore is known among scholars as one of the keenest and most cautious of investigators. . . . His novel methods of investigation constitute an additional claim upon our interest; the problem of the crisis has never yet been approached in precisely this way." — Alvin S. Johnson, in the New Republic. "This book indicates a method of utilizing (economic) data . . . that is worthy of the highest commendation." — Allen Hazen, in the E^igineering News. "If the promise of Professor Moore's convincing Essay is fulfilled, economics will become an approximately exact science. . If progress is made in the direction of such a goal as a result of this work, it will be the economic contribution of a century, and will usher in a new scientific epoch." — Roy G. Blakey, in the Times A7iiialist. "The agricultural theory of cycles has found a new and brilliant exponent in Professor Henry L. Moore." — Wesley Clair Mit- chell, in the American Yearbook. "If his methods stand the test of experience, and can be widely adopted, the field of business may be revolutionized so far as it con- cerns the enterpriser because the measuring of the force of under- lying, fundamental conditions will become approximately accurate and the function of the enterpriser will thereby be reduced." Magazine published by Alexander Hamilton Institute. "L'auteur a mis k son service des proc^des mathematiques et statistiques raffines et elegants celui-ci a ecrit un livre bril- lant." — Umberto Ricci, in Scientia. THE MACMILLAN COMPANY Publishers 64-66 Fifth Avenue New York Laws of Wages An Essay In Statistical Economics By henry LUDWELL MOORE Professor of Political Economy in Columbia University 8vo, $1.80 Extract from the Introduction: "In the following chapters I have endeavored to use the newer statistical methods and the more recent economic theory to ex- tract, from data relating to wages, either new truth or else truth in such new form as will admit of its being brought into fruitful relations with the generalizations of economic science." Comments of Specialists "Professor Moore brings to his task a wide acquaintance witli the most difficult parts of the literature of economics and statistics, a full appreciation of its large problems, a judicial spirit and a dig- nified style." — F. W. Taussig, in the Quarterly Journal of Eco- nomics; . "Statistics of the ordinary official kind have often served to support the arguments of political economists. But this is the first time, we believe, that the higher statistics, which are founded on the Calculus of Probabilities, have been used on a large scale as a but- tress of economic theory," — F. Y. Edgeworth, in the Economic Journal. "Professor Moore has broken new ground in a most interesting field, and while we may differ from him in the weight to be attached to this or that result or the interpretation to be placed on some observed coefficient, we may offer cordial congratulations on the work as a whole.'' — G. Y. Yule, in the Journal of the Royal Statistical Society. " Die Fruchtbarkeit der verwendeten Methode scheint mir durch diese Untersuchungen zweifellos erwiesen, ebenso wie die Erreich- barkeit des Ziels, die Theorie ganz dicht an die Zahlenausdriicke der wirtschaftlichen Tatsachen heranzubringen. Und das ist eine Tat, zu der der Autor nur zu begluckwiinschen ist. . . . Hat das Buch auch auf der Hand liegende Fehler — in der Zukunft wird man sich seiner als der ersten klaren, einfachen und zielbewussten Darlegung und Exemplifizierung der Anwendung der 'hoheren Statistik' auf okonomische Problems dankbar erinnern." — Joseph Schumpeter, in the Aixhiv fixr Sozialwissenschafl und Sozialpolitik. "Non seulement il nous enseigne I'emploi d'une msthode qui dans de certaines limites peut etre tres feconde. Mais encore son habileti^ personnelle dans le maniement de cette methode est tres r^elle. II salt scruter les statistiques d'une fagon fort psnetrante et exposer les r^sultats de ses recherches aveo beauooup d'elSgance. Le lecteur frangais en particulier, appr^ciera I'ingeniosite avec laquelle il tire des statistiques frangaises des inductions souvent nouvelles et justes.'' — Albert Aftalion, in the Revue d'histoire des doctrines economiques. "Alcuni dei risultati ottenuti dall'autore, sono nuovi e suggest! ^•i e da essi molte conclusioni si possono trarre (cui I'autore accenna nel capitolo finale della sua opera) sia rispetto alle teorie del salario che rispetto alia politioa sociale. II libro e insomma, ripetiamo, un contribute molto importante all'investigazione scientifiea dei fe- nomeni economici e A'orremmo che esso stimolasse parecchi altri studiosi a fare per altre Industrie o per altri paesi, recerche analoghe." — Constantino Bresciani Turroni, in the Giornale degli Econo- misii. THE MACMILLAN COMPANY Publishers 64-66 Fifth Avenue New York