Jttiaca, New $nrk BOUGHT WITH THE INCOME OF THE SAGE ENDOWMENt FUND THE GIFT OF HENRY W. SAGE 1891 ENGINEERING LIBRARY Date Due i\iuv D l C^l 1959 DECl 4 ^959 jEB-..l.Jiiia^ jia^ m/ zi m^c^ NO V, 5 196^ '^*m««i^dwiMMM T^Ht ftinii' 7 t^iSit- -££ ' • iiV- /> 196h Q0T2! i 1967 yunJ 2^ 'Z^ PRINTED IN (**f NO. Z3233 Cornell University Library QA 273.F52 "Vhe mathematical theory of probabilities 3 1924 004 250 456 Cornell University Library The original of tiiis book is in tine Cornell University Library. There are no known copyright restrictions in the United States on the use of the text. http://www.archive.org/details/cu31924004250456 THE MATHEMATICAL THEORY OF PROBABILITIES _^r^^ THE MACMILLAN COMPANY NEW YORK BOSTON CHICAGO DALLAS SAN FRANCISCO MACMILLAN & CO., Limited LONDON BOMBAY CALCUTTA MELBOURNE THE MACMILLAN CO. OF CANADA. Ltd. TORONTO THE MATHEMATICAL THEORY OF PROBABILITIES AND ITS APPLICATION TO FREQUENCY CURVES AND STATISTICAL METHODS BY ARNE FISHER. TRANSLATED PROM THE DANISH BY CHARLOTTE DICKSON, B.A. (COLUMBIA) MATHEMATICAL ASSISTANT IN THE DEPARTMENT OF DEVELOPMENT AND RESEARCH OF THE AMERICAN TELEPHONE AND TELEGRAPH COMPANY AND WILLIAM BONYNGE, B.A. (BELFAST) WITH INTRODUCTORY NOTES BY M. C. RORTY AND F. W. FRANKLAND, F.I.A., F.A.S, F.S.S. VOLUME I Mathematical Probabilities, Frequency Curves, Homograde and Heterograde Statistics SECOND EDITION GREATLY ENLARGED NEW YORK THE MACMILLAN COMPANY 1922 Copyright, 1915 and 1922, By ARNE fisher Set up and electrotyped. Published November, 1915. Second Edition, greatly enlarged, May, 1922. FEINTED IN TUB UNITED STATES OF AMBEICA x'/^ INTRODUCTORY NOTE TO THE SECOND EDITION. Mr. Fisher has requested that an introduction be written to this, the second edition of his work on probabilities, which shall indicate some of the practical applications of the mathematical theory with which his treatise deals. The writer has only a limited knowledge of mathematical technique — ^yet it has so happened that in twenty-five years of active work as engineer, statistician and executive he has had frequent occasion to call upon the skill of trained mathematicians for the solution of practical problems involving frequency curves and probabilities. Among such mathematicians none has been more helpful, or quicker to perceive the possibility of making valuable applications of higher mathematics to business problems', than Mr. Fisher himself. For this reason it is a duty as well as a privilege to outline, at his request, certain actual practical expe- riences with mathematical applications and to indicate such possible applications for the future. The writer's initial experience with frequency curves and probabilities was in the years 1902 and 1903, when it became evident, in analyzing various problems in telephone traffic, that certain- peak loads, which were superimposed upon the normal seasonal, weekly, and daily fluctuations, could be accounted for only by the laws of chance. Recourse was, therefore, had to the formulae then available for approximate summations of the terms of the binomial expansion, and from these a series of curves was drawn which indicated for any given normal hourly traffic (as indicated by studies of seasonal, weekly, and daily variations) the probability that any given short period load would be equalled or exceeded.. Practical experience with these curves soon showed that, in spite of minor errors, they were close enough to the real facts to make them of primary importance in traffic studies of all kinds, and particularly in the development of mechanical switching de- vices. Their use for such purposes has now become a common- place in telephone engineering. As a by-product of the preceding application there have been other interesting uses of the same probability curves. Effective studies have been made of the decrease in the total stocks of small machine parts that could be made possible by standardizing and VI INTRODUCTORY NOTE TO THE SECOND EDITION. reducing the number of types of screws, bolts, nuts, etc. The curves can also be applied directly to every line of business and every type of operation where prompt service must be given and where the demand arises from a large number of independent sources, and is, therefore, subject to peak loads determined by the laws of chance, which may be superimposed upon other "normal" peak loads varying with the days of the week, the hours of the day, etc. Entirely separate applications of frequency curves are those necessary in actuarial work. These are relatively well known. But it is less generally known that one of the most important of business problems, that of depreciation, can be treated effectively only when approached on an actuarial basis with a full under- standing of the frequency curves which go^•ern the displacement, year by year, of the physical units involved. A still further use of frequency curves and the theory of probabil- ities, which is of immediate practical importance, is in connection with sampling operations. The theory of sampling has already been well developed, but adequate efforts have not yet been made by mathematicians to reduce the processes of sampling to de- pendable simple rules that can be applied by business executives and statisticians untrained in higher mathematics. In census work, and in statistical and other reports made by business or- ganizations, the waste of money, that could be avoided by an inteUigent application of the theory of sampling, is very great. Not only can many reports and analyses be made much more cheaply and quickly by sampling processes, but they can also be made more accurately. Many important items of information can" be determined only by trained specialists. In such cases the only procedure, that does not involve prohibitive expense in large census operations, is to tie such items, by a sampling process, to other items which are susceptible of exact enumeration Ijy lelatively miskilled enumerators, and then to compute the totals for the special items from the relations of such items to the items which are completely enumerated. All of the preceding are in the field of immediate practicalities. When we come to the future, one of the most pro:iiLiag uses of mathematics is in the development of logical processes. It is not going too far to say that all business, and most engineering opera- tions are fundamentally based on probabilities. The business man is always dealing in degrees of uncertainty, and even the engineer INTRODUCTORY NOTE TO THE SECOND EDITION. VU has only occasionally a definite set of conditions upon which to base his computations. Where the problem is primarily a financial one, he must balance the cost of overbuilding against the cost of underbuilding; and, if he combines business judgment with en- gineering skill, he will multiply the amount of each possible loss by the probability of its occurring, and will ordinarily choose, among all possible plans, the plan which involves the minimum probable loss. Here it is not inappropriate to interject the idea that the most practical logic must always be in terms of probabil- ities, and that a logic which deals, or pretends to deal, in certainties only is not alone useless, but is also harmful and misleading, when difficult problems are to be approached. Such problems can rarely, if ever, be solved except through the cumulation toward a certainty of many small probabilities established from uncorre- lated, or only partially correlated, viewpoints. A final suggestion which is to-day speculative, but may assume important practical aspects in the near future, is with respect to the applications of frequency curves and probabilities to physical and cosmic mathematics. In such mathematics we are forced to assume that all of our measures must arise out of the things meas- ured. When we deal with physical velocities, it would seem that our only measures of velocity can arise out of the velocities them- selves. Similar considerations hold true with respect to funda- mental measures of physical extension. Under these circum- stances we may talk in terms of infinite space and of infinite time, but we can hardly talk in terms of infinites when we are dealing with the dimensions of atonuc structure and the velocities of material particles. In these cases it seems very highly probable that we are dealing with frequency distributions which we must measure and define in terms derived from such distributions them- seh'es. "\Mth respect to such measures some of our frequency curves may have infinite "tails," but it is more probable that the frequency forms are such that they can be completely defined in finite terms. Along this same line, we may even risk a closing speculation that the relative proportions of organized matter and space in the stellar universe are determined through the opera- tions of the laws of chance in establishing heterogeneities in what is otherwise a homogeneous void-filling medium. M. C. RORTY. New York City, March 2, 1922. PREFACE TO THE SECOND EDITION. At the time when the first edition of this little book was published in 1916, I expected to issue a second volume shortly after, dealing with frequency curves and frequency surfaces as well as the re- lated problem of co-variation (correlation). The manuscript for this volume was completed and printing had already commenced on some of the chapters, when a series of misfortunes, not neces- sarily imexpected, overtook the work. A major part of the manu- script while in transit to a friend in Denmark for review and cor- rections went down with a Danish vessel when torpedoed by an outlaw German submarine. A duplicate copy was for some reason or other withheld by the British military censor and not returned to the writer until long after the termination of the world war. My third and final copy of the manuscript, which I had submitted to an American friend for critical review was also lost in transit. The veritable nemesis which seems to have followed my efforts is, however, only a verification of the all prevailing laws of chance, which every serious minded student must face with unperturbed attitude. In fact, the above misfortunes have, after all, only made me more determined to complete another collection of notes, which I eventually hope to put into proper shape for publication. In the meantime the first edition has been out of print for more than two years, and when the publisher asked me to prepare a new edition I took advantage of this opportunity to add several chapters on frequency fimctions and their application to het- erograde statistical series so as to give a complete treatment of statistical functions involving one variable. The book is, there- fore, twice its original size and contains the major part of what I originally intended for a second volume. The reader will readily notice that my treatment of the subject is based throughout upon the principles of the classical probability theory as founded by Bernoulli, De Moivre and above all by the great Laplace and his disciple, Poisson. I am of the opinion that these principles and their further extension by the Scandinavian statisticians and actuaries, Gram, Thiele, Westergaard, Charlier, Wicksell and Jorgensen, offer as yet the best and also the most powerful tools for the treatment of collected statistical data by means of mathematical methods. In the way of adumbration and X PREFACE TO THE SECOND EDITION. economy of thought the Laplacean methods stand unsurpassed in the whole realm of mathematical statistics. I have, therefore, in this volume limited my investigations to a systematic treat- ment along these lines. I hope, however, in the forthcoming second volume to treat the methods of Pearson, Edgeworth, Kaptejm, Bachelier and Knibbs and show their relation to La- place's theory. The reason why the Laplacean doctrine of frequency curves has been ignored until comparativel3' recent years and has remained more or less obscure is perhaps due to the fact that for more than a century it remained a theory pure and simple and was used but sparingly in practical calculations. Any statistical theory, in order to be of use in practical work, must be arranged in such a manner that it is readily adaptable to numerical computations. Advanced mathematical computation has not been given its due reward and proper attention in our ordinary academic instruction. A high grade mathematical computer i--. indeed a "rare bird," much more so in fact than a good mathematician. To arrange and plan the numerical work in connection with the theoretical formula so that the detailed and painstaking work is reduced to a minimum, and at the same time afford tlie proper means for checking and counterchecking, is by no means an easy task and often requires as much ingenuity as the actual development of the theoretical formulje. While Gauss has always been acknowledged as one of the world's greatest com- puters and in addition to his extensive work in pure mathematics also did much practical work in surveying, physics, and in financial and actuarial investigations, Laplace during his entire career remained a pure mathematician and apparently failed to grasp the paramount attributes required by a successful computer. His attempt to inject himself into public life, as for instance when he secured for himself an appointment as minister of the interior, must be regarded as a dismal failure as admitted in Napoleon's memorandum on his dismissal. The failure of Laplace to recognize fully the all-important phase of numerical computations in all observations on statistical mass phenomena is in my opinion the main reason why the Gaussian theory of observations and the allied subject on the theory of least squares has hitherto supplanted the admittedly superior theory of the great Frenchman. Gauss in addition to his theory furnished an essentially useful and elegant method for performing the neces- PREFACE TO THE SECOND EDITION. XI sary numerical calculations, while Laplace left this decidedly important aspect out of consideration altogether. It remained in reality to Charlier to furnish the Laplacean doctrine with a prac- tical method for computing the various statistical parameters. And in the meantime the Gaussian methods reigned supreme whUe Laplace's great work was neglected. The careful reader will readily notice that in the treatment of frequency curves I have allowed the semi-invariants, originally introduced in the theory of statistics by Thiele, to occupy a central position. In my opinion the semi-invariants represent a more powerful tool than the method of moments. I have also tried to rescue from oblivion the important and original memoir by the Danish actuary, Gram, and give to him and the French math- ematician, Hermite, their due recognition as the earliest investi- gators of skew frequency distributions. Gram was perhaps the first investigator to make proper use of the orthogonal functional properties of the Laplacean normal frequency curve and its deriva- tives. By means of an application of the orthogonal properties of the Hermite polynomials and their close relation to the theory of integral equations, the whole theory of frequenc}^ distribution can be presented in a decidedly compact form; and I deem no apology necessary for ha\-ing introduced in my treatment of frequency curves some of the more elementary theorems of integral equations, that youngest branch of higher analysis, which at present occupies a central position in advanced mathematics. The most recent investigations along those lines have been made by the Swedish astronomer, Charlier, and his disciples, Jorgensen and Wicksell. Unfortunately these investigations have hitherto not received adequate and systematic treatment in Eng- lish and American texts on statistics, and it is my hope that the following pages may be of service in opening the eyes of English speaking statisticians to the practical utility of these methods. The examples have all been selected so as to give a complete and detailed illustration of the application of the theory to es;-;entially practical problems. I have, on the other hand, purposely refrained from giving the customary exercises, so-called, usually found in statistical texts, especially those in German and English. Although I have been a close student of and have read most of the published statistical text-books in about seven languages for the last ten years, I regret to state that I have found little or no XU PREFACE TO THE SECOND EDITION. practical value in such trick exercises, which as a rule have but slight relation to problems occurring in daily life. Since the appearance of the first edition of this book in 1916 a number of excellent statistical texts have been issued. Among these I may mention a new edition of Yule's well-known ele- mentary text, a greatly enlarged edition of Bowley's Elements of Statistics, the new treatise by Caradog Jones, an enlarged German translation of Charlier's Grunddragen, a very lucid Swedish text by Wicksell, the scholarly and broadly planned Statistikens Teori i Grundrids (in Danish) by Westergaard, and last but not least, the thesis by Jorgensen, Frekvensflader og Korrelation} Although an extended residence in the United States has per- haps improved my barbaric Dano-English, I fear that I must still apologize to the reader for my shortcomings in rhetoric and gram- mar. Most of the serious defects have, I hope, been overcome by the diligent efforts of my co-editor and translator. Miss C. Dickson, mathematical assistant in the department of Development and Research of the American Telephone and Telegraph Company. Miss Dickson's work has indeed been much beyond that of mere translation. Her knowledge of the mathematical theory of prob- abilities has enabled her to suggest to me several improvements in my Danish notes. I am also under great obligations to a number of friends and colleagues who have assisted me in the preparation of this volume. I am especially indebted to Mr. E. C. Molina, the well-known probability expert of the American Telephone and Telegraph Company. Mr. Molina's extensive knowledge of the works of the old French masters, especially of those of Laplace, has been of the greatest value to me, and I can truthfully say that I have nowhere met a mathematician so thoroughly acquainted with the intricacies of the Theorie Analytique as Mr. Molina. My thanks are also due to Mr. F. L. Hoffman, the Statistician of the Prudential Insurance Company, for the interest he took in my work along those lines while I was employed as a computer in his department. To Messrs. M. C. Rorty and D. R. Belcher of the American Telephone and Telegraph Company, I beg leave to ' As a pure probability text we may mention G. Castelnuovo's, Calcolo delle Probabilita (MUano, 1919), as an exceptionally lucid and rigorous treatise. The recently issued Treatise on Probability by J. M. Keynes is briefly discussed in paragraph 138 of this book. A. F. PREFACE TO THE SECOND EDITION. Xlll express my best thanks for their kind advice and encouragement in the preparation of this volume. It is indeed impossible to adequately express in a mere formal preface my obligations to Mr. Rorty in this matter. His introduc- tory note I regard as one of the highest rewards I have received in this field of endeavor where one must usually be content with the appreciation of one's peers. In this connection it is of interest to note that Mr. Rorty is the pioneer investigator in the application of the mathematical theory of probabilities to telephone engineer- ing, which has been further developed in recent years by Molina of America, Erlang and Johaimsen of Denmark, Holm of Sweden, Odell and Grinsted of Great Britain. The pioneer work by Mr. Rorty in this eminently practical field antedates the earliest work by Erlang in Tidsskrift for Matematik by nearly five years. Last, but not least, I wish to convey my sincerest thanks to my Scandinavian compatriots, Westergaard, Charlier, Jorgensen, Wicksell and Guldberg from whose works I have drawn so freely. To these gentlemen and to the works of the late Messrs. Gram and Thiele of Copenhagen I really owe anything of value which may be contained in this work. Ahne Fisher. New York, April, 1922. INTRODUCTORY NOTE TO THE FIRST EDITION. I feel it a great honor to have been asked by my friend and colleague, ^Mr. Arne Fisher, of the Equitable Life Assurance Society of the United States, to write an introductory note to what appears to me the finest book as yet compiled in the English language on the subject of which it treats. As an Examiner myself in Statistical Method for a British Colonial Government, it has been to me a heart-breaking experience, when implored by intending candidates for examination to recommend a text-book dealing with IMr. Fisher's subject matter, that it has heretofore been impossible for me to recommend one in the English language which covers the whole of the ground. Until comparatively recent years the case was even worse. While in French, in Italian, in German, in Danish, and in Dutch, scientific works on statistics were available galore, the dearth of such literature in the English language was little short of a national or racial scandal. With such works as those of Yule and Bowley, in recent years, there has been some possibility for the English-speaking student to acquire part of the knowledge needed. But it is hardly necessary to point out what a very large amount of new ground is covered by Mr. Fisher's new book as compared with such works as I have referred to. Despite my professional connection with statistical and actu- arial work of a technical character my own personal interest in INIr. Fisher's book is concentrated principally on the metaphysical basis of the Probability-theory, and it is with regard to this aspect of the subject alone that I feel quahfied to comment on his achievement. With all the controversy that has gone on through many decades among metaphysicians and among writers on logic interested especially in the bases of the theories of probability and induction, between the pure empiricists of the type of J. S. Mill and John Venn (at all events in the earliest edition of his work) on the one hand, and the (partljO a priori theorists who base their doctrine on the foundation of Laplace on the other hand, it has XVi INTRODUCTORY NOTE TO THE FIRST EDITION. been a source of intense satisfaction to me, as in the main a dis- ciple of the latter group of theorists, to note the masterly way in which Mr. Arne Fisher disentangles the issues which arise in the keen and sometimes almost embittered controversy between these two schools of thought. It has always seemed to the present writer as if the very foundations of Epistemology were involved in this controversy. The impossibility of deriving the corpus of human knowledge exclusively from empirical data by any logic- ally valid process — an impossibility which led Immanuel Kant to the creation of his epoch-making philosophical system — is hardly anywhere made more evident than in what seems to the present writer the unsuccessful effort of thinkers like John Venn to derive from such purely empirical data the entire Theory of Probability. The logical fallacy of the process is analogous to that perpetrated by John Stuart Mill in endeavoring to base the Law of Causality on what he termed an "indudio per simplicem enumerationem." Probably there is nowhere a more trenchant and conclusive exposure of the unsoundness of this point of view, than in the Right Honorable Arthur James Balfour's monu- mental work "A Defense of Philosophic Doubt." It is there- fore satisfactory to find that Mr. Fisher emphasizes, quite at the beginning of his treatise, that an a priori foundation for " Proba- bility" judgments is indispensable. Hardly less gratifying, from the metaphysical point of view, is Mr. Fisher's treatment of the celebrated quaestio vexata of Inverse Probabilities and his qualified vindication of Bayes' Rule against its modern detractors. Aside altogether from metaphysics, it is particularly satis- factory to note the full and clear way in which the author treats the Lexian Theory of Dispersion and of the "Stability" of sta- tistical series and the extension of this theory by recent Scandi- navian and Russian investigators, — a branch of the science which has till the appearance of this new work not been adequately covered in English text-books. It may of course be a moot question whether the preference given by our author to Charlier's method of treating " Frequency Curves" over the method of Professor Karl Pearson is well advised. But whatever the experts' verdict may be on debatable INTRODUCTORY NOTE TO THE FIRST EDITION. XVll questions like these, the scientific world is to be congratulated on Mr. Fisher's presentment of a new and sound point of view, and he emphatically is to be congratulated on the production of a text-book which for many years to come will be invaluable both to students and to his confreres who are engaged in extending the boundaries of this fascinating science. F. W. Frankland, Member of the Actuarial Society of America, Fellow of the Institute of Actuaries of Great Britain and Ireland, and Fellow of the Royal Statistical Society of London. New York, October 1, 1915. PREFACE TO THE FIRST EDITION. " Probability " has long ago ceased to be a mere theory of games of chance and is everywhere, especially on the continent, regarded as one of the most important branches of applied mathematics. This is proven by the increasing number of standard text-books in French, German, Italian, Scandinavian and Russian which have appeared during the last ten years. During this time the research work in the theory of probabilities has received a new impetus through the labors of the English biometricians under the leader- ship of Pearson, the Scandinaxdan statisticians Westergaard, Charlier and Kieer, the German statistical school under Lexis, and the brilliant investigations of the Russian school of statisticians. Each group of these investigations seems, however, to have moved along its own particular lines. The English schools have mostly limited their investigations to the field of biology as pub- lished in the extensive memoirs in the highly specialized journal, Biometrika. The Scandinavian scholars have produced researches of a more general character, but most of these researches are un- fortunately contained in Scandinavian scientific journals and are for this reason out of reach to the great majority of readers who are not famihar with any of the allied Scandinavian languages. This applies in a still greater degree to the Russians. German scholars of the Lex-is school have also contributed important memoirs, but strangely enough their researches are little known in this country or in England, a fact which is emphasized through the belated Enghsh discussion on the theory of dispersion as devel- oped by Lexis and his disciples. The same can also be said with regard to the Italian statisticians. In the present work I have attempted to treat all these modern researches from a common point of view, based upon the mathe- matical principles as contained in the immortal work of the great Laplace, "Theorie analytique des Probabilites," a work which despite its age remains the most important contribution to the six XX PREFACE TO THE FIRST EDITION. theory of probabilities to our present day. Charlier has rightly observed that the modern statistical methods may be based upon a few condensed rules contained in the great work of Laplace. This holds true despite the fact that many modern English writers of late have shown a certain distrust, not to say actual hostility, towards the so-called mathematical probabilities as defined by the French savant, and have in their place adopted the purely empirical probability ratios as defined by Mill, Venn and Chrystal. It is quite true that it is possible to build a consistent theory of such ratios, as for an instance is done by the Danish astronomer and actuary, Thiele. The theorj-, however, then becomes purely a theory of observations in which the theory of probability takes a secondary place. The distrust in the so-called mathematical a priori probabilities of Laplace I believe, however, to be unfounded, and the criticism to which that particular kind of probabilities is subjected by a few of the modern English writers is, I believe, due to a misapprehension of the true nature of the Bernoullian Theorem. This renowned theorem remains to-day the cornerstone of the theory of statistics, and upon it I have based the most important chapters of the present work. Following the beautiful investigations of TschebychefF and Pizetti in their proofs of Bernoulli's Theorem and the closely related theorem of large numbers by Poisson I have adopted the methods of the Swedish astronomer and statistician, Charlier, in the discussion of the Lexian dispersion theory. The theory of frequency curves is treated from various points of view. I have first given a short historical introduction to the various investigations of the law of errors. The Gaussian normal curve of error was by the older school of statisticians held to be sufficient to represent all statistical frequencies, and actual observed deviations from the normal curve were attributed to the limited number of observations. Through the original memoirs of Lexis and the investigations of Thiele the fallacy of such a dogmatic belief was finally shown. The researches of Thiele, and later of Pearson, developed later the theory of skew curves of error. As recently as 1905 Charlier finally showed that the whole theory of errors or frequency curves may be brought back to the principles of Laplace. I have treated this PHEFACE TO THE FIRST EDITION. XXI subject by the methods of both Pearson and Charlier, although I •have given the methods of the latter a predominant place, because of their easy and simple application in the practical computations required by statistical work. The mathematical theory of cor- relation, which is treated in an elementary manner only, is based upon the same principles. The statistical examples serve as illustrations of the theory, and it will be noted that it is possible to solve all the important sta- tistical problenis presenting themselves in daily work on the basis of a theory of mathematical probabilities instead of on a direct theory of statistical methods. I have here again followed Charlier in dividing all statistical problems into two distinct groups, namely, the homograde and the heterograde groups. In treating the philosophical side of the subject I have naturally not gone into much detail. However, I have tried to emphasize the two diametricalh' opposite standpoints, namely the principle of what von Kries has called the principle of "cogent reason," and the principle which Boole has aptly termed "the equal distribution of ignorance." These two principles are clearly illus- trated in the case of the so-called inverse probabilities. As far as pure theory is concerned, the theory of "inverse probabilities" is rigorous enough. It is only when making practical applications of the rule of inverse probabilities (the so-called Bayes' Rule) that many writers have made a fatal mistake by tacitly assuming the principle of "insufficient reason" as the only true rule of com- putation. This leads to paradoxical results as illustrated by the practical problem from the region of actuarial science in Chapter VI In this book. In a work of this character I have naturally made an extended use of the higher mathematical analysis. However, the reader who is not versed in these higher methods need not feel alarmed on this account, as the elementary chapters are arranged in such a way that the more difficult paragraphs may be left out. I have in fact divided the treatise into two separate parts. The first part embraces the mathematical probabilities proper and their applications to homograde statistical series. This part, I think, constitutes what is -usually given as a course in vital statistics in many American colleges. I hardly deem it worth while to give a XXll PREFACE TO THE FIRST EDITION. detailed discussion on the collection and arrangement of the sta- tistical data as to various frequency distributions. The mere graphical and serial representation of frequency functions by means of histographs and frequency columns is so simple and evident that a detailed description seems superfluous. The fitting of the various curves to analytical formulas and the determination of the various parameters seem to me of much greater impor- tance. The theory of curve fitting which is treated in the second volume is founded upon a more advanced mathematical analysis and is for this reason out of reach to the average American student who desires to learn only the rudiments of modern statistical methods. Practical statisticians, on the other hand, will derive much benefit from these higher methods. It is a fact generally noted in mathematics that the practical application of a difficult theory is much simpler than that of a more elementary theory. This is amply proven by the appearance of an excellent little Scandinavian brochure by Charlier : " Grunddragen af den mate- matiske Statistikken." ("Rudiments of Mathematical Statis- tics.") I have always attempted to adapt theory to actual practical problems and requirements rather than to give a purely mathematical abstract discussion. In fact it has been my aim to present a theory of probabilities as developed in recent years which would prove of value to the practical statistician, the actuary, the biologist, the engineer and the medical man, as well as to the student who studies mathematics for the sake of mathematics alone. The nucleus of this work consisted of a number of notes written in Danish on various aspects of the theory of probabilities, col- lected from a great number of mathematical, philosophical and economic writings in various languages. At the suggestion of my former esteemed chief, Mr. H. W. Robertson, F.A.S., As- sistant Actuary of the Equitable Life Assurance Society of the United States, I was encouraged to collect these fragmentary notes in systematic form. The rendering in English was done by myself personally with the assistance of I\Ir. W. Bonynge. With his assistance most of the idiomatic errors due to my barbaric Dano-English have been eliminated. The notes stand, however, in the main as a faithful reproduction of my original PREFACE TO THE FIRST EDITION. XXUl English copy. Although the resulting " Dano-Enghsh " may have its great shortcomings as to rhetoric and grammar, I hope to have succeeded in expressing what I wanted to say in such a manner that my possible readers may follow me without difficulty. I gladly take the opportunity of expressing my thanks to a number of friends and colleagues who in various ways have as- sisted me in the preparation of this work. ]\Iy most grateful thanks are due to Mr. F. W. Frankland, Mr. H. W. Robertson and Mr. Wm. Bonynge not only for reading the manuscript and most of the proofs, but also for the friendly help and encourage- ment in the completion of this volume. The introductory note by Mr. Frankland, coming from the pen of a scholar who for the most of a life-time has worked with statistical-mathematical subjects and who has taken a special interest in the philosophical and metaphysical aspects of the probabiHty theory, I regard as one of the strong points of the book. ]My debts to INIessrs. Frankland and Robertson as well as to Dr. W. Strong, Associate Actuary of the INIutual Life Insurance Company, are indeed of such a nature that they cannot be expressed in a formal preface. INIy thanks are also due to ^Nlr. A. Pettigrew in correcting the first rough draught of the first three chapters at a time when my knowledge of English was most rudimentary, to ]Mr. j\l. Dawson, Consulting Actuary, and ^Ir. R. Henderson, Actuary of the Equit- able Life, for reading a few chapters in manuscript and making certain critical suggestions, to Professors C. Grove and W. Fite, of Columbia University, for numerous technical hints in the working out of various mathematical formulas in Chapter VL to Miss G. ]Morse, librarian of the Equitable Library, in the search of certain bibliographical material. Last but not least I wish to express my sincerest thanks to several of my Scandinavian com- patriots for allowing me to quote and use their researches on various statistical subjects. I want in this connection especially to mention Professor Charher, of Lund, and Professors Wester- gaard and Johannsen, of Copenhagen. To The Alacmillan Company and The New Era Printing Com- pany I beg leave to convey my sincere appreciation of their very courteous and accommodating attitude in the manufacture of XXIV PREFACE TO THE FIRST EDITION. this work. Their spirit has been far from commercial in this — from a pure business standpoint — somewhat doubtful under- taking. Aene Fisher. New Yohk, October, 1915. TABLE OF CONTENTS. FART I. MATHEMATICAL PROBABILITIES AND HOMOGRADE STATISTICS. Chapter I. Introduction: General Principles and Philosophical Aspects. Page 1 . Methods of Attack 1 2. Law of Causality 1 3. Hypothetical Judgments 3 4. Hypothetical Disjunctive Judgments 4 5. General Definition of the Probability of an Event 5 6. Equally likely Cases 6 7. Objective and Subjective Probabilities 9 Chapter II. Historical and Bibliographical Notes. 8. Pioneer Writers 11 9. Bernoulli, de Moivre and Bayes 12 10. Application to Statistical Data 13 11. Laplace and Modem Writers 14 Chapter III. The Mathematical Theory of Probabilities. 12. Definition of Mathematical Probability 17 13. Example 1 18 14. Example 2 20 15. Example 3 20 16. Example 5 22 17. Example 6 23 Chapter IV. The Addition and Multiplication Theorems in Probabilities. 18. Systematic Treatment by Laplace 26 19. Definition of Technical Terms 26 20. The Theorem of the Complete or Total Probability, or the Proba- bility of "Either Or" 27 21. Theorem of the Compound Probability or the Probability of "As Well As" 28 22. Poincare's Proof of the Addition and Multiphcation Theorem 30 23. Relative Probabilities 31 24. Multiphcation Theorem 33 25. Probability of Repetitions 33 I* XXV XXVI TABLE OF CONTENTS. 26. Application of the Addition and Multiplication Theorems in Problems in Probabilities 35 27. Example 12 35 28. Example 13 36 29. Example 14 37 30. Example 15 37 31. Example 16 38 32. Example 17 39 33. Example 18. De JNIoivre's Problem 40 34. Example 19 42 35. Example 20. Tchebycheff 's Problem 46 Chapter V. Mathematical Expectation. 36. Definition, Mean Values 49 37. The Petrograd (St. Petersburg) Problem 51 38. Various Explanations of the Paradox. The Moral Expectation. ... 51 Chapter \1. Probability a Posteriori. 39. Bayes's Rule. A Posteriori Probabilities 54 40. Discovery and History of the Rule 55 41. Bayes's Rule (Case I) 56 42. Bayes's Rule (Case II) 59 43. Determination of the Probabilities of Future Events Based upon Actual Observations 59 44. Examples on the Application of Bayes's Rule 61 45. Criticism of Bayes's Rule 62 46. Theory versus Practice 64 47. Probabilities expressed by Integrals 67 48. Example 24 70 49. Example 25. Ring's Paradox 72 50. Conclusion 76 Chapter VII. The Law of Large Numbers. 51. A Priori and Empirical Probabilities 82 52. Extent and Usage of Both Methods 85 53. Average a Priori Probabilities 87 54. The Theory of Dispersion 88 55. Historical Development of the Law of Large Numbers 89 Chapter VIII. Introductory Formulas from the Infinitesimal Calculus. 56. Special Integrals 90 57. Wallis's Expression of ir as an Infinite Product 90 58. De Moivre— Stirling's Formula 92 TABLE OF CONTENTS. XXvil Chapteb IX. Law of Large Numbers. Mathematical Deduction. 59. Repeated Trials 96 60. Most Probable Value 97 61. Simple Numerical Examples 97 62. The Most Probable Value in a Series of Repeated Trials 99 63. Approximate Calculation of the Maximum Term, r„ 101 64. Expected or Probable Value 102 65. Summation Method of Laplace. The ilean Error 104 66. ^Nlean Error of ^'arious Algebraic Expressions 106 67. Tchebycheff's Theorem 108 68. The Theorems of Poisson and Bernoulli proved by the Application of the Tchebycheffian Criterion 110 69. BernouUian Scheme 110 70. Poisson's Scheme Ill 71. Relation between Empirical Frequency Ratios and Mathematical Probabihties 114 72. Application of the Tchebycheffian Criterion 115 Chapter X. The Theory of Dispersion and the Criterions of Lexis and Charlier. 73. BernouUian, Poisson and Lexis Series 117 74. The Mean and Dispersion 118 74a. Mean or Average Deviation 122 75. The Lexian Ratio and Charlier Coefficient of Disturbancy 124 Chapter XI. Application to Games of Chance and Statistical Problems. 76. Correlate between Theory and Practice 127 77. Homograde and Heterograde Series. Technical Terms 128 78. Computation of the Mean and the Dispersion in Practice 130 79. Westergaard's Experiments 136 80. CharUer's Experiments 137 81. Experiments by Bonynge and Fisher 141 CHAPTER XII. Continuation of the Application of the Theory of Probabilities to Homograde Statistical Series. 82. General Remarks 146 83. Analogy between Statistical Data and Mathematical Probabilities. . 147 84. Number of Comparison and Proportional Factors 149 85. Child Births in Sweden 151 86. Child Births in Denmark 152 XX, 111 TABLE OF CONTENTS. 87. Danish Marriage Series 153 88. Stillbirths 154 89. Coal Mine Fatalities 155 90. Reduced and Weighted Series in Statistics 157 91. Secular and Periodical Fluctuations 161 92. Cancer Statistics 165 93. Application of the Lexian Dispersion Theory in Actuarial Science. Conclusion 167 PART II. FREQUENCY CURVES AND HETEROGRADE STATISTICS Chapter XIII. The Theory of Errors and Frequency Curves and Its Application to Statistical Series. General Remarks. 94. General Remarks. The Hypotheses of Elementary Errors 169 95. Application to Statistical Series. Definitions 173 96. Compound Frequency Curves 176 97. Early Writers 178 98. Laplace and Gauss 179 99. Quetelet's Studies 181 100. Opperman, Gram, and Thiele 182 101. Modern Investigations 184 Chapter XIV. The Mathematical Theory of Frequency Curves. 102. Frequency Distributions 188 103. Parameters Considered as Symmetric Functions 189 104. Semi-Invariants of Thiele 191 105. The Fourier Integral Equation 194 106. Frequency Function as the Solution of an Integral Equation 195 107. The Normal or Laplacean Probability Function 197 108. Hermite's Polynomials 199 109. Orthogonal Functions 200 110. The Frequency Function Expressed as a Series 202 111. Derivation of Gram's Series 203 112. Absolute Frequencies 206 113. Coefficients Expressed by Semi-Invariants 208 114. Change of Origin and Unit 210 TABLE OF CONTENTS. Xxix PART III. PRACTICAL APPLICATIONS OF THE THEORY. Chapter XV. The Numerical Determination of the Parameters. 1 15. General Remarks 215 116. Remarks on Criticisms 216 117. Charlier's Computation Scheme 218 118. Comparison between Observed Data and Theoretical Values 220 119. Principle of Method of Least Squares 221 120. Gauss' Solution of Normal Equations 224 121. Arithmetical Application of Method 225 Chapter XVI. Logarithmically Transformed Frequency Functions. 122. Transformation of the Variate 235 123. The General Theory of Transformation 236 124. Logarithmic Transformation 237 125. The Mathematical Zero 238 126. Logarithmically Transformed Frequency Series 239 127. Parameters Determined by Least Squares 243 128. Application to Graduation of Mortality Tables 244 129. Formation of Observation Equations 246 130. Additional Examples 257 Chapter X^II. Frequency Curves and their Relation to the Bernoullian Series. 131. The Bernoullian Series 261 132. Poisson's Exponential 265 133. The Law of SmaU Numbers 270 Chapter XVIII. Poisson-Charlier Frequency Curves for Integral Variates. 134. Charlier's B Curve 271 135. Numerical Examples 273 136. Transformation of the Variate 274 137. Bernoullian Series expressed as B Curves 276 138. Remarks on Mr. Keynes' Criticisms 278 PART I MATHEMATICAL PROBABILITIES AND HOMOGRADE STATISTICS CHAPTER I. INTRODUCTION; GENERAL PRINCIPLES AND PHILOSOPHICAL ASPECTS. 1. Methods of Attack. — The subject of the theory of proba- bilities may be attacked in two different ways, namely in a philosophical, and in a mathematical manner. At first the subject originated as isolated mathematical problems from games of chance. The pioneer writers on probability such as Cardano, Galileo, Pascal, Fermat, and Huyghens treated it in this way. The famous Bernoulli was, perhaps, the first to view the subject from the philosopher's point of view. Laplace wrote his well- known "Essai Philosophique des Probabilites," wherein he terms the whole science of probability as the application of common sense. During the last thirty years numerous eminent philo- sophical scholars such as Mill, Venn, and Keynes of England, Bertrand and Poincare of France, Sigwart, von Kries and Lange of Germany, Kroman of Denmark, and several Russian scholars have written on the philosophical aspect. In the ordinary presentation of the elements of the theory of probability as found in most English text-books, the treatment is wholly mathematical. The student is given the definition of a mathematical probability and the elementary theorems are then proved. We shall, in the following chapter, depart from this rule and first view the subject, briefly, from a philosophical standpoint. What the student may thus lose in time we hope he may gain in obtaining a broader view of the fundamental principles underlying our science. At the same time, the reader who is unacquainted with the science of philosophy or pure logic, need not feel alarmed, since not even the most elementary knowledge of the principles of formal logic is required for the understanding of the following chapter. 2. Law of Causality. — In a great treatise on the Chinese civihza- tion, Oscar Peschel, the German geographer and philosopher, makes the following remarks: "Since our intellectual awakening, since we have appeared on the arena of history as the creators 2 1 2 INTRODUCTION. [ 2 and guardians of the treasures of culture, we have sought after only one thing, of the presence of which the Chinese had no idea, and for which they would give hardly a bowl of rice. This invisible thing we call causality. We have admired a vast number of Chinese inventions, but even if we seek through their huge treasures of philosophical writing we are not indebted to them for a single theory or a single glance into the relation between cause and effect." The law of causality may be stated broadly as follows : Every- thing that happens, and everything that exists, necessarily happens or exists as the consequence of a previous state of things. This law cannot be proven. It must be taken, a priori, as an axiom; but once accepted as a truth it does away with the belief of a capricious ruling power, and even if the strongest disbeliever of the law may deny its truth in theory he invariably applies it in practice during his daily occupation in life. All future human activity is more or less influenced by past and present conditions. Modern historical writings, as for instance the works of the brilliant Italian historian, Ferrero, always seek to connect past events with present social and economic conditions. Likewise great and constructive statesmen in trying to shape the destinies of nations always reckon with past and present events and conditions. We often hear the term, "a man with foresight," applied to leading financiers and states- men. This does not mean that such men are gifted with a vision of the future, but simply that they, with a detailed and thorough knowledge of past and present events, associated with the par- ticular undertaking in which they are interested, have drawn conclusions in regard to a future state of affairs. For example, when the Canadian Pacific officials, in the early eighties, chose Vancouver as the western terminal for the transcontinental railroad, at a time when practically the whole site of the present metropolis of western Canada was only a vast timber tract, they realized that the conditions then prevailing on this particular spot — the excellent shipping facilities, the favorable location in regard to the Oriental trade, and the natural wealth of the sur- rounding country — would bring forth a great city, and their predictions came true. 3] HYPOTHETICAL JUDGMENTS. 3 Predictions with regard to the future must be taken seriously only when they are based upon a thorough knowledge of past and present events and conditions. Prophecies, taken in a purely biblical sense of the term and viewed from the law of causality, are mere guesses which may come true and may not. A prophet can hardly be called more than a successful guesser. Whether there have been persons gifted with a purely prophetic vision is a question which must be left to the theologians to wrangle over. 3. Hypothetical Judgments. — Any person with ordinary in- tellectual faculties may, however, predict certain future events with absolute certainty by a simple application of the principle of hypothetical judgment. The typical form of the hypothetical judgment is as follows : If a certain condition exists, or if a certain event takes place then another definite event will surely follow. Or if .i exists B will invariably follow. Mathematical theorems are examples of hA-pothetical judg- ments. Thus in the geometry of the plane we start with certain ideas (axioms) about the line and plane. From these axioms we then deduce the theorems by mere hypothetical judgments. Thus in the Euclidian geometry we find the axiom of parallel lines, which assumes that through a point only one line can be drawn parallel to another given line, and from this assumption we then deduce the theorem that the sum of the angles in a triangle is 180°. But it must be borne in mind that this proof is valid only on the assumption of the actual existence of such lines. If we could prove directly by logical reasoning or by actual measurement, that the sum of the angles in any triangle is equal to 180°, then we would be able to prove the above theorem, the so-called "hole in geometry," independently of the axiom of parallel lines. A Russian mathematician, Lobatschewsky, on the other hand, assumed that through a single point an infinite number of parallels might be drawn to a previously given line, and from this as- sumption he built up a complete and vahd geometry of his own. Still another mathematician, Riemann, assumed that no lines were parallel to each other, and from this produced a perfectly valid surface geometry of the sphere. 4: INTRODUCTION. [4 As examples of hypothetical judgment we have the two follow- ing well-known theorems from elementary geometry and algebra. If one of the angles of a triangle is divided into two parts, then the line of division intersects the opposite side. If a decadian number is divided by 5 there is no remainder from the division. In natural science, hypothetical judgments are founded on certain occurrences (phenomena) which, without exception, have taken place in the same manner, as shown by repeated obser- vations. The statement that a suspended body will fall when its support is removed is a hypothetical judgment derived from actual experience and observation. 4. H3rpothetical Disjunctive Judgments. — In hypothetical judgments we are always able to associate cause and effect. It happens frequently, however, that our knowledge of a certain complex of present conditions and actions is such that we are not able to tell beforehand the resulting consequences or effects of such conditions and actions, but are able to state only that either an event Ei or an event E2, etc., or an event En will happen. This represents a hypothetical disjunctive judgment whose typical form is : If A exists either Ei, E2, -E3, • • • or -E„ will happen. If we take a die, i. e., a homogeneous cube whose faces are marked with the numbers from one to six, and make an ordinary throw, we are not able to tell beforehand which side will turn up. True, we have here again a previous state of things, but the conditions do not allow such a simple analysis as the cases we have hitherto considered under the purely hypothetical judgment. Here a multitude of causes influence the final result — the weight and centre of gravity of the die, the infinite number of possible movements of the hand which throws the die, the force of contact with which the die strikes the table, the friction, etc. All these causes are so complex that our minds are not afforded an op- portunity to grasp and distinguish the impulses that determine the fall of the die. In other words we are not able to say, a priori, which face will appear. We only know for certain that either 1, 2, 3, 4, 5, or 6 will appear. If a line is drawn through the vertex of a triangle, it either intersects the opposite side or it does not. If a number is divided by 5 the division either gives 5] GENERAL DEFINITION OF PKOBABILITT OF AN EVENT. 5 only an integral number or leaves a remainder. If an opening is made in the wall of a vessel partly filled with water, then either the water escapes or remains in the vessel. All the above cases are examples of hj-pothetical disjunctive judgments. The four cases show, however, a common characteristic. They all have a certain partial domain, where one of the mutually exclusive events is certain to happen, while the other partial domain will bring forth the other event, and the total area of action embraces both events. Taking the triangle, we notice that the lines may pass through all the points inside of an angle of 360°, but only the lines faUing inside the internal vertical angle, ip, of the triangle will produce the event in question, namely the line intersecting the opposite side. There tstII be an outflow from the vessel only if the hole is made in that part of the wall which is touched by the fluid. All problems do not allow of such simple analysis, however, as ^N-ill be seen from the following example. Suppose we have an urn containing 1 white and 2 black balls and let a person draw one from the urn. The hj-pothetical disjunctive judgment immediately tells us that the ball will be either black or white, but the particular domain of each event cannot be limited to the fixed border lines of the former examples. Any one of the balls may occupy an infinite number of positions, and furthermore we may imagine an infinite number of movements of the hand which draws the ball, each movement being associated with a particular point of position of the baU in the urn. If we now assume each of the three balls to have occupied all possible positions in the urn, each point of position being associated •n'ith its proper movement of the hand, it is readily seen that a black ball will be encountered tmce as often as a white ball in a particular point of position in the urn, and for this reason any particular movement of the hand which leads to this point of position grasps a black ball twice as often as a white ball. 5. General Definition of the Probability of an Event. — All the above examples have shown the following characteristics: (1) A total general region or area of action in which all actions may take place, this total area being associated with all possible events. 6 INTRODUCTION. [ 6 (2) A limited special domain in which the associated actions produce a special event only. If these areas and domains, as in the above cases, are of such a nature that they allow a purely quantitative determination, they may be treated by mathematical analysis. We define now, without entering further into its particular logical signifi- cance, the ratio of the second special and limited domain to the first total region or area as the probability of the happening of the event, E, associated with domain No. 2. We must, however, hasten to remark that it is only in a com- paratively few cases that we are able, a priori, to make such a segregation of domains of actions. This may be possible in purely abstract examples, as for instance in the example of the division of the decadian number by 5. But in all cases where organic life enters as a dominant factor we are unable to make such sharp distinctions. If we were asked to determine the proba- bility of an a:-year-old person being alive one year from now, we should be able to form the hypothetical disjunctive judgment: An a;-year-old person will be either alive or dead one year from now. But a further segregation into special domains as was the case with the balls in the urn is not possible. Many ex- tremely complex causes enter into such a determination; the health of the particular person, the surroundings, the daily life, the climate, the social conditions, etc. Our only recourse in such cases is to actual observation. By observing a large number of persons of the same age, x, we may, in a purely em- pirical way, determine the rate of death or survival. Such a deter- mination of an unknown probability is called an empirical proba- bility. An empirical probability is thus a probability, into the determination of which actual experience has entered as a domi- nant factor. 6. Equally Likely Cases. — The main difficulty, in the appli- cation of the above definition of probability, lies in the deter- mination of the question whether all the events or cases taking place in the general area of action may be regarded as equally likely or not. Two diametrically opposite views have here been brought forward by writers on probabilities. One view is based upon the principle which in logic is known as the principle of 6] EQUALLY LIKELY C.VSES. 7 ■ insuflBcient reason," while the other ^■iew is based upon the principle of " cogent reason." The classical writers on the theory of probability, such as Jacob Bernoulli and Laplace, base the theory on the principle of insuiBcient reason exclusively. Thus Bernoulli declares the six possible cases by the throw of a die to be equally likely, since "on account of the equal form of all the faces and on account of the homogeneous structiu-e and equally arranged weight of the die. there is no reason to asstune that any face should tiu-n up in preference to any other." In one place Laplace says that the possible cases are "cases of which we are equally ignorant," and in another place, "we have no reason to beUeve any particidar case should happen in preference to any other. " The opposite ^■^ew, based on the principle of cogent reason, has been strongly endorsed in an admirable little treatise by the German scholar, -Johannes von Kries.^ Von Kries requires, first of all, as the main essential in a logical theory of probability, that "the arrangement of the equally hkely cases must have a cogent reason and not be subject to arbitrary conditions." Li several illustrative examples, von Kries shows how the principle of insufficient reason may lead to different and paradox- ical results. The following example will illustrate the main points in von Kries's criticism. Suppose we be given the follow- ing problem : Determine the probabihty of the existence of hiunan beings on the planet ^lars. By applying the first mentioned principle oiu- reasoning woidd be as follows: We have no more reason to assume the actual existence of man on the planet than the complete absence. Hence the probability for the non- existence of a human being, is equal to ^. Next we ask for the probability of the presence or non-presence of another earthly mammal, say the elephant. The answer is the same, f . Now the probability for the absence of both man and elephant on the planet i? 2 X § = !•' The probability for the absence of a third mammal, the horse, is also §, or the probability for the absence of man, elephant, and horse is equal to (i)* = } Proceeding in the same manner for aU mammals we obtain a ver\" small proba- 1 "Die Principien der Wahrscheinlichkeitsrechnuiig."; Berlin, lsS6. ' 5t>e the chapter on multiplication of probabilities. 8 INTRODUCTION. I « bility for the complete absence of all mammals on Mars, or a very large probability, almost equal to certainty, that the planet harbors at least one mammal known on our planet, an answer which certainly does not seem plausible. But we might as well have put the question from the start: what is the probability of the existence or absence of any one earthly mammal on Mars? The principle of insufficient reason when applied directly would here give the answer 5, while when applied in an indirect manner the same method gave an answer very near to certainty. An urn is kijown to contain white and black balls, but the number of the balls of the two different colors is unknown. What is the probability of drawing a white ball? The principle of insufficient reason gives us readily the answer: f, while the prin- ciple of cogent reason would give the same answer only if it were known a priori that there were equal numbers of balls of each color in the urn before the drawing took place. Since this knowledge is not present a priori, we are not able to give any answer, and the problem is considered outside the domain of probabilities. There is no doubt that the principle advocated by von Kries is the only logical one to apply, and a recent treatise on the theory of probability by Professor Bruhns of Leipzig"^ also gives the principle of cogent reason the most promi- nent place. On the other hand it must be admitted that if the principle was to be followed consistently in its very extreme it would of course exclude many problems now found in treatises on probability and limit the application of our theory consider- ably in scope. Still, however, we must agree with von Kries that it seems very foolhardy to assign cases of which we are absolutely in the dark, as being equally likely to occur. This very principle of insufficient reason is in very high degree re- sponsible for the somewhat absurd answers to questions on the so-called "inverse probabilities," a name which in itself is a great misnomer. We shall later in the chapter on "a posteriori" probabilities discuss this question in detail. At present we shall only warn the student not to judge cases of which he has no knowledge whatsoever to be equally likely to occur. The old rule "experience is the best teacher" holds here, as everywhere else. ' " KoUektivmasslehre and Wahrscheinlichkeitsrechnung," Leipzig, 1903. 7] OBJECTIVE AND SUBJECTIVE PROBABILITIES. 9 7. Objective and Subjective Probabilities. — In this connection it is interesting to note the lucid remarks by the Danish statis- tician, Westergaard. "By every well arranged game of chance, by lotteries, dice, etc.," Westergaard says, "everything is ar- ranged in such a way that the causes influencing each draw or throw remain constant as far as possible. The balls are of the same size, of the same wood, and have the same density; they are carefully mixed and each ball is thus apparently subject to the influences of the same causes. However, this is not so. Despite all our efforts the balls are different. It is impossible that they are of exactly mathematically spherical form. Each ball has its special deviation from the mathematical sphere, its special size and weight. Xo ball is absolutely similar to any one of the others. It is also impossible that they may be situated in the same manner in the bag. In short there is a multitude of ap- parently insignificant differences which determine that a certain definite ball and none of the other balls may be drawn from the bag. If such inequalities did not exist one of two things would happen. Either all balls would turn up simultaneouslj' or also they would all remain in the bag. IMany of these numerous causes are so small that they perhaps are invisible to the naked eye and completely escape all calculations, but by mutual action they may nevertheless produce a visible result." It thus appears that a rigorous application of the principle of cogent reason seems impossible. However, a compromise between this principle and that of the principle of insufficient reason may be effected by the following definition of equally possible cases, viz. : Equally possible cases are such cases in which we, after an exhaustive analysis of the physical laws underlying the structure of the complex of causes influencing the special event, are led to assume that no particular case tvill occui in preference to any other. True, this definition introduces a certain subjective element and may therefore be criticized by those readers who wish to make the whole theory of probabilities purely objective. Yet it seems to me preferable" to the strict application of the principle of equal distribution of ignorance. Take again the question of the probability of the existence of human beings on the planet Mars. The principle of equal distribution of ignorance 10 INTRODUCTION. 17 readily gives us without further ado the answer f . Modern astro- physical researches have, however, verified physical conditions on the planet which make the presence of organic life quite possible, and according to such an eminent authority as Mr. Lowell, perhaps absolutely certain. Yet these physical investigations are as yet not sufficiently complete, and not in such a form that they may be subjected to a purely quantitative analysis as far as the theory of probabilities is concerned. Viewed from the standpoint of the principle of cogent reason any attempt to determine the numerical value of the above probability must therefore be put aside as futile. This result, negative as it is, seems, however, preferable to the absolute guess of f as the probability. CHAPTER II. HISTORICAL AND BIBLIOGRAPHICAL NOTES. 8. Pioneer "Writers. — The first attempt to define the measure of a probabiUty of a future event is credited to the Greek philos- opher, Aristotle. Aristotle calls an event probable when the majority, or at least the majority of the most intellectual persons, deem it likely to happen. This definition, although not allowing a purely quantitative measurement, makes use of a subjective judgment. The first really mathematical treatment of chance, however, is given by the two Italian mathematicians, Cardano and Galileo, who both solved several problems relating to the game of dice. Cardano, aside from his mathematical occupation, was also a professional gambler and had evidently noticed that in all kinds of gambling houses cheating was often resorted to. In order that the gamester might be fortified against such cheating prac- tices, Cardano wrote a little treatise on gambling wherein he discussed several mathematical questions connected with the different games of dice as played in the Italian gambling houses at that time. Galileo, although not a professional gambler, was often consulted by a certain Italian nobleman on several problems relating to the game of dice, and fortunately the great scholar has left some of his investigations in a short memoir. In the same manner the two great French mathematicians, Pascal and Fermat, were often asked by a professional gamester, the cheva- lier de ^Vlere, to apply their mathematical skill to the solution of different gambling problems. It was this kind of investigation which probably led Pascal to the discovery of the arithmetical triangle, and the first rudiments of the combinatorial analysis, which had its origin in probability problems, and which later evolved into an independent branch of mathematical analysis. One of the earliest works from the illustrious Dutch physicist, Huyghens, is a small pamphlet entitled "de Ratiociniis in Ludo Alese," printed in Leyden in the year 1657. Huyghens' tract is 11 12 HISTORICAL AND BIBLIOGRAPHICAL NOTES. [9 the first attempt of a systematic treatment of the subject. The famous Leibnitz also wrote on chance. His first reference to a mathematical probability is perhaps in a letter to the philoso- pher, WolfF, wherein he discusses the summation of the infinite series 1 — 1 + 1 — !+•••• Besides he solved several problems. 9. Bernoulli, de Moivre and Bayes. — The first extensive treatise on the theory as a whole is from the hand of the famous Jacob BernouUi. Bernoulli's book, "Ars Conjectandi," marks a revolution in the whole theory of chance. The author treats the subject from the mathematical as well as from a philo- sophical point of view, and shows the manifold applications of the new science to practical problems. Among other important theorems we here find the famous proposition which has become known as the Bernoulli Theorem in the mathematical theory of probabilities. Bernoulli's work has recently been translated from the Latin into German,^ and a student who is interested in the whole theory of probability should not fail to read this masterly work. The English mathematicians were the next to carry on the investigations. Abraham de Moivre, a French Huguenot, and one of the most remarkable mathematicians of his time, wrote the first English treatise on probabilities.^ This book was cer- tainly a worthy product of the masterful mind of its author, and may, even today, be read with useful results, although the method of demonstration often appears lengthy to the student who is accustomed to the powerful tools of modern analysis. The high esteem in which the work by de Moivre is held by modern writers, is proven by the fact that E. Czuber, the eminent Austrian mathematician and actuary, so recently as two years ago translated the book into German. A certain problem (see Chap. IV) still goes under the name of "The Problem of de Moivre" in the modern literature on probability. A contem- porary of de Moivre, Stirling, contributed also to the new branch of mathematics, and his name also is immortalized in the theory of probability by the formula which bears his name, and by which we are able to express large factorials to a very accurate degree of approximation. The third important English contributor is ' Ars Conjectandi, Ostwald's Klassiker No. 108, Leipzig, 1901. ''■ de Moivre: "The Doctrine of Chances," London, 1781. 10] APPLICATION TO STATISTICAL DATA. 13 the Oxford clergyman, T. Bayes. Bayes' treatise, which was published after his death by Price, in Philosophical Transactions for 1764, deals with the determination of the a posteriori proba- bilities, and marks a very important stepping stone in our whole theory. Unfortunately the rule known as Bayes' Rule has been applied very carelessly, and that mostly by some of Bayes' own countrymen; so the whole theory of Bayes has been repudi- ated by certain modern writers. A recent contribution by the Danish philosophical writer. Dr. Kroman, seems, however, to have cleared up all doubts on the subject, and to have given Bayes his proper credit. 10. Application to Statistical Data. — In the eighteenth century some of the most celebrated mathematicians investigated problems in the theory of probability. The birth of life as- surance gave the whole theory an important application to social problems and the increasing desire for the collection of all kinds of statistical data by governmental bodies all over Europe gave the mathematicians some highly interesting material to which to apply their theories. No wonder, therefore, that we in this period find the names of some of the most illustrious mathe- maticians of that time, such as Daniel Bernoulli, Euler, Nicolas and John Bernoulli, Simpson, D'Alembert and Buffon, closely connected with the solution of problems in the theory of mathe- matical probabilities. We shall not attempt to give an account of the diherent works of these scientists, but shall only dwell briefly on the labors of Bernoulli and D'Alembert. In a memoir in the St. Petersburg Academy, Daniel Bernoulli is the first to discuss the so called St. Petersburg Problem, one of the most hotly debated in the whole realm of our science. We may here mention that this problem is today one of the main pillars in the economic treatment of value Bernoulli introduced in the dis- cussion of the above mentioned problem the idea of the "moral expectation," which under slightly different names appears in nearly all standard writings on economics. D'Alembert is especially remembered for the critical attitude he took towards the whole theory. Although one of the most brilliant thinkers of his age, the versatile Frenchman made some great blunders in his attempt to criticize the theories of chance. 14 HISTORICAL AND BIBLIOGRAPHICAL NOTES. [11 Buffon's name is remembered because of the needle problem, and he may properly be called the father of the so-called "ge- ometrical" or "local" probabilities. 11. Laplace and Modem Writers. — We now come to that resplendent genius in the investigation of the mathematical theory of chance, the immortal Laplace, who in his great work, "Theorie Analytique des Probabilites, " gave the final mathe- matical treatment of the subject. This massive volume leaves nothing to be desired and is still today — more than one hundred years after its first publication — a most valuable mine of in- formation and compares favorably with much more modern treatises. But like all mines, it requires to be mined and is by no means easy reading for a beginner. An elementary extract, "Essai Philosophique des Probabilites," containing the more elementary parts of Laplace's greater work and stripped of all mathematical formulas has recently appeared in an English translation. Among later French works, Cournot's "Exposition de la Theorie des Chances et des Probabilites" (1843), treated the principal questions in the application of the theory to practical problems in sociology. In 1837 Poisson published his "Re- cherches sur les Probabilites " in which he for the first time proved the famous theorem which bears his name. Poisson and his Belgian contemporary, Quetelet, made extensive use of the theory in the treatment of statistical data. Among the most recent French works, we mention especially Bertrand's "Calcul des Probabilites" (Paris, 1888), Poincar^'s "Calcul des Probabilites" (Paris, 1896), and Borel's "Calcul des Probabilites" (Paris, 1901). We especially recommend Poin- care's brilliant little treatise to every student who masters the French language, as this book makes no departure from the lively and elucidating manner in which this able mathematical writer treated the numerous subjects on which he wrote during his long and brilliant career as a mathematician. Of Russian writers, the mathematician, Tchebycheff, has given some extensive general theorems relating to the law of large numbers. Unfortunately Tchebycheff 's writings are for the most part scattered in French, German, Scandinavian and 11] LAPLACE AND MODERN WRITERS. 15 Russian journals, and thus are not easily accessible to the ordinary- reader. A Russian artillery ofScer, Sabudski, has recently pub- lished a treatise on ballistics in German, wherein he extends the views formulated by Tchebycheff. Of Scandinavian writers we mention T. N. Thiele, who prob- ably was the first to publish a systematic treatise on skew curves.-^ An abridged edition of this very original work has recently been translated into English.^ The Dane, Westergaard, is the author of the most extensive and thorough treatise on vital statistics which we possess at the present time. Westergaard's work has recently been translated into German,' and is strongly recom- mended to the student of vital statistics on account of his clear and attractive style of presenting this important subject. The Swedish mathematicians Charlier and Gylden have published a series of memoirs in different Scandinavian journals and scientific transactions. We may also, in this category, mention the numerous small articles by the eminent Danish actuary. Dr. Gram. While the German mathematicians in general are the most fertile writers on almost every branch of pure and applied mathe- matics, they have not shown much activity in the theory of mathematical probability except in the past ten years. But during that time there has appeared at least a dozen standard works in German. Among these, the lucid and terse treatise by E. Czuber, the Austrian actuary and mathematician, is especially attractive to the beginner on account of the systematic treatment of the whole subject.^ A very original treatment is offered by H. Bruhns in his " Kollektivmasslehre und Wahrschein- lichkeitsrechnung" (Leipzig, 1903). Among the German works, we may also mention the book by Dr. Norman Herz in " Samm- lung Schubert," and an excellent little work by Hack in the small pocket edition of "Sammlung Goschen." The theory of skew curves and correlation is presented by Lipps and Bruhns in extensive treatises. 1 " Almindelig lagttagelseslaere," Copenhagen, 1884. 2 "Theory of Observations," London, 1903. ' " MortaUtat und MorbiUtat,'' Jena, 1902. *E. Czuber, "Wahrscheinhchkeitsreohnung," Leipzig, 1908 anfi 1910, 2 volumes. 16 HISTORICAL AND BIBLIOGHAPHICAL NOTES. [ H We finally come to modern English writers on the subject. After the appearance of de ]Moivre's "Doctrine of Chances" the first work of importance was the book by de Morgan "An Essay on the Theory of Probabilities." The latest text-book is Whitworth's "Choice and Chance" (Oxford Press, 1904); but none of these works, although very excellent in their manner of treatment of the subject, comes up to the French, Scandinavian, and German text-books. Nevertheless, some of the most im- portant contributions to the whole theory have been made by the English statisticians and mathematicians, Crofton, Pearson, and Edgeworth. Especially have frequency curA'es and cor- relation methods introduced by Professor Karl Pearson been very extensively used in direct applications to statistical and biological problems. Of purely statistical writers, we may mention G. Udny Yule, who has published a short treatise en- titled "Theory of Statistics" (London, 1911). Numerous ex- cellent memoirs have also appeared in the different English and American mathematical journals and statistical periodicals, especially in the quarterly publication, Biometrika, edited by Professor Karl Pearson. In the above brief sketch, we have only mentioned the most important contributors to the theory of probabilities proper. Numerous able writers ha-\'e written on the related subject of least squares, the mathematical theory of statistics and insurance mathematics. We shall not discuss the works of these inves- tigators at the present stage. Each of the most important works in the above mentioned branches will receive a short review in the corresponding chapters on statistics and assurance mathe- matics. The readers interested in the historical development of the theory of probabilities are advised to consult the special treatises on this subject by Todhunter and Czuber.^ ' After this chapter had gone to jiress I notice that a treatise by the emi- nent Enghsh scholar, Mr. Keynes, is being prepared by The Macmillan Co. In this connection I wish also to call attention to the recent publication by Bachelier (Calcul des probabilites, 1912), a work planned on a broad and extensive scale. — A. F. CHAPTER III. THE MATHEMATICAL THEORY OF PROBABILITIES. 12. Definition of Mathematical Probability. — " If our positive Knowledge of the effect of a complex of causes is such that we may assume, a priori, t cases as being equally likely to occur, but of which only/, (J < t), cases are favorable in causing the event, E, In which we are interested, then we define the proper fraction: f/t = p a,s the mathematical probability of the happening of the event, E" (Czuber). We might also have defined an a priori probability as the ratio of the equally favorable cases to the co-ordinated possible cases. As is readily seen, this definition assumes a certain a priori knowledge of the possible and favorable conditions of the event in question, and the probability thus defined is therefore called "a priori probability." Denoting the event by the symbol, E, we express the probability of its occurrence by the symbol P{E), and the probability of its non-occurrence by P{E). Thus if t is the total number of equally possible cases and / the number of favorable cases for the event, we have: and P{E) = j=V> P{E) = ^= 1 - I = 1 - p = 1 - P{E). This relation evidently gives us: P(£) + P(£) = 1, which is the symbolic expression for the hypothetical disjunctive judgment that the event E will either happen or not happen. If / = t, we have: P(E) = 1=1, which is the symbol for the hypothetical judgment that if A exists, E will surely happen. Similarly if / = 0, we get 3 17 18 THE MATHEMATICAL THEORY OF PROBABILITIES. [ 13 P{E) = 7=0, or the symbol for the hypothetical judgment: If A exists, E will not happen, or what is the same, E will happen. As we have already mentioned, in an a priori determination of a probability, special stress must be laid upon the requirement that all possible cases must be equally likely to occur. The enumeration of these cases is by no means so easy as may appear at first sight. Even in the most simple problems wh^re there can be doubt about the possible cases being equally likely to occur, it is very easy to make a mistake, and some of the most eminent mathematicians and most acute thinkers have drawn erroneous conclusions in this respect. We shall give a few ex- amples of such errors from the literature on the subject of the theory of probabilities, not on account of their historical interest alone, but also for the benefit of the novice who naturally is ex- posed to such errors. 13. Example 1. — An Italian nobleman, a professional gambler and an amateur mathematician, had, by continued observation of a game with three dice, noticed that the sum of 10 appeared more often than the sum of 9. He expressed his surprise at this to Galileo and asked for an explanation. The nobleman re- garded the following combinations as favorable for the throw of 9: 1 2 6 1 3 5 1 4 4 2 2 5 2 3 4 3 3 3 and for the throw of 10 the six combinations of: 1 3 6 1 4 5 2 2 6 2 3 5 2 4 4 3 3 4 13] EXAMPLE 1. 19 Galileo shows in a treatise entitled " Considerazione sopra il giuco dei dadi" that these combinations cannot be regarded as being equally likely. By painting each of the three dice with the different color it is easy to see that an arrangement such as 12 6 can be produced in 6 different ways. Let the colors be white, black and red respectively. We may then make the following arrangements : aite : Black Red 1 2 6 1 6 2 2 1 6 2 6 1 6 1 2 6 2 1 which gives 3! = 6 different arrangements. The arrangements of 1 4 4 can be made as follows : White Black Red 1 4 4 4 1 4 4 4 1 which gives 3 different arrangements. The arrangements of 3 3 3 can be made in one way only. By complete enumeration of equally favorable cases we obtain the following scheme: Sum 9 cases Sum 10 cases 1,2,6 6 1,3,6 6 1, 3, 5 6 1, 4, 5 6 1,4,4 3 2,2,6 3 2, 2, 5 3 2, 3, 5 6 2,3,4 6 2,4,4 3 3,3,3 J. 3,3,4 2_ 25 27 The total number of equally possible cases by the different ar- rangements of the 18 faces on the dice is 6^ = 216. The prob- ability of throwing 9 with three dice is therefore gVe, of throwing in — -2-7- = 1 20 THE MATHEMATICAL THEORY OF PROBABILITIES [14 14. Example 2.— D'Alembert, the great French mathematician and natural philosopher and one of the ablest thinkers of his time, assigned f as the probability of throwing head at least once in two successive throws with a homogeneous coin. D'Alem- bert reasons as follows: If head appears first the game is finished and a second throw is not necessary. He therefore gives as equally possible cases (we denote head by H and tail by T) : H, TH, TT, and determines thus the probability as f . Where then is the error of D'Alembert? At first glance the chain of reasoning seems perfect. There are altogether three possible cases of which two are in favor of the event. But are the three cases equally likely? To throw head in a single throw is evi- dently not the same as to throw head in two successive throws. D'Alembert has left out of consideration the fact that a double throw is allowed. The following analysis shows all the equally possible cases which may occur: IIH, HT, TH, TT. Three of those cases favor the event. Hence we have: PiE) = p = f . We shall return to this problem at a later stage under the dis- cussion of the law of large numbers. The examples quoted have already shown that the enumer- ation of the equally likely cases requires a sharp distinction between the different combinations and arrangements of ele- ments. In other words, the solution of the problems requires a knowledge of permutations and combinations. We assume here that the reader is already acquainted with the elements and formulas from the combinatorial analysis and shall therefore proceed with some more illustrations. In the following, when employing the binomial coefficients, we shall use the notation 1,1 instead of "'Ck- 15. Example 3. — An urn contains a white and b black balls. A person draws k balls. What is the probability of drawing a white and /3 black balls? {a + 13 = Ic, a ma, P^h) 15] EXAMPLE 3. 21 A- balls may be drawn from the urn in as many ways as it is possible to select k elements from a + b elements, which may be done in (a+b\ _ (a+b\ ways. Furthermore there are I I groups of a white and I ^ I groups of /3 black balls. Since each combination of any one group of tjie first groups with any one group of the second groups is favorable for the event, we have as favorable cases: . /a\ ib\ „ „ \a} ^ \b} Xa + p) Example 4. A special case of the above problem is the fol- lowing question which often appears in the well known game of whist. What are the respective chances that 0, 1, 2, 3, 4 aces are held by a specified player? There are altogether 52 cards in the game equally distributed among 4 players. Of these cards 4 are aces and 48 are non-aces. Hence we have the fol- lowing values for a, b, k, a and /3. a = 4, 6 = 48, />; = 13, a = 0, 1, 2, 3, 4, /3 = 13, 12, 11, 10, 9. Substituting in the above formula we get: 82251 P''=(o)x(l3)^(l3) = /4\ /48\ /52\ 1 ?'^=(l)x(l2)^(l3) = 2- /4\ /48\ /52\ t ^=(2)x(ll)^(l3) = 2 /4\ /48\ /52V ] 270725' A hypothetical disjunctive judgment immediately tells us that in 22 THE MATHEMATICAL THEORY OF PKOBABILITIES. [ X6 a game of whist a specified player must either hold 0, 1, 2, 3 or 4 aces. Any such judgment is certain to come true. Hence by adding the 5 above computed probabilities we obtain a check for the accuracy of our calculations. The actual addition of the numerical values of po, pi, p-i, pz, and pi gives us unity which is the mathematical symbol for certainty. Gauss, the renowned German mathematician and astronomer, was an eager whist player. During his forty-eight years of residence in the university town of Gottingen almost every evening he played a rubber of whist with some friends among the university professors. He kept a careful record of the distribution of the aces in each game. After his death these records were found among his papers, headed "Aces in A^^list." The actual records agree with the results computed above. 16. Example 5. — An urn contains n similar balls. A part of or all the balls are drawn. What is the probability of drawing an even number of balls? One ball may be drawn in as many ways as there are balls, two balls in as many ways as we may select two elements out of n elements, and so on. Hence we have for the total number of equally possible cases: We have now: ('+')"='+n+(")+-+(:). and "-'-(n+(;')--+(- ')•(:)■ The number of favorable cases is given by the expansion: /=(")+(:;)+ The expression for t is the binominal coefficients less unity. Hence we have: i = (1 + 1)» - 1 = 2" - 1. If we add the two expansions of (1 + 1)" and (1 — 1)" and then 1"] EXLVMPLE 6. 23 subtract 2 we get the expansion for 2/. Hence we have : ■2f = [(1+ 1)" + (1 - 1)" - 2] .-. / = 2"-i - 1. Thus we shall have as the probabihty of drawing an even number of balls: On— 1 2 while for an uneven number: Oi«-l 9= 1-P = ^^^- We notice that the probabihty of drawing an uneven nimiber of balls is larger than the probability of drawing an even number. This apparently strange result is easUy explained without the aid of algebra from the fact that when the urn contains one ball only, we cannot draw an even number. Hence we have p = 0, 5=1. With two balls we may draw an uneven number in two ways and an even number in one way, thus p = f . and q = ^■ The greater weight of q remains when n is finite; only when n = X, p = q = ^_ 17. Example 6. — A box contains n balls marked 1, 2, .3, • • • n. A p>erson draws n balls in succession and none of the balls thus drawn is put back in the m-n. Each drawing is consecutively marked 1, 2, 3, • • • " on n cards, ^"hat is the probability that no ball marked a (a = 1, 2. 3. ■ ■ ■ n) appears simultaneously with a drawing card marked a? The number of equally jwssible cases is simply the number of permutations of ?! elements which is equal to n! The niunber of favorable cases is given by the total number of derangements or relative permutations of ;( elements, i. e., such permutations wherein the numbers from 1 to n do not app>ear in their natural places. The formula for such relative permuta- tions was first given by Euler in a memoir of the St. Petersburg Academy entitled 'Quaestio Curiosa ex Doctrina Combina- tionis."' Euler makes use of a recursion formula. A German mathematician, Lampe, has, however, derived the formula in a simpler manner in "Grunert's Archives ' for 1S'>4. 24 THE MATHEMATICAL THEORY OF PROBABILITIES. [17 Lampe denotes bj"^ the symbol (p{l) the number of permuta- tions wherein 1 does not appear in its natural place. By letting 1 remain fixed in the first place we obtain (n — 1) ! permutations of the other remaining elements, or: • En has occurred previously. It is immaterial if the n subsidiary events have happened simultaneously or in succession. But it makes a difference if the events Ei, E2, E3, • • ■ En are independent, or dependent on each other. 1. Independent Emntt. — The probability, P{E) = p, for the simultaneous or consecutive appearance of several mutually ex- clusive events: 7?i, E2, • • • -E„ is equal to the product: pi-p^-ps- •• • Pn of the individual probabilities of the n events. Proof: Let the number of possible cases entering into the complex that brings forth the event E be t. Each of the ti 21] THEOREM OF THE COMPOUND PROBABILITY. 29 possible cases corresponding to the event Ei may occur simul- taneously with each one of the U cases corresponding to the event E2. Thus we have altogether h X <2 cases falling on Ei and E2 at the same time. Continuing in the same way of reasoning it is readily seen that the total number of equally possible cases resulting from the simultaneous occurrence of the events Ei, E2, E3, •■■■En is equal to = pi X 2J2 X P3 X ■■■ Pn. But p2 means here the probability for the happening of E2 after the actual occurrence of Ei, pz the probability for the happening of Ez after Ei and E2 have pieviously happened, and so on for all n events. Example 9. — A card is drawn from a whist deck and replaced by a joker, and then a second card is drawn. What is the prob- abilitv that both cards are aces? 30 THE ADDITION AND MULTIPLICATION THEOREMS. [22 Denoting the two subsidiary events by Ei and E2 we have: 4 3 3 3 P{E) = P{E0PiE2) = 52 52 13 X 52 676 ' The two above theorems are known as the multiplication theorems in probabilities. Reuschle has also suggested the name " the as well as probability." 22. Poincare's Proof of the Addition and Multiplication Theorem. — The French mathematician and physicist, H. Poincare, has derived the above theorems in a new and elegant manner in his excellent little treatise: " Lecons sur le Calcul des Probabilites," Paris, 1896. Poincare's proof is briefly as follows: Let El and E2 be two arbitrary events. El and E2 may happen in a different ways. El may happen but not E2 in /3 different ways. E2 may happen but not Ei in y different ways. Neither Ei nor E2 will happen in 5 different ways. We assume the total a + |8 + 7 + 5 cases to be equally likely to occur. The probabihty for the occurrence of Ei is ^' a + p + y + d- The probability for the occurrence of E2 is a + 7 The probability for the occurrence of at least one of the events Ei and E2 is a + g + 7 ^' a + ^ + y + d- The probability for the occurrence of both Ei and E2 is a P'^ a + ^ + y+8- The probability for the occurrence of Ei when E2 has already oc- curred is a a + y 23] RELATIVE PROBABILITIES. 31 The probability for the occurrence of E2 when Ei has already oc- curred is a P6 = The probability for the occurrence of Ei when £2 has not already occurred is The probability for the occurrence of E2 when Ei has not already occurred is Ps = We have now the following identical relations: Pl + P2= P3 + Pi, P3 = Pl+ P2 — Pi, i. e., the probability that of two arbitrary events at least one will happen is equal to the probability that the first will happen plus the probability that the second will happen less the prob- ability that both will happen. The particular problem which we may happen to investigate may possibly be of such a nature that the two events Ei and E2 cannot happen at the same time, in that case pi = 0, and we get : Ps = Pi + Pi- In this equation we immediately recognize the addition theorem for two mutually exclusive events. By substitution of the proper values we have furthermore: Pi= P2 ■ Pi or ^4 = pi • P6- These equations contain the theorems proved under § 21, of the probability for two mutually dependent events. 23. Relative Probabilities. — We shall now finally give an alter- native demonstration of the same two theorems. It will, of course, be of benefit to the student to see the subject from as many view points as possible; moreover, the following remarks will contain some very useful hints for the solution of more com- plicated problems by the application of so-called " relative prob- 32 THE ADDITION AND MULTIPLICATION THEOREMS. [23 abilities "and a few elementary theorems from the calculus of logic. The following paragraphs are mainly based upon a treatise in the Proceedings of the Royal Academy of Saxony, by the German mathematician and actuary, F. Hausdorff. In our fundamental definition of a mathematical probability for the happening of an event E, expressed in symbols by P{E), as the ratio of the equally favorable and equally possible cases resulting from a general complex of causes, we were able to compute the so-called ordinary or absolute probabilities. But if we, from among the favorable cases and possible cases, select only such as bring forward a certain different event, say F, then we obtain the " relative probability " for the happening of E under the assumption that the subsidiary event, F, has occurred previously. For this relative probability we shall employ the symbol Pp{E), which reads "the relative probability of E, positi F." The following problem illustrates the meaning of relative probabilities. If an honor card is drawn from an ordinary deck of cards, what is the probability that it is a king? Denoting the subsidiary event of drawing an honor card by F, and the main event of drawing a king by E, we may write the above mentioned probability in the symbolic form: Pip{E). If on the other hand we knew a priori that a king was drawn, we may also ask for the probability of having drawn an honor card. Since any king also is an honor card, we may write in symbols: P^{F) = 1. Before entering upon the immediate determination of relative probabilities we shall first define a few symbols from the calculus of logic. We denote first of all the occurrence of an event E by E, the non-occurrence of the same event by E. Similarly we have for the occurrence and non-occurrence of other events, F, G, H, ■ ■ ■ and F, G, II, ■■■. E + F means that at least one of the two events E and F will happen. E X F or simply E ■ F means the occurrence of both E and F. From the above definition it follows immediately that E -\- F = E ■ F and E= E ■ F^- E -F. This last relation simply states that E will happen when either E and F happen simultaneously or when E and the non-appear- ance of F happen at the same time. If furthermore F\, Fi, F2, 25] PROBABILITY OF REPETITIONS. 33 I'\ ■ • • Fn, F„ constitute the members of a complete disjunction, i. e., mutually exclusive events, we have in general: E = E- F^-^E-F, + E- F. + E-F.+ ■■■E-F,, + E J\. From the original definition of a probability, it follows now : P{E) = P{E ■ F) + P{E ■ F), and P{E) = P{E ■ F{) + P{E ■ Fi) + PiE ■ F2) + P{E ■ ¥2) + P{E ■ Fn) + PiE ■ ¥\), i. e., the probability that of several mutually exclusi-\'e events one at least will happen is the sum of the probabilities of the happening of the separate events. This is the symbolic form for the addition theorem. 24. Multiplication Theorem. — We next take two arbitrary events. From these events we may form the following com- binations : E ■ F, E ■¥,£ ■ F, E -¥,'1.^., Both E and F happen, E happens but not F F happens but not E Neither E nor F happens. Furthermore let a, j3, 7, 5, be the respective numbers of the favorable cases for the above four combinations of the events E and F. Following the previous method of Poincare, we shall have: ^ ' a + ^ + T + 5' ^ ' a + )3 + -y + 5' nE.F)= ^^^l^^^ . 25. Probability of Repetitions. — From the above equations it immediately follows: P{E ■ F) = PiE) X PEiF) = PiF) X PAE), which is the symbolic form for the multiplication theorems of compound probabilities. 34 THE ADDITION AND MULTIPLICATION THEOREMS. [25 In special cases it may happen that the different subsidiary events : Ei, E2, E^ ■ ■ ■ En are all similar. We shall then have, following the symbolic method: E = El- E,- Ei--- En = E,- El- El--- Ei = Ei", and P{E) = P(£i") = PC^i)"- This gives us the following theorem: The probability for the repetition n times of a certain event, E, is equal to the wth power of its absolute probability. Thus if P{E) = p we have immediately P{E) = 1 — p. PiE"") = P{EY = f\ P{E'') = P[E)^ = (1 - pY. Thus the probability for the occurrence of E at least once in n trials is P(^ + £ + --- n times) = 1 - P(^") = 1 - (1 - p)". Denoting the numerical quantity of this probability by Q we have: 1 - Q = (1 - pY. Solving this equation for n we shall have: log (1 - Q) n = log (1 - p) ■ Whenever n equals, or is greater than, the above logarithmic value for given values of Q and p we are sure that Q will exceed a previously given proper fraction. To illustrate: Example 10. — How often must a die be thrown so that the probability that a six appears at least once is greater than |? Here p = \, Q = \- Hence we must select for n the smallest positive integer satisfying the relation: log (1 - i) log 1 .301035 log (1 - I) log f .079186 For this particular value, of n we have in reality Q = 1 - (1)^ = .518. i. e., n = 4. 27] EXAMPLE 12. 35 26. Application of the Addition and Mtdtiplication Theorems in Problems in Probabilities. — We shall next proceed to illustrate the theorems of the preceding paragraphs by a few examples. First, we shall apply the demonstrated theorems to some of the examples we have already solved by a direct application of the fundamental definition of a mathematical probability. Example 11. — We take first of all our old friend, the problem of D'Alembert. What is the probability of throwing head at least once in two successive throws with an uniform coin? This problem is most easily solved by finding the probability first for not getting head in two successive throws. By the multiplication theorem this probability is: p = 2 X 2 = i- Then the probability to get head at least once is 1 — J = f from a simple application of the rule in § 25. A more lengthy analysis is as follows. Denoting the event by E, the following cases may appear which may bring forth the desired event: Head in first throw which we shall denote by Hx and head in second throw which we denote by H2, or head in first throw {H^ and tail in second {Ti), or finally tail in first (Ti) and head in second {H^). Then we have: E= Hr- H2+H,- T,+ Tx- H2, or: P{E) = P(Fi) ■ PiH,) + P(i7i) ■ PiT2) + P(fi) • P{H2) — 2^2I2'^2I2'^2 — 4' 27. Example 12. — What is the probability of throwing at least twelve in a single throw with three dice? The expected event occurs when either 12, 13, 14, . . . or 18 is thrown. Of these events only one may happen at a time. We may, there- fore, apply the addition theorem and obtain as the total prob- ability: p = pu + Pu + Pu+ ■ ■ ■ + Pis- where pu, Pn, ■ • ■ Pis are the respective probabilities for throwing the sums of 12, 13, • • • or 18. These subsidiary probabilities were determined in § 13 under the problem of Galileo, and: P = 2¥6 + 2¥6 +^^6 + A°6 + l^fe + ^6 + jh = II- 36 THE ADDITION AND MULTIPLICATION THEOREMS. [2.^ 28. Example 13. — An urn contains a white, b black and c red balls. A single ball is drawn a + /3 + 7 times in succession, and the ball thus drawn is replaced before the next drawing takes place. To determine the probability that (1) there are first a white, then /3 black and finally 7 red balls, (2) the drawn balls appear in three closed groups of a white, (3 black and 7 red balls, but the order of these groups is arbitrary, (3) that white, black and red balls appear in the same number as above, but in any order whatsoever. 1. Denoting the three subsidiary events for drawing a white, j3 black and 7 red balls by Fi, F2 and F3, and the main event for drawing the balls in the prescribed order by E, we may write the probability for the occurrence of the main event in following symbolic form involving symbolic probabilities: Substituting the algebraic values for P{Fi), P{Fi) and PiFz) in the expression for P{E), and then applying Hausdorff's rule (§ 24) we get: a°- h^ c^ P{E) = pi = (^ ^ 5 ^ ^Y X (a + 6 4- c)» >< (a + 6 + cy {a+b + c)»+^+i' • 2. In the second part of the problem the order of the three different groups is immaterial. The three subsidiary events: Fi, Fi and F3, may therefore be arranged in any order whatsoever. The total number of arrangements is 3! = 6. The probability of the happening of any one of these arrangements separately is the same as the probability computed under (1). By applying the addition theorem we get therefore as the probability of the occurrence of this event: ^' " (a+6 + c)»+«+v 3. The third part is more easily solved by a direct application of the definition of a mathematical probability. The order of the balls drawn is here immaterial. Of each individual com- 30] EXAMPLE 15. 37 bination of a white, /S black balls and 7 red balls it is possible to form (a + |3 + y)lja\^\y\ different permutations as the total number of favorable cases. The above number of equally pos- sible cases is here (a + 6 + c)""''^"'"'''. Hence we have: (a + ^ + 7) ! ,, a'^h^c' P3 = -T^r-, X a\^ly\ '^ (a+ &+c)''+^+^" 29. Example 14. — In an urn are n balls among which are a white and /3 black. What is the probability in three successive drawings to draw (1) first two white and then one black ball, (2) two white and one black ball in any order whatsoever? (a+jS^n). The probability to draw first one white, then another white and finally a black ball is: a (g - 1 ) ^ /3 Vt. = -7Z inX n(n- I) {n - 2) •* The probability for any of the other arrangements is the same, or we have for (2) 3q! (a - 1) /3 P2 = 3pi = — 7 -TT X 7 ^ . •^ -^ n {n — \) (n — 2) 30. Example 15. — What is the chance to throw a doublet of 6 at least once in n consecutive throws with two dice? (Pascal's Problem.) Chevalier de Mere, a French nobleman and a great friend of all games of chance, went more deeply into the complex of causes in different games than most of the ordinary gamblers of his time. Although not a proficient mathematician he understood suffi- cient, nevertheless, to give some very interesting problems for which he got the ideas from the gambling resorts he frequented. De Mere was a friend of the great French mathematician and philosopher, Blaise Pascal, and went to him whenever he wanted information on some apparently obscure point in the different games in which he participated. The chevalier had from patient observation noticed that he could profitably bet to throw a six at least once in four throws with a single die. He reasoned now that the number of throws to throw a doublet at least once with two dice ought to be proportional to the corresponding equal number of possible cases with a single die. For one die there are 38 THE ADDITION AND MULTIPLICATION THEOHEMS. [31 6 possible cases, for two 36. Thus de Mere thought he could solely bet to throw a doublet of 6 in 24 throws with two dice. An actual trial by several games of dice proved extremely disastrous to the finances of the nobleman, who then went to Pascal for an explanation. Pascal solved the problem by a direct application of the definition of a mathematical probability. We shall, however, solve it by an application of the multiplication theorem. The probability to get a doublet of 6 in a single throw is ^^g. The probability of not getting a double six is therefore 1 — jg = fg. The probability of the happening of this event n con- secutive times is (ff )". Thus the probability of getting a double six at least once in n throws with two dice becomes: p = 1 — (ff)"- Solving this equation for n we shall have: ^ log (1 - p) log 35 — log 36 ' for p = ^ we shall have: _ log 2 _ log 36 — log 35 First for 25 throws we may bet safely one to one while for 24 throws such a bet was unfavorable. This shows the fallacy of de Mere's reasoning. 31. Example 16. — An urn, ^4, contains a balls of which a are white, another similar urn, B, contains- b balls of which /3 are white. A single ball is drawn from one of the two urns. What is the probability that the ball is white? The beginner may easily make the following error in the solution of this problem. The probability to get a white ball from A is a/a, from B, P/b. Thus the total probability to get a white ball is : a/a + fi/b. This result is, however, wrong, for we may, by selecting proper values for a, b, a and P, obtain a total probability which in numerical value is greater than unity. Thus if a — 1, 6=7, a = 5, /3 = 4, we get as the total probability: ^ — 6 I 4 _ 9 This result is evidently wrong, since a mathematical probability is never an improper fraction. The error lies in the fact that we 32] EXAMPLE 17. 39 have regarded the two events of drawing a ball from either urn as independent and mutually exclusive. A simple application of the symbolic rule for relative probabilities will give us the result immediately. The main event, E, is composed of the two following subsidiary events: (1) to get a white ball from A, or (2) to get a white ball from B. We shall symbolically denote these two events by A ■ W and B ■ W respectively. Thus we have: P{E) = PiA-W) + P{B-W) = P(A)P^(W) + PiB)Ps(W). Now the probabihty to obtain urn A is P(A) = Pi = 2. also to get B: P(B) = P2 = |. The probability to get a white ball from A when this particular urn is previously selected is expressed by the relative probability: Similarly for B: Pb{W) = P4 = f . Substituting these different values in the expression for P(-E) we get finally: For the particular numerical example we have: = 1/5 4\^^ ^ 2\7^7l 14" 32. Example 17. — The probability of the happening of a certain event, E, is p, while the probability for the non-occurrence of the same event is q = 1 — p. The trial is now to be repeated n times. The probability that there will be first a successes and then /3 failures is: P{E'')Pe'^ {&) = p'^ ■ q^ia + P= n). This is the probability that the two complementary events E and E happen in the order prescribed above. When the order, in which the successes and failures happen, plays no role during the n trials, that is to say it is only required to obtain a successes 40 THE ADDITION AND MULTIPLICATION THEOREMS. [33 and j3 failures in any order whatsoever in n total trials, then the arrangement of the a factors p and /3 factors q is immaterial. The total number of arrangements of n elements of which a are equal to p and (3 equal to q is simply n\/{al X /S !) . For any one particu- lar arrangement of a factors p and /3 factors q the probability of the happening of the two complementary events in this particular arrangement is equal to p" ■ q^. The Addition Theorem im- mediately gives the answer for a successes and /3 failures in any order whatseover as: p{E--E^) = p^ = ["Dv^r- Let us, for the present, regard this probability as being a function of the variable quantity, a, {n being a constant quantity). We may then write: Va = '-+--l- By actual multiplication we get a power series in x. The terms containing x' are obtained in the following manner : the first term of the first factor being multiplied with the term (i -I- g 1 \ I x' of the second factor, the second term of the first factor multiplied with the term: (i -\- s — ?i — 2\ I ^.s-n-i (jf j.}jg second factor, s — n — I I the third term of the first factor multiplied with the term: („ I a;«-2n-2 pf ^jjg second factor. s -2n-2 I 42 THE ADDITION AND MULTIPLICATION THEOREMS. [34 Thus the coefficient of a;' is equal to n + s—l\ /i\/i + s — n — 2\ \ s }~ \l)\ s- n- 1 I + \2/ \ s-2n-2 )~ The above expression may by further reductions be brought to the form: (^+l)(^+2) ... (s+i-l) 1 ■ 2 • ■ • (i - 1) -(;) (*- n)(s ■ -n+1) ■■• is — n -]- i ' — 2) 1 • 2 ■ ■ ■ (i — 1) (s- 2n- ■ l){s-2n) ■ (.s - 2n + i — 3) 1 • 2 . . ■ (2 - 1) The series breaks of course as soon as negative factors appear in the numerator. The required probabihty is therefore _ 1 f(5+l)(5+2) ■■■ (^+i-l) (n + 1)M 1 • 2 • • • (i - 1) (s - n)(.? - n + 1) • • • (5 - n + i - 2) (1) 1 2 ■ • • (i - 1) 34. Example 19. — If a single experiment or observation is made on n pairs of opposite (complementary) events, Fi^ and F^ with the respective probabilities of happening Pa and g„ (a = 1, 2, 3, • • • n), to determine the probability that: (1) exactly r, (2) at least r of the events 2?,, will happen. This problem is of great importance, especially in life assurance mathematics. It happens frequently that an actuary is called upon to determine the probability that exactly r persons will be alive m years from now out of a group of n persons of any age whatsoever, each person's age and his individual coefficient of survival through the period being known beforehand. Various demonstrations have been given of this problem. The first elementary proof was probably due to Mr. George King, 34] EXAMPLE 19. 43 the English actuary, in his well-known text-book. The Austrian mathematician and actuary, E. Czuber, has simplified King's method in his " Wahrscheinlichkeitsrechnung " (1903). Eater the Italian actuary, Toja, has given an elegant proof in Bolletino degli Attuari, Vol. 12. Finally another Italian mathematician, P. Medolaghi, has investigated the problem from the standpoint of symbolic logic. In the following we shall adhere to the demon- stration of Czuber and also give a short outline of the symbolic method. In order to answer the first part of the problem we must form all possible combinations of r factors of p and n — r factors of q and then sum all such combinations of n factors. Denoting the event by £[,.] we have: = 22J„p3 ■ • • (1 - pj(l - pj • • • (1 - pj. We shall now denote the sum of all products in (1) containing

"• 34] EXAMPLE 19. 45 If we expand the algebraic expression: we have: +(-l)-'(„", )«■-••■■ We may therefore write P{E) = ,.. . „.,_)_^ ,. when every expo- nent is replaced by an index number {i. e., S* replaced by S^) and the expansion broken off at the term S". The student must of course constantly bear in mind the symbolic meaning of S^. The second part of the problem is easily solved by the sym- bolic method. Denoting this particular event hj Er, we have the following identity: P{Er) - P{Er+{) = P(£p,) or P{Er) - P{E,r,) = P{Er+l). The following relations are self-evident: P(Eo) = 1; S" 1 PiEm) - i^s~ 1 + S' P{Ei) = P{Eo) - P{EJ = 1 - jqp^, also; __S S S^ P{E2) = PiEi) - P{E,,,) - 1 ^ 5 (1 + s)2 - (1 + S)2- The complete induction gives us finally: _ S' P{Er) -^j ^gy. Assuming the rule is true for r, we may easily prove it is true for r + 1 also. We have in fact: (1 + sy (1 + s)^i (1 + s)^i • 46 THE ADDITION AND MULTIPLICATION THEOREMS. [35 35. Example 20. Tchebycheff's Problem.— The following solu- tion of a very interesting problem is due to the eminent Russian mathematician, Tchebycheff, one of the foremost of modern analysts. A proper fraction is chosen at random. What is the proba- bility it is in its lowest terms? Stated in a slightly different wording the same question may also be put as follows: If A/B is a proper fraction, what is the probability that A and B are prime to each other? If Pi, Ps, Pi, ■ ■ ■ Pm denote respectively the probabilities that each of the primes 2, 3, 5, • • • m is not a common factor of numerator and denominator of A/B, then the probability that no prime number is a common factor is: P = P2 ■ Pi ■ Pi ■ ■ ■ Pm ■ ■ ■ p, ■ ■ ■ a,P2=^---Pn = ^. The arithmetic mean of all the numbers written on the balls is: O-lXl + 023:2 + • • • anXn N which agrees with the mean as defined above. 38] VARIOUS EXPLANATIONS OF THE PARADOX. 51 37. The Petrograd (St. Petersburg) Problem. — In this con- nection it is worthy to note a celebrated problem, which on account of its paradoxical nature has become a veritable stumb- ling block, and has been discussed by some of the most eminent writers on probabilities. The problem was first suggested by Daniel Bernoulli in a communication to the Petrograd — or as it was then called St. Petersburger Academy — in 1738. The Petrograd problem may shortly be stated as follows : Two persons A and B are interested in a game of tossing a coin under the following conditions. An ordinary coin is tossed until head turns up, which is the deciding event. If head turns up the first time A pays one dollar to B, if head appears first at the second toss B is to receive two dollars, if first at the third time four dollars and so on. What is the mathematical expectation of J?? Or in other words, how much must B pay to A before the game starts in order that the game may be considered fair? The mathematical expectation of B in the first trial is 5X1 = 2- The mathematical expectation for head in second throw is {^y X 2 = ^. Or in general the mathematical prob- ability that head appears for the first time in the nth toss is (§)", and the co-ordinated expectation is 2"~^-j-2" = |. Thus the total expectation is expressed by the following series: 2 + 2 + 2 + ■ ■ ■• \Vhen n = 00 as its limiting value it thus appears that B could afford to pay an infinite amount of money for his expected gain. 38. Various Explanations of the Paradox. The Moral Expec- tation. — This evidently paradoxical result has called forth a num- ber of explanations of various forms by some eminent mathe- maticians. One of the commentators was D'Alembert. It was to be expected that the famous encyclopaedist, who — as we have seen — did not view the theory of probabilities in too kindly a manner, would not hesitate to attack. He returns repeatedly to this problem in the "Opuscules" (1761) and in "Doutes et questions" (Amsterdam, 1770). D'Alembert distinguishes between two forms of possibilities, viz., metaphysical and physical possibilities. An event is by 52 MATHEMATICAL EXPECTATION [38 him called a metaphysical possibility, when it is not absurd. When the event is not too "uncommon" in the ordinary course of happenings it is a physical possibility. That head would appear for the first time after 1,000 throws is metaphysically possible but quite impossible physically. This contention is rather bold. "What would," as Czuber remarks, "D'Alembert have said to an actual reported case in 'Grunert's Archiv' where in a game of whist each of the four players held 13 cards of one suit." The numerical probability of such an event as expressed by mathematical probabilities is (635013559600)^. D'Alembert's definitions including the half metaphorical term "ordinary course" are rather vague. And what numerical value of the mathematical probability constitutes the physical impossibility? D'Alembert gives three arbitrary solutions for the probability of getting head in the nth throw, namely : 1 1 1 2"(1 + jSn") ' 2"+"" ' 2"B where a, /3, B, K are constants and q an uneven number. Daniel Bernoulli himself gives a solution wherein he introduces the term "moral expectation." If a person possesses a sum of money equal to x then according to Bernoulli , kdx is the moral expectation of x, k being a constant quantity. Integrating. we get: Jdy = k j — = k(log b — log a) = k log-, which is the moral expectation of an increase h — a of an original value a. If now x denotes the sum owned by B we may replace the mathematical expectation by their corresponding moral ex- pectations, that is to say replace 2"-V2" by (1/2") log ((a+2"~i)/x) and we then have: ^/'ll ^+l_i.ii ^+2 1 x + 2- \ 38] EXPLANATION OF PARADOX. 53 which is a convergent series. In this connection, it may be mentioned that the Bernoullian hypothesis has found quite an extensive use in the modern theory of utility. De Morgan in his splendid httle treatise "On Probabihties" takes the view that the solution as first given is by no means an anomaly. He quotes an actual experiment in coin tossing by Buffon. Out of 2,048 trials 1,061 gave head at the first toss, 494 at the second, 232 at the third, 137 at the fourth, 56 at the fifth, 29 at the sixth, 25 at the seventh, 8 at the eighth and 6 at the ninth. Computing the various mathematical expectations, we find that the maximum value is found in the 25 sets with head in the seventh toss, which gives a gain of 25 X 64 = 1,600. The most rare occurrence, the 6 sets of head in the ninth throw gives a gain of 6 X 25G = 1,536, which is the next highest gain in all the nine sets. De INIorgan furthermore contends that if Buffon had tried a thousand times as many games, the results would not only have given more, but more per game, arguing "that a larger net would have caught not only more fish but more varieties of fish; and in two millions of sets, we might have seen cases in which head did not appear till the twentieth throw." Further- more, "the player might continue until he had realized not only any given sum, but any given sum per game." Therefore according to De Morgan the mathematical expectation of a player in a single game must be infinite. CHAPTER VI. PROBABILITY A POSTERIORI. 39. Bayes's Rule. A Posteriori Probabilities. — The problems hitherto considered have all had certain points in common. Before entering upon the calculations of the mathematical probability of the happening of the event in question, we knew beforehand a certain complex of causes which operated in the general domain of action. We also were able to separate this general complex of productive causes into two distinctive minor domains of complexes, of which one would bring forth the event, E, while the other domain would act towards the production of the opposite event, E. Furthermore we also were able to measure the respective quantitative magnitudes of the two domains, and then, by a simple algebraic operation, determine the probability as a proper fraction. The addition and multi- plication theorems did not introduce any new principles, but only gave us a set of systematic rules which facilitated and shortened the calculations of the relations between the different absolute probabilities. The above method of determination of a mathematical probability is known as an a priori determina- tion, and such probabilities are termed a priori probabilities. The problems treated in the preceding chapters have, nearly all, been related to different games of chance, or purely abstract mathematical quantities. The inorganic nature of this kind of problems has made it possible for us to treat them in a relatively simple manner. In many of the problems, which we shall con- sider hereafter, organic elements enter as a dominant factor and make the analysis much more complicated and difficult. All social and biological investigations, which are of a much larger benefit and practical value than the problems in games of chance, lead often to a completely different category of probabil- ity problems, which are known as " a posteriori probabilities." In problems where organic life enters into the calculations, the complex of productive causes is so varied and manifold, that 54 40] DISCOVERY AND HISTORY OF THE RULE. 55 our minds are not able to pigeonhole the different productive causes, placing them in their proper domains of action. But we know that such causes do exist and are the origin of the event. If now, by a series of observations, we have noticed the actual occurrence of the event, E (or the occurrence of the opposite event E), the problem of the determination of an a posteriori probability to find the probability that the event E originated from a certain complex, say F. We must then, first of all, form a complete hypothetical judgment of the form: E either happens from the complexes Fi, or f 2, or F^, • • • or f „. But we must not forget that, in general, the different complexes F^ (a = 1, 2, • • •, n) of the disjunction are not known a priori. We must, therefore, determine the respective probabilities for the actual existence of such disjunctive complexes F^. These probabilities of existence for the complexes of causes are in general different for each member, a fact which often has been overlooked by many investigators and writers on a posteriori probabilities, and which has given rise to meaningless and paradoxical results. 40. Discovery .and History of the Rule. — The first discoverer of the rule for the computation of a posteriori probabilities by a purely deductive process was the English clergyman, T. Bayes. Bayes's treatise was first published after the death of the author by his friend. Dr. Price, in Philosophical Transactions for 1763. The treatise by the English clergyman was, for a long time, almost forgotten, even by the author's own countrymen; and later English writers have lost sight of the true " Bayes's Rule " and substituted a false, or to be more accurate, a special case of the exact rule, in the different algebraic texts, under the discus- sion of the so called " inverse probabihties," a name which is due probably to de Morgan, and which in itself is a great misnomer. This point, presently, we shall discuss in detail. The careless application of the exact rule has recently led to a certain distrust of the whole theory of " a posteriori proba- bilities." Scandinavian mathematicians were probably the first to criticize the theory. In 1879, Mr. J. Bing, a Danish actuary, took a very critical attitude towards the mathematical principles underlying Bayes's Rule, in a scholarly article in the mathe- 56 PEOBABILITY A POSTERIORI. [41 matical journal Tidsskrift for Matematik. Bing's article caused a sharp, and often heated, discussion among the older and younger Danish mathematicians at that time; but his views seem to have gained the upper hand, and even so great an authority on the whole subject as the late Dr. T. Thiele, in his well-known work, " Theory of Observations " (London, 1903), refers to Bing's article as "a crushing proof of the fallacies underlying the determination of a posteriori probabilities by a purely deductive method." As recently as 1908, the Danish writer on philosophy. Dr. Kroman, has taken up cudgels in defense of Bayes in a contribution in the Transactions of the Royal Danish Academy of Science, which has done much towards the removal of many obscure and erroneous views of the older authors. Among English writers, Professor Chrystal, in a lecture delivered before the Actuarial Society of Edinburgh, has also given a sharp criticism of the rule, although he does not go so deeply into the real nature of the problem as either Bing or Kroman. Despite Chrystal's advice to " bury the laws of inverse prob- abilities decently out of sight, and not embalm them in text books and examination papers " the old view still holds sway in recent professional examination papers. It is therefore absolutely necessary for the student preparing for professional examinations to be acquainted with the theory. In the following paragraphs we shall, therefore, give the mathematical theory of Bayes's Rule with several examples illustrating its application to actual problems, together with a criticism of the rule. 41. Bayes's Rule (Case I). — {The different complexes of causes producing the observed event, E, possess different a priori proba- bilities of existence.) Let E denote a certain state or condition, which can appear under only one of the mutually exclusive complexes of causes: Fi, Fi, ■ ■ ■ and not otherwise. Let the probability for the actual existence of Fi be k\ and if Fi really exists then let wi be the " productive probability " for bringing forth the observed event, E {E being of a different nature from F), which can only occur after the previous existence of one of the mutually exclusive complexes, F. Let, in the same manner, F2 have an " existence probability " of ko and a " productive probability " of C02, Fs an existence probability of K3 and a pro- 41] BAYES'S RULE. 57 ductive probability of C03 • • • etc. If now, by actual observa- tion, we have noted that the event E has occurred exactly m times in n trials, then the probability that the complex i^i was the origin of £ is: ^^-2«..co.'"(l-coJ"-'» («= 1.2,3, •••). Similarly that complex F2 was the origin: K2 ■ COz^Cl — C02)"~" Q2 = 2/c„ • co.-Cl - wj" and so on for the other complexes. Proof. — Let the number of equally possible cases in the general domain of action, which leads to one of the complexes F^, be t. Fiu-thermore, of these t cases let /i be favorable for the existence of complex f i,/2 for F2,fz for F3, • • • , etc. Then the probabilities for the existence of the different complexes F^{a = 1,2,3, ■ ■ ■ n) are: /i fi h .. , Ki = — , K2 = -7 , Kz = -r • • • respectively. Of the /i favorable cases for complex Fi, Xi are also favorable for the occurrence of E. Of the fi favorable cases for complex F2, X2 are also favorable for the occmrence of E. Of the fi favorable cases for complex Fz, X3 are also favorable for the occurrence of E. The probability of the happening of E under the assumption that F-i exists, i. e., the relative probability: Pj,-^{E), is: X, COi /x x„ w„ fa or in general: (a = 1, 2, 3, • • •). The total number of equally likely cases for the simultaneous occurrence of the event E with either one of the favorable cases 58 PROBABILITY A POSTERIORI. [41 foTFi,F,,F3, •••Is: Xi + X2 + X3 + • • • = SX.. The number of favorable cases for the simultaneous occurrence of Fi and E is Xi, for the simultaneous occurrence of F2 and E, X2, • • • , etc. Hence: we have as measures for their corresponding probabilities ^ Xi _ X2 '^'-■EK' ^'~EK' But Xl = COl fi, X2 = CO2 • /2, ■■■, etc.. and /l = Kl • t, f2 = K2 ■ t, ■ • • , etc. Hence \l = COl ■ Kl • t, X2 = 0)2 • K2 ■ i, ■ • ■ , etc. Substituting these values in the above expression for Qi, Q2, we get: ^ Kl • Wl /C2 • CO2 as the respective probabilities that the observed event originated from the complexes Fi, F2, F3, ■ ■ ■ . Such probabilities are called a posteriori probabilities. Let us now for a moment investigate the above expression for Qi, Q2, ■ ■ ■■ The numerator in the expression for Qi is ki • wl But Kl is simply the a priori probability for the existence of Fi while COl is the a priori productive probability of bringing forth the event observed from complex F^. The product ki ■ coi is simply the relative probability Pp^{E), or the probability that the event E originated from Fi. In the denominator we have the expression Sk„w„ (a = 1, 2, ■ ■ ■ n) which is the total proba- bility to get E from any of the complexes F^. From example 17 (Chapter W) we know that the probability to get E exactly m times from Fi in n total trials is: Vi= (^)'Ci-a;r(l-coi)»-™ and the probability to get E from any one of the complexes, F, 43] BAYES'S RULE. 59 m times out of w is: 2P« = {D ^K^ ■ o>:-{X - coj"- (a = 1, 2, 3, ■ • ■). If, by actual observation, we know the event E to have happened exactly m times out of n, then the a posteriori probability that Fi was the origin is: ( "^ ) Ki ■ cor(l - co,)"-^ ei=-77^T (a= 1,2,3, ■••). (I) rials (™)i The factorials I j in numerator and denominator cancel each other of course. It will be noticed that, in the above proof, it is not assumed that the a posteriori probability is proportional to the a priori probability, an assumption usually made in the ordinary texts on algebra. 42. Bayes's Rule (Case 11). — (Special Case. The a priori probabilities of existence of the different complexes are equal.) Sometimes the different complexes F may be of such special characters thai their a priori probabilities of existence are equal, i. e., Kl = Ki = Ki = K4: ■ ■ ■ Kn- In this case the equation (I) simply reduces to: _ cord -coO — Equation (I) gives, however, the most general expression for Bayes's Rule which may be stated as follows: If a definite observed event, E, can originate from a certain series of mutually exclusive complexes, F, atid if the actual occurrence of the event has been observed, then the probability that it originated from a specified complex or a specified group of complexes is also the " a posteriori " probability or probability of existence of the specified complex or group of complexes. 43. Determination of the Probabilities of Future Events Based Upon Actual Observations. — It happens frequently that 60 PROBABILITY A POSTERIORI. [43 our knowledge of the general domain of action is so incomplete, that we are not able to determine, a priori, the probability of the occurrence of a certain expected event. As we already have stated in the introduction to a posteriori probabilities, this is nearly always the case with problems wherein organic life enters as a determining factor or momentum. But the same state of affairs may also occur in the category of problems relating to games of chance, which we have hitherto considered. Suppose we had an urn which was known to contain white and black balls only, but the actual ratio in which the balls of the two different colors were mixed, was unknown. With this knowledge beforehand, we should not be able to determine the probability for the drawing of a white ball. If, on the other hand, we knew, from actual experience by repeated observations, the results of former draw- ings from the same urn when the conditions in the general domain of action remained unchanged during each separate drawing, then these results might be used in the determination of the prob- ability of a specified event by future drawings. Our problem may be stated in its most general form as follows : Let F^ denote a certain state or condition in the general domain of action, which state or condition can appear only in one or the other of the mutually exclusive forms : Fi, F2, F.3, • • ■ . and not otherwise. Let the probability of existence of Fi, F2, F^, • • • be Ki, K2, K3, ■ ■ ■ respectively, and when one of the complexes Fi, F2, Fz, ■ ■ ■ exists (occurs) let wi, W2, ws, • • • be the respective pro- ductive probabilities of bringing forth a specified event, E. If now, by actual observation, we know the event, E, to have happened exactly m times out of n total trials (the conditions in the general domain of action being the same at each individual trial), what is then the probability that the event, E, will happen in the (n + l)th trial also? By Bayes's Rule we determined the " a posteriori " probabili- ties or the probabilities of existence of the complexes Fi, F2, ■ ■ ■ as: (a= 1,2,3, ■••). In the (n + l)th trial E may happen from any one of the mutually 44] APPLICATION OF BAYES's RULE. 61 exclusive complexes: Fi, F2, F3, • ■ • whose respective probabilities in producing the event, E, are wi, W2, W3, • • • . The addition theorem then gives us as the total probability of the occurrence of E in the (n + l)th trial: Ra = '^PfJ.E) = Ql • Wl + Q2 • CO2 + Qi • 033 ^ Sk, ■ co^^-Cl - CO.)"— • CO, . _ ^ „ „ . (Ill) "^■■' S/c, • co.^a - coj"-- l«- i.A'i, ••■;• If the a priori probabilities of existence are of equal magnitude (Case II) the factors k in the above expression cancel each other in numerator and denominator and we have ^ Sco.-'d - coJ"-"co. 44. Examples on the Application of Bayes's Rule. — Example 21. — An urn contains two balls, white or black or both kinds. What is the probability of getting a white ball in the first draw- ing, and if this event has happened and the ball replaced, what is then the probability to get white in the following drawing? Three conditions are here possible in the urn. There may be 0, 1, or 2 white balls. Each hypothetical condition has a proba- bility of existence equal to \, and the productive probabilities for white are 0, | and 1 respectively. The total probability to get white is therefore: If we now draw a white ball then the probabilities that it came from the complexes: F-^, F^, F3, respectively, are: n_^i i_^l 1^1 U • 2) 6 • 2> 3 • 2- These are also new existence probabilities of the three proba- bilities. The probability for white in second drawing is therefore . (0-i)0+(iH-i)|+(i^i)l = f. This solution of the problem is, however, not a unique solution, because it is an arbitrary solution. It is arbitrary in this respect, that we have without further consideration given all three com- plexes the same probability of existence, f . We shall discuss 62 PROBABILITY A POSTERIORI. [45 this part of the question under the chapter on the criticism of Bayes's Rule. Example 22. — An urn contains five balls of which a part is known to be white and the rest black. A ball is drawn four times in succession and replaced after each drawing. By three of such drawings a white ball was obtained and by one drawing a black ball. What is the probability that we will get a white ball in the fifth drawing? In regard to the contents of the urn the following four hypoth- eses are possible: F\: 4 white, 1 black balls, ^2: 3 (( 2 Fz: 2 (( 3 F,: 1 a 4 Since we do not know anything about the ratio of distribution of the different colored balls, we may by a direct application of the principle of insufficient reason regard the four complexes as equally probable, or: K\= Kl= Kz = Ki= J. If either Fi, Fi, F3 or f 4 exists, the respective productive probabilities are: ^,_4 .,_3 ,,_2 ,,_! By a direct substitution in the formula: Sco^^Cl - CO.)"-™ • w„ R Sco.™(l - CO J"-™ (a = 1, 2, 3, 4) f or w = 4 and m = 3 we get: „ anm) + (f)''(i)(f) + (f)^(i)(f) + (mm) ,, (i)'(i) + (f )^(f) + (i)^(f) + {\m) ' ^^- 45. Criticism of Bayes's Rule. — In most English treatises on the theory of chance the " a posteriori " determination of a mathematical probability is discussed under the socalled " in- verse probabilities." This somewhat misleading name was prob- ably first introduced by the eminent English mathematician and actuary, Augustus de Morgan. In the opening of the discussion 45] CRITICISM OF BAYES's RULE. 63 of a posteriori probabilities in the third chapter of his treatise, " An Essay on Probabilities," de Morgan says: " In the preceding chapter, we have calculated the chances of an event, knowing the circumstances under which it is to happen or fail. We are now to place ourselves in an inverted position, we know the event, and ask what is the probability which results from the event in favor of any set of circumstances under which the same might have happened." Is this now an inverse process? By the a priori or — as de Morgan prefers to call them, — the direct prob- abilities, we started from a definitely known condition and de- termined the probability for a future event, E, or what is the same, the probability of a specified future state of affairs. Here we start knowing the present condition and try to determine a past condition. The process apparently appears to be the inverse of the former, although they both are the same. We possess a definite knowledge of a certain condition and try to determine the probability of the existence of a specified state of affairs, in general different from the first condition, but whether this state of affairs occurred in the past or is to occur in the future has no bearing on our problem. In other words, time does not enter as a determining" factor. And even if we were willing to admit the two processes of the determination of the different probabil- ities to be inverse, the probabilities themselves can not be said to be inverse. Nevertheless, this misleading name appears over and over again in examination papers in England and in America as a thoroughly embalmed corpse which ought to have been buried long ago. What is really needed, is a change of customary nomenclature in the whole theory of probability. Instead of direct and inverse, a priori and a posteriori probabilities, it would be more proper to speak about " prospective " and " retro- spective " probabilities in the application of Bayes's Rule. All probabilities are in reahty determined bj- an empirical process. That there is a certain probability to throw a six with a die we only know after we have formed a definite conception of a die. The only probabilities which we perhaps rightly may name a priori are the arbitrary probabilities in purely mathematical problems where we assume an ideal state of affairs. " There is," to quote the Danish writer on logic. Dr. Kroman, " really 64 PROBABILITY A POSTERIORI. [46 more reason to doubt the a priori than the a posteriori probabil- ities, and it would be more natural and also more exact in the application of Bayes's Rule to speak about the actual or original and the new or gained probability." The discussion above has really no direct bearing on Bayes's Rule but was introduced in order to give the student a clearer understanding of the main principles underlying the whole deter- mination of a posteriori probabilities by means of actual experi- mental observations, and also to remove some obscure points. From his ordinary mathematical training every student of mathe- matics has an almost intuitive understanding of an inverse process. Naturally when he encounters again and again the customary heading: "inverse probabilities " in text-books he obtains from the very start — almost before he starts to read this particular chapter — an inverse idea of the subject instead of the idea he really ought to have. Nowhere in continental texts on the theory of probabilities, will the reader be able to fiild the words direct and inverse applied in the same sense as in English texts since the introduction of these terms by de Morgan. We shall advise readers who have become accustomed to the old terms to pay no serious attention to them. 46. Theory Versus Practice. — In § 41 we reduced Bayes's Rule to its most general form: This is an exact expression for the rule, but it is at the same time almost impossible to employ it in practice. Only in a few exceptional cases do we know, a priori, the different values of the often numerous probabilities of existence /c„, of the complexes F^, and in order to apply the rule with exact results we require here sufScient facts and information about the different com- plexes of causes from which the observed event, E, originated. Bayes deduced the rule from special examples resulting from drawings of balls of different colors from an urn where the different complexes of causes were materially existent. The probability of a cause or a certain complex of causes did not here mean the probability of existence of such a complex but the probability 46] THEORY VERSUS PRACTICE. 65 that the observed event originated from this particular complex. In order to elucidate this statement we give following simple example: Example 23. — A bag contains 4 coins, of which one is coined with two heads, the other three having both head and tail. A coin is drawn at random and tossed four times in succession and each time head turns up. What is the probabihty that it was the coin with two heads? The two complexes Fi and Fi, which may produce the event, E, are: Fi, the coin with two heads, and Fi, an ordinary coin. The probability of existence of Fi is the probability of drawing the single coin with two heads which is equal to \, the probability of existence for the other complex, F2, is equal to |. The respective productive probabilities are 1 and \. Thus ki = \, /C2 = 4, wi = 1 and W2 = 2- Substituting these values in formula (I) (m = 4, m = 4), we get: Q = (i X 1^) -^ (J X 1^ + ! X {\Y) = i - Jl = H- But in most cases we do not know anything about the material existence of the complexes of causes from which the event, E, originated. On the contrary, we are forced to form a hypothesis about their actual existence. To start with a simple case we take example 21 of § 44. We assumed here three equally possible conditions in the urn before the drawings, namely the presence of 0, 1, or 2 white balls. From this assumption we found the probability to get a white ball in the second drawing, after we had previously drawn a white ball and then put it back in the urn before the second drawing, to be equal to |. As we already remarked, this solution is not unique because it is an arbitrary solution. It is arbitrary to assign, without any consideration whatsoever, | as the probability of existence to each of the three conditions. Let us suppose that each of the two balls bore the numbers 1 and 2 respectively. We may then form the following equally likely conditions: 6162, hiW2, biWi, wiiVi, each condition having an a priori probability of existence equal to \ and a productive probability for the drawing of a white 6 66 PROBABILITY A POSTERIORI. [46 ball equal to: 0, §, | and 1 respectively. Thus: and Kl = Ki — Ki = Ki = I Wl =0, W2 = I, '•'3 = h W4 = 1- The respective a posteriori probabilities, that is the new or gained probabilities of the four hypothetical conditions, become now by the application of Bayes's Rule (Formula II) : Qi — 2' Q2 = 2 "=" 2, Q3 = 2 "=" 2, Qi = 2- Hence the probability for white in the second drawing is: ( Formula lY: R = - ^ lti \„ T ) E = ^ 2 + (I -^ 2) + (i ^ 2) + (1 - 2) = i In the first solution we got | for the same probability. Which answer is now the true one? Neither one! The true answer to the problem is that it is not given in such a form that the last question — the probability of getting a white ball in the second drawing — may be settled without any doubt. The answer must be conditional. Following the first hypothesis we got |, while the second hypothesis gives f as the answer. We next proceed to example 22 which is almost identical in form to the first one, the only difference being a greater variety of hypothetical conditions. We started here with the following four hypotheses: Fi: 4 white, 1 black ball, F2: 3 white, 2 black, F3: 2 white, 3 black and ^^4: 1 white and 4 black balls, assigning J as the hy- pothetical existence probability. By marking the 5 balls similarly as in the last example, with the numbers from 1 to 5 we may form the complexes: Fi: 4 white and 1 black ball in (5) ways, ^2: 3 " "2 " balls " (i) " F3: 2 " "3 " " " (I) " ^4=1 " " 4 " " " (6) " This gives us a total of 5 + 10 + 10 + 5 = 30 different complexes. Assuming all of these complexes equally likely 47] PROBABILITIES EXPRESSED BY INTEGRALS. 67 to occur, we get following probabilities of existence and pro- ductive probabilities: Kl = K2 = K3 = K4 = • • • = K30 = ^V wi = &;2 = W3 = C04 = C05 = f (Productive prob. for Fi) coe = wy = '^s = • • • = W16 = f (Productive prob. for F2) W16 = CO17 = • • • = £025 = f (Productive prob. for F3) W26 = W27 = W28 = W29 = W30 = 3- (Productive prob. for Fi). The total probability of getting a white ball in the second . . Sw^l - CO J CO, ^ 1 o "} •3n^ drawing is now -r^; — 57- r — {a = I, 2, S, • ■ ■ , 60). 2/C0„ (,1— coj Actual substitution of the above values of co in this formula gives us the final result as: R = ^^. 47. Probabilities Expressed by Integrals. — By making an ex- tended use of the infinitesimal calculus Mr. Bing and Dr. Kroman in their memoirs arrived at much more ambiguous results through an application of the rule of Bayes. Starting with the funda- mental rule as given in equation (I) in § 41, we may at times en- counter somewhat simpler conditions inside the domain of causes. The total complex of actions may embrace a large number of smaller sub-complexes construed in such a way that the change from one complex to another may be regarded as a continuous process, so that the productive probabilities are increased by an infinitely small quantity from a certain lower limit, a, to an upper limit, b. Denoting such continuously in- creasing probabilities by v and the corresponding small proba- bilities of existence by udv, we have as the total probability of obtaining E from any one of the minor complexes with a pro- ductive probability between a and {a ^ a, ^ ^ b) p = I uvdv. The probability that when £ has happened it originated from one of those minor complexes, or the probability of existence of some one of those complexes is: r i: uvdv 68 PROBABILITY A POSTEKIOHI. [47 The situation may be still more simplified by the following con- siderations. In the continuous total complex between the limits a and h we have altogether situated (6 — a) jdv individual minor complexes, If we assume all of these complexes to possess the same probability of existence, we must have: udv = dv b — a The two formulas then take on the form: P and = -- r p — '^ r vdv vdv A still more specialized form is obtained by letting a = and b = 1 which gives: -r vdv and P = f vdv i: vdv The above formulas may perhaps be made more intelligible to the reader by a geometrical illustration. ^ Let the various productive probabilities, v, be plotted along the A' axis in a Cartesian coordinate system in the interval from a to b (a < b). To any one of these probabilities say Vr there corresponds a certain probability of existence, m,, represented by a Y ordinate. In the same manner the next following pro- ductive probability, ?)r+i, will have a probability of existence 47] PROBABILITIES EXPRESSED BY INTEGRALS. 69 represented by an ordinate Wr+i. It is now possible to represent the various u's by means of areas instead of line ordinates. Thus the probability of existence, Ur, is in the figure represented by the small shaded rectangle, with a base equal to ?)r+l — Vr = A2)r, and an altitude of Ur, the total area being equal to AvrUr. That this is so, follows from the well-known elementary theorem from geometry that areas of rectangles with equal bases are directly proportional to their altitudes. The sum of the different it's is thus in the figure represented as the sum areas of the various small rectangles in the staircase shaped histograph. Now ac- cording to our assumption « is a continuous function in the interval from a to b. We may, therefore, divide this interval, b — a, into n smaller equal intervals. Let b — a «r+l — Vr = AUr = be one of these smaller divisions. By choosing n sufficiently large, (b — a)/n or Av becomes a very small quantity and by letting n approach infinity as a limiting value we have T b — a lim u = Iim uAv = udv. In this case the histograph is replaced by a continuous curve and udv is the probability of existence that the productive probability is enclosed between v and v -{- dv} The probability to get E from any one of the complexes Is evidently given by the total area of the small rectangles, or in the continuous case by means of the integral: uvdv . ^ A more rigorous analysis would be as follows : We plot along the abscissa axis intervals of the length e so that the middle of the interval has a distance from the origin equal to an integral multiple of «. If now e is chosen suffi- ciently small, we may regard the probability of existence of u, for values of the variable v between re — |e and re + Je as a constant and the probability that V falls between the limits re — Je and re + Je may hence be expressed as eUr. When e approaches as a limiting value this expression becomes vdv. See the similar discussion under frequency curves. 70 PROBABILITY A POSTERIORI. [48 In the same way the probabiUty that E originated from any of the complexes between a and ^ is: ••3 uvdv I f *J a uvdv The special case a = and & = 1 needs no further commentary. We are now in a position to consider the examples of Bing and Kroman. Any student familiar with multiple integration will find no difficulty in the following analysis. For the benefit of readers to whom the evaluation of the various integrals may seem somewhat difficult, we may refer to the addenda at the close of this treatise or to any standard treatise on the calculus as, for instance, WiUiamson's " Integral Calculus." 48. Example 24. — An urn contains a very large number of similarly shaped balls. In 10 successive drawings (with replace- ments) we have obtained 7 with the number 1, 2 with the number 2, and one having the number 3. What is the probability to obtain a ball with another number in the following drawing? We must here distinguish between 4 kinds of balls, namely balls marked 1, 2, 3, or " other balls." A general scheme of distribution of the balls in the urn may be given through the following scheme: nx balls marked with the number 1, ny " " " " " 2, nz " " " " " 3 and nt = n{\ — X — y — z) other balls. Here x, y, z and t represent the respective productive probabil- ities. If we now let all such probabilities assume all possible values between and 1 with intervals of l/n, we obtain the pos- sible conditions in the total complex of actions. Each of these conditions has a probability of existence, *, and the productive probabilities x, y, z, and \ — x — y — z. The original probability for 7 ones, 2 twos and 1 three in 10 drawings is: 10' 48] EXAMPLE 24. 71 Now when m is a very large number the interval 1/n becomes a very small quantity, and we may approximately write: s = udxdydz, and also write the above sum as a triple integral: 10! C^ f^ n where y = 1 — X and q = 1 — x — y. If now the above event has happened, then the probability to get a different marked ball in the 11th drawing is: ml u ■ x' • y^ • z{l — X — y — z) ■ dx ■ dy ■ dz Q = ml u • x' • y'^ ■ z ■ dx • d,y ■ dz It is, however, quite impossible to evaluate the above integral without knowing the form of the function u; but unfortunately our information at hand tells us absolutely nothing in regard to this. Perhaps the balls bear the numbers 1, 2 and 3 only, or perhaps there is an equal distribution up to 10,000 or any other number. Our information is really so insufficient that it is quite hopeless to attempt a calculation of the a posteriori probability. Many adherents of the inverse probability method venture, however, boldly forth with the following solution based upon the perfectly arbitrary hypothesis that all the m's are of equal magni- tude. This gives the special integral: mi x' ■ y'^ ■ z{\ — X — y — z), dx • dy • dz _ _ V /-•! /^P /»« I I I x'' • y^ ■ z ■ dx ■ dy ■ dz Jo Jo Jo where once more it must be remembered that x + y + z^l. In this case the limits of x are and 1, those of y are and 1 — x and those of z are and 1 — x — y. This is a well-known form of the triple integral which may be evaluated by means of Dirichlet's Theorem: 72 PKOBABILITY A POSTERIORI. [49 1 a;'^V~^2""W.r ■ dy ■ dz = Jo -'-"-^ V{b)T{m)V{n) T{l-[-h+m-\-n) (See Williamson's Calculus.) Remembering the well-known relation between gamma func- tions and factorials, viz. V{n-\- 1) = n\, we find by a mere substitution in the integral, the value of the probability in question to be 1:14. Another and equally plausible result is obtained by a slightly different wording of the problem. Ten successive drawings have resulted in balls marked 1, 2, or 3. What is the probability to obtain a ball not bearing such a number in the 11th drawing? This probability is given by the formula. t)i°(l - v)dv f v^^dv Jo = 1 : 12. Quite a different result from the one given above. 49. Example 25 — Bing's Paradox. — A still more astonishing paradox is produced by Bing when he gives an example of Bayes's Rule to a problem from mortality statistics. A mortality table gives the ratio of the number of persons living during a certain period, to the number living at the beginning of this period, all persons being of the same age. By recording the deaths during the specified period (say one year) it has been ascertained that of s persons, say forty years of age at the beginning of the period, m have died during the period. The observed ratio is then (s — m)/s. If * is a very large number this ratio may (as we shall have occasion to prove at a later stage) be taken as an approximation of the true ratio of probability of survival during the period. If s is not sufficiently large the believers in the inverse theory ought to be able to evaluate this ratio by an application of Bayes's Rule, by means of an analysis similar to the one as follows : Let y be the general symbol for the probability of a forty- year-old person being alive one year from hence. Each of such persons will in general be subject to different conditions, and the general symbol, y, will therefore have to be understood as the 49] EXAMPLE 25. bing's paradox. 73 symbol for all the possible productive probability values changing from to 1 by a continuous process. Assuming s a very large number each condition will have a probability of existence equal to udy. We may now ask: What is the probability that the rate of survival of a group of s persons aged 40 is situated between the limits a and ;8? The answer according to Bayes's Rule is: f £ ^{l - y)-^udy 1 (I) f-^{l - y)"'udy Jo Let us furthermore divide the whole year into two equal parts and let t/i be the probability of surviving the first half year, 2/2 the probability of surviving the second half, and Wi • dyi, M2 • dyi the corresponding probabilities of existence. Then the respective a posteriori probabilities for y^ and 2/2 are: yr^'{l - 7/1)™ Mirfj/i X JO and 1 2/i^-"Hl - 2/i)™Mi^2/i i.r^{l - y2)"^-U2dy2 (mi + m2 = m) f 2/2^ (1 - y^T'-Uidyi 'Jo (mi and mi represent the number of deaths in the respective half years.) The probability that both yi and 2/2 are true is then according to the multiplication theorem: 2/1""' (1 - yi)""Mi%it/2'~"(l - y%)""'Uidy2 I 2/i»-^'(l - yi)"''uidyi I 2/2*^(1 - y2)"''U2dy2 i/o •-'0 where y = yi ■ y2. The probability that the probability of survival for a full year, y, is situated between the limits a and /3 is therefore : r ] yr^'C^ - yiT'yr^{l - 2/2)'"Hti • u^ ■ dy^ ■ dy^ '^. w. (11) I yr^\l - yiT'uidyi 2/2'""(l - y2)'"'u2dy Jo Jo 74 PEOBABILITY A POSTERIOKI. [49 where the Umits in the double integral in the numerator are de- termined by the relation: a ^ z/12/2 = ^■ Choosing the principle of insufficient reason as the basis of our calculations, merely assuming that all possible events are, in the absence of any grounds for inference, equally likely, the various quantities expressed by the general symbol, u, become equal and constant and cancel each other in numerator and denominator, which brings the a posteriori probabilities ex- pressed by (I) and (II) to the forms: X . (HI) 2/^^(1 - yYdy and 2/i'-""(l - 2/i)™'2//-'"'-'^(l - ?/2)'^(^2/i • dyi n> f ^//-'"'(l - 2/1)"' / y2^{l - y2)'^dyi ■ dy2 •Jo •Jo (IV) where the limits in the numerator in the latter expression are determined by the relation : a < yiy^ < (3. Letting y V2 = — ^ 2/1 and then I - yi = z{\ - y) this latter expression may after a simple substitution be brought to the form: X(3 r z"* 2A-"(1 -y)^^dyj - '2"*^! - 2)"'<^2 '^'-'^ (V) Jj/i-'-Kl - yiT'dy^j 2/2'-^(l - 2/2)'^(^2/2 (See appendix.) Mr. Bing now puts the further question : What is the probability that a new person forty years of age, entering the original large 49] EXAMPLE 25. (O group of s persons, will sur\-ive one year, when we assume OTi = 7K2 = 0? (Ill) gives the answer: « + 2 ' y'ay Formula (V), on the other hand, gives us: dz I 2^^1(1 - y)dy j - , , i^2 Jo Jo 1 — -d — y) ^ ( s+ l y ~ \s+2} ■ I yidyi I Jo Jo y^dyi As the above analysis is perfectly general, we might equally well have applied it to each of the semi-annual periods, which would give us an a posteriori probability of survival equal to I .-, I for each half year, or a compound probability of _i_ .^ J for the whole year. Extending this process it is easily seen that by di'v'iding the year into parts, we shall have (* -I- 1 \ " ~^_^ I as the final probability a posteriori that a forty-year- old person will reach the age of forty-one. By letting n increase indefinitely the above quantity approaches as its limiting value and we obtain thus the paradox of Bing : //, among a large group of s equally old persons, we have observed no deaths during a full calendar year then another person of the same age outside the group is sure to die inside the calendar year. This is e^'idently a very strange result, and yet, working on the basis of the principle of insufficient reason, the mathematical deductions and formula exhibit no errors. Mr. Bing disposes of the whole matter by simply denj4ng the validity and existence of a posteriori probabilities. Dr. Kroman on the other hand defends Bayes's Rule. " Mathematics," Kroman says, " is — as Huxley has justly remarked — an ex- ceedingly fine null stone, but one must not expect to get wheat flour after ha%-ing put oats in the quern." According to the 76 PROBABILITY A POSTERIOEI. [50 Danish scholar the paradox is due to the use of a wrong formula. We ought to have used the general formula (II) instead of formula (V) which is a special case. In the general formula we encounter the functions u, denoting the probability existence of the various productive probabilities y. As we do not know anything about this function u it is hopeless to attempt a calculation. This brings the criticism down to the fundamental question whether we shall build the theory of probabilities on the principle of " cogent reason " or the principle of " insufficient reason." 50. Conclusion. — Contradictory results of a similar kind to the ones given above have led several eminent mathematicians to a complete denunciation of the laws underlying a posteriori probabilities. Professor Chrystal, especially, becomes extremely severe in his criticism in the previously mentioned address before the Actuarial Society of Edinburgh. He advises " practical people like the actuaries, much though they may justly respect Laplace, not to air his weaknesses in their annual examinations. The indiscretions of a great man should be quietly allowed to be forgotten." Although one may heartily agree with Professor Chrystal's candid attack on the belief in authority, too often prevailing among mathematical students, I think — aside from the fact that the rule was originally given by Bayes — that the great French savant has been accused unjustly as the following remarks perhaps may tend to show. In our statement of Bayes's Rule, we followed an exact mathe- matical method, and the final formula (I) is theoretically as correct as any previously demonstrated in this work. The customary definition of a mathematical probability as the ratio of equally favorable to coordinated possible cases, is not done away with in this new kind of probabilities; the former are found in the numerator and the latter in the denominator; and if we take care that each of the particular formulas, with its definite requirements, is applied to its particular case, we do not go beyond pure mathematics or logic. But are we able to get complete and exact information about these requirements? In the example of the tossing of a coin with two heads, this informa- tion was at hand. Here we were able to enumerate exactly the different mutually exclusive causes from which the observed 50] CONCLUSION. 77 event originated. We were also able to determine the exact quantitative measures for the probabilities, k, that these com- plexes existed as well as the different productive probabilities, w. Here the most rigid requirements could be satisfied, and the rule gave therefore a true answer. In the other examples we encountered a different state of affairs. Here we were not able to enumerate directly the dif- ferent complexes of causes from which the event originated, but were forced to form different and arbitrary hypotheses about the complexes of origin, F, and each hypothesis gave, in general, a different result. Furthermore, we assumed a priori that the different probabilities of the actual existence of the complexes were all equal in magnitude, and it was, therefore, the special formula (II) we employed in the determination of the a posteriori probabilities. In this formula, the different k's do not enter at all as a determining factor; only the productive probabilities, as, are considered. The assumption that all the k's are equal in magnitude is based upon the principle of insufficient reason, or as Boole calls it, " the equal distribution of ignorance." The principle of equal distribution of ignorance makes in the case of continuously varying productive probabilities, v, the function, u, of the probabilities of existence of the various complexes equal to a constant quantity. In other words, the curve in Fig. 1, is replaced by a straight line of the form, u = k. Now, as a matter of fact, we possess in most cases, some partial knowledge of the complexes of action producing the event in question. This partial knowledge — although far from complete enough to make a rigorous use of formula (I) — is nevertheless sufficient to justify us in discarding completely any general hypothesis assuming such simple conditions as above. Such partial knowledge is, for instance, found in the Paradox of Bing. Here the rather absurd hypothesis was made that the possible values of the probability of surviving a certain period were equally probable. In other words, it is equally probable that there will die 0, 1, 2, ■ ■ ■ , or s persons in the particular period. " Common sense, however, tells us that it is far more probable that, for instance, 90 per cent, of a large number of forty-year-old persons will survive the period than no one or every one will die 78 PROBABILITY A POSTERIORI. [50 in the same period " (Kroman). The indiscreet use of formula (II) therefore naturally leads to paradoxical results. On the other hand, the fallacy of the happy-go-lucky computers, em- ploying the special case (II) of Bayes's Rule, as well as the critics of Laplace, lies in their failure to make a proper distinction between " equal distribution of ignorance " and " partial cogent reason," which latter expression properly may be termed " an unequal distribution of ignorance." If, despite the actual presence of such unequal distribution of ignorance, we still insist in using the special formula (II), which is only to be used in the case of an equal distribution of ignorance, it is no wonder we encounter ambiguous answers. Not the rule itself, its discoverer, or Laplace, but the indiscreet computer is the one to blame. Messrs. Bing, Venn and Chrystal, in their various criticisms, have filled the quern with some rather " wild oats " and expected to get wheat flour; and that one of those critics in his disappoint- ment in not getting the expected flour should blame Laplace, is hardly just. So much for the principle of " equal distribution of igno- rance." It may be of interest to see how matters turn out when we like von Kries insist upon the principle of " cogent reason " as the true basis of our computations. The reader will quite readily see that a rigorous application of the Rule of Bayes in its most general form as given by formula (I) really tacitly assumes this very principle. In formula (I), we require not alone an exact enumeration of the various complexes from which the observed event may originate, but also an exact and complete information about the structure of such complexes in order to evaluate their various probabilities of existence. If such informa- tion is present, we can meet even the most stringent requirements of the general formula, and we will get a correct answer. But in the vast majority of cases, not to say all cases, such information is not at hand, and any attempt to make a computation by means of Bayes's Rule must be regarded as hopeless. We may, how- ever, again remark that very seldom we are in complete ignorance of the conditions of the complexes, which is the same thing as saying that we are not in a position to employ the principle of equal distribution of ignorance in a rigorous manner. From 50] CONCIiTJSION. 79 other experiments on the same kind of event, or from other sources, we may have attained some partial information, even if insufficient to employ the principle of cogent reason. Is such information now to be completely ignored in an attempt to give a reasonable, although approximate answer? It is but natiu-al that the mathematician should attempt to obtain as much of such information as possible and use it in the evaluation of the various probabilities of existence. Thus for instance, if, in the Paradox of Bing, we had observed that the probability of survival for a forty-year-old person never had been below .75 and never above .95, it would be but reasonable to substitute those limits in their proper integrals in order to attain an approximate answer. To illustrate this somewhat subjective determination of an a posteriori probability, we take another example from the memoirs of Bing and Kroman. Example (24)- — A merchant receives a cargo of 100,000 pieces of fruit. If every single fruit is untainted, the value of the cargo may be put at 10,000 Kroner. On the other hand, any part of the cargo more or less tainted is considered worthless. The merchant has never before received a similar cargo and does not know how the fruit has been afFected by travel. As samples, he has selected 30 pieces picked at random from the cargo and all samples proved to be fresh. He asks a mathematician what value he can put on the cargo. If the mathematician uses the special formula (II), assum- ing an equal distribution of ignorance, therefore assuming that it is equally probable that for example none, 5,000 or all the individual pieces of fruit were untainted, the answer is: 10,000^^;! = 9687.5 Kroner. I 1^'^dv If we use the true rule, the a posteriori probability of the whole- someness of the cargo is given by the integral: •■1 I Jo I 1 80 PROBABILITY A POSTERIORI. [50 where v is the general expression for a possible probability of wholesomeness between and 1 and udv the corresponding proba- bility of existence. Now if the mathematician has no complete information as to this particular function, ?t, it would be foolish cf him to attempt a calculation, since the hypothesis of an equal probability of existence for all possible values of v evidently gives an arbitrary and perhaps a very erroneous result. On the other hand, the computer may possibly have access to some partial information. Perhaps the merchant has received fruit of a similar kind or heard about cargoes of this particular kind of fruit received by other dealers. If now the merchant were able to inform the computer that in a great number of similar cases the probability of wholesomeness had been between 0.9 and 1 with an approximately even distribution, while it never had been below 0.9, then nothing would hinder the mathematician to present the following computation: I V «/0.9 I v^dv Jo.9 - = 0.9726 and tell the merchant that on the basis of the information given 9,726 Kroner would be a fair price for the cargo. This is really the point of view taken by the English mathe- matician, Professor Karl Pearson, one of the ablest writers on mathematical statistics of the present time, when he says: "I start, as most Avriters on mathematics have done, with ' the equal distribution of ignorance ' or I assume the truth of Bayes's Theorem. I hold this theorem not as rigidly demonstrated, but I think with Edgeworth that the hypothesis of the equal dis- tribution of ignorance is, within the limits of practical life, justified by our experience of statistical ratios, which are unknown, i. e., such ratios do not tend to cluster markedly round any particular point." To sum up the above remarks: Theoretically Bayes's Rule is true. If we are able to enumerate and determine the probabilities of existence of the complexes of origin it will also give true results in practice. If we are justified in assuming the principle 50 ] CONCLUSION. 81 of " insufficient reason " or " equal distribution of ignorance " as the basis for our calculations, formula (II) may be employed with exact results after a rigid enumeration of the complexes. If the principle of " cogent reason " is required as the basis, an exact computation is in general hopeless, and we can only after having obtained partial subjective information give an approxi- mate answer. With these remarks we shall conclude the elementary dis- cussion of the merely theoretical part of the siibject. The follow- ing chapters require in most cases a knowledge of the infinitesimal calculus, and many of the questions discussed above will appear in a new and instructive light by this treatment. CHAPTER VII. THE LAW OF LARGE NUMBERS. 51. A Priori and Empirical Probabilities. — In the previous chapters we Hmited ourselves to the discussion of such mathe- matical probabilities, where we, a priori, on account of our knowledge of the various domains or complexes of actions, were able to enumerate the respective favorable and unfavorable possibilities associated with the occurrence or non-occurrence of the event in question. " The real importance of the theory of probability in regard to mass phenomena consists, however, in determining the mathematical relations of the various proba- bilities not in a deductive, but in an empirical manner — without an a priori exhaustive knowledge of the mutual relations and actions between cause and effect — by means of statistical enumeration of the frequency of the observed event. The conception of a probability finds its justification in the close relation between the mathematical probabilities and relative frequencies as determined in a purely empirical way. This relation is established by means of the famous Law of Large Numbers " (A. A. Tschuprow). To return to our original definition of a mathematical proba- bility as the ratio of the favorable to the coordinated equally possible cases, we first notice that this definition is wholly arbitrary like many mathematical definitions. The contention of Stuart Mill that every definition contains an axiom is rather far stretched. In mathematics a definition does not necessarily need to be metaphysical. A striking example is offered in mechanics by the definitions of force as given by Lagrange and Kirchhoff. What is force? " Force," Lagrange says, " is a cause which tends to produce motion." Kirchhoff on the other hand tells us that force is the product of mass and acceleration. Lagrange's definition is wholly metaphysical. Whenever a definition is to be of use in a purely exact science such as mathe- matics, it must teach us how to measure the particular phe- nomena which we are investigating. Thus, to quote Poincare, 82 51] A PHIORI AND EMPIRICAL PROBABILITIES. 83 " it is not necessary that the definition tells us what force really is, whether it is a cause or the effect of motion." An analogous case is offered in the criticism of a mathematical probability as defined by Laplace, and the attempts to place the whole theory of probabilities on a purely empirical basis by Stuart Mill, Venn and Chrystal. These writers contend " that probability is not an attribute of any particular event happening on any particular occasion. Unless an event can happen, or be conceived to happen a great many times, there is no sense in speaking of its probability." The whole attack is directed against the definition of a mathematical probability in a single trial which definition, evidently by the empiricists, is regarded as having no sense. The word " sense " must evidently be con- sidered as having a purely metaphysical meaning. In the same manner Kirchhoff's definition might be dismissed as having no sense, since it would seem as difficult to conceive force as a purely mathematical product of two factors, mass and acceleration, as it is to conceive the definition of a mathematical probability as a ratio. The metaphysical trend of thought of the above writers is shown in their various definitions of the probability of an event. Mill defines it merely as the relative frequency of happenings inside a large number of trials, and Venn gives a similar defini- tion, while Chrystal gives the following: " If, on taking any very large number N out of a series of cases in which an event, E, is in question, E happens on pN occasions, the probability of the event, E, is said to be p." Let us, for a moment, look more closely into these statements. Any definition, if it bears its name rightly, must mean the same to all persons. Now, as a matter of fact, the vagueness in a half metaphorical term like " any very large number " illustrates its weakness. The question immediately confronts us " what is a very large number? " Is it 100, 1,000 or perhaps 1,000,000? A fixed universal standard for the value of N seems out of the question and the definition — although perhaps readily grasped in a " general way " — can hardly be said to be happily chosen. Another, and perfectly rigorous definition, is the following one given by the Danish astronomer and actuary, T. N. Thiele. 84 THE LAW OF LARGE NUMBERS. [51 Thiele tells us that " common usage " has assigned the word probability as the name "for the limiting value of the relative frequency of an event, when the number of observations (trials), under which the event happens, approach infinity as a limit." A similar definition is later on given by the American actuary R. Henderson, who says : " The numerical measure which has been universally adopted for the probability of an event under given circumstances is the ultimate value, as the number of cases is indefinitely increased, of the ratio of the number of times the event happens under those circumstances to the total possible number of times." There is nothing ambiguous or vague in these definitions. Infinity, taken in a purely quantitative sense, has a perfectly uniform meaning in mathematics. The new definition differs, however, radically from our customary definition of a mathematical a priori probability. We cannot, therefore, agree with Mr. Henderson when he continues " the measure there given has been universally adopted and this holds true in spite of the fact that the rule has been stated in ways which on their face differ widely from that above given. The one most commonly given is that if an event can happen in a ways and fail in b ways all of which are equally likely, the probability of the event is the ratio of a to the sum of a and h. It is readily seen that if we read into this statement the meaning of the words " equally likely," this measure, so far as it goes, reduces to a particular case of that given above." In order to investigate this statement somewhat more closely, let us try to measure the probability of throwing head with an ordinary coin by both our old definition of a mathematical probability and the definition by Mr. Henderson of what we shall term an empirical probability. Denoting the first kind of probability by P(E) and the second by P'{E) we have in ordinary symbols P(E) = I P'iE) = lim F{E, v) where the symbol F{E, v) denotes the relative frequency of the event, E, in v total trials. No a priori knowledge will tell us offhand if P'(E) will approach ^ as its ultimate value. The 52] EXTENT AND USAGE OF BOTH METHODS. 85 two methods are radically different. By the first method the determination of the numerical measure of a probability depends simply on our ability to judge and segregate the equally possible cases into cases favorable and unfavorable to the event E. By the second method the determination of the probability depends, not alone on the segregation and consequent enumeration of the favorable from the total cases, but chiefly on the extent of our observations or trials on the event in question. 52. Extent and Usage of Both Methods. — Before entering into a more detailed discussion of the actual quantitative comparison of the two methods, it might be of use to compare their A'arious extent of usage. In this respect the empirical method is vastly superior to the a priori. A rigorous application of the a priori method, as far as concrete problems go, is limited to simple games of chance. As soon as we begin to tackle sociological or economical practical problems it leaves us in a helpless state. If we were to ask about the probability that a certain person forty years of age would die inside a year, it would be of little use to try to determine this in an a priori manner. Even a purely deductive process, as illustrated by Bayes's Rule in the earlier chapters, leads to paradoxical results. Our a priori knowledge of the complexes of causes governing death or survival is so incomplete that even a qualitative — not to speak of a quanti- tative — judgment is out of the question. The empirical method shows us at least a way to obtain a measure for the probabihty of the event in question. By observing during a period of a year an infinite number of forty-year-old persons of whom, after an exhaustive qualitative investigation, we are led to believe that their present conditions as far as health, social occupation, en- vironments, etc., are concerned are equally similar, we may by an enumeration of those who died during the year obtain the desired ratio as defined by P'{E). Of course, observation an infinite number is practically impossible. An approximate ratio may be formed by taking a finite, but a large, number of cases under observation. But how large a number? This very question leads straightforward to another problem, namely the quantitative determination of the range of variance between the approximate ratio and the ideal ultimate ratio as defined by 86 THE LAW OF LARGE NUMBERS. [52 the relation P'iE) = lim FiE, v). Since it is impossible to make an infinite number of observations we cannot find the exact value of the range of such variations. But we may, however, determine the probability that this range does not exceed a certain fixed quantity, say X, in absolute mag- nitude. Stated in compact form our problem reduces to the following form: To determine the probability of the existence of the following inequality : hm FiE, v)-- ^X where both a and s are finite numbers. This, to a certain extent, contains in a nut shell some of the most important problems in probabilities. The above problem may be solved in two distinct ways. The first, and perhaps the most logical way, is by a direct process. This is the method followed by T. N. Thiele in his " Almindelig lagttagelseslsere,"^ published in Copenhagen, 1889, a most original work, which moves along wholly novel lines. Thiele distinguishes between (1) Actual observation series as recorded from observation, in other words statistical data. (2) Theoret- ical observation series giving the conclusions as to the outcome of future observations and (3) Methodical laws of series where the number of observations is increased indefinitely. By such a process, purely a theory of observations, the whole theory of probability becomes of secondary importance and rests wholly upon the theory of observed series, a fact thoroughly emphasized by Thiele himself. When the author first, in the closing chapters of his book, makes use of the word probability it is only because " common usage " has assigned this word as the name for the ultimate frequency ratio designated by our symbol lim F{E, v). r=oo The problem may, however, be solved in an indirect way, which is the one I shall adopt. This method, as first consistently deduced by Laplace, has for its basis our original definition of a mathematical a priori probability and may be briefly sketched as follows: We first of all postulate the existence of an a priori 'English edition, "Theory of Observations," London, 1905. 53] AVERAGE A PRIORI PROBABILITIES. 87 pirobability as defined, although its actual determination, by a priori knowledge, is impossible except in a few cases, as, for instance, simple games of chance, drawing balls from urns, etc. Denoting such a probability by P{E), or p, we next ask. What will be the expected number, say a, of actual happenings of the event, E, expressed in terms of s and p, when we make s consecutive trials instead of a single trial, and what will be the number of happenings of E when s approaches infinity as its ultimate value? If such a relation is found between p, a and s, where p is the unknown quantity, we have also found a means of determining the value of p in known quantities. Our next question is — What is the probability that the absolute value of the difference between p and the relative frequency of the event as expressed by the ratio of a to 5 does not exceed a previously assigned quantity? Or the probability that a X? Now, as the reader will see later, we shall prove that lim F{E, ■») = P{E) = p. V=QO It must, however, be remembered that this result is reached by a mathematical deduction, based upon the postulate of mathe- matical probabilities, and not in the manner as suggested in the above statement by Mr. Henderson. It is only after having established such purely quantitative relations that we are entitled to extend the laws of mathematical probabilities as deduced in the earlier chapters to other problems than the simple problems of games of chance. 53. Average a Priori Probabilities. — In the previous para- graphs of this chapter, another important matter is to be noted, namely the assumption that the complex of causes producing the event in question remains constant during the repeated trials (observations), or, stated in other words the mathematical a priori probability remains constant. Under this limitation the extension of the laws of mathematical probabilities would have but a very limited practical application. In all statistical mass phenomena such an ideal state of affairs is rather a very 88 THE LAW OF LARGE NUMBERS. [54 rare exception. If we consider an ordinary mortality investiga- tion we know with absolute certainty that no two persons are identically alike as far as health, occupation, environment and numerous other things are concerned. Thus the postulated mathematical probability for death or survival during a whole calendar year will in general be different for each person. We may, however, conceive an average probability of survival for a full year defined by the relation Pi + P2 + Ps + ■ ■ ■ Pb 2p P' = S = T' where pi, pi, pz, ■ ■ ■ are the postulated probabilities of each individual under observation. Our task is now to find: 1. An algebraic relation between the average probability as defined above, the absolute frequency a and the total number of observations (trials) s, 2. The same relation when s approaches a as its ultimate value, 3. The probability of the existence of the inequality, a Po sx, s where a denotes the absolute frequency of the occurrence of the event, s the total number of observations (trials) and X an ar- bitrary constant. 54. The Theory of Dispersion. — As we mentioned before the empirical ratio a/s represents only an approximation of the ideal ultimate value of lim F{E, v). If we now make a series of V=00 observations (trials) on the occurrence of a certain event E, such that instead of a single set of observations of s individual ob- servations we take N such sets, we shall have N relative frequency ratios : Ol 02 «3 Oj?- s ' s ' s'' '" s ' Since the ratios are approximations only of the ultimate ratio they will in general exhibit discrepancies as to their numerical values and may be regarded as N diff'erent empirical approxima- tions. The question now arises how these various empirical ratios group themselves around the value of lim F{E, v). The dis- 55] HISTORICAL DEVELOPMENT OF LAW OF LARGE NUMBERS. 89 tribution of the empirical ratios around the ultimate ratio is by Lexis called " dispersion." 55. Historical Development of the Law of Large Numbers. — The first mathematician to investigate the problems we have roughly outlined in the previous paragraphs was the renowned Jacob Bernoulli in the classic, " Ars Conjectandi," which rightly may be classified as one of the most important contributions on the subject. Bernoulli's researches culminate in the theorem which bears his name and forms the corner-stone of modern mathematical statistics. That Bernoulli fully realized the great practical importance of these investigations is proven by the heading of the fourth part of his book which runs as follows: " Artis Conjectandi Pars Quarta, tradens usum et applicationem prsecedentis doctrinse in civilibus et ceconomicis." It is also here that we first encounter the terms " a priori " and " a pos- teriori " probabilities. Bernoulli's researches were limited to such cases where the a priori probabilities remained constant during the series or the whole sets of series of observations. Poisson, a French mathematician, treated later in a series of memoirs the more general case where the a priori probabilities varied with each indi-\'idual trial. He also introduced the technical term, " Law of Large Numbers " (" Loi des Grand Xombres "). Finally Lexis through the publication in 1877 of his brochure, " Zur Theorie der Massenerscheinungen der menschlichen Gesell- schaft," treated the dispersion theory and forged the closing link of the chain connecting the theory of a priori probabilities and empirical frequency ratios. Of late years the Russian mathe- matician, Tchebycheff, the Scandina^^an statisticians, Wester- gaard and Charlier, and the Italian scholar, Pizetti, have con- tributed several important papers. It is on the basis of these papers that the following mathematical treatment is founded. In certain cases, however, we shall not attempt to enter too deeply into the theory of certain definite integrals, which is essential for a rigorous mathematical analysis, but which also requires an extensive mathematical knowledge which many of my readers, perhaps, do not possess. To readers interested in the analysis of the various integrals we may refer to the original works of Czuber and Charlier. CHAPTER VIII. INTRODUCTORY FORMULAS FROM THE INFINITESIMAL CALCULUS. 56. Special Integrals. — In the following chapters we shall attempt to investigate the theor\' of probabilities from the stand- point of the calculus. Although a knowledge of the elements of this branch of mathematics is presupposed to be possessed by the student, we shall for the sake of convenience briefly review and demonstrate a few formulas from the higher analysis of which we shall make frequent use in the following paragraphs. All such formulas have been given in the elementary instruction of the calculus, and only such readers who do not have this particular branch of mathematics fresh in memory from their school days need pay any serious attention to the first few paragraphs. 57. Wallis's Expression for tt as an Infinite Product. — We wish first of all to determine the value of the definite integral: Jn = qJ sin" xdx, (1) under the assumption that w is a positive integral number. This integral is geometrically equal to the area between the x axis, the axis of y, the ordinate corresponding to the abscissa \ir and the graph of the function y = sin" x. Letting u' = DxU = sin x, V = sin"~^ X, we get by partial integration : J„ = — cos X sin"~^ a; r ^+ gf' '^ cos x{n— 1) sin"~^ x cos xdx. (2) If we substitute the upper and lower Hmits in the first term on the right hand side of the above expression for J„ this term reduces to 0, assuming « > 1. Thus we have: Jn = (n — l)oJ" sm^~^ x-cos,^ xdx. 90 57] WALLIS'S EXPRESSION OF tt AS AN INFINITE PRODUCT. 91 Putting cos^ X = 1 — sin^ x, we get: Jn= {n- l)J'^' sin"-2 xdx - {n - 1) J""" sin" xdx. (3) The last integral is, however, equal to J„ and the first integral is, following the notation from (1), equal to /n-2. ^Ve shall therefore have: Jn+ (n — l)Jn = (n — l)Jn-2, or nJn= {n- 1)J^2- (4) Replacing nhy n — 1, n — 2, n — 3, •■• successively we get: nJn = (n — l)Jr^2, {n — 1) J,^_l = (W — 2)Jn-3, {n — 2)Jn-2 = (n — 3)Jn-i, According as n is even or uneven we shall have one of the following equations at the bottom of the recursion formula: Jo = o/" ^ sin" xdx = J^' ^ dx = ^ir, or Ji = ^J smxdx = — cos a; " = 1. (5) If, for even values of n, we let n = 2m, and, for uneven values, n = 2m — 1, we get finally the following recursion formulas: 2TnJ2m= (2m— l)J2v>-2, (2m— 1) J2m-1= (2m— 2) J2m-3, (2m — 2) J2m-2 = (2m— 3)J2m^i, (2m— B)J2m^3= (2m— 4) J2m-Si, 2J2 = l-k, 3^3 = 2X1. Successive multiplication of the above equations gives us finally: (2m- l)(2m- 3)---l ir *'''"" 2m(2m-2)---2 ^2' _ (2m- 2)(2m- 4)- ••2 *■ ^ Jtm-i - (2m - l)(2m - 3) • ■ -3 ■ We may now draw some very interesting conclusions from the 92 FOBMULAS FEOM THE INFINITESIMAL CALCULUS. [58 above equations. Both integrals represent geometrically areas bounded by the graphs of the functions: y = sin^" X and y = sin^""^ x respectively. The difference of the ordinates of these graphs, namely: (sin X — 1) sin^""' x is evidently decreasing with increasing values of the positive integer n, since sin x lies between and + 1 and sin^"*"^ x ap- proaches the value except for certain values of x. The larger we select m the less is the difference of the two areas and the ratio will therefore approach 1, or the expression (2m- 2)(2w - 4)- ■■2 (2m- l)(2m - 3) •■•3 _ t (2m- l)(2m - 3)---3 ^ 2m(2m-2)---2 ~2" Hence: IT ,. 22-42-62---(2m- 2)2.2m = lim 2 ;r;i2.32.52-.-(2m- 3)2(2m- 1)2* Multiplying with 22-42-62- • ■ (2m - 2)^ we get: T ,. 2^"'-3m[(m - 1)/]^ ,. 22"'(m/)2 PT" K = hm — pTT, -xw> — or lim =:^ = ^ir 2. 2 m=« [(2m - l).f ^=„ (2m/) M2m This is the formula originally discovered by the English mathematician, John Wallis (1616-1703), and by means of which IT may be expressed as an infinite product. 58. De Moivre — Stirling's Formula. — We are now in a position to give a demonstration of Stirling's formula for the approximate value of n! for large values of n. A. de Moivre seems to have been the first to attempt this approximation. In the first edition of his "Doctrine of Chances" (1718) he reaches a result, which must be regarded as final, except for the determination of an unknown constant factor. Stirling succeeded in completing this last step in his remarkable "Methodus Differentialis " (1738). In the second edition of " Doctrine of Chances" (1738) de Moivre gives the complete formula with full credit to Stirling. He mentions as his belief that Stirling in his final calculation possibly has made use of the formula of Wallis. The demonstration by the older English authors is rather lengthy and much shorter 58] DE MOivRE — Stirling's formula. 93 methods have been de\'ised by later writers. ^lost authors make use of the Eulerian integral of the second order by which any factorial may be expressed by a gamma fimction: r(n + 1) = J^x'^e-^'dx m. Another method makes use of the well-known Euler's Summation Formula from the calculus of finite differences. This method is of special interest to actuarial students, who frequently use the Eulerian formula in the computation of various life contingencies. For the benefit of those interested in this particular method we may refer to the treatises of Seliwanoff and ]\Iarkhoff, two Russian mathematicians.^ The Italian mathematician, Cesaro, has, however, derived the formula in a much simpler manner.^ Cesaro starts with the inequalities: (-^r From a well-known theorem from logarithms we have : 1, «+ 1 1 ,1,1, 2n + 1 ' 3(2?! + \f ' 5(2n + 1)* ' which also may be written as follows: y=(n+i)log.(l + J)=l + 3^2„Vl)^+5(2n+l)*+---- If all the coefficients 3, 5, • • • are replaced by the number 3, we obtain a geometrical series. The summation of this infinite series shows that 1< .V < 1 + ^ or I2n(n + 1)' . n+l '2 , , 1 If we let / l\n+l/2 1 _ w!e" _ (n + 1) ! e"+i (n + l)"+3'2' 1 Seliwanoff, "LeJirbuch der Differenzenrechnung," Leipzig, 1905, pages 59-60; Markhoff, "Differenzenreclmung," Leipzig, 1898. 2 Cesaro, "Corso di analisa algebrica," Torino, 1884, pages 270 and 480. S4 FORMULAS FEOM THE INFINITESIMAL CALCULUS. [58 then _M^^ (1 + llnY+^'^ Un+i e Dividing the quantities in (I) by e we have: Un 1<— ^< 6"""'+". (II) Un+1 The exponent of e may be written as follows: 1 1 1 12nin + 1) 12n 12{n + 1) " IVIaking use of this relation (II) may be written in the following form: __!_ 1 12ra Denoting the quantity : Un ■ e~^ 'i^" by uj, we shall have two mon- otone number sequences: Ml, M2, Ms, • • • Un, Un+1, ' ' ' , Ml', W, Ms',- • -Un', Un+\,' ■ ■ ■■ These two sequences show some very remarkable features. With increasing values of n the values of m„ decrease, or the sequence is a monotone decreasing number sequence. The values of Un become larger when n is increased and form there- fore a monotone increasing number sequence. But any member of this latter series satisfies, however, the inequality Since both number sequences are situated in a finite Interval it follows from the well-known theorem of Weierstrass that they both have a clustering point, i. e., a point in whose immediate region an infinite number of points of the sequence are located. Denoting this point of cluster by a, we have here an increasing and a decreasing monotone sequence which both converge towards a, or: lim Un = lim m„ = a. n=oo w=oo This relation may be illustrated by the accompanying diagram : If we now let lim m„ = lim M„-e~''^^" = a, then we shall have 58] DE MOivRE — Stirling's formula. for every finite value of n: Un-e -l/12n < a < w„. where a = iin-e~^'^" (0 < 6 < 1). 96 or n^^ ^ This gives us finally the following expression for n!: (Ill) In this expression we need only determine the unknown coeflBeient a. The formula of Wallis gives immediately: (2-4-6---2n )^ 2'-{niy j—~ lim i=— = nm -;=^= -VTr/z. »=«. (2n)!-V2w «=«.(2n)!V2w Substituting in this latter expression the value for factorials as found in (III) and neglecting the quantity: d/12n, we have after a few reductions : an lim , «=«. V27i(2m) = V7r/2, or o = A/2ir, from which we easily obtain De Moivre-Stirling's Formula in its final form: nl = V2^-n"+i/2-e-". This remarkable approximation formula gives even for com- paratively small values of n surprisingly accurate results. Thus for instance we have: 10! = 3,628,800; lOi^e-^" V20^ = 3,598,699. CHAPTER IX. LAW OF LARGE NUMBERS. MATHEMATICAL DEDUCTION. 5Q. Repeated Trials. — Let us consider a general domain of action wherein the determining causes remain constant and produce either one or the other of the opposite and mutually exclusive events, E and E, with the respective a priori prob- abilities p and q (,q = 1 — p)ma single trial. The trial (observa- tion) will, however, be repeated s times with the explicit assump- tion that the outward conditions influencing the different trials remain unaltered during each observation. The simplest ex- ample of observations of this kind is offered by repeated drawings of balls from an urn containing white and black balls only, and where the ball is put back in the urn and mixed thoroughly with the rest before the next drawing takes place. We keep now a record of the repetitions of the opposite events, E and E during the s trials, irrespective of the order in which these two events may happen. This record must necessarily be of one of the following forms : E happens s times, E times, E " s-1 " El " E " s-2 " E2 " E " " Es In Chapter IV, Example 17, we showed that the probabilities of the above combinations of the two events, E and E, were determined by the expansion of the binomial (p + qY- 96 61] SIMPLE NUMEBICAL EXAMPLES. 97 The general term is the probabUity P{E°-E^) that E will happen a and E /3 times in the s total trials. Each separate term of the binomial expan- sion of (p + qY, represents the probability of the happening of the two events in the order given in the above scheme. 60. Most Probable Value. — In dealing with these various terms, it has usually been the custom of the English and French mathematicians as well as many German scholars to pay par- ticular attention to a special term, the maximum term, which generally is known as the "most probable value" or the "mode." Russian and Scandinavian writers and the followers of the Lexis statistical school of Germany have preferred to make another quantity known as the "probable" or "expected value," the nucleus of their investigations. Although it is our intention to follow the latter method, we shall discuss first, briefly, the most probable value. Two questions are then of special interest to us: (1) What particular event is most probable to happen? (2) What is the probability that an event will occur whose probability does not differ from that of the most probable event by more than a previously fixed quantity? Neither of the two questions offers any particular principal difficulties from a theoretical point of view. When regarding the probability P{E°'&), which we shall denote by T, as a func- tion of the variable quantity, a, T evidently will reach a maximum value for a certain value of a, (/3 = s — a), and we need only determine the greatest term in the above binomial expansion. In order to answer the second question we have only to pick out all the terms which are situated between the two fixed limits. Their sum is then the probability that those two limits are not exceeded. 61. Simple Numerical Examples. — When s is a comparatively small number the actual expansion may be performed by simple arithmetic. We shall, for the benefit of the student, give a simple example of this kind. 8 98 LAW OF LARGE NUMBERS. [61 A pair of dice is thrown 4 times in succession, to investigate the chance of throwing doublets. In a single throw the probability of getting a doublet is P — 'a'>{ 9 = 'a) ■ Expanding I ^ + ^ ) by means of the bi- nominal theorem we get l^j + i l-j l-j + qI-J l^j "'"^(fi/lfi) "'"IfiJ" ^^^^ of t^6 above terms represents the probability of the occurrence of the various combinations of doublets (E) and non-doublets (E), and it is readily seen that the event of getting no doublets at all, represented by the last term ( ^ ) = 0.4823, has the greatest probability. In other 'i\\)rds it is the most probable event. Let us next repeat the trial 12 times instead of 4. The 13 possible probabilities for the various combinations of doublets and non-doublets will then be expressed by the respective terms in the expression il-IJ The 13 members have as their common denominator the quantity 2,176,782,336 and as numerators the following quantities: 1, 60, 1,650, 27,500, 309,375, 2,475,000, 14,437,500, 61,875,000, 193,- 359,375, 429,687,500, 644,531,250, 585,937,500, 244,140,625, which now shows that the most probable combination is the one of 2 doublets and 10 non-doublets, having a numerical value equal to .2961. A further comparison will show that the most probable event in the second series had the probability .2961, whereas .4823 was its value in the first series. In other words, the prob- ability decreases when the trials (observations) are increased. This is due to the fact that the total number of possible cases becomes large with the increase of experiments. Another question which presents itself, in this connection, is the following : What is the probability that an event will occur 62] VALUE IN A SERIES OF REPEATED TRIALS. 99 whose probability does not differ from the most probable value by more than a previously fixed quantity? Let us suppose we were asked to determine the probability that a doublet does not occur oftener than 5 times and not less than 1 time in 12 trials. This probability is found by adding the numerical values of the prob- abilities as given in the binomial expansion from the term containing p = ^ to the power 6, to p to the first power or 14,437,500 + 6,187,500 + 193,359,375 + 429,687,500 + 644,531,250 + 585,937,500 2,176,782,336 62. The Most Probable Value in a Series of Repeated Trials. — In the examples just given we determined the probability for the happening of the most probable event in a series of s observa- tions by a direct expansion of the binomial {j) -{-qY- This may be done whenever 5 is a comparatively small number. But, when s takes on large values, this method becomes impracticable, not to say impossible. Suppose that s = 1,400, then the actual straightforward expansion (p + 5)""" would require a tremen- dous work of calculation which no practical computer would be willing to undertake. We must therefore in some way or other seek a method of approximation by which this labor of calcula- tion may be avoided and try to find an approximate formula by which we are able to express the maximum term in a simple manner, involving little computation and at the same time yielding results close enough for practical as well as theoretical purposes. Jacob Bernoulli in his famous treatise "Ars Conjec- tandi" was the first mathematician to solve this problem. Bernoulli also gave an expression for the probability that the departure from the most probable value should not exceed pre- A'iously fixed limits. The method, however, was very laborious and the final form was first reached by Laplace in " Th6orie des Probabilites." We saw before in Chapter IV that the general term 100 LAW OF LARGE NUMBERS. [62 in the binomial expansion (j> + q)° represented the probability that an event, E, will happen a times and fail (3 times in s trials, where p and q were the respective probabilities for success and failure in a single trial. The exponent a may here take all posi- tive integral values in the interval (0, s), including both limits. The question now arises, which particular value of a, say a„, will make the above quantity a maximum term in the expansion of the binomial? If a„ really is this particular value, then it must satisfy the following inequalities: (a„+l)!(^„- 1)^' ^ ^ajpnl^ * (I) (11) sl = («„- l)!(/3„+l)I^ ^ • (III) Dividing (II) by (III) and (II) by (I) we obtain the following inequalities : an q - Pn p- which also may be written i^H + l)p ^ qcxn and («„ + l)q ^ /3„p. The following reductions are self evident: {s — an+ l)p ^ anil — p) or sp + P ^ ««> and (a„ + l)g ^ (s — an)p or a„g + a„p ~^ sp — q oy an '^ sp — q. From which we see that «„ satisfies the following relation : ps — q ^ an ^ ps -\- p. Since p -\- q = 1, we notice that «„ is enclosed between two limits whose difFerence in absolute magnitude equals unity. The whole interval in which a„ is situated being equal to unity, and since a„ must be an integral number, this particular a„ is determined uniquely as an integral positive number when both ps — q and ps + p are fractional quantities. If ps — q is an integral number ps + p will also be integral, and a„ had to be a 63] APPROXIMATE CALCULATION OF THE MAXIMUM TERM. 101 fractional number in order to satisfy the above inequality. Since by the nature of the problem a„ can take positive integral values only, the binomial expansion of (p -\- qY must have two terms which are greater than any of the rest. Dividing both sides of the inequality by s, we shall have Since both p and q are proper fractions, both p/s and q/s are less than 1/s. We may therefore safely assume that the highest pos- sible difference between the two quotients ajs and /3„/s and the probabilities p and q will never exceed 1/s. Now if s is a very large number this quantity may be neglected, and we may therefore '«Tite ps = an and qs = /3„. Substituting these values in our original expression for the general term of the binomial expansion we get as the maximum number: ^- {sp)lisq)\P ^ • 63. Approximate Calculation of the Maximum Term, T,^. — \\Tien the trials are repeated a large number of times the straight- forward calculation of the maximum term becomes very laborious. The onlj' table facilitating an exact computation is in a work "Tabularum ad Faciliorem et Breviorem Probabilitatis Com- putationem Utilium, Enneas," by the Danish mathematician, C. F. Degen. This table, which was published in 1824, gives the logarithms to twelve places for all values of nl from w = 1 to n = 1,200. Degen's table is, however, not easily obtained, and even if it were, it would be of little or no value for factorials above 1,200 !. Our only resort is therefore to find an approximate expression for the above value of n I. This is most conveniently done by making use of Stirling's formula for factorials of high orders. We have si = s^^'^e-'^, isp)l = {spY^^'^e-'P^, {sq)l = (s5)"+i/2e-»«V2^. 102 LAW OF LARGE NUMBERS. [64 Substituting the above values in the expression s//((sp)/ (sq)!) we get 1 Hence we have which reduces to psjH-i i2q,q+i 12 ^2irs p8p+ll2q8q+ll2^2ws' 1 T = Vl> Tspq as an approximate value for the maximum term. Tchehycheff's Theorems. — Despite all that has been said about the most probable value, its use is somewhat limited, and it might well, without harm, be left out of the whole theory of probabilities. Just because an event is the most probable it does by no means follow it is a very probable event. In fact the expression ( '^2-n-spq)~^ which for large values of s converges towards zero shows that the most probable event in reality is a very improbable event. This statement may seem a little paradoxical; but it is easily understood by realizing that the most probable event is only a probability for a possible combina- tion among a large number of equally possible combinations of a different order. Instead of finding the most probable event it is more important in practical calculations to determine the average number or mean value of the absolute frequencies of successes. In Chapter V we pointed out the close relation between a mathematical expectation and the mean value of a variable. This relation is used by the Russian mathematician, Tchebycheff, as the basis of some very general and far-reaching theorems in probabilities, by means of which the Law of Large Numbers may be established in an elegant and elementary manner. 64. Expected or Probable Value. — In Chapter V we defined the product of a certain sum, s, and the probabihty of winning such a sum as the mathematical expectation of s. It is, however, not necessary to associate the happening of the e-\'ent with a monetary gain or loss, in fact it serves often to confuse the reader and we may generalize the definition as follows. If a 64] EXPECTED OR PROBABLE VALUE. 103 variable a i may assume any of the values ai, ai, as • • • a^ each, with a respective probability of existence /'(^) + 22S[a - e(a)][/3 - em^iam^) + SS[/3 - e03)]V(/3a)i/'(^). A mere inspection will satisfy that the first and the last terms of this expression equals e^{a) and e^{fi) respectively. The first term may be written as follows : 2[a - e(a)]V(a)2;/'(/3) = e\a) since 2i^(^) = 1. The same also holds true for the last term. With regard to the middle term we found before that e[a — e{a)] = 0. Hence it follows by mere inspection that this term becomes 0. Thus we finally have : e'ia + ^) = e'ia) + e\P) or e{a + /3) = Ve^^S) + e\a). Since the middle term is always 0, it follows a fortiori e(a - ft = ^t\a) + e\l3), also that e(ka) = kt{a), where ^ is a constant. This gives us the following theorems: The mean error of the sum or of the difference of two quantities is equal to the square root of the sum of the squares of each separate mean error. The mean error of any quantity multiplied by a constant is equal to this same constant multiplied by the mean error of the quantity. (See Appendix.) The above theorems may easily be extended to any number of variables: a, ^,y ■ ■ ■ so that in general we have 6(a + i3 + 7---) = VeV) + e^(^) + 6^(t) + •■■• We shall later make use of this formula by a comparison of the different rates of mortality among difPerent population groups. So far we have computed the mean error for the absolute frequencies of a, and the quantity ys-pq was compared with the most probable number of successes s-p. But it may also be useful to know the mean error of the relative frequencies. This calcula- tion is performed by reducing the mean error of the absolute 108 LAW OP LARGE NUMBERS. [67 frequencies to the same degree as these absolute frequencies are reduced to relative frequencies. We saw before that e(a) = sp. The relative frequency of the probable value is e{a)/s = sp/s = p. The mean error of p therefore is The following remarks of Westergaard are worthy of note: '' \Mien a length is measured in meters and this measure may be effected with an uncertainty of say 2 meters, the length in centimetres is then simply found by multiplication by 100 and the uncertainty is 200 cm. When we wish to find the mean error of p instead of sp we only need to divide the mean error "Vspg by s, which gives ypqjs." The same result is also easily obtained from the formula e{ka) = ke{(x) when we let k = 1/s. 67. Tchebycheff' s Theorem. — Tchebycheff's brochure ap- peared first in Liouville's Journal for 1866 under the title "Des valeurs Moyennes." A later demonstration was given by the Italian mathematician, Pizetti, in the annals of the University of Geneva for 1892. The nucleus in both Tchebycheff's and Pizetti's investigations is the expression for the mean error: e(^) = m - emMO. (1) The variable ^ may be of any form whatsoever, it may thus for instance be the sum of several variables: a, 0, y • • • while 1 5- - (4) Let also a = X€(^). We then have by a mere substitution in the above inequality: Pr>l-jJ-2. (5) This constitutes the first of Tchebycheff's criterions which says: The probability that the absolute value of the difference \ a — e{a) \ does not exceed the mean error by a certain multiple, X, (X > 1) is greater than 1 — (1/X^). Now we made no restrictions as to the variable, f, which may be composed of the sum of several independent variables, a, /3, 7, • • ■ . We saw before that e\a + P + y+ ■■■) = e^a) + e^O?) + 6^(7) + • • • Tchebycheff's criterion may therefore be extended as follows : The Tchebycheffian probability, Pt, that the difference | a + ;8 + 7 _|_ ... _ g^Qj) _ g(^) _ g(-y) — ... I icill never exceed the mean error e by a certain multiple, X > 1, is greater than 1 — (1/X^). 110 LAW OF LARGE NUMBERS. [69 68. The Theorems of Poisson and Bernoulli proved by the application of the Tchebycheffian Criterion. — Bernoulli in his researches limited himself to the solution of the problem in which the probabilities for the observed event remained constant during the total number of observations or trials. Poisson has treated the more general case, wherein the individual probability for the happening of the event in a single trial varies during the total s trials. This may probably best be illustrated by an urn schema. Suppose we have s urns Ui, U2, ■ • • Us with white and black balls in various numbers. Let the probability for drawing a white ball from the urns Ui, U2, ■ ■ ■ f7s in a single trial be Pi, P2, • ■ ■ Ps respectively, qi, q^, • • • qs the chances for drawing a black ball in a single trial. If a ball is drawn from each urn, what is the probability of a drawing a white and s — a black balls in s trials? It is easily seen that the Bernoullian Theorem is a special case when the contents of the 5 urns and the respective probabilities for drawing a white ball in a single trial are the same for all urns. 69. Bernoullian Scheme. — We shall now show how the Tche- bycheffian critierions may be used in answering the question given above. First of all we shall start with the simpler case of the Bernoullian urn-schema. Here the probability for drawing a white or a black ball from each of the s urns in a single trial is p and q respectively. The square of the mean error in a single trial is pq. From the formulas in § 66 it then follows : e^ = «i^ + ^2^ + • • • = pq -\- pq -\- pq -\- • ■ • s times = spq or € = ■Sspq. While the above expression gives us the mean error of the absolute frequency of the variable a, the relative frequency of a to the total number of trials, s, is given as ^Ipq € = —=-. We now ask: What is the total probability that the absolute deviation of the relative frequency a/s from its expected value sp/s = p never becomes larger than X times the mean error, 70] poisson's scheme. Ill € = -Spqjs? Letting X = ylsft and using the symbols Pt for this particular probabiUty, we have according to Tchebycheff's criterion : Pr > 1 - l/X^, or Pr> I- f/s. Since the mean error is equal to 'ypqjs we have: Xe = — . The answer to our question above follows now a fortiori as follows : The total probability that the absolute deviation of the relative frequency from the postulated a priori probability, p, never exceeds the quantity, 'ypq/t, is greater than 1 — (f/s) . By taking t large enough we may reduce '^Ipq/t (where pq is a fraction whose maximum value never can exceed 1 -f- 4,) below any previously assigned quantity, 5, however small. If, for instance, we choose the value .0001 for 8, we may rest assured that '^pq/t will be less than 5 when we take t larger than 5000. But no matter how large t is, so long as it remains a finite number, by letting s = ,(y, ((' = 1, 2, 3, • • -s) If by po and qo we denote the arithmetic mean or the average value of the s p's and s q's, such that Pi + P2 + P3+ ■■■Pa ,„, Po = ^: (3) gi + ?2 + g3 + ■■■ qa ... 90 = (4) and assume that po and 50 denote the constant probabilities during each of the s trials (observations), we should according to the Bernoullian Theorem have : e{aB) = spo (5) «(«£) = ■^spoqo (6) where a^ stands for the absolute frequency in a Bernoullian series. An actual comparison of (1) and (5) and (3) shows that: e{ap) = eiae) (7) where aj> is the symbol for the absolute frequency in a Poisson series. In other words: If the s trials had been performed with constant probability for success equal to po instead of with varying probabilities pi, p2, ■ ■ ■ /)», the expected or probable value would be the same for the Bernoullian and Poisson scheme. With regard to the mean error we find, however, after a little calculation, e/(«) = e^a) - S(i;, - po)\ (8) The expression for the mean error in Poisson's Theorem is of the following form ep = -^piqi + p^qi + p^q^ + ■ • -piqi = VSpi^,- {i = 1, 2, 3- • •«). 70] POISSONS SCHEME. 113 Now piqi may be transformed as follows: Writing pi = Po + ipi — Po) ii = 9o — {pi — Po) and multiplying we obtain: Pt

  • concerned. As we already ohser^-ed in the intro- ducTorj' remarks to this chapter it is impossible to perform a certain experiment an intinite mmiber of times, and it is therefore out of the qiiesrion to determine the limiting and ideal value of the iH^sieriori probability, and we must satisfy ourselves ^"ith an approximation by performing a finite nimiber of trials, or let ^« be a finite mmiber. The quotient a -^ .* is then the empirical approximate a posteriori probability. We know also that al- though this quotient is an approximation of the posTidated a priori probability only, that by increasing .< or what amounts to the Siime thing, by making a large niumber of trials, the dif- ference between the approximate empirical probability ratio, Q -i- *, and the a priori probability, p, becomes smaller as the niunber of trials is increased. Bvit how small is the difference? Or how many times shall we repeat the trials (observations so that, for practical piu-poses, we may disregard this difference? It does not suffice to be satistieii with the fact that the difference becomes proportionately smaller the greater we make the number of trials and merely insist that in order to avoid large errors it is only necessary to operate with very large nimibers. Immediately the question arises: Wnat constitutes a large number? Is 100 a large ntmiber. or is 1.000. 10.000, 100.000 or even a milli on an answer to tliis question? As long as this question remains unanswereii. it helps but Uttle to poke upon the "law of large munbers." a tendency which unfortunately is too manifest in many statistical researches by amateiu- statisticians. As long as a definition, much less than a nimierieal determination of the ranee of "small mmibers"' is lacking, little stress ought to be laid on such remarks based in the metaphorical terms of "small" and "large" numbers. 72. Application of the TchebycheflSan Criterion. — It is readily seen that even a rough qr.antitive determination of the difference between the approximate a posteriori probabUity and the postulated a priori probability based upon the mere vague state- ment of "large numbers' is utterly impossible, and it remains " to be seen, therefore, if the theory" of probability' offers tis a criterion that might serve as a preliminarj' test for the above diiierence. To restate oxir problem: If p is the p rulaini a priori 116 LAW OF LARGE NUMBERS. [72 probability and a -i- s is the empirical probability {a posteriori) or relative frequency of the event, E, what is the probability that the difference, \ (ajs) — p | does not exceed a previously assigned quantity^ In the mean error and the associated theorem of Tchebycheff we have a simple and easily applied criterion to test this prob- ability. Tchebycheff's rule states that the probability, Pj-, of a devia- tion of a variable from its probable value, not larger than X times its mean error, is greater than 1 — (1/X^). For X=3 Pt> I- ^ = 0.888 X = 4 Pt> I - j\ = 0.937 X = 5 Pr > 1 - 2T = 0-96. This shows that a deviation from the expected or probable value of the variable equal to 4 or 5 times the mean error possesses a very small probability and such deviations are extremely rare. Let us for example assume that the observed rate of mortality in a certain population group is equal to .0200. Let furthermore the number exposed to risk equal 10,000. The mean error is (02X 98\^ ^ ) ~ -0014. If the number of lives exposed to risk was one million instead of 10,000, the mean error would be (02 X 98\^ ' ^ — J = .00014. A deviation four times this latter quantity is equal to .00056, and according to Tchebycheff's criterion the probability for the non-occurrence of a deviation above .00056 is greater than .937, or the probability of dying inside a year will not be higher than .0206 or less than .0194. For an observation series of 4,000,000 homogeneous elements we might by a similar procedure expect to find a rate of mortality between 0.02 + 0.00028 or 0.02 - 0.00028. Thus we notice that the mean error of the relative frequency numbers decreases as the number of observations increases. CHAPTER X. THE THEORY OF DISPERSION AND THE CRITERIA OP LEXIS AND CHARLIER. 73. Bemoullian, Poisson and Lexis Series. — In the previous chapter we limited our discussion to single sets consisting of s individual trials and found in the mean error and the criterion of Tchebycheff a measure for the uncertainty with which the relative frequency ratio a/s as well as the absolute frequency a were affected. How will matters now turn out if, instead of a single set, we make N sets of trials? As already mentioned in paragraph 54, in general in N such sets we shall obtain N dif- ferent values of a, denoting the absolute frequency of the event represented by the sequence di, ai, az, ■ ■ ■ a^. Our object is now to investigate whether the distribution of the above values of a around a certain norm is subject to some simple mathematical law and if possible to find a measure for such distributions. In this connection it is of great importance whether the pos- tulated a priori probabilities remain constant or not during the 'N sample sets. Three cases are of special importance to us.^ 1. The probability of the happening of the event remains constant during all the iV sets. The series as given by the ab- solute frequencies in each set is known as a Bemoullian Series. 2. The same probability varies from trial to trial inside each of N sample sets, the variations being the same from set to set. The series as given by the absolute frequencies is in this case known as a Poisson Series. 3. The probability remains constant in any one particular set but varies from set to set. The absolute frequency series as produced in this way is called a Lexis Series. The above definition of these three series may, perhaps, be made clearer by a concrete urn scheme. ' The terminology in due to Charlier. 117 118 THE THEORY OF DISPERSION. [73 A. BernouUian Series. — s balls are drawn one at a time from an urn, containing black and white balls in constant proportion during all drawings. Such drawings constitute a sample set. Let us in this particular set have obtained say ai white and /3i black balls, where ai + (3i = s. We make N sets of drawings under the same conditions, keeping a record of white balls drawn in each set. The number sequence thus obtained. Oil, CLi, as, • • • a^. is a BernouUian Series. B. Poisson Series. — s individual urns contain white and black balls, the proportion of white to black varying from urn to urn. A single ball is drawn from each urn and its color noted. In this way we get ai white and j8i black balls constituting a set. The balls thus drawn are replaced in their respective urns and a second set of s drawings is performed as before, resulting in aj white and ^2 black balls. The number sequence, ai, a2, as, ■ ■ ■ a^f, of white balls in N sets represents a Poisson Series. C. Lexis Series. — s balls are drawn one at a time under the same conditions as set No. 1 in the BernouUian series. The a\ white and /3i black thus drawn constitute the first set. In the second and following set the composition of the urn is changed from set to set. The number sequence representing the number of white balls in the N respective sets: Oil, Oil, OLz, ■ ■ ■ Oijf is a Lexian Series. The scheme of drawings is the same as in the BernouUian Series except that the proportion of white to black balls varies from set to set. 74. The Mean and Dispersion. — Since we have no a priori reasons for choosing any one particular value of the various a's of the above sequences in preference to any other, we might give equal weight to each set and take the arithmetic mean as defined by the formula: of the N values of a.. 73] BERNOULLIAN, POISSON AND LEXIS SERIES. 119 I-t will be unnecessary to enter into a detailed discussion of the mean, which is a quantity used on numerous occasions in every day life. We shall, however, define another important function known as the dispersion (standard deviation). The dispersion is denoted by the Greek letter, a, and is defined by the formula „ (ai - MY +ia2- My+ ■■■ (a^ - M)^ N (11) We shall now attempt to find the expected value of the mean and the dispersion in the three series. First of all take the BernouUian Series. Let the constant probability for success in a single trial be po. We have then for the various expected values or mathematical expectations of a: Set No. 1: e(ai) = spo Set No. 2: e{ai) = spo Set Noo N: e{aif) = spo or: e(ai) + e(a2) + • • • + eia^) Se(a^) Nspo spo, N N N which shows that the mean in a BernouUian Series of N sample sets is equal to the expected value of the absolute frequency in a single set. In regard to the dispersion we have for the various sets: Set No. 1: e{ai - M^ = e^(ai) = spoQo Set No. 2: e{, defined by means of the following relation: |ai-ilf! + |«2--M"| + |a3--M"|+ V\aK- M\ ^= N ' where | a^ — ilf | means the absolute difference between m, and M. We shall now proceed to determine the expected value of i} on the assumption that the observed data follow the Bernoullian Law. The mean in a Bernoullian series with constant prob- ability po we found before to be equal to spo which was the expected value of a in a single sample set of s trials. The expected value of the absolute difference in the yth set is therefore: e\a^ — spo\ = "Zilay — spo] 1, the series is called hypernormal. When L < 1, the series is a subnormal series. It is easily seen from the respective formulas that the Poisson Series are subnormal series whereas the Lexian Series are hyper- normal. The great majority of statistical series are — as we shall have occasion to see in the following chapter — of a hyper- normal kind and correspond thus to the Lexian Series. In § 74 we found the dispersion in the Lexis series as a-L = <^B + if - s)(Tb)- A computation of the Charlier CoeflBcient of Disturbancy from the observed values gives : 100, = 50.80 whereas the theoretical value is 55.38, showing a decidedly hypernormal dispersion, a result which was to be expected since the probabilities of drawing black varies from \ to ^V i'^ the various sets of samples. All the above experiments show a completely satisfactory verification of the various theorems of the previous chapters and may perhaps serve as a vindication of the followers of Laplace, who like him hold that an a ■priori foundation for probabiHty judgments is indispensable. 11 CHAPTER XII. CONTINUATION OF THE APPLICATION OF THE THEORY OF PROBABILITIES TO HOMOGRADE STATISTICAL SERIES. 82. General Remarks. — In this chapter it is our intention to discuss the apphcation of the theory of probabilities to homograde statistical series with special reference to vital statistics. We owe the reader an apology, however, inasmuch as in the former paragraphs we have employed the term statistics without defining its meaning in a rigorous manner. A definition may perhaps appear superfluous since statistics nowadays is almost a house- hold word. The term unfortunately is often employed as a mere phrase without any understanding of its real meaning. This applies especially to that band of self-styled statisticians, mere dilettanti, who, with an energy which undoubtedly could be better employed otherwise, attempt to investigate and analyze mass phenomena regardless of method and system. When investigations are undertaken by such dilettanti the common gibe that "statistics will prove anything" becomes, alas, only too true and proves at least that " like other mathematical tools they can be wielded effectively only by those who have taken the trouble to understand the way they work."^ By the science of statistics we understand the recording and subsequent quantitative analysis of observed mass phenomena. By mathematical statistics {also called statistical methods) we understand the quantitative determination and measurement of the effect of a complex of causes acting on the object under investigation as furnished by previously recorded observations as to certain attri- butes among a collective body of individual objects. Practical statistics — if such a name may be used — then simply becomes the mechanical collection of statistical data, i. e., the recording of the observed attributes of each individual. In no way do we wish to underestimate the importance of this process 1 See Nunn, "Exercises in Algebra" (London, 1914), pages 432-33. 146 83] STATISTICAL DATA AND MATHEMATICAL PROBABILITIES. 147 which is as important for the statistical analysis as is the gathering of structural materials for the erection of a large building. Mathematical statistics is thus the tool we must use in the final analysis of the statistical data. It is a very effective and powerful tool when used properly by the investigator. At the same time it is not an automatic calculating machine in which we need only put the material and read off the result on a dial. A person without any knowledge whatsoever about the nature of loga- rithms may in a few hours be taught how to use a logarithmic table in practical computations, but it would be foolish to view the formulas and criteria from probabilities when applied to statistical data in the same light as a table of logarithms in cal- culating work. Such formulas and criteria must be used with caution and discretion and only by those who have taken the trouble to make a thorough study of probabilities and master their real meaning and their relation to mass phenomena. If put in the hands of mere amateurs the formulas become as dangerous a toy as a razor to a child. It is not our intention to give in this work a description of the technique of the collection of the material, which depends to a large extent on local social conditions and for which it is difficult to give a set of fixed rules. In the following we shall treat the mathematical methods of statistics exclusively, and furthermore make the theory of probabilities the basis of our investigations. 83. Analogy between Statistical Data and Mathematical Probabilities. — Let us for the moment imagine a closed commun- ity with a stationary population from year to year and let us denote the size of such a population by s. Let us furthermore suppose we were given a series of numbers: mi, m2, ma, • • • m^f, denoting the number of children born in various years in this community. The ratios mi 7112 ma m^ V ' V ' V' 7 may then be looked upon as probabilities of a childbirth in various years. As Charlier justly remarks, "such an identi- fication of a statistical ratio with a mathematical probability is. 148 HOMOGRADE STATISTICAL SERIES. [83 at first sight a mere analogy which possibly may have very little in common with the observed statistical phenomena, but a closer scrutiny shows the great importance for statistics of such' a view." If such ratios could be regarded as mathematical probabilities wherein the various m's were identical to favorable cases in s total trials, the mean and the dispersion could be de- termined a priori from the Bernoullian Theorem. The founders of mathematical statistics regarded the identification of an or- dinary statistical series with a Bernoullian Series almost as axiomatic. This view is found even among some leading writers of the present time. Among others we apparently find this traditional view by the eminent English actuary, G. King, in his classic "Text Book." In Chapter II of this well-known standard actuarial treatise a probability is defined as follows : " If an event may happen in a ways and fail in /3 ways, all these ways being equally likely, the probability of the happening of the event is a -r- (a -|- /?)." With this definition as a basis King then de- duces the elementary formulas of the addition and multiplication theorems. He then continues: "Passing now to the mortality table, if there be Ix persons living at age x, and if these Z^+n survive to age X -\- n, then the probability that a life aged x will survive n years is Ix+n -^ L = nPx- And again "the probability that a life aged x and a life aged y will both survive n years is nPxX nVv-"^ From the above it would appear that the author unreservedly assumes a one-to-one correspondence between the Ix+n survivors and "favorable ways" as known from ordinary games of chance and a similar correspondence between the original Ix persons and "equally possible cases." A simple consideration will show that there exists no a priori reason for such a unique correspondence between ordinary empirical death rates and mathematical proba- bilities. None of the original 4 persons can be considered as 1 Mr. H. Moir in his "Primer of Insurance" tried to avoid the difficulty by giving a wholly new definition of ''equally likely events." According to Moir "events may be said to be 'equally likely' when they recur with regu- larity in the long run." Apart from the half metaphorical term "in the long run" Mr. Moir fails to state what he means by the expression "with regu- larity." If the statement is to be understood as regular repetitions of a certain event in various sample sets, it is evident that we may obtain a regular recur- rence of the observed absolute frequencies in a Poisson Series, where — as we know — the events are not equally likely." — A.F. 84] COMPARISON AND PROPORTIONAL FACTORS. 149 being "equally likely" as in the sense of games of chance. Numerous factors such as heredity, environment, climatic and economic conditions, etc., play here a vital part in the various complexes embracing the original l^ persons. The belief in an absolute identity of mathematical probabilities and statistical frequency ratios seems to have originated from Gauss. The great German mathematician — or rather the dogmatic faith in his authority as a mathematician — proved thus for a number of years a veritable stumbling block to a fruitful development of mathematical statistics. Gauss and his followers maintained that all statistical mass phenomena could be made to conform with the law of errors as exhibited by the so-called Gaussian Normal Error Cur^'e. If certain statistical series exhibited discrepancies they claimed that such deviations arose from the limited number of observations. The deviations would become less marked if the number of observed values was enlarged and would eventually disappear as the number of ob- servations approached infinity as its ultimate value. The Gaus- sian dogma held sway despite the fact that the Danish actuary, Oppermann, and the French mathematicians, Binemaye and Cournot, have pointed out that several statistical series, despite all efforts to the contrary offered a persistent defiance to the Gaussian law. The first real attack on the dogma laid down so authoritatively by Gauss was delivered by the French actuary, Dormay, in certain investigations relating to the French census. It was, however, first after the appearance of the already men- tioned brochure by Lexis, "Die Massenerscheinungen, etc.," that a correct idea was gained about the real nature of statistical series. The Lexian theory was expounded in the previous chapters of this work, and we are therefore ready to enter upon the investi- gations of a few selected mass observations from the domain of vital statistics. 84. Number of Comparison and Proportional Factors. — In the mathematical treatment of the Lexian theory of dispersion we tacitly assumed that the total number of individual trials in a sample set or the number of comparison, s, remained constant from set to set. In the observations on games of chance it 150 HOMOGEADE STATISTILAL SERIES. [84 remained in our power to arrange the actual experiments in such a manner that s would be constant. In actual social statistical series such simple conditions do not exist. In comparing the number of births in a country with the total population it is readily noticed that the population does not remain constant but varies from year to year. For this reason the various numbers m denoting the births are not directly comparable with another. We may, however, easily form a new series of the form: S S S 9 — • mi, — ■ m2, — • ms, • ■ • — • m^, Sl 52 S3 s^ wherein the various numbers, mi, m2, ma • • • , corresponding to the numbers of comparison Si, Si, s^, ■ ■ ■ , are reduced to a constant number of comparison s. This series is by Charlier called a reduced statistical series. Such a reduction requires, in many Proportional Factoes for a Hypothetical Stationary Population in Sweden and Denmark Equal to 5,000,000 and 2,500,000 Respectively. Sweden, Denmark, Year. Inhabitants. «:»4. Year. Inhabitants. s: Sk 1876 4,429,713 1.1288 1888 2,143,000 1.1666 1877 4,484,542 1.1150 89 2,161,000 1.1569 1878 4,531,863 1.1033 1890 2,179,000 1.1473 79 4,578,901 1.0919 91 2,195,000 1.1390 1880 4,565,668 1.0952 92 2,210,000 1.1312 81 4,572,245 1.0936 93 2,226,000 1.1230 82 4,579,115 1.0919 94 2,248,000 1.1121 83 4,603,595 1.0861 1895 2,276,000 1.0984 84 4,644,448 1.0765 96 2,306,000 1.0841 1885 4,682,769 1.0677 97 2,338,000 1.0694 86 4,717,189 1.0600 98 2,371,000 1.0544 87 4,734,901 1.0560 99 2,403,000 1.0404 88 4,748,257 1.0530 1900 2,432,000 1.0280 89 4,774,409 1.0472 01 2,462,000 1.0154 1890 4,784,981 1.0449 02 2,491,000 1.0036 91 4,802,751 1.0410 03 2,519,000 0.9925 92 4,806,865 1.0402 04 2,546,000 0.9819 93 4,824,150 1.0365 1905 2,574,000 0.9713 94 4,873,183 1.0261 06 2,603,000 0.9604 1895 4,919,260 1.0165 07 2,635,000 0.9488 96 4,962,568 1.0076 08 2,668,000 0.9370 97 5,009,632 0.9981 09 2,702,000 0.9252 98 5,062,918 0.9875 1910 2,737,000 0.9134 1899 5,097,402 0.9809 11 2,800,000 0.8929 1900 5,136,441 0.9734 1912 2,830,000 0.8834 So' CHILD BIRTHS IX SWEDEN. 151 cases, a certain correction. However, when the general ratios .-' -H .■jj v^- = 1, 2, 3 • • • ^') are close to unity the reduced series may be treated as a directly observed series. In most of the following examples taken from Scandina\-ian statistical tabular works the proportional factor s 4- .>■,. is close to unity as shown in the table below. For Sweden I have, following CharUer, assumed a stationary population .« = 5,000,000. The corresponding Danish .•< I have taken as 2,500,000. The above figures are taken from " Sveriges officielle statistik " and "Statistisk Aarbog for Danmark" for 1913 (Precis de Statistique, 1913). 85. Child Births in Sweden. — From Charlier's "Grimddragen" I select the foUo^4ng example showing the number of children born in Sweden in the period from 1 SSI -1900 as reduced to a stationary population of 5,000,000. s = .^.000,000, .V = 20. J/o = 140,000. Tear. m. m-U^ {m~iS:''. 1S61 145,230 -5.230 27.352,900 S2 146,630 —6.630 44.0S9.600 S3 144.320 4-4.320 IS.662.400 S4 149.360 -9.360 S7.609.600 ISSo 146.600 -rO.eoo 43.560.000 S6 US. 270 -S.270 6S.392,900 S7 14S.020 -S.020 64,3-20.400 ss 143.6S0 -3.6S0 13.542,400 S9 loS.SOO -1,700 2.S90.000 1S90 139,600 - 400 160.000 91 141.070 -rl,070 1.144,900 92 134.S30 -5.170 26.r2S,900 93 136.540 -3,460 11,971.600 94 134,S40 -5.160 26.625.600 1?Q5 136.S20 -3.1S0 10,112.400 96 135.330 -4.670 21.SOS.900 97 132.7.50 -7.2.50 52.562.500 9S 134. S20 -5. ISO 26,S:32.400 99 131.320 — S.6S0 75.342.400 1900 134.460 -5.540 30,691,600 Sum - = • 4- 53,190 ■ -50,390 6^4,401,400 From which we obtain: b= (+ 53.190 - 50,390) : 20 = 140 J/ = J/o + 6 = 140,140 152 HOMOGRADE STATISTICAL SEBIES. [86 a^ = 654,401,400 : 20 - 5^ = 32,700,470, ox a = 5,718. The empirical probability of a birth (po) is po = ^1/ : 5 = 0.02803, so that go = 1 - Po = 0.97197 and the Bernoullian dispersion as = "Vspo ?o = 369.0. The actual observed dispersion (5,718) is thus much greater than the Bernoullian. The birth series is considerably hyper- normal. The Lexian ratio has the value L = 5,718 : 369.0 = 15.50, while the Charlier coefficient of disturbancy is: lOOp = 4.07. Both the values of L and p show that the birth series by no means can be compared with the ordinary games of chance but is subject to outward perturbing influences. 86. Child Births in Denmark. — The following example shows the corresponding birth series for Denmark in the 25-year period from 1888-1912 as reduced to a stationary population of 2,500,000. The computation of the various parameters follows: h = (39,713 - 30,287) : 25 = ^- 377, M = Mo+b = 73,377, (T^ = 281,208,156 : 25 - 6^ = 11,106,197.2, o-/ = spo go = 71,223. (po = M : s = 0.0293508), L = a : (Tb = 12.5 lOOp = 100( ^1|7r™2 (mi and nii standing for the number of white balls) as the number of white balls drawn in sample sets of 1,000 single drawings. But these values are not equally reliable. The mean error in the second series is in fact 10 times as large as the mean error in the first series. In order to overcome this difficulty we ask the reader to consider the following series: The element — mi is repeated si times ({ (( Si a «2 €C tc S3 tt S3 s Sn JV In this way we obtain a series with 5i + •'2 + S3 + •••+«. elements which may be termed a reduced and weighted series since the larger Sk appears oftener than the smaller values of Sk- We shall now see if it is possible to determine the expected value of the mean and the dispersion if the series is supposed to follow the Bernoullian Law. The mean is defined by the following relation: < 5l > •« «2 > In 9 9 S M= — miH H — miH — m2+ ■ ■ ■ + - m2 \_Si Si Si S2 90] BEDUCED AND WEIGHTED SERIES IN STATISTICS. 159 < Sif > S 9 ~l + • • • +-—mjf+ ■■■ + — mjf\^[si+S2+ ■■• + Sff] s _ Sk sl^rrii Denoting the average empirical probability by po we have 2mfc : 2sit = po and, Mb = spo. As to the dispersion it takes on the following form: < «i > ff^ = ( —mi — spo j + ■ ■ ■ +1 — mi — spoj -Si- + [jm, -sp,j+ ■■■+ [j^m, - sp,j+ ■ ■ ^N + f - m;v - «Po j + •••+(— m^ - spA \ -^ [*1 + «2 + • • • 5;,?] 2**: —7714 — spo I 2 — {mk — SkPo) Sk / Sh 'S.Sk 2 (fc= 1,2,3, •••iV). In finding the theoretical dispersion, assuming a Bernoullian distribution for which po may be used an an approximation of the mathematical a priori probability, we ask the reader to examine the general term of the expression for o^, viz. : — (m^ — SkPaY ■ 25i-. If the individual trials follow the BernoulUan Law the expected value of the factor (m* — SkPo)^ takes the form : e[(mk — SkPoY] = 2(7?ii — SkpaYipimk) = SkPaqa- This brings the general term for a^ to the form: 160 HOMOGRADE STATISTICAL SERIES. [90 Thus the expected value of «»-.tf. -^?^- ( ^- "-:-)<- 1S-S9 1 377 — Is3 -12 2.196 ISiO '"> 476 — >4 -11 924 91 3 410 -15(.^ -10 1.5O0 V- 4 444 -116 — 9 1.044 93 5 4ti2 — v»S — s 7S4 ■H 6 423 -137 — 1 959 ls-i>o 1 442 -lis - 6 70S 96 S 493 — 67 - 5 3.%S u~ 9 505 — 55 - 4 220 US, 10 v"15 - 45 - 3 135 99 11 513 — 47 _ 2 94 1900 12 547 - 13 - 1 13 01 13 ?i»o + 35 0-: 14 540 - 20 + 1 -20 03 15 .5>0 — 20 + 2 40 04 16 609 -r 49 -1 — o 147 1905 ■ ~ 639 — "v + 4 316 06 IS 61v> — 59 — 5 295 o: 19 IV.S — - 9S — 6 55.> OS ■20 631 - 71 -1- 7 497 09 ■"^l i^nS — 12-3 ^~ X 9>4 1910 •>■■» no +150 — i> 1.350 11 23 "10 +150 +10 1.5tXi 12 721 -161 —11 1 ~-\ 1913 ■:o ris -15< -12 :>f«.i = :< 27n A computation of the disp)er5ion and the Charlier coefficient of disturbancy gives a \"alue of lO0..-> in the neighborhood of IS, indicating marked fluctuations. An inspection of the series shows immediately that there is a markeii increase in the rate of death from cancer. Working out the secular disturbances in the ordi- 166 HOMOGRADE STATISTICAL SERIES. [93 nary manner we find: „ 18,276 , , „„ ^^^ = -1:300 = i4-o^ indicating an increase of death from cancer of about 14 persons pr. annum for a population of 1,000,000. Eliminating the secular disturbances in the same manner as above, we now get a coefficient of disturbancy equal to 0.983t {i = V — 1), practically a normal dispersion when taking into account the mean error due to sampling. 93. Application of the Lexian Dispersion Theory in Actuarial Theory. Conclusion. — The Russian actuary, Jastremsky, has applied the Lexian Dispersion Theory in testing the influence of medical selection in life assurance.^ The research by Jastremsky evolves about the following question. Is medical selection a phenomena independent of the age of the assured? Let ^'^qx denote the observed rate of mortality after t years' duration of assurance. In the same manner qj^^^ denotes the rate of mor- tality of a life aged x after 5 or more years of duration (< ^ 5). Forming the ratio '"(/i : g^'^' for all ages of x we obtain a certain homograde series for which we may compute the Lexian Ratio and the Charlier Coefficient and thus determine if the fluctuations are due to sampling onlj or dependent on the age of the assured. Space does not allow us to give a detailed account of the very interesting research by Jastremsky as applied to the Austro- Hungarian Mortality Table (Vienna, 1909), and we shall limit ourselves to quote his final results as to the Lexian Ratio, L, for Whole Life Assurances and Endowment Assurances: Whole Life Assurances. Endowment Assurances t i L 1 0.88 1.01 2 0.89 0.96 3 1.12 1.05 4 1.05 0.98 5 1.07 0.91 The above values of L all lie close to unity and the series may therefore be considered as a Bernoullian Series where the fluctu- 1 Jastremsky . "Der Auslese-KoefEzient," Zeilschr. f. d. ges. Vers.-Wiss., Band XII, 1912. 93] APPLICATION OF THE LEXIAN DISPERSION THEORY. 167 ations are due to sampling entirely. Or in other words, the ratio