UNIVERSITY OF ILLINOIS LIBRARY AT URBANA CHAMPAIGN BOOKSTACKS Faculty Working Papers \ REVISED ERROR COMPONENTS: MORE EFFICIENT ESTIMATION WITH COMBINED CROSS SECTION AND TIME SERIES Robert W. Resek #228 College of Commerce and Business Administration University of Illinois at Urbana-Champaign FACULTY WORKING PAPERS College of Commerce and Business Administration University of Illinois at Urbana -Champaign January 29, 1975 REVISED ERROR COMPONENTS: MORE EFFICIENT ESTIMATION WITH COMBINED CROSS SECTION AND TIME SERIES Robert W. Resek #228 1. Introduction Estimation of combined cross-section time series models is frequently carried out by error components methods as well as by co- variance models. In this paper we analyze the small sample relation between these estimators. In addition we propose a specific new estimator, the revised error components estimator whose small sample efficiency is always greater than either competing estimator. In this work we rely heavily on previous studies of error component models. Swamy and Arora [10] developed exact finite sample results and an a- class of estimators. As one might expect the revised error component estimator here as well as the r-class which includes it are special cases of their a class. In addition readers will see the close dependence on work of Nerlove [8], Balestra and Nerlove [2], Arora [1], Fuller and Battese [3, 4] and Wallace and Hussain [11]. We proceed by restating briefly known results about error components models. Then we develop in the two component case the r-class of estimators and the specific member which is opti ial. Alternative variance estimators are considered and their relative merit determined. Finally, the three component model is developed aad the results summarized. 2. Model Consider a linear econometric model: (1) y - 6o i + X6 + e i is an (NT x 1) vector of ones, y and £ are (NT x 1) . X is (NT x k) and is measured as deviations from column means so that X'i - 0. $ is (k x 1). Digitized by the Internet Archive in 2012 with funding from University of Illinois Urbana-Champaign http://www.archive.org/details/revisederrorcomp228rese -2- For simplicity there are N elements of the cross section called "states" and T time elements. Each state-time combination has exactly one observation. Within X, y, and e, the time subscripts vary most rapidly and all time observations within a state are adjacent. The model (1) will be transformed by premultiplication by appropriate transformation matrices P., P„ , etc. where each P has NT columns. These will all be expressed as Kronecker products: e.g. P - A B where A has N columns and operates on states and B has T columns and operates on time. Now consider the elements of e. e-m refers to state i, time j. (2) e y - u tj + v j + , 4 where each u, v, and w is an independent normal random variable and is independent of X. All results are conditional on X in this small sample. 2 2 2 The mean of each component is zero and their variances are 0,0,0. u v w 2 The normality assumption is necessary with small samples to provide X and F distributions but we show below the major results do not depend on these distributions, For the analysis we define two new variances: 2 2 ? (3) o, = CT + T cT 1 u v 2 2 2 a - a + N a 2 u w 2 2 2 2 It is clear that a. >_o , a_ > a . These inequalities play a very important role below. Employing (2) and (3) Ee » (4) V - Eee* - a 2 t (I ® I) + (a 2 - a 2 )(l ® 1 J) + (a 2 - o 2 )& J8i) U 1 U JL l. U N -3- where J = ii f IsanNxNorTxT matrix of ones. The square matrices in (4) are each idempotent. The BLUE estimator of 3 is Generalized Least Squares (GLS) but this depends on the unknown matrix V. In turn this matrix depends on three un- 2 2 2 known elements a , a, and a„. Since V need only be known to a scalar u 1 2 multiple we define: (5) Yl -oJ h\ 2 , 2 2 u 2 Thus we need only know y, and y_. During some of the development we will consider the model as stated (the three component model). For simplicity we also consider the two component model in which: (6) cl = 0, a 2 2 = *2, y 2 = i In this situation there is one unknown value Yi » which will be called y. 3. Useful Transformations The estimation is facilitated y use of various ^ans formations. Consider a T by T Helmert matrix [9, p. 13]. The first column is . and -1/2 the remaining (T-l) columns are ? . Then. = T i and (J> is orthogonal. Therefore, W - I T ; >' = I T ; (f)^ = 1; ^ = l M *^ 1 - | J; ty 2 - I - i j; ^[ P 3 - I 8 t^i - I 8 | J - P[P 1 p 2 = i ® 2 p 4 = i e 4> 2 (j>; - i (i - | j) - p 2 p 2 where P + P, ■ I; P_ P. - 0. 3 4 3 4 When P. and P_ are vertically augmented they yield the orthogonal matrix I ® <}> and so together provide independent data sets which convey all sample in- formation. We also define (where is now N x N) . p 5 - ^ • i - i j • i p 6 - * 2 *- e i - a - i j) e j so that (8) V - a 2 I + (a 2 - a 2 ) P 3 + (a 2 - a 2 ) P 5 We will use this notation to develop initial estimates of 3 as well as estimates of the unknown variances. We turn exclusively to the two component model until the results in that case are developed. (9) V = a 2 (I - P,) + a 2 P - a 2 P. + a 2 p_ u 3 13 u4 13 (10) V" 1 = - 2 P^ + - 2 P 3 since P 3 + P 4 - I a a., u 1 2 2 4. Estimation of a and a, u 1 Transform (1) by P. : (11) ? ± y = $ o P x i + P x X 6 + ? 1 e -5- (12) V - EP, eE , P 1 = a 2 I V. Is N by N and ordinary least squares (OLS) yields BLUE estimates given the transformed data. Define (13) Z = X'P'P X « X'P X (14) b 1 - z" 1 X'P^y (15) b - i'P.y/N - Zy. ./NT » Grand Mean o l lj (16) var b : » o^z" 1 (17) var b » a J /NT o 1 2 The residual vector e is employed to estimate O- : (18) 6* - e^/q (19) q -■ N-l-k *2 Q- follows a Chi Square distribution ith q degrees of freedom. Equation (14) implies that we assume Z. nonsingular and hence positive definite. The analysis can proceed without this assumption but that generalization always brings unwanted confusion. The reader may develop results without it. Similarly P„ may be employed (and we assume Z„ nonsingular). (20) P 2 y - P 2 XB + P 2 e (21) V 2 = a 2 I (NT-N) square z 2 = x'p 2 p 2 x - x'p 4 x -6- (22) b 2 = z'Vp^y (23) var t> 2 = ah^ 1 2 The residuals e~ are employed to estimate a : (24) a 2 u = e 2 e 2 /n n = N(T-l) - k 2 As above a follows a chi square distribution. We note that b 1 and b_ are independent and include all the sample information. They are independent of a. and a which are also pair wise independent. We will rely on an estimate of Y : (25) 9-S 2 a /S\ Y is clearly distributed F with (n, q) degrees of freedom. Before proceeding, consider P,. and P, . P 2 P 2 * P 4 P 4 so b 2 = b 4* Al8 ° e 4 = P 2 e 2 so that: e 4 e 4 ~ e 2 P 2 P 2 e 2 " e 2 (I ® ^2' e 2 = e 2 e 2 Thus P 2 and P, yield identical ccefficient vectors and variance estimates. The same can be shown to be true of P, and P_. 1 3 P. and P, are also identical to the direct estimation of (1) by ordinary least squares after (N-l) state dummy variables are added. It is the covariance (CV) estimate and we will refer to b_ by that name. For the development we employ only the estimators of the variance but other estimators exist which are in some ways superior and these are -7- discussed below. We will now develop the Generalized Least Squares and Error Components estimators based on the analysis so far. 5. Generalized Least Squares The GLS estimator can be written employing this notation. b = [X'V" 3 ^] " 1 [X'V'V] X - P. X + P. X 3 4 Employing (10, (13), (22) x'v' 1 * X '(P 3 + V ( "2 P 4 + V3 )(P 3 + P A )X u °1 - 2 x'p a x + - 2 x»p 3 x v u 1 JT2 „2 Z 1 u 1 i Z +1 Z a 2 2 a? * U 1 K a i Employing (5), (14), and (23) (26) b g = [Z 2 + Y Z 1 ]' 1 [X»P 4 y + yX'P^] (27) b - W 2 *b 2 + W*b 1 where W £ * - [Z £ + YZ^'Vp^, W x * = [Z 2 + YZ^fV'P. W 2 + W l * I -8- (28) var (b g ) - a 2 [Z £ + YZ^*" 1 These are the well known results which depend on known values for 2 2 a , and a, or y. (27) shows that b is a weighted average of b„ u 1 g 2 and b . 6. Revised Error Components We introduce a class of estimators, the r-class, of which one member will be Revised Error Components (REC) . For the r class, y in (26) is replaced by ry where < r. The r class is a subset of the a class of estimators discussed by Swamy and Arora. However, their purpose was the broad proof of asymptotic distribution, etc. and their multiple parameters, while desirable for that purpose, are not desirable here. The r class estimator is: (29) b r - [Z 2 + rrz l ]" 1 [Z 2 b 2 + ryZ^] W_ b. + W. b. 2r 2 lr 1 where W + W. =1 2r lr E(b r |y) - w 2r Eb 2 + w lr Eb 1 = (w 2r + w lr ) 3 = 3 Hence b is conditionally unbiased for any y and r and so also is unconditionally unbiased. The variance of b conditional on y is (30) var(b r |y) - a 2 JZ 2 + ryZ^" 1 ^ + (r 2 y 2 /y)Z 1 ] [z 2 + tyZ^" 1 -9- Although expressions like (30) are frequently employed to characterize the efficiency of (29), their use is incorrect because Y is a random variable estimated in tae process. The correct matrix is the expectation of (30) taken over y. The r class is interesting because it includes other major estimators as special cases. Error components (EC) has r = 1. Co- variance (CV) has r = , GLS has r « 1 and y = y. Finally REC which Is specified below is a member of this class . Since every member of the r-class is unbiased we develop REC by minimizing the variances. In order to proceed we need additional assumptions about Z. and Z_. For the development of estimates we employ an overly strict assumption and subsequently relax it. Our final assumptions are more specific than we would desire but we believe that other analysis including Monte Carlo analysis will show the particular estimator to be more efficient than alternatives in any real case. This is because there is substantial breadth to what is employable in our final assumptions. 7. Relation 3etween Z- and Z_. For this discussion we consider three different factors entering Z- and Z_. These matrices have different number of underlying observations; those observations have different variation; and the pattern of variation is different. The first two elements, if considered without the third, lead to our two matrices being proportional to each other. Consider the number of observations: Z. derives from P. and has N state means, while Z„ (from P.) has N(T-l) deviations from those means. Hence Z has (T-l) -10- times the observations of Z and will be (T-l) times as large from this factor above. Second, we observe that often variation between states is larger (say m times as large) as variation within states, so that each observation making up Z will be m times as large as one in Z». Combining these two elements Z. is m/(T-l) ■ A times the size of Z-. The third source of variation is the "pattern of variation." Thus two variables in X may be very collinear within states (with large cross product in Z„), but unrelated between states. One very important reason for employing error components estimation rather than covariance is the full use of this information source. We temporarily restrict ourselves to the first two sources of variation and make this assumption: Al Z is proportional to Z. (31) Z^ = XZ 2 8. Optimization of r and Efficiency Comparison Rather than looking directly at the variance of b , we define the proportion by which each variance ma 4 rix element exceeds the corresponding element in the variance of the GLS estimator. Our assumption Al makes this relative variance (RV) a constant scalar for each matrix. Thus for GLS under Al: (32) var(b g ) = a*(l + Xy)' 1 z' 1 For the r class: (33) var(b r |y) = [1 + RV(b r |y)] var(b ) -11- (34) RV(b r |y) - (1 + Xy)(l + Xr 2 y 2 /Y)/(1 + Xr?) 2 - 1 - Xyd - ry/Y) 2 /(l + Xry) 2 For CV we set r = and for EC we set r = 1: (35) (36) CV: RV(b 2 ) - Xy -2, EC: RV(b |y) - Xy(1 - y/y) /U + A? ) Our goal now is to minimize (34) with respect to r. The resulting estimator will have its RV compared with (35) and (36). Clearly we remain interested in the expected value of (34) and (36) over the distribution of Examine the denominator of (34). Employing the inequality at (3), 2 2 we have a . _> a or y <_ 1. Because the true parameter satisfies this constraint, we will apply it also to the estimate. Thus (25) is replaced by (37) a 2 /5 2 u 1 if if 2 ^ ^2 1 — u a, < a 1 u This constraint is widely employed in current estimation by Error Components, Given this constraint: (38) l£l+XrY <1 + Xr In addition from the discussion of section 7 we know X to be fairly small in general and below we set r < 1. Thus the bounds provided by the inequality are relatively narrow. We note however that under relaxed -12- assumptiois below the counterpart of X may be greater than 1. Employ (38) with (34): (39) RV(b r |y) < Xyd - ry/Y) > Xy(l - r?/Y) 2 /(l + Xr) 2 A similar pair of inequalities holds for EC when we set r = 1 in (39) , Rather than minimizing the relative variance over r, we will minimize each limit as given by (39). Consider now the distribution of Y/Y* Define its mean and variance: (40) y » E(Y/Y) (T + y = E(y/Y) Then take the expectation of (39) (41) RV(b r ) = E [RV(bjY)] < Xy[1 - 2ry + r 2 (a 2 + y 2 ) ] > ay[1 - 2ru + r 2 (a 2 + y 2 )]/(l + Xr) 2 The optimal r values for the upper and lower limits are: (42) r x = ]i/(a 2 + y 2 ) r 2 = (y + X)/(a 2 + y 2 +X) Since we wish to minimize the largest possible variance, we place greatest weight on the upper limit value, r. . Also we note that as X approaches zero, r 2 goes to r and that r. is free of dependence on X. -13- Thus REC, the revised error component estimator is defined as the member of the r class with r = r. . The estimator is called b*. This value in principle can be determined whether or not Al is true and we advocate its use in general. As previously indicated we shall below in- dicate other cases where it is optimal. Under assumption Al we find these conclusions concerning EC, CV, and REC: CI. REC is more efficient than CV C2. REC is known to be more efficient than EC in this sense: a. The upper limit of the inequality (42) for REC is below the upper limit for EC. b. The lower limit of the inequality is below the lower limit 2 2 2 for EC as long as X < a + y . Since y > 1 in most cases, this allows a very large X. c. RV (b*) is strictly less than RV(b ) by (42) if 2 2 2 (1 + X) < a + y . We believe in practice b* will always be superior to b . 2 2 C3. EC is more efficient than CV if a + y < 2y. It is less 2 2 2 efficient if a + y > 2(y + X) + X . Intermediate cases are uncertain. Following section 10 we provide a specific simple example so one can see the numerical efficiency gain. 9. Distribution of y. The preceding results are specific and useful for any distribution of y. We now turn to the specific distribution that arises from our -14- estimates (25) and (37). The distribution stated below (25) depends on the original assumption that all errors have normal distribution. In this case (Y/y) was F with n and q degrees of freedom. For the present we ignore the restriction that y < 1 but return to it later. We recall that n, defined in (24) depends on NT and will in almost every case be fairly large, q on the other hand is N-l-k and can be very small. For example, one could have N - 7, k = 5 so q = 1. Indeed an Important reason for later con- 2 sidering other estimates for O. is that q here can be zero so the estimate would not exist. For this F distribution we have (for q > 4) y =» q/(q-2) a 2 + y 2 = [q 2 (n + 2)]/[(q - 2) (q - 4)n] (43) r x = [(q - 4)n]/[q(n + 2)] 2 We see that the mean of F becomes infinite when q = 2 and a does when q ■ 4. In (38) we found that the fact that y <_ 1 was very useful. It plays 2 2 "> an extremely important role here, too. Under (37) a > a , £ Y £ 1 so < y/Y £ 1/Y* Thus the range of Y is finite and all moments therefore must exist. The distribution is now a "truncated F" [F (n, q, F*) ] with (n, q) degrees of freedom and an upper limit F*. By this we mean the corresponding F for values < F < F*. There is a "spike" of F* such that P(F - F*) = P(F j> F*) . In our situation F* = 1/y« Since y is an unknown parameter we cannot know F* or the resulting mean or variance of F but can examine them conditional on various F*. -15- The moments of F (n, q, F*) were found by numerical integration and appear in Table 1. n -*■ °° was emp'oyed since in general use, n will be 2 2 large. The truncation reduces y and (a + u ) and also increases r. It is clear that a prior point estimate or distribution of y and hence F* is required to select the most efficient estimator and the reader may consider the optimal use of such prior information. We propose that a broad rule of thumb be employed and will show where it is better than alternatives. The rule is (44) r - [(q + 4)nJ/[(q + 11) (n + 2)] if q < 15 [(q - 4)n]/[q(n +2)] if q > 15 This rule was devised as a simple way to approximate Table 1 when F* ■ 3 and to include the effect of n. It may be shown that it leads to a smaller relative variance than EC for the values in the Table as long as either F* _> 5 or q _> 4. REC is now defined as the member of the r-class with r defined as in (44). Two conclusions can be derived. C4. REC is superior to either EC c : CV if F* <_ 5 or q >_ 4. If F* > 5 and q jc 3 then the r-class should be employed with appropriate (small) value of r. C5. EC is superior to CV if: F* ■ 2 for any q; F* - 3 and q ^ 4; F* ■ 5, and q _> 7; F* > 5 and q >_ 8. This is found by direct application of C3 to Table 1. Clearly the small sample properties are very dependent on the true unknown y. Monte Carlo studies should reach various conclusions according to the variances selected in advance for analysis. It is important to note that in Table 1, the biggest effect on r is created by F* and not the degrees of freedom. This implies intuitively Table 1 Truncated F Distribution Mean, Variance, Optimal r Numerator Degrees of Freedom -i Denominator Degrees of Freedom, q Truncation s = F* 2. 3. 5. 1.44 1.92 2.69 2.54 4.90 11.01 .57 .39 .24 1.35 1.68 2.13 2.23 3.87 7.41 .60 .43 .29 1.30 1.55 1.83 2.05 3.29 5.52 .63 .47 .33 1.26 1.46 1.65 1.93 2.89 4.36 .65 .51 .38 1.24 1.39 1.52 1.84 2.60 3.59 .67 .53 .42 1.22 1.34 1.43 1.77 2.38 3.05 .69 .56 .47 1.20 1.30 1.37 1.71 2.20 2.67 .70 .59 .51 1.19 1.27 1.32 1.65 2.06 2.39 .72 .62 .55 10. 1. y G + ]1 2. y 2 2 cr + y 3. y 2 2 a + y 4. y 2 2 a + y 5. y ~ 2 2 6. y 2 2 a + y 7. y „2 , 2 a + y 8. y „ 2 JL 2 a + y 4.14 32.35 .13 2.78 16.82 .16 2.16 10.10 .21 1.82 6.72 .27 1.62 4.87 .33 1.49 3.78 .39 1.40 3.10 .45 1.33 2.66 .50 * * * * 3.00 * 2.00 1.67 8.33 .20 1.50 4.50 .33 1.40 3.27 .42 1.33 2.67 .50 // 2 2, r - \i/(o + y ) infinite -16- that the results will not materially be altered if the underlying distribution is different from F as long as the truncation is present. We have at this point developed the REC estimator and have shown it to be broadly superior to EC and to CV. It remains to expand the analysis in three ways. These are first to relax the initial assumption Al; second 2 2 to consider other estimators of a, and O ; and third to present the estimators 1 u for a three component model. 10. Alternative Assumptions about Z. The analysis above depends on Al and we now discuss a variety of means by which comparable results may be achieved. While the family ultimately outlined here is not all inclusive, it contains a wide family of different kinds of situations. Initially we will state the major alternative assumption and then, since its meaning may not be instantly clear we will consider specific situations under which it will arise. For the matrices Z. or Z_ we may write eigen vectors and eigen values in a matrix equation; e.g. Z 2 A 2 * A 2 A 2 where A- Is an orthogonal matrix of eigen vectors and A« a diagonal matrix of eigen values. k~ ^y not ^ e unique because its columns may be permuted along with elements of A„ and if there are multiple eigen values, there are multiple solutions. We assume the following: A2 For the matrices Z_ and Z there exist matrices of eigen vectors A- and A. such that A = A = A. It is permissible to perform a single variance transformation on both Z matrices before finding eigen vectors. -17- One way of achieving this is assumption Al above. Another direct way is the following: A3 Z. and Z„ are diagonal matrices. This would arise when all the Xs are uncorrelated both within states and between states. Incidentally, this clearly includes the case where there is a single regressor. Other possibilities are best approached by first performing a variance transformation on Z. and Z~ as mentioned in the assumption. Form the diagonal matrix Q such that each diagonal element is the square root of the reciprocal of the diagonal of Z_, i.e. (45) Q 1± = Z a -1/2 q ±j - o tyj Define (46) Z* » QZ 2 Q Z* - QZ^Q It is clear that by construction, the diagonal of Z* consists of ones. We assume: A4 The diagonal elements of Z* are equal to each other. The im- plication of this is that variances in Z are proportional to those in Zj. This transformation will be made before A2 is applied. Its use, as seen below, is to transform both X and 6 without change of the content of the analysis. Along with A4, anyone of the following is sufficient for A2. -18- A5 There are two explanatory variables In the model (k » 2) . A6 Z* and Z* have a single constant every where off the main diagonal (but these two may be different) . A7 Z* and Zt are. each circular symmetric matrices. A8 Z* and Z* are each tridiagonal matrices. See Press [9] for specific discussion of the latter two assumptions. Another possibility is that Z. and Z. are each conformably block diagonal with separate blocks satisfying different assumptions above. In general terms these assumptions say that one makes a single variance transformation to Z and Z and the results are pattern matrices of a single type but different coefficients. Now we proceed to assume A4 and A2 are consecutively employed. (47) Z* A - (QZ 2 Q) A - AA 2 Z* A «= (QZjQ) A - kh Define M as the reciprocal square root of A ? so that: (48) M 2 A 2 = I Reconsider the original model (1) . (49) Y = 3i + X3 + e o =• $ i + XQAM(QAM)"" 1 g + £ o Define (50) 6 - (QAM)" 1 ^ so that (51) Y » $ i + (XQAM)6 + e -19- We will now examine the variance matrix of the transformed coefficient vector <5. This will be minimized in the sense that alternative estimators will have variance matrices which dit :er by a positive semi-definite matrix indicating the results will also hold for estimates of 0. The transformation QAM alters the matrices Z. and Z_. These become: (52) Z** - M'A'Q'Z QAM = I Z** = M'A'Q'Z^QAM - MA^ - K A' 1 - A A is a diagonal matrix where the i th diagonal element is the ratio of the eigen value of Z_ to the corresponding eigen value of Z ? . (53) X t - X U A 21 We proceed directly to the variance of b (30) where b is now an estimate of 6. (54) var(b r |y) - a* [I + r?A]" 1 [I + (r 2 y 2 /Y)A][I + ryA]"" 1 Clearly all off diagonal elements (covariances) are zero and diagonal elements may be written: (55) var(b ri |y) = a 2 [l + X^f^l + (v 2 f/y)X ± ) Previously this depended on a scalar X for the entire matrix but now it depends on A.. Relative variance continues to be as in (33) but is unique for each parameter in 6*. 2,,, , *.2 (56) RV(b rl |y) = X lY (1-ry/Y) /(l + X ± ry)' -20- This result is identical to (34) except that the subscript is added to X. Our justification for choosing the upper limit in (42) to determine r is strengthened because we are free of dependence on X. All subsequent results hold in their entirety but now apply to each parameter being estimated. CI can be restated as follows: The variance of the CV estimator of 6 exceeds that of the REC estimator by a positive semi-definite matrix. Since this difference is true for estimator of <5, it is also true of the comparable estimators of B. Each other conclusion has a comparable interpretation showing in particular REC to be better than the alternatives. We should note that in some cases X may be fairly large leading to wide inequalities in (42) but also leading larger relative efficiency differences. An example is presented to make the potential efficiency gain more real. This example falls under A4 and A5. X*P X » Z £ » 16 -36 -36 100 x'p 3 x - z 1 - 4 8 8 25 ) The only restriction is that variances are proportional: i.e. 16/100 ■ 4/25 The example was selected so the pattern of variation differs substantially, that is, the cross products are large negative and positive numbers (-36 and +8). The variance transformation is performed and then eigen vectors found: .25 .10 1_ \ i l l -l -21- The eigen values are: X 2± - [ .1 1.9] A 1± = [.45 .05] A = 4.5 A 2 = .02631 Assume also y - 1/3 q = 5 n = 98 Then by (44) r = .551 For & 1 RV(CV) =1.5 RV(EC) < 1.5 (.82) - 1.23 RV(REC) < 1.5 (.26) = .39 For 6 2 RV(CV) - .0088 RV(EC) _< .0072 RV(REC) < .0022 The figures indicate that there is little difference in the three estimators relative to 6 ? but substantial difference in 6 . For <$.. , CV has variance that is 150% larger than GLS ith known variance matrix while EC has variance up to 123% larger and REC up to 39% larger. Thus the variance of CV is 12% greater than that of EC and 80% greater than REC. For 6_, the ordering is the same but REC is only known to be 0.66% superior to CV. This concludes the discussion of alternative assumptions about Z. We turn in the next section to alternative variance estimates. -22- 11. Alternative Variance Estimates. 2 2 The estimates above of a. and a (18) and (23) are not the only such estimates. Arora [1] employs those given here. Fuller and Battese *2 2 [3, 4] use a but for a. employs the estimator below, while Wallace and Hussain [11] use estimates which are each different. We will consider the merits of the major alternative forms. A 2 ^2 Recall that a., and a are advantageous because they are each chi square with q and n degrees of freedom respectively. Their disadvantage which we will show, is that they "lose " more degrees of freedom than the model requires and hence are, in certain cases, inefficient. This is of importance particularly when there are few (or zero) degrees of freedom. In the following we need some known results concerning quadratic forms. 2 1. Assume X is N(0,a I) and A is idempotent of rank r. Then Q - X'AX ^ Chi square (r) Its cumulants are [6, p 168] (57) K 8 (Q) - C s • r where C g - 2 s "" 1 (s-1)! The cumulants convey the same information as moments [5, p. 20] The first four are: K l = w l ; K 2 = ° ; K 3 " y 3 ; K 4 = P 4 " ° Where y,. and y. are the third and fourth moments about the mean. 3 4 All cumulants for s > 3 in a normal distribution are zero. 2. Assume X^N(0,V) and A is a general symmetric matrix. Then Q ■ x'Ax has the following cumulants [7, p. 153]. -23- (58) K g (Q) = C s EX^ = C s tr (AV) S where the X.. are the eigen values of AV and tr (AV) is its trace. 3. Assume X^N(0,V) and A and B are general quadratic forms. Then covar(X'AX, X'BX) - 2 tr (AVBV) . This is obvious from the variance of X' (A + B)X. Now we will examine variance estimates of the type employed by Wallace and Hussain [11]. First employ ordinary least squares to estimate (1) and find the residual vector e. e - Me = [I - (-J 8 h) ~ X(X , X)" 1 X]e n t Define Q° • e'P 3 e = e'MP 3 Me Q? - e'P.e - e'MP.Me 4 4 4 The distribution of Q. and Q, depend on Q 3 - P 3 MVMP 3 Q 4 = P 4 MVMP 4 Where these are symmetric matrices which serve the role of AV in (58) First examine Q,: Q 4 - ofo - K°) + (aj - *y° 3l> -24- where K° * P i X(X , X)" 1 X f P 1 K ± - (X , X)~ L X , P i X k. ■ tr K " tr K° ill K . . = K.K etc. for any sequence of subscripts r s k. is k with r subscripts each equal to i followed by ^ s each equal to j. K= (Z. + Z_) Z. and is non negative definite. Hence k i 1 ° k. + k. = k 3 A k + k», - k_ etc. for any complementary pair. Employing eigen values one may show k 33 > k 3 2 /k . k 34 i k 3 V k We may proceed further under the assumptions employed above. Assume Al : k 3 r 4 P - u r (l - y) p k p = X/(l + X) Assume A2 : k 3 r 4 P = Zy ± r (l - Pi ) p U± - X ± /(l + X ± ) -25- 2 12. Estimate of . u Define (59) a* 2 = Q.%11, u 4 4 (60) m 4 - N(T - 1) - k 4 (61) Ea* 2 = a 2 (l + yk 34 /m 4 ) (62) y - (a x 2 /a 2 ) - 1 (63) var a* 2 = 2a 4 — [1 - — {k_. - 2yk- . , - yV,,,}] u urn, m. 34 343 3433 4 4 (61) and (63) show the dependence of this estimators on y and therefore on 2 2 Q,/<7 . Hence we must assume y is small. Given this and the small size of 1 u 2 k_, we note that a* is approximately unbiased and the last term in the variance equation is 1. If y is small the cumulants are: (64) Ks (o* 2 u ) - c s a 2 u s , 4 1 - 3 [1 - J > (k - k 3 s ) + o(k 34 / m4 ) 1 4 The last term signifies terms of order at most k_,/m, . Deleting that term (65) k (a* 2 ) < C a ^m, 1 " 3 s u — s u 4 which are the cumulants of chi square with m, degrees of freedom. Hence 2 a* is superior, in the sense of cumulants, to such a distribution and hence also to a . u The smallness of y can be made more specific by employing A2 . In that case k (66) K s (a* 2 ) = C s a 2s m 4 1_s [l + ^-{-k + Zy^d + y - yy^ 3 }] 4 i -26- 2 This is superior to x ( m A ) if the term in { } is negative for all s. In turn this is true if each element Ln the sum is less than or equal to 1 or: 1 > pj(l + y - yu ± ) S y < 1/U ± This is also a necessary condition. In our example above X = (4.5, .026) so y ■ (.82, .075) thus y < 1.22 or a 2 < 2.22 a 2 or y > .45. — 1 — u — 2 2 In every case under A2, O. < 2a will suffice so that all cumulants of 2 ~2 a* are less than those of a . u u In many cases under Al or A2, the largest X will be, say 1/10. 2 In this case u ■ 1/11 and y < 11 is sufficient for a* to be superior to — u /s2 2 a . Therefore, we conclude that O* is the superior estimator in cumulant u u r 2 2 if either 0~/o is small or the largest root A is small. 1 u ° 2 Despite the advantages of a* , the gain will in almost every case be small because there are many degrees of freedom. Hence it seems wise "2 to generally employ a . 2 13. Estimates of a. . Q 3 = a/ [(I - $D 8 1 J - 2K 3 ° + K 33 ] + o\ [K 3 ° - K 3 °] -27- Define (67) a* 2 - [Q 3 - a\ 3A ]/m 3 (68) m 3 = N - 1 - k + k 44 This is the estimator of Fuller and Battese. For simplicity one could employ Q /(N-l) or Q /nu but these require assumptions that a are S S J u small, etc. Cumulants can be found generally: (69) K 8 (CX* 2 ) - C^m/^U - ^-(k 4A - k/ 8 ) + 0( Y k 34 /m 3 )3. The last term depends on m,. which may not be very small but also has 2 2 Y " O /o. which is always less than one. (69) includes the effect of *2 a . To make (69) clearer we now employ A2. Under it: (70) < S ^J 2 ) - c s a 2s m 3 1_s [i - ± z (i - p ± ) 2 {i - (i - y ± ) s " 2 (i - v ± + Yy ± ) 8 } + ± (N r T - i) - k)" s (- Y k 3A ) 8 ] (71) k (a* 2 ) < CoJV 1 " 8 . 8 1 — S 1 J The inequality in (71) is true for s >_ 3 without any additional assumptions. When s ■ 2, the requirement is that Y be a small amount less than one. One form of requirement for (71) to hold is: 2[N(T - 1) - k] > k/(l - Y ) 1/2 For example if Y ■ .99 and k = 4, then (71) is true if N(T -1) > 24. ■28- This depends on the fact that at least one p. <_ 1/2. Our real Interest A 2 Is var (a. ) and one can show var a* £ var (a- ) If N(T - 1) - k > 1/4 k which will of course always be true. 2 We have shown that o* is cumulant superior to chi square m_ as A 2 well as to a. and hence should always be employed. Clearly there will be 2 some cases with small N where a* exists and the alternative does not. We will employ the chi square distribution as an approximation as we return to REC but we note that the truncation of the F distribution means the exact distribution is not required in any event. Thus we propose that REC be employed as above but with y* - a /a* replacing Y* The new degrees of freedom are used in defining r. (m, replaced q.) * 2 In this section we have shown that the estimates of a., of Fuller and Battese are superior to those of Arora but that both serve well in the REC estimator. 14. Three Component REC Model Now consider the original model with variance matrix given by (4) . 2 2 The analysis was developed with CT- ■ a but that assumption is now dropped and the third component restored. We employ four transformations defined as in section 3. These are (72) P=P«P P=»P«P K J 35 3 5 36 r 3 6 P,, = P. • P. P., = P. • P, 45 4 5 46 4 6 -29- One may show P 34 + P 45 " P 5 : P 36 + P 46 = 6 ; P 35 + P 36 " P 3 ; etc ' When the model is transformed by V - we have the grand mean for all ob servations - which is zero within X. P,_ yields mean values for each time period; P, fi yields mean values for each state and P., yields deviations from both the state mean and time mean. When regressions are performed on each transformed model in turn: P^c yields only the estimate of 6 P, 6 yields b, and a new $ which has [(N - 1) (T - 1) - k] degrees of freedom. ~2 ^2 P_- yields h. and a. which are identical to b, and cr- above. P,,. yields b,. and cL where the latter is Chi Square with T - 1 - k degrees of freedom. 2 2 We also could define 0* and a* as was done in the preceeding section and which would have the same implications. All of the estimators under consideration are now weighted averages of b. , b,, and b_: -30- Define the following. '1 u 1 2 u 2 Z. - X'P,P C X Z c - X'P.P,X 4 4 5 5 4 5 Z. = X'P P,X - X'P-X 1 Jo J Z A + Z 5 - Z 2 ' W • Z, + r Y-.Z. + r, Yo z c 4 a'l 1 b'2 5 W, = W z. 4 o 4 W = r Y,W "* 1 Z 1 1 a 1 o 1 W_ => r, Y W ~ 1 Zc 5 b 2 o 5 Then the r class estimator is defined as: (73) b - W.b, + W.b, + W^b,. r 44 11 55 This estimator is again identical to the a class of Swamy and Arora. Indeed the entire 7 parameter a class can be expressed as part of the two parameter r class if we set: (74) r fl = a /t a 6 ^ a i + a 2^1^ r fe - a 3 /[a 6 (a 4 + a^)] Although the a notation was desirable for the purposes of that article, the r notation is simpler and more sensible for general estimation purposes. -31- There are many specific estimators available for a 3 component model such as this one. For each of the two ratios Y-t and Yn we ^y employ EC, CV, REC, or omit the component and return to a 2 component model. These imply that r ■ 1, r = 0, r * r*, or r Y, s 1. Since the same four options S. 3, 3. 3. _L are available for the two components there are 4 x 4 = 16 different specifi- cations clearly available. The variance of b is : r (75) varog^, Y 2 > - °\~hz A + Cr.VV*! + '^Vo" 1 We now proceed directly to an assumption about Z similar to that made above: A9 After a single variance transformation Q is applied to Z_, Z- and Z, such that Z* - QZ.Q etc. there exist eigen value matrices for the transformed Z that are identical. This is applied as above to transform the parameter vector to a new vector <$. The eigen values are contained in A.. A, and A_. 4 15 Define X ai = X li /X 4ij \± " X 5i /A 4i Then the relative variance of b J may be shown to be: ri (76) WCbJV Y 2 ) - Q*/(l + X^ ♦ ^rjff < Q* Q* - X al y l(1 - r aYl / Yl ) 2 + ^(1 - r a Y 2 /Y 2 ) 2 + X aAi Y lV r aV Y l " r bW 2 -32- 9 9 y and a. are the mean and variance of (Y-j/Y-j); P ? and a ? are the mean and * ^2 A 2 ^2 variance of (y~/y ? )'> and a ? is the rovariance. If a , a and a 2 are o ^ respectively chi square with n , q , and q ? degrees of freedom then Yi and Y are each F with common numerator but independent denominator. If these distributions are not truncated then: (77) a 12 = (2/n°) u^ Ihis value will be small when n is large. We minimize Q* over r and r, . The optimum value for r* after some x a b a computation is found to be: *bi Y 2 (78) r* = r, li- st. 1 1 + \±^: 1 - r* (y 2 + a 12 / Ul ) r is the optimal r value for the two component model defined in (42) . A similar equation holds for r*. It is seen from this that r* is approximated by r 1 if (X, . Y 2 ) is small or if q„ is not small (so that r* is approximately one. Because this approximation exists we define the three component REC estimator as the r class with r = r and r. ■ r„. a i b 2 It remains for us to contrast the efficiency of the alternative estimators with the three component model. We indicated below (74) the variety of estimation methods available. The selection of estimation method depends on the true value of Y-i or Y?« Hence the most efficient method is 2 components if Y is close to one. CV is adviseable if q is very small or X large and EC is good if q is large and Y not close to one. The most efficient estimator In every case is REC with appropriately chosen r. In general use one may choose r by the rules given in (42) but -33- shift to another estimator is special cases indicated by the previous paragraph. The general use of any other estimator is clearly inefficient. We have developed the r class of estimators and have shown that a specific member, REC, should be used in essentially all error component problems. In the process we have developed substantial information about small sample properties of error components, covariance, and REC estimators, Further theoretical and Monte Carlo analysis should broaden the specific assumptions and results. References [1] Arora, S. S., "Error Components Regression Models and Their Applications", Annals of Economic and Social Measurement , 2, October, 1973, 451-61. [2] Balestra, P. and N. Nerlove, "Pooling Cross-Section and Time Series Data in the Estimation of a Dynamic Model: The Demand for Natural Gas". Econometrics , 34, July, 1966, 585-612. [3] Fuller, W. A. and G. E. Battese, "Transformations for Estimation of Linear Models and Nested-Error Structure", Journal of the American Statistical Association , 68, September, 1973, 626-32. [4] Fuller, W. A. and G. E. Battese, "Estimation of Linear Models with Crossed-Error Structure". Journal of Econometrics , 2, March, 1974. [5] Johnson, Norman L. and Samuel Kotz, Discrete Distributions , Boston, 1969. [6] Johnson, Norman L. and Samuel Katz, Continuous Univariate Distributions - 1, Boston, 1970. [7] Johnson, Norman L. and Samuel Katz, Continuous Univariate Distributions - 2, Boston, 1970. [8] Nerlove, M. , "Further Evidence on the Estimation of Dynamic Economic Relations from a Time Series of Cross Sections", Econometrica , 39, March, 1971, 359-82. [9] Press, S. James, Applied Multi v ariate Analysis , New York, 19 72. [10] Swamy, P.A.V.B., and S. S. Arora, "The Exact Finite Sample Properties of the Estimators of Coefficients in the Error Components Regression Models", Econometrica , 40, March, 1972, 253-60. [11] Wallace, T. D. and A. Hussain, "The Use of Error Components Models in Combining Cross Section with Time Series Data", Econometrica , 37, January, 1969. >0 uNo e ^